Notification emails are not being sent
Incident Report for Nitro Sign
Postmortem

Overview

From 28 Sept, 16:28, Dublin time, we stopped to send email notifications to the owner of signed documents.

What Happened

Our provider of digital signatures (Notarius) executed a planned maintenance that somehow invalidated our certificates resulting in our digital signing subsystem to fail. This failure prevented the normal termination of all our signature workflows; one of the symptoms of this was the missing notifications to the owners of a signed documents.

Resolution

After individuating the root cause (Notarius update) we contacted their support team and agreed to reset our certificates.

Root Causes

Unclear. Noting changed our side. The most probable cause is that Notarius somehow invalidated our certificates after one of their previous updates to their infrastructure. Refreshing these certificates solved the problem therefore we can confidently assume that the cause was their planned maintenance update.

Impact

For more than 24hrs the owners of a signed documents did not receive the email notification. The signed documents were correctly stored in the user workspace but they were not digitally signed.

What Went Well?

  • Notarius tech support was responsive and quickly rotated our certs.

What Didn't Go So Well?

  • No alerts around missing notifications.
  • No direct line with Notarius support.
  • Poor DLQ hygiene. The Dead Letter Queue contained some very old poison pills that slowed us down in replaying the missing events.

Action Items

  • Re-establish a clean and direct line of contact with Notarius
  • Improve our synthetic tests to include also missing email notifications during the signature workflow.
  • Improve our alerts around DLQ size
  • Have a better DLQ hygiene

Timeline

  • Sept 28 @ 16:28 Our digital signing sub system started to fail. No more digitally signed documents and missed notifications to users.
  • Sept 29 @ 10:19am Customer supports reported issues with missing email notifications.
  • Sept 29 @ 10:20am Team started to diagnose. Lots of error in digital-signing-service about failing to sign using our certificates
  • Sept 29 @ 10:39am Notarius status page notifies of an ongoing maintenance
  • Sept 29 @ 11:09am NSP is also affected.
  • Sept 29 @ 11:30am Contacted a Notarius tech support engineer and we raise the issue with him.
  • Sept 29 @ 12:16pm Notarius is back but this does not fix the issue
  • Sept 29 @ 14:47pm After figuring out that the issue was def on Notarius side we asked their support engineer to rotate our certs
  • Sept 29 @ 15:41pm Notarius completed cert rotation. Our digital signature sub system starts to function correctly again. All system working correctly now. Email notifications are being sent.
  • Sept 29 @ 15:49pm We start to replay old messages in the DLQ but we notice that among them there are some old poison pills :(
  • Sept 29 @ 18:34pm The team decide to pause the replay to not bring down the system.
  • Sept 30 @ 12:55pm All the messages in the DLQ have been replayed successfully. DLQ is purged from its poison pills.
  • Sept 30 @ 12:56pm Incident closed.
Posted Sep 30, 2022 - 15:23 UTC

Resolved
This incident has been resolved.
Posted Sep 30, 2022 - 11:54 UTC
Update
Production systems are OK.

A backlog of failed notifications will need to be re-played.
Posted Sep 29, 2022 - 17:01 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Sep 29, 2022 - 16:59 UTC
Identified
The issue has been identified and a fix is being implemented.
Posted Sep 29, 2022 - 10:12 UTC
Investigating
We are currently investigating this issue.
Posted Sep 29, 2022 - 10:03 UTC
This incident affected: Nitro Sign Notification Delivery.