Salsa Latency Issues

Postmortem

Issue Summary:

On 21st January, some Salsa Engage clients began experiencing issues with sluggish responses from various elements within the product – webpages, email sending, email statistics and some reports. Over the course of the following days, up until February 7th, some clients continued to experience issues to various degrees. As of February 8th, normal performance resumed.

Root Cause:

The infrastructure for all of Salsa was being upgraded. This included both Salsa CRM and Engage. The entirety of CRM was upgraded without issue, however, the Engage element presented problems when the database was upgraded. This was primarily due to a race condition that presented as the new database and infrastructure were spun-up. Although this was resolved, a significant backlog had built up and needed to ‘catch up’ and re-synchronize. In addition, verbose logging during the upgrade took up considerably more disc space than planned for. This combination of events had a knock-on effect impacting some reports, elements of email sending and webpage performance. While no element was entirely non-functional, the slowness of some exacerbated the performance impact of many that were not initially directly affected. Once the backlog had been purged, logging updated, reports manually re-queued, then Salsa Engage returned to the performance levels expected and indeed, to a greater response, throughput and output level than before the upgrade.

Prevention:

Detailed error aggregation and logging capacity has been enforced. Database performance has been massively updated and auto-scaling enabled. Locked jobs are automatically identified and released through auto-scaling compute capacity.

Posted Feb 12, 2025 - 15:17 EST

Resolved

The incident has been successfully resolved.
Posted Feb 05, 2025 - 16:29 EST

Update

The reported latency issues with reports and the results page have been addressed. Queue activity is now current and responses should be received within 15 minutes. We will continue to monitor performance.
Posted Feb 05, 2025 - 09:24 EST

Update

The reports queue has been significantly reduced, so latency issues associated with reports and the results page are now returned within 1 - 2 hours. The latency issues will resolve completely as the queue has been fully addressed over the next 24 hours.
Posted Feb 04, 2025 - 17:19 EST

Update

We have received reports of latency associated with reports and the results page. Due to end-of-month activities, we experienced an increase in queue activity, which led to delays. The queue is now reducing as processing catches up, but users may still experience a delay of up to 24 hours in data updates.
Posted Feb 04, 2025 - 13:23 EST

Update

Performance monitoring remains ongoing. We are actively addressing an issue causing minor latency challenges for some users and we will continue to post regular updates.
Posted Feb 04, 2025 - 09:27 EST

Update

We continue to monitor performance and are actively addressing an issue causing minor latency challenges for some users. We are focused on restoring regular operations for all users as quickly as possible.
Posted Feb 03, 2025 - 17:34 EST

Update

Performance has improved following updates completed over the weekend. Performance monitoring has continued with no identified degradation.
Posted Feb 03, 2025 - 12:35 EST

Update

Performance monitoring continued throughout the weekend. While no additional issues have been identified, the technical teams will continue to monitor.
Posted Feb 03, 2025 - 09:31 EST

Update

We are proactively monitoring performance and taking necessary steps to restore regular operations. Throughout the weekend, our technical teams will provide 24/7 coverage, with additional resources in place to minimize delays and ensure information and updates are displayed accurately and promptly.
Posted Jan 31, 2025 - 16:53 EST

Update

We are continuing to monitor performance. No additional issues have been identified.
Posted Jan 31, 2025 - 13:05 EST

Update

Performance monitoring continued overnight. No additional issues have been identified.
Posted Jan 31, 2025 - 09:20 EST

Update

We are continuing to monitor performance. No additional issues have been identified.
Posted Jan 30, 2025 - 17:46 EST

Update

We continue to monitor performance following the completed updates to address the remaining residual items causing the minor latency, syncing delays, cloning functionality, and email deliverability. No new issues have been identified.
Posted Jan 30, 2025 - 13:11 EST

Update

We are continuing to monitor performance following the completed updates to address the remaining residual items causing the minor latency, syncing delays, cloning functionality, and email deliverability issues.
Posted Jan 30, 2025 - 09:06 EST

Update

A few updates were made to assist with the remaining residual items causing the minor latency, syncing delays, cloning functionality, and email deliverability issues. We will continue to monitor to ensure there is no additional regression with performance.
Posted Jan 29, 2025 - 17:02 EST

Update

We continue to actively monitor performance. No new issues have been identified following the implemented fix. We continue to focus on the remaining residual items around the minor latency, syncing delays, cloning functionality, and email deliverability, with full resolution expected within the next day.
Posted Jan 29, 2025 - 13:16 EST

Update

We continued to monitor performance overnight. No further issues have been identified following the implemented fix. We continue to focus on the remaining residual items around the minor latency, syncing delays, cloning functionality, and email deliverability, with full resolution expected within the next day.
Posted Jan 29, 2025 - 09:00 EST

Update

We are still actively monitoring performance. No additional issues have been identified following the implemented fix. Focus continues on the remaining residual items around the minor latency, syncing delays, cloning functionality, and email deliverability, with full resolution expected within the next two days.
Posted Jan 28, 2025 - 17:08 EST

Update

Performance monitoring remains ongoing, and no new issues have been identified following the implemented fix. Focus continues on the remaining residual items around the minor latency, syncing delays, cloning functionality, and email deliverability, with full resolution expected within the next two days.
Posted Jan 28, 2025 - 13:27 EST

Update

We are continuing to monitor the situation and have observed that most systems are performing as expected following the fix implemented during the emergency maintenance window. However, there are still a few residual items we are working to address, including minor latency, syncing delays, cloning functionality, and email deliverability. These issues are gradually improving, and we anticipate resolution within the next 48 hours.
Posted Jan 28, 2025 - 09:22 EST

Update

The email deliverability issues have been resolved. We will continue to monitor performance to ensure there are no further issues.
Posted Jan 24, 2025 - 14:27 EST

Update

Following the emergency maintenance, we have observed significant improvements in overall product performance. However, we are currently investigating reports of email deliverability issues and will provide updates as more information becomes available.
Posted Jan 24, 2025 - 11:21 EST

Monitoring

A fix was implemented during the emergency maintenance window and subsequently tested. Now we're out of maintenance and monitoring performance. While Salsa processes the tasks that had built up during this incident over the last couple of days, it will remain sluggish albeit increasingly less so. We expect normal performance to be resumed by Sunday evening.
Posted Jan 23, 2025 - 07:01 EST

Update

We are instigating Emergency Maintenance to enable further investigation and resolution of this Salsa Engage issue. This will commence at 05:15 EST and complete by 06:45 EST. We sincerely apologize for any inconvenience this may cause.
Posted Jan 23, 2025 - 05:03 EST

Update

The team continue to work on resolving the issues with Salsa Engage. While we are still Investigating, we have identified two possible causes and are actively eliminating one of them. Further updates will be provided as we move to the Identified stage.
Posted Jan 23, 2025 - 03:09 EST

Update

We are fully focused on resolving the latency issues affecting the Salsa product. An expanded team is actively working through all aspects of the problem, and we will continue working overnight to resolve it as quickly as possible. We will provide another update tomorrow morning.
Posted Jan 22, 2025 - 17:30 EST

Update

We are continuing to investigate this issue.
Posted Jan 22, 2025 - 14:12 EST

Investigating

We are aware of the latency issues currently affecting the Salsa product and understand the impact this may have on your work. Our team is actively investigating the cause and working to resolve this as quickly as possible.

We will keep you updated on our progress and provide more information as it becomes available.
Posted Jan 22, 2025 - 11:04 EST
This incident affected: Salsa.