Issues with displaying the leave balance and carry over calculation
Incident Report for Calamari
Postmortem

Overview

  • Impact: The problem affected clients using the Time Off module with leave carryover policies turned on. The system displayed the incorrect leave balance, but this did not result in data loss.
  • Timeline: November 17, 2024, 12:30 CEST – November 20, 2024, 21:00 CET

Timeline Highlights

  1. November 17, 2024 – 12:30 CET: A version introducing the bug was deployed.
  2. November 18, 2024 – ~03:00 CET: Customers began reporting calculation errors.
  3. November 18, 2024 – 09:50 CET: Root cause identified—issues in the calculation engine connected to the carry-over process.
  4. November 18, 2024 – 11:38 CET: First patch deployed alongside a mechanism to expedite data correction; fixes began rolling out.
  5. November 18, 2024 – 13:33 CET: Realization that the correction process would require several hours; active monitoring continued.
  6. November 19, 2024 – 09:30 CET: Final accounts corrected. However, an oversight affecting the calculation of leave days for 2025 was identified.
  7. November 20, 2024 – 21:00 CET: Enhanced system version deployed, addressing all known issues.

Detection and Cause

The issue was reported by customers approximately 12 hours after deployment. We were deploying improvements in the carry-over engine, but it unintentionally affected leave balances displayed to end users. The complexity of the environment and scenario-specific conditions were not fully replicated and issue has not been detected in the test environment.

Lessons Learned

Testing Improvements - Extend testing cycles for critical modules like the accrual engine.

Staged Roll-outs - Reinstate staged deployments to limit potential impact during updates.

Communication Enhancements - Strengthen customer communication to ensure timely updates during incidents.

Follow-Up Actions

  • We are expanding test cases to include more edge scenarios in the area of accrual engine.
  • We are implementing policies ensuring staged (partial) roll-outs for all major changes.
Posted Nov 21, 2024 - 14:42 CET

Resolved
We've restored 100% accounts. Right now, we are monitoring systems.
Posted Nov 19, 2024 - 16:52 CET
Update
We've restored 100% of accounts in the US and ASIA regions and are consequently restoring the data for clients in the EU. Currently, we've fixed approximately 80% of affected accounts in EU.
Posted Nov 19, 2024 - 09:30 CET
Update
We are consequently restoring the data. Currently, we've fixed approximately 50% of affected accounts.
Posted Nov 18, 2024 - 17:08 CET
Update
We are consequently restoring the data. Currently, we've fixed approximately 25% of affected accounts.
Posted Nov 18, 2024 - 15:06 CET
Update
We are consequently restoring the data. Currently, we've fixed approximately 15% of affected accounts. The ETA for correcting balances will take longer than we expected.
Posted Nov 18, 2024 - 13:33 CET
Update
We are consequently restoring the data from the moment before our last release. The ETA for correcting balances in all accounts is 60-90 minutes.
Posted Nov 18, 2024 - 12:46 CET
Monitoring
We are consequently restoring the data from the moment before our last release. The process of restoring the data is in the production environment. Carry-over balances will be fixed in the next 30-60 min for the first group of our clients.
Posted Nov 18, 2024 - 11:38 CET
Update
We've found a solution to prevent the problem from spreading—it is currently rolling out to the production environment. We are still working on a solution for recalculating carry-over to the correct amount.
Posted Nov 18, 2024 - 10:46 CET
Update
We are working on a solution to ensure the consistency of all our clients' data. We still do not have an ETA for solving the problem, but we will let you know as soon as we find one.
Posted Nov 18, 2024 - 10:23 CET
Identified
We found the cause of the issue and are working on a solution. The problem is related to our recent release - improvements in the carry-over engine.
Posted Nov 18, 2024 - 09:50 CET
Investigating
We are currently investigating the issue.
Posted Nov 18, 2024 - 02:00 CET
This incident affected: Web Apps - EU, Web Apps - US, and Web Apps - ASIA.