Overview
- Impact: The problem affected clients using the Time Off module with leave carryover policies turned on. The system displayed the incorrect leave balance, but this did not result in data loss.
- Timeline: November 17, 2024, 12:30 CEST – November 20, 2024, 21:00 CET
Timeline Highlights
- November 17, 2024 – 12:30 CET: A version introducing the bug was deployed.
- November 18, 2024 – ~03:00 CET: Customers began reporting calculation errors.
- November 18, 2024 – 09:50 CET: Root cause identified—issues in the calculation engine connected to the carry-over process.
- November 18, 2024 – 11:38 CET: First patch deployed alongside a mechanism to expedite data correction; fixes began rolling out.
- November 18, 2024 – 13:33 CET: Realization that the correction process would require several hours; active monitoring continued.
- November 19, 2024 – 09:30 CET: Final accounts corrected. However, an oversight affecting the calculation of leave days for 2025 was identified.
- November 20, 2024 – 21:00 CET: Enhanced system version deployed, addressing all known issues.
Detection and Cause
The issue was reported by customers approximately 12 hours after deployment. We were deploying improvements in the carry-over engine, but it unintentionally affected leave balances displayed to end users. The complexity of the environment and scenario-specific conditions were not fully replicated and issue has not been detected in the test environment.
Lessons Learned
Testing Improvements - Extend testing cycles for critical modules like the accrual engine.
Staged Roll-outs - Reinstate staged deployments to limit potential impact during updates.
Communication Enhancements - Strengthen customer communication to ensure timely updates during incidents.
Follow-Up Actions
- We are expanding test cases to include more edge scenarios in the area of accrual engine.
- We are implementing policies ensuring staged (partial) roll-outs for all major changes.