Redirect errors on Curve LMS sites
Incident Report for Thrivist
Postmortem

The shared Redis production database which handles caching for Curve ran out of “freeable memory” which is essentially a cache server’s diskspace. When Curve cannot reach Redis, it attempts to redirect the page and the browser will not display giving a “too many redirects” error. The issue occurred between 11:27am-11:57pm CDT. Thankfully we received 0 reports of issues from clients, likely due to the issue arising during the lunch hour.

We are doing three things to address:

  1. We are adding alarms so we will be notified if Redis reaches a low threshold of memory. This will at least alert us to a similar condition so we can act before it becomes an issue.
  2. We will build in some auto-clearing of Redis cache into the Curve application upon production deploys. For reference, the freeable memory ran from the beginning of July to today before filling up, so at this point, max monthly purges should keep things flowing smoothly.
  3. Develop an error page in Curve that is more helpful to end-users and links to the Thrivist status page: http://status.thrivist.com/
Posted Sep 05, 2018 - 12:27 CDT

Resolved
All Curve LMS production instances are now operational. The cache database was cleared and freed of memory issues. We are monitoring the environment and will be planning a longterm fix to address the issue.
Posted Sep 05, 2018 - 12:03 CDT
Identified
The Curve LMS cache database is having memory issues causing errors for all production Curve LMS sites. The issue has been identified and a fix is being applied.
Posted Sep 05, 2018 - 11:54 CDT
Investigating
We are seeing redirect errors on client Curve LMS sites. We are investigating the issue.
Posted Sep 05, 2018 - 11:27 CDT
This incident affected: curve (Curve LMS).