The shared Redis production database which handles caching for Curve ran out of “freeable memory” which is essentially a cache server’s diskspace. When Curve cannot reach Redis, it attempts to redirect the page and the browser will not display giving a “too many redirects” error. The issue occurred between 11:27am-11:57pm CDT. Thankfully we received 0 reports of issues from clients, likely due to the issue arising during the lunch hour.
We are doing three things to address:
- We are adding alarms so we will be notified if Redis reaches a low threshold of memory. This will at least alert us to a similar condition so we can act before it becomes an issue.
- We will build in some auto-clearing of Redis cache into the Curve application upon production deploys. For reference, the freeable memory ran from the beginning of July to today before filling up, so at this point, max monthly purges should keep things flowing smoothly.
- Develop an error page in Curve that is more helpful to end-users and links to the Thrivist status page: http://status.thrivist.com/