Errors for some multi-step custom workflows
Incident Report for One Codex
Postmortem

After a deployment on 1/17, some multi-step workflows experienced unexpected JSON parsing errors for intermediate steps, causing them to fail.

In investigating these errors, we quickly traced the problem to a series of changes as part of our upcoming support for extended workflow parametrization options. These included changes to how outputs were captured, specifically JSON files. We ran a standard suite of tests to verify this change before deploying. We did not test it with customer pipelines specifically, some of which included steps with problematic output schemas.

To prevent such issues moving forward, we plan to expand our test suite to include more complex workflow pipelines as well as introduce an additional sign-off and testing step for complex workflow orchestrator changes.

Timeline:

  • 1:26 PM PST: We received alerting for failed workflow runs.
  • 1:56 PM PST: We identified the issue and began working on a fix.
  • 2:42 PM PST: We deployed updated code and re-ran all affected workflows.
Posted Jan 23, 2024 - 18:53 UTC

Resolved
Some multi-step workflows are experiencing unexpected internal failures across our workflow orchestration platform.
Posted Jan 17, 2024 - 21:26 UTC