Skip to content

fix: zero silent failures, Bedrock graceful handling, exact page reporting#50

Open
Nishit24113 wants to merge 3 commits into
mainfrom
fix/error-handling-clean
Open

fix: zero silent failures, Bedrock graceful handling, exact page reporting#50
Nishit24113 wants to merge 3 commits into
mainfrom
fix/error-handling-clean

Conversation

@Nishit24113

Copy link
Copy Markdown
Collaborator

Summary

  • Adobe failures: Report "Adobe API failed for this document" and exit cleanly — removes the BAD_PDF pdf-lib fallback that was degrading output quality by destroying PDF structure tags on re-save
  • Bedrock graceful handling: Images with extreme aspect ratio (>20:1) are pre-checked and tagged as "Decorative element" with exact page number logged to CloudWatch; other Bedrock failures log exact page and fail the pipeline properly
  • Exact page reporting: Added page_num column to SQLite image_data table in Adobe container — alt-text station now logs page=6 instead of misleading pages=1-200
  • Zero silent failures: Failure-handler Lambda writes result/FAILED_<name>.json on every pipeline failure so the UI stops polling and shows the user what happened instead of spinning indefinitely
  • Title-generator + merger: Replaced return {"statusCode": 500} pattern with raise so Step Functions Catch fires on failure instead of treating errors as success
  • Env var fix: s3_bucket passed correctly to ECS containers via JsonPath.string_at("$.s3_bucket") instead of broken ContainerOverrides indexing
  • Splitter: Adds total_pages and pages_in_chunk to chunk metadata

Test plan

  • Deploy to pdf-dev using deploy.sh
  • Test a normal PDF — verify it still produces clean results matching previous runs
  • Test a PDF with images — verify CloudWatch shows page=<N> for Bedrock activity
  • Verify no pages=1-200 appears in CloudWatch logs
  • Confirm failed PDF shows FAILED_<name>.json in S3 result/ folder and UI stops spinning

…rting

- Adobe failures: report "Adobe API failed for this document" and exit
  cleanly (no pdf-lib BAD_PDF fallback that degraded output quality)
- Bedrock graceful handling: images with extreme aspect ratio (>20:1) are
  tagged as "Decorative element" with exact page number logged to CloudWatch;
  other Bedrock failures log exact page and fail the pipeline properly
- Exact page reporting: store page_num in SQLite image_data table so alt-text
  station can report "page=6" instead of misleading "pages=1-200"
- Zero silent failures: failure-handler Lambda writes result/FAILED_<name>.json
  on every pipeline failure so UI stops polling and shows the user what happened
- Title-generator and merger: return 500 dict replaced with raise so Step
  Functions Catch fires instead of silently continuing
- Env var fix: s3_bucket passed correctly to ECS containers via JsonPath
- Splitter: adds total_pages and pages_in_chunk to chunk metadata
…k pages

- Adobe container: include actual Adobe error (errorCode, statusCode,
  requestTrackingId) in the failure message instead of generic text;
  compute page_start/page_end from chunk key so the failure marker
  shows which chunk range was being processed
- Alt-text container: when all Bedrock requests fail, collect the page
  numbers of failed images from the SQLite DB and include them in the
  failure detail so the user can see which pages triggered the issue
- Failure-handler: pass page_start/page_end from station errors through
  to the result marker; use station message as user-facing summary
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant