Production-oriented AWS CDK v2 infrastructure for an event-driven CSV processing pipeline. Validated through policy-as-code with OPA/Rego.
- Fully event-driven — S3 uploads trigger Step Functions via EventBridge, no Lambda glue code
- Secure by default — KMS encryption everywhere, S3 Object Lock (Compliance mode), VPC endpoints, least-privilege IAM
- Policy-as-code — OPA/Rego policies in CI enforce security & operational guardrails on every commit
- Cost-aware — NAT Gateway is only deployed when needed; configurable retention, TTL-based DynamoDB cleanup
- Fargate-native — Long-running CSV processing via ECS Fargate tasks (5-10 min), not Lambda
- Tested infrastructure — Jest snapshot tests + OPA validation + CDK Nag support
flowchart LR
user[User or upstream system] --> raw[(Raw uploads S3 bucket<br/>KMS, TLS, versioning)]
raw -->|Object Created via EventBridge notifications| rule[EventBridge rule]
raw -. server access logs .-> accessLogs[(Access log bucket<br/>KMS, TLS)]
rule --> sfn[Step Functions state machine]
sfn -->|RunTask sync integration| ecs[ECS Fargate task<br/>private subnets, no public IP<br/>read-only root FS, non-root user]
ecs --> raw
ecs --> processed[(Processed files S3 bucket<br/>KMS, Object Lock, versioning)]
ecs --> failed[(Failed files S3 bucket<br/>KMS, Object Lock, versioning)]
ecs --> secrets[Secrets Manager<br/>KMS]
ecs --> logs[CloudWatch Logs<br/>KMS]
ecs --> vpc[Processing VPC<br/>private subnets]
vpc --> endpoints[Interface VPC endpoints<br/>Secrets Manager, Logs, ECR, SQS]
trail[CloudTrail S3 data events] --> cloudtrailBucket[(CloudTrail bucket<br/>KMS, TLS)]
raw --> kms[KMS keys<br/>storage, operational, secrets]
processed --> kms
failed --> kms
secrets --> kms
logs --> kms
sfn --> jobs[(DynamoDB job table<br/>TTL-based auto-expiry)]
rule -. retry after failed workflow start .-> retry[(RetryQueue)]
macie[Amazon Macie] --> logs
macie --> raw
macie --> failed
ecs -. domain writes .-> db[(Main database)]
npm install
aws configure
cdk bootstrap aws://<account-id>/<region>
npm run build && cdk deploySee full deployment docs for configuration options.
| Control | Implementation |
|---|---|
| Encryption at rest | Customer-managed KMS keys per tier (storage, operational, secrets) with auto-rotation |
| Encryption in transit | TLS enforced on all S3 buckets; VPC endpoints for AWS services |
| Object immutability | S3 Object Lock in Compliance mode for processed & failed buckets |
| Access isolation | Bucket policy Deny rules scoped to ECS task role only |
| Secrets management | ECS Secrets injection — task role has no secretsmanager:GetValue |
| Container hardening | Non-root user, read-only root filesystem, digest-pinned images |
| Audit trail | CloudTrail with S3 data events for all data buckets |
| PII visibility | Amazon Macie for sensitive data discovery with EventBridge alerting |
| Network isolation | Private subnets, no public IP, scoped HTTPS egress |
| Feature | Savings |
|---|---|
| Conditional NAT Gateway | ~$32/mo saved when enrichment APIs are not configured |
| DynamoDB TTL auto-cleanup | Free — reduces manual intervention and storage costs |
| Configurable log retention | Pay only for what you need (default: 30 days) |
| PAY_PER_REQUEST DynamoDB | No provisioned capacity waste for variable workloads |
- Step Functions execution logs and X-Ray tracing
- CloudWatch Logs with KMS encryption for processor containers
- Macie findings routed to CloudWatch Logs and SNS alert topic
- CloudTrail data events for object-level audit across all S3 buckets
- DynamoDB job status table with TTL-based expiry
- AWS CDK v2 (TypeScript) — infrastructure as code
- Amazon S3 — raw uploads, processed files, failed artifacts, access logs
- AWS Step Functions — workflow orchestration with native ECS integration
- Amazon ECS Fargate — long-running CSV processor tasks in private subnets
- Amazon DynamoDB — job metadata ledger with TTL-based auto-cleanup
- Amazon EventBridge — S3 event ingestion and Macie finding routing
- Amazon Macie — sensitive data discovery
- AWS CloudTrail — object-level audit trail
- AWS KMS — customer-managed keys per data tier
- Amazon SQS — dead-letter queue for failed workflow invocations
- AWS Secrets Manager — credential and API key injection
- Amazon VPC — isolated networking with interface endpoints
- Open Policy Agent — policy-as-code validation in CI
- Single stack: One CDK stack keeps the infrastructure easy to review. The app avoids premature module boundaries while still separating deployment configuration into a small helper for testability.
- Event-driven orchestration: S3 Object Created events flow through EventBridge directly into Step Functions. This avoids Lambda glue code and keeps retry/failure behavior visible in the workflow.
- Fargate for long-running work: Processing takes 5-10 minutes, which is too long for a simple synchronous request path. A one-off Fargate task fits a black-box container processor without requiring a continuously running ECS service.
- Step Functions native ECS integration: The workflow uses the native
ecs:runTask.syncintegration so Step Functions waits for task completion and can apply workflow-level retry and catch handling. - Private task networking: Fargate tasks run in private subnets with no public IP. Interface VPC endpoints for Secrets Manager, CloudWatch Logs, ECR, and SQS keep service traffic within the AWS network. A NAT gateway is conditionally included only when enrichment API CIDRs are configured.
- Data protection by default: S3 buckets block public access, enforce SSL, use bucket-owner-enforced object ownership, versioning, and customer-managed KMS encryption. Processed and failed buckets use S3 Object Lock in Compliance mode — objects cannot be deleted or overwritten until the retention period expires, even by the root user.
- Multipart uploads for large files: The raw bucket is intended to receive large CSV files through S3 multipart upload. This improves reliability for multi-GB files and lets clients retry individual parts instead of restarting the whole upload.
- Job metadata table: Step Functions persists job status in DynamoDB using a deterministic key based on bucket, object key, and S3 sequencer. Records auto-expire via TTL after the configured retention period.
- Failure isolation: EventBridge target retries are bounded and failed state-machine invocations are sent to
RetryQueue. Processor failures are captured in the state machine and reflected in the job table before the workflow fails. - Workflow observability: Step Functions execution logs and X-Ray tracing are enabled so orchestration failures can be diagnosed without relying only on container logs.
- PII visibility with Macie: Amazon Macie is enabled for the account/region and Macie findings are captured through EventBridge into both an encrypted CloudWatch log group and an SNS topic for operational alerting.
- Per-bucket data retention: Raw uploads, processed files, and failed processing artifacts each have their own configurable retention period via CDK context or environment variables. Raw and failed files default to 7 days because they can contain unsanitized PII. Processed files default to 7 days as well but can be extended independently for audit or reprocessing needs.
- Least-privilege task role: The task role can read raw inputs and write processed/failed outputs. Secrets are injected by the ECS agent via ECS Secrets — the task role has no
secretsmanager:GetValuepermission. - Portable deployment: Account, region, retention periods, processor image, enrichment API CIDRs, processor compute size, and job TTL are all configurable through environment variables or CDK context so the same code can deploy to another AWS account or region without source edits.
- Configurable processor image: The stack does not hardcode the processor container. Deployments provide a digest-pinned image URI through
PROCESSOR_IMAGEor-c processorImage=.... Tags like:latestare rejected at synthesis time.
- Public S3 access is blocked on all buckets.
- Buckets enforce TLS using
enforceSSL. - All S3 buckets (raw, processed, failed) have lifecycle expiration configured independently to control data longevity and PII exposure.
- Processed and failed buckets use S3 Object Lock in Compliance mode — objects are WORM-protected for the full retention period.
- All data buckets log access to a dedicated S3 server access log bucket for auditability.
- S3 access is restricted to only the ECS task role via explicit bucket policy Deny rules.
- S3, DynamoDB, SQS, Secrets Manager, and CloudWatch Logs use separate customer-managed KMS keys per data tier (storage, operational, secrets) to limit blast radius. All keys have rotation enabled.
- CloudTrail is enabled with S3 data events for all three data buckets to support object-level audit.
- Amazon Macie is enabled to support sensitive data discovery and S3 data security findings.
- Macie findings are routed to both CloudWatch Logs and an SNS topic for operational alerting.
- ECS tasks run in private subnets and use a security group that allows DNS plus outbound HTTPS only.
- The placeholder container runs as a non-root user with a read-only root filesystem.
- The ECS task role is granted read access only to the raw bucket and write access only to the processed and failed buckets. Secrets are not accessible to the task role.
- Secrets are injected at container start by the ECS agent using ECS Secrets, not passed as plain-text environment variables.
- Container images must be pinned to a digest (
@sha256:...) — tags like:latestare rejected at synthesis time. - The Step Functions execution role is explicitly defined and scoped to the specific DynamoDB table, ECS task definition, and log group.
- Interface VPC endpoints for Secrets Manager, CloudWatch Logs, ECR, and SQS keep service traffic within the AWS network.
- The EventBridge rule uses an input transformer to strip sensitive S3 event metadata (user identity, source IP) before passing the event to Step Functions.
- HTTPS egress is scoped to the VPC CIDR (for interface endpoints) and configurable enrichment API CIDRs — no wildcard
0.0.0.0/0HTTPS egress. - IAM permissions are intentionally resource-scoped through CDK grants where possible.
- Production systems should still consider malware scanning, stricter API egress allowlists, centralized audit retention, and organization-level guardrails.
- No Lambda preprocessor is included; Step Functions receives the S3 event directly from EventBridge to keep the design small.
- No main relational database, RDS proxy, or domain schema is deployed because the prompt treats the processor and main database as external concerns. The DynamoDB table is only an operational job ledger for orchestration status.
- The stack configures the processor runtime environment but does not implement the processor image itself. The container image is expected to own CSV parsing, enrichment, output writing, and any domain database updates.
- A NAT gateway is conditionally deployed based on enrichment API CIDRs. Interface VPC endpoints cover the majority of AWS service traffic.
- Fargate is simpler than a persistent ECS service or AWS Batch for this use case. AWS Batch could be attractive for heavier scheduling, queues, or very high concurrency.
- Multipart upload initiation and presigned URL generation are intentionally outside this stack because the assignment does not include an upload API. A real product would add an authenticated API for creating multipart upload sessions.
- The job table key includes the S3 sequencer, so overwrites of the same object become separate processing records. If the product needs strict one-record-per-object idempotency, use bucket and key alone with conditional writes.
- Macie findings are routed to CloudWatch Logs and an SNS topic. In production, the SNS topic would typically subscribe Slack/PagerDuty or route to Security Hub for a full alerting workflow.
- CSV parsing, PII scrubbing, enrichment logic, or database writes inside the processor application.
- Upload API, authentication flow, or presigned multipart upload URL generation.
- Main database, schema migrations, RDS proxy, or data access layer.
- Production alerting, dashboards, runbooks, or incident response automation.
- Multi-account deployment pipeline or automated production deployment.
- Full compliance controls such as data subject deletion workflows, legal hold, or centralized audit retention.
- Replace the sample processor image URI with the real pinned processor image in ECR.
- Add user-facing APIs over the job metadata table for progress and retry visibility.
- Add operational alerts for failed workflow executions.
- Add Macie custom data identifiers for domain-specific customer identifiers if the default managed identifiers are not enough.
- Add reserved concurrency controls or EventBridge/SQS buffering if many large files can arrive at once.
- Add integration tests that assert the synthesized IAM policy scope and Step Functions input paths.
npm install
aws configure
cdk bootstrap aws://<account-id>/<region>
npm run build
cdk synth
cdk deployaws configure should point to the AWS account where you want to deploy. The stack is portable across accounts and regions; set the target account and region through your AWS CLI profile, or export them explicitly:
export AWS_ACCOUNT_ID=<your-account-id>
export AWS_REGION=<aws-region>
export PROCESSOR_IMAGE=<account-id>.dkr.ecr.<aws-region>.amazonaws.com/csv-processor@sha256:<hex>
cdk bootstrap aws://$AWS_ACCOUNT_ID/$AWS_REGION
cdk deployIf neither AWS_ACCOUNT_ID nor AWS_REGION is set, CDK uses the active CLI profile/default environment. Resource names are intentionally not exposed in stack outputs to reduce attack surface. Use the AWS CLI or Console to discover resource names after deployment:
# List S3 buckets with the data-processing prefix
aws s3 ls | grep data
# Find the DynamoDB job table
aws dynamodb list-tables | grep ProcessingJobs
# Find the retry queue
aws sqs list-queues --queue-name-prefix DataProcessingInfrastructureAll configuration is available through CDK context (-c key=value) or environment variables.
| Option | CDK Context | Environment Variable | Default |
|---|---|---|---|
| Processor image | processorImage |
PROCESSOR_IMAGE |
Required |
| Raw file retention | rawFileRetentionDays |
RAW_FILE_RETENTION_DAYS |
7 |
| Processed file retention | processedFileRetentionDays |
PROCESSED_FILE_RETENTION_DAYS |
7 |
| Failed file retention | failedFileRetentionDays |
FAILED_FILE_RETENTION_DAYS |
7 |
| Enrichment API CIDRs | enrichmentApiCidrs |
ENRICHMENT_API_CIDRS |
(none) |
| Job retention | jobRetentionDays |
JOB_RETENTION_DAYS |
30 |
| Processor CPU | processorCpu |
PROCESSOR_CPU |
1024 |
| Processor memory | processorMemory |
PROCESSOR_MEMORY |
2048 |
| Log retention | logRetentionDays |
LOG_RETENTION_DAYS |
30 |
# Per-bucket retention
cdk deploy -c rawFileRetentionDays=14 -c processedFileRetentionDays=30 -c failedFileRetentionDays=90
# Processor compute
cdk deploy -c processorCpu=512 -c processorMemory=1024
# Enrichment API CIDRs (enables NAT Gateway automatically)
cdk deploy -c enrichmentApiCidrs=203.0.113.0/24,198.51.100.0/24
# Log retention
export LOG_RETENTION_DAYS=14
cdk deployThe processor image must be pinned to a digest:
cdk deploy -c processorImage=<account-id>.dkr.ecr.<aws-region>.amazonaws.com/csv-processor@sha256:<hex>Job records in DynamoDB expire after a configurable number of days (default 30):
cdk deploy -c jobRetentionDays=60Run CDK Nag security checks by setting CDK_NAG=1:
CDK_NAG=1 npx cdk synthGitHub Actions runs a CI workflow on pull requests and pushes to main:
npm cinpm run buildnpm test -- --runInBandnpx cdk synthopa evalagainst the synthesized template for IAM, S3, networking, logging, compute, and data guardrails
The workflow intentionally does not deploy yet. For deployment automation, add a separate workflow using GitHub OIDC and an AWS IAM role scoped to this stack instead of storing long-lived AWS access keys in GitHub secrets.
OPA/Rego policies in policy/ validate the synthesized CloudFormation template:
| Domain | Policy | Check |
|---|---|---|
| IAM | deny_iam_policy_star_action |
No wildcard IAM actions |
| IAM | deny_iam_role_managed_policy_admin |
No AdministratorAccess policy |
| IAM | deny_iam_role_star_assume |
No wildcard assume-role principals |
| S3 | deny_s3_no_encryption |
Encryption enabled on all buckets |
| S3 | deny_s3_no_block_public_access |
Public access blocked |
| S3 | deny_s3_bucket_acl_not_enforced |
BucketOwnerEnforced ownership |
| S3 | deny_s3_versioning_disabled |
Versioning enabled |
| Network | deny_security_group_open_ingress |
No public ingress |
| Network | deny_security_group_open_egress |
No public egress |
| Logging | deny_log_group_without_retention |
Retention configured |
| Logging | deny_cloudtrail_without_validation |
Log file validation |
| Logging | deny_cloudtrail_without_kms |
KMS encryption |
| Security | deny_kms_key_rotation_disabled |
Key rotation enabled |
| Security | deny_secrets_manager_no_encryption |
KMS on secrets |
| Compute | deny_ecs_task_definition_unpinned_image |
Digest-pinned images |
| Compute | deny_ecs_task_definition_privileged_mode |
No privileged mode |
| Compute | deny_ecs_task_definition_root_user |
Non-root user |
| Data | deny_dynamodb_no_pitr |
Point-in-time recovery |
| Data | deny_sqs_no_encryption |
KMS on queues |
├── .github/
│ ├── workflows/ci.yml # CI pipeline
│ └── PULL_REQUEST_TEMPLATE.md
├── bin/
│ └── data-processing-infrastructure.ts # CDK app entry point
├── lib/
│ ├── data-processing-infrastructure-stack.ts # Main stack
│ └── deployment-config.ts # Configuration resolver
├── policy/ # OPA/Rego policies (6 domains)
│ ├── compute.rego
│ ├── data.rego
│ ├── iam.rego
│ ├── logging.rego
│ ├── network.rego
│ ├── s3.rego
│ └── security.rego
├── test/
│ └── data-processing-infrastructure.test.ts
├── package.json
├── tsconfig.json
├── jest.config.js
├── cdk.json
├── CONTRIBUTING.md
└── LICENSE
MIT