Two stacks are all you need: Permanent data vs. ephemeral compute
I use two CDK stacks on every project. Not three, not seven, just two.
One stack is for data that must never be deleted. The other is for compute I can tear down and rebuild whenever needed.
I have shipped three different production apps with this pattern, and it has held up every time.
The short version: stack boundaries should follow failure tolerance, not abstract architectural neatness.
The two-stack split
The idea is simple: every resource in your app falls into one of two buckets.
Data stack (permanent)
Anything that holds state you cannot afford to lose: databases, S3 buckets, secrets, VPC networking, and setup that would cost real time, budget, or pain to recreate.
This stack deploys first and should not change often. When it does change, review the diff carefully.
Ephemeral stack (everything else)
App Runner services, ECS tasks, Lambdas, API Gateway, CloudFront, and any resource you can rebuild from code without losing state.
Need a big infra change? Delete the ephemeral stack, redeploy, and move on.
This is why my deploys stay under 10 minutes. I am not waiting on database provisioning or networking every time.
Dependency rule
The ephemeral stack depends on the data stack, never the other way around.
Here is what the wiring looks like:
import * as cdk from 'aws-cdk-lib';import { DataStack } from '../lib/data-stack';import { EphemeralStack } from '../lib/ephemeral-stack';
const app = new cdk.App();const env = { account: process.env.CDK_DEFAULT_ACCOUNT, region: process.env.CDK_DEFAULT_REGION,};
const data = new DataStack(app, 'DataStack', { env });const ephemeral = new EphemeralStack(app, 'EphemeralStack', { env, vpc: data.vpc, database: data.database, bucket: data.bucket,});ephemeral.addDependency(data);The data stack exposes what the ephemeral stack needs through public properties:
import * as cdk from 'aws-cdk-lib';import * as ec2 from 'aws-cdk-lib/aws-ec2';import * as rds from 'aws-cdk-lib/aws-rds';import * as s3 from 'aws-cdk-lib/aws-s3';import { Construct } from 'constructs';
export class DataStack extends cdk.Stack { public readonly vpc: ec2.IVpc; public readonly database: rds.IDatabaseInstance; public readonly bucket: s3.IBucket;
constructor(scope: Construct, id: string, props?: cdk.StackProps) { super(scope, id, props);
this.vpc = new ec2.Vpc(this, 'Vpc', { maxAzs: 2, natGateways: 1, });
this.database = new rds.DatabaseInstance(this, 'Database', { engine: rds.DatabaseInstanceEngine.postgres({ version: rds.PostgresEngineVersion.VER_16, }), vpc: this.vpc, vpcSubnets: { subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS }, removalPolicy: cdk.RemovalPolicy.RETAIN, });
this.bucket = new s3.Bucket(this, 'Storage', { removalPolicy: cdk.RemovalPolicy.RETAIN, autoDeleteObjects: false, }); }}Notice removalPolicy: cdk.RemovalPolicy.RETAIN on the database and bucket. That is the whole point of this stack.
Even if CloudFormation tries to delete them, AWS refuses.
The ephemeral stack takes those resources as props and builds everything else:
import * as cdk from 'aws-cdk-lib';import * as ec2 from 'aws-cdk-lib/aws-ec2';import * as rds from 'aws-cdk-lib/aws-rds';import * as s3 from 'aws-cdk-lib/aws-s3';import * as apprunner from '@aws-cdk/aws-apprunner-alpha';import { Construct } from 'constructs';
interface EphemeralStackProps extends cdk.StackProps { vpc: ec2.IVpc; database: rds.IDatabaseInstance; bucket: s3.IBucket;}
export class EphemeralStack extends cdk.Stack { constructor(scope: Construct, id: string, props: EphemeralStackProps) { super(scope, id, props);
// App Runner services, ECS tasks, Lambdas, API Gateway, etc. // Anything that can be rebuilt from code goes here. // Everything references the data stack through props.
const appService = new apprunner.Service(this, 'AppService', { source: apprunner.Source.fromEcrPublic({ imageIdentifier: 'your-image:latest', }), vpcConnector: new apprunner.VpcConnector(this, 'VpcConnector', { vpc: props.vpc, vpcSubnets: { subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS }, }), }); }}That is it. Two stacks, one dependency direction, clean separation.
What brought me here
Starting constraints
I did not start with this process. I started with the “right” way and seven stacks.
I had a team of myself, two in-house engineers (high commitment), and two Fiverr engineers (low commitment). Our mission was to take over a NextJS app on a GCP Kubernetes cluster that had not been updated in about six months, finish two in-progress features, and then build an enterprise offering.
The timeline for those features was effectively “yesterday,” which usually means already late and somehow expected to ship in days.
The app had zero infrastructure as code. Everything was deployed manually by an engineer who was mostly out of pocket by then. The main dev who wrote most of the code quit within a week of my onboarding.
It was do or die in a few months.
First implementation
I needed something I could teach quickly, so I migrated everything to AWS CDK with a recommended seven-stack architecture.
Database in private subnets, two App Runner services for main and admin apps, ECS Fargate for OCR, secrets in their own stack, networking in its own stack, and the rest cleanly separated like the guides suggest.
It worked great for about three weeks.
Where seven stacks went sideways
Failure mode 1: Secrets became blocked
Problems started when we needed to add secrets.
We were splitting out a third app for enterprise users, and each app needed its own NextAuth secrets. Those belonged in the secrets stack, which deployed before the app stack.
Except the secrets stack had hard references into both the app stack and database stack. CloudFormation could not modify it cleanly while dependent stacks were already in play.
A simple secret change failed to deploy.
Failure mode 2: Dependencies compounded
We were not going to wipe the app and database to add a secret. So we took a shortcut and put new secrets directly in the app stack.
That got us moving and hit the deadline, but it created technical debt immediately.
Then each new infra change ran into more cross-stack tangles. Eventually we had circular dependencies CloudFormation could not resolve.
Every deploy became a puzzle: order of operations, update sequencing, and deadlocks.
The rewrite
I rewrote the CDK codebase and migrated each environment to two stacks: data and ephemeral.
The data stack goes first and keeps everything painful to lose plus foundational setup. The ephemeral stack holds everything else.
The ephemeral stack depends on the data stack. Full stop.
That rewrite took one weekend, which was about the same amount of time I had been losing every month to deployment weirdness.
More stacks means more boundaries, and every boundary is a place CloudFormation can deadlock your deploy.
When it is actually good to split stacks
Cases where more stacks make sense
A two-stack model will not fit every team or app.
If you have a large team parallelizing truly independent services, splitting stacks around team or service boundaries can help.
You also need a different approach if you are hitting CloudFormation resource limits per stack.
Cases where fewer stacks make sense
If your app is tiny and has only a handful of resources, one stack is fine. Do not split early just because a guide says to.
Engineering is about fitting the solution to the problem in front of you.
Start with the simplest thing that works. Scale architecture only when constraints demand it.
What’s next
If this two-stack model resonated, the next CDK post goes one level deeper: why I stopped trying to unit test CDK stacks and instead test real deployed infrastructure.
I will walk through how cdk diff, multiple non-prod environments, and end-to-end tests provide more confidence than brittle snapshot tests.
If that sounds useful, sign up below for alerts when new posts go live.