How to Add Tests to Code You Didn't Write
Strategy for adding test coverage to AI-generated code. Start with smoke tests, build up to unit tests.
You're staring at a codebase you didn't write. There are no tests. The code works, mostly, but you're not sure how or why. Every time you change something, something else breaks.
This is the reality for most people working with AI-generated code. The AI wrote thousands of lines, the features work, and now you need to add tests to keep it stable. But how do you write tests for code you don't fully understand?
This post gives you a practical, step-by-step strategy. No theory. No "test everything" advice. Just the specific approach that works for AI-generated codebases.
The Strategy: Smoke Tests → Integration Tests → Unit Tests
Don't start with unit tests. That's the instinct most developers have, and it's the wrong order for AI-generated code. Here's why:
Unit tests require understanding how individual functions work. With AI-generated code, you often don't have that understanding. You didn't write the functions. The naming might be inconsistent. The logic might be scattered.
Start with tests that verify the system works at all, then work inward.
Phase 1: Smoke Tests (1-2 hours)
Smoke tests answer one question: Does the app start and respond?
These tests are fast to write and immediately tell you if something is fundamentally broken. Here's what to check:
Backend smoke tests:// Smoke test: Does the server start?
describe('Server health', () => {
it('should respond to health check', async () => {
const response = await request(app).get('/health')
expect(response.status).toBe(200)
})
it('should connect to the database', async () => {
const response = await request(app).get('/health/db')
expect(response.status).toBe(200)
expect(response.body.connected).toBe(true)
})
})
Frontend smoke tests:
// Smoke test: Do critical pages render?
describe('Critical pages', () => {
it('should render the homepage without crashing', () => {
render(<App />)
expect(screen.getByRole('main')).toBeInTheDocument()
})
it('should render the login page', () => {
render(<App initialRoute="/login" />)
expect(screen.getByText(/sign in/i)).toBeInTheDocument()
})
})
What to check:
- Can the app start in a test environment?
- Do the health endpoints respond?
- Do the main pages render without errors?
- Can the database connect?
- Can you reach the API from the test client?
If any smoke test fails, stop. Fix the foundation before writing deeper tests.
Time investment: 1-2 hours. Value: catches show-stoppers immediately.
Phase 2: Integration Tests (1-2 days)
Integration tests answer: Do the critical user flows work end-to-end?
These are the most valuable tests for AI-generated code because they test the actual behavior users depend on, without requiring you to understand every internal function.
Identify your critical flows first:Typical critical flows:
- User signup → email verification → first login
- User creates content → content saves → content appears in list
- User initiates purchase → payment processes → confirmation shown
- User uploads file → file stores → file can be retrieved
describe('User signup flow', () => {
it('should allow a new user to sign up and log in', async () => {
// Sign up
const signupResponse = await request(app)
.post('/api/auth/signup')
.send({
email: '[email protected]',
password: 'securePassword123!'
})
expect(signupResponse.status).toBe(201)
// Log in
const loginResponse = await request(app)
.post('/api/auth/login')
.send({
email: '[email protected]',
password: 'securePassword123!'
})
expect(loginResponse.status).toBe(200)
expect(loginResponse.body.token).toBeDefined()
// Access protected resource
const protectedResponse = await request(app)
.get('/api/dashboard')
.set('Authorization', Bearer ${loginResponse.body.token})
expect(protectedResponse.status).toBe(200)
})
})
Key principle for integration tests: Test what the system does, not how it does it. If you change the internal implementation but the user flow still works, your tests pass. That's correct behavior.
Time investment: 1-2 days. Value: catches the bugs that actually matter to users.
Phase 3: Unit Tests (Ongoing)
Unit tests answer: Does each individual piece work correctly in isolation?
Start writing unit tests only after smoke and integration tests are in place. Focus on:
- Complex business logic — functions with branching, calculations, or data transformations
- Edge cases — what happens with empty input, very large input, special characters
- Utility functions — date formatting, validation, data cleaning
// Unit test for a function you identify as critical
describe('calculateSubscriptionTotal', () => {
it('should apply monthly pricing correctly', () => {
expect(calculateSubscriptionTotal('monthly', 1)).toBe(29.99)
expect(calculateSubscriptionTotal('monthly', 3)).toBe(89.97)
})
it('should apply discount for annual billing', () => {
expect(calculateSubscriptionTotal('annual', 1)).toBe(299.99)
expect(calculateSubscriptionTotal('annual', 3)).toBe(899.97)
})
it('should throw for invalid plan type', () => {
expect(() => calculateSubscriptionTotal('weekly', 1)).toThrow('Invalid plan type')
})
})
Time investment: Ongoing. Value: catches subtle logic bugs before they reach production.
How to Understand Code You Didn't Write
The biggest challenge with testing AI-generated code isn't writing the tests — it's understanding the code well enough to know what to test.
Technique 1: Trace the Data Flow
Start at the entry point (API route, button click handler) and follow the data:
You don't need to understand every line. You need to understand the path data takes from input to output.
Technique 2: Use the Debugger
Set breakpoints at the start of functions and step through the code. Watch how data transforms at each step. This is the fastest way to understand unfamiliar code.
// Add a breakpoint here
async function processOrder(orderData) {
const validated = validateOrder(orderData) // What does this return?
const pricing = calculatePricing(validated) // How does this work?
const result = await saveOrder(pricing) // What gets stored?
return result
}
Technique 3: Check the Database
If you're unsure what code does, check what it stores. Look at the database schema, then trace back to the code that writes to each table. The schema tells you what the code is supposed to do.
Technique 4: Use AI to Explain the Code
This is meta — using AI to understand AI-generated code — but it works. Paste a function into an AI tool and ask: "What does this function do? What are the edge cases? What could go wrong?"
The AI won't always be right, but it gives you a starting point. Verify the explanation by checking the actual behavior.
Mocking Strategies for AI-Generated Code
When you don't understand internal functions, mocking becomes critical. Mock what you don't understand and test what matters.
Mock External Services
AI-generated code often calls third-party services directly without abstraction. You need to mock these:
// Instead of testing with a real Stripe call
jest.mock('./stripe', () => ({
chargeCard: jest.fn().mockResolvedValue({ id: 'ch_test', status: 'succeeded' })
}))
// Instead of testing with a real email service
jest.mock('./email', () => ({
sendEmail: jest.fn().mockResolvedValue({ sent: true })
}))
Mock the Database (for unit tests)
For unit tests, mock database calls. For integration tests, use a test database:
// Unit test: mock the database
jest.mock('./db', () => ({
query: jest.fn().mockResolvedValue([{ id: 1, name: 'Test User' }])
}))
// Integration test: use a real test database
beforeAll(async () => {
await db.migrate.latest()
await db.seed.run()
})
Mock Time (for time-sensitive code)
AI-generated code often has subtle time bugs. Mock time to catch them:
describe('Session expiry', () => {
it('should expire sessions after 30 minutes', () => {
const now = new Date('2026-01-01T12:00:00Z')
jest.setSystemTime(now)
const session = createSession()
expect(isSessionValid(session)).toBe(true)
jest.setSystemTime(new Date('2026-01-01T12:31:00Z'))
expect(isSessionValid(session)).toBe(false)
})
})
Using AI to Help Write Tests
Here's the practical loop for generating tests with AI:
Write tests for this function. Include:
- Happy path cases
- Edge cases (empty input, null, wrong types)
- Error cases
- Boundary conditions
Use [your test framework]. Don't mock anything unless I specify.
This workflow typically generates 70-80% useful tests. The other 20-30% needs human judgment to get right.
Common Pitfalls When Testing AI-Generated Code
Don't test implementation details. AI code changes frequently. If your tests check internal function names or call order, they'll break when the code changes. Test behavior, not structure. Don't write tests that depend on other tests. Each test should be independent. AI-generated code often has shared state that causes test interdependence — find and eliminate those patterns. Don't ignore flaky tests. If a test fails sometimes but not always, there's a real bug hiding in there. Usually it's a race condition or shared state issue. Fix the root cause, don't just rerun. Don't aim for 100% coverage. Coverage above 80% usually means you're testing trivial getters and setters instead of meaningful logic. Focus on the 20% of code that handles 80% of the business value.The Testing Pyramid for AI-Generated Code
/ Unit \ ← Ongoing, focused on complex logic
/----------\
/ Integration \ ← 1-2 days, test critical user flows
/--------------\
/ Smoke Tests \ ← 1-2 hours, verify system starts
/------------------\
Start at the bottom, work your way up. Don't skip levels.
The Bottom Line
Testing code you didn't write is about managing risk, not achieving perfection. You don't need to understand every line of AI-generated code to write tests that keep your product stable.
The smoke test → integration test → unit test progression works because each layer builds on the one below. Smoke tests tell you the system works at all. Integration tests tell you users can complete their workflows. Unit tests catch the subtle bugs in business logic.
Most vibe-coded projects fail because they skip testing entirely. Adding tests after the fact feels like extra work — and it is. But it's the kind of work that prevents you from spending twice as long debugging production issues.
If you're not sure where to start, we can help. Get a free vibe-code assessment and we'll identify exactly which tests will give you the most stability for the least effort.
FAQ
How many tests do I actually need?
Start with 5-10 integration tests covering your critical user flows. That alone catches 80% of the bugs that matter. Add unit tests for complex business logic as you encounter bugs or add features. A typical MVP needs 30-50 tests total to be reasonably stable.
Should I use TDD with AI-generated code?
Not initially. TDD works best when you're writing new code from scratch. With existing AI-generated code, start by understanding what the code does (through integration tests), then add targeted unit tests where needed. You can adopt TDD for new features going forward.
What test framework should I use?
Use whatever your stack's community recommends. For JavaScript/TypeScript: Jest or Vitest. For Python: pytest. For Go: the built-in testing package. For Ruby: RSpec or Minitest. Don't spend time choosing — just pick the standard tool and start.
Can AI write all my tests for me?
AI can generate 70-80% of useful tests, but it makes wrong assumptions about behavior. Always review AI-generated tests, run them against the actual code, and verify the assertions match real behavior. Treat AI test generation as a starting point, not a finished product.
What if I find bugs while writing tests?
Good. That's the point. Write down what you found, fix the critical bugs, and add a test to prevent regression. Finding bugs during testing is infinitely better than finding them in production with angry users.
Need help with your vibe-coded codebase?
Get a free assessment. We'll tell you exactly what needs fixing and in what order.