Testing has always been one of those tasks that developers know is essential but often find tedious. When I decided to add comprehensive unit tests to my NoteBookmark project, I thought: why not make this an experiment in AI-assisted development? What followed was a fascinating 4-hour journey that resulted in 88 unit tests, a complete CI/CD pipeline, and some valuable insights about working with AI coding assistants.
The Project: NoteBookmark
NoteBookmark is a .NET application built with C# that helps users manage and organize their reading notes and bookmarks. The project includes an API, a Blazor frontend, and uses Azure services for storage. You can check out the complete project on GitHub.
The Challenge: Starting from Zero
I'll be honest - it had been a while since I'd written comprehensive unit tests. Rather than diving in myself, I decided to see how different AI models would approach this task. My initial request was deliberately vague: "add a test project" without any other specifications.
Looking back, I realize I should have been more specific about which parts of the code I wanted covered. This would have made the review process easier and given me better control over the scope. But sometimes, the best learning comes from letting the AI surprise you.
The Great AI Model Comparison
GPT-4.1: Competent but Quiet
GPT-4.1 delivered decent results, but the experience felt somewhat mechanical. The code it generated was functional, but I found myself wanting more context. The explanations were minimal, and I often had to ask follow-up questions to understand the reasoning behind certain test approaches.
Gemini: The False Start
My experience with Gemini was... strange. Perhaps it was a glitch or an off day, but most of what was generated simply didn't work. I didn't persist with this model for long, as debugging AI-generated code that fundamentally doesn't function defeats the purpose of the exercise. Note that at the time of this writing, Gemini was still in preview, so I expect it to improve over time.
Claude Sonnet: The Clear Winner
This is where the magic happened. Claude Sonnet became my co-pilot of choice for this project. What set it apart wasn't just the quality of the code (though that was excellent), but the quality of the conversation. It felt like having a thoughtful colleague thinking out loud with me.
The explanations were clear and educational. When Claude suggested a particular testing approach, it would explain why. When it encountered a complex scenario, it would walk through its reasoning. I tried different versions of Claude Sonnet but didn't notice significant differences in results - they were all consistently good.
The Development Process: A 4-Hour Journey
Hour 1-2: Getting to Compilation
The first iteration couldn't compile. This wasn't surprising given the complexity of the codebase and the vague initial request. But here's where the AI collaboration really shined. Instead of manually debugging everything myself, I worked with Copilot to identify and fix issues iteratively.
We went through several rounds of:
- Identify compilation errors
- Discuss the best approach to fix them
- Let the AI implement the fixes
- Review and refine
After about 2 hours, we had a test project with 88 unit tests that compiled successfully. The AI had chosen xUnit as the testing framework, which I was happy with - it's a solid choice that I might not have picked myself if I was rusty on the current .NET testing landscape.
Hour 2.5-3.5: Making Tests Pass
Getting the tests to compile was one thing; getting them to pass was another challenge entirely. This phase taught me a lot about both my codebase and xUnit features I wasn't familiar with.
I relied heavily on the /explain
feature during this phase. When tests failed, I'd ask Claude to explain what was happening and why. This was invaluable for understanding not just the immediate fix, but the underlying testing concepts.
One of those moment was learning about [InlineData(true)]
and other xUnit data attributes. These weren't features I was familiar with, and having them explained in context made them immediately useful.
![]() |
Hour 3.5-4: Structure and Style
Once all tests were passing, I spent time ensuring I understood each test and requesting structural changes to match my preferences. This phase was crucial for taking ownership of the code. Just because AI wrote it doesn't mean it should remain a black box. Let's repeat this: Understanding the code is essential; just because AI wrote it doesn't mean it's good.
Beyond Testing: CI/CD Integration
With the tests complete, I asked Copilot to create a GitHub Actions workflow to run tests on every push to main and v-next branches, plus PR reviews. Initially it started modifiying my existing workflow that takess care of the Azure deployment. I wanted a separate workflow for testing, so I interrupted (that's nice I wasn't "forced" to wait), and asked it to create a new one instead. The result was the running-unit-tests.yml
workflow that worked perfectly on the first try.
This was genuinely surprising. CI/CD configurations often require tweaking, but the generated workflow handled:
- Multi-version .NET setup
- Dependency restoration
- Building and testing
- Test result reporting
- Code coverage analysis
- Artifact uploading
The PR Enhancement Adventure
Here's where things got interesting. When I asked Copilot to enhance the workflow to show test results in PRs, it started adding components, then paused and asked if it could delete the current version and start from scratch.
I said yes, and I'm glad I did. The rebuilt version created beautiful PR comments showing:
- Test results summary
- Code coverage reports (which I didn't ask for but appreciated)
- Detailed breakdowns.
The Finishing Touches
No project is complete without proper status indicators. I added a test status badge to the README, giving anyone visiting the repository immediate visibility into the project's health.
Key Takeaways
What Worked Well
- AI as a Learning Partner: Having Copilot explain testing concepts and xUnit features was like having a patient teacher
- Iterative Refinement: The back-and-forth process felt natural and productive
- Comprehensive Solutions: The AI didn't just write tests; it created a complete testing infrastructure
- Quality Over Speed: While it took 4 hours, the result was thorough and well-structured
What I'd Do Differently
- Be More Specific Initially: Starting with clearer scope would have streamlined the process
- Set Testing Priorities: Identifying critical paths first would have been valuable
- Plan for Visual Test Reports: Thinking about test result visualization from the start
Lessons About AI Collaboration
- Model Choice Matters: The difference between AI models was significant
- Conversation Quality Matters: Clear explanations make the collaboration more valuable
- Trust but Verify: Understanding every piece of generated code is crucial
- Embrace Iteration: The best results come from multiple refinement cycles
The Bigger Picture
This experiment reinforced my belief that AI coding assistants are most powerful when they're true collaborators rather than code generators. The value wasn't just in the 88 tests that were written, but in the learning that happened along the way.
For developers hesitant about AI assistance in testing: this isn't about replacing your testing skills, it's about augmenting them. The AI handles the boilerplate and suggests patterns, but you bring the domain knowledge and quality judgment.
Conclusion
Would I do this again? Absolutely. The combination of comprehensive test coverage, learning opportunities, and time efficiency made this a clear win. The 4 hours invested created not just tests, but a complete testing infrastructure that will pay dividends throughout the project's lifecycle.
If you're considering AI-assisted testing for your own projects, my advice is simple: start the conversation, be prepared to iterate, and don't be afraid to ask "why" at every step. The goal isn't just working code - it's understanding and owning that code.
The complete test suite and CI/CD pipeline are available in the NoteBookmark repository if you want to see the results of this AI collaboration in action.