Pair programming with AI rubberducks - is it worth it?

If you are like me, you have a mix of scepticism and curiosity what and how we can use AI tools to assist with our software development workflows - beyond writing quick snippets of code. The results can be buggy and incomplete, and hard to maintain. But if we treat the AI tools as virtual pair programmers, can that get us closer to a sweet spot of increased productivity with good quality? This approach could perhaps provide a middle ground between solo development and pair programming, offering benefits from both approaches.

The experiment - pair programming with Claude Code

To explore this hypothesis, I decided to build a relatively simple project using Claude Code (Pro subscription) as my pair programming partner: a command-line tool that syncs local files to BunnyCDN storage. I chose this as it would be complex enough to include some instructive design decisions and meaningful collaboration, but also simple enough to complete in a reasonable timeframe.

The results were mixed, but perhaps more positive than negative.

Setting the ground rules

To try to make Claude behave like cooperating pair programmer, I have some instructions to it at each start of a session that it should follow:

You are acting as a pair programming buddy to me. I write this in F#, and this is part of an exercise to get more familiar with F# and writing good F#. You can assume that I have not a lot of experience with F#, and not a lot of experience with .NET. In this work we are using Test-driven development (TDD) to build the solution, and write tests first, and use a red-green-refactor cycle. Implementations should solve the test case in the simplest way possible, and then refactor to build a more complete solution, unless it is trivial to build the complete solution directly. You shall not write any code yourself, only make suggestions - unless I explicitly tell you to write or change code.

At the start of a session, also read the file README.md to learn about the project and the identified tasks that have been completed and what should be worked on. You can also look at the code in the solution to see where we are at currently.

It took a few iterations to land on this instruction. Initially, I kept this instruction in a separate markdown file, which I asked Claude to read at the start of a session. Later in the project, I asked Claude Code to set up context for the project with the/init command. Since the instructions above could be considered specific for my approach to working with Claude, I asked Claude to reformat that text so it would be easier for it to parse, and save it to CLAUDE.local.md. I added that file to gitignore as well.

Claude generated this instruction from this text:

Project Specific Instructions

Pair Programming Context

Act as a pair programming buddy

User is learning F# and .NET (limited experience)

Use Test-Driven Development (TDD) approach with red-green-refactor cycles

Development Process

Write tests first before implementation

Implement minimal solution to pass tests

Refactor to improve design (unless trivial solution is complete)

Follow red-green-refactor cycle

Code Assistance Rules

Make suggestions only - do not write code unless explicitly asked

Help guide the user through F# best practices

Support learning-oriented development approach

Session Initialization

When starting a new session: 1. Read README.md to understand the project 2. Read TASKS.md to see completed work and current priorities
3. Review existing code to understand current state 4. Focus on helping user learn F# through practical TDD implementation

Recently, I found that there is an effort to standardize the names of these instruction files: AGENTS.md. I move the contents from CLAUDE.md to AGENTS.md, and only added a reference in CLAUDE.md to read the contents of AGENTS.md.

Following TDD with an AI buddy

I started by working with Claude to build a comprehensive list of test cases without writing any implementation code. This proved valuable — it helped focus both myself and Claude on the tasks to do.

We then followed the classic red-green-refactor cycle of Test-Driven Development. However, Claude was perhaps a bit eager to write a suggestion for the whole test case directly rather than writing the simplest solution that would work with the specific data we used in the test case. So there was limited refactoring steps.

I found myself periodically reining it in, trying to stick to the TDD principle of writing the minimal code to make tests pass before refactoring.

As the solution grew more complete and we began integrating smaller building blocks, Claude started suggesting implementation code before writing tests. Again, discipline was required to maintain our TDD approach.

But after being more explicit in the instructions on the TDD workflow and writing tests first, Claude behaved better.

The Good, the Bad, and the Rubber Duck

The good parts: Claude worked well for me as a virtual rubber duck. Explaining concerns and arguing with it helped clarify my thinking, and there were also cases where it suggested better solutions that I had not considered. I could also start a session at any time I wanted, and keep at it for as long as it worked for me — no scheduling issues. Claude could also do a few things that did not feel like the core work, like making the build script and the command-line parsing setup.

The bad parts: As expected, Claude sometimes acted as a quite eager developer wanting to take as large chunks of work as the checklist allowed. If you allowed it to grab larger chunks of work, it would happily continue on that path, until you told it to back down. When I disagreed with its suggestions, it often happily changed opinion to mine and praising my suggestion. While Claude did push back a few times, it would accept whatever I chose after that.

The functional programming challenge: Claude was occasionally suggesting solutions that probably would have been more suitable if the language had been C# or another language that is not functional-first. So I had to argue with it about following functional programming principles. It was very happy to adjust though, and acknowledge the importance of adhering to functional design. It did get better after a while, when more code had been produced.

Comparing approaches - controlled vs freestyle programming

This disciplined pair programming approach stood in stark contrast to simply asking Claude to write the solution from scratch, which I also tried. The latter was certainly faster, producing a somewhat working solution quickly. However, that solution was only partially functional and contained multiple bugs, which it could not resolve easily.

Following the code generated was possible, but spotting the bugs proved challenging without a systematic way to test the design and pinpoint problems. The controlled, collaborative approach took longer but resulted in a more robust solution.

Notably though, the same type of bugs appeared in the collaborative approach in the suggested code, but was much easier to pinpoint and fix.

Lessons learned

Seniority is preferred The temptation to let the AI tool take over and write the code can probably be strong if you are a junior developer, which may not be the best outcome. Maintaining discipline and good software engineering practices requires experience working under such conditions.

Limited scope works best I think I would use this approach for smaller portions of a solution, with clear boundaries. Small and somewhat vague scopes can work if you mainly do rubberducking, or exploration for learning purposes.

Good practices matter If we are going to work effectively with AI-assisted development, we need to become better to adhere to solid software engineering practices — perhaps more than we have done in the past.

It can be fun I must admit I enjoyed parts of the experience. Not as good as pair/ensemble programming with real experienced developers, but occasionally nicer than working with problems without interacting with anyone else.

I think treating AI tools as some kind of collaborative partners rather than just code-generating tools might be a way forward for many of us.

The key is maintaining control over the development process while leveraging the AI’s ability to suggest alternatives and serve as an intelligent rubber duck.