Unit Test Automation Using AI: A CLI-tool For Test Generation  - 20240917_tools_baner-1

We understand your frustration with the endless cycle of manual software testing. While it may have been exciting initially, the novelty wears off quickly. Writing tests can feel as tedious as solving a Rubik’s Cube blindfolded. However, thanks to advancements in AI, unit test automation is now a reality, allowing you to breathe easier.

In this article we want to present to you the results of our experiments with AI based unit test generation. We’ve engineered a command-line interface (CLI) solution that’s about to make testing software a walk in the park. It’s much better than traditional testing tools and can generate tests and identify bugs at warp speed.

We’ll explain what kind of tool we’ve created, it’s key features, and the reasons why this tool will be useful to you. We’ll show in great detail how our tool works. You’ll learn how much time you can save with it. And the results of using the tool. So buckle up, folks. The time has come to make your testing life easier than finding an unhandled exception in a “Hello World” program.

Our AI-Powered CLI Tool: The Why, The What & Its Key Features

Reasons to Create This Test Automation Software

Picture this: It’s been 10 years since you started creating tests. You’ve analyzed so many requirements and created so many test plans, that even if woken up in the middle of the night, you could come up with pass/fail criteria for any feature or function on the go. But experienced as you might be, you’re tired of how time-consuming the whole process is. Many projects lack time required for thorough testing.

You’re a human being, and we’re prone to making mistakes. The lack of consistency is frustrating. Testing is also notoriously difficult to scale, and as the application grows in size and complexity, it becomes increasingly difficult to manually test all the features and combinations. As your enthusiasm wanes, the test quality or coverage risks falling into the gutter. Some say that writing unit tests is about as exciting as watching paint dry, and we tend to agree.

There’s also the money issue: what client wouldn’t want to reduce the cost by reducing the number of tests? They view them as non-essential. Even if we completely disagree with them! Our tool greatly accelerates the testing process, making it more cost-effective.

So you’ll find it much easier to justify the need for unit testing or implementing AI testing tools to your clients. Another point is that software often changes quickly. As it changes, the need to update tests quickly needs to be addressed, too. Our CLI tool will come to the rescue here.

Last but not least, there’s an issue of low test coverage. With our tool, our aim was to boost test coverage without significantly increasing development time or costs. As we were discussing the idea of such a tool for unit test automation, we tested several alternatives. They were lacking when it comes to sound testing and test maintenance.

So, knowing how boring, monotonous, and demotivating the software testing job gets over time, we rolled up our sleeves and got down to work. Surely, as a developer, you want to focus on more important tasks. Software development is full of such tasks. And if you ever need to go much further than unit testing and test a web application, we already covered this topic on our blog, too.

And one more thing. If you want to know who a software tester is and why software testing is important, we invite you to read another piece on our blog.

What Kind Of AI Automation Tool We Created

We’ve created a command-line interface tool powered with artificial intelligence to generate comprehensive test suites faster than you can say “runtime error.” No more writing specific test cases. Testing made easy.

Our Test Generation Software Features

  1. AI-Powered Test Generation: Using the might of advanced language models—we’ll share the specifics in a moment—our tool can automatically generate unit tests, significantly reducing the time and effort typically required for test writing.
  2. Multi-Language Support: Thanks to the capabilities of the underlying Large Language Models (LLMs), our tool for unit test automation supports test generation for a wide range of programming languages. Think Dart, Swift, or Kotlin. Whatever the tech stack you use, or project you work on, this flexibility allows you to use one single tool everywhere!
  3. Command-Line Interface (CLI): Our tool is designed as a versatile CLI application, making it easy to integrate into various development workflows and environments. It could do slightly more than simple testing.
  4. Customizable AI Models: Users of the tool can choose from several cutting-edge LLMs, including Anthropic’s—our personal favorite—Claude 3.5 Sonnet, OpenAI’s GPT-4o, and the locally-run Codestral model, to best suit their specific needs and preferences.
  5. Context options: We give developers the freedom to choose how exactly they want to make the AI-generated code better. They can add extra instructions, example code, and more information. This helps make sure the tests are not just correct, but also match what the team wants.

As Mateusz Sawa, our in-house AI enthusiast at Applandeo and the author of this article, recently noted:

In the end, our tool not only saves time but also helps make our software better and more reliable; allowing developers to focus on creating new features and solving important problems, instead of spending so much time writing tests. I, for one, can finally focus on much more creative tasks and experimenting with my pet projects.

How We Built This AI Unit Testing Tool

Choice of AI Models

For unit test automation, we’ve tested three powerhouse language models, :

  • Anthropic’s Claude 3.5 Sonnet
  • OpenAI’s GPT-4o
  • Local Codestral model

The usual suspects, right? But why these three? Well, Claude 3.5 Sonnet and GPT-4o are the top-tier performers that never let you down. The Codestral model is our local hero, perfect for those who care about data privacy as much as they care about AI testing tools. It’s a very secure solution.

Actually, we know more than a thing or two about running AI on-device, locally, and making the most of it in the process. If you want to unlock AI potential in your mobile app with on-device LLMs, or to see how good medium-sized SLMs are, we invite you to rummage through our blog’s archive a bit more. We have articles catering to the most demanding readers.

Ok, back to test generation And a much more pleasant manual test writing. Another reason why we chose Codestral is because it’s a perfect option for those unwilling to use cloud-based APIs. Moreover, after some extensive testing, we found that it consistently produced superior results for our use case. But there’s a but: with 22B parameters and a non-production license, it needs a lot of computer memory (VRAM). And it can only be used for research, reliable test generation, and software development for your pet projects.

Command Line Interface (CLI)

We chose a console application as our primary interface because, let’s face it, real developers live in the terminal. 

Our choice offers: easy integration into your development workflow; a foundation for other interfaces, like IDE plugins; compatibility with various OS and environments; simplified automation and scripting for minimal manual effort and fewer issues in testing.

unit testing automation time savings
Everyone likes to see numbers. These are the results we’ve got with our unit test automation tool.

Experimental AI-Assisted Development

Here’s where it gets meta: a significant portion of our tool itself was generated using the Claude 3.5 Sonnet model itself. Yes, very Inception-esque. We’ll have to admit that we did draw inspiration from M.C. Escher’s art. Generative AI is art, after all. 

This experiment showed us:

  • AI-generated code can significantly speed up development‘ is not just a tidbit AI enthusiasts throw around, but a real fact, as we’ve been using this too for quite a while now.
  • The quality was surprisingly good, as it required way less refactoring than we initially expected.
  • We gained valuable insights into effective AI prompting for code generation, and now we know the secret handshake of the robot world.
  • We also identified areas still needing human intervention. Complex logic and handling edge cases is something where mechanical brains still need a bit of schooling.

In the world of software development, the only constant is change. To keep up with this change, we suggest you fill in the form below. This way you’ll stay on top of all the latest developments.

Want to get more of Generative AI?

Sign up to stay up to speed with our latest AI developments

How It All Works

Can anything work better for illustration purposes than a sweet, short video narrated with an AI-generated voice? Exactly, nothing can! So here’s a sneak peak of our CLI-tool for unit test automation in action:

If your hands are itching to try it all yourself, here’s a link to a README.md file on our Github.

And, finally, if you’re so cool that you want to build it all yourself, without using our ready-made tool for quality assurance process simplification and unit test generation, here’s the link to a prompt we used to create this user friendly AI powered unit testing tool.

By the way, our system was designed with extensibility in mind. So if you ever wish to add some new AI models in the future to broaden functionality, the process should be pretty straightforward.

We Put It To The Test

Did we stress test our next generation testing automation tool? You bet! To make sure that our unit testing tool does the job of unit test generation well enough, we checked it in several test cases. AI based tools do not always live up to the hype, so we had to execute a few tests, check the code quality in real user interactions, and establish some proper testing processes.

Long story short, we threw everything but our office coffee machine at our CLI tool for unit test automation to make sure it works. And it worked. We got satisfying test results—scroll down a bit to see them—and we realized that the code quality was high enough to satisfy even the most demanding software developer.

Our test cases included:

  1. Simple functions (because everyone needs to start somewhere, and adding numbers or sorting lists is a good place to start)
  2. Complex classes (we fed it classes with many methods and properties to see if it could handle more pressure—testing is a stressful affair)
  3. Classes with dependencies (to check if it could juggle classes that depend on other classes, databases, or external services better than our company can juggle relationships between different employees and their personality types. This helped us see if the tool could create tests that use mocks or stubs for these dependencies. It also checked if the AI could understand and test complex relationships between different parts of a system)
  4. Multiple programming languages (because our devs are polyglots at heart, and they are equally fluent in Python, Kotlin, Swift, or Dart)
  5. Edge cases (to see if it could think outside the box and handle stress well)
  6. Real-world examples (because theory is great, but practice makes perfect, and how it performs in our real project is the best testament to the utility of the tool)

And We’ve Got Good Results

Well, look for yourself. Generative AI is truly a gem of technology, and AI tools, if handled properly, are capable of truly magical things. No wonder our tools could execute tests and generate code of very high quality. No bugs, high ability, user friendly interface, and comprehensive coverage.

Our AI powered software for test maintenance and to generate unit tests handled test inputs well. After a series of dominance comparison test to evaluate the relative superiority of the AI models we selected, we’ve got the following results:

So, this is what we’ve found:

  • Claude 3.5 Sonnet came out on top, handling both simple and complex cases with commendable grace. It consistently produced high-quality tests that covered most scenarios.
  • GPT-4 was nipping at Sonnet’s heels, often needing just a little extra instruction (parameter -i) to match its performance. We’d give it the silver medal.
  • Codestral proved to be a solid wingman for test writing. It’s great for generating good basic tests, but you might need to step in to keep things going for those tricky edge cases. It does require way more tweaks than its bigger brothers.

However, the performance of these two models was very close in many situations. It’s worth mentioning that in no case did GPT-4o overpower the opponent. At best, the score was evened out by Sonnet.

Conclusion: Time Saved, Quality Improved

Ok, that was a long story. But it was totally worth it. We found that

  • the AI models were surprisingly good at creating tests for most situations;
  • even with very complex or unusual code (when your intervention was required), the benefits were still enormous;
  • our tool for test suite generation is more than just a fancy Generative AI handiwork–it’s your ticket to faster, more efficient, and dare we say, more enjoyable testing;
  • a choice between Claude 3.5 Sonnet and GPT-4 is a matter of personal preference or specific project needs;
  • Codestral is also a good local option, even if with some limitations.

So, are you ready to take your testing game to the next level? Subscribe to Applandeo blog for more insights on AI & GenAI, automation, software development stories, and more cool tools developed by our AI experts. Trust us, the future is VERY exciting!

Thanks for reading till the end. It’s not the end, but a beginning, really.

Let's chat!

Generative AI Explained: An Introductory Guide—Part 1 - marcel-100px Hi, I’m Marcin, COO of Applandeo

Are you looking for a tech partner? Searching for a new job? Or do you simply have any feedback that you'd like to share with our team? Whatever brings you to us, we'll do our best to help you. Don't hesitate and drop us a message!

Drop a message
Generative AI Explained: An Introductory Guide—Part 1 - Start-a-project