automation

Use Strong Testing Harnesses

What qualities does a strong testing framework have? What guidelines should developers use when choosing one? Why do those guidelines matter? What tools would I recommend?

In the previous article I made the case for Automated Testing. Today we will talk about the tools.

What qualities does a strong testing framework have?
What guidelines should developers use when choosing one?
Why do those guidelines matter?
What tools would I recommend?

I will use the terms Harness, Framework and Library interchangeably. Be warned.

All of this and more, after a few choice bold words, a digit, a hash, and a hyphen.

Principle #2 - Use Strong Testing Harnesses

So here's the thing - There are hundreds, if not thousands of testing libraries out there to choose from, each with subtle differences in implementation and reporting options. This is exceptionally true in C# and C++. Depending on who you talk to, the idea may even be floated that it's worth it to roll your own.

If you look beyond the first few options, you are wasting time. Most frameworks provide the same features with slightly different syntax. In the context of this topic, it is more important to be writing tests than to worry about syntax. we may have time to discuss my preferences later on and why I have them.

The key features you should be looking for are:

Cases are written in the same language as the application is.
Cases are able to be organized around concepts, in reports and in code.
Execution can be automated.
Reports are clear and easy to review at a glance for unskilled eyes.
Reports include all relevant details to allow for easy direction when issues arise.
Executing single cases is possible without much overhead.

Why are these key in making your decision about testing frameworks?

Let's start at the top.

Same Language

If you are working on a MEAN application, and writing your tests in ruby, that means you have to understand JavaScript, Express, Angular, Node, MongoDB, and likely the Cypress.io platform before you can deliver anything useful. This is a subtle kind of context switching, similar in nature to reading two books by different authors, a chapter at a time. After a while your will lose context and the details and idioms of the different languages and API will meld together.

When working in C#, I use NUnit; when working in JavaScript or TypeScript I use Mocha + Chai.should or Jest; when working in C++ on Windows I use the CPPUnit. I have used them time and again, and grown to include them in my workflow, even when it comes time to test games or GUI implementations. I do this with few deviations, typically only to accommodate my team, but even then using a framework written in a different language is only agreed to after the whole team is on board.

Organization

When I say tests are organized around concepts, I mean that your test cases are not necessarily grouped by class or namespace. If I am building out the interface to the player in GenericShooter123ABC, I should be able to organize my test cases by their conceptual type:

Player should be able to jump
Player should be able to double jump
Player should be able to crouch
Player should be able to side strafe
Player should be able to walk
Player should be able to sprint

These are all movement related concepts and reside in the Player Movement test category. Player should be able to spend gold is not a member of this category because it is a part of the Economic System, and not a movement concept. Organizing tests in this way allows you to see when movement is complete.

You can draw the line at whatever point is useful for you or your team. These tests are here, as stated before, to document features, so you should be able to bring a test report to the review meeting and see definitively that a given feature or set of features is completed.

Automated Testing

If you cannot automate the running of these test cases its going to be a real pain to get to a report. In order to ensure cross platform support, libraries like V8 go the route of building out a simple application that when run executes a ton of test cases, generating a running tally of the results for the tester to review - this is the bare minimum.

NUnit 2.6.4 has a GUI toolkit and plugins to patch itself into Visual Studio to drive execution and reporting of test cases. You compile, it detects that a new DLL is present and executes the test cases. You get visual results back that can even be used to highlight where failures have occurred.

NUnit 3 has removed the GUI tool so Resharper, command line batch files, and hot keys in Visual Studio are used instead.

Mocha can be paired with tools like nodemon to monitor for changes in the file system and execute accordingly. Alternatively, mocha has a more rudimentary file system monitor built-in that can be very handy in a bind. Jest also has a watch mode to help in this regard.

CPPUnit is hooked into Visual Studio and tools like Resharper can detect updated libraries and trigger test execution.

Optimally, you should be able to execute them in the context of a continuous integration tool like Hudson or Jenkins, tracking test results and code coverage over time. Hopefully, for all of our sakes, this will become more of an expectation for software teams as time goes on. It takes considerable effort to convince some upper management types of the benefits to these methodologies. Sometimes, though, you have to ask for forgiveness and not permission.

Easy to Read Reports

As much as we would all like to be at the top of our game at all times, more of our time than we would ever like to admit is spent wondering what we did to screw up and cause these 70 insane errors. Tack on the unfortunate fact that we are accountable to non-technical people for our time and effort and you come to the conclusion that either you will need to spend a lot of time explaining fine grained detail to people that will never care, or you will need to dumb it down.

Whether we are writing tests for ourselves, our colleagues, QA or translating product-owner-written acceptance tests into technical requirements, someone is likely to want to draw a conclusion from the green checkmarks and red Xs on your report.

No matter which format you choose, make sure special applications are not needed to be able to read them. Your managers, and colleagues, may be interested in looking at them while on their lunch or bathroom break; on their iPad, iPhone if they are yuppies; or Android watch if they are early adopting losers; or maybe they printed it out and read it at Grandma's Pie Shop on 23rd. The point is, you don't know where or when someone will have the time to review the state of your application.

To this end, I recommend that reports be HTML or XLS files for portability and readability.

Include Details in the Report

Alright! Story time!

When I built out my first testing framework a decade or so ago I had a single word returned to the console - PASSED or FAILED. This meant that I could run it and see the result, in a neat little Christmassy kind of way, but when things failed I was completely uninformed.

Was it test 20 or 42?
Did the UI change on the webpage?
Did the field name change?
Was it an actual issue with the application or with my framework?

Back then, to figure out the answers to these questions I would walk through the test cases manually. Super dumb, in hindsight, and it likely amounts to weeks or months of my life that I will never get back - Thank you Broadcom for sponsoring me through those golden times of discovery!

It took me a couple months to agree with myself that I needed to trust my code to deliver, even though it was Perl at the time.

I started by reporting progress to the console.
Then I learned to output a progress reporter to the console using carriage returns.
Then I output the description of the test case results to a secondary file.
Ultimately, I came to the conclusion that I needed a final report that included a stack trace to where the failure occurred.

After a few weeks of iteration, I transformed a week long manual test into a seconds long test session that resulted in a color coded XLS file ready for consumption by upper management. It included...

clear descriptions of what was being tested.
how a given test was executed
where it failed in the form of a stack trace.
when it failed and how long it took to execute it.
as well as human readable descriptions of how to reproduce the issue manually.

It was right 100% of the time, and was useful enough to share with my manager when requested. If you don't see the benefits of including these types of details in your report by now, you likely need other help.

Allow for Executing Single Cases

A chief concern of mine early on was what I would do when a test case failed. The philosophy of TDD is to fail a test, fix the test, refactor and repeat. How does this scale to the insanity of building a framework? Sure the first dozen tests are there and now they pass, but what do I do when I get to test 432 and I don't care about more than a dozen of them that I am directly working on?

The frameworks I mentioned above are all very good at letting you run tests individually or filtering them with a regular expression.

When I was at Rhythm I worked on an application that had enough test cases that it took no fewer than 6 hours to complete the full regression suite. These tests were overwhelming to execute in totality. Expecting any of the 12 engineers we had to execute all of them at any point other than just after merging the trunk into their branch was ridiculous.

When one of the cases would fail - even for a networking issue or just a timeout - we would still have to rerun the entire suite before QA would sign off on it. Occasionally though, there were tests that would crash the system and depending on scheduling we would run from the last successful test - without that option I doubt any of us would have delivered on most of our deadlines.

Rolling your own

This industry is terrible about reuse, but there are situations where you don't have the opportunity to use well known tools and libraries, as in embedded or fringe platforms. There are always idioms built into languages to help you out, thankfully.

C/C++ has assert
Perl has or die
Ruby has unless
JavaScript has throw new Error("...") and console.assert
C# has Debug.Assert

Generally speaking, no matter what your language or platform of choice you will find a similar option available to use for reporting errors.

That said - it is incredibly unlikely that you will ever need to resort to using them. You don't likely want to take on the maintenance of a testing framework along with the application you are working on. Keep in mind, there is absolutely no purpose to building a testing framework that itself is not being tested - fully.

At this point, you are probably convinced that automated testing is a great idea, along with the merits of using well known and documented frameworks to write your tests. These two principles are key, hence why they occur so early.

As we move forward we have one more topic to cover before we can dive into code examples and the details, bear with me - everything will come together soon.

The next principle – Write Great Tests!