Write Automated Tests

An ex colleague reached out recently to probe my experience in building, maintaining and reporting with automated test frameworks. I have authored and helped design three large testing frameworks over the years. I am not a guru, but what follows is a series of articles highlighting the principles I hold dear, not just because of their simplicity, but because they are rules to live by. Automated testing is important, useful and within the reach of every developer, in every language.

Before we can do much with a test framework, we need to be on the same page about the importance of testing.

Principle #1 – Write Automated Tests

At my first software engineering job, I was brought on to help test a huge application that had a tremendous amount of importance and I had absolutely no idea how to start. The team would hand me thousands of pages of formal specification documents, direct me to issue trackers, forward me email threads and then sick me on the application to confirm elements were working as advertised. I read specification after specification, found myself working long hours just to stay afloat and ultimately, after an 80-hour week, I folded.

There had to be a better way!

Manually testing from highly technical specifications is inherently complicated. The assumption is that we are all human, and thus flawed. Our brains go numb to the subtle details, our minds skip implicit assertions for lack of interest or context. It is a losing battle from the onset, and one that no one can win.

  • We all forget things,
  • Some of these things are very important,
  • Apologies only go so far

After failing to find a number of major issues with the application, I turned to my whiteboard and looked at the problem from a different angle. To avoid these issues in other contexts, most of us turn to tools: bug trackers like Mantis or Redmine; project trackers like Asana, Trello for JIRA; Calendars for scheduling. Each of these tools helps with organization and formalizing. What would it look like to formalize a specification into something testable?

When the user logs in, they should see the file download link to an authorized build of the toolkit.

I had already been looking at the specifications for such statements but they didn’t typically come in this form so I hadn’t noticed the common structure. I spent a couple of hours pulling out the obvious examples, where the specifications had been structured as above, and dumped them into an excel document and to my amazement — my week-long testing session had a ton of overlap! There were only 5 classes of user, they performed very similar actions and permissions were overlapped between the classes. With some simple collation, I cut my time down to just over two days.

Most applications start off small, only a couple dozen test cases that can be executed in a matter of minutes when Compilation has completed. Over time, even the most innocent of applications will evolve into a tightly connected jumble of thoughts and intentions. As the application grows, this tight coupling is a major source of heartache and pain since subtle modifications to the code can have rippling effects.

It is not uncommon for there to be thousands of tests for an application. They vary in complexity, some tests being navigation based or checking for specific text strings on the page while others require a specific type of user to execute a number of actions that are inter dependent and exhaustive. At this job, I was one of a small team of testers, so our work was divided and it took me 8 to 10 days to execute a regression prior to optimizing for duplicated actions. In reality, it was taking the company approximately 3 man-months to complete the regression which was needed for our quarterly release.

A man-day is 8 hours of a person testing. A man-month is 30 such days.

Over the course of a year, that adds up to a healthy price tag on testing our work. One entire engineer worth of salary was paid to test the application. Believe it or not, this is extremely common. A recent client was averaging more than 20 man-months to complete their regression.

Even with finding my optimization, the full regression was only able to save 20% of the time to execute the full regression because most of the tests were still unique.

Leveling Up

I went home to think it through and while parked on my couch, I remembered a saying from my Perl course – CPAN has a module for everything. By the time I was back in the office, I had written a simple test harness using a web browser to march through our website. It was rudimentary at best, without much in the way of visuals and the test cases were explicitly written, but it would report a simple red Fail when the application had issues or a bright green Pass if everything went well. The best part was that 2 days of work resulted in cutting the execution time of my portion of the regression down to 12 seconds. Clearly, there is an argument to be made for automation being beneficial. 

Let’s back up a bit though. In order to get from ~ 8 days of testing down to 12 seconds I went through a process of refinement. I started with reading the specifications and email threads, then distilled them down into clear statements of fact about the application, and then I translated those facts into tests about the pages of the application. The tests are what can be considered an interface to the specifications, cleaned up and formalized so the testing can focus on functionality and internal state modification. Each test in my tool translated to one or many lines in the specifications.

This is typically where people smack their heads and ask the bartender for another couple of fingers of Scotch, but we aren’t quite done. There are many more perks with automated testing! For instance, as a developer, I need to be able to know that the changes I made to my code base aren’t breaking anything else in the application, and compile time type checking doesn’t cut it. With this toolkit, we could execute 40% of our test cases in no more than 12 seconds. With work, we raised the number of included tests to 100% of the documented manual cases and ultimately this lead to the addition of another few hundred cases before moving to a formal data source and then things got pretty nuts. When I left the team, we had 17800 test cases being executed in about 18 minutes.

  • We do not know the effects of our code changes until they are implemented
  • Keeping the entire scope of an application in our minds is impossible
  • Manual testing leaves holes that may not be found until the application has been deployed to production
  • We need to be able to iterate on our implementations and find out if there are problems as soon as possible

There are many benefits to automated testing, but these were the most persuasive to me early on.

Other roads

Some teams decide to take the long detour through a dark forest of lies and gluttony called it’s too much work. When confronted with a situation where you have to spend multiple years worth of testing per year some teams, or managers, choose instead to ignore reality and use shortcuts.

One argument goes a little like this:

P1 I have a lot of tests to execute.
P2 Executing all of the tests will take more time than is left in the universe.
C Execute no tests.

I have had this conversation with leads on a couple of projects and without fail, every release is a nightmare – Bugs slip through and features aren’t complete.


A second argument goes similarly:

P1 I have a lot of tests to execute.
P2 Executing all of the tests will take more time than is left in the universe.
C Execute the important tests.

This one is problematic because of the complicated nature of importance. Our team defined “important” as the set of tests covering the users common interactions. We tracked their actions, monitored their time spent executing things and focused our time on these common actions. The problems presented themselves consistently in the form of uncommon tasks being buggy and unreliable. This caused a feedback loop where common tasks became more common because doing things the uncommon way was not well-tested and drove users to do things the common way, neglecting new features. In turn, new features were not well-tested and it continued like that for a long time. It wasn’t until we did some hands on AB testing that we even found out about it.


Which brings us to the third most common argument I’ve run into:

P1 I have a lot of tests to execute.
P2 Executing all of the tests will take more time than is left in the universe.
C Execute tests against the newest features.

As with the previous example, this leaves a massive hole in deployment. By focusing on new features, older heavily relied upon features move further and further out of phase. Older features are often the ones that drew users to your application in the first place.

If you find yourself getting pulled into one of these situations, or anything remotely similar, you may want to start thinking about a change in scenery.

Closing

If you have made it this far I can only assume you are looking for more information on Automated Testing. It is a practice that I hope to pass on to more developers. The change in mentality is not always easy and requires significant changes to a team before it will catch.

As with everything, we can only do so much, but at least we agree on the importance of testing.

I hope you have enjoyed this, if you have any questions, comments or corrections please drop them off in the comments.