IntroductionSoftware development is difficult. Projects often evolve over several years, under changing requirements and shifting market conditions, impacting developer tools and infrastructure. Technical debt, slow build systems, poor debuggability, and increasing numbers of dependencies can weigh down a project The developers get weary, and cobwebs accumulate in dusty corners of the code base.
Fighting these issues can be taxing and feel like a quixotic undertaking, but don’t worry — the Google Testing Blog is riding to the rescue! This is the first article of a series on “hackability” that identifies some of the issues that hinder software projects and outlines what Google SETIs usually do about them.
According to Wiktionary, hackable is defined as:
hackable (comparative more hackable, superlative most hackable)
- (computing) That can be hacked or broken into; insecure, vulnerable.
- That lends itself to hacking (technical tinkering and modification); moddable.
Obviously, we’re not going to talk about making your product more vulnerable (by, say, rolling your own crypto or something equally unwise); instead, we will focus on the second definition, which essentially means “something that is easy to work on.” This has become the mainfocus for SETIs at Google as the role has evolved over the years.
In PracticeIn a hackable project, it’s easy to try things and hard to break things. Hackability means fast feedback cycles that offer useful information to the developer.
This is hackability:
- Developing is easy
- Fast build
- Good, fast tests
- Clean code
- Easy running + debugging
- One-click rollbacks
- Broken HEAD (tip-of-tree)
- Slow presubmit (i.e. checks running before submit)
- Builds take hours
- Incremental build/link > 30s
- Can’t attach debugger
- Logs full of uninteresting information
Pillar 1: Code Health“I found Rome a city of bricks, and left it a city of marble.”
Keeping the code in good shape is critical for hackability. It’s a lot harder to tinker and modify something if you don’t understand what it does (or if it’s full of hidden traps, for that matter).
TestsUnit and small integration tests are probably the best things you can do for hackability. They’re a support you can lean on while making your changes, and they contain lots of good information on what the code does. It isn’t hackability to boot a slow UI and click buttons on every iteration to verify your change worked - it is hackability to run a sub-second set of unit tests! In contrast, end-to-end (E2E) tests generally help hackability much less (and can evenbe a hindrance if they, or the product, are in sufficiently bad shape).
Figure 1: the Testing Pyramid.
I’ve always been interested in how you actually make unit tests happen in a team. It’s about education. Writing a product such that it has good unit tests is actually a hard problem. It requires knowledge of dependency injection, testing/mocking frameworks, language idioms and refactoring. The difficulty varies by language as well. Writing unit tests in Go or Java is quite easy and natural, whereas in C++ it can be very difficult (and it isn’t exactly ingrained in C++ culture to write unit tests).
It’s important to educate your developers about unit tests. Sometimes, it is appropriate to lead by example and help review unit tests as well. You can have a large impact on a project by establishing a pattern of unit testing early. If tons of code gets written without unit tests, it will be much harder to add unit tests later.
What if you already have tons of poorly tested legacy code? The answer is refactoring and adding tests as you go. It’s hard work, but each line you add a test for is one more line that is easier to hack on.
Readable Code and Code ReviewAt Google, “readability” is a special committer status that is granted per language (C++, Go, Java and so on). It means that a person not only knows the language and its culture and idioms well, but also can write clean, well tested and well structured code. Readability literally means that you’re a guardian of Google’s code base and should push back on hacky and ugly code. The use of a style guide enforces consistency, and code review (where at least one person with readability must approve) ensures the code upholds high quality. Engineers must take care to not depend too much on “review buddies” here but really make sure to pull in the person that can give the best feedback.
Requiring code reviews naturally results in small changes, as reviewers often get grumpy if you dump huge changelists in their lap (at least if reviewers are somewhat fast to respond, which they should be). This is a good thing, since small changes are less risky and are easy to roll back. Furthermore, code review is good for knowledge sharing. You can also do pair programming if your team prefers that (a pair-programmed change is considered reviewed and can be submitted when both engineers are happy). There are multiple open-source review tools out there, such as Gerrit.
Nice, clean code is great for hackability, since you don’t need to spend time to unwind that nasty pointer hack in your head before making your changes. How do you make all this happen in practice? Put together workshops on, say, the SOLID principles, unit testing, or concurrency to encourage developers to learn. Spread knowledge through code review, pair programming and mentoring (such as with the Readability concept). You can’t just mandate higher code quality; it takes a lot of work, effort and consistency.
Presubmit Testing and LintConsistently formatted source code aids hackability. You can scan code faster if its formatting is consistent. Automated tooling also aids hackability. It really doesn’t make sense to waste any time on formatting source code by hand. You should be using tools like gofmt, clang-format, etc. If the patch isn’t formatted properly, you should see something like this (example from Chrome):
$ git cl upload
Error: the media/audio directory requires formatting. Please run
git cl format media/audio.
Source formatting isn’t the only thing to check. In fact, you should check pretty much anything you have as a rule in your project. Should other modules not depend on the internals of your modules? Enforce it with a check. Are there already inappropriate dependencies in your project? Whitelist the existing ones for now, but at least block new bad dependencies from forming. Should our app work on Android 16 phones and newer? Add linting, so we don’t use level 17+ APIs without gating at runtime. Should your project’s VHDL code always place-and-route cleanly on a particular brand of FPGA? Invoke the layout tool in your presubmit and and stop submit if the layout process fails.
Presubmit is the most valuable real estate for aiding hackability. You have limited space in your presubmit, but you can get tremendous value out of it if you put the right things there. You should stop all obvious errors here.
It aids hackability to have all this tooling so you don’t have to waste time going back and breaking things for other developers. Remember you need to maintain the presubmit well; it’s not hackability to have a slow, overbearing or buggy presubmit. Having a good presubmit can make it tremendously more pleasant to work on a project. We’re going to talk more in later articles on how to build infrastructure for submit queues and presubmit.
Single Branch And Reducing RiskHaving a single branch for everything, and putting risky new changes behind feature flags, aids hackability since branches and forks often amass tremendous risk when it’s time to merge them. Single branches smooth out the risk. Furthermore, running all your tests on many branches is expensive. However, a single branch can have negative effects on hackability if Team A depends on a library from Team B and gets broken by Team B a lot. Having some kind of stabilization on Team B’s software might be a good idea there. Thisarticle covers such situations, and how to integrate often with your dependencies to reduce the risk that one of them will break you.
Loose Coupling and TestabilityTightly coupled code is terrible for hackability. To take the most ridiculous example I know: I once heard of a computer game where a developer changed a ballistics algorithm and broke the game’s chat. That’s hilarious, but hardly intuitive for the poor developer that made the change. A hallmark of loosely coupled code is that it’s upfront about its dependencies and behavior and is easy to modify and move around.
Loose coupling, coherence and so on is really about design and architecture and is notoriously hard to measure. It really takes experience. One of the best ways to convey such experience is through code review, which we’ve already mentioned. Education on the SOLID principles, rules of thumb such as tell-don’t-ask, discussions about anti-patterns and code smells are all good here. Again, it’s hard to build tooling for this. You could write a presubmit check that forbids methods longer than 20 lines or cyclomatic complexity over 30, but that’s probably shooting yourself in the foot. Developers would consider that overbearing rather than a helpful assist.
SETIs at Google are expected to give input on a product’s testability. A few well-placed test hooks in your product can enable tremendously powerful testing, such as serving mock content for apps (this enables you to meaningfully test app UI without contacting your real servers, for instance). Testability can also have an influence on architecture. For instance, it’s a testability problem if your servers are built like a huge monolith that is slow to build and start, or if it can’t boot on localhost without calling external services. We’ll cover this in the next article.
Aggressively Reduce Technical DebtIt’s quite easy to add a lot of code and dependencies and call it a day when the software works. New projects can do this without many problems, but as the project becomes older it becomes a “legacy” project, weighed down by dependencies and excess code. Don’t end up there. It’s bad for hackability to have a slew of bug fixes stacked on top of unwise and obsolete decisions, and understanding and untangling the software becomes more difficult.
What constitutes technical debt varies by project and is something you need to learn from experience. It simply means the software isn’t in optimal form. Some types of technical debt are easy to classify, such as dead code and barely-used dependencies. Some types are harder to identify, such as when the architecture of the project has grown unfit to the task from changing requirements. We can’t use tooling to help with the latter, but we can with the former.
I already mentioned that dependency enforcement can go a long way toward keeping people honest. It helps make sure people are making the appropriate trade-offs instead of just slapping on a new dependency, and it requires them to explain to a fellow engineer when they want to override a dependency rule. This can prevent unhealthy dependencies like circular dependencies, abstract modules depending on concrete modules, or modules depending on the internals of other modules.
There are various tools available for visualizing dependency graphs as well. You can use these to get a grip on your current situation and start cleaning up dependencies. If you have a huge dependency you only use a small part of, maybe you can replace it with something simpler. If an old part of your app has inappropriate dependencies and other problems, maybe it’s time to rewrite that part.
The next article will be on Pillar 2: Debuggability.
Did you know if you select multiple items in Mozilla Thunderbird and press Delete followed quickly by enter, Thunderbird deletes the messages and then opens multiple empty message windows?
You can often find unexpected behavior when you trigger two actions at once that the user would never do, such as this particular thing I always do.
In Web testing, you can do this using the Enter key to trigger one button while clicking another or by clicking multiple buttons in quick succession.
In mobile testing, you can do this by tapping two things at once or making two gestures at once. Or by Doing something and pressing the Home button or the Power button.
In desktop application testing, this can be by clicking a button while pressing a hot key or pressing multiple hot keys at once or in rapid succession.
Regardless, the application should always pause other input while taking an action and should always check to see if it has everything it needs to act on when starting an action. In this case, it would be an active, not deleted message.
due to capacity limits, I apologize.
However, here are the slides:
And here is the recording:
I am happy to answer any questions by e-mail, phone or Skype. If you want to arrange a session, my contact info is on the final slide.
Literally. I saw this in the back of Ozark Farm and Neighbor magazine:
At the very cheapest, a domain registration + a year of simple hosting with domain purchase + use of templates and standard copy means that any beef above a couple of steaks is pure profit.
Check out this piece at Ministry of Testing:
You know it ain’t gonna die.
Alice in Chains, “The Rooster”:
Chris McMahon, who has always impressed me with his words and his wit called me out in his blog.
Apropos of my criticism of “Context Driven Approach to Automation in Testing” (I reviewed version 1.04), I ask you to join me in condemning publicly both the tone and the substance of that paper.
Almost exactly a year ago, I reviewed a draft of the paper, and my name is among those listed as reviewing the paper. The feedback I gave was largely editorial (typos and flow), with a few comments about approach that I’ll repeat here:
The first red flag I called out (and will call out again here) is the phrase that describes test automation as something to “automate testing by automating the user”. This is a shallow view of test automation, but other than comment, I didn’t push hard on it. In hindsight, this was a mistake (much of The A Word touches on this topic).
In regards to the scenario the authors chose for scenario automation, I thought the choice was weird, and asked for more…context and provided some food for thought.
I think you can add even more emphasis to how and why you chose to automate scene creation. Too many times, testers choose to automate because they can automate. The thought process (for me) may be, “The scene feature is pretty important, I’m curious what happens if an author has thousands of scenes. Will it cause performance problems, formatting problems, file load problems, or other issues, etc. There’s no way in hell I want to do this manually, so let me write some code to help me run my experiment”. You may also want to discuss alternate implementation ideas – e.g. creating a macro in Notepad++ to create the text then paste it in, or creating a macro in Word for the same, or using for in a windows console (e.g. for /L %f in (1,1,1000) do echo “##”>>output.txt&echo “Scene %f”>>output.txt ). Using tools for creating and manipulating data could be a whole article.
And that led to my real beef with the paper. It talks about using tools to test – which can be a good thing, but it doesn’t really talk about automation in the way successful teams actually use it.
I think it may be important to talk about purposes of automation and where to apply it – or at least one context of that – as I don’t think that’s discussed enough (but I’ve made a note to myself to write a blog on this very subject). At a BFC (big company) like Microsoft, we write a lot of shared / distributed automation – automated tests that we need to run on a lot of different hardware/platform configurations in some sort of lab setting (tools like SauceLabs or BrowserStack are helpful here). Web apps are famous for this [problem].
I also commented:
The other kind of automation (which you cover in your article) is what I sometimes call exploratory automation (or more often, just “Testing”). This is where we get a test idea, and want to write some quick automation to help learn about the product. While we may turn this sort of test into something that’s distributed / shared someday, its primary purpose is to help me answer questions and learn. There’s (another) story in HWTSAM where I described a case of this. I wrote really ugly brute force automation in C (using things like FindWindow and SendMessage(LBUTTON_DOWN…) to simulate opening and closing a connection to a remote host many times (the only thing this app did). It found a nice memory leak that may not have been found otherwise (or at least not as quickly).
All of this feedback fed my uber-point, which was that while the article talked about test automation, the examples really just talked about using somewhat random tools to help the authors explore or test some software. There was nothing about strategy, or about more typical use of test automation. I asked about it in a comment:
I wonder why you don’t use the word “Tools” in the title – e.g. “A CDT approach to tools and automation in testing” or something like that.
…because the paper is about, as I said above, using non-standard tools to help test. Sure, it’s automation in a sense, but nothing in the paper reflects the way test automation is used successfully in thousands of successful products.
All that said, I do not support this paper as a description of good test automation, and I think it’s an inappropriate method for anyone to learn about how to write automation. Chris requested that the authors remove the paper; while I support this, and do believe that the paper can cause more harm than good, there’s so much bad advice on the internet about creating software, that removing this one piece of bad advice will hardly make a dent.
I did not realize my name was listed as a reviewer, and although I did (as admitted above) review this paper, I do not want my name associated with it, and will request that the authors remove my name.(potentially) related posts:
There’s a right way and a wrong way to keep test data out of production. Citigroup chose the wrong way:
It turned out that the error was a result of how the company introduced new alphanumeric branch codes.
When the system was introduced in the mid-1990s, the program code filtered out any transactions that were given three-digit branch codes from 089 to 100 and used those prefixes for testing purposes.
But in 1998, the company started using alphanumeric branch codes as it expanded its business. Among them were the codes 10B, 10C and so on, which the system treated as being within the excluded range, and so their transactions were removed from any reports sent to the SEC.
The SEC routinely sends requests to financial institutions asking them to send all details on transactions between specific dates as a way of checking that nothing untoward is going on. The coding error had resulted in Citigroup failing to send information on 26,810 transactions in over 2,300 such requests.
Citigroup was fined $7,000,000 for the problem which probably stemmed from a lack of communication.
You know when your company wires off some part of the interface because the functionality is incomplete or not ready for the release?
Yeah, it’s like that.
It, too, is a risky maneuver as it’s generally a last minute decision, which doesn’t leave you a lot of time to test to ensure it’s wired off completely in all areas where the user would encounter it.
Much like an application that may lack documentation, Sudoku only gives you a partial solution and you have to fill in the rest. Unlike software, however, guessing actually prevents you from solving the puzzle - mainly because you don't see the impact of an incorrect guess until it is too late to change it.
My friend, Frank Rowland, developed an Excel spreadsheet some time ago, that he used to help solve Sudoku puzzles by adding macros to identify cells that had only one possible correct value. At the time, I thought that was pretty cool. Of course, you still had to enter one value at a time manually. But I thought it was a good example of using a degree of automation to solve a problem.
Fast forward to last week. I was having lunch with Frank and he whips out his notebook PC and shows me the newest version of the spreadsheet. After listening to a math lecture from The Great Courses, he learned some new approaches for solving Sudoku.
Armed with this new information, Frank was successful in practically automating the solution of a Sudoku puzzle. I say "practically" because at times, some human intervention is required.
Now, I think the spreadsheet is very cool and I think that the approach used to solve the puzzle can also be applied to test automation. The twist is that the automation is not pre-determined as far as the numeric values are concerned. The numbers are derived totally dynamically.
Contrast this with traditional test automation. In the traditional approach to test automation (even keyword-driven), you would be able to place numbers in the cells, but only in a repeatable way - not a dynamic way.
In Franks's approach, the actions are determined based on the previous actions and outcomes. For example, when a block of nine cells are filled, that drives the possible values in related cells. The macros in this case know how to deduce the other possibilities and also can eliminate the invalid possibilities. In this case of "Man vs. Machine", the machine wins big time.
I don't have all the answers and I know that work has been done by others to create this type of dynamic test automation. I just want to present the example and would love to hear your experiences in similar efforts.
I think the traditional view of test automation is limited and fragile. It has helped in establishing repeatable tests for some applications, but the problem is that most applications are too dynamic. This causes the automation to fail. At that point, people often get frustrated and give up.
I've been working with test automation since 1989, so I have seen a lot of ideas come down the pike. I really like the possibilities of test automation with AI.
I hope to get a video posted soon to show more about how this works. Once again, I would also love to hear your feedback.