updated: July 2016
Creating a test plan is often a complex undertaking. An ideal test plan is accomplished by applying basic principles of cost-benefit analysis and risk analysis, optimally balancing these software development factors:
- Implementation cost: The time and complexity of implementing testable features and automated tests for specific scenarios will vary, and this affects short-term development cost.
- Maintenance cost: Some tests or test plans may vary from easy to difficult to maintain, and this affects long-term development cost. When manual testing is chosen, this also adds to long-term cost.
- Monetary cost: Some test approaches may require billed resources.
- Benefit: Tests are capable of preventing issues and aiding productivity by varying degrees. Also, the earlier they can catch problems in the development life-cycle, the greater the benefit.
- Risk: The probability of failure scenarios may vary from rare to likely, and their consequences may vary from minor nuisance to catastrophic.
This guide puts the onus on the reader to find the right balance for their project. Also, it does not provide a test plan template, because templates are often too generic or too specific and quickly become outdated. Instead, it focuses on selecting the best content when writing a test plan.
Test plan vs. strategy
Before proceeding, two common methods for defining test plans need to be clarified:
- Single test plan: Some projects have a single "test plan" that describes all implemented and planned testing for the project.
- Single test strategy and many plans: Some projects have a "test strategy" document as well as many smaller "test plan" documents. Strategies typically cover the overall test approach and goals, while plans cover specific features or project updates.
For the purpose of this guide, I will refer to both test document types simply as "test plans”. If you have multiple documents, just apply the advice below to your document aggregation.
A good approach to creating content for your test plan is to start by listing all questions that need answers. The lists below provide a comprehensive collection of important questions that may or may not apply to your project. Go through the lists and select all that apply. By answering these questions, you will form the contents for your test plan, and you should structure your plan around the chosen content in any format your team prefers. Be sure to balance the factors as mentioned above when making decisions.
- Do you need a test plan? If there is no project design document or a clear vision for the product, it may be too early to write a test plan.
- Has testability been considered in the project design? Before a project gets too far into implementation, all scenarios must be designed as testable, preferably via automation. Both project design documents and test plans should comment on testability as needed.
- Will you keep the plan up-to-date? If so, be careful about adding too much detail, otherwise it may be difficult to maintain the plan.
- Does this quality effort overlap with other teams? If so, how have you deduplicated the work?
- Are there any significant project risks, and how will you mitigate them? Consider:
- Injury to people or animals
- Security and integrity of user data
- User privacy
- Security of company systems
- Hardware or property damage
- Legal and compliance issues
- Exposure of confidential or sensitive data
- Data loss or corruption
- Revenue loss
- Unrecoverable scenarios
- Performance requirements
- Misinforming users
- Impact to other projects
- Impact from other projects
- Impact to company’s public image
- Loss of productivity
- What are the project’s technical vulnerabilities? Consider:
- Features or components known to be hacky, fragile, or in great need of refactoring
- Dependencies or platforms that frequently cause issues
- Possibility for users to cause harm to the system
- Trends seen in past issues
- What does the test surface look like? Is it a simple library with one method, or a multi-platform client-server stateful system with a combinatorial explosion of use cases? Describe the design and architecture of the system in a way that highlights possible points of failure.
- What platforms are supported? Consider listing supported operating systems, hardware, devices, etc. Also describe how testing will be performed and reported for each platform.
- What are the features? Consider making a summary list of all features and describe how certain categories of features will be tested.
- What will not be tested? No test suite covers every possibility. It’s best to be up-front about this and provide rationale for not testing certain cases. Examples: low risk areas that are a low priority, complex cases that are a low priority, areas covered by other teams, features not ready for testing, etc.
- What is covered by unit (small), integration (medium), and system (large) tests? Always test as much as possible in smaller tests, leaving fewer cases for larger tests. Describe how certain categories of test cases are best tested by each test size and provide rationale.
- What will be tested manually vs. automated? When feasible and cost-effective, automation is usually best. Many projects can automate all testing. However, there may be good reasons to choose manual testing. Describe the types of cases that will be tested manually and provide rationale.
- How are you covering each test category? Consider:
- Will you use static and/or dynamic analysis tools? Both static analysis tools and dynamic analysis tools can find problems that are hard to catch in reviews and testing, so consider using them.
- How will system components and dependencies be stubbed, mocked, faked, staged, or used normally during testing? There are good reasons to do each of these, and they each have a unique impact on coverage.
- What builds are your tests running against? Are tests running against a build from HEAD (aka tip), a staged build, and/or a release candidate? If only from HEAD, how will you test release build cherry picks (selection of individual changelists for a release) and system configuration changes not normally seen by builds from HEAD?
- What kind of testing will be done outside of your team? Examples:
- External crowdsource testing
- Public alpha/beta versions (how will they be tested before releasing?)
- External trusted testers
- How are data migrations tested? You may need special testing to compare before and after migration results.
- Do you need to be concerned with backward compatibility? You may own previously distributed clients or there may be other systems that depend on your system’s protocol, configuration, features, and behavior.
- Do you need to test upgrade scenarios for server/client/device software or dependencies/platforms/APIs that the software utilizes?
- Do you have line coverage goals?
Tooling and Infrastructure
- Do you need new test frameworks? If so, describe these or add design links in the plan.
- Do you need a new test lab setup? If so, describe these or add design links in the plan.
- If your project offers a service to other projects, are you providing test tools to those users? Consider providing mocks, fakes, and/or reliable staged servers for users trying to test their integration with your system.
- For end-to-end testing, how will test infrastructure, systems under test, and other dependencies be managed? How will they be deployed? How will persistence be set-up/torn-down? How will you handle required migrations from one datacenter to another?
- Do you need tools to help debug system or test failures? You may be able to use existing tools, or you may need to develop new ones.
- Are there test schedule requirements? What time commitments have been made, which tests will be in place (or test feedback provided) by what dates? Are some tests important to deliver before others?
- How are builds and tests run continuously? Most small tests will be run by continuous integration tools, but large tests may need a different approach. Alternatively, you may opt for running large tests as-needed.
- How will build and test results be reported and monitored?
- Do you have a team rotation to monitor continuous integration?
- Large tests might require monitoring by someone with expertise.
- Do you need a dashboard for test results and other project health indicators?
- Who will get email alerts and how?
- Will the person monitoring tests simply use verbal communication to the team?
- How are tests used when releasing?
- Are they run explicitly against the release candidate, or does the release process depend only on continuous test results?
- If system components and dependencies are released independently, are tests run for each type of release?
- Will a "release blocker" bug stop the release manager(s) from actually releasing? Is there an agreement on what are the release blocking criteria?
- When performing canary releases (aka % rollouts), how will progress be monitored and tested?
- How will external users report bugs? Consider feedback links or other similar tools to collect and cluster reports.
- How does bug triage work? Consider labels or categories for bugs in order for them to land in a triage bucket. Also make sure the teams responsible for filing and or creating the bug report template are aware of this. Are you using one bug tracker or do you need to setup some automatic or manual import routine?
- Do you have a policy for submitting new tests before closing bugs that could have been caught?
- How are tests used for unsubmitted changes? If anyone can run all tests against any experimental build (a good thing), consider providing a howto.
- How can team members create and/or debug tests? Consider providing a howto.
- Who are the test plan readers? Some test plans are only read by a few people, while others are read by many. At a minimum, you should consider getting a review from all stakeholders (project managers, tech leads, feature owners). When writing the plan, be sure to understand the expected readers, provide them with enough background to understand the plan, and answer all questions you think they will have - even if your answer is that you don’t have an answer yet. Also consider adding contacts for the test plan, so any reader can get more information.
- How can readers review the actual test cases? Manual cases might be in a test case management tool, in a separate document, or included in the test plan. Consider providing links to directories containing automated test cases.
- Do you need traceability between requirements, features, and tests?
- Do you have any general product health or quality goals and how will you measure success? Consider:
- Release cadence
- Number of bugs caught by users in production
- Number of bugs caught in release testing
- Number of open bugs over time
- Code coverage
- Cost of manual testing
- Difficulty of creating new tests
IntroductionSoftware development is difficult. Projects often evolve over several years, under changing requirements and shifting market conditions, impacting developer tools and infrastructure. Technical debt, slow build systems, poor debuggability, and increasing numbers of dependencies can weigh down a project The developers get weary, and cobwebs accumulate in dusty corners of the code base.
Fighting these issues can be taxing and feel like a quixotic undertaking, but don’t worry — the Google Testing Blog is riding to the rescue! This is the first article of a series on “hackability” that identifies some of the issues that hinder software projects and outlines what Google SETIs usually do about them.
According to Wiktionary, hackable is defined as:
hackable (comparative more hackable, superlative most hackable)
- (computing) That can be hacked or broken into; insecure, vulnerable.
- That lends itself to hacking (technical tinkering and modification); moddable.
Obviously, we’re not going to talk about making your product more vulnerable (by, say, rolling your own crypto or something equally unwise); instead, we will focus on the second definition, which essentially means “something that is easy to work on.” This has become the mainfocus for SETIs at Google as the role has evolved over the years.
In PracticeIn a hackable project, it’s easy to try things and hard to break things. Hackability means fast feedback cycles that offer useful information to the developer.
This is hackability:
- Developing is easy
- Fast build
- Good, fast tests
- Clean code
- Easy running + debugging
- One-click rollbacks
- Broken HEAD (tip-of-tree)
- Slow presubmit (i.e. checks running before submit)
- Builds take hours
- Incremental build/link > 30s
- Can’t attach debugger
- Logs full of uninteresting information
Pillar 1: Code Health“I found Rome a city of bricks, and left it a city of marble.”
Keeping the code in good shape is critical for hackability. It’s a lot harder to tinker and modify something if you don’t understand what it does (or if it’s full of hidden traps, for that matter).
TestsUnit and small integration tests are probably the best things you can do for hackability. They’re a support you can lean on while making your changes, and they contain lots of good information on what the code does. It isn’t hackability to boot a slow UI and click buttons on every iteration to verify your change worked - it is hackability to run a sub-second set of unit tests! In contrast, end-to-end (E2E) tests generally help hackability much less (and can evenbe a hindrance if they, or the product, are in sufficiently bad shape).
Figure 1: the Testing Pyramid.
I’ve always been interested in how you actually make unit tests happen in a team. It’s about education. Writing a product such that it has good unit tests is actually a hard problem. It requires knowledge of dependency injection, testing/mocking frameworks, language idioms and refactoring. The difficulty varies by language as well. Writing unit tests in Go or Java is quite easy and natural, whereas in C++ it can be very difficult (and it isn’t exactly ingrained in C++ culture to write unit tests).
It’s important to educate your developers about unit tests. Sometimes, it is appropriate to lead by example and help review unit tests as well. You can have a large impact on a project by establishing a pattern of unit testing early. If tons of code gets written without unit tests, it will be much harder to add unit tests later.
What if you already have tons of poorly tested legacy code? The answer is refactoring and adding tests as you go. It’s hard work, but each line you add a test for is one more line that is easier to hack on.
Readable Code and Code ReviewAt Google, “readability” is a special committer status that is granted per language (C++, Go, Java and so on). It means that a person not only knows the language and its culture and idioms well, but also can write clean, well tested and well structured code. Readability literally means that you’re a guardian of Google’s code base and should push back on hacky and ugly code. The use of a style guide enforces consistency, and code review (where at least one person with readability must approve) ensures the code upholds high quality. Engineers must take care to not depend too much on “review buddies” here but really make sure to pull in the person that can give the best feedback.
Requiring code reviews naturally results in small changes, as reviewers often get grumpy if you dump huge changelists in their lap (at least if reviewers are somewhat fast to respond, which they should be). This is a good thing, since small changes are less risky and are easy to roll back. Furthermore, code review is good for knowledge sharing. You can also do pair programming if your team prefers that (a pair-programmed change is considered reviewed and can be submitted when both engineers are happy). There are multiple open-source review tools out there, such as Gerrit.
Nice, clean code is great for hackability, since you don’t need to spend time to unwind that nasty pointer hack in your head before making your changes. How do you make all this happen in practice? Put together workshops on, say, the SOLID principles, unit testing, or concurrency to encourage developers to learn. Spread knowledge through code review, pair programming and mentoring (such as with the Readability concept). You can’t just mandate higher code quality; it takes a lot of work, effort and consistency.
Presubmit Testing and LintConsistently formatted source code aids hackability. You can scan code faster if its formatting is consistent. Automated tooling also aids hackability. It really doesn’t make sense to waste any time on formatting source code by hand. You should be using tools like gofmt, clang-format, etc. If the patch isn’t formatted properly, you should see something like this (example from Chrome):
$ git cl upload
Error: the media/audio directory requires formatting. Please run
git cl format media/audio.
Source formatting isn’t the only thing to check. In fact, you should check pretty much anything you have as a rule in your project. Should other modules not depend on the internals of your modules? Enforce it with a check. Are there already inappropriate dependencies in your project? Whitelist the existing ones for now, but at least block new bad dependencies from forming. Should our app work on Android 16 phones and newer? Add linting, so we don’t use level 17+ APIs without gating at runtime. Should your project’s VHDL code always place-and-route cleanly on a particular brand of FPGA? Invoke the layout tool in your presubmit and and stop submit if the layout process fails.
Presubmit is the most valuable real estate for aiding hackability. You have limited space in your presubmit, but you can get tremendous value out of it if you put the right things there. You should stop all obvious errors here.
It aids hackability to have all this tooling so you don’t have to waste time going back and breaking things for other developers. Remember you need to maintain the presubmit well; it’s not hackability to have a slow, overbearing or buggy presubmit. Having a good presubmit can make it tremendously more pleasant to work on a project. We’re going to talk more in later articles on how to build infrastructure for submit queues and presubmit.
Single Branch And Reducing RiskHaving a single branch for everything, and putting risky new changes behind feature flags, aids hackability since branches and forks often amass tremendous risk when it’s time to merge them. Single branches smooth out the risk. Furthermore, running all your tests on many branches is expensive. However, a single branch can have negative effects on hackability if Team A depends on a library from Team B and gets broken by Team B a lot. Having some kind of stabilization on Team B’s software might be a good idea there. Thisarticle covers such situations, and how to integrate often with your dependencies to reduce the risk that one of them will break you.
Loose Coupling and TestabilityTightly coupled code is terrible for hackability. To take the most ridiculous example I know: I once heard of a computer game where a developer changed a ballistics algorithm and broke the game’s chat. That’s hilarious, but hardly intuitive for the poor developer that made the change. A hallmark of loosely coupled code is that it’s upfront about its dependencies and behavior and is easy to modify and move around.
Loose coupling, coherence and so on is really about design and architecture and is notoriously hard to measure. It really takes experience. One of the best ways to convey such experience is through code review, which we’ve already mentioned. Education on the SOLID principles, rules of thumb such as tell-don’t-ask, discussions about anti-patterns and code smells are all good here. Again, it’s hard to build tooling for this. You could write a presubmit check that forbids methods longer than 20 lines or cyclomatic complexity over 30, but that’s probably shooting yourself in the foot. Developers would consider that overbearing rather than a helpful assist.
SETIs at Google are expected to give input on a product’s testability. A few well-placed test hooks in your product can enable tremendously powerful testing, such as serving mock content for apps (this enables you to meaningfully test app UI without contacting your real servers, for instance). Testability can also have an influence on architecture. For instance, it’s a testability problem if your servers are built like a huge monolith that is slow to build and start, or if it can’t boot on localhost without calling external services. We’ll cover this in the next article.
Aggressively Reduce Technical DebtIt’s quite easy to add a lot of code and dependencies and call it a day when the software works. New projects can do this without many problems, but as the project becomes older it becomes a “legacy” project, weighed down by dependencies and excess code. Don’t end up there. It’s bad for hackability to have a slew of bug fixes stacked on top of unwise and obsolete decisions, and understanding and untangling the software becomes more difficult.
What constitutes technical debt varies by project and is something you need to learn from experience. It simply means the software isn’t in optimal form. Some types of technical debt are easy to classify, such as dead code and barely-used dependencies. Some types are harder to identify, such as when the architecture of the project has grown unfit to the task from changing requirements. We can’t use tooling to help with the latter, but we can with the former.
I already mentioned that dependency enforcement can go a long way toward keeping people honest. It helps make sure people are making the appropriate trade-offs instead of just slapping on a new dependency, and it requires them to explain to a fellow engineer when they want to override a dependency rule. This can prevent unhealthy dependencies like circular dependencies, abstract modules depending on concrete modules, or modules depending on the internals of other modules.
There are various tools available for visualizing dependency graphs as well. You can use these to get a grip on your current situation and start cleaning up dependencies. If you have a huge dependency you only use a small part of, maybe you can replace it with something simpler. If an old part of your app has inappropriate dependencies and other problems, maybe it’s time to rewrite that part.
The next article will be on Pillar 2: Debuggability.
Did you know if you select multiple items in Mozilla Thunderbird and press Delete followed quickly by enter, Thunderbird deletes the messages and then opens multiple empty message windows?
You can often find unexpected behavior when you trigger two actions at once that the user would never do, such as this particular thing I always do.
In Web testing, you can do this using the Enter key to trigger one button while clicking another or by clicking multiple buttons in quick succession.
In mobile testing, you can do this by tapping two things at once or making two gestures at once. Or by Doing something and pressing the Home button or the Power button.
In desktop application testing, this can be by clicking a button while pressing a hot key or pressing multiple hot keys at once or in rapid succession.
Regardless, the application should always pause other input while taking an action and should always check to see if it has everything it needs to act on when starting an action. In this case, it would be an active, not deleted message.
due to capacity limits, I apologize.
However, here are the slides:
And here is the recording:
I am happy to answer any questions by e-mail, phone or Skype. If you want to arrange a session, my contact info is on the final slide.
Literally. I saw this in the back of Ozark Farm and Neighbor magazine:
At the very cheapest, a domain registration + a year of simple hosting with domain purchase + use of templates and standard copy means that any beef above a couple of steaks is pure profit.
Check out this piece at Ministry of Testing:
You know it ain’t gonna die.
Alice in Chains, “The Rooster”:
Chris McMahon, who has always impressed me with his words and his wit called me out in his blog.
Apropos of my criticism of “Context Driven Approach to Automation in Testing” (I reviewed version 1.04), I ask you to join me in condemning publicly both the tone and the substance of that paper.
Almost exactly a year ago, I reviewed a draft of the paper, and my name is among those listed as reviewing the paper. The feedback I gave was largely editorial (typos and flow), with a few comments about approach that I’ll repeat here:
The first red flag I called out (and will call out again here) is the phrase that describes test automation as something to “automate testing by automating the user”. This is a shallow view of test automation, but other than comment, I didn’t push hard on it. In hindsight, this was a mistake (much of The A Word touches on this topic).
In regards to the scenario the authors chose for scenario automation, I thought the choice was weird, and asked for more…context and provided some food for thought.
I think you can add even more emphasis to how and why you chose to automate scene creation. Too many times, testers choose to automate because they can automate. The thought process (for me) may be, “The scene feature is pretty important, I’m curious what happens if an author has thousands of scenes. Will it cause performance problems, formatting problems, file load problems, or other issues, etc. There’s no way in hell I want to do this manually, so let me write some code to help me run my experiment”. You may also want to discuss alternate implementation ideas – e.g. creating a macro in Notepad++ to create the text then paste it in, or creating a macro in Word for the same, or using for in a windows console (e.g. for /L %f in (1,1,1000) do echo “##”>>output.txt&echo “Scene %f”>>output.txt ). Using tools for creating and manipulating data could be a whole article.
And that led to my real beef with the paper. It talks about using tools to test – which can be a good thing, but it doesn’t really talk about automation in the way successful teams actually use it.
I think it may be important to talk about purposes of automation and where to apply it – or at least one context of that – as I don’t think that’s discussed enough (but I’ve made a note to myself to write a blog on this very subject). At a BFC (big company) like Microsoft, we write a lot of shared / distributed automation – automated tests that we need to run on a lot of different hardware/platform configurations in some sort of lab setting (tools like SauceLabs or BrowserStack are helpful here). Web apps are famous for this [problem].
I also commented:
The other kind of automation (which you cover in your article) is what I sometimes call exploratory automation (or more often, just “Testing”). This is where we get a test idea, and want to write some quick automation to help learn about the product. While we may turn this sort of test into something that’s distributed / shared someday, its primary purpose is to help me answer questions and learn. There’s (another) story in HWTSAM where I described a case of this. I wrote really ugly brute force automation in C (using things like FindWindow and SendMessage(LBUTTON_DOWN…) to simulate opening and closing a connection to a remote host many times (the only thing this app did). It found a nice memory leak that may not have been found otherwise (or at least not as quickly).
All of this feedback fed my uber-point, which was that while the article talked about test automation, the examples really just talked about using somewhat random tools to help the authors explore or test some software. There was nothing about strategy, or about more typical use of test automation. I asked about it in a comment:
I wonder why you don’t use the word “Tools” in the title – e.g. “A CDT approach to tools and automation in testing” or something like that.
…because the paper is about, as I said above, using non-standard tools to help test. Sure, it’s automation in a sense, but nothing in the paper reflects the way test automation is used successfully in thousands of successful products.
All that said, I do not support this paper as a description of good test automation, and I think it’s an inappropriate method for anyone to learn about how to write automation. Chris requested that the authors remove the paper; while I support this, and do believe that the paper can cause more harm than good, there’s so much bad advice on the internet about creating software, that removing this one piece of bad advice will hardly make a dent.
I did not realize my name was listed as a reviewer, and although I did (as admitted above) review this paper, I do not want my name associated with it, and will request that the authors remove my name.(potentially) related posts:
There’s a right way and a wrong way to keep test data out of production. Citigroup chose the wrong way:
It turned out that the error was a result of how the company introduced new alphanumeric branch codes.
When the system was introduced in the mid-1990s, the program code filtered out any transactions that were given three-digit branch codes from 089 to 100 and used those prefixes for testing purposes.
But in 1998, the company started using alphanumeric branch codes as it expanded its business. Among them were the codes 10B, 10C and so on, which the system treated as being within the excluded range, and so their transactions were removed from any reports sent to the SEC.
The SEC routinely sends requests to financial institutions asking them to send all details on transactions between specific dates as a way of checking that nothing untoward is going on. The coding error had resulted in Citigroup failing to send information on 26,810 transactions in over 2,300 such requests.
Citigroup was fined $7,000,000 for the problem which probably stemmed from a lack of communication.