Software Testing

Very Short Blog Posts (20): More About Testability

DevelopSense - Michael Bolton - Mon, 07/14/2014 - 15:30
A few weeks ago, I posted a Very Short Blog Post on the bare-bones basics of testability. Today, I saw a very good post from Adam Knight talking about telling the testability story. Adam focused, as I did, on intrinsic testability—things in the product itself that it more testable. But testability isn’t just a product […]
Categories: Software Testing

Measuring Coverage at Google

Google Testing Blog - Mon, 07/14/2014 - 13:45
By Marko Ivanković, Google Zürich

Introduction
Code coverage is a very interesting metric, covered by a large body of research that reaches somewhat contradictory results. Some people think it is an extremely useful metric and that a certain percentage of coverage should be enforced on all code. Some think it is a useful tool to identify areas that need more testing but don’t necessarily trust that covered code is truly well tested. Others yet think that measuring coverage is actively harmful because it provides a false sense of security.

Our team’s mission was to collect coverage related data then develop and champion code coverage practices across Google. We designed an opt-in system where engineers could enable two different types of coverage measurements for their projects: daily and per-commit. With daily coverage, we run all tests for their project, where as with per-commit coverage we run only the tests affected by the commit. The two measurements are independent and many projects opted into both.

While we did experiment with branch, function and statement coverage, we ended up focusing mostly on statement coverage because of its relative simplicity and ease of visualization.

How we measured
Our job was made significantly easier by the wonderful Google build system whose parallelism and flexibility allowed us to simply scale our measurements to Google scale. The build system had integrated various language-specific open source coverage measurement tools like Gcov (C++), Emma / JaCoCo (Java) and Coverage.py (Python), and we provided a central system where teams could sign up for coverage measurement.

For daily whole project coverage measurements, each team was provided with a simple cronjob that would run all tests across the project’s codebase. The results of these runs were available to the teams in a centralized dashboard that displays charts showing coverage over time and allows daily / weekly / quarterly / yearly aggregations and per-language slicing. On this dashboard teams can also compare their project (or projects) with any other project, or Google as a whole.

For per-commit measurement, we hook into the Google code review process (briefly explained in this article) and display the data visually to both the commit author and the reviewers. We display the data on two levels: color coded lines right next to the color coded diff and a total aggregate number for the entire commit.


Displayed above is a screenshot of the code review tool. The green line coloring is the standard diff coloring for added lines. The orange and lighter green coloring on the line numbers is the coverage information. We use light green for covered lines, orange for non-covered lines and white for non-instrumented lines.

It’s important to note that we surface the coverage information before the commit is submitted to the codebase, because this is the time when engineers are most likely to be interested in improving it.

Results
One of the main benefits of working at Google is the scale at which we operate. We have been running the coverage measurement system for some time now and we have collected data for more than 650 different projects, spanning 100,000+ commits. We have a significant amount of data for C++, Java, Python, Go and JavaScript code.

I am happy to say that we can share some preliminary results with you today:


The chart above is the histogram of average values of measured absolute coverage across Google. The median (50th percentile) code coverage is 78%, the 75th percentile 85% and 90th percentile 90%. We believe that these numbers represent a very healthy codebase.

We have also found it very interesting that there are significant differences between languages:

C++JavaGoJavaScriptPython56.6%61.2%63.0%76.9%84.2%

The table above shows the total coverage of all analyzed code for each language, averaged over the past quarter. We believe that the large difference is due to structural, paradigm and best practice differences between languages and the more precise ability to measure coverage in certain languages.

Note that these numbers should not be interpreted as guidelines for a particular language, the aggregation method used is too simple for that. Instead this finding is simply a data point for any future research that analyzes samples from a single programming language.

The feedback from our fellow engineers was overwhelmingly positive. The most loved feature was surfacing the coverage information during code review time. This early surfacing of coverage had a statistically significant impact: our initial analysis suggests that it increased coverage by 10% (averaged across all commits).

Future work
We are aware that there are a few problems with the dataset we collected. In particular, the individual tools we use to measure coverage are not perfect. Large integration tests, end to end tests and UI tests are difficult to instrument, so large parts of code exercised by such tests can be misreported as non-covered.

We are working on improving the tools, but also analyzing the impact of unit tests, integration tests and other types of tests individually.

In addition to languages, we will also investigate other factors that might influence coverage, such as platforms and frameworks, to allow all future research to account for their effect.

We will be publishing more of our findings in the future, so stay tuned.

And if this sounds like something you would like to work on, why not apply on our job site?

Categories: Software Testing

Pushing Up All The Levels On The Mixing Board

Eric Jacobson's Software Testing Blog - Fri, 07/11/2014 - 09:50

…may not be a good way to start testing.

I heard a programmer use this metaphor to describe the testing habits of a tester he had worked with. 

As a tester, taking all test input variables to their extreme, may be an effective way to find bugs.  However, it may not be an effective way to report bugs.  Skilled testers will repeat the same test until they isolate the minimum variable(s) that cause the bug.  Or using this metaphor, they may repeat the same test with all levels on the mixing board pulled down, except the one they are interested in observing.

Once identified, the skilled tester will repeat the test only changing the isolated variable, and accurately predict a pass or fail result.

Categories: Software Testing

“Bug In The Test” vs. “Bug In The Product”

Eric Jacobson's Software Testing Blog - Wed, 07/09/2014 - 12:44

Dear Test Automators,

The next time you discuss automation results, please consider qualifying the context of the word “bug”.

If automation fails, it means one of two things:

  1. There is a bug in the product-under-test.
  2. There is a bug in the automation.

The former is waaaaaay more important than the latter.  Maybe not to you, but certainly for your audience.

Instead of saying,

This automated check failed”,

consider saying,

This automated check failed because of a bug in the product-under-test”.

 

Instead of saying,

I’m working on a bug”,

consider saying,

I’m working on a bug in the automation”.

 

Your world is arguably more complex than that of testers who don’t use automation.  You must test twice as many programs (the automation and the product-under-test).  Please consider being precise when you communicate.

Categories: Software Testing

QA Music: Quality Assurance, Defined

QA Hates You - Mon, 07/07/2014 - 04:24

Avenged Sevenfold, “This Means War”:

Are we at war with the others in the software industry who accept poor quality software? Are we at war with ourselves because we give it just slightly less than we’ve got and sometimes a lot less than it takes?

Yes.

Categories: Software Testing

Data Testing vs. Functional Testing

Eric Jacobson's Software Testing Blog - Wed, 07/02/2014 - 12:32

So, you’ve got a green thumb.  You’ve been growing houseplants your whole life.  Now try to grow an orchid.  What you’ve learned about houseplants has taught you very little about orchids.

  • Put one in soil and you’ll kill it (orchids grow on rocks or bark). 
  • Orchids need about 20 degrees Fahrenheit difference between day and night.
  • Orchids need wind and humidity to strive.
  • Orchids need indirect sunlight.  Lots of it.  But put them in the sun and they’ll burn.
  • Fading flowers does not mean your orchid is dying (orchids bloom in cycles).

So, you’re a skilled tester.  You’ve been testing functional applications with user interfaces your whole career.  Now try to test a data warehouse.  What you’ve learned about functionality testing has taught you very little about data testing.

  • “Acting like a user”, will not get you far.  Efficient data testing does not involve a UI and depends little on other interfaces.  There are no buttons to click or text boxes to interrogate during a massive data quality investigation.
  • Lack of technical skills will kill you.  Interacting with a DB requires DB Language skills (e.g., TSQL).  Testing millions of lines of data requires coding skills to enlist the help of machine-aided-exploratory-testing.
  • Checking the health of your data warehouse prior to deployments probably requires automated checks.
  • For functional testing, executing shallow tests first to cover breadth, then deep tests later is normally a good approach.  In data testing, the opposite may be true.
  • If you are skilled at writing bug reports with detailed repro steps, this skill may hinder your effectiveness at communicating data warehouse bugs, where repro steps may not be important.
  • If you are used to getting by as a tester, not reading books about the architecture or technology of your system-under-test, you may fail at data warehouse testing.  In order to design valuable tests, a tester will need to study data warehouses until they grok concepts like Inferred Members, Junk Dimensions, Partitioning, Null handling, 3NF, grain, and Rapidly Changing Monster Dimensions.

Testers, let’s respect the differences in the projects we test, and grow our skills accordingly.  Please don’t use a one-size-fits-all approach.

Categories: Software Testing

Pages