It looks as though FogBugz has decided to offer a little advice in the defect report’s description field:
Its placeholder says:
Every good bug report needs exactly three things: steps to reproduce, what you expected to see, and what you saw instead.
Exactly three things? Well, I must be an overachiever then when I add some analysis or relationships to other bugs, logs, and so on.
But that’s my way.
At Google, we run a very large corpus of tests continuously to validate our code submissions. Everyone from developers to project managers rely on the results of these tests to make decisions about whether the system is ready for deployment or whether code changes are OK to submit. Productivity for developers at Google relies on the ability of the tests to find real problems with the code being changed or developed in a timely and reliable fashion.
Tests are run before submission (pre-submit testing) which gates submission and verifies that changes are acceptable, and again after submission (post-submit testing) to decide whether the project is ready to be released. In both cases, all of the tests for a particular project must report a passing result before submitting code or releasing a project.
Unfortunately, across our entire corpus of tests, we see a continual rate of about 1.5% of all test runs reporting a "flaky" result. We define a "flaky" test result as a test that exhibits both a passing and a failing result with the same code. There are many root causes why tests return flaky results, including concurrency, relying on non-deterministic or undefined behaviors, flaky third party code, infrastructure problems, etc. We have invested a lot of effort in removing flakiness from tests, but overall the insertion rate is about the same as the fix rate, meaning we are stuck with a certain rate of tests that provide value, but occasionally produce a flaky result. Almost 16% of our tests have some level of flakiness associated with them! This is a staggering number; it means that more than 1 in 7 of the tests written by our world-class engineers occasionally fail in a way not caused by changes to the code or tests.
When doing post-submit testing, our Continuous Integration (CI) system identifies when a passing test transitions to failing, so that we can investigate the code submission that caused the failure. What we find in practice is that about 84% of the transitions we observe from pass to fail involve a flaky test! This causes extra repetitive work to determine whether a new failure is a flaky result or a legitimate failure. It is quite common to ignore legitimate failures in flaky tests due to the high number of false-positives. At the very least, build monitors typically wait for additional CI cycles to run this test again to determine whether or not the test has been broken by a submission adding to the delay of identifying real problems and increasing the pool of changes that could contribute.
In addition to the cost of build monitoring, consider that the average project contains 1000 or so individual tests. To release a project, we require that all these tests pass with the latest code changes. If 1.5% of test results are flaky, 15 tests will likely fail, requiring expensive investigation by a build cop or developer. In some cases, developers dismiss a failing result as flaky only to later realize that it was a legitimate failure caused by the code. It is human nature to ignore alarms when there is a history of false signals coming from a system. For example, see this article about airline pilots ignoring an alarm on 737s. The same phenomenon occurs with pre-submit testing. The same 15 or so failing tests block submission and introduce costly delays into the core development process. Ignoring legitimate failures at this stage results in the submission of broken code.
We have several mitigation strategies for flaky tests during presubmit testing, including the ability to re-run only failing tests, and an option to re-run tests automatically when they fail. We even have a way to denote a test as flaky - causing it to report a failure only if it fails 3 times in a row. This reduces false positives, but encourages developers to ignore flakiness in their own tests unless their tests start failing 3 times in a row, which is hardly a perfect solution.
Imagine a 15 minute integration test marked as flaky that is broken by my code submission. The breakage will not be discovered until 3 executions of the test complete, or 45 minutes, after which it will need to be determined if the test is broken (and needs to be fixed) or if the test just flaked three times in a row.
Other mitigation strategies include:
- A tool that monitors the flakiness of tests and if the flakiness is too high, it automatically quarantines the test. Quarantining removes the test from the critical path and files a bug for developers to reduce the flakiness. This prevents it from becoming a problem for developers, but could easily mask a real race condition or some other bug in the code being tested.
- Another tool detects changes in the flakiness level of tests and works to identify the change that caused the test to change the level of flakiness.
In summary, test flakiness is an important problem, and Google is continuing to invest in detecting, mitigating, tracking, and fixing test flakiness throughout our code base. For example:
- We have a new team dedicated to providing accurate and timely information about test flakiness to help developers and build monitors so that they know whether they are being harmed by test flakiness.
- As we analyze the data from flaky test executions, we are seeing promising correlations with features that should enable us to identify a flaky result accurately without re-running the test.
By continually advancing the state of the art for teams at Google, we aim to remove the friction caused by test flakiness from the core developer workflows.
Automation is SO easy.
Let me rephrase that - automation often seems to be very easy.When you see your first demo, or run your first automated test, it’s like magic - wow, that’s good, wish I could type that fast.
But good automation is very different to that first test.
If you go into the garden and see a lovely juicy fruit hanging on a low branch, and you reach out and pick it, you think, "Wow, that was easy - isn’t it good, lovely and tasty".
But good test automation is more like building an orchard to grow enough fruit to feed a small town.
Where do you start?First you need to know what kind of fruit you want to grow - apples? oranges? (oranges would not be a good choice for the UK). You need to consider what kind of soil you have, what kind of climate, and also what will the market be - you don’t want to grow fruit that no one wants to buy or eat.
In automation, first you need to know what kind of tests you want to automate, and why. You need to consider the company culture, other tools, what the context is, and what will bring lasting value to your business.
Growing pains?Then you need to grow your trees. Fortunately automation can grow a lot quicker than trees, but it still takes time - it’s not instant.
While the trees are growing, you need to prune them and prune them hard especially in the first few years. Maybe you don’t allow them to fruit at all for the first 3 years - this way you are building a strong infrastructure for the trees so that they will be stronger and healthier and will produce much more fruit later on. You may also want to train them to grow into the structure that you want from the trees when they are mature.
In automation, you need to prune your tests - don’t just let them grow and grow and get all straggly. You need to make sure that each test has earned its place in your test suite, otherwise get rid of it. This way you will build a strong infrastructure of worthwhile tests that will make your automation stronger and healthier over the years, and it will bring good benefits to your organisation. You need to structure your automation (a good testware architecture) so that it will give lasting benefits.
Feeding, pests and diseasesOver time, you need to fertilise the ground, so that the trees have the nourishment they need to grow to be strong and healthy.
In automation, you need to nourish the people who are working on the automation, so that they will continue to improve and build stronger and healthier automation. They need to keep learning, experimenting, and be encouraged to make mistakes - in order to learn from them.
You need to deal with pests - bugs - that might attack your trees and damage your fruit.
Is this anything to do with automation? Are there bugs in automated scripts? In testing tools? Of course there are, and you need to deal with them - be prepared to look for them and eradicate them.
What about diseases? What if one of your trees gets infected with some kind of blight, or suddenly stops producing good fruit? You may need to chop down that infected tree and burn it, because it you don’t, this blight might spread to your whole orchard.
Does automation get sick? Actually, a lot of automation efforts seem to decay over time - they take more and more effort to maintain. technical debt builds up, and often the automation dies. If you want your automation to live and produce good results, you might need to take drastic action and re-factor the architecture if it is causing problems. Because if you don’t, your whole automation may die.
Picking and packingWhat about picking the fruit? I have seen machines that shake the trees so they can be scooped up - that might be ok if you are making cider or applesauce, but I wouldn’t want fruit picked in that way to be in my fruit bowl on the table. Manual effort is still needed. The machines can help but not do everything (and someone is driving the machines).
Test execution tools don’t do testing, they just run stuff. The tools can help and can very usefully do some things, but there are tests that should not be automated and should be run manually. The tools don’t replace testers, they support them.
We need to pack the fruit so it will survive the journey to market, perhaps building a structure to hold the fruit so it can be transported without damage.
Automation needs to survive too - it needs to survive more than one release of the application, more than one version of the tool, and may need to run on new platforms. The structure of the automation, the testware architecture, is what determines whether or not the automated tests survive these changes well.
Marketing, selling, roles and expectationsIt is important to do marketing and selling for our fruit - if no one buys it, we will have a glut of rotting fruit on our hands.
Automation needs to be marketed and sold as well - we need to make sure that our managers and stakeholders are aware of the value that automation brings, so that they want to keep buying it and supporting it over time.
By the way, the people who are good at marketing and selling are probably not the same people who are good at picking or packing or pruning - different roles are needed. Of course the same is true for automation - different roles are needed: tester, automator, automation architect, champion (who sells the benefits to stakeholders and managers).
Finally, it is important to set realistic expectations. If your local supermarket buyers have heard that eating your fruit will enable them to leap tall buildings at a single bound, you will have a very easy sell for the first shipment of fruit, but when they find out that it doesn’t meet those expectations, even if the fruit is very good, it may be seen as worthless.
Setting realistic expectations for automation is critical for long-term success and for gaining long-term support; otherwise if the expectations aren’t met, the automation may be seen as worthless, even if it is actually providing useful benefits.
SummarySo if you are growing your own automation, remember these things:
- - it takes time to do it well
- - prepare the ground
- - choose the right tests to grow
- - be prepared to prune / re-factor
- - deal with pests and diseases (see previous point)
- - make sure you have a good structure so the automation will survive change
- - different roles are needed
- - sell and market the automation and set realistic expectations
- - you can achieve great results
I hope that all of your automation efforts are very fruitful!
The HTTP Archive crawls the world’s top URLs twice each month and records detailed information like the number of HTTP requests, the most popular image formats, and the use of gzip compression. In addition to aggregate stats, the HTTP Archive has the same data for individual websites plus images and video of the site loading. It’s built on top of WebPageTest (yay Pat!), and all our code and data is open source. HTTP Archive is part of the Internet Archive and is made possible thanks to our sponsors: Google, Mozilla, New Relic, O’Reilly Media, Etsy, dynaTrace, Instart Logic, Catchpoint Systems, Fastly, SOASTA mPulse, and Hosting Facts.
I started the HTTP Archive in November 2010. Even though I worked at Google, I decided to use Internet Explorer 8 to gather the data because I wanted the data to represent the typical user experience and IE 8 was the world’s most popular browser. Later, testing switched to IE 9 when it became the most popular browser. Chrome’s popularity has been growing, so we started parallel testing with Chrome last year in anticipation of switching over. This month, it was determined that Chrome is the world’s most popular browser.
In May 2011, I launched HTTP Archive Mobile. This testing was done with real iPhones. It started by testing 1,000 URLs and has “scaled up” to 5,000 URLs. I put that in quotes because 5,000 URLs is far short of the 500,000 URLs being tested on desktop. Pat hosts these iPhones at home. We’ve found that maintaining real mobile devices for large scale testing is costly, unreliable, and time-consuming. For the last year we’ve talked about how to track mobile data in a way that would allow us to scale to 1M URLs. We decided emulating Android using Chrome’s mobile emulation features was the best option, and started parallel testing in this mode early last year.
Today, we’re announcing our switch from IE 9 and real iPhones to Chrome and emulated Android as the test agents for HTTP Archive.
We swapped in the new Chrome and emulated Android data starting March 1 2016. In other words, if you go to HTTP Archive the data starting from March 1 2016 is from Chrome, and everything prior is from Internet Explorer. Similarly, if you go to HTTP Archive Mobile the data starting from March 1 2016 is from emulated Android, and everything prior is from real iPhones. For purposes of comparison, we’re temporarily maintaining HTTP Archive IE and HTTP Archive iPhone where you can see the data from those test agents up to the current day. We’ll keep doing this testing through June.
This switchover opens the way for us to expand both our desktop and mobile testing to the top 1 million URLs worldwide. It also lowers our hardware and maintenance costs, and allows us to use the world’s most popular browser. Take a look today at our aggregate trends and see what stats we have for your website.
Employers can’t stop the QA mindset:
The NLRB’s ruling last week said that requiring employees to maintain a “positive work environment” is too restrictive, as the workplace can sometimes get contentious. You can’t keep your employees from arguing.
To celebrate, I’m going to turn this smile upside down. Which is just as well, as co-workers fear my smile more than my frown.
Your Web site probably falls far, far short of this.
However, the page still has a common bug. Anyone care to tell me what?
Facebook logs a helpful message to the console to help prevent XSS exploits:
However, if the user displays the console on the right instead of the bottom, this message does not lay out properly in Firefox:
Obviously, Facebook did not test this in all possible configurations. If Facebook tested it at all.
The syllabus is not yet posted on the ASTQB web site, but it will be available there very soon. It will take training providers such as myself a while to create courses and get them accredited, but they will also be out in the marketplace in the coming weeks and months.
Cybersecurity is a very important concern for every person and organization. However, only a small percentage of companies perform continuous security testing to make sure security measures are working as designed. This certification prepares people to work at an advanced level in cybersecurity as security testers. This is a great specialty area for testers looking to branch out into a new field - or to show their knowledge as security testers.
Below is a diagram showing the topics in the certification (click to enlarge):
We are committed to increasing diversity at GTAC, and we believe the best way to do that is by making sure we have a diverse set of applicants to speak and attend. As part of that commitment, we are excited to announce that we will be offering travel scholarships this year.
Travel scholarships will be available for selected applicants from traditionally underrepresented groups in technology.
To be eligible for a grant to attend GTAC, applicants must:
- Be 18 years of age or older.
- Be from a traditionally underrepresented group in technology.
- Work or study in Computer Science, Computer Engineering, Information Technology, or a technical field related to software testing.
- Be able to attend core dates of GTAC, November 15th - 16th 2016 in Sunnyvale, CA.
Please fill out the following form to be considered for a travel scholarship.
The deadline for submission is June 1st. Scholarship recipients will be announced on June 30th. If you are selected, we will contact you with information on how to proceed with booking travel.
What the scholarship covers:
Google will pay for standard coach class airfare for selected scholarship recipients to San Francisco or San Jose, and 3 nights of accommodations in a hotel near the Sunnyvale campus. Breakfast and lunch will be provided for GTAC attendees and speakers on both days of the conference. We will also provide a $50.00 gift card for other incidentals such as airport transportation or meals. You will need to provide your own credit card to cover any hotel incidentals.
Google is dedicated to providing a harassment-free and inclusive conference experience for everyone. Our anti-harassment policy can be found at:
The GTAC (Google Test Automation Conference) 2016 application process is now open for presentation proposals and attendance. GTAC will be held at the Google Sunnyvale office on November 15th - 16th, 2016.
GTAC will be streamed live on YouTube again this year, so even if you cannot attend in person, you will be able to watch the conference remotely. We will post the livestream information as we get closer to the event, and recordings will be posted afterwards.
Presentations are targeted at students, academics, and experienced engineers working on test automation. Full presentations are 30 minutes and lightning talks are 10 minutes. Speakers should be prepared for a question and answer session following their presentation.
For presentation proposals and/or attendance, complete this form. We will be selecting about 25 talks and 300 attendees for the event. The selection process is not first come first serve (no need to rush your application), and we select a diverse group of engineers from various locations, company sizes, and technical backgrounds.
The due date for both presentation and attendance applications is June 1st, 2016.
There are no registration fees, but speakers and attendees must arrange and pay for their own travel and accommodations.
Please read our FAQ for most common questions
How prepared is your software for this sudden shift?
Venezuelans lost half an hour of sleep on Sunday when their clocks moved forward to save power, as the country grapples with a deep economic crisis.
The time change was ordered by President Nicolas Maduro as part of a package of measures to cope with a severe electricity shortage.
I’m calling this the V.5H bug.
I came across this today: Being A Developer After 40
It also applies to testing and software QA. Most of the good testers I know or have known were older than the stereotypical 23 year old wunderkind. Because they’d seen things.
I’m at no loss for blog material, but have been short on time (that’s not going to change, so I’ll need to tweak priorities). But…I wanted to write something a bit different from normal in case anyone else ever needs to solve this specific problem (or if anyone else knows that this problem already has an even better solution).
Our team uses a tool called Istanbul to measure code coverage. It generates a report that looks sort of like this (minus the privacy scribbling).
For those who don’t know me, I feel compelled to once again share that I think Code Coverage is a wonderful tool, but a horrible metric. Driving coverage numbers up purely for the sake of getting a higher number is idiotic and irresponsible. However, the value of discovering untested and unreachable code is invaluable, and dismissing the tool entirely can be worse than using the measurements incorrectly.The Missing Piece
Istanbul shows all up coverage for our web app (about 600 files in 300 or so directories). What I wanted to do, was to break down coverage by feature team as well. The “elegant” solution would be to create a map of files to features, then add code to the Istanbul reporter to add the feature team to each file / directory, and then modify the table output to include the ability to filter by team (or create separate reports by team).
I don’t have time for the elegant solution (but here’s where someone can tell me if it already exists).The (or “My”) Solution
This seems like a job for Excel, so first, I looked to see if Istanbul had CSV as a reporter format (it doesn’t). It does, however output json and xml, so I figured a quick and dirty solution was possible.
The first thing I did was assign a team owner to each code directory. I pulled the list of directories from the Istanbul report (I copied from the html, but I could have pulled from the xml as well), and then used excel to create a CSV file with file and owner. I could figure out a team owner for over 90% of the files from the name (thanks to reasonable naming conventions!), and I used git log to discover the rest. I ended up with a format that looked like this:
Then it was a matter of parsing the coverage xml created by Istanbul and making a new CSV with the data I cared about (directory, coverage percentage, statements, and statements hit). The latter two are critical, because I would need to recalculate coverage per team.
There was a time (like my first 20+ years in software) where a batch file was my answer for almost anything, but lately – and especially in this case – a bit of powershell was the right tool for the job.
The pseudo code was pretty much:
- Load the xml file into a PS object
- Walk the xml nodes to get the coverage data for a node
- Load a map file from a csv
- Use the map and node information to create a new csv
Hacky, yet effective.
I posted the whole script on github here.(potentially) related posts: