Software Testing

Flushing Out the Bugs

Randy Rice's Software Testing & Quality - Fri, 05/29/2015 - 14:18

First, let me say that all of May has been a very difficult month weather-wise for those of us in Oklahoma, then later in May, for folks in Texas. Thankfully, all the tornadoes and flooding did not affect us personally, but we have friends and neighbors who were impacted and some of the stories are just tragic. So, I ask that if you are able to send a relief gift to the Red Cross designated for these disasters, please do so. It would really help those in need.
Here in Oklahoma we have had two years of extreme drought. One of the major lakes was over 31 feet below normal levels. Now, it has risen to 99% capacity. We have some lakes that are 33 feet above normal. Just in the month of May we have had 27.5 inches of rain, which shatters the record for the wettest month in history  (May 2013 with 14.52 inches and the all time monthly record was 14.66 in June of 1989). Texas has also seen similar records broken. In short, we’ve had all the rain we need, thank you. California, we would be happy for you to get some of this for your drought.
The image above is of the main street of my hometown, Chickasha, OK.
Then, there are the tornadoes that make everything even more exciting. One night this month, we had to take shelter twice but no damage, thankfully. Then yesterday morning I was awakened at 5:30 a.m. to the sounds of tornado sirens. That is freaky because you have to act fast to see what is really happening. In this case, the tornado was 40 miles away, heading the opposite direction. I question the decision to sound the alarm in that situation.
Anyway…with that context…
About a week ago, I started noticing ants everywhere in and around our house. I mean parades of them everywhere. Ironically, I even found one crawling on my MacBook Pro!
Then, came the spiders, a plethora of other bugs, snakes and even fish in some peoples’ years. A friend reported seeing a solid white opossum near his house, which is very unusual.
And you perhaps heard that in one tornado event nearby on May 6, a wild animal preserve was hit and it was reported for a while that lions, bears, tigers, etc. were loose. Turns out that was a false report, too. But it did make for some juicy Facebook pictures for “Tigernado”movies.
Other weird things have happened as well, such as storm shelters and entire swimming pools popping out of the ground due to the high water table (and poor installation in some cases)!
But back to the ants and bugs and why they are everywhere. Turns out that we have had so much rain, their nests and colonies were destroyed and they are now looking for other habitats. The same has occurred with spiders, snakes, mice and rats.
In fact, my wife and I are finding bugs we have never seen before. I had to look some of them up on the Internet just to know what kind of bug I was killing.
That caused me to think about a new testing analogy to reinforce a really great testing technique. To flush out the bugs in something, change the environment.
Of course, the difference here in this analogy is that software bugs are not like actual bugs in many regards. However, there are some similarities:
·      Both have taxonomies·      Both can be studied·      Both can mutate·      Both can travel·      Both can destroy the structure of something·      Both can be identified and removed·      Both can be prevented·      Both can be hidden from plain view
The main differences are:
·      Bugs have somewhat predictable behavior – not all software defects do·      Bugs can inhabit a place on their own initiative – software defects are created by people due to errors
(Although I have wondered how squash bugs know how just to infest squash plants and nothing else…)
In the recent onslaught of ants, it is the flooding that has caused them to appear in masses. In software, perhaps if you flooded the application with excessive amounts of data such as long data strings in fields, you might see some new bugs. Or, you could flood a website with concurrent transaction load to see new and odd behavior.
Perhaps you could do the opposite and starve the environment of memory, CPU availability, disk space, etc. to also cause bugs to manifest as failures.
This is not a new idea by any means. Fault injection has been used for many years to force environmental conditions that might reveal failures and defects. Other forms of fault injection directly manipulate code.
Another technique is to test in a variety of valid operational environments that have different operating systems, hardware capacities and so forth. This is a great technique for testing mobile devices and applications. It’s also a great technique for web-based testing and security testing.
The main principle here is that if you can get the application to fail in a way that causes it to change state (such as from “normal state” to “failure state”, then it is possible to use that failure as a point of vulnerability. Once the defect has been isolated and fixed, not only has a defect been found and fixed, but also another security vulnerability has been eliminated.
Remember, as testers we are actually trying to cause failures that might reveal the presence of defects. Failure is not an option – it is an objective!
Although, we commonly say that testers are looking for defects (bugs), the bugs are actually out of our view many times. They are in the code, the integration, APIs, requirements, and so forth. Yes, sometimes we see the obvious external bug, like an error message with confusing wording, or no message at all.
However, in the external functional view of an application or system, testers mainly see the indicators of defects. These can then be investigated for a final determination of really what is going on.
As testers, we can dig for the bugs (which can also be productive), or we can force the bugs to manifest themselves by flushing them out with environmental changes.
And let’s be real here. In some software, the bugs are not at all hard to find!
Me? I’ll continue to both dig and flush to find those defects. Even better, I’ll go upstream where the bugs often originate (in requirements, user stories, etc.) and try to find them there!
Categories: Software Testing

Fog Creek Fun

Alan Page - Thu, 05/28/2015 - 12:30

Whaaa…? Two posts on automation in one week?

Normally, I’d refrain, but for those who missed it on twitter, I recorded an interview with Fog Creek last week on the Abuse and Misuse of Test Automation. It’s short and sweet (and that includes my umms and awws).

(potentially) related posts:
  1. An Angry Weasel Hiatus
  2. Why I Write and Speak
  3. Testing with code
Categories: Software Testing

GTAC 2015 Coming to Cambridge (Greater Boston) in November

Google Testing Blog - Thu, 05/28/2015 - 10:56
Posted by Anthony Vallone on behalf of the GTAC Committee

We are pleased to announce that the ninth GTAC (Google Test Automation Conference) will be held in Cambridge (Greatah Boston, USA) on November 10th and 11th (Toozdee and Wenzdee), 2015. So, tell everyone to save the date for this wicked good event.

GTAC is an annual conference hosted by Google, bringing together engineers from industry and academia to discuss advances in test automation and the test engineering computer science field. It’s a great opportunity to present, learn, and challenge modern testing technologies and strategies.

You can browse presentation abstracts, slides, and videos from previous years on the GTAC site.

Stay tuned to this blog and the GTAC website for application information and opportunities to present at GTAC. Subscribing to this blog is the best way to get notified. We're looking forward to seeing you there!

Categories: Software Testing

Upcoming ASTQB Conference 2015 and Free Webinar

Randy Rice's Software Testing & Quality - Tue, 05/26/2015 - 11:03
I hope you had a great weekend!

I want to let you know about two events that will add value to your testing efforts.  I'm very excited to be a speaker at the upcoming ASTQB Conference 2015 in Washington, D.C. on September 14 - 16. This is a special conference because we are giving focus to critical topics such as Cybersecurity Testing, Testing of Critical Applications, The Business Value of Testing, Agile Testing, and Test Leadership. In this conference, our goal is to provide valuable take-away ideas for increasing the reliability, security and quality of software projects.
I often tell my clients there are two places you don't want to be during or after a project - the newspaper and the courtroom. Your organization's reputation is at stake with every project. I'm sure, like me, you hear stories every week of another cyber attack, system failure, or failed project. The costs of these failures are enormous.
At this conference, we are bringing together some of the country's leading experts on cybersecurity, software quality, and test management to provide real-world solutions to some of the most challenging issues we face as software testers.
Early-bird pricing is still available until June 15. But, if you use the code "astqb2015a2u" during registration, you get an extra 10% discount!
To see more information and to register, just go to
Free Webinar - Thursday, June 4 from 1 p.m. to 2 p.m. EDT
As part of the lead-up to the ASTQB Conference 2015, Taz Daughtrey and I will be presenting a free one-hour webinar on how to "Protect Your Company - and Your Software Testing Career." This free sneak peek of the ASTQB Conference will preview the keynote topics, tutorials and breakout sessions that are designed to keep you out of trouble.
In addition to the preview of the conference topics, Taz and I will share some practical information you can use immediately in testing cyber security, mobile applications and agile projects. I'll discuss a few of the new "Free and Cheap Tools" I have found recently as well.
I promise it will be an entertaining and informative time!
To register, visit
Very important note: This webinar will be full at 100 people. We will probably get 300 or more registrants, so if you want to actually get in to the session, you will want to log in early - at least 15 minutes prior.
I really hope you can join me at one or both of these events!
Thanks for reading,
Categories: Software Testing

<sigh> Automation…again

Alan Page - Tue, 05/26/2015 - 10:53

I think this is the first time I’ve blogged about automation since writing…or, to be fair, compiling The A Word.

But yet again, I see questions among testers about the value of automation and whether it will replace testers, etc.. For example, this post from Josh Grant asks whether there are similarities between automated trucking and automated testing. Of course, I think most testers will go on (and on) about how much brainpower and critical thinking software testing needs, and how test automation can never replace “real testing”. They’re right, of course, but there’s more to the story.

Software testing isn’t at all unique among professions requiring brain power, creativity, or critical thinking. I challenged  you to bingoogle “Knowledge Work” or Knowledge Worker”, and not see the parallels to software testing in other professions. You know what? Some legal practices can be replaced by automation or by low-cost outsourcing – yet I couldn’t find any articles, blogs, or anything else from lawyers complaining about automation or outsourcing taking away their jobs (disclaimer – I only looked at the first two pages of results on simple searches). Apparently, however, there are “managers” (1000’s of them if I’m extrapolating correctly) who claim that test automation is a process for replacing human testers. Apparently, these managers don’t spend any time on the internet, because I could only find second hand confirmation of their existence.

At risk of repeating myself (or re-repeating myself…) you should automate the stuff that humans don’t want (or shouldn’t have) to do. Automate the process of adding and deleting 100,000 records; but use your brain to walk through a user workflow. Stop worrying about automation as a replacement for testing, but don’t’ ignore the value it gives you for accomplishing the complex and mundane.

(potentially) related posts:
  1. To Automate…?
  2. Test Design for Automation
  3. Last Word on the A Word
Categories: Software Testing

Your #1 BugMagnet requested feature now works

The Quest for Software++ - Mon, 05/25/2015 - 07:25

BugMagnet 0.8, pushed out to the Chrome Extension store today, allows users to define custom edge cases, boundaries and interesting examples. This was by far the most requested feature since BugMagnet came out, so I certainly hope that the new version helps people be more productive while testing.

Previously, users had to change the main config file and re-build the extension from the local source files. This was a hassle because it required a development environment setup, plus if effectively required users to maintain their own version of the extension and follow source code updates.

The new version makes configuration changes trivial: Just click on the new “Configure BugMagnet” option in the menu, and you’ll see a pop-up window with the option to add local files. For a description of the configuration file format, see the Github repo main page.

This also means that we can distribute more usage-specific configuration files in the main repository. Where users previously asked for configuration file changes to be merged with the main repository, I had a really difficult decision to make between balancing things that are useful to the majority and adding interesting boundary conditions. No more! Because now people can load whatever they want, and we can avoid overcomplicating menus for users who don’t need all the additional use cases, I’m happy to take in pull requests for additional libraries of examples. I’ll distribute them through the extras folder on Github, and later make a nice web page that allows people to add such config files with one click.

To get started with BugMagnet, grab it from the Chrome Web store.

In other news, Brian Goad ported BugMagnet to Firefox. You can grab it from the Mozilla Add-ons page. The Firefox version does not support config files yet, but I hope it will do shortly.

QA Music: Nobody Praying for Me

QA Hates You - Mon, 05/25/2015 - 03:51

Seether, “Nobody Praying For Me”

Categories: Software Testing

The Best Software Testing Tool? That’s Easy…

Eric Jacobson's Software Testing Blog - Thu, 05/21/2015 - 15:55


After experimenting with a Test Case Management application’s Session-Test tool, a colleague of mine noted the tool’s overhead (i.e., the non-test-related waiting and admin effort forced by the tool).  She said, I would rather just use Notepad to document my testing.  Exactly!

Notepad has very little overhead.  It requires no setup, no license, no logging in, few machine resources, it always works, and we don’t waste time on trivial things like making test documentation pretty (e.g., let’s make passing tests green!).

Testing is an intellectual activity, especially if you’re using automation.  The test idea is the start.  Whether it comes to us in the midst of a discussion, requirements review, or while performing a different test, we want to document it.  Otherwise we risk losing it.

Don’t overlook the power of Notepad.

Categories: Software Testing

Today’s Required Reading

QA Hates You - Tue, 05/19/2015 - 04:15

7 timeless lessons of programming ‘graybeards’:

The software industry venerates the young. If you have a family, you’re too old to code. If you’re pushing 30 or even 25, you’re already over the hill.

Alas, the whippersnappers aren’t always the best solution. While their brains are full of details about the latest, trendiest architectures, frameworks, and stacks, they lack fundamental experience with how software really works and doesn’t. These experiences come only after many lost weeks of frustration borne of weird and inexplicable bugs.

Like the viewers of “Silicon Valley,” who by the end of episode 1.06 get the satisfaction of watching the boy genius crash and burn, many of us programming graybeards enjoy a wee bit of schadenfraude when those who have ignored us for being “past our prime” end up with a flaming pile of code simply because they didn’t listen to their programming elders.

(Link via tweet.)

Categories: Software Testing

Fifty Quick Ideas To Improve Your Tests now available

The Quest for Software++ - Tue, 05/19/2015 - 02:00

My new book, Fifty Quick Ideas to Improve Your Tests, is now available on Amazon. Grab it at 50% discount before Friday:

This book is for cross-functional teams working in an iterative delivery environment, planning with user stories and testing frequently changing software under tough time pressure. This book will help you test your software better, easier and faster. Many of these ideas also help teams engage their business stakeholders better in defining key expectations and improve the quality of their software products.

For more info, check out

QA Music: Making A Deal With The Bad Wolf

QA Hates You - Mon, 05/18/2015 - 03:46

AWOLNATION, “Hollow Moon (Bad Wolf)”

Categories: Software Testing

I Prefer This Over That

Test Obsessed - Sun, 05/17/2015 - 11:22

A couple weeks ago I tweeted:

I prefer: - Recovery over Perfection - Predictability over Commitment - Safety Nets over Change Control - Collaboration over Handoffs

— ElisabethHendrickson (@testobsessed) May 6, 2015

Apparently it resonated. I think that’s more retweets than anything else original I’ve said on Twitter in my seven years on the platform. (SEVEN years? Holy snack-sized sound bytes! But I digress.)

@jonathandart said, “I would love to read a fleshed out version of that tweet.”

OK, here you go.

First, a little background. Since I worked on Cloud Foundry at Pivotal for a couple years, I’ve been living the DevOps life. My days were filled with zero-downtime deployments, monitoring, configuration as code, and a deep antipathy for snowflakes. We honed our practices around deployment checklists, incident response, and no-blame post mortems.

It is within that context that I came to appreciate these four simple statements.

Recovery over Perfection

Something will go wrong. Software might behave differently with real production data or traffic than you could possibly have imagined. AWS could have an outage. Humans, being fallible, might publish secret credentials in public places. A new security vulnerability may come to light (oh hai, Heartbleed).

If we aim for perfection, we’ll be too afraid to deploy. We’ll delay deploying while we attempt to test all the things (and fail anyway because ‘all the things’ is an infinite set). Lowering the frequency with which we deploy in order to attempt perfection will ironically increase the odds of failure: we’ll have fewer turns of the crank and thus fewer opportunities to learn, so we’ll be even farther from perfect.

Perfect is indeed the enemy of good. Striving for perfection creates brittle systems.

So rather than strive for perfection, I prefer to have a Plan B. What happens if the deployment fails? Make sure we can roll back. What happens if the software exhibits bad behavior? Make sure we can update it quickly.

Predictability over Commitment

Surely you have seen at least one case where estimates were interpreted as a commitment, and a team was then pressured to deliver a fixed scope in fixed time.

Some even think such commitments light a fire under the team. They give everyone something to strive for.

It’s a trap.

Any interesting, innovative, and even slightly complex development effort will encounter unforeseen obstacles. Surprises will crop up that affect our ability to deliver. If those surprises threaten our ability to meet our commitments, we have to make painful tradeoffs: Do we live up to our commitment and sacrifice something else, like quality? Or do we break our commitment? The very notion of commitment means we probably take the tradeoff. We made a commitment, after all. Broken commitments are a sign of failure.

Commitment thus trumps sustainability. It leads to mounting technical debt. Some number of years later find themselves constantly firefighting and unable to make any progress.

The real problem with commitments is that they suggest that achieving a given goal is more important than positioning ourselves for ongoing success. It is not enough to deliver on this one thing. With each delivery, we need to improve our position to deliver in the future.

So rather than committing in the face of the unknown, I prefer to use historical information and systems that create visibility to predict outcomes. That means having a backlog that represents a single stream of work, and using velocity to enable us to predict when a given story will land. When we’re surprised by the need for additional work, we put that work in the backlog and see the implications. If we don’t like the result, we make an explicit decision to tradeoff scope and time instead of cutting corners to make a commitment.

Aiming for predictability instead of commitment allows us to adapt when we discover that our assumptions were not realistic. There is no failure, there is only learning.

Safety Nets over Change Control

If you want to prevent a given set of changes from breaking your system, you can either put in place practices to tightly control the nature of the changes, or you can make it safer to change things.

Controlling the changes typically means having mechanisms to accept or reject proposed changes: change control boards, review cycles, quality gates.

Such systems may be intended to mitigate risk, but they do so by making change more expensive. The people making changes have to navigate through the labyrinth of these control systems to deliver their work. More expensive change means less change means less risk. Unless the real risk to your business is a slogging pace of innovation in a rapidly changing market.

Thus rather than building up control systems that prevent change, I’d rather find ways to make change safe. One way is to ensure recoverability. Recovery over perfection, after all.

Fast feedback cycles make change safe too. So instead of a review board, I’d rather have CI to tell us when the system is violating expectations. And instead of a laborious code review process, I’d rather have a pair work with me in real time.

If you want to keep the status quo, change control is fine. But if you want to go fast, find ways to make change cheap and safe.

Collaboration over Handoffs

In traditional processes there are typically a variety of points where one group hands off work to another. Developers hand off to other developers, to QA for test, to Release Engineering to deliver, or to Ops to deploy. Such handoffs typically involve checklists and documentation.

But the written word cannot convey the richness of a conversation. Things will be missed. And then there will be a back and forth.

“You didn’t document foo.”
“Yes, we did. See section 3.5.1.”
“I read that. It doesn’t give me the information I need.”

The next thing you know it’s been 3 weeks and the project is stalled.

We imagine a proper handoff to be an efficient use of everyone’s time, but they’re risky. Too much can go wrong, and when it does progress stops.

Instead of throwing a set of deliverables at the next team down the line, bring people together. Embed testers in the development team. Have members of the development team rotate through Ops to help with deployment and operation for a period of time. It actually takes less time to work together than it does to create sufficient documentation to achieve a perfect handoff.

True Responsiveness over the Illusion of Control

Ultimately all these statements are about creating responsive systems.

When we design processes that attempt to corral reality into a neat little box, we set ourselves up for failure. Such systems are brittle. We may feel in control, but it’s an illusion. The real world is not constrained by our imagined boundaries. There are surprises just around the corner.

We can’t control the surprises. But we can be ready for them.

Categories: Software Testing

Multi-Repository Development

Google Testing Blog - Fri, 05/15/2015 - 15:00
Author: Patrik Höglund

As we all know, software development is a complicated activity where we develop features and applications to provide value to our users. Furthermore, any nontrivial modern software is composed out of other software. For instance, the Chrome web browser pulls roughly a hundred libraries into its third_party folder when you build the browser. The most significant of these libraries is Blink, the rendering engine, but there’s also ffmpeg for image processing, skia for low-level 2D graphics, and WebRTC for real-time communication (to name a few).

Figure 1. Holy dependencies, Batman!
There are many reasons to use software libraries. Why write your own phone number parser when you can use libphonenumber, which is battle-tested by real use in Android and Chrome and available under a permissive license? Using such software frees you up to focus on the core of your software so you can deliver a unique experience to your users. On the other hand, you need to keep your application up to date with changes in the library (you want that latest bug fix, right?), and you also run a risk of such a change breaking your application. This article will examine that integration problem and how you can reduce the risks associated with it.
Updating Dependencies is HardThe simplest solution is to check in a copy of the library, build with it, and avoid touching it as much as possible. This solution, however, can be problematic because you miss out on bug fixes and new features in the library. What if you need a new feature or bug fix that just made it in? You have a few options:
  • Update the library to its latest release. If it’s been a long time since you did this, it can be quite risky and you may have to spend significant testing resources to ensure all the accumulated changes don’t break your application. You may have to catch up to interface changes in the library as well. 
  • Cherry-pick the feature/bug fix you want into your copy of the library. This is even riskier because your cherry-picked patches may depend on other changes in the library in subtle ways. Also, you still are not up to date with the latest version. 
  • Find some way to make do without the feature or bug fix.
None of the above options are very good. Using this ad-hoc updating model can work if there’s a low volume of changes in the library and our requirements on the library don’t change very often. Even if that is the case, what will you do if a critical zero-day exploit is discovered in your socket library?

One way to mitigate the update risk is to integrate more often with your dependencies. As an extreme example, let’s look at Chrome.

In Chrome development, there’s a massive amount of change going into its dependencies. The Blink rendering engine lives in a separate code repository from the browser. Blink sees hundreds of code changes per day, and Chrome must integrate with Blink often since it’s an important part of the browser. Another example is the WebRTC implementation, where a large part of Chrome’s implementation resides in the repository. This article will focus on the latter because it’s the team I happen to work on.
How “Rolling” Works The open-sourced WebRTC codebase is used by Chrome but also by a number of other companies working on WebRTC. Chrome uses a toolchain called depot_tools to manage dependencies, and there’s a checked-in text file called DEPS where dependencies are managed. It looks roughly like this:
# ...
'' +
'external/webrtc/trunk/webrtc.git' +
'@' + '5727038f572c517204e1642b8bc69b25381c4e9f',

The above means we should pull WebRTC from the specified git repository at the 572703... hash, similar to other dependency-provisioning frameworks. To build Chrome with a new version, we change the hash and check in a new version of the DEPS file. If the library’s API has changed, we must update Chrome to use the new API in the same patch. This process is known as rolling WebRTC to a new version.

Now the problem is that we have changed the code going into Chrome. Maybe getUserMedia has started crashing on Android, or maybe the browser no longer boots on Windows. We don’t know until we have built and run all the tests. Therefore a roll patch is subject to the same presubmit checks as any Chrome patch (i.e. many tests, on all platforms we ship on). However, roll patches can be considerably more painful and risky than other patches.

Figure 2. Life of a Roll Patch.
On the WebRTC team we found ourselves in an uncomfortable position a couple years back. Developers would make changes to the code and there was a fair amount of churn in the interface, which meant we would have to update Chrome to adapt to those changes. Also we frequently broke tests and WebRTC functionality in Chrome because semantic changes had unexpected consequences in Chrome. Since rolls were so risky and painful to make, they started to happen less often, which made things even worse. There could be two weeks between rolls, which meant Chrome was hit by a large number of changes in one patch.
Bots That Can See the Future: “FYI Bots” We found a way to mitigate this which we called FYI (for your information) bots. A bot is Chrome lingo for a continuous build machine which builds Chrome and runs tests.

All the existing Chrome bots at that point would build Chrome as specified in the DEPS file, which meant they would build the WebRTC version we had rolled to up to that point. FYI bots replace that pinned version with WebRTC HEAD, but otherwise build and run Chrome-level tests as usual. Therefore:

  • If all the FYI bots were green, we knew a roll most likely would go smoothly. 
  • If the bots didn’t compile, we knew we would have to adapt Chrome to an interface change in the next roll patch. 
  • If the bots were red, we knew we either had a bug in WebRTC or that Chrome would have to be adapted to some semantic change in WebRTC.
The FYI “waterfall” (a set of bots that builds and runs tests) is a straight copy of the main waterfall, which is expensive in resources. We could have cheated and just set up FYI bots for one platform (say, Linux), but the most expensive regressions are platform-specific, so we reckoned the extra machines and maintenance were worth it.
Making Gradual Interface Changes This solution helped but wasn’t quite satisfactory. We initially had the policy that it was fine to break the FYI bots since we could not update Chrome to use a new interface until the new interface had actually been rolled into Chrome. This, however, often caused the FYI bots to be compile-broken for days. We quickly started to suffer from red blindness [1] and had no idea if we would break tests on the roll, especially if an interface change was made early in the roll cycle.

The solution was to move to a more careful update policy for the WebRTC API. For the more technically inclined, “careful” here means “following the API prime directive[2]. Consider this example:
class WebRtcAmplifier {
int SetOutputVolume(float volume);
Normally we would just change the method’s signature when we needed to:
class WebRtcAmplifier {
int SetOutputVolume(float volume, bool allow_eleven1);
… but this would compile-break Chome until it could be updated. So we started doing it like this instead:
class WebRtcAmplifier {
int SetOutputVolume(float volume);
int SetOutputVolume2(float volume, bool allow_eleven);
Then we could:
  1. Roll into Chrome 
  2. Make Chrome use SetOutputVolume2 
  3. Update SetOutputVolume’s signature 
  4. Roll again and make Chrome use SetOutputVolume 
  5. Delete SetOutputVolume2
This approach requires several steps but we end up with the right interface and at no point do we break Chrome.
ResultsWhen we implemented the above, we could fix problems as they came up rather than in big batches on each roll. We could institute the policy that the FYI bots should always be green, and that changes breaking them should be immediately rolled back. This made a huge difference. The team could work smoother and roll more often. This reduced our risk quite a bit, particularly when Chrome was about to cut a new version branch. Instead of doing panicked and risky rolls around a release, we could work out issues in good time and stay in control.

Another benefit of FYI bots is more granular performance tests. Before the FYI bots, it would frequently happen that a bunch of metrics regressed. However, it’s not fun to find which of the 100 patches in the roll caused the regression! With the FYI bots, we can see precisely which WebRTC revision caused the problem.
Future Work: Optimistic Auto-rollingThe final step on this ladder (short of actually merging the repositories) is auto-rolling. The Blink team implemented this with their ARB (AutoRollBot). The bot wakes up periodically and tries to do a roll patch. If it fails on the trybots, it waits and tries again later (perhaps the trybots failed because of a flake or other temporary error, or perhaps the error was real but has been fixed).

To pull auto-rolling off, you are going to need very good tests. That goes for any roll patch (or any patch, really), but if you’re edging closer to a release and an unstoppable flood of code changes keep breaking you, you’re not in a good place.

References[1] Martin Fowler (May 2006) “Continuous Integration”
[2] Dani Megert, Remy Chi Jian Suen, et. al. (Oct 2014) “Evolving Java-based APIs”
  1. We actually did have a hilarious bug in WebRTC where it was possible to set the volume to 1.1, but only 0.0-1.0 was supposed to be allowed. No, really. Thus, our WebRTC implementation must be louder than the others since everybody knows 1.1 must be louder than 1.0.

Categories: Software Testing

Web Performance News of the Week

LoadStorm - Fri, 05/15/2015 - 13:52

This week Bing announced it will add its own mobile-friendliness algorithm to its search results, WordPress released a security update, Google added a new Search Analytics report for web developers, and the FCC denied delay of net neutrality rules.

Bing will roll out its own mobile-friendly algorithm in the upcoming months

This week Bing announced they would be following Google’s lead, but are taking a slightly different approach to mobile-friendly search rankings. Bing announced in November that they were investing in mobile-friendly pages, and have since added “Mobile-friendly” tags to relevant sites, resulting in positive user feedback. However, Bing will not be rolling out Mobilegeddon. Instead, the mobile-friendliness signal will focus on balancing mobile-friendly pages while continuing “to focus on delivering the most relevant results for a given query.” So pages that are not mobile-friendly will not be penalized, and users can expect sites that contain more relevant results to be shown before mobile-friendly ones with less relevant content.

Shyam Jayasankar, a spokesman for the Bing Mobile Relevance team said, “This is a fine balance and getting it right took a few iterations, but we believe we are now close.”

Mobile-friendliness detection will focus on several important factors, but highlighted some of the more important ones:

  • Easy navigation – links should be far enough apart to easily navigate and click the right one.
  • Readability – Text should be readable without requiring zooming or lateral scrolling.
  • Scrolling – sites should typically fit within device width
  • Compatibility – the site must only use content that is compatible for the device; i.e. no plugin issues, flash, copyright issues on content, etc.

Bing also mentioned considering pop-ups that make it difficult to view the core of the page as a ranking signal (oh please, oh please!). They also stressed that Bingbot mobile user agents must be able to access all the necessary CSS and script files required to determine mobile-friendliness, and that they were very interested in listening to feedback on the mobile ranking.

WordPress security update addresses additional security issues

WordPress rolled out its second security update this month to address a security flaw which affected millions of websites. The exploit comprised of the vector based icons, called Genericons, that are often included by default into WordPress sites and plugins (including the Twenty Fifteen theme). The flaw was pointed out by security researchers from Sucuri, a cloud-based security company, who noted that it may be a “bit harder to exploit” than other vulnerabilities , but could allow attackers to take control of the sites.
The flaw leaves websites open to a cross-site scripting (XSS) vulnerability, similar to security risks we’ve seen WordPress address in the past month.
Make sure your site is up to date to keep it secure!

Google adds more precise data in their new Search Analytics report

Google has added a new feature to the Webmaster Tools to help website managers understand how users find your site as well as how the content will appear to them in Google search results. The new Search Analytics report contains data that is more recent and calculated differently from Google’s Search Queries results. The report was added to give users additional options for traffic analysis, allowing them more granularity with the ability to filter content and decompose search data for analysis. A fun example Google used to show off the new tool was a comparison of mobile traffic before and after the April 21st mobile update. The Search Queries report will remain available in Google Webmaster Tools for three more months to allow webmasters to get adjusted.

FCC refuses to delay net neutrality rules

USTelecom, AT&T, and CenturyLink jointly filed a petition asking the U.S. Court of Appeals for the D.C. Circuit to stay the FCC’s Open Internet order. USTelecom president Walter McCormick explained that they are “seeking to stay this ill-conceived order’s reclassification of broadband service as a public utility service.” The FCC denied the petition, refusing to delay net neutrality rules. Digital rights group, Public Knowledge, commended the decision, arguing that the reclassification would enable the FCC to enforce consumer protections in the future. Several groups have filed separate lawsuits, bringing the total number of lawsuits filed challenging net neutrality regulations to 10. This week, Free Press and New America’s Open Technology Institute (OTI) filed a motion to intervene in the legal challenges against the FCC’s Net Neutrality rules. In the motion, Free Press stated that they “rely on an open Internet to communicate with its members, activists, allies and the public in furtherance of its mission.” and therefore were considered a “party in interest in the proceeding”.

The post Web Performance News of the Week appeared first on LoadStorm.

Hero Image Custom Metrics

Steve Souders - Tue, 05/12/2015 - 02:31

The takeaways from this post are:

  1. If your website has a hero image, make sure it loads and renders as early as possible. Many designs that feature a hero image suffer from HID (Hero Image Delay) mostly due to blocking scripts and stylesheets.
  2. You should add custom metrics to your website to make sure you know how quickly (or slowly) important content gets displayed. I’ll describe a new technique for measuring when images are rendered so you can track that as a custom metric.
HID (Hero Image Delay)

The size of websites is growing. The average website contains over 2 MB of downloaded content. Of that, 1.3 MB (65%) is images. Many websites use images as a major design element in the page – these are called hero images. Because these hero images are critical design elements, it’s important that they render quickly, and yet often hero images load too late frequently getting pre-empted by less critical resources on the page.

Popular websites that use hero images include Jawbone, Homeaway, and Airbnb. Using SpeedCurve’s Responsive Design Dashboard, we can see how their hero images load over time across different screen sizes.


Jawbone’s site features an image of a woman in a sweater wearing the Up fitness tracker. While this picture is stunning, it takes 2.5-3.5 seconds before it’s displayed.

Figure 1: Jawbone Responsive Design Dashboard

Often we think that’s the price we have to pay for rich images like this – they simply take longer to download. But investigating further we find that’s often not the cause of the delayed rendering. Looking at the waterfall chart for Jawbone in Figure 2, we see that the image loaded in ~700 ms. (Look for the white URL ending in “sweater-2000.v2.jpg”.)

So why didn’t the hero image get rendered until almost 2600 ms? First, it’s referenced as a background image in a stylesheet. That means the browser’s preloader can’t find it for early downloading and the browser isn’t even aware of the image’s URL until after the stylesheet is downloaded and its rules are parsed and applied to the page. All of this means the image doesn’t even start downloading until ~650 ms after the HTML document arrives. Even after the image finishes downloading it’s blocked from rendering for ~550 ms by the scripts in the page.

Figure 2: Jawbone waterfall Homeaway

Homeaway’s entire front page is one big hero image. Without the image, the page looks broken with a gray background and a floating form. For larger screen sizes, the hero image isn’t shown for over 2.5 seconds.

Figure 3: Homeaway Responsive Design Dashboard

The waterfall in Figure 4 shows that the (initial, low-res version of the) hero image loads early and quickly (request #9). But then it’s blocked from rendering for over 2 seconds by other scripts in the page.

Figure 4: Homeaway waterfall Airbnb

Similar to Homeaway, Airbnb uses a beautiful hero image to cover nearly the entire front page. But again, this critical hero image is blocked from rendering for 1.5 to 2.5 seconds.

Figure 5: Airbnb Responsive Design Dashboard

Once again, the hero image is blocked from rendering because of the many scripts on the page, but Airbnb’s waterfall shows an interesting effect of the preloader. While the preloader, overall, makes pages load much quicker, in this case it actually hurts the user experience for Airbnb users. The Airbnb home page puts several scripts at the bottom of the page, but doesn’t load them asynchronously. While moving scripts to the bottom was a good performance optimization in 2007, that was before preloaders were created. Today, for optimal performance it’s important to load scripts asynchronously.

In this case, when Airbnb is loaded in a modern browser those scripts at the bottom get scheduled earlier by the preloader and end up being requests 3, 5, 6 and 9. They add up to 238K of gzipped JavaScript. Ungzipped it turns into 797K of JavaScript that has to be parsed and executed before the hero image can be displayed.

Figure 6: Airbnb waterfall Image Custom Metrics

Most of the performance metrics used today focus on the mechanics of the browser (window.onload) or network performance (time to first byte and Resource Timing). Unfortunately, these don’t tell us enough about what matters the most: the user’s experience. When does the page’s content get shown to the user so she can start interacting with the page?

To measure what matters, we need to shift our focus to metrics that better represent the user experience. These metrics are specific to each individual website measuring the most important design elements on each page. Because they must be created on a case-by-case basis we call them custom metrics. The earliest and most well known example of a custom metric is in a 2012 article from Twitter where they describe how their most important performance metric is Time to First Tweet, defined as “the amount of time it takes from navigation (clicking the link) to viewing the first Tweet on each page’s timeline.” Note that they don’t talk about how long it takes to download the tweets. Instead, they care about when the tweet can be viewed.

Sites that have hero images need to do the same thing: focus on when the hero image is viewed. This is trickier than it sounds. There are no hooks in today’s browsers that can be used to know when content is viewable. But we can find a solution by thinking about the three things that block an image from rendering: synchronous scripts, stylesheets, and the image itself being slow.

Talking to some web performance experts (thanks Philip, Pat, Scott, Andy and Paul!), I identified five candidate techniques for measuring when an image is displayed:

I created a test page that has a synchronous script, a stylesheet, and an image that are programmed to take a specific amount of time to download (3 seconds, 5 seconds, and 1 second respectively). Running them in WebPagetest I confirmed that the image isn’t displayed until after 5 seconds. I then implemented each of the techniques and found that:

  • Resource Timing reports a time of ~1.5 seconds, which is accurate for when the image downloads but is not accurate for measuring when the image is viewable.
  • The image onload handler, mutation observer, and polling techniques all report a time of ~3 seconds which is too early.
  • Only the inline script timer technique reports a time that matches when the image is displayed.

This test page addresses the scenarios of a synchronous script and stylesheet. We still need to find an accurate measurement technique for the case when the image itself is slow to load. A slight variation of the test page includes a 7-second image and, of the five techniques, only Resource Timing and image onload handler correctly measure when the image is displayed – the other techniques report a time that is too early. Of those two, image onload is preferred over Resource Timing because it’s simpler and more widely supported.

Therefore, to create a custom metric that determines when an image is displayed you should take the max of the values reported by the inline script timer and image onload.

We’re all pretty familiar with image onload handlers. The inline script timer is simple as well – it’s just an inline script that records the time immediately following the IMG tag. Here’s what the code looks like:

<img src="hero.jpg" onload="performance.mark('hero1')"> <script>performance.mark('hero2')</script>

The code above takes advantage of the User Timing API. It’s easy to polyfill for browsers that don’t support it; I recommend using Pat Meenan’s polyfill. You need to take the max value of the hero1 and hero2 marks; this can be done either on the client or on the backend server that’s collecting your metrics. Refer to my test page to see live code of all five measurement techniques.

The most important thing to do is get your hero images to render quickly. Use Custom Metrics to make sure you’re doing that well.

Categories: Software Testing

Washington University Discovers Appropriate Tests for Testers

QA Hates You - Mon, 05/11/2015 - 10:35

Well, no, they tested the eye-hand coordination of Albert Pujols a couple years ago, and the tests seem like they’d be applicable to testers as well:

White, who administers these tests frequently as part of her research and clinical work, was especially surprised by Pujols’ performance on two tests in particular, a finger-tapping exercise that measures gross motor performance and a letter cancellation task that measures ability to conduct rapid searches of the environment to locate a specific target.

Asked to place a mark through a specific letter each time it appeared on a page of randomly positioned letters, Pujols used a search strategy that White had never witnessed in 18 years of administering the test.

“What was remarkable about Mr. Pujols’ performance was not his speed but his unique visual search strategy,” White said. “Most people search for targets on a page from left to right, much as they would when reading. In observing Mr. Pujols’ performance, I initially thought he was searching randomly. As I watched, however, I realized that he was searching as if the page were divided into sectors. After locating a single target within a sector, he moved to another sector. Only after locating a single target within each sector, did he return to previously searched sectors and continue his scan for additional targets.”

Asked to depress a tapper with his index finger as many times as possible in 10 seconds, Pujols scored in the 99th percentile, a score almost identical to one earned by Ruth on a similar test of movement speed and endurance. White was impressed not only by Pujols’ tapping speed (2.4 standard deviations faster than normal), but also by the fact that his performance kept improving after repeated trials.

“It was interesting that he actually tapped faster in later trials of the task, suggesting considerable stamina at a high level of performance,” White noted. “Most people tap somewhat slower as the test progresses because their fingers and hands begin to fatigue.”

Pujols tapped with such force, in fact, that, at one point, he actually knocked the tapping key out of alignment. Pujols then helped White repair the finger tapper, tightening a loosened screw with his fingernail, she said.

On additional test I’d pose is: How many members of an agile team can you depress in ten seconds?

Categories: Software Testing

Web Performance News of the Week

LoadStorm - Fri, 05/08/2015 - 15:02

Live streaming apps steal traffic from the pay-per-view Mayweather/Pacquiao fight

Over the weekend dozens of live streams of the Mayweather/Pacquiao fight were available through Periscope, a fairly new app owned by Twitter. Because apps like Periscope and Meerkat record and broadcast live, it is more challenging to shut down live streams of TV footage before users see them. HBO and Showtime, who co-produced the pay-per-view boxing match were required to alert Twitter of illegal streams of the so called fight of the century, while many media firms say it should be the app’s responsibility to police the live streaming of pay-per-view events. Periscope shut down over 30 illegal streams, but internet users still boasted about their ability to watch for free. It was estimated that thousands of illegal streams were available to watch from users who had recorded and simultaneously broadcasted the TV footage on their smartphones.

Google Cloud Deployment Manager is now in beta

The Google Cloud Deployment Manager was introduced into beta this week, allowing you to generate a description of what you want to deploy and hand off all the dirty work to Google. The tool boasts intent driven management using a declarative syntax, meaning you simply have to state the desired outcome of your deployment rather than running scripts to configure your environment, or even using a separate server to run a configuration tool. The main difference between the Deployment Manager and existing open source configuration tools like Puppet, Chef or SaltStack is that the Google Cloud Deployment Manager is natively integrated into the Cloud Platform, meaning you don’t pay anything extra for it or deploy and manage a separate configuration management software.

Here are the key features:

  • Define your infrastructure deployment in a template and deploy via command line or RESTful API
  • Templates support Jinja or Python, so you can take advantage of programming constructs, such as loops, conditionals, and parameterized inputs for deployments requiring logic
  • UI support for viewing and deleting deployments in Google Developers Console
  • Tight integration with Google Cloud Platform resources, from compute to storage to networking, which provides faster provisioning and visualization of the deployments

Facebook extends deep linking capabilities

Facebook announced the extension of their deep linking tool beyond engagement ads to also include mobile app install ads. The goal of the tool is to help developers and advertisers send people directly to information they care about, such as a product page, when their app is opened for the first time. Developers can access the feature through App Links, Facebook’s cross platform standard for deep linking on mobile, or through the Facebook SDK if they’ve enabled App Links. In addition, developers can also define the location they want their ads to link to.

To complement this new feature, Facebook also introduced a new deep link verifier within the app ads helper to verify if deep links are set up properly before running app ads that use the functionality.

PageSpeed Service has been deprecated

This week Google announced that PageSpeed, a service that optimizes websites by rewriting web pages and serving the optimized content via Google servers, has been deprecated and will be discontinued on August 3rd, 2015. All sites using PageSpeed service will become completely unavailable on that date if users do not change their DNS before then. Although Google will be notifying users believed to be affected, they urged users not to rely solely on a notification and to login to the console to check for any domains that show up as “Enabled” to determine if they are at risk. Many hosting providers integrate PageSpeed, and users are instructed to check provider documentation.

The post Web Performance News of the Week appeared first on LoadStorm.

Very Short Blog Posts (28): Users vs. Use Cases

DevelopSense - Michael Bolton - Thu, 05/07/2015 - 11:26
As a tester, you’ve probably seen use cases, and they’ve probably informed some of the choices you make about how to test your product or service. (Maybe you’ve based test cases on use cases. I don’t find test cases a very helpful way of framing testing work, but that’s a topic for another post—or for […]
Categories: Software Testing