A big change in the World of Performance for 2015 [this post is being cross-posted from the 2015 Performance Calendar] is the shift to metrics that do a better job of measuring the user experience. The performance industry grew up focusing on page load time, but teams with more advanced websites have started replacing PLT with metrics that have more to do with rendering and interactivity. The best examples of these new UX-focused metrics are Start Render and Speed Index.Start Render and Speed Index
A fast start render time is important for a good user experience because once users request a new page, they’re left staring at the old page or, even worse, a blank screen. This is frustrating for users because nothing is happening and they don’t know if the site is down, if they should reload the page, or if they should simply wait longer. A fast start render time means the user doesn’t have to experience this frustration because she is reassured that the site is working and delivering upon her request.
Speed Index, a metric developed by Pat Meenan as part of WebPageTest, is the average time at which visible parts of the page are displayed. Whereas start render time captures when the rendering experience starts, Speed Index reflects how quickly the entire viewport renders. These metrics measure different things, but both focus on how quickly pages render which is critical for a good user experience.Critical Resources
The main blockers to fast rendering are stylesheets and synchronous scripts. Stylesheets block all rendering in the page until they finish loading. Synchronous scripts (e.g., <script src="main.js">) block rendering for all following DOM elements. Therefore, synchronous scripts in the HEAD of the page block the entire page from rendering until they finish loading.
I call stylesheets and synchronous scripts “critical blocking resources” because of their big impact on rendering. A few months back I decided to start tracking this as a new performance metric as part of SpeedCurve and the HTTP Archive. Most performance services already have metrics for scripts and stylesheets, but a separate metric for critical resources is different in a few ways:
- It combines stylesheets and synchronous scripts into a single metric, making it easier to track their impact.
- It only counts synchronous scripts. Asynchronous scripts don’t block rendering so they’re not included. The HTTP Archive data for the world’s top 500K URLs shows that the median website has 10 synchronous scripts and 2 async scripts, so ignoring those async scripts gives a more accurate measurement of the impact on rendering. (I do this as a WebPageTest custom metric. The code is here.)
- Synchronous scripts loaded in iframes are not included because they don’t block rendering of the main page. (I’m still working on code to ignore stylesheets in iframes.)
I’m confident this new “critical resources” metric will prove to be key for tracking a good user experience in terms of performance. Whether that’s true will be borne out as adoption grows and we gain more experience correlating this to other metrics that reflect a good user experience.
In the meantime, I added this metric to the HTTP Archive and measured the correlation to start render time, Speed Index, and page load time. Here are the results for the Dec 1 2015 crawl:
The critical resources metric described in this article is called “CSS & Sync JS” in the charts above. It has the highest correlation to Speed Index and the second highest correlation to start render time. This shows that “critical resources” is a good indicator of rendering performance. It doesn’t show up in the top five variables correlated to load time, which is fine. Most people agree that page load time is no longer a good metric because it doesn’t reflect the user experience.
We all want to create great, enjoyable user experiences. With the complexity of today’s web apps – preloading, lazy-loading, sync & async scripts, dynamic images, etc. – it’s important to have metrics that help us know when our user experience performance is slipping. Tracking critical resources provides an early indicator of how our code might affect the user experience, so we can keep our websites fast and our users happy.
Here are some things, I can think of, testers-who-don’t-code can do to help boost thier value:
- Find more bugs - This is one of the most valued services a tester can provide. Scour a software quality characteristics list like this to expand your test coverage be more aggressive with your testing. You can probably cover way more than automation engineers in a shorter amount of time. Humans are much better at finding bugs than machines. Finding bugs is not a realistic goal of automation.
- Faster Feedback – Everybody wants faster feedback. Humans can deliver faster feedback than automation engineers on new testing. Machines are faster on old testing (e.g., regression testing). Report back on what works and doesn’t while the automation engineer is still writing new test code.
- Give better test reports – Nobody cares about test results. Find ways to sneak them in and make them easier to digest. Shove them into your daily stand-up report (e.g., “based on what I tested yesterday, I learned that these things appear to be working, great job team!”). Give verbal test summaries to your programmers after each and every test session with their code. Give impromptu test summaries to your Product Owner.
- Sit with your users – See how they use your product. Learn what is important to them.
- Volunteer for unwanted tasks – “I’ll stay late tonight to test the patch”, “I’ll do it this weekend”. You have a personal life though. Take back the time. Take Monday off.
- Work for your programmers - Ask what they are concerned about. Ask what they would like you to test.
- What if? – Show up at design meetings and have a louder presence at Sprint Planning meeting. Blast the team with relentless “what if” scenarios. Use your domain expertise and user knowledge to conceive of conflicts. Remove the explicit assumptions one at a time and challenge the team, even at the risk of being ridiculous (e.g., what if the web server goes down? what if their phone battery dies?).
- Do more security testing – Security testing, for the most part, can not be automated. Develop expertise in this area.
- Bring new ideas – Read testing blogs and books. Attend conferences. Tweak your processes. Pilot new ideas. Don’t be status quo.
- Consider Integration – Talk to the people who build the products that integrate with your product. Learn how to operate their product and perform integration tests that are otherwise being automated via mocks. You just can’t beat the real thing.
- Help your automation engineer – Tell them what you think needs to be automated. Don’t be narrow-minded in determining what to automate. Ask them which automation they are struggling to write or maintain, then offer to maintain it yourself, with manual testing.
- Get visible – Ring a bell when you find a bug. Give out candy when you don’t find a bug. Wear shirts with testing slogans, etc.
- Help code automation – You’re not a coder so don’t go building frameworks, designing automation patterns, or even independently designing new automated checks. Ask if there are straight forward automation patterns you can reuse with new scenarios. Ask for levels of abstraction that hide the complicated methods and let you focus on business inputs and observations. Here are other ways to get involved.
Because I just loaded it onto a cheap MP3 player for my gym workouts, have Poison, “Come Hell or High Water”:
I had a second scenario this week that gave me pause before resulting in the above practice.
ProductA is developed and maintained by ScrumTeamA, who writes automated checks for all User Stories and runs the checks in a CI. ProductB is developed and maintained by ScrumTeamB.
ScrumTeamB developed UserStoryB, which required new code for both ProductA and ProductB. ScrumTeamB merged the new product code into ProductA…but did NOT merge new test code to ProductA. Now we have a problem. Do you see it?
When ProductA deploys, how can we be sure the dependencies for UserStoyB are included? All new product code for ProductA should probably be accompanied with new test code, regardless of the Scrum Team making the change.
The same practice might be suggested in environments without automation. In other words, ScrumTeamB should probably give manual test scripts, lists, test fragments, or do knowledge transfer such that manual testers responsible for ProductA (i.e., ScrumTeamA) can perform the testing for UserStoryB prior to ProductA deployments.
…It seems obvious until you deal with integration tests and products with no automation. I got tripped up by this example:
ProductA calls ProductB’s service, ServiceB. Both products are owned by the same dev shop. ServiceB keeps breaking in production, disrupting ProductA. ProductA has automated checks. ProductB does NOT have automated checks. Automated checks for ServiceB might help. Where would the automated checks for ServiceB live?
It’s tempting to say ProductA because ProductA has an automation framework with its automated checks running in a Continuous Integration on merge-to-dev. It would be much quicker to add said automated checks to ProductA than ProductB. However, said checks wouldn’t help b/c they would run in ProductA’s CI. ProductB could still deploy to production with a broken ServiceB.
My lesson learned: Despite the ease of adding a check to ProductA’s CI, the check needs to be coupled with ProductB.
In my case, until we invest in test automation for ProductB, said check(s) for ServiceB will be checks performed by humans.
When you log into Slack, it provides you an inspirational message. How positive of the program. This particular item always gets me:
The first item on the list is that I couldn’t complete the list in under 24 hours.
Then we get into the physically impossible.
What, this is a rhetorical question? Then why ask it?
Potentially shippable software is the holy grail of agile delivery, according to anyone out there with enough patience to sit through two days of Scrum conditioning. Ten years ago, most of the industry probably wasn’t capable of living up even to that benchmark. But today, potentially shippable software by the end of each iteration should be taken for granted, the same way you expect your next hamburger to be asbestos-free. That’s the bare minimum, but far from being good enough.
In fact, the current thinking around potentially shippable software severely limits what teams could achieve. The move to frequent releases is causing a fundamental change for consumers. Companies that can spot this in their market segments, and adapt quickly, will start running circles around the competition. Those that don’t will be left trying to play bowling on a basketball court.
For an example, just look at transportation. The entire car industry seems to shake and tumble with problems. The data for 2015 isn’t out yet, but just for comparison, Toyota recalled 6.5 million cars last year to deal with switch malfunctions. In 2014 alone, the recalls ordered by the US NHTSA agency involved 63.9 million vehicles. General Motors had to fix 5.8 million vehicles in 2014 to deal with faulty ignition switches that could cause fires. In a similar situation, NHTSA ordered Tesla to deal with problems in almost 30.000 vehicles. The NEMA 14-50 Universal Mobile Connectors could overheat, and potentially cause a fire. Pretty serious stuff, with faulty hardware. Judging by the rest of the industry, this should have been a crisis that would cost at least a few million. Instead, it turned into a ton of free press.
The Tesla UMC recall didn’t require car owners to waste time driving their vehicles to a service shop. It didn’t require the manufacturer to cash out for new parts, or to pay for mechanics’ time. Instead, someone pushed a button. An over-the-air software update remedied the problem until the next time a car gets brought in for a regular check-up.
When one player in a market can respond to a major problem with an automatic software patch, while the others have to pay for parts and labour, they aren’t playing the same game any more. The costs of servicing, of course, plummet. But the impact goes far beyond that.
In similar situations, the owners of the other vehicles had to make a difficult choice of trading their short term plans against security and safety risks. Tesla’s customers were not affected at all. For Tesla, continuous software delivery isn’t just a technical practice, it’s a way to change consumer expectations and open up marketing opportunities. You can dismiss this as an isolated incident, but ten years from now, the car consumer expectations will completely turn. People will expect to have that level of service, and anyone not being able to provide it will be out of business. And it will happen to many other industries as well.Disrupting business models
When the user expectations and perceptions change, business models have to change as well. Just consider how the typical software sales models changed over the last twenty years. Before the web services became ubiquitous, it was quite normal for consumers to buy a particular version of software. When a new box of their favourite software came out, complete with a stack of 3.5” floppy disks and a printed user guide, consumers would pay for upgrades. This model was sensible when new versions came out every year. But as the Internet took off, and people requested higher bandwidth to enjoy ever increasing quality of funny kitten videos, it became possible to distribute software updates more frequently.
Consumer expectations significantly changed. Technically, it makes a lot of sense to release software several times a month, especially to fix security risks. So people got used to upgrading frequently. Consumers might even like the new features enough to want to install a new version, but nobody wants to pay for software that often. The whole concept of selling versions didn’t make much sense at higher frequency, so companies started to offer free upgrades, and users started to become more entitled. The average internet consumer today expects to get web services for free. Free e-mail, free photo storage, free news. On mobile platforms, apps still sell, but users who pay $0.99 expect to get all future new features free, forever. That requires literally a pyramid scheme where early backers benefit from latecomers, and requires an ever increasing user-base. When the growth stops, commercial models like that fail, much like in a Ponzi scheme.
Just look at operating systems. After OSX went free, the game changed. Microsoft had to make Windows 10 free as well, and move away from versions. Now all Windows will be 10, and instead of a major version every three years, consumers can expect a continuous stream of updates. At the same time, because it can’t be sold any more, Windows collects private information and phones back home with advertising identifiers, so it can make up for the lost revenue
Changing the expected delivery frequency pretty much killed the old software business model. Instead of charging for new features with paid upgrades, software companies had to come up with completely different ways of financing development. The wholesale sleaze-ball privacy invasion of ad networks is just a way to pass the need for payments from consumers to third parties. Some companies decided to constantly harass their users with micro-payments to unlock individual features. Zynga was one of the first to spot the game change, and it was one of the rising stars at the turn of the decade. But consumers can only suffer constant harassment for so long, and it took only a few years for the whole pyramid to collapse. On the other hand, companies that could charge for rent, such as Dropbox or Github, flourished in the new game. The expectations in the market changed. People want software for free, but they seem happy to pay for a service.
This user entitlement caused by more frequent delivery expectations isn’t a problem just for software. As continuous delivery crosses more into product strategy, it starts to affect customers of all types of products. In October 2014, Tesla announced that new cars coming out of the factory will have a forward radar, ultrasonic sensors and cameras, all wired up to a lane-changing autopilot and high precision digital breaking system. Although the news was amazing, not everyone was happy. Richard Wolpert from Los Angeles, for example, bought an older model just a few months before, and to him the world just seemed unfair. Normally, he got the new car features for free, magically. But this one wasn’t coming. So he started a petition to force Tesla to retro-fit radars and sonars into older cars. Dag Rinden of Oslo, Norway, pleaded that lane switching and automatic breaking are important for driver security, and that they should be provided for free to all existing owners.
Now let's just take a moment to consider this. Someone bought a car, and later complained that new hardware did not magically appear overnight when it was announced in the news. Continuous delivery doesn't solve this problem, Star Trek replicators do. You and I can laugh about it, because we can distinguish hardware and software, but Richard and Dag don’t care about that. They only see a car, and they got used to getting the new stuff for free. Plus, the new features are potentially life-saving, so surely they are entitled to those. Disappointing users is never good, even when they are clearly wrong. But giving away free radars also isn’t good for business. And the whole mess is a consequence of the fundamental game change.
I’ve never heard of anyone with similar complaints about any other car manufacturers. When you buy a car, pretty much it's clear that it won't one day just get a radar and a sonar. But Tesla trained their users to expect more. They aren’t playing the same game. For all other manufacturers, a car model is something with a fixed design, produced a particular year. For Tesla, that concept of car models just doesn’t work like that. And once there are no more models and yearly versions, people just feel a lot more entitled and expect to get things for free.Disrupting marketing
Another major side-effect of frequent delivery is that it removes the drama. The more often software ships, the less risky each update becomes. Small changes mean quick testing, and small potential problems. Continuous delivery pipelines help companies deploy with more confidence, prevent surprises, and generally make releases uneventful. But making the releases uneventful also causes problems for marketing. I’ve learnt this the most stupid way possible, on my own skin.
MindMup is a bootstrapped product, and we don’t have a lot of cash to spend on advertising. Apart from slow and steady word of mouth, the typical way for such products to get new users is press coverage. Indeed, the three biggest spikes of user traffic for MindMup over the last three years came from news sites — spending a day on the front page of HackerNews after we open sourced it, and getting reviewed on LifeHacker and PCWorld. However, after those early successes, it took over a year and a half until we could get another big spike. Meanwhile, we shipped a ton of useful stuff, but nobody took notice. By having a continuous stream of small changes instead of big versions, we scored a marketing own-goal. Sure, technically there was no drama in any of the hundreds of releases. But there was also no excitement. No single change was ever big and important and newsworthy to be covered by a major channel.Potentially Shippable in a changed game
Changed consumer expectations, across industries, will put more pressure on companies to roll out software with increasing frequency. That’s a given. Yet the more frequently software ships, the more it has an impact on marketing, business models and consumer expectations. People who design continuous delivery pipelines, and people who break down features into iterative deliveries, now have a magic wand that can disrupt sales or marketing and disorient users.
A nice example of that is how Paypal changed their business dashboard last year. One day, trying to pay something using PayPal, I panicked after several thousand pounds disappeared from our company account. My first thoughts were that our PayPal account got hacked, or that the funds were frozen for some bizarre reason. PayPal is famous for being hostile to digital goods merchants, and frozen funds were an even scarier scenario than getting hacked. I looked through the recent transactions, and I couldn’t see any transfers or withdrawals. In fact, there was nothing suspicious in the list. While I was trying to call the customer service, I spotted a link saying something similar to ‘how do you like our new business dashboard?’ Anyone who has ever done serious software testing would start guessing what happened there. And in fact, it took only three link clicks to find the money. My company has a multi-currency account with PayPal. The old dashboard converted all the money into an approximated value in the primary currency, but the new dashboard only showed the money actually in the primary currency. Someone did an incremental development change, and they either intentionally or mistakenly disregarded multiple currency accounts. I can only assume that most people with multiple currency accounts didn’t think like a software tester that morning, and that the PayPal customer service didn’t exactly have a pleasant day. At the same time, to get software potentially shippable, someone had to cut a huge piece of work into smaller batches. And they made the wrong choice.
As an industry, we need to move the discussion away from ensuring things are potentially shippable towards how exactly that’s achieved. The choice can have a ton of unexpected negative effects on sales and marketing, or it can open up new business opportunities and help companies run much faster than the competition. That’s why software planning and releases have to be driven more by market cycles and marketing opportunities than arbitrary iterations.
And that’s where the problem with the concept of 'Potentially Shippable' starts. Does that mean potentially could be deployed to production? Or does it mean potentially could be released to users? Who determines if potentially should be turned into actually? Or when that should happen?Deployments are not the same as Releases
When we started fixing this problem for MindMup, one thing became painfully clear. We thought about deployment and releasing as the same thing, but it's much more useful to look at them separately. Deployment is a technical event, bits and bytes of software being moved to production servers or users’ devices. Release is a marketing event, where a new version becomes available to a group of end-users. Think about ‘Deployment’ as the part when an Amazon courier brings a box of cardboard-packed toys, you wrap them up nicely, and hide them in a cupboard. But ‘Release’ is when your children find the toys under a Christmas tree, at exactly the right moment to believe in Santa Claus.
For MindMup, potentially shippable stuff turned into actually shipped almost all the time, in order to reduce technical deployment risks. We mentally coupled deployments and releases, and by doing that, we forced a technical event to have a marketing impact. Going back to the example with presents, it’s as if the children intercepted the couriers and took the presents themselves, along with the delivery slips and the receipts. Sure, at the end everyone got a toy, but the magic of Christmas is gone. And they’ll start arguing about who got a more expensive present and who got shorted. Our software releases were driven by technical cycles, not marketing cycles. No wonder nobody wanted to pick up on any important news.
Once I could spot this in our software, it became easy to see it with many of my consulting clients as well. I don’t have any statistically relevant data to claim an industry-wide pattern, but it looks as if this is quite a common self-inflicted handicap. Deployments and releases are tightly coupled in our minds, it’s just the way we were conditioned to think. I assume that nobody reading this article primarily distributes software on floppy disks in boxes, physically shipped to consumers. Yet that’s still how most people think about releases and deployments.
The solution is quite simple: Decouple deployments and releases. This effectively means being able to put software on production systems that is not necessarily generally available, running alongside software that is visible. It’s the nicely wrapped present, without the receipts or any other controversial crap, waiting for the right moment to make a big impact. That way, the marketing stakeholders can decide on their own when they are going to release it and how. Software releases can be organised around important marketing opportunities, while software deployments can still happen frequently to reduce technical risk. Jez Humble wrote about that in 2012.The key is in multi-versioning
The problem, of course, is that simple is not the same as easy. Although I can suggest the solution in one sentence, it is quite difficult to pull off in practice. Feature toggles, ever more present in software, lead to unmaintainable spaghetti of code, configuration, and magic. To truly get the benefits of continuous delivery, most companies will likely need a completely different approach to technical architecture and design. Instead of simple toggles and flags, software will need to be designed from the ground up for multi-tenant, multi-versioned, multi-interface world. This means that every layer of the stack will need to accept calls from potentially different versions of things above it and know how to reply accordingly. It also means that almost every piece of data in transport will need to be tagged with the appropriate version. This will significantly increase the complexity of testing and operating software. But companies that don’t do that will end up playing bowling on a basketball court and wonder why they are not scoring.
Once the capability for running multiple concurrent versions is in place, it’s becomes quite easy to make some versions of software available only to certain subsets of users. And so, it becomes easy to minimise the potential negative effects of small incremental changes. Imagine if the new Paypal business dashboard was only shown to customers with a single currency account. Instead of giving all the users a small increment of the improvement, this would give a small group of users 100% of what they need. There would be no user confusion, and the “new” business dashboard would actually be better for whoever could see it. Over time, as the features build up, more users could be brought over to the new system, and then finally, the old version completely retired. Ironically, I’m pretty sure that PayPal has the capability to deploy and release gradually to subsets of users, but they didn’t coordinate it well with the rest of the business.
Once the capability for running multiple concurrent versions is in place, it’s becomes much easier to decide what and how to sell, and what, how and when to open up. Continuous delivery pipelines don’t need to have a negative impact on sales or marketing, and the decisions around those aspects can go back to the people that should be making them. Even more importantly, with proper multi-versioning in place, it becomes a lot easier to make better informed decisions. Focus groups, prototype experiments and customer research can only suggest that people might potentially be able to do something, not that they will actually do it, or get the expected benefits. But with multi-versioned systems, companies don’t have to rely on potential usage data — they can look at actual, real user trends, and weed out bad ideas before they become cemented. At Google, one such test apparently led to an extra $200m a year in revenue.
Ron Kohavi, Thomas Crook, and Roger Longbotham have some chilling statistics in their paper Online Experimentation at Microsoft, where they claim that only about one third of analysed ideas actually achieved what was expected once implemented in software. They also cite a source from Amazon, where the success rate is higher, but still less than 50%.
This means that for the average software company out there, getting multi-versioning right can reduce maintenance costs by fifty to seventy percent, just by helping them drop deadwood, and not waste time on implementing things that just won’t fly. The additional cost of operation and testing can then be easily be recovered through a significant reduction in maintenance costs.
So, if you’re still late making your 2016 resolutions, or if all the ones you made already turned out to be unachievable, here's an idea for the next year: push your organisation slightly more towards thinking that continuous delivery isn’t just a technical thing. It’s a game-changer, that has massive side-effects on business models and customer expectations. And design your pipeline so that you can decouple deployments from releases. Run the former based on technical risk, and coordinate the latter with marketing cycles.
If I could take one test automation rule to my grave, this would be it. I had forgotten that it was optional.
I know, I know, it’s seems so tempting to break this rule at first; TestA puts the product-under-test in the perfect state for TestB. Please don’t fall into this trap.
Here are some reasons (I can think of) to keep your tests mutually exclusive:
- The Domino Effect – If TestB depends on TestA, and TestA fails, there is a good change TestB will fail, but not because the functionality TestB is checking fails. And so on.
- Making a Check Mix – Once you have a good number of automated checks, you’ll want the freedom to break them into various suites. You may want a smoke test suite, a regression test suite, a root check for a performance test, or other test missions that require only a handful of checks...dependencies will not allow this.
- Authoring – While coding an automated check (a new check or updating a check), you will want to execute that check over and over, without having to execute the whole suite.
- Easily Readable – When you review your automation coverage with your development team or stakeholders, you’ll want readable test methods. That usually means each test method’s setup is clear. Everything needed to understand that test method is contained within the scope of the test method.
The tale has all the hallmarks of technical debt in a huge, unmaintained, bitrotten codebase (the bug itself due to code that hadn’t been used for 8 years), and a really poor, undisciplined devops story.
I’d always sworn I’d never work for a health devices or financial services company because the risks were so great.
Well, so far, I’m keeping half of that pledge.
I decided that in the rare occurrences where I post non-software articles, that I’d use a blog on Medium and post my attempts at story-telling along with the rest of the world.
My first post is here.(potentially) related posts:
This was a small survey of 22 companies worldwide, 19 of which were able to provide accurate information about their tester to developer ratio.
This survey is part of ongoing research I have been conducting since 2000.
Thanks to everyone who has contributed to date.
Before I get into the findings, I want to refer to two articles I have written on this topic. These articles explain why I feel that the data show there is no single ratio that works better than others. Getting the right workload balance is a matter of tuning processes and scope, which includes optimizing testing to get the most efficiency with the resources you have.
You can read these articles at:
The Tester-to-Developer Ratio Revisited
The Elusive Tester to Developer Ratio
The recent findings are:1. The range of ratios are much tighter. The range was 1 tester to 1 developer on the richer end of the scale, to 1 tester to 7 developers on the leaner end. I feel that some of this is due to the small sample size.2. The majority of responses (16) indicated just three ratios: 1 tester to 1 developer on the low side to 1 tester to 3 developers on the high side.3. The most common ratio was 1 tester to 2 developers4. The average was also 1 tester to 2 developers5. People reported poor, workable and good test effectiveness at all ratios. The variation was wide. There were no noticeable indications that a particular ratio of testers to developers worked any better than another, simply due to the ratio.
This survey showed much richer ratios than any other survey I’ve taken. This could be due to the impact of agile methods. Most of these companies (13) reported they do not anticipate hiring more testers in 2016. I plan to continue this survey to get a more significant sample size.
If you have not contributed to this survey yet, you can still add your responses at:https://www.surveymonkey.com/r/55LVHFZ
I have an after-work event tonight, and rather than leave my car in the garage overnight, I ran to work. Since I’ve moved to downtown Bellevue, I’ve done this a few times – and given that I’m running another half-marathon in 10 days, it was a great opportunity for a long training run before I begin to taper my mileage down a bit leading up to the race.App Issues
I’ve been a long-time user of a running companion app called Runtastic. It does the usual stuff of tracking mileage, route, pace, and giving voice updates at user specified intervals. I find it especially valuable when training or racing, because I usually have very specific pace goals. Having a voice tell me how long the last mile/half-mile/km took me let’s me know if I’m running too fast (and burning out) or too slow (and putting my goals in jeopardy). Granted, I have a pretty good internal clock, and usually run my paces pretty well withouth “help”, but the feedback is really useful to me.
Today, I took off from home, started spacing out, and before long, I was a mile or two from home…when I noticed that I had not received any voice feedback yet. I knew exactly what happened (because it’s happened before). When you start the app, it gives you a 15 second timer before the tracking actually starts, along with the ability to add time to the delay up to two minutes or so. I LOVE this feature, because I can give myself time to put my phone into my running belt, walk twenty yards and curse myself for having such a painful hobby before actually exerting any physical energy.
Unfortunately, at least 25% of the time I use the app the countdown fails. But I never know it failed until too late.
What happens is that the countdown stops at 1 second. I hear the voice prompt count down 5-4-3-2-1, and I think it’s tracking, but it’s stuck on one second.
Today was extra painful not only because I wanted to see how I was doing on “race pace”, but after I discoverd it failed, I restarted the app, set the countdown again…and it “hung” on one second again.
Now, it’s a bug – that’s for sure. Some testers I know would automatically assume that every user in the world was hitting this bug and that public shaming of the company would be the next course of action. I, however, realize that context plays a role in many (every?) part of software engineering, and that given the value I get from the product (and my very amateur level of running) that while this is a painfully annoying bug, it’s not the end of the world.Intelligence?
Given that I wasn’t concerned at all with pace for the remaining 5+ miles of my commute, my mind began to wander. What follows is a completely made up story of how a Runtastic engineer may discover this issue without me, or anyone else, reporting it.
Pointy Haired Runtastic Manager: Hey super-smart employee (sse). Our default time for delay before a run is 15 seconds. Can you look at the data and see if our estimate for a default delay length is in the ballpark of what people actually use? Someone on the train told me that they though 30 seconds would be a lot better. I don’t think they’re right, but I want to make a decision based on data.
Super Smart Employee (sse): Sure boss. That data is pretty easy to pull. I’ll take a look!
What SSE is about to do at this point is gather Business Intelligence. They want to use data to make a business decision.
SSE looks at hundreds of thousands of activities from the past six months and sees that nearly 60% of the people just use the default 15 seconds. She quickly generates a scatter graph showing that shows the outliers and prepares it for her boss. Before sending, she realized that she wants to exclude the instances where people cancel the activity completely before the countdown completes (phone calls, cold feet (literally, and metaphorically), and a variety of other reasons could cause this). She filters the data and starts to send the report…but – while she’s there, she notices something…interesting. First, the number of “cancelled” activities seem high to her (over 15%). She flips the filter to look only at cancelled activities and things get weirder. Of the 15% of cancelled activities, 90% are cancelled at exactly 1 second.
That’s too weird to be true.Insight
SSE looks at every activity where the timer was “killed” at one second. Often, those users started another activity within 10-15 minutes.
Or, maybe they all had the same model of phone.
Or maybe they were all running Spotify at the same time.
Or something. Remember. This story is completely made up.
The point is that SSE quickly went from gathering BI to using discovery and insight to find a pretty cool bug. Using data!Aside
I told a story at a conference recently about a team I worked on that used an offshore vendor team to run through a large number of applications for app compatability testing. We asked them to take notes and to send a report, but not to bother filing bug reports.
Yeah – we had sufficient telemetry and monitoring that we knew about all the bugs and glitches (and had collected call stacks and other helpful information) already. Many, in fact, that the test team didn’t (or couldn’t) notice. Entering the bugs would have been a waste of time. In the rare cases where something weird happened that we didn’t track, we immediately added the appropriate instrumentation to track that class of failure in the future.
I expect that for most of you, my world isn’t your world. But in my world, data driven engineering is critical.Epilogue?
Since I don’t know how made up my made up story really is, I’m going to report it to Runtastic anyway. I can’t predict the future (or anything else), but I hope the reply to my complaint is, “Yeah – we already knew about that. From the data”.(potentially) related posts: