I wish someone would ask me that question in an interview just so I could say, “A squirrel.”
A half-crazed Mississippi squirrel.
…are not always the full truth. Is that hurting our craft?
Last week, I attended the first Software Testing Club Atlanta Meetup. It was organized by Claire Moss and graciously hosted by VersionOne. The format was Lean Coffee, which was perfect for this meeting.
Photo by Claire Moss
I’m not going to blog about the discussion topics themselves. Instead, I would like to blog about a familiar Testing Story pattern I noticed:
During the first 2 hours, it seemed to me, we were telling each other the testing stories we wanted to believe, the stories we wanted each other to believe. We had to make first impressions and establish our personal expertise, I guess. But during the 3rd hour, we started to tell more candid stories, about our testing struggles and dysfunctions. I started hearing things like, “we know what we should be doing, we just can’t pull it off”. People who, at first impression, seemed to have it all together, seemed a little less intimidating now.
When we attend conference talks, read blog posts, and socialize professionally, I think we are in a bubble of exaggerated success. The same thing happens on Facebook, right? And people fall into a trap: The more one uses Facebook, the more miserable one feels. I’m probably guilty of spreading exaggerated success on this blog. I’m sure it’s easier, certainly safer, to leave out the embarrassing bits.
That being said, I am going to post some of my recent testing failure stories on this blog in the near future. See you soon.
Nearly 20 million Americans have now experienced the broken Obamacare website first hand. But Ben Simo, a past president of the Association for Software Testing, found something more than a cumbersome login or a blank screen—clear evidence of subpar coding on the site.
Here are the slides in PDF format. I'll be posting the video soon on YouTube.
I forgot to mention this before, but on Thursday, Jason Arbon from applause and uTest will be giving a practically free ($99) course on mobile application quality. Jason is a great teacher, and really knows his stuff in this area.
More information at the SASQAG web site (scroll down a bit on the home page)(potentially) related posts:
Five years ago, Lisa Crispin and Janet Gregory brought testing kicking and screaming into agile, with their insanely influential Agile Testing book. They are now working on a follow-up. This got me thinking that it’s about time we remodelled one of our sacred cows: the Agile Testing Quadrants. Although Brian Marick introduced the quadrants a few years earlier, it is undoubtedly Crispin and Gregory that gave Agile Quadrants the wings. The Quadrants were the centre-piece of the book, the one thing everyone easily remembered. Now is the right time to forget them.
XP is primarily a methodology invented by developers for developers. Everything outside of development was boxed into the role of the XP Customer, which translates loosely from devspeak to plain English as “not my problem”. So it took a while for the other roles to start trying to fit in. Roughly ten years ago, companies at large started renaming business analysts to product owners and project managers to scrum masters, trying to put them into agile boxes. Testers, forever the poor cousins, were not an interesting target group for expensive certification. So they were left utterly confused about their role in the brave new world. For example, upon hearing that their company is adopting Scrum, the entire testing department of one of our clients quit within a week. Developers worldwide, including me, secretly hoped that they’ll be able to replace those pesky pedants from the basement with a few lines of JUnit. And for many people out there, Crispin and Gregory saved the day. As the community started re-learning that there is a lot more to quality than just unit testing, the Quadrants became my primary conversation tool to reduce confusion. I was regularly using that model to explain, in less than five minutes, that there is still a place for testers, and that only one of the four quadrants is really about rapid automation with unit testing tools. The Quadrants helped me facilitate many useful discussions on the big picture missing from typical developers’ view of quality, and helped many testers figure out what to focus on.
The Quadrants were an incredibly useful thinking model for 200x. However, I’m finding it increasingly difficult to fit the software world of 201x into the same model. With shorter iterations and continuous delivery, it’s difficult to draw the line between activities that support the team and those that critique the product. Why would performance tests not be aimed at supporting the team? Why are functional tests not critiquing the product? Why would exploratory tests be only for business stuff? Why is UAT separate from functional testing? I’m not sure if the original intention was to separate things into those during development and after development, but most people out there seem to think about the horizontal Quadrants axis in terms of time (there is nothing in the original picture that suggests that, although Marick talks about a “finished product”). This creates some unjustifiable conclusions – for example that exploratory testing has to happen after development. The axis also creates a separation that I always found difficult to justify, because critiquing the product can support the team quite effectively, if it is done timely. Taking that to the extreme, with lean startup methods, a lot of critiquing the product should happen before a single line of production code is written.
The Quadrants don’t fit well with the all the huge changes that happened in the last five years, including the surge in popularity of continuous delivery, devops, build-measure-learn, big-data analytics obsession of product managers, exploratory and context driven testing. Because of that, a lot of the stuff teams do now spans several quadrants. The more I try to map things that we do now, the more the picture looks like a crayon self-portrait that my three year old daughter drew on our living room wall.
The vertical axis of the Quadrants is still useful to me. Separation of business oriented tests and technology oriented tests is a great rule of thumb, as far as I’m concerned. But the horizontal axis is no longer relevant. Iterations are getting shorter, delivery is becoming more continuous, and a lot of the stuff is just merging across that line. For example, Specification by Example helps teams to completely merge functional tests and UAT into something that is continuously checked during development. Many teams I worked with recently run performance tests during development, primarily not to mess things up with frequent changes – more to support the team than anything else.
Dividing tests into those that support the team and those that evaluate the product is not really helping to facilitate useful discussions any more, so it’s time to break that model.
The context driven testing community argues very hard that looking for expected results isn’t really testing – instead they call that checking. Without getting into an argument what is or isn’t testing, the division was quite useful to me for many recent discussions with clients. Perhaps that is a more useful second axis for the model: the difference between looking for expected outcomes and analysing aspects without a definite yes/no answer, where results require skilful analytic interpretation. Most of the innovation these days seems to happen in the second part anyway. Checking for expected results, both from a technical and business perspective, is now pretty much a solved problem.
Thinking about checking expected vs analysing outcomes that weren’t pre-defined helps to explain several important issues:
- We can split security into penetration/investigations (not pre-defined) and a lot of functional tests around compliance such as encryption, data protection, authentication etc (essentially all checking for pre-defined expected results), debunking the stupid myth that security is “non-functional”.
- We can split performance into load tests (where will it break?) and running business scenarios to prove agreed SLAs and capacity, continuous delivery style, debunking the stupid myth that performance is a technical concern.
- We can have a nice box for ACC-matrix driven exploration of capabilities, as well as a meaningful discussion about having separate technical and business oriented exploratory tests.
- We can have a nice box for build-measure-learn product tests, and have a meaningful discussion on how those tests require a defined hypothesis, and how that is different from just pushing stuff out and seeing what happens through usage analytics.
- We can have a nice way of discussing production log trends as a way of continuously testing technical stuff that’s difficult to automate before deployment, but still useful to support the team. We can also have a nice way of differentiating those tests from business-oriented production usage analytics.
- We could avoid silly discussions on whether usability testing is there to support the team or evaluate the product.
Most importantly, by using that horizontal axis, we can raise awareness about a whole category of things that don’t fit into typical test plans or test reports, but are still incredibly valuable. The 200x quadrants were useful because they raised awareness about a whole category of things in the upper left corner that most teams weren’t really thinking of, but are now taken as common sense. The 201x quadrants can help us raise awareness about some more important issues for today.
That’s my current thinking about it. Perhaps the model can look similar to the picture below.
What do you think?
Earlier this year, we presented Espresso at GTAC as a solution to the UI testing problem. Today we are announcing the launch of the developer preview for Espresso!
The compelling thing about developing Espresso was making it easy and fun for developers to write reliable UI tests. Espresso has a small, predictable, and easy to learn API, which is still open for customization. But most importantly - Espresso removes the need to think about the complexity of multi-threaded testing. With Espresso, you can think procedurally and write concise, beautiful, and reliable Android UI tests quickly.
Espresso is now being used by over 30 applications within Google (Drive, Maps and G+, just to name a few). Starting from today, Espresso will also be available to our great developer community. We hope you will also enjoy testing your applications with Espresso and looking forward to your feedback and contributions!
Android Test Kit: https://code.google.com/p/android-test-kit/
As a testing exercise, it would be great if you could draw one or two graphs or illustrations, take a photo of it and either uploaded to twitter with hash tag #testindex or uploaded it to the cartoon tester Facebook page.Go on, it would make you think about testing which is never a bad thing (especially if you’re a tester!).
Cracked (I put the name in italics because it was a magazine in my day, sonny, and I fancy myself the IT world’s Sylvester P. Smythe) has a piece entitled 5 Reasons Tech Companies Make Bad Gadgets (An Inside Look) that you might want to read.
It’s not about software per se, but it looks awfully familiar.
Cheers, Andy (also known as the Cartoon Tester!)
Here are my slides from the Agile Tour Vienna 2013 Keynote: How best to sabotage your product.
I implied at the end of my last post that I’d follow up after my keynote (I failed – sorry). This was a weird conference for me. While I attended nearly all of the keynotes, I only made it to a few other sessions, and didn’t have as much time to hang out with folks as I would have liked. For better or for worse, I spent the majority of the week in my hotel room working (internet access was much better than in the conference area). Even in hindsight, taking care of the day job was the right thing to do.
But I had a great time at STAR. I’m mostly happy with my keynote and think I delivered everything the way I wanted (incidentally, the keynote is online here). I thought Friday’s panel discussion with Jon Bach and Dawn Haynes was a lot of fun (although I probably couldn’t have been more annoyed with the content and style of Friday’s keynote speaker). Over all, it was another great STAR conference, and I can’t wait to attend another one.
And if you missed me, now that Xbox One is almost out the door, I’m planning to be back at STAR for STAR East this spring in Orlando. I hope to see some of you there.(potentially) related posts:
My data warehouse project team is configuring one of our QA environments to be a dynamic read-only copy of production. I’m salivating as I try to wrap my head around the testing possibilities.
We are taking about 10 transactional databases from one of our QA environments, and replacing them with 10 databases replicated from their production counterparts. This means, when any of our users perform a transaction in production, said data change will be reflected in our QA environment instantly.
- Excellent Soak Testing – We’ll be able to deploy a pre-production build of our product to our Prod-replicated-QA-environment and see how it handles actual production data updates. This is huge because we have been unable to find some bugs until our product builds experience real live usage.
- Use real live user scenarios to drive tests – We have a suite of automated checks that invoke fake updates in our transactional data bases, then expect data warehouse updates within certain time spans. The checks use fake updates. Until now. With the Prod-replicated-QA-environment, we are attempting to programmatically detect real live data updates via logging, and measure those against expected results.
- Comparing reports – A new flavor of automated checks is now possible. With the Prod-replicated-QA-environment, we are attempting to use production report results as a golden master to compare to QA report results sitting on the pre-production QA build data warehouse. Since the data warehouse data to support the reports should be the same, we can expect the report results to match.
- The Prod-replicated-QA-environment will be read-only. This means instead of creating fake user actions whenever we want, we will need to wait until they occur. What if some don’t occur…within the soak test window?
- No more data comparing? - Comparing transactional data to data warehouse data has always been a bread and butter automated check we’ve performed. These checks check data integrity and data loading. Comparing a real live quickly changing source to a slowly updating target will be difficult at best.
1) That so many state exchanges that experienced problems make me wonder if none of these IT shops have heard about performance testing? Failover servers? Load balancing? In a way, this was almost engineered (my apologies to all engineers) to fail because it's the "everybody show up at the same scenario".
2) There were reportedly functional defects in some states that prevented people from even setting up a user account.
3) Once again, the idea prevailed that just because someone in government declared "Let there be a system for...", people assumed the resulting system would be on schedule, adequate quality, etc. There are no magic IT wands. But, on the other hand, how hard is it to build a web site that is just a directory to other sites? Of course, I'm just the consultant looking in from the outside. I've seen simple problems grow into complex monsters once vendors and government meet.
4) Then, of course, there are the flaws in the requirements concerning rate calculations.
I'm glad my state of Oklahoma opted out of building its own exchange.
I will be surprised if the problems are resolved quickly. I've seen these situations before and the more people try and fail to get access, the more they keep trying. It's a death spiral of performance.
Maybe, the people from United Airlines and the various state exchanges could get together and we could all have free insurance!
******* Update *******
In USA Today (http://www.usatoday.com/story/news/nation/2013/10/05/health-care-website-repairs/2927597/) we find the following:
"U.S. Chief Technology Officer Todd Park said the government expected HealthCare.gov to draw 50,000 to 60,000 simultaneous users, but instead it has drawn as many as 250,000 at a time since it launched Oct. 1." and "These bugs were functions of volume,'' Park said. "Take away the volume and it works.''
So, it appears that one contributing factor to the "bugs" (I would suggest this is system failure, not just a "bug") is that the performance targets were set way too low. This is like the infamous Victoria's Secret online fashion show failure at halftime of the Super Bowl a few years back. In performance testing of new launches, you have to take into account the curiosity factor. In the case of Obamacare, you tell 300 million people that a certain day is the day to check it out and expect only 60,000 people to show up? Come on, guys, you have to set your sights higher than that.
This is a great lesson in performance testing. You always go for high numbers for big launches (Like the Facebook IPO). Unless, of course, you want to go the public apology route.
it’s 6:30-ish pm, and I’m killing time waiting for my sound check for tomorrow’s keynote, and thought I’d do a quick brain dump of today’s tutorial session.
Today’s session was “Alan Page: On Testing” – which is a pretty wide open topic. For the slide handouts, I slapped together slides from a bunch of things I could talk about, but my plan all along was different. In a perhaps risky move, I decided that I’d take the first 10 minutes of the session to collect as many questions from the audience as I could, then I loosely grouped the questions and put together a few impromptu talks to cover the answers. I took questions as I went, plus a few ad-hoc questions at the end, and filled the 3.5 hour session.
The problem with this sort of thing is that it exhausts me. I’m wiped out, and I’ve lost half of my voice, but I should be good to go for my keynote in the morning.
The other drawback of this sort of session (and the few pieces of feedback I glanced at reflect this) is that this sort of session is polarizing, Attendees either got a ton of value, or little value – comments like “Love the unstructured format – tons of great information” were contrasted with, “Didn’t like the unstructured format – too much information”. I’m not too concerned, since the conference circuit isn’t really my thing, but I feel a little bad I didn’t set up the people in the “don’t like unstructured” group a little better with expectations.
More tomorrow after my (structured-ish) keynote.(potentially) related posts:
A quote commonly attributed to Napoleon says:
Rascality has limits; stupidity has not.
I’m testing this application as rascally as I can, but I am only one member of the team, and the users are infinite monkeys with infinite typewriters and hammers and stone tablets.
Am I being stupid enough? How can I be more stupid?
I can’t imagine testing without multiple computers at my disposal. You may want to hold on to your old, out of warranty computers if given the choice. Five quick reasons:
- When Computer#1 hits an impediment such as an unrecoverable error, Computer#2 can start testing immediately as Computer#1 reboots.
- I can use both computers to simulate interesting multi-user tests. Example: what if two users attempt to acquire locks at roughly the same time.
- I can kick off timely processes, staggered on 3 separate boxes, so as not to sit idle waiting.
- Different OS’s, browser versions, frameworks, and other software running in the background can be informally tested to narrow down variables.
- Computer#1 can support the administrative work of testing (e.g., documenting tests, bugs, emailing), while Computer#2 can stay clean and focus on operating the product under test.
I got a question from one of the blog readers on how I would describe a spec with examples for a user-interface specific user story, such as “As a user, I want to register in order to log in”. The reader challenged the value of doing a Cucumber test for the registration, because it’s obvious and mostly UI-heavy. First of all, there is nothing obvious about that story. In fact, that is the problem! I wouldn’t even try to describe the spec for it, because that would just be continuing a garbage-in-garbage-out queue. A good user story is a necessary input for a spec workshop. A good user story is one that helps the delivery team reach a shared understanding on what it is about, and that helps the team discuss the needs with their business stakeholders. “As a user, I want to register …” fails miserably there, because it is a lie.
The lie starts with the whole premise. “As a user, I want to register”…. No I don’t. As a user I don’t want to give my private information to another site, have to argue with some arbitrary fascist filter about which combination of letters and numbers is strong enough, try to guess what’s written on some random distorted image, and then have to remember another set of fake privacy answers. That sentence might be in a user story format, but it’s far far from a user story. It’s grammatically correct, but completely false, just like saying “As a citizen of Greece I want to pay my tax so that the EU stops giving us free money”.
As a user, I will suffer registering if it brings me some value, but I don’t want it. As a user, I might want to store my files securely online, to limit access to my data, or to do something else… But registering, and for that matter logging in, is very low on my list of priorities. And that’s where we run into the problem: the lying “user story” misleads teams into wasting time on discussing things that don’t matter that much. Because there is no context, there is no way to know when we’ve done enough, so unnecessary features creep in. Because there is no real benefit there, the “story” will lead to a feature which, on its own, won’t be particularly useful. A team can demo it at the end of the sprint, but there is zero value in it going live on its own. To get real feedback they’ll have to do who knows how many more fake stories, with more feature creep.
A more realistic description of what the users really want to achieve would be better, because it would limit the security aspects to what is really necessary and prevent unnecessary bloat. It might lead to something shorter, shaper, deliverable that would actually bring value. A more realistic description of who really wants this might lead to a realisation that it’s not the users who want to register or log in, but possibly the web site operators who want to identify users so that they could charge them correctly, or compliance officers who are concerned about privacy complaints, or marketers who want to harvest e-mail addresses. None of those motives and needs are captured in the starting sentence, so they won’t be discussed or implemented at the same time as the silly log in form. It doesn’t matter what you write in the spec or test for this user story, it will fail to deliver.
Stories like this are fake, misleading, and only hurt. They don’t provide a context for a good discussion or prioritisation. They are nothing more than functional task breakdowns, wrapped into a different form to pass the scrutiny of someone who spent two days on some silly certification course, and to provide the organisation some fake comfort that they are now, in fact, agile. Don’t fall into that trap. Challenge your user stories, scrutinise them, and make sure that they capture real stakeholders and their real needs. This leads to much better systems, and much happier users and stakeholders.
For example, MindMup allows people to store files online and come back to them by generating random URLs and storing the references in the local browser profile. No log in and no registration required, and our users love us for that. We got to roughly 140K users now, without ever needing to have a database, which means that the system is cheaper to run and has one less point of potential failure. It’s a win-win scenario just because we were realistic about what our users want. Registration, the first thing that most teams do when building a web site, never really came into the plan.