Software Testing

The Secret of My Success

QA Hates You - Tue, 06/24/2014 - 09:18

Haters gonna hate – but it makes them better at their job: Grumpy and negative people are more efficient than happy colleagues:

Everyone hates a hater. They’re the ones who hate the sun because it’s too hot, and the breeze because it’s too cold.

The rest of us, then, can take comfort in the fact that haters may not want to get involved in as many activities as the rest of us.

But in a twist of irony, that grumpy person you know may actually be better at their job since they spend so much time on fewer activities.

It’s not true, of course.

Haters don’t hate other haters.

But the rest could hold true.

Categories: Software Testing

QA Music: Is There Any Hope for QA?

QA Hates You - Mon, 06/23/2014 - 05:27

Devour the Day, “Good Man”:

Categories: Software Testing

LoadStorm v2.0 Now with Transaction Response Timing – What does this mean for you?

LoadStorm - Tue, 06/17/2014 - 12:17

Today, LoadStorm published a press release announcing our new Transaction Response Timing. For many professional performance testers, especially those used to products like HP Loadrunner or SOASTA CloudTest, wrapping timings around logical business processes and related transactions, is a familiar concept. For those of you that aren’t familiar, I’ll explain.

What is Transaction Response Time?

Transaction Response Time represents the time taken for the application to complete a defined transaction or business process.

Why is it important to measure Transaction Response Time?

The objective of a performance test is to ensure that the application is working optimally under load. However, the definition of “optimally” under load may vary with different systems.
By defining an initial acceptable response time, we can benchmark the application if it is performing as anticipated.

The importance of Transaction Response Time is that it gives the project team/application team an idea of how the application is performing in the measurement of time. With this information, they can relate to the users/customers on the expected time when processing request or understanding how their application performed.

What does Transaction Response Time encompass?

The Transaction Response Time encompasses the time taken for the request made to the web server, there after being processed by the Web Server and sent to the Application Server, which in most instances will make a request to the Database Server. All this will then be repeated again in reverse from the Database Server, Application Server, Web Server and back to the user. Take note that the time taken for the request or data in the network transmission is also factored in.

To simplify, the Transaction Response Time is comprised of the following:
Processing time on Web Server
Processing time on Application Server
Processing time on Database Server
Network latency between the servers, and the client

The following diagram illustrates Transaction Response Time.

Transaction Response Time = (t1 + t2 + t3 + t4 + t5 + t6 + t7 + t8 + t9) X 2
Note: X 2 represents factoring of the time taken for the data to return to the client.

How do we measure?

Measurement of the Transaction Response Time begins when the defined transaction makes a request to the application. From here, until the transaction completes before proceeding with the next subsequent request (in terms of transaction), the time is been measured and will stop when the transaction completes.

How can we use Transaction Response Time to analyze performance issues?

Transaction Response Time allows us to identify abnormalities when performance issues surface. This will be represented as a slow response of the transaction, which differs significantly (or slightly) from the average of the Transaction Response Time. With this, we can correlate by using other measurements such as the number of virtual users that are accessing the application at that point in time and the system-related metrics (e.g. CPU Utilization) to identify the root cause.

With all the data that has been collected during the load test, we can correlate the measurements to find trends and bottlenecks between the response time, and the amount of load that was generated.

How is it beneficial to the Project Team?

Using Transaction Response Time, the Project Team can better relate to their users by using transactions as a form of language protocol that their users can comprehend. Users will know that transactions (or business processes) are performing at an acceptable level in terms of time.
Users may be unable to understand the meaning of CPU utilization or Memory usage and thus using the common language of time is ideal to convey performance-related issues.

The post LoadStorm v2.0 Now with Transaction Response Timing – What does this mean for you? appeared first on LoadStorm.

You Can Learn From Others’ Failures

QA Hates You - Tue, 06/17/2014 - 04:27

10 Things You Can Learn From Bad Copy:

We’ve all read copy that makes us cringe. Sometimes it’s hard to put a finger on exactly what it is that makes the copy so bad. Nonetheless, its lack of appeal doesn’t go unnoticed.

Of course, writing is subjective in nature, but there are certain blunders that are universal. While poor writing doesn’t do much to engage the reader or lend authority to its publisher, it can help you gain a better understanding of what is needed to produce quality content.

It’s most applicable to content-heavy Web sites, but some are more broadly applicable to applications in general. Including #8, Grammar Matters:

Obviously, you wouldn’t use poor grammar on purpose. Unfortunately, many don’t know when they’re using poor grammar.

That’s one of the things we’re here for.

(Link via SupaTrey.)

Categories: Software Testing

GTAC 2014: Call for Proposals & Attendance

Google Testing Blog - Mon, 06/16/2014 - 15:57
Posted by Anthony Vallone on behalf of the GTAC Committee

The application process is now open for presentation proposals and attendance for GTAC (Google Test Automation Conference) (see initial announcement) to be held at the Google Kirkland office (near Seattle, WA) on October 28 - 29th, 2014.

GTAC will be streamed live on YouTube again this year, so even if you can’t attend, you’ll be able to watch the conference from your computer.

Speakers
Presentations are targeted at student, academic, and experienced engineers working on test automation. Full presentations and lightning talks are 45 minutes and 15 minutes respectively. Speakers should be prepared for a question and answer session following their presentation.

Application
For presentation proposals and/or attendance, complete this form. We will be selecting about 300 applicants for the event.

Deadline
The due date for both presentation and attendance applications is July 28, 2014.

Fees
There are no registration fees, and we will send out detailed registration instructions to each invited applicant. Meals will be provided, but speakers and attendees must arrange and pay for their own travel and accommodations.



Categories: Software Testing

QA Music: QA’s Place

QA Hates You - Mon, 06/16/2014 - 04:33

The Pretty Reckless with “Heaven Knows”:

Sometimes it does feel like they want to keep us in our place at the end of the process, ainna?

Categories: Software Testing

An Angry Weasel Hiatus

Alan Page - Sun, 06/15/2014 - 21:24

I mentioned this on twitter a few times, but should probably mention it here for completeness.

I’m taking a break.

I’m taking a break from blogging, twitter, and most of all, my day job. Beyond July 4, 2014, I’ll be absent from the Microsoft campus for a few months– and most likely from social media as well. While I can guarantee my absence from MS, I may share on twitter…we’ll just have to see.

The backstory is that Microsoft offers a an award – a “sabbatical” for employees of a certain career stage and career history. I received my award nine years ago, but haven’t used it yet…but now’s the time – so I’m finally taking ~9 weeks off beginning just after the 4th of July, and concluding in early September.

Now – hanging out at home for two months would be fun…but that’s not going to cut it for this long awaited break. Instead, I’ll be in the south of France (Provence) for most of my break before spending a week in Japan on my way back to reality.

I probably won’t post again until September. I’m sure I have many posts waiting on the other side.

-AW

Categories: Software Testing

Forget the walking skeleton – put it on crutches

The Quest for Software++ - Mon, 06/09/2014 - 06:46

TL;DR version: Modern technology and delivery approaches, in particular the push towards continuous delivery, enable us take the walking skeleton technique much further. By building up the UI first and delivering the back-end using continuous delivery without interrupting or surprising users, we can get cut the time to initial value delivery and validation by an order of magnitude!

This is an excerpt from my upcoming book 50 Quick Ideas to Improve your User Stories. Grab the book draft from LeanPub and you’ll get all future updates automatically.

The Walking Skeleton has long been my favourite approach to start a major effort or a legacy transition. Alistair Cockburn provides a canonical definition of this approach in Crystal Clear:

A Walking Skeleton is a tiny implementation of the system that performs a small end-to-end function. It need not use the final architecture, but it should link together the main architectural components. The architecture and the functionality can then evolve in parallel.

Building on that, Dan North formulated the Dancing Skeleton Pattern. In addition to performing a small function on the target architecture, this pattern involves an interface for developers to interact with the environment and experiment. When the users need to perform a more complex function, developers can make the skeleton implementation “dance”.

The walking skeleton sets up the main architectural components, and the dancing skeleton allows experimentation on the target architecture. Both approaches allow teams to ship something quickly, and build on that iteratively. Modern technology and delivery approaches, in particular the push towards continuous delivery, enable us take those approaches even further.

Instead of validating the architecture with a thin slice, we can aim to deliver value with an even thinner slice and build the architecture through iterative delivery later. We can start with a piece that users can interact with, avoid linking together the architectural components, but instead use a simpler and easier back-end. We can then iteratively connect the user interface to the target platform. In the skeleton lingo, don’t worry about making the skeleton walk, put it on crutches and ship out. While users are working with it, build up the muscles and then take away the crutches.

The core idea of the skeleton on crutches is to ship out the user interface early, and plan for iterative build-up of everything below the user interface. The user interaction may change very little, or not at all, while the team improves the back-end through continuous delivery. With a good deployment pipeline, back-ends can be updated and reconfigured without even interrupting user flow. For example, with MindMup we fully multi-version scripts and APIs, so we can deploy a new version of back-end functionality without interrupting any active sessions. Users with open sessions will continue to use the old functionality until they refresh the page, and they can benefit from updated features on their next visit.

Here are some recent examples of skeletons on crutches on the projects I’ve been involved in:

  • A customer report built with Google Analytics events instead of hooking into a complex backend database. It wasn’t as comprehensive or accurate as the final thing, but the first version was done in two hours, and the final version took several weeks to implement because of third party dependencies.
  • A document uploader which saved directly from users’s browsers to Amazon S3 file storage, instead of uploading to the customer database. The skeleton was not as secure or flexible as the final solution, and did not allow fast searching, but was done in a hour.
  • A registration system executed using JotForm, an online forms and workflow system which can take payments. This allowed us to start registering users and charging for access without even having a database.

All these solutions were not nearly as good as the final versions, but they were in production very quickly and allowed users to start getting some value early. Some remained in action for a long time – for example we kept registration through JotForm in production for three months, because it was good enough and there were more important things to do.

Key Benefits

A major benefit of skeletons on crutches is early value delivery. A user segment starts getting some value almost instantly. None of the examples I mentioned earlier were particularly difficult to build, but users got value in hours instead of weeks.

Early validation is an important consequence of early value delivery. Instead of showing something to users and asking them to imagine how the final thing would be fantastic, a skeleton on crutches allows them to actually use something in real work. This allows product managers to get invaluable feedback on features and product ideas early and build that learning into the plans for later versions.

For example, the JotForm registration skeleton allowed us to start charging for access months before the overall solution was ready, and the fact that users paid provided huge validation and encouragement that we’re going in the right direction. We were also able to discover some surprising things people would pay for, and change the focus of our business plans.

How to make this work

Aim to deliver the user interfacing part first, on a simplistic architecture. Try to skip all back-end components that would slow down delivery, even if for half a day. Deliver the user interfacing part as close to the final thing as possible, to avoid future surprises. Iterate over that until you start delivering value. Then replace the back-end with as few changes to the user interfacing part as possible to avoid surprises. If you must change the UI, think about multi-versioning to support uninterrupted usage. Build up capacity, performance, and extend functionality by replacing invisible parts with your target architecture iteratively.

There are many good crutches available for developers online. My favourite ones are:

  • Amazon AWS cloud file stores, database systems and queues enable teams to store user documents and information easily, coordinate simple asynchronous flows and authenticate access to information using pre-signed URLs.
  • JotForm and similar forms processors allow teams to design forms and embed them in web sites. Jotform automatically keeps a record of submissions, it can send e-mail confirmations to users and developers. It also supports simple workflows for things such as payments, adding to a mailing list or sending to a landing page.
  • Stripe is a developer-friendly payment processor, that is easy to integrate and has an option of sending e-mail invoices to customers.
  • Google analytics are well known for web statistics, but they can also collect front-end and back-end events and create relatively complex reports on vast amounts of data. Think of this as a powerful reporting front-end for your data on which you can experiment with reporting structures and requirements.
  • Heroku application servers can run simple tasks, connecting to other crutches, and you can start and shut them down easily on demand. Heroku has lots of third party applications that you can connect easily. They may not do exactly what you need, and they can be quite pricey for large user bases, but Heroku is often a good quick and cheap start for something small.

If you can’t put a skeleton on remote crutches because of security or corporate policy, then consider using simple components onsite. One interesting example of this is a team at a bank I recently worked with. Their corporate procedures required weeks of bureaucracy to deploy an application with access to a production database. The team set up a web reporting application which executed reports on plain text files using Unix command line tools behind the scenes. This allowed them to deploy quickly, but only worked for quite a limited data set. While the users were happily using the reports in the first few weeks, the team obtained access to the production database and replaced the backend, without anyone noticing anything.

HTTP Archive – new stuff!

Steve Souders - Sun, 06/08/2014 - 23:11
Background

The HTTP Archive crawls the world’s top 300K URLs twice each month and records detailed information like the number of HTTP requests, the most popular image formats, and the use of gzip compression. We also crawl the top 5K URLs on real iPhones as part of the HTTP Archive Mobile. In addition to aggregate stats, the HTTP Archive has the same set of data for individual websites plus images and video of the site loading.

I started the project in 2010 and merged it into the Internet Archive in 2011. The data is collected using WebPagetest. The code and data are open source. The hardware, mobile devices, storage, and bandwidth are funded by our generous sponsors:  GoogleMozillaNew RelicO’Reilly MediaEtsyRadwaredynaTrace SoftwareTorbitInstart Logic, and Catchpoint Systems.

For more information about the HTTP Archive see our About page.

New Stuff!

I’ve made a lot of progress on the HTTP Archive in the last two months and want to share the news in this blog post.

Github
A major change was moving the code to Github. It used to be on Google Code but Github is more popular now. There have been several contributors over the years, but I hope the move to Github increases the number of patches contributed to the project.
Histograms
The HTTP Archive’s trending charts show the average value of various stats over time. These are great for spotting performance regressions and other changes in the Web. But often it’s important to look beyond the average and see more detail about the distribution. As of today all the relevant trending charts have a corresponding histogram. For an example, take a look at the trending chart for Total Transfer Size & Total Requests and its corresponding histograms. I’d appreciate feedback on these histograms. In some cases I wonder if a CDF would be more appropriate.
Connections
We now plot the number of TCP connections that were used to load the website. (All desktop stats are gathered using IE9.) Currently the average is 37 connections per page.  
CDNs
We now record the CDN, if any, for each individual resource. This is currently visible in the CDNs section for individual websites. The determination is based on a reverse lookup of the IP address and a host-to-CDN mapping in WebPagetest.
Custom Metrics

Pat Meenan, the creator of WebPagetest, just added a new feature called custom metrics for gathering additional metrics using JavaScript. The HTTP Archive uses this feature to gather these additional stats:

Average DOM Depth
The complexity of the DOM hierarchy affects JavaScript, CSS, and rendering performance. I first saw average DOM depth as a performance metric in DOM Monster. I think it’s a good stat for tracking DOM complexity. 
Document Height
Web pages are growing in many ways such as total size and use of fonts. I’ve noticed pages also getting wider and taller so HTTP Archive now tracks document height and width. You can see the code here. Document width is more constrained by the viewport of the test agent and the results aren’t that interesting, so I only show document height. 
localStorage & sessionStorage
The use of localStorage and sessionStorage can help performance and offline apps, so the HTTP Archive tracks both of these. Right now the 95th percentile is under 200 characters for both, but watch these charts over the next year. I expect we’ll see some growth.
Iframes
Iframes are used frequently to contain third party content. This will be another good trend to watch.
SCRIPT Tags
The HTTP Archive has tracked the number of external scripts since its inception, but custom metrics allows us to track the total number of SCRIPT tags (inline and external) in the page.
Doctype
Specifying a doctype affects quirks mode and other behavior of the page. Based on the latest crawl, 14% of websites don’t specify a doctype, and “html” is the most popular doctype at 40%. Here are the top five doctypes.

Doctype Percentage html 40% html -//W3C//DTD XHTML 1.0 Transitional//EN http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd 31% [none] 14% html -//W3C//DTD XHTML 1.0 Strict//EN http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd 8% HTML -//W3C//DTD HTML 4.01 Transitional//EN http://www.w3.org/TR/html4/loose.dtd 3%

Some of these new metrics are not yet available in the HTTP Archive Mobile but we’re working to add those soon. They’re available as histograms currently, but once we have a few months of data I’ll add trending charts, as well.

What’s next?

Big ticket items on the HTTP Archive TODO list include:

  • easier private instance - I estimate there are 20 private instances of HTTP Archive out there today (see here, here, here, here, and here). I applaud these folks because the code and documentation don’t make it easy to setup a private instance. There are thousands of WebPagetest private instances in the world. I feel that anyone running WebPagetest on a regular basis would benefit from storing and viewing the results in HTTP Archive. I’d like to lower the bar to make this happen.
  • 1,000,000 URLs – We’ve increased from 1K URLs at the beginning four years ago to 300K URLs today. I’d like to increase that to 1 million URLs on desktop. I also want to increase the coverage on mobile, but that’s going to probably require switching to emulators.
  • UI overhaul – The UI needs an update, especially the charts.

In the meantime, I encourage you to take a look at the HTTP Archive. Search for your website to see its performance history. If it’s not there (because it’s not in the top 300K) then add your website to the crawl. And if you have your own questions you’d like answered then try using the HTTP Archive dumps that Ilya Grigorik has exported to Google BigQuery and the examples from bigqueri.es.

Categories: Software Testing

Six ways to make impact maps more effective

The Quest for Software++ - Wed, 06/04/2014 - 14:30

We’ve been working very hard over the last few months to make impact mapping easier and more effective with MindMup. Here are 6 ways we think it’s helping. I’d love to hear your ideas and comments on how we can make it even better.

GTAC 2014 Coming to Seattle/Kirkland in October

Google Testing Blog - Wed, 06/04/2014 - 13:21
Posted by Anthony Vallone on behalf of the GTAC Committee

If you're looking for a place to discuss the latest innovations in test automation, then charge your tablets and pack your gumboots - the eighth GTAC (Google Test Automation Conference) will be held on October 28-29, 2014 at Google Kirkland! The Kirkland office is part of the Seattle/Kirkland campus in beautiful Washington state. This campus forms our third largest engineering office in the USA.



GTAC is a periodic conference hosted by Google, bringing together engineers from industry and academia to discuss advances in test automation and the test engineering computer science field. It’s a great opportunity to present, learn, and challenge modern testing technologies and strategies.

You can browse the presentation abstracts, slides, and videos from last year on the GTAC 2013 page.

Stay tuned to this blog and the GTAC website for application information and opportunities to present at GTAC. Subscribing to this blog is the best way to get notified. We're looking forward to seeing you there!

Categories: Software Testing

Like A Prometheus of Vocabulary, I Bring You New Tools

QA Hates You - Wed, 06/04/2014 - 09:07

The Australians and New Zealanders have a word, rort, that we should employ as part of our software testing lexicon.

I’m going to rort this Web site.

Feel free to drop that in your next stand-up. Bad Australian accent is optional.

Categories: Software Testing

E-Commerce Benchmark – Magento Scalability Results

LoadStorm - Tue, 06/03/2014 - 08:55

The Web Performance Lab has been working hard to improve our testing and scoring methodology, and the new benchmark for Magento is here!

We had high expectations for Magento because it is such a favorite among web developers globally. Unfortunately, the testing shows that the scalability of an out-of-the-box implementation of Magento was disappointing. Our target objectives for scalability and web performance were not achieved by the load tests, and we saw slower response and more errors at a lower volume than we hoped.

Experiment Methodology

Our Magento e-commerce platform has been installed on an Amazon Elastic Cloud Compute m3.Large instance for a server. Four new scripts to model typical e-commerce user activity have been created:

  • Browse and Search
  • Product search and cart abandonment
  • New account registration and cart abandonment
  • User login and purchase

To be effective in our load testing, we ran a series of preliminary tests. The first of our preliminary tests only utilized one script – new account registration and cart abandonment. This test increased VUsers linearly from 1 to 1000, over the period of an hour. Once the error rates for this test reach an unacceptable level, the test was stopped and the amount of concurrent users at that point are noted. This test allows us to find that rough baseline, while simultaneously registering new user accounts in our e-commerce platform.

The second test we conducted utilized all four scripts, weighting the traffic according to typical e-commerce user activities. This test was run for only 15 minutes, and increased VUsers linearly from 5-500.

Testing Phase

Based on the point of failure according to a high error rate, our three scalability tests would be run only to 200 VUsers. The results of the three tests had minimal difference in results, so we chose to analyze the median test for simplicity.

Failure Criteria

For this series of experiments, we’ve defined three main criteria to determine a point of failure in our load tests on the e-commerce stores. If any of the following criteria are met, then that point in the test is the point of failure and we determine the system to be scalable up to that point:

  • Performance error rate greater than 5 %
  • Average Response Time for requests exceeding 1 second
  • Average Page Completion Time exceeding 10 seconds
Data Analysis & Results

Performance Metric 1: Error Rates greater than 5 %

Error rates for the Magento platform were nonexistent until the 26 minute mark in the test, where they slowly begin to increase. Error rates didn’t exceed 5 % until 35 minutes into the test, at 145 concurrent VUsers.

We encountered three types of errors during this load test; request read timeouts, internal server errors (500 errors), and forbidden errors (error code 403). An internal server error means that the web site’s server experienced a problem, but could not give specific details on what the exact problem is. Request read timeouts indicate that the server took longer than the time limit LoadStorm’s parameters gave it to respond, so the request was cancelled. After investigating which requests yielded 403 errors, we found that it was the checkout processes in a script that became “forbidden” halfway into the test. Interestingly, the occurrence of these errors appeared almost Gaussian in nature with respect to time, and even dropped back down to 0% at the 53 minute mark.

Performance Metric 2: Average Response Time exceeding 1 second for a request

The average response time for requests in our Magento store start out very good, and remain acceptable up until around 100 concurrent VUsers. It degrades consistently from that point on and reaches just over a second (1.001 seconds) at the 137 VUser mark. If we examine the throughput and requests per second and compare it with average response time, we can see easily graphically see the point in our load test that we begin to reach our scalability limits.

Performance Metric 3: Average Page Completion Time exceeding 10 seconds

The last key metric to analyze is the average page completion time for the home-page for our e-commerce site. Based on the average response time, it was unsurprising that the average page completion time started out very fast, but also reached unacceptable speeds after the 100 VUser point in the test. The average page completion time exceeds 10 seconds, at 134 concurrent VUsers, with an average page completion time of 11.24 seconds.

This means that the average user would have to wait over 10 seconds just for the home page to load, which is completely unacceptable. Studies on the human mind and usability indicate that website response time limits of 10 seconds will keep a user’s attention. A delay greater than 10 seconds affects the user’s flow of thought, and will often make them leave a site immediately. Ideally, what we like to see for page completion time is 5 seconds or less.

Conclusions

Magento had very interesting results, but it’s performance was disappointing because we expected to reach 500 Concurrent Users and higher Throughput. Like we saw for the WooCommerce platform, the point of failure in our load test was determined by average page completion time, a crucial facet of user experience.

Based on our criteria for failure, we’ve determined the system to be scalable up to 134 concurrent VUsers. To score our ecommerce platform, we took four different metrics that reflect volume and weighted them evenly. We compared the actual results with what we considered to be reasonable goals for each metric of scalability. The average of the four different goals give us our overall Scalability Score.

Improving the scalability of any site builds a competitive advantage. Preparing and load testing a site is a way to gain insight on existing bottlenecks to be improved to ensure a pleasurable online shopping experience. It is unrealistic to expect that an e-commerce site will be successful if it is not scalable. It is essential to investigate the performance limitations of any site, and to implement appropriate solutions to improve the overall user experience. The user experience is just as important as the application’s feature set, and it’s vital for web developers to appreciate the scalability and speed of their apps.

Stay tuned for more e-commerce platforms in queue!

The post E-Commerce Benchmark – Magento Scalability Results appeared first on LoadStorm.

A Quiz Where I Proudly Scored 0

QA Hates You - Tue, 06/03/2014 - 08:24

The 32 Words That Used Incorrectly Can Make You Look Bad.

So much of our written communication, including emails, texts, tweets, and online conversations are informal, but I still take pride in using full sentences, correct punctuation, and the right word for the job. It differentiates me from people who don’t, and I hope it serves me well in picking up clients and contacts.

Categories: Software Testing

Testing on the Toilet: Risk-Driven Testing

Google Testing Blog - Fri, 05/30/2014 - 17:10
by Peter Arrenbrecht

This article was adapted from a Google Testing on the Toilet (TotT) episode. You can download a printer-friendly version of this TotT episode and post it in your office.

We are all conditioned to write tests as we code: unit, functional, UI—the whole shebang. We are professionals, after all. Many of us like how small tests let us work quickly, and how larger tests inspire safety and closure. Or we may just anticipate flak during review. We are so used to these tests that often we no longer question why we write them. This can be wasteful and dangerous.

Tests are a means to an end: To reduce the key risks of a project, and to get the biggest bang for the buck. This bang may not always come from the tests that standard practice has you write, or not even from tests at all.

Two examples:

“We built a new debugging aid. We wrote unit, integration, and UI tests. We were ready to launch.”

Outstanding practice. Missing the mark.

Our key risks were that we'd corrupt our data or bring down our servers for the sake of a debugging aid. None of the tests addressed this, but they gave a false sense of safety and “being done”.
We stopped the launch.

“We wanted to turn down a feature, so we needed to alert affected users. Again we had unit and integration tests, and even one expensive end-to-end test.”

Standard practice. Wasted effort.

The alert was so critical it actually needed end-to-end coverage for all scenarios. But it would be live for only three releases. The cheapest effective test? Manual testing before each release.

A Better Approach: Risks First

For every project or feature, think about testing. Brainstorm your key risks and your best options to reduce them. Do this at the start so you don't waste effort and can adapt your design. Write them down as a QA design so you can point to it in reviews and discussions.

To be sure, standard practice remains a good idea in most cases (hence it’s standard). Small tests are cheap and speed up coding and maintenance, and larger tests safeguard core use-cases and integration.

Just remember: Your tests are a means. The bang is what counts. It’s your job to maximize it.

Categories: Software Testing

E-Commerce Benchmark – WooCommerce Results

LoadStorm - Wed, 05/28/2014 - 11:58

The new e-commerce experiments are here! The goal of these experiments is to expose how well these e-commerce platforms perform in their most basic setup when running on an Amazon m3.large server. To test the scalability of each platform, we will run three hour-long tests to ensure reproducible results. This time around we are utilizing quick, preliminary testing to establish a rough baseline for the amount of users each platform can handle. In addition, the Web Performance Lab has modified the criteria that will be used to determine performance failure.

WooCommerce Experiment Methodology

The first platform we will be re-working in this series of experiments was WooCommerce. After initializing the WooCommerce store, a set of four scripts was created to simulate user activity. In our preliminary load test, traffic was ramped up from 1 to 1000 VUsers over the period of an hour, using a single script that completed the new account registration process in the WooCommerce store. Parameterizing this script with user data allowed us to create new accounts that we could use to log in with during later tests, while at the same time obtaining an estimate of how many users the system could handle. Error rates spiked to an unacceptable 14.9% at just under 200 concurrent VUsers, and the test was stopped leaving us with 147 new accounts created in the WooCommerce system.

The second preliminary load test was run using all four scripts and weighting the traffic accordingly to mimic typical user transactions.

Because we already had a rough idea of where the test would fail, we only ran the test for 15 minutes, scaling up linearly to 500 VUsers. This time error rates remained relatively low until around 220 concurrent users, where they spiked to about 12%. Based on these results, we decided to run the three scalability tests scaling linearly from 1 – 250 concurrent VUsers. Each test would hold the peak load concurrency of 250 VUsers for the last 10 minutes of the hour-long test.

Criteria for Failure

The tough question and the crux of why we’re re-doing these experiments really is determining what scalability means.

As performance engineers, we understand that all of the metrics matter and are tightly coupled. However, after a lot of research and discussion, the Web Performance Lab agreed upon a set of 3 main criteria to use to determine a point of failure in our scalability tests, based on if any of the three criteria were met. The following three criteria were used to determine the system’s scalability, considering the impact of the performance scenarios:

Performance error rate greater than 5 %
Average Response Time for requests exceeding 1 second
Average Page Completion Time exceeding 10 seconds

All three tests had very similar tests results, varying less than 1% in the number of overall errors and requests sent between two of the tests. Due to this uniformity, we chose to analyze the median of the three tests. These are the results of the test, based on our new criteria for failure.

Data Analysis & Results

Performance Metric 1: Error Rates greater than 5 %

It wasn’t until the around the 35 minute mark in all three tests, error rates begin to increase, jumping up to 5.05% at 38 minutes in our median test run. The two types of errors we experienced included request read timeouts and internal server errors (500 errors). An internal server error means that the web site’s server experienced a problem, but could not give specific details on what the exact problem is. Request read timeouts indicate that the server took longer than the time limit LoadStorm’s parameters gave it to respond, so the request was cancelled. There were no specific requests in particular yielding these errors, which is good because it means there weren’t any particularly troublesome requests being made in the load test.

Performance Metric 2: Average Response Time exceeding 1 seconds for a request

It’s important to note that this is the average response time for all requests made in a minute interval, not an entire page. Taking a closer look at the performance metrics of all three tests, we can see several key indicators of performance limitations:


In the beginning of the test, the average response time for requests was very low, but it appears to increase in an exponential pattern as the concurrent users increase linearly. We can notice throughput and requests per second scaling proportionally with the linearly increasing VUsers. Interestingly, there appears to be a strong correlation between the point we notice those two metrics plateau, and the point where average response time spikes. This happens just before the 30 minute mark, which is indicative of scalability limits.

Performance Metric 3: Average Page Completion Time exceeding 10 seconds

The last key metric to study is the average page completion time. We chose to focus on the homepage completion for our testing purposes, as it’s the page most commonly hit. We realize we could’ve chosen a smaller time required for a page to complete, but we decided to be generous in our benchmarking. I expected to see page completion time begin to drop off near or slightly before the 38 minute mark, along with throughput and increase in error rate. Surprisingly, our criteria for failure was met at the 29 minute mark, at 147 concurrent VUsers, with average page completion time pushing 15 seconds. This means that the average user would have to wait over 10 seconds just for the home page to load, which is completely unacceptable.

Conclusions – Where does our system fail?

Based on our criteria for failure, we’ve determined the system to be scalable up to 147 concurrent VUsers. Our system’s bottlenecks appear to be on the server side, and our limitation is average page completion speed, a crucial metric of performance scalability.

To score our e-commerce platform, we took four different metrics that reflect volume and weighted them evenly. We compared the actual results with what we considered to be a reasonable goal for each metric for scalability.

Determining a set of criteria to base our scalability on was a challenge. As a young performance engineer, analyzing the results and studying how all of the different pieces correlate has been like solving a puzzle. Based on our original e-commerce experiments, I expected a slightly more scalable system. The metrics we used are absolutely essential as a predictive tool for benchmarking and identifying the real performance limitations in the WooCommerce platform.

What did you think of our experiment? Give us feedback on what you liked and on what you’d like to see done differently below!

The post E-Commerce Benchmark – WooCommerce Results appeared first on LoadStorm.

Documenting Tests – Part 2 (The Tester Leaves)

Eric Jacobson's Software Testing Blog - Tue, 05/27/2014 - 11:10

What if the test documentation is written per the detail level in this post… 

Document enough detail such that you can explain the testing you did, in-person, up to a year later.

…and the tester who wrote it left the company? 

Problem #1:

How are new testers going to use the old test documentation for new testing?  They aren’t, and that’s a good thing.

Write tests as late as possible because things change.  If you agree with that heuristic, you can probably see how a tester leaving the company would not cause problems for new testers.

  • If test documentation was written some time ago, and our system-under-test (SUT) changed, that documentation might be wrong.
  • Let’s suppose things didn’t change.  In that case, it doesn’t matter if the tester left because we don’t have to test anything.
  • How about the SUT didn’t change but an interfacing system did.  In this case, we may feel comfortable using the old test documentation (in a regression test capacity).  In other words, let’s talk about when the write-tests-as-late-as-possible heuristic is wrong and the test documentation author left the company.  If you agree that a test is something that happens in one’s brain and not a document.  Wouldn’t we be better off asking our testers to learn the SUT instead of copying someone else’s detailed steps?  Documentation, at this level of detail, might be a helpful training aid, but it will not allow an unskilled tester to turn off their brain.  Get it?

Problem #2:

How are people going to understand old test documentation if the tester who left, must be available to interpret?  They won’t understand it in full.  They will probably understand parts of it.  An organization with high tester turnover and lots of audit-type inquiries into past testing, may need more than this level of detail.

But consider this: Planning all activities around the risk that an employee might leave, is expensive.  A major trade-off of writing detailed test documentation is lower quality.

Categories: Software Testing

Secrets of Successful Assessments

Randy Rice's Software Testing & Quality - Fri, 05/23/2014 - 08:23
Thanks to everyone who attended the webinar on "Secrets of Successful Assessments" that Tom Staab and I presented yesterday.

Here are the items from the webinar:

Recording of the session is at http://youtu.be/7Tk04F4NU5Y
(By the way, I have other videos there on my channel. Please subscribe to it if you want notifications of when I post new ones. We have 80 subscribers so far!)

The assessment checklist is at: http://www.riceconsulting.com/assessment_checklist.docx

The slides are at: http://www.softwaretestingtrainingonline.com/cc/public_pdf/Secrets%20of%20Successful%20Software%20Assessments%2005152014%20-%20RR.pdf


Congratulations to the winner of my book, Testing Dirty Systems, Emma Ball from Columbus, OH!


Have a great Memorial Day weekend and don't forget top honor those who have fought and died for our country!

Randy




Categories: Software Testing

James Bach’s Beautiful Testing Metaphors

Eric Jacobson's Software Testing Blog - Thu, 05/22/2014 - 14:30

(and one of Michael Bolton’s)

One of my testers took James Bach’s 3-day online Rapid Testing Intensive class.  I poked my head in from time to time, couldn’t help it.  What struck me is how metaphor after metaphor dripped from Bach’s mouth like poetry.  I’ve heard him speak more times than I can count but I’ve never heard such a spontaneous panoply of beautiful metaphors.  Michael Bolton, acting as assistant, chimed in periodically with his own metaphors.  Here are some from the portions I observed:

  • A tester is like a smoke alarm, their job is to tell people when and where a fire is.  However, they are not responsible for telling people how to evacuate the building or put out the fire.
  • (Michael Bolton) Testers are like scientists.  But scientists have it easy;  They only get one build.  Testers get new builds daily so all bets are off on yesterday’s test results.
  • Buying test tools is like buying a sweater for someone.  The problem is, they feel obligated to wear the sweater, even if it’s not a good fit.
  • Testers need to make a choice; either learn to write code or learn to be charming.  If you’re charming, perhaps you can get a programmer to write code for you.   It’s like having a friend who owns a boat.
  • Deep vs. Shallow testing.  Some testers only do Shallow testing.  That is like driving a car with a rattle in the door…”I hear a rattle in the door but it seems to stay shut when I drive so…who cares?”.
  • Asking a tester how long it will take to test is like being diagnosed with cancel and asking the doctor how long you have to live.
  • Asking a tester how long the testing phase will last is like asking a flight attendant how long the flight attendant service will last.
  • Complaining to the programmers about how bad their code looks is like being a patron at a restaurant and walking back into the kitchen to complain about the food to the chefs.  How do you think they’re going to take it?
  • Too many testers and test managers want to rush to formality (e.g., test scripts, test plans).  It’s like wanting to teleport yourself home from the gym.  Take the stairs!

Thanks, James.  Keep them coming.

Categories: Software Testing

Editorial on End of Net Neutrality

LoadStorm - Mon, 05/19/2014 - 13:34
The FCC Proposes to End Net Neutrality

This week’s web performance news hits close to home for performance engineers. The concept of “the end of net neutrality” leaked a couple weeks ago and the public response was very strong. Learn more about what net neutrality means to you from our blog post last week. Last Thursday (5/15) the FCC voted on a proposal that will allow internet “fast lanes” for companies willing and able to pay for them. The proposal was accepted by a 3-2 majority vote. The vote was divided down party lines with the 3 democratic commissioners voting for the proposal and the 2 republican commissioners voting against. The FCC will now be taking comments from the public before a final vote on the proposal takes place in July. Read on for an editorial of my opinion of why we all need to rally together and fight this proposal and protect net neutrality:

What is Net Neutrality?

The principle of net neutrality is that all internet content should be treated equally by internet service providers. The implementation of an open environment and the free exchange of information between people is regarded as sacred and if this proposal were enacted, internet service providers would be able to give some content and websites preferential treatment.

The Plans

The proposal made by Tom Wheeler, the chairman of the FCC, moves to allow internet service providers (Comcast, Time Warner, and Verizon) to offer “fast lanes” to companies who pay for them. Many argue that allowing the ISP’s to determine which sites and content are seen at what speed could lead to both censorship and monopolization of the internet. Censorship is a possible outcome of the proposal as ISP’s will have the legal ability to give some content faster speeds than others. Additionally, an ISP could prevent controversial content from reaching a large audience by denying it access to the “fast lanes”. A form of monopolization could occur as large corporations are able to afford the fast highways while small businesses and start ups are left with just the bumpy back roads. These small businesses have the slowest (and therefore poorer performing) websites and will easily be squelched by large corporations who can afford to pay for the “fast lanes.”

How will this affect load and performance testing?

Web performance is like a intricate and living piece of art. Each website contains a different framework and serves a different purpose. Optimization requires understanding what you’re changing and why, as well as knowledge of useful testing tools and metrics that matter. The innovation and research into the neuroscience of attaining great performance will be dead at the hands of money. The piece of web development that is affected by this proposal is how the ISP’s handle the delivery of information to the end users, which is really a critical portion of web development.

For example, if two companies have the same good infrastructure and back-end performance, and Company A can afford to pay for the “fast lane,” its end users will continue to see excellent web performance. However, if Company B cannot or will not pay the ISP’s for the “fast lane,” its end users will see markedly decreased web performance. Therefore, prepare as you might, developing a truly stellar web application will lose its value to large corporations who are able to afford to pay the ISP’s for preferred service.

Quality development and internet competition is at risk. A popular example of the importance of equal internet opportunity is the replacement of MySpace with Facebook. This change would most likely not have been possible without net neutrality as Facebook would not have been able to compete financially. Likewise, new e-commerce sites or social media platforms will not stand a competitive chance, regardless of the benefits they may bring to market.

What can you do?

Nearly 75,000 people have petitioned the White House to protect and maintain net neutrality. You can sign it here now. In addition, the FCC has offered an email address for people to voice their thoughts on the neutrality plans. Please take a minute to email openinternet@fcc.gov, which has been set up by the FCC to take public comments on this issue. We want to know what you think about net neutrality too! Share your thoughts in the comments below!

The post Editorial on End of Net Neutrality appeared first on LoadStorm.

Pages