- Top 3 Performance Problems in Custom Microsoft CRM Applications
- Top 10 Client-Side Performance Problems in Web 2.0
- How to Automate Google Analytics Analysis
- Ajax Best Practices: Reduce and Aggregate similar XHR calls
- dynaTrace Continuously Monitors ShowSlow URLs
- Performance as Key to Success! How Online News Portals could do better
- Week 9 – How to Measure Application Performance
- Video of Business Transaction Management in Action: In 6 minutes from Slow Search Request to identify Impacted Users and Offending SQL
- IE Compatibility View: How to identify performance problems between IE versions
- dynaTrace at Web Performance Meetups in Boston and New York City
- Too Much Cache is Like a Krispy Kreme Burger
- Debugging SAP scripts using SAPGUI Spy in LoadRunner
- Monitoring Maintenance Windows
- How to Monitor Oracle Database Performance
- Stressing Out Your Access Management System
- Running remote Unix commands from LoadRunner
- Web Performance Tuning Never Ends
- Running command-line programs from LoadRunner
- IIS Connections Affect Web Performance
- Load Testing Quote for August 19, 2010
Load & Perf Testing
Quick Throughput Experiment
One of the first things we performance engineers do with a new server application is to conduct a quick throughput experiment. The goal is to find the maximum throughput that the server can deliver. In many cases, it is important that the server be capable of delivering this throughput with a certain response time bound. Thus, we always qualify the throughput with an average and 90th percentile response time (i.e. we want 90% of the requests to execute within the stated time). Any decent workload should therefore measure both the throughput and response time.
Let us assume we have such a workload. How best to estimate the maximum throughput within the required response time bounds ? The easiest way to conduct such an experiment is to run a bunch of clients (emulated users, virtual users or vusers) to drive load against the target server without any think time. Here is how the flow from a vuser will look like :
Create Request ==> Send Request ==> Receive Response ==> Log statisticsThis sequence of operations is executed repeatedly (without any pauses in between i.e. no think times) for a sufficient length of time to get statistically valid results. So, to find the maximum throughput, run a series of tests, each time increasing the number of clients. Simple, isn’t it ?
A little while ago, I realized that if one doesn’t have the proper training, this isn’t that simple. I came across such an experiment with the following results :
VUsers ThroughputRequests/sec 5000 88318 10000 88407 20000 88309 25000 88429 30000 88392 35000 88440
What is wrong with these results ?
Firstly, the throughput is roughly the same at all loads. This probably means that the system saturated even below the base load level of 5,000 Vusers. Recall, that the workload does not have a think time. When you have this many users repeatedly submitting requests, the server is certain to be overloaded. I must mention that the server in this case is a single system with 12 cores having hyper-threading enabled. A multi-threaded server application typically will use one or more threads to receive requests from the network, then hand the request to a worker thread for processing. Considering the context-switching, waiting, locking etc. one can assume that at most one can run 4x the number of cores or in this case about 96 server threads. Since each Vuser submits a request and waits for a response, it probably requires 2-2.5x the number of Vusers as the number of server threads to saturate a system. Using this rule of thumb, one would need to run a maximum of 200-250 Vusers.
After I explained the above, the tests were re-run with the following results:
VUsers Throughput 1 1833 10 18226 50 74684 100 86069 200 88455 300 88375Notice that the maximum throughput is still nearly the same as from the previous set, but it has been achieved with a much lower number of Vusers (aka clients). So does it really matter ? Doesn’t it look better to say that the server could handle 35000 connections rather than 300 ? No, it doesn’t. The reason becomes obvious if we take a look at the response times.
The Impact of Response TimesThe graph below shows how the 90% Response Time varied for both sets of experiments :
The response times for the first experiment with very large number of Vusers ranges in the hundreds of millisecs. When the number of Vusers was pared down to just reach saturation, the server responded hundred times faster ! Intuitively too, this makes sense. If the server is inundated with requests, they are just going to queue up. The longer a request waits for processing, the larger is it’s response time.
SummaryWhen doing throughput performance experiments, it is important to take into consideration the type of server application, the hardware characteristics etc. and run appropriate load levels. Otherwise, although you may be able to find out what the maximum throughput is, you will have no idea what the response time is.
Estimating concurrent users based on past traffic
Today we received an excellent question from a customer of ours:
We were wondering if you all have any information that says “X Unique visitors per day translates into Y simultaneous users at any given time.”
Essentially, we’re looking for a way to determine how many simultaneous users we should load test with if we know the sites normal daily traffic.
While every site is different, we recommend following this line of reasoning to help you find the answer. Suppose your site gets 100K unique visitors per day, with peak traffic in the mornings and afternoons. Assume that 40% of the traffic comes between 7AM and 11AM, 40% at 4PM to 9PM, and 20% at other times. This means during your peak hours (7AM to 11AM and 4PM to 9PM) you’ll get ~10% of your unique visitors per hour, or 10K uniques in our example.
Now that you know how many unique visitors you’ll get in an hour, you can start turning that in to concurrent users. To do that, it’s important to understand that a unique visitor is roughly equivalent of a transaction. So really you want to figure out how many users you need to reach 10K transactions in an hour.
Suppose your script (or scripts) take an average of 2 minutes to complete. That means a single user will execute 30 transactions in an hour. So to reach 10K transactions, you’d need 334 users (10K divided by 30). If you decide you want to create realistic scripts that include human think time, then the scripts will take that much longer and you’ll need that many more concurrent users. For example, if the script gets 5X think time added and now takes 10 minutes to run, then you’ll need 1,667 users (10K divided by 6).
Of course, this calculation will only get you the load on a typical day (assuming a single hour sees 10% of traffic). Your traffic patterns may vary, or you may want to prepare for a larger surge. For example, if you want to test what happens when 60% of the daily traffic visits in an hour, then you’d need 2,000 users (60K divided by 30).
Estimating concurrent users based on past traffic
Eliminating concurrent access to sensitive data
Eliminating concurrent access to sensitive data
We recently had a customer from a large clothing retailer ask us if there was any way to ensure that data, such as a username/password combination, could be restricted such that it was “checked out” and available only for a specific concurrent user. This is very common with logins, where systems often will prevent concurrent logins from multiple IP addresses.
While BrowserMob does not have a concept in which data rows can be “checked out”, some simple scripting can achieve the same results. The key is in creative use of the browserMob.getUserNum() and browserMob.getTxCount() APIs. You can learn more about them by reading up on the BrowserMob APIs.
The getUserNum function returns 0, 1, 2, etc based on the concurrent user in your load test. So if you have a 100 user test, getUserNum will return between 0 and 99. It’s important to understand that it will return the same value for the same user throughout the test.
The getTxCount function returns 1, 2, 3, etc based on the number of cycles for that specific user. This number will effectively be a counter of the unique number of transactions that that particular user has executed. So user 1 and user 100 will both have a getTxCount of 1 returned, but by the time user 100 sees it, user 1 might already be on transaction 50.
Now suppose you want to run a 1000 user test in which you never concurrently log in with the same user. All you need to do is pre-create 1000 user accounts and then write your script like so:
var userId = browserMob.getUserNum(); var username = "test-" + userId; var password = "password"; selenium.type("username", username); selenium.type("password", password);This works great, but what if you want to use more than 1000 logins? Suppose you want to use up to 10,000 logins among the 1000 user test? This is where the getTxCount function comes in to:
var loginsPerUser = 10; var userNum = browserMob.getUserNum(); var txCount = browserMob.getTxCount(); var userId = userNum * loginsPerUser + txCount % loginsPerUser; var username = "test-" + userId; var password = "password"; selenium.type("username", username); selenium.type("password", password);What this does is allocate 10 logins per concurrent user. So user 1 will get usernames test-0, test-1, …, test-9 while user 8 will get usernames test-80, test-81, …, test-89, etc. Because of the mod call (%) the ten usernames will simply wrap around once they’ve been used.
Advanced handling of page timeouts in Selenium
Because both our load testing and website monitoring services are based on Selenium, we have a unique ability to measure the performance of things like page load times, AJAX timings, and other in-browser interactions.
Selenium has both a setTimeout command and a waitForPageToLoad command. Both can be given a timeout value, which will control how long Selenium waits for a given page to load or element to appear. When it comes to using our services, most people stick with the default time of 30 seconds. If the timeout is reached, an error is thrown, the script aborts, and the transaction is recorded.
However, sometimes people want to know when pages take more than X seconds to load, but don’t want to necessarily interrupt the flow of the script. In fact, just last week we got this request from a customer:
Ability to trigger an alert based on a set threshold (by the user) – not using the timeout. This basically came from the performance issues we are experiencing. Here’s the scenario:
- We have a specific page that takes 2+ minutes to load.
- The page load timeout in the Selenium script was set to 60s.
- BMob properly reported the “timeout” error but when this error happens, BMob quits the script.
- This is not ideal for me since I want to still be able to see how long the page takes to load.
- Increasing the page load timeout for the page in question works, but now I don’t have a way (that I know of) to still trigger an alert after 60s.
I should be able to set a threshold for any page I choose that would then send a notification alert.
In other words, this customer wanted for a way to still report a transaction as a failure and receive an alert, but also still allow the script to continue. Fortunately, our support for JavaScript as a scripting language provides the answer:
var timeout = 90000; ... // start of script ... var start = new Date().getTime(); selenium.waitForPageToLoad(timeout); var end = new Date().getTime(); ... // rest of script ... if ((end - start) > 45000) { throw "An important page took longer than 45 seconds to load"; }What this script does is sets the timeout to a very long amount (1.5 minutes) but will still report an error if a specific page takes longer than 45 seconds. This allows the remainder of the script to execute even when the page takes more than a specified 45-second threshold.
The only problem with this script is that if the page takes longer than 90 seconds, then the rest of the script will still not run because waitForPageToLoad will throw an exception. You can solve that too with a little code:
var start = new Date().getTime(); try { selenium.waitForPageToLoad(timeout); } catch (e) { // this will happen after 90 seconds // todo: recover and send the browser to the the next URL } var end = new Date().getTime();The only thing that is important to remember with this use of try/catch is that you’ll need to properly recover from the error. Simply catching the error and trying to continue may not work. For example, if the next Selenium command requires clicking on a button that should have loaded from the last page, there may be no way to recover. However, if the next step is simply visiting a new URL, you could possibly get away with a simple open() command in the catch block.