Fix Memory Leaks in Java Production Applications

Adding more memory to your JVMs (Java Virtual Machines) might be a temporary solution to fixing memory leaks in Java applications, but it for sure won’t fix the root cause of the issue. Instead of crashing once per day it may just crash every other day. “Preventive” restarts are also just another desperate measure to [...]
Categories: Load & Perf Testing

on load testing and performance April-13

LoadImpact - Mon, 04/29/2013 - 07:06

What are others are saying about load testing and web performance as of now? Well, apparently more people have things to say about how to make things scale rather than how to measure it, but anyway, this is what cought our attention recently:

 

 

  • Boundary.com has a two part interview with Todd Hoff, founder of High Scalability and advisor to several start ups. Read the first part here: Facebook Secrets of web performance
  • Another insight in how the big ones are doing it is this 30 minute video from this years Pycon US. Rick Branson of Instagram talks about how they handle their load as well as the Justin Bieber effect. 
  • The marketing part of the world have really started to understand how page load times affect sales as well as Google rankings. In this article Online marketing experts portent.com explains all the loops and hoos they went trough to get to sub second page load time. Interesting read indeed.
  • One of my favorite sources for LAMP related performance insights is the MySQL performance blog. In a post from last week, they explain a bit about how to use their tools to analyze high load problems.
What big load testing or web performance news did we miss? Have your say in the comments below.

 

Categories: Load & Perf Testing

Balancing the Load

A question that every online application provider will face eventually is: does my application scale? Can I add an extra 100 users and still ensure the same user experience? If the application architecture is properly designed the easiest way is to put additional server behind load balancer to handle more traffic. In this article we [...]
Categories: Load & Perf Testing

Emulated Mobile Monitoring

BrowserMob - Mon, 04/22/2013 - 17:07

We often get asked how WPM can monitor mobile websites.  The standard answer has been to script an HTTP request intercepter to change the User-Agent content header to be the same as a mobile browser.  This works in some cases, but has a number of drawbacks.

 

  • To the webpage's JavaScript the browser still appears to be Chrome or Firefox and may not trigger the loading or layout of the mobile website.
  • The features available to the webpage may be different from the mobile device.
  • The screen resolution and page layout don't match real devices.

 

Our solution to this is to use a custom browser, where we can have a high degree of control over the browser features and rendering.  We use a custom build of PhantomJS to emulate Android Chrome and iOS Safari.  The same WebDriver scripting API we use with Chrome and Firefox works with our emulated browsers, thanks to Ivan De Marino's excellent WebDriver implementation for PhantomJS.

 

This approach allows us to match the rendering of real devices quite closely.  Matching the rendering is necessary as it verifies that we are loading the same site and content as a real mobile device.

 

 

Here's example screenshots of google.com rendered with different device profiles:

 

google.com on iPhone

 

google.com on iPad

google.com on desktop Chrome

 

On the other hand, many sites do not have dedicated mobile content and display the essentially the same on Mobile devices as they do on desktops with larger screens.  In this case the device browser gives the web page a virtual screen width of 980 pixels to render into. The page is then scaled down to fit on the real device screen (Note: Or it could be scaled up depending on the native resolution of the device e.g. Retina iPad has a higher resolution than many desktops and the content is often scaled up so it doesn't appear too small).  The page may overflow the 980 pixel limit, if it does the content is cropped and the user must scroll the page to see the rest of the content.  Here's an example of a page rendered at a virtual screen width of 980:

 

apple.com on iPhone

 

apple.com on desktop Firefox

 

 

Features

 

Here's a short feature list:

 

  • Android Chrome and iOS Safari can be emulated via several device profiles:
    • iPhone (Retina)
    • iPad (Retina)
    • iPad Mini
    • LG Optimus G
    • HTC Droid DNA
    • Samsung Galaxy S3
  • Screen layout & rendering match mobile devices.
  • Touch event support.
  • Emulated Mobile Monitoring is available from all locations.
  • Existing Selenium/WebDriver API is supported.

 

 

Walkthrough

 

All WPM accounts have access to Emulated Mobile Monitoring.

 

Writing your monitoring script follows the same workflow as existing desktop browser scripting.  To validate a script in a mobile browser, select the browser type from the browser drop down and click revalidate/validate.

 

 

After the job validates, a screenshot of the webpage will be displayed (note this does not include the surrounding graphics of the mobile browser, the screenshot is equivalent to the browser in fullscreen mode).

 

Once the script has validated, it can now be used in a Monitor.  Either create a new Monitor or edit an existing one to bring up the Monitor Settings.  From here an emulated browser can be selected from the "Select Browser" drop down.

 

Note: Each sample costs four monitoring units just like regular browser monitoring.

 

Local Script Validator

 

Local Script Validator can be used to run the emulated mobile scripts locally.

 

  • Follow the Local Script Validator setup instructions.
  • Download WPM PhantomJS and save it to a location your PATH (only available for Windows right now).
  • Run local validator with one of the following -browser options:

 

Browser
Device
Option
SafariiPhone 5iPhone5-emuSafariiPad RetinaiPad-emuSafariiPad MiniiPadMini-emuChromeLG Optimus G

OptimusG-emu

Chrome Samsung Galaxy S3SamsungS3-emuChromeHTC Droid DNADroidDNA-emu

 

E.g.

 

  validator -browser iPhone5-emu myscript.js

 

Limitations

 

  • Emulated Mobile browsers are not (yet) available for Load Testing.
  • Differing WebKit versions.  The version of WebKit in PhantomJS differs from the version on real devices.  This sometimes causes problems with some of the more recent WebKit features.  We are in process of upgrading to a newer WebKit version.  But for now the versions are:
    • PhantomJS WebKit: 534.34
    • iOS 6.1 Webkit: 536.26
    • Android 4.1.1 Webkit: 537.22 (534.30 builtin)
    • Android 4.2 Webkit: 537.22
  • Differing sets of plugins.
    • PhantomJS: None
    • iOs: QuickTime
    • Android: Flash, PDF reader.

 

Links

 

  • For a detailed description of page layout on different devices see: http://developer.apple.com/library/ios/#documentation/AppleApplications/Reference/SafariWebContent/UsingtheViewport/UsingtheViewport.html
  • http://phantomjs.org/
  • https://github.com/detro/ghostdriver
  • http://community.neustar.biz/community/wpm/blog/2012/10/02/neustar-script-local-validator-user-guide-for-windows

 

 


Categories: Load & Perf Testing

Load testing tools vs monitoring tools

LoadImpact - Mon, 04/22/2013 - 03:41

So, what's the difference between a load testing tool (such as http://loadimpact.com/) and a site monitoring tool such as Pingdom (https://www.pingdom.com/). The answer might seem obvious to all you industry experts out there, but nevertheless it's a question we sometimes get. They are different tools used for different things, so an explanation is called for.

Load testing tools

With a load testing tool, you create a large amount of traffic to your website and measure what happens to it. The most obvious measurement is to see how the response time differs when the web site is under the load created by the traffic. Generally, you want to find out either how many concurrent users your website can handle or you want to look at the response times for a given amount of concurrent users. Think of it as success simulation: What happens if I have thousands of customers in my web shop at the same time. Will it break for everyone or will I actually sell more?

Knowing a bit about how your website reacts under load, you may want to dig deeper and examine why it reacts the way it does. When doing this, you want to keep track of various indicators on the web site itself while it receives a lot of traffic. How much memory is consumed? How much time spent waiting for disk reads and write? What's the database response time? etc. Load Impact offers server metrics as a way to help you do this. By watching how your webserver (or servers) consume resources, you gradually build better and better understanding about how your web application can be improved to handle more load or just to improve response times under load.

Next up, you may want to start using the load testing tool as a development tool. You make changes that you believe will change the characteristics of your web application and then you make another measurement. As you understand more and more about the potential performance problems in your specific web application, you iterate towards more performance.

Monitoring tools

A site monitoring tool, such as Pingdom (https://www.pingdom.com/), might be related, but is a rather different creature. A site monitoring tool will send requests to your web site on a regular interval. If your web site doesn't respond at all or, slightly more advanced, answers with some type of error message, you will be notified.

An advanced site monitoring tool can check your web site very often, once every minute for instance. It will also test from various locations around the world to be able to catch network problems between you and your customers. A site monitoring tool should be able to notify you by email and SMS as soon as something happens to your site. You are typically able to set rules for when your are notified and for what events, such as 'completely down', 'slow response time' or 'error message on your front page' In recent years, functionality have gotten more advanced and beside just checking if your web site is up, you can test entire work flows are working, for instance if your customers can place an item in the shopping card and check out.

Most site monitoring tools also include reporting so that you can find out what your service level have been like historically. It's not unusual to find out that the web site you thought had 100% uptime actually has a couple of minutes of down time every month. By proper reporting, you should be able to follow if downtime per month is trending.

Sounds like a good tool right? We think it deserves to be mentioned that whenever you detect downtime or slow response time with a site monitoring tool, you typically don't know why it's down or slow. But you know you have problems and that's a very good start.

One or the other?

Having a bit more knowledge about the difference between these types of tools, we also want to shed some light on how these can be used together.

First of all, you don't choose one or the other type of tool, they are simply used for different things. Like measuring tape and a saw, when your building a house you want both. We absolutely recommend that if you depend on your web site being accessible, you should use a site monitoring tool. When fine tuning your web site monitoring tool, you probably want to set a threshold for how long time you allow for a web page to load. If you have conducted a proper load test, you probably know what kind of response times that are acceptable and when the page load times actually indicates that the web server has too much load.

Then, when your site monitoring tool suddenly begins to alert you about problems you want to dig down and understand why, that's when the load testing tool becomes really useful. As long as the reason for your down time can be traced back to a performance problem with the actual web server, a load testing tool can help you a long way.

Recently, I had a client that started getting customer service complaints about the web site not working. First step was to set up a web site monitoring tool to get more data in place. Almost directly, the web site monitoring tool in use was giving alerts, the site wasnt always down, but quite often rather slow. The web shop was hosted using standard web hosting package at one of the local companies. I quickly found out that the problem was that the web shop software was just using a lot of server resources and this was very easy to confirm using a load testing tool. Now the client is in the process of moving the site to a Virtual Private Server where resources can be added as we go along. Both types of tools played an important role in solving this problem quickly.

Questions? Tell us what you want to know more about in the comments below.

 

Categories: Load & Perf Testing

So What? – Monitoring Hadoop beyond Ganglia

Over the last couple of months I have been talking to more and more customers who are either bringing their Hadoop clusters into production or that have already done so and are now getting serious about operations. This leads to some interesting discussions about how to monitor Hadoop properly and one thing pops up quite often: Do [...]
Categories: Load & Perf Testing

Evolving an APM Strategy for the 21st Century

I started in the web performance industry – well before Application Performance Management (APM) existed – during a time when external, single page measurement ruled the land. In an ecosystem where no other solutions existed, it was the top of the data chain to support the rapidly evolving world of web applications. This was an [...]
Categories: Load & Perf Testing

Top 5 ways to improve Wordpress under load

LoadImpact - Thu, 04/11/2013 - 06:38

Wordpress claims that more than 63 million web sites are running the Wordpress software. So for a lot of users, understanding how to make Wordpress handle load is important. Optimizing Wordpress ability to handle load is very closely related to optimizing the general performance, a subject with a lot of opinions out there. We've actually talked about this issue before on this blog. Here's the top 5 things we recommend you do to fix before you write that successful blog post that drives massive amounts of visitors.

#1 - Keep everything clean and up to date

Make sure that everything in Wordpress is up to date. While this is not primarily a performance consideration, it's mostly important for security reasons. But various plugins do gradually become better and better with performance issues, so it's a good idea to keep Wordpress core, all plugins as well as your theme up to date. And do check for updates often, I have 15 active plugins on my blog and I'd say that there's 4-6 upgrades available per month on average.
The other thing to look out for is to keep things clean. Remove all themes and plugins that you don't currently use. Deactivate and physically delete them.
As an example, at the time of writing this. my personal blog had 9 plugins that needed upgrading and I had also let the default Wordpress theme in there. I think it's a pretty common situation so do what I did and make those upgrades.

#2 - Keep the database optimized

There are two ways that Wordpress databases can become a performance problem. First is that Wordpress stores revisions of all posts and pages automatically. It's there so that you always can go back to a previous version of a post. As handy as that can be, it also means that the one db table with the most queries gets cluttered. On my blog, I have about 30 posts but 173 rows in the wp_posts table. For any functionality that lists recent posts, related posts and similar, this means that the queries takes longer. Similary, the wp_comments table keeps a copy of all comments that you've marked as spam, so the wp_comments table may also gradually grow to become a performance problem.
The other way that you can optimize the Wordpress database is to have mysql do some internal cleanup. Over time the internal structure of the mysql tables also becomes a cluttered. Mysql provides an internal command for this: 'optimze table [table_name]'. Running optimize table can improve query performance a couple of percent with in turn improves the page load performance.
Instead of using phpmyadmin to manually delete old post revisions and to run the optimize table command, you should use a plugin to do that, for instance WP Optimize.
Installing Wp optimize on my blog, it told me that the current database size was 7.8 Mb and that it could potentially remove 1.3 Mb from it. It also tells me that a few important tables can be optimized, for instance wp_options that is used in every single page request that Wordpress will ever handle.

 

#3 - Use a cache plugin

Probably the single most effective way to improve the amount of traffic your Wordpress web site can handle is to use a cache plugin. We've tested cache plugins previously on the Load Impact blog, so we feel quite confident about this advice. The plugin that came out on top in our tests a few years ago was W3 Total Cache. Setting up W3 Total Cache requires some attention to details that is well beyond what other Wordpress plugins typically requires. My best advice is to read the installation requirements carefully before enabling the page cache functionality since not all features will work on all hosting environments. Read more about various Wordpress cache plugins here, but be sure to read the follow up.

#4 - Start using a CDN

By using a CDN (content delivery network), you get two great performance enhancements at once. First of all, web browsers limit the number of concurrent connections to your server, so when downloading all the static content from your Wordpress install (css, images, javascripts etc.), they actually queue up since not all of them is downloaded at the same time. By placing as much content as possible on a CDN, you work around this limitation since your static content is now served from a different wen server. The other advantage you get is that CDN typically have more servers than you do and there's a big chance that (a) one of their servers is closer to the end user than your server and (b) that they have more bandwidth than you do.
There are a number of ways you can add CDN to your Wordpress install. W3 Total Cache from #3 above handles several CDN providers (Cloudflare, Amazon, Rackspace) or even lets you provide your own. Another great alternative is to use the CloudFlare Wordpress plugin that they provide themselves.

#5 Optimize images (and css/js)

Looking at the content that needs to be downloaded, regardless if it's from a CDN or from your own server, it makes sense to optimize it. For css and js files, a modern CDN provider like CloudFlare can actually minify it for you. And if you don't go all the way to use an external CDN, the W3 Total Cache plugin can also do it for you.
For images you want to keep the downloaded size as low as possible. Yahoo! has an image optimizer called Smush.it that will drastically reduce the file size of an image, while not reducing quality. But rather than dealing with every image individually, you can use a great plugin name WP-Smushit that does this for you as you go along.

Conclusion and next step

There are lots and lots of content online that will help you optimize Wordpress performance and I guess it's no secret that these top 5 tips are not the end of it. In the next post, I will show you how a few of these advises measures up in reality in the LoadImpact test bench.
 
 

Categories: Load & Perf Testing

Top 8 Application Performance Landmines

We have been blogging about the same problems and problem patterns we see while working with our customers over the past few of years. There have always been the classic application performance landmines in the areas of inefficient database access, misconfigured frameworks, excessive memory usage, bloated web pages and not following common web performance best [...]
Categories: Load & Perf Testing

Scheduled Maintenance for Saturday, April 6, 2013

BrowserMob - Thu, 04/04/2013 - 09:07

We will be performing maintenance on Saturday, April 6, 2013 between 10:00 PM and 3:00 PM the next morning (PST).  Intermittent loss of connectivity may occur during this time.

 

Monitoring, alerting and load testing will not be affected during this time.

 

If you have any concerns, please contact us at .

 

Thank you.

Categories: Load & Perf Testing

Just Don’t Panic

When we set up our application performance monitoring tool to correctly notify us about unexpected performance degradation, we often know about problems before our users start making calls to support (see our previous post on proactive APM). But what happens when we actually learn about a major performance outage? Psychologists identified two ways people falter [...]
Categories: Load & Perf Testing

Python - Re-tag FLAC Audio Files (Update Metadata)

Corey Goldberg - Tue, 04/02/2013 - 09:16

I had a bunch of FLAC (.flac) audio files together in a directory. They are from various sources and their metadata (tags) were somewhat incomplete or incorrect.

I managed to manually get all of the files standardized in "%Artist% - %Title%.flac" file name format. However, What I really wanted was to clear their metadata and just save "Artist" and "Title" tags, pulled from file names.

I looked at a few audio tagging tools in the Ubuntu repos, and came up short finding something simple that covered my needs. (I use Audio Tag Tool for MP3's, but it has no FLAC file support.)

So, I figured the easiest way to get this done was a quick Python script.

I grabbed Mutagen, a Python module to handle audio metadata with FLAC support.

This is essentially the task I was looking to do:

#!/usr/bin/env python import glob import os from mutagen.flac import FLAC for filename in glob.glob('*.flac'): artist, title = os.path.splitext(filename)[0].split(' - ', 1) audio = FLAC(filename) audio.clear() audio['artist'] = artist audio['title'] = title audio.save()

It iterates over .flac files in the current directory, clearing the metadata and rewriting only the artist/title tags based on each file name.

I created a repository with a slightly more full-featured version, used to re-tag single FLAC files:
https://github.com/cgoldberg/audioscripts/blob/master/flac_retag.py

Categories: Load & Perf Testing

How Bon-Ton Stores aligns Business Goals with IT Requirements

Two or three times a year, Bon-Ton Stores products are featured on Jill’s “Steals and Deals” segment on the Today Show. The products are promoted with huge discounts. As soon as the segment first airs on the East Coast, the “Steals and Deals” site displays the featured products, with links to Bon-Ton’s site, usually directly [...]
Categories: Load & Perf Testing

Run Comparisons in Faban

Performance & Open Source - Mon, 04/01/2013 - 15:01

I recently checked in a feature that allows fairly extensive comparisons of different runs in Faban. Although the ‘Compare’ button has been part of the Results list view for awhile, it has been broken for a long time. It finally works!

When to use Compare

The purpose of this feature is to compare runs performed at the same load level (aka Scale in faban) and  on the same benchmark rig. Perhaps you are tuning certain configs and/or code and are doing runs to analyze the performance differences between these changes. The Compare feature lets you look at multiple runs at the same time on multiple dimensions: throughput, average and 90% response times, average CPU utilization, etc. This gives a single page view that can quickly point out where one run differs from another.

How to use Compare

This is easy. On the results view in the dashboard, simply select the runs you want to compare using the check box at the left of each row. Then click the Compare button at the top of the screen.

The screen-shot below shows this operation:

Comparison Results

The first part of the comparison report looks like the image below. The report combines tables with graphs to make the data relevant. For example, Run Information is a summary table that describes the runs, where as throughput is a graph that shows how the overall throughput varied during the length of the test for all runs.

 

 

 

 

 

 

 

 

 

 

 

How can I get this code?

The code is currently in the main branch of the faban code on github. Fork it and try it out. Once I get some feedback, I will fix any issues and cut a new binary release.


Categories: Load & Perf Testing

Squeezelite - Headless Squeezebox Emulator

Corey Goldberg - Mon, 04/01/2013 - 07:19

Use Squeezebox, without buying a Squeezebox...

Recently, Logitech discontinued most Squeezebox streaming music players. However, the media server is Open Source, so it looks like some form of Logitech Media Server (LMS) will live on, no matter what Logitech eventually does with it.

I've been a user of Squeezebox network music player since it was released by SlimDevices (SliMP3/SlimServer), and throughout the transfer to Logitech. I've owned 3 Squeezebox models over the years... currently enjoying the Squeezebox Touch, with music streamed from Logitech Media Server.

It works flawlessly for streaming my own music collection (FLAC/MP3/etc), and streaming radio (Pandora/Slacker/Sirius/etc), to my HiFi. I use the digital (S/PDIF) outputs, and sometimes the DAC/analog (RCA) outputs.

Now... with the release of Squeezelite, you can build your own Squeezebox, or use an existing computer/laptop with digital output as a Squeezebox.

Squeezelite is a cross-platform, headless, LMS client that supports playback synchronization, gapless playback, direct streaming, and playback at various sampling rates. It runs on Linux using ALSA audio output and other platforms using PortAudio. It is aimed at supporting high quality audio.

I gave Squeezelite 1.0 a try on Ubuntu 12.04, with S/PDIF optical output to my DAC. It worked like a charm!

Squeezelite info:
https://code.google.com/p/squeezelite/

Squeezelite download (precompiled binaries for x86/amd64/arm):
https://code.google.com/p/squeezelite/downloads/list

Enjoy the music.

Categories: Load & Perf Testing

What to do if A/B testing fails to improve conversions?

A/B and multivariate testing are often used to improve the conversion funnel. What these tools do is randomly place alternative change images, text or other design elements to gather statistics about how these things affect site visitors. Companies have had great success using such solutions, but sometimes multiple rounds of testing still produce inconclusive data: [...]
Categories: Load & Perf Testing

Can You See the Storm Coming?

As much as we try to avoid performance problems, they do happen. It is inevitable. But it is possible to learn to react fast; and in some occasions fast enough that the impact on the end users is negligible. Despite operators’ best efforts, 73% of performance issues are reported by users, according to "APM: Getting [...]
Categories: Load & Perf Testing

Announcing the Load Impact API beta

LoadImpact - Wed, 03/20/2013 - 02:54

 We are pleased to announce the Load Impact API beta!

 the developer.loadimpact.com API documentation site

For people who do not know what an API is, or what it is good for, our API allows you to do basically everything you can do when logged in at loadimpact.com, like configure a load test, run a load test, download results data from a load test. But the API can be used by another program, communicating with Load Impact over the Internet. This means that a developer can write an application that will be able to use our API functionality to configure and run load tests on the Load Impact infrastructure - and this can happen completely without human involvement, if the developer chooses it.

The API is very useful for companies with a mature development process, where they e.g. do nightly builds - and run automated tests - on their software. The API allows them to include load tests in their automated test suites, and in that way monitor the performance and scalability of their application while it is being developed. This is useful in order to get an early indication that some piece of newly produced code doesn't perform well under load/stress. The earlier such problems are detected, the less risk of developers wasting time working on code tracks that don't meet the performance criteria set up for the application.

The API can also be used by other online services or applications, that want to include load testing functionality as part of the service/product, but where it is preferable to avoid building from scratch a complete load testing solution like Load Impact. They can use our API to integrate load testing functionality as part of their own product, with Load Impact providing that functionality for them.

We have created a whole new documentation section for the API at http://developer.loadimpact.com where you can find the API reference and some code examples. We will be delighted to hear from you if you are using the API, so don't hesitate to get in touch with us! Feedback or questions are very welcome!

 

Categories: Load & Perf Testing

Know your node.js

LoadImpact - Tue, 03/19/2013 - 09:17

As part of a follow up to last months column about PHP vs Node.js, I hit some problems with Node under load. As with all technologies, Node.js does have some limitations that may or may not be a problem for your specific use case. If the last column about comparing PHP and Node.js had a deeper message, that message would be that if you want to scale you have to know your stack.

To be completely clear, when I say stack I mean the layers of technology used to server http requests. One of the most common stacks out there are simply called LAMP - (L)inux (A)pache2 (M)ySQL (P)HP (or Perl). You now see a lot of references to LNMP, where Apache2 is replaced with Nginx. When building Node.js applications, things can vary a lot since node.js comes with it's own http server. In my previous text, I used Node.js together with MySQL on a Linux box, so I guess we can dub that the LNM stack if we absolutely need to have a name for it.

And when I say Know your stack. I mean that if you want to produce better than average performance numbers, you have to be better than average in understanding how the different parts in your stack works together. There are hundreds of little things that most of us never knew mattered that suddenly becomes important when things come under load. As it happens, watching your application work under load is a great way to force yourself to know your stack a little better.

Background

When testing Apache/PHP against Node.js, I found that the raw performance of Node.js as well as the ability to handle many concurrent clients was excellent. Faster and more scalable than Apache2/PHP. One reader pointed out that the test wasn't very realistic since there was just one single resource being queried and there was no static content involved. Apache2/PHP could very well relatively better if some of the content was static. So I set up a test to check this and while running this. Node.js crashed. As in stopped working. As in would not server any more http reqeusts without manual intervention. So to keep it shord, Apach2/PHP won that round.

But in the spirit of 'know your stack', we need to understand why Node.js crashed. The error message I got was this:

Unhandled 'error' event "events.js:71"

First of all, it took a fair amout of googling to figure out what that the error message was really about. Or, rather, the error message was saying that something happened and there's no error handler for it. So good luck.

Fixing it.

The first indication I got via Google and Stack Overflow was that this may be an issue with Node.js before 0.8.22 and sure enough, I was running 0.8.19. So the first thing I did was upgrade to version 0.8.22. But that did not fix the problem at all (but a later and greater version is of course a nice side effect). With almost all other software involved being up to date, this actually required some structured problem solving.

Back to the drawing board

I eventually managed to trace the error message down to a 'too many open files' problem which is Interesting as it answers the crucial question: What went wong? This happened at roughly 250 concurrent users with a test that was accessing 6 different static files. This is what it looks like in LoadImpact:

So a little depending on timing, and exactly when each request comes in, it would roughly indicate that some 1500 (6 files times 250 users) files can be open at the same time. Give or take. Most Linux systems are, by default, configured to allow relatively small number of open files, e.g. 1024. The Linux command to check this is ulimit:

$ ulimit -n
1024

1024 is the default on a lot of distros, including Ubuntu 12.10 that I was running the tests on. So my machine had 1024 as the limit but it appears that I had 1500 files open at the same time. Does this make any sense? Well, sort of, there are at least 3 factors involved here that would affect the results:

  1. LoadImpact simulates real browsers (simulated browser users or just SBU). A SBU user only opens 4 concurrent connections to the same server even if the script tells it to download 6 resources. The other 2 resources are simply queued.
  2. Each open TCP socket counts as an open file. So each concurrent TCP connection is an open file. Knowing that our limit is 1024, that would indicate that node.js could handle up to 256 concurrent users if each user uses the maximum of 4 open connections.
  3. In our sample, the requests for static resources also opens a file and thereby occupies another file handle. This file is open for less time than the actual connection, but still, under a certain time, a single request can consume 2 open file handles.

So in theory, the limit for concurrent simulated browser users should be 256 or less. But in reality, I saw the number of concurrent users go all the way up to 270 before the Node.js process died on me. The explanation to that is more likely than anything just timing. Not all SBU's will hit the server at exactly the same time. At the end, hitting problems when running about 250 concurrent users reasons well with the open files limit being the problem. Luckily, the limit of number of open files per process is easy to change:

$ ulimit -n 2048

The next test shows real progress. Here's the graph:

Problem solved (at least within the limits of this test).

Summary

Understanding what you build upon is important. If you choose to rely on node.js, you probably want to be aware of how that increases your dependency on various per process limitations in the operating system in general and max number of open files in particular. You are more affected by these limitations since everything you do takes place inside a single process.

And yes. I know. There are numerous of more or less fantastic ways to work around this particular limitation. Just as there are plenty of ways to work around limitations in any other web development stack. The key thing to remember is that when you select your stack, framework, language or server, you also select all the limitations that comes with it. There's (still) no silver bullet, even if some bullets are better out of the box than other.

Having spent countless of hours with other web development languages, I think I'm in a good position to compare and yes indeed! Node.js delivers some amazing performance. But at present, it comes with a bigger responsibility to 'Know Your stack' than a lot of the others.

 

 

Categories: Load & Perf Testing

Let’s Not Play Blame Games

When the Operations team gets an alert about potential performance problems that users might be experiencing, it is usually either the infrastructure or the actual application that is causing those problems. Things get interesting when neither the ISP nor the application provider is willing to admit fault. Can we tell who is to blame? Could [...]
Categories: Load & Perf Testing