Performance testing images from a customer perspective

Recently we moved Net-A-Porter to use a dynamic product image service created by our Product Management team. Not only will it allow us to improve our customer experience but it will decrease the time to get products to market.

MRP (Mr Porter) and TON (The Outnet) had already implemented the system but at NAP (Net-A-Porter) we were waiting for a few additional features, and also see how it handled the other brands.

Previously NAP had all of the image assets mounted to the web servers with a CDN sat in front, allowing for them to be served extremely quickly. We wanted to make sure that the service creating and storing the images for us wouldn’t degrade performance for our customers.

We wanted to be able to collect performance metrics to compare NAP image performance to the other brands. I created a simple tool to test different image requests in bulk and collect the average response time.

How the tool works

I’m going to briefly explain how the tool works but if you would rather look for yourself, it is available on github.

The tool is written using NodeJS and as an opportunity to learn something, it uses the ES7 proposed async/await functions. However, because these haven’t landed in Node natively yet I used Babel to compile them into something useable. This was (maybe surprisingly to some) my first time setting up Babel, normally if I’m writing using ES6 I only care for modern browsers or Node 6 but this time I had no choice. After some fiddling and only 7(!) dependancies I got it working, it was great to start thinking about how I will use async/await in the future but my Babel reservations still remain. I don’t the idea of writing code that is not the code that is going to be executed. I find the required build step frustrating and also, if I had a syntax error within an async function, it would swallow the error and just exit with no stack trace (this however could be my implementation so please correct me if I’m wrong).

It took a little adjusting to get into the flow of writing and using async functions but after I got into it I really enjoyed and it looks very clean. This was just an internal tool so I didn’t performance test the code, so I don’t know how it would perform in production but if you have not already looked into using them I recommend doing so, here is a good starting point.

Back to the tool, by default it will collect 100 PIDs (Product IDs) from each brand using our Listing and Details API or as it’s know within the company, LAD. You can get a list of those pids like this:

$ curl http://api.net-a-porter.com/${brand}/GB/100/0/pids?visibility=visible&whatsNew=Now

We can then use this list of PIDs to generate a URL for a product image. Thankfully all three brands have a similar structure so we can use ES6 template literals:

const imageUrl = `http://cache.${brand}.com/images/products/${pid}/${pid}_in_m2.jpg`;

${pid}_in_m2.jpg – is the most commonly used image on listing pages, it means “index shot” at size “m2”.

We use this image URL twice, once as is and again with a cache buster appended to the end, each pid is then requested ten times. Using the elapsedTime metric with Request gives us an average for both cached (CDN) and origin. This allows us to compare brands, CDN, origin and the service. Once the results have been collected they are saved to the file system and a graph is displayed directly in the terminal.

Performance Graph

To achieve this the tool uses blessed and blessed-contrib, two great terminal interface libraries for Node.

There are other options to tweak how the application tests performance, more details can be found in the repository: https://github.com/NET-A-PORTER/product-image-perf

Benchmarking

When testing performance it is crucial to collect benchmarks before you make any changes so you can quantify any regression or improvements. All of the benchmarks will compare NAP, which is using images stored on a web server, to MRP and TON both using the image service.

Benchmarking the service’s origin requests

The image resizing service has it’s own internal Varnish cache, once an image has been generated it will remain in the cache for four weeks. The first set of benchmarks collected random product sets each time, the hypothesis being that the requests to cold caches would result in increased latency. This can be done within the tool by calling:

$ RANDOM_PIDS=true npm start

file mount vs cold cacheWe can see that requests to both cache and origin are very similar, however when requesting the service’s origin there’s a noticeable increase in latency.

file mount vs cold cache - 2To be sure this was not an anomaly, this same test was repeated multiple times and we observed similar results. This is expected because if the image is not available at CDN or in the application’s cache, it needs to be generated.

Benchmarking the service’s cached requests

Next we wanted to compare a cold cache of the service to a warm cache. The first set of benchmarks would target a fixed product list, the hypothesis being that the first call would be slower as the images would need to be generated (as above) but the second call would see better performance.

file mount vs cold cacheThis matches the pattern of the previous tests, which is exactly what we want however now we want to see if we get improved latency by calling the same product set.

Screen Shot 2016-08-18 at 12.13.47 PMOrigin and Cache are now much closer, TON has slightly more latency on origin but further testing showed this was just an anomaly.

Conclusion
We can see that requests for images not in the product resizing service cache will cause a performance hit but once cached this should be insignificant. We have our benchmarks so we can write a hypothesis going forward and make a rollout plan.

Rollout

As with most of our software releases we didn’t want to go big bang, we wanted to rollout incrementally, monitor performance and then increase traffic if we were happy with the results.

Our plan was to rollout to 10% of the images first then increase this as time goes on. We thought of a really simple way to implement this using rewrite rules. Here is a sample image URL:

https://cache.net-a-porter.com/images/products/757193/757193_in_m2.jpg

We decided to rewrite all requests for images that had a 0 as the last digit to the new service with a simple regular expression:

https://cache.net-a-porter\.com/(images/products/([0-9]+0/.*))

So while the previous image request (PID: 757193) would go to the original file mount, the following would go to the new image resizing service:

https://cache.net-a-porter.com/images/products/398360/398360_in_m2.jpg

That would give us about 10% of traffic going to the new service, we can then increase the percentage to 50% and finally 100% once we are happy with the results.

Synthetic performance metrics

Having rolled out to 10% of PIDs using the new service we now want to compare how they measure up. You can use fixed PID set using the following command:

NAP_LOCAL_PIDS=true npm start

This will load the PIDs in

pids/${brand}.json

In this case a 100 PIDs ending with 0.

We can now compare the previous NAP benchmarks to the performance of the image resizing service.

Cold cache
First run cold caches of file mountThis shows the performance of a cold cache request to the file mounted images on NAP.

First run cold caches of image serviceHowever this graph shoes the impact of moving to the image resizing service, if the assets have not been previously generated there is an increase in latency.

Warm cache
fifth run cold caches of file mountHere again we can see that with warm caches the performance of the images coming from the service is very similar to the file mount.

Screen Shot 2016-08-18 at 5.07.12 PMThe good news is though as the service gets images in it’s cache the performance improves and is comparable with the file mounted images.

However these are just synthetic tests.

Real user metrics

From the synthetic tests we could see an increased latency if a product’s image had not been requested before but performance returns to normal on the second request. At this point you could be tempted to accept this regression and think this will only introduce an increased latency to a small percentage of customers i.e. the first ones to ever view that product. However, we have about 20,000 products live at any one time, more being added multiple times a week and an application cache time of a month. The chance of a customer regularly hitting uncached images could be higher than we think. The synthetic tests have only given us a part understanding, we have not taken into account user behaviour.

That is why it is important to also use Real User Metrics (RUM) when calculating the impact of this service.

There are multiple ways of collecting performance metrics and multiple metrics you can collect. The most commonly collected metric within RUM tools is document complete, which often requires adding something to the head of the document which essentially does something like:

var start = new Date().getTime();
document.onreadystatechange = function () {
    if (document.readyState === "complete") {
        var end = new Date().getTime();
        var time = end - start;
         // time is document complete in ms
         // then beaconed to RUM tool api
    }
}

While this works well, for modern day web apps it is a little crude. Enter the performance API, which can give us much more granularity. I’m going to go into much more detail about our approach to RUM and the performance API in a future post however for the purpose of this post we are going to look at two specific metrics that take advantage of performance mark and performance measure.

These two metrics are used to determine certain states of the page’s lifecycle. What we deem as ready is having the page’s meaningful content available to the user and in these cases that includes the loading of the product imagery. I have also highlighted on the graph each stage of the image service rollout, this will allow us to see if it has affected customer’s perceived performance of the page.

Listing page ready

listing-page

Listing page ready is a metric that takes into consideration the loading of product images within their view port.

Product details page ready

product-page

Product details page ready is a metric for when the hero product image is loaded.

Conclusions and caveats

The concern of having a large product set and the likelihood of not hitting a warm cache proved to be unfounded as shown in our Real User Metrics. This could potentially be caused by the nature of fashion itself, customers being heavily invested in what’s “hot right now”, trends and trusting our editorial curated lists. Only viewing the first page of products on popular sections insures the images are always available. It will be interesting to see if this pattern changes during a sale period when the user behaviour changes from browsing to hunting for bargains.

The service is now up and running, and proving to be a success. It’s helping the business get to market faster, reducing load on our web servers and allowing us to reclaim years of disk space back. Importantly (for the frontend team) we haven’t impacted the performance experienced by the customer.

During this post I haven’t covered the potential impact that larger (in kilobytes) images might have on performance. This could obviously affect the time it takes to come down the wire but also on the browser painting the image having to work harder to make sense of the larger dataset. Thankfully, comparing the service and the original showed assets on the file mount were the same size so I could eliminate these factors.

There are also some regular variables on the NAP site that can affect our performance metrics. There are three upload days for NET-A-PORTER, when new products are available to purchase, these days and times have become a trained pattern for our customers. Due to this and wanting to shop the latest trends, there is a significant increase to our site traffic. The more more traffic, the hotter the caches and this is why you can see consistent peaks and troughs in performance metrics.

I will however be talking about that more next time.

Print Friendly

Leave a Reply