The bugfix that could make the internet 5% faster

I’ve been working with Google Analytics for the last 3 years. When I started working with it it was already a very huge player on the market, but I’ve seen enormous growth on these  years. Google Analytics is the most used web analytics solution in the world. It’s used on currently 44.67% of the top million websites on the internet. ga.js is the most popular javascript snippet in the history of the internet.

Google Analytics Usage on top websites:

source: builtwith.com

Imagine the responsibility of the Google engineering team that maintains the ga.js javascript file. While having to deal with multiple recent changes and new features on Google Analytics still have to make sure that their code runs as fast as possible and on all browsers that exist. They must support ie5.5 and low end mobile devices, otherwise these browsers wouldn’t show up on Google analytics reports. Still they must do it while keeping the code from affecting the website performance.

I must say that they do a great work on keeping that code. The asynchronous syntax while confusing at first is a very clever way to push code execution and loading way down on the queue, so browsers don’t delay the page loading to register a GA pageview. It’s clear that the GA team takes great care when it comes to how fast and seamless their code is.

The one point that still bothers me a lot regarding performance are the Google Analytics cookies. Let’s take a look at what GA cookies look like:

>document.cookie
"__utma=96182344.347392035.1326382423.1326382423.1326382423.1; __utmb=96182344.1.10.1326382423; __utmc=96182344; __utmz=96182344.1326382423.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)"
>document.cookie.length
188

This is a minimum GA cookie. It can get longer if you use Custom Variables and Google Website Optimizer. But let’s settle down with the minimum for now.
These cookies are used iternally in GA to keep state and are manipulated by the code on ga.js javascript file. Different from most other cookies you might see out there these cookies don’t need to hit your webservers never. Still they hit your website every single time an HTTP request is made.

According to Google SPDY whitepaper the average HTTP request is 700-800 bytes long. That means that GA Cookies represent about 25% of that HTTP request size. The moment you notice GA is present in about 50% of top websites you notice that useless GA cookies going around the internet represent 12% of all HTTP requests.

I’ve posted a bug regarding this issue on GA-Issues a while ago. The idea is to use HTML5 localStorage to store the cookies on browsers that support it. Still it has attracted no attention so far. This bug fix could easily make the average HTTP request around 5% faster. We’re talking about the average speed of the whole internet.

The real picture is not that bad, since this only affect HTTP requests and not HTTP responses and that’s where the real data is. Still it’s funny to see something that huge going around unnoticed.

  • jemerick

    One of the recommended things to do is to serve static content from a cookie-less domain so that the cookies are not sent with every static content request which would help a bit.

  • AM

    You are seriously overstating the effect. That is 5% of the request traffic. Requests are tiny in comparison to responses.

    • AM

      I do see the GA cookie as overhead and localStorage might be a good option in some cases. A few things I would want to consider in designing it:

      - How many bytes are we adding to the ga.js script to make it work – since we clearly also need to continue to support cookies for older browsers
      - Some sites might expect to be able to read the GA cookie as well. Perhaps for informing in-house analytics. I know i’ve thought about just storing a log of the the GA cookie value along with user_id and other stuff im not sending to QA to do more detailed analysis
      - Is there anything that can be done to shorten/cleanup the cookie first. I havent looked in a while but I had the feeling that there was some bloat in the cookie content that might be a good first target

  • Fred

    It will not make the internet 5% faster only requests, the responses are the big part.

  • John Doe

    You do realize that these cookies tracks your *every move* on the web do you?
    Open you browser Preferences dialog and turn off cookies now!
    After that, install Adblock Plus and block Google Analytics completely.
    Can’t be faster than that – no downloading of script files, no cookies etc…

    • Jordan Louis

      Google Analytics doesn’t track your *every move* on the entire web.  The Google Analytics tracking code only feeds data to the individual account or accounts owned by the owner of the website you’re visiting.  This allows the owner of a website to determine what content is being used the most, what content is being searched for, and whether the users of the website are getting what they came for.  While Google stores this data on their own servers, the data on these servers is specific to each visited website on which Google Analytics is installed, and is not easily accessible by Google themselves.  This data is not personally identifiable to any degree whatsoever, unless you’re among those who still confuse an IP address with a single person’s identity.

      If you’re really afraid of being tracked everywhere you go, I suggest finding a nice little cabin in the woods, hunting your food, cutting your own firewood, and living entirely off the grid.  Otherwise, I’m afraid that blocking Google Analytics should be the least of your worries.

      • Anon

        > [GA tracking data] is not easily accessible by Google themselves

        Uh, I think you completely missed the entire business model behind GA. They can access the data, and they do access the data to more accurately target ads. Facebook does the same thing with metrics returned from pages embedding Facebook widgets (e.g., comments and “Like this!” buttons).

        > This data is not personally identifiable to any degree whatsoever

        It would be both trivial and financially advantageous for Google to internally correlate GA tracking metrics with Google accounts. To assume they’re not doing so is folly.

  • http://www.jqueryin.com Corey Ballou

    Since we’re talking about web scale, we might as well talk about the economics of such a large scale operation as Google Analytics. Consider the amount of bandwidth google is using for simply hosting the globally available analytics JavaScript. If they were to implement a cross-browser localStorage implementation with fallbacks on cookies, they’d be drastically increasing the size of ga.js and thereby drastically increasing their bandwidth and overhead. I’d argue that they’re not responding because it’s not in their best interest to throw away money when they can simply parse your cookies client side.

  • HugoCrd

    You are talking about the average speed of the whole web, not the whole internet aren’t you ?

  • Pouet

    Most ISP offer asymmetric links – for instance I get around 30mpbs down, 1mbps up on my DSL line. So each uploaded byte is worth 30 dowloaded, which means not sending that cookie is more or less like removing 5 kbytes from each page (in terms ofo speed).

  • Matthew Barry

    localStorage is not the same as cookies, it’s only accessible through javascript meaning there would need to be another request made after ga.js loads to send your tracking information back to Google.  Cookies are the ideal solution if Google needs to track who is making every request to their servers which I would assume that they do.

    • http://eduardo.cereto.net/ Eduardo Cereto Carvalho

      That’s not how they send the GA information. They get all information from cookies and send it in a __utm.gif request query parameters. They don’t need cookies at all.

      • Matthew Barry

        You’re right, I was thinking the tracking was sent through the cookies, but after looking at the implementation it makes a lot more sense.  My mistake!

  • http://twitter.com/rasmus Rasmus Lerdorf

    I kind of doubt the 5% number for the typical request. But requests that already carry a bunch of cookies and other header data where the GA cookie makes the request headers overflow the MTU will definitely see improvement by being cut to fit in a single packet. Especially for mobile devices on slower connections.

  • Guest

    I also think they should hire you.

    • http://eduardo.cereto.net/ Eduardo Cereto Carvalho

      they did ;)

  • Nathan Friemel

    There are some issues to consider before saying localStorage is the end all be all.
    * The localStorage api is synchronous/blocking, could be harmful for JS heavy sites or sites that do a lot of JS animation
    * LocalStorage checks would have to be added
    * LocalStorage can be present, but disabled by the user, another check to perform
    * The I/O speeds of most browser localStorage implementations are pretty terrible, though surprisingly fast on most mobile devices, probably due to solid state drives
    * There is a size limit to localStorage per domain, while googles tracking id probably wont be 5MB in size what if a domain is already using up the allotted space for localStorage, another check to perform
    * All these checks would ultimately require a fallback
    * Even compressed and gzipped all these checks and fallbacks would probably increase the size of ga.js more than the 200 bytes we are trying to save, and I know that ga.js is sent once and cached and the 200 bytes are sent with every request, but it is up to the site owner/developer to minimize the number of requests

    • http://eduardo.cereto.net/ Eduardo Cereto Carvalho

      Thanks for that Nathan. Of course I didn’t consider every single corner case. I just assumed that it would work for the majority and the fallback would kick in otherwise. The ga.js overhead on the other hand is not a big problem. This file is cached.

  • Anonymous

    “the average HTTP request is 700-800 bytes long” . Thats well below the usual MSS (1420 – 1460 bytes). IMHO it doesnt really matter if a HTTP request is 1300 bytes or 200 bytes big, as long as its below ~1400 , it would need to travel in a single packet.