JavaScript vs Logs
A war has been raging over the last few years between consumers and advertisers. A never ending cat and mouse game where each side tries to out-do the other. It’s tracking vs blocking, clicks vs privacy, readers vs publishers.
Gone are the days when adding the GA/et-al snippet to your blog captured almost every user. My theory is that, for tech oriented blogs, GA misses a significant portion of traffic. Tech readers are more aware and actually care about their privacy online.
But how big is the problem? What am I not seeing in GA? Am I talking a load of nonsense? To find out I used GoAccess to parse & track the stats for this blog from the nginx logs. The results have been quite revealing. Let’s look at February to July of 2018:
That’s a huge difference! Much more than I ever expected. Note that this after some cleanup. GoAccess uses servers logs so all traffic counts - including crawlers, static files and bad actors. The logs were filtered and processed as so:
# GET requests, 200 responses, no static/invalid files
grep "GET" access-feb-jun.log | \
grep '" 200 ' | \
grep -vE "\?|\x|http" | \
grep -vE "\.(png|php|css|ico|xml|js|jpg|zip|cgi|gif|woff|woff2|eot|svg|ttf)" \
> access-feb-jun-clean.log
# Parse with goaccess, removing redirects and crawlers
goaccess \
--log-format=COMBINED \
--ignore-crawlers \
--ignore-status=301 \
--ignore-status=302 \
-o report.csv \
access-feb-jun-clean.log
The result gives a clean(ish) log of legit traffic - which is roughly what GA should be tracking. As we can see, this is not the case - not even close. Even if we account for 10% still incorrect logs - GA is missing roughly 70% of traffic to this blog.
I have removed GA as it’s both useless and a privacy concern for anyone not using a blocker. The results provided by GoAccess are more than enough, even if they contain a few false positives.
Bonus item!: using server logs means tracking (rough) RSS readership is possible!
# GET requests, 200 responses, no static/invalid files
grep "/index.xml" access-feb-jun.log \
> access-feb-jun-rss.log
goaccess \
--log-format=COMBINED \
-o report.csv access-feb-jun-rss.log
- Next: On the Consolidation of the Web
- Previous: Why You Should Try pyinfra
- Archive: All Ramblings