A long while back I built a built a web app which provided search and aggregated stats for Valve gameservers. At any time there are ~50k of these registered with the "master server", but only 20-40% of these are accessible publically. The original collector was written in PHP and would take about 20 minutes to discover and collect stats from the gameservers. I have rebuilt the collector in Python and it's able to collect all ~50k servers in under a minute, given a suitable (>=100mbit) connection.
In this post I'll outline how we can build such a collector. To do so we'll make use of two wonderful Python packages: gevent allows us to fire off requests in parallel which is essential to achieve the speeds desired and python-valve talks the Valve master and game server protocols. The two parts of the collecting process are a) read server addresses from the master server and b) read information directly from each server.
In spite of the risks, I always install the latest OSX betas on my personal laptop/dev-machine. This always brings a whole host of compatability issues and broken things, but fixing/discovering these is all part of the fun! This post is a summary of issues/solutions found so far - hopefully it'll be of help to someone. I shall keep updating as I discover more.