Amazon S3 to Apache Common Log Format Converter
I had put this together for a client a couple of months ago, and I’m just getting around to blogging about it now, at the time I couldn’t find any tools that might make this task easier, I assume folks who build web stats analyzers will deal with the S3 format natively, eventually.
Until that time, here’s a little converter to make Amazon S3 logs understandable to your favourite web log analyzer.
The work is done by one gnarly regular expression, which was easy to put together with the help of Pyreb.
You can download the python source here, and as always your mileage may vary.
Nice effort. I keep a little list of tools like this over at s3stat.com, so I’ll be sure to add a link.
And hey, you might appreciate a heads up about S3stat itself. It’s a little service that downloads and processes S3’s server access logs, converts them similar to how your code does it, runs the output through Webalizer, and uploads it back to S3. Everything the lazy man needs to get web stats for their S3 account. Check it out when you get a chance:
http://www.s3stat.com/
This is an excellent script. One thing you may want to note, your script creates apache logs in the ‘combined’ format, not the ‘common’ format. If you’re like me and want the more valuable ‘combined’ formatted logs, this script is for you.
For those who don’t know, combined format is just like the common format except for two additional fields which include the referer and the user agent.
Thanks for a great script,
Angelo