Earlier this year, I was tasked with creating an application that would download information from our organization’s website using Python. The tricky part was that it would be encrypted, gzipped and the payload would be JSON. Could Python do all that? Well, that’s what I wanted to find out. Now it’s time for you to learn what I discovered.
Python and Encryption
The first order of business was to figure out the encryption stuff.The payload was supposed to be AES encrypted. While Python doesn’t seem to have a module that’s built in for this sort of thing, there is an excellent PyCrypto package for Python 2.x that works just fine. Unfortunately, their main site doesn’t list how to install it on Windows. You need to do some compiling of your own to get it to work (using Visual Studio, I think), or you can download Michael Foord’s builds here. I went with that latter.
Here’s the basic code I ended up using:
from Crypto.Cipher import AES cipher = AES.new(key, AES.MODE_ECB) gzipData = cipher.decrypt(encData).strip('\000')
The encData variable is just the file that’s been downloaded using urllib2. We’ll look at how to do that soon enough. Just be patient. The key was provided by one of my fellow developers. Anyway, once you’ve unencrypted it, you end up with the gzipped data.
Decompressing Gzipped Files
The documentation about gzipped stuff is pretty confusing. Do you use gzip or zlib? It took quite a bit of trial and error for me to figure that out, mainly because my colleague was giving me the wrong file format. This part actually ended up being super easy to do too:
import zlib jsonTxt = zlib.decompress(gzipData)
If you do the above, you’ll end up with the decompressed data. Yes, it really is that simple.
JSON and Python
Starting with Python 2.6, you get a json module shipped in Python. You can read about it here. If you’re stuck with an older version, than you can download the module from PyPIinstead. Or you can use the simplejson package, which is what I used.
import simplejson json = simplejson.loads(jsonTxt )
Now you’ll have a list of nested dictionaries. Basically, you’ll want to do something like this to use it:
data = json['keyName']
That will return another dictionary with different data. You’ll want to study the data structure a bit to figure out the best way to access what you want.
Putting it all Together
Now let’s put it all together and show you the completed script:
import simplejson import urllib2 import zlib from Crypto.Cipher import AES from platform import node from win32api import GetUserName version = "1.0.4" uid = GetUserName().upper() machine = node() #---------------------------------------------------------------------- def getData(url, key): """ Downloads and decrypts gzipped data and returns a JSON string """ try: headers = {"X-ActiveCalls-Version":version, "X-ActiveCalls-User-Windows-user-ID":uid, "X-ActiveCalls-Client-Machine-Name":machine} request = urllib2.Request(url, headers=headers) f = urllib2.urlopen(request) encData = f.read() cipher = AES.new(key, AES.MODE_ECB) gzipData = cipher.decrypt(encData).strip('\000') jsonTxt = zlib.decompress(gzipData) return jsonTxt except: msg = "Error: Program unable to contact update server. Please check configuration URL" print msg if __name__ == "__main__": json = getData("some url", "some AES key")
In this particular example, I needed to also let the server know which version of the application was requesting data, who the user was, and which machine the request came from. To do all that, we use the urllib2’s Request method to pass a special header to the server with that information. The rest of the code should be pretty self-explanatory
Wrapping Up
I hope that all made sense and that it’s helpful to you in your Python adventures. If not, check out the links I provided in the various sections and do a little research. Have fun!
According to RFC 2616, spaces are invalid in HTTP header names.
Updated the post so the headers no longer have spaces. Sorry about that.