Python’s “batteries included” philosophy even includes a module for object serialization. They call it the pickle module. Some people call serialization by other names, such as marshalling or flattening. In Python, it’s known as “pickling”. The pickle module also has an optimized C-based version known as cPickle that can run up to 1000 times faster than the ordinary pickle. The documentation does come with a warning though and it’s important so it is reprinted below:
Warning: The pickle module is not intended to be secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.
Now that we have that out of the way, we can start learning how to use pickle! By the end of this post, you may be hungry!
Writing a Simple Pickle Script
We will start out by writing a simple script that demonstrates how to pickle a Python list. Here’s the code:
import pickle #---------------------------------------------------------------------- def serialize(obj, path): """ Pickle a Python object """ with open(path, "wb") as pfile: pickle.dump(obj, pfile) #---------------------------------------------------------------------- def deserialize(path): """ Extracts a pickled Python object and returns it """ with open(path, "rb") as pfile: data = pickle.load(pfile) return data #---------------------------------------------------------------------- if __name__ == "__main__": my_list = [i for i in range(10)] pkl_path = "data.pkl" serialize(my_list, pkl_path) saved_list = deserialize(pkl_path) print saved_list
Let’s take a few minutes to study this code. We have two functions, the first of which is for saving (or pickling) a Python object. The second is for deserializing (or unpickling) the object. To do the serialization, you just need to call pickle’s dump method and pass it the object to be pickled and an open file handle. To deserialize the object, you just call pickle’s load method. You can pickle multiple objects into one file, but the pickling works like a FIFO (first in, first out) stack. So you’ll get the items out in the order that you put them in. Let’s change the code above to demonstrate this concept!
import pickle #---------------------------------------------------------------------- def serialize(objects, path): """ Pickle a Python object """ with open(path, "wb") as pfile: for obj in objects: pickle.dump(obj, pfile) #---------------------------------------------------------------------- def deserialize(path): """ Extracts a pickled Python object and returns it """ with open(path, "rb") as pfile: lst = pickle.load(pfile) dic = pickle.load(pfile) string = pickle.load(pfile) return lst, dic, string #---------------------------------------------------------------------- if __name__ == "__main__": my_list = [i for i in range(10)] my_dict = {"a":1, "b":2} my_string = "I'm a string!" pkl_path = "data.pkl" serialize([my_list, my_dict, my_string], pkl_path) data = deserialize(pkl_path) print data
In this code, we pass in a list of 3 Python objects: a list, a dictionary and a string. Note that we have to call pickle’s dump method to store each of these objects. When you deserialize, you’ll need to call pickle’s load method the same number of times.
Other Notes about Pickling
You can’t pickle everything. For example, you cannot pickle Python objects that have ties to C/C++ underneath, such as wxPython. If you try to, you’ll receive a PicklingError. According to the documentation, the following types can be pickled:
- None, True, and False
- integers, long integers, floating point numbers, complex numbers
- normal and Unicode strings
- tuples, lists, sets, and dictionaries containing only picklable objects
- functions defined at the top level of a module
- built-in functions defined at the top level of a module
- classes that are defined at the top level of a module
- instances of such classes whose __dict__ or the result of calling __getstate__() is picklable (see section The pickle protocol for details).
Also note that if you happen to use the cPickle module for the speedup in processing time, you cannot subclass it. The cPickle module does not support subclassing of the Pickler() and Unpickler() because they’re actually functions in cPickle. That’s a rather sneaky got’cha that you need to be aware of.
Finally, pickle’s output data format uses a printable ASCII representation. Let’s take a look at the second script’s output just for fun:
(lp0 I0 aI1 aI2 aI3 aI4 aI5 aI6 aI7 aI8 aI9 a.(dp0 S'a' p1 I1 sS'b' p2 I2 s.S"I'm a string!" p0 .
Now, I’m not an expert on this format, but you can kind of see what’s going on. However, I’m not sure how to tell what’s the end of a section. Also note that the pickle module uses protocol version 0 by default. There are protocols 2 and 3. You can specify which protocol you want by passing it in as the 3rd argument to pickle’s dump method.
Finally, there’s a really cool video on the pickle module from PyCon 2011 by Richard Saunders.
Wrapping Up
At this point, you should be able to use pickle for your own data serialization needs. Have fun!
- Python documentation on the pickle module
- The Python wiki article: Using Pickle
- Python documentation on the marshal module
- Effbot’s marshal page
Pingback: Mike Driscoll: Python 101: An Intro to Object Serialization with Pickle | The Black Velvet Room
Just as an afterthought, shouldn’t you hash the pickle file to make sure it doesn’t change before reading it back in? I guess I am paranoid. 🙂
Yeah, probably…but I did warn you. Never trust a file unless you know you made it yourself…when you were lucid.
You did! Sad my attention span is that short. 🙂