Tuesday, May 12, 2009

Store pickled data in a database using SQLAlchemy

I've been using SQLAlchemy (and Elixir) to write the data collection code for GovCheck. My first choice would have been to use Django's ORM itself, but that proved harder and more time consuming that I thought worth it.

One of the problems that I've had to solve is how to store lots of static data in a SQL database without adding more than 100 columns into a DB table. A good solution to this problem is to store this data with a pickled dictionary. The problem with that is the in-built PickleType defined by SQLAlchemy does not also base64 encode the result of the pickle operation on the dictionary (I wanted to do this because the custom field type I'd defined within the Django app was something similar to this, which assumes base64 encoded data coming in). However, fortunately, it's rather easy to create custom Type's for SQLAlchemy and use them in your models. Here is the custom type I created to store the data.

import base64
try:
import cPickle as pickle
except:
import pickle

class EncodedPickleType(types.TypeDecorator):
"""
This class should be used whenever pickled data needs to be stored
(instead of using the in-built PickleType). The reason for this is
that the in-built type does not encode the pickled string using
base64, which is what the Django field type expects.
"""
impl = types.Text

def process_bind_param(self, value, dialect):
dumps = pickle.dumps
if value is None:
return None
return base64.b64encode(dumps(value))

def process_result_value(self, value, dialect):
loads = pickle.loads
if value is None:
return None
if not isinstance(value, basestring):
return value
return loads(base64.b64decode(value))


You can then use this within your model definitions and feed it a dictionary. The field will take care of pickling and encoding it as well as decoding and unpickling the data when accessing the data.

1 comment:

Unknown said...

it would be helpful if you included the full example of your implementation w.r.t. flask and sqlalchemy as your post title suggests.

thanks