Mapper Syntax

Documents

Documents are the basic building blocks of MongoKit. They define the schema of each document and how that document should be accessed

Document Class

from mongokit import Document, Connection

class MyDocument(Document):
    structure = {
        ...
    }
    required_fields = [
        ...
    ]
    default_values = {
        ...
    }

You can read more about the structure attribute, and the required_fields and default_values descriptors. They are the primary definition of a document. MongoKit also supports handling i18n, indexes, and migration.

Registering

Once a document has been defined, it must be registered with a Connection:

connection = Connection()
connection.register([MyDocument])

Optionally, the register method can be used as a decorator:

@connection.register
class MyDocument(Document):
    structure = {...}

Database and Collection

To use a Document, you must call it from a collection. In pymongo’s syntax, you would use connection.<database>.<collection> to access the collection. Once you have the collection, you can create a new document:

>>> connection.database.collection.MyDocument()
{... new Document's default values ...}

As a short cut, you can define the database and collection names in the Document definition:

@connection.register
class MyDocument(Document):
    __collection__ = 'collection_name'
    __database__ = 'database_name'
    structure = {...}

Now, we can have access to our document directly from the connection:

>>> connection.MyDocument()
{... new Document's default values ...}

Note that if you want to specify the __database__, you should also specify the __collection__ attribute.

It is also possible to access the Document from the database:

>>> connection.database.MyDocument() # this will use __collection__ as collection name

This matches the typical pattern of creating and passing around a db object:

>>> connection = Connection()
>>> db = connection[MONGODB_DATABASE_NAME]
>>> db.MyDocument()

Changing Collection Dynamically

You might need to specify a different db or collection dynamically. For instance, say you want to store a User by database.

>>> # Python 3
>>> class User(Document):
...     structure = {
...         'login':str,
...         'screen_name':str
...     }
>>> con.register([User])
>>> # Python 2
>>> class User(Document):
...     structure = {
...         'login':unicode,
...         'screen_name':unicode
...     }
>>> con.register([User])

Like pymongo, MongoKit allow you to change those parameters on the fly :

>>> user_name = 'namlook'
>>> user_collection = connection[user_name].profile
returns a reference to the database 'namlook' in the collection 'profile'.

Now, we can query the database by passing our new collection :

>>> profiles = user_collection.User.find()
>>> user = user_collection.User()
>>> user['login'] = 'namlook@namlook.com'
>>> user['screen_name'] = 'Namlook'

Calling user.save() will save the object into the database namlook in the collection profile.

Dot Notation

If you want to use the dot notation (ala json), you must set the use_dot_notation attribute to True:

# Python 3
class TestDotNotation(Document):
    use_dot_notation = True

    structure = {
        'foo':{
            'bar': str
        }
    }

# Python 2
class TestDotNotation(Document):
    use_dot_notation = True

    structure = {
        'foo':{
            'bar': basestring
        }
    }
>>> connection.register([TestDotNotation])
>>> doc = connection.database.TestDotNotation()
>>> doc.foo.bar = 'blah'
>>> doc
{'foo': {'bar': 'blah'}}

Note that if an attribute is not in structure, the value will be added as attribute :

>>> doc.arf = 3 # arf is not in structure
>>> doc
{'foo': {'bar': u'bla'}}

If you want to be warned when a value is set as attribute, you can set the dot_notation_warning attribute as True.

Polymorphism

In the following example, we have two objects, A and B, which inherit from Root. And we want to build an object C from A and B. Let’s build Root, A and B first:

# Python 3
from mongokit import *
class Root(Document):
    structure = {
        'root': int
    }
    required_fields = ['root']

class A(Root):
    structure = {
        'a_field': str,
    }
    required_fields = ['a_field']


class B(Root):
    structure = {
        'b_field': str,
    }

# Python 2
from mongokit import *
class Root(Document):
    structure = {
        'root': int
    }
    required_fields = ['root']

class A(Root):
    structure = {
        'a_field': basestring,
    }
    required_fields = ['a_field']


class B(Root):
    structure = {
        'b_field': basestring,
    }

Polymorphisms just work as expected:

class C(A,B):
    structure = {'c_field': float}

>>> c = C()
>>> c == {'b_field': None, 'root': None, 'c_field': None, 'a_field': None}
True
>>> C.required_fields
['root', 'a_field']

Descriptors

In the MongoKit philosophy, the structure must be simple, clear and readable. So all descriptors (validation, requirement, default values, etc.) are described outside the structure. Descriptors can be combined and applied to the same field.

required

This descriptor describes the required fields:

# Python 3
class MyDoc(Document):
    structure = {
        'bar': str,
        'foo':{
            'spam': str,
            'eggs': int,
        }
    }
    required = ['bar', 'foo.spam']

# Python 2
class MyDoc(Document):
    structure = {
        'bar': basestring,
        'foo':{
            'spam': basestring,
            'eggs': int,
        }
    }
    required = ['bar', 'foo.spam']

If you want to reach nested fields, just use the dot notation.

default_values

This descriptor allows to specify a default value at the creation of the document:

# Python 3
class MyDoc(Document):
     structure = {
         'bar': str,
         'foo':{
             'spam': str,
             'eggs': int,
         }
     }
     default_values = {'bar': 'hello', 'foo.eggs': 4}

# Python 2
class MyDoc(Document):
     structure = {
         'bar': basestring,
         'foo':{
             'spam': basestring,
             'eggs': int,
         }
     }
     default_values = {'bar': 'hello', 'foo.eggs': 4}

Note that the default value must be a valid type. Again, to reach nested fields, use dot notation.

validators

This descriptor brings a validation layer to a field. It takes a function which returns False if the validation fails, True otherwise:

# Python 3
import re
def email_validator(value):
   email = re.compile(r'(?:^|\s)[-a-z0-9_.]+@(?:[-a-z0-9]+\.)+[a-z]{2,6}(?:\s|$)',re.IGNORECASE)
   return bool(email.match(value))

class MyDoc(Document):
   structure = {
      'email': str,
      'foo': {
        'eggs': int,
      }
   }
   validators = {
       'email': email_validator,
       'foo.eggs': lambda x: x > 10
   }

# Python 2
import re
def email_validator(value):
   email = re.compile(r'(?:^|\s)[-a-z0-9_.]+@(?:[-a-z0-9]+\.)+[a-z]{2,6}(?:\s|$)',re.IGNORECASE)
   return bool(email.match(value))

class MyDoc(Document):
   structure = {
      'email': basestring,
      'foo': {
        'eggs': int,
      }
   }
   validators = {
       'email': email_validator,
       'foo.eggs': lambda x: x > 10
   }

You can add custom message to your validators by throwing a ValidationError instead of returning False

def email_validator(value):
   email = re.compile(r'(?:^|\s)[-a-z0-9_.]+@(?:[-a-z0-9]+\.)+[a-z]{2,6}(?:\s|$)',re.IGNORECASE)
   if not email.match(value):
      raise ValidationError('%s is not a valid email' % value)

Make sure to include one ‘%s’ in the message. This will be used to refer to the name of the field containing errors.

You can also pass params to your validator by wrapping it in a class:

class MinLengthValidator(object):
    def __init__(self, min_length):
        self.min_length = min_length

    def __call__(self, value):
        if len(value) >= self.min_length:
            return True
        else:
            raise Exception('%s must be at least %d characters long.' % (value, self.min_length))

# Python 3
class Client(Document):
    structure = {
      'first_name': str
    }
    validators = { 'first_name': MinLengthValidator(2) }

# Python 2
class Client(Document):
    structure = {
      'first_name': basestring
    }
    validators = { 'first_name': MinLengthValidator(2) }

In this example, first_name must contain at least 2 characters.

Adding Complex Validation

If the use of a validator is not enough, you can overload the validation method to fit your needs.

For example, take the following document:

# Python 3
class MyDoc(Document):
    structure = {
        'foo': int,
        'bar': int,
        'baz': str
    }

# Python 2
class MyDoc(Document):
    structure = {
        'foo': int,
        'bar': int,
        'baz': basestring
    }

We want to be sure before saving our object that foo is greater than bar. To do that, we just overload the validation method:

def validate(self, *args, **kwargs):
    assert self['foo'] > self['bar']
    super(MyDoc, self).validate(*args, **kwargs)

Skipping Validation

Once your application is ready for production and you are sure that the data is consistent, you might want to skip the validation layer. This will make MongoKit significantly faster (as fast as pymongo). In order to do that, just set the skip_validation attribute to True.

TIP: It is a good idea to create a RootDocument and to inherit all your document classes from it. This will allow you to control the default behavior of all your objects by setting attributes on the RootDocument:

class RootDocument(Document):
    structure = {}
    skip_validation = True
    use_autorefs = True

class MyDoc(RootDocument):
    structure = {
        'foo': int
    }

Note that you can always force the validation at any moment on saving even if skip_validation is True:

>>> con.register([MyDoc]) # No need to register RootDocument as we do not instantiate it
>>> mydoc = tutorial.MyDoc()
>>> mydoc['foo'] = 'bar'
>>> mydoc.save(validate=True)
Traceback (most recent call last):
...
SchemaTypeError: foo must be an instance of int not basestring

Quiet Validation Detection

By default, when validation is on, each error raises an Exception. Sometimes, you just want to collect all errors in one place. This is possible by setting the raise_validation_errors to False. This causes all errors to be stored in the validation_errors attribute:

class MyDoc(Document):
    raise_validation_errors = False
    structure = {
        'foo': set,
    }
>>> con.register([MyDoc])
>>> doc = tutorial.MyDoc()
>>> doc.validate()
>>> doc.validation_errors
{'foo': [StructureError("<type 'set'> is not an authorized type",), RequireFieldError('foo is required',)]}

validation_errors is a dictionary which takes the field name as key and the Python exception as value. There are two issues with foo here: one with it’s structure (set is not an authorized type) and another with required field (foo is required field but is not specified).

>>> doc.validation_errors['foo'][0].message
"<type 'set'> is not an authorized type"

Validate Keys

If the value of key is not known but we want to validate some deeper structure, we use the “$<type>” descriptor:

# Python 3
class MyDoc(Document):
    structure = {
        'key': {
            str: {
                'first': int,
                'secondpart': {
                    str: int
                }
            }
        }
    }

required_fields = ["key1.$str.bla"]

# Python 2
class MyDoc(Document):
    structure = {
        'key': {
            unicode: {
                'first': int,
                'secondpart': {
                    unicode: int
                }
            }
        }
    }

required_fields = ["key1.$unicode.bla"]

Note that if you use a Python type as a key in structure, generate_skeleton won’t be able to build the entire underlying structure :

>>> con.register([MyDoc])
>>> tutorial.MyDoc() == {'key1': {}, 'bla': None}
True

So, neither default_values nor validators will work.

The Structure

The structure is a simple dict which defines the document’s schema.

Field Types

Field types are simple python types. By default, MongoKit allows the following types:

# Common types between python 3 and python 2

None # Untyped field
bool
int
float
list
dict
datetime.datetime
bson.binary.Binary
pymongo.objectid.ObjectId
bson.dbref.DBRef
bson.code.Code
type(re.compile(""))
uuid.UUID
CustomType

# Python 3 types
bytes
str

# Python 2 types
basestring
long
unicode

Untyped field

Sometimes you don’t want to specify a type for a field. In order to allow a field to have any of the authorized types, just set the field to None in the structure:

class MyDoc(Document):
  structure = {
    'foo': int,
    'bar': None
  }

In this example, bar can be any of the above types except for a CustomType.

Nested Structure

MongoDB allows documents to include nested structures using lists and dicts. You can also use the structure dict to specify these nested structures as well.

Dicts

Python’s dict syntax {} is used for describing nested structure:

# Python 3
class Person(Document):
  structure = {
    'biography': {
      'name': str,
      'age': int
    }
  }

# Python 2
class Person(Document):
  structure = {
    'biography': {
      'name': basestring,
      'age': int
    }
  }

This validates that each document has an author dict which contains a string name and an integer number of books.

If you want to nest a dict without type validation, you must use the dict type keyword instead:

class Person(Document):
  structure = {
    'biography': dict
  }

If you don’t specify the nested structure or don’t use the dict type keyword, you won’t be able to add values to the nested structure:

class Person(Document):
  structure = {
    'biography': {}
  }
>>> bob = Person()
>>> bob['biography']['foo'] = 'bar'
>>> bob.validate()
Traceback (most recent call last):
...
StructureError: unknown fields : ['foo']

Using dict type is useful if you don’t know what fields will be added or what types they will be. If you know the type of the field, it’s better to explicitly specify it:

# Python 3
class Person(Document):
  structure = {
    'biography': {
      unicode: str
    }
  }

# Python 2
class Person(Document):
  structure = {
    'biography': {
      unicode: unicode
    }
  }

This will add another layer to validate the content. See the validate-keys section for more information.

Lists

The basic way to use a list is without validation of its contents:

class Article(Document):
    structure = {
        'tags': list
    }

In this example, the tags value must be a list but the contents of tags can be anything at all. To validate the contents of a list, you use Python’s list syntax [] instead:

# Python 3
class Article(Document):
    structure = {
        'tags': [str]
    }

# Python 2
class Article(Document):
    structure = {
        'tags': [basestring]
    }

You can also validate an array of complex objects by using a dict:

# Python 3
class Article(Document):
    structure = {
        'tags': [
            {
            'name': str,
            'count': int
            }
        ]
    }

# Python 2
class Article(Document):
    structure = {
        'tags': [
            {
            'name': basestring,
            'count': int
            }
        ]
    }

Tuples

If you need a structured list with a fixed number of items, you can use tuple to describe it:

# Python 3
class MyDoc(Document):
    structure = {
        'book': (int, str, float)
    }

# Python 2
class MyDoc(Document):
    structure = {
        'book': (int, basestring, float)
    }
>>> con.register([MyDoc])
>>> mydoc = tutorial.MyDoc()
>>> mydoc['book']
[None, None, None]

Tuple are converted into a simple list and add another validation layer. Fields must follow the right type:

>>> # Python 3
>>> mydoc['book'] = ['foobar', 1, 1.0]
>>> mydoc.validate()
Traceback (most recent call last):
...
SchemaTypeError: book must be an instance of int not str

>>> # Python 2
>>> mydoc['book'] = ['foobar', 1, 1.0]
>>> mydoc.validate()
Traceback (most recent call last):
...
SchemaTypeError: book must be an instance of int not basestring

And they must have the right number of items:

>>> mydoc['book'] = [1, 'foobar']
>>> mydoc.validate()
Traceback (most recent call last):
...
SchemaTypeError: book must have 3 items not 2

As tuples are converted to list internally, you can make all list operations:

>>> mydoc['book'] = [1, 'foobar', 3.2]
>>> mydoc.validate()
>>> mydoc['book'][0] = 50
>>> mydoc.validate()

Sets

The set python type is not supported in pymongo. If you want to use it anyway, use the Set() custom type:

# Python 3
class MyDoc(Document):
  structure = {
    'tags': Set(str),
  }

# Python 2
class MyDoc(Document):
  structure = {
    'tags': Set(unicode),
  }

Using Custom Types

Sometimes we need to work with complex objects while keeping their footprint in the database fairly simple. Let’s take a datetime object. A datetime object can be useful to compute complex date and though MongoDB can deal with datetime object, we may just want to store its unicode representation.

MongoKit allows you to work on a datetime object and store the unicode representation converted on the fly. In order to do this, we have to implement a CustomType and fill the custom_types attributes:

>>> import datetime

A CustomType object must implement two methods and one attribute:

  • to_bson(self, value): this method will convert the value to fit the correct authorized type before being saved in the db.
  • to_python(self, value): this method will convert the value taken from the db into a python object
  • validate(self, value, path): this method is optional and will add a validation layer. Please, see the Set() CustomType code for more example.
  • You must specify a mongo_type property in the CustomType class. this will describe the type of the value stored in the mongodb.
  • If you want more validation, you can specify a python_type property which is the python type the value will be converted to. It is a good thing to specify it as it make a good documentation.
  • init_type attribute will allow to describe an empty value. For example, if you implement the python set as CustomType, you’ll set init_type to Set. Note that init_type must be a type or a callable instance.
# Python 3
class CustomDate(CustomType):
    mongo_type = str
    python_type = datetime.datetime # optional, just for more validation
    init_type = None # optional, fill the first empty value

    def to_bson(self, value):
        """convert type to a mongodb type"""
        return unicode(datetime.datetime.strftime(value,'%y-%m-%d'))

    def to_python(self, value):
        """convert type to a python object"""
        if value is not None:
           return datetime.datetime.strptime(value, '%y-%m-%d')

    def validate(self, value, path):
        """OPTIONAL : useful to add a validation layer"""
        if value is not None:
            pass # ... do something here

# Python 2
class CustomDate(CustomType):
    mongo_type = unicode
    python_type = datetime.datetime # optional, just for more validation
    init_type = None # optional, fill the first empty value

    def to_bson(self, value):
        """convert type to a mongodb type"""
        return unicode(datetime.datetime.strftime(value,'%y-%m-%d'))

    def to_python(self, value):
        """convert type to a python object"""
        if value is not None:
           return datetime.datetime.strptime(value, '%y-%m-%d')

    def validate(self, value, path):
        """OPTIONAL : useful to add a validation layer"""
        if value is not None:
            pass # ... do something here

Now, let’s create a Document:

class Foo(Document):
    structure = {
        'foo':{
            'date': CustomDate(),
        },
    }

Now, we can create Foo objects and work with python datetime objects:

>>> foo = Foo()
>>> foo['_id'] = 1
>>> foo['foo']['date'] = datetime.datetime(2003,2,1)
>>> foo.save()

The object saved in the db has the unicode footprint as expected:

>>> tutorial.find_one({'_id':1})
{u'_id': 1, u'foo': {u'date': u'03-02-01'}}

Querying an object will automatically convert the CustomType into the correct python object:

>>> foo = tutorial.Foo.get_from_id(1)
>>> foo['foo']['date']
datetime.datetime(2003, 2, 1, 0, 0)

OR, NOT, and IS operators

You can also use boolean logic to do field type validation.

OR operator

Let’s say that we have a field which can be either unicode, int or a float. We can use the OR operator to tell MongoKit to validate the field :

>>> # Python 3
>>> from mongokit import OR
>>> from datetime import datetime
>>> class Account(Document):
...     structure = {
...         "balance": {'foo': OR(str, int, float)}
...     }
>>> # Validation
>>> con.register([Account])
>>> account = tutorial.Account()
>>> account['balance']['foo'] = '3.0'
>>> account.validate()
>>> account['balance']['foo'] = 3.0
>>> account.validate()
>>> # but
>>> account['balance']['foo'] = datetime.now()
>>> account.validate()
Traceback (most recent call last):
...
SchemaTypeError: balance.foo must be an instance of <str or int or float> not datetime
>>> # Python 2
>>> from mongokit import OR
>>> from datetime import datetime
>>> class Account(Document):
...     structure = {
...         "balance": {'foo': OR(unicode, int, float)}
...     }
>>> # Validation
>>> con.register([Account])
>>> account = tutorial.Account()
>>> account['balance']['foo'] = u'3.0'
>>> account.validate()
>>> account['balance']['foo'] = 3.0
>>> account.validate()
>>> # but
>>> account['balance']['foo'] = datetime.now()
>>> account.validate()
Traceback (most recent call last):
...
SchemaTypeError: balance.foo must be an instance of <unicode or int or float> not datetime

NOT operator

You can also use the NOT operator to tell MongoKit that you don’t want a given type for a field :

>>> # Python 3
>>> from mongokit import NOT
>>> class Account(Document):
...     structure = {
...         "balance": {'foo': NOT(str, datetime)}
...     }
>>> # Validation
>>> con.register([Account])
>>> account = tutorial.Account()
>>> account['balance']['foo'] = 3
>>> account.validate()
>>> account['balance']['foo'] = 3.0
>>> account.validate()
>>> # but
>>> account['balance']['foo'] = datetime.now()
>>> account.validate()
Traceback (most recent call last):
...
SchemaTypeError: balance.foo must be an instance of <not str, not datetime> not datetime
>>> account['balance']['foo'] = u'3.0'
>>> account.validate()
Traceback (most recent call last):
...
SchemaTypeError: balance.foo must be an instance of <not str, not datetime> not str
>>> # Python 2
>>> from mongokit import NOT
>>> class Account(Document):
...     structure = {
...         "balance": {'foo': NOT(unicode, datetime)}
...     }
>>> # Validation
>>> con.register([Account])
>>> account = tutorial.Account()
>>> account['balance']['foo'] = 3
>>> account.validate()
>>> account['balance']['foo'] = 3.0
>>> account.validate()
>>> # but
>>> account['balance']['foo'] = datetime.now()
>>> account.validate()
Traceback (most recent call last):
...
SchemaTypeError: balance.foo must be an instance of <not unicode, not datetime> not datetime
>>> account['balance']['foo'] = u'3.0'
>>> account.validate()
Traceback (most recent call last):
...
SchemaTypeError: balance.foo must be an instance of <not unicode, not datetime> not unicode

IS operator

Sometimes you might want to define a field which accepts only values limited to a predefined set. The IS operator can be used for this purpose:

>>> # Python 3
>>> from mongokit import IS
>>> class Account(Document):
...     structure = {
...         "flag": {'foo': IS('spam', 'controversy', 'phishing')}
...     }
>>> # Validation
>>> con.register([Account])
>>> account = tutorial.Account()
>>> account['flag']['foo'] = 'spam'
>>> account.validate()
>>> account['flag']['foo'] = 'phishing'
>>> account.validate()
>>> # but
>>> account['flag']['foo'] = 'foo'
>>> account.validate()
Traceback (most recent call last):
...
SchemaTypeError: flag.foo must be in ['spam', 'controversy', 'phishing'] not foo

>>> # Python 2
>>> from mongokit import IS
>>> class Account(Document):
...     structure = {
...         "flag": {'foo': IS(u'spam', u'controversy', u'phishing')}
...     }
>>> # Validation
>>> con.register([Account])
>>> account = tutorial.Account()
>>> account['flag']['foo'] = u'spam'
>>> account.validate()
>>> account['flag']['foo'] = u'phishing'
>>> account.validate()
>>> # but
>>> account['flag']['foo'] = u'foo'
>>> account.validate()
Traceback (most recent call last):
...
SchemaTypeError: flag.foo must be in [u'spam', u'controversy', u'phishing'] not foo

Schemaless Structure

One of the main advantages of MongoDB is the ability to insert schemaless documents into the database. As of version 0.7, MongoKit allows you to save partially structured documents. For now, this feature must be activated. It will be the default behavior in a future release.

To enable schemaless support, use the use_schemaless attribute:

class MyDoc(Document):
    use_schemaless = True

Setting use_schemaless to True allows to have an unset structure, however you can still specify a structure:

# Python 3
class MyDoc(Document):
    use_schemaless = True
    structure = {
        'title': str,
        'age': int
    }
    required_fields = ['title']

# Python 2
class MyDoc(Document):
    use_schemaless = True
    structure = {
        'title': basestring,
        'age': int
    }
    required_fields = ['title']

MongoKit will raise an exception only if required fields are missing:

>>> doc = MyDoc({'age': 21})
>>> doc.save()
Traceback (most recent call last):
...
StructureError: missed fields : ['title']
>>> doc = MyDoc({'age': 21, 'title': 'Hello World !'})
>>> doc.save()

Indexes

Sometimes, it’s desirable to have indexes on your dataset - especially unique ones. In order to do that, you must fill the indexes attribute. The indexes attribute is a list of dictionary with the following structure:

“fields”:# take a list of fields or a field name (required)
“unique”:should this index guarantee uniqueness? (optional, False by default)
“ttl”:(optional, 300 by default) time window (in seconds) during which this index will be recognized by subsequent calls to ensure_index - see pymongo documentation for ensure_index for details.
“check:(optional, True by default) don’t check if the field name is present in the structure. Useful if you don’t know the field name.

Example:

>>> # Python 3
>>> class MyDoc(Document):
...     structure = {
...         'standard':str,
...         'other':{
...             'deep':str,
...         },
...         'notindexed':str,
...     }
...
...     indexes = [
...         {
...             'fields':['standard', 'other.deep'],
...             'unique':True,
...         },
...     ]

>>> # Python 2
>>> class MyDoc(Document):
...     structure = {
...         'standard':unicode,
...         'other':{
...             'deep':unicode,
...         },
...         'notindexed':unicode,
...     }
...
...     indexes = [
...         {
...             'fields':['standard', 'other.deep'],
...             'unique':True,
...         },
...     ]

or if you have more than one index:

>>> # Python 3
>>> class Movie(Document):
...     db_name = 'test'
...     collection_name = 'mongokit'
...     structure = {
...         'standard':str,
...         'other':{
...             'deep':str,
...         },
...         'alsoindexed':str,
...     }
...
...     indexes = [
...         {
...             'fields':'standard',
...             'unique':True,
...         },
...         {
...             'fields': ['alsoindexed', 'other.deep']
...         },
...     ]

>>> # Python 2
>>> class Movie(Document):
...     db_name = 'test'
...     collection_name = 'mongokit'
...     structure = {
...         'standard':unicode,
...         'other':{
...             'deep':unicode,
...         },
...         'alsoindexed':unicode,
...     }
...
...     indexes = [
...         {
...             'fields':'standard',
...             'unique':True,
...         },
...         {
...             'fields': ['alsoindexed', 'other.deep']
...         },
...     ]

By default, the index direction is set to 1. You can change the direction by passing a list of tuple. Direction must be one of INDEX_ASCENDING (or 1), INDEX_DESCENDING (or -1), INDEX_OFF (or 0), INDEX_ALL (or 2) or INDEX_GEO2D (or ‘2d’):

>>> # Python 3
>>> class MyDoc(Document):
...     structure = {
...         'standard':str,
...         'other':{
...             'deep':str,
...         },
...         'notindexed':str,
...     }
...
...     indexes = [
...         {
...             'fields':[('standard',INDEX_ASCENDING), ('other.deep',INDEX_DESCENDING)],
...             'unique':True,
...         },
...     ]

>>> # Python 2
>>> class MyDoc(Document):
...     structure = {
...         'standard':unicode,
...         'other':{
...             'deep':unicode,
...         },
...         'notindexed':unicode,
...     }
...
...     indexes = [
...         {
...             'fields':[('standard',INDEX_ASCENDING), ('other.deep',INDEX_DESCENDING)],
...             'unique':True,
...         },
...     ]

To prevent adding an index on the wrong field (misspelled for instance), MongoKit will check by default the indexes descriptor. In some cases may want to disable this. To do so, add "check":True:

>>> class MyDoc(Document):
...    structure = {
...        'foo': dict,
...        'bar': int
...    }
...    indexes = [
...        # I know this field is not in the document structure, don't check it
...        {'fields':['foo.title'], 'check':False}
...    ]

In this example, we index the field foo.title which is not explicitly specified in the structure.

Internationalization

Sometime you might want to present your data in differents languages and have i18n fields. Mongokit provides helper to do it.

i18n with dot_notation

Let’s create a simple i18n BlogPost:

>>> # Python 3
>>> from mongokit import *
>>> class BlogPost(Document):
...     structure = {
...             'title':str,
...             'body':str,
...             'author':str,
...     }
...     i18n = ['title', 'body']
...     use_dot_notation = True
>>> # Python 2
>>> from mongokit import *
>>> class BlogPost(Document):
...     structure = {
...             'title':unicode,
...             'body':unicode,
...             'author':unicode,
...     }
...     i18n = ['title', 'body']
...     use_dot_notation = True

Declare your structure as usual and add an i18n descriptor. The i18n descriptor will tel Mongokit that the fields title and body will be in multiple language.

Note of the use of use_dot_notation attribute. Using i18n with dot notation is more fun but a little slower (not critical thought). We will see later how to use i18n is a blazing fast way (but less fun).

Let’s create a BlogPost object and fill some fields:

>>> # Python 3
>>> con = Connection()
>>> con.register([BlogPost])
>>> blog_post = con.test.i18n.BlogPost()
>>> blog_post['_id'] = 'bp1'
>>> blog_post.title = "Hello"
>>> blog_post.body = "How are you ?"
>>> blog_post.author = "me"

>>> # Python 2
>>> con = Connection()
>>> con.register([BlogPost])
>>> blog_post = con.test.i18n.BlogPost()
>>> blog_post['_id'] = u'bp1'
>>> blog_post.title = u"Hello"
>>> blog_post.body = u"How are you ?"
>>> blog_post.author = u"me"

Now let’s say we want to write your blog post in French. We select the language with the set_lang() method:

>>> # Python 3
>>> blog_post.set_lang('fr')
>>> blog_post.title = "Salut"
>>> blog_post.body = "Comment allez-vous ?"

>>> # Python 2
>>> blog_post.set_lang('fr')
>>> blog_post.title = u"Salut"
>>> blog_post.body = u"Comment allez-vous ?"

the author field is not i18n so we don’t have to set it again.

Now let’s play with our object

>>> # Python 3
>>> blog_post.title
'Salut'
>>> blog_post.set_lang('en')
>>> blog_post.title
'Hello'

>>> # Now, let's see how it work:
>>> blog_post
{'body': {'fr': 'Comment allez-vous ?', 'en': 'How are you ?'}, '_id': 'bp1', 'title': {'fr': 'Salut', 'en': 'Hello'}, 'author': 'me'}

>>> # Python 2
>>> blog_post.title
u'Salut'
>>> blog_post.set_lang('en')
>>> blog_post.title
u'Hello'

>>> # Now, let's see how it work:
>>> blog_post
{'body': {'fr': u'Comment allez-vous ?', 'en': u'How are you ?'}, '_id': u'bp1', 'title': {'fr': u'Salut', 'en': u'Hello'}, 'author': u'me'}

The title field is actually a dictionary which keys are the language and the values are the text. This is useful if you don’t want to use the dot notation. Let’s save our object:

>>> # Python 2
>>> blog_post.save()
>>> raw_blog_post = con.test.i18n.find_one({'_id':'bp1'})
>>> raw_blog_post
{'body': [{'lang': 'fr', 'value': 'Comment allez-vous ?'}, {'lang': 'en', 'value': 'How are you ?'}], '_id': 'bp1', 'author': 'me', 'title': [{'lang': 'fr', 'value': 'Salut'}, {'lang': 'en', 'value': 'Hello'}]}

>>> # Python 3
>>> blog_post.save()
>>> raw_blog_post = con.test.i18n.find_one({'_id':'bp1'})
>>> raw_blog_post
{u'body': [{u'lang': u'fr', u'value': u'Comment allez-vous ?'}, {u'lang': u'en', u'value': u'How are you ?'}], u'_id': u'bp1', u'author': u'me', u'title': [{u'lang': u'fr', u'value': u'Salut'}, {u'lang': u'en', u'value': u'Hello'}]}

Now, the title field looks little different. This is a list of dictionary which have the following structure:

[{'lang': lang, 'value', text}, ...]

So, when an i18n object is save to the mongo database, it structure is changed. This is done to make indexation possible.

Note that you can still use this way event if you enable dot notation.

Default language

By default, the default language is english (‘en’). You can change it easily by passing arguments in object creation:

>>> blog_post = con.test.i18n.BlogPost()
>>> blog_post.get_lang() # english by default
'en'
>>> blog_post = con.test.i18n.BlogPost(lang='fr')
>>> blog_post.get_lang()
'fr'

you can also specify a fallback language. This is useful if a field was translated yet:

>>> blog_post = con.test.i18n.BlogPost(lang='en', fallback_lang='en')
>>> blog_post.title = u"Hello"
>>> blog_post.set_lang('fr')
>>> blog_post.title # no title in french yet
u'Hello'
>>> blog_post.title = u'Salut'
>>> blog_post.title
u'Salut'

i18n without dot notation (the fast way)

If for you, speed is very very important, you might not want to use the dot notation (which brings some extra wrapping). While the API would be more fun, you can still use i18n. Let’s take our BlogPost:

>>> # Python 3
>>> from mongokit import *
>>> class BlogPost(Document):
...     structure = {
...             'title':str,
...             'body':str,
...             'author':str,
...     }
...     i18n = ['title', 'body']

>>> con = Connection()
>>> con.register([BlogPost])
>>> blog_post = con.test.i18n.BlogPost()
>>> blog_post['_id'] = 'bp1'
>>> blog_post['title']['en'] = "Hello"
>>> blog_post['body']['en'] = "How are you ?"
>>> blog_post['author'] = "me"

>>> # Python 2
>>> from mongokit import *
>>> class BlogPost(Document):
...     structure = {
...             'title':unicode,
...             'body':unicode,
...             'author':unicode,
...     }
...     i18n = ['title', 'body']

>>> con = Connection()
>>> con.register([BlogPost])
>>> blog_post = con.test.i18n.BlogPost()
>>> blog_post['_id'] = u'bp1'
>>> blog_post['title']['en'] = u"Hello"
>>> blog_post['body']['en'] = u"How are you ?"
>>> blog_post['author'] = u"me"

As you can see, fields title and body are now dictionary which take the language as key. The result is the same:

>>> # Python 3
>>> blog_post
{'body': {'en': 'How are you ?'}, '_id': 'bp1', 'title': {'en': 'Hello'}, 'author': 'me'}

>>> # Python 2
>>> blog_post
{'body': {'en': u'How are you ?'}, '_id': u'bp1', 'title': {'en': u'Hello'}, 'author': u'me'}

The good thing is you don’t have to use set_lang() and get_lang() anymore, the bad thing is you get some ugly:

>>> # Python 3
>>> blog_post['title']['fr'] = 'Salut'
>>> blog_post['title']
{'fr': 'Salut', 'en': 'Hello'}
>>> blog_post['body']['fr'] = 'Comment allez-vous ?'
>>> blog_post['body']
{'fr': 'Comment allez-vous ?', 'en': 'How are you ?'}
>>> # Python 2
>>> blog_post['title']['fr'] = u'Salut'
>>> blog_post['title']
{'fr': u'Salut', 'en': u'Hello'}
>>> blog_post['body']['fr'] = u'Comment allez-vous ?'
>>> blog_post['body']
{'fr': u'Comment allez-vous ?', 'en': u'How are you ?'}

Note that you don’t have to fear to miss a i18n field. Validation will take care of that

>>> # Python 3
>>> blog_post['body'] = 'Comment allez-vous ?'
>>> blog_post.save()
Traceback (most recent call last):
...
SchemaTypeError: body must be an instance of i18n not unicode

>>> # Python 2
>>> blog_post['body'] = u'Comment allez-vous ?'
>>> blog_post.save()
Traceback (most recent call last):
...
SchemaTypeError: body must be an instance of i18n not unicode

i18n with different type

i18n in Mongokit was designed to handled any python types authorized in MongoKit. To illustrate, let’s take a fake example : temperature.

>>> class Temperature(Document):
...     structure = {
...        "temperature":{
...           "degree": float
...        }
...     }
...     i18n = ['temperature.degree']
...     use_dot_notation = True

>>> con.register([Temperature])
>>> temp = con.test.i18n.Temperature()
>>> temp.set_lang('us')
>>> temp.temperature.degree = 75.2
>>> temp.set_lang('fr')
>>> temp.temperature.degree = 24.0
>>> temp.save()

This example describes that float can be translated too. Using i18n to handle temperature is a bad idea but you may find a useful usage of this feature.

Using i18n different type allow you to translate list:

>>> # Python 3
>>> class Doc(Document):
...     structure = {
...        "tags":[str]
...     }
...     i18n = ['tags']
...     use_dot_notation = True

>>> con.register([Doc])
>>> doc = con.test.i18n.Doc()
>>> doc.set_lang('en')
>>> doc.tags = ['apple', 'juice']
>>> doc.set_lang('fr')
>>> doc.tags = ['pomme', 'jus']
>>> doc
{'tags': {'fr': ['pomme', 'jus'], 'en': ['apple', 'juice']}}

>>> # Python 2
>>> class Doc(Document):
...     structure = {
...        "tags":[unicode]
...     }
...     i18n = ['tags']
...     use_dot_notation = True

>>> con.register([Doc])
>>> doc = con.test.i18n.Doc()
>>> doc.set_lang('en')
>>> doc.tags = [u'apple', u'juice']
>>> doc.set_lang('fr')
>>> doc.tags = [u'pomme', u'jus']
>>> doc
{'tags': {'fr': [u'pomme', u'jus'], 'en': [u'apple', u'juice']}}

Using DBRef

MongoKit has optional support for MongoDB’s autoreferencing/dbref features. Autoreferencing allows you to embed MongoKit objects/instances inside another MongoKit object. With autoreferencing enabled, MongoKit and the pymongo driver will translate the embedded MongoKit object values into internal MongoDB DBRefs. The (de)serialization is handled automatically by the pymongo driver.

Autoreferences allow you to pass other Documents as values. Pymongo (with help from MongoKit) automatically translates these object values into DBRefs before persisting to Mongo. When fetching, it translates them back, so that you have the data values for your referenced object. See the autoref_sample. for further details/internals on this driver-level functionality. As for enabling it in your own MongoKit code, simply define the following class attribute upon your Document subclass:

use_autorefs = True

With autoref enabled, MongoKit’s connection management will attach the appropriate BSON manipulators to your document’s connection handles. We require you to explicitly enable autoref for two reasons:

  • Using autoref and it’s BSON manipulators (As well as DBRefs) can carry a performance penalty. We opt for performance and simplicity first, so you must explicitly enable autoreferencing.
  • You may not wish to use auto-referencing in some cases where you’re using DBRefs.

Once you have autoref enabled, MongoKit will allow you to define any valid subclass of Document as part of your document structure. If your class does not define `use_autorefs` as True, MongoKit’s structure validation code will REJECT your structure.

A detailed example

First let’s create a simple doc:

>>> class DocA(Document):
...    structure = {
...        "a":{'foo':int},
...        "abis":{'bar':int},
...    }
...    default_values = {'a.foo':2}
...    required_fields = ['abis.bar']

>>> con.register([DocA])
>>> doca = tutorial.DocA()
>>> doca['_id'] = 'doca'
>>> doca['abis']['bar'] = 3
>>> doca.save()

Now, let’s create a DocB which have a reference to DocA:

>>> class DocB(Document):
...    structure = {
...        "b":{"doc_a":DocA},
...    }
...    use_autorefs = True

Note that to be able to specify a Document into the structure, we must set use_autorefs as True.

>>> con.register([DocB])
>>> docb = tutorial.DocB()

The default value for an embedded doc is None:

>>> docb
{'b': {'doc_a': None}}

The validation acts as expected:

>>> docb['b']['doc_a'] = 4
>>> docb.validate()
Traceback (most recent call last):
...
SchemaTypeError: b.doc_a must be an instance of DocA not int

>>> docb['_id'] = 'docb'
>>> docb['b']['doc_a'] = doca
>>> docb
{'b': {'doc_a': {'a': {'foo': 2}, 'abis': {'bar': 3}, '_id': 'doca'}}, '_id': 'docb'}

Note that the reference can not only be cross collection but also cross database. So, it doesn’t matter where you save the DocA object as long as it can be fetch with the same connection.

Now the interesting part. If we change a field in an embedded doc, the change will be done for all DocA which have the same ‘_id’:

>>> docb['b']['doc_a']['a']['foo'] = 4
>>> docb.save()

>>> doca['a']['foo']
4

Required fields are also supported in embedded documents. Remember DocA have the ‘abis.bar’ field required. If we set it to None via the docb document, the RequireFieldError is raised:

>>> docb['b']['doc_a']['abis']['bar'] = None
>>> docb.validate()
Traceback (most recent call last):
...
RequireFieldError: abis.bar is required

About cross-database references

pymongo’s DBRef doesn’t take a database by default so MongoKit needs this information to fetch the correct Document.

An example is better than thousand words. Let’s create an EmbedDoc and a Doc object:

>>> # Python 3
>>> class EmbedDoc(Document):
...   structure = {
...       "foo": str,
...   }

>>> class Doc(Document):
...   use_dot_notation=True
...   use_autorefs = True
...   structure = {
...       "embed": EmbedDoc,
...   }

>>> con.register([EmbedDoc, Doc])
>>> embed = tutorial.EmbedDoc()
>>> embed['foo'] = 'bar'
>>> embed.save()


>>> # Python 2
>>> class EmbedDoc(Document):
...   structure = {
...       "foo": unicode,
...   }

>>> class Doc(Document):
...   use_dot_notation=True
...   use_autorefs = True
...   structure = {
...       "embed": EmbedDoc,
...   }

>>> con.register([EmbedDoc, Doc])
>>> embed = tutorial.EmbedDoc()
>>> embed['foo'] = u'bar'
>>> embed.save()

Now let’s insert a raw document with a DBRef but without specifying the database:

>>> raw_doc = {'embed':DBRef(collection='tutorial', id=embed['_id'])}
>>> doc_id = tutorial.insert(raw_doc)

Now what append when we want to load the data:

>>> doc = tutorial.Doc.get_from_id(doc_id)
Traceback (most recent call last):
...
RuntimeError: It appears that you try to use autorefs. I found a DBRef without database specified.
 If you do want to use the current database, you have to add the attribute `force_autorefs_current_db` as True. Please see the doc for more details.
 The DBRef without database is : DBRef(u'tutorial', ObjectId('4b6a949890bce72958000002'))

This mean that you may load data which could have been generated by map/reduce or raw data (like fixtures for instance) and the database information is not set into the DBRef. The error message tells you that you can add turn the force_autorefs_current_db as True to allow MongoKit to use the current collection by default (here ‘test’):

>>> tutorial.database.name
u'test'

NOTE: You have to be very careful when you enable this option to be sure that you are using the correct database. If you expect some strange behavior (like not document found), you may look at this first.

Reference and dereference

You can get the dbref of a document with the get_dbref() method. The dereference() allow to get a Document from a dbref. You can pass a Document to tell mongokit to what model it should dereferenced:

>>> dbref = mydoc.get_dbref()
>>> raw_doc = con.mydb.dereference(dbref) # the result is a regular dict
>>> doc = con.mydb.dereference(dbref, MyDoc) # the result is a MyDoc instance

GridFS

MongoKit implements GridFS support and brings some helpers to facilitate the use of relative small files.

Let’s create a document Doc which have two attachment in GridFS named as source and template:

>>> from mongokit import *
>>> class Doc(Document):
...        structure = {
...            'title':unicode,
...        }
...        gridfs = {'files':['source', 'template']}

You might want to be able to add file in gridfs on the fly without knowing their name. The new API allow to add “containers” to gridfs. So, the gridfs declaration look like this:

gridfs = {
  'files':['source', 'template'],
  'containers': ['images'],
}

As you can see, nothing hard. We just declare our attachment files in the gridfs attribute. Filling this attribute will generate an fs attribute at runtime. This fs attribute is actually an object which deal with GridFS.

>>> connection = Connection()
>>> connection.register([Doc])
>>> doc = connection.test.tutorial.Doc()
>>> doc['title'] = u'Hello'
>>> doc.save()

Before using gridfs attachment, you have to save the document. This is required as under the hood, mongokit use the document _id to link with GridFS files.

The simple way

All gridfs attachments are accessible via the fs object. Now, we can fill the source and template:

>>> doc.fs.source = "Hello World !"
>>> doc.fs.template = "My pretty template"

And that’s it ! By doing this, MongoKit will open a GridFile, fill it with the value, and close it.

Note that you have to be careful to the type : attachments only accept string (Python 2) or bytes (Python 3).

You can read any attachment in a very simple way:

>>> doc.fs.source
'Hello World !'

You can add any image you want to the container “images”:

>>> doc.fs.images['image1.png'] = "..."
>>> doc.fs.images['image1.png']
'...'
>>> doc.fs.images['image2.png'] = '...'

This is very useful when you want of store a number of file but you don’t know their names.

If you have python-magic installed (sudo easy_install -U python-magic), the content-type of the file is automatically guessed. To access to it, you have to use the “full way”.

new in version 0.5.11

There were many problems with the python-magic support so it has been removed.

If you do not know stored file names, you can list them by iterate:

>>> [f.name for f in doc.fs]
['source', 'template']

You can list a container as well. The container name is accessible via the container attribute:

>>> for f in doc.fs.images:
...    print '%s/%s' % (f.container, f.name)
images/image1.png
images/image2.png

The full way

While the previous method is very easy, it might not be enougth if you’re dealing with very big files or want to use some file related feature (for instance, using seek to not have to load all the file in memory)

You can do that with using the get_last_version() method on the fs object.

>>> f = doc.fs.get_last_version("source")
>>> f.read(10)

If you want to create a file and write in it, you can do that with using the new_file() method on the fs object. The new_file() method take the file name and all other properties pymongo accepts:

>>> f = doc.fs.new_file('source')
>>> f.write("Hello World again !")
>>> f.close()

By supporting PyMongo 1.6 you can use the advanced with keyword to handle write operations:

>>> with doc.fs.new_file("source") as f:
...     f.write("Hello World again !")
...

You can add any image you want to the container “images”:

>>> f = doc.fs.images.new_file('image1.png')
>>> f.write('...')
>>> f.close()
>>> f = doc.fs.images.get_last_version('image1.png')
>>> f.read(10)

All PyMongo API is supported:

>>> id = doc.fs.put("Hello World", filename="source")
>>> doc.fs.get(id).read()
'Hello World'
>>> doc.fs.get_last_version("source")
<gridfs.grid_file.GridOut object at 0x1573610>
>>> doc.fs.get_last_version("source").read()
'Hello World'
>>> f = doc.fs.new_file("source")
>>> f.write("New Hello World!")
>>> f.close()
>>> doc.fs.source
'New Hello World!'
>>> new_id = doc.fs.get_last_version("source")._id
>>> doc.fs.delete(new_id)
>>> doc.fs.source
'Hello World'