Multi-document consistency with MongoDB

Europython 2012

Anders Hammarquist <iko@openend.se>

Me

Python since 1.4

OpenEnd (Strakt) since 2001

Python, Databases, Email,
Domain-specific languages,
System integration, etc

Introduction

This talk is about how to deal with the fact that MongoDB does not support transactions when you need to have consistent data across multiple documents.
MongoDB
A NoSQL database that stores free-form structured documents.
Transaction

A set of database updates that we want atomically written.

This is not a transaction in the ACID sense, although many properties apply. The isolation level is equivalent to SQL Read Committed, so you will see changes made by other committed transactions. Multi-document consistency can only be guaranteed if you observe the commit locks when reading (we don't).

The lack of consistency is not a big problem, as any update that depended on an inconsistency should fail to commit, due to incompatible changes.

Eutaxia

The Open End collaborative todo system.

We have successfully transitioned Eutaxia from a transaction-dependent PostgreSQL backend to MongoDB.

Background: Eutaxia

Grew out of an earlier advanced helpdesk system

Custom object db on top of SQL

Move from single-process to multiple servers

MongoDB a nice fit for our data model

Background: MongoDB

Document store
JSON (BSON) object
Python dict
Collections
Sets of related documents
Searchable
Give me documents that look like this

Background: MongoDB

Distributable
Replication
Sharding
No transactions
Updates atomic only for a single document

Background: MongoDB



Now what?

Background: MongoDB

Two-phase commit
MongoDB cookbook
Solves a specific problem of accounting
Not a good fit for free-form data
Write-intent journal
Similar, but kept separate
Store diffs

Model

Normally, you'd read and write your documents directly. Now, you write a transaction diff to a separate collection instead, and when you're done, you apply the diff to the main collection(s).

Write-intent

Moves transaction to application:

Store changes in a commit collection

Apply changes (may fail!)

Profit!

Commit objects

Generate when transaction runs

Diff of documents to change

Data for new documents

List of documents to remove

Input that created the change

Applying commit

Lock affected documents (fail)

Apply changes (fail)

Unlock documents

Look for another commit
from another process
Possible to see inconsistent data
unless you ignore locked documents

Locking failure

Mark commit as not running

Another writer is running a commit:
Leave this commit for that writer
Avoid deadlocks
Else: Try again
Race condition:
Slight chance of dropping commit
Fix by regularly looking for commits

Change failure

Something changed in the database

Application dependent
Re-run code to generate new diff
Reduce chance
E.g. add list append

Other failures

Other failure scenarios, not covered

MongoDB write errors
Out of disk, crashes etc
Record what changes have been applied
Application crash leaving locks
Cleanup routine that finds dead locks
Unhandled exceptions
Unlocked, partially applied changes

Demo



Demo

Example

# Colour
{
   '_id' : ObjectId(),
   'keys' : [],
   'count' : 0,
   'colour' : 'red', # 'green', 'any'
}

Example: Commit document

# Commit
{
   '_id' : ObjectId(), # document id
   'handled_by' : ObjectId() # process id
   'input' : {}
   'changes' : {
      str(docid) : {
        key : ( old, new )
      }
   }
}

Example: add_key()

 def main():
     commit = committer.add_key()
     committer.process(commit)

 def add_key(self):
     # Generate a random key
     colour = random.choice(['red', 'green'])
     key = random.choice('abcd…')

     return self.prepare_commit(colour, key)
We have three documents, 'red', 'green' and 'any', to which we want to add keys. We randomly pick a colour and key and add the key to the document of the chosen colour and the 'any' document.

Example: prepare_commit()

 def prepare_commit(self, colour, key):
     ckey = self.database.data.find_one(
        {'colour' : colour})
     ccount = ckey['count']
     ckeys = ckey['keys']
     
     commit = {
       'handled_by': self.id,
       'input' : { 'colour': colour,
                   'key' : key},
       'changes' : { str(ckey['_id']) : {
          'count' : ( ccount, ccount + 1),
          'keys' : ( ckeys, ckeys + [key]) },
     }

Example: process()

def process(self, commit):
  while commit:
    try:
      self.commit(commit)
    except Locked:
      unhandle, if other handlers: return
    except Conflict:
      remove and recreate, continue
    else:
      remove
    finally:
      unhandle commit

    find and handle next commit

Example: commit()

def commit(self, commit):
  affected = commit['changes'].items()
  try:
    lock(affected)

    for doc, changes in affected:
      for k, (prev, _) in data.items():
        if doc[k] != prev: raise Conflict()

    for doc, changes in affected:
      for k, (_, new) in data.items(): doc[k]=new
      store doc

  finally:
    unlock(self.id)

Example: lock()

 def lock(affected):
   for docid, _ in affected:
     doc = self.database.data.find_and_modify(
       {'_id': ObjectId(docid), '_handled_by': None},
       {'$set' : {'_handled_by': self.id}})
     if doc is None:
       raise Locked()

 def unlock(selfid):
   self.database.data.update(
     {'_handled_by': selfid},
     {'$unset' : {'_handled_by': True}},
      multi=True)

Example: limitations

Only changes, no add or delete

No MongoDB error checking

Dataset chosen to always generate conflict
If you have data like this, you may want to consider other solutions.

Summary

Write changes to a separate commit collection

Apply changes to data

Handle failures

Summary

Unlocked data will be consistent
except for errors in db

Commits may be delayed in case of lock contention

Consistency in case of failures can be improved