Monday, April 28, 2008

Google App Engine: Many-to-many JOIN

GAEUpdate: After reading this, you might want to check out GAE: [A Better] Many-to-many JOIN, which gives an improved way of doing this, plus goes into why you shouldn't normalize your data.

A public library has many books. In SQL-speak, this is a one-to-many relationship. (For the sake of the argument, I'll assume each library has only one copy of a given book). It follows then, that many libraries have many books. This is a many-to-many relationship. On the heels of my recent post GAE: One-to-many JOIN, here is an example showing how to do a many-to-many JOIN using the Google App Engine Datastore.

You can download this entire sample here.

A many-to-many SQL query for our library scenario would look something like
SELECT
*
FROM
library
INNER JOIN
libraries_books
ON
library.KEY=libraries_books.library_KEY
INNER JOIN
books
ON
libraries_books.book_KEY=books.KEY
To duplicate this functionality in the Datastore, we have to model our data as follows. (Full code listing here.)
# These are used for linking/ordering
class Books(db.Model):
notes = db.StringProperty(required=False)

class Libraries(db.Model):
notes = db.StringProperty(required=False)

# Data models
class Library(db.Model):
name = db.StringProperty(required=True)
address = db.StringProperty(required=True)
city = db.StringProperty(required=True)
library_list = db.ReferenceProperty(Libraries,
required=True, collection_name='ref_libs')

class Book(db.Model):
title = db.StringProperty(required=True)
author = db.StringProperty(required=True)
library = db.ReferenceProperty(Library,
required=True, collection_name='books')
book_list = db.ReferenceProperty(Books,
required=True, collection_name='ref_books')
The Library and Book models share a one-to-many relationship. This is setup using the Book.library db.ReferenceProperty. Nothing really new here (if you read my one-to-many post, anyway).

We need some additional references to pull off the many-to-many relationships, however, plus a couple extra Models. (It's important to note that the db.ReferenceProperty in itself only allows for a one-to-many relationship. That's why we need more than one get the many-to-many behavior.) I've created the Libraries and Books models for this. You may notice that they have an optional, largely unnecessary property named notes. This can pretty much be ignored. We really just need these entities to exist in order to point to them from our Library and Book entities.

The Library model contains a reference to Libraries through a property named library_list. Book has a reference to Books via book_list. Having references to both Libraries and Books allows us to manipulate the sorting for each collection, as you will see below.

When the page loads in our browser, the first thing we do is create entities from our models, and give them some data.
# Library collection
libs = Libraries()
libs.put()

# Books collection
books = Books()
books.put()

# Setup libraries
lib1 = Library(name='lib1', address='street a', city='city1',
library_list=libs)
lib2 = Library(name='lib2', address='street b', city='city2',
library_list=libs)
lib1.put()
lib2.put()

# Books:
# Both libraries
book1 = Book(title='book1', author='author one',
library=lib1, book_list=books)
book2 = Book(title='book1', author='author one',
library=lib2, book_list=books)
# Only first library
book3 = Book(title='book2', author='author one',
library=lib1, book_list=books)
# Both libraries
book4 = Book(title='book3', author='author two',
library=lib1, book_list=books)
book5 = Book(title='book3', author='author two',
library=lib2, book_list=books)
book1.put()
book2.put()
book3.put()
book4.put()
book5.put()
We declare our "link" entities, libs and books, first. Next we create two library instances, lib1 and lib2, and assign libs to library_list to create a one-to-many relationship from Library to Libraries.

A Book entity has two relationships to setup. A one-to-many relationship to a given Library entity, and a one-to-many relationship to the Books entity. These are established through the library and book_list properties, respectively.

After we store our data, we use the collections in our Library and Book models to create two objects that we will pass to our template.
libs_books = libs.ref_libs.order('name')
books_libs = books.ref_books.order('author').order('-title')

template_values = {
'libs_books': libs_books,
'books_libs': books_libs
}
Both libs_books and books_libs contain many-to-many relationships between libraries and books. But libs_books references books from libraries, allowing you to sort by library, and books_libs does the opposite, referencing libraries from books, letting you sort by books. This is certainly more clumsy and more work than our SQL counterpart, which just needs an ORDER BY clause to sort either way.

On to the template. To output books by library, we have to iterate over every library lib in libs_books, and then iterate over every book referenced to lib.
{% for lib in libs_books %}
{% for book in lib.books %}
<tr>
<td>{{ lib.name }}</td>
<td>{{ lib.address }}</td>
<td>{{ lib.city }}</td>
<td>{{ book.title }}</td>
<td>{{ book.author }}</td>
</tr>
{% endfor %}
{% endfor %}
Because of the way references are setup in libs_books, we are able to order the output based on the libraries, as you can see in the first table below.

results

The second table above shows the output from books_libs, which we use to order by books. Here's how we generate the data in the template:
{% for book in books_libs %}
<tr>
<td>{{ book.title }}</td>
<td>{{ book.author }}</td>
<td>{{ book.library.name }}</td>
<td>{{ book.library.address }}</td>
<td>{{ book.library.city }}</td>
</tr>
{% endfor %}
We don't have to use nested loops, and we simply use book.library as a normal reference (not a back-reference) to get the library associated to the given book. The reason we don't have to nest is because a Book entity has a many-to-one relationship with a Library entity, so each book is already attached to a Library. Library entities have a one-to-many relationship to Book entities, so every time you get lib, you have to find it's many, which requires the second loop.

There you have it. A first blush example, to be sure, but I think it conveys the core steps required to duplicate the behavior of a relational many-to-many JOIN.

No comments:

Post a Comment