After getting some feedback on my original post, a simpler, more SQL analogous way to obtain the many-to-many behavior was pointed out to me.
I've created another sample (download it here), and will go over it below. Afterwards, I'll talk about why you shouldn't model your data this way, and instead should denormalize your data for optimization in the Datastore.
Here are the new data Models. (The full code listing is here.)
class Libraries(db.Model): notes = db.StringProperty() class Books(db.Model): notes = db.StringProperty() class Library(db.Model): name = db.StringProperty() address = db.StringProperty() city = db.StringProperty() libscol = db.ReferenceProperty(Libraries, collection_name='libscol') def books(self): return (x.book for x in self.librarybook_set) class Book(db.Model): title = db.StringProperty() author = db.StringProperty() bookscol = db.ReferenceProperty(Books, collection_name='bookscol') def libraries(self): return (x.library for x in self.librarybook_set) class LibraryBook(db.Model): library = db.ReferenceProperty(Library) book = db.ReferenceProperty(Book)I still have the
Books
and Libraries
models, as you can see. These are needed to collect the Library
and Book
entities so I can easily iterate over them and output. The Book
model contains a reference to Books
, via Book.bookscol
, and Library
to Libraries
, via Library.libscol
.The
LibraryBook
model just contains references to the Library
and Book
models. This creates our "join". After we add libraries and books to the Datastore, we will link them to each other using LibraryBook
entities.When the page loads, we first create and store our data entities.
# Library collection libs = Libraries() libs.put() # Book collection books = Books() books.put() # Setup libraries lib1 = Library(name='lib1', address='street a', city='city1', libscol=libs) lib2 = Library(name='lib2', address='street b', city='city2', libscol=libs) lib1.put() lib2.put() book1 = Book(title='book1', author='author one', bookscol=books) book1.put() book2 = Book(title='book2', author='author one', bookscol=books) book2.put() book3 = Book(title='book1', author='author two', bookscol=books) book3.put() book4 = Book(title='book2', author='author two', bookscol=books) book4.put() book5 = Book(title='book3', author='author two', bookscol=books) book5.put() l1 = LibraryBook(library=lib1, book=book1) l2 = LibraryBook(library=lib1, book=book2) l3 = LibraryBook(library=lib1, book=book4) l4 = LibraryBook(library=lib2, book=book4) l5 = LibraryBook(library=lib2, book=book5) l6 = LibraryBook(library=lib2, book=book3) l7 = LibraryBook(library=lib2, book=book1) l1.put() l2.put() l3.put() l4.put() l5.put() l6.put() l7.put()First, we create our
Libraries
and Books
entities, libs
and books
. These will be passed into each Library
and Book
entity we create.After we create our books and libraries, we generate a lot of
LibraryBook
entities, assigning a library and a book to each one. Each LibraryBook
entity now links one library with one book. As you may have noticed, some books are assigned to both libraries, some are not.Library
contains a method called books()
. It returns every book in the librarybook_set
as an iterable data structure. Because LibraryBook
holds a reference to Library
, any Library
entity (say, lib1
), is given a back-reference to the collection of LibraryBook
entities. If you do not define a collection_name
, GAE automatically creates one by appending "_set" to the model name. This is where librarybook_set
came from, in case you were wondering.Given a library entity like
lib1
, the books()
method allows us to easily return all the books at that library by simply assigning or iterating over lib1.books()
. The Book
model contains a method called libraries()
which does just the opposite: allows you to get all the libraries where a given book resides.Our data has been created and linked. Now we pass it in to the template.
template_values= { 'lib': lib1.name, 'books_at_lib': lib1.books(), 'forbook': book1.title, 'libs_by_book': book1.libraries(), 'libs_books': libs.libscol.order('name'), 'books_libs': books.bookscol.order('-author').order('title') }In this example, we not only display all libraries and all books (via
libs_books
and books_libs
) the way we did in the previous post, but also output all books at a library (books_at_lib
), and all libraries that contain a given book (libs_by_book
).Here's the template, if you want to take a look at it.
Denormalize your data
As I stated before, the GAE Datastore is not a relational database. Databases were designed for compactness and efficiency, and normalization is used, in part, as a way to minimize the size of your data on disk.The Datastore has been built, first and foremost, with scalability in mind. Scalability means, in essence, "add more servers as needed, without re-writing your code". Specifically to the GAE Datastore, it means "disk space is cheap, stop worrying about it, and scale".
Consider modifying our
LibraryBook
model above to look likeclass LibraryBook(db.Model): library = db.ReferenceProperty(Library) book = db.ReferenceProperty(Book) booktitle = db.StringProperty() libraryname = db.StringProperty()Now, we are not only storing each book's title in the
LibraryBook
entity, but we are also storing it in the title
property of the referenced Book
entity. While this is obviously not space efficient, and certainly not the elegant, normalized way of storing relational data our brains are used to, it scales well and is fast.It scales because the Datastore runs on who knows how many commodity computers in the background (without the knowledge of our application), and it's fast because we have the most commonly needed fields available immediately. If you need to poke further into the data, like to get the street address of the library, you would use the referenced models, and our JOIN then comes into play.
(Thanks, Ben the Indefatigable for illuminating this.)
No comments:
Post a Comment