Google touts the platform's ease of development, and stepping through the samples reinforce that it is, in fact, quite easy. However, it doesn't take long to discover what will probably be the biggest hurdle for developers entrenched in the relational database paradigm: The Google Datastore. It's not a relational database, and it's not an OOP wrapper to a relational database. It's a web-specialized data storage mechanism, accessed through classes called Models, and objects called Entities.
I'm willing to bet that most of the developers playing with the SDK will first really "get" this when they move past the simple "one table" queries in the samples, and try to do a basic JOIN query. Although there is a SQLlike syntax called Gql -- as stated in the Docs -- there is no JOIN.
To get this functionality, you have to use
db.ReferenceProperty
to link one object to another. Here's a short demonstration of how it's done. I figure this is much needed, since there seems to be no good examples for it in the Google documentation. (The best information I could find was in the GAE discussion group.)Below, I've listed example.py in its entirety (don't worry, it's short), and I'll refer to each pertinent section by the line numbers. (You can download the entire sample here. Put it under the SDK folder, and run it like any of the GAE samples.)
1 import osThe above stores a URL someone has entered, and then stores other URLs that match it by some degree (the "affinity"). The affinity is a numeric score. This is a simple one-to-many relationship, and to get at the data using standard SQL, we'd write something like:
2 import cgi
3 import wsgiref.handlers
4
5 from google.appengine.ext import webapp
6 from google.appengine.ext import db
7 from google.appengine.ext.webapp import template
8
9 class MainPage(webapp.RequestHandler):
10 def get(self):
11
12 url = EnteredUrl(url="http://domain.com/page.html")
13 url.put()
14
15 match1 = AffinityUrl(
16 url="http://domain.com/dir/page1.html",
17 affinity = .83,
18 entered_url=url
19 )
20 match1.put()
21
22 match2 = AffinityUrl(
23 url="http://domain.com/dir/page2.html",
24 affinity = .8301,
25 entered_url=url
26 )
27 match2.put()
28
29 matched_urls=url.matched_urls.order('-affinity')
30
31 aff_entries = AffinityUrl.all().order('url')
32
33 template_values = {
34 'url' : url.url,
35 'matched_urls': matched_urls,
36 'aff_entries': aff_entries
37 }
38
39 path = os.path.join(os.path.dirname(__file__), 'index.html')
40 self.response.out.write(template.render(path, template_values))
41
42 class EnteredUrl(db.Model):
43 url = db.StringProperty(required=True)
44
45 class AffinityUrl(db.Model):
46 url = db.StringProperty(required=True)
47 affinity = db.FloatProperty(required=True)
48 entered_url = db.ReferenceProperty(EnteredUrl,
49 required=True, collection_name='matched_urls')
50
51 def main():
52 application = webapp.WSGIApplication(
53 [('/', MainPage)],
54 debug=True)
55 wsgiref.handlers.CGIHandler().run(application)
56
57 if __name__ == "__main__":
58 main()
SELECTHere are the steps using the GAE Datastore.
entered_url.url,
affinity_url.url,
affinity_url.affinity
FROM
entered_url
JOIN
affinity_url
ON
entered_url.KEY=affinity_url.FOREIGN_KEY
Lines 42-49.
First, let's define the data Model.
EnteredUrl
defines a single string property, url
, for the obvious reason. AffinityUrl
defines a string property for url
, as well as a float affinity
property, for storing the score.Lines 48-49.
Also,
AffinityUrl
defines a db.ReferenceProperty
named entered_url
, which refers to an EnteredUrl
object. This is the link between our two data objects, and how we effectively do a JOIN. The collection_name
, matched_urls, is used to refer to the collection of AffinityUrl
objects that will be linked.Lines 12-13.
When the page is loaded in the browser we create an
EnteredUrl
entity named url
, setting its url
property to a string value.Lines 15-27.
We setup two
AffinityUrl
objects, and assign them both a url and a numeric score. Additionally, we point entered_url
to our EnteredUrl
object, url
. We have just linked one object (url
) to many (match1
, and match2
).Line 29.
This line queries the data in the one-to-many way, and stores it in an object,
matched_urls
, which I pass through to the template for iteration and output. This is where the collection name we defined in the db.ReferenceProperty
attributes is used. Note that the collection name, matched_urls, is called like a method from url
, since url
is the object being referenced.Line 31.
Additionally, for illustration, I query the
AffinityUrl
object data and save it in aff_entries
. Just as in SQL, where you can JOIN tables, or query them individually, the App Engine allows you to do both. (Hopefully, you've realized by now that although they look and are accessed differently, these linked entities are behaving quite a lot like relational database tables.)In the template, I output the data from
matched_urls
by getting each AffinityUrl
object in the collection, and displaying that URL. Note that because of the .order('-affinity')
call, we are displaying the URLs with the closest affinity at the top (descending order).<table>Load this up in your browser, and refresh a few times, and this is what you get:
{% for affurl in matched_urls %}
<tr><td>{{ affurl.url }}</td></tr>
{% endfor %}
</table>
You may have noticed from the code that I also pass all the data stored in the
AffinityUrl
model (line 31) to the template as well. This is output in the second table, above.Because I've refreshed the page several times, I've generated and stored the
match1
and match2
objects multiple times to the Datastore. This highlights something strikingly different about the Datastore and a SQL table. SQL statements like the one I give will display all the entries that match between EnteredUrl
and AffinityUrl
, even if entries in AffinityUrl
are duplicated. As you can see, even though we have duplicate AffinityUrl
entities stored, the reference from the EnteredUrl
entity is smart enough to realize that they are duplicates, and only displays the ones that are unique. Update: please see the comments for a correction of the previous statements. The Datastore is creating new entities each time with a unique ID...The Datastore takes a little getting used to, especially for those experienced in the standard relational data models. (Good ol' paradigm shift.) The GAE documentation feels unfinished or at least rushed, which is unfortunate. I personally think they should have concentrated more on giving good examples that demonstrate mapping relational concepts to Datastore concepts, since the majority of developers looking at the GAE will be old hands at the relational stuff.
I'm sure they'll get there eventually. In the meantime, I hope you found this tutorial useful.
It is worth explaining further that in line 49, as another critical contribution to the 'join-equivalent' GAE functionality, the GAE framework automatically creates a Collection named 'matched_urls' on each EnteredUrl instance AND the framework automatically adds a reference to the given AffinityUrl into that matched_urls collection of the given EnteredUrl instance.
ReplyDeleteThanks for sharing this.
ReplyDelete