Posted by Aidan, 22
Last week's Google Developer Day was a huge amount of fun. Particularly interesting was Mano Marks' codelab session on Google App Engine.
I'd played about with App Engine before, but it was only when going off on a tangent and exploring the codelab example code that I started to get a feel for some of the application growth / scalability limitations that the framework poses at the moment. (In the course of the event I managed to find two bugs in App Engine, and then a third bug in Google Chrome when trying to submit the issue.)
Updating domain models through the Django-esque framework can be fraught, lovely though it is. Removing or modifying properties can be done silently, which is fair enough, but records are only updated at write-time. That is to say, if a property is removed, each object must be read and then persisted before the change will be propagated across all data. Changing a property type is even more fraught, as a new temporary property must be added, filled with data from the old, the old removed, and then the data cast and moved back. Google recommend that you don't fetch records several hundred at a time when running these processes, as the app engine may injudiciously kill lengthy requests, but more importantly, because their fetch offset method is inefficient and unreliable with an offset over 1,000. Principly, this is because the fetch method queries the dataset when run, and then only returns the limited data specified in the parameters.
Simply put, it's expensive to page through all data without querying specifically for each chunk, as the underlying mechanism to return data doesn't distinguish between domain and relational data. Having built an ORM framework in the past, we have some opinions on the matter. Google's solution (when it arrives, see the PDF on their thoughts on scaling) will differ, not least as they have some very different considerations for App Engine. Here's a snippet from the FAQ for our LPC framework software, which describes our own solution to the problem when we encountered it in 2000:
Differentiation between domain and relational data
The LothianProductionsCommon (LPC) framework provides highly scalable search and retrieval functionality. Key to this is the underlying differentiation between domain object data representing Cacheable objects and domain object relational data representing their relationships.
Need for flexible primary keys
Cacheable objects must have a simple primary unique key (Id), and the framework assumes that this can be represented by a long integer in most cases (domains where objects cannot be keyed by a numeric ID can be supported through use of the MathHelper's BaseDecToBaseN methods -- if implemented at factory level this allows for IDs to use a 37-byte alphanumeric base, though this approach is highly undesirable).
Partial and complete cache-assisted domain object retrieval
Where an object's primary key is known, it can be fetched simply by using the appropriate broker's Get or Gets method (users should note that polymorphism is not exploited by the broker templates as it cannot be modelled in WSDL when exposing brokers to SOAP requests). This will retrieve the objects required by primary key, and can pass through the cache on an object-by-object basis, optimising requests for groups of objects that are only partially cached.
Get and Gets calls the only way to retrieve objects
This Get or Gets broker call is the only way to retrieve domain objects, and it will reference the underlying application configuration, bringing out the data manipulation language (DML). When using SQL, for instance, the DML may read as:
SELECT id, name, description modified FROM test WHERE {id}
Scheme-based optimisation of Gets calls
The factories will generate valid DML from this template string. In the case of SQL queries, there are various layers of optimisation determined by the ConnectionScheme being used. Firstly, when using Gets to retrieve multiple domain objects from a database, the scheme will decide whether using OR or IN statements is most effective for the given database type.
SELECT id, name, colour FROM test WHERE id = 1 or id = 2
SELECT id, name, colour FROM test WHERE id IN (1, 2)
The scheme will also split large Gets calls where required by the underlying data provider. For instance, if a Gets call is querying for 40,000 objects and the data provider only permits 15,000 bound variables per statement, the scheme will transparently split the query into four and merge the results. Additionally, if the data provider has a query length limit -- for example, of 2,000 characters per query -- the scheme can also transparently split and remerge the query.
Why separate relationship resolution?
Consider the following queries run sequentially on a static dataset:
SELECT id, name, colour FROM table WHERE colour IN ('red', 'blue')
SELECT id, name, colour FROM table WHERE id IN ('red')
Running and caching results of both could lead to an unnecessary connection and query being made, as well as the unnecessary transfer of data from data provider to application server, and the unnecessary use of cache space for what are duplicate results.
By splitting out the relational information in the queries, the LPC has an opportunity to cache data more effectively with minimal overhead. For instance, the following query will return only the primary key of the data to be retrieved.
SELECT id FROM table WHERE colour IN ('red', 'blue')
= 1, 2, 3, 4, 5
These IDs can then be loaded if not cached, and cached appropriately.
SELECT id, name, colour FROM test WHERE id IN (1, 2, 3, 4, 5)
It is likely the next query will contain duplicate IDs.
SELECT id FROM table WHERE colour IN ('red')
= 1, 2, 3
And in this case it will turn out that we have already retrieved and cached objects #1, #2 and #3. In this particular case we can see that the obviated Gets query does not look that costly -- but what if 99% of Gets would be looking for object #2, and what if each object has many large fields?
Managing relationships
Having established that domain objects can only be retrieved by a Get or Gets call, and that they can only be retrieved by a unique primary key, the next step is to define a search framework to resolve the relationship between terms as above like "colour IN ('red', 'blue')" and a set of results as seen, like "1, 2, 3, 4, 5". In addition to this, it is clear that as well as caching domain objects, being able to cache relationship resolution queries and their answers would be beneficial.
SearchCriteria objects contain collections of SearchTerms in the LPC, allowing users to build simple queries as above, where "colour IS red OR blue", and more complex queries with multiple sort orders, fuzzy matching and layers of Boolean logic. SearchCriterias are also sensibly used to key the cache of resolved relationship queries.
When handed to the GetRelationship call, the criteria will be dynamically transformed and interpreted to suit the data provider being used. This allows for dynamic changes in data provider type, and portability -- the LPC can be moved from a MySQL back end to an Oracle one despite substantial SQL format differences, or even backed onto a set of XML files with a SQL to XQL transform. GetRelationship will return a Relationship object, which provides a primitive ordered list of IDs and accompanying lists of relevance scores (used when the underlying query engine can provide such information) and modification timestamps, as well as the SearchCriteria used to generate the relationship. The modification timestamps are used by the caching mechanism to assist with object expiry. If we can see in the relationship query that the object has been modified since it was last cached, we know the cache can be invalidated for that particular object.
Start to finish, retrieving domain objects from a query
Posted by Aidan, 26
Be cracked by software
Unlike the older iPhone, the 3G can't... yet. So meantime you probably need a SIM card adapter.
Easily sync with Google Apps
It can and it can't. The email works well, but synchronisation of calendars or contacts requires Nuevasync. Nuevasync is free, works well, and doesn't need to be trusted with any of your passwords.
Work nicely with push data
Yes, it will if you have a nice Exchange server, or if you use MobileMe. But using MobileMe with Google Applications isn't straightforward, and neither is migration between the two. Hopefully Google will introduce push support soon. Emoze offer a push service to sit on top of other email providers, but it's not a quick win.
Let you swap tasks
Surprisingly, you can't switch between applications. If you're browsing the web the web and need to dip into your contacts, the web browser gets closed. Fortunately when you re-open your browser it will remember where it was, but the process takes time and other applications don't work so well.
Include copy and paste
So you've been emailed a thirteen-digit tracking code for your Royal Mail parcel and want to track it online? You cannot copy the number from email and paste into your browser, and -- as above -- switching between the two involves closing the former. There is an application called iCopy which emulates copy and paste, but it involves sending your "clipboard" data in plaintext over the web. Top tip: carry around a notepad and pen with your iPhone 3G.
Run quickly
Whilst the scrolling is usually lag-free, there can be quite a delay loading applications. If you have a few hundred contacts, prepare to wait a few seconds before your iPhone starts responding when you press "Contacts".
Provide Open Source or free RDP and VNC support
It doesn't, not yet, though there are commercial implementations of both, and limited "free" versions of those.
Have native office applications
No, it doesn't have word processing, spreadsheets or presentations.
Have native chat applications
There are plenty of web based chat systems that work with or are built for the iPhone 3G. But do you want to put your chat account data into those systems? And remember, when you want to switch to another application on the iPhone, your web browser will be closed. MobileChat has been around for a while but it's not free and their servers have been falling over since launch.
Have a task list application
The best it'll do is get you into the iPhone version of RememberTheMilk. A native application would be preferable, as whilst the web solution looks lovely, parts are cryptic. After fifteen minutes of playing with it I think I can do everything apart from mark a task as completed.
Use OpenVPN
There's no native client yet, so the office can wait.
Let you play Angband, Moria or Nethack
Whilst there is a port of Nethack available for the iPhone 3G, the touchscreen interface doesn't allow for enough 1-touch commands. It's possible to play any of the games in a terminal through an SSH session, but there's still a problem.
.net angel apache audit backup backup extraction bbc bcm.pabx best practice bootlaw bug business business angels business continuity c# call detail recording cdr chief technical officer chief technology officer christmas chrome cio code review colo consulting cto contract cto creative agencies credit card credit crunch crunchies 2008 cto cto for hire data storage data-centre development disaster disaster recovery django domain modelling drinktank due diligence encryption entrepreneurs equity funding events fail firewall focus forcedeth fowa fraud freelance cto fundraising git google google apps google developer day hackintosh hiring hosting ideneb incubator interim cto internet world investment investment. investor investor ios4 ipad iphone iphone 3g iphone backup extractor iphone restore iplayer jason calacanis java job description jobs labs language launch48 law layoffs legal advice logs london lpc mac mashups meetups mentor capital microsoft mobile mod_wsgi molo mvc nda ned networking nortel norway online security os x outsourcing php plan planning protectedcc ps3 raising money realplayer recruiting recruitment reincubate saas scaling security seedcamp seo software staffing start-up start-ups starting a business startup startup cto stealth start-up techcrunch telephone temporary cto testing the start-up depression titanic turnaround ubuntu vc vct virtual cto virtual technology incubation web cto web optimisation web shops weekend wireless wpa xbox360