I've been looking into writing a web app that will run on Google App Engine, but before I commit myself to the platform I'd like to know what, if any, limitations there are. I'm aware of the basic CPU/bandwidth restrictions that Google places on the free service, but I'm wondering more about development restrictions like how BigTable compares to a standard relational database and what Python libraries aren't available on the GAE platform (and what alternatives Google provides).
Basically I'm looking for any hidden roadblocks before I commit to the platform. Thanks for your help!
I think your biggest obstacle (but definitely not a limitation) is overcoming the relational mindset that has become mainstream in the industry. The relational model has its place in this world, but it doesn't solve all problem domains, and you can't depend on it in GAE.
What this means in more practical terms is that you will have to design/architect your applications differently. In several scenarios the typical and norm you have been acusstomed to will not apply. Perhaps one of the most obvious examples is that of shared counters. Anyway, all this is hard to see until you have concrete examples. However for starters, I have compiled a playlist [1] of some of the most helpful App Engine sessions that were presented at Google I/O last year. I would advise you to see them as they do a great job at helping one understand how the platform works.
Update: Just recently on Google App Engine's Blog they describe non-relational databases [2]. I think this will help to give more context.
[1] http://www.youtube.com/view_play_list?p=DA31F43DE4107B05There are a lot of constraints in the google app engine and it is quite difficult to list them all.
A good start for you might be to check the list of most voted feature requests and figure out if your project can work without those: http://code.google.com/p/googleappengine/issues/list. There are other small issues related to the dashboard and inconsistencies between the SDK and the public application, but most of these can be avoided.
One thing that I have found quite annoying is the fact that there is no 'blueprint' for building apps in GAE. Basically, even for quite simple web apps you'll find out that there isn't a known path to decide what is the best way to structure your data and get the best performance out of it, nor are there simple ways to profile and understand how to circumvent these.
Building an app deployable on GAE is a fundamental shift from the traditional way of writing apps and while it is exciting, I do think that it might require more effort on your side.
./alex
PS: I'm currently working on a GAE based project, that is deployed here: http://the.dailycloud.net
Having developed a couple of web application with GAE, I would add :
There are a number
of limitations, though it is a promising product.
The most ironic of them is that GAE doesn't didn't support TEXT SEARCH in its database Api.
I thought google was all about search, and mostly text search.
[1] http://www.google.com/events/io/2011/sessions/full-text-search.htmlUPDATE: at last Google is adding Full-text Search [1]
At last we are adding a full text search service to App Engine. The upcoming service will be built on top of the very infrastructure used by Google. In addition to full text search queries we will also offer numeric, geo, date search capabilities, and much more. This session will cover the basic full text search API, briefly outline more advanced features, and how full text search ties to existing services such as datastore
I have identified/researched following limitations so far.
Performance will surprise you. GAE is optimized for many tiny queries and you get warned if a query takes any CPU time at all. You get 6.5 (at last check) free hours per day, but it's a mystical number and you should test.
You'll find that time as you measure it doesn't relate to the CPU or datastore CPU time, because (for example) under the covers there might be multiple machines updating indexes during deletes/updates. Some users have found huge CPU usage when uploading bulk data - many hours of usage for e.g. 20 min of real time.
Your Java instance might need to be powered-up if it hasn't been hit in (I think) 20 minutes. The benefit is that they can pass their smart management on to you as cheaper costs, but it does mean you'll experience a short delay, and see a high CPU warning on the first request in a while.
For many cases, Python datastore access is faster than Java JDO. You'll likely find that using the low-level API for Java faster.
Some developers seem to have experienced more datastore errors thank you would expect (around 0.4-1% maybe?). I haven't yet.
I explored Google AppEngine for my own amusement a while ago, making a lunch 8ball in the process.
http://blurry-lunch.appspot.com/
The system is very easy to work with, and to hit the ground running.
Limitations I saw in Bigtable mostly revolved around dataset size and access time. Part of the application I was making would randomly choose a location out of a list of locations, to due this I am pulling out the complete list of locations, then selecting a random element in python. As the default index is a non-linear GUID, and I didn't bother to setup a separate attribute in the object, as to find the next available id in the system isn't how DataStore is designed.
The problem is, if you need linear access to a massive amount of records to perform an operation, you could run into a limit where your request takes too long, but what data set would cause such a delay isn't clearly defined, as google's systems are massively clustered.
As far as external Python Libraries, you should be fine, as long as the calls they make are python based. You will have to bundle them yourselves into the directory structure that gets uploaded for deployment.
You should be aware that you will be locking your self to their platform, as there is no production ready system that supports their API. They have a standalone appserver for debug purposes, but it certainly isn't suitable for an actual deployment.
Another thing is that GAE is still in beta, with no commercial support options, and you could not run a wildly successful web application currently without a commercial plan. The limitations are too low to even survive a slashdotting for static content.
EDIT: Of course this is all as of Feb 19th, 2009. This could all change wildly, or GAE could even be turned off.
If you consider the situation of moving to different hosting platform, try to isolate things that are unique to GAE (mostly storage and authentication) from your app core logic — this way you'd be able to swap them with relatively smaller amount of work. And do not use their WebApp framework (it's really basic anyway), stick with widely adopted tools like Django [1] or some WSGI toolbox (eg. Werkzeug [2]). Speaking of the limits: it's always desirable to make your app responsive, so careful coding to not hit timeout limits is a Good Thing™. :)
Other than that, I found developing for GAE really fun and entertaining, perhaps due to exotic nature of platform. ;)
[1] http://www.djangoproject.com/There are many limitations that are not marketed by Google, but you'll hit them sooner or later ;). One of them was just posted at http://stackoverflow.com/questions/572780/cpython-internal-structures.
Many limitations can be overcome by changes in algorithms, that will do good for your application anyway. For instance, they recently raised timeout from 10 sec to 30 sec for total request processing (you can spend that time in many ways that doesn't consume other resources, the simplest is querying external system). Changes you do to your application to fit into 30 seconds will make your app better!
And similar approach applies to many other limitations that are there. Try, push to limits, and see if it fits you. Good luck!
In addition to the other answers, there is a maximum of 5000 indexed properties per entity (source: http://www.youtube.com/watch?v=AgaL6NGpkB8&feature=player_embedded# [1]!) but it really seems to be a limit of 5000 indexed property values per entity.
I summed up more limits of the datastore here http://code.google.com/p/xydra/wiki/AppEngine
[1] http://www.youtube.com/watch?v=AgaL6NGpkB8&feature=player_embedded#You are not limited to using Google's hosting of AppEngine. For example with AppScale [1] you can migrate to Amazon EC2 or your own colo servers.
When you think about it, the restrictions relative to standard Python are all to do with avoiding security holes. If your application really does require unavailable features you could look at Amazon's new micro-instances, now free for the first year for the first one. With any good VPS you are in total control. For example using NodeJS - which seems to be a very credible alternative.
[1] http://code.google.com/p/appscale/Database: there are many issues with that notably, you can't have Join in queries Portability: Code written for GAE can't be ported anywhere else (unless you use django or web2py; but they also have limitations on GAE)
I'd like to know why you want to use GAE rather than a proper database backed site.
I built a couple of test projects on GAE, and both times things went smoothly initially then suddenly hit a stupid roadblock. Missing features is one problem, reliability is another.
Backend writes fail often enough to cause trouble, and are well documented.
I scrapped my last GAE project when my site suddenly started serving 404 pages overnight. I hadn't changed anything and there are no logs for debugging. If it was a live production site, what would you do in those circumstances? You're basically stuffed.
One of the limitations that concerned me was the limits with respect to updating entities. I cannot remember where I read it and I will update the post if I find it.
The Restriction is that you can only write to the same entity 5 times per second.
If someone has heard of this restriction and has a link please add as a comment.
Another Limitation is support, there is an error at the moment which causes 500 errors randomly and some apps need to upload their code again to get the app working (see the django-nonrelation google group to see what I am talking about). There is no word when this will be fixed.
Also once I tried an app to run some integration with Salesforce... and because salesforce took more than 10 seconds to service the request, though that limit has been upped to 30s... it failed.