1. Dynamic instances and processing time
This Google App Engine feature came me as a surprise, though it makes perfect sense. Your site is slow if it has low traffic.
Google App Engine runs Python code on instances. By default, instances are dynamic. Instances are shutdown if they do not have enough traffic (requests per minute). Thus, when you get the individual hits to App Engine now and then, App Engine must restart your instance every time for each hit.
When this happens, you see the following in App Engine console logs for every request on low volume traffic:
This request caused a new process to be started for your application, and thus caused your application code to be loaded for the first time.
It is not always ok to add 500 – 2000 milliseconds processing delay on the top of the normal processing time. Google’s own recommendation was that each page should be served within 200 milliseconds.
There are three ways to optimize this issue
- Use App Engine premium feature “Always on” 0,30 $ / day which keeps your instance always running
- Use cron job or such to keep your instance alive (polling once in a minute seems to do the job)
- Optimize your imports and split your code to several modules with light amount of imports, so that start up is fast (modules are imported only once)
We are using Zabbix software to monitor our sites (sidenote: I don’t recommend Zabbix as the first monitoring software choice as it is very difficult to use and has bad user experience, alienating both sysadmins and developers away from it). This is what we had before optimizations – App Engine was starting a new process for every request:
… and this is output we got after optimizations:
Here is the corresponding diagram after optimizations from App Engine dashboard itself. These processing times are without network latency. As far as I know Google does not expose the endpoints of App Engine hosting, so you don’t know from which site of the world your responses come from. By comparing this diagram to the diagram above, you can see how Internet traffic is affecting to your App Engine application.
2. The PITA of dying instances
For some reason, App Engine instances misbehave sometimes. This causes the HTTP requests die ungracefully.
Normally it is not a problem as you lost few page loads now and then. People are used to “Internet grade” service and can hit the refresh button if they have problems opening a page.
However if you are monitoring your site and the site gives an unnecessary alarm in the middle of the night, waking up your bastard operator from Hell, he will be very angry next morning and tell you to migrate the crappy software from unreliable Python / App Engine to more reliable PHP servers 🙁
This is what you see in App Engine logs:
A serious problem was encountered with the process that handled this request, causing it to exit. This is likely to cause a new process to be used for the next request to your application. If you see this message frequently, you may be throwing exceptions during the initialization of your application. (Error code 104)
After digging in deeper, you see that it is a problem of instating a new object in the database, exceeding 30 seconds hard limit for processing a HTTP request:
2011-03-09 05:06:20.794 / 500 30094ms 86cpu_ms 40api_cpu_ms 0kb Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727),gzip(gfe),gzip(gfe),gzip(gfe) <class 'google.appengine.runtime.DeadlineExceededError'>: Traceback (most recent call last): File "/base/data/home/apps/mfabrikkampagne/1.347249742610459821/main.py", line 494, in main run_wsgi_app(application) File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/webapp/util.py", line 97, in run_wsgi_app run_bare_wsgi_app(add_wsgi_middleware(application)) File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/webapp/util.py", line 115, in run_bare_wsgi_app result = application(env, _start_response) File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/webapp/__init__.py", line 515, in __call__ handler.get(*groups) File "/base/data/home/apps/mfabrikkampagne/1.347249742610459821/main.py", line 296, in get try: self.session = Session()
So it looks like there is a temporary hick-up in Google App Engine’s Data Store (Big Table?). In the example above the error comes from gaeutilities‘s Session model, but it could be any other model.
The best way to handle this situation is to adjust your monitoring software – Zabbix in our case. Zabbix allows you to configure triggers so that they don’t alarm on every bad item state change. Instead, you can use min() function and trigger the alarm after the trigger condition has failed every time during a monitoring period. Just make sure that the trigger period is at least twice long as the update interval of your web scenario: this way Zabbix can logs at least two item state changes and allows one of them to be failed one.
For example if
- Update interval of web scenario is 60 seconds
- Trigger function must check minimal failures of 1 during 2*60 seconds + some buffer = 150 seconds.
{xxx.fi:web.test.fail[de.mfabrik.com].min(150)}=1
This will allow one failed response before triggering the alarm.
Subscribe to RSS feed Follow me on Twitter Follow me on Facebook Follow me Google+