Monday, November 29, 2010

For storing objects in a database, I chose to serialize the objects in a way that maintains compatibility between object versions. So Java serialization was out. My first implementation was to serialize to JSON using Jackson, which was simple because it could serialize and deserialize objects that I was already using. However, there were more compact serialization schemes. Apache Thrift and Google Protocol Buffers looked fairly equivalent in functionality. Both required generating code from some IDL, making them less convenient than JSON. I chose Protocol Buffers since its serialization seemed to be slightly smaller than that of Thrift.

In order to continue using the objects that I was using before, since they included annotations and logic that can't be cleanly added to the protobuf-generated code, I used java.beans.Introspector to extract values from the objects to be stored in the database and put them into the protobuf objects and vice-versa.

Monday, November 22, 2010

AWS (Amazon Web Services) is pretty nice. It not only has what's needed to deploy and scale a web service, but everything can be controlled through their REST (or SOAP) API, so that everything can be automated.

One thing that should have been obvious was how a service would start when an instance starts. The first time, I created an instance from a prebuilt image, copied the code to it, then logged in and started it manually. Eventually, I created an image that included all the code. I finally figured out that I could start it in /etc/rc.local.

The next thing I needed to figure out was how to update the software. Everything would be in a single war file (except for static assets pushed up to the CloudFront CDN (Content Delivery Network)), but automatically building a new image from an existing image with the war file replaced looked difficult. I could do it manually by starting an instance, replacing the war file, and then creating a new image. I could automate that manual process. Another way seems to involve turning the image into a loopback filesystem, replacing the war file in the filesystem, then turning the filesystem back into an image, then uploading the image. Finally, I figured out that I could just upload the war file to Amazon S3 (Simple Storage Service), and the instance could download the war file from S3 when it starts up, and creating new images is unnecessary.

However, the process of getting a working image was slow and tedious, since starting and stopping instances, and creating images were all really slow. Once I had an image that worked, setting up load balancing was trivially easy. Setting up Auto Scaling also looks very easy, once I figure out what metrics to use for launching and terminating instances.

Monday, November 15, 2010

JAX-RS is really nice for writing REST services. I'm using Jersey 1.3, with Guice 2.0 and jersey-guice to configure things. However, the Jersey JSON provider is horrible. It messes up writing empty and single element arrays, and it writes boolean and numerical values as strings. My first hack around that was to return a JSON string instead of the objects, doing the conversions in-line using Jackson 1.5. That was ugly, so I overwrote META-INF/services/javax.ws.rs.ext.MessageBodyReader and META-INF/services/javax.ws.rs.ext.MessageBodyWriter in the Jersey jar, replacing the Jersey JSON providers with org.codehaus.jackson.jaxrs.JacksonJaxbJsonProvider, and it was much better.

I also use the Jersey REST client to talk to Facebook. However, Facebook returns text/javascript;charset=UTF-8 as the content-type, which is not recognized by either the Jersey JSON provider, or the Jackson JAX-RS JSON provider. Again, the first thing I did was to get the content as an InputStream, which I sent to Jackson for object binding. Then, I figured that I could just extend the Jackson JSON provider to accept text/javascript and use that class as the JSON provider.

@Provider
@Consumes({MediaType.APPLICATION_JSON, "text/json", "text/javascript"})
@Produces({MediaType.APPLICATION_JSON, "text/json"})
public class MyJaxbJsonProvider extends JacksonJaxbJsonProvider {
@Override
protected boolean isJsonType(MediaType mediaType) {
if (mediaType != null && "javascript".equals(mediaType.getSubtype()))
return true;
return super.isJsonType(mediaType);
}
}

Now, the code no longer has ugly hacks for JSON. However, any time I get a new Jersey jar, I'll have to overwrite the provider list in the jar, which is ugly. Perhaps I could put my own javax.ws.rs.ext.MessageBodyReader and javax.ws.rs.ext.MessageBodyWriter in a jar, and have it take precedence, but it didn't work the first time I tried it. It might depend on the classpath order of the jars, and having to use the classpath order of jar files to override things is also ugly.

Also on JSON, RFC-4627 says that slashes may be backslash quoted, though the example in section 8 does not backslash quote the slashes. The JSON from Facebook does backslash quote the slashes, but the JSON produced by Jackson does not. I wonder why backslash quoted slashes were explicitly included in the JSON specification. Perhaps it was inherited from Javascript, in which case I wonder why it was that way in Javascript. My only guess is that it had something to do with regular expressions, if it was from Javascript. Otherwise, it just seems weird.

Monday, November 8, 2010

One thing that bugs me about some of the datacenter hosts that I have to access for one reason or another under a generic developer account that has no write access to anything except /tmp is that there is a long delay while xauth tries to lock $HOME/.Xauthority. I've explicitly unset DISPLAY, but that makes no difference.

Monday, November 1, 2010

My initial attempt at making an iframe Facebook app immediately ran into a snag. It seems that if the Safari browser's cookie configuration is "Only from sites I visit", subtitled "Block cookies from third parties and advertisers.", then cookies don't get saved or sent to the iframe. Which means saving the session id in a cookie fails. Having sessions is critical, but I don't know if other browsers have this problem, or whether it's acceptable to fail for users that block iframe cookies (or all cookies). The session id could be stuck in the URL as a matrix parameter with a simple configuration of the servlet container, but it's painful to have to make sure all the URLs that need the session id are rewritten. Plus, the session id would get leaked in Referer headers.

Once I made sure all the URLs had the proper rewriting, it still didn't work, because although Glassfish v3 (74.2) rewrites all the URLs, it never recognizes the sessionid in the URL, and creates a new session. When I switched to Tomcat, it worked with cookies disabled.