Monday, May 18, 2009

Until recently, I didn't know just how expensive the Xerces Java XML DOM parser was. In particular, when instantiating a new parser, or rather, a new DocumentBuilder. There are all these ObjectFactory classes that, if a system property isn't set, scans the classpath for a file that contains the value it needs. And, at least in the older versions, the value is not cached, so each time a new XML parser is needed, it runs through the classpath.

That made for thread dumps where lots of threads were blocked on ZipEntry. And there were over 100 jar files in the classpath. I think setting a bunch of properties such as javax.xml.parsers.DocumentBuilderFactory and javax.xml.parsers.SAXParserFactory helped quite a bit.

No wonder JSON has gained so much popularity.

Before learning all this, I really hated all the code that built XML by hand, because I always would get assigned bugs due to <, >, or & causing problems. (Once, when I filed a bug on such issue, someone "fixed" it by doing URLEncoder.encode(). Wrong. I fixed that fix.) I would think it would be better to just create a DOM Document, and then serialize that out. Now, I'm not so sure.

No comments:

Post a Comment