Monday, August 31, 2009

I got assigned a bug due to my stupidity today. Originally, it was a new feature, that required restructuring some code, which I did, but then I marked the task complete, forgetting to hook up the final piece to enable the new feature.

More specifically, there was an interface that permitted the submission of multiple items. The code accepting the items would then put each item individually into a queue. The dequeue processor would then pass the item to a handler.

The new requirement was that one of the handlers had to be able to process multiple items at once. Another new requirement was that a single submission with multiple items results in a single billing record.

So the queue had to be modified to handle multiple items in a single submission, as did the handler interface. The handlers themselves were adapted to the new interface by adding the multiple item call in the base class that looped over the items, and calling the original single item signature.

So after all that, I tested some handlers, to make sure they worked like they did before, and it was fine. I just forgot to finish the changes to the handler to process multiple items at once that was the original motivation for all these changes.

Of course, there's the matter of error handling of multiple items. It was decided that if the processing of any item failed, the entire submission would be considered to have failed, even if some items were first successfully processed. I suspect that this will eventually be unsatisfactory and will have to be changed, which will require probably more extensive restructuring of the code. I don't know what will be the most important problem with it, so I'm not going to do anything about it until there are new requirements.

Friday, August 28, 2009

One of my tendencies is to code subexpressions inline if they are only used once. It makes the code less readable by making the lines longer and the expressions more complicated. However, I do it because I don't have to invent some name for the identifier of the subexpression. Furthermore, in Java, if the type of the subexpression isn't already imported, then either the type has to be imported or the full class name needs to be in the declaration.

Decent type inference would move my tendency towards naming the subexpression rather than coding it inline. But the matter of having to come up with a name for it means that my expressions would still tend to be a little more complicated that those written by most others.

Wednesday, August 26, 2009

One issue that probably bothers me more than it should is returning mutable objects out of a cache. I probably should just return a copy and not worry about it, even though I won't be modifying it. But it bothers me that all those copies get made. On the other hand, when I do find a need to modify one of those objects, I won't have to worry about remembering to make a copy then.

One of the great things about Java is that Strings are immutable. And, if I were worrying about malicious code, the String class is final, so, at least with Strings, Integers, and other primitive types and their wrappers, I don't have to worry about weird things happening. Of course, the Java platform has elaborate security management to support applets, rmi, and other mechanisms of running untrusted code.

However, when designing and implementing an API, how much should be done to protect against malicious usage? I think it's not worth it to take steps against malicious API usage, but it's worth it to think about it, because it is valuable to be able to handle possibly naive usage that might take the same path and wouldn't otherwise be anticipated.

Monday, August 24, 2009

One former coworker liked to add pointless parameter validation to methods. For example, a lookup method started like this

if (name == null || name.trim().length() == 0) {
log(some verbiage);
return null;
}

I'd prefer that code just not be there at all. If null is passed in and causes a NullPointerException, there's probably something wrong either with the code passing it in, and the exception would help in tracking down the problem more than the verbiage in the log statement would. There are situations where a null being passed in should be expected and should return a null, but that was not one of them.

The checking for an empty string or a string of only whitespace was really pointless, since it was a lookup that would have returned null anyhow for anything that wasn't in the table. Anyhow, the code calling the lookup method would either check the result and log something if it was null, or it would just use the result, causing a NullPointerException. Either way, the end result would be correct, though the NullPointerException made the logs uglier.

It wasn't as if the calls were part of an API to be delivered externally, or even to another internal team. And if they were, the checking would still have been mostly pointless.

Friday, August 21, 2009

In some XML files, there are values that require additional parsing rather than using XML to structure that data. I suppose it's because something like

<values>1,2,3</values>

is more compact and more friendly for human editing than

<values><value>1</value><value>2</value><value>3</value><values>

The worst example is SVG. SVG seems designed to make the format compact, using many single letter elements, but it also had to be XML, which is not compact. It also has data, such as path and transform values that have their own syntax. If that data were instead structured with XML, the size of the SVG document would explode. My conclusion is that XML is being shoehorned into places where it's just awkward. Probably because there are lots of XML tools and expertise, and no obvious alternative.

Wednesday, August 19, 2009

One misconception that I've seen a few times while browsing around is the idea that function calls (and with it, circular data structures) are JSON. They are Javascript, and they work in browsers when using eval. But they won't work with JSON parsers, which are being built into the latest browsers, and are faster and safer than eval.

Monday, August 17, 2009

I just recently posted about rarely seeing multithreading problems, and now I get called into work in the evening because there are reports of users seeing data for other users, and this started just after a new build went out to production. Since it was in code that I had never looked at before, I had to grep around to see what was calling what, since the lowest level code that I was initially pointed to looked fine. In a class three calls up, there was a suspicious looking static declaration of a User object. Looking at the history in source control, it had been changed to static for no apparent reason, not even a stupid reason, such as using it in a static method.

Friday, August 14, 2009

A few years ago, while dumping out some Java class files, I was thinking that it would be nice if the Javadoc were embedded in the class files. That way, a tools like javap (or IDEs) could be so much more useful when given some jar file. With JDK 1.5 and annotations, such things do get embedded into class files. Why not have an option to compile Javadoc into class files as annotations?

Wednesday, August 12, 2009

I'm surprised that I only recently made this realization: Java arrays are just a special case of a generic class. Of course, since arrays have been part of the JVM from the beginning, they get special treatment -- no type erasure and covariance being major examples. I'm surprised that I didn't make this connection sooner, because, in some other languages, arrays are not conceptually different than other generic types. In Standard ML, for example, 'a array is just another type with a single type parameter.

Monday, August 10, 2009

In most of the javascript examples of setTimeout I've seen, I see a string for the first argument,

setTimeout("do some stuff", 100);

where the contains javascript to be interpreted. I prefer using a function for the first argument,

setTimeout(function() { do some stuff }, 100);

I would imagine that the function is more efficient, but perhaps the string is so commonly used that it's been super optimized by javascript implementations. I think the function is cleaner, and theoretically allows syntax checking. If there are parameters involved in building the string, there's the possibility of script injection attacks. The function has lexical scoping. I imagine a new scope is created for the interpreted string.

Friday, August 7, 2009

Multithreaded code can be a source of pitfalls, but I've only rarely seen with problems with threading. Mostly because the vast majority of code don't have to deal with or be aware of multiple threads, and the code that does is generally written by people who know what they're doing.

The worst example I've seen was some code using a static variable to build a string. I'm sure it worked fine when it was first tested. Under more load, though, it generated these gigantic strings that mixed information from multiple transactions.

In some code by someone who I assume generally knew what he was doing, there was a comment that said synchronization wasn't needed because the JVM does operations on longs atomically. I didn't change anything, but I'm pretty sure the JVM specification explicitly does not guarantee that operations on longs were atomic. I suppose they could be atomic on modern systems.

About 10 years ago, one guy said that he made some methods synchronized because he couldn't figure out what some problem was and thought that making the methods synchronized might help. I don't think it did, though.

Wednesday, August 5, 2009

A minor case code that to me is more complicated and less correct than it needs to be is validation of email address formats done in code where I work. It's pretty much always done with a regular expression. The JavaMail API is always available too. javax.mail.internet.InternetAddress.parse() will throw a javax.mail.internet.AddressException for invalid email address formatting, so why are regular expressions always used? I'm guessing that it's ignorance and coding by copying. It works well enough, so I'm not going to change it. But I'd do email address format validation with JavaMail rather than with a regular expression.

Monday, August 3, 2009

Since many sites are offering JSON as an alternative to XML, and since JSON can be much more lightweight than XML, I started looking into Java JSON libraries. I wanted something fast and lightweight.

The JSON.org implementation is low-level and clunky, so I looked at gson and Jackson. One of the most obvious deficiencies of the JSON.org implementation is that it only deserializes from a String, and not from a Reader (or a Stream), which means the entire value has to be read into a String, and then parsed. I don't imagine that I'll be dealing with huge JSON values any time soon, but it is a concern. The other implementations do not have that deficiency and additionally provide a streaming or event API, which would be useful, should there be huge JSON values from which a small amount of data is of interest.

For the near term, I'll only be using the deserialization, and both of them seem fine. Jackson appears slightly bulkier than gson, in that it's broken into multiple jar files, which admittedly is a silly metric. I like the Jackson API a little better, though.

For serialization, I'm not satisfied with either one, but I like what Jackson offers better. gson serializes all the fields, including private fields. Jackson is close to what I want, in that it serializes POJOs, but it introspects the class from getClass(), rather than a passed in class, which is what I'd like to use to restrict which fields get serialized, which would allow having other public getters that aren't serialized that can be used in other code. Something like this

public <T> void serializeToJSON(Writer writer, T object, Class<T> type);

where the 3rd parameter would typically be an interface.

Update: serialization the way I want is in Jackson 1.2.