I write the code: May 2009

Friday, May 29, 2009

When I was in college, my first class that used computers, a numerical methods class using FORTRAN77, told me to use vi as the editor. So I learned how to use vi for basic editing.

When I was in graduate school, the first computers I got set up with in my department were a VMS cluster. vi was not available, and everyone used emacs. So I started using emacs. This was emacs 18, emacs 19 and later had yet to be released. Pretty soon, I found vi-mode, and then viper-mode, which I used for a few months. I started using a bunch of features that vi didn't have, such as splitting the window. Eventually, as it was horribly slow, I totally abandoned viper-mode and just used the straight emacs bindings. So, that's how I came to be an emacs user.

However, to this day, if I want to edit a file from the shell, I'll start up vi, because vi is quicker to type than emacs, and vi (or elvis or vim or whatever the vi that was there was) starts up much more quickly.

But I'll always have an emacs window around. One emacs feature that I overuse is hippie-expand. I have it bound to M-TAB, or, as I type it, C-M-i (or on keyboards without a meta key, C-[ C-i). I suppose identifier completion is a feature of IDEs, which I hate using. But I've been using hippie-expand for over 15 years.

Wednesday, May 27, 2009

I don't like making gratuitous changes to code, even to code that I don't like. It clutters the source control history with useless entries, and it makes comparing branches more difficult than it otherwise would be.

However, a lot of code is written by somewhat blindly copying code that is already there. So, I would like that the code that gets copied to be the code that I like, rather than code that I don't like. So I'll be a little more likely to clean up code that I don't like if it is code that tends to get copied.

Monday, May 25, 2009

When I first started using Java, I used wildcard imports all the time. Now, I never use them. They make grepping for the uses of a class much more difficult.

When I first learned about import static (introduced in JDK 1.5), I thought it was something that I would never use. I still think I'll never use a wildcard import static.

However, there were a bunch of classes that extended one abstract class, and I wanted to change to extend another abstract class. Those classes used a bunch of static methods in the original abstract class, so, at first, I started changing each reference to be prefixed with the original class. After a while of this stupidity, I remembered the import static feature, and that made the change a lot easier.

Another import feature that I sometimes want is an "import as" feature. Something like


import java.sql.Date : SQLDate;


import SQLDate = java.sql.Date;

because java.sql.Date and java.util.Date can't both be imported.

Friday, May 22, 2009

When restructuring the code last month for more modularity, I moved each module into its own jar file, each loaded with its own ClassLoader. One of the modules used Google's gdata API, and moving the gdata jar files into the module caused the problem of the javax.activation framework not finding the handler for application/atom+xml documents. At first, I just kept the gdata jar files in the main classpath so that it would work, as I had a bunch of other things to work on.

When I got back to looking into it, I googled around a bit and discovered that I needed to call Thread.setContextClassLoader() to the module's ClassLoader (and then restore it after returning out), and that made it work.

Right now, it's pretty ugly, because I call Thread.setContextClassLoader() within the module using gdata, and that doesn't extend well to other modules. All calls to the module should be wrapped with Thread.setContextClassLoader(), so that the module doesn't have to worry about it. The cleanest way I see to do this without disturbing the existing code is to create a wrapper module object for each module, which is kind of gross. But now that I know about Thread.setContextClassLoader(), the next time I'm doing something similar, I'll accommodate it in the design.

Among the things involved with moving each module into its own jar file, I moved all the static images associated with the module from the web directory into the module jar file, and added a servlet to serve them up, including sending the "Last-Modified" header and honoring the "If-Modified-Since" header.

The modules are now pretty modular. Until six months ago, there was a file that had all of the modules (5 at the time) hard-coded into it. Until last month, the SNMP code had all the modules (9 at the time) hard-coded into it.

Wednesday, May 20, 2009

Where I work, there are many non-native English speakers, as well as a substantial team in India. One thing this leads to is annoying misspellings in the source code. I think that they just don't catch their typos, and then many of them get established as public method names or public constants that cannot be easily changed.

Here are some examples:

ALBUB (ALBUM)
OBJCET (OBJECT)
Billilng (Billing)

The biggest problem that grepping through code will miss these typos, unless I'm aware that they are there.

Of course, native speakers also make misspellings, such as "compatable", etc.

Monday, May 18, 2009

Until recently, I didn't know just how expensive the Xerces Java XML DOM parser was. In particular, when instantiating a new parser, or rather, a new DocumentBuilder. There are all these ObjectFactory classes that, if a system property isn't set, scans the classpath for a file that contains the value it needs. And, at least in the older versions, the value is not cached, so each time a new XML parser is needed, it runs through the classpath.

That made for thread dumps where lots of threads were blocked on ZipEntry. And there were over 100 jar files in the classpath. I think setting a bunch of properties such as javax.xml.parsers.DocumentBuilderFactory and javax.xml.parsers.SAXParserFactory helped quite a bit.

No wonder JSON has gained so much popularity.

Before learning all this, I really hated all the code that built XML by hand, because I always would get assigned bugs due to <, >, or & causing problems. (Once, when I filed a bug on such issue, someone "fixed" it by doing URLEncoder.encode(). Wrong. I fixed that fix.) I would think it would be better to just create a DOM Document, and then serialize that out. Now, I'm not so sure.

Friday, May 15, 2009

Despite what I've written so far, my first impulse when maintaining code written by other people is not to rewrite it. My preference is to work within the design evident in the code to extend it as needed. That is, if I have enough time and interest to understand the design of the code. One thing I really like about source control is that looking at the history of the changes often helps in understanding the code.

Sometimes I'm not given enough time to understand the design, or the code is just too gross for me to want to figure it out. In those cases, I generally stick in some hack. If I didn't understand the code, the hack could have other problems, though.

The general culture of conservatism with respect to code changes where I work also encourages minimizing code changes, rather than rewriting things.

However, if I have to fix something too many times and the code is too crappy, I'll rewrite it, even if I get chewed out for making changes that are too extensive.

Wednesday, May 13, 2009

Code formatting is something I don't mess with too much. I'm not going to make gratuitous formatting changes in code. It makes extra entries in the source control. And the biggest problem with reformatting code is that it gratuitously makes tracking differences between branches harder.

However, even if I'm not going to mess with it, there are matters of formatting that I don't like.

Lots of people indent with tabs. Plus, they generally use 4-space tabs. I use emacs, which defaults to 8-space tabs. So tab-indented code generally looks like crap to me. I could do (setq tab-width 4) if it really bugged me, but it usually doesn't. I always indent with spaces.

Some people put trailing whitespace on lines, which I find annoying. I think lots of editors put indenting whitespace on blank lines. But some people put trailing whitespace on non-blank lines.

This one guy always put a space after an open parenthesis, and a space before a close parenthesis, which I think looks bad.

I also don't like source files that don't end with a newline. I don't know how to get vi to save a text file that doesn't end with a newline, but it seems to me that IDEs and Microsoft Windows based editors make the end of the file ambiguous, because there are files that either end with no newline, or end with multiple blank lines. Ending with multiple blank lines is annoying the way trailing whitespace on lines is annoying. However, files that don't end with a newline make unified diffs look bad when the last line is involved.

Tuesday, May 12, 2009

Implicit in a lot of the code I see seem to be some rules that I don't agree with.

One seems to be that there should be no more than one return statement. For example, something like


   Object result = null;

   ...

   if (result == null) {
       ...
   }

   if (result == null) {
       ...
   }

   if (result == null) {
       ...
   }

   if (result == null) {
       ...
   }

   ... and on and on ...

   return result;

More often, though, it tends goes like


    Object result = null;
    try {
        ...
        result = ...;
    } catch (Exception e) {
        ...
    }
    return result;

There also seems to be a rule that a result variable has to be declared, so I see things like


    Object result = ...;
    return result;

where I would just write


    return ...;

And then there's code from beginners that don't seem to realize that boolean expressions result in boolean values


    boolean result;
    if (...)
        result = true;
    else
        result = false;
    return result;

Monday, May 11, 2009

I see lots of code that is more complicated than it needs to be.

When writing struct classes, this one guy always adds an elaborate hashCode() method, in which he combines the hashcodes of all the fields, and always adds an elaborate equals() method, in which he compares all the fields, and always adds an elaborate toString() method. Of course, when new fields are added, they aren't always added to the hashCode(), equals(), and toString() methods. The hashCode() result would still be okay, but the equals() would be incorrect, though it's never actually used. And the overly elaborate toString() doesn't matter anyhow, since the only thing it affects are the debug logs.

In another case, I was coding some module that depended on some custom HTTP header, as well as the module system. So I asked that all the HTTP headers be passed in, as some future module might need some other custom HTTP header. But, this other guy writes this elaborate code to read a configuration parameter for that one HTTP header, and extract that one HTTP header, put that one HTTP header into the Map<String,String[]> that was intended to hold all the HTTP headers, and pass that in. As it turns out, the module needed 2 custom HTTP headers. Fortunately, I had already ripped out that elaborate code and passed in all the headers.

There's also this enum that has a constructor that takes a String that is supposed to be the name of a class. All the enum elements are constructed with hard-coded Strings of class names of classes that are known at compile time. The only thing that String is used for is in Class.forName(). So, it would be much simpler to just have the constructor take a Class as the argument. Furthermore, since the String can't be checked at compile time like the Class could be, there is a stupid validate() method that does Class.forName() on that String and logs an error if the class is not found. Furthermore, since all the classes have to be a subclass of a known class, after the Class.forName() is done, a Class.asSubclass() call is done, which is another stupid run-time check that could be done at compile-time by declaring the enum constructor argument to be Class<? extends KnownClass>.

Sunday, May 10, 2009

I don't like using debuggers. It makes running things slow, especially when stepping or when hitting breakpoints. And then, inspecting the state is laborious. So I don't use them.

I prefer debug logging. There are a number of reasons why I prefer looking at logs.

They remain after-the-fact

When I'm assigned a bug that was filed a few days ago, I can go back to the logs of when it occurred.
I don't have to figure out how to reproduce a bug from the steps in the report. I can go back to the logs and see what was done.

If worse comes to worse, they can be turned on on production systems much much more easily than using a debugger on a production system.

Of course, that means the logging has to be meaningful. I can find almost any logging meaningful, though, since I look at logs in context, and with the source code available. Most people I work with seem to not look at the context of the logs, and just want to grep for specific strings to see specific things, so they're not as good as I am at using the logs to diagnose problems.

Many people I work with also do overly elaborate formatting of their logs. They also like adding extraneous line breaks or other strings such as "---->" in an attempt to make their logs visually distinct or easy to grep for. My preferred logging is like this


  log.log("variable1="+variable1+",variable2="+variable2);

which shows the relevant information. I suppose most of the people I work with wouldn't find the logging I add useful, because they'd be using a debugger.

Some people I've worked with don't like the logs to be "cluttered" with stuff they don't find useful, and remove stuff. I rarely remove logging, because almost anything could become useful. The worst thing they do is remove the logging of stack traces that aren't expected. They'd log an exception like this:


  log.log("Something failed: "+ex.getMessage());

and, in the logs would be:


Something failed: null

which is pretty useless. Or, they'd have


  log.log("Something failed: "+ex);

which was give


Something failed: java.lang.NullPointerException

which is almost as useless. I would write


  log.log("",ex);

which would print the stack trace, as the "Something failed:" (or whatever that string was) is useless compared to the stack trace, but all the logging interfaces require a message.

Ideally, every log item would automatically contain the timestamp, the thread name, the source filename and line number. Our logs, however, don't have the source filename or line number, but do have the log category, which, being the class name, is almost always good enough. I've had to deal with not having the thread name in some other logs, and that can make things much more difficult, but at least I've had access to the source code and could guess. For those logs, the code was written by people who hated "clutter" in the logs, so they didn't log stack traces, so I really didn't like having to diagnose problems that code.

Saturday, May 9, 2009

I recently realized that there are some languages that are relatively popular, yet I've never found compelling enough to even play with. They are Ruby and Python.

I work with Java. I've played around with JavaScript quite a bit. In the past, I've played around with scheme, which JavaScript reminds me much of, particularly how they both encourage continuation passing style. JavaScript doesn't have call-with-current-continuation, though.

In the past, I've done a little with C++, and a lot with C. I've also used FORTRAN77 and Fortran90 extensively, as well as a little bit of HPF, until I gave up on the buggy compiler.

I started out with BASIC and 6502 machine code and assembly. I played with 68000 assembly for a summer.

In more scripting type languages, I've used perl quite a bit. As an emacs user, I've found emacs lisp quite handy. I used to use M-ESC quite a bit before it got changed to M-: (eval-expression), which I now use, and is my desktop calculator. When I played with tcl, though, I'd be writing in C, and it was more like sh. I've never even tried to use csh.

I've liked the syntax of Haskell, Standard ML, and OCaML, and loved their type systems. I played with Scala a little, recapturing some of the feel of those strongly-typed functional languages. I never tried using erlang, though. I've even fiddled with J, the ASCII-based APL derivative that looks like line noise.

I've dabbled in Objective C, but not Smalltalk. I've experimented with Sather, but not Eiffel. I played with Ada 95 a little. I've played with Pascal some. The only Computer Science class I'd ever taken taught Pascal. The only other computer class that I've taken was FORTRAN77 for physics students.

I've played with lots of computer languages, yet I've still never touched Python, which I've known about for a long time, or Ruby, which I've known about for less.

Friday, May 8, 2009

I really like the Spring Framework inversion of control (or dependency injection) container. I restructured a small part of the codebase 6 months ago to use it and got chewed out for checking that in. That code didn't get reverted, though.

For modular code, what people where I work do is to write a factory class that does Class.forName(), and then Class.newInstance(). There are numerous factory classes with a single method doing the Class.forName() and Class.newInstance() thing. Spring eliminates that. Plus, the objects built up by Spring are all configured by dependency injection. I don't think many people where I work here really get that, though.

Thursday, May 7, 2009

Half of all my check-ins of the past 6 months have been in the past month. The main reasons are

Version 2.0
2 people on the team left

Something else that may be related is that at a recent company meeting, a high-up person said that the development of version 1.0 was too slow and the approach was too conservative.

The advent of version 2.0 meant that version 1.x was branched off, and that management and QA are all focused on that branch. So, when I'm not fixing bugs in the 1.x branch, I can do stuff in the main branch, essentially getting a head start on the stuff planned for version 2.0.

Before the 2 people left, I was not as free to check in anything more than minor changes without having a meeting to discuss what was being changed. And when I did, I got a talking to. One glaring example was when the guy who chewed me out for checking some changes in was the same guy who dropped by my cube the previous day to say that we needed those changes, which, it seemed to me, were fairly simple.

The top changes that I've made in the past month are

restructuring some code that greatly increased modularization
OAuth consumer support
adding localization infrastructure

Wednesday, May 6, 2009

This was created so that I could use Google's OAuth Playground to track down why my OAuth implementation wasn't working.

It was helpful in that could see that there was nothing wrong with how I generated the signature base string. I then hard-coded the timestamp and nonce from the playground and found that I was calculating the signature incorrectly.

So then, I hard-coded the remaining parameters, the consumer key and the consumer secret, and now I was generating the correct signature. I then removed all the hard-coded parameters and logged the consumer key and secret. The logs showed that the consumer secret was wrong, it was the consumer key. So I looked at all the calls to see if I could have possibly swapped the parameters or maybe inadvertently called an overloaded method with shifted parameters, but that wasn't the case. Plus, the actual consumer key had the correct value.

Finally, I got to the configuration file, and saw


    <property name="consumerKey"><value>${consumer-key}</value></property>
    <property name="consumerSecret"><value>${consumer-key}</value></property>

Just a stupid little mistake.

I write the code