Monday, March 15, 2010

After getting reacquainted with Haskell, I really like using it. However, using lots of higher-order functions and operators makes for dense APL-like code that is inaccessible to those unfamiliar with those functions and operators. And, parser combinators introduce very expressive and extensible ways to create parsers, but would also look foreign to the uninitiated. But these operators and functions also can be used to make very elegant and maintainable code.

For example, I'm writing code to deal with JVM class files. To write to a class file, I want to convert various things to a list of bytes, so I made a typeclass for that:

class ToBytes a where
toBytes :: a -> [Word8]

Then, since many objects, when converted to a list of bytes, are just the concatenation of the conversion of component objects, I defined

concatAccessors :: [a -> [b]] -> a -> [b]
concatAccessors = flip (.) (flip ($)) . flip concatMap

which is an example of dense APL-like code, using the (.) and ($) operators and the flip and concatMap higher-order functions. However, it enables this:

instance ToBytes JVMClass where
toBytes = concatAccessors [
toBytes . jvmClassMagic,
toBytes . jvmClassMinorVersion,
toBytes . jvmClassMajorVersion,
toBytes . jvmClassConstantPool,
toBytes . jvmClassAccessFlags,
toBytes . jvmClassThisClass,
toBytes . jvmClassSuperClass,
toBytes . jvmClassInterfaces,
toBytes . jvmClassFields,
toBytes . jvmClassMethods,
toBytes . jvmClassAttributes
]

For most of the cases, having the ToBytes type class doesn't buy me that much. It saves having to have separate names for jvmClassToBytes, word16ToBytes, word32ToBytes, etc, but they still have to be defined. However, when there are lists of items in the class file format, it is preceded with a 16 bit count, so I can define

instance ToBytes a => ToBytes [a] where
toBytes l = toBytes (fromIntegral (length l) :: Word16) ++ concatMap toBytes l

so the constant pool (with a hack for the off by 1 length and the double-sized long and double constants), and the lists of interfaces, fields, methods, and attributes can be taken care of by defining toBytes for single items.

For reading class files, I learned how to make a state-transformer monad, which looks like this

newtype ST st a = ST (st -> (a,st))

instance Monad (ST st) where
p >>= q = ST (uncurry (s . q) . s p) where s (ST t) = t
return = ST . (,)

runST :: ST st a -> st -> a
runST (ST t) = fst . t

There are the uncurry higher-order function and the (.) operator making things hard to understand for those unfamiliar with them, not to mention the odd looking (,) constructor. However, it enables code like this:

readJVMClass :: ST State JVMClass
readJVMClass = do
magic <- readWord32
minorVersion <- readWord16
majorVersion <- readWord16
constantPool <- readConstantPool
accessFlags <- readWord16
thisClass <- readWord16
superClass <- readWord16
interfaces <- readList readWord16
fields <- readList readField
methods <- readList readMethod
attributes <- readList readAttribute
return (JVMClass magic minorVersion majorVersion constantPool accessFlags
thisClass superClass interfaces fields methods attributes)

which is very maintainable.

Using Functor and Applicative

instance Functor (ST st) where
fmap f p = do { a <- p; return (f a) }

instance Applicative (ST st) where
pure = return
p <*> q = do { f <- p; a <- q; return (f a) }

it could be written more concisely, but with more weird operators:

readJVMClass = JVMClass <$> readWord32
<*> readWord16
<*> readWord16
<*> readConstantPool
<*> readWord16
<*> readWord16
<*> readWord16
<*> readList readWord16
<*> readList readField
<*> readList readMethod
<*> readList readAttribute

readJVMClass (readWord32, readWord16, ...) doesn't return a JVMClass (Word32, Word16, ...), it returns a state transformer, which is a function that takes a state, and returns an updated state and a JVMClass (Word32, Word16, ...). readJVMClass just builds a state transformer that chains other state transformers and extracts the values that they return. To actually do the reading and building, one must use runST.

No comments:

Post a Comment