There has been some recent discussion on languages people should learn if they want to broaden their understanding of programming generally. For example, Fogus, and Eugene Wallingford in response. There has also been a recent burst of posts to proggit on API design (some of which I find questionable, by the way.)
I’d like to attempt a merger of these two ideas: what should people learn to improve their ability to design programs and APIs, beyond simply looking at good or bad examples, or learning specific languages? Instead, I think there are some interesting concepts people should be familiar with:
Objects. I’ll start with the classic that I assume everyone has already learned something of, but here is a small list of things you should know that many do not:
- When inheritance is actually called for, and when you should avoid it.
- “Generics” and subtyping, in particular: covariance, contravariance, invariance. (Types are important here, so I suggest using a statically typed OO language for this, also avoiding C++ because templates are untyped. If this is all perplexing to you, I’ve heard good things about the book “Effective Java” but I haven’t read it myself.)
- (There is much more to list about doing OO well, but part of this whole exercise is to give people the tools to come up that list themselves, so I will stop here for now.)
Datatypes. Here I refer to the usual data type declarations of functional languages. Some examples include Haskell (data), ML (datatype), Scala (case class), F#/OCaml (type), and of course many others. Datatypes and objects are actually dual to each other:
|objects||unlimited||fixed (virtual functions)||dynamic dispatch|
Things to understand:
- How the datatype world gets by without ‘null’.
- How datatypes and objects can be transformed into each other. (huge hint: datatypes are to existential types as objects are to the visitor pattern.)
- How damn awesome pattern matching is. (For example, “multiple dispatch problem? Pffft!”, nested patterns, etc.)
- Can you think of any object hierarchies in code you’ve written that would have been vastly better represented as datatypes, instead? (Obvious choice: anywhere you’ve used the visitor pattern. Next choice: anywhere you would have used the visitor pattern, but all that extra syntactic noise made it not worthwhile. But beyond that?)
- Further reading: On understanding data abstraction, revisited.
“Complete types.” I’ve completely made up this term, so bear with me for a moment. There are a number of types where it doesn’t matter whether we choose to represent them as objects or datatypes because both the list of constructors and the list of necessary functions are fixed. That is, all other functions can be written in terms of the fixed list. (To really be “complete” it does have to be ALL functions, not just those any sane person would want to write!)
For example, lists. You need ‘cons’ and ‘nil’ constructors, and you need the ‘fold’ operation. (Did you expect me to list ‘null’ ‘head’ and ‘tail’ as the operations on a list? Why might those be a very poor interface to lists? Beyond just, “all of them can be written using fold,” I mean.) You don’t need anything else. Any function over a list can be written as a fold, and any way of constructing a list (e.g. append) can be written as a function that just uses the two constructors (and probably fold, too.)
Other examples include Boolean (though you need lazy evaluation, or a simulation of it, to write the ‘if’ function in its interface), Pairs, and Maybe (forget ‘isJust’ and ‘fromJust’, what is the function in its interface?).
- Come up with another example of a “complete type.”
- Compare the implementation of Maybe as both an object and a datatype.
(okay, I’m sliding into “give homework” mode here. Sorry! I love these because they let us really directly apples-to-apples compare datatypes and objects, albeit for very simple cases.)
Typeclasses. Haskell is the canonical example, but Scala may also approximate typeclasses. Type classes actually bring us back to the object “side of the fence” from datatypes, but in a slightly different way: they’re most comparable to interfaces from OO languages.
Further, “constructor classes” allow us to make more than just types members of a type classes, and instead allow type constructors. This is the (often unnoticed) mechanism behind type classes like Functor and Monad in Haskell.
Things to see:
- Get a feel for how powerful the two-level system of “datatypes with typeclasses on top” is. If you came to the conclusion earlier that datatypes would make for a superior “default” way of representing things, typeclasses should start to make you wonder if objects are even needed at all!
- But… come up with a case where “typeclasses over datatypes” is insufficient/cumbersome and objects lead to a superior design. Bonus points: do so in a way that doesn’t involve encapsulating mutation.
- Go figure out Functor, Monad, and such. The Typeclassopedia may help.
Modules. Here, you’re pretty much limited to OCaml (standard ML, too, but a lot fewer resources are out there for learning it…)
- Understand structures, signatures, functors (not to be confused with haskell functors), and sharing.
- Compare and contrast modules with typeclasses (this may be hard! I’m not even sure I fully understand it…)
- Read Of Course ML has Monads. Also ML Modules and Haskell Type Classes: A Constructive Comparison and Modular Type Classes.
Relations. This is an important, and often overlooked, area of program design, despite the fact that it’s probably one of the most common types of programs today. (Perhaps because there is no “relation-oriented programming language.” The closest is probably Prolog, and that’s not what I have in mind at all.)
- Take a look at relational algebra, and of course SQL. (Both, because they do take different perspectives!)
- If you haven’t looked at an ORM before, try to learn one.
- If you have, you’re probably a bit “tainted” in that you view relations through the lens of what your ORM makes easy. Try to design a perfectly reasonable database that your favorite ORM sucks at actually using. Bonus: go beyond obvious things like “my orm expects there to always be a single integer primary key”. Double bonus: do so without googling “why my favorite ormsucks” ;)
- Consider how you would adapt relations to datatypes instead of objects (and how to adapt datatypes to relations.) (Perhaps look at Haskell DB libraries? I haven’t, so I’m not sure what’s there…)
- If you’ve worked on an app with a DB recently, try to think about what your “interface” to the database actually is. (I mean, to start you can look at “all queries we run” but odds are that’s not a very good interface!) (There is a “cross cutting” issue that makes this somewhat difficult: joins operate over many tables at once, so we can’t just do the obvious “give a simple interface to each table” (something ORMs frequently attempt.) But, we do actually have a similar situation with pattern matching on datatypes: you can match on multiple types at once.)
- (Suggestions for further reading welcome.)
Attribute grammars. Consider taking this one with a grain of salt, because attribute grammars are what I work on every day, so I’m biased. But, I think they’re an often overlooked data representation technique with a lot of really cool properties (probably overlooked because the word “grammar” is somewhat inappropriately in the name, for historical reasons.)
Attribute grammars are a bit like “datatypes with typeclasses.” Except we call our types “nonterminals”, our constructors “productions”, and our type classes (restricted to one function) “attributes.” However, they have a number very important key differences:
- AGs are not limited in constructors (productions) and functions (attributes.) (This should cause a problem: spot it. We solve this problem in our AG system with forwarding.)
- AGs permit flowing information down the tree (inherited attributes) as well as up (synthesized attributes.) The type class approach would be limited to up (though inherited attributes can be simulated by making what flows up a function of what should flow down.)
- AGs memoize the value of attributes during an evaluation, and demand no explicit ordering, as function calls would. (Very helpful, see: this discussion, for example.)
AGs are most frequently used to do language processing. (This is how ‘grammar’ got in the name.) Unfortunately, there’s no one system I can point to as a good one to play with. Our tool, Silver, is a great little stand-alone language. But UUAG (preprocesses to Haskell), JastAdd (preprocesses to Java), and Kiama (embedded in Scala) are all also good, and each has different trade-offs, which would be a whole post to itself.
- Further reading: Why Attribute Grammars Matter
Objects. By now, having gone through all the above (some months/years later!), hopefully coming back to objects makes you reconsider just what you think of as “object-oriented programming.”
Misc. Other things that have substantial effect on program design include: mutation, garbage collection, lazy evaluation, higher-order functions and closures, IO, and a whole lot more that I can’t think of at the moment. If you actually wade through learning the above, you’ll pick up on these, but it’s worth taking the time to think about them separately.
For example: Java might force us to look at null/head/tail being the operations on lists because it’s hard to write the higher-order function for fold in Java, whereas it’s quite easy in Haskell. Lazy evaluation can allow us to construct some kinds of circular data structures, without resorting to mutation. (My favorite example of this is when constructing closures for recursive functions in an interpreter.) Cleanly separating the IO parts from the pure parts of a program, as Haskell forces you to do, is not only a general improvement to the usefulness of an API, it makes writing tests dramatically easier. Actually, the ease with which good tests can be written is probably a pretty good “code smell” indicator, too.
One last observation: We have a bunch of different things on the “object side of the fence:” objects, typeclasses, modules, attribute grammars, and there exist many more (just think about all the different kinds of object-oriented languages.) But there’s basically just datatypes (and relations, I suppose) on the other side. Why is that?
Well, if you go back to thinking about the “dual of datatypes” (using existential types) what you actually get isn’t actually objects. It’s something that subsumes objects, typeclasses, modules, etc. I’d personally consider it too unwieldy to work with on a regular basis, which is why we pick some nice properties we want and come up with restricted versions of it to actually let programmers play with. (I believe that full power is actually available with “first class modules,” but I’m not 100% certain on that.)