Advice on writing a programming language

Walter Bright, the guy behind the D language, wrote an article about writing your own programming language. If you’re interested in such things, I suggest reading it. It’s pretty good.

I want to disagree about one thing, though. And add a couple of points that I think are important.

Parsing is actually pretty important

There are a few things I (perhaps surprisingly) suggest should not be considerations… Easy parsing. It isn’t hard to write parsers with arbitrary lookahead. The looks of the language shouldn’t be compromised to save a few lines of code in the parser. Remember, you’ll spend a lot of time staring at the code. That comes first.

…Somewhat more controversial, I wouldn’t bother wasting time with lexer or parser generators… They’re a waste of time. Writing a lexer and parser is a tiny percentage of the job of writing a compiler.

This isn’t entirely bad advice, but I’m going to go with “mostly.” The lexer and parser generators we have are all pretty dismal, it’s true. We know how to do better, but no one’s just gone and done it. Probably partly because it’s an entirely thankless job. (“Here’s the perfect parser generator!” “great, now I’m not going to bother switching to it, see…”)

But this sentiment that parsing difficulty is unimportant and parsing formalisms should be tossed aside is entirely misplaced. As far as I can tell, it’s based on the belief that you just end up writing one parser.

But more than just your compiler needs to parse code. Just a smattering of tools that would want to parse the code for your language include:

  • IDE and text editor tooling
  • Documentation generators
  • REPLs
  • Code reformatters
  • Debuggers and profiling tools
  • Refactoring tools
  • Code highlighters, renderers for Latex and HTML
  • Lint tools
  • Static analysis tools
  • Alternative compiler implementations
  • Web toys wanting a Javascript parser, or apps that prefer a Java native parser, rather than a C library (or vice versa)
  • Even build and packaging tools

And more. Pretty much everything that might work with your code will benefit from being able to parse it. All of these tools will sprout up like weeds for a language that’s easy to parse and popular. (So, Java: Sure. C++: Nnnnope!) Anyone who has an idea can plop a grammar into a parser generator and hack together something that works well to do whatever, and gets improved from there.

If the language is hard to parse, then many of those people eager to contribute either get stuck trying and failing to build a parser or trying and failing to learn to use the daunting internals of your compiler. (Oh, and you did build your compiler as a library with a stable API, right?)

What’s strange is he seems to get it:

[They should have] Context-free grammars. What this really means is the code should be parsable without having to look things up in a symbol table. C++ is famously not a context-free grammar. A context-free grammar, besides making things a lot simpler, means that IDEs can do syntax highlighting without integrating most of a compiler front end. As a result, third-party tools become much more likely to exist.

I mean, right there, that’s what I’m saying. C++ needs an almost full-stack compiler just to parse the language, and if you go look at Clang, you’ll discover it’s got a ton of stuff integrated. Documentation generator, refactoring support, code completion support, and code formatter all integrated directly into the same repo as the full-stack compiler.

I’m not sure where the disconnect is here. C++ is especially bad in what’s necessary just to parse it, but from a tool builders perspective it’s not any worse. All the hardship is in having to link against a probably unstable and poorly documented API just to parse the language. What abominable things happen under the hood to accomplish this doesn’t make it better or worse as far as the library user is concerned.

And so, I disagree. Use an LR parser generator. It’ll keep your language parsable, and make it easier to change early on. When your language becomes popular enough that you have the problem of parsing errors not being friendly enough for your users, celebrate that fact and hand-roll only then.

And then reap the benefit of all the tooling an easily parsed language gets, since you know you kept it LR(1).

On language adoption

You’ll be doing what every nascent rock band does — play shopping malls, high school dances, dive bars, and so on, slowly building up an audience.

I think the trick to language adoption is simpler. Pick an initial application area, and focus on libraries.

People picked up Ruby because of Rails. People picked up PHP because it was designed to work with a webserver from the beginning and directly integrated things like mysql support. People are picking up Go because it has all the libraries you need to write back-end web services. People picked up Javascript because it was the only damned option that did what it did.

If you’re creating a language you think is great for webapps, build your rails equivalent. If you think it’s great for game development, make sure it’s got the right libraries like SDL, GL, or whatever libraries are used with phones these days.

And don’t just crappily expose these libraries and expect people to bite. This should be the selling point. You created this language presumably because you think it’s better. Show how. “Look at how you use SDL and GL in MySystemsAndGamesLang! Isn’t this so much better than what you’re used to?”

On marketing

One last parting shot that I couldn’t resist. When it does come time to promote this awesome language you’ve created, sell it on its own merits. Don’t disparage the competition.

If the first bits of marketing to reach most potential users is “attention C++ users: your language sucks,” … this might not work very well, even if true.

This entry was posted in Uncategorized and tagged , , . Bookmark the permalink.

4 Responses to Advice on writing a programming language

  1. Abscissa says:

    While you make a good point about the variety of tools that need to parse a language, I think it’s worth noting that a parser alone is often not enough. It IS a necessary baseline for many things, but once it’s in place, there can be a lot of demand for more advanced features that require at least some level of semantics. That leads to a stronger case for compiler-as-a-lib, which reduces some of the need for third party parsers (although as you’ve said, certainly not all of the need).

    “If the first bits of marketing to reach most potential users is “attention C++ users: your language sucks,” … this might not work very well, even if true.”

    Normally, attacking a more widely-established competitor may be a bad marketing tactic, but I think C++ is somewhat of a special case here. Compared to other major languages, C++ has a fairly large proportion of users who very much dislike the language and use it simply because there’s not much alternative for their particular use-case. For example, a lot of AAA game devs DO have a lot of hate for C++ even though it’s the de-facto standard for game engines (and often even for game code, depending on the engine).

    So I think “Tired of suffering through C++’s x, y and/or z problems?” could be a reasonable approach in certain cases, whereas replacing “C++” with, say, “Python” or “Scala” would much more likely alienate your language from its potential audience.

    • Ted Kaminski says:

      I agree: compilers as libraries is absolutely essential nowadays, and many tools will be able to reuse that code.

      But I think there’s still a lot of reasons why you might not be willing or able to simply reuse the compiler library. I mentioned platforms as one possibility (native vs JVM vs CLR vs Javascript), but there’s quite a few others as well. For example, perhaps you want to change the language a little bit to add special annotations for your static analysis. Or you want to analyze things in comments that the original compiler’s parser just discarded. Or someone wants to create a literate programming tool, and so needs to embed that grammar inside something else. And so forth.

      In some ways, ease of parsing is like a fallback API. By all means, have a well-designed and stable API to use the compiler as a library, but saying that “ease of parsing doesn’t matter” is like saying “the underlying API can be horrible that’s fine, we’ll hide it under an abstraction layer.” I’m of the opinion that both should be well-designed. :)

  2. Pingback: Reddits | cyndichristopher

  3. Craig says:

    “And don’t just crappily expose these libraries and expect people to bite”

    Wow, don’t use up all your insight on just one post — save some for later! I need some time to properly absorb the subtleties here.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s