Walter Bright, the guy behind the D language, wrote an article about writing your own programming language. If you’re interested in such things, I suggest reading it. It’s pretty good.
I want to disagree about one thing, though. And add a couple of points that I think are important.
Parsing is actually pretty important
There are a few things I (perhaps surprisingly) suggest should not be considerations… Easy parsing. It isn’t hard to write parsers with arbitrary lookahead. The looks of the language shouldn’t be compromised to save a few lines of code in the parser. Remember, you’ll spend a lot of time staring at the code. That comes first.
…Somewhat more controversial, I wouldn’t bother wasting time with lexer or parser generators… They’re a waste of time. Writing a lexer and parser is a tiny percentage of the job of writing a compiler.
This isn’t entirely bad advice, but I’m going to go with “mostly.” The lexer and parser generators we have are all pretty dismal, it’s true. We know how to do better, but no one’s just gone and done it. Probably partly because it’s an entirely thankless job. (“Here’s the perfect parser generator!” “great, now I’m not going to bother switching to it, see…”)
But this sentiment that parsing difficulty is unimportant and parsing formalisms should be tossed aside is entirely misplaced. As far as I can tell, it’s based on the belief that you just end up writing one parser.
But more than just your compiler needs to parse code. Just a smattering of tools that would want to parse the code for your language include:
- IDE and text editor tooling
- Documentation generators
- Code reformatters
- Debuggers and profiling tools
- Refactoring tools
- Code highlighters, renderers for Latex and HTML
- Lint tools
- Static analysis tools
- Alternative compiler implementations
- Even build and packaging tools
And more. Pretty much everything that might work with your code will benefit from being able to parse it. All of these tools will sprout up like weeds for a language that’s easy to parse and popular. (So, Java: Sure. C++: Nnnnope!) Anyone who has an idea can plop a grammar into a parser generator and hack together something that works well to do whatever, and gets improved from there.
If the language is hard to parse, then many of those people eager to contribute either get stuck trying and failing to build a parser or trying and failing to learn to use the daunting internals of your compiler. (Oh, and you did build your compiler as a library with a stable API, right?)
What’s strange is he seems to get it:
[They should have] Context-free grammars. What this really means is the code should be parsable without having to look things up in a symbol table. C++ is famously not a context-free grammar. A context-free grammar, besides making things a lot simpler, means that IDEs can do syntax highlighting without integrating most of a compiler front end. As a result, third-party tools become much more likely to exist.
I mean, right there, that’s what I’m saying. C++ needs an almost full-stack compiler just to parse the language, and if you go look at Clang, you’ll discover it’s got a ton of stuff integrated. Documentation generator, refactoring support, code completion support, and code formatter all integrated directly into the same repo as the full-stack compiler.
I’m not sure where the disconnect is here. C++ is especially bad in what’s necessary just to parse it, but from a tool builders perspective it’s not any worse. All the hardship is in having to link against a probably unstable and poorly documented API just to parse the language. What abominable things happen under the hood to accomplish this doesn’t make it better or worse as far as the library user is concerned.
And so, I disagree. Use an LR parser generator. It’ll keep your language parsable, and make it easier to change early on. When your language becomes popular enough that you have the problem of parsing errors not being friendly enough for your users, celebrate that fact and hand-roll only then.
And then reap the benefit of all the tooling an easily parsed language gets, since you know you kept it LR(1).
On language adoption
You’ll be doing what every nascent rock band does — play shopping malls, high school dances, dive bars, and so on, slowly building up an audience.
I think the trick to language adoption is simpler. Pick an initial application area, and focus on libraries.
If you’re creating a language you think is great for webapps, build your rails equivalent. If you think it’s great for game development, make sure it’s got the right libraries like SDL, GL, or whatever libraries are used with phones these days.
And don’t just crappily expose these libraries and expect people to bite. This should be the selling point. You created this language presumably because you think it’s better. Show how. “Look at how you use SDL and GL in MySystemsAndGamesLang! Isn’t this so much better than what you’re used to?”
One last parting shot that I couldn’t resist. When it does come time to promote this awesome language you’ve created, sell it on its own merits. Don’t disparage the competition.
If the first bits of marketing to reach most potential users is “attention C++ users: your language sucks,” … this might not work very well, even if true.