Mere thoughts on a biological programming language

I occasionally get ideas that I don’t have the time or opportunity to pursue, so I thought I’d organize and write about one of them. Synthetic biology is an extremely interesting area, which might have caused me to take a different career path, if I’d been born a little later, or knew more about it when I was younger. A few years back, some clever people managed to create the first 100% synthetic genome (as in “we turned bits and nucleotides into DNA”), and inserted it into a cell and it thrived.

It’s interesting to think about what the end-game of this technology is. There are multitudes of problems solvable by designing novel organisms in a computer and synthesizing them, even with a relatively crude understanding of biochemistry. So there’s a question closer to my area: what would a programming language for organisms look like?

There are a few things we can answer with relative certainty. We can look at the question from three approaches: top-down, bottom-up, and black-box. What, about programming languages in general, can we say with reasonable certainty will apply to even a biological programming language? And how far can we go from what the result must be (DNA) backwards to what the language must be like? And regardless of what the programming language looks like, what properties must be true about it?

The language will be grounded in type theory. This might (to someone not familiar with PL theory) sound a bit unreasonable, since we still live in a world where most computer programming languages aren’t grounded in type theory, yet. But type theory is the fundamental theory of programming languages (and they’re hard at work extending that to logical and all of mathematics), so it’s not too controversial from that perspective. The only real reason our current language aren’t grounded in type theory is our current languages are a mess of historical accidents.

If this seems hard to imagine, the next probable property might help: the ambient monad for a biological language will be totally different from computer programming languages. (That is, different from languages for Von Neumann/Turing machines.) Most of our programming languages pervasively allow things like state, mutation, and IO. Even Haskell, with its reputation for being “pure,” pervasively permits non-termination of computations. A biological language will likely share none of these “ambient monads”, though I can make no guesses at the moment as to what ones we might find convenient to use. (I suppose there will be a sort of ambient state for the cell, but it will be different from how we typically think of state in current languages.)

If that’s still hard to believe, let me try it a different way: a type theory that permits no effects is really just another way of writing down data structures. Once you start wanting to describe a complicated enough data structure, you start wanting more and more abstraction, modularity, and composition features until you end up with a programming language. And that should (darn it) be grounded in type theory.

So next, bottom-up: the language must compile down to DNA. Of course. But we can also draw a nice line in the sand between regulatory genes and (I don’t know of a word for “non-regulatory gene” so I’ll call them) IO genes. I don’t know enough biology to know if “real” (i.e. evolved) organisms obey anything even remotely like a nice separation rule between these two kinds of genes (and evolution suggests it almost certainly does not), but it doesn’t matter. Just as a C compiler cannot generate all possible assembly programs, we can live with our bio language only generating genomes that are “well-behaved” in this way.

But this means that IO genes are the “foreign functions” of the language, and 100% of the purpose of our “program” is to generate the regulatory genes. We’ll almost certainly never write a program that somehow compiles down to the genes for constructing “chlorophyll a”. That’s too complicated. Too much chemistry, and the algorithms involved in figuring out a chemical structure like that are complex. (In the “complexity theory” sense.) You don’t want a compiler solving a problem like that, you want to solve it once and for all, study it carefully, and these re-use the result. Happily, this means evolution gives us tons of building blocks right from the start.

The regulatory side is perfectly reasonable, though. We can decide how we’re going to go about promoting and suppressing expression of IO genes, and then generate a set of regulatory genes that does this exactly according to the program. Again, we’re taking advantage of the fact that we don’t need to be able to reproduce all the different kinds of regulation that have evolved. We only need enough that we can regulate gene expression at all.

Foreign “IO” genes are what the name suggests: both inputs and outputs of the program. That is, some of these will be pure sensing: they detect something about the environment of the cell and cause a change in gene expression. Meanwhile, others will be pure output: they will cause something physical to happen only when they are expressed, but will cause no (direct) effects on gene expression. Others may be both. But this is not the only sensing that can go on: many functional parts of the cell (for example, stoma) will “sense” but purely within their own chemistry, and not directly controlled by gene expression.

The regulatory genes generated by the compiler will be intra-cell only. Probably. It’s possible to rely on “foreign IO” genes to accomplish communication with the environment, including other cells. And this is likely a good idea, because there are a lot of different ways cell can communicate, so it might be unwise to try to fix a few in stone and bake them into the language.

Metadata will be associated with all foreign genes. We’ll want to be able to simulate our programs in the machine, in order to debug and test them. To do that, we need to be able to abstract far away from the actual chemical machinery of the cell, because otherwise it is totally computationally infeasible. Even if inter-cell communication is part of the core of the language and thus does not need to be part of the metadata for foreign genes, we’ll still want to be able to run statistics on things like oxygen exchange, to make sure no cells will be starved and things like that. Since these are the result of the physical effects of expressed genes (i.e. the IO of the cell), we’ll need information on what those effects will be to simulate them without having to resort to simulating the chemistry.

 

So, finally, if it’s not obvious, I should note that I’m no biologist. This is just interesting. So with these few ideas in mind, the next question is: what’s the first step? If we designed a prototype language of this sort, we’d probably want to follow the work on synthetic genomes. Take the first synthetic genome, separate it into regulatory and “IO” genes as best we can, and then rewrite all the regulatory parts within the language, operating on the IO genes as “foreign functions”. Or at least, do so for a small part of it at first. (After all, a “trivial” program exactly reproducing the organism would consist of all the current genes as IO genes, and no program code at all. So we can start with parts and grow to the whole.)

Next, compile it, put it into a cell, and see if the new-but-not-really-different organism manages to survive. And behaves the same. This also happens to be basic science: you’d be verifying your understanding of the regulatory network’s behavior by creating a totally synthetic gene regulatory network.

And then, as the technology to synthesize genomes becomes easier, and the loop between “design genome -> test organism -> measure results” becomes tighter, the scientific opportunities start to explode.

Advertisements
Posted in Uncategorized | Tagged , | Leave a comment

Programmers as glue factories

Chris Granger has a wonderful post up “Toward a better programming” based on his Strange Loop talk. I’m in general agreement with most of it, and probably with anyone making serious attempts at trying to understand why programming is terrible.

Of course, there’s something about it I want to comment on:

Not once did I hear that programming is “solving problems.” Instead, I found out that it’s just clumping things together with half-dried glue and praying that it holds water long enough to replace it with a new clump of things. I heard over and over again that we’re plumbers and glue factories…

Good.

I mean, this is almost a success story, right? Ignore the metaphors that are framing this as a really bad thing for a moment. Wouldn’t we like it if most programming were about composing together small components to solve our larger problems? Would this not essentially be nirvana for programmers? If, for almost all the smaller problems we encountered, we could just use an existing solution, probably better designed, and fully debugged already? Would this not make us wizards?

It is, of course, not a success story, but I think that’s because:

  1. All our programming languages are terrible.
  2. All our standard libraries, libraries, and frameworks are terrible.
  3. All our tools are terrible.

And yet, despite all that, we’re actually sort of close. We have all those severe problems, and yet we’re still able to build systems by trying to compose together parts. It’s just painful and fragile (“teacups stacked on top of teacups”) instead of wonderful, and well… no wonder. Look at that list again.

If you haven’t clicked the link to read that post, please do. There are lots of good ideas there. But for the rest of this post, instead of talking about those, I want to elaborate a little more constructively about *why* I think all our languages (etc) are terrible.

Abstraction, Modularity, Composition

I think the history of programming languages, in the broadest possible terms, is the discovery of these three things as hugely important parts of programming. In fact, I believe we’ve discovered them in this order, over decades.

Abstraction goes back to the beginning of programming. The essential observation of the lambda calculus (and lisp, schemes, etc) was that abstraction was actually the bare minimum, sufficient to write any program.

Great abstractions are ones that hide away details that really don’t matter. Your task really does get easier using them. Good ones hide away details that usually don’t matter, but you aren’t tied to them in any way. If the details start mattering, just use a different abstraction, no problem. Poor abstractions hide details we care about, tie us down in using them, and give us no good way of dealing with it.

(For example, most of the historical arguments about garbage collection have been between people whose particular problem domains disagree on whether garbage collection is a great or poor abstraction. For most problem domains, it’s great. For some, it’s poor.)

Modularity as a principle wasn’t really developed or understood until the 80s. The basic idea is designing interfaces to isolate components of your program. This isn’t really about the module system of a programming language, instead it’s generally down at the level of types.  In object-oriented languages, that’s interfaces or classes, and involves principles like encapsulation.

By and large, the adoption of modularity as an important principle in software design coincided with the object-oriented movement. Unfortunately, this has meant that most languages in use today were based on the ad-hoc dogmas of that movement, and not the underlying principles. Many of them do not support modular development of programs very well at all, despite being considered “object-oriented.”

Composition is only just recently coming to be understood as an important principle. I’m not sure if there has ever been a good manifesto on this, or if instead it’s just sort of slowly congealing in the community’s collective minds. It’s making its way through to practice via mantras like “favor composition over inheritance” or the SOLID principles.

The lack of support for composition in languages like Java is the reason for the existence of things like Dependency Injection frameworks. These frameworks are little more than an elaborate song-and-dance routine that tries to enable composition by reducing it to a modularity problem. (And the frameworks I’ve seen only handle some parts of the problem, partially.)

The major advantage of composition is that it’s almost a cheat code for turning poor abstractions into good ones. If an abstraction hides a detail you care about, and that abstraction is composed together from smaller ones, then you can just break it apart, replace the part that doesn’t work for you, and re-compose it back together again and have a new great abstraction to use. (The state of the art for this, besides dependency injection, is inheritance, which is anti-modular and anti-composable. Elaborating on this might make a good post, later…)

 

I could go on, but I think this is sufficient to get my point across. Our programs are fragile glued together horrors because they’re built on poor abstractions. Abstractions that have to be poor because our languages lack support for modularity and especially composition, and often foist just plain no-excuses poor abstractions on us besides. (e.g. String in Haskell.)

If and when we start working in a world with languages that properly support these principles, and libraries that take advantage of them, will the woes of the world’s glue factories be solved? I don’t know for certain. For all I know, next decade we’ll have discovered all-important principle #4.

But I am very skeptical of any proposal to radically remake how programming is done. I think we have a very clear set of principles for building good software that our programming languages do not enable, our libraries do not follow, and our tools to not support. I’m not convinced the problem with programming is anything truly deeper than “well, let’s just do that already. Sheesh.”

Posted in Design laws | Tagged , , , , | 1 Comment

Advice on writing a programming language

Walter Bright, the guy behind the D language, wrote an article about writing your own programming language. If you’re interested in such things, I suggest reading it. It’s pretty good.

I want to disagree about one thing, though. And add a couple of points that I think are important.

Parsing is actually pretty important

There are a few things I (perhaps surprisingly) suggest should not be considerations… Easy parsing. It isn’t hard to write parsers with arbitrary lookahead. The looks of the language shouldn’t be compromised to save a few lines of code in the parser. Remember, you’ll spend a lot of time staring at the code. That comes first.

…Somewhat more controversial, I wouldn’t bother wasting time with lexer or parser generators… They’re a waste of time. Writing a lexer and parser is a tiny percentage of the job of writing a compiler.

This isn’t entirely bad advice, but I’m going to go with “mostly.” The lexer and parser generators we have are all pretty dismal, it’s true. We know how to do better, but no one’s just gone and done it. Probably partly because it’s an entirely thankless job. (“Here’s the perfect parser generator!” “great, now I’m not going to bother switching to it, see…”)

But this sentiment that parsing difficulty is unimportant and parsing formalisms should be tossed aside is entirely misplaced. As far as I can tell, it’s based on the belief that you just end up writing one parser.

But more than just your compiler needs to parse code. Just a smattering of tools that would want to parse the code for your language include:

  • IDE and text editor tooling
  • Documentation generators
  • REPLs
  • Code reformatters
  • Debuggers and profiling tools
  • Refactoring tools
  • Code highlighters, renderers for Latex and HTML
  • Lint tools
  • Static analysis tools
  • Alternative compiler implementations
  • Web toys wanting a Javascript parser, or apps that prefer a Java native parser, rather than a C library (or vice versa)
  • Even build and packaging tools

And more. Pretty much everything that might work with your code will benefit from being able to parse it. All of these tools will sprout up like weeds for a language that’s easy to parse and popular. (So, Java: Sure. C++: Nnnnope!) Anyone who has an idea can plop a grammar into a parser generator and hack together something that works well to do whatever, and gets improved from there.

If the language is hard to parse, then many of those people eager to contribute either get stuck trying and failing to build a parser or trying and failing to learn to use the daunting internals of your compiler. (Oh, and you did build your compiler as a library with a stable API, right?)

What’s strange is he seems to get it:

[They should have] Context-free grammars. What this really means is the code should be parsable without having to look things up in a symbol table. C++ is famously not a context-free grammar. A context-free grammar, besides making things a lot simpler, means that IDEs can do syntax highlighting without integrating most of a compiler front end. As a result, third-party tools become much more likely to exist.

I mean, right there, that’s what I’m saying. C++ needs an almost full-stack compiler just to parse the language, and if you go look at Clang, you’ll discover it’s got a ton of stuff integrated. Documentation generator, refactoring support, code completion support, and code formatter all integrated directly into the same repo as the full-stack compiler.

I’m not sure where the disconnect is here. C++ is especially bad in what’s necessary just to parse it, but from a tool builders perspective it’s not any worse. All the hardship is in having to link against a probably unstable and poorly documented API just to parse the language. What abominable things happen under the hood to accomplish this doesn’t make it better or worse as far as the library user is concerned.

And so, I disagree. Use an LR parser generator. It’ll keep your language parsable, and make it easier to change early on. When your language becomes popular enough that you have the problem of parsing errors not being friendly enough for your users, celebrate that fact and hand-roll only then.

And then reap the benefit of all the tooling an easily parsed language gets, since you know you kept it LR(1).

On language adoption

You’ll be doing what every nascent rock band does — play shopping malls, high school dances, dive bars, and so on, slowly building up an audience.

I think the trick to language adoption is simpler. Pick an initial application area, and focus on libraries.

People picked up Ruby because of Rails. People picked up PHP because it was designed to work with a webserver from the beginning and directly integrated things like mysql support. People are picking up Go because it has all the libraries you need to write back-end web services. People picked up Javascript because it was the only damned option that did what it did.

If you’re creating a language you think is great for webapps, build your rails equivalent. If you think it’s great for game development, make sure it’s got the right libraries like SDL, GL, or whatever libraries are used with phones these days.

And don’t just crappily expose these libraries and expect people to bite. This should be the selling point. You created this language presumably because you think it’s better. Show how. “Look at how you use SDL and GL in MySystemsAndGamesLang! Isn’t this so much better than what you’re used to?”

On marketing

One last parting shot that I couldn’t resist. When it does come time to promote this awesome language you’ve created, sell it on its own merits. Don’t disparage the competition.

If the first bits of marketing to reach most potential users is “attention C++ users: your language sucks,” … this might not work very well, even if true.

Posted in Uncategorized | Tagged , , | 4 Comments

A newbie’s guide to understanding Clang AST construction

I’ve just jumped into reading Clang code a bit, and these are some notes on how it works. Accuracy completely disavowed, I’m obviously no expert on Clang internals! (But, you know, corrections welcome.) I’m mostly interested in C, so I’ll probably skip some stuff important for C++ (and ObjC) here.

A high-level overview

The general structure for compiling a file looks standard, something like “Lex -> Parse -> Sema -> AST -> whatever” where “whatever” might be something like “CodeGen”. These are all names of modules you’ll find under clang/lib/ and the corresponding clang/include/clang/.

Lex and Parse have mostly the obvious responsibilities, with the exception that Parse appears to handle management of scopes (for name lookup). Sema is a module that tries to separate out from Parse the semantic analysis and AST construction code. Sema is pretty much a bunch of functions, called by Parse, that do type checking and AST construction.

That’s right: type checking is done during parsing of the source file, and not performed on the AST. I’m assuming this is an awful requirement imposed by C++ syntax. (And Parse handling scopes is probably the result of C syntax and the typedef hack.)

On the bright side, it does mean that ASTs, once constructed, have all type information already filled out. (For those familiar with the “AST typing problem” typically discussed in functional languages, Clang is essentially resolving this by not having a tree data structure without full type information at all.)

Scope overview

The scoping mechanism is somewhat nifty. To see an example, look for ParseForStatement in Parse/ParseStmt.cpp. To open a scope, it first decides what kind (just continue/break scope, or also including declarations) and then creates a ParserScope object, which does the following:

/// ParseScope - Introduces a new scope for parsing. The kind of
/// scope is determined by ScopeFlags. Objects of this type should
/// be created on the stack to coincide with the position where the
/// parser enters the new scope, and this object's constructor will
/// create that new scope. Similarly, once the object is destroyed
/// the parser will exit the scope.

It’s just a lifetime management object: the scope opens where the constructor runs, and closes either explicitly, or when the destructor is called.

Related to scoping, of course, is looking up identifiers. I’ll avoid the full details, let’s just look at how ActOnIdExpression in Sema/SemaExpr.cpp works. First, look up results are stored in a LookupResult (Sema/Lookup.h and Sema/SemaLookup.cpp), but actual lookups are done by function call e.g. LookupParsedName (Sema/SemaLookup.cpp).

That just calls LookupName, which gets an iterator from an IdResolver which, you know what? Let’s not go further on that. Anyway, yeah, lookups happen, and at least you can see the Scope being passed into LookupParsedName.

Typechecking overview

I won’t look too much at the diagnostic system. It’s neat, if you want to check it out yourself. Handles things like translations (I assume), and emitting a relevant snippet of the original code, to give context to an error.

The first thing to do, when trying to show an example of type checking, is to think of something in C that isn’t type correct. This is a trick in the land of C where if(0["Hello, world!"]); is perfectly okay.

So, let’s go with structs. if(expr_of_type_struct_foo) ought to be an error, so long as that’s not a pointer, anyway. Looking at ActOnIfStmt goes nowhere, unfortunately, so we back up to looking at the control flow for ParseIfStatement and discover it calls ActOnBooleanCondition on the expression first.

This leads to CheckBooleanCondition and we find that it:

  1. transforms the expression with DefaultFunctionArrayLvalueConversion, and
  2. checks isScalarType() on the type of the resulting expression.

This leaves us with looking at the default conversions, but C’s rules are complex and, well:

ExprResult Sema::DefaultFunctionArrayConversion(Expr *E) {
ExprResult Sema::DefaultLvalueConversion(Expr *E) {
ExprResult Sema::DefaultFunctionArrayLvalueConversion(Expr *E) {
ExprResult Sema::UsualUnaryConversions(Expr *E) {
ExprResult Sema::DefaultArgumentPromotion(Expr *E) {
ExprResult Sema::DefaultVariadicArgumentPromotion(Expr *E, VariadicCallType CT,

There they are in Sema/SemaExpr.cpp.

AST overview

Let’s pick function declarations as our example. Here’s a Create function for a FunctionDecl (found in AST/Decl.h):

static FunctionDecl *Create(ASTContext &C, DeclContext *DC,
                            SourceLocation StartLoc,
                            const DeclarationNameInfo &NameInfo,
                            QualType T, TypeSourceInfo *TInfo,
                            StorageClass SC,
                            bool isInlineSpecified,
                            bool hasWrittenPrototype,
                            bool isConstexprSpecified = false);

Here are a few notes about it:

  • ASTContext appears to be relevant to memory management only: it’s essentially a region that owns all AST node memory.

  • DeclContext is how contextual information is handled in the AST. The children have a pointer to their parent. We can find out what’s relevent (edited slightly:)

    tedinski@smorgasbord:~/repos/llvm/tools/clang/include/clang$ egrep -R "class.*:.*public DeclContext" `find . -name "*.h"`
    ./AST/DeclCXX.h:class LinkageSpecDecl : public Decl, public DeclContext {
    ./AST/Decl.h:class TranslationUnitDecl : public Decl, public DeclContext {
    ./AST/Decl.h:class NamespaceDecl : public NamedDecl, public DeclContext, 
    ./AST/Decl.h:class FunctionDecl : public DeclaratorDecl, public DeclContext,
    ./AST/Decl.h:class BlockDecl : public Decl, public DeclContext {
    ./AST/Decl.h:class CapturedDecl : public Decl, public DeclContext {
    ./AST/DeclObjC.h:class ObjCMethodDecl : public NamedDecl, public DeclContext {
    ./AST/DeclObjC.h:class ObjCContainerDecl : public NamedDecl, public DeclContext {

    That gives us a good idea of what our DeclContext means.

  • SourceLocation The actual representation of this is annoyingly complicated, but it identifies a specific character in a specific file, plus extra information, like include call stack, etc. I’m going to ignore the details but infer from the name “StartLoc” that it’s the start of the declarator. I tried to verify this and fell into a rabbit hole.

  • DeclarationNameInfo Before we figure out what this is, let’s back up a second. What’s a FunctionDecl?

    class FunctionDecl : public DeclaratorDecl, public DeclContext,
                         public Redeclarable 
    class DeclaratorDecl : public ValueDecl 
    class ValueDecl : public NamedDecl 
    class NamedDecl : public Decl 

    Okay. What do each of these progressively want?

    Decl(Kind DK, DeclContext *DC, SourceLocation L)
    NamedDecl(Kind DK, DeclContext *DC, SourceLocation L, DeclarationName N)
    DeclaratorDecl(Kind DK, DeclContext *DC, SourceLocation L,
                   DeclarationName N, QualType T, TypeSourceInfo *TInfo,
                   SourceLocation StartL)

    Okay, let’s start with DeclarationName

    DeclarationName(const IdentifierInfo *II)

    IdentifierInfo is gross, and I don’t want to look at it. I hope it’s just an artifact of trying to be super-efficient. But it seem to just be a name, plus some other info of (to me) unknown importance that’s hopefully not relevant to C. On to DeclarationNameInfo:

    struct DeclarationNameInfo {
    private:
      /// Name - The declaration name, also encoding name kind.
      DeclarationName Name;
      /// Loc - The main source location for the declaration name.
      SourceLocation NameLoc;
      /// Info - Further source/type location info for special kinds of names.
      DeclarationNameLoc LocInfo;

    Not bad. There’s an awful lot of operations on this, that all seem to use LocInfo and be for C++ only. So that’s probably not relevant. So it’s pretty much just a name and location. Okay.

  • Going back to the things that go into FunctionDecl, way up there, next up is QualType, but I’m going to save this for last.

  • Then there’s TypeSourceInfo. I just don’t even know. So it’s probably something to do with locating the type part of the declarator (as opposed to the identifier part) in the source files. It’s implementation is an abomination of who knows what, though.

  • Next is StorageClass. From Basic/Specifiers.h. These are none/extern/static/etc.

  • The booleans parameters to a FunctionDecl look mostly syntactic. I’d say the hasWrittenPrototype is semantic though.

Then, there are more pieces the parser fills after construction:

NewFD->setParams(Params);

I’m going to stop there in looking at ASTs. There’s probably a lot of confusion in that little dump of information up there, but let’s return to QualType instead.

Type representation overview

At a conceptual level, this works simply enough, but the focus on efficiency makes everything more difficult (as usual.) Types are rather simple: there is QualType, which is essentially a pair of a Qualifiers and a Type. Qualifiers are things like const, restrict, and volatile. Type has constructors for just about everything: BuiltinType, ComplexType, TypedefType, PointerType, etc. Even ParenType, apparently just to keep all the syntax relevant. Structs are apparently called RecordType.

To see a full list, you can consult AST/TypeNodes.def.

If a type wraps other types, like PointerType for example, then those wrapped types are QualTypes. This is, of course, how things like “const char * const t” get represented.

Type checking then proceed by a truly shocking number of “isX” boolean functions you can find on Type in AST/Type.h. The general structure seems to be that if you need to examine the types in any depth, you will, for example, call isBuiltinType() and if true call castAs<BuiltinType>(). Or, I suppose, call getAs() and check for null.

There’s quite a lot I haven’t explored, but I hope this exercise in tracking down things and grepping through the source has been interesting to someone.

Posted in Compilers | Tagged , | Leave a comment

Lazy Webbing a Personal Cloud?

The shutdown of Reader left me… sad. Feedly has done okay as a replacement, though I’m not sure what their business plan is. I’d discovered Tiny Tiny RSS, which actually looks quite nice, and has the distinct advantage of staying up as long as you decide to keep it up. But… well, I’m not really a “host things myself” person at the moment. (Hello, from my wordpress.com blog!) And just an RSS reader isn’t enough to motivate me to start.

Drew Crawford has written a step by step guide to setting up a mail server, complete with IMAP, spam filtering, push, and encryption. It’s excellent. I’m posting pretty much to save that link for myself later. Because RSS and email is almost enough motivation.

So my friendly lazyweb, are there good open source replacements for Dropbox?

How about Google calendar or docs? (Or, since docs seems like a lot, maybe just the word processing part of it?) Contacts? (Actually wait, isn’t IMAP supposed to do that?)

Now that I think about it, I wonder how hard it is to set up a private VPN server.

And what host has the cheapest VM with decent disk space? (Since only you’d be accessing your own box, I feel like bandwidth, cpu, etc aren’t such a big deal.)

Also, what other neat stuff can you cram into your personal “cloud?” (After all, everything cool is a cloud now, right?)

Oh. And who’s working on the scripted/chef/puppet/flavor of the month version of Drew’s post? I know you exist.

Hey, the title of this post did say Lazy.

Update 2013-7-20: Saw this link on HN about replacing dropbox. Doesn’t quite cut it though. For one thing, needs inotify, not “sleep 2”. For another, well, the android app isn’t open source. And, you know, other platforms exist, too.

Update 2013-8-20: Ansible scripts that set up a lot of this stuff. Nice!

Posted in Dear lazyweb, Personal cloud | 1 Comment

The use of closed nonterminals

Motivated by a modular well-definedness analysis for higher-order attribute grammars with forwarding that I developed (for which I just noticed I really need to get a pdf up), I added the concept of “closed nonterminals” to Silver this last year.

Nonterminals in Silver allow extensions to introduce new productions, but only so long as those productions forward back down to the host language.  Extensions are then free to introduce new synthesized attributes, confident that they can always be evaluated via forwarding.  It’s a nice way of solving a very, very strong form of the expression problem.

Closed nonterminals are introduced because we wanted to sometimes make the dual trade-off.  Instead, extensions are allowed to introduce new attributes, but only so long as those attributes have a default value.  Then, new (not necessarily forwarding) productions can be introduced by extensions, confident that if they lack an explicit equation, there will be a default value instead.

In some sense, this switches from a “datatype-like” form of extension to an “object-like” form of extension.  We were motivated to introduce this to handle transformations from concrete syntax tree to abstract syntax trees.  Without closed nonterminals, you’d have a really stupid time trying to write what your new concrete syntax productions have to forward to.  When they’re closed, you don’t have to bother with that, and instead you just can’t add any new truly meaningful attributes to the concrete syntax tree… which is fine because we don’t want to.

At the time of introduction, I also noticed one other specific usage for closed nonterminals… as part of an environment pattern.  We’d write something like:

nonterminal Defs with ...
closed nonterminal Def with ...

To allow extensions to introduce entirely new kinds of definitions into the environment.

Since then, I’d started to feel the urge to use them a little more often, nearly always the same case of handling lists of things: Defs with Def, top level Dcls with Dcl, etc.  However, I had resisted so far, because I wasn’t sure why this would be okay.  Perhaps down the road, we’d suddenly discover this was the wrong decision?  I’d want an explanation for why this is right, and not just convenient now, before I’d start to allow it to be used.  There was a good story for concrete syntax, and for the environment pattern, but not so far for much else.

Simply choosing to make all “list element” nonterminals closed all the time is definitely wrong: Stmts, for example, very very much does not mean Stmt can be a closed nonterminal.  It’d eliminate the ability of an extension to introduce any sort of new analysis.

But, I think I’ve come up with an explanation: any list that exists to only permit an unordered collection of syntax can have its “list element” be a closed nonterminal.  A  good example is modifiers: “public static final,” “final public static,” who cares? The point is simply to recognize what’s there.

I’m still not completely convinced this is sufficient justification for using a closed nonterminal, but it appears to be a necessary condition for some particular class of valid uses.  Modifiers seems very safe, but the idea of making top-level declarations (which meet this criterion, at least for Silver itself) closed still makes me nervous…  The trouble is, we’re running into some extensions we want to write that demand it be closed, but I don’t yet know what sort of interesting extensions would be precluded by doing so.  Which, of course, isn’t evidence that there aren’t any. :)

Posted in Attribute grammars, Compilers, Language extension, Silver | Tagged | Leave a comment

Extending C

I’m looking for a few good language extensions to C.

I want to demonstrate the practicality of composable language extensions, and to do that, I’m going to extend C with a bunch of neato stuff.  But I haven’t yet determined just what neato stuff that will be.

I have a number of good ideas motivated by C libraries.  For example:

  • MPI. Parallelism is always good, and there should be quite a lot of lowish-hanging fruit for a language extension for MPI.
  • Bison. Parsing is really well-addressed by DSLs, so this is an obvious choice.
  • Lua. There’s a lot of regularity to the interface between these languages that could be very well addressed by a language extension.  This isn’t extending C with Lua, but rather an extension to C that makes using the Lua C API dramatically easier.
  • OpenCL. I haven’t fully investigated OpenCL yet, but I very much imagine this will be as amenable to a language extension as MPI is. And GPU programming is another obvious hot topic.

However, all of these involve potentially complicated design work.  In the interest of laziness time-efficiency (and as a bonus, to speak more directly to the research community,) I’d instead like to implement language extensions others have already pioneered, but implement them as composable language extensions.  This way, the focus is on the advantages of this approach, and the particular extensions are already regarded as interesting.  Of course, this leaves me adrift: there are a lot of papers that extend C.  And I haven’t really paid attention to this area in the past, so I’m somewhat clueless as to which are actually regarded as good.

So, dear #lazyweb, what are your favorite papers that extend C?

Posted in Dear lazyweb, Language extension, Silver | Tagged | 5 Comments