Wind powered cars

There’s a neat, long article over at Wait But Why that I found quite interesting. It purports to be about Elon Musk and Tesla (which might dissuade some readers) but it’s actually about climate change and electric cars.

I wanted to add something to it, but that means you should probably read that article first.

One of the first things everyone hears about when looking at renewable energy like wind and solar is variability. Everyone upon hearing that variability of wind is a problem thinks the same thing: gosh, the wind might not blow and there won’t be enough power!

This is wrong. The actual variability problem is the other way around: high winds create too much power. (Why aren’t low winds a problem? Well, it’s possible for them to be an issue, but low wind speed is actually pretty rare 100 meters up. The kinds of calm days we experience are actually hot-air bubbles close to the surface protecting us from the winds higher up.)

Wind becomes uneconomical if you can’t actually use the power you’re generating and it’s a waste. This is why people talk about “grid infrastructure upgrades” as something necessary to support wind power. Iowa’s got 30% of their power coming from wind, and so they want to build a connection to Chicago. To sell power, not buy it. Because excess is what we worry about.

Enter electric cars and the “smart grid.” How do you use up excess energy? Why not offer it up at a discount to anyone willing to defer drawing it until an excess is available? Your car might have a couple of slider settings that look like this:

Car charging diagramThe end result? A fleet of electric cars absorbs excess capacity when we need it to, and also reduces electrical demand when there isn’t excess capacity. Smoothing out demand to match supply.

This is one of the things I love about energy innovations: synergy. Electric cars and wind energy make each other better. There’s also a nice synergy between nuclear energy (as baseload) and grid storage… but perhaps another day.

Posted in Energy | Tagged , | Leave a comment

Algebraic data types: without them, we all suffer

Perhaps you’ve heard of “stringly typed” code. The use of strings instead of actual, you know, proper data structures, in places where it seems completely inappropriate.

Why do programmers write such things? I’m sure some of it is simple bad programming. But there are some legitimate advantages to stringly typed code, if you can manage to hold your nose long enough to give it a fair evaluation.

  • Strings behave like actual values. You can pass them around and copy them quite easily in pretty much all languages.
  • You don’t have to write any serialization/deserialization code. Woohoo!
  • Write-fast programming! Not having to design schemas and write data structures with all the associated “boilerplate” code is the same reason many give for using dynamic languages.

I can’t in good conscience write a list of advantages without also mentioning the disadvantages:

  • Difficult to use and understand. No clues given for what values should be passed in, or might be expected to come out. Documentation becomes harder and more important, and documentation is already rarely good enough.
  • Difficult to refactor. Since way too many things are strings, it’s very hard to find all the implications of a code change. Testing also becomes harder.
  • Biggest source of security vulnerabilities since buffer overflows. Not just SQL-injection, it’s very easy for validation code to disagree with all the possibilities for how the code consuming the string might behave. To see both of these sorts of bugs in action at the same time, consider how mysql_escape_string was buggy and mysql_real_escape_string became necessary.

Okay. So, the question is: can we get rid of these advantages? Well, let’s take one small step in the right direction to: JSONly typed code. It’s all the rage these days. You tend to see it less “in-the-small” compared to stringly typed, so you’re less likely to find simple functions expecting a JSON true/false object the way you can find methods expecting one of “true” or “false” as strings. But it shows up everywhere, regardless of appropriateness these days. Configuration files, same-machine IPC mechanisms, databases…

Let’s compare it to the advantages of stringly typed code:

  • Stringly typed values are already difficult to mutate, so we can get value-like semantics from JSON by simply treating it as immutable without any real loss.
  • Serialization/deserialization code is already written for us in a library. Score!
  • All the “boilerplate” data structure construction / visiting code: also in the library.
  • Still no schemas to think about writing, so code fast! Zoom zoom!

So, a slight cost in using a library, but with mostly the same advantages and with one extra really huge one:

We can actually represent tree-like data.

Think about your standard, popular, typed languages: C, C++, Java, C#. Actually representing tree-like data is a horrible mess of writing a ton of classes along with visitor interfaces and so on. It’s a pain.

Think about your standard, popular, relational databases: MySQL, Postgres, SQL Server. How do you represent tree-like data? I’m not even sure this admits a one-sentence description, so let’s go with: it’s a pain.

Dynamic languages (equivalently, NoSQL databases) manage a little better, simply because they don’t have to bother with writing down overly-verbose boiler-plate code. JSON as a datatype brings these advantages even to typed languages.

But it comes at the cost (of course) of actually having meaningful types. Consider still the same disadvantages listed for stringly typed code:

  • It’s still completely up to good documentation to know what to pass in, and what to expect out. I think every web developer has encountered an infuriating API that gives undocumented JSON error responses sometimes.
  • Refactoring is still a concern, but the use of JSON is more often at interface boundaries where API stability is expected. But this can still result in seemingly innocent changes causing problems with clients, simply because the interface is under-specified.
  • Security vulnerabilities are apparently massively reduced, but I suspect this is partly because this is an area where security researchers haven’t yet had a play day. JSON does seem to significantly reduce the possibility of radically reinterpreting the meaning of a value, but “validation vs interpretation” mismatches are still possible, and I’m sure they’re out there.

So, how do we completely fix all these disadvantages, while retaining all the advantages? Well, the answer is: algebraic data types.

JSON, after all, is just one simple algebraic datatype:

data JSONValue
 = JSONNull
 | JSONBool   Bool
 | JSONFloat  Double
 | JSONString String
 | JSONArray  [JSONValue]
 | JSONObject (Map String JSONValue)

That’s some simplified Haskell, but the important thing is: we’re almost done. All we need is some code to serialize/deserialize. We could even have Haskell make up its own code to do this for us, but it wouldn’t actually be the JSON syntax.

Compare that to the equivalent code we’d have to write in, say, Java. 7 short lines versus, what? A few hundred? And the code we write to consume/build these data structures is also going to be quite a bit simpler and shorter.

Let me pick another example. How about a (simplified) representation of C types?

data CType
 = CTypeBuiltin   [Qualifier] BuiltinCType
 | CTypePointer   [Qualifier] CType
 | CTypeTag       [Qualifier] CName
 | CTypeArray     CType (Maybe Int)
 | CTypeFunction  CType [CType]

Not too shabby for 6 lines. But consider what it would take to do this with classes.

Then consider what choice you’d make, if you wanted to store a table mapping names to types in SQLite.

As for the relative advantages and disadvantages:

  • We get values the same way JSON does: immutability.
  • We can automatically generate serialization/deserialization code, as long as the particulars of the syntax don’t matter too much.
  • We don’t need to write any boilerplate.
  • The code precisely documents the range of valid values.
  • The code is easily refactorable. I haven’t gone into pattern matching and exhaustiveness checking, but algebraic data types are an absolute dream for “change something, fix compiler errors, wow everything just works!”
  • The opportunities for validation errors are virtually eliminated. Our C types example requires only checking to make sure that a CName is actually a valid identifier (and thus not, say, a pile of arbitrary code). It’s possible to perfectly enforce this by making the constructor of a CName type always check this, and throw an exception otherwise. (Something that can’t be done for JSON.)

The lack of algebraic data types in modern languages and relational databases is, in my opinion, the biggest cause of frustration, error and bad code since the lack of generics. I hope one day even conservative languages (like C and Java) will incorporate them.

Posted in Programming | Tagged , , | Leave a comment

Human-computer interaction and programming

The other week, I was dismissive of an attempt at creating a “non-programming” sort of programming. I’d like to explain myself a bit more.

The trouble, I think, is that many people confuse the problem. Programming is pretty terrible, but I don’t think that’s because we write programs in a programming language. I think it’s because we write poor programs in poor languages, with poor tools, using poor libraries, that are all poor mostly because we’ve built everything on shaky foundations we can’t fix and basically because we still don’t know what we’re doing as an industry. So we’re quite fond of metaphors like “it’s held together with duct tape and bailing wire.”

When people try to create “non-programming” sorts of programming, they generally fix none of those issues. And they remove the successes we do have with languages. And they mistakenly believe that “ordinary” people are incapable of programming. And then they make a truly, horribly fundamental mistake: they try to make things graphical and clicky.

You see, I think there’s three basic levels of communication from humans to computers. There’s the “point and click” level, the expert “real-time” level, and then there’s the language level.

You’re probably familiar with all of these, but let me make sure I’m clear. We browse the web almost entirely at the “point and click” level. It’s essentially how we interact with tablets and smart phones.

For the expert “real-time” (for lack of a better term) level, we can look to the kinds of sound boards that audio engineers use. Or to vim used by programmers, and it’s myriad of commands. Or even to games, like World of Warcraft, once players bind every ability to a key on their keyboard (and aren’t clicking on icons anymore.)

Or even just a keyboard when you need to type a lot of text. Doing that on a phone or tablet is awful!

Already with just these two levels, I think people generally appreciate the difference. I’ve heard many people parrot the comment on how tablets are only good for consuming content, not creating it. And I hope nobody tries to write papers or essays on a tablet without a keyboard.

But once we get to the next level, linguistic communication, I think people seem to forget that lesson, or somehow think it doesn’t apply. Then we get Lab View and Excel, and all manner of other mistakes in trying to program by pointing and clicking. Somewhere along the line, I heard someone joke that you can’t do anything truly useful without language, as a caveman, solely by pointing and grunting. Well, you can (see above: surf the web! “useful”…), but I think when you need language, it is really and truly irreplaceable. When humans need to communicate with humans, we sure haven’t figured out anything better yet.

The trouble, of course, is that these things (like Lab View and Excel) have actually succeeded pretty well, and so I can’t be totally right that these are useless. But the trouble is not that they’re useless, it’s that they inevitably lead to horrible messes as a result. Multi-million dollar businesses relying on incredibly error-prone piles of crap no one wants to touch, and no one understands, is extremely common. We have entire research programs dedicated to trying to figure out how to mitigate this mess. And they have a “horror stories” link on their front page.

So I’m extremely skeptical of this sort of thing.

But let me point out one bright side: Scratch. It’s point and click programming. But there are two amazingly important differences between Scratch and other kinds of point and click programming I’ve seen. One: it’s still quite linguistic. Look at the front-page example. An event hook, a loop, and a series of commands. No you didn’t type all that in letter by letter, but you’re still pretty much writing in a language and kids get it. And two: it’s a stepping stone for learning, not an attempt to “make programming easier” or make programing “not programming.” This makes all the difference in the world. Nobody is going to try to build anything but toys for their own education with Scratch.

So here’s my tl;dr bit of advice for anyone trying to replace Excel, or whatever: you can have a Scratch-like learning mode for using your software, to make it easy to use for non-programmers. But the point of this shouldn’t be to “make programming easier” or wave your hands and claim it’s “not programming”, it’s to teach those users programming. Confront the hard problem straight on, and solve it. One day, they can turn that mode off, and then write their programs.

Oh and also, don’t make up a programming language that ignores everything we’ve learned about programming languages for the past 60 years. It’d certainly help us avoid a big mess if they’re not then writing programs in yet another poorly thought out undesigned language.

Posted in Programming | Tagged , | Leave a comment

The Fermi paradox… isn’t.

There is a common topic among people who like to think about the future and space travel: the Fermi paradox. Simply put, if you take even tiny-seeming growth rates for any civilization in the galaxy, it should be everywhere in the galaxy within a billion or so years. And there have been 14 billion years, so where are they?

The wiki page suggests quite a lot of possible explanations, many of them nonsense, unfortunately. But it’s worth questioning the very basic assumptions of the argument, rather than resorting to making up extra things to explain the problem away.

The paradox rests on a few foundations:

  1. Civilizations are at least relatively common.
  2. Civilizations survive.
  3. Civilizations expand.

There are several other totally non-essential assumptions that are sometimes made. For example, the whole bit about radio broadcasts in the wiki isn’t really relevant because radio won’t be detectable from background noise at any real distance. But it doesn’t matter, because given a bit more time, you’d expect them to simply be everywhere.

There are already ways to question the first two assumptions. Most of them somewhat implausible. Dystopian sci-fi is all the rage these days (possibly partly fueled by inaction on climate change), so blathering about civilizations destroying themselves is a popular. But I’m not such a pessimist. The sticking point is that even if most civilizations die off, well… the rest ought to be enough.

The trick of the paradox is that even if the numbers are tiny (1 civilization per million stars! 1 in ten thousand survive!), you multiply by a big number (times 300 billion stars = 30 civilizations), and you basically only need to get 2 for the paradox to work. There’s us, and there’s someone else, then given enough time, we ought to be everywhere.

But… I don’t see people question the last assumption enough. Why expand? Usually this is hand-waved away (if the expansion rate is only 0.00000001% per year! Wow so little!) but that’s not good enough. The number really could be indistinguishable from zero.

Here’s the UN population projections to 2100. If anything, I think these are optimistic, and they suggest a leveling off around 11 billion humans. This isn’t due to any sort of resource constraint. It’s entirely due to the birth rate falling off a cliff (in fact, going below replacement and we expect population to decline!) as soon as a population reaches a certain level of economic security and education (particularly for women.)

So, here’s a thought, if we top out at 11 billion humans, just what are we going to fill those 300 billion stars with?

There are of course some counter arguments. For example, what about AI? Or curing aging? But I think these too are dead ends. I’m not worried about AI, but that’s a topic for another post. As for curing aging… I predict it will lead to an even more severe drop-off in the birth rate. (You know the “ticking biological clock?” Yeah, why rush? Without that deadline hanging around…)

One can also object that maybe other species won’t be like humans in this respect, but… why assume otherwise? We don’t really have any reason to. I don’t think this objection is sufficient when we’re calling the problem a paradox. You don’t get paradoxes from unfounded assumptions. Just don’t make that assumption, problem solved.

Ultimately, I don’t think civilizations will expand beyond maybe a handful of nearby stars, and that’s iffy. (Depends on whether there are any real unpredictable existential threats that can hit a whole star system at once.) And the long term expansion rate will be determined only by the rate that stars die (the sun should be useful for another 8 billion years, even if the Earth isn’t. Why move before then?) Which means the growth rate is indistinguishable enough from 0 that the paradox is no more.

Posted in Futuring | Leave a comment

Programming is the new literacy

Chris Granger has a post ironically titled “Coding is not the new literacy.” It’s pretty good. Despite the early part where he attempts to redefine literacy in order to have a catchy title:

This is certainly accurate, but defining literacy as interpreting and making marks on a sheet of paper is grossly inadequate. […snip…] …composition and comprehension. And they are what literacy really is.

So if you ignore that bit, it goes back to, in effect, arguing that programming is the new literacy:

Reading and writing gave us external and distributable storage. Coding gives us external and distributable computation.

I think the probable source of his disconnect is that he’s working on Yet Another “hey, maybe people should program without actually programming, so let’s call it modeling and then make a gui to click on and,” and well, I don’t really need to know more, that’s going nowhere.

I’m definitely 100% on the side of programming being necessary to have any level of higher communication between human and computer beyond pointing and grunting. Language is the highest form of communication between humans and there’s no reason to believe this isn’t true between humans and computers as well.

The trouble for bringing programming to the masses isn’t that we’ve stubbornly insisted on writing code in a programming language as the way to do it. It’s that our languages, tools, platforms, and so forth are all basically crap. Programming is legit pretty stupid, and we have to put up with a lot of mistakes we can’t seem to actually fix. Mistakes piled on other mistakes, that require us to make still further mistakes.

Granger ends with referring to Alan Kay. Kay is someone I also agree with in the sense that he’s identified the right problem. Unfortunately, Kay is also someone I think proposes solutions (or at least, suggests directions in which to find solutions) that are… also quite incorrect.

Ultimately, I think this is the real issue. We don’t need prophets to tell us what the new One Correct way is. We need to be able to just fix our past mistakes. Just the ability to not be married to designs and ideas we invented 30 years ago, and realized were garbage 25 years ago, but still persist because we’re unable to change anything.

That’d be nice.

Posted in Programming | 1 Comment

More on Abstraction

Almost a year ago (evidently), I wrote briefly about Abstraction, Modularity, and Composition as important principles in designing software. I’ve had a few more thoughts to add about abstraction.

There are two sides to the abstraction “coin”: there’s (for lack of my knowing of any better terminology) pattern abstraction, and then there’s property abstraction. Each of these are simple enough to be illustrated with just the lambda calculus itself as an example.

What prompted me to write today was that I saw a claim that macro assemblers are really all-powerful and all these languages we’ve invented are “really” just macros piled on assemblers. This is, of course, bullocks.

This amounts, essentially, to the claim that the only abstraction that matters is pattern abstraction. Pattern abstraction is pretty much just taking commonly repeated patterns, and creating an abstraction with the varying parts as parameters. So, if we find ourselves iterating over lists of things often enough, we might go write:

apply f (h:t) = f h >> apply f t
apply f [] = return ()

And we’ve now abstracted away this iteration pattern, and we can now just write the function we want to run on each element of the list, and re-use the rest. Identify the pattern, and abstract it out.

Applying this to macro assemblers, and it amounts to the claim that class definitions and such from higher level languages only matter in the assembly code they cause to be emitted. The pattern is the only thing that matters.

But that’s not all there is to it. In particular, with our apply function. we’ve abstracted away an iteration pattern, but when can we use it? For such a tiny little two line thing, maybe this isn’t a big deal, we can just go read the code. But for bigger and bigger abstractions, “just read the code” isn’t good enough anymore. We could go read the documentation, but when we have more and more abstractions, “just read the docs” isn’t really good enough anymore either. Read what docs? Should we know everything about everything before we write any code?

We start to see the beginnings of property abstraction when we look at static types:

apply :: (a -> Action) -> [a] -> Action

(I’ve simplified this a lot, because I don’t want this to be about Haskell. Just read “Action” as a return type like “void” in C, where the point is to have side effects, not actually return something.)

Before, we could look at that pattern in a purely untyped way. We could pass anything into it, and the only way to know what would happen is to run it (or at least, simulate running it in your head.) But now we know some properties about the function, and about its parameters.

Abstractions always have an “outside” and an “inside” and both matter, and both have properties we’re interested in. An “abstraction” is not merely a repeated pattern like “apply”. It’s also something we’re holding abstract about which we know some properties, like “f” within apply. (And in some sense, “apply” for the rest of the program.)

My point of this isn’t that static types are great, or whatever. It’s that properties are great. The more we actually know about the stuff we’re working with, the less horrible programming is.

I think “property abstraction” is the main reason that, for example, Java has succeeded, while Lisp has languished. Everyone likes to make fun of Java, but despite its flaws, I do think it’s a good thing that, in Java, everyone knows it’s just plain not possible for an object to suddenly mutate what class it is. Lisp advocates go on about the “power” of being able to do something like that, but I think they’ve forgotten the “power” of knowing that sort of thing can’t happen to you. (Ditto Python vs CLOS, if you want an example without static types.)

Anyone who thinks languages are just macro assemblers is making the same mistakes. They’re looking only at the patterns, and not the properties.

Posted in Uncategorized | Leave a comment

Mere thoughts on a biological programming language

I occasionally get ideas that I don’t have the time or opportunity to pursue, so I thought I’d organize and write about one of them. Synthetic biology is an extremely interesting area, which might have caused me to take a different career path, if I’d been born a little later, or knew more about it when I was younger. A few years back, some clever people managed to create the first 100% synthetic genome (as in “we turned bits and nucleotides into DNA”), and inserted it into a cell and it thrived.

It’s interesting to think about what the end-game of this technology is. There are multitudes of problems solvable by designing novel organisms in a computer and synthesizing them, even with a relatively crude understanding of biochemistry. So there’s a question closer to my area: what would a programming language for organisms look like?

There are a few things we can answer with relative certainty. We can look at the question from three approaches: top-down, bottom-up, and black-box. What, about programming languages in general, can we say with reasonable certainty will apply to even a biological programming language? And how far can we go from what the result must be (DNA) backwards to what the language must be like? And regardless of what the programming language looks like, what properties must be true about it?

The language will be grounded in type theory. This might (to someone not familiar with PL theory) sound a bit unreasonable, since we still live in a world where most computer programming languages aren’t grounded in type theory, yet. But type theory is the fundamental theory of programming languages (and they’re hard at work extending that to logical and all of mathematics), so it’s not too controversial from that perspective. The only real reason our current language aren’t grounded in type theory is our current languages are a mess of historical accidents.

If this seems hard to imagine, the next probable property might help: the ambient monad for a biological language will be totally different from computer programming languages. (That is, different from languages for Von Neumann/Turing machines.) Most of our programming languages pervasively allow things like state, mutation, and IO. Even Haskell, with its reputation for being “pure,” pervasively permits non-termination of computations. A biological language will likely share none of these “ambient monads”, though I can make no guesses at the moment as to what ones we might find convenient to use. (I suppose there will be a sort of ambient state for the cell, but it will be different from how we typically think of state in current languages.)

If that’s still hard to believe, let me try it a different way: a type theory that permits no effects is really just another way of writing down data structures. Once you start wanting to describe a complicated enough data structure, you start wanting more and more abstraction, modularity, and composition features until you end up with a programming language. And that should (darn it) be grounded in type theory.

So next, bottom-up: the language must compile down to DNA. Of course. But we can also draw a nice line in the sand between regulatory genes and (I don’t know of a word for “non-regulatory gene” so I’ll call them) IO genes. I don’t know enough biology to know if “real” (i.e. evolved) organisms obey anything even remotely like a nice separation rule between these two kinds of genes (and evolution suggests it almost certainly does not), but it doesn’t matter. Just as a C compiler cannot generate all possible assembly programs, we can live with our bio language only generating genomes that are “well-behaved” in this way.

But this means that IO genes are the “foreign functions” of the language, and 100% of the purpose of our “program” is to generate the regulatory genes. We’ll almost certainly never write a program that somehow compiles down to the genes for constructing “chlorophyll a”. That’s too complicated. Too much chemistry, and the algorithms involved in figuring out a chemical structure like that are complex. (In the “complexity theory” sense.) You don’t want a compiler solving a problem like that, you want to solve it once and for all, study it carefully, and these re-use the result. Happily, this means evolution gives us tons of building blocks right from the start.

The regulatory side is perfectly reasonable, though. We can decide how we’re going to go about promoting and suppressing expression of IO genes, and then generate a set of regulatory genes that does this exactly according to the program. Again, we’re taking advantage of the fact that we don’t need to be able to reproduce all the different kinds of regulation that have evolved. We only need enough that we can regulate gene expression at all.

Foreign “IO” genes are what the name suggests: both inputs and outputs of the program. That is, some of these will be pure sensing: they detect something about the environment of the cell and cause a change in gene expression. Meanwhile, others will be pure output: they will cause something physical to happen only when they are expressed, but will cause no (direct) effects on gene expression. Others may be both. But this is not the only sensing that can go on: many functional parts of the cell (for example, stoma) will “sense” but purely within their own chemistry, and not directly controlled by gene expression.

The regulatory genes generated by the compiler will be intra-cell only. Probably. It’s possible to rely on “foreign IO” genes to accomplish communication with the environment, including other cells. And this is likely a good idea, because there are a lot of different ways cell can communicate, so it might be unwise to try to fix a few in stone and bake them into the language.

Metadata will be associated with all foreign genes. We’ll want to be able to simulate our programs in the machine, in order to debug and test them. To do that, we need to be able to abstract far away from the actual chemical machinery of the cell, because otherwise it is totally computationally infeasible. Even if inter-cell communication is part of the core of the language and thus does not need to be part of the metadata for foreign genes, we’ll still want to be able to run statistics on things like oxygen exchange, to make sure no cells will be starved and things like that. Since these are the result of the physical effects of expressed genes (i.e. the IO of the cell), we’ll need information on what those effects will be to simulate them without having to resort to simulating the chemistry.


So, finally, if it’s not obvious, I should note that I’m no biologist. This is just interesting. So with these few ideas in mind, the next question is: what’s the first step? If we designed a prototype language of this sort, we’d probably want to follow the work on synthetic genomes. Take the first synthetic genome, separate it into regulatory and “IO” genes as best we can, and then rewrite all the regulatory parts within the language, operating on the IO genes as “foreign functions”. Or at least, do so for a small part of it at first. (After all, a “trivial” program exactly reproducing the organism would consist of all the current genes as IO genes, and no program code at all. So we can start with parts and grow to the whole.)

Next, compile it, put it into a cell, and see if the new-but-not-really-different organism manages to survive. And behaves the same. This also happens to be basic science: you’d be verifying your understanding of the regulatory network’s behavior by creating a totally synthetic gene regulatory network.

And then, as the technology to synthesize genomes becomes easier, and the loop between “design genome -> test organism -> measure results” becomes tighter, the scientific opportunities start to explode.

Posted in Uncategorized | Tagged , | Leave a comment