New Post at the DevBlog
http://guidewiredevelopment.wordpress.com/2008/03/18/enhancements-in-gscript/
Pretty cool feature of GScript.
Dynamic Languages are Wrong
Dynamic Languages
Are Just Wrong For 99% of All
Development
There has been a flurry of excitement over ruby and a
few other dynamically typed languages in the last few
years, driven mainly by rails on the server side and
javascript on the browser. Rails is a great project
and is far better for most websites than the J2EE
stack, but that unfortunately obscures the fact that
the language it is built on, ruby, while superior to
Java in many ways, just isn't the right thing for
most developers.
This is a relatively new opinion of mine. You can
see previous posts of mine where I'm very
enthusiastic about ruby. And, at some level, I'm
still enthusiastic about it. Many of the features
Ruby offers we've translated into GScript at
Guidewire and I'm forever in debt to it for that.
But that doesn't change the fact that I think it
is wrong for most developers.
In order to prove this somewhat ridiculous claim,
I'll compare what I consider the key features of ruby
and how GScript matches up in its statically typed
world.
Terseness
Ruby is incredibly terse when compared with many
statically typed languages. As a motivating example,
a simple enterprise-y method definition might look
like this (Note: I intentionally include
an assignment to a local var in order to contrast
with GScript):
def employees_over_age( age )
emps = @employees.find_all { | e | e.age > age }
emps
end
Compare that with the five to ten lines of java you would have to write to accomplish the same task, with all the generics and types you would have to annotate. I can't even bear to write it all out.
But let's look at the same function defined in GScript:
function employeesOverAge( age : int ) : Employee[] {
var emps = _employees.findAll( \ e -> e.age > age )
return emps
}
I'll admit, it is more code. But not a ton more and I think most of the additional code is pretty reasonable: you have to annotate in and out types at the method level, which is good because you can restrict your implementation details from leaking out of the method. You have to put an explicit return statement in the code which I actually find more readable. And you have a slightly more verbose but also more consistent syntax for blocks.
So ruby wins in terseness but GScript gets pretty darned close even though it is statically typed.
Open Classes
Ruby elegantly (or hackily, according to tastes) solves another common problem: what if someone hasn't designed a class to your liking, omitting an obvious method or two, and you want to add this functionality to it. In java, this has lead to a proliferation of *Util classes: StringUtil, ObjectUtil, DateUtil, FileUtil, etc. There are thousands of these util classes filled with static methods floating around java code bases. Some code bases are so large (*cough* Guidewire *cough*) that there are many multiple different versions of these utility classes, often with subtly different names.
In ruby, you can simply add a method to a class like so:
class String
def my_method()
puts( "Holy Crap!!! I've added a method to Strings!" )
end
end
These are referred to as "Open Classes." Pretty neat, eh? Well, it's so neat that we decided we needed something like that in GScript, so we added something called Enhancements. Here is an equivalent Enhancement:
enhancement MyStrEnhancement : String {
function myMethod() {
print( "Holy Crap!!! I've added a method to Strings!")
}
}
Not bad, eh? And because GScript is statically typed and because we provide an IDE for it, you will get very nice code completion when you hit '.' after a string object, with your shiny new method available and quite discoverable.
So I think GScript actually wins here by a hair because it formalizes the class extension mechanism in a new language construct, but both are just about equivalent. The point is that you don't need dynamic typing for this very useful feature.
MetaProgramming
The really killer aspect of ruby and what lets rails clean J2EE's clock in terms of ease-of-development is the metaprogramming ability you have available. This allows you to dynamically generate classes on the fly based on, well, whatever you damned well please. This is how ActiveRecord builds classes based on your database schema, with no code-gen phase in the middle to gunk up the works. You change the schema and, *bam*, your class is updated.
It's hard to contrast this with GScript's alternative in a succinct, blog-friendly way, but I'll try. GScript has an "open" typesystem allowing anyone to implement a TypeLoader and custom Types in java (which underlies GScript.) That TypeLoader can construct its types based on whatever metadata is wants to just like in ruby. At Guidewire we use this feature to build type systems on top of our web-UI files, on top of our internationalization properties files and on top of our OR layer, to name just a few. This allows GScript code to access these resources in a typesafe way but without any sort of a code-gen step.
GScript's Type System, therefore, is very flexible in much the same way that Ruby's is: developers can implement their API's in terms of types that they create dynamically rather than statically, based on whatever metadata they like.
An Anti-Feature: DSLs
A lot of developers are excited about DSL's in ruby that take advantage of the flexible nature of the ruby langauge. I'm not sure they are such a good idea. I think developers, on balance, would prefer to program in one sufficiently powerful language. I would imagine this is especially true in the enterprise space. I also think language design is pretty difficult and putting a bunch of people to work churning out specialized languages wouldn't turn out as well as we might hope. I think there may even be a biblical story about that sort of thing.
Rather than domain specific languages, I think there should be domain specific type systems: as I mentioned above we have type systems for our web layer, our OR layer, our permissions layer, etc. and it all works out grand. You have access to all these resources in a single language, GScript, presented in (one hopes) a nice API shaped by the dynamically generated types of the particular TypeLoader.
No new syntax to learn, just more libraries. Nice.
So Why Are Dynamic Languages Wrong?
Really it boils down to two reasons: tools and static verification. The first reason is far more important than the second one.
Being able to hit '.' and see what the hell you can do with an object is priceless, particularly on larger projects. I know some people say "don't get involved in larger projects" but, well, sometimes it happens. Refactor tools (yeah, yeah, SmallTalk, blah blah blah) are far easier to implement correctly with statically typed languages than dynamically typed languages. And total-program analysis tools become possible. If the syntactic and expressive price is low enough (and in GScript, it is) then there is no reason to give up all this functionality for a dynamic language.
Static verification has gotten a bit of a bad name lately and we often joke at Guidewire that "well, it compiled, it must be right." Still, when you are making big changes and you have tens of thousands of tests to run, it is really nice to have something relatively fast (a compiler) point out things you have obviously missed at compilation time rather than waiting to run a series of test suites (even on our distributed testing cluster, it often takes up to an hour to hear back about every test after a checkin.)
But really, I could have stopped at '.'
Good code completion pretty much QED's the argument in my book.
GScript...
Check it out.
GScript rocks.
Java and XML: Let's not use them together
- Java isn't flexible enough, both syntactically and with respect to it's type system. (Let's leave aside the lack of a reasonable lambda-style syntax for the moment, which is higher on my List of Things That Make Me Cuss When I'm Programming In Java, but is not as relevant for this article.)
- XML is, by design, horrifically redundant. (This is almost acceptable for what it was intended to be: a non-human readable data format that didn't have the negative connotations associated with s-expressions. It is totally unacceptable now that people are forced to look at it all day long.)
What I'd like to look at in this post is why I think XML has become such a huge part of java development and why I think that is unfortunate. The short answer to the first part is:
People need Domain Specific Language
(DSLs)
Why do people need DSL's? Because there are whole
chunks of applications that don't need a full,
general programming language, and for which a full,
general programming language is poorly suited. O/R
mapping is a good and common enough example. Build
tools are another: you want a syntax that
encapsulates the common operations so you don't end
up generating reams of general code to do basic
activities.
As noted in
this
paper,
there is a continuum between libraries and DSLs. So,
why have DSLs at all? Why not simply design
libraries?
The answer to that question is: syntax. As much as
academics might scoff at it, syntax matters, and it
matters
a lot.
This is precisely why ruby is enjoying so much
success right now: ruby's syntax and evaluation rules
are so flexible that it allows you to create minimal,
expressive DSL's with very little effort. You simply
have to get your head around how meta-programming in
ruby works, and you are off to the races. Ruby on
rails is a DSL for building web applications, and a
pretty darned good one.
Java is, of course, much more locked down than ruby.
And this isn't necessarily a bad thing. The java
designers were coming from a world replete with
horrible C macro-kludges, so it's understandable that
they decided to leave out syntactic extensions. If
every man is a language designer, you end up with a
ton of badly designed languages. But you also end up
with a few very well designed ones. And I'm not
entirely convinced that the vast majority of useful,
small DSL's aren't simply badly designed languages
that answer a particular specific need, akin to
German's relationship with soldiering.
In any even, that's wandering a bit off point. The
facts on the ground today are that java developers
have been in need of a way to design and implement
DSL's for a while now (even when they don't call it
that) and the accepted way to do it has become XML.
Why?
My theory is this: in java, DSL's usually start out
as a library, then progress to a library with a
smidgen of configuration. XML became the de facto
standard for config files during the .com boom,
property files apparently not being cool enough, so
config information ended up in XML files.
Additionally, XSD's give us a rudimentary language
syntax (though not semantic) verification tools.
All fine and well. I might pick another syntax for
structured configuration (say,
YAML),
but whatever. XML is reasonably suited for simple
declarative programming.
But then we java developers started doing more and
more in those config files and, at some point, they
began to cross over that invisible line and
become
semantically crucial parts of our
applications.
They no longer simply contained a few flags used to
slightly modify runtime behavior. They became an
XML-based programming language for crucial
subsystems.
This is unfortunate, for many reasons. Among them:
- We have traded java, a language that, while certainly not beautiful, is at least plausible for one that was never designed for human consumption. XML is utterly miserable to use in large quantities. See ant build files, and weep.
- We now have to think in two different syntaxes. I maintain that this is a difficult transition for a significant portion of the programming populace.
- We cannot have any sort of locality with related java code. Again, the syntax is so utterly foreign that it is like mixing Japanese and English. Even if we could put it in the same file, or add IDE support to navigate from one to the other, it wouldn't work well.
- And, most interestingly to me, it becomes difficult to communicate whatever type information we have built into our DSL to Java. We have two choices I can see:
-
- We can do java code generation off of our XML-based DSL's, which everyone hates. Among other things, it takes time, requires a lot of infrastructure work and can introduce some nasty build dependencies. We do a fair amount of this a Guidewire.
-
- We can communicate with our DSL via non-typesafe mechanisms (usually hashes and strings). This is the preferred mechanism because it is the easiest. Simply do nothing! But then one wonders why we spend so much time crucifying ourselves on the cross of type safety in java, when increasing amounts of our application code reside across this great type-unsafe divide.
So, that outlines why I think we ended up with so much XML in our java applications, and why I view that as an unfortunate thing. Now the hard part: what can be done about it.
Frankly, I have no idea.
My first reaction is that we need to open up java with a type-safe macro language to allow for syntactic extensions. But as nonchalant as ruby has made me about language extensions, it still seems insane in java. The macro (meta?) language needs the ability to communicate with the java type system easily, making it easy to generate coherent error messages.
I realize, of course, that I may simply be saying something as absurd as "let's make hard problems easy," but I have to believe that there is a better way than the current state of things.
I'm going to spend some quality time with O'caml/camlp4 over the next month and see if I get anything out of it.
Related Links:
All Ordered Combos
OK, OK, OK, last one, I promise
Nowhere near as elegant as the inject method below, but this covers both assignment and block usages, so you can pick your poison. If you pass in a block then you have linear rather than exponential memory usage, although the run time is of course equivalent.
OK, I'm done. This is my final answer.
Wait... Maybe we should add an optional argument with a default value that limits the output, to prevent inadvertently calling it with an array that will take forever to return...
You know what this blog needs? More power_set().
A witty exchange of emails followed (the sort of thing that makes you love working at an engineering oriented company), and Jim made the point that all the offered implementations were horribly memory inefficient. He offered an iterator-based solution in java.
Well, I for one am not going to stand here and let our favorite little programming language have its name dragged through the mud. So here was the iterative-based solution I came up with:
power_set()
Dynamic Languages
Despite how much I love Ruby, I'm still skeptical of how well it will perform in a large system. I just don't have experience with large, dynamically typed systems, and a lot of older engineers I respect shudder at the idea. Maybe the prevalence of unit-testing will change this (Martin Fowler seems to think so.) I guess we will have to wait and see how the Rails projects turn out to provide evidence one way or the other.
Testing and Change
* Test-first development is hard when a GUI layer is involved
* End-to-end tests are not worth the effort until you have a 1.0 product. And perhaps they aren't even worth the effort until you have a 2.0 product, where certain application paths have been established and need to be maintained.
* Unit testing can get really nasty when there are elaborate dependencies between classes
o It is very hard to keep dependencies low. It requires effort at every step. If you aren't constantly watching it, you will introduce them.
* A flexible sample data and configuration generation platform is crucial for a good test environment
* If tests aren't easy to write, they won't get written or they will be written poorly
Why *their* programming language is cooler than *my* programming language
--Good ol' quicksort
quicksort [] =[]
quicksort(x:xs) = quicksort[ y | y <- xs, y < x]
++ [x]
++ quicksort[ y | y <- xs, y >= x]
--The list of the Fibonacci numbers
fib = 1 : 1 : [a+b| (a,b) <- zip fib (tail fib)]
Holy. Crap. Too bad I had to get a masters in CS to understand what the hell is going on here.
: /