I'm not saying Java is a terrible language, compared to something like Perl for example. I don't think learning Java will make you a stupider person like learning Visual Basic will. But Java, like PvdL, has a sinister underside. It has tremendous problems that don't get discussed openly. And most significantly, Sun has controlled the buzz around Java with such frightening success that it has slid up to prominence on greased rails amazingly without opening a dialog about the virtues of other non-C languages and other serious runtime environments.
I think Java is a bad choice for many of the projects that have tried to use it. During its hype phase, Sun promised us and we in turn promised each other certain things about Java, with incredible vociferous and emphasis (not unlike Linux promises stability and security), and we have dismally failed to coax Java into delivering them. Now that Java is old enough to document its own failure in copious detail, it is not okay to keep using it in new projects as if it were fresh, unproven ``technology'' with great promise, but a lot of kinks that we will have to tolerate as the price for working on the bleeding edge. It's too late to be so forgiving and optimistic. It is time to honestly survey what has happened with Java, and why.
Even when Java was new, its hype came at the price of an honest survey. Had the Java-hypers bantering across neighboring coffee tables been a little more conscious of their own history, we would probably still have been excited about Java, but we would not have been so surprised when things didn't pan out so well. More importantly we would not find ourselves so discouraged by the failure of the only-game-in-town that we would continue with this pathetic denial, pretending Java is still being massaged into usefulness well after it screwed us over so decisively.
Let's consider some much older languages that delivered on some of Java's promises.
emacs has a web browser written in elisp---it displays images and, in my experience anyway, is much more useful than Hotjava. The best news and mail reader in existence is called Gnus, and is also written in elisp. Gnus is good enough that many people patient enough to learn it choose to use it instead of a plethora of traditional machine-binary mailreaders on their platform-of-choice, which is more than I can say for Java news and mail readers which generally only see use because their victims are banished to some kind of web ``kiosk'' and need to use the office's expensive proprietary webmail software.
emacs runs on more platforms than Java does.
elisp programs, in my experience at least, do not suffer from the portability problems that Java programs encounter when moved from one JRE to another. The same emacs codebase builds into the emacs editor-and-runtime-environment on all the platforms that emacs supports. the elisp environment is consistent.
As far as I know, emacs does not have a sandbox-jail for running untrusted programs on web pages. Neither does Javascript, but let's consider Javascript a marketing artifact with nothing whatsoever in common with Java. so, I must grant that emacs doesn't attempt a sandbox, and Java does with at least some meaningful success. emacs also lacks threads.
However, elisp is a more portable language than Java, and elisp has more user applications with relevance to the ``web revolution'' or whatever than Java does. emacs is more convincing than Java as a successful ``middleware'' platform.
You don't mention one of the first and most successful systems with bytecode, the UCSD p-system.so. Java borrows its syntax from C and its runtime architecture from Pascal. These are the two most prominent so-called ``high level'' languages since punch-cards. What architectural basis could possibly be less ``innovative'' than a synthesis of the oldest and topmost two players?It was largely responsible for making Pascal THE language to know in the late 70s and early 80s, and gave us the term ``p-code'' which is now used generically for any interpreted bytecode.
Microsoft programs were compiled to p-code into the late 80s, and the MS compilers would emit p-code up until the mid-90s.
I think the main thing he left out, though, was how not-fun writing code in Java is.
Perhaps an architecture based on obsolete compilers from Microsoft, the kings of BASIC? Oh, wait. Java is that, too.
The article that <0@pigdog.org> cites goes on to explain that p-code was necessary because of the primitive state of compiler tools at the time. Reusing p-code back-ends produced an inferior compiler, but it saves work for the compiler developers compared to a monolithic design. The p-system defines rigid technical specifications at the boundary between the compiler's front-end and back-end. This means the Pascal--to--p-code compiler can get its funding from a separate group than the p-code--to--native-insn compiler, thus penetrating the market with less investment and maximizing ROI. The rigidity of the p-code is what protects business investments in the p-system. All this comes at the expense of modern compiler design, which uses abstract and constantly-advancing intermediate represenations, often less brute-force literal representations based on strongly-typed functional languages like Haskell.
Now that the market for C compilers is mature, it's possible to fund a compiler without resorting to p-code. Java still splits the compiler investment: for example, Sun funds development of 'javac' while Digital funds the Alpha/Tru64 runtime environment. Two companies pool investment to get Java servlets running on the Alpha, and their responsibility splits along the boundary of the simplified, inflexible p-code.
Sure, there are other reasons for p-code, but Pascal's history shows how, in a more realistic context, decisions Sun tries to pass as clever architecture fit better as cynical bows to unpleasant business realities.
An OpenPROM user can write small Forth programs at an interactive prompt and execute them immediately. One can also write OpenPROM-Forth programs to run at boot. These 'nvramrc' programs contend with other information for 4kB of nonvolatile storage on the clock chip. OpenPROM is a complete minimalist operating system including device drivers for all interesting hardware in the system, memory management, a text editor, and a debugger. The OpenPROM operating system can even load and run programs stored on the disk. Solaris is one of the most popular programs for OpenPROM OS.
Solaris does not include hardware drivers for anything where an OpenPROM driver will suffice instead. The OpenPROM Forth runtime environment is slow, so, for example, it's totally unsuitable for disk access. Solaris includes SCSI and IDE drivers for direct hardware access to disks.
However, when Solaris users login to ``the console'' without the X Window System, it is the OpenPROM framebuffer driver that paints characters onto, scrolls, clears, or flashes the screen. Solaris passes small Forth programs to OpenPROM for execution under its runtime environment. The console on Suns is notoriously slow, but one can boot and halt Solaris without clearing the screen. OpenPROM drivers can also stand in for non-critical-path functions in real drivers which must be fast. For example, an Ethernet driver needs to access hardware directly for sending and receiving packets, but the OpenPROM network driver might be an appropriate way to load the hardware address into the Ethernet chip. X needs to write directly to the framebuffer, but it doesn't need to know about monitor sync timings, or how to program resolutions and bit depths into the framebuffer's registers.
SBus and PCI expansion cards for Suns contain a ROM with OpenPROM drivers burned onto it. At boot, OpenPROM enumerates all the cards and links in their drivers. The drivers are stored in ``F-code'', a ``compiled'' version of OpenPROM Forth. There is no SPARC code stored on these ROMs, so Sun could theoretically pull another CPU-switcharoo without updating the OpenPROM drivers on their old expansion cards.
Obviously, it doesn't make sense to write general-purpose applications in OpenPROM Forth, but its existence somewhat knocks down Java's claims of revolutionary novelty.
OpenPROM is a particularly ironic example of how little new ground Java has broken, because the same company that claims to revolutionize the industry with fresh ideas like bytecode and The Java OS has been using these same ideas for the last ten years with no fanfare. Why have ideas that used to be taken for granted suddenly become franticly-hyped propaganda?
Even more laughable, OpenPROM compatibility is a catastrophic disaster---far worse than Java---ever since Apple and Firmworks began releasing hideously bug-ridden implementations. I have seen nothing so awful in a Sun machine made in the last ten years. Apple is the kiss-of-death for OpenFirmware. but, you must admit, the unrealized idea is kind of neat---PCI framebuffers and disk controllers might have worked for console output or booting in PCI-based machines from both Sun and Apple. oh well. In the mean time, at least all the old SBus cards and machines seem to work together fairly well.
NewtonScript addresses the cost and power economies of PDAs better than any existing PDA. Both RAM and stable storage are relatively expensive, and the former is power-hungry as well. Fast CPUs, however, can be cheap and use little power. The executable images of NewtonScript programs are very small.
One writes his or her NewtonScript program on a Macintosh, ``compiles'' it into a binary image with the NTK (Newton Toolkit), and then sends it to the PDA. Compiled NewtonScript is not an arm32 executable, but rather requires the NewtonOS as a runtime environment. Sound familiar?
I already mentioned the web browser written in NewtonScript. There are also FAX programs, email user agents, web servers, and NewtonWorks---the word processor. Newtons have keyboards, printer drivers, and support for modems and networks. Newton software comes in shrink-wrapped boxes and is still for sale. Newtons are computers. The only thing missing is the concept of a ``double click''. All Newton programs must be written in NewtonScript.
now, where is the Java PDA? The Java word processor? The Java web browser? The Java printer drivers? Where can I buy them? And who has written the JRE that can run all these programs in 16MB of FLASH and 4MB of RAM?
And, if someone does some day manage to bring these things into existence, I must still ask, where is the recognition that the Newton did it first and did it better?
Also, ObjectPascal strikes me as somewhat of a pet project, like someone just didn't want to let go of Delphi and tried to drag everyone kicking and screaming along with him. but, whatever.
Oberon is a large project with tangled goals. One can break off a piece of it to prove a variety of points. I first heard about it through a Slarshdot post about their now-defunct ``Slim Binaries.''
Like Java, p-code, F-code, .elc files, and compiled NewtonScript, ``slim binaries'' aren't CPU-specific. An Oberon runtime would accomodate them with equivalent efficiency on any CPU. But ``slim binaries'' were supposed to be much smaller than Java .class or .jar files, and were supposed to be faster to translate into the host CPU's machine language. The combination of the two would supposedly make Oberon ``Juice'' applets embedded in web pages load faster than Java applets. The Oberon people wrote some plug-in for Netscape 1.1 or something that would run their ``Juice'' applets.
The current ETH Oberon System apparently chips away at the much-hyped Java Operating System. I don't know much about it.
Anyway, Oberon sounds kind of neat. I've never tried it, but, as we will discuss later, they claim compiler technology which is decisively a generation ahead of Java, and their small binary size goes well with things like celfones and Newtons.
I've heard all the Java hype about portability and abstraction. I've heard the claims about why nothing short of Java can ever be truly portable across CPUs, because of the sizes of variables and pointers or exposed-endiness or something. Then I wrote a Java program or two myself. The first things I noticed were:
No, don't just brush this off. Old programs exist. I can run old C programs. I can even run old C programs that have been compiled into machine code. In fact, I can run old C programs compiled into MIPS machine code on Digital's ancient ``Ultrix'' Unix for their discontinued ``pmax'' workstations. I can run these programs on my NEC MC-R700A ``MobilePro'' PDA running NetBSD, which just happens to have a vaguely-related CPU core even though it is a totally different type of device. These programs were written and compiled long before Java was even proposed, on a completely different discontinued machine, with a discontinued C compiler, compiling to a two-generations-older version of the MIPS core, and a variant of Unix that's two or three generations old. Why can I run them alongside modern C programs on a modern ``PDA'' laptop, but I can't run a Java program written six months ago on any current machine anywhere?
Why do we see all these ``Best with MSIE 5.0'' banners on web pages, and find that the ``Pure Java'' applets contained therein do not work in any other JRE? Some pages actually have multiple copies of a single applet on them, one in a .cab for the MS JRE, and one in a .jar for the Netscape JRE. I might be inclined to hear the usual Microsoft conspiracy theories if I didn't have all the same problems myself switching between Netscape 4.6 and the Sun JDK on a Solaris machine.
Okay then, it must be ``Because the developers didn't write portable Java code.'' They used constructs in their Java programs that weren't portable. Excuse me, did I hear that right? That is precisely what Java's portability promises claim is impossible to do. It is exactly the supposed portability problem with C programs. If the programmer refrains from using unportable constructs, unpadded or endian-dependent structures, function calls, whatever, then C programs are very portable. Java has this problem, too, and worse than ever. ``Portable code''---Java clamed to be a portable language, and simply isn't. Don't condescend to me with vague wizard-behind-the-curtain whitepaper essays about endyness and the size of pointers and integers. The Java API is not consistent. The Java API is not as consistent as the C API.
Fortunately, Sun is on top of this problem. Their latest ``Web Start'' wrapper claims to facilitate arbitrary released versions of the JRE on a single machine, so each Java program you run can demand its favorite revision of the JRE and your Windows machine will automatically download and install the right JRE release from Sun (at 50MB of disk and who knows how much VM, each). Thanks to Web Start, it's slightly more practical to run Java programs, so long as you're running them on Windows. If not, you'd better hope your vendor offers not only the latest JRE, but backports of every JRE that Sun released, including a separate port of every incremental JRE release Sun has ever made. You'd better hope the portability problems in your Java program actually are between JRE release versions, and not between Sun's JRE and your vendor's. And you'd better be prepared to handle the revision management yourself in case your vendor hasn't designed a ``Web Start'' framework that exactly mimics Sun's.
I don't get it. C has various standards---ANSI C, Posix, periodic releases of the STL---but I have never found a C program that only works with a C library that implements on an older POSIX standard, and breaks if linked against a newer POSIX. I've never heard of vendors shipping multiple C libraries to accomodate old programs. Sun's own Solaris actually surpasses their competitors in this regard, yet they're unable to repeat the performance with Java. Why the failure, if Java is architected to be portable?
The whole reason we were told Java's portability was so important is that it's supposed to run applets inside web pages, and without good revision management only a small fraction of the theoretically very diverse browser installed base would work. If Java was designed from the ground up for use in web pages, why can't anyone predict how an applet will behave without testing it in all the relevant browsers? It should be enough to test in the JRE's appletviewer. Why are standalone Java programs so much easier to write than applets? And why don't all standalone JREs ship with a working copy of Hotjava?
If Java was designed to be portable, why is it so much easier to port C programs to different Unixes than it is to port Java programs to Java Runtime Environments on different Unixes? I've heard people complain they cannot get the Freenet distribution (an anonymous file-sharing and publishing architecture written in Java) to work on JRE X, so they are trying JREs Y and Z instead to see if problems are less catastrophic there. They download every JRE they can get their hands on and hope one of them works. If Freenet were a C program, it would have been picked up by all the Unix package collections by now, and would have the same ``write once, run anywhere'' property as lynx or mutt or any other popular Unix freeware. Since Freenet written in Java, it's a portability nightmare, and only a small inner circle has gotten it almost-working. Java's decoy claims of portability have in effect killed the Freenet, and dragged the Freenet architecture down to the same level of broken fantastic promises that Java makes. ``The mythical Freenet about which we have heard so much.''
If Java itself is portable, then why isn't there a portable way to install and run a Java program without dealing with spaghetti .class-files, setting CLASSPATH, and referring to arcane modules contained within .jar files? Why do we have to use a Unix shell script to start a supposedly-portable Java program? C defines source code, header files, libraries, object files, and fully-linked programs. This includes a pretty clear concept of what is a program, and a uniform way for starting programs. Java does not.
Java programs can have more than one entry point. Java programs are shipped in class files stored inside .jar files, sort of like a plastic bag full of split peas. One or more of the peas will contain an entry point for the program. You have to reach into the bag, find the right pea for the situation, invoke it in the right way, and hope all the other peas it depends on are present in the bag, or in some other nearby bag(s) of peas that hopefully is/are findable.
emacs is a little bit freer about its namespace and cooperation
between programs than C, and there could be some dispute about where
one elisp program ends and another begins. But, it's portable.
Installing an elisp program involves installing some .elc files in the
site-lisp directory, adding a line or two of autoloads to the .emacs
startup configuration file, and invoking the program by pressing M-x
Java defines nothing past .class files. The way class files find each
other is unportable---it's left up to each JRE, and there are usually
several options: .class file in the current directory, .class file in
the CLASSPATH, .class file fetched from a ``base'' URL, .jar file,
.zip file, .cab file. Nor is the entry point to a Java program
portably defined---no .class file is marked specially as startable, so
it is hard to even identify how many startable programs you have just
installed. There is no singular documentation library, nor is there a
uniform namespace for starting programs as provided by M-x.
IS Networks's
MindTerm, for example, is either an Applet or an Application, has
several entry points, and is distributed as the same set of .class
files redundantly smushed together in three different ways: two .jar
files and one .cab file. Which entry point and which .class file
collector actually works for running the program depends on your
JRE.
If Java was designed to be portable, then the design failed. Yes,
Java sucks. But surely this is just a matter of the tower toppling
because of the mad, insatiable pace of economic development. The
foundation of Java is good. Java broke new ground for the industry.
The next version, or some differently-branded but fundamentally
similar successor, will undoubtedly deliver what was promised. All
Hail, The Next Version! Java merely needs to ``stabilize.'' And, in
any case, the Industry has Learned Things from Java, which will
positively influence other unrelated Technologies.
I wonder.
Because it's ``standard''? This actually makes some sense at
first---everyone has this huge library available all the time, so you
don't have to bother with finding, fetching, and installing just the
libraries you need. However, it creates the ``version skew'' problem.
If I require an updated gzip inflate-deflate library, I'm supposed to
get it by upgrading to the latest JDK. New JDKs are not compatible
with old Java source code, so before I can use the new gzip library, I
have to port my program to the new JDK. This annoying enough in
itself, until it quickly leads to the nightmare of having several
JDKs installed on a single machine in support of a single project,
with some vague long-term hope of eventually porting all the modules
to one JDK so that the program can actually run.
Languages like C, Perl, and elisp permit their programmers to upgrade
components incrementally, while maintaining a uniform interface to
the overall collection of code and documentation. With C, one can
solve the find-fetch-install problem with something like NetBSD's
packages collection, which makes adding a popular C library as easy
as adding one dependency line to a Makefile. Perl and elisp tend to
slowly absorb popular libraries into a Java-ish standard collection,
but the collection is not as opaque as a JRE.
Well, then, Java's standard library is better because it is
well-documented? People used to say this all the time because one of
Sun's marketing prongs was ``literate programming,'' meaning that
.java files contain the code and the documentation in the same file.
Now that Sun's marketing has calmed down, I don't think I need to do
anything but yawn at this. Everyone has documentation. There are
plenty of systems like Docbook, POD, Texinfo for writing your
documentation that are of similar quality to Java's. Putting
documentation and code in the same file is controversial, not
impressive or revolutionary. It's important to keep code and
documentation in-sync, but CVS can do this, too.
What I don't like is projects that offer both source code and
documentation, but install them far apart from each other, or treat
the source code like something dirty and don't install it at all. For
systems that ship with source, the source is part of the
documentation. On BSD, for example, the C source code for libc is in
/usr/src/lib/libc. You can read about a library function, then
quickly find its source code and look at the actual implementation.
If putting source and documentation in one file had meant I could
follow a link in the documentation to actual source, I'd be a fan.
But that didn't happen.
Sun's latest 1.4.0 JDK (s'cuse me, ``JSDK'') perpetrates the ultimate
irony of literate programming: the documentation and source code are
released on different dates and under different licenses. (XXX --
what did I mean by this?)
However, Java's bytecode is supposedly distinct from these three in an
interesting way. Compiled NewtonScript, .elc files, and F-code are
all logical representations of the original program that maintain most
of its linguistic structure, and do not at all resemble a machine
language that could run on a real CPU. Part of Sun's original pitch
was that Java bytecode was supposed to run on the as-yet-unreleased
Java CPU, an imaginary chip that would run Java ``natively''.
This observation makes me wonder if we wouldn't be better off throwing
out Java and writing programs in C, then cross-compiling them for the
VAX CPU. Instead of a JRE, we could simply write a VAX emulator for
all interesting architectures, and put virtual-VAX sandboxes inside
web browsers. VAX insns would become the Language Of The Web. We
could standardize a crippled miniature virtual VMS called WebVMS for
building into web browsers, to give all these VAXlets access to the
network, the local filesystem, a GUI toolkit. We would have
VAX-compatible smartcards. There is no reason a VAX emulator can't
translate VAX instructions into native-CPU instructions just like a
JRE's JIT does. This can be done quite well---like I said, the VAX
emulator for the Alpha is faster than any physical VAX CPU. I suspect
VAX machine-code would also be similarly compact to Java bytecode,
since machine-code-compactness was the biggest priority when the VAX
was designed. There is no missing piece to invent or design: we
simply agree that, henceforth, all applications will be VAX
applications so as to be equally inconvenient for everyone. Voila!
Portability!
However, it turns out Sun's pitch was basically a lie in practice.
First of all, there is no Java CPU and never will be. Sun did make
some ``Javastations'', but they fufilled the JavaCPU dream in name
only. They were merely proprietary SPARCstations running JavaCPU
emulators just as regular workstations do. They claimed their
picoJava chips would be 20 times faster than JIT on
i386, but then quickly backpedaled with MAJC and ``throughput computing,'' and now just
focus on memory footprint (*cough* Oberon *cough*) for, I don't know what, Sidekicks or smartcards or something, rather than speed.
Second, Java bytecode retains a lot of linguistic structure and is
readily disassembled back into source code, so the uselessness of Java
in protecting your company's proprietary software source code is a
popular industry joke.
From my point of view, these are positive developments. I now believe
Sun made these claims as part of their spin. You can imagine how
people other than me would react to Java:
Q. But why would I write programs in Java when C programs are compiled
into the very same instructions the CPU runs?
A. We're coming out with a Java CPU!
Q. My competitors will use Java's intermediate representation to steal
my source code. The machine languages C programs are compiled into
are ugly when disassembled, which protects our ``intellectual
property'' from ``theft.''
A. Java bytecode throws out all the linguistic structure. It's like a
stack-based CPU. It can't be disassembled!
Fortunately, both Answers are fictions. No one designed anything as
stupid as a Java CPU---new CPUs are still designed intelligently using
modern techniques and expecting their programs to be translated. And
I think bytecode is about like emacs .elc. It's very easy to
disassemble, to the point where programmers disassemble their bytecode
to diagnose compiler problems.
The ``uniqueness'' of the embedded market is that embedded devices
usually draw no clear distinction between the operating system and the
application. The programmer might draw such a distinction, but the
user does not. Indeed, to the user, there is no distinction between
the device and the software.
I have a Motorola i85s celfone, which has some miniscule Java-branded
runtime environment in it. When I complain to someone that ``my celfone
crashed,'' or say that Motorola's stuff has ``crappy software,'' they
usually sneer at me and deliver some speech about not being
technically-inclined and not ``understanding all that,'' but that they
also own a celfone, and they can make phone calls with it. There is no
distinction between software and phone.
K, so JavaOS can more cheaply provide the feature, ``if one program
misbehaves, it won't bring down the whole operating system.'' Pardon
me, but if embedded-device users won't even admit that the device has
software in it, how will it pacify them to learn that their celfone's
operating system didn't crash---it was just the ``phone application''
that crashed? I'm betting they won't know or care.
There is no need to segregate applications in a delivered embedded device.
The device needs to not crash, at all. The embedded market makes all
this nonsense about memory protection and kernel integrity irrelevant to
everyone except the developer.
Java's ``standard'' class library presents another problem for embedded
devices. It is gigantic. Every embedded device that wants itself
stamped with ``Java'' has to choose a different incompatible subset of
the standard library to include. This decision has two slightly different
scenarios.
A ``closed'' embedded device is one scenario. We want to design a
sealed product and use Java for the benefit of our engineers, who like
and know Java, or who feel that this particular OS kit has some
attractive characteristic. Here, the standard class library is
annoying because it is too tightly integrated with the JRE. We buy an
embedded Java kit from someone and are forced to install the entire
kit onto the device. Some standard classes in the kit will take up
space in the device's ROM without getting used, while other classes
that we might have appreciated will have to be implemented by hand. A
C-based kit like vxworks can provide a huge standard library and
statically link in only the symbols (functions/methods and variables)
that our C application uses. The finished ROM image will contain only
code that actually gets used. It will not contain pieces of the
standard library that never get called. In fact, it won't even
contain symbol names: the textual names of variables, methods,
classes, and so on. Compiled Java .class files must contain these
because Java's analogue to the C ``linker'' is part of the JRE, which
puts us in an absurd position where using short, inscrutible variable
names will actually reduce the size of the ROM chip we must put inside
our VCR. This factor naturally pales in comparison with including
standard library code that is never used, but it illustrates how
Java's inflexible runtime linking is problematic for embedded
devices.
The second scenario is something like a Java celfone which allows
users to download tiny Java craplets over i-mode or WAP to customize
their phones. In this situation, we must rigidly define the
``standard'' API so that craplet publishers can run the same image on
everyone's phone. Here, the question becomes ``what is Java?'' Do we
mean application-Java, web-page-Java, Keitai-Java, or smartcard-Java?
Motorola/Nextel Java, or NEC/NTTDoCoMo Java? The existence of a
``standard'' class library that isn't standard, but rather is
individually customized and abridged to match the scale of each
endeavor to which Java is applied, becomes very confusing. I know my
i85s phone is a ``Java'' phone, and as a consumer I recognize the
``Java'' brand. but does this mean I can run the same applets on my
phone that NTT DoCoMo subscribers are running on Java-branded phones
in Japan? Based on our experience so far with a grab-bag of
incompatible Linux-on-i386 JREs, I can't imagine there is any
consistency between phones except that which is forcefully imposed by
OEM distributors like NTT DoCoMo, onto their walled-garden subscribers
only. If I write a Java applet for celfones, who is my audience? All
Java-branded phones? I don't think so. As a phone user, the
basically untainted DoCoMo brand becomes more important than the
scattergun Java brand. As a developer, this is many times more
annoying than the well-known incompatibilities between JREs in web
browsers, because
However, machinery differs in its adeptness at solving various
problems---for example, different machines have different kinds of
floating point units, or none at all. Some machines are good at
vector math like Cray's YMP CPU's, or running massively parallel
instruction streams like clustered PeeCees or Cray's air-cooled
Alphas. Symmetric shared-memory multiprocessing currently starts
diminishing once you have about eight CPUs. SGI has multi-CPU towers
where several chunky-desktop-size cases are connected by a high-speed,
low-level bus called HSSI such that groups of eight CPUs inside each
case have an affinity for their internal block of physical memory, but
all CPUs can transparently access all memory---Irix supposedly has
some rather impressive distributed computing features so that, if you
write programs that match their internal behaviour to this
architecture, Irix will properly align threads across the cunky cases
which make up the tower. Most modern CPU instruction set
architectures have branch-prediction for the indirect function calls
that object-oriented languages use. Some, like the Symbolics Ivory
CPU or the IBM AS/400, have special memories that can do hardware
type-checking. Like I said, machinery differs in its adeptness at
solving various problems.
Similarly, languages differ at their adeptness and convenience in
varying kinds of description. XSLT or XSSSLT (or whatever
it's called) is a functional language that is very good at
describing how to translate a single proprietary XML description of
content into several nonproprietary markup languages like the web's
HTML, WAP's WML, and DoCoMo's cHTML. Perl is ``good with text
files'', but XSSSLT is more adept at describing this particular
translation.
It is up to us to figure out how to execute a program efficiently. We
might choose a specific machine which is adept at the task. We also
have a bag of well-known tricks. C uses one such trick: we can use a
very expensive translator called ``the compiler'' to translate the
program into a different language whenever it changes. Since the
program changes very seldom, this is efficient. However, there are
drawbacks to this trick: it makes some algorithms inconvenient to
describe in C, and it makes some ``optimizations'' impossible.
It is said, ``those who do not study Lisp are doomed to reimplement
it. Poorly.'' The popular C compiler 'gcc' uses a Lisp-ish language
internally to represent the program being translated. The fact that
the C compiler must do this, and must be itself written in C, is
probably part of what makes the C compiler itself so slow.
The JIT's idea of translating one machine language into another at
runtime in a way that takes advantage of linguistic construcs
describing repetition like ``loops'' to save translation-time is not
new. Apple translated m68k code into ppc code this way when they
replaced their old Macintoshes with ``Power Macintoshes''. Digital
did the same thing translating VAX insns to Alpha insns. The
``Executor'' product for running Macintosh programs on PeeCees
translates m68k code into i386 code this way. These ``emulation''
tools are short, hand-optimized routines for translating from one
relatively simple imperative language into another. Their ambition
pales when compared to a modern Lisp runtime environment.
Lisp is an interesting language to bring into our Java discussion
because it is one where investing huge amounts of work into the
runtime environment can produce a huge execution speed increase.
Popular C compilers and CPU emulators, by comparison, have progressed
past the point of rapidly-diminishing returns. Modern Lisp tools,
just as Java's JIT does, avoid translating code from Lisp into the
target machine language multiple times when that code doesn't change.
Note that when I use this particular language to describe how good
Lisp environments work, the behaviour is identical to a C environment.
Concepts like ``loading'' or ``running'' a program, or ``rebooting'' a
machine, are fragile and subject to re-interpretation. If you are
going to pontificate about ``The Interpreter Problem'', you cannot
presume that .foo files are compiled into .o files and then linked
into .app files. The Newton PDA doesn't even have ``files''.
Suppose my Lisp Machine lacks the notion of ``loading'' programs. All
programs are always loaded. I merely decide which ones I want to run,
and when I run them, they are translated into machine language
whenever necessary. When I want to power down or ``reboot'' the
machine, the state of all these loaded programs is saved to disk.
Suppose this ``state'' includes the machine language versions of
anything that got translated. Now, this may not be a particularly
desireable way to build a Lisp Machine, but performance-wise it seems
comparable to the way popular machines accomodate C programs. The
only difference is that you have to write ``Makefiles,'' and I don't.
When I say ``programs are compiled whenever they change'' all boils
down to the same thing in any language, I'm serious.
Programming environments also perform optimization.
Optimization is the idea that a language's runtime environment should
understand the meaning of things written in the language enough to
draw simple logical conclusions about equivalent programs, written in
the same language, which will run faster. In C, optimization happens
only in the translator. C compilers perform ``static'' optimization,
meaning they must operate without ``running'' the program. This, in
itself, is a fragile statement because even a human who looks at
source code and figures out what it will do is ``running'' the code
inside his or her brain, and in that sense static optimizers may
``run'' parts of the program they're optimizing: stated less
ambiguously, a C compiler is not allowed to change its optimization
decisions unless you change the program. A simple static optimization
might be to realize that
While Fortran is usually considered categorically inferior to C, the
language permits more static optimizations. Math routines will often
run faster if written in Fortran than if written in C, given a good
compiler---particularly on SIMD CPUs like Cray YMP or PowerPC
Ultivec.
Lisp runtime environments are allowed to change their optimization
decisions while a program is running. This is dynamic
optimization, and can give Lisp code speed advantages over Fortran
and C. It is not Lisp which makes dynamic optimization possible, but
the traditional architecture of Lisp runtime environments: Lisp
programs are not irrevocably translated into machine code once like a
C compiler does, but rather translated as needed while they run. We
could run C programs this way, or using a tool like 'stalin' we can
obliterate the dynamic optimization opportunity by irrevocably
translating a Lisp-ish program. The Lisp runtime architecture has the
freedom to change its optimization decisions after profiling the
running program. Running the progam on the CPU can reveal its nature
in ways that sort-of-running it on the static optimizer cannot.
The idea of ``just-in-time compiling'' reflects a fundamental
ignorance of quality Lisp runtime environments. It is a C-centric
attitude: naturally, thinks the C bigot, everything must be translated
into machine code by a ``compiler''. In Java, we will place the
``compiler'' in a slightly different spot, but it will still perform
only static optimizations like its Cish inspiration. Lisp runtime
environments include JIT functionality as a matter of course. Any
well-designed runtime environment will conserve its translation
effort. Pushing the envelope in a Lisp runtime environment is thus
about more complicated issues than the fragile and ambiguous word
``compiler'', issues like designing more aggressive dynamic
optimizations. Java has given a name, ``JIT,'' to a practice that,
before Java's invention, was considered mandatory.
It gets worse. Java's intermediate ``bytecode'' representation
obliterates much of the linguistic structure that exists in Java
programs before compiling them into ``bytecode''. The intelligence of
the predicate-based-AI used for optimization is very limited compared
to the intelligence of the humans that created the linguistic
structures (in Java) which are obliterated by 'javac' translating Java
into bytecode. Destroying this linguistic structure is harmful to
optimization. To see why, consider the static optimization I
described earlier of
That's why translating from Java into Java bytecode makes optimizers
less effective. C does not have this problem, because the optimizer
has both the high-level language and the target CPU's characteristics
available. OpenPROM F-code, NewtonForth bytecode, and Oberon
slim-binaries also do not have this problem, because their bytecode
preserves the high-level language's linguistic structure better than
Java's bytecode.
It is hard to make any clear statements about ``optimizing'' without
getting bitten by a practical exception, especially with my limited
knowledge, but the points are: the language in which a program is
written affects how it can be optimized, and optimization is something
that works best with high-level languages.
I understand that the ridiculous compiled vs. interpreted
argument is no doubt popular among former GWBASIC programmers who
learned that C programs are faster than GWBASIC programs because C
programs are ``compiled ahead of time'' into ``the very same
instructions that run on the physical CPU.''
Hopefully it is now clear why this is infuriatingly narrow reasoning.
All programs that run on a CPU are eventually compiled into ``the very
same'' machine code. Otherwise, they couldn't run. Victims of
Microsoft ROM BASIC can use whatever words they want to interfere with
their understanding of this fact, but there is no formal distinction
between interpreting and compiling. There are all sorts of wacky
schemes for executing programs written in programming languages, and
some of them are more clever than others. I assume the internal
design of GWBASIC is astonishingly unclever, but we will probably
never know. The strategy most C environments use to execute programs
may be more clever than GWBASIC, but it's still a very simple
strategy. I believe the performance reputation C enjoys is not a
consequence of C compilers being the fastest way to execute any given
algorithm. Rather, I think the speed reputation is primarily
attributable to our collective inability, so far, to implement
complicated language tools and make them generally available to
everyone for free. I have a copy of gcc for my NEC MobilePro, but a
copy of Allegro CL, I do not
have.
Contrary to what marketers would have us believe, the Java
architecture actually snubs the lessons of state-of-the-art notC
runtime environments that preceeded it. Not only is the JIT neither
new nor impressive, but Java's bytecode architecture precludes the use
of well-known optimization techniques for designing fast runtime
environments.
I pity the poor suckers who contribute to porting Sun's JDKs, and then
end up forbidden from redistributing their own work. NetBSD
contributors are fed up with politically
discriminatory licensing exceptions. Java-compatible environments
fork like the low end of a
broomhandle over licensing arguments. You are better off writing
programs in Modula2 than Java, in terms of available Unix runtime
environments. This is not a joke: cvsup seems to be working a lot
more often than Freenet. Perhaps it has to do with the Modula2 team's
allowing contributors to freely redistribute their work, instead of
claiming ownership and attempting to profit from the improvements that
others contribute.
There's some recent improvement in the NetBSD Java situation,
but I'm not optimistic this situation will last long. Personally, I
bet NetBSD will be plunged back into the dark ages of running JREs
under COMPAT_LINUX emulation very soon, but for now it's just my
prediction. As is, the SCSL restrictions mean that to use the Java port I've linked to,
When I say ``go away,'' I don't mean you get stuck with an old
version. I mean, if you didn't download this BSD-Java yesterday, you
might not be able to get it today, PERIOD. The people at eyesbeyond.com
hosting that Java port not only can't post binaries for download, they
can't even post a mirror of Sun's Java source code to which their
patches cleanly apply. When Sun comes out with a new Java version and
stops distributing their old source code, it instantly becomes
impossible for anyone to build Java on BSD, whether old version or
new. If for any reason the JRE you downloaded stops working, you
simply no longer have any way to run the Java program that you could
run yesterday. Nor is there any legal way to install the Java you
already have onto your friend's BSD system. Do you really want to
develop an in-hose application on Java with that kind of risk of
having your platform yanked out from underneath you? Is that an
appropriate platform upon which to conduct any kind of serious
consulting business?
The difference between variants of Java is more rigid than between C
development environments, because a Java developer is typically
compelled to use a JDK that matches the JRE, implying a specific set
of development tools and libraries. A C developer must also use
certain tools to match the target environment, but since the C tools
are not so tightly integrated at least he or she can reasonably expect
to keep a favorite text editor and 'make' tool across platforms, and
with a little luck might not be compelled to use an MS Windows host
like J2ME and JavaCard developers targeting the ``smaller scale''
environments often find themselves compelled to do.
While neither C nor Java's situation is ideal, it's important to
realize that Java's ``scalability'' is more a matter of market
penetration than actual uniformity.
It turns out where Java is actually used a lot these days is on the
back-end, on the web server, with WebSphere or JSP or something. Any
language could fit in that role---it's not architecturally difficult.
I'd also like it if designers of future elaborate database programs
would look at ``the latency problem''. This click-[wait]-scroll stuff
really is kind of irritating, and the Java sandbox and web integration
might have addressed it, but ultimately didn't. For example, if I
read Usenet news under emacs and Gnus, the portable Gnus program
(written in elisp and distributable, in compiled form, across all
emacs architectures) will pre-fetch the next ten or twenty news
articles that it thinks me likely to read. If I read news with
DejaNews a.k.a. Google Groups with an ordinary web browser, each news
article is fetched only after I select its HTML link because there is
no sane way to describe the prefetch logic to my web browser in HTML.
Actually Google Groups is very nicely done, and its CGI programs
combine multiple articles on a single web page, but the prefetch of
Gnus is smarter and more transparent. If elisp had become ``the
language of the web,'' Google could simply download a preconfigured
``Gnus'' to my computer, and give me the cleverest prefetch and lowest
latency available.
I think problems like this generalize well across many database
applications---for example, anything that returns ``20 results per
page.'' The particular angle on the Java hype that I'm refuting seems
unusually greedy compared to solving the click-[wait]-scroll
problem---behind the veil of all this fancy language is a desire to
sign on gigantic numbers of sheeplike users instantly whether they
like it or not, by force-loading Java environments with
advertiser-friendly features onto their computers along with this
web-browser program that they actually want, sort of like a trojan
horse, and then embedding invisible Java programs into web pages that
they will accidentally visit. Anyway, that accurately describes how
I've run most actual Java programs in this lifetime. The browser
says: Loading Java. ``Criminy.'' A window pops
up: The applet
globoNetDynamics.CommerceBasket2000PRO.userTrack.dbFrobometer is
requesting the following priviledges: complete control of your
audiovisual experience and access to everything on your disk. Would
you like to (a) Submit or (b) crash the browser? ``What
happened to the so-called Sand Box?'' Thank you for
submitting to the JavaControls dialog box. ``Why isn't
anything happening? Where's the Java? I thought this
was supposed to be cool, like running programs is cool. Oh, there it
is, that rotating logo in the lower right corner. Watery crap!
Thanks a lot, Java. Woo hoo. Raise the fucking roof.''
Is that the real advantage of all this language-of-the-web
stuff? There ought to be some small place left for the idea that
high-quality programs will attract users. Why bother when you can
DRIVE hits to your site BY THE THOUSANDS with OPT-IN JAVA
SPADVERTIZING!
But, what about when we're through inventing new languages and want to
pick one of them in which to write useful programs and sell to consumers?
Or what about when there is no hard disk for VM backing-store? Is ``The
C Operating System'' still the most convenient solution? I doubt it.
It gets worse. Modern CPUs are ``The C microprocessor'', because they
include MMUs designed to accomodate ``The C Operating System''-'s
memory allocation style, supervisor bits to accomodate its notion of
a ``kernel'', branch prediction to accomodate its function-calling
conventions, and so on. C has developed a harmful stranglehold on our
unnaturally limited concept of what a ``computer'' is. If we are ever
to design the Robot Masters that will ultimately wrest the planet from
human control, we must first wrest the computing community from C's
control.
There are a few problems with the JavaOS as a conceptual result of
this rebellion against ``The C Operating System''. First, Java is too
C-like. Java programs are basically C++ programs with Purify-style
code instrumentation. In one sense, it's amazing that such a trivial
change as Purify-ing everything can allow us to re-evaluate all sorts
of fundamental OS decisions, throw huge chunks of code out the window,
and come up with a final architecture that isn't totally absurd. In
another sense, the end result doesn't perform too well, and we can do
better. We have already done better.
We've already seen several variants of The Lisp Machine, where the
entire operating system is written in Lisp. Symbolics and TI both
offered useful Lisp machines, and most of them remain in the hands of
happy owners. Apple's NewtonOS is The Forth Operating System, and
attains levels of code-compactness that Java can't touch. The AS/400
is also a sort of ``Database Machine'', where the processor is a
tweaked PowerPC with hardware support for whatever weird languages IBM
wants people to use.
Also, there are operating systems like vxworks that give up on memory
protection but remain ``The C Operating System''---for example, on
vxworks if a program dies, you can restart it, but any memory it
allocated ``leaks'': remains allocated until a reboot. In a celfone
where the Phone Application starts at poweron and never dies, this
makes a lot of sense, and it frees the OS from some of C's usual
performance-sucking needs. For one thing, the kernel and application
can
Anyway, I have to admit the JavaOS is my favorite of all Java's many
flimsy marketing claims. It's important to remember, whenever someone
proposes a ``single-language'' operating system, that we are already
[ab]using a single-language OS: Unix.
In any case, we've been around this [language]OS block before. And,
like so many other aspects of Java, the JavaOS is a half-baked attempt
compared to the superior notC operating systems like Genera which
preceeded it.
The mindblowing complexity of Java's ``standard library'' creates this
confusing problem. First, there are scores of slightly different
Java-ish languages out there, all calling themselves Java. The
standard library, isn't. Second, it is trivial to add IJG's libjpeg
to my C application written under vxworks, but if I want to add JPEG
decopression to my Motorola-i85s-Java application, the tight
integration of JREs means there is no meaningful guarantee that I am
technically able to plagarize this JPEG code from some other, larger
JRE---much less that I can do this legally.(x * 128)
is equivalent to
(x << 7)
.(x * 128)
into (x <<
7)
. The 6502 CPU in the Apple //e lacked a multiplication
instruction, so integer multiplication in C might be translated by
``the compiler'' into an additive loop in 6502 assembley. Which of
the following patterns is easier to recognize and optimize into
(x << 7)
?
I assert that it is easier to write correct static optimizer patterns
that will recognize the first case as a shift-left than the second
case. If it's unclear how a computer would do static optimization,
try writing down some AI rules in English for a static optimizer that
runs on Grad Students. Which optimization is easier to program the
Grad Student to perform?
answer = (x * 128);
accumreg = x; ireg = 128;
for (ireg--; ireg != 0; ireg--)
accumreg += accumreg;
answer = accumreg;
Remeber, with respect to the negative impact of Sun's licenses, we're
talking only about running Java on NetBSD/i386. The Java 1.4 port
above is for i386 only. The fact that Java on free OSes doesn't run
on anything but i386 is a separate argument---here, we're just
pointing out that once you've accepted for the sake of argument that
you must run i386, you can't even choose whether you'd like to run
Java programs on BSD or Linux without a lot of pain and a very
realistic risk of having your Java go away when a later version of the
OS, the JRE, or the SCSL comes out.sbrk(...)
-based memory allocation, are all costly
consequences of supporting C programs. Other languages have other
costly requirements, and I'm far from convinced that ``The C operating
system'' is the best tradeoff. It is good in one sense: it can
support practically any language. Even if not at imagineably-optimal
efficiency, at least it can do so without catastrophic crashes. And
while sbrk(...) and Unix VM may not be the best way of allocating
memory for notC languages with garbage collection, still it is at
least a reasonably inexpensive (in hardware and software) way of
giving any language the memory it requires.
A student, in hopes of understanding the Lambda-nature, came to
Greenblatt. As they spoke a Multics system hacker walked by. ``Is it
true,'' asked the student, ``that PL-1 has many of the same data types
as Lisp?'' Almost before the student had finished his question,
Greenblatt shouted, ``FOO!'' and hit the student with a stick.
malloc(...)
out of the same giant heap, instead of
doing ridiculous things like growing ``system'' memory from one end of
address space and ``user'' memory from the other end, like other
primitive operating systems (pre-Multifinder MacOS, PalmOS) do.
rants / map / carton's page / Miles Nordin <carton@Ivy.NET>
Last update (UTC timezone): $Id: java_languageoftomorrow.html,v 1.2 2006/09/18 02:32:53 carton Exp $