Nexenta talk



Hi Brian!, thanks for your talk.  Honestly I think it was more
interesting to me than Ian's for what I'm doing right now. :) I am
actually very interested in how Sun plans to make money, both as a
general business-strategy thing and as an
I-hope-Sun-does-well-so-I-have-some-choices thing, and I expect a big
part of this will be selling access to certain branches of a
complicated revision tree.  This brings in revision control, which is
also an interesting topic to me.  but since the Indiana project is
just starting and the audience uninitiated I guess the talk was
necessarily short on technical details.

Anyway, I didn't really take Nexenta seriously before because I
thought it took away the best parts from both Linux and Solaris---the
careful integration, release engineering, i18n, and support offering
of Solaris gone and replaced with ``at least what the Nexenta people
have used themselves does work, and for everything else it's `open
source' so fix it yourself instead of complaining,'' and the real
freedom of Linux where almost all the source really is available
unlike Solaris(/Nexenta) where it's only a very tiny amount with huge
chunks of the base system, the toolchain, and even the kernel missing
to an extent that no Linux zealot would _dream_ of tolerating, and
constantly new binary-only drivers are committed even for hardware
that Sun sells themselves, while still people go around saying things
like ``didn't even know Solaris was open-source now.''  but at least
the demos kind of shocked me by making Nexenta look to me polished,
practical, attractive, and I was thinking ``maybe I wouldn't feel so
far behind if I were using this,'' so I think it was a good idea to
show them and talk about them.


I've been thinking about the differences among a bunch of different
package systems:

 * BSD systems which are all pkgsrc-like.  Within these there are some
   weird things going on like:

   * NetBSD's ``bulk builds''

   * NetBSD's 'pkgviews' (this one is potentially important to Solaris
     because it can actually solve the ``glibc version'' problem.
     someone in the back mentioned this example as a more insidious
     incarnation of the IMHO-mostly-solved multiple-JRE-versions
     problem Mark brought up.)

   * FreeBSD's 'portupgrade' (I don't fully understand this tool yet.
     I don't know if it's Gentoo-ish or NetBSD-ish.)

 * Gentoo, which differs from BSD in a few interesting ways, like

   * the lack of a ``base'' system

   * the weaker way build dependencies are treated which makes
     bootstrapping possible without a base system and makes small
     library upgrades very fast, but which also makes it possible to
     ``corrupt'' your system and creates all these checking ``etools''
     like 'revdep-rebuild'.

   * the cleaner-than-BSD way that multiple installable revisions of
     the same package are handled.  You still aren't allowed to
     install two versions of the same package like pkgviews, but in
     Gentoo at least you can _find_ the version you want to install
     easier, and the KEYWORDS mask (stable 'x86' vs unstable '~x86')
     automates this in a way that BSD can't.

 * The Linux binary package systems

   * RedHat's lesson of charging more to let you keep _older_
     packages, and fixing the security bugs in them.  Cheaper/free
     package systems follow the same revision tree as their upstream
     projects, which means more often you have to have to upgrade to
     an actual new version to fix security bugs than is a CentOS user
     or RedHat contract-holder.

   * all the horrible grief Linux binary systems get from letting you
     or forcing you to download non-revision-controlled packages
     directly from vendors outside the distribution project.

   * The difference between source dependencies and binary
     dependencies.  The Linux binary distributions constantly have to
     stress and wring their hands over the
     shared-library-version-problem.  When should we increment the
     library's version number and force a rebuild/upgrade of all the
     dependent packages?  When should we simply release a new version
     of the shared-library binary package, but leave the version
     number marked on the .so inside unchanged so you can keep using
     all your old dependent packages?

     BSD has this library version number problem only for the base
     system.  And it's only on the base system that they make binary
     ABI commitments on formal releases.  In the package system,
     shared library version numbers aren't really used at all unless
     someone's carefully set up two packages by hand to permit two
     copies of the same library to be installed at once.  They're not
     used for dependencies.  In general the version is just
     libfoo.so.0.0, and the package dependencies enforce that
     libraries are only installed on the same system with compatible
     packages (``compatible'' meaning packages that were built from
     source against that exact library).  Make _any_ change to the
     shared library, and you have to rebuild all the dependent
     packages.  This means they only care about source dependencies.

     Suppose upgrading from libpng-1.2.8 to libpng-1.2.9 requires
     rebuilding Mozilla because the ABI (some .h files) to libpng has
     changed.  But no changes to the Mozilla source code are required.
     so, PNG 1.2.8 and 1.2.9 are source-compatible, but not
     binary-compatible.

             |        dpkg/rpm          |       pkgsrc
             | old        | new         | old         |new
     --------+------------+-------------+-------------+------------
     library |1.2.8-1     | 1.2.9-0     | 1.2.8-nb1   |1.2.9
     package |            |             |             |
     --------+------------+-------------+-------------+------------
     soname  |libpng.so.2 | libpng.so.3 | libpng.so.0 |libpng.so.0
     --------+------------+-------------+-------------+------------
     Mozilla |2.0-1       | 2.0-2       | 2.0-nb1     |2.0-nb1
     package |            |             |             |

     Because some .h files changed, both Linux and BSD need to rebuild
     Mozilla.  There's no way around that.  But in BSD, the dependency
     of Mozilla on libpng, in the _installed_ system although not in
     the installable pkgsrc tree, is tracked abstractly, as one object
     on another object, not by version number.  If you touch libpng at
     all, whether you change libpng's version number or not, pkgsrc
     will want to rebuild Mozilla.  The ``new'' Mozilla 2.0-nb1 on the
     disk, and the binary package you'd make from it, isn't the same
     as the old one.  Yet the two have the same version number because
     nothing has changed in the source package, nothing in the Mozilla
     code nor the build instructions for Mozilla itself.

     Binary dependencies aren't tracked at all.  And while you can
     have ``binary'' packages in pkgsrc, the package version number
     marked on the binary package doesn't uniquely identify it like it
     does in Linux binary systems.  A binary package in pkgsrc is
     identified by a tuple of { the version of the NetBSD base system
     used to build the package , the date/CVSbranch of the /usr/pkgsrc
     tree used to build the package , the package name and version }.
     In practice people get a little sloppy and slightly Linuxy with
     this, and the package system doesn't actually record the CVS
     date/branch inside the binary package (maybe it should) but if
     you want an absolutely guaranteed-to-work system you have to
     install one big bag of binary packages all at once, all built
     from source at the same time.  The pkgsrc guys run continuous
     ``bulk builds'' to produce these bags of consistent binary
     packages.  The alternative is, don't use binary packages---use
     /usr/pkgsrc---then you can upgrade individual things and rebuild
     dependencies as needed.

     This is actually a bit paradoxical, because on one hand it
     reduces the amount of regression testing you need to do.  On
     Linux, the burden of one of these libpng.so.2 -> libpng.so.3
     changes, having to release, download, and install all those
     Mozilla 2.0-1 -> 2.0-2 binary packages, is enormous, so as much
     as possible they will change libpng but keep the same soname
     libpng.so.2 in both the old and new packages, so Mozilla doesn't
     need to be rebuilt.  

             |        dpkg/rpm          |       pkgsrc
             | old        | new         | old         |new
     --------+------------+-------------+-------------+------------
     library |1.2.9-0     | 1.3.0-0     | 1.2.9       |1.3.0
     package |            |             |             |
     --------+------------+-------------+-------------+------------
     soname  |libpng.so.3 | libpng.so.3 | libpng.so.0 |libpng.so.0
     --------+------------+-------------+-------------+------------
     Mozilla |2.0-2 *     | 2.0-2 *     | 2.0-nb1 +   |2.0-nb1 +
     package |            |             |             |

     * these two are the same mozilla.bin

     + although the package version number is the same, mozilla is
       rebuilt between these two.  mozilla.bin may have a different
       checksum.

     Sometimes the Linux guys will make a mistake, and give
     incompatible versions of a library the same soname.  This can be
     a disaster, because it causes problems for only a small subset of
     customers that you can identify only by the exact version numbers
     of every single package on their system.  It's hard to detect,
     hard to reproduce.  a really nasty bug in your release.  And the
     instructions to customers on how to avoid the bug can be
     complicated: ``don't use this version of libpng!  but it's ok to
     keep using it if you have this version of this, and that version
     of that, and ...''  so I like the BSD way better.

     On the other hand, if you do not use binary packages and build
     from source using /usr/pkgsrc, the packages high up in the
     dependency tree will tend to have binaries with many different
     checksums on each customer.  One guy rebuilt libpng for a
     security flaw, but kept the old libpango.  Another guy used a
     later version of /usr/pkgsrc, so he got new libpng, and also new
     libpango which was upgraded for some non-security-related thing.
     Both have the exact same revision Mozilla package, but different
     Mozilla binaries.  A package with lots of dependencies could have
     hundreds of different mozilla.bin's, all supposedly correct and
     working, but if there is a bug somewhere, again, it's hard to track
     down.  The official BSD answer to this is, I think, ``If you
     don't like that, (1) use our quarterly stable branches like
     pkgsrc 2007Q2, and (2) always upgrade everything when you upgrade
     anything.''

     I like the BSD package system.  I think eliminating classes of
     bugs is Good, and I think rebuilding and downloading things is
     relatively cheap now.  But if you have no base system like
     Gentoo, it's really not good, because it's hard or impossible for
     closed-source vendors to release consistently-working software
     that depends on libraries inside something like pkgsrc or Gentoo.

     I think open-source and closed-source software each work better
     with a fundamentally different package system architecture, and
     kind of like the status quo where I get a consistent-ABI platform
     from Sun, and then I add on all this open source stuff with
     pkgsrc.  however I'm not sure this is going to fly with the way
     Solaris sysadmins like to do security upgrades.  And I'm sure it
     won't fly with Sun's evangelical push to ``be more like Linux,''
     which makes me sad, the way GNOME trying to be more like Windows
     makes me sad.

 * There was a ``package'' filesystem in QNX Neutrino, which they have
   abandoned I think.  I believe it was a sort of dual-booting thing,
   where you could have multiple simultaneous collections, each of
   hundreds of packages, some packages shared between two collections,
   and a mesh of dependencies.  Then, you can choose at boot time, ``I
   want Package Set A'', and the system will make things appear as
   though only files mentioned in Set A, not Set B, are installed.  I
   don't know how it was used.

   I could imagine it being an alternative to FLASHing the ROM of an
   embedded system.  High-end systems usually have ``Bank A'' and
   ``Bank B'', or they have a FLASH filesystem like Cisco where each
   boot image is a single big file, and a ROM monitor lets you choose
   among them.  With this ``package filesystem,'' the vendor could say
   ``install this tiny patch package on your system'', and if it
   didn't work, you'd be able to roll back the install with the
   Package Filesystem.

   This sounds eerily like the ``package management service'' Ian
   hinted at, but it could be completely different.  I suppose I
   should read about it instead of speculating.

Ian focused on the weirdest things, like whether the tool downloads
packages for you, or you do it by hand.  Who gives a shit?  Package
systems are really complicated and interesting.  I'd like to see what
Sun comes up with, though I think I will miss the freedom and the
readily-available easy-to-tweak source code I get with pkgsrc.

Attachment: pgpS14ThfpOLk.pgp
Description: PGP signature