Thread support in operating systems


What are threads, and why do we use them

Most of the programs I run on my Unix shell host, castrovalva.Ivy.NET, are unthreadded. I know this because they work, and NetBSD's threads are mostly broken on architectures other than i386, so working programs must be using the thread libraries lightly or not at all. Here are some programs which are, or can be, unthreadded:

All these programs are the Past. Threads are the Future.

Two motivations conspire to squeeze off the possibility of workable unthreaded platforms. What are they?

First, fashion! ``Unthreaded platforms,'' I say...what's a platform, anyway? It's where all the money comes from in selling computers and software. Keep people on the platform, and they can be controlled and milked for cash. Nevermind how to do the milking---that solves itself. Just keep them on the platform, and milk...er, cash, flows. Java is a platform. Windows is a platform, but whenever the scum at Microsoft builds some new ramshackled asbestos-laden shanty hut onto the side of Windows, the new thing becomes an even stricter platform: Office is definitely a platform. .NET is the biggest sheep-herding exercise in decades. There are either/or platforms, and there are nested platforms where one is built upon the other, but always a platform has this containing feature so there is major stumbling and rework when you try to move off it. Ever feel like Mac users are a cult? That's the platform-building in action. Giant web sites like Amazon, Google Maps, Facebook that have ``API's'' are trying to become platforms. Platforms == riches.

It's easier to communicate with others if they're on the same platform. You can borrow programs others are running as long as you don't have to cross platforms. Data will be richer if it hasn't crossed platforms. If the ``data'' is an email from your friend telling you how to fix your computer, that won't work as well if the email has crossed platforms, either, which leads right into the next point about platforms.

More importantly, you can borrow/steal developers from others on the same platform---if someone's been stuck on a platform for a long time, if his mind is infected with it, then he'll work more efficiently if you don't try to relocate him. People get stuck on platforms by creating the platform just as much as they do by merely using it. The bigger it becomes, the more there is to relearn, reimplement, adapt, if you try to leave. The platforms of tomorrow are the ones taught in third-world for-profit tech schools. The old Unix tribe is dying out, mostly because we've been outmaneuvered in indoctrinating the next generation of worker drones.

What do threads have to do with platforms? Well, lots of platforms have threads now. Java and Windows do for sure, but also Perl and Python and, well, almost anything new. Without threads, you're kicked off the platform. And we create the threaded uberplatform by expecting to have threads whenever we write new programs---even if we don't have .NET Win_LPCreateThreadEx32(), we want to have some function that makes a familiar thread-like thing, and we whine if we don't get it. Porting programs across platforms, even big ones like Firefox, is doable, but porting a threaded program to an unthreaded platform is impossible. Firefox needs threads on all platforms where it runs.

To be fair, threads are useful. They make some programs simpler to write. But that's not why they're the future. They're the future because all the (fewer every year) politically powerful platforms of the future include them.

Second reason: performance. Our timesharing system can accomplish more computation if it has many runnable contexts to spread among CPU cores instead of just a few. An unthreadded program has only one context, and can only keep one CPU busy. The spreading out of single programs onto many CPU's can take many forms besides threads, but most of the ideas for making computers of the future faster depend on this spreading, and threads are probably the simplest imagineable model. Here are some of the ways having multiple threads can speed up computing:

How are threads implemented: part one, experimental M:1 systems

Threads are a programming language idea, so it's wrong to talk about their implementation except as an entire stack extending all the way up to a high-level language. But, in general, we do. We only care about implementing threads in C. We don't need to look into other languages like Perl, Java, Python, Lisp where threads all exist mostly through adaptation layers over the POSIX threads for C API, sharing whatever limitations C threads have on that platform. (There might be a small CPU- and ABI-dependent piece in the notC language. I'm not sure.)

Let's talk about some of the implementations of POSIX threads for C, on Unix.

The first reliable thread implementation for Unix was MIT pthreads (now GNU Pth is something like it). Pth exists entirely within the userspace context of a single process, with no help from the Unix kernel. No one uses Pth any more. Why? Not modifying the kernel is the cornerstone of the Pth architecture, and it's a decision that introduces unsolveable problems. Here are some problems with Pth which are severe enough to make it a not good-enough foundation for a modern threaded platform:

GNU Pth is important because, even after Linux and *BSD had kernel-provided thread implementations, they provided incomplete or buggy and broken implementations for many years. Linux took a very long time to switch over to NPTL, which is their third-try kernel thread implementation (fourth-try? at least LinuxThreads and NGPT came first.) which unlike earlier ones actually aims to be mostly correct. Even after NPTL existed and people started bragging about it, it took a couple more years to make it onto release-engineered desktops and servers.

Likewise, BSD still has buggy kernel threads---as of 2008 this applies to at least:

Of course, the latter two problems aren't worse than Pth, but the first two are. The point is, Pth is finished, and kernel threads in free Unix still aren't! Linux I think is almost finished, and maybe FreeBSD/i386 but I'm not positive. NetBSD 5.0 will probably be finished.

How are threads implemented: part two, mature systems

What Unix definitely has a finished thread implementation as of 2008? Linux and I think {Free,Open}BSD, but on i386 only, were all mostly finished (watch out for HyperThreadding, broken gdb's, missing libc functions, low performance on certain workloads) a couple years after 2006. And any vendor Unix, like AIX, OSF/1, Irix, Solaris. They were all finished as of 1997. Let's have a look at what those latter guys are doing. or, what they did a long time ago.

There was once serious debate about how to make the most efficient thread architecture. Here are some of the old proposals:

Summary

Solaris and other proprietary Unixes had bug-free threads a full decade before any free operating system. Linux was the first free operating system to have decent, reliable, performant POSIX threads, though they offer them after a couple of heinously ugly false starts that never happened on proprietary Unixes.

1:1 thread architectures are the easiest to make mature/bug-free, complete, and decently performant on all workloads. Both M:1 and M:N architectures are abandoned, industry-wide.


rants / map / carton's page / Miles Nordin <carton@Ivy.NET>
Last update (UTC timezone): $Id: thread.html,v 1.7 2008/01/28 22:40:08 carton Exp $