I think processes could replace threads in most cases, but common OSs are hampering us.
Fork is slow in Windows. In unix, having lots of processes crowds ps (creates disincentive) and if you want to effectively manage a tree of threads you have to do fiddly work managing a thread group and (if you want to be fast) wrapping your head and software around shared memory IPC.
I think that if support for multiple processes in mainstream processes was more effective than it is, we'd both spend less time worrying about threads and write more stable software.
* sharing memory - sometimes you have lots of immutable data, like modules, graphs, whatnot. Yes, there is copy-on-write and no, it doesn't work well on any python implementation out there. Also sometimes it's mostly-immutable data, but not quite.
* serializing lots of data is a mess and even if feasible is usually a big performance hit if you want to exchange actual objects.
With Plan 9's rfork() (and linux's less nice clone()), you can create new processes that share the memory of the old process, which addresses those two answers. Though, I'm not sure if this reinforces OP's point or just indicates that the distinction between threads and processes can be fuzzy.
RFMEM If set, the child and the parent will share data and bss segments. Otherwise, the child inherits a copy of those segments. Other segment types, in particular stack segments, will be unaffected. May be set only with RFPROC.
So, it basically is a way to start a process that shares all its globals (including static variables, I think) with another process, but not other memory. That is more secure than having threads, but also more restricted, as one cannot share heap-allocated structures between such processes. I guess this feature gets used most in Fortran code where nothing gets allocated dynamically.
It also makes it easier to selective kill a thread of execution from the command line, but I do not see when that might be useful.
I think you're wrong, reasoning in the wrong direction. I think that green threads / fibers / lightweight threads are going to replace threads. Haskell uses them, for example. They are much more lightweight, you can get additional correctness guarantees from them (1: bind the "runner" threads to specific CPUs, you have to worry about memory fences less, 2: if other fibers that use the same resources run on the same "runner" thread, you can "lock" shared data simply by preventing a (lightweight) context switch), the only problem is that they are hard to implement in C(++) (you need stack switching if you want to be reasonably fast), so not many compilers/interpreters implement them.
I see this sentiment everywhere. Processes don't solve every problem, nor do threads, and nor do fibers. They're each useful in different places.
Despite the GIL, threads are still useful in Python for handling asynchronous operations. One or more threads pull items off a work queue, process them, and then put the results somewhere else. The GIL is a problem only when operations are CPU-bound, but Python is pretty damn slow at that anyway. Alternatively, you can sidestep the GIL by creating a thread in a C extension that does the heavy-lifting, then calls back into Python with the result.
Fork doesn't work for much in python; for example if you want to create a bunch of data and then fork a bunch of processes to calculate stuff using the data (say the data is a bunch of word count indexes or something), you can't even read the data from the python processes without triggering copy-on-write. This is because all these little reference counts are scattered throughout the data and they get changed just by reading.
I find it strange that more attention hasn't been paid to the more obvious path: Multiple interpreters in one process. You can safely run a separate Python interpreter in each thread of a process. More overhead than threading, but less than multiprocessing, and takes care of both the Windows forking problem and the general unix "housekeeping" problem.
Fork is slow in Windows. In unix, having lots of processes crowds ps (creates disincentive) and if you want to effectively manage a tree of threads you have to do fiddly work managing a thread group and (if you want to be fast) wrapping your head and software around shared memory IPC.
I think that if support for multiple processes in mainstream processes was more effective than it is, we'd both spend less time worrying about threads and write more stable software.