Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think processes could replace threads in most cases, but common OSs are hampering us.

Fork is slow in Windows. In unix, having lots of processes crowds ps (creates disincentive) and if you want to effectively manage a tree of threads you have to do fiddly work managing a thread group and (if you want to be fast) wrapping your head and software around shared memory IPC.

I think that if support for multiple processes in mainstream processes was more effective than it is, we'd both spend less time worrying about threads and write more stable software.



Two answers really:

* sharing memory - sometimes you have lots of immutable data, like modules, graphs, whatnot. Yes, there is copy-on-write and no, it doesn't work well on any python implementation out there. Also sometimes it's mostly-immutable data, but not quite.

* serializing lots of data is a mess and even if feasible is usually a big performance hit if you want to exchange actual objects.


With Plan 9's rfork() (and linux's less nice clone()), you can create new processes that share the memory of the old process, which addresses those two answers. Though, I'm not sure if this reinforces OP's point or just indicates that the distinction between threads and processes can be fuzzy.


What are the uses of a process that shares address space with its parent?


I wondered, too, so I had to look this up http://cm.bell-labs.com/magic/man2html/2/fork shows that it is not the whole address space:

RFMEM If set, the child and the parent will share data and bss segments. Otherwise, the child inherits a copy of those segments. Other segment types, in particular stack segments, will be unaffected. May be set only with RFPROC.

So, it basically is a way to start a process that shares all its globals (including static variables, I think) with another process, but not other memory. That is more secure than having threads, but also more restricted, as one cannot share heap-allocated structures between such processes. I guess this feature gets used most in Fortran code where nothing gets allocated dynamically.

It also makes it easier to selective kill a thread of execution from the command line, but I do not see when that might be useful.


I think you're wrong, reasoning in the wrong direction. I think that green threads / fibers / lightweight threads are going to replace threads. Haskell uses them, for example. They are much more lightweight, you can get additional correctness guarantees from them (1: bind the "runner" threads to specific CPUs, you have to worry about memory fences less, 2: if other fibers that use the same resources run on the same "runner" thread, you can "lock" shared data simply by preventing a (lightweight) context switch), the only problem is that they are hard to implement in C(++) (you need stack switching if you want to be reasonably fast), so not many compilers/interpreters implement them.


I see this sentiment everywhere. Processes don't solve every problem, nor do threads, and nor do fibers. They're each useful in different places.

Despite the GIL, threads are still useful in Python for handling asynchronous operations. One or more threads pull items off a work queue, process them, and then put the results somewhere else. The GIL is a problem only when operations are CPU-bound, but Python is pretty damn slow at that anyway. Alternatively, you can sidestep the GIL by creating a thread in a C extension that does the heavy-lifting, then calls back into Python with the result.


Fork doesn't work for much in python; for example if you want to create a bunch of data and then fork a bunch of processes to calculate stuff using the data (say the data is a bunch of word count indexes or something), you can't even read the data from the python processes without triggering copy-on-write. This is because all these little reference counts are scattered throughout the data and they get changed just by reading.


This doesn't quite apply to PyPy (which doesn't have refcounting), but the point stands. In case of PyPy this is GC flags and moving GC.


I find it strange that more attention hasn't been paid to the more obvious path: Multiple interpreters in one process. You can safely run a separate Python interpreter in each thread of a process. More overhead than threading, but less than multiprocessing, and takes care of both the Windows forking problem and the general unix "housekeeping" problem.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: