m: Venkatesh Srinivas Subject: GSoC Segments: What have I been doing, anyway? Date: Mon, 16 Aug 2010 10:29:54 -0400 Hi 9fans, So GSoC is more or less over! First, I really need to thank David Eckhardt and Erik Quanstrom for putting up with me this summer; dealing with me can be as frustrating as pulling teeth with a screwdriver when a patient only speaks another language. Next time I see either/both of them, I owe them beer/non-alcohol-beverage or pizza [1]. Also, thanks to Devon for everything, including waiting over an hour for me downtown. I have been working on the segment code and making forays into libthread. Let me first talk about what exactly I did, without talking about the motivations. In Plan 9 fresh-out-of-the-box, a process's address space is constructed from a series of segments, contiguous ranges of address space backed by the same object. By default, a process has a small number of segments: a Text segment, backed by the image, a Stack segment, backed by anonymous memory, a Data segment to back the heap structure, and a BSS segment for the usual purpose. Each process also has a small series of slots, currently 4, for other segments, obtained via the segattach() system call and released via the segdetach() syscall. When a process calls rfork(RFPROC), segments from the "shared" class are shared across the fork and "memory" class segments are copy-on-write across the fork; each process gets its own stack. When a process calls rfork(RFMEM | RFPROC), all segments except the Stack segment are maintained across the fork except the Stack segment. When a process class exec(), segments marked with SG_CEXEC are detached; the rest are inherited across the exec(). The Stack segment can never be inherited. Across an rfork(RFMEM | RFPROC), new segattach()es and segdetach()es are not visible - in Ron Minnich's terminology, we have shared memory, but not shared address spaces. First, I modified the segment slot structures, to lift the limit on four user segments. I made the segment array dynamic, resized in segattach(). The first few elements of the array are as in the current system, the special Text, Data, BSS, and Stack segments. The rest of the segment array is address-ordered, and searched via binary searches. The user/system interface doesn't change, except that the limit on segment attaches is now from the kernel memory allocator, rather than a fixed per-process limit. I further changed segattach() to add more flags: SG_NONE: A segment with the SG_NONE flag set does not have a backing store. Any accesses, read or write, cause a fault. This segment flag is useful for placing red zones at user-desired addresses. It is an error to combine the SG_NONE and SG_COMMIT flags. SG_COMMIT: A segment with the SG_COMMIT flag set is fully pre-faulted and its pages are not considered by the swapper. An SG_COMMIT segment is maintained at commit status across and exec() and rfork(RFMEM | RFPROC). If we are unable to satisfy pre-faults for all of the pages of the segment in segattach(), we cancel the attach. It is an error to combine the SG_COMMIT flag with SG_NONE. SG_SAS: A segment attached with the SG_SAS flag appears in the address space of all processes related to the current one by rfork(RFPROC | RFMEM). An SG_SAS segment will not overlap a segment in any process related via rfork(RFMEM | RFPROC). I finally changed libthread. Currently, libthread allocates thread stacks via malloc()/free(). I converted libthread to allocate thread stacks via segattach() - each thread stack consists of three segments, an anonymous segment flanked by two SG_NONE redzones. Currently I have posted a prototype (very generously called 'prototype') implementation of the above interface to sources; the prototype kernel omits a number of the checks claimed above. SG_SAS faults are not handled; SG_SAS segments must be SG_COMMIT. SG_COMMIT has no limit, which makes it very easy to crash a system by draining the page queue readily. The prototype libthread is of considerably higher quality, I think, and would be usable a production-grade implementation of these interfaces. The prototype kernel is usable though - I have run it alone on my terminal for approximately a month. However, the prototype kernel shows us that the interface can be implemented efficiently - even when using three segattach()es per thread stack, creating 1024 threads took 2.25s real time on a 400MHz AMD K6, versus 0.87s realtime with the original libthread and 9 kernel. Creating processes with thousands of segments is not incredibly speedy, but it is workable and there is a lot of low-hanging fruit that can improve performance. The SG_SAS work is fairly unusual for Plan9 - each process originally had a single, fixed-size segment slot array. Now, a process has a per-process segment array and a second shared segment array. The shared array is referenced by all processes created by rfork(RFMEM | RFPROC); the shares are unlinked on exec() or rfork(RFPROC). The SG_SAS logic was added to match the current semantics of thread stacks - as they are allocated by malloc() and free() from the Data segment, they are visible across rfork(RFMEM | RFPROC); this is as expected - a thread can pass a pointer to a stacked buffer to an ioproc(), for example. To allow for standalone segments to be used the same way, they needed to appear across rfork(). This interface would also support a libc memory allocator that uses standalone segments, rather that constraining it to use sbrk() or pre-allocated segments. This was my original motivation for this project, though it was a problem I did not get a chance to address. Any thoughts or discussion on the interface would rock. Thanks, -- vs [1] http://undeadly.org/cgi?action=article&sid=20100808121724