Saturday, August 6, 2011

malloc

Today I was asked to comment on a lecture that will be given to second year students in October. This lecture is part of the UNIX System Programming course and is a support for one of the most interesting project of the semester: make your own memory allocator. By implementing the standard C/C++ routines malloc(), free(), calloc() and realloc(), students will be familiar with the system calls brk()/sbrk() and will have a much deeper understanding of a process execution environment.

The lecture first introduce how memory is managed by the Linux kernel, then speaks about the different memory sections of a process execution environment. Finally, students are invited to follow some demonstrations, meant to explain how brk() and sbrk() will have to be used in the project. Here’s a little summary:

There is basically 3 main sections mapped in memory: the code segment (or text segment), the data segment and the stack.

The code segment holds the executable instructions interpreted by the processor. It is read-only and has a fixed-size.

The stack segment stores information about the current state of the process. I’ll probably need a hole other bunch of lines to explain how it is used so I’ll try to keep in simple. The stack is used every time a subroutine is called, storing data on top before the routine call (push) and restoring it on return (pop).

The data segment has read and write permissions and is divided in 3 parts:

  • The data part which holds the initialised global and static variables.
  • The BSS which holds the uninitialised global and static variables (it is filled with zeros before execution).
  • The heap which is the section managed by the memory allocator. It can grow or shrink during run-time via the system calls brk() and sbrk() (mmap() can also be used, but let’s leave that apart).


So, as you can guess, every time you call malloc(), the memory allocator will find in the heap a suitable place for you and will return a pointer to that specific spot. If there is none, it will extend the heap.

A good memory allocator has to:

  • be fast: of course! Just count how many time you call malloc() in one of your programs.
  • manage memory very efficiently: free blocks can be used again to avoid endless system calls, adjacent free blocks can bee merged to avoid fragmentation, …
  • check for programming mistakes: release an already freed memory is an error, programmer can give an invalid pointer, …

You can see why writing your own memory allocator can be a good idea. There’s no need for a complex one at first. You can start easy with a simple call to sbrk() every time and then improve it with a better data structure for storing your pointers to free blocks. Ore maybe make sure there’s no race conditions on a multi-threaded application?
And there’s also that feeling when xeyes and Bash runs with your malloc().

No comments:

Post a Comment