Pages

Wednesday, March 10, 2010

Segments in a.out

Ever wondered whenever you compile a C code, what the hell is written in the object .out file. Well, lets dig into the object file and see what it contains.

Now whenever you write a C code, it basically contains the code (functions etc) which you want to execute and the data (global, local etc) on which you are executing those functions. So what the compiler (and linker) does is (in reality, it does hell lot of stuff but just to make things simple-), it takes the code and the data and put them into segments.

Various Segments in object file a.out:

Note: These segments are not the one used in the hardware, these are simply part of an object file.

Data segment : contains the global and static variables which are initialized.
BSS segment : contains the global and static variables which are not initialized.
Text segment : Contains the code

The main importance of dividing the binary file into these segments is that the loader just need to take these segments and map it in memory.

BSS: This segment in the object file actually do not contain anything except the size of the BSS segment needed as the variables are still initialized so no need to mirror those variables in the segment. This size value will be later used by the loader to reserve that much amount of memory for the segment.

How segments are laid out in memory::
The segments conveniently map into objects that the runtime linker can load directly! The loader just takes each segment image in the file and puts it directly into memory. The segments essentially become memory areas of an executing program, each with a dedicated purpose.

The text segment contains the program instructions. The loader copies that directly from the file into memory (typically with the mmap() system call), and need never worry about it again, as program text typically never changes in value nor size. Some operating systems and linkers can even assign appropriate permissions to the different sections in segments, for example, text can be made read-andexecute-only, some data can be made read-write-no-execute, other data made read-only, and so on.

The data segment contains the initialized global and static variables, complete with their assigned values.

The size of the BSS segment is then obtained from the executable, and the loader obtains a block of this size, putting it right after the data segment. This block is zeroed out as it is put in the program's address space. The entire stretch of data and BSS is usually just referred to jointly as the data segment at this point. This is because a segment, in OS memory management terms, is simply a range of consecutive virtual addresses, so adjacent segments are coalesced. The data segment is typically the largest segment in any process.

We still need some memory space for local variables, temporaries, parameter passing in function calls, and the like. A stack segment is allocated for this. We also need heap space for dynamically allocated memory. This will be set up on demand, as soon as the first call to malloc() is made.

Note that the lowest part of the virtual address space is unmapped; that is, it is within the address space of the process, but has not been assigned to a physical address, so any references to it will be illegal. This is typically a few Kbytes of memory from address zero up. It catches references through null pointers, and pointers that have small integer values.

No comments:

Post a Comment