Inspirational Quotes and Speeches

Monday, February 28, 2011

Introduction

Chapter 1 : Introduction

1.1 What is the kernel ?

Kernel is the core of an operating system . Operating System receives the request from user and processes it on user’s behalf. Requests are received by command shell or some other kind of user interface and are processed by the kernel. So, kernel acts like an engine of the operating system which enables a user to use a computer system. Shell is the outer part of the operating system that provides an interface to the user for communicating with kernel.

Fig. 1.1 Layout of an Operating System

1.2 Kernel Components

Major components of a kernel are,

  • Low Level Drivers : They are architecture specific drivers and are responsible for CPU, MMU and on-board devices initialization .
  • Process Scheduler : Scheduler is responsible for fair cpu time slice allocation to different processes .
  • Memory Manager : Memory management system is responsible for allocating and sharing memory to different processes.
  • File System : Linux supports many file system types, e.g. - fat, ntfs, jffs and lot more. User doesnt have to worry about the complexities of underlying file system type. For this linux provides a single interface, named as virtual file system . Using a single Virtual File System interface users can use the services of different underlying file systems. The complexities of different file systems are abstracted fromthe user.
  • Network Interface : This component of linux kernel provides access and control to different networking devices.
  • Device Drivers : These are high level drivers .
  • IPC : Inter Process Communication , IPC subsystem allows different processes to share data among themselves.

Fig. 1.2 Kernel components

1.3 Integration Design

As we saw kernel is made up of different components. Integration design tells how these different components are integrated to create kernel’s binary image .

There are mainly two integration designs used for operating system kernels , monolithic and micro . In monolithic design all the kernel components are built together into a single static binary image . At bootup time , entire kernel gets loaded and then runs as a single process in a single address space. All the kernel components/services exist in that static kernel image . All the kernel services are running and available all the time . Also , since inside the kernel everything resides in a single address space ,so no IPC kind of mechanism is needed for communicating between kernel services. For all these reasons monolithic kernels are high performance. Most of the unix kernels are monolithic kernels.

The downside of this design is that once the static kernel image is loaded , you cant add/remove any component or service from the kernel . Also its memory footprint is high . So, resource consumption is higher in case of monolithic kernels.

The second kind of kernel is microkernel. In microkernel a single static kernel image is not built, instead kernel image is broken down into different small services. At boot up time , core kernel services are loaded , they run in privileged mode . Whenever some service is required , it has to get loaded for running . Unlike monolithic kernel all services are not up and running all the time . They run as and when requested . Also, unlike monolithic kernels , services in microkernels run in separate address spaces . So, communication between two different services requires IPC mechanism . For all these reasons microkernels are not high performace kernels but they require less resources to run .

Linux kernel takes best of both these designs. Fundamentally it is a monolithic kernel. Entire linux kernel and all its services run as a single process , in a single address space , achieving very high performance . But it also has the capability to load / unload services at run time in the form of kernel modules .

1.4 User - Mode and Kernel - Mode

In a system , linux kernel runs under a special privileged mode as compared to user applications. Kernel runs in a protected memory space and it has access to the entire hardware . This memory space and this privileged state collectively is known as kernel space or kernel mode. On the contrary , user applications run under user-space and have limited access to resources and hardware. User space applications cant directly access to kernel space memory but kernel has access to entire memory space .

1.5 Different Contexts of Kernel Code

Entire kernel code can be divided into three categories.

  • Process Context
  • Interrupt Context
  • Kernel Context

1.5.1 Process Context

User applications cant access the kernel space directly but there is an interface using which user applications can call the functions defined in the kernel space. This interface is known as system call . A user application can request for kernel services using a system call.

read() , write() calls are examples of a system call. A user application calls read() / write() , that in turn invokes sys_read() / sys_write() in the kernel space . In this case kernel code executes on the request of user space application. So, the kernel code that executes on the request or on behalf of a user application is called process context code. All system calls fall in this category.

1.5.2 Interrupt Context

Whenever a device wants to communicate with the kernel, it sends an interrupt signal to the kernel. The moment kernel receives an interrupt request from the hardware, it starts executing some routine in the response to that interrupt request. This response routine is called as interrupt service routine or an interrupt handler. Interrupt handler routine is said to execute in the interrupt context.

1.5.3 Kernel Context

There is some code in the linux kernel that is neither invoked by a user application nor it is invoked by an interrupt. This code is integral to the kernel and keeps on running always . Memory management , process management , I/O schedulers , all that code lies in this category. This code is said to execute in the kernel context.

1.6 Linux Kernel Versioning

For knowing the linux kernel version , you can use ‘uname’ command with ‘-r’ option . uname is a useful command , you should go through all its options . e.g. I have used command on my machine ,

# uname –r

2.6.18-1.798.fc6

In the command output , you can see a dotted decimal string , 2.6.18. This is the linux kernel version . In this dotted decimal string ( 2.6.18 ) , the first value 2 denotes major release number, second value 6 denotes minor release number and the third value 18 is called the revision number. The major release combined with the minor release is called the kernel series. So, as per the above command output, I am using 2.6 kernel series on my machine.

There is something more on the minor release number. The odd minor release number is considered as a development release while the even minor release number is considered as a stable release. Development release is fast changing and are meant for developers to experiment upon . They are not preferred to use in a production environment. On the other hand stable releases are meant to use in a production environment. The only changes to a stable release are usually bug-fixes and addition of new drivers.

This all is true for linux kernels prior to 2.6 series . From 2.6 onwards all releases are considered as stable .Our discussions is primarily based on 2.6 kernel series but we will highlight the important differences between 2.6 and 2.4 kernel series.

1.7 Linux Kernel Sources

For building the linux kernel , you will need latest or any other stable kernel sources . For example we have taken the sources of stable kernel release version 2.6.33 . Different versions of Linux Kernel sources can be found at http://www.kernel.org . Get latest or any stable release of kernel sources from there.

Let’s assume you have downloaded stable kernel release sources on your system . Kernel sources are zipped in a tarball , something like ’ linux-2.6.33.tar.gz’ . Put this tarball under /usr/src directory . You may have another directory named ‘linux’ under /user/src. A notice of caution is that , pls do not touch /usr/src/linux directory for our experimentation . Just save the recently acquired kernel sources tarball directly under /user/src and untar it by using command .

# tar –xvzf linux-2.6.33.tar.gz

This command will create a directory with name linux-2.6.33 and it will untar entire kernel source in this directory . Whew !!!! First big job is done .

In case you do not have access to high speed internet , you can also get these sources from any of your friends and copy here .

1.8 Exploring Kernel Sources

Let’s have an introduction with this beast called linux kernel sources. At this point , if you struggle to understand few things , please don’t get panic . As the course progresses ,we will discuss in detail and you will able to understand them completely. For now Just keep on moving ahead.

Most of the kernel source is written in C .It is organized in various directories and subdirectories . Each directory is named after what it contains .

Directory structureof kernel may look like the below diagram.

Fig. 1.3 Linux Kernel source tree

Here is some brief introduction about the directories that you are seeing in linux kernel sources.

1. arch/ : Linux kernel can be installed on a handheld device to huge servers. It supports intel ,alpha,mips, arm,sparc processor architectures . This 'arch' directory further contains subdirectories for a specific processor architecture. Each subdirectory contains the architecture dependent code. For example , for a PC , code will be under arch/i386 directory , for arm processor , code will be under arch/arm/arm64 directory etc.

2. init/ : LILO or linux loader loads the kernel into memory and then control is passed to an assembler routine , arch/x86/kernel/head_x.S .This routine is responsible for hardware initialization , and hence it is architecture specific. Once hardware initialization is done , control is passed to start_kernel() routine that is defined in init/main.c . This routine is analogous to main() function in any ‘C’ program , it’s the starting point of kernel code . After the architecture specific setup is done , the kernel initialization starts and this kernel initialization code is kept under init directory. The code under this directory is responsible for proper kernel initialization that includes initialization of page addresses, scheduler ,trap, irq, signals, timer, console etc.. The code under this directory is also responsible for processing the boot time command line arguments.

3. crypto/ : This directory contains source code of different encryption algorithms , e.g. md5,sha1,blowfish,serpent and many more . All these algorithms are implemented as kernel modules . They can be loaded and unloaded at run time . We will talk about kernel modules in subsequent chapters.

4. documentation/ : This directory contains documentation of kernel sources.

5. drivers/ : If we understand the device driver code , it is splitted into two parts. One part communicates with user, takes commands from user , displays output to user etc. The other part communicates with the device, for example controlling the device , sending or receiving commands to and from the device etc. The part of the device driver that communicates with user is hardware independent and resides under this 'drivers' directory. This directory contains source code of various device drivers. Device drivers are implemented as kernel modules. As a matter of fact, majority of the linux kernel code is composed of the device drivers code , so majority of our discussion too will roam around device drivers.

This directory is further divided into subdirectories depending on the device’s driver code it contains.

  • drivers/block/ - contains drivers for block devices,e.g. – hard disks.
  • drivers/cdrom/ - contains drivers for proprietary cd-rom drives.
  • drivers/char/ - contains drivers for character devices , e.g. - terminals, serial port, mouse etc.
  • drivers/isdn/ - contains isdn drivers.
  • drivers/net/ - contains drivers for network cards.
  • drivers/pci– contains drivers for pci bus access and control.
  • drivers/scsi/ - contains drivers for scsi interface.
  • drivers/ide/ - contains drivers for ide devices.
  • drivers/sound – contains drivers for various soundcards.

Another part of a device driver , that communicates with the device is hardware dependent, more specifically bus dependent. It is dependent on the type of bus which device uses for the communication. This bus specific code resides under the arch/ directory.

6. fs/ : Linux has got support for lot many file systems , e.g. ext2,ext3, fat, vfat,ntfs, nfs,jffs and lot more . All the source code for these different file systems supported is given in this directory under file system specific sudirectory,e.g. fs/ext2,fs/ext3 etc. Also, linux provides a virtual file system(VFS) that acts like a wrapper to these different file systems . Linux virtual file system interface enables the user to use different file systems under one single root ( ‘/’) . Code for vfs also resides here. Data structures related to vfs are defined in include/linux/fs.h. Please take a note , it is very important header file for kernel development.

7. kernel/ : This is one of the most important directories in kernel. This directory contains the generic code for kernel subsystem i.e. code for system calls , timers, schedulers, DMA , interrupt handling and signal handling. The architecture specific kernel code is kept under arch/*/kernel.

8. include/ : Along with the kernel/ directory this include/ directory also is very important for kernel development .It includes generic kernel headers . This directory too contains many subdirectories . Each subdirectory contains the architecture specific header files .

9. ipc/ : Code for all three System V IPCs(semaphores, shared memory, message queues) resides here.

10. lib/ : Kernel’s library code is kept under this directory. The architecture specific library’s code resides under arch/*/lib.

11. mm/ : This too is very important directory for kernel development perspective. It contains generic code for memory management and virtual memory subsystem. Again, the architecture specific code is in arch/*/mm/ directory. This part of kernel code is responsible for requesting/releasing memory, paging, page fault handling, memory mapping, different caches etc.

12. net/ : The code for kernel’s networking subsystem resides here. It includes code for various protocols like ,TCP/IP, ARP, Ethernet, ATM, Bluetooth etc. . It includes socket implementation too , quite interesting directory to look into for networking geeks.

13. scripts/ : This directory includes kernel build and configuration subsystem. This directory has scripts and code that is used to configure and build kernel.

14. security/ : This directory includes security functions and SELinux code, implemented as kernel modules.

15. sound/ : : This directory includes code for sound subsystem.

16. module/ : When the kernel is compiled , lot of code is compiled as modules which will be added later to kernel image at runtime. This directory holds all those modules. It will be empty until the kernel is built at least once.

Apart from these important directories , also there are few files under the root of kernel sources.

1. COPYING - Copyright and licensing (GNU GPL v2).

2. CREDITS - partial credits-file of people that have contributed to the Linux project.

3. MAINTAINERS - List of maintainers who maintain kernel subsystems and drivers. It also describes how to submit kernel changes.

4. Makefile – Kernel’s main or root makefile.

5. README - This is the release notes for linux kernel. it explains how to install and patch the kernel , and what to do if something goes wrong .

1.9 Documentation

You can use make documentation targets to generate linux kernel documentation. By running these targets, we can construct the documents in any of the formats like pdf, html,man page, psdocs etc.

For generating kernel documentation, give any of the commands from the root of your kernel sources.

# make pdfdocs

# make htmldocs

# make mandocs

# make psdocs

1.10 Source Browsing

Browsing source code of a large project like linux kernel can be very tedious and time consuming . Unix systems have provided two tools, ctags and cscope for browsing the codebase of large projects. Source code browsing becomes very convenient using those tools. Linux kernel has built-in support for cscope.

Using cscope ,you can :

  • Find all references of a symbol
  • Find function’s definition
  • Find the caller graph of a function
  • Find a particular text string
  • Change the particular text string
  • Find a particular file
  • Find all the files that includes a particular file.

  • There are few good tutorials available on how to use cscope,

    1. Using Cscope on large projects(example:the Linux Kernel)

    2. Using Cscope with Vim

    Please refer these tutorials and learn how to use cscope for browsing source code.