Processes
Processes are perhaps the most important abstraction an operating system provides, since each process represents a running program on the system.
Processes Visualized
The visualization below shows a snapshot of the process hierarchy on the CS 134 Linux server we use for this class. Each node in the visualization represents a process, and the edges between nodes represent parent-child relationships between processes. Nodes are color coded according to the user that owns the process.
The visualization is interactive:
- Hover your mouse over a node to see more information about the process.
- Click on empty space and drag to pan the visualization.
- Use the mouse wheel (or scrolling gesture) to zoom in and out.
- Click and drag a node to move it around to improve the layout.
- Click the checkbox at the bottom to show or hide kernel threads.
Processes on the CS 134 Linux Server
So the root process is
systemd, right? And it is PID 1, so that's the first process started by the Linux kernel.
That's right.
OMG! There are so many processes on the system! Some of them are running programs that I recognize, like
sshandman, but there are so many I have no idea about. Like, what iswireplumber? Orgsd-housekeepin, orcron?
And what's with all the kernel threads? Why are there so many of them?
The important point here is less about the specifics of all these programs, but just to give you the overall sense that on a modern machine, even when it seems to be doing very little, there are actually a myriad of processes running.
Okay, sure, but still, what about all those kernel threads? Why are there so many of them?
Deeper Dive: Kernel Threads
Kernel threads are a special type of process that runs in kernel space. They are used by the kernel to perform tasks that require kernel-level privileges, such as managing hardware devices, handling interrupts, and scheduling other processes. Kernel threads are created by the kernel itself and are not associated with any particular user program.
In many cases—for example, managing migration of processes between CPUs—the Linux kernel uses one kernel thread per CPU to handle the task. As the number of available cores in machines rise, this design choice does lead to a proliferation of kernel threads, since it is number of distinct kernel tasks × number of cores.
But today we're thinking more about user-level processes, so we'll leave the kernel threads aside for now.
If you're curious about the processes on a Mac, click the button below and it'll change the visualization above to show the process hierarchy on a Mac (don't worry about the question!). There are plenty of processes on a Mac, too, and the graph is even harder to read because so many of them are children of the launchd process (PID 1).
With all those processes running 'round and 'round, it's a wonder the computer isn't totally overwhelmed!
Well, here's the thing: most of these processes actually aren't running. They're just sitting there, waiting for something to happen.
This is probably a good time to talk about the lifecycle of a process.
Process Lifecycle
A process goes through several states during its lifetime. The diagram below shows the lifecycle of a process, with the transitions between states labeled with the system calls that cause them.
In OS/161, these correspond to the states of a thread, right?
Yes. In OS/161 we had the states
S_RUN,S_READY,S_SLEEP, andS_ZOMBIE. There's noS_NEWbecause there is no delay between creating a thread and it being able to run. In some other systems, there might be some kind of load balancing or scheduling delay between creating a thread and it being able to run, which is why they have aNEWstate.
Hay! But why is it a thread in OS/161 and a process here?
We'll get to that in a bit when we talk about threads.
Processes Are An Abstraction
When you look at something like the web browser you're using to view this page, or a program you've written for an assignment, it's natural to feel like processes are real things. But they're actually an abstraction, something the operating system “makes up”. It's little different from other coding tasks, like writing a binary-search tree, where the “nodes” of the tree are an abstraction that you've created to help you solve a problem. And in the same way you can choose what information to store in each node of a binary-search tree, when we design an operating system, we get to decide what it means to be a process, and what attributes a process has.
So, like, a process is just a figment of the operating system's imagination?
Yes; and yet from the perspective of a program running as a process, it feels real.
Isn't everything like that? I mean, like, this course or just this lesson. It feels like a real thing, but it's also something that Prof. Melissa made up.
Meh. We're not real. Got it. Now I really don't need to worry about passing the class.
I'm having an existential crisis now.
Process Attributes
Memory Space
The most obvious thing a running program needs is memory. The operating system provides each process with its own address space, which is the memory that the process can access. The address space needs space for
- Program code: The instructions that the process is executing.
- Program data: The variables and data structures that the process uses.
- Including the heap, where dynamically allocated memory is stored.
- Program stack: The stack that the process uses for function calls and local variables.
Execution State
A running program isn't just a collection of data in memory. It also has an execution state, which represents the current state of the CPU executing the program. The execution state includes
- Registers: The CPU register state of the process, including
- The program counter, which points to the next instruction to execute.
- The stack pointer, which points to the top of the stack.
I/O State
A program often interacts with the outside world through input and output operations. The operating system keeps track of the I/O state of each process, including
- File descriptors: The files that the process has open.
- Working directory: The directory that the process is currently working in.
- Root directory: The root directory of the process.
Scheduling Information
The operating system also keeps track of scheduling information for each process, including
- Process state: The current state of the process (e.g., running, sleeping, waiting).
- Priority: The priority of the process in the scheduler.
- Class: The scheduling class of the process (e.g., real-time, time-sharing).
Process Information
There are also various pieces of information about each process, including
- Process ID (PID): A unique identifier for the process.
- Parent Process ID (PPID): The PID of the process's parent.
- Start time: The time when the process was started.
- CPU time: The amount of CPU time used by the process.
Event Notifications
The operating system tracks event notifications for each process, including
- Signals waiting: The signals that the process is waiting for.
- Signal mask: The signals that the process has blocked.
- Time of next alarm: The time when the process's next alarm is scheduled to go off.
Security Information
Finally, the operating system keeps track of security and authentication information for each process, including
- User ID (UID): The user that owns the process.
- Group ID (GID): The group that owns the process.
- Access rights: The permissions that the process has.
And More…
This list isn't really exhaustive (different OSs might track additional elements), but it should give you a sense of the kinds of information that the operating system needs to keep track of for each process.
(When logged in, completion status appears here.)