Written Component

As with HW 4, a copy of these questions is provided in the handin/written.md file in your repository. Put your answers in that file.

A: `wait4` Support

What is the difference between waitpid and wait4?

waitpid and wait4 both wait for a child process to terminate, but wait4 provides additional functionality. While waitpid only returns the process ID of the terminated child, wait4 also collects and returns resource usage statistics (CPU time in user mode and kernel mode) via the rusageptr parameter, which points to a struct rusage. In fact, waitpid is implemented by calling wait4 with a NULL rusageptr.
Where is struct rusage defined in the OS/161 codebase?

struct rusage is defined in kern/include/resource.h. The structure contains two fields:
```
struct rusage {
        struct timeval ru_utime;  /* user time used */
        struct timeval ru_stime;  /* system time used */
};
```
Additional note: On the first day the assignment was released, it was actually defined in two files, once in resource.h which was never included by any file in the OS/161 codebase, and once in wait.h. In an update pushed to student repos, the code in wait.h was removed and resource.h was included where needed instead.
How is the data that gets put into the struct rusage for a process in OS/161 collected?

The data collection process works as follows:
- During execution, cpu_record_cycles() tracks kernel cycles (t_kcycles) and user cycles (t_ucycles) for each thread
- When a process waits for a child to exit, pid_wait() adds the child's cycle counts to the parent's child statistics
- When a process exits, pid_exit accumulates the processes own cycle counts and its (terminated) children's cycle counts together to provide the final statistics for the process
- In wait4, these accumulated cycles are converted to time values using cpu_cycles_to_timeval() and stored in the rusage structure
Other notes: There's actually a bug in the code, because pid_wait_any() doesn't have the code pid_wait() does—it should have been abstracted into a helper function. This is a good example of why code duplication is bad.
The timing results that this implementation of wait4 provides are often quite precise and repeatable. Thinking about human nature rather than system implementation, why might this be a problem?

The high precision and repeatability of these timing results can be problematic because:
- It may lead developers to focus on micro-optimizations that show small, consistent improvements in timing, rather than more meaningful architectural improvements.
- The precision might create a false sense of significance, where developers attribute meaning to small timing differences that are actually due to system-specific factors rather than their code changes.
- It can encourage over-optimization at the microsecond level when that time might be better spent improving algorithm efficiency or code maintainability.
In short, something can be precise and repeatable without being meaningful, explicable, or significant in the context of the overall system performance.

B: Cycle Accounting

Suppose you wanted to call a function foo() in the kernel and know exactly how many cycles it took to run. How would you do that?

To measure the exact number of cycles for foo():

// Record starting cycles
cpu_record_cycles(CPU_CYCLES_KERNEL);
uint64_t start_cycles = curthread->t_kcycles;

// Call the function
foo();

// Record ending cycles
cpu_record_cycles(CPU_CYCLES_KERNEL);
uint64_t end_cycles = curthread->t_kcycles;

// Calculate cycles used
uint64_t cycles_used = end_cycles - start_cycles;

This approach captures the kernel cycles used specifically by foo() by measuring the difference in t_kcycles before and after the function call.

We could also use the CPU’s cycle counter, but this would give us overall elapsed time, not just the time spent in foo() (e.g., if a context switch occurred during foo()). Since the question is ambiguous about whether we want the total time or just the time spent in foo(), I've provided both answers.

Additional note: Minor detail, cpu_record_cycles(); is supposed to be called with interrupts disabled or a spin-lock held to avoid the chance of any race conditions updating the cycle counts, but this is unlikely to be an issue in practice and isn't worth worrying about for this question. But if you mentioned it, good job!

Additional context: cpu_record_cycles updates both curcpu->c_cycles and either curthread->t_kcycles (CPU_CYCLES_KERNEL), curthread->t_ucycles (CPU_CYCLES_KERNEL), or neither (CPU_CYCLES_GLOBAL). In kern/include/cpu.h, it notes:

/*
* CPU's timer and cycle counter.
*
* cpu_reset_quantum resets the CPU's timer interrupt to go off in
* one quantum from now (hardclock() is called when the timer
* interrupt goes off). On MIPs, the same counter is used to count
* cycles.
* 
* cpu_record_cycles records the number of cycles used since the last
* call to cpu_record_cycles with the same target, charging them to
* the given target. As a convenience, cpu_record_cycles returns the 
* current value of curcpu->c_cycles.
* 
* To avoid race conditios, cpu_record_cycles should be called with
* interrupts disabled.  If you're not sure whether to use CPU_CYCLES_KERNEL
* or CPU_CYCLES_GLOBAL, you can use curthread->t_in_interrupt to decide.
* Most interrupts will be unrelated to the current thread.
* 
* Conversion functions are provided to convert between cycles and
* time (which is time since boot).
*/
enum cpu_cycles_target {
        CPU_CYCLES_USER,        /* Charge to current thread; user mode */
        CPU_CYCLES_KERNEL,      /* Charge to current thread; kernel mode */
        CPU_CYCLES_GLOBAL       /* Don't charge to thread */
};

void cpu_reset_quantum(void);
uint64_t cpu_record_cycles(enum cpu_cycles_target target);

C: CPU Migration

There are two distinct reasons a thread might be sent to another CPU. What are they?

Threads can be migrated between CPUs for two reasons:
1. Load Balancing: When thread_consider_migration() is called by hardclock(), it checks if the current CPU has more than its fair share of threads. If so, it migrates threads to less busy CPUs to balance the workload.
2. Idle CPU Utilization: When thread_make_runnable() is called for a new or newly-awakened thread, it checks for idle CPUs using cpu_find_idle(). If an idle CPU is found, the thread is assigned to that CPU to make better use of available resources.
Between the code for Homework 4 and this homework, Prof. Melissa made a change to thread_consider_migration to fix a bug. What was the bug, and what was the thinking error (probably) that led to it?

The bug was in how threads were counted for each CPU. The original code:
```
total_count += c->c_runqueue.tl_count;
```
Was changed to:
```
total_count += c->c_runqueue.tl_count + !c->c_isidle;
```
The thinking error was forgetting that the currently executing thread is not in the run queue. The original code only counted threads waiting to run, missing the active thread on non-idle CPUs. This led to incorrect load balancing decisions because the total thread count was undercounted by one for each busy CPU.

D: The LAMEbus Timer

What function is called when the LAMEbus timer fires?

When the LAMEbus timer fires, it triggers the ltimer_irq() interrupt handler, which then calls timerclock(). The timerclock() function is the important one, as it's where we put code that should run when the timer fires.

Additional context: Although the interval can be changed, the LAMEbus timer fires once per secont. In contrast, the hardclock() function is called from the internal MIPs timer interrupt, which is used for scheduling, as it fires at a much higher rate (normally 100 times per second). If you delve deep into the code, there is an option for the LAMEbus timer to call hardclock() if no other timer is available, but this is not the default behavior.
What function sets the interval for the LAMEbus timer?

The function timerclock_setinterval() sets the interval for the LAMEbus timer.
What does the LAMEbus device documentation for the timer say happens when you set the interval?

According to the documentation, when you set the interval by writing to the countdown time register (offset 16-19), it immediately starts the countdown timer. When the countdown reaches zero, an interrupt is generated.
Does the documentation indicate any way to read the amount of time remaining before the timer fires?

No, the documentation does not indicate any way to read the remaining time in the countdown timer. The documentation is rather vague as to whether the value can be read.

Additional note: In fact, you can read the value from the hardware, but it will just give you the value you wrote to the timer, not the time remaining.

E: Nanosleep

In the nanosleep function (in clock.c), it calls gettime(&now) twice. Could the code have equivalently used cpu_cycles_to_timespec(curcpu->c_cycles, &now); instead? Why or why not? And what's the difference between the two?

While both functions produce time values, they serve different purposes:
- gettime(&now) returns the current wall-clock time since the epoch (January 1, 1970)
- cpu_cycles_to_timespec() converts CPU cycles to time, representing time since system boot
For nanosleep, either would work, because we care about a time delta. It doesn't matter what the starting time is, as long as we can calculate the difference between the start and end times.

Additional note: Calling gettime involves external hardware since it accesses the real time clock on the system. This might seem to be slower than simply reading the CPU cycle counter, but in practice, the difference is negligible (cpu_cycles_to_timespec involves division and modulo operations, which are relatively slow).
The nanosleep function works well enough to run a simple demo like sleepyecho, but is in fact horribly wrong in the way it operates. What is the problem with the current implementation?

The fundamental problem with the current implementation is that it uses a single shared system timer for all sleeping threads. This creates several issues:
- When a thread is about to wake up, it changes the system-wide timer interval using timerclock_setinterval()
- This change affects all other sleeping threads, potentially disrupting their sleep intervals
- If multiple threads are sleeping, the later threads can override the timer settings of earlier threads
- This leads to incorrect sleep durations and potential race conditions
The implementation works for simple cases with a single thread but breaks down with multiple sleeping threads due to this shared resource conflict.

More on `nanosleep`

Here's another version of nanosleep. It does not fix the fundamental problem with the code, but shows a different way to use the the cycle-counting functions.

int
nanosleep(struct timespec *remaining)
{
        uint64_t endtime;
        int64_t rem;
        uint32_t newinterval, oldinterval;
        int ret = 0;

        cpu_timespec_to_cycles(remaining, &rem);
        endtime = curcpu->c_cycles + rem;

        /* Wait until we have less than a second to go */
        spinlock_acquire(&lbolt_lock);
        while (rem >= CPU_FREQUENCY) {
                wchan_sleep(lbolt, &lbolt_lock);
                rem = endtime - curcpu->c_cycles;
        }
        spinlock_release(&lbolt_lock);

        /* Are we actually done? (or close enough?) */
        if (rem < 250) {
                return ret;
        }

        /* Wait for the last few parts of a second */
        newinterval = (rem - 125) * CPU_NSECS_PER_CYCLE;
        oldinterval = timerclock_setinterval(newinterval);
        spinlock_acquire(&lbolt_lock);
        wchan_sleep(lbolt, &lbolt_lock);
        spinlock_release(&lbolt_lock);
        timerclock_setinterval(oldinterval);
        remaining->tv_sec = 0;
        remaining->tv_nsec = 0;

        return ret;
}

(When logged in, completion status appears here.)

Previous Page Go to Parent Page

CS 134

Warning: Missing Prerequisites

Written Component

A: `wait4` Support

B: Cycle Accounting

C: CPU Migration

D: The LAMEbus Timer

E: Nanosleep

More on `nanosleep`

Instructor Controls

Warning: Missing Prerequisites

Written Component

A: wait4 Support

B: Cycle Accounting

C: CPU Migration

D: The LAMEbus Timer

E: Nanosleep

More on nanosleep

Instructor Controls

A: `wait4` Support

More on `nanosleep`