Written Component
As with HW 4, a copy of these questions is provided in the handin/written.md file in your repository. Put your answers in that file.
A: wait4 Support
-
What is the difference between
waitpidandwait4?waitpidandwait4both wait for a child process to terminate, butwait4provides additional functionality. Whilewaitpidonly returns the process ID of the terminated child,wait4also collects and returns resource usage statistics (CPU time in user mode and kernel mode) via therusageptrparameter, which points to astruct rusage. In fact,waitpidis implemented by callingwait4with a NULLrusageptr. -
Where is
struct rusagedefined in the OS/161 codebase?struct rusageis defined inkern/include/resource.h. The structure contains two fields:struct rusage { struct timeval ru_utime; /* user time used */ struct timeval ru_stime; /* system time used */ };Additional note: On the first day the assignment was released, it was actually defined in two files, once in
resource.hwhich was never included by any file in the OS/161 codebase, and once inwait.h. In an update pushed to student repos, the code inwait.hwas removed andresource.hwas included where needed instead. -
How is the data that gets put into the
struct rusagefor a process in OS/161 collected?The data collection process works as follows:
- During execution,
cpu_record_cycles()tracks kernel cycles (t_kcycles) and user cycles (t_ucycles) for each thread - When a process waits for a child to exit,
pid_wait()adds the child's cycle counts to the parent's child statistics - When a process exits,
pid_exitaccumulates the processes own cycle counts and its (terminated) children's cycle counts together to provide the final statistics for the process - In
wait4, these accumulated cycles are converted to time values usingcpu_cycles_to_timeval()and stored in therusagestructure
Other notes: There's actually a bug in the code, because
pid_wait_any()doesn't have the codepid_wait()does—it should have been abstracted into a helper function. This is a good example of why code duplication is bad. - During execution,
-
The timing results that this implementation of
wait4provides are often quite precise and repeatable. Thinking about human nature rather than system implementation, why might this be a problem?The high precision and repeatability of these timing results can be problematic because:
- It may lead developers to focus on micro-optimizations that show small, consistent improvements in timing, rather than more meaningful architectural improvements.
- The precision might create a false sense of significance, where developers attribute meaning to small timing differences that are actually due to system-specific factors rather than their code changes.
- It can encourage over-optimization at the microsecond level when that time might be better spent improving algorithm efficiency or code maintainability.
In short, something can be precise and repeatable without being meaningful, explicable, or significant in the context of the overall system performance.
B: Cycle Accounting
-
Suppose you wanted to call a function
foo()in the kernel and know exactly how many cycles it took to run. How would you do that?To measure the exact number of cycles for
foo():// Record starting cycles cpu_record_cycles(CPU_CYCLES_KERNEL); uint64_t start_cycles = curthread->t_kcycles; // Call the function foo(); // Record ending cycles cpu_record_cycles(CPU_CYCLES_KERNEL); uint64_t end_cycles = curthread->t_kcycles; // Calculate cycles used uint64_t cycles_used = end_cycles - start_cycles;This approach captures the kernel cycles used specifically by
foo()by measuring the difference int_kcyclesbefore and after the function call.We could also use the CPU’s cycle counter, but this would give us overall elapsed time, not just the time spent in
foo()(e.g., if a context switch occurred duringfoo()). Since the question is ambiguous about whether we want the total time or just the time spent infoo(), I've provided both answers.Additional note: Minor detail,
cpu_record_cycles();is supposed to be called with interrupts disabled or a spin-lock held to avoid the chance of any race conditions updating the cycle counts, but this is unlikely to be an issue in practice and isn't worth worrying about for this question. But if you mentioned it, good job!Additional context:
cpu_record_cyclesupdates bothcurcpu->c_cyclesand eithercurthread->t_kcycles(CPU_CYCLES_KERNEL),curthread->t_ucycles(CPU_CYCLES_KERNEL), or neither (CPU_CYCLES_GLOBAL). Inkern/include/cpu.h, it notes:/* * CPU's timer and cycle counter. * * cpu_reset_quantum resets the CPU's timer interrupt to go off in * one quantum from now (hardclock() is called when the timer * interrupt goes off). On MIPs, the same counter is used to count * cycles. * * cpu_record_cycles records the number of cycles used since the last * call to cpu_record_cycles with the same target, charging them to * the given target. As a convenience, cpu_record_cycles returns the * current value of curcpu->c_cycles. * * To avoid race conditios, cpu_record_cycles should be called with * interrupts disabled. If you're not sure whether to use CPU_CYCLES_KERNEL * or CPU_CYCLES_GLOBAL, you can use curthread->t_in_interrupt to decide. * Most interrupts will be unrelated to the current thread. * * Conversion functions are provided to convert between cycles and * time (which is time since boot). */ enum cpu_cycles_target { CPU_CYCLES_USER, /* Charge to current thread; user mode */ CPU_CYCLES_KERNEL, /* Charge to current thread; kernel mode */ CPU_CYCLES_GLOBAL /* Don't charge to thread */ }; void cpu_reset_quantum(void); uint64_t cpu_record_cycles(enum cpu_cycles_target target);
C: CPU Migration
-
There are two distinct reasons a thread might be sent to another CPU. What are they?
Threads can be migrated between CPUs for two reasons:
- Load Balancing: When
thread_consider_migration()is called byhardclock(), it checks if the current CPU has more than its fair share of threads. If so, it migrates threads to less busy CPUs to balance the workload. - Idle CPU Utilization: When
thread_make_runnable()is called for a new or newly-awakened thread, it checks for idle CPUs usingcpu_find_idle(). If an idle CPU is found, the thread is assigned to that CPU to make better use of available resources.
- Load Balancing: When
-
Between the code for Homework 4 and this homework, Prof. Melissa made a change to
thread_consider_migrationto fix a bug. What was the bug, and what was the thinking error (probably) that led to it?The bug was in how threads were counted for each CPU. The original code:
total_count += c->c_runqueue.tl_count;Was changed to:
total_count += c->c_runqueue.tl_count + !c->c_isidle;The thinking error was forgetting that the currently executing thread is not in the run queue. The original code only counted threads waiting to run, missing the active thread on non-idle CPUs. This led to incorrect load balancing decisions because the total thread count was undercounted by one for each busy CPU.
D: The LAMEbus Timer
-
What function is called when the LAMEbus timer fires?
When the LAMEbus timer fires, it triggers the
ltimer_irq()interrupt handler, which then callstimerclock(). Thetimerclock()function is the important one, as it's where we put code that should run when the timer fires.Additional context: Although the interval can be changed, the LAMEbus timer fires once per secont. In contrast, the
hardclock()function is called from the internal MIPs timer interrupt, which is used for scheduling, as it fires at a much higher rate (normally 100 times per second). If you delve deep into the code, there is an option for the LAMEbus timer to callhardclock()if no other timer is available, but this is not the default behavior. -
What function sets the interval for the LAMEbus timer?
The function
timerclock_setinterval()sets the interval for the LAMEbus timer. -
What does the LAMEbus device documentation for the timer say happens when you set the interval?
According to the documentation, when you set the interval by writing to the countdown time register (offset 16-19), it immediately starts the countdown timer. When the countdown reaches zero, an interrupt is generated.
-
Does the documentation indicate any way to read the amount of time remaining before the timer fires?
No, the documentation does not indicate any way to read the remaining time in the countdown timer. The documentation is rather vague as to whether the value can be read.
Additional note: In fact, you can read the value from the hardware, but it will just give you the value you wrote to the timer, not the time remaining.
E: Nanosleep
-
In the
nanosleepfunction (inclock.c), it callsgettime(&now)twice. Could the code have equivalently usedcpu_cycles_to_timespec(curcpu->c_cycles, &now);instead? Why or why not? And what's the difference between the two?While both functions produce time values, they serve different purposes:
gettime(&now)returns the current wall-clock time since the epoch (January 1, 1970)cpu_cycles_to_timespec()converts CPU cycles to time, representing time since system boot
For
nanosleep, either would work, because we care about a time delta. It doesn't matter what the starting time is, as long as we can calculate the difference between the start and end times.Additional note: Calling
gettimeinvolves external hardware since it accesses the real time clock on the system. This might seem to be slower than simply reading the CPU cycle counter, but in practice, the difference is negligible (cpu_cycles_to_timespecinvolves division and modulo operations, which are relatively slow). -
The
nanosleepfunction works well enough to run a simple demo likesleepyecho, but is in fact horribly wrong in the way it operates. What is the problem with the current implementation?The fundamental problem with the current implementation is that it uses a single shared system timer for all sleeping threads. This creates several issues:
- When a thread is about to wake up, it changes the system-wide timer interval using
timerclock_setinterval() - This change affects all other sleeping threads, potentially disrupting their sleep intervals
- If multiple threads are sleeping, the later threads can override the timer settings of earlier threads
- This leads to incorrect sleep durations and potential race conditions
The implementation works for simple cases with a single thread but breaks down with multiple sleeping threads due to this shared resource conflict.
- When a thread is about to wake up, it changes the system-wide timer interval using
More on nanosleep
Here's another version of nanosleep. It does not fix the fundamental problem with the code, but shows a different way to use the the cycle-counting functions.
int
nanosleep(struct timespec *remaining)
{
uint64_t endtime;
int64_t rem;
uint32_t newinterval, oldinterval;
int ret = 0;
cpu_timespec_to_cycles(remaining, &rem);
endtime = curcpu->c_cycles + rem;
/* Wait until we have less than a second to go */
spinlock_acquire(&lbolt_lock);
while (rem >= CPU_FREQUENCY) {
wchan_sleep(lbolt, &lbolt_lock);
rem = endtime - curcpu->c_cycles;
}
spinlock_release(&lbolt_lock);
/* Are we actually done? (or close enough?) */
if (rem < 250) {
return ret;
}
/* Wait for the last few parts of a second */
newinterval = (rem - 125) * CPU_NSECS_PER_CYCLE;
oldinterval = timerclock_setinterval(newinterval);
spinlock_acquire(&lbolt_lock);
wchan_sleep(lbolt, &lbolt_lock);
spinlock_release(&lbolt_lock);
timerclock_setinterval(oldinterval);
remaining->tv_sec = 0;
remaining->tv_nsec = 0;
return ret;
}
(When logged in, completion status appears here.)