DO NOT MAIL: xv6 web pages

This commit is contained in:
rsc 2008-09-03 04:50:04 +00:00
parent ee3f75f229
commit f53494c28e
37 changed files with 9034 additions and 0 deletions

3
web/Makefile Normal file
View file

@ -0,0 +1,3 @@
index.html: index.txt mkhtml
mkhtml index.txt >_$@ && mv _$@ $@

353
web/index.html Normal file
View file

@ -0,0 +1,353 @@
<!-- AUTOMATICALLY GENERATED: EDIT the .txt version, not the .html version -->
<html>
<head>
<title>Xv6, a simple Unix-like teaching operating system</title>
<style type="text/css"><!--
body {
background-color: white;
color: black;
font-size: medium;
line-height: 1.2em;
margin-left: 0.5in;
margin-right: 0.5in;
margin-top: 0;
margin-bottom: 0;
}
h1 {
text-indent: 0in;
text-align: left;
margin-top: 2em;
font-weight: bold;
font-size: 1.4em;
}
h2 {
text-indent: 0in;
text-align: left;
margin-top: 2em;
font-weight: bold;
font-size: 1.2em;
}
--></style>
</head>
<body bgcolor=#ffffff>
<h1>Xv6, a simple Unix-like teaching operating system</h1>
<br><br>
Xv6 is a teaching operating system developed
in the summer of 2006 for MIT's operating systems course,
&ldquo;6.828: Operating Systems Engineering.&rdquo;
We used it for 6.828 in Fall 2006 and Fall 2007
and are using it this semester (Fall 2008).
We hope that xv6 will be useful in other courses too.
This page collects resources to aid the use of xv6
in other courses.
<h2>History and Background</h2>
For many years, MIT had no operating systems course.
In the fall of 2002, Frans Kaashoek, Josh Cates, and Emil Sit
created a new, experimental course (6.097)
to teach operating systems engineering.
In the course lectures, the class worked through Sixth Edition Unix (aka V6)
using John Lions's famous commentary.
In the lab assignments, students wrote most of an exokernel operating
system, eventually named Jos, for the Intel x86.
Exposing students to multiple systems&ndash;V6 and Jos&ndash;helped
develop a sense of the spectrum of operating system designs.
In the fall of 2003, the experimental 6.097 became the
official course 6.828; the course has been offered each fall since then.
<br><br>
V6 presented pedagogic challenges from the start.
Students doubted the relevance of an obsolete 30-year-old operating system
written in an obsolete programming language (pre-K&R C)
running on obsolete hardware (the PDP-11).
Students also struggled to learn the low-level details of two different
architectures (the PDP-11 and the Intel x86) at the same time.
By the summer of 2006, we had decided to replace V6
with a new operating system, xv6, modeled on V6
but written in ANSI C and running on multiprocessor
Intel x86 machines.
Xv6's use of the x86 makes it more relevant to
students' experience than V6 was
and unifies the course around a single architecture.
Adding multiprocessor support also helps relevance
and makes it easier to discuss threads and concurrency.
(In a single processor operating system, concurrency&ndash;which only
happens because of interrupts&ndash;is too easy to view as a special case.
A multiprocessor operating system must attack the problem head on.)
Finally, writing a new system allowed us to write cleaner versions
of the rougher parts of V6, like the scheduler and file system.
<br><br>
6.828 substituted xv6 for V6 in the fall of 2006.
Based on that experience, we cleaned up rough patches
of xv6 for the course in the fall of 2007.
Since then, xv6 has stabilized, so we are making it
available in the hopes that others will find it useful too.
<br><br>
6.828 uses both xv6 and Jos.
Courses taught at UCLA, NYU, and Stanford have used
Jos without xv6; we believe other courses could use
xv6 without Jos, though we are not aware of any that have.
<h2>Xv6 sources</h2>
The latest xv6 is <a href="xv6-rev2.tar.gz">xv6-rev2.tar.gz</a>.
We distribute the sources in electronic form but also as
a printed booklet with line numbers that keep everyone
together during lectures. The booklet is available as
<a href="xv6-rev2.pdf">xv6-rev2.pdf</a>.
<br><br>
xv6 compiles using the GNU C compiler,
targeted at the x86 using ELF binaries.
On BSD and Linux systems, you can use the native compilers;
On OS X, which doesn't use ELF binaries,
you must use a cross-compiler.
Xv6 does boot on real hardware, but typically
we run it using the Bochs emulator.
Both the GCC cross compiler and Bochs
can be found on the <a href="../../2007/tools.html">6.828 tools page</a>.
<h2>Lectures</h2>
In 6.828, the lectures in the first half of the course
introduce the PC hardware, the Intel x86, and then xv6.
The lectures in the second half consider advanced topics
using research papers; for some, xv6 serves as a useful
base for making discussions concrete.
This section describe a typical 6.828 lecture schedule,
linking to lecture notes and homework.
A course using only xv6 (not Jos) will need to adapt
a few of the lectures, but we hope these are a useful
starting point.
<br><br><b><i>Lecture 1. Operating systems</i></b>
<br><br>
The first lecture introduces both the general topic of
operating systems and the specific approach of 6.828.
After defining &ldquo;operating system,&rdquo; the lecture
examines the implementation of a Unix shell
to look at the details the traditional Unix system call interface.
This is relevant to both xv6 and Jos: in the final
Jos labs, students implement a Unix-like interface
and culminating in a Unix shell.
<br><br>
<a href="l1.html">lecture notes</a>
<br><br><b><i>Lecture 2. PC hardware and x86 programming</i></b>
<br><br>
This lecture introduces the PC architecture, the 16- and 32-bit x86,
the stack, and the GCC x86 calling conventions.
It also introduces the pieces of a typical C tool chain&ndash;compiler,
assembler, linker, loader&ndash;and the Bochs emulator.
<br><br>
Reading: PC Assembly Language
<br><br>
Homework: familiarize with Bochs
<br><br>
<a href="l2.html">lecture notes</a>
<a href="x86-intro.html">homework</a>
<br><br><b><i>Lecture 3. Operating system organization</i></b>
<br><br>
This lecture continues Lecture 1's discussion of what
an operating system does.
An operating system provides a &ldquo;virtual computer&rdquo;
interface to user space programs.
At a high level, the main job of the operating system
is to implement that interface
using the physical computer it runs on.
<br><br>
The lecture discusses four approaches to that job:
monolithic operating systems, microkernels,
virtual machines, and exokernels.
Exokernels might not be worth mentioning
except that the Jos labs are built around one.
<br><br>
Reading: Engler et al., Exokernel: An Operating System Architecture
for Application-Level Resource Management
<br><br>
<a href="l3.html">lecture notes</a>
<br><br><b><i>Lecture 4. Address spaces using segmentation</i></b>
<br><br>
This is the first lecture that uses xv6.
It introduces the idea of address spaces and the
details of the x86 segmentation hardware.
It makes the discussion concrete by reading the xv6
source code and watching xv6 execute using the Bochs simulator.
<br><br>
Reading: x86 MMU handout,
xv6: bootasm.S, bootother.S, <a href="src/bootmain.c.html">bootmain.c</a>, <a href="src/main.c.html">main.c</a>, <a href="src/init.c.html">init.c</a>, and setupsegs in <a href="src/proc.c.html">proc.c</a>.
<br><br>
Homework: Bochs stack introduction
<br><br>
<a href="l4.html">lecture notes</a>
<a href="xv6-intro.html">homework</a>
<br><br><b><i>Lecture 5. Address spaces using page tables</i></b>
<br><br>
This lecture continues the discussion of address spaces,
examining the other x86 virtual memory mechanism: page tables.
Xv6 does not use page tables, so there is no xv6 here.
Instead, the lecture uses Jos as a concrete example.
An xv6-only course might skip or shorten this discussion.
<br><br>
Reading: x86 manual excerpts
<br><br>
Homework: stuff about gdt
XXX not appropriate; should be in Lecture 4
<br><br>
<a href="l5.html">lecture notes</a>
<br><br><b><i>Lecture 6. Interrupts and exceptions</i></b>
<br><br>
How does a user program invoke the operating system kernel?
How does the kernel return to the user program?
What happens when a hardware device needs attention?
This lecture explains the answer to these questions:
interrupt and exception handling.
<br><br>
It explains the x86 trap setup mechanisms and then
examines their use in xv6's SETGATE (<a href="src/mmu.h.html">mmu.h</a>),
tvinit (<a href="src/trap.c.html">trap.c</a>), idtinit (<a href="src/trap.c.html">trap.c</a>), <a href="src/vectors.pl.html">vectors.pl</a>, and vectors.S.
<br><br>
It then traces through a call to the system call open:
<a href="src/init.c.html">init.c</a>, usys.S, vector48 and alltraps (vectors.S), trap (<a href="src/trap.c.html">trap.c</a>),
syscall (<a href="src/syscall.c.html">syscall.c</a>),
sys_open (<a href="src/sysfile.c.html">sysfile.c</a>), fetcharg, fetchint, argint, argptr, argstr (<a href="src/syscall.c.html">syscall.c</a>),
<br><br>
The interrupt controller, briefly:
pic_init and pic_enable (<a href="src/picirq.c.html">picirq.c</a>).
The timer and keyboard, briefly:
timer_init (<a href="src/timer.c.html">timer.c</a>), console_init (<a href="src/console.c.html">console.c</a>).
Enabling and disabling of interrupts.
<br><br>
Reading: x86 manual excerpts,
xv6: trapasm.S, <a href="src/trap.c.html">trap.c</a>, <a href="src/syscall.c.html">syscall.c</a>, and usys.S.
Skim <a href="src/lapic.c.html">lapic.c</a>, <a href="src/ioapic.c.html">ioapic.c</a>, <a href="src/picirq.c.html">picirq.c</a>.
<br><br>
Homework: Explain the 35 words on the top of the
stack at first invocation of <code>syscall</code>.
<br><br>
<a href="l-interrupt.html">lecture notes</a>
<a href="x86-intr.html">homework</a>
<br><br><b><i>Lecture 7. Multiprocessors and locking</i></b>
<br><br>
This lecture introduces the problems of
coordination and synchronization on a
multiprocessor
and then the solution of mutual exclusion locks.
Atomic instructions, test-and-set locks,
lock granularity, (the mistake of) recursive locks.
<br><br>
Although xv6 user programs cannot share memory,
the xv6 kernel itself is a program with multiple threads
executing concurrently and sharing memory.
Illustration: the xv6 scheduler's proc_table_lock (<a href="src/proc.c.html">proc.c</a>)
and the spin lock implementation (<a href="src/spinlock.c.html">spinlock.c</a>).
<br><br>
Reading: xv6: <a href="src/spinlock.c.html">spinlock.c</a>. Skim <a href="src/mp.c.html">mp.c</a>.
<br><br>
Homework: Interaction between locking and interrupts.
Try not disabling interrupts in the disk driver and watch xv6 break.
<br><br>
<a href="l-lock.html">lecture notes</a>
<a href="xv6-lock.html">homework</a>
<br><br><b><i>Lecture 8. Threads, processes and context switching</i></b>
<br><br>
The last lecture introduced some of the issues
in writing threaded programs, using xv6's processes
as an example.
This lecture introduces the issues in implementing
threads, continuing to use xv6 as the example.
<br><br>
The lecture defines a thread of computation as a register
set and a stack. A process is an address space plus one
or more threads of computation sharing that address space.
Thus the xv6 kernel can be viewed as a single process
with many threads (each user process) executing concurrently.
<br><br>
Illustrations: thread switching (swtch.S), scheduler (<a href="src/proc.c.html">proc.c</a>), sys_fork (<a href="src/sysproc.c.html">sysproc.c</a>)
<br><br>
Reading: <a href="src/proc.c.html">proc.c</a>, swtch.S, sys_fork (<a href="src/sysproc.c.html">sysproc.c</a>)
<br><br>
Homework: trace through stack switching.
<br><br>
<a href="l-threads.html">lecture notes (need to be updated to use swtch)</a>
<a href="xv6-sched.html">homework</a>
<br><br><b><i>Lecture 9. Processes and coordination</i></b>
<br><br>
This lecture introduces the idea of sequence coordination
and then examines the particular solution illustrated by
sleep and wakeup (<a href="src/proc.c.html">proc.c</a>).
It introduces and refines a simple
producer/consumer queue to illustrate the
need for sleep and wakeup
and then the sleep and wakeup
implementations themselves.
<br><br>
Reading: <a href="src/proc.c.html">proc.c</a>, sys_exec, sys_sbrk, sys_wait, sys_exec, sys_kill (<a href="src/sysproc.c.html">sysproc.c</a>).
<br><br>
Homework: Explain how sleep and wakeup would break
without proc_table_lock. Explain how devices would break
without second lock argument to sleep.
<br><br>
<a href="l-coordination.html">lecture notes</a>
<a href="xv6-sleep.html">homework</a>
<br><br><b><i>Lecture 10. Files and disk I/O</i></b>
<br><br>
This is the first of three file system lectures.
This lecture introduces the basic file system interface
and then considers the on-disk layout of individual files
and the free block bitmap.
<br><br>
Reading: iread, iwrite, fileread, filewrite, wdir, mknod1, and
code related to these calls in <a href="src/fs.c.html">fs.c</a>, <a href="src/bio.c.html">bio.c</a>, <a href="src/ide.c.html">ide.c</a>, and <a href="src/file.c.html">file.c</a>.
<br><br>
Homework: Add print to bwrite to trace every disk write.
Explain the disk writes caused by some simple shell commands.
<br><br>
<a href="l-fs.html">lecture notes</a>
<a href="xv6-disk.html">homework</a>
<br><br><b><i>Lecture 11. Naming</i></b>
<br><br>
The last lecture discussed on-disk file system representation.
This lecture covers the implementation of
file system paths (namei in <a href="src/fs.c.html">fs.c</a>)
and also discusses the security problems of a shared /tmp
and symbolic links.
<br><br>
Understanding exec (<a href="src/exec.c.html">exec.c</a>) is left as an exercise.
<br><br>
Reading: namei in <a href="src/fs.c.html">fs.c</a>, <a href="src/sysfile.c.html">sysfile.c</a>, <a href="src/file.c.html">file.c</a>.
<br><br>
Homework: Explain how to implement symbolic links in xv6.
<br><br>
<a href="l-name.html">lecture notes</a>
<a href="xv6-names.html">homework</a>
<br><br><b><i>Lecture 12. High-performance file systems</i></b>
<br><br>
This lecture is the first of the research paper-based lectures.
It discusses the &ldquo;soft updates&rdquo; paper,
using xv6 as a concrete example.
<h2>Feedback</h2>
If you are interested in using xv6 or have used xv6 in a course,
we would love to hear from you.
If there's anything that we can do to make xv6 easier
to adopt, we'd like to hear about it.
We'd also be interested to hear what worked well and what didn't.
<br><br>
Russ Cox (rsc@swtch.com)<br>
Frans Kaashoek (kaashoek@mit.edu)<br>
Robert Morris (rtm@mit.edu)
<br><br>
You can reach all of us at 6.828-staff@pdos.csail.mit.edu.
<br><br>
<br><br>
</body>
</html>

335
web/index.txt Normal file
View file

@ -0,0 +1,335 @@
** Xv6, a simple Unix-like teaching operating system
Xv6 is a teaching operating system developed
in the summer of 2006 for MIT's operating systems course,
``6.828: Operating Systems Engineering.''
We used it for 6.828 in Fall 2006 and Fall 2007
and are using it this semester (Fall 2008).
We hope that xv6 will be useful in other courses too.
This page collects resources to aid the use of xv6
in other courses.
* History and Background
For many years, MIT had no operating systems course.
In the fall of 2002, Frans Kaashoek, Josh Cates, and Emil Sit
created a new, experimental course (6.097)
to teach operating systems engineering.
In the course lectures, the class worked through Sixth Edition Unix (aka V6)
using John Lions's famous commentary.
In the lab assignments, students wrote most of an exokernel operating
system, eventually named Jos, for the Intel x86.
Exposing students to multiple systems--V6 and Jos--helped
develop a sense of the spectrum of operating system designs.
In the fall of 2003, the experimental 6.097 became the
official course 6.828; the course has been offered each fall since then.
V6 presented pedagogic challenges from the start.
Students doubted the relevance of an obsolete 30-year-old operating system
written in an obsolete programming language (pre-K&R C)
running on obsolete hardware (the PDP-11).
Students also struggled to learn the low-level details of two different
architectures (the PDP-11 and the Intel x86) at the same time.
By the summer of 2006, we had decided to replace V6
with a new operating system, xv6, modeled on V6
but written in ANSI C and running on multiprocessor
Intel x86 machines.
Xv6's use of the x86 makes it more relevant to
students' experience than V6 was
and unifies the course around a single architecture.
Adding multiprocessor support also helps relevance
and makes it easier to discuss threads and concurrency.
(In a single processor operating system, concurrency--which only
happens because of interrupts--is too easy to view as a special case.
A multiprocessor operating system must attack the problem head on.)
Finally, writing a new system allowed us to write cleaner versions
of the rougher parts of V6, like the scheduler and file system.
6.828 substituted xv6 for V6 in the fall of 2006.
Based on that experience, we cleaned up rough patches
of xv6 for the course in the fall of 2007.
Since then, xv6 has stabilized, so we are making it
available in the hopes that others will find it useful too.
6.828 uses both xv6 and Jos.
Courses taught at UCLA, NYU, and Stanford have used
Jos without xv6; we believe other courses could use
xv6 without Jos, though we are not aware of any that have.
* Xv6 sources
The latest xv6 is [xv6-rev2.tar.gz].
We distribute the sources in electronic form but also as
a printed booklet with line numbers that keep everyone
together during lectures. The booklet is available as
[xv6-rev2.pdf].
xv6 compiles using the GNU C compiler,
targeted at the x86 using ELF binaries.
On BSD and Linux systems, you can use the native compilers;
On OS X, which doesn't use ELF binaries,
you must use a cross-compiler.
Xv6 does boot on real hardware, but typically
we run it using the Bochs emulator.
Both the GCC cross compiler and Bochs
can be found on the [../../2007/tools.html | 6.828 tools page].
* Lectures
In 6.828, the lectures in the first half of the course
introduce the PC hardware, the Intel x86, and then xv6.
The lectures in the second half consider advanced topics
using research papers; for some, xv6 serves as a useful
base for making discussions concrete.
This section describe a typical 6.828 lecture schedule,
linking to lecture notes and homework.
A course using only xv6 (not Jos) will need to adapt
a few of the lectures, but we hope these are a useful
starting point.
Lecture 1. Operating systems
The first lecture introduces both the general topic of
operating systems and the specific approach of 6.828.
After defining ``operating system,'' the lecture
examines the implementation of a Unix shell
to look at the details the traditional Unix system call interface.
This is relevant to both xv6 and Jos: in the final
Jos labs, students implement a Unix-like interface
and culminating in a Unix shell.
[l1.html | lecture notes]
Lecture 2. PC hardware and x86 programming
This lecture introduces the PC architecture, the 16- and 32-bit x86,
the stack, and the GCC x86 calling conventions.
It also introduces the pieces of a typical C tool chain--compiler,
assembler, linker, loader--and the Bochs emulator.
Reading: PC Assembly Language
Homework: familiarize with Bochs
[l2.html | lecture notes]
[x86-intro.html | homework]
Lecture 3. Operating system organization
This lecture continues Lecture 1's discussion of what
an operating system does.
An operating system provides a ``virtual computer''
interface to user space programs.
At a high level, the main job of the operating system
is to implement that interface
using the physical computer it runs on.
The lecture discusses four approaches to that job:
monolithic operating systems, microkernels,
virtual machines, and exokernels.
Exokernels might not be worth mentioning
except that the Jos labs are built around one.
Reading: Engler et al., Exokernel: An Operating System Architecture
for Application-Level Resource Management
[l3.html | lecture notes]
Lecture 4. Address spaces using segmentation
This is the first lecture that uses xv6.
It introduces the idea of address spaces and the
details of the x86 segmentation hardware.
It makes the discussion concrete by reading the xv6
source code and watching xv6 execute using the Bochs simulator.
Reading: x86 MMU handout,
xv6: bootasm.S, bootother.S, bootmain.c, main.c, init.c, and setupsegs in proc.c.
Homework: Bochs stack introduction
[l4.html | lecture notes]
[xv6-intro.html | homework]
Lecture 5. Address spaces using page tables
This lecture continues the discussion of address spaces,
examining the other x86 virtual memory mechanism: page tables.
Xv6 does not use page tables, so there is no xv6 here.
Instead, the lecture uses Jos as a concrete example.
An xv6-only course might skip or shorten this discussion.
Reading: x86 manual excerpts
Homework: stuff about gdt
XXX not appropriate; should be in Lecture 4
[l5.html | lecture notes]
Lecture 6. Interrupts and exceptions
How does a user program invoke the operating system kernel?
How does the kernel return to the user program?
What happens when a hardware device needs attention?
This lecture explains the answer to these questions:
interrupt and exception handling.
It explains the x86 trap setup mechanisms and then
examines their use in xv6's SETGATE (mmu.h),
tvinit (trap.c), idtinit (trap.c), vectors.pl, and vectors.S.
It then traces through a call to the system call open:
init.c, usys.S, vector48 and alltraps (vectors.S), trap (trap.c),
syscall (syscall.c),
sys_open (sysfile.c), fetcharg, fetchint, argint, argptr, argstr (syscall.c),
The interrupt controller, briefly:
pic_init and pic_enable (picirq.c).
The timer and keyboard, briefly:
timer_init (timer.c), console_init (console.c).
Enabling and disabling of interrupts.
Reading: x86 manual excerpts,
xv6: trapasm.S, trap.c, syscall.c, and usys.S.
Skim lapic.c, ioapic.c, picirq.c.
Homework: Explain the 35 words on the top of the
stack at first invocation of <code>syscall</code>.
[l-interrupt.html | lecture notes]
[x86-intr.html | homework]
Lecture 7. Multiprocessors and locking
This lecture introduces the problems of
coordination and synchronization on a
multiprocessor
and then the solution of mutual exclusion locks.
Atomic instructions, test-and-set locks,
lock granularity, (the mistake of) recursive locks.
Although xv6 user programs cannot share memory,
the xv6 kernel itself is a program with multiple threads
executing concurrently and sharing memory.
Illustration: the xv6 scheduler's proc_table_lock (proc.c)
and the spin lock implementation (spinlock.c).
Reading: xv6: spinlock.c. Skim mp.c.
Homework: Interaction between locking and interrupts.
Try not disabling interrupts in the disk driver and watch xv6 break.
[l-lock.html | lecture notes]
[xv6-lock.html | homework]
Lecture 8. Threads, processes and context switching
The last lecture introduced some of the issues
in writing threaded programs, using xv6's processes
as an example.
This lecture introduces the issues in implementing
threads, continuing to use xv6 as the example.
The lecture defines a thread of computation as a register
set and a stack. A process is an address space plus one
or more threads of computation sharing that address space.
Thus the xv6 kernel can be viewed as a single process
with many threads (each user process) executing concurrently.
Illustrations: thread switching (swtch.S), scheduler (proc.c), sys_fork (sysproc.c)
Reading: proc.c, swtch.S, sys_fork (sysproc.c)
Homework: trace through stack switching.
[l-threads.html | lecture notes (need to be updated to use swtch)]
[xv6-sched.html | homework]
Lecture 9. Processes and coordination
This lecture introduces the idea of sequence coordination
and then examines the particular solution illustrated by
sleep and wakeup (proc.c).
It introduces and refines a simple
producer/consumer queue to illustrate the
need for sleep and wakeup
and then the sleep and wakeup
implementations themselves.
Reading: proc.c, sys_exec, sys_sbrk, sys_wait, sys_exec, sys_kill (sysproc.c).
Homework: Explain how sleep and wakeup would break
without proc_table_lock. Explain how devices would break
without second lock argument to sleep.
[l-coordination.html | lecture notes]
[xv6-sleep.html | homework]
Lecture 10. Files and disk I/O
This is the first of three file system lectures.
This lecture introduces the basic file system interface
and then considers the on-disk layout of individual files
and the free block bitmap.
Reading: iread, iwrite, fileread, filewrite, wdir, mknod1, and
code related to these calls in fs.c, bio.c, ide.c, and file.c.
Homework: Add print to bwrite to trace every disk write.
Explain the disk writes caused by some simple shell commands.
[l-fs.html | lecture notes]
[xv6-disk.html | homework]
Lecture 11. Naming
The last lecture discussed on-disk file system representation.
This lecture covers the implementation of
file system paths (namei in fs.c)
and also discusses the security problems of a shared /tmp
and symbolic links.
Understanding exec (exec.c) is left as an exercise.
Reading: namei in fs.c, sysfile.c, file.c.
Homework: Explain how to implement symbolic links in xv6.
[l-name.html | lecture notes]
[xv6-names.html | homework]
Lecture 12. High-performance file systems
This lecture is the first of the research paper-based lectures.
It discusses the ``soft updates'' paper,
using xv6 as a concrete example.
* Feedback
If you are interested in using xv6 or have used xv6 in a course,
we would love to hear from you.
If there's anything that we can do to make xv6 easier
to adopt, we'd like to hear about it.
We'd also be interested to hear what worked well and what didn't.
Russ Cox (rsc@swtch.com)<br>
Frans Kaashoek (kaashoek@mit.edu)<br>
Robert Morris (rtm@mit.edu)
You can reach all of us at 6.828-staff@pdos.csail.mit.edu.

187
web/l-bugs.html Normal file
View file

@ -0,0 +1,187 @@
<title>OS Bugs</title>
<html>
<head>
</head>
<body>
<h1>OS Bugs</h1>
<p>Required reading: Bugs as deviant behavior
<h2>Overview</h2>
<p>Operating systems must obey many rules for correctness and
performance. Examples rules:
<ul>
<li>Do not call blocking functions with interrupts disabled or spin
lock held
<li>check for NULL results
<li>Do not allocate large stack variables
<li>Do no re-use already-allocated memory
<li>Check user pointers before using them in kernel mode
<li>Release acquired locks
</ul>
<p>In addition, there are standard software engineering rules, like
use function results in consistent ways.
<p>These rules are typically not checked by a compiler, even though
they could be checked by a compiler, in principle. The goal of the
meta-level compilation project is to allow system implementors to
write system-specific compiler extensions that check the source code
for rule violations.
<p>The results are good: many new bugs found (500-1000) in Linux
alone. The paper for today studies these bugs and attempts to draw
lessons from these bugs.
<p>Are kernel error worse than user-level errors? That is, if we get
the kernel correct, then we won't have system crashes?
<h2>Errors in JOS kernel</h2>
<p>What are unstated invariants in the JOS?
<ul>
<li>Interrupts are disabled in kernel mode
<li>Only env 1 has access to disk
<li>All registers are saved & restored on context switch
<li>Application code is never executed with CPL 0
<li>Don't allocate an already-allocated physical page
<li>Propagate error messages to user applications (e.g., out of
resources)
<li>Map pipe before fd
<li>Unmap fd before pipe
<li>A spawned program should have open only file descriptors 0, 1, and 2.
<li>Pass sometimes size in bytes and sometimes in block number to a
given file system function.
<li>User pointers should be run through TRUP before used by the kernel
</ul>
<p>Could these errors have been caught by metacompilation? Would
metacompilation have caught the pipe race condition? (Probably not,
it happens in only one place.)
<p>How confident are you that your code is correct? For example,
are you sure interrupts are always disabled in kernel mode? How would
you test?
<h2>Metacompilation</h2>
<p>A system programmer writes the rule checkers in a high-level,
state-machine language (metal). These checkers are dynamically linked
into an extensible version of g++, xg++. Xg++ applies the rule
checkers to every possible execution path of a function that is being
compiled.
<p>An example rule from
the <a
href="http://www.stanford.edu/~engler/exe-ccs-06.pdf">OSDI
paper</a>:
<pre>
sm check_interrupts {
decl { unsigned} flags;
pat enable = { sti(); } | {restore_flags(flags);} ;
pat disable = { cli(); };
is_enabled: disable ==> is_disabled | enable ==> { err("double
enable")};
...
</pre>
A more complete version found 82 errors in the Linux 2.3.99 kernel.
<p>Common mistake:
<pre>
get_free_buffer ( ... ) {
....
save_flags (flags);
cli ();
if ((bh = sh->buffer_pool) == NULL)
return NULL;
....
}
</pre>
<p>(Figure 2 also lists a simple metarule.)
<p>Some checkers produce false positives, because of limitations of
both static analysis and the checkers, which mostly use local
analysis.
<p>How does the <b>block</b> checker work? The first pass is a rule
that marks functions as potentially blocking. After processing a
function, the checker emits the function's flow graph to a file
(including, annotations and functions called). The second pass takes
the merged flow graph of all function calls, and produces a file with
all functions that have a path in the control-flow-graph to a blocking
function call. For the Linux kernel this results in 3,000 functions
that potentially could call sleep. Yet another checker like
check_interrupts checks if a function calls any of the 3,000 functions
with interrupts disabled. Etc.
<h2>This paper</h2>
<p>Writing rules is painful. First, you have to write them. Second,
how do you decide what to check? Was it easy to enumerate all
conventions for JOS?
<p>Insight: infer programmer "beliefs" from code and cross-check
for contradictions. If <i>cli</i> is always followed by <i>sti</i>,
except in one case, perhaps something is wrong. This simplifies
life because we can write generic checkers instead of checkers
that specifically check for <i>sti</i>, and perhaps we get lucky
and find other temporal ordering conventions.
<p>Do we know which case is wrong? The 999 times or the 1 time that
<i>sti</i> is absent? (No, this method cannot figure what the correct
sequence is but it can flag that something is weird, which in practice
useful.) The method just detects inconsistencies.
<p>Is every inconsistency an error? No, some inconsistency don't
indicate an error. If a call to function <i>f</i> is often followed
by call to function <i>g</i>, does that imply that f should always be
followed by g? (No!)
<p>Solution: MUST beliefs and MAYBE beliefs. MUST beliefs are
invariants that must hold; any inconsistency indicates an error. If a
pointer is dereferences, then the programmer MUST believe that the
pointer is pointing to something that can be dereferenced (i.e., the
pointer is definitely not zero). MUST beliefs can be checked using
"internal inconsistencies".
<p>An aside, can zero pointers pointers be detected during runtime?
(Sure, unmap the page at address zero.) Why is metacompilation still
valuable? (At runtime you will find only the null pointers that your
test code dereferenced; not all possible dereferences of null
pointers.) An even more convincing example for Metacompilation is
tracking user pointers that the kernel dereferences. (Is this a MUST
belief?)
<p>MAYBE beliefs are invariants that are suggested by the code, but
they maybe coincidences. MAYBE beliefs are ranked by statistical
analysis, and perhaps augmented with input about functions names
(e.g., alloc and free are important). Is it computationally feasible
to check every MAYBE belief? Could there be much noise?
<p>What errors won't this approach catch?
<h2>Paper discussion</h2>
<p>This paper is best discussed by studying every code fragment. Most
code fragments are pieces of code from Linux distributions; these
mistakes are real!
<p>Section 3.1. what is the error? how does metacompilation catch
it?
<p>Figure 1. what is the error? is there one?
<p>Code fragments from 6.1. what is the error? how does metacompilation catch
it?
<p>Figure 3. what is the error? how does metacompilation catch
it?
<p>Section 8.3. what is the error? how does metacompilation catch
it?
</body>

354
web/l-coordination.html Normal file
View file

@ -0,0 +1,354 @@
<title>L9</title>
<html>
<head>
</head>
<body>
<h1>Coordination and more processes</h1>
<p>Required reading: remainder of proc.c, sys_exec, sys_sbrk,
sys_wait, sys_exit, and sys_kill.
<h2>Overview</h2>
<p>Big picture: more programs than processors. How to share the
limited number of processors among the programs? Last lecture
covered basic mechanism: threads and the distinction between process
and thread. Today expand: how to coordinate the interactions
between threads explicitly, and some operations on processes.
<p>Sequence coordination. This is a diferrent type of coordination
than mutual-exclusion coordination (which has its goal to make
atomic actions so that threads don't interfere). The goal of
sequence coordination is for threads to coordinate the sequences in
which they run.
<p>For example, a thread may want to wait until another thread
terminates. One way to do so is to have the thread run periodically,
let it check if the other thread terminated, and if not give up the
processor again. This is wasteful, especially if there are many
threads.
<p>With primitives for sequence coordination one can do better. The
thread could tell the thread manager that it is waiting for an event
(e.g., another thread terminating). When the other thread
terminates, it explicitly wakes up the waiting thread. This is more
work for the programmer, but more efficient.
<p>Sequence coordination often interacts with mutual-exclusion
coordination, as we will see below.
<p>The operating system literature has a rich set of primivites for
sequence coordination. We study a very simple version of condition
variables in xv6: sleep and wakeup, with a single lock.
<h2>xv6 code examples</h2>
<h3>Sleep and wakeup - usage</h3>
Let's consider implementing a producer/consumer queue
(like a pipe) that can be used to hold a single non-null char pointer:
<pre>
struct pcq {
void *ptr;
};
void*
pcqread(struct pcq *q)
{
void *p;
while((p = q-&gt;ptr) == 0)
;
q-&gt;ptr = 0;
return p;
}
void
pcqwrite(struct pcq *q, void *p)
{
while(q-&gt;ptr != 0)
;
q-&gt;ptr = p;
}
</pre>
<p>Easy and correct, at least assuming there is at most one
reader and at most one writer at a time.
<p>Unfortunately, the while loops are inefficient.
Instead of polling, it would be great if there were
primitives saying ``wait for some event to happen''
and ``this event happened''.
That's what sleep and wakeup do.
<p>Second try:
<pre>
void*
pcqread(struct pcq *q)
{
void *p;
if(q-&gt;ptr == 0)
sleep(q);
p = q-&gt;ptr;
q-&gt;ptr = 0;
wakeup(q); /* wake pcqwrite */
return p;
}
void
pcqwrite(struct pcq *q, void *p)
{
if(q-&gt;ptr != 0)
sleep(q);
q-&gt;ptr = p;
wakeup(q); /* wake pcqread */
return p;
}
</pre>
That's better, but there is still a problem.
What if the wakeup happens between the check in the if
and the call to sleep?
<p>Add locks:
<pre>
struct pcq {
void *ptr;
struct spinlock lock;
};
void*
pcqread(struct pcq *q)
{
void *p;
acquire(&amp;q->lock);
if(q-&gt;ptr == 0)
sleep(q, &amp;q->lock);
p = q-&gt;ptr;
q-&gt;ptr = 0;
wakeup(q); /* wake pcqwrite */
release(&amp;q->lock);
return p;
}
void
pcqwrite(struct pcq *q, void *p)
{
acquire(&amp;q->lock);
if(q-&gt;ptr != 0)
sleep(q, &amp;q->lock);
q-&gt;ptr = p;
wakeup(q); /* wake pcqread */
release(&amp;q->lock);
return p;
}
</pre>
This is okay, and now safer for multiple readers and writers,
except that wakeup wakes up everyone who is asleep on chan,
not just one guy.
So some of the guys who wake up from sleep might not
be cleared to read or write from the queue. Have to go back to looping:
<pre>
struct pcq {
void *ptr;
struct spinlock lock;
};
void*
pcqread(struct pcq *q)
{
void *p;
acquire(&amp;q->lock);
while(q-&gt;ptr == 0)
sleep(q, &amp;q->lock);
p = q-&gt;ptr;
q-&gt;ptr = 0;
wakeup(q); /* wake pcqwrite */
release(&amp;q->lock);
return p;
}
void
pcqwrite(struct pcq *q, void *p)
{
acquire(&amp;q->lock);
while(q-&gt;ptr != 0)
sleep(q, &amp;q->lock);
q-&gt;ptr = p;
wakeup(q); /* wake pcqread */
release(&amp;q->lock);
return p;
}
</pre>
The difference between this an our original is that
the body of the while loop is a much more efficient way to pause.
<p>Now we've figured out how to use it, but we
still need to figure out how to implement it.
<h3>Sleep and wakeup - implementation</h3>
<p>
Simple implementation:
<pre>
void
sleep(void *chan, struct spinlock *lk)
{
struct proc *p = curproc[cpu()];
release(lk);
p-&gt;chan = chan;
p-&gt;state = SLEEPING;
sched();
}
void
wakeup(void *chan)
{
for(each proc p) {
if(p-&gt;state == SLEEPING &amp;&amp; p-&gt;chan == chan)
p-&gt;state = RUNNABLE;
}
}
</pre>
<p>What's wrong? What if the wakeup runs right after
the release(lk) in sleep?
It still misses the sleep.
<p>Move the lock down:
<pre>
void
sleep(void *chan, struct spinlock *lk)
{
struct proc *p = curproc[cpu()];
p-&gt;chan = chan;
p-&gt;state = SLEEPING;
release(lk);
sched();
}
void
wakeup(void *chan)
{
for(each proc p) {
if(p-&gt;state == SLEEPING &amp;&amp; p-&gt;chan == chan)
p-&gt;state = RUNNABLE;
}
}
</pre>
<p>This almost works. Recall from last lecture that we also need
to acquire the proc_table_lock before calling sched, to
protect p-&gt;jmpbuf.
<pre>
void
sleep(void *chan, struct spinlock *lk)
{
struct proc *p = curproc[cpu()];
p-&gt;chan = chan;
p-&gt;state = SLEEPING;
acquire(&amp;proc_table_lock);
release(lk);
sched();
}
</pre>
<p>The problem is that now we're using lk to protect
access to the p-&gt;chan and p-&gt;state variables
but other routines besides sleep and wakeup
(in particular, proc_kill) will need to use them and won't
know which lock protects them.
So instead of protecting them with lk, let's use proc_table_lock:
<pre>
void
sleep(void *chan, struct spinlock *lk)
{
struct proc *p = curproc[cpu()];
acquire(&amp;proc_table_lock);
release(lk);
p-&gt;chan = chan;
p-&gt;state = SLEEPING;
sched();
}
void
wakeup(void *chan)
{
acquire(&amp;proc_table_lock);
for(each proc p) {
if(p-&gt;state == SLEEPING &amp;&amp; p-&gt;chan == chan)
p-&gt;state = RUNNABLE;
}
release(&amp;proc_table_lock);
}
</pre>
<p>One could probably make things work with lk as above,
but the relationship between data and locks would be
more complicated with no real benefit. Xv6 takes the easy way out
and says that elements in the proc structure are always protected
by proc_table_lock.
<h3>Use example: exit and wait</h3>
<p>If proc_wait decides there are children to be waited for,
it calls sleep at line 2462.
When a process exits, we proc_exit scans the process table
to find the parent and wakes it at 2408.
<p>Which lock protects sleep and wakeup from missing each other?
Proc_table_lock. Have to tweak sleep again to avoid double-acquire:
<pre>
if(lk != &amp;proc_table_lock) {
acquire(&amp;proc_table_lock);
release(lk);
}
</pre>
<h3>New feature: kill</h3>
<p>Proc_kill marks a process as killed (line 2371).
When the process finally exits the kernel to user space,
or if a clock interrupt happens while it is in user space,
it will be destroyed (line 2886, 2890, 2912).
<p>Why wait until the process ends up in user space?
<p>What if the process is stuck in sleep? It might take a long
time to get back to user space.
Don't want to have to wait for it, so make sleep wake up early
(line 2373).
<p>This means all callers of sleep should check
whether they have been killed, but none do.
Bug in xv6.
<h3>System call handlers</h3>
<p>Sheet 32
<p>Fork: discussed copyproc in earlier lectures.
Sys_fork (line 3218) just calls copyproc
and marks the new proc runnable.
Does fork create a new process or a new thread?
Is there any shared context?
<p>Exec: we'll talk about exec later, when we talk about file systems.
<p>Sbrk: Saw growproc earlier. Why setupsegs before returning?

222
web/l-fs.html Normal file
View file

@ -0,0 +1,222 @@
<title>L10</title>
<html>
<head>
</head>
<body>
<h1>File systems</h1>
<p>Required reading: iread, iwrite, and wdir, and code related to
these calls in fs.c, bio.c, ide.c, file.c, and sysfile.c
<h2>Overview</h2>
<p>The next 3 lectures are about file systems:
<ul>
<li>Basic file system implementation
<li>Naming
<li>Performance
</ul>
<p>Users desire to store their data durable so that data survives when
the user turns of his computer. The primary media for doing so are:
magnetic disks, flash memory, and tapes. We focus on magnetic disks
(e.g., through the IDE interface in xv6).
<p>To allow users to remember where they stored a file, they can
assign a symbolic name to a file, which appears in a directory.
<p>The data in a file can be organized in a structured way or not.
The structured variant is often called a database. UNIX uses the
unstructured variant: files are streams of bytes. Any particular
structure is likely to be useful to only a small class of
applications, and other applications will have to work hard to fit
their data into one of the pre-defined structures. Besides, if you
want structure, you can easily write a user-mode library program that
imposes that format on any file. The end-to-end argument in action.
(Databases have special requirements and support an important class of
applications, and thus have a specialized plan.)
<p>The API for a minimal file system consists of: open, read, write,
seek, close, and stat. Dup duplicates a file descriptor. For example:
<pre>
fd = open("x", O_RDWR);
read (fd, buf, 100);
write (fd, buf, 512);
close (fd)
</pre>
<p>Maintaining the file offset behind the read/write interface is an
interesting design decision . The alternative is that the state of a
read operation should be maintained by the process doing the reading
(i.e., that the pointer should be passed as an argument to read).
This argument is compelling in view of the UNIX fork() semantics,
which clones a process which shares the file descriptors of its
parent. A read by the parent of a shared file descriptor (e.g.,
stdin, changes the read pointer seen by the child). On the other
hand the alternative would make it difficult to get "(data; ls) > x"
right.
<p>Unix API doesn't specify that the effects of write are immediately
on the disk before a write returns. It is up to the implementation
of the file system within certain bounds. Choices include (that
aren't non-exclusive):
<ul>
<li>At some point in the future, if the system stays up (e.g., after
30 seconds);
<li>Before the write returns;
<li>Before close returns;
<li>User specified (e.g., before fsync returns).
</ul>
<p>A design issue is the semantics of a file system operation that
requires multiple disk writes. In particular, what happens if the
logical update requires writing multiple disks blocks and the power
fails during the update? For example, to create a new file,
requires allocating an inode (which requires updating the list of
free inodes on disk), writing a directory entry to record the
allocated i-node under the name of the new file (which may require
allocating a new block and updating the directory inode). If the
power fails during the operation, the list of free inodes and blocks
may be inconsistent with the blocks and inodes in use. Again this is
up to implementation of the file system to keep on disk data
structures consistent:
<ul>
<li>Don't worry about it much, but use a recovery program to bring
file system back into a consistent state.
<li>Journaling file system. Never let the file system get into an
inconsistent state.
</ul>
<p>Another design issue is the semantics are of concurrent writes to
the same data item. What is the order of two updates that happen at
the same time? For example, two processes open the same file and write
to it. Modern Unix operating systems allow the application to lock a
file to get exclusive access. If file locking is not used and if the
file descriptor is shared, then the bytes of the two writes will get
into the file in some order (this happens often for log files). If
the file descriptor is not shared, the end result is not defined. For
example, one write may overwrite the other one (e.g., if they are
writing to the same part of the file.)
<p>An implementation issue is performance, because writing to magnetic
disk is relatively expensive compared to computing. Three primary ways
to improve performance are: careful file system layout that induces
few seeks, an in-memory cache of frequently-accessed blocks, and
overlap I/O with computation so that file operations don't have to
wait until their completion and so that that the disk driver has more
data to write, which allows disk scheduling. (We will talk about
performance in detail later.)
<h2>xv6 code examples</h2>
<p>xv6 implements a minimal Unix file system interface. xv6 doesn't
pay attention to file system layout. It overlaps computation and I/O,
but doesn't do any disk scheduling. Its cache is write-through, which
simplifies keep on disk datastructures consistent, but is bad for
performance.
<p>On disk files are represented by an inode (struct dinode in fs.h),
and blocks. Small files have up to 12 block addresses in their inode;
large files use files the last address in the inode as a disk address
for a block with 128 disk addresses (512/4). The size of a file is
thus limited to 12 * 512 + 128*512 bytes. What would you change to
support larger files? (Ans: e.g., double indirect blocks.)
<p>Directories are files with a bit of structure to them. The file
contains of records of the type struct dirent. The entry contains the
name for a file (or directory) and its corresponding inode number.
How many files can appear in a directory?
<p>In memory files are represented by struct inode in fsvar.h. What is
the role of the additional fields in struct inode?
<p>What is xv6's disk layout? How does xv6 keep track of free blocks
and inodes? See balloc()/bfree() and ialloc()/ifree(). Is this
layout a good one for performance? What are other options?
<p>Let's assume that an application created an empty file x with
contains 512 bytes, and that the application now calls read(fd, buf,
100), that is, it is requesting to read 100 bytes into buf.
Furthermore, let's assume that the inode for x is is i. Let's pick
up what happens by investigating readi(), line 4483.
<ul>
<li>4488-4492: can iread be called on other objects than files? (Yes.
For example, read from the keyboard.) Everything is a file in Unix.
<li>4495: what does bmap do?
<ul>
<li>4384: what block is being read?
</ul>
<li>4483: what does bread do? does bread always cause a read to disk?
<ul>
<li>4006: what does bget do? it implements a simple cache of
recently-read disk blocks.
<ul>
<li>How big is the cache? (see param.h)
<li>3972: look if the requested block is in the cache by walking down
a circular list.
<li>3977: we had a match.
<li>3979: some other process has "locked" the block, wait until it
releases. the other processes releases the block using brelse().
Why lock a block?
<ul>
<li>Atomic read and update. For example, allocating an inode: read
block containing inode, mark it allocated, and write it back. This
operation must be atomic.
</ul>
<li>3982: it is ours now.
<li>3987: it is not in the cache; we need to find a cache entry to
hold the block.
<li>3987: what is the cache replacement strategy? (see also brelse())
<li>3988: found an entry that we are going to use.
<li>3989: mark it ours but don't mark it valid (there is no valid data
in the entry yet).
</ul>
<li>4007: if the block was in the cache and the entry has the block's
data, return.
<li>4010: if the block wasn't in the cache, read it from disk. are
read's synchronous or asynchronous?
<ul>
<li>3836: a bounded buffer of outstanding disk requests.
<li>3809: tell the disk to move arm and generate an interrupt.
<li>3851: go to sleep and run some other process to run. time sharing
in action.
<li>3792: interrupt: arm is in the right position; wakeup requester.
<li>3856: read block from disk.
<li>3860: remove request from bounded buffer. wakeup processes that
are waiting for a slot.
<li>3864: start next disk request, if any. xv6 can overlap I/O with
computation.
</ul>
<li>4011: mark the cache entry has holding the data.
</ul>
<li>4498: To where is the block copied? is dst a valid user address?
</ul>
<p>Now let's suppose that the process is writing 512 bytes at the end
of the file a. How many disk writes will happen?
<ul>
<li>4567: allocate a new block
<ul>
<li>4518: allocate a block: scan block map, and write entry
<li>4523: How many disk operations if the process would have been appending
to a large file? (Answer: read indirect block, scan block map, write
block map.)
</ul>
<li>4572: read the block that the process will be writing, in case the
process writes only part of the block.
<li>4574: write it. is it synchronous or asynchronous? (Ans:
synchronous but with timesharing.)
</ul>
<p>Lots of code to implement reading and writing of files. How about
directories?
<ul>
<li>4722: look for the directory, reading directory block and see if a
directory entry is unused (inum == 0).
<li>4729: use it and update it.
<li>4735: write the modified block.
</ul>
<p>Reading and writing of directories is trivial.
</body>

174
web/l-interrupt.html Normal file
View file

@ -0,0 +1,174 @@
<html>
<head><title>Lecture 6: Interrupts &amp; Exceptions</title></head>
<body>
<h1>Interrupts &amp; Exceptions</h1>
<p>
Required reading: xv6 <code>trapasm.S</code>, <code>trap.c</code>, <code>syscall.c</code>, <code>usys.S</code>.
<br>
You will need to consult
<a href="../readings/ia32/IA32-3.pdf">IA32 System
Programming Guide</a> chapter 5 (skip 5.7.1, 5.8.2, 5.12.2).
<h2>Overview</h2>
<p>
Big picture: kernel is trusted third-party that runs the machine.
Only the kernel can execute privileged instructions (e.g.,
changing MMU state).
The processor enforces this protection through the ring bits
in the code segment.
If a user application needs to carry out a privileged operation
or other kernel-only service,
it must ask the kernel nicely.
How can a user program change to the kernel address space?
How can the kernel transfer to a user address space?
What happens when a device attached to the computer
needs attention?
These are the topics for today's lecture.
<p>
There are three kinds of events that must be handled
by the kernel, not user programs:
(1) a system call invoked by a user program,
(2) an illegal instruction or other kind of bad processor state (memory fault, etc.).
and
(3) an interrupt from a hardware device.
<p>
Although these three events are different, they all use the same
mechanism to transfer control to the kernel.
This mechanism consists of three steps that execute as one atomic unit.
(a) change the processor to kernel mode;
(b) save the old processor somewhere (usually the kernel stack);
and (c) change the processor state to the values set up as
the &ldquo;official kernel entry values.&rdquo;
The exact implementation of this mechanism differs
from processor to processor, but the idea is the same.
<p>
We'll work through examples of these today in lecture.
You'll see all three in great detail in the labs as well.
<p>
A note on terminology: sometimes we'll
use interrupt (or trap) to mean both interrupts and exceptions.
<h2>
Setting up traps on the x86
</h2>
<p>
See handout Table 5-1, Figure 5-1, Figure 5-2.
<p>
xv6 Sheet 07: <code>struct gatedesc</code> and <code>SETGATE</code>.
<p>
xv6 Sheet 28: <code>tvinit</code> and <code>idtinit</code>.
Note setting of gate for <code>T_SYSCALL</code>
<p>
xv6 Sheet 29: <code>vectors.pl</code> (also see generated <code>vectors.S</code>).
<h2>
System calls
</h2>
<p>
xv6 Sheet 16: <code>init.c</code> calls <code>open("console")</code>.
How is that implemented?
<p>
xv6 <code>usys.S</code> (not in book).
(No saving of registers. Why?)
<p>
Breakpoint <code>0x1b:"open"</code>,
step past <code>int</code> instruction into kernel.
<p>
See handout Figure 9-4 [sic].
<p>
xv6 Sheet 28: in <code>vectors.S</code> briefly, then in <code>alltraps</code>.
Step through to <code>call trap</code>, examine registers and stack.
How will the kernel find the argument to <code>open</code>?
<p>
xv6 Sheet 29: <code>trap</code>, on to <code>syscall</code>.
<p>
xv6 Sheet 31: <code>syscall</code> looks at <code>eax</code>,
calls <code>sys_open</code>.
<p>
(Briefly)
xv6 Sheet 52: <code>sys_open</code> uses <code>argstr</code> and <code>argint</code>
to get its arguments. How do they work?
<p>
xv6 Sheet 30: <code>fetchint</code>, <code>fetcharg</code>, <code>argint</code>,
<code>argptr</code>, <code>argstr</code>.
<p>
What happens if a user program divides by zero
or accesses unmapped memory?
Exception. Same path as system call until <code>trap</code>.
<p>
What happens if kernel divides by zero or accesses unmapped memory?
<h2>
Interrupts
</h2>
<p>
Like system calls, except:
devices generate them at any time,
there are no arguments in CPU registers,
nothing to return to,
usually can't ignore them.
<p>
How do they get generated?
Device essentially phones up the
interrupt controller and asks to talk to the CPU.
Interrupt controller then buzzes the CPU and
tells it, &ldquo;keyboard on line 1.&rdquo;
Interrupt controller is essentially the CPU's
<strike>secretary</strike> administrative assistant,
managing the phone lines on the CPU's behalf.
<p>
Have to set up interrupt controller.
<p>
(Briefly) xv6 Sheet 63: <code>pic_init</code> sets up the interrupt controller,
<code>irq_enable</code> tells the interrupt controller to let the given
interrupt through.
<p>
(Briefly) xv6 Sheet 68: <code>pit8253_init</code> sets up the clock chip,
telling it to interrupt on <code>IRQ_TIMER</code> 100 times/second.
<code>console_init</code> sets up the keyboard, enabling <code>IRQ_KBD</code>.
<p>
In Bochs, set breakpoint at 0x8:"vector0"
and continue, loading kernel.
Step through clock interrupt, look at
stack, registers.
<p>
Was the processor executing in kernel or user mode
at the time of the clock interrupt?
Why? (Have any user-space instructions executed at all?)
<p>
Can the kernel get an interrupt at any time?
Why or why not? <code>cli</code> and <code>sti</code>,
<code>irq_enable</code>.
</body>
</html>

322
web/l-lock.html Normal file
View file

@ -0,0 +1,322 @@
<title>L7</title>
<html>
<head>
</head>
<body>
<h1>Locking</h1>
<p>Required reading: spinlock.c
<h2>Why coordinate?</h2>
<p>Mutual-exclusion coordination is an important topic in operating
systems, because many operating systems run on
multiprocessors. Coordination techniques protect variables that are
shared among multiple threads and updated concurrently. These
techniques allow programmers to implement atomic sections so that one
thread can safely update the shared variables without having to worry
that another thread intervening. For example, processes in xv6 may
run concurrently on different processors and in kernel-mode share
kernel data structures. We must ensure that these updates happen
correctly.
<p>List and insert example:
<pre>
struct List {
int data;
struct List *next;
};
List *list = 0;
insert(int data) {
List *l = new List;
l->data = data;
l->next = list; // A
list = l; // B
}
</pre>
<p>What needs to be atomic? The two statements labeled A and B should
always be executed together, as an indivisible fragment of code. If
two processors execute A and B interleaved, then we end up with an
incorrect list. To see that this is the case, draw out the list after
the sequence A1 (statement executed A by processor 1), A2 (statement A
executed by processor 2), B2, and B1.
<p>How could this erroneous sequence happen? The varilable <i>list</i>
lives in physical memory shared among multiple processors, connected
by a bus. The accesses to the shared memory will be ordered in some
total order by the bus/memory system. If the programmer doesn't
coordinate the execution of the statements A and B, any order can
happen, including the erroneous one.
<p>The erroneous case is called a race condition. The problem with
races is that they are difficult to reproduce. For example, if you
put print statements in to debug the incorrect behavior, you might
change the time and the race might not happen anymore.
<h2>Atomic instructions</h2>
<p>The programmer must be able express that A and B should be executed
as single atomic instruction. We generally use a concept like locks
to mark an atomic region, acquiring the lock at the beginning of the
section and releasing it at the end:
<pre>
void acquire(int *lock) {
while (TSL(lock) != 0) ;
}
void release (int *lock) {
*lock = 0;
}
</pre>
<p>Acquire and release, of course, need to be atomic too, which can,
for example, be done with a hardware atomic TSL (try-set-lock)
instruction:
<p>The semantics of TSL are:
<pre>
R <- [mem] // load content of mem into register R
[mem] <- 1 // store 1 in mem.
</pre>
<p>In a harware implementation, the bus arbiter guarantees that both
the load and store are executed without any other load/stores coming
in between.
<p>We can use locks to implement an atomic insert, or we can use
TSL directly:
<pre>
int insert_lock = 0;
insert(int data) {
/* acquire the lock: */
while(TSL(&insert_lock) != 0)
;
/* critical section: */
List *l = new List;
l->data = data;
l->next = list;
list = l;
/* release the lock: */
insert_lock = 0;
}
</pre>
<p>It is the programmer's job to make sure that locks are respected. If
a programmer writes another function that manipulates the list, the
programmer must must make sure that the new functions acquires and
releases the appropriate locks. If the programmer doesn't, race
conditions occur.
<p>This code assumes that stores commit to memory in program order and
that all stores by other processors started before insert got the lock
are observable by this processor. That is, after the other processor
released a lock, all the previous stores are committed to memory. If
a processor executes instructions out of order, this assumption won't
hold and we must, for example, a barrier instruction that makes the
assumption true.
<h2>Example: Locking on x86</h2>
<p>Here is one way we can implement acquire and release using the x86
xchgl instruction:
<pre>
struct Lock {
unsigned int locked;
};
acquire(Lock *lck) {
while(TSL(&(lck->locked)) != 0)
;
}
release(Lock *lck) {
lck->locked = 0;
}
int
TSL(int *addr)
{
register int content = 1;
// xchgl content, *addr
// xchgl exchanges the values of its two operands, while
// locking the memory bus to exclude other operations.
asm volatile ("xchgl %0,%1" :
"=r" (content),
"=m" (*addr) :
"0" (content),
"m" (*addr));
return(content);
}
</pre>
<p>the instruction "XCHG %eax, (content)" works as follows:
<ol>
<li> freeze other CPUs' memory activity
<li> temp := content
<li> content := %eax
<li> %eax := temp
<li> un-freeze other CPUs
</ol>
<p>steps 1 and 5 make XCHG special: it is "locked" special signal
lines on the inter-CPU bus, bus arbitration
<p>This implementation doesn't scale to a large number of processors;
in a later lecture we will see how we could do better.
<h2>Lock granularity</h2>
<p>Release/acquire is ideal for short atomic sections: increment a
counter, search in i-node cache, allocate a free buffer.
<p>What are spin locks not so great for? Long atomic sections may
waste waiters' CPU time and it is to sleep while holding locks. In
xv6 we try to avoid long atomic sections by carefully coding (can
you find an example?). xv6 doesn't release the processor when
holding a lock, but has an additional set of coordination primitives
(sleep and wakeup), which we will study later.
<p>My list_lock protects all lists; inserts to different lists are
blocked. A lock per list would waste less time spinning so you might
want "fine-grained" locks, one for every object BUT acquire/release
are expensive (500 cycles on my 3 ghz machine) because they need to
talk off-chip.
<p>Also, "correctness" is not that simple with fine-grained locks if
need to maintain global invariants; e.g., "every buffer must be on
exactly one of free list and device list". Per-list locks are
irrelevant for this invariant. So you might want "large-grained",
which reduces overhead but reduces concurrency.
<p>This tension is hard to get right. One often starts out with
"large-grained locks" and measures the performance of the system on
some workloads. When more concurrency is desired (to get better
performance), an implementor may switch to a more fine-grained
scheme. Operating system designers fiddle with this all the time.
<h2>Recursive locks and modularity</h2>
<p>When designing a system we desire clean abstractions and good
modularity. We like a caller not have to know about how a callee
implements a particul functions. Locks make achieving modularity
more complicated. For example, what to do when the caller holds a
lock, then calls a function, which also needs to the lock to perform
its job.
<p>There are no transparent solutions that allow the caller and callee
to be unaware of which lokcs they use. One transparent, but
unsatisfactory option is recursive locks: If a callee asks for a
lock that its caller has, then we allow the callee to proceed.
Unfortunately, this solution is not ideal either.
<p>Consider the following. If lock x protects the internals of some
struct foo, then if the caller acquires lock x, it know that the
internals of foo are in a sane state and it can fiddle with them.
And then the caller must restore them to a sane state before release
lock x, but until then anything goes.
<p>This assumption doesn't hold with recursive locking. After
acquiring lock x, the acquirer knows that either it is the first to
get this lock, in which case the internals are in a sane state, or
maybe some caller holds the lock and has messed up the internals and
didn't realize when calling the callee that it was going to try to
look at them too. So the fact that a function acquired the lock x
doesn't guarantee anything at all. In short, locks protect against
callers and callees just as much as they protect against other
threads.
<p>Since transparent solutions aren't ideal, it is better to consider
locks part of the function specification. The programmer must
arrange that a caller doesn't invoke another function while holding
a lock that the callee also needs.
<h2>Locking in xv6</h2>
<p>xv6 runs on a multiprocessor and is programmed to allow multiple
threads of computation to run concurrently. In xv6 an interrupt might
run on one processor and a process in kernel mode may run on another
processor, sharing a kernel data structure with the interrupt routing.
xv6 uses locks, implemented using an atomic instruction, to coordinate
concurrent activities.
<p>Let's check out why xv6 needs locks by following what happens when
we start a second processor:
<ul>
<li>1516: mp_init (called from main0)
<li>1606: mp_startthem (called from main0)
<li>1302: mpmain
<li>2208: scheduler.
<br>Now we have several processors invoking the scheduler
function. xv6 better ensure that multiple processors don't run the
same process! does it?
<br>Yes, if multiple schedulers run concurrently, only one will
acquire proc_table_lock, and proceed looking for a runnable
process. if it finds a process, it will mark it running, longjmps to
it, and the process will release proc_table_lock. the next instance
of scheduler will skip this entry, because it is marked running, and
look for another runnable process.
</ul>
<p>Why hold proc_table_lock during a context switch? It protects
p->state; the process has to hold some lock to avoid a race with
wakeup() and yield(), as we will see in the next lectures.
<p>Why not a lock per proc entry? It might be expensive in in whole
table scans (in wait, wakeup, scheduler). proc_table_lock also
protects some larger invariants, for example it might be hard to get
proc_wait() right with just per entry locks. Right now the check to
see if there are any exited children and the sleep are atomic -- but
that would be hard with per entry locks. One could have both, but
that would probably be neither clean nor fast.
<p>Of course, there is only processor searching the proc table if
acquire is implemented correctly. Let's check out acquire in
spinlock.c:
<ul>
<li>1807: no recursive locks!
<li>1811: why disable interrupts on the current processor? (if
interrupt code itself tries to take a held lock, xv6 will deadlock;
the panic will fire on 1808.)
<ul>
<li>can a process on a processor hold multiple locks?
</ul>
<li>1814: the (hopefully) atomic instruction.
<ul>
<li>see sheet 4, line 0468.
</ul>
<li>1819: make sure that stores issued on other processors before we
got the lock are observed by this processor. these may be stores to
the shared data structure that is protected by the lock.
</ul>
<p>
<h2>Locking in JOS</h2>
<p>JOS is meant to run on single-CPU machines, and the plan can be
simple. The simple plan is disabling/enabling interrupts in the
kernel (IF flags in the EFLAGS register). Thus, in the kernel,
threads release the processors only when they want to and can ensure
that they don't release the processor during a critical section.
<p>In user mode, JOS runs with interrupts enabled, but Unix user
applications don't share data structures. The data structures that
must be protected, however, are the ones shared in the library
operating system (e.g., pipes). In JOS we will use special-case
solutions, as you will find out in lab 6. For example, to implement
pipe we will assume there is one reader and one writer. The reader
and writer never update each other's variables; they only read each
other's variables. Carefully programming using this rule we can avoid
races.

262
web/l-mkernel.html Normal file
View file

@ -0,0 +1,262 @@
<title>Microkernel lecture</title>
<html>
<head>
</head>
<body>
<h1>Microkernels</h1>
<p>Required reading: Improving IPC by kernel design
<h2>Overview</h2>
<p>This lecture looks at the microkernel organization. In a
microkernel, services that a monolithic kernel implements in the
kernel are running as user-level programs. For example, the file
system, UNIX process management, pager, and network protocols each run
in a separate user-level address space. The microkernel itself
supports only the services that are necessary to allow system services
to run well in user space; a typical microkernel has at least support
for creating address spaces, threads, and inter process communication.
<p>The potential advantages of a microkernel are simplicity of the
kernel (small), isolation of operating system components (each runs in
its own user-level address space), and flexibility (we can have a file
server and a database server). One potential disadvantage is
performance loss, because what in a monolithich kernel requires a
single system call may require in a microkernel multiple system calls
and context switches.
<p>One way in how microkernels differ from each other is the exact
kernel API they implement. For example, Mach (a system developed at
CMU, which influenced a number of commercial operating systems) has
the following system calls: processes (create, terminate, suspend,
resume, priority, assign, info, threads), threads (fork, exit, join,
detach, yield, self), ports and messages (a port is a unidirectionally
communication channel with a message queue and supporting primitives
to send, destroy, etc), and regions/memory objects (allocate,
deallocate, map, copy, inherit, read, write).
<p>Some microkernels are more "microkernel" than others. For example,
some microkernels implement the pager in user space but the basic
virtual memory abstractions in the kernel (e.g, Mach); others, are
more extreme, and implement most of the virtual memory in user space
(L4). Yet others are less extreme: many servers run in their own
address space, but in kernel mode (Chorus).
<p>All microkernels support multiple threads per address space. xv6
and Unix until recently didn't; why? Because, in Unix system services
are typically implemented in the kernel, and those are the primary
programs that need multiple threads to handle events concurrently
(waiting for disk and processing new I/O requests). In microkernels,
these services are implemented in user-level address spaces and so
they need a mechanism to deal with handling operations concurrently.
(Of course, one can argue if fork efficient enough, there is no need
to have threads.)
<h2>L3/L4</h2>
<p>L3 is a predecessor to L4. L3 provides data persistence, DOS
emulation, and ELAN runtime system. L4 is a reimplementation of L3,
but without the data persistence. L4KA is a project at
sourceforge.net, and you can download the code for the latest
incarnation of L4 from there.
<p>L4 is a "second-generation" microkernel, with 7 calls: IPC (of
which there are several types), id_nearest (find a thread with an ID
close the given ID), fpage_unmap (unmap pages, mapping is done as a
side-effect of IPC), thread_switch (hand processor to specified
thread), lthread_ex_regs (manipulate thread registers),
thread_schedule (set scheduling policies), task_new (create a new
address space with some default number of threads). These calls
provide address spaces, tasks, threads, interprocess communication,
and unique identifiers. An address space is a set of mappings.
Multiple threads may share mappings, a thread may grants mappings to
another thread (through IPC). Task is the set of threads sharing an
address space.
<p>A thread is the execution abstraction; it belongs to an address
space, a UID, a register set, a page fault handler, and an exception
handler. A UID of a thread is its task number plus the number of the
thread within that task.
<p>IPC passes data by value or by reference to another address space.
It also provide for sequence coordination. It is used for
communication between client and servers, to pass interrupts to a
user-level exception handler, to pass page faults to an external
pager. In L4, device drivers are implemented has a user-level
processes with the device mapped into their address space.
Linux runs as a user-level process.
<p>L4 provides quite a scala of messages types: inline-by-value,
strings, and virtual memory mappings. The send and receive descriptor
specify how many, if any.
<p>In addition, there is a system call for timeouts and controling
thread scheduling.
<h2>L3/L4 paper discussion</h2>
<ul>
<li>This paper is about performance. What is a microsecond? Is 100
usec bad? Is 5 usec so much better we care? How many instructions
does 50-Mhz x86 execute in 100 usec? What can we compute with that
number of instructions? How many disk operations in that time? How
many interrupts can we take? (The livelock paper, which we cover in a
few lectures, mentions 5,000 network pkts per second, and each packet
generates two interrrupts.)
<li>In performance calculations, what is the appropriate/better metric?
Microseconds or cycles?
<li>Goal: improve IPC performance by a factor 10 by careful kernel
design that is fully aware of the hardware it is running on.
Principle: performance rules! Optimize for the common case. Because
in L3 interrupts are propagated to user-level using IPC, the system
may have to be able to support many IPCs per second (as many as the
device can generate interrupts).
<li>IPC consists of transfering control and transfering data. The
minimal cost for transfering control is 127 cycles, plus 45 cycles for
TLB misses (see table 3). What are the x86 instructions to enter and
leave the kernel? (int, iret) Why do they consume so much time?
(Flush pipeline) Do modern processors perform these operations more
efficient? Worse now. Faster processors optimized for straight-line
code; Traps/Exceptions flush deeper pipeline, cache misses cost more
cycles.
<li>What are the 5 TLB misses: 1) B's thread control block; loading %cr3
flushes TLB, so 2) kernel text causes miss; iret, accesses both 3) stack and
4+5) user text - two pages B's user code looks at message
<li>Interface:
<ul>
<li>call (threadID, send-message, receive-message, timeout);
<li>reply_and_receive (reply-message, receive-message, timeout);
</ul>
<li>Optimizations:
<ul>
<li>New system call: reply_and_receive. Effect: 2 system calls per
RPC.
<li>Complex messages: direct string, indirect strings, and memory
objects.
<li>Direct transfer by temporary mapping through a communication
window. The communication window is mapped in B address space and in
A's kernel address space; why is this better than just mapping a page
shared between A and B's address space? 1) Multi-level security, it
makes it hard to reason about information flow; 2) Receiver can't
check message legality (might change after check); 3) When server has
many clients, could run out of virtual address space Requires shared
memory region to be established ahead of time; 4) Not application
friendly, since data may already be at another address, i.e.
applications would have to copy anyway--possibly more copies.
<li>Why not use the following approach: map the region copy-on-write
(or read-only) in A's address space after send and read-only in B's
address space? Now B may have to copy data or cannot receive data in
its final destination.
<li>On the x86 implemented by coping B's PDE into A's address space.
Why two PDEs? (Maximum message size is 4 Meg, so guaranteed to work
if the message starts in the bottom for 4 Mbyte of an 8 Mbyte mapped
region.) Why not just copy PTEs? Would be much more expensive
<li> What does it mean for the TLB to be "window clean"? Why do we
care? Means TLB contains no mappings within communication window. We
care because mapping is cheap (copy PDE), but invalidation not; x86
only lets you invalidate one page at a time, or whole TLB Does TLB
invalidation of communication window turn out to be a problem? Not
usually, because have to load %cr3 during IPC anyway
<li>Thread control block registers, links to various double-linked
lists, pgdir, uid, etc.. Lower part of thread UID contains TCB
number. Can also dededuce TCB address from stack by taking SP AND
bitmask (the SP comes out of the TSS when just switching to kernel).
<li> Kernel stack is on same page as tcb. why? 1) Minimizes TLB
misses (since accessing kernel stack will bring in tcb); 2) Allows
very efficient access to tcb -- just mask off lower 12 bits of %esp;
3) With VM, can use lower 32-bits of thread id to indicate which tcb;
using one page per tcb means no need to check if thread is swapped out
(Can simply not map that tcb if shouldn't access it).
<li>Invariant on queues: queues always hold in-memory TCBs.
<li>Wakeup queue: set of 8 unordered wakeup lists (wakup time mod 8),
and smart representation of time so that 32-bit integers can be used
in the common case (base + offset in msec; bump base and recompute all
offsets ~4 hours. maximum timeout is ~24 days, 2^31 msec).
<li>What is the problem addressed by lazy scheduling?
Conventional approach to scheduling:
<pre>
A sends message to B:
Move A from ready queue to waiting queue
Move B from waiting queue to ready queue
This requires 58 cycles, including 4 TLB misses. What are TLB misses?
One each for head of ready and waiting queues
One each for previous queue element during the remove
</pre>
<li> Lazy scheduling:
<pre>
Ready queue must contain all ready threads except current one
Might contain other threads that aren't actually ready, though
Each wakeup queue contains all threads waiting in that queue
Again, might contain other threads, too
Scheduler removes inappropriate queue entries when scanning
queue
</pre>
<li>Why does this help performance? Only three situations in which
thread gives up CPU but stays ready: send syscall (as opposed to
call), preemption, and hardware interrupts. So very often can IPC into
thread while not putting it on ready list.
<li>Direct process switch. This section just says you should use
kernel threads instead of continuations.
<li>Short messages via registers.
<li>Avoiding unnecessary copies. Basically can send and receive
messages w. same vector. Makes forwarding efficient, which is
important for Clans/Chiefs model.
<li>Segment register optimization. Loading segments registers is
slow, have to access GDT, etc. But common case is that users don't
change their segment registers. Observation: it is faster to check
that segment descriptor than load it. So just check that segment
registers are okay. Only need to load if user code changed them.
<li>Registers for paramater passing where ever possible: systems calls
and IPC.
<li>Minimizing TLB misses. Try to cram as many things as possible onto
same page: IPC kernel code, GDT, IDT, TSS, all on same page. Actually
maybe can't fit whole tables but put the important parts of tables on
the same page (maybe beginning of TSS, IDT, or GDT only?)
<li>Coding tricks: short offsets, avoid jumps, avoid checks, pack
often-used data on same cache lines, lazily save/restore CPU state
like debug and FPU registers. Much of the kernel is written in
assembly!
<li>What are the results? figure 7 and 8 look good.
<li>Is fast IPC enough to get good overall system performance? This
paper doesn't make a statement either way; we have to read their 1997
paper to find find the answer to that question.
<li>Is the principle of optimizing for performance right? In general,
it is wrong to optimize for performance; other things matter more. Is
IPC the one exception? Maybe, perhaps not. Was Liedtke fighting a
losing battle against CPU makers? Should fast IPC time be a hardware,
or just an OS issue?
</ul>
</body>

181
web/l-name.html Normal file
View file

@ -0,0 +1,181 @@
<title>L11</title>
<html>
<head>
</head>
<body>
<h1>Naming in file systems</h1>
<p>Required reading: nami(), and all other file system code.
<h2>Overview</h2>
<p>To help users to remember where they stored their data, most
systems allow users to assign their own names to their data.
Typically the data is organized in files and users assign names to
files. To deal with many files, users can organize their files in
directories, in a hierarchical manner. Each name is a pathname, with
the components separated by "/".
<p>To avoid that users have to type long abolute names (i.e., names
starting with "/" in Unix), users can change their working directory
and use relative names (i.e., naming that don't start with "/").
<p>User file namespace operations include create, mkdir, mv, ln
(link), unlink, and chdir. (How is "mv a b" implemented in xv6?
Answer: "link a b"; "unlink a".) To be able to name the current
directory and the parent directory every directory includes two
entries "." and "..". Files and directories can reclaimed if users
cannot name it anymore (i.e., after the last unlink).
<p>Recall from last lecture, all directories entries contain a name,
followed by an inode number. The inode number names an inode of the
file system. How can we merge file systems from different disks into
a single name space?
<p>A user grafts new file systems on a name space using mount. Umount
removes a file system from the name space. (In DOS, a file system is
named by its device letter.) Mount takes the root inode of the
to-be-mounted file system and grafts it on the inode of the name space
entry where the file system is mounted (e.g., /mnt/disk1). The
in-memory inode of /mnt/disk1 records the major and minor number of
the file system mounted on it. When namei sees an inode on which a
file system is mounted, it looks up the root inode of the mounted file
system, and proceeds with that inode.
<p>Mount is not a durable operation; it doesn't surive power failures.
After a power failure, the system administrator must remount the file
system (i.e., often in a startup script that is run from init).
<p>Links are convenient, because with users can create synonyms for
file names. But, it creates the potential of introducing cycles in
the naning tree. For example, consider link("a/b/c", "a"). This
makes c a synonym for a. This cycle can complicate matters; for
example:
<ul>
<li>If a user subsequently calls unlink ("a"), then the user cannot
name the directory "b" and the link "c" anymore, but how can the
file system decide that?
</ul>
<p>This problem can be solved by detecting cycles. The second problem
can be solved by computing with files are reacheable from "/" and
reclaim all the ones that aren't reacheable. Unix takes a simpler
approach: avoid cycles by disallowing users to create links for
directories. If there are no cycles, then reference counts can be
used to see if a file is still referenced. In the inode maintain a
field for counting references (nlink in xv6's dinode). link
increases the reference count, and unlink decreases the count; if
the count reaches zero the inode and disk blocks can be reclaimed.
<p>How to handle symbolic links across file systems (i.e., from one
mounted file system to another)? Since inodes are not unique across
file systems, we cannot create a link across file systems; the
directory entry only contains an inode number, not the inode number
and the name of the disk on which the inode is located. To handle
this case, Unix provides a second type of link, which are called
soft links.
<p>Soft links are a special file type (e.g., T_SYMLINK). If namei
encounters a inode of type T_SYMLINK, it resolves the the name in
the symlink file to an inode, and continues from there. With
symlinks one can create cycles and they can point to non-existing
files.
<p>The design of the name system can have security implications. For
example, if you tests if a name exists, and then use the name,
between testing and using it an adversary can have change the
binding from name to object. Such problems are called TOCTTOU.
<p>An example of TOCTTOU is follows. Let's say root runs a script
every night to remove file in /tmp. This gets rid off the files
that editors might left behind, but we will never be used again. An
adversary can exploit this script as follows:
<pre>
Root Attacker
mkdir ("/tmp/etc")
creat ("/tmp/etc/passw")
readdir ("tmp");
lstat ("tmp/etc");
readdir ("tmp/etc");
rename ("tmp/etc", "/tmp/x");
symlink ("etc", "/tmp/etc");
unlink ("tmp/etc/passwd");
</pre>
Lstat checks whether /tmp/etc is not symbolic link, but by the time it
runs unlink the attacker had time to creat a symbolic link in the
place of /tmp/etc, with a password file of the adversary's choice.
<p>This problem could have been avoided if every user or process group
had its own private /tmp, or if access to the shared one was
mediated.
<h2>V6 code examples</h2>
<p> namei (sheet 46) is the core of the Unix naming system. namei can
be called in several ways: NAMEI_LOOKUP (resolve a name to an inode
and lock inode), NAMEI_CREATE (resolve a name, but lock parent
inode), and NAMEI_DELETE (resolve a name, lock parent inode, and
return offset in the directory). The reason is that namei is
complicated is that we want to atomically test if a name exist and
remove/create it, if it does; otherwise, two concurrent processes
could interfere with each other and directory could end up in an
inconsistent state.
<p>Let's trace open("a", O_RDWR), focussing on namei:
<ul>
<li>5263: we will look at creating a file in a bit.
<li>5277: call namei with NAMEI_LOOKUP
<li>4629: if path name start with "/", lookup root inode (1).
<li>4632: otherwise, use inode for current working directory.
<li>4638: consume row of "/", for example in "/////a////b"
<li>4641: if we are done with NAMEI_LOOKUP, return inode (e.g.,
namei("/")).
<li>4652: if the inode we are searching for a name isn't of type
directory, give up.
<li>4657-4661: determine length of the current component of the
pathname we are resolving.
<li>4663-4681: scan the directory for the component.
<li>4682-4696: the entry wasn't found. if we are the end of the
pathname and NAMEI_CREATE is set, lock parent directory and return a
pointer to the start of the component. In all other case, unlock
inode of directory, and return 0.
<li>4701: if NAMEI_DELETE is set, return locked parent inode and the
offset of the to-be-deleted component in the directory.
<li>4707: lookup inode of the component, and go to the top of the loop.
</ul>
<p>Now let's look at creating a file in a directory:
<ul>
<li>5264: if the last component doesn't exist, but first part of the
pathname resolved to a directory, then dp will be 0, last will point
to the beginning of the last component, and ip will be the locked
parent directory.
<li>5266: create an entry for last in the directory.
<li>4772: mknod1 allocates a new named inode and adds it to an
existing directory.
<li>4776: ialloc. skan inode block, find unused entry, and write
it. (if lucky 1 read and 1 write.)
<li>4784: fill out the inode entry, and write it. (another write)
<li>4786: write the entry into the directory (if lucky, 1 write)
</ul>
</ul>
Why must the parent directory be locked? If two processes try to
create the same name in the same directory, only one should succeed
and the other one, should receive an error (file exist).
<p>Link, unlink, chdir, mount, umount could have taken file
descriptors instead of their path argument. In fact, this would get
rid of some possible race conditions (some of which have security
implications, TOCTTOU). However, this would require that the current
working directory be remembered by the process, and UNIX didn't have
good ways of maintaining static state shared among all processes
belonging to a given user. The easiest way is to create shared state
is to place it in the kernel.
<p>We have one piece of code in xv6 that we haven't studied: exec.
With all the ground work we have done this code can be easily
understood (see sheet 54).
</body>

249
web/l-okws.txt Normal file
View file

@ -0,0 +1,249 @@
Security
-------------------
I. 2 Intro Examples
II. Security Overview
III. Server Security: Offense + Defense
IV. Unix Security + POLP
V. Example: OKWS
VI. How to Build a Website
I. Intro Examples
--------------------
1. Apache + OpenSSL 0.9.6a (CAN 2002-0656)
- SSL = More security!
unsigned int j;
p=(unsigned char *)s->init_buf->data;
j= *(p++);
s->session->session_id_length=j;
memcpy(s->session->session_id,p,j);
- the result: an Apache worm
2. SparkNotes.com 2000:
- New profile feature that displays "public" information about users
but bug that made e-mail addresses "public" by default.
- New program for getting that data:
http://www.sparknotes.com/getprofile.cgi?id=1343
II. Security Overview
----------------------
What Is Security?
- Protecting your system from attack.
What's an attack?
- Stealing data
- Corrupting data
- Controlling resources
- DOS
Why attack?
- Money
- Blackmail / extortion
- Vendetta
- intellectual curiosity
- fame
Security is a Big topic
- Server security -- today's focus. There's some machine sitting on the
Internet somewhere, with a certain interface exposed, and attackers
want to circumvent it.
- Why should you trust your software?
- Client security
- Clients are usually servers, so they have many of the same issues.
- Slight simplification: people across the network cannot typically
initiate connections.
- Has a "fallible operator":
- Spyware
- Drive-by-Downloads
- Client security turns out to be much harder -- GUI considerations,
look inside the browser and the applications.
- Systems community can more easily handle server security.
- We think mainly of servers.
III. Server Security: Offense and Defense
-----------------------------------------
- Show picture of a Web site.
Attacks | Defense
----------------------------------------------------------------------------
1. Break into DB from net | 1. FW it off
2. Break into WS on telnet | 2. FW it off
3. Buffer overrun in Apache | 3. Patch apache / use better lang?
4. Buffer overrun in our code | 4. Use better lang / isolate it
5. SQL injection | 5. Better escaping / don't interpret code.
6. Data scraping. | 6. Use a sparse UID space.
7. PW sniffing | 7. ???
8. Fetch /etc/passwd and crack | 8. Don't expose /etc/passwd
PW |
9. Root escalation from apache | 9. No setuid programs available to Apache
10. XSS |10. Filter JS and input HTML code.
11. Keystroke recorded on sys- |11. Client security
admin's desktop (planetlab) |
12. DDOS |12. ???
Summary:
- That we want private data to be available to right people makes
this problem hard in the first place. Internet servers are there
for a reason.
- Security != "just encrypt your data;" this in fact can sometimes
make the problem worse.
- Best to prevent break-ins from happening in the first place.
- If they do happen, want to limit their damage (POLP).
- Security policies are difficult to express / package up neatly.
IV. Design According to POLP (in Unix)
---------------------------------------
- Assume any piece of a system can be compromised, by either bad
programming or malicious attack.
- Try to limit the damage done by such a compromise (along the lines
of the 4 attack goals).
<Draw a picture of a server process on Unix, w/ other processes>
What's the goal on Unix?
- Keep processes from communicating that don't have to:
- limit FS, IPC, signals, ptrace
- Strip away unneeded privilege
- with respect to network, FS.
- Strip away FS access.
How on Unix?
- setuid/setgid
- system call interposition
- chroot (away from setuid executables, /etc/passwd, /etc/ssh/..)
<show Code snippet>
How do you write chroot'ed programs?
- What about shared libraries?
- /etc/resolv.conf?
- Can chroot'ed programs access the FS at all? What if they need
to write to the FS or read from the FS?
- Fd's are *capabilities*; can pass them to chroot'ed services,
thereby opening new files on its behalf.
- Unforgeable - can only get them from the kernel via open/socket, etc.
Unix Shortcomings (round 1)
- It's bad to run as root!
- Yet, need root for:
- chroot
- setuid/setgid to a lower-privileged user
- create a new user ID
- Still no guarantee that we've cut off all channels
- 200 syscalls!
- Default is to give most/all privileges.
- Can "break out" of chroot jails?
- Can still exploit race conditions in the kernel to escalate privileges.
Sidebar
- setuid / setuid misunderstanding
- root / root misunderstanding
- effective vs. real vs. saved set-user-ID
V. OKWS
-------
- Taking these principles as far as possible.
- C.f. Figure 1 From the paper..
- Discussion of which privileges are in which processes
<Table of how to hack, what you get, etc...>
- Technical details: how to launch a new service
- Within the launcher (running as root):
<on board:>
// receive FDs from logger, pubd, demux
fork ();
chroot ("/var/okws/run");
chdir ("/coredumps/51001");
setgid (51001);
setuid (51001);
exec ("login", fds ... );
- Note no chroot -- why not?
- Once launched, how does a service get new connections?
- Note the goal - minimum tampering with each other in the
case of a compromise.
Shortcoming of Unix (2)
- A lot of plumbing involved with this system. FDs flying everywhere.
- Isolation still not fine enough. If a service gets taken over,
can compromise all users of that service.
VI. Reflections on Building Websites
---------------------------------
- OKWS interesting "experiment"
- Need for speed; also, good gzip support.
- If you need compiled code, it's a good way to go.
- RPC-like system a must for backend communication
- Connection-pooling for free
Biggest difficulties:
- Finding good C++ programmers.
- Compile times.
- The DB is still always the problem.
Hard to Find good Alternatives
- Python / Perl - you might spend a lot of time writing C code /
integrating with lower level languages.
- Have to worry about DB pooling.
- Java -- must viable, and is getting better. Scary you can't peer
inside.
- .Net / C#-based system might be the way to go.
=======================================================================
Extra Material:
Capabilities (From the Eros Paper in SOSP 1999)
- "Unforgeable pair made up of an object ID and a set of authorized
operations (an interface) on that object."
- c.f. Dennis and van Horn. "Programming semantics for multiprogrammed
computations," Communications of the ACM 9(3):143-154, Mar 1966.
- Thus:
<object ID, set of authorized OPs on that object>
- Examples:
"Process X can write to file at inode Y"
"Process P can read from file at inode Z"
- Familiar example: Unix file descriptors
- Why are they secure?
- Capabilities are "unforgeable"
- Processes can get them only through authorized interfaces
- Capabilities are only given to processes authorized to hold them
- How do you get them?
- From the kernel (e.g., open)
- From other applications (e.g., FD passing)
- How do you use them?
- read (fd), write(fd).
- How do you revoke them once granted?
- In Unix, you do not.
- In some systems, a central authority ("reference monitor") can revoke.
- How do you store them persistently?
- Can have circular dependencies (unlike an FS).
- What happens when the system starts up?
- Revert to checkpointed state.
- Often capability systems chose a single-level store.
- Capability systems, a historical prospective:
- KeyKOS, Eros, Cyotos (UP research)
- Never saw any applications
- IBM Systems (System 38, later AS/400, later 'i Series')
- Commercially viable
- Problems:
- All bets are off when a capability is sent to the wrong place.
- Firewall analogy?

249
web/l-plan9.html Normal file
View file

@ -0,0 +1,249 @@
<html>
<head>
<title>Plan 9</title>
</head>
<body>
<h1>Plan 9</h1>
<p>Required reading: Plan 9 from Bell Labs</p>
<h2>Background</h2>
<p>Had moved away from the ``one computing system'' model of
Multics and Unix.</p>
<p>Many computers (`workstations'), self-maintained, not a coherent whole.</p>
<p>Pike and Thompson had been batting around ideas about a system glued together
by a single protocol as early as 1984.
Various small experiments involving individual pieces (file server, OS, computer)
tried throughout 1980s.</p>
<p>Ordered the hardware for the ``real thing'' in beginning of 1989,
built up WORM file server, kernel, throughout that year.</p>
<p>Some time in early fall 1989, Pike and Thompson were
trying to figure out a way to fit the window system in.
On way home from dinner, both independently realized that
needed to be able to mount a user-space file descriptor,
not just a network address.</p>
<p>Around Thanksgiving 1989, spent a few days rethinking the whole
thing, added bind, new mount, flush, and spent a weekend
making everything work again. The protocol at that point was
essentially identical to the 9P in the paper.</p>
<p>In May 1990, tried to use system as self-hosting.
File server kept breaking, had to keep rewriting window system.
Dozen or so users by then, mostly using terminal windows to
connect to Unix.</p>
<p>Paper written and submitted to UKUUG in July 1990.</p>
<p>Because it was an entirely new system, could take the
time to fix problems as they arose, <i>in the right place</i>.</p>
<h2>Design Principles</h2>
<p>Three design principles:</p>
<p>
1. Everything is a file.<br>
2. There is a standard protocol for accessing files.<br>
3. Private, malleable name spaces (bind, mount).
</p>
<h3>Everything is a file.</h3>
<p>Everything is a file (more everything than Unix: networks, graphics).</p>
<pre>
% ls -l /net
% lp /dev/screen
% cat /mnt/wsys/1/text
</pre>
<h3>Standard protocol for accessing files</h3>
<p>9P is the only protocol the kernel knows: other protocols
(NFS, disk file systems, etc.) are provided by user-level translators.</p>
<p>Only one protocol, so easy to write filters and other
converters. <i>Iostats</i> puts itself between the kernel
and a command.</p>
<pre>
% iostats -xvdfdf /bin/ls
</pre>
<h3>Private, malleable name spaces</h3>
<p>Each process has its own private name space that it
can customize at will.
(Full disclosure: can arrange groups of
processes to run in a shared name space. Otherwise how do
you implement <i>mount</i> and <i>bind</i>?)</p>
<p><i>Iostats</i> remounts the root of the name space
with its own filter service.</p>
<p>The window system mounts a file system that it serves
on <tt>/mnt/wsys</tt>.</p>
<p>The network is actually a kernel device (no 9P involved)
but it still serves a file interface that other programs
use to access the network.
Easy to move out to user space (or replace) if necessary:
<i>import</i> network from another machine.</p>
<h3>Implications</h3>
<p>Everything is a file + can share files =&gt; can share everything.</p>
<p>Per-process name spaces help move toward ``each process has its own
private machine.''</p>
<p>One protocol: easy to build custom filters to add functionality
(e.g., reestablishing broken network connections).
<h3>File representation for networks, graphics, etc.</h3>
<p>Unix sockets are file descriptors, but you can't use the
usual file operations on them. Also far too much detail that
the user doesn't care about.</p>
<p>In Plan 9:
<pre>dial("tcp!plan9.bell-labs.com!http");
</pre>
(Protocol-independent!)</p>
<p>Dial more or less does:<br>
write to /net/cs: tcp!plan9.bell-labs.com!http
read back: /net/tcp/clone 204.178.31.2!80
write to /net/tcp/clone: connect 204.178.31.2!80
read connection number: 4
open /net/tcp/4/data
</p>
<p>Details don't really matter. Two important points:
protocol-independent, and ordinary file operations
(open, read, write).</p>
<p>Networks can be shared just like any other files.</p>
<p>Similar story for graphics, other resources.</p>
<h2>Conventions</h2>
<p>Per-process name spaces mean that even full path names are ambiguous
(<tt>/bin/cat</tt> means different things on different machines,
or even for different users).</p>
<p><i>Convention</i> binds everything together.
On a 386, <tt>bind /386/bin /bin</tt>.
<p>In Plan 9, always know where the resource <i>should</i> be
(e.g., <tt>/net</tt>, <tt>/dev</tt>, <tt>/proc</tt>, etc.),
but not which one is there.</p>
<p>Can break conventions: on a 386, <tt>bind /alpha/bin /bin</tt>, just won't
have usable binaries in <tt>/bin</tt> anymore.</p>
<p>Object-oriented in the sense of having objects (files) that all
present the same interface and can be substituted for one another
to arrange the system in different ways.</p>
<p>Very little ``type-checking'': <tt>bind /net /proc; ps</tt>.
Great benefit (generality) but must be careful (no safety nets).</p>
<h2>Other Contributions</h2>
<h3>Portability</h3>
<p>Plan 9 still is the most portable operating system.
Not much machine-dependent code, no fancy features
tied to one machine's MMU, multiprocessor from the start (1989).</p>
<p>Many other systems are still struggling with converting to SMPs.</p>
<p>Has run on MIPS, Motorola 68000, Nextstation, Sparc, x86, PowerPC, Alpha, others.</p>
<p>All the world is not an x86.</p>
<h3>Alef</h3>
<p>New programming language: convenient, but difficult to maintain.
Retired when author (Winterbottom) stopped working on Plan 9.</p>
<p>Good ideas transferred to C library plus conventions.</p>
<p>All the world is not C.</p>
<h3>UTF-8</h3>
<p>Thompson invented UTF-8. Pike and Thompson
converted Plan 9 to use it over the first weekend of September 1992,
in time for X/Open to choose it as the Unicode standard byte format
at a meeting the next week.</p>
<p>UTF-8 is now the standard character encoding for Unicode on
all systems and interoperating between systems.</p>
<h3>Simple, easy to modify base for experiments</h3>
<p>Whole system source code is available, simple, easy to
understand and change.
There's a reason it only took a couple days to convert to UTF-8.</p>
<pre>
49343 file server kernel
181611 main kernel
78521 ipaq port (small kernel)
20027 TCP/IP stack
15365 ipaq-specific code
43129 portable code
1326778 total lines of source code
</pre>
<h3>Dump file system</h3>
<p>Snapshot idea might well have been ``in the air'' at the time.
(<tt>OldFiles</tt> in AFS appears to be independently derived,
use of WORM media was common research topic.)</p>
<h3>Generalized Fork</h3>
<p>Picked up by other systems: FreeBSD, Linux.</p>
<h3>Authentication</h3>
<p>No global super-user.
Newer, more Plan 9-like authentication described in later paper.</p>
<h3>New Compilers</h3>
<p>Much faster than gcc, simpler.</p>
<p>8s to build acme for Linux using gcc; 1s to build acme for Plan 9 using 8c (but running on Linux)</p>
<h3>IL Protocol</h3>
<p>Now retired.
For better or worse, TCP has all the installed base.
IL didn't work very well on asymmetric or high-latency links
(e.g., cable modems).</p>
<h2>Idea propagation</h2>
<p>Many ideas have propagated out to varying degrees.</p>
<p>Linux even has bind and user-level file servers now (FUSE),
but still not per-process name spaces.</p>
</body>

202
web/l-scalablecoord.html Normal file
View file

@ -0,0 +1,202 @@
<title>Scalable coordination</title>
<html>
<head>
</head>
<body>
<h1>Scalable coordination</h1>
<p>Required reading: Mellor-Crummey and Scott, Algorithms for Scalable
Synchronization on Shared-Memory Multiprocessors, TOCS, Feb 1991.
<h2>Overview</h2>
<p>Shared memory machines are bunch of CPUs, sharing physical memory.
Typically each processor also mantains a cache (for performance),
which introduces the problem of keep caches coherent. If processor 1
writes a memory location whose value processor 2 has cached, then
processor 2's cache must be updated in some way. How?
<ul>
<li>Bus-based schemes. Any CPU can access "dance with" any memory
equally ("dance hall arch"). Use "Snoopy" protocols: Each CPU's cache
listens to the memory bus. With write-through architecture, invalidate
copy when see a write. Or can have "ownership" scheme with write-back
cache (E.g., Pentium cache have MESI bits---modified, exclusive,
shared, invalid). If E bit set, CPU caches exclusively and can do
write back. But bus places limits on scalability.
<li>More scalability w. NUMA schemes (non-uniform memory access). Each
CPU comes with fast "close" memory. Slower to access memory that is
stored with another processor. Use a directory to keep track of who is
caching what. For example, processor 0 is responsible for all memory
starting with address "000", processor 1 is responsible for all memory
starting with "001", etc.
<li>COMA - cache-only memory architecture. Each CPU has local RAM,
treated as cache. Cache lines migrate around to different nodes based
on access pattern. Data only lives in cache, no permanent memory
location. (These machines aren't too popular any more.)
</ul>
<h2>Scalable locks</h2>
<p>This paper is about cost and scalability of locking; what if you
have 10 CPUs waiting for the same lock? For example, what would
happen if xv6 runs on an SMP with many processors?
<p>What's the cost of a simple spinning acquire/release? Algorithm 1
*without* the delays, which is like xv6's implementation of acquire
and release (xv6 uses XCHG instead of test_and_set):
<pre>
each of the 10 CPUs gets the lock in turn
meanwhile, remaining CPUs in XCHG on lock
lock must be X in cache to run XCHG
otherwise all might read, then all might write
so bus is busy all the time with XCHGs!
can we avoid constant XCHGs while lock is held?
</pre>
<p>test-and-test-and-set
<pre>
only run expensive TSL if not locked
spin on ordinary load instruction, so cache line is S
acquire(l)
while(1){
while(l->locked != 0) { }
if(TSL(&l->locked) == 0)
return;
}
</pre>
<p>suppose 10 CPUs are waiting, let's count cost in total bus
transactions
<pre>
CPU1 gets lock in one cycle
sets lock's cache line to I in other CPUs
9 CPUs each use bus once in XCHG
then everyone has the line S, so they spin locally
CPU1 release the lock
CPU2 gets the lock in one cycle
8 CPUs each use bus once...
So 10 + 9 + 8 + ... = 50 transactions, O(n^2) in # of CPUs!
Look at "test-and-test-and-set" in Figure 6
</pre>
<p> Can we have <i>n</i> CPUs acquire a lock in O(<i>n</i>) time?
<p>What is the point of the exponential backoff in Algorithm 1?
<pre>
Does it buy us O(n) time for n acquires?
Is there anything wrong with it?
may not be fair
exponential backoff may increase delay after release
</pre>
<p>What's the point of the ticket locks, Algorithm 2?
<pre>
one interlocked instruction to get my ticket number
then I spin on now_serving with ordinary load
release() just increments now_serving
</pre>
<p>why is that good?
<pre>
+ fair
+ no exponential backoff overshoot
+ no spinning on
</pre>
<p>but what's the cost, in bus transactions?
<pre>
while lock is held, now_serving is S in all caches
release makes it I in all caches
then each waiters uses a bus transaction to get new value
so still O(n^2)
</pre>
<p>What's the point of the array-based queuing locks, Algorithm 3?
<pre>
a lock has an array of "slots"
waiter allocates a slot, spins on that slot
release wakes up just next slot
so O(n) bus transactions to get through n waiters: good!
anderson lines in Figure 4 and 6 are flat-ish
they only go up because lock data structures protected by simpler lock
but O(n) space *per lock*!
</pre>
<p>Algorithm 5 (MCS), the new algorithm of the paper, uses
compare_and_swap:
<pre>
int compare_and_swap(addr, v1, v2) {
int ret = 0;
// stop all memory activity and ignore interrupts
if (*addr == v1) {
*addr = v2;
ret = 1;
}
// resume other memory activity and take interrupts
return ret;
}
</pre>
<p>What's the point of the MCS lock, Algorithm 5?
<pre>
constant space per lock, rather than O(n)
one "qnode" per thread, used for whatever lock it's waiting for
lock holder's qnode points to start of list
lock variable points to end of list
acquire adds your qnode to end of list
then you spin on your own qnode
release wakes up next qnode
</pre>
<h2>Wait-free or non-blocking data structures</h2>
<p>The previous implementations all block threads when there is
contention for a lock. Other atomic hardware operations allows one
to build implementation wait-free data structures. For example, one
can make an insert of an element in a shared list that don't block a
thread. Such versions are called wait free.
<p>A linked list with locks is as follows:
<pre>
Lock list_lock;
insert(int x) {
element *n = new Element;
n->x = x;
acquire(&list_lock);
n->next = list;
list = n;
release(&list_lock);
}
</pre>
<p>A wait-free implementation is as follows:
<pre>
insert (int x) {
element *n = new Element;
n->x = x;
do {
n->next = list;
} while (compare_and_swap (&list, n->next, n) == 0);
}
</pre>
<p>How many bus transactions with 10 CPUs inserting one element in the
list? Could you do better?
<p><a href="http://www.cl.cam.ac.uk/netos/papers/2007-cpwl.pdf">This
paper by Fraser and Harris</a> compares lock-based implementations
versus corresponding non-blocking implementations of a number of data
structures.
<p>It is not possible to make every operation wait-free, and there are
times we will need an implementation of acquire and release.
research on non-blocking data structures is active; the last word
isn't said on this topic yet.
</body>

340
web/l-schedule.html Normal file
View file

@ -0,0 +1,340 @@
<title>Scheduling</title>
<html>
<head>
</head>
<body>
<h1>Scheduling</h1>
<p>Required reading: Eliminating receive livelock
<p>Notes based on prof. Morris's lecture on scheduling (6.824, fall'02).
<h2>Overview</h2>
<ul>
<li>What is scheduling? The OS policies and mechanisms to allocates
resources to entities. A good scheduling policy ensures that the most
important entitity gets the resources it needs. This topic was
popular in the days of time sharing, when there was a shortage of
resources. It seemed irrelevant in era of PCs and workstations, when
resources were plenty. Now the topic is back from the dead to handle
massive Internet servers with paying customers. The Internet exposes
web sites to international abuse and overload, which can lead to
resource shortages. Furthermore, some customers are more important
than others (e.g., the ones that buy a lot).
<li>Key problems:
<ul>
<li>Gap between desired policy and available mechanism. The desired
policies often include elements that not implementable with the
mechanisms available to the operation system. Furthermore, often
there are many conflicting goals (low latency, high throughput, and
fairness), and the scheduler must make a trade-off between the goals.
<li>Interaction between different schedulers. One have to take a
systems view. Just optimizing the CPU scheduler may do little to for
the overall desired policy.
</ul>
<li>Resources you might want to schedule: CPU time, physical memory,
disk and network I/O, and I/O bus bandwidth.
<li>Entities that you might want to give resources to: users,
processes, threads, web requests, or MIT accounts.
<li>Many polices for resource to entity allocation are possible:
strict priority, divide equally, shortest job first, minimum guarantee
combined with admission control.
<li>General plan for scheduling mechanisms
<ol>
<li> Understand where scheduling is occuring.
<li> Expose scheduling decisions, allow control.
<li> Account for resource consumption, to allow intelligent control.
</ol>
<li>Simple example from 6.828 kernel. The policy for scheduling
environments is to give each one equal CPU time. The mechanism used to
implement this policy is a clock interrupt every 10 msec and then
selecting the next environment in a round-robin fashion.
<p>But this only works if processes are compute-bound. What if a
process gives up some of its 10 ms to wait for input? Do we have to
keep track of that and give it back?
<p>How long should the quantum be? is 10 msec the right answer?
Shorter quantum will lead to better interactive performance, but
lowers overall system throughput because we will reschedule more,
which has overhead.
<p>What if the environment computes for 1 msec and sends an IPC to
the file server environment? Shouldn't the file server get more CPU
time because it operates on behalf of all other functions?
<p>Potential improvements for the 6.828 kernel: track "recent" CPU use
(e.g., over the last second) and always run environment with least
recent CPU use. (Still, if you sleep long enough you lose.) Other
solution: directed yield; specify on the yield to which environment
you are donating the remainder of the quantuam (e.g., to the file
server so that it can compute on the environment's behalf).
<li>Pitfall: Priority Inversion
<pre>
Assume policy is strict priority.
Thread T1: low priority.
Thread T2: medium priority.
Thread T3: high priority.
T1: acquire(l)
context switch to T3
T3: acquire(l)... must wait for T1 to release(l)...
context switch to T2
T2 computes for a while
T3 is indefinitely delayed despite high priority.
Can solve if T3 lends its priority to holder of lock it is waiting for.
So T1 runs, not T2.
[this is really a multiple scheduler problem.]
[since locks schedule access to locked resource.]
</pre>
<li>Pitfall: Efficiency. Efficiency often conflicts with fairness (or
any other policy). Long time quantum for efficiency in CPU scheduling
versus low delay. Shortest seek versus FIFO disk scheduling.
Contiguous read-ahead vs data needed now. For example, scheduler
swaps out my idle emacs to let gcc run faster with more phys mem.
What happens when I type a key? These don't fit well into a "who gets
to go next" scheduler framework. Inefficient scheduling may make
<i>everybody</i> slower, including high priority users.
<li>Pitfall: Multiple Interacting Schedulers. Suppose you want your
emacs to have priority over everything else. Give it high CPU
priority. Does that mean nothing else will run if emacs wants to run?
Disk scheduler might not know to favor emacs's disk I/Os. Typical
UNIX disk scheduler favors disk efficiency, not process prio. Suppose
emacs needs more memory. Other processes have dirty pages; emacs must
wait. Does disk scheduler know these other processes' writes are high
prio?
<li>Pitfall: Server Processes. Suppose emacs uses X windows to
display. The X server must serve requests from many clients. Does it
know that emacs' requests should be given priority? Does the OS know
to raise X's priority when it is serving emacs? Similarly for DNS,
and NFS. Does the network know to give emacs' NFS requests priority?
</ul>
<p>In short, scheduling is a system problem. There are many
schedulers; they interact. The CPU scheduler is usually the easy
part. The hardest part is system structure. For example, the
<i>existence</i> of interrupts is bad for scheduling. Conflicting
goals may limit effectiveness.
<h2>Case study: modern UNIX</h2>
<p>Goals:
<ul>
<li>Simplicity (e.g. avoid complex locking regimes).
<li>Quick response to device interrupts.
<li> Favor interactive response.
</ul>
<p>UNIX has a number of execution environments. We care about
scheduling transitions among them. Some transitions aren't possible,
some can't be be controlled. The execution environments are:
<ul>
<li>Process, user half
<li>Process, kernel half
<li>Soft interrupts: timer, network
<li>Device interrupts
</ul>
<p>The rules are:
<ul>
<li>User is pre-emptible.
<li>Kernel half and software interrupts are not pre-emptible.
<li>Device handlers may not make blocking calls (e.g., sleep)
<li>Effective priorities: intr > soft intr > kernel half > user
</ul>
</ul>
<p>Rules are implemented as follows:
<ul>
<li>UNIX: Process User Half. Runs in process address space, on
per-process stack. Interruptible. Pre-emptible: interrupt may cause
context switch. We don't trust user processes to yield CPU.
Voluntarily enters kernel half via system calls and faults.
<li>UNIX: Process Kernel Half. Runs in kernel address space, on
per-process kernel stack. Executes system calls and faults for its
process. Interruptible (but can defer interrupts in critical
sections). Not pre-emptible. Only yields voluntarily, when waiting
for an event. E.g. disk I/O done. This simplifies concurrency
control; locks often not required. No user process runs if any kernel
half wants to run. Many process' kernel halfs may be sleeping in the
kernel.
<li>UNIX: Device Interrupts. Hardware asks CPU for an interrupt to ask
for attention. Disk read/write completed, or network packet received.
Runs in kernel space, on special interrupt stack. Interrupt routine
cannot block; must return. Interrupts are interruptible. They nest
on the one interrupt stack. Interrupts are not pre-emptible, and
cannot really yield. The real-time clock is a device and interrupts
every 10ms (or whatever). Process scheduling decisions can be made
when interrupt returns (e.g. wake up the process waiting for this
event). You want interrupt processing to be fast, since it has
priority. Don't do any more work than you have to. You're blocking
processes and other interrupts. Typically, an interrupt does the
minimal work necessary to keep the device happy, and then call wakeup
on a thread.
<li>UNIX: Soft Interrupts. (Didn't exist in xv6) Used when device
handling is expensive. But no obvious process context in which to
run. Examples include IP forwarding, TCP input processing. Runs in
kernel space, on interrupt stack. Interruptable. Not pre-emptable,
can't really yield. Triggered by hardware interrupt. Called when
outermost hardware interrupt returns. Periodic scheduling decisions
are made in timer s/w interrupt. Scheduled by hardware timer
interrupt (i.e., if current process has run long enough, switch).
</ul>
<p>Is this good software structure? Let's talk about receive
livelock.
<h2>Paper discussion</h2>
<ul>
<li>What is application that the paper is addressing: IP forwarding.
What functionality does a network interface offer to driver?
<ul>
<li> Read packets
<li> Poke hardware to send packets
<li> Interrupts when packet received/transmit complete
<li> Buffer many input packets
</ul>
<li>What devices in the 6.828 kernel are interrupt driven? Which one
are polling? Is this ideal?
<li>Explain Figure 6-1. Why does it go up? What determines how high
the peak is? Why does it go down? What determines how fast it goes
does? Answer:
<pre>
(fraction of packets discarded)(work invested in discarded packets)
-------------------------------------------
(total work CPU is capable of)
</pre>
<li>Suppose I wanted to test an NFS server for livelock.
<pre>
Run client with this loop:
while(1){
send NFS READ RPC;
wait for response;
}
</pre>
What would I see? Is the NFS server probably subject to livelock?
(No--offered load subject to feedback).
<li>What other problems are we trying to address?
<ul>
<li>Increased latency for packet delivery and forwarding (e.g., start
disk head moving when first NFS read request comes)
<li>Transmit starvation
<li>User-level CPU starvation
</ul>
<li>Why not tell the O/S scheduler to give interrupts lower priority?
Non-preemptible.
Could you fix this by making interrupts faster? (Maybe, if coupled
with some limit on input rate.)
<li>Why not completely process each packet in the interrupt handler?
(I.e. forward it?) Other parts of kernel don't expect to run at high
interrupt-level (e.g., some packet processing code might invoke a function
that sleeps). Still might want an output queue
<li>What about using polling instead of interrupts? Solves overload
problem, but killer for latency.
<li>What's the paper's solution?
<ul>
<li>No IP input queue.
<li>Input processing and device input polling in kernel thread.
<li>Device receive interrupt just wakes up thread. And leaves
interrupts *disabled* for that device.
<li>Thread does all input processing, then re-enables interrupts.
</ul>
<p>Why does this work? What happens when packets arrive too fast?
What happens when packets arrive slowly?
<li>Explain Figure 6-3.
<ul>
<li>Why does "Polling (no quota)" work badly? (Input still starves
xmit complete processing.)
<li>Why does it immediately fall to zero, rather than gradually decreasing?
(xmit complete processing must be very cheap compared to input.)
</ul>
<li>Explain Figure 6-4.
<ul>
<li>Why does "Polling, no feedback" behave badly? There's a queue in
front of screend. We can still give 100% to input thread, 0% to
screend.
<li>Why does "Polling w/ feedback" behave well? Input thread yields
when queue to screend fills.
<li>What if screend hangs, what about other consumers of packets?
(e.g., can you ssh to machine to fix screend?) Fortunately screend
typically is only application. Also, re-enable input after timeout.
</ul>
<li>Why are the two solutions different?
<ol>
<li> Polling thread <i>with quotas</i>.
<li> Feedback from full queue.
</ol>
(I believe they should have used #2 for both.)
<li>If we apply the proposed fixes, does the phenomemon totally go
away? (e.g. for web server, waits for disk, &c.)
<ul>
<li>Can the net device throw away packets without slowing down host?
<li>Problem: We want to drop packets for applications with big queues.
But requires work to determine which application a packet belongs to
Solution: NI-LRP (have network interface sort packets)
</ul>
<li>What about latency question? (Look at figure 14 p. 243.)
<ul>
<li>1st packet looks like an improvement over non-polling. But 2nd
packet transmitted later with poling. Why? (No new packets added to
xmit buffer until xmit interrupt)
<li>Why? In traditional BSD, to
amortize cost of poking device. Maybe better to poke a second time
anyway.
</ul>
<li>What if processing has more complex structure?
<ul>
<li>Chain of processing stages with queues? Does feedback work?
What happens when a late stage is slow?
<li>Split at some point, multiple parallel paths? No so great; one
slow path blocks all paths.
</ul>
<li>Can we formulate any general principles from paper?
<ul>
<li>Don't spend time on new work before completing existing work.
<li>Or give new work lower priority than partially-completed work.
</ul>
</ul>

316
web/l-threads.html Normal file
View file

@ -0,0 +1,316 @@
<title>L8</title>
<html>
<head>
</head>
<body>
<h1>Threads, processes, and context switching</h1>
<p>Required reading: proc.c (focus on scheduler() and sched()),
setjmp.S, and sys_fork (in sysproc.c)
<h2>Overview</h2>
<p>Big picture: more programs than processors. How to share the
limited number of processors among the programs?
<p>Observation: most programs don't need the processor continuously,
because they frequently have to wait for input (from user, disk,
network, etc.)
<p>Idea: when one program must wait, it releases the processor, and
gives it to another program.
<p>Mechanism: thread of computation, an active active computation. A
thread is an abstraction that contains the minimal state that is
necessary to stop an active and an resume it at some point later.
What that state is depends on the processor. On x86, it is the
processor registers (see setjmp.S).
<p>Address spaces and threads: address spaces and threads are in
principle independent concepts. One can switch from one thread to
another thread in the same address space, or one can switch from one
thread to another thread in another address space. Example: in xv6,
one switches address spaces by switching segmentation registers (see
setupsegs). Does xv6 ever switch from one thread to another in the
same address space? (Answer: yes, v6 switches, for example, from the
scheduler, proc[0], to the kernel part of init, proc[1].) In the JOS
kernel we switch from the kernel thread to a user thread, but we don't
switch kernel space necessarily.
<p>Process: one address space plus one or more threads of computation.
In xv6 all <i>user</i> programs contain one thread of computation and
one address space, and the concepts of address space and threads of
computation are not separated but bundled together in the concept of a
process. When switching from the kernel program (which has multiple
threads) to a user program, xv6 switches threads (switching from a
kernel stack to a user stack) and address spaces (the hardware uses
the kernel segment registers and the user segment registers).
<p>xv6 supports the following operations on processes:
<ul>
<li>fork; create a new process, which is a copy of the parent.
<li>exec; execute a program
<li>exit: terminte process
<li>wait: wait for a process to terminate
<li>kill: kill process
<li>sbrk: grow the address space of a process.
</ul>
This interfaces doesn't separate threads and address spaces. For
example, with this interface one cannot create additional threads in
the same threads. Modern Unixes provides additional primitives
(called pthreads, POSIX threads) to create additional threads in a
process and coordinate their activities.
<p>Scheduling. The thread manager needs a method for deciding which
thread to run if multiple threads are runnable. The xv6 policy is to
run the processes round robin. Why round robin? What other methods
can you imagine?
<p>Preemptive scheduling. To force a thread to release the processor
periodically (in case the thread never calls sleep), a thread manager
can use preemptive scheduling. The thread manager uses the clock chip
to generate periodically a hardware interrupt, which will cause
control to transfer to the thread manager, which then can decide to
run another thread (e.g., see trap.c).
<h2>xv6 code examples</h2>
<p>Thread switching is implemented in xv6 using setjmp and longjmp,
which take a jumpbuf as an argument. setjmp saves its context in a
jumpbuf for later use by longjmp. longjmp restores the context saved
by the last setjmp. It then causes execution to continue as if the
call of setjmp has just returned 1.
<ul>
<li>setjmp saves: ebx, exc, edx, esi, edi, esp, ebp, and eip.
<li>longjmp restores them, and puts 1 in eax!
</ul>
<p> Example of thread switching: proc[0] switches to scheduler:
<ul>
<li>1359: proc[0] calls iget, which calls sleep, which calls sched.
<li>2261: The stack before the call to setjmp in sched is:
<pre>
CPU 0:
eax: 0x10a144 1089860
ecx: 0x6c65746e 1818588270
edx: 0x0 0
ebx: 0x10a0e0 1089760
esp: 0x210ea8 2166440
ebp: 0x210ebc 2166460
esi: 0x107f20 1081120
edi: 0x107740 1079104
eip: 0x1023c9
eflags 0x12
cs: 0x8
ss: 0x10
ds: 0x10
es: 0x10
fs: 0x10
gs: 0x10
00210ea8 [00210ea8] 10111e
00210eac [00210eac] 210ebc
00210eb0 [00210eb0] 10239e
00210eb4 [00210eb4] 0001
00210eb8 [00210eb8] 10a0e0
00210ebc [00210ebc] 210edc
00210ec0 [00210ec0] 1024ce
00210ec4 [00210ec4] 1010101
00210ec8 [00210ec8] 1010101
00210ecc [00210ecc] 1010101
00210ed0 [00210ed0] 107740
00210ed4 [00210ed4] 0001
00210ed8 [00210ed8] 10cd74
00210edc [00210edc] 210f1c
00210ee0 [00210ee0] 100bbc
00210ee4 [00210ee4] 107740
</pre>
<li>2517: stack at beginning of setjmp:
<pre>
CPU 0:
eax: 0x10a144 1089860
ecx: 0x6c65746e 1818588270
edx: 0x0 0
ebx: 0x10a0e0 1089760
esp: 0x210ea0 2166432
ebp: 0x210ebc 2166460
esi: 0x107f20 1081120
edi: 0x107740 1079104
eip: 0x102848
eflags 0x12
cs: 0x8
ss: 0x10
ds: 0x10
es: 0x10
fs: 0x10
gs: 0x10
00210ea0 [00210ea0] 1023cf <--- return address (sched)
00210ea4 [00210ea4] 10a144
00210ea8 [00210ea8] 10111e
00210eac [00210eac] 210ebc
00210eb0 [00210eb0] 10239e
00210eb4 [00210eb4] 0001
00210eb8 [00210eb8] 10a0e0
00210ebc [00210ebc] 210edc
00210ec0 [00210ec0] 1024ce
00210ec4 [00210ec4] 1010101
00210ec8 [00210ec8] 1010101
00210ecc [00210ecc] 1010101
00210ed0 [00210ed0] 107740
00210ed4 [00210ed4] 0001
00210ed8 [00210ed8] 10cd74
00210edc [00210edc] 210f1c
</pre>
<li>2519: What is saved in jmpbuf of proc[0]?
<li>2529: return 0!
<li>2534: What is in jmpbuf of cpu 0? The stack is as follows:
<pre>
CPU 0:
eax: 0x0 0
ecx: 0x6c65746e 1818588270
edx: 0x108aa4 1084068
ebx: 0x10a0e0 1089760
esp: 0x210ea0 2166432
ebp: 0x210ebc 2166460
esi: 0x107f20 1081120
edi: 0x107740 1079104
eip: 0x10286e
eflags 0x46
cs: 0x8
ss: 0x10
ds: 0x10
es: 0x10
fs: 0x10
gs: 0x10
00210ea0 [00210ea0] 1023fe
00210ea4 [00210ea4] 108aa4
00210ea8 [00210ea8] 10111e
00210eac [00210eac] 210ebc
00210eb0 [00210eb0] 10239e
00210eb4 [00210eb4] 0001
00210eb8 [00210eb8] 10a0e0
00210ebc [00210ebc] 210edc
00210ec0 [00210ec0] 1024ce
00210ec4 [00210ec4] 1010101
00210ec8 [00210ec8] 1010101
00210ecc [00210ecc] 1010101
00210ed0 [00210ed0] 107740
00210ed4 [00210ed4] 0001
00210ed8 [00210ed8] 10cd74
00210edc [00210edc] 210f1c
</pre>
<li>2547: return 1! stack looks as follows:
<pre>
CPU 0:
eax: 0x1 1
ecx: 0x108aa0 1084064
edx: 0x108aa4 1084068
ebx: 0x10074 65652
esp: 0x108d40 1084736
ebp: 0x108d5c 1084764
esi: 0x10074 65652
edi: 0xffde 65502
eip: 0x102892
eflags 0x6
cs: 0x8
ss: 0x10
ds: 0x10
es: 0x10
fs: 0x10
gs: 0x10
00108d40 [00108d40] 10231c
00108d44 [00108d44] 10a144
00108d48 [00108d48] 0010
00108d4c [00108d4c] 0021
00108d50 [00108d50] 0000
00108d54 [00108d54] 0000
00108d58 [00108d58] 10a0e0
00108d5c [00108d5c] 0000
00108d60 [00108d60] 0001
00108d64 [00108d64] 0000
00108d68 [00108d68] 0000
00108d6c [00108d6c] 0000
00108d70 [00108d70] 0000
00108d74 [00108d74] 0000
00108d78 [00108d78] 0000
00108d7c [00108d7c] 0000
</pre>
<li>2548: where will longjmp return? (answer: 10231c, in scheduler)
<li>2233:Scheduler on each processor selects in a round-robin fashion the
first runnable process. Which process will that be? (If we are
running with one processor.) (Ans: proc[0].)
<li>2229: what will be saved in cpu's jmpbuf?
<li>What is in proc[0]'s jmpbuf?
<li>2548: return 1. Stack looks as follows:
<pre>
CPU 0:
eax: 0x1 1
ecx: 0x6c65746e 1818588270
edx: 0x0 0
ebx: 0x10a0e0 1089760
esp: 0x210ea0 2166432
ebp: 0x210ebc 2166460
esi: 0x107f20 1081120
edi: 0x107740 1079104
eip: 0x102892
eflags 0x2
cs: 0x8
ss: 0x10
ds: 0x10
es: 0x10
fs: 0x10
gs: 0x10
00210ea0 [00210ea0] 1023cf <--- return to sleep
00210ea4 [00210ea4] 108aa4
00210ea8 [00210ea8] 10111e
00210eac [00210eac] 210ebc
00210eb0 [00210eb0] 10239e
00210eb4 [00210eb4] 0001
00210eb8 [00210eb8] 10a0e0
00210ebc [00210ebc] 210edc
00210ec0 [00210ec0] 1024ce
00210ec4 [00210ec4] 1010101
00210ec8 [00210ec8] 1010101
00210ecc [00210ecc] 1010101
00210ed0 [00210ed0] 107740
00210ed4 [00210ed4] 0001
00210ed8 [00210ed8] 10cd74
00210edc [00210edc] 210f1c
</pre>
</ul>
<p>Why switch from proc[0] to the processor stack, and then to
proc[0]'s stack? Why not instead run the scheduler on the kernel
stack of the last process that run on that cpu?
<ul>
<li>If the scheduler wanted to use the process stack, then it couldn't
have any stack variables live across process scheduling, since
they'd be different depending on which process just stopped running.
<li>Suppose process p goes to sleep on CPU1, so CPU1 is idling in
scheduler() on p's stack. Someone wakes up p. CPU2 decides to run
p. Now p is running on its stack, and CPU1 is also running on the
same stack. They will likely scribble on each others' local
variables, return pointers, etc.
<li>The same thing happens if CPU1 tries to reuse the process's page
tables to avoid a TLB flush. If the process gets killed and cleaned
up by the other CPU, now the page tables are wrong. I think some OSes
actually do this (with appropriate ref counting).
</ul>
<p>How is preemptive scheduling implemented in xv6? Answer see trap.c
line 2905 through 2917, and the implementation of yield() on sheet
22.
<p>How long is a timeslice for a user process? (possibly very short;
very important lock is held across context switch!)
</body>

462
web/l-vm.html Normal file
View file

@ -0,0 +1,462 @@
<html>
<head>
<title>Virtual Machines</title>
</head>
<body>
<h1>Virtual Machines</h1>
<p>Required reading: Disco</p>
<h2>Overview</h2>
<p>What is a virtual machine? IBM definition: a fully protected and
isolated copy of the underlying machine's hardware.</p>
<p>Another view is that it provides another example of a kernel API.
In contrast to other kernel APIs (unix, microkernel, and exokernel),
the virtual machine operating system exports as the kernel API the
processor API (e.g., the x86 interface). Thus, each program running
in user space sees the services offered by a processor, and each
program sees its own processor. Of course, we don't want to make a
system call for each instruction, and in fact one of the main
challenges in virtual machine operation systems is to design the
system in such a way that the physical processor executes the virtual
processor API directly, at processor speed.
<p>
Virtual machines can be useful for a number of reasons:
<ol>
<li>Run multiple operating systems on single piece of hardware. For
example, in one process, you run Linux, and in another you run
Windows/XP. If the kernel API is identical to the x86 (and faithly
emulates x86 instructions, state, protection levels, page tables),
then Linux and Windows/XP, the virual machine operationg system can
run these <i>guest</i> operating systems without modifications.
<ul>
<li>Run "older" programs on the same hardware (e.g., run one x86
virtual machine in real mode to execute old DOS apps).
<li>Or run applications that require different operating system.
</ul>
<li>Fault isolation: like processes on UNIX but more complete, because
the guest operating systems runs on the virtual machine in user space.
Thus, faults in the guest OS cannot effect any other software.
<li>Customizing the apparent hardware: virtual machine may have
different view of hardware than is physically present.
<li>Simplify deployment/development of software for scalable
processors (e.g., Disco).
</ol>
</p>
<p>If your operating system isn't a virtual machine operating system,
what are the alternatives? Processor simulation (e.g., bochs) or
binary emulation (WINE). Simulation runs instructions purely in
software and is slow (e.g., 100x slow down for bochs); virtualization
gets out of the way whenever possible and can be efficient.
<p>Simulation gives portability whereas virtualization focuses on
performance. However, this means that you need to model your hardware
very carefully in software. Binary emulation focuses on just getting
system call for a particular operating system's interface. Binary
emulation can be hard because it is targetted towards a particular
operating system (and even that can change between revisions).
</p>
<p>To provide each process with its own virtual processor that exports
the same API as the physical processor, what features must
the virtual machine operating system virtualize?
<ol>
<li>CPU: instructions -- trap all privileged instructions</li>
<li>Memory: address spaces -- map "physical" pages managed
by the guest OS to <i>machine</i>pages, handle translation, etc.</li>
<li>Devices: any I/O communication needs to be trapped and passed
through/handled appropriately.</li>
</ol>
</p>
The software that implements the virtualization is typically called
the monitor, instead of the virtual machine operating system.
<p>Virtual machine monitors (VMM) can be implemented in two ways:
<ol>
<li>Run VMM directly on hardware: like Disco.</li>
<li>Run VMM as an application (though still running as root, with
integration into OS) on top of a <i>host</i> OS: like VMware. Provides
additional hardware support at low development cost in
VMM. Intercept CPU-level I/O requests and translate them into
system calls (e.g. <code>read()</code>).</li>
</ol>
</p>
<p>The three primary functions of a virtual machine monitor are:
<ul>
<li>virtualize processor (CPU, memory, and devices)
<li>dispatch events (e.g., forward page fault trap to guest OS).
<li>allocate resources (e.g., divide real memory in some way between
the physical memory of each guest OS).
</ul>
<h2>Virtualization in detail</h2>
<h3>Memory virtualization</h3>
<p>
Understanding memory virtualization. Let's consider the MIPS example
from the paper. Ideally, we'd be able to intercept and rewrite all
memory address references. (e.g., by intercepting virtual memory
calls). Why can't we do this on the MIPS? (There are addresses that
don't go through address translation --- but we don't want the virtual
machine to directly access memory!) What does Disco do to get around
this problem? (Relink the kernel outside this address space.)
</p>
<p>
Having gotten around that problem, how do we handle things in general?
</p>
<pre>
// Disco's tlb miss handler.
// Called when a memory reference for virtual adddress
// 'VA' is made, but there is not VA->MA (virtual -> machine)
// mapping in the cpu's TLB.
void tlb_miss_handler (VA)
{
// see if we have a mapping in our "shadow" tlb (which includes
// "main" tlb)
tlb_entry *t = tlb_lookup (thiscpu->l2tlb, va);
if (t && defined (thiscpu->pmap[t->pa])) // is there a MA for this PA?
tlbwrite (va, thiscpu->pmap[t->pa], t->otherdata);
else if (t)
// get a machine page, copy physical page into, and tlbwrite
else
// trap to the virtual CPU/OS's handler
}
// Disco's procedure which emulates the MIPS
// instruction which writes to the tlb.
//
// VA -- virtual addresss
// PA -- physical address (NOT MA machine address!)
// otherdata -- perms and stuff
void emulate_tlbwrite_instruction (VA, PA, otherdata)
{
tlb_insert (thiscpu->l2tlb, VA, PA, otherdata); // cache
if (!defined (thiscpu->pmap[PA])) { // fill in pmap dynamically
MA = allocate_machine_page ();
thiscpu->pmap[PA] = MA; // See 4.2.2
thiscpu->pmapbackmap[MA] = PA;
thiscpu->memmap[MA] = VA; // See 4.2.3 (for TLB shootdowns)
}
tlbwrite (va, thiscpu->pmap[PA], otherdata);
}
// Disco's procedure which emulates the MIPS
// instruction which read the tlb.
tlb_entry *emulate_tlbread_instruction (VA)
{
// Must return a TLB entry that has a "Physical" address;
// This is recorded in our secondary TLB cache.
// (We don't have to read from the hardware TLB since
// all writes to the hardware TLB are mediated by Disco.
// Thus we can always keep the l2tlb up to date.)
return tlb_lookup (thiscpu->l2tlb, va);
}
</pre>
<h3>CPU virtualization</h3>
<p>Requirements:
<ol>
<li>Results of executing non-privileged instructions in privileged and
user mode must be equivalent. (Why? B/c the virtual "privileged"
system will not be running in true "privileged" mode.)
<li>There must be a way to protect the VM from the real machine. (Some
sort of memory protection/address translation. For fault isolation.)</li>
<li>There must be a way to detect and transfer control to the VMM when
the VM tries to execute a sensitive instruction (e.g. a privileged
instruction, or one that could expose the "virtualness" of the
VM.) It must be possible to emulate these instructions in
software. Can be classified into completely virtualizable
(i.e. there are protection mechanisms that cause traps for all
instructions), partly (insufficient or incomplete trap
mechanisms), or not at all (e.g. no MMU).
</ol>
</p>
<p>The MIPS didn't quite meet the second criteria, as discussed
above. But, it does have a supervisor mode that is between user mode and
kernel mode where any privileged instruction will trap.</p>
<p>What might a the VMM trap handler look like?</p>
<pre>
void privilege_trap_handler (addr) {
instruction, args = decode_instruction (addr)
switch (instruction) {
case foo:
emulate_foo (thiscpu, args, ...);
break;
case bar:
emulate_bar (thiscpu, args, ...);
break;
case ...:
...
}
}
</pre>
<p>The <code>emulator_foo</code> bits will have to evaluate the
state of the virtual CPU and compute the appropriate "fake" answer.
</p>
<p>What sort of state is needed in order to appropriately emulate all
of these things?
<pre>
- all user registers
- CPU specific regs (e.g. on x86, %crN, debugging, FP...)
- page tables (or tlb)
- interrupt tables
</pre>
This is needed for each virtual processor.
</p>
<h3>Device I/O virtualization</h3>
<p>We intercept all communication to the I/O devices: read/writes to
reserved memory addresses cause page faults into special handlers
which will emulate or pass through I/O as appropriate.
</p>
<p>
In a system like Disco, the sequence would look something like:
<ol>
<li>VM executes instruction to access I/O</li>
<li>Trap generated by CPU (based on memory or privilege protection)
transfers control to VMM.</li>
<li>VMM emulates I/O instruction, saving information about where this
came from (for demultiplexing async reply from hardware later) .</li>
<li>VMM reschedules a VM.</li>
</ol>
</p>
<p>
Interrupts will require some additional work:
<ol>
<li>Interrupt occurs on real machine, transfering control to VMM
handler.</li>
<li>VMM determines the VM that ought to receive this interrupt.</li>
<li>VMM causes a simulated interrupt to occur in the VM, and reschedules a
VM.</li>
<li>VM runs its interrupt handler, which may involve other I/O
instructions that need to be trapped.</li>
</ol>
</p>
<p>
The above can be slow! So sometimes you want the guest operating
system to be aware that it is a guest and allow it to avoid the slow
path. Special device drivers or changing instructions that would cause
traps into memory read/write instructions.
</p>
<h2>Intel x86/vmware</h2>
<p>VMware, unlike Disco, runs as an application on a guest OS and
cannot modify the guest OS. Furthermore, it must virtualize the x86
instead of MIPS processor. Both of these differences make good design
challenges.
<p>The first challenge is that the monitor runs in user space, yet it
must dispatch traps and it must execute privilege instructions, which
both require kernel privileges. To address this challenge, the
monitor downloads a piece of code, a kernel module, into the guest
OS. Most modern operating systems are constructed as a core kernel,
extended with downloadable kernel modules.
Privileged users can insert kernel modules at run-time.
<p>The monitor downloads a kernel module that reads the IDT, copies
it, and overwrites the hard-wired entries with addresses for stubs in
the just downloaded kernel module. When a trap happens, the kernel
module inspects the PC, and either forwards the trap to the monitor
running in user space or to the guest OS. If the trap is caused
because a guest OS execute a privileged instructions, the monitor can
emulate that privilege instruction by asking the kernel module to
perform that instructions (perhaps after modifying the arguments to
the instruction).
<p>The second challenge is virtualizing the x86
instructions. Unfortunately, x86 doesn't meet the 3 requirements for
CPU virtualization. the first two requirements above. If you run
the CPU in ring 3, <i>most</i> x86 instructions will be fine,
because most privileged instructions will result in a trap, which
can then be forwarded to vmware for emulation. For example,
consider a guest OS loading the root of a page table in CR3. This
results in trap (the guest OS runs in user space), which is
forwarded to the monitor, which can emulate the load to CR3 as
follows:
<pre>
// addr is a physical address
void emulate_lcr3 (thiscpu, addr)
{
thiscpu->cr3 = addr;
Pte *fakepdir = lookup (addr, oldcr3cache);
if (!fakepdir) {
fakedir = ppage_alloc ();
store (oldcr3cache, addr, fakedir);
// May wish to scan through supplied page directory to see if
// we have to fix up anything in particular.
// Exact settings will depend on how we want to handle
// problem cases below and our own MM.
}
asm ("movl fakepdir,%cr3");
// Must make sure our page fault handler is in sync with what we do here.
}
</pre>
<p>To virtualize the x86, the monitor must intercept any modifications
to the page table and substitute appropriate responses. And update
things like the accessed/dirty bits. The monitor can arrange for this
to happen by making all page table pages inaccessible so that it can
emulate loads and stores to page table pages. This setup allow the
monitor to virtualize the memory interface of the x86.</p>
<p>Unfortunately, not all instructions that must be virtualized result
in traps:
<ul>
<li><code>pushf/popf</code>: <code>FL_IF</code> is handled different,
for example. In user-mode setting FL_IF is just ignored.</li>
<li>Anything (<code>push</code>, <code>pop</code>, <code>mov</code>)
that reads or writes from <code>%cs</code>, which contains the
privilege level.
<li>Setting the interrupt enable bit in EFLAGS has different
semantics in user space and kernel space. In user space, it
is ignored; in kernel space, the bit is set.
<li>And some others... (total, 17 instructions).
</ul>
These instructions are unpriviliged instructions (i.e., don't cause a
trap when executed by a guest OS) but expose physical processor state.
These could reveal details of virtualization that should not be
revealed. For example, if guest OS sets the interrupt enable bit for
its virtual x86, the virtualized EFLAGS should reflect that the bit is
set, even though the guest OS is running in user space.
<p>How can we virtualize these instructions? An approach is to decode
the instruction stream that is provided by the user and look for bad
instructions. When we find them, replace them with an interrupt
(<code>INT 3</code>) that will allow the VMM to handle it
correctly. This might look something like:
</p>
<pre>
void initcode () {
scan_for_nonvirtual (0x7c00);
}
void scan_for_nonvirtualizable (thiscpu, startaddr) {
addr = startaddr;
instr = disassemble (addr);
while (instr is not branch or bad) {
addr += len (instr);
instr = disassemble (addr);
}
// remember that we wanted to execute this instruction.
replace (addr, "int 3");
record (thiscpu->rewrites, addr, instr);
}
void breakpoint_handler (tf) {
oldinstr = lookup (thiscpu->rewrites, tf->eip);
if (oldinstr is branch) {
newcs:neweip = evaluate branch
scan_for_nonvirtualizable (thiscpu, newcs:neweip)
return;
} else { // something non virtualizable
// dispatch to appropriate emulation
}
}
</pre>
<p>All pages must be scanned in this way. Fortunately, most pages
probably are okay and don't really need any special handling so after
scanning them once, we can just remember that the page is okay and let
it run natively.
</p>
<p>What if a guest OS generates instructions, writes them to memory,
and then wants to execute them? We must detect self-modifying code
(e.g. must simulate buffer overflow attacks correctly.) When a write
to a physical page that happens to be in code segment happens, must
trap the write and then rescan the affected portions of the page.</p>
<p>What about self-examining code? Need to protect it some
how---possibly by playing tricks with instruction/data TLB caches, or
introducing a private segment for code (%cs) that is different than
the segment used for reads/writes (%ds).
</p>
<h2>Some Disco paper notes</h2>
<p>
Disco has some I/O specific optimizations.
</p>
<ul>
<li>Disk reads only need to happen once and can be shared between
virtual machines via copy-on-write virtual memory tricks.</li>
<li>Network cards do not need to be fully virtualized --- intra
VM communication doesn't need a real network card backing it.</li>
<li>Special handling for NFS so that all VMs "share" a buffer cache.</li>
</ul>
<p>
Disco developers clearly had access to IRIX source code.
</p>
<ul>
<li>Need to deal with KSEG0 segment of MIPS memory by relinking kernel
at different address space.</li>
<li>Ensuring page-alignment of network writes (for the purposes of
doing memory map tricks.)</li>
</ul>
<p>Performance?</p>
<ul>
<li>Evaluated in simulation.</li>
<li>Where are the overheads? Where do they come from?</li>
<li>Does it run better than NUMA IRIX?</li>
</ul>
<p>Premise. Are virtual machine the preferred approach to extending
operating systems? Have scalable multiprocessors materialized?</p>
<h2>Related papers</h2>
<p>John Scott Robin, Cynthia E. Irvine. <a
href="http://www.cs.nps.navy.mil/people/faculty/irvine/publications/2000/VMM-usenix00-0611.pdf">Analysis of the
Intel Pentium's Ability to Support a Secure Virtual Machine
Monitor</a>.</p>
<p>Jeremy Sugerman, Ganesh Venkitachalam, Beng-Hong Lim. <a
href="http://www.vmware.com/resources/techresources/530">Virtualizing
I/O Devices on VMware Workstation's Hosted Virtual Machine
Monitor</a>. In Proceedings of the 2001 Usenix Technical Conference.</p>
<p>Kevin Lawton, Drew Northup. <a
href="http://savannah.nongnu.org/projects/plex86">Plex86 Virtual
Machine</a>.</p>
<p><a href="http://www.cl.cam.ac.uk/netos/papers/2003-xensosp.pdf">Xen
and the Art of Virtualization</a>, Paul Barham, Boris
Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf
Neugebauer, Ian Pratt, Andrew Warfield, SOSP 2003</p>
<p><a href="http://www.vmware.com/pdf/asplos235_adams.pdf">A comparison of
software and hardware techniques for x86 virtualizaton</a>Keith Adams
and Ole Agesen, ASPLOS 2006</p>
</body>
</html>

246
web/l-xfi.html Normal file
View file

@ -0,0 +1,246 @@
<html>
<head>
<title>XFI</title>
</head>
<body>
<h1>XFI</h1>
<p>Required reading: XFI: software guards for system address spaces.
<h2>Introduction</h2>
<p>Problem: how to use untrusted code (an "extension") in a trusted
program?
<ul>
<li>Use untrusted jpeg codec in Web browser
<li>Use an untrusted driver in the kernel
</ul>
<p>What are the dangers?
<ul>
<li>No fault isolations: extension modifies trusted code unintentionally
<li>No protection: extension causes a security hole
<ul>
<li>Extension has a buffer overrun problem
<li>Extension calls trusted program's functions
<li>Extensions calls a trusted program's functions that is allowed to
call, but supplies "bad" arguments
<li>Extensions calls privileged hardware instructions (when extending
kernel)
<li>Extensions reads data out of trusted program it shouldn't.
</ul>
</ul>
<p>Possible solutions approaches:
<ul>
<li>Run extension in its own address space with minimal
privileges. Rely on hardware and operating system protection
mechanism.
<li>Restrict the language in which the extension is written:
<ul>
<li>Packet filter language. Language is limited in its capabilities,
and it easy to guarantee "safe" execution.
<li>Type-safe language. Language runtime and compiler guarantee "safe"
execution.
</ul>
<li>Software-based sandboxing.
</ul>
<h2>Software-based sandboxing</h2>
<p>Sandboxer. A compiler or binary-rewriter sandboxes all unsafe
instructions in an extension by inserting additional instructions.
For example, every indirect store is preceded by a few instructions
that compute and check the target of the store at runtime.
<p>Verifier. When the extension is loaded in the trusted program, the
verifier checks if the extension is appropriately sandboxed (e.g.,
are all indirect stores sandboxed? does it call any privileged
instructions?). If not, the extension is rejected. If yes, the
extension is loaded, and can run. If the extension runs, the
instruction that sandbox unsafe instructions check if the unsafe
instruction is used in a safe way.
<p>The verifier must be trusted, but the sandboxer doesn't. We can do
without the verifier, if the trusted program can establish that the
extension has been sandboxed by a trusted sandboxer.
<p>The paper refers to this setup as instance of proof-carrying code.
<h2>Software fault isolation</h2>
<p><a href="http://citeseer.ist.psu.edu/wahbe93efficient.html">SFI</a>
by Wahbe et al. explored out to use sandboxing for fault isolation
extensions; that is, use sandboxing to control that stores and jump
stay within a specified memory range (i.e., they don't overwrite and
jump into addresses in the trusted program unchecked). They
implemented SFI for a RISC processor, which simplify things since
memory can be written only by store instructions (other instructions
modify registers). In addition, they assumed that there were plenty
of registers, so that they can dedicate a few for sandboxing code.
<p>The extension is loaded into a specific range (called a segment)
within the trusted application's address space. The segment is
identified by the upper bits of the addresses in the
segment. Separate code and data segments are necessary to prevent an
extension overwriting its code.
<p>An unsafe instruction on the MIPS is an instruction that jumps or
stores to an address that cannot be statically verified to be within
the correct segment. Most control transfer operations, such
program-counter relative can be statically verified. Stores to
static variables often use an immediate addressing mode and can be
statically verified. Indirect jumps and indirect stores are unsafe.
<p>To sandbox those instructions the sandboxer could generate the
following code for each unsafe instruction:
<pre>
DR0 <- target address
R0 <- DR0 >> shift-register; // load in R0 segment id of target
CMP R0, segment-register; // compare to segment id to segment's ID
BNE fault-isolation-error // if not equal, branch to trusted error code
STORE using DR0
</pre>
In this code, DR0, shift-register, and segment register
are <i>dedicated</i>: they cannot be used by the extension code. The
verifier must check if the extension doesn't use they registers. R0
is a scratch register, but doesn't have to be dedicated. The
dedicated registers are necessary, because otherwise extension could
load DR0 and jump to the STORE instruction directly, skipping the
check.
<p>This implementation costs 4 registers, and 4 additional instructions
for each unsafe instruction. One could do better, however:
<pre>
DR0 <- target address & and-mask-register // mask segment ID from target
DR0 <- DR0 | segment register // insert this segment's ID
STORE using DR0
</pre>
This code just sets the write segment ID bits. It doesn't catch
illegal addresses; it just ensures that illegal addresses are within
the segment, harming the extension but no other code. Even if the
extension jumps to the second instruction of this sandbox sequence,
nothing bad will happen (because DR0 will already contain the correct
segment ID).
<p>Optimizations include:
<ul>
<li>use guard zones for <i>store value, offset(reg)</i>
<li>treat SP as dedicated register (sandbox code that initializes it)
<li>etc.
</ul>
<h2>XFI</h2>
<p>XFI extends SFI in several ways:
<ul>
<li>Handles fault isolation and protection
<li>Uses control-folow integrity (CFI) to get good performance
<li>Doesn't use dedicated registers
<li>Use two stacks (a scoped stack and an allocation stack) and only
allocation stack can be corrupted by buffer-overrun attacks. The
scoped stack cannot via computed memory references.
<li>Uses a binary rewriter.
<li>Works for the x86
</ul>
<p>x86 is challenging, because limited registers and variable length
of instructions. SFI technique won't work with x86 instruction
set. For example if the binary contains:
<pre>
25 CD 80 00 00 # AND eax, 0x80CD
</pre>
and an adversary can arrange to jump to the second byte, then the
adversary calls system call on Linux, which has binary the binary
representation CD 80. Thus, XFI must control execution flow.
<p>XFI policy goals:
<ul>
<li>Memory-access constraints (like SFI)
<li>Interface restrictions (extension has fixed entry and exit points)
<li>Scoped-stack integrity (calling stack is well formed)
<li>Simplified instructions semantics (remove dangerous instructions)
<li>System-environment integrity (ensure certain machine model
invariants, such as x86 flags register cannot be modified)
<li>Control-flow integrity: execution must follow a static, expected
control-flow graph. (enter at beginning of basic blocks)
<li>Program-data integrity (certain global variables in extension
cannot be accessed via computed memory addresses)
</ul>
<p>The binary rewriter inserts guards to ensure these properties. The
verifier check if the appropriate guards in place. The primary
mechanisms used are:
<ul>
<li>CFI guards on computed control-flow transfers (see figure 2)
<li>Two stacks
<li>Guards on computer memory accesses (see figure 3)
<li>Module header has a section that contain access permissions for
region
<li>Binary rewriter, which performs intra-procedure analysis, and
generates guards, code for stack use, and verification hints
<li>Verifier checks specific conditions per basic block. hints specify
the verification state for the entry to each basic block, and at
exit of basic block the verifier checks that the final state implies
the verification state at entry to all possible successor basic
blocks. (see figure 4)
</ul>
<p>Can XFI protect against the attack discussed in last lecture?
<pre>
unsigned int j;
p=(unsigned char *)s->init_buf->data;
j= *(p++);
s->session->session_id_length=j;
memcpy(s->session->session_id,p,j);
</pre>
Where will <i>j</i> be located?
<p>How about the following one from the paper <a href="http://research.microsoft.com/users/jpincus/beyond-stack-smashing.pdf"><i>Beyond stack smashing:
recent advances in exploiting buffer overruns</i></a>?
<pre>
void f2b(void * arg, size_t len) {
char buf[100];
long val = ..;
long *ptr = ..;
extern void (*f)();
memcopy(buff, arg, len);
*ptr = val;
f();
...
return;
}
</pre>
What code can <i>(*f)()</i> call? Code that the attacker inserted?
Code in libc?
<p>How about an attack that use <i>ptr</i> in the above code to
overwrite a method's address in a class's dispatch table with an
address of support function?
<p>How about <a href="http://research.microsoft.com/~shuochen/papers/usenix05data_attack.pdf">data-only attacks</a>? For example, attacker
overwrites <i>pw_uid</i> in the heap with 0 before the following
code executes (when downloading /etc/passwd and then uploading it with a
modified entry).
<pre>
FILE *getdatasock( ... ) {
seteuid(0);
setsockeope ( ...);
...
seteuid(pw->pw_uid);
...
}
</pre>
<p>How much does XFI slow down applications? How many more
instructions are executed? (see Tables 1-4)
</body>

288
web/l1.html Normal file
View file

@ -0,0 +1,288 @@
<title>L1</title>
<html>
<head>
</head>
<body>
<h1>OS overview</h1>
<h2>Overview</h2>
<ul>
<li>Goal of course:
<ul>
<li>Understand operating systems in detail by designing and
implementing miminal OS
<li>Hands-on experience with building systems ("Applying 6.033")
</ul>
<li>What is an operating system?
<ul>
<li>a piece of software that turns the hardware into something useful
<li>layered picture: hardware, OS, applications
<li>Three main functions: fault isolate applications, abstract hardware,
manage hardware
</ul>
<li>Examples:
<ul>
<li>OS-X, Windows, Linux, *BSD, ... (desktop, server)
<li>PalmOS Windows/CE (PDA)
<li>Symbian, JavaOS (Cell phones)
<li>VxWorks, pSOS (real-time)
<li> ...
</ul>
<li>OS Abstractions
<ul>
<li>processes: fork, wait, exec, exit, kill, getpid, brk, nice, sleep,
trace
<li>files: open, close, read, write, lseek, stat, sync
<li>directories: mkdir, rmdir, link, unlink, mount, umount
<li>users + security: chown, chmod, getuid, setuid
<li>interprocess communication: signals, pipe
<li>networking: socket, accept, snd, recv, connect
<li>time: gettimeofday
<li>terminal:
</ul>
<li>Sample Unix System calls (mostly POSIX)
<ul>
<li> int read(int fd, void*, int)
<li> int write(int fd, void*, int)
<li> off_t lseek(int fd, off_t, int [012])
<li> int close(int fd)
<li> int fsync(int fd)
<li> int open(const char*, int flags [, int mode])
<ul>
<li> O_RDONLY, O_WRONLY, O_RDWR, O_CREAT
</ul>
<li> mode_t umask(mode_t cmask)
<li> int mkdir(char *path, mode_t mode);
<li> DIR *opendir(char *dirname)
<li> struct dirent *readdir(DIR *dirp)
<li> int closedir(DIR *dirp)
<li> int chdir(char *path)
<li> int link(char *existing, char *new)
<li> int unlink(char *path)
<li> int rename(const char*, const char*)
<li> int rmdir(char *path)
<li> int stat(char *path, struct stat *buf)
<li> int mknod(char *path, mode_t mode, dev_t dev)
<li> int fork()
<ul>
<li> returns childPID in parent, 0 in child; only
difference
</ul>
<li>int getpid()
<li> int waitpid(int pid, int* stat, int opt)
<ul>
<li> pid==-1: any; opt==0||WNOHANG
<li> returns pid or error
</ul>
<li> void _exit(int status)
<li> int kill(int pid, int signal)
<li> int sigaction(int sig, struct sigaction *, struct sigaction *)
<li> int sleep (int sec)
<li> int execve(char* prog, char** argv, char** envp)
<li> void *sbrk(int incr)
<li> int dup2(int oldfd, int newfd)
<li> int fcntl(int fd, F_SETFD, int val)
<li> int pipe(int fds[2])
<ul>
<li> writes on fds[1] will be read on fds[0]
<li> when last fds[1] closed, read fds[0] retursn EOF
<li> when last fds[0] closed, write fds[1] kills SIGPIPE/fails
EPIPE
</ul>
<li> int fchown(int fd, uind_t owner, gid_t group)
<li> int fchmod(int fd, mode_t mode)
<li> int socket(int domain, int type, int protocol)
<li> int accept(int socket_fd, struct sockaddr*, int* namelen)
<ul>
<li> returns new fd
</ul>
<li> int listen(int fd, int backlog)
<li> int connect(int fd, const struct sockaddr*, int namelen)
<li> void* mmap(void* addr, size_t len, int prot, int flags, int fd,
off_t offset)
<li> int munmap(void* addr, size_t len)
<li> int gettimeofday(struct timeval*)
</ul>
</ul>
<p>See the <a href="../reference.html">reference page</a> for links to
the early Unix papers.
<h2>Class structure</h2>
<ul>
<li>Lab: minimal OS for x86 in an exokernel style (50%)
<ul>
<li>kernel interface: hardware + protection
<li>libOS implements fork, exec, pipe, ...
<li>applications: file system, shell, ..
<li>development environment: gcc, bochs
<li>lab 1 is out
</ul>
<li>Lecture structure (20%)
<ul>
<li>homework
<li>45min lecture
<li>45min case study
</ul>
<li>Two quizzes (30%)
<ul>
<li>mid-term
<li>final's exam week
</ul>
</ul>
<h2>Case study: the shell (simplified)</h2>
<ul>
<li>interactive command execution and a programming language
<li>Nice example that uses various OS abstractions. See <a
href="../readings/ritchie74unix.pdf">Unix
paper</a> if you are unfamiliar with the shell.
<li>Final lab is a simple shell.
<li>Basic structure:
<pre>
while (1) {
printf ("$");
readcommand (command, args); // parse user input
if ((pid = fork ()) == 0) { // child?
exec (command, args, 0);
} else if (pid > 0) { // parent?
wait (0); // wait for child to terminate
} else {
perror ("Failed to fork\n");
}
}
</pre>
<p>The split of creating a process with a new program in fork and exec
is mostly a historical accident. See the <a
href="../readings/ritchie79evolution.html">assigned paper</a> for today.
<li>Example:
<pre>
$ ls
</pre>
<li>why call "wait"? to wait for the child to terminate and collect
its exit status. (if child finishes, child becomes a zombie until
parent calls wait.)
<li>I/O: file descriptors. Child inherits open file descriptors
from parent. By convention:
<ul>
<li>file descriptor 0 for input (e.g., keyboard). read_command:
<pre>
read (1, buf, bufsize)
</pre>
<li>file descriptor 1 for output (e.g., terminal)
<pre>
write (1, "hello\n", strlen("hello\n")+1)
</pre>
<li>file descriptor 2 for error (e.g., terminal)
</ul>
<li>How does the shell implement:
<pre>
$ls > tmp1
</pre>
just before exec insert:
<pre>
close (1);
fd = open ("tmp1", O_CREAT|O_WRONLY); // fd will be 1!
</pre>
<p>The kernel will return the first free file descriptor, 1 in this case.
<li>How does the shell implement sharing an output file:
<pre>
$ls 2> tmp1 > tmp1
</pre>
replace last code with:
<pre>
close (1);
close (2);
fd1 = open ("tmp1", O_CREAT|O_WRONLY); // fd will be 1!
fd2 = dup (fd1);
</pre>
both file descriptors share offset
<li>how do programs communicate?
<pre>
$ sort file.txt | uniq | wc
</pre>
or
<pre>
$ sort file.txt > tmp1
$ uniq tmp1 > tmp2
$ wc tmp2
$ rm tmp1 tmp2
</pre>
or
<pre>
$ kill -9
</pre>
<li>A pipe is an one-way communication channel. Here is an example
where the parent is the writer and the child is the reader:
<pre>
int fdarray[2];
if (pipe(fdarray) < 0) panic ("error");
if ((pid = fork()) < 0) panic ("error");
else if (pid > 0) {
close(fdarray[0]);
write(fdarray[1], "hello world\n", 12);
} else {
close(fdarray[1]);
n = read (fdarray[0], buf, MAXBUF);
write (1, buf, n);
}
</pre>
<li>How does the shell implement pipelines (i.e., cmd 1 | cmd 2 |..)?
We want to arrange that the output of cmd 1 is the input of cmd 2.
The way to achieve this goal is to manipulate stdout and stdin.
<li>The shell creates processes for each command in
the pipeline, hooks up their stdin and stdout correctly. To do it
correct, and waits for the last process of the
pipeline to exit. A sketch of the core modifications to our shell for
setting up a pipe is:
<pre>
int fdarray[2];
if (pipe(fdarray) < 0) panic ("error");
if ((pid = fork ()) == 0) { child (left end of pipe)
close (1);
tmp = dup (fdarray[1]); // fdarray[1] is the write end, tmp will be 1
close (fdarray[0]); // close read end
close (fdarray[1]); // close fdarray[1]
exec (command1, args1, 0);
} else if (pid > 0) { // parent (right end of pipe)
close (0);
tmp = dup (fdarray[0]); // fdarray[0] is the read end, tmp will be 0
close (fdarray[0]);
close (fdarray[1]); // close write end
exec (command2, args2, 0);
} else {
printf ("Unable to fork\n");
}
</pre>
<li>Why close read-end and write-end? multiple reasons: maintain that
every process starts with 3 file descriptors and reading from an empty
pipe blocks reader, while reading from a closed pipe returns end of
file.
<li>How do you background jobs?
<pre>
$ compute &
</pre>
<li>How does the shell implement "&", backgrounding? (Don't call wait
immediately).
<li>More details in the shell lecture later in the term.
</body>

245
web/l13.html Normal file
View file

@ -0,0 +1,245 @@
<title>High-performance File Systems</title>
<html>
<head>
</head>
<body>
<h1>High-performance File Systems</h1>
<p>Required reading: soft updates.
<h2>Overview</h2>
<p>A key problem in designing file systems is how to obtain
performance on file system operations while providing consistency.
With consistency, we mean, that file system invariants are maintained
is on disk. These invariants include that if a file is created, it
appears in its directory, etc. If the file system data structures are
consistent, then it is possible to rebuild the file system to a
correct state after a failure.
<p>To ensure consistency of on-disk file system data structures,
modifications to the file system must respect certain rules:
<ul>
<li>Never point to a structure before it is initialized. An inode must
be initialized before a directory entry references it. An block must
be initialized before an inode references it.
<li>Never reuse a structure before nullifying all pointers to it. An
inode pointer to a disk block must be reset before the file system can
reallocate the disk block.
<li>Never reset the last point to a live structure before a new
pointer is set. When renaming a file, the file system should not
remove the old name for an inode until after the new name has been
written.
</ul>
The paper calls these dependencies update dependencies.
<p>xv6 ensures these rules by writing every block synchronously, and
by ordering the writes appropriately. With synchronous, we mean
that a process waits until the current disk write has been
completed before continuing with execution.
<ul>
<li>What happens if power fails after 4776 in mknod1? Did we lose the
inode for ever? No, we have a separate program (called fsck), which
can rebuild the disk structures correctly and can mark the inode on
the free list.
<li>Does the order of writes in mknod1 matter? Say, what if we wrote
directory entry first and then wrote the allocated inode to disk?
This violates the update rules and it is not a good plan. If a
failure happens after the directory write, then on recovery we have
an directory pointing to an unallocated inode, which now may be
allocated by another process for another file!
<li>Can we turn the writes (i.e., the ones invoked by iupdate and
wdir) into delayed writes without creating problems? No, because
the cause might write them back to the disk in an incorrect order.
It has no information to decide in what order to write them.
</ul>
<p>xv6 is a nice example of the tension between consistency and
performance. To get consistency, xv6 uses synchronous writes,
but these writes are slow, because they perform at the rate of a
seek instead of the rate of the maximum data transfer rate. The
bandwidth to a disk is reasonable high for large transfer (around
50Mbyte/s), but latency is low, because of the cost of moving the
disk arm(s) (the seek latency is about 10msec).
<p>This tension is an implementation-dependent one. The Unix API
doesn't require that writes are synchronous. Updates don't have to
appear on disk until a sync, fsync, or open with O_SYNC. Thus, in
principle, the UNIX API allows delayed writes, which are good for
performance:
<ul>
<li>Batch many writes together in a big one, written at the disk data
rate.
<li>Absorp writes to the same block.
<li>Schedule writes to avoid seeks.
</ul>
<p>Thus the question: how to delay writes and achieve consistency?
The paper provides an answer.
<h2>This paper</h2>
<p>The paper surveys some of the existing techniques and introduces a
new to achieve the goal of performance and consistency.
<p>
<p>Techniques possible:
<ul>
<li>Equip system with NVRAM, and put buffer cache in NVRAM.
<li>Logging. Often used in UNIX file systems for metadata updates.
LFS is an extreme version of this strategy.
<li>Flusher-enforced ordering. All writes are delayed. This flusher
is aware of dependencies between blocks, but doesn't work because
circular dependencies need to be broken by writing blocks out.
</ul>
<p>Soft updates is the solution explored in this paper. It doesn't
require NVRAM, and performs as well as the naive strategy of keep all
dirty block in main memory. Compared to logging, it is unclear if
soft updates is better. The default BSD file systems uses soft
updates, but most Linux file systems use logging.
<p>Soft updates is a sophisticated variant of flusher-enforced
ordering. Instead of maintaining dependencies on the block-level, it
maintains dependencies on file structure level (per inode, per
directory, etc.), reducing circular dependencies. Furthermore, it
breaks any remaining circular dependencies by undo changes before
writing the block and then redoing them to the block after writing.
<p>Pseudocode for create:
<pre>
create (f) {
allocate inode in block i (assuming inode is available)
add i to directory data block d (assuming d has space)
mark d has dependent on i, and create undo/redo record
update directory inode in block di
mark di has dependent on d
}
</pre>
<p>Pseudocode for the flusher:
<pre>
flushblock (b)
{
lock b;
for all dependencies that b is relying on
"remove" that dependency by undoing the change to b
mark the dependency as "unrolled"
write b
}
write_completed (b) {
remove dependencies that depend on b
reapply "unrolled" dependencies that b depended on
unlock b
}
</pre>
<p>Apply flush algorithm to example:
<ul>
<li>A list of two dependencies: directory->inode, inode->directory.
<li>Lets say syncer picks directory first
<li>Undo directory->inode changes (i.e., unroll <A,#4>)
<li>Write directory block
<li>Remove met dependencies (i.e., remove inode->directory dependency)
<li>Perform redo operation (i.e., redo <A,#4>)
<li>Select inode block and write it
<li>Remove met dependencies (i.e., remove directory->inode dependency)
<li>Select directory block (it is dirty again!)
<li>Write it.
</ul>
<p>An file operation that is important for file-system consistency
is rename. Rename conceptually works as follows:
<pre>
rename (from, to)
unlink (to);
link (from, to);
unlink (from);
</pre>
<p>Rename it often used by programs to make a new version of a file
the current version. Committing to a new version must happen
atomically. Unfortunately, without a transaction-like support
atomicity is impossible to guarantee, so a typical file systems
provides weaker semantics for rename: if to already exists, an
instance of to will always exist, even if the system should crash in
the middle of the operation. Does the above implementation of rename
guarantee this semantics? (Answer: no).
<p>If rename is implemented as unlink, link, unlink, then it is
difficult to guarantee even the weak semantics. Modern UNIXes provide
rename as a file system call:
<pre>
update dir block for to point to from's inode // write block
update dir block for from to free entry // write block
</pre>
<p>fsck may need to correct refcounts in the inode if the file
system fails during rename. for example, a crash after the first
write followed by fsck should set refcount to 2, since both from
and to are pointing at the inode.
<p>This semantics is sufficient, however, for an application to ensure
atomicity. Before the call, there is a from and perhaps a to. If the
call is successful, following the call there is only a to. If there
is a crash, there may be both a from and a to, in which case the
caller knows the previous attempt failed, and must retry. The
subtlety is that if you now follow the two links, the "to" name may
link to either the old file or the new file. If it links to the new
file, that means that there was a crash and you just detected that the
rename operation was composite. On the other hand, the retry
procedure can be the same for either case (do the rename again), so it
isn't necessary to discover how it failed. The function follows the
golden rule of recoverability, and it is idempotent, so it lays all
the needed groundwork for use as part of a true atomic action.
<p>With soft updates renames becomes:
<pre>
rename (from, to) {
i = namei(from);
add "to" directory data block td a reference to inode i
mark td dependent on block i
update directory inode "to" tdi
mark tdi as dependent on td
remove "from" directory data block fd a reference to inode i
mark fd as dependent on tdi
update directory inode in block fdi
mark fdi as dependent on fd
}
</pre>
<p>No synchronous writes!
<p>What needs to be done on recovery? (Inspect every statement in
rename and see what inconsistencies could exist on the disk; e.g.,
refcnt inode could be too high.) None of these inconsitencies require
fixing before the file system can operate; they can be fixed by a
background file system repairer.
<h2>Paper discussion</h2>
<p>Do soft updates perform any useless writes? (A useless write is a
write that will be immediately overwritten.) (Answer: yes.) Fix
syncer to becareful with what block to start. Fix cache replacement
to selecting LRU block with no pendending dependencies.
<p>Can a log-structured file system implement rename better? (Answer:
yes, since it can get the refcnts right).
<p>Discuss all graphs.
</body>

247
web/l14.txt Normal file
View file

@ -0,0 +1,247 @@
Why am I lecturing about Multics?
Origin of many ideas in today's OSes
Motivated UNIX design (often in opposition)
Motivated x86 VM design
This lecture is really "how Intel intended x86 segments to be used"
Multics background
design started in 1965
very few interactive time-shared systems then: CTSS
design first, then implementation
system stable by 1969
so pre-dates UNIX, which started in 1969
ambitious, many years, many programmers, MIT+GE+BTL
Multics high-level goals
many users on same machine: "time sharing"
perhaps commercial services sharing the machine too
remote terminal access (but no recognizable data networks: wired or phone)
persistent reliable file system
encourage interaction between users
support joint projects that share data &c
control access to data that should not be shared
Most interesting aspect of design: memory system
idea: eliminate memory / file distinction
file i/o uses LD / ST instructions
no difference between memory and disk files
just jump to start of file to run program
enhances sharing: no more copying files to private memory
this seems like a really neat simplification!
GE 645 physical memory system
24-bit phys addresses
36-bit words
so up to 75 megabytes of physical memory!!!
but no-one could afford more than about a megabyte
[per-process state]
DBR
DS, SDW (== address space)
KST
stack segment
per-segment linkage segments
[global state]
segment content pages
per-segment page tables
per-segment branch in directory segment
AST
645 segments (simplified for now, no paging or rings)
descriptor base register (DBR) holds phy addr of descriptor segment (DS)
DS is an array of segment descriptor words (SDW)
SDW: phys addr, length, r/w/x, present
CPU has pairs of registers: 18 bit offset, 18 bit segment #
five pairs (PC, arguments, base, linkage, stack)
early Multics limited each segment to 2^16 words
thus there are lots of them, intended to correspond to program modules
note: cannot directly address phys mem (18 vs 24)
645 segments are a lot like the x86!
645 paging
DBR and SDW actually contain phy addr of 64-entry page table
each page is 1024 words
PTE holds phys addr and present flag
no permission bits, so you really need to use the segments, not like JOS
no per-process page table, only per-segment
so all processes using a segment share its page table and phys storage
makes sense assuming segments tend to be shared
paging environment doesn't change on process switch
Multics processes
each process has its own DS
Multics switches DBR on context switch
different processes typically have different number for same segment
how to use segments to unify memory and file system?
don't want to have to use 18-bit seg numbers as file names
we want to write programs using symbolic names
names should be hierarchical (for users)
so users can have directories and sub-directories
and path names
Multics file system
tree structure, directories and files
each file and directory is a segment
dir seg holds array of "branches"
name, length, ACL, array of block #s, "active"
unique ROOT directory
path names: ROOT > A > B
note there are no inodes, thus no i-numbers
so "real name" for a file is the complete path name
o/s tables have path name where unix would have i-number
presumably makes renaming and removing active files awkward
no hard links
how does a program refer to a different segment?
inter-segment variables contain symbolic segment name
A$E refers to segment A, variable/function E
what happens when segment B calls function A$E(1, 2, 3)?
when compiling B:
compiler actually generates *two* segments
one holds B's instructions
one holds B's linkage information
initial linkage entry:
name of segment e.g. "A"
name of symbol e.g. "E"
valid flag
CALL instruction is indirect through entry i of linkage segment
compiler marks entry i invalid
[storage for strings "A" and "E" really in segment B, not linkage seg]
when a process is executing B:
two segments in DS: B and a *copy* of B's linkage segment
CPU linkage register always points to current segment's linkage segment
call A$E is really call indirect via linkage[i]
faults because linkage[i] is invalid
o/s fault handler
looks up segment name for i ("A")
search path in file system for segment "A" (cwd, library dirs)
if not already in use by some process (branch active flag and AST knows):
allocate page table and pages
read segment A into memory
if not already in use by *this* process (KST knows):
find free SDW j in process DS, make it refer to A's page table
set up r/w/x based on process's user and file ACL
also set up copy of A's linkage segment
search A's symbol table for "E"
linkage[i] := j / address(E)
restart B
now the CALL works via linkage[i]
and subsequent calls are fast
how does A get the correct linkage register?
the right value cannot be embedded in A, since shared among processes
so CALL actually goes to instructions in A's linkage segment
load current seg# into linkage register, jump into A
one set of these per procedure in A
all memory / file references work this way
as if pointers were really symbolic names
segment # is really a transparent optimization
linking is "dynamic"
programs contain symbolic references
resolved only as needed -- if/when executed
code is shared among processes
was program data shared?
probably most variables not shared (on stack, in private segments)
maybe a DB would share a data segment, w/ synchronization
file data:
probably one at a time (locks) for read/write
read-only is easy to share
filesystem / segment implications
programs start slowly due to dynamic linking
creat(), unlink(), &c are outside of this model
store beyond end extends a segment (== appends to a file)
no need for buffer cache! no need to copy into user space!
but no buffer cache => ad-hoc caches e.g. active segment table
when are dirty segments written back to disk?
only in page eviction algorithm, when free pages are low
database careful ordered writes? e.g. log before data blocks?
I don't know, probably separate flush system calls
how does shell work?
you type a program name
the shell just CALLs that program, as a segment!
dynamic linking finds program segment and any library segments it needs
the program eventually returns, e.g. with RET
all this happened inside the shell process's address space
no fork, no exec
buggy program can crash the shell! e.g. scribble on stack
process creation was too slow to give each program its own process
how valuable is the sharing provided by segment machinery?
is it critical to users sharing information?
or is it just there to save memory and copying?
how does the kernel fit into all this?
kernel is a bunch of code modules in segments (in file system)
a process dynamically loads in the kernel segments that it uses
so kernel segments have different numbers in different processes
a little different from separate kernel "program" in JOS or xv6
kernel shares process's segment# address space
thus easy to interpret seg #s in system call arguments
kernel segment ACLs in file system restrict write
so mapped non-writeable into processes
how to call the kernel?
very similar to the Intel x86
8 rings. users at 4. core kernel at 0.
CPU knows current execution level
SDW has max read/write/execute levels
call gate: lowers ring level, but only at designated entry
stack per ring, incoming call switches stacks
inner ring can always read arguments, write results
problem: checking validity of arguments to system calls
don't want user to trick kernel into reading/writing the wrong segment
you have this problem in JOS too
later Multics CPUs had hardware to check argument references
are Multics rings a general-purpose protected subsystem facility?
example: protected game implementation
protected so that users cannot cheat
put game's code and data in ring 3
BUT what if I don't trust the author?
or if i've already put some other subsystem in ring 3?
a ring has full power over itself and outer rings: you must trust
today: user/kernel, server processes and IPC
pro: protection among mutually suspicious subsystems
con: no convenient sharing of address spaces
UNIX vs Multics
UNIX was less ambitious (e.g. no unified mem/FS)
UNIX hardware was small
just a few programmers, all in the same room
evolved rather than pre-planned
quickly self-hosted, so they got experience earlier
What did UNIX inherit from MULTICS?
a shell at user level (not built into kernel)
a single hierarchical file system, with subdirectories
controlled sharing of files
written in high level language, self-hosted development
What did UNIX reject from MULTICS?
files look like memory
instead, unifying idea is file descriptor and read()/write()
memory is a totally separate resource
dynamic linking
instead, static linking at compile time, every binary had copy of libraries
segments and sharing
instead, single linear address space per process, like xv6
(but shared libraries brought these back, just for efficiency, in 1980s)
Hierarchical rings of protection
simpler user/kernel
for subsystems, setuid, then client/server and IPC
The most useful sources I found for late-1960s Multics VM:
1. Bensoussan, Clingen, Daley, "The Multics Virtual Memory: Concepts
and Design," CACM 1972 (segments, paging, naming segments, dynamic
linking).
2. Daley and Dennis, "Virtual Memory, Processes, and Sharing in Multics,"
SOSP 1967 (more details about dynamic linking and CPU).
3. Graham, "Protection in an Information Processing Utility,"
CACM 1968 (brief account of rings and gates).

1412
web/l19.txt Normal file

File diff suppressed because it is too large Load diff

494
web/l2.html Normal file
View file

@ -0,0 +1,494 @@
<html>
<head>
<title>L2</title>
</head>
<body>
<h1>6.828 Lecture Notes: x86 and PC architecture</h1>
<h2>Outline</h2>
<ul>
<li>PC architecture
<li>x86 instruction set
<li>gcc calling conventions
<li>PC emulation
</ul>
<h2>PC architecture</h2>
<ul>
<li>A full PC has:
<ul>
<li>an x86 CPU with registers, execution unit, and memory management
<li>CPU chip pins include address and data signals
<li>memory
<li>disk
<li>keyboard
<li>display
<li>other resources: BIOS ROM, clock, ...
</ul>
<li>We will start with the original 16-bit 8086 CPU (1978)
<li>CPU runs instructions:
<pre>
for(;;){
run next instruction
}
</pre>
<li>Needs work space: registers
<ul>
<li>four 16-bit data registers: AX, CX, DX, BX
<li>each in two 8-bit halves, e.g. AH and AL
<li>very fast, very few
</ul>
<li>More work space: memory
<ul>
<li>CPU sends out address on address lines (wires, one bit per wire)
<li>Data comes back on data lines
<li><i>or</i> data is written to data lines
</ul>
<li>Add address registers: pointers into memory
<ul>
<li>SP - stack pointer
<li>BP - frame base pointer
<li>SI - source index
<li>DI - destination index
</ul>
<li>Instructions are in memory too!
<ul>
<li>IP - instruction pointer (PC on PDP-11, everything else)
<li>increment after running each instruction
<li>can be modified by CALL, RET, JMP, conditional jumps
</ul>
<li>Want conditional jumps
<ul>
<li>FLAGS - various condition codes
<ul>
<li>whether last arithmetic operation overflowed
<li> ... was positive/negative
<li> ... was [not] zero
<li> ... carry/borrow on add/subtract
<li> ... overflow
<li> ... etc.
<li>whether interrupts are enabled
<li>direction of data copy instructions
</ul>
<li>JP, JN, J[N]Z, J[N]C, J[N]O ...
</ul>
<li>Still not interesting - need I/O to interact with outside world
<ul>
<li>Original PC architecture: use dedicated <i>I/O space</i>
<ul>
<li>Works same as memory accesses but set I/O signal
<li>Only 1024 I/O addresses
<li>Example: write a byte to line printer:
<pre>
#define DATA_PORT 0x378
#define STATUS_PORT 0x379
#define BUSY 0x80
#define CONTROL_PORT 0x37A
#define STROBE 0x01
void
lpt_putc(int c)
{
/* wait for printer to consume previous byte */
while((inb(STATUS_PORT) & BUSY) == 0)
;
/* put the byte on the parallel lines */
outb(DATA_PORT, c);
/* tell the printer to look at the data */
outb(CONTROL_PORT, STROBE);
outb(CONTROL_PORT, 0);
}
<pre>
</ul>
<li>Memory-Mapped I/O
<ul>
<li>Use normal physical memory addresses
<ul>
<li>Gets around limited size of I/O address space
<li>No need for special instructions
<li>System controller routes to appropriate device
</ul>
<li>Works like ``magic'' memory:
<ul>
<li> <i>Addressed</i> and <i>accessed</i> like memory,
but ...
<li> ... does not <i>behave</i> like memory!
<li> Reads and writes can have ``side effects''
<li> Read results can change due to external events
</ul>
</ul>
</ul>
<li>What if we want to use more than 2^16 bytes of memory?
<ul>
<li>8086 has 20-bit physical addresses, can have 1 Meg RAM
<li>each segment is a 2^16 byte window into physical memory
<li>virtual to physical translation: pa = va + seg*16
<li>the segment is usually implicit, from a segment register
<li>CS - code segment (for fetches via IP)
<li>SS - stack segment (for load/store via SP and BP)
<li>DS - data segment (for load/store via other registers)
<li>ES - another data segment (destination for string operations)
<li>tricky: can't use the 16-bit address of a stack variable as a pointer
<li>but a <i>far pointer</i> includes full segment:offset (16 + 16 bits)
</ul>
<li>But 8086's 16-bit addresses and data were still painfully small
<ul>
<li>80386 added support for 32-bit data and addresses (1985)
<li>boots in 16-bit mode, boot.S switches to 32-bit mode
<li>registers are 32 bits wide, called EAX rather than AX
<li>operands and addresses are also 32 bits, e.g. ADD does 32-bit arithmetic
<li>prefix 0x66 gets you 16-bit mode: MOVW is really 0x66 MOVW
<li>the .code32 in boot.S tells assembler to generate 0x66 for e.g. MOVW
<li>80386 also changed segments and added paged memory...
</ul>
</ul>
<h2>x86 Physical Memory Map</h2>
<ul>
<li>The physical address space mostly looks like ordinary RAM
<li>Except some low-memory addresses actually refer to other things
<li>Writes to VGA memory appear on the screen
<li>Reset or power-on jumps to ROM at 0x000ffff0
</ul>
<pre>
+------------------+ <- 0xFFFFFFFF (4GB)
| 32-bit |
| memory mapped |
| devices |
| |
/\/\/\/\/\/\/\/\/\/\
/\/\/\/\/\/\/\/\/\/\
| |
| Unused |
| |
+------------------+ <- depends on amount of RAM
| |
| |
| Extended Memory |
| |
| |
+------------------+ <- 0x00100000 (1MB)
| BIOS ROM |
+------------------+ <- 0x000F0000 (960KB)
| 16-bit devices, |
| expansion ROMs |
+------------------+ <- 0x000C0000 (768KB)
| VGA Display |
+------------------+ <- 0x000A0000 (640KB)
| |
| Low Memory |
| |
+------------------+ <- 0x00000000
</pre>
<h2>x86 Instruction Set</h2>
<ul>
<li>Two-operand instruction set
<ul>
<li>Intel syntax: <tt>op dst, src</tt>
<li>AT&amp;T (gcc/gas) syntax: <tt>op src, dst</tt>
<ul>
<li>uses b, w, l suffix on instructions to specify size of operands
</ul>
<li>Operands are registers, constant, memory via register, memory via constant
<li> Examples:
<table cellspacing=5>
<tr><td><u>AT&amp;T syntax</u> <td><u>"C"-ish equivalent</u>
<tr><td>movl %eax, %edx <td>edx = eax; <td><i>register mode</i>
<tr><td>movl $0x123, %edx <td>edx = 0x123; <td><i>immediate</i>
<tr><td>movl 0x123, %edx <td>edx = *(int32_t*)0x123; <td><i>direct</i>
<tr><td>movl (%ebx), %edx <td>edx = *(int32_t*)ebx; <td><i>indirect</i>
<tr><td>movl 4(%ebx), %edx <td>edx = *(int32_t*)(ebx+4); <td><i>displaced</i>
</table>
</ul>
<li>Instruction classes
<ul>
<li>data movement: MOV, PUSH, POP, ...
<li>arithmetic: TEST, SHL, ADD, AND, ...
<li>i/o: IN, OUT, ...
<li>control: JMP, JZ, JNZ, CALL, RET
<li>string: REP MOVSB, ...
<li>system: IRET, INT
</ul>
<li>Intel architecture manual Volume 2 is <i>the</i> reference
</ul>
<h2>gcc x86 calling conventions</h2>
<ul>
<li>x86 dictates that stack grows down:
<table cellspacing=5>
<tr><td><u>Example instruction</u> <td><u>What it does</u>
<tr><td>pushl %eax
<td>
subl $4, %esp <br>
movl %eax, (%esp) <br>
<tr><td>popl %eax
<td>
movl (%esp), %eax <br>
addl $4, %esp <br>
<tr><td>call $0x12345
<td>
pushl %eip <sup>(*)</sup> <br>
movl $0x12345, %eip <sup>(*)</sup> <br>
<tr><td>ret
<td>
popl %eip <sup>(*)</sup>
</table>
(*) <i>Not real instructions</i>
<li>GCC dictates how the stack is used.
Contract between caller and callee on x86:
<ul>
<li>after call instruction:
<ul>
<li>%eip points at first instruction of function
<li>%esp+4 points at first argument
<li>%esp points at return address
</ul>
<li>after ret instruction:
<ul>
<li>%eip contains return address
<li>%esp points at arguments pushed by caller
<li>called function may have trashed arguments
<li>%eax contains return value
(or trash if function is <tt>void</tt>)
<li>%ecx, %edx may be trashed
<li>%ebp, %ebx, %esi, %edi must contain contents from time of <tt>call</tt>
</ul>
<li>Terminology:
<ul>
<li>%eax, %ecx, %edx are "caller save" registers
<li>%ebp, %ebx, %esi, %edi are "callee save" registers
</ul>
</ul>
<li>Functions can do anything that doesn't violate contract.
By convention, GCC does more:
<ul>
<li>each function has a stack frame marked by %ebp, %esp
<pre>
+------------+ |
| arg 2 | \
+------------+ &gt;- previous function's stack frame
| arg 1 | /
+------------+ |
| ret %eip | /
+============+
| saved %ebp | \
%ebp-&gt; +------------+ |
| | |
| local | \
| variables, | &gt;- current function's stack frame
| etc. | /
| | |
| | |
%esp-&gt; +------------+ /
</pre>
<li>%esp can move to make stack frame bigger, smaller
<li>%ebp points at saved %ebp from previous function,
chain to walk stack
<li>function prologue:
<pre>
pushl %ebp
movl %esp, %ebp
</pre>
<li>function epilogue:
<pre>
movl %ebp, %esp
popl %ebp
</pre>
or
<pre>
leave
</pre>
</ul>
<li>Big example:
<ul>
<li>C code
<pre>
int main(void) { return f(8)+1; }
int f(int x) { return g(x); }
int g(int x) { return x+3; }
</pre>
<li>assembler
<pre>
_main:
<i>prologue</i>
pushl %ebp
movl %esp, %ebp
<i>body</i>
pushl $8
call _f
addl $1, %eax
<i>epilogue</i>
movl %ebp, %esp
popl %ebp
ret
_f:
<i>prologue</i>
pushl %ebp
movl %esp, %ebp
<i>body</i>
pushl 8(%esp)
call _g
<i>epilogue</i>
movl %ebp, %esp
popl %ebp
ret
_g:
<i>prologue</i>
pushl %ebp
movl %esp, %ebp
<i>save %ebx</i>
pushl %ebx
<i>body</i>
movl 8(%ebp), %ebx
addl $3, %ebx
movl %ebx, %eax
<i>restore %ebx</i>
popl %ebx
<i>epilogue</i>
movl %ebp, %esp
popl %ebp
ret
</pre>
</ul>
<li>Super-small <tt>_g</tt>:
<pre>
_g:
movl 4(%esp), %eax
addl $3, %eax
ret
</pre>
<li>Compiling, linking, loading:
<ul>
<li> <i>Compiler</i> takes C source code (ASCII text),
produces assembly language (also ASCII text)
<li> <i>Assembler</i> takes assembly language (ASCII text),
produces <tt>.o</tt> file (binary, machine-readable!)
<li> <i>Linker</i> takse multiple '<tt>.o</tt>'s,
produces a single <i>program image</i> (binary)
<li> <i>Loader</i> loads the program image into memory
at run-time and starts it executing
</ul>
</ul>
<h2>PC emulation</h2>
<ul>
<li> Emulator like Bochs works by
<ul>
<li> doing exactly what a real PC would do,
<li> only implemented in software rather than hardware!
</ul>
<li> Runs as a normal process in a "host" operating system (e.g., Linux)
<li> Uses normal process storage to hold emulated hardware state:
e.g.,
<ul>
<li> Hold emulated CPU registers in global variables
<pre>
int32_t regs[8];
#define REG_EAX 1;
#define REG_EBX 2;
#define REG_ECX 3;
...
int32_t eip;
int16_t segregs[4];
...
</pre>
<li> <tt>malloc</tt> a big chunk of (virtual) process memory
to hold emulated PC's (physical) memory
</ul>
<li> Execute instructions by simulating them in a loop:
<pre>
for (;;) {
read_instruction();
switch (decode_instruction_opcode()) {
case OPCODE_ADD:
int src = decode_src_reg();
int dst = decode_dst_reg();
regs[dst] = regs[dst] + regs[src];
break;
case OPCODE_SUB:
int src = decode_src_reg();
int dst = decode_dst_reg();
regs[dst] = regs[dst] - regs[src];
break;
...
}
eip += instruction_length;
}
</pre>
<li> Simulate PC's physical memory map
by decoding emulated "physical" addresses just like a PC would:
<pre>
#define KB 1024
#define MB 1024*1024
#define LOW_MEMORY 640*KB
#define EXT_MEMORY 10*MB
uint8_t low_mem[LOW_MEMORY];
uint8_t ext_mem[EXT_MEMORY];
uint8_t bios_rom[64*KB];
uint8_t read_byte(uint32_t phys_addr) {
if (phys_addr < LOW_MEMORY)
return low_mem[phys_addr];
else if (phys_addr >= 960*KB && phys_addr < 1*MB)
return rom_bios[phys_addr - 960*KB];
else if (phys_addr >= 1*MB && phys_addr < 1*MB+EXT_MEMORY) {
return ext_mem[phys_addr-1*MB];
else ...
}
void write_byte(uint32_t phys_addr, uint8_t val) {
if (phys_addr < LOW_MEMORY)
low_mem[phys_addr] = val;
else if (phys_addr >= 960*KB && phys_addr < 1*MB)
; /* ignore attempted write to ROM! */
else if (phys_addr >= 1*MB && phys_addr < 1*MB+EXT_MEMORY) {
ext_mem[phys_addr-1*MB] = val;
else ...
}
</pre>
<li> Simulate I/O devices, etc., by detecting accesses to
"special" memory and I/O space and emulating the correct behavior:
e.g.,
<ul>
<li> Reads/writes to emulated hard disk
transformed into reads/writes of a file on the host system
<li> Writes to emulated VGA display hardware
transformed into drawing into an X window
<li> Reads from emulated PC keyboard
transformed into reads from X input event queue
</ul>
</ul>

334
web/l3.html Normal file
View file

@ -0,0 +1,334 @@
<title>L3</title>
<html>
<head>
</head>
<body>
<h1>Operating system organizaton</h1>
<p>Required reading: Exokernel paper.
<h2>Intro: virtualizing</h2>
<p>One way to think about an operating system interface is that it
extends the hardware instructions with a set of "instructions" that
are implemented in software. These instructions are invoked using a
system call instruction (int on the x86). In this view, a task of the
operating system is to provide each application with a <i>virtual</i>
version of the interface; that is, it provides each application with a
virtual computer.
<p>One of the challenges in an operating system is multiplexing the
physical resources between the potentially many virtual computers.
What makes the multiplexing typically complicated is an additional
constraint: isolate the virtual computers well from each other. That
is,
<ul>
<li> stores shouldn't be able to overwrite other apps's data
<li> jmp shouldn't be able to enter another application
<li> one virtual computer cannot hog the processor
</ul>
<p>In this lecture, we will explore at a high-level how to build
virtual computer that meet these goals. In the rest of the term we
work out the details.
<h2>Virtual processors</h2>
<p>To give each application its own set of virtual processor, we need
to virtualize the physical processors. One way to do is to multiplex
the physical processor over time: the operating system runs one
application for a while, then runs another application for while, etc.
We can implement this solution as follows: when an application has run
for its share of the processor, unload the state of the phyical
processor, save that state to be able to resume the application later,
load in the state for the next application, and resume it.
<p>What needs to be saved and restored? That depends on the
processor, but for the x86:
<ul>
<li>IP
<li>SP
<li>The other processor registers (eax, etc.)
</ul>
<p>To enforce that a virtual processor doesn't keep a processor, the
operating system can arrange for a periodic interrupt, and switch the
processor in the interrupt routine.
<p>To separate the memories of the applications, we may also need to save
and restore the registers that define the (virtual) memory of the
application (e.g., segment and MMU registers on the x86), which is
explained next.
<h2>Separating memories</h2>
<p>Approach to separating memories:
<ul>
<li>Force programs to be written in high-level, type-safe language
<li>Enforce separation using hardware support
</ul>
The approaches can be combined.
<p>Lets assume unlimited physical memory for a little while. We can
enforce separation then as follows:
<ul>
<li>Put device (memory management unit) between processor and memory,
which checks each memory access against a set of domain registers.
(The domain registers are like segment registers on the x86, except
there is no computation to compute an address.)
<li>The domain register specifies a range of addresses that the
processor is allow to access.
<li>When switching applications, switch domain registers.
</ul>
Why does this work? load/stores/jmps cannot touch/enter other
application's domains.
<p>To allow for controled sharing and separation with an application,
extend domain registers with protectioin bits: read (R), write (W),
execute-only (X).
<p>How to protect the domain registers? Extend the protection bits
with a kernel-only one. When in kernel-mode, processor can change
domain registers. As we will see in lecture 4, x86 stores the U/K
information in CPL (current privilege level) in CS segment
register.
<p>To change from user to kernel, extend the hardware with special
instructions for entering a "supervisor" or "system" call, and
returning from it. On x86, int and reti. The int instruction takes as
argument the system call number. We can then think of the kernel
interface as the set of "instructions" that augment the instructions
implemented in hardware.
<h2>Memory management</h2>
<p>We assumed unlimited physical memory and big addresses. In
practice, operating system must support creating, shrinking, and
growing of domains, while still allowing the addresses of an
application to be contiguous (for programming convenience). What if
we want to grow the domain of application 1 but the memory right below
and above it is in use by application 2?
<p>How? Virtual addresses and spaces. Virtualize addresses and let
the kernel control the mapping from virtual to physical.
<p> Address spaces provide each application with the ideas that it has
a complete memory for itself. All the addresses it issues are its
addresses (e.g., each application has an address 0).
<li> How do you give each application its own address space?
<ul>
<li> MMU translates <i>virtual</i> address to <i>physical</i>
addresses using a translation table
<li> Implementation approaches for translation table:
<ol>
<li> for each virtual address store physical address, too costly.
<li> translate a set of contiguous virtual addresses at a time using
segments (segment #, base address, length)
<li> translate a fixed-size set of address (page) at a time using a
page map (page # -> block #) (draw hardware page table picture).
Datastructures for page map: array, n-level tree, superpages, etc.
</ol>
<br>Some processor have both 2+3: x86! (see lecture 4)
</ul>
<li> What if two applications want to share real memory? Map the pages
into multiple address spaces and have protection bits per page.
<li> How do you give an application access to a memory-mapped-IO
device? Map the physical address for the device into the applications
address space.
<li> How do you get off the ground?
<ul>
<li> when computer starts, MMU is disabled.
<li> computer starts in kernel mode, with no
translation (i.e., virtual address 0 is physical address 0, and
so on)
<li> kernel program sets up MMU to translate kernel address to physical
address. often kernel virtual address translates to physical adress 0.
<li> enable MMU
<br><p>Lab 2 explores this topic in detail.
</ul>
<h2>Operating system organizations</h2>
<p>A central theme in operating system design is how to organize the
operating system. It is helpful to define a couple of terms:
<ul>
<li>Kernel: the program that runs in kernel mode, in a kernel
address space.
<li>Library: code against which application link (e.g., libc).
<li>Application: code that runs in a user-level address space.
<li>Operating system: kernel plus all user-level system code (e.g.,
servers, libraries, etc.)
</ul>
<p>Example: trace a call to printf made by an application.
<p>There are roughly 4 operating system designs:
<ul>
<li>Monolithic design. The OS interface is the kernel interface (i.e.,
the complete operating systems runs in kernel mode). This has limited
flexibility (other than downloadable kernel modules) and doesn't fault
isolate individual OS modules (e.g., the file system and process
module are both in the kernel address space). xv6 has this
organization.
<li>Microkernl design. The kernel interface provides a minimal set of
abstractions (e.g., virtual memory, IPC, and threads), and the rest of
the operating system is implemented by user applications (often called
servers).
<li>Virtual machine design. The kernel implements a virtual machine
monitor. The monitor multiplexes multiple virtual machines, which
each provide as the kernel programming interface the machine platform
(the instruction set, devices, etc.). Each virtual machine runs its
own, perhaps simple, operating system.
<li>Exokernel design. Only used in this class and discussed below.
</ul>
<p>Although monolithic operating systems are the dominant operating
system architecture for desktop and server machines, it is worthwhile
to consider alternative architectures, even it is just to understand
operating systems better. This lecture looks at exokernels, because
that is what you will building in the lab. xv6 is organized as a
monolithic system, and we will study in the next lectures. Later in
the term we will read papers about microkernel and virtual machine
operating systems.
<h2>Exokernels</h2>
<p>The exokernel architecture takes an end-to-end approach to
operating system design. In this design, the kernel just securely
multiplexes physical resources; any programmer can decide what the
operating system interface and its implementation are for his
application. One would expect a couple of popular APIs (e.g., UNIX)
that most applications will link against, but a programmer is always
free to replace that API, partially or completely. (Draw picture of
JOS.)
<p>Compare UNIX interface (<a href="v6.c">v6</a> or <a
href="os10.h">OSX</a>) with the JOS exokernel-like interface:
<pre>
enum
{
SYS_cputs = 0,
SYS_cgetc,
SYS_getenvid,
SYS_env_destroy,
SYS_page_alloc,
SYS_page_map,
SYS_page_unmap,
SYS_exofork,
SYS_env_set_status,
SYS_env_set_trapframe,
SYS_env_set_pgfault_upcall,
SYS_yield,
SYS_ipc_try_send,
SYS_ipc_recv,
};
</pre>
<p>To illustrate the differences between these interfaces in more
detail consider implementing the following:
<ul>
<li>User-level thread package that deals well with blocking system
calls, page faults, etc.
<li>High-performance web server performing optimizations across module
boundaries (e.g., file system and network stack).
</ul>
<p>How well can each kernel interface implement the above examples?
(Start with UNIX interface and see where you run into problems.) (The
JOS kernel interface is not flexible enough: for example,
<i>ipc_receive</i> is blocking.)
<h2>Exokernel paper discussion</h2>
<p>The central challenge in an exokernel design it to provide
extensibility, but provide fault isolation. This challenge breaks
down into three problems:
<ul>
<li>tracking owner ship of resources;
<li>ensuring fault isolation between applications;
<li>revoking access to resources.
</ul>
<ul>
<li>How is physical memory multiplexed? Kernel tracks for each
physical page who has it.
<li>How is the processor multiplexed? Time slices.
<li>How is the network multiplexed? Packet filters.
<li>What is the plan for revoking resources?
<ul>
<li>Expose information so that application can do the right thing.
<li>Ask applications politely to release resources of a given type.
<li>Ask applications with force to release resources
</ul>
<li>What is an environment? The processor environment: it stores
sufficient information to deliver events to applications: exception
context, interrupt context, protected entry context, and addressing
context. This structure is processor specific.
<li>How does on implement a minimal protected control transfer on the
x86? Lab 4's approach to IPC has some short comings: what are they?
(It is essentially a polling-based solution, and the one you implement
is unfair.) What is a better way? Set up a specific handler to be
called when an environment wants to call this environment. How does
this impact scheduling of environments? (i.e., give up time slice or
not?)
<li>How does one dispatch exceptions (e.g., page fault) to user space
on the x86? Give each environment a separate exception stack in user
space, and propagate exceptions on that stack. See page-fault handling
in lab 4.
<li>How does on implement processes in user space? The thread part of
a process is easy. The difficult part it to perform the copy of the
address space efficiently; one would like to share memory between
parent and child. This property can be achieved using copy-on-write.
The child should, however, have its own exception stack. Again,
see lab 4. <i>sfork</i> is a trivial extension of user-level <i>fork</i>.
<li>What are the examples of extensibility in this paper? (RPC system
in which server saves and restores registers, different page table,
and stride scheduler.)
</ul>
</body>

518
web/l4.html Normal file
View file

@ -0,0 +1,518 @@
<title>L4</title>
<html>
<head>
</head>
<body>
<h1>Address translation and sharing using segments</h1>
<p>This lecture is about virtual memory, focusing on address
spaces. It is the first lecture out of series of lectures that uses
xv6 as a case study.
<h2>Address spaces</h2>
<ul>
<li>OS: kernel program and user-level programs. For fault isolation
each program runs in a separate address space. The kernel address
spaces is like user address spaces, expect it runs in kernel mode.
The program in kernel mode can execute priviledge instructions (e.g.,
writing the kernel's code segment registers).
<li>One job of kernel is to manage address spaces (creating, growing,
deleting, and switching between them)
<ul>
<li>Each address space (including kernel) consists of the binary
representation for the text of the program, the data part
part of the program, and the stack area.
<li>The kernel address space runs the kernel program. In a monolithic
organization the kernel manages all hardware and provides an API
to user programs.
<li>Each user address space contains a program. A user progam may ask
to shrink or grow its address space.
</ul>
<li>The main operations:
<ul>
<li>Creation. Allocate physical memory to storage program. Load
program into physical memory. Fill address spaces with references to
physical memory.
<li>Growing. Allocate physical memory and add it to address space.
<li>Shrinking. Free some of the memory in an address space.
<li>Deletion. Free all memory in an address space.
<li>Switching. Switch the processor to use another address space.
<li>Sharing. Share a part of an address space with another program.
</ul>
</ul>
<p>Two main approaches to implementing address spaces: using segments
and using page tables. Often when one uses segments, one also uses
page tables. But not the other way around; i.e., paging without
segmentation is common.
<h2>Example support for address spaces: x86</h2>
<p>For an operating system to provide address spaces and address
translation typically requires support from hardware. The translation
and checking of permissions typically must happen on each address used
by a program, and it would be too slow to check that in software (if
even possible). The division of labor is operating system manages
address spaces, and hardware translates addresses and checks
permissions.
<p>PC block diagram without virtual memory support:
<ul>
<li>physical address
<li>base, IO hole, extended memory
<li>Physical address == what is on CPU's address pins
</ul>
<p>The x86 starts out in real mode and translation is as follows:
<ul>
<li>segment*16+offset ==> physical address
<li>no protection: program can load anything into seg reg
</ul>
<p>The operating system can switch the x86 to protected mode, which
allows the operating system to create address spaces. Translation in
protected mode is as follows:
<ul>
<li>selector:offset (logical addr) <br>
==SEGMENTATION==>
<li>linear address <br>
==PAGING ==>
<li>physical address
</ul>
<p>Next lecture covers paging; now we focus on segmentation.
<p>Protected-mode segmentation works as follows:
<ul>
<li>protected-mode segments add 32-bit addresses and protection
<ul>
<li>wait: what's the point? the point of segments in real mode was
bigger addresses, but 32-bit mode fixes that!
</ul>
<li>segment register holds segment selector
<li>selector indexes into global descriptor table (GDT)
<li>segment descriptor holds 32-bit base, limit, type, protection
<li>la = va + base ; assert(va < limit);
<li>seg register usually implicit in instruction
<ul>
<li>DS:REG
<ul>
<li><tt>movl $0x1, _flag</tt>
</ul>
<li>SS:ESP, SS:EBP
<ul>
<li><tt>pushl %ecx, pushl $_i</tt>
<li><tt>popl %ecx</tt>
<li><tt>movl 4(%ebp),%eax</tt>
</ul>
<li>CS:EIP
<ul>
<li>instruction fetch
</ul>
<li>String instructions: read from DS:ESI, write to ES:EDI
<ul>
<li><tt>rep movsb</tt>
</ul>
<li>Exception: far addresses
<ul>
<li><tt>ljmp $selector, $offset</tt>
</ul>
</ul>
<li>LGDT instruction loads CPU's GDT register
<li>you turn on protected mode by setting PE bit in CR0 register
<li>what happens with the next instruction? CS now has different
meaning...
<li>How to transfer from segment to another, perhaps with different
priveleges.
<ul>
<li>Current privilege level (CPL) is in the low 2 bits of CS
<li>CPL=0 is privileged O/S, CPL=3 is user
<li>Within in the same privelege level: ljmp.
<li>Transfer to a segment with more privilege: call gates.
<ul>
<li>a way for app to jump into a segment and acquire privs
<li>CPL must be <= descriptor's DPL in order to read or write segment
<li>call gates can change privelege <b>and</b> switch CS and SS
segment
<li>call gates are implemented using a special type segment descriptor
in the GDT.
<li>interrupts are conceptually the same as call gates, but their
descriptor is stored in the IDT. We will use interrupts to transfer
control between user and kernel mode, both in JOS and xv6. We will
return to this in the lecture about interrupts and exceptions.
</ul>
</ul>
<li>What about protection?
<ul>
<li>can o/s limit what memory an application can read or write?
<li>app can load any selector into a seg reg...
<li>but can only mention indices into GDT
<li>app can't change GDT register (requires privilege)
<li>why can't app write the descriptors in the GDT?
<li>what about system calls? how to they transfer to kernel?
<li>app cannot <b>just</b> lower the CPL
</ul>
</ul>
<h2>Case study (xv6)</h2>
<p>xv6 is a reimplementation of <a href="../v6.html">Unix 6th edition</a>.
<ul>
<li>v6 is a version of the orginal Unix operating system for <a href="http://www.pdp11.org/">DEC PDP11</a>
<ul>
<li>PDP-11 (1972):
<li>16-bit processor, 18-bit physical (40)
<li>UNIBUS
<li>memory-mapped I/O
<li>performance: less than 1MIPS
<li>register-to-register transfer: 0.9 usec
<li>56k-228k (40)
<li>no paging, but some segmentation support
<li>interrupts, traps
<li>about $10K
<li>rk disk with 2MByte of storage
<li>with cabinet 11/40 is 400lbs
</ul>
<li>Unix v6
<ul>
<li><a href="../reference.html">Unix papers</a>.
<li>1976; first widely available Unix outside Bell labs
<li>Thompson and Ritchie
<li>Influenced by Multics but simpler.
<li>complete (used for real work)
<li>Multi-user, time-sharing
<li>small (43 system calls)
<li>modular (composition through pipes; one had to split programs!!)
<li>compactly written (2 programmers, 9,000 lines of code)
<li>advanced UI (shell)
<li>introduced C (derived from B)
<li>distributed with source
<li>V7 was sold by Microsoft for a couple years under the name Xenix
</ul>
<li>Lion's commentary
<ul>
<li>surpressed because of copyright issue
<li>resurfaced in 1996
</ul>
<li>xv6 written for 6.828:
<ul>
<li>v6 reimplementation for x86
<li>does't include all features of v6 (e.g., xv6 has 20 of 43
system calls).
<li>runs on symmetric multiprocessing PCs (SMPs).
</ul>
</ul>
<p>Newer Unixs have inherited many of the conceptual ideas even though
they added paging, networking, graphics, improve performance, etc.
<p>You will need to read most of the source code multiple times. Your
goal is to explain every line to yourself.
<h3>Overview of address spaces in xv6</h3>
<p>In today's lecture we see how xv6 creates the kernel address
spaces, first user address spaces, and switches to it. To understand
how this happens, we need to understand in detail the state on the
stack too---this may be surprising, but a thread of control and
address space are tightly bundled in xv6, in a concept
called <i>process</i>. The kernel address space is the only address
space with multiple threads of control. We will study context
switching and process management in detail next weeks; creation of
the first user process (init) will get you a first flavor.
<p>xv6 uses only the segmentation hardware on xv6, but in a limited
way. (In JOS you will use page-table hardware too, which we cover in
next lecture.) The adddress space layouts are as follows:
<ul>
<li>In kernel address space is set up as follows:
<pre>
the code segment runs from 0 to 2^32 and is mapped X and R
the data segment runs from 0 to 2^32 but is mapped W (read and write).
</pre>
<li>For each process, the layout is as follows:
<pre>
text
original data and bss
fixed-size stack
expandable heap
</pre>
The text of a process is stored in its own segment and the rest in a
data segment.
</ul>
<p>xv6 makes minimal use of the segmentation hardware available on the
x86. What other plans could you envision?
<p>In xv6, each each program has a user and a kernel stack; when the
user program switches to the kernel, it switches to its kernel stack.
Its kernel stack is stored in process's proc structure. (This is
arranged through the descriptors in the IDT, which is covered later.)
<p>xv6 assumes that there is a lot of physical memory. It assumes that
segments can be stored contiguously in physical memory and has
therefore no need for page tables.
<h3>xv6 kernel address space</h3>
<p>Let's see how xv6 creates the kernel address space by tracing xv6
from when it boots, focussing on address space management:
<ul>
<li>Where does xv6 start after the PC is power on: start (which is
loaded at physical address 0x7c00; see lab 1).
<li>1025-1033: are we in real mode?
<ul>
<li>how big are logical addresses?
<li>how big are physical addresses?
<li>how are addresses physical calculated?
<li>what segment is being used in subsequent code?
<li>what values are in that segment?
</ul>
<li>1068: what values are loaded in the GDT?
<ul>
<li>1097: gdtr points to gdt
<li>1094: entry 0 unused
<li>1095: entry 1 (X + R, base = 0, limit = 0xffffffff, DPL = 0)
<li>1096: entry 2 (W, base = 0, limit = 0xffffffff, DPL = 0)
<li>are we using segments in a sophisticated way? (i.e., controled sharing)
<li>are P and S set?
<li>are addresses translated as in protected mode when lgdt completes?
</ul>
<li>1071: no, and not even here.
<li>1075: far jump, load 8 in CS. from now on we use segment-based translation.
<li>1081-1086: set up other segment registers
<li>1087: where is the stack which is used for procedure calls?
<li>1087: cmain in the bootloader (see lab 1), which calls main0
<li>1222: main0.
<ul>
<li>job of main0 is to set everthing up so that all xv6 convtions works
<li>where is the stack? (sp = 0x7bec)
<li>what is on it?
<pre>
00007bec [00007bec] 7cda // return address in cmain
00007bf0 [00007bf0] 0080 // callee-saved ebx
00007bf4 [00007bf4] 7369 // callee-saved esi
00007bf8 [00007bf8] 0000 // callee-saved ebp
00007bfc [00007bfc] 7c49 // return address for cmain: spin
00007c00 [00007c00] c031fcfa // the instructions from 7c00 (start)
</pre>
</ul>
<li>1239-1240: switch to cpu stack (important for scheduler)
<ul>
<li>why -32?
<li>what values are in ebp and esp?
<pre>
esp: 0x108d30 1084720
ebp: 0x108d5c 1084764
</pre>
<li>what is on the stack?
<pre>
00108d30 [00108d30] 0000
00108d34 [00108d34] 0000
00108d38 [00108d38] 0000
00108d3c [00108d3c] 0000
00108d40 [00108d40] 0000
00108d44 [00108d44] 0000
00108d48 [00108d48] 0000
00108d4c [00108d4c] 0000
00108d50 [00108d50] 0000
00108d54 [00108d54] 0000
00108d58 [00108d58] 0000
00108d5c [00108d5c] 0000
00108d60 [00108d60] 0001
00108d64 [00108d64] 0001
00108d68 [00108d68] 0000
00108d6c [00108d6c] 0000
</pre>
<li>what is 1 in 0x108d60? is it on the stack?
</ul>
<li>1242: is it save to reference bcpu? where is it allocated?
<li>1260-1270: set up proc[0]
<ul>
<li>each process has its own stack (see struct proc).
<li>where is its stack? (see the section below on physical memory
management below).
<li>what is the jmpbuf? (will discuss in detail later)
<li>1267: why -4?
</ul>
<li>1270: necessar to be able to take interrupts (will discuss in
detail later)
<li>1292: what process do you think scheduler() will run? we will
study later how that happens, but let's assume it runs process0 on
process0's stack.
</ul>
<h3>xv6 user address spaces</h3>
<ul>
<li>1327: process0
<ul>
<li>process 0 sets up everything to make process conventions work out
<li>which stack is process0 running? see 1260.
<li>1334: is the convention to release the proc_table_lock after being
scheduled? (we will discuss locks later; assume there are no other
processors for now.)
<li>1336: cwd is current working directory.
<li>1348: first step in initializing a template tram frame: set
everything to zero. we are setting up process 0 as if it just
entered the kernel from user space and wants to go back to user
space. (see x86.h to see what field have the value 0.)
<li>1349: why "|3"? instead of 0?
<li>1351: why set interrupt flag in template trapframe?
<li>1352: where will the user stack be in proc[0]'s address space?
<li>1353: makes a copy of proc0. fork() calls copyproc() to implement
forking a process. This statement in essense is calling fork inside
proc0, making a proc[1] a duplicate of proc[0]. proc[0], however,
has not much in its address space of one page (see 1341).
<ul>
<li>2221: grab a lock on the proc table so that we are the only one
updating it.
<li>2116: allocate next pid.
<li>2228: we got our entry; release the lock. from now we are only
modifying our entry.
<li>2120-2127: copy proc[0]'s memory. proc[1]'s memory will be identical
to proc[0]'s.
<li>2130-2136: allocate a kernel stack. this stack is different from
the stack that proc[1] uses when running in user mode.
<li>2139-2140: copy the template trapframe that xv6 had set up in
proc[0].
<li>2147: where will proc[1] start running when the scheduler selects
it?
<li>2151-2155: Unix semantics: child inherits open file descriptors
from parent.
<li>2158: same for cwd.
</ul>
<li>1356: load a program in proc[1]'s address space. the program
loaded is the binary version of init.c (sheet 16).
<li>1374: where will proc[1] start?
<li>1377-1388: copy the binary into proc[1]'s address space. (you
will learn about the ELF format in the labs.)
<ul>
<li>can the binary for init be any size for proc[1] to work correctly?
<li>what is the layout of proc[1]'s address space? is it consistent
with the layout described on line 1950-1954?
</ul>
<li>1357: make proc[1] runnable so that the scheduler will select it
to run. everything is set up now for proc[1] to run, "return" to
user space, and execute init.
<li>1359: proc[0] gives up the processor, which calls sleep, which
calls sched, which setjmps back to scheduler. let's peak a bit in
scheduler to see what happens next. (we will return to the
scheduler in more detail later.)
</ul>
<li>2219: this test will fail for proc[1]
<li>2226: setupsegs(p) sets up the segments for proc[1]. this call is
more interesting than the previous, so let's see what happens:
<ul>
<li>2032-37: this is for traps and interrupts, which we will cover later.
<li>2039-49: set up new gdt.
<li>2040: why 0x100000 + 64*1024?
<li>2045: why 3? why is base p->mem? is p->mem physical or logical?
<li>2045-2046: how much the program for proc[1] be compiled if proc[1]
will run successfully in user space?
<li>2052: we are still running in the kernel, but we are loading gdt.
is this ok?
<li>why have so few user-level segments? why not separate out code,
data, stack, bss, etc.?
</ul>
<li>2227: record that proc[1] is running on the cpu
<li>2228: record it is running instead of just runnable
<li>2229: setjmp to fork_ret.
<li>2282: which stack is proc[1] running on?
<li>2284: when scheduled, first release the proc_table_lock.
<li>2287: back into assembly.
<li>2782: where is the stack pointer pointing to?
<pre>
0020dfbc [0020dfbc] 0000
0020dfc0 [0020dfc0] 0000
0020dfc4 [0020dfc4] 0000
0020dfc8 [0020dfc8] 0000
0020dfcc [0020dfcc] 0000
0020dfd0 [0020dfd0] 0000
0020dfd4 [0020dfd4] 0000
0020dfd8 [0020dfd8] 0000
0020dfdc [0020dfdc] 0023
0020dfe0 [0020dfe0] 0023
0020dfe4 [0020dfe4] 0000
0020dfe8 [0020dfe8] 0000
0020dfec [0020dfec] 0000
0020dff0 [0020dff0] 001b
0020dff4 [0020dff4] 0200
0020dff8 [0020dff8] 1000
</pre>
<li>2783: why jmp instead of call?
<li>what will iret put in eip?
<li>what is 0x1b? what will iret put in cs?
<li>after iret, what will the processor being executing?
</ul>
<h3>Managing physical memory</h3>
<p>To create an address space we must allocate physical memory, which
will be freed when an address space is deleted (e.g., when a user
program terminates). xv6 implements a first-fit memory allocater
(see kalloc.c).
<p>It maintains a list of ranges of free memory. The allocator finds
the first range that is larger than the amount of requested memory.
It splits that range in two: one range of the size requested and one
of the remainder. It returns the first range. When memory is
freed, kfree will merge ranges that are adjacent in memory.
<p>Under what scenarios is a first-fit memory allocator undesirable?
<h3>Growing an address space</h3>
<p>How can a user process grow its address space? growproc.
<ul>
<li>2064: allocate a new segment of old size plus n
<li>2067: copy the old segment into the new (ouch!)
<li>2068: and zero the rest.
<li>2071: free the old physical memory
</ul>
<p>We could do a lot better if segments didn't have to contiguous in
physical memory. How could we arrange that? Using page tables, which
is our next topic. This is one place where page tables would be
useful, but there are others too (e.g., in fork).
</body>

210
web/l5.html Normal file
View file

@ -0,0 +1,210 @@
<title>Lecture 5/title>
<html>
<head>
</head>
<body>
<h2>Address translation and sharing using page tables</h2>
<p> Reading: <a href="../readings/i386/toc.htm">80386</a> chapters 5 and 6<br>
<p> Handout: <b> x86 address translation diagram</b> -
<a href="x86_translation.ps">PS</a> -
<a href="x86_translation.eps">EPS</a> -
<a href="x86_translation.fig">xfig</a>
<br>
<p>Why do we care about x86 address translation?
<ul>
<li>It can simplify s/w structure by placing data at fixed known addresses.
<li>It can implement tricks like demand paging and copy-on-write.
<li>It can isolate programs to contain bugs.
<li>It can isolate programs to increase security.
<li>JOS uses paging a lot, and segments more than you might think.
</ul>
<p>Why aren't protected-mode segments enough?
<ul>
<li>Why did the 386 add translation using page tables as well?
<li>Isn't it enough to give each process its own segments?
</ul>
<p>Translation using page tables on x86:
<ul>
<li>paging hardware maps linear address (la) to physical address (pa)
<li>(we will often interchange "linear" and "virtual")
<li>page size is 4096 bytes, so there are 1,048,576 pages in 2^32
<li>why not just have a big array with each page #'s translation?
<ul>
<li>table[20-bit linear page #] => 20-bit phys page #
</ul>
<li>386 uses 2-level mapping structure
<li>one page directory page, with 1024 page directory entries (PDEs)
<li>up to 1024 page table pages, each with 1024 page table entries (PTEs)
<li>so la has 10 bits of directory index, 10 bits table index, 12 bits offset
<li>What's in a PDE or PTE?
<ul>
<li>20-bit phys page number, present, read/write, user/supervisor
</ul>
<li>cr3 register holds physical address of current page directory
<li>puzzle: what do PDE read/write and user/supervisor flags mean?
<li>puzzle: can supervisor read/write user pages?
<li>Here's how the MMU translates an la to a pa:
<pre>
uint
translate (uint la, bool user, bool write)
{
uint pde;
pde = read_mem (%CR3 + 4*(la >> 22));
access (pde, user, read);
pte = read_mem ( (pde & 0xfffff000) + 4*((la >> 12) & 0x3ff));
access (pte, user, read);
return (pte & 0xfffff000) + (la & 0xfff);
}
// check protection. pxe is a pte or pde.
// user is true if CPL==3
void
access (uint pxe, bool user, bool write)
{
if (!(pxe & PG_P)
=> page fault -- page not present
if (!(pxe & PG_U) && user)
=> page fault -- not access for user
if (write && !(pxe & PG_W))
if (user)
=> page fault -- not writable
else if (!(pxe & PG_U))
=> page fault -- not writable
else if (%CR0 & CR0_WP)
=> page fault -- not writable
}
</pre>
<li>CPU's TLB caches vpn => ppn mappings
<li>if you change a PDE or PTE, you must flush the TLB!
<ul>
<li>by re-loading cr3
</ul>
<li>turn on paging by setting CR0_PE bit of %cr0
</ul>
Can we use paging to limit what memory an app can read/write?
<ul>
<li>user can't modify cr3 (requires privilege)
<li>is that enough?
<li>could user modify page tables? after all, they are in memory.
</ul>
<p>How we will use paging (and segments) in JOS:
<ul>
<li>use segments only to switch privilege level into/out of kernel
<li>use paging to structure process address space
<li>use paging to limit process memory access to its own address space
<li>below is the JOS virtual memory map
<li>why map both kernel and current process? why not 4GB for each?
<li>why is the kernel at the top?
<li>why map all of phys mem at the top? i.e. why multiple mappings?
<li>why map page table a second time at VPT?
<li>why map page table a third time at UVPT?
<li>how do we switch mappings for a different process?
</ul>
<pre>
4 Gig --------> +------------------------------+
| | RW/--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
: . :
: . :
: . :
|~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~| RW/--
| | RW/--
| Remapped Physical Memory | RW/--
| | RW/--
KERNBASE -----> +------------------------------+ 0xf0000000
| Cur. Page Table (Kern. RW) | RW/-- PTSIZE
VPT,KSTACKTOP--> +------------------------------+ 0xefc00000 --+
| Kernel Stack | RW/-- KSTKSIZE |
| - - - - - - - - - - - - - - -| PTSIZE
| Invalid Memory | --/-- |
ULIM ------> +------------------------------+ 0xef800000 --+
| Cur. Page Table (User R-) | R-/R- PTSIZE
UVPT ----> +------------------------------+ 0xef400000
| RO PAGES | R-/R- PTSIZE
UPAGES ----> +------------------------------+ 0xef000000
| RO ENVS | R-/R- PTSIZE
UTOP,UENVS ------> +------------------------------+ 0xeec00000
UXSTACKTOP -/ | User Exception Stack | RW/RW PGSIZE
+------------------------------+ 0xeebff000
| Empty Memory | --/-- PGSIZE
USTACKTOP ---> +------------------------------+ 0xeebfe000
| Normal User Stack | RW/RW PGSIZE
+------------------------------+ 0xeebfd000
| |
| |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
. .
. .
. .
|~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
| Program Data & Heap |
UTEXT --------> +------------------------------+ 0x00800000
PFTEMP -------> | Empty Memory | PTSIZE
| |
UTEMP --------> +------------------------------+ 0x00400000
| Empty Memory | PTSIZE
0 ------------> +------------------------------+
</pre>
<h3>The VPT </h3>
<p>Remember how the X86 translates virtual addresses into physical ones:
<p><img src=pagetables.png>
<p>CR3 points at the page directory. The PDX part of the address
indexes into the page directory to give you a page table. The
PTX part indexes into the page table to give you a page, and then
you add the low bits in.
<p>But the processor has no concept of page directories, page tables,
and pages being anything other than plain memory. So there's nothing
that says a particular page in memory can't serve as two or three of
these at once. The processor just follows pointers:
pd = lcr3();
pt = *(pd+4*PDX);
page = *(pt+4*PTX);
<p>Diagramatically, it starts at CR3, follows three arrows, and then stops.
<p>If we put a pointer into the page directory that points back to itself at
index Z, as in
<p><img src=vpt.png>
<p>then when we try to translate a virtual address with PDX and PTX
equal to V, following three arrows leaves us at the page directory.
So that virtual page translates to the page holding the page directory.
In Jos, V is 0x3BD, so the virtual address of the VPD is
(0x3BD&lt;&lt;22)|(0x3BD&lt;&lt;12).
<p>Now, if we try to translate a virtual address with PDX = V but an
arbitrary PTX != V, then following three arrows from CR3 ends
one level up from usual (instead of two as in the last case),
which is to say in the page tables. So the set of virtual pages
with PDX=V form a 4MB region whose page contents, as far
as the processor is concerned, are the page tables themselves.
In Jos, V is 0x3BD so the virtual address of the VPT is (0x3BD&lt;&lt;22).
<p>So because of the "no-op" arrow we've cleverly inserted into
the page directory, we've mapped the pages being used as
the page directory and page table (which are normally virtually
invisible) into the virtual address space.
</body>

70
web/mkhtml Executable file
View file

@ -0,0 +1,70 @@
#!/usr/bin/perl
my @lines = <>;
my $text = join('', @lines);
my $title;
if($text =~ /^\*\* (.*?)\n/m){
$title = $1;
$text = $` . $';
}else{
$title = "Untitled";
}
$text =~ s/[ \t]+$//mg;
$text =~ s/^$/<br><br>/mg;
$text =~ s!\b([a-z0-9]+\.(c|s|pl|h))\b!<a href="src/$1.html">$1</a>!g;
$text =~ s!^(Lecture [0-9]+\. .*?)$!<b><i>$1</i></b>!mg;
$text =~ s!^\* (.*?)$!<h2>$1</h2>!mg;
$text =~ s!((<br>)+\n)+<h2>!\n<h2>!g;
$text =~ s!</h2>\n?((<br>)+\n)+!</h2>\n!g;
$text =~ s!((<br>)+\n)+<b>!\n<br><br><b>!g;
$text =~ s!\b\s*--\s*\b!\&ndash;!g;
$text =~ s!\[([^\[\]|]+) \| ([^\[\]]+)\]!<a href="$1">$2</a>!g;
$text =~ s!\[([^ \t]+)\]!<a href="$1">$1</a>!g;
$text =~ s!``!\&ldquo;!g;
$text =~ s!''!\&rdquo;!g;
print <<EOF;
<!-- AUTOMATICALLY GENERATED: EDIT the .txt version, not the .html version -->
<html>
<head>
<title>$title</title>
<style type="text/css"><!--
body {
background-color: white;
color: black;
font-size: medium;
line-height: 1.2em;
margin-left: 0.5in;
margin-right: 0.5in;
margin-top: 0;
margin-bottom: 0;
}
h1 {
text-indent: 0in;
text-align: left;
margin-top: 2em;
font-weight: bold;
font-size: 1.4em;
}
h2 {
text-indent: 0in;
text-align: left;
margin-top: 2em;
font-weight: bold;
font-size: 1.2em;
}
--></style>
</head>
<body bgcolor=#ffffff>
<h1>$title</h1>
<br><br>
EOF
print $text;
print <<EOF;
</body>
</html>
EOF

53
web/x86-intr.html Normal file
View file

@ -0,0 +1,53 @@
<title>Homework: xv6 and Interrupts and Exceptions</title>
<html>
<head>
</head>
<body>
<h1>Homework: xv6 and Interrupts and Exceptions</h1>
<p>
<b>Read</b>: xv6's trapasm.S, trap.c, syscall.c, vectors.S, and usys.S. Skim
lapic.c, ioapic.c, and picirq.c
<p>
<b>Hand-In Procedure</b>
<p>
You are to turn in this homework during lecture. Please
write up your answers to the exercises below and hand them in to a
6.828 staff member at the beginning of the lecture.
<p>
<b>Introduction</b>
<p>Try to understand
xv6's trapasm.S, trap.c, syscall.c, vectors.S, and usys.S. Skim
You will need to consult:
<p>Chapter 5 of <a href="../readings/ia32/IA32-3.pdf">IA-32 Intel
Architecture Software Developer's Manual, Volume 3: System programming
guide</a>; you can skip sections 5.7.1, 5.8.2, and 5.12.2. Be aware
that terms such as exceptions, traps, interrupts, faults and aborts
have no standard meaning.
<p>Chapter 9 of the 1987 <a href="../readings/i386/toc.htm">i386
Programmer's Reference Manual</a> also covers exception and interrupt
handling in IA32 processors.
<p><b>Assignment</b>:
In xv6, set a breakpoint at the beginning of <code>syscall()</code> to
catch the very first system call. What values are on the stack at
this point? Turn in the output of <code>print-stack 35</code> at that
breakpoint with each value labeled as to what it is (e.g.,
saved <code>%ebp</code> for <code>trap</code>,
<code>trapframe.eip</code>, etc.).
<p>
<b>This completes the homework.</b>
</body>

18
web/x86-intro.html Normal file
View file

@ -0,0 +1,18 @@
<title>Homework: Intro to x86 and PC</title>
<html>
<head>
</head>
<body>
<h1>Homework: Intro to x86 and PC</h1>
<p>Today's lecture is an introduction to the x86 and the PC, the
platform for which you will write an operating system. The assigned
book is a reference for x86 assembly programming of which you will do
some.
<p><b>Assignment</b> Make sure to do exercise 1 of lab 1 before
coming to lecture.
</body>

33
web/x86-mmu.html Normal file
View file

@ -0,0 +1,33 @@
<title>Homework: x86 MMU</title>
<html>
<head>
</head>
<body>
<h1>Homework: x86 MMU</h1>
<p>Read chapters 5 and 6 of
<a href="../readings/i386/toc.htm">Intel 80386 Reference Manual</a>.
These chapters explain
the x86 Memory Management Unit (MMU),
which we will cover in lecture today and which you need
to understand in order to do lab 2.
<p>
<b>Read</b>: bootasm.S and setupsegs() in proc.c
<p>
<b>Hand-In Procedure</b>
<p>
You are to turn in this homework during lecture. Please
write up your answers to the exercises below and hand them in to a
6.828 staff member by the beginning of lecture.
<p>
<p><b>Assignment</b>: Try to understand setupsegs() in proc.c.
What values are written into <code>gdt[SEG_UCODE]</code>
and <code>gdt[SEG_UDATA]</code> for init, the first user-space
process?
(You can use Bochs to answer this question.)
</body>

BIN
web/x86-mmu1.pdf Normal file

Binary file not shown.

55
web/x86-mmu2.pdf Normal file
View file

@ -0,0 +1,55 @@
%PDF-1.4 1 0 obj <</Type /XObject /Subtype /Image /ImageMask true /Name /Obj1 /Width 2560 /Height 3256 /BlackIs1 false /BitsPerComponent 1 /Length 25249 /Filter /CCITTFaxDecode /DecodeParms << /K -1 /Columns 2560 >> >> stream ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿü¦ÑæKçóFf<46>…<dp„|än'ŠG2tyŠJÙà\áÅ#åãè<C3A3>ÂFaË7L<37>lº1ÙÑš"^+<2B>Ž aà Ž ÐGåž¼ŽJYÀ´ °ˆƒw&EÏB2(ä09<06>Èdã<>QfÔ£jÎæAÜú”.zˆX0æ<1C>ærÇä2Aw&9 ¸']YÇ*£Á«”äAÉŽ[“ ¦ ß xr hEƒÐí±®dÈ'¯ÿÿÿÿÿÿÿ“`×úãûÿÿÿÿþ½ÿ_ÿýúÿºÿ½{ÿÿÿÿÿ­ü³£Zî <09>¨D#:;ý„×ÿI¿×ÿ<7F>ÿõÿçVb.Dƒ!³U~PŠL†ûú 3âL<œ\ๆ œÌ @ÿÖ<C3BF>ÚèhqªiýÕ´Ó­ÕtÓOª¼”Q;¢wª´IòCÚ“|»¢wÿ ž§„¯AÁ ƒxzzþþíp¿ëéö­û¿½cõ½/ [ý*"<22>>DQéiu§"4ºÿÆ ñ{ÿÁý×Wþðà ýÿä­¿T7ûÓ¹z„ÿ÷òÔAþ—ÓšI_5¯ùÖlŽÃÿ÷µÒÕîÝw3¿ò$°®ÿXj¶µµüZöÅEé[wlußÕ«÷wÚjéÝÚdW_»»M5öS¡hA„Bõ†\ eº `<60>„H¾"""""""#õÿÿÿÿÿÿÿÿÿùk%¿P¬ìh‰#B<>#-"@ÊH†Áùe0A„AªaMSú}¦½'ëûTä‡É§¨Y' /(&ÐAÁL¦ɬ *ÿH8T ôÿOûþŸÚÿþö»ÿ¿¥ù¡‡ðr†—ö”œë~·«}þ;ŠŠ<C5A0>íV¿ðKé|Wî (Kuÿê´ %¨J¿ûxP<78> ÿå” ´%é_÷ß Iá/o®ü($ «Òùeý”.áþ×Jô®í~ßé%éjÅKL»³~AsøfÏÿ±¬Hƒˆ»ø¿kí?ûþ_atí0«¦‡àÁ4"""#ñþýüµ•QÑ|´ÌDr/¢8-…#†Hk|²§h)ÿü<C3BF> (r <20>ã<1C>ä= ÙZÿ&9>.Ê• ¹s.¥ü!wÿ¯ÿ×ÿÿ¯ÿÿÿÿÿÿÿçxþv¡š™ÇúõLèó±Ó ûÈÀsKÿ:È»ÿ Ó*oÿ´öÿ;@Ÿ¢8z ÿÛ_-Zôÿüǵï~¡´!§ý{òqr9fcéÂfÄ4'ÿ܆<C39C>戎Ž² ñ“åÅ4½8‡ ô¯M?ûîa¡f„P<E2809E>¡¯§x_wM7ÿþªž<C2AA>ú¢qûë ».(<28>í…ÿß{Z]/Iäñèžúÿ_z ‡.ØaIÛ¿í{ôÿú‡Iÿßû
¤ú éÿ¯Jÿöÿÿþ«ßî®·ßÿþDù~ÿÿô¾>þº™‡­ãÿØqúÿâ×ý×uõûZØÿÒtúëî½ÿýx?ÿ__ÿÿYºä#‚Ûü×ÿ_z÷_ÿåî­…ósÑ:ÿúü½ûµâ½%×s7·_o„ÿÞõ¥š}ýÚp×µØ`¿iXKÿëöm.׸­Šâ½<C3A2>Žý0—¶µ¶+÷ºëjújÿ¼W±QV×û"ö°Âöšý{»o¿†jg4Öv­ÓA”8@ÁWþ½„M5UˆˆˆŽ"""?Ô21Âÿâ)ÿÿøKÿáõÿûzÿÖ?ýwÿþ½ÿ}®¿ì˜ÈìÆc(EÄ.ΈŽ…ò8‚§â""""#ÿé<C3BF>4r c!¶ä§.ɹCœrÜã”9Fá]Uy!Í¢èò%,Én[øˆˆ<CB86>oÿ×_^þÿëëßßÿõõÿÿÿÿúÿ¿ÿúÿ¿“j¾¿¯<C2BF>Úÿÿÿÿ÷ôÿõÿÿÿÿÿýzûÿÿÿÿÿÿýoþ¿û ÿ¯ÿ¥ÿÐ_ýú_ýÿÒÿéÈW„ºúéÒÒïïÿ%ó¡ÿé0@ÿõ§ÿ [;£'3`äÌ1ÿH*'<27>S¿ú®åQõd%ôº×JÖFïþ é~“Ÿ4?úƒúé4÷¿ývµÒ¼Æfh<66>ù_ôMøªézgÏ üø <C3B8>Ùñ_úúKÕvÇ~ƒC ¯ô„¼%¤òÇIøL*õð’-ñë¤Hœu“Œïþô«âÝ úPƒÐ<¼ÿÒðÿý7ZN“ÿ×Ð$»‡¥ý
ÿÿÐ]$xY!ܸ({ë…ÿÿ¥Ü/Zt„ã¿ä«õÿ]RT¦ûÒÖö¿ô¿<C3B4>ÛÝþ©ßÚ¯éz ÁT}øÞM^ ùkšzéo¾ßÒ¹<C392>Ög}ôºÊ™mé?ß«k­ÿ¬J"8Úî× Ž>Âv¬0µý¬Gô:¸ø¶8ïøþÝvÖíµÿü=4ì-^ì*ÿ†<C3BF>ØX“â hWÊåqO3HäGˆàn`ÅñDE!¾WA=#µwô—í~È솔VA7<…¹\@¸ç<1C>åySYmÆpX¯¢Úë^ëéÿû/Wâ?ÿÿÿÿ×ýÿÿý~¿å{Í/þVÁlÿáÿÿäG.È>Ì97"á8 c ƒA<>ÿòÛ£ýzþÿoÿÿêÿßþ¿ÿkûÿ¯_ÿÿÿýÿ~¿ßý{ÿ]ÿúÿ÷&âÑüó3‰äGˆà„p‡3ÁA|ŽGÞþMÀà 5ÿÿú×Y €ØëÿÜÆÿ¿õûþþ¿ÿÿÿ_ÿÿÿÿÿþ¿ÿ÷ÎÖÿÿþ?ÿïÿÿÿV¿ë¯ýÿÿÿý_ÿÿÿÿÿ¯ÿß×ÿïfq'3ðÍ‘ƒ#™5EÅ#²ñ¦lˌ֎ô/ÿÄDDDDDDDDD}ûÿÿùmÑëý}ÿ¿ÿ×ÿ:õ¼'ÿzÿúúGe"G~™ªÿþBiÿÿêÿÿþ—ÿÿÈo 2ý<32>ÐŒäáƒÑSd]ñQoúéPv†¼—}Úé­ýõëû^¿Ó÷þ–—Ÿ ÆcÍÈãs1ðïßÿÿZ^a =ôÈpj]=ÿýªö«§ª{ÿÿýpº ÓäÝýºìƒ÷Í¢7…©ÒÌìâ8¿õ¬¼ËÊAô_dó¢ùðÄÆfÁ|NXðƒÂ¡g>øEþ•GN“Óïtþû ¢oïÐtš}XCCÂh4ÿá t?Õã[­?ýÚéÚzº®½ýõ­Av—ÉÿkfǬ<C387>ïH¼z&ùwûå=<12>HÇh<C387>Ñc• úéBUa‡î<Ì:WÅÄzPŸ÷ÐAè7¤ô/<Ÿ¿ô ¨<C2A0>îdþ½_úŽ½vŸqÿWú§Ãî¿Š A>ˆ±ÿD‡)ÿì<C3BF>ÅëvêÒB¿úéw‚†?ëé¾ ù8)Ê„/isH.»Õƒòœ&…×Oȱ`Å‘)ü"^!ûõêÐ<C3AA>_ù:D6#!Oúƒó ?ÿñ°Ø?õ„Rúÿv}úÿi¢ U—_o ÿ§þû 7ýU91ÿøVÒ´½l+Ø,ýy ~‰gïþ‰Å ÿ¥Â[×Ä68ø‡±H… ÿ ÷šp¾]>ßËQ$ôŠq;izîöÖÝ+ÁWìgî~¿ëÌã‘Ûÿ^ºÿäAîäAÕ;
a¯ýë~[mm;Õ>ÿºé:_øaA— „Ñ# ) ,0CþÓkØVÂûÃ
ÃäQí…† ýH3ÛKüDGýx­ŠˆzVÆÇLTSýþ—ùØ,¿Ý4í?»û´þúÒöûÿw PiÂü;[4×ô·ô¿á[ ¡ Â„""!¡(Mqékék«ˆ¨ˆˆ<CB86>Õþûª÷î»ú^ØPÖþ¾ô½Šú„µúúªþ—ú^ÿéo¤á¬~é}õˆÿTµp£ÿëá¥ÙØLŸ%|ŽŠÒ%ò:(ÿ¥ìWý/ú[°²lÿXŽRÁl%·G½z×Ôõÿ ã<>ŽBŽh%¡ ŽD¼%ÅŽ]gò¦¸<C381>g ¥…ýþfêÿú÷¯ÿÿÿÿÿÿÿÿÿô¿ÿ¿ÿûÿÿÿ÷ëóµ@§a‡'¾¿ú ¾ü4ÿÿ÷yTÿþEv×ýïùsËþ¿ûxÿÿûdb \3L£8Éïïÿ¹ñJŽÌǹ†pC
 ˆœ!§ÿ¯ü Âõ0ƒÂýúÔ'öúë×ÿÿTLz#‰ßøHœ98°<r8¯þÿÂÐ< ôßu
ƒ¤ïO ºÿÖüpžŸé'§÷§þÿý
u¥ûý¤:WÚ_ÿ^‰/"?®²á?ÿýð@»Çê±ÿÿõÿ姺ÿÿþëÿ¿ôE<C3B4>ÿôÿ÷þÿþ_þZº»ÍTÎïÞ¿ÿúU~Oé/fu®ëÇúÿ× úúiuÿ¯Ýµml. { wÿÿñ Ž*)xêW_ý}ªöV×ÿÿÜ2+Æ«pÐi Õ Ï4t`Î ´" Ê&¿øˆâ"5ÿÿ2&é÷þÿÿ±™ëÿ¯÷×íü!úÿãúïÿ¿ÿõýÿ·ÿýÿ×ëÿëòÜqÙ¢4#§6Î#‡ÿÿ; `Gïÿÿ_ÿÿÿŽB9 Ø)ÈŽÍ…Žq˃]T|+ œ¤¿ïÿ‚µßõý>¿×_ÿïÿÿÿÿþ»ëÿÿÿ¿ÿÿ]ßÿÿ¯ÿÿÿïþÿÿýÿÿÿÿÿÿÿÿÿÿïÿïõ×ýïÿÿÿÿÿÿûÿÿõÿõÿõûÿûÿÿõÿÿï¯ÿÿÿÿÿÿýßúûúÿÿÿûÿÿõÿÿÿK¯ÿïÿÿÿÿÿÕþ¾ïºÿõZßïïßÿßÿÿë¯ý}ÿÿíÿû×ÿÿ¿×ÿ×ï¯ÿÿýÿý/ÿýÿëÿïÿÿ׿¯ë}ÿ®ïîûÿÿ_¾¿×ÿ×íÿïÿýý×þwè¨GâŸ5æÑ ¹˜uÿÿQõ{ÿ^믻ÿßù¨ÿÿOú¿ÿýúÿòUÿõO¯ÿ^¿ïÿÒÿKÿ„u¿ÿ¯õûúOÿõÈÆG3´:5",×ÿþ¡0¡8atUÿ÷¾¼„n•ÿþɃ«ïõUÿÿ"ÍH0µD…:‰ýÿþÛ;Cö¤ˆÀ¿ÿÙ j÷ÿ³ªÖB«Õø?Ï  dxœòqåÉÌÇ”ñq„xŽ!™é0ÿõôÈ7ÿ Ð³<C390>pœ^ƒâøýÈ`gÿû—¦·§xOººÕ<C2BA>#_ÿø½%×] "#ý)èÿ¿<C3BF>YyEöN<¼rxôO2 åù<Ó ÿÿ À•­ ƒ~ø~›ˆÂü$ÈfGûÿ=<>__Oí­Á ¹:$òìº#äpP^. ˆâó`ŽGÌömŽpr8`Ž‹Š\9Ñüïˆû0EÝ/¥š,õÿ먈<C2A8>ˆˆˆˆˆˆˆˆˆˆˆˆˆ<CB86>ëâ2&<26>E>3æ<>kâÑáWúÒÿø.NÁ暈Ný—_ööB9Ç; ‹†˜ä܇_ß@¶…ôE{ýUÿÔÕ„"",TsRäý×ù>ü&¸/®NÎ8KÿªÒ""#õôB­ ½zûö+ A}ú‚»¥Õ³ïûöÏ¿ÿéÃ"ÓßÜÕ§wk믯׾ ¿ñø¸á„¶!ÅqLƒØ}a$DeÿöÜ1ÿµowî”kÿ Ó <0B>ƒë §õÿü;A S Úª ÓMi¿ñÂ^«Ô¬Êª—ïû]Ò%Ù#¯¯ñFKÐ@šßßU0Ÿÿ¦¾“»úîÁÔ<>Ñwý~#°©öÿúâ–“ÿû^þ¿ñõÿÿk¯ÿã÷÷ù?8çr¦zŒA¨LåY«8ep…aU•¿ª¿×ለˆˆˆˆˆŽûÿ»¤ÿßõ i}?\ ·ÿû ­kýj—ÿÿ…ÿÿÒm+ÿÿ -ýÿßÿþWëü<C3AB>ŒÿˆÕ_ÿÿÿ~¿ÿÝWÿ¿û×ûÿë]s_1òèà†Ì<E280A0>žFãhº0
¤pÈ Ÿþ"""""""¿ÿÿyÃkÎå¹Ü«?f-2
Ng+<2B>©¿ÿYmÐÿßÓ¯´»ýþéÿÿÿýzÿÿÿÿÿ÷ü˜(ÎÉA<C389>ÿðÂeWR¢þ¿ÃÓÓÿÿµ×<C2B5>§ŸþBß.l?,]hÿƒéÇ×d)?ýƒöì‰"{†tEÁÿù/·7‹³Ã(&pCìŽ*å<G&\.ÁÿöAzÕiÆOAÅ®…÷ìývÿÓ ®×oû! ?þA¹ýô¨<C3B4>ì0¤áȯ]ˆ`¤q^ûìï
_8@ú á]H1<48>­<E280BA>Êïû<C3AF>Íü%×­á:ûõÕ—DDCŒxÀ!r@ì®/¸Á¯áó†Aœ~µ]0´¯þ·õˆˆˆˆâ"?ÄD{ÿxÞûõþÿÿÒøÿûJZ÷È
ö@ðbêïÿÿÿÖ”½— ¢²  Ç ªä4CdG"¹9C™Éж…¯-Â<02>^ÿýýúD44Ÿ<E28099>ÎBuùvyÌêëŠßí¿®<E2809A>__‰…­¶¿ÿ±ZôAÆuëNÿÿ_ýí<ô¥Ûa[^[R-’¿¢#NŸ"+QüTT6?‹ø{\/ V[¾ÖÖ¾¿3O_œKMì Âv¾Ÿw¬~;TL!Áœtâ,/þ"""4"#]ýWëªõð<C3B5>_ú¥÷Ú¯øƒ ¿CÿZödkË¢è ÇÌþ_(GLƒdíÿÿÖ\9´)r<>ñÿ_ÿÿûÿÿ×õÿÿÿ÷÷×ûÿÿZ×ßÿýn[§;ÿ"$ïØ:úÈ`Gü†zª½ƒßðöC3;òß‚dO#HꉄQ˜¾\L8²æG<C3A6>ÁGÊã@xlˆˆˆˆˆˆˆˆû÷Ö
¿-H<><48>l,ʃ9Xã“9Ms_© <0A>ÎWüƒ ?Ð!ÿ![:o _äÛ%ÿÿÿ–•f]pšúiÿ¿ù©¢wEÃ_­>ÿïéÿ¥T¿ðe<E28098>aN€¹ÚØ!ÀØ`܆ÿ
C Pzk÷ý"n·<08>3\œ5%± dâšë܈ûþ’„L/z¦<7A>¯XL ¾N/ô•SåÍËü‰<C3BC>a5Ysf‡-e«­ý%»ÝðnZÅ=wªD‡Ò ÷þ<C3B7>*' º«ï…“º'<27>w}0¯ó6uf"„GDƒþ½§_ü<5F>ëOOÿÕ ž¼ `†|@ƒþ‚ý'_ïW¯×ÿtìÂjš ï¤/×÷|[]u×äÇꩪ×H`û}¢cß}JK#·úÑ1þ¯Kmk'nN2q—<71>úDa<44>äÿik¯ ÍDoâµëÈ.`á8aaiè: ÷þ‘¨(D$Áþ4— |CkúIâžÈ`ÂB1P´<50>[ØOãTÁ?Ö´?Úë]kÔ%B¿\þ­õû]á$ºéÖ‰Â|Š=ƒý"xÑ8µúIuôx\´¥!Qø¸¤<7F> ÛþödþZGúJœµAK¯]Òúðˆ5÷™Î?ð‚ˆ=päó}éWëú ?K„­.¨/oAi/CÏôšüw¤úŽ/|'ü¤]šPßéŽÂKµ ×¯«Ò@ÂýkÕµmo^ <>ýª<C3BD><C2AA>ÂXö¹þÌ'^Âv% ”·ÿa) 1H†ú ƒ¸¶8Ø¥º&öÂDLÁŽ*Ar7ï~¿»´×Ò{H.ÿö¿ßÙÓ†šêª@ˆ%Ó^Ó panÓTÕ'VP‡è*p<70><†½X!£ Å
Þ"jb$""hDDDDD‡Ðý,—NÝM~¸OA{ ~“ë_ý%ºKÝz]…ý¯¥lR ÀÁ˜Ü¯t·ü^©a¯_Öh#/ýb#ëÆ·ýÒÿ½ÿëú2áIaHaÿù
h ` ÁzÿØAÃ'wïÿ§„Ò»ïÿZræÂk.ysrç{ï“žƒÝaðû{~ÿ¤½Ñ<ýíï÷ÿ´út¿ÿw××_ÿµ¿ûÿºÇZë^ÿ«ÿõÓZþúZÿêÖÔRõù:×ÿuq@¾]ãŠÿ×ZKý¿ÿþúûýÿ}·Uòs¦ÿì­GƒeˤÔ|<7C>t©¹gÿØ<C3BF>ÈnŸÚý­ÿ÷®æÛëýÿ¯jý}ýuþ[’Í¥½¯õ¥ÿþ×í+ðº÷ÿØ®Ûü% ÿý¥â¾*+éÿm{O×ëÿ > [Ó ÿýˆâT!ÕþØ¥ÿý~Úÿúÿ½Þ¿â?ÿÿÿÿ_ßúÊæH<C3A6>þ"?ÿþÈ n;˜ÁÚ¦<DDDDGÿÿþýÿÿ¿ÿ×ÿÿÿÿÿÿÿÿËAfhf@ßËqdGF´vñÜó!.û•ËhÂei¦šal„kõÉHçùTgxé+ïø"‡"ðh?Ú¯])}ÿèa0ž×µþÕMqUUÐ_ù_ IÞNÿäfSðÍ2®MWÌ38 ÌË.EfÌã Ž)˜ìð¦™qBjƒÓn\<5C>°ƒ8)N.n3"ñ¸¸ÉÅ4 û 0DDä Ž—ña8†=cC4?éî¯ü&ƒuBаš„^ƒN´¿I;_µ»ÿ0úZïÓ øJío¥
¥÷.z¢ODß"ÓÃ
MÜ“×D£"<22>Dá¢QŽ28†%<¥Çdp<64>´‰ÞJòûòxáÐ<ÿ˜K%<25>è„ì ôx Ü&øAºµî)Òé鵿ßë…kòºX£UõõÿÓ¥¯ÿ¥<C3BF>Wßém=?Ö¸¯×ð»÷ˆÿ¿íÓý­wÒ÷ÿÿ¦<C3BF>ÿö•"¨ýzÿ«ÿçP©g<>—?ñÿþpm~%ÿjõ<C3B5>ëÿÕ~{ÿÿÿúýWÂ_ᆿûÿÿî>ݵÿÿÿDwþû Oø7#<23>·ýÿ¬Ÿm¯þ¹ kÿþø/cù\—Ò_ t þ¿—£ÛÿúñÛ÷ùj?ÿ½ÿUÓÒ¿ÍÁé~Õo4ÿ_ý%­¿Š§3­+K_×_ý¬-~ÿÿëO[Köײ ÷¯všjÚ¶¾Ø]w9_é+K»õöÕöÒ†µû†“ ÷³bwÅGñÄ>“û ™¦>ØâÇÜllWÛ_ÜSVõþï¿ø¨ƒKº~ßVŸý¯Þ½…M?»"ëøu¯iú~š×»28^ðšÂft ÉÚ¦<C39A>è&[_a;0êœ0ƒ/“Xg§´¢|@Á4Ðh!H~$„^DDDDGþ"vu×<C397>(%~5_µEÕQ6-ûT- ãÇ÷¯ÿõ¿úÿýÿ׿ÿ×ÿÿÿå¬c!££8ÁÍ„#Šr#äp¤p Ž ƒ˜Ž ÆyÆÆa£™„3À…ÁAq˜dá¶a—€^Z <0C>!ä`Ëäxø0G\ˆàLŽ[â9˜Ês0þ]ÁPŽàsÈá{8<10>Ã9ˆàwÁ¸ë
@˜ðMÉY †häpERX$ä$9ÛÐÌë5 f®C=ŽDó¹C˜çY2<59>dÄB§ Yì<>˨ªÌ4Rs€ƒQ<1F>ê2ÖS\µ¨h¾¤§ˆòÐË)J2­rÅq)£ òÕ¼ì£"H§gdÁ¹Þ¤EˆíÌ ¦8 giY¶28† ®¢*†šå#ÔœzÃ&îi<>2º¯ø^—I»¿áÂïÿ××í¿…Tÿÿú^וî#_ÓTíÿi}))Óý·ûV {
üìl^Ú×Kx{“³T}4úyåq[#ˆTDŠùÆj3 Î„Ñ;1;8d<38>sAƒ4œŒK¥æ|>'ž0S1
Û‡>´"FÙÃ4g£q=OÂz5™Ã02@… ÌŽVP.|]ü = `˜<41>¨Aá?Â_áè40ƒ ãù¬ŒÃ
&8ú}bL ã“ðƒ4õN-=:Óˆi"cú_T®šnWW„qÖCå<43>ѼŽˆùx_Œ'_Mßi¦«ë¦©î<C2A9>§ß§õÅ:Oíz­Ù]úŒLöëCþ¿Ýd!iW§¿ªëwëú†~½FÜ”<04>ªªr7ÿN$gÄ Q~¾ê)cïz®ëö‰FJ?#<23>'<27>»Z"»ô»¯…#Ž²XÑ2;ÂŽ‚EµB@Œþ¶D×Y"Ã’Í%r,dw…%œ=?O#Î×<4F>äyÿô<C3BF><>µi‰­G
ù?ˆïZý<'®E/t¤ ^Á«k®´Ç¥õzzK?—;¿ýEkQÐcÂ#®$kúA?ÁSw¥ôë<C3B4>×UÁŒ„IÄ/ÈTDëõ¬!Ý¿ÅÑ ?Ql‡#}G]ÿ^Ì×þBŽÁj¯¯âäÁ­=ƒÏƒÿ×ÿÉ~Òú ¿Ž]Dè·¯ê0ë„¿â<C2BF>ÿß_ ð¿ ýþ„ÈÝáÿáü7™-«ã_ÿƒ~‚úì.‡ýøl7ëᇯý÷¾ÞïüŽ²íà<C3AD>Ë"š‘Õÿÿ×ï׿ÿÑû‡Ñ?×ú×ÝÖikY:™.‡uÿ¾uß þAŽÿÿÈÁÃüÒ<>Ä࿾ºÏ=‡ñò4˜˜ÿ:úßü<C39F>[Á®üð·Nûú5dt­-æÑé_ñý_r<5F>ӽϙrz pA #‰:Û×Õ­GKÿA®½…µìù¡m ßô!ÿû -Û†h[}W À·« ,,?¡ydšý¿^Ïú×H äãì0]tD³<44>l+Ø-Ö¿a~öŠûÂVãàØøãccìÓø -Çüqð…¸ß<C2B8><C39F><EFBFBD>øý<C3B8><C3BD>Ú¯÷¦®½ëÒl„c±ÿìoþþ×ÖþÓµ­ýw{WþÃKß}¦±²ŸNY>þÓ궿·®ï»ZþínÖþÿþÖáÃA¤Šð±ëD[Õþ¡ý†öï°š 'vD¨kdX¾Â®ýØNàÁPdò ×0Zb1HzdAáá Â÷Þ¨4® ¸Ð0@Â:2:jŽ¨4!àÐ0MavšÁ hÕšˆˆ¨<CB86>BŽàÂÃÐh"øh<C3B8>ÎÅOQˆŽ"""""6¨5©fŠ?V5 Õ°•x§Ý/tÂJï@Âì ÔíKF?ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿü²™¨ÿÿ“s7…ùcÖï¸fiÿÿäñRUÓ<55>GÿÿÿÿÿÿÿÿåT bàælØ˃rñt\ËŠb9­gÐËäto#£qDmPˆˆˆˆˆˆˆˆˆˆˆ <0C>w!´9‡ ¸äÇ,rrÉÁxQƒ6„¢"""""C$‡,r-0Ðh9Ëk ÄDDD†I9 ƒ“²1Íg™NSHDµë*„ô,!ö<>ÞGDDDDHc<>mnE¡ŸŽå4â"""%¹ò`!€¤tl €ªG½ˆ•ÍQÄ^)ÑÌò.ŒÙ0ÐMˆˆˆˆ‰]TÌG™!À¯ˆˆ™z(ÍÅÈàAs#<23>áµÄDDDì¡÷×ÿÿÿÿÿÿÿõÿÿÿÿÿÿÿåX`ì€Á;;3?pÓ³½Y#1˜ÎÇv„p9,KáÃÊÃÊMM¡$äße9„ÿ}.]êÊ¿Óÿ¿ôºM÷ûþK²\Ç’:_ÒD ÿ¿»™‘6ðç#ùă¡Ò]( üŽá¯wë„ Ôø<C394>i„Á3š™ñ™™9ãÍF\G3<1C>éSžFŒ éÎÈ/¿øL! ÷¸Â ac׈|Eé.¯‚ 6.Ùá ÈŽfcÉÅËhß¿ªj½iÝÿ»ä(ÿE<C3BF>ˆá…„;ïý'þ«ßÔù­ýz´´H|?P¾œ=ÿ’·$ù(|<7C>ùì0IÓkv^×ÆÝ4½^Óõ~  H:O½Hùh'ä³ü–.úÕ×°¤‡êˆâ·È£ïOõB\%~G ¯KÓ]u×ïuUûõŽE“Ìžx'ǽþ<C2BD>-ãê<u÷^›÷õÎ iéiô#î¾×O×û ŸëÓÿÓÕÔ<C395>ZZÒÒÕ6Gøó@«Qj%ÂS}ÿy¤Aÿ¿ÿ÷ÿé×þƒäQôÚ÷Awó0н}A%0šÕyfý<E280B9>ÿ<EFBFBD>v§ëä9úÓKÿÿK¿Ç¯ -×þ½ü5¶¼|Š9ÝIjGõëÿú¶Á…í³eý¥µõ¨þ¿`¸Oò^ Q,þÉUÿqÇñ±Ü ƒ—¿Ãíºµn~­üý¯GïÿÕuU ­¨þùжãáÚ~6Á|-¯a?®î×ÛMmzõ}7]Óñ\|oWdQáÚ®ž<C2AE>¯íÿz«v“_íèa ÿj«½¿Ø[aÚÞ¶¿ˆŽ""""4 œOÛUÁPh0žE{&ðÕ2(úØIw,£’â"#¸ˆŽ""#ƒ98‡fƒ! üRIyÚƒÄDDGû…VÒ°«×ÂXXëáªÛVðˆè²Ì궵ñÓT |DD[2_ÿþÿºþ¿ÿÿÿÿÿ+‰‘øA§ôòήþ³Ò ×DÊ2#Ñx¦šØ e Ê€r¬É€é”<E2809D>¾F¯Óª®©„íÕ2¬TÂÿpJ]°¡S½?M©þ4zN,»¦oYb”HvªŸÅÿyw¥Û¥„<11>
‰;ø%ƒ×ôôô“_¼maýÐ:2s_ê‡Z×ÿÿŠ[]Ú®ˆ† Jù0ëù\¡Þûì$”…VX„šõ·ê»«'~¾þW¿KãÖí-WÿßáÛ§É=)aµúÿú §Dá¥A±oÁ°—ßØa/KÃT§â ½h<17>ŽéëôÈ/þ×ý•‹]|—{Ý »¯†ª‘×ÞÚ[k©H>ßø<C39F>Ûö X*Ú¶a0Èâ×i}/â˜éŠ<C3A9>ŠSþÇöµÃÖÓ[Q{üDZ ÂÚ…L/a|DZé˜ÿÿÿÿÿ+<2B> <0B>á¨\!¶_0þDDHƒ<03>îDr+™ÈÙ¦_ Ò91ÉŽCn‡eøˆ‰ <0C>ràŽ
óauœ¹GË\(2
ÞA©ÄDDDDAÿ¹s$ ƒA <0A>£øˆ<C3B8>•3²†TFÁ´<C381> <0C>Wþî,Ãa VÈ”Fâ5&@Ì<>ŸÿÝÿ¦Ÿÿ)ëÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿúÿÿÿÿÿÿÿ¿ÿÿÿ­þ¿ýÿÿïÿÿÿÿÿÿ×ÿÿþ¿ÿÿÿׯÿÿ»ÿýÿÚ_ù y8ÿÿÿø z!Ï'ŒÞ|ÿÛÿúzýÿý}ùnÿþÿõÿô}òýÿÍ?ÿÿýÿÿÝÿÿÿï_¿ûÿÿÿ_×ÿÿë¯ÿÿÿýãâ¾» ªßׯÿ<C2AF>ÿëÿ]ÿÿÿæ“O{ÿ÷ÿ¿õÿ4ë×ÿ×ÿÒÿÿÿÿýÿoßÿß¿X`¿¯ÿ¬œ/õïÿÿý=ð×ÿþ¯½÷V¸K¿Ö²¾¬Âë?Ô\hqÿ•ÐMt‰€C´ŸùçÿÞûï†vò<76>ìîqŠýÿ÷ÿ÷þŸä
ï_Õׯ]r};#ÕÚþŸï_úÚí~<7E>¡ùsÿßYÑ]…ÿ÷ÿîh»ë¿µ<1E>|Îfqÿ)ú5êHßýN
hDvxy™drÌÝçÌ ÉÚ”?3ŒâvP«ðŸY ntxMÓM=m=?ÿ¦†ƒÂ ‹Âá0<>Ì; ç˜a"§g#ñ½>´ôôÿý~ÿÓTôÓÓè'®èki¦ƒ¤XþÌ<08>òñòÇ=[¯Dáï×õ¯ÿYwD‡¢oÑy½lÀä‡h˜í5Õi;[®é>­'ñÑoOèÕtš]›¿^:áù<è¾ÅA< òíÓ˼»ÂEåw¥õ¿éêº}=/¿û¯~ïÐôëÐôëÝÓ¦):ÚN“§Ãïí¾ÏêOÕ¾aþ>Ÿ×ëÿèÐUâÈâØ_àû §L?ýÿ^ÆõõýÖ?ÿë[Ì>µÒÿÿ®óÁ}ãDþÜ„±ÇôülÁêݨ¯ù¢ï§¤ÿÝÿÿË×õZÿ´Ø(nÓÿ®Ø>><3E>„ëÎÿ÷úÿô’ý—ÿ¢yùps­<12>õº"¼7_ÿƒ ½]Bÿ®×¯þ¶·Wéq_þ¿Äx_蹯—È_…ÿòÓÁ¾ïÿßé ÿßÿþ¿Ò·çŸ^»iéy‰ôÑ~íS_?ƒ—ƒêô‹ïù;ûõܞبé/Ø_þÓ¶¾·^ºéßkNÿö»Û\ºc×÷ÿ¯õ}ýÒ©“ֿDZÇñÐ.ØXkk÷×ý¯ºòù½ûI}÷÷p×îÕm5Þ {z]E÷Ø_ý'VÒ÷×[_þÈÇ» dQðº í5»‡Úiÿ{ëc®?øÿëÿúˆ‰C ˆŸ:'˜A—i„Âa8dÞË~M½?ï<>µÕíé×ïò¸-ˆˆˆˆˆˆˆˆˆ<CB86>Ná“…ødWá°¿Ú]¯ÿúö¸ˆˆˆá”<C3A1>§Ž#<23>]Sÿãìy7ûŽ8ã¯Ø_ ƒÿ]{Ýßý÷ïïê Át¸ëýÿñþõÿÿ½‘ÿõýÿ`ÿ_õÿá¿ÌÄÌÃq_õüÖ_¢zy@ºg™çïÿÃþöøAýÿ÷éÍ&ý$NC¯jþ©ê¿ô„2âñ¯ý÷ÿa¥÷†ÿ§ôëÿÿ»ÿ﾿×ý¯]ïì}ÿøkÿªäZ÷3ëßÿš¯×ïd(¿ÿþ?ýøo_^¿ÿÿ†ý_ÿ÷õÿù¬ö>¿ï_­yø?QõÿÿÞºï½×úþ¿­ß¯ïþ¿S¯`Öªµÿýu¼~û_ÿ~Úß oÿÿ¿xXì/ÿÿÒøøâ?÷þõÿÿÿõÿ_·ÿ¿ÿú÷õ¯ÿ¦¿þ¿ÿýÿÿþÿÿÿÿÿÿÿºúýÿþ¿þ¿ÿ¿ÿþÿ__ÿÿëÿÞ¿ÿßÿýö¿úè<C3BA>àbɹ „Ýž²çd'ôýÿÿ Hä0äc9Ç4eACМʿ­5ÿüDDDDDD†°ä¸ŠZ2:0¬â@þï[ìDDDDDD†<44>äAr
9>ü›ä4PŒ Û'äs1—ÈèØS0`ÀB9—i|DDDDDDDD®V‰ ÖeÙ€„qæG“C0¡§ñŽ\4GG B8dÿ;+þ?¯ÿßÿëÿÿùj™d"Ø·þõùj“Ö”<C396>äoþZ•WÞšvF¢+ßU÷þÓ®ÿïÿõúUëÿÿ»ÿÿ÷_¯ÿÿ÷ýÿõÿôþ¿ÿÓ_ÿÿÿýÿÿÿýïýÿÿÚÿÿþ¿_þ¾ÿÿ^«ÿÿ¿~ÿÿßÿÿë×ÿßÿÕ¯ýïÿ×Óÿÿ¿õþÿþ¿õÿÿ¯¿¿ÿ_[ÿ¯ÿÿ÷ÿÿÿÿÿÿÿëÿÿõÿÿÿýëÿÿëÿÿïÿÿýÿÿÿÿÿïÿþ¿ÿÿÿ­ÿÿÿÿ÷ÿÿÿÿýÿÿ¿ÿëÿÿßëÿÿÿÿUÿÿû¿ÿÿÿ_É°@ãVE×ÿÿü·,50ÓLÿúïýßêvÔŸÿ_ÿ_þÿÿÿõ†¿…ßÿÿû±ß¯ÿÿÿ¯éíÿÿóÆ}žÎÌÇË3 ñEÙÔ<C399>36} Í#S=Œë‘,ž?/ÿÿÿ,wè?4!á5P@‚ øpƒ:Á|Cb<13>3 0@Ñä~CÁ#ÌÓyᚣìøÉÿÿ_ÒmôÓÓoOOO¤ðœZôÐãN!Å„Ðè ïA„ÿþÿÿÞ‰õõž<>êéú¦ºwið¯´ÐÿÿêŸJ=ró'<27>“·'n»ºË¶H“änEÈáÔ´Ið¤ã Àý7ëP¿ÿ_õ[~¿§§ÒwA<%Ö¨„ Ø$xAáSrxy;%nNðRóÔ¼ÉÛÿÿÿþ<C3BF>þ?ôÚïNºýé'^×M ·õzÿÿÿÞNòqwÕt¾ ×ë}{§ÅV¿}üi'Áÿõè[ç<><øß½ƒÒd'ãþ­÷ü 'ÿ¥ÔŠŸ°cø¯ÿÿ¥ð_ÿ~Øwïÿà¯ýqì…ÓÈ\ÿÿÿWTG_û÷àÛxoÿõëõÿ¿ÛÜ0ò€<‘Ãÿÿîø_¢}þBŽé`ú@õTýׯèŽÿýx7¨?ÿþþ]1×õëËW5`Ûç°of«úÿüüž<C3BC>äQÂ_ÿÿé;^¿ßæo¿O<C2BF>?Í/ýÛïÿÿè<C3BF>æË<§E ¹Çÿÿÿ¾Á?°<>¥°×p—ÓWWkk¦êëéÿûa5¾ÓÞÂb"?þ¾½o»ˆqö qìŒzà Ø-{ /a+ Ø&¿÷a{]+]ïý;NûUǵÅ8b¡±Wñ[ÿìVÇÅÿýdWÕHAéa…µµMm[O¿ëÕm7ÿõ¯þš ^ ižÂ ÂØT´ÿ´ÓìŠ: m4Ðkÿß߈ˆˆˆˆˆˆˆ<CB86>ƒ"ÂEÙI 4 ¯M0†šÃ à x0ƒ úÿÖê#ˆ”J"""""""?Ýÿÿïÿ¯÷á+µÿþÕx^+ÿúÿTÿÿïÿa_ÿQhªßÿÿÄÿ÷ÿÿÿÿ¿ÿÿ×þ·ÿÿþ[…çTT",<2C>ÿÿRÜ° MgbѶv/˜ÿÿû„ë­‡×ÿþƒÿöÿÿþ¨Š;ý/ÿÿÿ <C3BF>—ÿß¾¿¯á;ú^ÿïëý&ÔÌyr(<1B>óPGC/¿(Î"ŒÕ’¼<>w“ ¯õ¿á8°ƒT!„C`…—0ˆ´˜LঃIý?ÿúÐéÚÕ Õ5ÂÅ„ÐÂh]ãÿÿõÓ´ûUO÷OMÿ÷ÿÉæJòy¹<ÉãDî‰E-¼<>ܶ¤Ý¢Oÿ×þ}ÑrÒnž¿zªxA¿§„éÁ$AÿÿÝÄ}{cõ<63>º_¯OWUÓÿÿÿÿúÁ}°Zÿéÿuõ¯ÿÿÿÿç¿œ
¼þ½f>·×ÿÿ¹Ñé?×Aÿúî"ÿ&(ˆR9ÿ¯ÿí}ƒÓõÿÿÿöÔœÃlŽÈà¹<1C>åÍ~¿ÿß¿òYýëõÿ÷µWñºïÿ×ÿ…ÿ]òè¿ýþ^ŸÿÿþÚö¶¯kKúÛ_µó;K2*ÌÀjž <0C>Ìÿÿü5í+]R×Mµoÿ÷¥´ûÄD]ÿÿì{TC⡄ㆠ_ /ÇÚL/ÿ¯ýö©î˜½<CB9C>Øý­Š<C2AD>Uÿÿ²ýªú<C2AA>ƒÃ´ûûí:_ÿ¿ÁðÂ: •¦¥Ž˜M /ì,4ÿÿÿÄDDDG¤A§˜BP„GTs÷ÿúùÔq?b""®¿úßÓûZÿÿÿnÂÒÿûÿ<C3BB>¸¥¯úÖ¾ÚIÿýÿ´íýëÜC+pVÄ]ÿ<7F>ö"7úÿÔÿÿÿ×ÿÿ¿ý~õÿ-Ê:ÿ¯è3°Œ2Èœ`ÎÖY;6ï¯Óó3ïÕv;ÿïÿzA®JZz¹QwÿÿäçªMÝ­=+Vÿÿÿ¸ªŠ<C2AA>ZSOÝkÿ_i¾\:×K»ß¿·õÏ‘ Æc"z8fI$ƒÉ}³  N<>_¥è=k××»t0… ÂÓ=rDa¤™á8) EÒó ͦxfß×þ´Ô*}&»}¡è4Ýšé~†A¿ÿÿÿ[]”>ê¾ÓÕ=m?K^[í?ýÿë…—ŽO(<28>åçƉßþGNú$ù;† z$ô¸ô¯ $û¿ÿÿ]5O¿ÓðžšxA´Ÿ„øÝ*Dò‰ßÿÿþ?¤Þ:I?ÿ«ëL/Ké/¯ë ïï¯ÿ ªÿ"Η¥ýuÖ!•—Û†)w ÿ{  àÒaóùqõÿÿ„N¸Ø=(ÿÿúuÂÈJ—rc˜}¥­²*(DDHdׯËOý†éÿ÷…ãa¯'B ÿBÿ_¾ôˆƒõÁ¾¿ÿõ×Á‡Wî¾Áì¡×ÿÿY<¿"$ÐZõÿýß ð{ïuáñ+©¢"莈è¾GFbQïÿÿ×|°†Ÿ©j=ïšp_-Cöÿù9„C,DDDDz.¿úë×ù»ÒÌÞßîßØ_ó8ÃoØú³ÿÿþÂÚ¶·ZV­Úÿ®ú¯¯‡Þµ¶<C2B5>ÿ××ÿâSÌ<E28098>±\WðÁ5†† Ü0°Ò­Ó_°•×ˆÿ×ýÝý·ßö>*-¶6?n+†Çý÷ÿ‘» «ÚÚýöžÚkïjé¥ÿÿíL<>öƒ
µûNdWÓ0½áxiªÿÿâ"8ˆ<38>áqaA àÁ ¡˜T aWÿÿ&Ú<lDDDDq_ÿÿïÿýŠÒKúïûMÿÿÿzKÿÿƒ ëÿÿˆ°¿þ¿ˆÿ÷ÿÿÿÿÿÿ¯ÿÿü·*Gÿÿ-Å<>ƒµˆŒŽÂÿëÿÚjŸ×ÿí蔳¬E†ÿwÿäxõì ïÿÿÁéj·õÿëÿ¤ßúÿû”ŒéfÈ—•$9;mýÿûóâ
…Ÿ`<60>æb
&xtÿÿõNÂ}¦†©„ï5ÿÿÿé§ôšÒiÓIÓßÿÿê\eå·#v¨“ä£ÉÅ<11> 4Ÿ¿ÿÿ<C3BF> Þôôõ6}ð<>¶ªO?ÿõþ·Ý.ŸI÷÷¯éÿÿÿúö;uÿ_ {Ý+ÿÿÿÜrÈAþ]¯¬„ÿýwÿÿùtÛ|Gôh·Ó<C2B7>r86—ráÈàA#ˆGžÿÿÖƒ~é{ì?ÿô""""Al7rÜ…Šå¹rC˜ðãÿÿßy8ýúü¯×øˆˆˆˆˆýW×ï.<2E>ÁåëŽ^•³QÉë^½mýÿ¼4ï4ºÍ/Î?3¾ö×2ZE)r6<72>Äc8gº#åó Â7žG"â—Èá<C388>ÿÿ_ktØ]ì+ ^÷_uëÿþ¬0JXa/l%`½¥ ~Ø_ªÿßÆÇÓÆÅ0ØlS_ÿëm¦šÞ<C5A1>Ú~ë^ÿÿí0ƒ^Oa¦¼4ÿÿâ" Á0„A• ÓA /õÿˆˆØˆˆ<CB86>zÿ÷_¿ÿuïÿÝz¯ýº^¿ÿÿô«×׈÷¿ÿõ÷ýþÿÿÿúÿÿê[3³K·îš}/ÿ_ÿßß^¾µêÿÿ­ÒÿþÉ'"A” Èlèÿÿü±á0ƒ!ÿÿÒÓM4ì&»ÿÿ}W]ÿý¢vÑ+rvÑ;¢w”4Jÿþý'§„è.žƒÐz}×ö¯ûãmSÓ[ÿýzäZÿÿÿ¬[!|uëÿÿöàôò@5Ã,Ž…Á´¸¤|ð! Éx«‰Zÿÿ¯á°ÿ¡ aª9,.…l¯þ¿¿ r<C2A0> ¾""ÿÿ¢tçÞgKš0Á²ðy=9Ô¸€ãýûÓw¶º~›¹›µl)F3™xÙ˜Dñr.åÃ$ýúí­ö·¿jøˆˆˆˆˆÿªÿŠ<C3BF><C5A0>ŠŠb¢×ÿ®îõONÿýêÚa4ÓM?ûõÜ0ƒA„aëÿøˆˆˆˆŠÿÿÿûú¯ÿÿÿ×÷ÿÿßõúÿ¯ÿÿÿ_ÿÿ»÷ÿúå<E28093> k$ŒÃ%,èÈ<C3A8>þ¿5í{ý¦ÿÿÿôþ¿ÿþ‰{\ÿÿAêëßÿé6xd2lŽD Ý<1B> <0C>ó~Q<>ÿÿA Ä0ƒõ qŸÿþ…4í4=4ïM þº­®šwªÿÿåÝ·#Š'y;òï.ì”tJœ_ÿý=; Òm'ðô›è:N¯ÿñéþžŸV«Iúéõÿú¨ïÿ°b+ýZÿÿò.¼Á“…@ýÿÿ¹« ë{Û W㬆G s<>”lR8¥ÙpÁr#³ØC†\)päiÿÿÁU§ÿ Ã ÿñ!%<25>Êr9sŽaÎçA(ÿ÷º’Ê}Ö¸:%Ÿé¾"""""?þº —3·—Eù¼ÍÂëåëÍq€U#²8\6Dp6ÿ¶µ¸U]7ôýn¼ŸÔDDD·œyg3Ñôl3LY0GÂàxoÿøa+[þÖÓÿV×°¶¢"""""?ÿb˜ %ñÁ…ã°œVÃ
Ø/ÿÚØý¦+j-v)<29>ÿá¬5í{OJïÿö"½„Õ}¦GvÿüDDDDDE”4f…ÿÜDGÿ¯ÿÿÿÿÿÿÿÿÿÿ¿ÿ_ÿÿþ¿ÿËqUûß@ÌÌÑ<C38C>âIŒ¿Zý÷k¯ÿüã"—ïÿõÖÿÿíøõÿÿÓ®×ÿÿö˜ÿë渚G2; œ'„ ü]œû>ÌÈŽò8ÿÿ¢Ç|G ø¸ÂP‡ÿÞ™}+Þžë ýþõõ†­…ÿ®¾¿µÕôO.‰eDûòy“Ïÿÿ¦¶ªé®_ÓïÿÿoþŸÿK¯ÿúýv¤t¦ýëÿó d†áHè¸g#Œ¸f6!.)pÁp†ÌFÁrìÌ)´n3_ßÛýuˆÓ÷UÿB"""""""""?ªý1ÿÿÿÿï÷øÿ­«×ÿÿ q†n1ˆã1˜™vlÍÄb.‹Æã¤]<17>3˜eÁr8ræGP'ÿõ_Ë”%~×ïñvyž#išŒôIæ™üŒÍÄp…Èø†ÁÌ/þïû‰É=w]_ñïÿöÿÿ×ëÿ×·ioðÒÂÚÿÕvŸíŽ¾Å1±ÿÿÿºýßÿý{†žºh8ÿâ#A gH…¡ψÔÚjÿâ"""r¡¿Ú÷þ×ïñ×ý®¿Âÿå“Wßñê¿ÿÿÿÿÿ帾vthý2CL3²oÿú„ïïÿI××ÿDvßÿúÂ?ÿêŸïþ|e3¿ÎGÙ¸Ã:²|º#™£/š šF/ÿáqû¯Eªaÿít¬}4þÓAÿÿÕ?<3F>„{[­Wïÿ¢y—ØJÁû'”NÜŽ=KÊ%ê½{ºWL¤ôð<C3B4>n ÿuØïá‡ï¥û¯_ÿàÿHÒ½êÒYiÿÿ{”åûÇ¿°’ÿÿö¨oïÿØ}çP4#¢.<2E>F$?ÿ¨7ôzÿt°ßñÿ ypQþ©)sÏ~ ß¿þóp|~þï—­ë1éeq|¾uÉŸ4ËÈá8R8`¸ÏmoÌÞºëíŸ|onþ"""""#uùuk z†“jÚÚýÚÚÿö6)Šö8☯X¦+ÿÕ5«]VÕ4ÿü4 U¤ÿ» ÿƒA„øa2p™‡ 'iÁkÿˆˆˆˆâ"""?õÿßÿ¯ÿÿÿµÿÿÿÿÿÿÿËr¨ŽGs2£5/ùn,V4a0Ÿþý;Î_ÿúïÿ'7Ì;ê¿ôÐ}ׯ֟ÿ½ÌÆP)9{êo.Ë°ƒ<_ýB &†¼Ãé Ô ~ÿÕ5«_<C2AB>¿ÿVª¯·È<C2B7> %õøTO2ù¢yÿJ¢WÿQTõÓ×_ jÿþÆ·í_ûßßÚæúÿµéù.Æpz~Ò_ÿÂizéù 2AÈá¶\³d\Èá¨Gÿn/ï¿ýDDDHƒ\ÉŽw"Ãÿòa—?ÔqþËø˜ˆˆÿþgþ«ïé•fG¼HÌ. #A™£d`ŒÙ/—#ˆû#q˜F ò#Æ3`m ¿ûºz·õKíŸb"""""""""""%¹Nt<4E>H¾Pˆñøž#æó28<32>#6Èè<C388>g³à†¸Ú4ds>Þp ÿý…°<E280A6>…KwûQþÅEÇúöÇÿÚm骾ëÿm2 ÷÷ý…ýÚ<06>˵M4ÓM øˆˆˆˆˆˆëÿÿÿÿÿÿûÿúÿå¹Z_âá<C3A2>ðÎÒ'ý„÷òqõôh_ýÿþiÒ4Íšÿ›؆‚„5ÿêša;ÿõ¦šî¿Ö¤â‹Ê#w#» úçV Þðƒ°›þñ•xÖ¿þºäZ×õQ°h_ïùiêÃßÊðU8dpi.dqHñ´xB8„s0eÈœˆñ€ÁpsáKÄ|ŽeÂtÌÈ莫ÿá½}ˆˆˆˆˆˆˆˆˆˆˆ‰ <0C>n9‡!ÜŒs¹ ¡È*G„09˜å<CB9C>m Ô¡¯ý‚.Š
B¸/â""""""#ÿ¢õ7¹=>Êrƒ<>Xrý³í=Ìî±Ü„v„GGâtjŽ¹",Gžf€®GàÜŽåÃ%í^¬/â"""""""#þâ£cîßûµ_ký„Ó_¿ÚaPˆˆˆˆ¯ÿþýÿ׿ÿÿÿÿÿÿ×ÿÿÿÿ÷ÿÿÿÿýÿÿå”´0vpÐS™ÔÈØ0wS"DfFÁ „/Þº©9ꈳR ™‡Þ÷ì«<.­uÿ½zUô”-·ý˜tÿÒé]÷ù.b%k¥éI½ÿ¹œACóhŽ¾×ק~<7E>»ÿ!g<4 >&j28¡2æGhâý/0GŒ gs”3fhßÿ BxNðƒâø¤—I} ~a wÿUµN½ÿ¢câ<63>}é¨Oÿ®¯úr ýk©tòn¿Õ7ë˜A)+¢8ÉG<47>È£ùì%t¿¸Óà =SÓÿUÓI7×%} ðØ_IzNµ%x©+¢8ÉFÿŠÓÖG <09>MÿÁZég»zÒ½…Ó¤ÿó@¿šþ:§÷¯Öïøñ.xkþ*ëëÿ%Wÿþùõø@¿þBŽûúÿê¯ë¡ÉCiî¾ÿþJ¢Q»¯D¯ü‡µ·ïü<C3AF>R(úÿý;Ô—„=k\•¿×µëÇÉ(H{'Ç/þºxAé-®àŸÿøí¾×õþ#ÿCŽ^6> ,ƒŽkªï`ÖÝöÙ²õl.xÿû±ï\q~Ãë$¡÷q×}ݯ]¯ï÷þííGµ¶ÂÚÿWn½…»¿Ö<C2BF>aaª »Ëu†«wzü=;¿"<22>dcÚKâ"4" Јˆ3ñÜ&Ž„<C5BD>°ƒ
™C ÊSh äþ_Ë@ƒÄDDFǯ *ùÚƒI/Æ°’ݤµtºµ¶“]ð`šÝ¤¾0žŽÄŽ«ˆˆ<CB86>{¯ÿÿ×ÿï×ÿÿûÿ¯ÿü®V!;øO,çÊF#²3·Ë4ËÿÕ0šdˆ")jÅ•fL qH)dn Cú'UM5 „5TÁ ÿAÔP©§§¨L&©ý+Ö^0P²oRÉÆÂßÞŒÏKփ˿/4´<17>š/?wúznžõ«ßßårÑåž“YÑ”?ô°Õ/ÖÕ7ÿ±Wþô¨âD0XñKÁ×ýú_«a/#Ž„šWÿ†ú_¾:ˆv“…ÿƒöû]nÒí/þW(þö4êNÛ+õ¸Kéý„ Ý»ZqmSÂZÿúÒ½àƒûñ Ð_þÖúó³Vº_ä½ ¿ûÒ ÊÅ¿ù÷]-´¾ýÿkÕiZ“j—ÿa|ƒëâ¬*°ÖM1Öà |0ƒ_í1ñ±[H† <63>µ´û^øMi Â * n×ÄDDDD4#ñÿÿÿÿÿÿÊâ ¶Ž`¸2ò<]—dpPG\Ì3Áflâ6ˆù¼ú0fl<66>Ž/¡ gŽAdrr ŽBÙ+':à~¤+<2B>°@ðØ9aÚî[(B¸­Š ÎBÙ'úÄDDDû;)ˆÜFâ7(<28>Äj"Q<1B>¢<EFBFBD>ß{ÿÿâÿÿÿÿÿÿÿÿ¯ÿÿÿÿÿ×þÿßûÿÿëÿÿÿÿÿ¯ÿÿÿÿÿÿûÿÿÿÿÿÿÿÿÿÿÿÿÿÿëÿÿÿÿÿÿþÿþC ÿÿÿôÿõëþÿõÿòïù@LÐ>O“îgÏ¿ÿÐÐ}馱"#úÿßûïþúÿÿÿ'~]ÿÿÿþ¿Â~ƒÿ¢ÿÿÿ×ýýÿ}¿÷ÿúÿ¿ÿÿþ¿ÿ]¯_ÿ ýÿ×ÿÿ5Šãÿ<C3A3>ßUÿÿ×ýwÿÿø]ÿûÿþZ|Ã/¬¼zÿÿþ¿ëÃíÿÿþÕ÷õÿÿÿÿÿ¯Çÿö?ýÿÿÿõÿþÿÿëkïÿ_þ8櫪ï÷ÿþ8<C3BE>ÿ+">vuÿÿúü®,MÎáñÎÍsÿõ÷õxA­äC´üêÿÿ¯Þ<C2AF>¿ßúÿïÿäñÉ÷þÿ:òCüÔ¿ÿÐzußQðƒï>:ù@™È&yæ»5Y &¿ýïKý=xzOOý}þ¼à¤áÎ
h3BÍ Ožn#ÇhFe©ìÓ<Fi{ï¿ÓûÕuÓ_þ
uA Ðh±èa BÂa0Ÿ,w 0š~ÿ0ýúz'<27>~Ÿë“‡¿ÿµµÓTûUÓM>µ¤÷ÒoIý½z_¯ÚÿéK¼<4B>õb¥Ý´úµ¢oï[uO_¯õÿ§W×ú÷ÿÇZ^5§§Eã]™x ¦í þïËÎþ?}|ÏÔ{N´<4E>ÿÔ/zÛk±Þž<C39E>'¥¼lt½*ÿ­B(o÷ÿ0ÿÿÿþ%Ä­ĸŒˆ=oÿŽEðwŠë©áñÿ_Oªïù8AWæ¤8ÿý†Ãì/Fb~éëþ¿ïÿõ±ÿåëÿužÿì6†ê ¿¿ÿÚû~—õÿÑ<¿-Êĺ%pòü/_J_„ †ûõÿ'|ÞÿÐü¡zõÿ¥¯ø]ø¯ÂÁñ“r£ßœ@åðn¥õWý_í>8ô´¶[ŸOú×W]|Åçªý×ÓMÓ~iã¯Öÿoý=ý¯ÿ¡ú÷iØ[[ ao°©vþÖÿÖë ûazý._{ íkKÿþ668¨¸¦5Øâö8¦>Ÿ†ø`¾ý/Oß×ý*ýý·i÷kmßkõø†þLjñZÞÅoôÚ¿ÿäQáéöš¯iÜ4×ïþÿ_í~º_ï\DI A hC/R­×Ó´ “­ðÈÑë§Ã^×õê¿-éb"""#ˆˆˆŽ>H<ZÁ•TEÇ}§ÿ /ñƇìqûñïýÿÿÚÿÿýþÿÿ¯_Ž¹˜cÿýÿýÿÿ¯÷ëÿÿÿçGù˜IÇžÏæÿÿþôô  Óþ¾½}CÚÓßýïÿËÎC|›ÿõÿÿéÓ ƒß¥ÿö¿ýú}÷O_õßÿ¿û]ÿÿÿ÷þøcóë_ÿÿþ¬‰­þŸÿÿëý;éZ«ÿþÿÿÃuïÚÿ×ÿÿS©qÿÿë×õüø¦¿ýûúënzöžÿÿÿ¿ûxEÿÿý{á/Xj6õ¿ÿÇøßâ¿×ÿÿ×ïÓÿþ¿á{†«¯ÿßùØX3W ÿõõTT¯»ýoÿÿoÿÿÿÿïÿÿ×ÿÿÿ¿ëÿÿÿßÿÿêê¿ÿõô¿ºªÿÿÿÿÚéúï_ëö¿÷ÿ×ÿÿýwÓýuÿë¿ïÿí?ÿ¿ÿMz_ÈÐä‚ã’‚ËÂä ä-–äÎåá^[#®"F9Ç/ʲ¦PÞýoÿüDDDDDDDDDDDDDDDDH‘ÙÄ`þ"#ø2Ç$ä69 Ì D<ÇüDDD<44>×ei³0¦FÌÛ#æ0ŽË<C5BD>àA292/ˆˆˆˆˆˆˆˆˆ<CB86>ÿÿÿÿïÿõÿþZØÈDã±kåª,¿#y_Âÿiäo"·ËPŸýÚý:ú_ÿÿîÿÿÿÿ¿ÿÿþõÿÿ¯ÿÿþ·ÿÿÿÿÿÿÿÿïÿÿÿ¿ÿÿÿÿÿÿÿÿýÿýÿÿ¿ÿÿ_ÿÿïÿÿýÿÿÿÿÿõÿÿýÿÿÿÿÿÿÿ¯ÿÿ×¾—¯ûõïÿÿ¿ÿëÿÿ¿ÿúÿÿÿ¯ÿÿÿÿÿÿÿÿÿÿ×ÿ¾ÿÿÿÿßÿúÿÿõÿ×úÿÿÿÿß×þ«ÿÿýýuÿÿÿ_ÿÿÿÿ_ºÿëÿßÿúúÿûýWïý¿ïÿÿÿù6)Ͳ5äPïÿúþ[OM<Vë·Ý~¿êvßëÿÿ·ý$êÿÿ¿û_Âÿ¯õÿ¦?ûýÿÿþïÿýÒþ‰<C3BE><ÌÙᙼø¹yñB à¦qΣ8Ìã„ÌÈÎ.G¢<47>#ñv|Š¼n.Î"9™ÆÑô^?gÒ5jhÍY(½ÿÿ¯W¨AøAÄ=©¡z ìhYD.,CŒ!hI¸´, ÌÄ383ìð <C3B0>¾¯ÿÿߤý?WOOÓ®-:Mt\Z jýŸÿÿÿíQ(ú$öH~˜`“jïV“ïOû×ý4×ÿÿúVõOôå÷D­ >½KëRó%d®ÉÞ<14>Ñ;Á"w²8ÉAÜ<>¸Hœ}Š$ÿ×þÿïí]tûõðƒÂë_Òz­Kªza=0<>' ÿAè?ÿý¯Ç¼V½[ýÜ~Ã_u½=oOL/úúëkúA_×ïÿÓ»Hœ'™„ï×m& zú_×_¯ëÛá<C39B>ÁŽÿoªÿøßð_ãíöðz¿õךëý\]?ÿþº…õÚþ½‡¤ûÿþ×þ=‡«¹#äq—™žkÿ×_ÿoR,}ÇÿuÁ·Ûý×ÿ]ýÿoÛЈˆˆ<CB86>uÿÿ­èŸý}yj÷à÷<C3A0>ÇËrsÿÓÿ¢Yÿ^°oÃ}?ÿÿäçb«úÿÿSTÇõ«¾ú„ÿþË¢Á¿$µÿÿéÕ×õ¿ÏŸkÞûîikÚúzÿþžtmy‰ó6=}ÿõ
½­¥ßéwö½®Ÿ¥¯ÿëþ·º¯ÿÿÒÚb߈qìVDZÔ0•Ã#×±¶ Ù/ÿ°ÁvÃ
÷÷ÿþÞ×zÚêéã¶+úzcãÿøø㊊ÿÿÿû"?ä ÿ¬5µ°kúÚ}õÿkiÜ5ïÿÿah0M4Ða™Ü* SM¨ Âß–:ÃAöE~ïû jEÓ^¿¯üDDDDDDDDqDJx0ˆhDE¡e0‰tàÁ ÿßÿäQÿÿékõþ¿õÛ þûÿÂñ_ëÿô·ÿÿþÖÂ_ÿâ Ê7§ÿÿâþ¿ÿ÷ÿÿúïÿýÿÿßÿÿ-ÉYÒ*2)B?þ¿ðˆJSPˆ.Е­ÿ„÷ûîÿôÿµõ×õèŠïùsÿÿÿAÿ¿ÿ¯¿A7׶ÿÿÿ§ÌÇ—"<22>þ|dƒ7™—ý¯¯ Ä0ƒú0†ÿÿÔt§úi¦¿ÿïªu×ÖH~Ÿ¿ïçx½ë'<27>̾þ<C2BE>:'ŠO>µÿëò<C3AB>é=7_õÖÿwIxÿ>§k¡Òt<C392>ï÷ÿöžh'@¿ßÿßÿëá‡øÿÿé}<7D>ßc?‘ˆÔˆfˆYÿ×÷÷ÿôE„BüÌ CTŽ
F›ˆù1¯þÿÿúôOÿ%ߤ"""#ÿÿý«ëõü%ùb?úÿÛÕ¾Ò×ý¿3es—Ž¹@§‘¸Ò#ÇÙB>xˆùˆÍÁÏ2>Gˇ#ƒr>_#†Aÿ°½­®ýumDDDDDDDDDDDDÿþÇÅG¼CcŠëÿß¾ôÛïµÿþ½<C3BE>ïØU!^áÚþ¿þ 'pja #êœ2w ·µÿÿñÇÿýùdûúÿ鄾»k×Lúúÿëþÿýÿ°Âêúÿãÿýÿûõÿÿëÿ÷÷Õ_ÿë´[ª3½2ÿ¿¿e¸(.„­<>Y)‰Tm<54>Í'ÿý~ÿ úyKÿ×ÿ¿ò†Ÿ„ëÿþýIý¯WßÿëÿýÎÎþ×ÿÿëÿÿÿÿÿ÷33¤b4y!åÏ'ÕdèÚÌDù¦ƒÈkÿÿÿ °ˆBðA±D à  ðƒM4Î dqMŸá?ÿþº¦©áž®â8‡ú|XCM×ÿÿý*¯wÚ¼Šëþ—Õ>¿ÿûë'n^díòó%}¿a„ÂR7~H“¾FýÚ_ÿÿÎ N®è6“üžaþ~ƒ¢}„cÿÿë=?ãÓéôõ¥Z_WZN×ÿÿè/×…ÿûÿÛÿ¾´ŸÚ'ÁLŽ†QOAŸ—ÿþôNãÍÿëýö¿ý娈ˆ<CB86>þÿòÓúO õßïíüüE¿¥õÿäQýúÿÿÕþªUˆn3 B8g#ƒÁl÷ÿÿ¢}úäû÷KÿÿÿÞÕˆˆˆëÿÿ¯©cZðE?–¬Ÿ·ëë^Z¿üºûÿÿÕ³;×¼ÒA:úëߙ߿™Éõÿÿ¶<C382>¯¬?mvÿïµõÿ]Q1ßÿø‡ &%ÅØ^*ðîî;[a…ý†µ§ëÿÿ±q÷±_wõqßÇ­ÿÿýÝ­þ©Þ÷ öÕ{µßýÿðÉâ ˜ö¨2V¦p¨÷k¨R(ëk÷i'ÿ¯øˆŽ"""""""""!¡f´Q0šïÿêM”T""#×ÿô×ÿþìWÿÿÿÿûZÿÿkÿãÿÿÿÿÿÿ­ÿ~¿ÿÿÿþýÿ÷Ëp Á¥×ÿøkÿÿï:¿ÿëÈ£ûÿõë.ŒZÿÿûÿÿßÛ0Ž¢‰ÿÿïlÿÿïü<C3AF>ÿÿÿÿÒÿÿÿàZ¿ÿ×ï…M×ÿ÷ÿ¯ÿÿÿ_Úÿÿ×ÿzÿë«ÿß<C3BF>ÁlB8*28`ŽÈâ‘Ù¶Piÿÿÿÿê„DDDDDö·ÿ÷ßÿÿ¯ÕÑ$<24>äèæl3Ä6<14>289ˆáœ¸ÈáËŒŽ Oÿ¿]/¶1&ÐŽÜ2Èà¦]ÃaµÿÿÒþ„DDÿúÿÿýx¯Aµëÿëßÿ×õý¯öBD5þ¿ñÿý{ÿð¾¿üõÿ×ÿþ½ßþKÿü_]þÿõÿõþ¿ÿÿÞÿÿÿrÝc;4~úëPŸþîý?__ÿÿ¯ïÿëúõÿ:²C$ ÎŒéïÿûË“ˆA„ øC1ÿ§ý-M4Ó{þýw];ÓÿõúÜœQ(¢wEåŽNè”ÿôè'A==>è&áÿÿ«ïOMØáªÿ_ÿéãŠðpcé?ýŽ¸ä/<2F>˜>¿ßúþÃoò0˃Is.P<10>28ÈááàAs/ŠSšÿÿ¿Á°ÃûB"""""""@̇1ÌäÎx*²|\‡<Êóßõÿý7ÄDDDDDD{÷ý§3³:Zˆyz÷×ÿÓÚzæù¥©nÉò5"†a<E280A0>4|É€çãÁ#†HgÿÿÝÕö½^ÚÚˆˆˆˆˆˆ<CB86>õýxØaX¨Øaa„˜a_ÿõoM1LqÿÿWh8i¦š¿ÿö  Ði¦¿¿â""" ìÒ?ëÇÿ~ÿÿÿþ¿_ý}ÿÿÿþýþÿÿÿÿÿÿÿïùnJD ˆ¸…b5×ð@ÓB ͪdIÿÿOµÿÿú{oÿÿôFåù¦ëÿÿè†ü8ÿÿý_VÿÿÿNxR<78>r8¦nn>)8¹y™ëÿ§0œZoÚa !ÿïñ[ïÕ5½Wßþ”˜ôH}¢CµoNïOÿ¯ô  ƒrWAëË̼ÉçD®‰æµïý=tõÿ½=? éÿõö:µ^ß½ˆýÿÿÿò %ÿð`»û¤ÿÿð¿¡þÁšþ/ÍÏ'ˆÜu#ŒüB"ŒåÿÈl/ÿJà úüDDDDH-†ÐäÉ9A¬Š4LXãÿþú"?ÿÜDGúUñ_ûY>ÿ# \ÿ»'eO¿þ«Ìïa/˜Ãþ—Cåt2VgB.Ë‘ËÆ̸¥È¾p4Á äp„pÓ#ÅÃ$3?°Ö•µüÎÒTú_¼Îµ×ða+[_l.»kkim…o¯þÇØþ9tñLC<4C>Ž?ëá­¯j)_­§ÿûLˆ>½ªzdGÖïýü0@ÁF•4  `ƒ0<>ƒÉ¿þ""""""""?×ÿ¿õÿþ¿ÿÿÿ×ßÿÿÿÿÿ×ÿÿÿòܵ“³6td¨3—gc*ûý¯ß}ÿûÔÛÿúÿï×ÿÿ]…’“ëÿï¨ïýîê:ÜÚ Jÿÿù¸Žds œ"ñn^<Ó7Ž]Èãÿû¢ÇñUT.!ÿÿõ‘´Ÿ½úÿÿ麬0¤áèŠ;IúR1ÿÿ×á5Ð~< “Ì ßÿÛþéÒßÓïÿÿt¿þ¾Ÿ×Kþÿ¥úðÓ0/õÿ_š€ðR<ŽFÌØ!Ïq™³c0dv~4ˆ!ÿ×ï¿øïýúÁˆˆˆˆˆˆˆ<CB86>[!Èä4ì†Çÿ÷ÿýëýþ"#ÿñ±õÓ_íÿ¯È
£ÿõUü±¿ÿþÞ%r¼Ñò0‰Á Œæa29ömGѳ6)ph6„p< ?öþþiÿëíþ"""""""""#ÿþ“ì/ýÖºÿþ©Ó÷
ÿ¶¶°×ÿ÷Óðâ¿Ø¦6?ÿïþÿûºÿÿúpÕ~ÖÿÿÚ-4ÐeêlAöšÿЈˆˆˆˆŽ#ÿ²È†—õ¨ï¯µ÷ö«¯ƒ ÿcÿÿõÿ¿ÿÿÿùn ⵚ3±%ÿÓU8'ÿ¯TßÿôÿýïHZ¯ÿôØÿÿÒÿçÆHœ]¥Nn>ÎEÈ<45>F™<46>™³ñ’ Ò1Ða4>ÚNІh^š âýkêšz¶?§®šÿû¦µPuÚõ¦µïÒùy—<79>…²¤®‰ãÇ©yDñ×ýßß XMÓ¿ޞ ÿªØ¥kA‡®ÿN—<4E>×ÿø?¬è'×ÿ¼MÿÿØ8õ)Ëýý<>™ò”"FiÈÒ:åezý݃úÿ{¥°ÿ5 2—²ì¾^6ÈùÑÍÄt_);#£VC5ßÞ¡ÿ=õÚàßZÿü<C3BF>Rõªüž·¬÷ÿWƒäëÿý¢õ1õÊú£¢.ˆ˜`q ‡.Èä\#³Æ\ Ã+_z¯„úoÖÒ°Ÿw·â""""""?ø2è$ÞØK[
Úv¶Òßiö"£<>¦*8Øÿb˜ýþÕ5«]BþšÿÐh4×´¦ºÃMð`ƒ0ša3fXᩦƒ4ÿñDDDG_ô¿ïÿýÿÿßþ?ýÿÿ¯ÿùn •†NF†fΗõ‡wÞrÿÞ«IÛýäø¡ÿ«_þ…Þaþ?ÿ}+¿ß¯dì<64>gI}hÞn/¢ÿô|@<40>¿ÇÙ°ŸÿA¦µÌõ»I=ª¦ëuTéÿþ^9;ËÍ?ð°ÂD£ëÐ¥t«÷ü%áÿÕb´ë¤ô—ïÿo°¿ý«¿ßÿŒàKÿ^?ü³¼/ïºïä0<‹ƒ)qÂÑÀØ{.Žfóqó1Ⱦtα¢(F¸¤5ÿºHˆ:þ*?Õˆˆˆˆˆˆˆˆˆ<CB86>ɇ!±Èx ]Êr 9ÂaÿÖÉŽR ÷ë¯Ù¨ˆˆˆˆ<CB86>ÿÂÊ¿_þd‡ƒaÈAÊ«,rÇ!lÎXä‡ ƒ÷ÿa=6×iõì͈ˆˆˆˆˆ‰6vy™¯/œ3ù„r.‚±2WýöºØTº¯µþÅD6?îíŠþ·î»ý×ÿa2 þšúÚþî 4'½5Xpaÿÿÿýÿÿÿÿßÿ×êa? ß ÎÕËÿÿ'6¯ýõþ÷ÿÍÅãäP2Ašf ÿ×aý{úzÿÿÓM;kõ…“¼ØåæJì/ö\,$<24>Z§«ÿ¡­=Š·Oïê»`ÿÿðž6?ÿɧðÿÉÀðk#ƒaSƒ.2äl5M£ Þ\C˜ŒÂ˜Èù3G ¡µõæb¯ñ Î9{"ŽaÈL0åŽIÈ`rœ¨'ì‰aÏ…¹œÐHr<17>2&‡+
Vyœ·ï¹nPå'lœÂþ""""""""""""""#Ö"i=ì”3ŽBÁc!G Ê9Â<>õ®}ß™¿ÄDDDDL…r!ÆÌ<C386><C38C>³Œè)Œ¼a—Ë£4`S#hŽ Èà„p<4ÈáOí~×ñûÅ|>ûµM}ÿV<C3BF>×~–ì  Tÿïÿþ½×ÿÿoÿÿ¯ôûúÿú÷ÿÿþ¿ïU» Ö9× \saÜ­¥9C—Bü¡Ê+%O$äܪäW'aIÊ¡2šÅ<DDDDDDDDDDDDØÇ ¨ä8<>U“ÃÁ:ŽALá|DDDDHw ÖåŽJɹ¸ƒaɹ܄rXMÈ.æƒ9 Øâ"""""""#–€ˆÔ)ˆû=!ämÏ#4HdùD·)ò0!Âœ
`9àk#†ràÊG¢8ňˆˆˆˆˆˆˆˆˆˆˆˆ‰nP<>¢èÙ—<C399>¦~4"84  <0C>¤p[²à†ˆˆˆˆˆˆˆ™z<11>ÿü²¬sW 4ˆÁ<CB86>“Dp†µO;˜3Xó±<
C<10>^a•ùšÁΈÂ\Ês%xÊÚ;á;\Xw“#MM5CÁ3³TD ¦Àƒ´ÿAªÿÞ½ÿ“ 0¥(v°ÈÕÖ«ÿw¿i~Õu°· Ò`«!GõvÿÿÿÒ »É R8„¹Ë()ò?<3F>ÉeëÝõÒ½x ÌÅ.FÌÌÎs>:}r\CC.Ì<D<>P2 YžG<C5BE>q¶P{œÍ$ØÚðÙÙqoh F êJuí ÷¾h\XA A÷ ÉÁpA» a29<10>Ћ™<1C>Œê Ì ½ÏŠt2äNM
ŽY"„&r Ú?G0ºXnÚõ×AÒº&? ñiÚ Ô öø '„Eè50Oð<4F>@òpø ó0˜L¸©É<C2A9>0dâLø¤ä ú^`ŒÀƒâùñd\ŒÅ$úï¨ä Òí:ö“õUÿtÓÓ<C393>Aêœ\~žžƒ‹Ð|_ < q®éT jáh4AŒ¯N˜V¾CŽO¦é†¿TÿUOOém}Uu¼'ßÚiéòcâ½ýô„Óß"¿ÒÂZ\Z"Þ¤W"C»ˆª#‡¢(ï÷» {Òë¥j<C2A5>«!"ä?éÒz]C zz®š ×ȾK*«ðAéù/¼U‘Þ#¼Ž-H·…ò;Ècª#¾µ#<23>ñìRr7†z¿É@$ÿ[_¤õÒWfÄë÷]ðÂõúiRþ
ƒh'á<ð<>øNxëDyÇÓÈñè<C3B1>ȶê%ØAérÈ©Òôé4ˆ¯L¾K<Ž2YÿûPa\&é×X ïâuLŽ×º´ƒW÷¦U?¹?µ××ÓðZ݆žžï¿„ˆªÑ('ÃZ÷þ0ßý%ÿœ uôf®=þ£QÒĸzëÄàZt ˆ*O÷×ÿÈ@zºó<C2BA>;ﯹʱ%W]|>°Ðr=?úþ°—ûÿßú|þŒÃ埡×õÇîZwzZÓÇ ìªŸ_®ÿÿô¿ò(å‘ÿÒÿÿ´—þ |+ÿ#¯ÿÁ¯¿ò(îJ<C3AE>þ8?ì?ÿò(çKﯮˆ·ïèQÿUþ¿ÿ¯üŽ¿ë…ÿ¯ÿoþ úo{þ‚þßï½áŽþ¿ð¿ú…ÚÈÓ¯öKYïØ©çäAËü‰ÿÿÁß×û÷IyÃm?ó7kÖßﶿUøªïáwKëûúý„" pZû$“ÿÈÔ?ýª}ýÙåB¤E¿ÉP zñPÁUƒírœ<72> ÿ×´Ÿ´ÛW<XKóäÚö¾îëý®gö¾h¿þ<C2BF>öÒñÛÕ<C39B>¡  íq5Þ>6?U<>î¬,8×µŽ8õŽ;<3B>}uAúÚØ'ª¶—‚ÿÖÏ™qý®¯çÙ²þïµ_íqv××½ Úê?cì+{„ãáÈl<C388>{°`<60>ÒÆ今Ö_V + &šß¿¦¿v¿ûÞ«Ú{¿~7k<37>v?áëÆõû{ÈHq±ü;"ÃA¯|0¿¦Bh0½Ü5^Õ; §÷÷këkû÷ui{º­ýÚí4pƒ Ü0A õ†» Ùá’{L-ØA„í{ûí»[×Zÿ°šÃ_Mîí+oA¨ˆâ""""""""" q „ÕP2ÇÇ Ü2QÜ2p°Èâ7†ȯu†ÿí ?ð½¯pÓ
Z"*""""#B""!¡Ä4 ªA„-æƒT 4àÁó@ͤ4ááPpÁ8dq\C-…&é
±%~""""""""""8ˆˆŽ#C†
<EFBFBD> ÖŽÅZÚŒ+†•ÚÜ8a5T—-dXiZ‰7+L i¦¢#@ÁDËT(Îû#ŠGeˆèÞCªD êЈaD—²¶dÈAJ³)Ï„ÉÂÞxï´î× iÞ·ÿü-ßß´¿Õ=ÿ®þò+µd§J~µ®Zƒ:¾Èâ“žÐ=è4¿Â B5Ý'¼ Â}åÄ63ã< A“Œ<E2809C>ìèy¡fÌÃ4Dvt gS5¸AÅ¡Ò½×ÿàž šzC3 dá~£ÕZÿ[¤ƒÓOÐâœ'h4ÂuµHx­¯»ý5äž«¦žš%ºÃ®ôšéëj<C3AB>všýä}åWøäXü^GG{ßD ýhŽ2W¹½_¤ö•ÿ î·Òzz«ç
=Ýæx—
×]…6þ½ÖB§ë«UP]ØØ5]† ’þ’ÈÇ-ÿ_ò7ˆ—@©×¾ýò€Ç¹Ô%ýÁÒè_ë}XzýÇÈ'Ÿý »…[Ó~<16>¿ýXpø<Ô†XK¯ä† ÖˆQÿëøl6¡¼/ïøGH<47>ÿt¼­ïô6áºI{kÒB7ÿ|`þ|á}ºéò:tIGßØ[UÂÿ«œ ßj¿Ú߸ lŽ`ùÔÁðÁ~69+Ô:¸Ý{ŽÂöp¯¾Î<1F>Þ¯áþñÓQèŒûcムCJ*?îë]õï×®ï»÷÷k¯Óé­¯¦ÿiöš²#ü2O Bß}ÚVŸðÐ~˜A¦ƒQ<ƒA¡<ˆD$ Žéó Ü Ù?ƒ&öFëi„Ó
"""""4"" Јˆˆ4&¤" ¬ô¯…ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿþ  endstream endobj 2 0 obj << /Length 48 >> stream q 612.00 0 0 792.00 0.00 0.00 cm 0 g /Obj1 Do Q endstream endobj 3 0 obj << /Type /Pages /Kids [ 4 0 R ] /Count 1 >> endobj 4 0 obj << /Type /Page /MediaBox [ 0 0 612 792 ] /Parent 3 0 R /Rotate 0 /Resources << /ProcSet [/PDF /ImageC /ImageB /ImageI] /XObject << /Obj1 1 0 R >> >> /Contents [2 0 R ] >> endobj 5 0 obj << /Type /Catalog /Pages 3 0 R >> endobj 6 0 obj << /Creator (HP Digital Sending Device) /CreationDate () /Author () /Producer (HP Digital Sending Device) /Title () /Subject() >> endobj xref 0 7 0000000000 65535 f 0000000009 00000 n 0000025495 00000 n 0000025597 00000 n 0000025656 00000 n 0000025843 00000 n 0000025892 00000 n trailer << /Size 7 /Root 5 0 R /Info 6 0 R >> startxref 26037 %%EOF

63
web/xv6-disk.html Normal file
View file

@ -0,0 +1,63 @@
<html>
<head>
<title>Homework: Files and Disk I/O</title>
</head>
<body>
<h1>Homework: Files and Disk I/O</h1>
<p>
<b>Read</b>: bio.c, fd.c, fs.c, and ide.c
<p>
This homework should be turned in at the beginning of lecture.
<p>
<b>File and Disk I/O</b>
<p>Insert a print statement in bwrite so that you get a
print every time a block is written to disk:
<pre>
cprintf("bwrite sector %d\n", sector);
</pre>
<p>Build and boot a new kernel and run these three commands at the shell:
<pre>
echo &gt;a
echo &gt;a
rm a
mkdir d
</pre>
(You can try <tt>rm d</tt> if you are curious, but it should look
almost identical to <tt>rm a</tt>.)
<p>You should see a sequence of bwrite prints after running each command.
Record the list and annotate it with the calling function and
what block is being written.
For example, this is the <i>second</i> <tt>echo &gt;a</tt>:
<pre>
$ echo >a
bwrite sector 121 # writei (data block)
bwrite sector 3 # iupdate (inode block)
$
</pre>
<p>Hint: the easiest way to get the name of the
calling function is to add a string argument to bwrite,
edit all the calls to bwrite to pass the name of the
calling function, and just print it.
You should be able to reason about what kind of
block is being written just from the calling function.
<p>You need not write the following up, but try to
understand why each write is happening. This will
help your understanding of the file system layout
and the code.
<p>
<b>This completes the homework.</b>
</body>

163
web/xv6-intro.html Normal file
View file

@ -0,0 +1,163 @@
<title>Homework: intro to xv6</title>
<html>
<head>
</head>
<body>
<h1>Homework: intro to xv6</h1>
<p>This lecture is the introduction to xv6, our re-implementation of
Unix v6. Read the source code in the assigned files. You won't have
to understand the details yet; we will focus on how the first
user-level process comes into existence after the computer is turned
on.
<p>
<b>Hand-In Procedure</b>
<p>
You are to turn in this homework during lecture. Please
write up your answers to the exercises below and hand them in to a
6.828 staff member at the beginning of lecture.
<p>
<p><b>Assignment</b>:
<br>
Fetch and un-tar the xv6 source:
<pre>
sh-3.00$ wget http://pdos.csail.mit.edu/6.828/2007/src/xv6-rev1.tar.gz
sh-3.00$ tar xzvf xv6-rev1.tar.gz
xv6/
xv6/asm.h
xv6/bio.c
xv6/bootasm.S
xv6/bootmain.c
...
$
</pre>
Build xv6:
<pre>
$ cd xv6
$ make
gcc -O -nostdinc -I. -c bootmain.c
gcc -nostdinc -I. -c bootasm.S
ld -N -e start -Ttext 0x7C00 -o bootblock.o bootasm.o bootmain.o
objdump -S bootblock.o > bootblock.asm
objcopy -S -O binary bootblock.o bootblock
...
$
</pre>
Find the address of the <code>main</code> function by
looking in <code>kernel.asm</code>:
<pre>
% grep main kernel.asm
...
00102454 &lt;mpmain&gt;:
mpmain(void)
001024d0 &lt;main&gt;:
10250d: 79 f1 jns 102500 &lt;main+0x30&gt;
1025f3: 76 6f jbe 102664 &lt;main+0x194&gt;
102611: 74 2f je 102642 &lt;main+0x172&gt;
</pre>
In this case, the address is <code>001024d0</code>.
<p>
Run the kernel inside Bochs, setting a breakpoint
at the beginning of <code>main</code> (i.e., the address
you just found).
<pre>
$ make bochs
if [ ! -e .bochsrc ]; then ln -s dot-bochsrc .bochsrc; fi
bochs -q
========================================================================
Bochs x86 Emulator 2.2.6
(6.828 distribution release 1)
========================================================================
00000000000i[ ] reading configuration from .bochsrc
00000000000i[ ] installing x module as the Bochs GUI
00000000000i[ ] Warning: no rc file specified.
00000000000i[ ] using log file bochsout.txt
Next at t=0
(0) [0xfffffff0] f000:fff0 (unk. ctxt): jmp far f000:e05b ; ea5be000f0
(1) [0xfffffff0] f000:fff0 (unk. ctxt): jmp far f000:e05b ; ea5be000f0
&lt;bochs&gt;
</pre>
Look at the registers and the stack contents:
<pre>
&lt;bochs&gt; info reg
...
&lt;bochs&gt; print-stack
...
&lt;bochs&gt;
</pre>
Which part of the stack printout is actually the stack?
(Hint: not all of it.) Identify all the non-zero values
on the stack.<p>
<b>Turn in:</b> the output of print-stack with
the valid part of the stack marked. Write a short (3-5 word)
comment next to each non-zero value explaining what it is.
<p>
Now look at kernel.asm for the instructions in main that read:
<pre>
10251e: 8b 15 00 78 10 00 mov 0x107800,%edx
102524: 8d 04 92 lea (%edx,%edx,4),%eax
102527: 8d 04 42 lea (%edx,%eax,2),%eax
10252a: c1 e0 04 shl $0x4,%eax
10252d: 01 d0 add %edx,%eax
10252f: 8d 04 85 1c ad 10 00 lea 0x10ad1c(,%eax,4),%eax
102536: 89 c4 mov %eax,%esp
</pre>
(The addresses and constants might be different on your system,
and the compiler might use <code>imul</code> instead of the <code>lea,lea,shl,add,lea</code> sequence.
Look for the move into <code>%esp</code>).
<p>
Which lines in <code>main.c</code> do these instructions correspond to?
<p>
Set a breakpoint at the first of those instructions
and let the program run until the breakpoint:
<pre>
&lt;bochs&gt; vb 0x8:0x10251e
&lt;bochs&gt; s
...
&lt;bochs&gt; c
(0) Breakpoint 2, 0x0010251e (0x0008:0x0010251e)
Next at t=1157430
(0) [0x0010251e] 0008:0x0010251e (unk. ctxt): mov edx, dword ptr ds:0x107800 ; 8b1500781000
(1) [0xfffffff0] f000:fff0 (unk. ctxt): jmp far f000:e05b ; ea5be000f0
&lt;bochs&gt;
</pre>
(The first <code>s</code> command is necessary
to single-step past the breakpoint at main, otherwise <code>c</code>
will not make any progress.)
<p>
Inspect the registers and stack again
(<code>info reg</code> and <code>print-stack</code>).
Then step past those seven instructions
(<code>s 7</code>)
and inspect them again.
Convince yourself that the stack has changed correctly.
<p>
<b>Turn in:</b> answers to the following questions.
Look at the assembly for the call to
<code>lapic_init</code> that occurs after the
the stack switch. Where does the
<code>bcpu</code> argument come from?
What would have happened if <code>main</code>
stored <code>bcpu</code>
on the stack before those four assembly instructions?
Would the code still work? Why or why not?
<p>
</body>
</html>

100
web/xv6-lock.html Normal file
View file

@ -0,0 +1,100 @@
<title>Homework: Locking</title>
<html>
<head>
</head>
<body>
<h1>Homework: Locking</h1>
<p>
<b>Read</b>: spinlock.c
<p>
<b>Hand-In Procedure</b>
<p>
You are to turn in this homework at the beginning of lecture. Please
write up your answers to the exercises below and hand them in to a
6.828 staff member at the beginning of lecture.
<p>
<b>Assignment</b>:
In this assignment we will explore some of the interaction
between interrupts and locking.
<p>
Make sure you understand what would happen if the kernel executed
the following code snippet:
<pre>
struct spinlock lk;
initlock(&amp;lk, "test lock");
acquire(&amp;lk);
acquire(&amp;lk);
</pre>
(Feel free to use Bochs to find out. <code>acquire</code> is in <code>spinlock.c</code>.)
<p>
An <code>acquire</code> ensures interrupts are off
on the local processor using <code>cli</code>,
and interrupts remain off until the <code>release</code>
of the last lock held by that processor
(at which point they are enabled using <code>sti</code>).
<p>
Let's see what happens if we turn on interrupts while
holding the <code>ide</code> lock.
In <code>ide_rw</code> in <code>ide.c</code>, add a call
to <code>sti()</code> after the <code>acquire()</code>.
Rebuild the kernel and boot it in Bochs.
Chances are the kernel will panic soon after boot; try booting Bochs a few times
if it doesn't.
<p>
<b>Turn in</b>: explain in a few sentences why the kernel panicked.
You may find it useful to look up the stack trace
(the sequence of <code>%eip</code> values printed by <code>panic</code>)
in the <code>kernel.asm</code> listing.
<p>
Remove the <code>sti()</code> you added,
rebuild the kernel, and make sure it works again.
<p>
Now let's see what happens if we turn on interrupts
while holding the <code>kalloc_lock</code>.
In <code>kalloc()</code> in <code>kalloc.c</code>, add
a call to <code>sti()</code> after the call to <code>acquire()</code>.
You will also need to add
<code>#include "x86.h"</code> at the top of the file after
the other <code>#include</code> lines.
Rebuild the kernel and boot it in Bochs.
It will not panic.
<p>
<b>Turn in</b>: explain in a few sentences why the kernel didn't panic.
What is different about <code>kalloc_lock</code>
as compared to <code>ide_lock</code>?
<p>
You do not need to understand anything about the details of the IDE hardware
to answer this question, but you may find it helpful to look
at which functions acquire each lock, and then at when those
functions get called.
<p>
(There is a very small but non-zero chance that the kernel will panic
with the extra <code>sti()</code> in <code>kalloc</code>.
If the kernel <i>does</i> panic, make doubly sure that
you removed the <code>sti()</code> call from
<code>ide_rw</code>. If it continues to panic and the
only extra <code>sti()</code> is in <code>bio.c</code>,
then mail <i>6.828-staff&#64;pdos.csail.mit.edu</i>
and think about buying a lottery ticket.)
<p>
<b>Turn in</b>: Why does <code>release()</code> clear
<code>lock-&gt;pcs[0]</code> and <code>lock-&gt;cpu</code>
<i>before</i> clearing <code>lock-&gt;locked</code>?
Why not wait until after?
</body>
</html>

78
web/xv6-names.html Normal file
View file

@ -0,0 +1,78 @@
<html>
<head>
<title>Homework: Naming</title>
</head>
<body>
<h1>Homework: Naming</h1>
<p>
<b>Read</b>: namei in fs.c, fd.c, sysfile.c
<p>
This homework should be turned in at the beginning of lecture.
<p>
<b>Symbolic Links</b>
<p>
As you read namei and explore its varied uses throughout xv6,
think about what steps would be required to add symbolic links
to xv6.
A symbolic link is simply a file with a special type (e.g., T_SYMLINK
instead of T_FILE or T_DIR) whose contents contain the path being
linked to.
<p>
Turn in a short writeup of how you would change xv6 to support
symlinks. List the functions that would have to be added or changed,
with short descriptions of the new functionality or changes.
<p>
<b>This completes the homework.</b>
<p>
The following is <i>not required</i>. If you want to try implementing
symbolic links in xv6, here are the files that the course staff
had to change to implement them:
<pre>
fs.c: 20 lines added, 4 modified
syscall.c: 2 lines added
syscall.h: 1 line added
sysfile.c: 15 lines added
user.h: 1 line added
usys.S: 1 line added
</pre>
Also, here is an <i>ln</i> program:
<pre>
#include "types.h"
#include "user.h"
int
main(int argc, char *argv[])
{
int (*ln)(char*, char*);
ln = link;
if(argc &gt; 1 &amp;&amp; strcmp(argv[1], "-s") == 0){
ln = symlink;
argc--;
argv++;
}
if(argc != 3){
printf(2, "usage: ln [-s] old new (%d)\n", argc);
exit();
}
if(ln(argv[1], argv[2]) &lt; 0){
printf(2, "%s failed\n", ln == symlink ? "symlink" : "link");
exit();
}
exit();
}
</pre>
</body>

96
web/xv6-sched.html Normal file
View file

@ -0,0 +1,96 @@
<title>Homework: Threads and Context Switching</title>
<html>
<head>
</head>
<body>
<h1>Homework: Threads and Context Switching</h1>
<p>
<b>Read</b>: swtch.S and proc.c (focus on the code that switches
between processes, specifically <code>scheduler</code> and <code>sched</code>).
<p>
<b>Hand-In Procedure</b>
<p>
You are to turn in this homework during lecture. Please
write up your answers to the exercises below and hand them in to a
6.828 staff member at the beginning of lecture.
<p>
<b>Introduction</b>
<p>
In this homework you will investigate how the kernel switches between
two processes.
<p>
<b>Assignment</b>:
<p>
Suppose a process that is running in the kernel
calls <code>sched()</code>, which ends up jumping
into <code>scheduler()</code>.
<p>
<b>Turn in</b>:
Where is the stack that <code>sched()</code> executes on?
<p>
<b>Turn in</b>:
Where is the stack that <code>scheduler()</code> executes on?
<p>
<b>Turn in:</b>
When <code>sched()</code> calls <code>swtch()</code>,
does that call to <code>swtch()</code> ever return? If so, when?
<p>
<b>Turn in:</b>
Why does <code>swtch()</code> copy %eip from the stack into the
context structure, only to copy it from the context
structure to the same place on the stack
when the process is re-activated?
What would go wrong if <code>swtch()</code> just left the
%eip on the stack and didn't store it in the context structure?
<p>
Surround the call to <code>swtch()</code> in <code>schedule()</code> with calls
to <code>cons_putc()</code> like this:
<pre>
cons_putc('a');
swtch(&cpus[cpu()].context, &p->context);
cons_putc('b');
</pre>
<p>
Similarly,
surround the call to <code>swtch()</code> in <code>sched()</code> with calls
to <code>cons_putc()</code> like this:
<pre>
cons_putc('c');
swtch(&cp->context, &cpus[cpu()].context);
cons_putc('d');
</pre>
<p>
Rebuild your kernel and boot it on bochs.
With a few exceptions
you should see a regular four-character pattern repeated over and over.
<p>
<b>Turn in</b>: What is the four-character pattern?
<p>
<b>Turn in</b>: The very first characters are <code>ac</code>. Why does
this happen?
<p>
<b>Turn in</b>: Near the start of the last line you should see
<code>bc</code>. How could this happen?
<p>
<b>This completes the homework.</b>
</body>

100
web/xv6-sleep.html Normal file
View file

@ -0,0 +1,100 @@
<title>Homework: sleep and wakeup</title>
<html>
<head>
</head>
<body>
<h1>Homework: sleep and wakeup</h1>
<p>
<b>Read</b>: pipe.c
<p>
<b>Hand-In Procedure</b>
<p>
You are to turn in this homework at the beginning of lecture. Please
write up your answers to the questions below and hand them in to a
6.828 staff member at the beginning of lecture.
<p>
<b>Introduction</b>
<p>
Remember in lecture 7 we discussed locking a linked list implementation.
The insert code was:
<pre>
struct list *l;
l = list_alloc();
l->next = list_head;
list_head = l;
</pre>
and if we run the insert on multiple processors simultaneously with no locking,
this ordering of instructions can cause one of the inserts to be lost:
<pre>
CPU1 CPU2
struct list *l;
l = list_alloc();
l->next = list_head;
struct list *l;
l = list_alloc();
l->next = list_head;
list_head = l;
list_head = l;
</pre>
(Even though the instructions can happen simultaneously, we
write out orderings where only one CPU is "executing" at a time,
to avoid complicating things more than necessary.)
<p>
In this case, the list element allocated by CPU2 is lost from
the list by CPU1's update of list_head.
Adding a lock that protects the final two instructions makes
the read and write of list_head atomic, so that this
ordering is impossible.
<p>
The reading for this lecture is the implementation of sleep and wakeup,
which are used for coordination between different processes executing
in the kernel, perhaps simultaneously.
<p>
If there were no locking at all in sleep and wakeup, it would be
possible for a sleep and its corresponding wakeup, if executing
simultaneously on different processors, to miss each other,
so that the wakeup didn't find any process to wake up, and yet the
process calling sleep does go to sleep, never to awake. Obviously this is something
we'd like to avoid.
<p>
Read the code with this in mind.
<p>
<br><br>
<b>Questions</b>
<p>
(Answer and hand in.)
<p>
1. How does the proc_table_lock help avoid this problem? Give an
ordering of instructions (like the above example for linked list
insertion)
that could result in a wakeup being missed if the proc_table_lock were not used.
You need only include the relevant lines of code.
<p>
2. sleep is also protected by a second lock, its second argument,
which need not be the proc_table_lock. Look at the example in ide.c,
which uses the ide_lock. Give an ordering of instructions that could
result in a wakeup being missed if the ide_lock were not being used.
(Hint: this should not be the same as your answer to question 2. The
two locks serve different purposes.)<p>
<br><br>
<b>This completes the homework.</b>
</body>