.tr _\(ul
.pl 11i
.ll 6.5i
.ps 10
.vs 11p
.br
.tl '-'''
.de pg
.sp .5
..
.de it
.ft I
\\$1
.ft R
..
.de bd
.ft B
\\$1
.ft R
..
.de he
.tl '-'''
'sp .5i
.if e  'tl '% - Unix I/O System'''
.if o 'tl '''Unix I/O System - %'
'sp .5i
'ns
..
.de fo
'bp
..
.de MS
.ne 4
.sp
.ft B
\\$1
.pg
.ft R
..
.de ms
.ne 4
.sp
.ft I
\\$1
.br
.ft R
..
.sp 1.5i
.wh 0 he
.wh -1i fo
.ps 18
.ce
The Unix I/O System
.ps 10
.sp .7i
This paper gives an overview of the workings of the Unix
I/O system.
It was written with an eye toward providing
guidance to writers of device driver routines,
and is oriented more toward describing the environment
and nature of device drivers than the implementation
of that part of the file system which deals with
ordinary files.
.pg
It is assumed that the reader has a good knowledge
of the overall structure of the file system as discussed
in the paper ``The Unix Time-sharing System.''
Moreover the present document is intended to be used in
conjunction with a copy of the system code,
since it is basically an exegesis on that code.
.MS "Device Classes"
There are two classes of device:
.it block
and
.it character.
The block interface is suitable for devices
like disks, tapes, and DECtape
which do, or can, work in 512-byte blocks
and can be used in direct-access fashion.
Ordinary magtape just barely fits in this category.
Block devices can at least potentially contain a mounted
file system.
The interface to block devices is very highly structured;
the drivers for these devices share a great many routines
as well as a pool of buffers.
.pg
Character-type devices have a much
more straightforward interface, although
more work must be done by the driver itself.
.pg
Devices of both types are named by a
.it major
and a
.it minor
device number.
Typically these numbers are stored as a word
with the minor device number
as the low byte and the major device number
as the high byte.
The major device number selects which driver will deal with
the device; the minor device number is not used
by the rest of the system but is passed to the
driver at appropriate times.
Typically the minor number
selects a subdevice attached to
a given controller, or one of
several similar hardware interfaces.
.MS "Overview of I/O"
[To be supplied.]
.MS "Character device drivers"
The
.it cdevsw
table specifies the interface routines present for
character devices.
Each devices provides five routines:
open, close, read, write, and special-function.
Any of these may be missing.
If a call on the routine
should be ignored,
(e.g.
.it open
on non-exclusive devices which require no setup)
the
.it cdevsw
entry can be given as
.it nulldev;
if it should be considered an error,
(e.g.
.it write
on read-only devices)
.it nodev
is used.
.pg
Th
.it open
routine is called each time the file
is opened with the full device number as argument.
The second argument is a flag which is
non-zero only if the device is to be written upon.
.pg
The
.it close
routine is called only when the file
is closed for the last time,
that is when the very last process in
which the file is open closes it.
This means it is not possible for the driver to
maintain its own count of its users.
.pg
When
.it read
is called, it is supplied the device
as argument.
.a
The per-user variable
.it u.u_count
has been set to
the number of characters requested by the user;
for character devices, this number may be 0
initially.
.it u.u_base
is the address supplied by the user in which to start
placing characters.
The system may call the
.write
routine internally, so the
flag
.it u.u_segflg
is supplied which indicates,
if
.it on,
that
.it u.u_base
refers to the system address space instead of
the user's.
.pg
The
.it write
routine
should copy up do
.it u.u_count
characters from the user's buffer to the device,
decrementing
.it u.u_count
for each character passed.
For most drivers, which work one character at a time,
the routine
.pg
.ti 5
.it "cpass( )"
.pg
is used to pick up characters.
Successive calls on it return
the characters to be written until
.it u.u_count
goes to 0 or an error occurs,
when it returns \(mi1.
.it Cpass
takes care of interrogating
.it u.u_segflg
and updating
.it u.u_count.
.pg
Write routines which want to transfer
a probably large number of characters into an internal
buffer may also use the routine
.pg
.ti 5
.it "iomove(buffer, offset, count, flag)"
.pg
which is faster when many characters must be moved.
.a
.it Iomove
transfers up to
.it count
characters into the
.it buffer
starting
.it offset
bytes from the start of the buffer;
.it flag
should be
.it B_WRITE
(which is 0) in the write case.
Caution:
the caller is responsible for making sure
the count is not too large and is non-zero.
As an efficency note,
.it iomove
is much slower is any of
.it "buffer+offset, count"
or
.it u.u_base
is odd.
.pg
The device's
.it read
routine is called under conditions similar to
.it write,
except that
.it u.u_count
is guaranteed to be non-zero.
To return characters to the user, the routine
.pg
.ti 5
.it "passc(c)"
.pg
is available; it takes care of housekeeping
like
.it cpass
and returns \(mi1 as the last character
specified by
.it u.u_count
is returned to the user;
before that time, 0 is returned.
.it Iomove is also usable as with
.it write;
the flag should be
.it B_READ
but the same cautions apply.
.pg
The ``special-functions'' routine
is invoked by the
.it stty
and
.it gtty
system calls as follows:
.pg
.ti 5
.it "sgtty(dev, v)"
.pg
where
.it dev
is the device number
and
.it v
is a vector in the
.it gtty
case.
The device is supposed to place up to 3 words of status information
into the vector; this will be returned to the caller.
In the
.it stty
case,
.it v
is 0;
the device should take up to 3 words of
control information from
the array
.it "u.u_arg[0...2]."
.pg
Finally, each device should have appropriate interrupt-time
routines.
The interrupt-catching mechanism makes
the low-order four bits of the ``new PS'' word in the
trap vector for the interrupt available
to the interrupt handler.
This is conventionally used by drivers
which deal with multiple similar devices
to encode the minor device number.
.pg
A number of subroutines are available which are useful
to character device drivers.
Most of these handlers, for example, need a place
to buffer characters in the internal interface
between their ``top half'' (read/write)
and ``bottom half'' (interrupt) routines.
For relatively low data-rate devices, the best mechanism
is the character queue maintained by the
routines
.it getc
and
.it putc.
A queue header has the structure
.pg
.in 5
.nf
struct {
	int	c_cc;	/* character count */
	char	*c_cf;	/* first character */
	char	*c_cl;	/* last character */
a
.pg
.in 0
.fi
A character is placed on the end of a queue by
.pg
.ti 5
.it "putc(&queue, c)"
.pg
where
.it c
is the character and
.it queue
is the queue header.
The routine returns \(mi1 if there is no space
to put the character, 0 otherwise.
The first character on the que may be retrieved
by
.pg
.ti 5
.it "getc(&queue)"
.pg
which returns either the (non-negative) character
or \(mi1 if the queue is empty.
.pg
Notice that the space for characters in queues is
shared among all devices in the system
and in the standard system there are only some 600
character slots available.
Thus device handlers,
especially write routines, must take
care to avoid gobbling up excessive numbers of characters.
.pg
The other major help available
to device handlers is the sleep-wakeup mechanism.
The call
.pg
.ti 5
.it "sleep(event, priority)"
.pg
causes the process to wait (allowing other processes to run)
until the
.it event
occurs;
at that time, the process is marked ready-to-run
and the call will return when there is no
process with higher
.it priority.
.pg
The call
.pg
.ti 5
.it "wakeup(event)"
.pg
indicates that the
.it event
has happened, that is, causes processes sleeping
on the event to be awakened.
The
.it event
is an arbitrary quantity.
By convention, it is the address of some data area used
by the driver, which guarantees that events
are unique.
.pg
Processes sleeping on an event should not assume
that the event has really happened;
they should check that the conditions which
caused them to sleep no longer hold.
.pg
Priorities can range from 127 to \(mi127;
a higher numerical value indicates a less-favored
scheduling stiuation.
A process sleeping at negative priority cannot
be terminated for any reason, although it
is conceivable that it may be swapped out.
Thus it is a bad idea to sleep with negative
priority on an event which might never occur.
On the other hand, calls to
.it sleep
with non-negative priority
may never return if the process is terminated by
some signal in the meantime.
Incidentally, it is a gross error to call
.it sleep
in a routine called at interrupt time, since the process
which is running is almost certainly not the
process which should go to sleep.
Also, if a device driver
wishes to wait for some event for which it is inconvenient
or impossible to supply a
.it wakeup,
(for example, a device going on-line, which does not
generally cause an interrupt),
the call
.pg
.ti 5
.it "sleep(&lbolt, priority)
.pg
may be given.
.it Lbolt
is an external cell whose address is awakened once each second
by the clock interrupt routine.
.pg
The driver for the paper-tape
reader/punch is worth examining
as a fairly simple example of
many of the techniques used in writing
character device handlers.
