  Quake II Cinematic Specs
  Tim Ferguson.  timf@dgs.monash.edu.au
  v0.01, 4/2/98
  ____________________________________________________________

  Table of Contents


  1. Introduction

  2. The supplied `bin[lowbar]nt/qdata.exe' by Id Software

  3. Video Coding

     3.1 Huffman Coding

  4. Audio Coding

  5. Coding Results

  6. Coded Cinematic Stream

  7. Versions



  ______________________________________________________________________

  1.  Introduction

  To improve the single player story side of Quake, Id Software has now
  included cut scene cinematics in Quake II.  Several people have since
  been interested in how to create their own cinematics and have
  discovered a program released by Id Software in their public source
  dump.

  This document attempts to describe the format of a Quake II cinematic
  sequence (a .cin file) and include some source code for encoding
  (taken from Id Softwares source) and decoding of the cinematic
  sequences.  I will try and keep it simple enough for non-technical
  people to follow.

  In essence, the Quake II cinematics are an AVI sequence where the
  audio is stored in a raw pcm format, and the 8-bit colour lookup table
  based video is coded using a two-pass loss-less static Huffman coder.
  I will go into more detail in the following sections.



     Legal note:
        Quake II is the trademark of Id Software inc.  All technical
        information is copyright (c) 1997 Id Software.  The document is
        not a publication of Id Software, and they probably wont answer
        questions related to it.

        The document is copyright (c) 1998 Tim Ferguson.

        Permission to use, copy and distribute unedited copies of this
        whole document is hereby granted.... etc...  If you have
        anything to add to this document, please contact me.


  2.  The supplied `bin_nt/qdata.exe' by Id Software

  The program supplied by Id Software in their public source code dump
  allows you to easily create .cin cinematic files.  There has been
  information supplied by Jeff Garstecki (stecki@frag.com and
  http://www.frag.com/deconstruct) (and a user made .cin sequence) and
  Paul Steed (psteed@idsoftware.com).  I will briefly re-cap their
  documentation, and go into a little bit more detail.

  The cinematic sequences are stored in the `quake2/baseq2/video'
  directory where they can be played from the console using the map
  command (try typing `map end.cin' from the console).

  To create your own sequences, generate a series of individual frames
  of your animation sequence and save them as 8-bit colour PCX files.
  The file names should be numbered sequentially as [base name]000.pcx,
  or [base name]0000.pcx, (for example: hell000.pcx, hell001.pcx, ...
  hell120.pcx) although qdata can start at any frame.  These files need
  to be located in the `/bin_nt/video/[base name]' directory (in the
  example: /bin_nt/video/hell/).

  Although you can have different colour palettes for your sequences,
  there will be an improvement in video quality if the frames share a
  common colour palette, or if the colour palette is only changed during
  a black frame.  This is due to slow palette switching times.  A
  suggestion is to fade to black, switch palettes, and fade to the new
  palette.  This palette switching can be seen in the ntro.cin sequence
  where it is used several times.  When adding PCX images, qdata checks
  to see if the palette has changed and adds a change palette command to
  the sequence.

  Technically, the frames can be of any resolution, however, the
  standard resolution used is 320x240.  I have tried sequence
  resolutions of 336x240, 176x144 and 360x288 and found that frames
  which are too large or small are scaled to fill the screen.
  Animations are played at 14 frames per second, and Quake II will skip
  frames to maintain this playback rate on slow or heavily loaded
  machines (see the section on `Audio Coding' for the 14 fps
  derivation).  Frame skipping is used to prevent sound from becoming
  choppy.

  An optional sound file can be included in the animation sequence.  The
  source sound must be in a .wav format, can be mono or stereo, can be a
  multiple of 8-bit per sample (usually 8 or 16-bit) and can technically
  have an arbitrary sampling rate (typically 22050Hz or 11025Hz).  The
  file must be placed in the same directory as the PCX files, and have
  the same [base name] as the PCX files (in our example:
  `/bin_nt/video/hell/hell.wav').

  Finally, a QDT script file (.qdt) needs to be created with the
  following information in it:


          $video [base name]  [no. of digits (3 or 4)] [start frame (optional)]




  In our example, we create the file hell.qdt with:


          $video hell 3




  The .qdt file is placed in the /bin_nt directory, and qdata is run
  using the .qdt file as its only argument.  (for example qdata
  hell.qdt).  After a few passes, a resulting .cin file will be created
  in the `/bin_nt/video/' directory which can be viewed using Quake II.
  3.  Video Coding

  If you venture into the source code of `qdata.exe' distributed in Id
  Software's public source dump, you will find the file
  `utils3/qdata/video.c'.  In this file, it can be seen that Id tried
  several techniques to code their video (including a few Huffman
  techniques and an LZ technique) before settling on a two-pass static
  loss-less Huffman coder.

  In the area of image, video and audio storage, there are three
  techniques to reduce file sizes: lossless coding, lossy coding and
  sub-sampling.  Lossless techniques compress data without loss to the
  audio or visual quality, however, obtain very low compression ratios
  resulting in large files.  Lossy techniques, however, sacrifice some
  audio or video quality not perceivable by humans, in return for
  significantly higher compression ratios.  The third way of reducing
  storage requirements is in the same vein as lossy compression and is
  done through sub-sampling.  For video, this includes pixel, spatial
  and temporal sub-sampling in the form of quantising the pixel colours
  to produce a smaller colour palette (eg: 256 colours rather than 16.7k
  colours), lower screen resolutions, and lower video frame rates (15
  frames per second (fps) rather than 25 or 30 fps) respectively.

  Id Softwares cinematic video sequences use two of the three forms of
  compression: sub-sampling and lossless coding.  Video sequences are
  firstly sub-sampled to 8-bit per pixel (256 colours), 320x240 pixel
  frames at 14 frames per second.  The resulting sequence is then
  lossless coded using the Huffman algorithm to achieve approximately
  3:1 reduction from the sub-sampled sequence.  This format would have
  probably been used due to the minimum platform specification in which
  the video is conveyed: on a PC with a 256 colour display, relatively
  slow (P90) processor, and a cheap mass storage device (CD rom).

  If, and most probably when (point release maybe??), Id increase their
  minimum platform to 24-bit colour and a slightly faster processor,
  they could use a lossy technique at 16.7k colours, over twice the
  frame rate and a significant improvement in compression.  The
  improvement in colour, spatial and temporal resolution would greatly
  out-weigh the loss through coding.  An example of this is a sequence
  converted from .cin format to MPEG.  The file idlog.cin plays with
  8-bit colour at 14 fps and is compressed at 2.3:1.  The same file
  encoded using MPEG (
  ftp://ftp.cdrom.com:/pub/idgames2/quake2/graphics/movies/idlog_avi.zip)
  is played with 24-bit colour at 25 fps, and is compressed to
  approximately 13:1.  The MPEG, as is expected, takes significantly
  more processing power to play back in real time when compared to the
  .cin format.  Other forms of less processor demanding lossy
  compression not experimented with include Quicktime and AVI
  incorporating codecs such as CinePak and Indeo Video.  See the results
  section for more .cin sequence compression results.



  3.1.  Huffman Coding

  As stated by Peter Gutmann in the comp.compression FAQ:

       `Huffman compression is a statistical data compression tech-
       nique which gives a reduction in the average code length
       used to represent the symbols of a alphabet.'


  In Huffman's coding technique, stored pixel data is assigned variable
  length codes (VLC) based on the pixel's probability of occurrence.
  Input pixels that occur more often are assigned shorter length codes
  (a fewer number of bits), while infrequent input pixels are assigned
  longer length codes (a greater number of bits).  A static Huffman
  coder achieves this by performing two passes over the video sequence.
  The first pass creates a frequency histogram of the pixels in the
  video sequence, using it to generate the dictionary of VLCs.  The his-
  togram is stored so that the decoder can reconstruct the VLC dictio-
  nary.  The second pass over the sequence pixels stores the VLC that
  corresponds to each input pixel.  You may need to look else where for
  a more in depth discussion on Huffman coding.

  Typically, a histogram of 256 elements is used when constructing the
  VLC dictionary, one histogram entry per pixel value.  However, video
  sequence images contain a high inter-pixel correlation in the spatial
  domain (pixels next to one another are very similar or the same in
  colour), and a significant improvement in compression performance can
  be achieved if both the previous pixel and the current pixel are used
  when generating the frequency histogram.  This is the case with Id
  Software's .cin video format.  The result is 256 histograms of 256
  elements producing a 256 * 256 table.  The rows of the histogram are
  referenced by the previous pixel, and the columns of the histogram are
  referenced by the current pixel.  Since there is a high probability of
  the previous pixel being the same as, or very similar to the current
  pixel, a diagonal line from the top-left corner to the bottom-right
  corner is formed in the histogram indicating areas of high
  probability.  See the included image of the two dimensional histogram.

  When decoding a sequence, the previously decoded pixel is used to
  reference a row of the VLC dictionary, while the stored variable
  length code is used to find the pixel value.  This new pixel value
  then becomes the previous pixel, and the process is repeated.  The
  initial `previous pixel' value is set to zero for the start of each
  frame.

  If you are interested in the video coding of .cin files, most of what
  has been said should be clearer if you look at the supplied source
  code.



  4.  Audio Coding

  Audio data in the .cin cinematic sequences is stored in a raw pcm
  format (uncompressed).  From the sequence header, it appears that any
  sampling rate, sample size and number of channels can be used,
  however, it would depend on what combinations of parameters the game
  can play back.  From the results section below, it can be seen that
  sequences have used sampling rates of 22050 and 11025 Hz, sample
  widths of 8 or 16 bits and either mono (1 channel) or stereo (2
  channels).  Acoustically demanding sequences (speech, sound effects
  and music) such as the intro and end sequence have used a higher
  quality stereo audio, while the less demanding (just speech and simple
  sound effects) cut scenes have used lower quality mono audio.

  When audio is coded into the cinematic sequence, a one second clip of
  audio data (sample rate * sample width * sample channels) is divided
  into 14 chunks.  Each of these chunks is assigned to one frame of the
  Huffman coded video.  This audio segmentation is found in the source
  code supplied by Id Software, and will result in a 14 frames per
  second video play back rate to synchronise with the audio.  More
  information on the sequence format is found below.



  5.  Coding Results

  Some results taken from both the included cinematic sequences, and a
  user made sequence (cave.cin by Jeff Garstecki) are as follows:
       +-----------+---------+-------+-----+------+--------+-----------+-------+
       | sequence  | vid res | rate  | wid | chan | frames | file size | compr |
       +-----------+---------+-------+-----+------+--------+-----------+-------+
       | ntro.cin  | 320x240 | 22050 | 16  |  2   | 2945   | 82836235  | 3.5:1 |
       | end.cin   | 320x240 | 22050 | 16  |  2   |  726   | 19311290  | 3.7:1 |
       | idlog.cin | 320x240 | 22050 | 16  |  2   |   81   |  3159828  | 2.3:1 |
       | eou#_.cin | 320x240 | 11025 | 8   |  1   |    -   |        -  |     - |
       | cave.cin  | 320x240 | 22050 | 16  |  2   |  200   |  5453415  | 3.7:1 |
       +-----------+---------+-------+-----+------+--------+-----------+-------+




  Where `rate', `wid' and `chan' are the audio sampling rate, sample
  width and number of channels respectively.  The `compr' is the com-
  pression obtained in the video only.  From these results it can be
  seen that sequences with smooth coloured areas (ntro.cin, end.cin and
  cave.cin) result in compression ratios of around 3.6:1.  However,
  highly textured sequences such as the idlog.cin (its background)
  result in lower compression.



  6.  Coded Cinematic Stream

  This section describes the very simple and application specific .cin
  file structure.  The .cin file contains a header in little endian
  format as follows:


               32 ...... 2 1 0      Field Name                    Type
              +---------------+
           0  |               |     Video width                   Unsigned long
              +---------------+
           4  |               |     Video height                  Unsigned long
              +---------------+
           8  |               |     Audio sample rate             Unsigned long
              +---------------+
          12  |               |     Audio sample width (in bytes) Unsigned long
              +---------------+
          16  |               |     Audio channels (1 or 2)       Unsigned long
              +---------------+
          20  |               |
              +-             -+
          24  |               |
              +-   . . . .   -+
              |               |     Huffman table                 Unsigned Byte
              +-             -+
       65556  |               |
              +---------------+




  This header contains information on the video and audio resolution, as
  well as a Huffman table used to code the video data.  The Huffman
  table is a 256 * 256 table of byte values (65536 bytes total).

  Following the header, and for each frame of the video, the following
  is stored in the .cin sequence:






          32 ...... 2 1 0      Field Name                    Type
         +---------------+
      0  |               |     Sequence command              Unsigned long
         +---------------+
      4  |               |
         +-             -+
      8  |               |
         +-   . . . .   -+
         |               |     OPTIONAL colour palette       Unsigned Byte
         +-             -+
         |               |
         +---------------+
    772  |               |     Huffman count                 Unsigned long
         +---------------+
    776  |               |     Decode count (D)              Unsigned long
         +---------------+
    780  |               |
         +-             -+
    784  |               |
         +-   . . . .   -+     Encoded Huffman video data    Unsigned Byte
         |               |  (contains decode count - 4 bytes)
         +-             -+
  D+784  |               |
         +---------------+
  D+788  |               |
         +-             -+
  D+792  |               |     Raw audio data                Unsigned Byte
         +-   . . . .   -+  (contains
         |               |    audio width * audio channels * audio rate/14
         +-             -+      bytes)
         |               |
         +---------------+




  As can be seen, the sequence stores one frame of video, and one sample
  of audio per frame.  The above sequence command takes on three possi-
  ble values:

  o  0x0002 - Indicates end of file and no other data follows it.

  o  0x0001 - Indicates the optional colour palette is included,
     followed by the video and audio data.

  o  0x0000 - No colour palette is included, just the video and audio
     data.

     The Huffman count indicates the number of coded bytes to follow
     (including the decode count), and as expected, the decode count is
     video width * video height.


  7.  Versions


     0.01: 4th Feb 1998

     o  Initial release of the specs.







