                                ==============
                                Web2Text v1.6
          Win95/98/NT Freeware Freeware HTML to ASCII text converter
          ==========================================================

Unlike all the other such programs to be found on the net (that I know of),
this one attempts to create a text file that retains some of the layout of
the web page being converted. Most other converters merely remove HTML
tags, which can leave you with a total mess, and lots more work to do.
Web2Text also keeps URLs intact, which only one other converter makes any
attempt at (and it does it very badly, so I'll name no names).

Installation:
=============

Run SETUP.EXE. You can choose which folder to install to, and whether
to create shortcuts to Web2Text in your Start Menu or on your
Desktop. There is also an option to install Web2Text as a shell
extension; if you choose to do so (this is the default), you will
be able to right-click on HTML pages in Explorer windows and click on
a new menu option titled 'Convert to text'. Two guesses what that
will do :)

The file size of SETUP.EXE, incidentally, should be precisely
192,512 bytes - if it isn't, it may be infected with a virus so go to
http://www.jetman.dircon.co.uk/software/index.html and grab a good copy.

Web2Text has been tested on Win95 and (briefly) on Win98. Previous
versions have worked on WinNT4 so I see no reason why this one will
fall over - please let me know if there any problems.

Uninstalling:
=============
If you chose to create Start Menu shortcuts when installing, you'll
have a shortcut called 'Uninstall FolderSort', so use that. Failing
that you can use Start|Settings|Control Panel|Add/Remove Programs;
pick 'FolderSort' from the list there and click the 'Add/Remove...'
button.

What's new in v1.6:
===================
* Keeping URLs in the converted text is now optional.
* Installation and uninstallation support.
* Optional shell extension.

Using Web2Text:
===============

The two largest boxes list the files and folders they reside in so you can
select which files to convert. Double click on individual files to just convert
them straight away, or single click files (holding down Control or Shift as
appropriate) to select more than one file, then click the Start button (no, not
the Win95 Start button!).

Output file type: type in the extension to give the output text files.

Output line length: type in how many characters you want in each line of the
                    output text file.

Italic character: type in the character you wish to use to denote italic text
                  in the output file (or remove the character here to have no
                  special treatment for italics).

Bold character: type in the character you wish to use to denote bold text in
                the output file (or remove the character here to have no
                special treatment for bold).

Header emphasis: tick this box to have headers and titles emphasised using 
                 dashes and equals signs -==like this==-; clear the box to
                 have titles and headers treated as normal text.

Keep URLs: tick this box to keep URLs visible in the output textfile. Clear
           it to have URLs removed entirely.
           
Start: starts conversion. Strange that.

Font: changes the font used within Web2Text. This has nothing to do with the
      font the output text file uses, that's down to whatever program you use
      to display text files! This option is primarily for users who need to
      view files that have foreign (e.g. Kanji) characters in the filenames.

About: shows copyright and version information.

Exit: no, it's no good, I just can't remember what this one does.

The progress bars at the bottom... well, I'll let you figure those out.

Web2Text handles the following:
===============================

<CENTER> - centers text. However, the ALIGN= property of various tags is *not*
supported. I feel <CENTER> should be used in addition to ALIGN=CENTER because
older browsers may support the former but not the latter. Modern browsers
support both.

<I>, <EM>, <DFN> and <CITE> - text often rendered as italics; surrounded by
                              asterisks *like this*. You can change the
                              asterisk to any character you want (or no
                              character at all) using the 'Italic character'
                              setting box.

<B> or <STRONG> - text often rendered as bold; surrounded by underlines
                  _like this_. You can change the underline to any character
                  you want (or no character at all) using the 'Bold character'
                  setting box.

<TITLE> and <H1> thru <H4> - text within these is displayed with equals/minus
signs around it appropriate to the importance of the text. E.G. <TITLE> text
is --===like this===--

Lists <UL>, <DIR> and <OL> are supported correctly though text longer than one
line will not indent correctly. <LI> (and now <DD>) produces a number for
ordered lists or a plus sign for unordered/directory lists. Make sure you use
<UL>, <DIR> or <OL> as appropriate, as use of <LI> without doing so can
cause problems if you are already in some other type of list. Netscape does
not handle this correctly, but IE does and will display the same results you
get from Web2Text.

Tags that cause a new line: <P> </P> <BR> <TR> </CENTER> </Hx> </TITLE>

All entities are supported, in numeric or mnemonic forms. The only incorrectly
supported entity is &nbsp; which should give a non-breaking space, but
Web2Text converts it to a normal space.

Images are ignored. URLs, providing they are absolute ones, are retained and
held in square brackets after the descriptive text assigned to them; e.g.
<A HREF="http://blah.com">this url</A> becomes 'this url [http://blah.com]'.
Relative URLs are ignored. URL types supported are: http, gopher, telnet,
news, ftp and mailto. Mailto URLs do not keep the URL type, i.e.
<A HREF="mailto:me"> becomes [me] instead of [mailto:me]. All other URL types
keep the URL type specifier.

You can define parts of a HTML file for Web2Text to ignore completely, e.g. to
keep navigation bars out of text-only versions of a page. You do this by
including a comment like this:

...text you want to appear in any converted text file...
<!--web2text-ignore-from-here-->
...whatever part of the web page you don't want appearing in a text file...
<!--web2text-ignore-until-here-->
...text you want to appear in any converted text file...

Similarly you can define text that you don't want to appear on the web page,
but you do want to appear in the text-only version. This uses another special
comment:

...text on the web page...
<!--web2text-textonly-Text you want to appear only in the text conversion-->
...text on the web page...

Advanced Usage:
===============

Most users can ignore this section. Chances are if you have any questions
at all about the stuff here, you don't need to know about it :)

FolderSort accepts several command line parameters, these are:

filename      Convert this file. Uses current configuration. As used by
              the shell extension option.
/I            Re-install shell extension option.
/U            Uninstall.
/U /S         Uninstall silently (no confirmation or completion dialog).
-A            Only allow access to drive A:

The final option above is seful if you wish to deny users any access to
the hard drive of a machine, say in a public access situation. To change a
shortcut to Web2Text so that this option is enabled, right click on the
shortcut and click 'Properties'. Now click the 'Shortcut' tab at the top of
the dialog that has appeared and click on the 'Target' box. Go to the end of
the box (after the final quote if there is one there) and add a space,
followed by -A and then click the OK button.

Problems:
=========

+ Tables are currently poorly supported. <TD> and <TH> are treated as a tab,
  so columns will not line up correctly. Lines that wrap will not indent
  correctly. <TH> when first encountered on a line turns on centering of the
  output text, and this is turned off when <TR> is found. As I don't really
  need better support than that for my own purposes, I'm unlikely to change
  this in the near future.

+ Any tags between the end of a <A HREF="..."> and </A> tag are not processed
  and will be visible in the converted text file. I may fix this at some
  point, but it requires a slight reorganisation of my code so it is not a
  priority.

+ The line-length setting works only for lines without hard tabs in them. If
  your HTML file has hard tabs, they are incorrectly counted as being a single
  space. I couldn't be bothered putting in a tab setting, as you shouldn't be
  using tabs in HTML anyway. You may, however, have used them in <PRE> blocks,
  so watch out for that.

This program is FREEWARE. I accept no responsibility for any harm or loss
caused by the use of it. I.E. if you save over your company's end of year
report, don't come crying to me.

An older 16-bit version is available; see the URL below for details. The
16-bit version is no longer being upgraded or supported because of lack of
time and demand.

Distribution:
=============

Web2Text is freeware. You may not charge money for it, with one exception:
if you are putting out a CD full of shareware/freeware/etc. and want to
include Web2Text you may do so but on the condition that you send me a copy
of the CD (and magazine if bundled). Email me to get my postal address and
formal permission.

Acknowledgements
================

Web2Text's executable has been reduced in size by using Alexey
Solodovnikov's excellent ASPack compression utility, available from:
http://www.alenka.spb.ru/aspack

-- 
Damien Burke
software@jetman.dircon.co.uk
http://www.jetman.dircon.co.uk/software/index.html
