[sebhc] new postings to SEBHC archives

Patrick patrick at vintagecomputermarketplace.com
Wed Apr 21 14:26:23 CDT 2004


> -----Original Message-----
> From: sebhc at sebhc.org [mailto:sebhc at sebhc.org]On Behalf Of Lee Hart
> Sent: Wednesday, April 21, 2004 12:47 PM
> To: sebhc at sebhc.org
> Subject: Re: [sebhc] new postings to SEBHC archives
>
>
> Patrick wrote:
> > Lee, reading into your email, I am assuming that by "simplest"
> > format, you mean plain text?
>
> It depends on the original document. If it is just plain typewritten
> text, then a plain text format is sufficient to reproduce it online. For
> example, all the original Digital Research CP/M manuals are simple
> typewritten documents.

Lee, for schematics and other mostly- or purely-graphical documents of
limited size, I agree that GIF or JPEG is a better choice than PDF, and I
prefer to keep them this way myself.

But I disagree on the text documents.  I think few of us has the time to
lovingly OCR most of these documents, and even among the OCR'd documents
I've been given, experience has taught me to mistrust them.  I tried to OCR
a couple of North Star manuals (typewritten or daisy wheel, clearly), and
the results were poor, even though I have one of the best-regarded OCR
products on the market.  The greatest difficulty was not the gross errors,
but the subtle ones that are sprinked throughout the document that result
from technical detail, jargon, and abbreviation.  For example, in the parts
list for the North Star floppy controller board, the OCR program dropped
many decimal points or misread digits, turning .1uf capacitors into 1uf
capacitors, 74LS257 into 74LS251, etc.  There's just a lot of manual labor
involved in proofing and correcting the results, so for my money, I like PDF
even when the scan is bad, because at least I can provide the neural network
(albeit limited :-)) it takes to interpret it correctly.

> If the original was prepared on a word processor with a proportional
> font and a few basic tables and charts, then HTML is a good choice. It
> reproduces well, and even lets the text be reformatted for easier
> on-screen reading (which is probably all most people will do).

You're asking a lot, I think.  It takes a lot of work to reproduce documents
in this form.  Have you tried?  You have all of the problems of scanning and
OCR'ing the text for starters, and then you have to produce the markup that
reproduces the layout, scanning images separately when necessary.   It's
labor-intensive.  What you end up with may be a good result, but it's also a
complex form in that it's comprised of one or more HTML files and a number
of separate graphics.  You can ZIP it into a single file, of course, but
it's still a BOM.

Short of PDF, I think ZIP archives of page scans as GIF or TIFF are probably
most useful and compatible, but I personally prefer PDF because of the
relative ease of producing paper copies from it--it's a single document
form.  For example, I have a PDF scan of the MTR-90 listing.  Because it's
all in one document, I am able to easily print it in a two-up layout
double-sided (so four "virtual" pages per physical sheet), reducing it from
144 pages printed to just 36--easy to handle/flip through.  I don't know of
any printer driver or graphics application that would allow me to take 144
separate GIF files and just merge them into a single document stream for the
printer so it can do the two-up double-sided layout.

As for naming conventions, I favor the use of long filenames, simply because
they are more descriptive and it's easier to figure out if what you need is
there (without reading a separate index file that may not always be kept up
to date--remember, people are volunteering their time to make this stuff
available at all).  All of the *nix FTP clients that I've ever used, as well
as the command line DOS and Windows clients, all allow you to rename a file
when you request it ("GET <remotefilename> [localfilename]"), or even just
type it to the screen ("GET <remotefilename> -").  That's just standard FTP
behavior.  The two GUI FTP tools I most frequently use also have this
facility.  Even browsing an FTP site in Internet Exploder, you can
right-click on a file to save it locally with a different name.  However, I
also agree that spaces and most other punctuation should be banished from
the names used in the archive, because they are just too troublesome on the
command line, and in some cases are even the cause of tragic accidents
(local file overwrites, etc.).

Opinions aside, I would certainly be happy to help any member get an archive
document in a form they can use.  I think it's well within the scope of the
group for people to ask that someone help out by printing a document and
mailing it to them, and I for one would actively step up to facilitate this
process.  Also, I have a server that runs ghostscript for a fax application,
and I am reasonably certain that I can somehow coax at least a
one-TIFF-per-page form out of any PDF.  I will try it and report back.  If
it works, I'll be happy to do it on request (and Jack can archive each
result for future reuse).

But above all, the person who's putting the time and effort into this is
Jack, and while he's very generous with his time, he also has a day job that
puts food on his table, and I for one am not going to be too picky about the
form of what's available, because it might otherwise not be readily
available at all.  I've spent a lot of time and had a lot of both successes
and disappointments trying to get the documents I need in any form at all.
If we all work together to help each other, in keeping with the spirit of
the group as it was originally, I think everyone will get what they need,
one way or another.  And, THANK YOU, Jack.

Patrick

--
Delivered by the SEBHC Mailing List



More information about the Sebhc mailing list