[sebhc] new postings to SEBHC archives
Dave Dunfield
dave04a at dunfield.com
Wed Apr 21 15:07:58 CDT 2004
>But I disagree on the text documents. I think few of us has the time to
>lovingly OCR most of these documents, and even among the OCR'd documents
>I've been given, experience has taught me to mistrust them. I tried to OCR
>a couple of North Star manuals (typewritten or daisy wheel, clearly), and
>the results were poor, even though I have one of the best-regarded OCR
>products on the market. The greatest difficulty was not the gross errors,
>but the subtle ones that are sprinked throughout the document that result
>from technical detail, jargon, and abbreviation. For example, in the parts
>list for the North Star floppy controller board, the OCR program dropped
>many decimal points or misread digits, turning .1uf capacitors into 1uf
>capacitors, 74LS257 into 74LS251, etc. There's just a lot of manual labor
>involved in proofing and correcting the results, so for my money, I like PDF
>even when the scan is bad, because at least I can provide the neural network
>(albeit limited :-)) it takes to interpret it correctly.
I agree - All of my original documents which were text files I continue to
maintain as text files, however I have given up on trying to OCR and convert
paper documents back to text. Instead I use old-format PDF's and "text qualtity"
resolution - it's much bigger than text, but at least it's still pretty much
managable - what really should be avoided is 10mb+ PDF's these are essentially
unobtainable to someone stuck with dial-up (hey but at least the air's clean
out here!)
>> If the original was prepared on a word processor with a proportional
>> font and a few basic tables and charts, then HTML is a good choice.
>What you end up with may be a good result, but it's also a
>complex form in that it's comprised of one or more HTML files and a number
>of separate graphics. You can ZIP it into a single file, of course, but
>it's still a BOM.
I disagree with HTML - I find that documents which have been converted to HTML
rarely print at one page per page --- I don't know how many pages I've wasted
with 1-2 lines. I also don't like the multiple-files and directoriies which
you end up with in a complete HTML document - For most documents which are
either scanned or require pictures, I use PDF.
>As for naming conventions, I favor the use of long filenames, simply because
>they are more descriptive and it's easier to figure out if what you need is
>there (without reading a separate index file that may not always be kept up
>to date--remember, people are volunteering their time to make this stuff
>available at all). All of the *nix FTP clients that I've ever used, as well
>as the command line DOS and Windows clients, all allow you to rename a file
>when you request it ("GET <remotefilename> [localfilename]"), or even just
>type it to the screen ("GET <remotefilename> -"). That's just standard FTP
>behavior. The two GUI FTP tools I most frequently use also have this
>facility. Even browsing an FTP site in Internet Exploder, you can
>right-click on a file to save it locally with a different name. However, I
>also agree that spaces and most other punctuation should be banished from
>the names used in the archive, because they are just too troublesome on the
>command line, and in some cases are even the cause of tragic accidents
>(local file overwrites, etc.).
I too agree that long filenames (while I don't like them - way too much typing
for a command-line guy like me) are here to stay, and in some cases necessary -
what I really dislike is spaces in the names. As you point out, it's not too
bad when you are using FTP, but long names with embedded spaces in ZIP and other
archive formats are a real pain in the a**. When I unzip a ZIP and get "unable
to create file "read me now", I see red - why not use "ReadMe". Another problem
is files that are not unique within 8 characters - when you unzip an archive
contining:
MyOldComputerFiles1
MyOldComputerFiles2
- and -
MyOldComputerFiles3
DOS ZIP will happly name all three "MyOldCom" - at least it prompts and lets
you rename the files - something I wish it did for invalid names.
What I do not to access such files, is unpack the archive on a windows system,
then use the network client to connect from my DOS box - then windows will
translate the names in valid dos names so I can copy them off.
>Opinions aside, I would certainly be happy to help any member get an archive
>document in a form they can use. I think it's well within the scope of the
>group for people to ask that someone help out by printing a document and
>mailing it to them, and I for one would actively step up to facilitate this
>process. Also, I have a server that runs ghostscript for a fax application,
>and I am reasonably certain that I can somehow coax at least a
>one-TIFF-per-page form out of any PDF. I will try it and report back. If
>it works, I'll be happy to do it on request (and Jack can archive each
>result for future reuse).
>
>But above all, the person who's putting the time and effort into this is
>Jack, and while he's very generous with his time, he also has a day job that
>puts food on his table, and I for one am not going to be too picky about the
>form of what's available, because it might otherwise not be readily
>available at all. I've spent a lot of time and had a lot of both successes
>and disappointments trying to get the documents I need in any form at all.
>If we all work together to help each other, in keeping with the spirit of
>the group as it was originally, I think everyone will get what they need,
>one way or another. And, THANK YOU, Jack.
Yes, thanks very much to Jack and everyone who is helping preserve this
material. As you say, having it available in any form is far better than
not having it at all!
Btw: I don't think this thread is in any way intended to slight anyone
for the formats they are using - just trying to point out that some choices
do make it difficult for some of us, and asking that where it does not make
a difference, if a different choice might be selected in the future.
Regards,
--
dave04a (at) Dave Dunfield
dunfield (dot) Firmware development services & tools: www.dunfield.com
com Vintage computing equipment collector.
http://www.parse.com/~ddunfield/museum/index.html
--
Delivered by the SEBHC Mailing List
More information about the Sebhc
mailing list