Hacker News new | past | comments | ask | show | jobs | submit login

Having gone through the gut-wrenching task of choosing which data books to throw out on more than one occasion I fully understand the value and sentiment of this project.

As a young hobbyist and later engineer I learned TONS out of data books, application guides and equipment manuals. I'd spend hours paging through data books, learning about the various chips, going through the application notes, building circuits, testing them and studying schematics when equipment actually came with schematics.

Anyone who was "all in" in electronics did exactly the same.

To this day I've kept my National Linear Applications books and a few others. eBooks have yet to capture the speed and convenience of holding a 500 page book in your hands that you can page through and explore. Worst yet, having five or six such books spread across your workbench as you work on a design.

That said, having the ability to search books or, better yet, your entire library, is useful. I don't buy programming books in paper form any more. And, I still prefer PDF to any other eBook format. For me it tends to be a far better experience across platforms.

This thread has made me think about the idea of digitizing my physical books. I find myself thinking about this every few months. I have both engineering and business books that will never be available in electronic form and I would definitely like to preserve them and make the searchable.

Is there a service or a device one could use for this purpose. The linear book scanner seems interesting yet apparently it is known to damage books. A service could be interesting but it would have to be comparable to buying a book, meaning, $20 per book or thereabouts, not $500 (or whatever). This would mean they'd have to have a slick and low cost means to digitize books or monetize the process in some form beyond charging for digitizing.

Building a scanner could be interesting, of course. I'm thinking about bringing this up as a project for the FIRST FRC robotics team I mentor. You never know what the kids might come up with.

Any resources on this front?




When I looked into book scanning a few years ago, the Kirtas (mentioned elsewhere in this thread) was as far as I could tell from Net sources, and remains, the reference method of fast, high-volume, non-destructive, high-quality scanning. Even so, many libraries with Kirtas units still employ someone to stand watch over the page turning and ensure only one page at a time is flipped. Perfect page turning is apparently not a completely solved problem yet.

If procuring (and paying nearly $10K USD per year in maintenance fees) through a hacker collective or maker space is infeasible in your area, then the community at www.diybookscanner.org have a workable solution for a much smaller subset of what the Kirtas units address, so you could look into that as a modest workaround for the time being (though I wonder what results they got for dewarping by simply taking pictures on all the sides of the scanning target to synthetically construct a 3D volume, as perfect dewarping continues to be an open and unsolved problem).


Even so, many libraries with Kirtas units still employ someone to stand watch over the page turning and ensure only one page at a time is flipped

Most books have page numbers; couldn't they use that along with OCR to detect and retry skipped pages? Maybe even a state that shakes the pages more than usual in an attempt to separate ones stuck together. It doesn't sound too difficult to do (perhaps you'd have to tell it where the page number is), given what the Kirtas machine costs.


The challenge seems to be the OCR takes place in a post-processing phase instead of real-time, and the desire is to catch the improper page flip before putting away the book. Perhaps with one or more gigabit pipes, the image processing can take place in the cloud in near real-time.

The Kirtas units seem highly-regarded by conservators; they might have lots of objections to even gentle shaking of their sometimes fragile charges. The impression I get is that the slight vacuum employed by the Kirtas on pages is the most handling that is accepted. There might be recent developments in computer vision and robotic fingers which could see an improved robotic analog to a human page flipper in the future.

My personal hunch is the popularization and (relative) mass adoption of the slower, lower-tech open source book scanners will eventually outstrip the dedicated scanning throughput of the high-end units, and put more digitized content onto the Net, along with a legal fight over content "abandoned" by publishers. When I digitize my content, it goes into my private collection, but I sure wish publishers were more aggressive with digitization of the older material, or lenient with letting that older material go into the public domain if they aren't even chasing the long-long-long tail of that material anymore.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: