Hacker News new | past | comments | ask | show | jobs | submit login
Library Genesis in Numbers: Mapping the Underground Flow of Knowledge (2018) [pdf] (direct.mit.edu)
158 points by simonpure on Nov 8, 2023 | hide | past | favorite | 49 comments



Interesting. The authors focused a lot on competing with market vs archival purposes. For me, and I reckon a lot of other people, I used libgen because I just want a PDF or epub that is not muddied with DRM or behind a university wall. Most of the books I download I already have a paper copy of, but want to be able to use it digitally. For me it’s all about convenience. I guess that would be pretty hard to measure.


This is close to my use case as well. I mainly buy e-books these days, but they are often published as an afterthought[1] and not designed for readers – things like maths and plots get mangled in the process and vary from excellent to completely unreadable. So I want something closer to the physical book to look up the equations and visuals, but I don't want to have to go to the library[2] for each one. A PDF scan of the print book is very convenient.

[1] Sometimes they are literally an OCR of the print book! You can tell because of the symbol substitutions.

[2] Even most digital libraries I've tried are very inconvenient to "go to". Archive.org might be the exception.


> Sometimes they are literally an OCR of the print book! You can tell because of the symbol substitutions.

Not really. TEX for example does a lot of weird character substitution behind the scenes to mangle the layout.


how often does TEX substitute "rn" for "m" though?


Comma instead of subscript i is one of my pet peeves.


my friend in college bought texts books before the first semester when their parents took them to the book store. then a classmate showed them libgen and didn't buy any more text books except the one weird custom textbook by some professor.

i guess they buy all their books now that they have a job, but also with generous use of used book stores and physical libraries.

oh and dont buy though amazon, especially kindle.


This pretty much mirrors my experience exactly, but these days I lean a lot more heavily on the Libby app for borrowing from my library which is amazing.

What’s the issue with Amazon/Kindle though, the fakes?


their de-facto monopoly status on basically all things books and increasingly on publishing, and of course their absolutely inane DRM.

libby is amazing, people are completely sleeping on it if you're not using it. free e-books!


Talking to some students, I'm told that people often freely pass a book on to the next semester's students.


it's actually very easy to measure. the audience here is close to the 1%, so your usecase is already discarded in the grand scheme with a 2% error margin when measuring behavior for the large population.


Yup hnews is not a good place to do market research for general population.


But it is a good place to get an insight into what could motivate enough technically minded (and connected) people to sustain such risky projects.


I used it to not have to pay.

I buy 10 books per year off Amazon and the occasional Economist magazine. I am of course not paying for academic journal access or tons of academic books outside my field.


I use it to wade through the masses.

Download 10 ebooks on a subject, scroll through, read a few excerpts, and the one that I actually start reading, I buy from amazon or a local bookstore and actually read it.

Many books look great, but once you start reading them, are total garbage. This is in the field of IT literature though, so books tend to be very hit or miss, and quickly outdated.


I've done that a lot especially with books on raising a child, education etc... Most books are bad, I don't want to pay until I know the book is good.


Great to see this here.

The author Balázs Bodó is an all around great guy with very refreshing research methods and an immensely broad portfolio of things he works on [1].

[1] https://www.uva.nl/en/profile/b/o/b.bodo/b.bodo.html#Publica...


Not to devalue the tireless work of authors, but we must also recognize that students in particular do not have enough money to pay for all these books.

At the same time, it can be argued that writing a single book, and selling it thousands times over, is a bit to "easy" in terms of ways to make money. The hard part is to get people to actually buy your book – it does not matter that you wrote the best book in the world if nobody gives you well deserved attention for it.

Bloggers suffer from the same problem, and it is not necessarily because their work is bad. The truth of the matter just is, nobody cares about quality information anymore (And I am guilty of this as well).

We want fast and summarized answers so we can move on to the important part: solving whatever concrete problem we are working on.

AI will probably delude the little remaining value of information even further, and at a point, nobody will manually write comprehensive information anymore. At least not unassisted by AI, and while the quality may suffer, we must also realize that we do not really need 100% accurate information. If we get a statistically significant amout of accurate AI provided information, then there is no need for anyone to write books anymore. It will be a complete unappreciated waste of their time, and nobody is going to buy them.

Even now that I am in a decent job, I still prefer not to buy books, instead relying on free sources on the internet (not piracy). If a given book/information is not available for free, then it is often not important enough for me to bother (note often – not always).

It is also a matter of prioritizing – reading a book takes me way too long, and the process is far-from comfortable due to my slow reading, and for that reason alone I tend to avoid reading entire books. It strikes me as an antiquated way to gain information even without AI. I may open a specific chapter of interest, but reading the entire thing is painfully tedious, and probably unnecessary.


I read lots of non fiction books. I think the same.

Most of them could condense it's contents in a couple of chapters.

It would be great to have modular books, like Emacs manual. If sections where independent modules you could rearange the book or even create books from a series of sections from diferent books.

That way you could choose different outlines, maybe predefined by the author like, to create books tailed to your needs:

- General Ideas.

- General Ideas + observed cases

- Mixed outline (Concepts + Stories + Conclusions)

- All details about one topic


This is what's done with sufficiently academic books in the hard sciences. They are so modular, that books are simply topics with each chapter written by a chosen author, and those authors will treat the chapters they've written similarly to papers they've had published.

I think if you dive deeper in to more rigorous nonfiction books you'll find that less time is spent on the 'pop' side of popsci literature. Which might be where you're encountering that fluff.


I hate to bring it up, but AI would be perfect for that. It could create logical segues between the new chapter order. So you could basically create books on demand based on real content with only the AI providing context.


"writing a single book, and selling it thousands times over, is a bit to "easy" in terms of ways to make money"

$2 profit per book is perhaps a high figure that an author may recieve from the publisher. 2 x 3000 is $6000 for maybe months or years of work. And this would be a 'successful' author. It's not all JK Rowling out there ya know!


I am curious, what books you found important enough to bother? So my list of (shame) books to read could grow even longer.


Why do people buy books now, instead of just reading reviews?


Obvious troll, but anyway: Reviews are there to help identify if books are worth reading/buying. Reviews cannot and are not supposed to replace books.


Well I'm certainly not trolling you but okay. Have a good day.


Why do people go to see movies when they could just watch the trailer on YouTube?


Well I'm not the one claiming people are going to stop reading because of AI, so I'm not the one to ask :)


I don't understand this question. Book reviews are not summaries. And even they were, "why do people read books, instead of just reading summaries?" is still a ridiculous question.


It is ridiculous. That's the point. Same as suggesting people would stop reading books in favor of summaries written by AI

And indeed, 'review' was the wrong word to use, but I appreciate you understanding what I meant


people do watch movies / show in 1.5x now.. it's common.


>but we must also recognize that students in particular do not have enough money to pay for all these books.

Sure they do. They happen to be able to pay for food, water, electricity, rent, tuition, transportation, pens, pencils, paper, etc just fine.


Academic books are terribly expensive.

When I was at university (Oxford, UK, 2009), I bought about four key books, and spent well over 100GBP. I simply couldn't afford more. I had a student loan which covered tuition and some of my living costs, as well as a bursary from the university which covered some of my other living costs (but not all!).

What was annoying was that our library didn't have enough books for all the students. We'd all be assigned the same reading list for the week, and then have to race to the library to get the books before they were all gone.


If people could download any of those things for free off the internet, they would.


by "just fine", you mean "go into crippling amounts of debt that isn't dischargeable by bankruptcy"?


Imagine someone from a 3rd world country. Let's say India.

Even the books like K&R C, Tanenbaum operating systems or CLRS / Skienna Algorithms will be north of 1000 INR in India.

Count 5 - 6 such books per semester, that's at least 5k - 10k INR. But for obscure books the price often goes to 5K for a single book. Let's say 10K INR.

Which is a significant amount, and for some students can exceed a month's living expenses for a semester.

So they're hesitant to spend that much. Often they end up with shittily written local books.

Now imagine you want to consult some book for specialist topics, like Windows internals or something, you will have to sell an organ.

Source: I was the Indian student.


“Just fine” here means riddled with debt.


Libgen fills an important gap in the paid archives/libraries. There are always some books, papers or articles that are not available through my university library.

It's a huge help to have documents available immediately and not having to pay a small fortune for a single document that might prove irrelevant.


If you are interested in Library Genesis (and shadow libraries in general) I can recommend this book by Joe Karaganis: https://boook.link/Shadow-Libraries


But which shadow library can I find this book on?


Just click on "PDF".


PDFs suck on e-readers.

But you can get an epub on at least one of the shadow libraries :)


They aren't too bad with modern reader software, you can automatically trim margins and set the viewport size and scan pattern to match the text (makes double-column papers much nicer to read).


>PDFs suck on e-readers.

They are delightful to read on a ~13-inch e-reader.


This is an incredibly valuable resource for those outside of the US and EU, where the prices for books in English (imported) are extremely high.


Interesting research. However, in practical terms, I wouldn't count Kindle as a form of digital availability, unless the book is composed of pure text.

Browsing figures or tables is usually a very bad experience. The figures are usually in low resolution. And the tables are sometime just pictures. Even if the publisher bothered to encode the table as a table, if it spans over one screen size, navigation becomes very hard. Not to mention symbols and formulas.


I predict that Libgen/SciHub and pirate sites in generall will be rendered functionally inaccessible in the developed west in the coming years. My reasoning is as follows:

Historically, IP owners have tried various strategies to increase revenues by curtailing infringement. In the early 2000s, IP owners tried to extract revenue directly from pirates via honeypot torrents and ISP subpoenas. Since about 2010, IP owners have shifted towards a more passive approach where they prioritize infringement for commercial purposes only and largely leave piracy for personal consumption alone. It just didn't make sense to chase after teenagers and students who probably wouldn't have had the money to make a legit purchase anyway.

LLMs trained on huge corpuses that include pirated content like LibGen change everything. Now, these IP holders face an existential (or at least severe) threat to their business models in the form of AI generated content. At the very least, these IP holders missed out on a massive opportunity to extract some the wealth created by AIs by virtue of their laxity in going after easily available pirate content.

I'd expect to see a strong swing back towards very draconian enforcement of against even personal infringement: domestic ISP DNS blocking, perhaps even mandatory browser or operating system level blocking of infringement.


library genesis is a god-send.

there's plenty of out of print books or that you won't find at specialized retailers but would find on lib-gen.


In Spain my home internet provider (Movistar) blocks access to LibGen and the like. My university, however, does not. It realizes that de facto these sources serve as the library for its professors. So I wind up having to connect through the university VPN anyway ...


They usually block via DNS, so it's pretty easy to circumvent by using alternative DNS servers such as 1.1.1.1 or 9.9.9.9, or Tor (which you sometimes need for books that have been DMCA'd)




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: