ZH version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
77% Positive
Analyzed from 3375 words in the discussion.
Trending Topics
#books#book#more#google#archive#https#companies#money#anna#able

Discussion (123 Comments)Read Original on HackerNews
If it were not for Anna's Archive and Z-Library, I would've never been able to read the books that shaped who I am today, or keep my passion for learning alive.
Thanks, AA and ZLib! (Also, thank you to the authors whose books and knowledge I consumed without being able to pay them back.)
I can't find the post but years ago on Reddit an author posted stats showing when her book turned up pirates online, real sales for it collapsed.
Because of this I make a point of buying books, programming books especially. Yes I download pdfs, I use them as previews. This has led to buying way more than I would have.
Anyway, I appreciate this doesn't apply if you live somewhere that these books can't be purchased. But everyone praising these sorts of sites tends to look at them from only a positive perspective.
I think that's at least a bit debatable. People thought that about (normal) libraries back in the day, but it ended up having the opposite effect.
Not to mention out of print books or academic books which is a big usage of sites like these, since lots of people prefer physical books and only reach for pdfs as a last resort.
This is key for getting epubs to your Kobo.
I've been using MoonReader for many years now and settled on pretty good parameters that make the reading experience very comfortable on both my phone and my tablet.
If you mean stripping drm I used Calibre for that but mostly I just avoid buying books with drm where possible.
Not sure if we will qualify for a bounty, but happy to share! Btw, we are looking for funding from small or large donors who want to help us translate the Renaissance…
I can't quickly tell what all you have archived^, but I have some friends who are academic historians who might be interested in certain categories of work (and could help verify some esoteric languages) - is it possible to search by region or language?
Have you reached out to any types of historians WRT the project? It seems like some PhD students might be able to find some projects in this work etc
^ when I looked at the timeline https://sourcelibrary.org/timeline, I got an error
Please share with historian friends. I’m not great at socials or fundraising but this was really designed to support humanists. It can give DOIs for the versions of the translated books, which means they can be quoted and cited in academic papers.
Tip: Try it in Claude or Claude code (even better)! Just point it towards the source library. It can find quotes and evidence on any topic of interest. Or try the librarian — our source-grounded research agent https://sourcelibrary.org/librarian
Thanks for the feedback, I’ll fix the timeline.
You can add it up!
I have seen Gemini costs change quite a bit when processing very similar books from the same series lately, mainly because thinking tokens have increased about 5x. Has that has happened to you as well?
Edit: for ocr I am using about 15k-25k tokens per page, but I have a complex prompt.
https://redlib.catsarch.com/r/Annas_Archive/comments/1f6h74r...
https://reddit.com/r/Annas_Archive/comments/1f6h74r/im_curio...
True or not I rather like that one.
But I could be wrong.
I am more surprised to see that there are so few alternatives to it. Or perhaps I am unaware of them but after Facebook and co declared war on libgen, and libgen going down, there were surprisingly few alternatives. Anna was one of the few. I still don't know what happened with libgen, but since the attack it really is kind of semi-gone.
Cloudflare captchas have made the internet unusable for me, and I'm sure it will only get worse over time. I'd much rather just browse (or even torrent) a copy of archive.is or similar. The latter would be much better for privacy, and hey, I run ad blockers anyway.
Well, there is this little conflict of interest
And even fewer who are single and childless. (Google would likely go after the estate of anyone who did this.)
financial watchdogs and international treaties make it impossible unless you are perhaps a multi billionaire who can afford to buy people at the political level
Lying about your assets to avoid paying a lawful fine is criminal. Just because they can’t see your money doesn’t mean they can’t prove that you have it, and can’t jail you for hiding it to get out paying a fine.
Between all the piracy, and all the AI training and the purchase/visitor-circumventing AI services, the practice of writing and publishing genuinely good work is being wiped out.
We're killing the goose that lays the eggs, for selfish gain.
Even if projects like AA didn’t have nation-level support, academics would find a way to keep as much of it as possible going. After all, we’re the ones who compiled the bulk of pre-2020 material, and we’re the ones who do all the hard work of scanning from our institutional libraries stuff that doesn’t exist anywhere in digital form.
Most of the best literature in the English language was written before modern IP law was even a thing. There's very little good literature written by authors primarily motivated by money.
I can only think of one writer off hand who wasn’t a wealthy landowner, although it is a particularly notable example; that of William Shakespeare.
Shakespeare wasn’t poor (his parents seem to be of upper middle class standing), he was able to get a basic (but not a university) education and then pursue an acting career (with perhaps a side hustle as a teacher). Whatever the case he certainly wasn’t independently wealthy before he started writing, he needed to earn a living.
He did seem to be in it for the money (and fame) since he wasn’t just a writer he was an actor, theatre owner, and something of a celebrity, and he did make enough money to become a wealthy landowner by the time he died.
> best literature
What does that even mean?
In such a world, isnt it useful that governments are stupid enough to give adversaries reasons to undermine it? When the government props up a corporate tyranny domestically, and racketeering, should we make a temporary alliance with all its enemies?
(Eg., the provision to AI companies of all corporate secretes and competitive practices via prompts, eventually to be used against their capital interests and their labour interests).
We already did that when the internet collectively agreed decades ago that everything digital should be free for anyone.
We're now 20 years downstream of ad-blocking being a virtuous good, and piracy being the ultimate show of liberty, and now suddenly everyone cares about the creator's revenue stream.
The mask slipped and unsurprisingly the internet is a bunch of selfish morally stunted children. Some of them even pushing 50 years old.
Yes, I am talking to you with the 4TB of pirated content, proud of not loading any ads in the last 15 years, and getting enraged over LLM training.
That's oddly-specific :-)
In any case, I have no pirated content that I know off, neither proud nor ashamed of blocking ads[1], but I still get annoyed that a bunch of VCs can use their invested-into companies to launder all the worlds IP, then sell it back to them.
[1] Who feels proud of blocking ads? It's like feeling proud of tying your shoelaces: "Good job, well done, but that's the expectation, son".
The current situation feels untenable with renting. So many regular people I know have learned about VPN, NAS, etc.
Or something like thanks.dev
Spotify, Netflix, Amazon etc provided OK value for a while, but now enshitification is biting, this is due a massive comeback.
> Purchase all Library of Congress MARC datasets — $3,000 bounty
> English Wikipedia pages about relevant institutions — up to $100 per new page
> Internet Archive Digital Lending — $5000 per 1 million pdf files
> Text version of our full library — $20,000
...
https://software.annas-archive.gl/AnnaArchivist/annas-archiv...
It seems like there are some deep pockets funding them.
LibGen is now more or less a dead project. The servers of the original version were reportedly seized a couple of years ago already, and other sites under the LibGen name were notorious for piggybacking the original collection and just plastering it with ads. If one wants to upload stuff, better now to upload it to Z-Lib (not a perfect site, but still) and it will then get picked up by AA in a few months.
https://chatgpt.com/share/6a4970e8-7fe8-83e9-8f81-3aefd76b6b...
On another note, if Google's cybersecurity were always one rogue employee away from a massive leak, then it wouldn't be Google. What was the last Google leak you remember, defense in depth people.
Fundamentally, you can never distill your way to being the teacher, so these approaches will not advance the frontier.
[edit, after thinking about it I think my phrasing is unfair. It's not necessarily that aren't able to do it, but they haven't yet shown that they are willing to do it.]
Are you sure?
What if you distill from 10 teachers?
Not yet.
If there is a need someone will come and fulfill. Personally for me now I do not even want to use top models. Professionally I use AI to help with the coding using Junie agent that comes with IDEs from JetBrains. Junie is told to use Gemini Flash and works fine for what I ("I" being an emphasis here) ask it to do. I tried more advanced models and different vendors only to discover credits going down the toilet without any extra benefit.
This doesn't strike me as a symptom of a bubble - except in so far as the bubble pushes the competitors models forwards and thus they need to invest more to stay competitive.
China acts like an entire bloc, not as single companies, and they want to monetize hardware.
Lots of companies will pick them up for scrap metal prices and host them for fractions of what we are paying today.
That's the nature of bubbles.