Visualizing All ISBNs

(annas-archive.org)

393 points | by RyanShook 442 days ago

18 comments

graypegg 442 days ago
I see that bounty at the bottom, so tossing away my chances here, but this visualization is just asking to be mapped onto a Hilbert Curve. [0] When you "stripe" the data like this, points that are sorted close together could end up pretty far apart, since a distance in the Y axis skips an entire row of data as you move down, rather than a distance in the X axis which is 1-to-1 with the source data.
If you map it onto a hilbert curve, the X and Y axis mean nothing, but visually points that are close together in the sorted list, will be visually close together in the output image.
Since the first part of an ISBN is the country, then the second part is the publisher, and the third part is the title, with a check sum at the end, I would remove the checksum and sort them each as a big number. (no hyphens)
You should end up with "islands", where you see big areas covered by big publishing countries, with these "islands" having bright spots for the publisher codes.
Bonus points for labeling these areas!
I set up something a while ago [1] for an interview that does this with weather data. It makes the seasons really obvious since they're all grouped together.
[0] https://en.wikipedia.org/wiki/Hilbert_curve
[1] https://graypegg.com/hilbert (https://github.com/graypegg/hilbertcurveplayground code if anyone wants to go for the prize using this! Please at least mention me if you decide to reuse this code, but I can't stop ya lol)
[-]
- abetusk 442 days ago
  And there's a generalized Hilbert curve, the Gilbert curve, for non powers of two rectangular regions [0] (online demo [1]).
  [0] https://github.com/jakubcerveny/gilbert
  [1] https://jakubcerveny.github.io/gilbert/demo/
- n2d4 441 days ago
  What property makes the Hilbert curve desirable compared to, say, a snake pattern, with which neighbouring ISBNs are also neighbours in the visualisation?
  The worry I have with Hilbert curves is that they make the result look like there are distinct "squares" of data [0] when really this is just an artifact of how Hilbert curves work. In that sense, the current visualization is more useful, because it's straightforward to identify the location of each country in it.
  [0] https://raw.githubusercontent.com/jakubcerveny/gilbert/maste...
  [-]
  - graypegg 440 days ago
    In a snake pattern, the neighbouring pixels on the left and right are related, but the ones above and below have skipped a whole row.
    And yeah that’s true! you end up with squares with Hilbert curves. But those squares are all « related » data. Then those squares are related to the squares near it. Zoom out more and that grouping of squares is related to the neighbouring macro-squares etc etc.
    Basically the square shape is a positive. Kind of like how charting the derivative lets you see how random/related information is, grouping into these squares gives you a visualization of pattern-ness, rather than any specific measurement.
    [-]
    - n2d4 440 days ago
      > In a snake pattern, the neighbouring pixels on the left and right are related, but the ones above and below have skipped a whole row.
      But this is also true in Hilbert curves across the boundaries of the "squares" that I mentioned. The two center pixels in the top row are much more distant than any two pixels would be in a snake pattern.
  - NooneAtAll3 441 days ago
    > What property makes the Hilbert curve desirable compared to, say, a snake pattern, with which neighbouring ISBNs are also neighbours in the visualisation?
    2D neighbourhood is better than 1D one
    > The worry I have with Hilbert curves is that they make the result look like there are distinct "squares" of data
    that's the point, tho? instead of distinct lines of taken ISBNs in a row, you get distinct squares if taken ISBNs in a row - much more noticeable
WillAdams 442 days ago
The thing is, ISBNs aren't hierarchical --- they are bought in blocks (or even individually at an exorbitant markup, says the guy who bought one to reprint a single book), so this doesn't show anything really interesting/useful.
A visualization using LoC or even Dewey Decimal would be far more useful, esp. if it also linked to public domain and copyright-free repositories/lists, say an interactive and visual version of John Mark Ockerbloom's:
https://onlinebooks.library.upenn.edu/
[-]
- est31 442 days ago
  ISBN's are hierarchical, what do you mean? Like Gaul, ISBNs are divided into multiple parts, where one part is for the language, another is for the publisher, and the last is for the title. The last part is a checksum. https://en.wikipedia.org/wiki/ISBN#Overview
  [-]
  - WillAdams 442 days ago
    Yes, but this internal hierarchy for an issued number doesn't tell anything beyond those facts about a specific edition of a specific text.
    One can't use ISBNs alone to create a hierarchical listing of texts which is useful for anything beyond browsing by language/publisher/order in which the ISBN was generated.
    A visual and interactive representation of books by LoC or some other cataloging system would actually be useful.
    [-]
    - PaulHoule 442 days ago
      I got into an argument with the manager of South End Press back in '94 about whether 'Futuresplash' (soon to be Macromedia Flash) had a future, he thought it did and he was right.
      Years later I was working at the library and got a little bit steamed because South End Press was reusing ISBN's after books went out of print which was allowed but, I think, lame.
      One of my strategies for researching a topic is looking a few up in the OPAC, finding them in the stacks, and finding more books on the topic in those areas. (In the Library of Congress system, machine vision could be under QA56 with the rest of computer science or around TA1630, thus "areas".)
      From time to time I've thought about trying to replicate the feel of this with some kind of UI given that our library moved a lot of the collection into deep archives and we have a very fast 'Borrow Direct' service with other peers)
    - convolvatron 442 days ago
      totally agree, but thats not in the data. however, since blocks are assigned to agencies associated with countries and publishers, you might find some utility in showing coverage by likely language and/or country of origin and date.
- MarceColl 442 days ago
  It shows what they want to show, which is mostly how much of the world books they have. Hierarchical has nothing to do with it.
  [-]
  - Finnucane 442 days ago
    It only sort of shows that. ISBNs are issued by edition, not title, so many books would have more than one. And books published before 1970 or so might not be represented at all if they have no recent edition.
  - NoMoreNicksLeft 442 days ago
    They can't even have a tiny fraction of the world's books. Each edition of the book gets a new ISBN... if a book is released as a paperback, hardback, kindle edition, pdf, and epub then there are supposed to be five ISBNs.
    The vast, vast majority have only been released as dead-tree versions. They have none of those. The books they scan may have an ISBN, but the scans do not have them. Like all Project Gutenberg books, their books have no ISBNs at all. From a strict point of view, they've released new editions of these books.
    [-]
    - nickelpro 442 days ago
      Worthless semantics in the context of the mission of the project.
      What you've described is that the archived content can be mapped to multiple ISBNs. It's clear the only element of concern here is the content itself. The failure to preserve a particular binding or printer's choice of typeface is irrelevant.
      Failing to recognize this requires an almost malicious level of pedantry
      [-]
      - jameshart 442 days ago
        A successful archival of one of those ISBNs will light up; four of those ISBNs remain dark. Yet they have that content archived. It means that lighting up the entire grid is not necessary to achieve their goal.
        Indeed a bigger problem is that it’s much harder to know which areas of the grid are never going to light up because the ISBN has not been used.
        [-]
        nickelpro 441 days ago
        This is a separate problem, but a notable one.
        Lighting up the entire grid is still the goal, you're describing the problem of ensuring the right set of squares is illuminated for each piece of archived content. One is a problem of archiving the content, the other is a problem of bookkeeping.
      - NoMoreNicksLeft 441 days ago
        >Worthless semantics in the context of the mission of the project.
        Hardly worthless... often times, the edition of the book matters as much as the title. Steven King wrote two books named The Stand, and one isn't anything like the other. He pulled a Lucas pretty early on.
        He's hardly the only author to ever do this. But it's not just authors either. Editors, collectors, translators all make their mark, and give you works that though they might be slightly different to you, the differences actually matter to the rest of us. It's not that you're ignorant that offends me, it's the arrogance about a subject you seem to know so little about that makes it difficult to tolerate.
        There is no pedantry here, just a desire to actually preserve books and to organize them.
        [-]
        nickelpro 441 days ago
        > Steven King wrote two books named The Stand, and one isn't anything like the other
        Then those two texts would map to different ISBNS, or perhaps each maps to multiple different ISBNs, it doesn't matter. That some texts exist with the same title but different content is similarly irrelevant.
        The content is all that matters. Two different bodies of content, two different entries in the archive. Each entry may map to one or more ISBN numbers.
        > the differences actually matter to the rest of us
        The only differences that matter are what matters to the archive that made the blog post. Your concerns are for entirely different things, which is fine, but don't say the OP's concerns or initiatives are impossible or ill-suited based on a criteria you're projecting onto them.
    - mmooss 442 days ago
      > The books they scan may have an ISBN, but the scans do not have them. Like all Project Gutenberg books, their books have no ISBNs at all. From a strict point of view, they've released new editions of these books.
      Are you saying they actively remove ISBN numbers from scans? If I downloaded one of the books, it wouldn't have an ISBN?
      Why? That seems like a bunch of extra processing per book, makes it harder for users to specifically identify a book, and probably does nothing for legality. Also, can people search by ISBN?
      [-]
      - Tomte 442 days ago
        > Are you saying they actively remove ISBN numbers from scans?
        No, he‘s playing the pointless „well, actually a scan of a book is a different thing from the book itself“ game.
        [-]
        NoMoreNicksLeft 441 days ago
        No, I'm saying that the ISBN doesn't describe titles, it describes editions, and editions matter.
        [-]
        nickelpro 441 days ago
        You said:
        > From a strict point of view, they've released new editions of these books.
        And this is clearly a semantically worthless distinction from the point of view of the archive.
        When different editions have different content, archiving those differences in that content may matter (arguably not for simple typographical corrections, printing errors, etc). When different ISBNs have identical content, it is totally irrelevant to the goals of the archive.
        [-]
        edflsafoiewq 441 days ago
        This is addressed somewhat in the "The critical window of shadow libraries" post
        > Until now, the only options to shrink the total size of our collection has been through more aggressive compression, or deduplication. However, to get significant enough savings, both are too lossy for our taste. Heavy compression of photos can make text barely readable. And deduplication requires high confidence of books being exactly the same, which is often too inaccurate, especially if the contents are the same but the scans are made on different occasions.
        Finnucane 441 days ago
        A text may be derived from an edition with an isbn, but the isbn wouldn’t apply to that file, it is effectively a different edition.
- omoikane 442 days ago
  One thing it shows is how ISBNs are allocated much faster than they are used, judging by the amount of black pixels.
  The image contains 1000*800 pixels at 2500 ISBNs per pixel, so it's visualizing 2e9 ISBNs. ISBN-13 contains 12 digits plus one check digit, so we might have expected the image to be 500 times bigger/denser than the current image. The fact that it's at its current size suggests that only ISBNs with 978 and 979 prefixes are included, and since the bottom half is more sparse, that probably corresponds to the new 979 range.
skrebbel 442 days ago
I thought it was my color blindness that made me not able to distinguish between the red and green pixels as described (i only see red and black ones), but even with a browser extension that counters color blindness i can't distinguish more colors. Is this just me, or is the graph weird?
[-]
- saithound 442 days ago
  Fwiw (not color-blind) I can see red, green and black pixels. The graph doesn't look weird to the naked eye.
  Find the interactive visualiser by scrolling down, and switch it to "Files in Anna's Archive [md5]". This will highlight the location of the green pixels in grey.
- Muehe 442 days ago
  If you have red-green blindness like me try this:
  - Right-click the image and select "Inspect".
  - Add a new CSS hue-rotate filter to the element:
```
    element {
       max-width: 100%;
       margin: 0 auto;
       filter: hue-rotate(-90deg);
    }
```
  Usually I use "filter: saturate(100);", but that didn't really work well for this image. You might have to adjust the rotation degree though, -90 worked best for me.
- superzamp 442 days ago
  The graph seems to be alright, there are indeed red and (some) green pixels, looks like an issue with your extension unfortunately.
- Finnucane 442 days ago
  I am also color blind and the graph is not good.
- rendx 442 days ago
  I see green dots and a few lines of green dots. Did you try zooming in?
- thaumasiotes 442 days ago
  I see red, green, and a bit of yellow. I assume the yellow is what happens when the red and green pixels come too close to each other.
- psychoslave 442 days ago
  No idea of were the issue might land, but I can see the difference in colors.
- asfasdfasdfn 442 days ago
  The graphs are very easy to read, albeit depend on your ability to distinguish between red and green.
  Can you change the green channel to blue to better view it?
glimshe 442 days ago
Anna's archive is one of the wonders of the world. If we almost destroyed our species but Anna's archive endured, there would be hope for a relatively expedient reconstruction.
[-]
- wayathr0w 440 days ago
  >relatively expedient reconstruction
  If self-destruction is a necessary premise here, is that really a good thing?
jdblair 442 days ago
It appears that the IP of the server is blocked in the EU. I get this from my ISP (Ziggo, in the Netherlands):
Deze website is geblokkeerd
Europese sancties
De Raad van Europa heeft besloten dat de websites van RT (voorheen Russia Today) en Sputnik News niet meer mogen worden doorgegeven. De website die je probeert te bezoeken, valt onder deze Europese sanctie.
VodafoneZiggo is verplicht de sanctie uit te voeren en heeft de website geblokkeerd.
[-]
- voytec 442 days ago
  Works in Poland, but here you go:
  https://web.archive.org/web/20250106112552/https://annas-arc...
- jdblair 438 days ago
  UPDATE: I updated my DNS server config (I run my own already) to use root DNS rather than forward to my ISP, problem solved.
- hk__2 442 days ago
  No issue here in France.
- manosyja 441 days ago
  Running your own recursive resolver has certain advantages…
  [-]
  - jdblair 438 days ago
    And I was so close! I just disabled forwarding to my ISP DNS on my home DNS, now there is no block.
- usr1106 441 days ago
  No issue in Finland.
billpg 442 days ago
Anyone else seeing this?
"This server couldn't prove that it's annas-archive.org; its security certificate is from *.hs.llnwd.net. This may be caused by a misconfiguration or an attacker intercepting your connection."
[-]
- masfuerte 442 days ago
  Yes. A DNS request for annas-archive.org to my ISP (EE in the UK) returns an address for www.ukispcourtorders.co.uk, which also gives a security warning. If I click through the warning on either site I get an HTTP 400 error.
  According to Wikipedia, www.ukispcourtorders.co.uk used to list the blocked domains and the court orders responsible.
  https://en.wikipedia.org/wiki/List_of_websites_blocked_in_th...
- c0balt 442 days ago
  No, sounds like you are being mitm for them. Though the domain appears like a legitimate CDN.
- usr1106 441 days ago
  I get a valid-looking cert issued by Google Trust Services. Finnish ISP's DNS.
- swores 442 days ago
  Same for me
quink 442 days ago
Kind of hard to tell what corresponds to what in these graphs, maybe if someone could point out Bookland (i.e. 978), it would be a bit easier to orient oneself?
[-]
- seszett 442 days ago
  Making it easier to visualise is the whole point of the bounty announced by this post.
greenie_beans 442 days ago
is it illegal to download and use their isbn file? like what is wrong with having that information?
[-]
- karel-3d 442 days ago
  I don't think this page, which links to libgen and sci-hub, is that concerned about copyright.
  [-]
  - greenie_beans 442 days ago
    annoying non-answer to my question. i already know all about anna's archive. i'm asking if a person can download these isbns and use them to make data visualizations without fear of breaking a law? https://software.annas-archive.li/AnnaArchivist/annas-archiv...
    [-]
    - qingcharles 441 days ago
      Seeing as nobody has provided a real answer. The question is, maybe.
      Anna's Archive is getting sued currently for scraping vast amounts of essentially public metadata which was being gate-keeped by a single organisation.
      Here's the longer and more complicated answer for you:
      https://libraries.emory.edu/research/copyright/copyright-dat...
      [-]
      - greenie_beans 441 days ago
        feist is what comes up when i search around, too. the ISBNs might be poisoned if anna broke terms of service to get the ISBNs
    - karel-3d 442 days ago
      Sorry, I misunderstood your question.
    - salomonk_mur 442 days ago
      They explicitly provide that data for you to do as you wish. They are in a grey area, not you. You can download it no problem.
      [-]
      - greenie_beans 442 days ago
        is there legal precedent for that?
        already asked LLMs so please don't copy/paste an LLM response.
        [-]
        eemil 442 days ago
        Depends on your jurisdiction.
whataguy 442 days ago
> Each pixel represents 2,500 ISBNs. If we have a file for an ISBN, we make that pixel more green.
What do you mean by "more green"? I don't see any shaded green.
And I presume the black pixels are unregistered ISBNs?
[-]
- slyall 441 days ago
  I'd suggest you try a color blindness test. The green is very obvious, especially about 40% of the way down the whole image.
  [-]
  - whataguy 438 days ago
    No, I see the green, but I don't see any shaded green. Though this has probably to do that ISBNs are distributed in blocks and every pixel is either red or green?
- lmm 442 days ago
  If you look closely there are definitely some brownish pixels and some dim greens.
eporomaa 442 days ago
Hm, I got:
"...
European sanctions
The Council of Europe has decided that the websites of RT (formerly Russia Today) and Sputnik News may no longer be transmitted. The website you are trying to visit falls under this European sanction.
..."
[-]
- reddalo 442 days ago
  I think the website is censored at DNS level but they chose the wrong error page.
  In Italy it just errors out with a NS_ERROR_CONNECTION_REFUSED.
  [-]
  - flir 442 days ago
    You're just cleared up a minor mystery I never bothered to investigate (BT, UK). Thanks.
    Flipping DNS to 8.8.4.4 fixed it for now but I really need to move this connection to A&A.
- TonyTrapp 442 days ago
  Works fine here from a European IP.
  [-]
  - jaapz 442 days ago
    It's blocked at least in the Netherlands. Weirdly it mentions it being part of the sanctions against Russia, while from a cursory search I only found a judge ordering the site to be blocked because of copyright issues (thanks Brein). They probably just show the wrong error page?
    [-]
    - Cthulhu_ 442 days ago
      Must be ISP specific, I'm also in NL and can access it fine.
      [-]
      - jaapz 437 days ago
        I'm on Ziggo
    - rchard2scout 442 days ago
      It's blocked by my corporate networking filter for me, in the category "Illegal downloads". So the Russian sanctions message is probably incorrect indeed.
    - rollulus 442 days ago
      I'm also in NL. Ziggo's DNS server blocks it:
      $ dig annas-archive.org @89.101.251.228 annas-archive.org. 360 IN CNAME unavailable.for.legal.reasons. unavailable.for.legal.reasons. 339 IN A 213.46.185.10
      213.46.185.10 serves a generic page mentioning Russia Today and the Pirate Bay. Not sure which one applies here.
      [-]
      - seszett 442 days ago
        > CNAME unavailable.for.legal.reasons.
        Not really standards compliant, but an interesting use of DNS.
      - Freak_NL 442 days ago
        Same for KPN:
        http://195.121.82.125/
        Would Tweak have blocked this? Most households in the Netherlands currently have the choice of Ziggo, KPN, and Odido. Long live VPNs…
        [-]
        xp84 442 days ago
        Is that three broadband providers serving the same address?? You guys are so lucky you don’t even know. In America we generally have a choice of one if you aren’t including Starlink or legacy slow satellite. And perhaps a joke of a 1-6Mbps DSL option in some parts.
        [-]
        reddalo 441 days ago
        Oh wow, don't look at Italy so! At my current address I have coverage from at least 7 different providers (even though they're all based on only 3 different infrastructures/lines).
        [-]
        xp84 438 days ago
        Three usable lines to your home??? I hope you're happy, you've made at least one American cry today.
        [-]
        reddalo 437 days ago
        Yes. One of those three lines is based on the old copper phone lines; the other two are optical fiber (FTTH).
        I currently have a 1 Gbps down / 300 Mbps up unlimited connection, and I pay only 16 euros (~16 USD) per month.
        I wonder why the US is so bad on home internet connections, but maybe it's because of the scale of your country?
- powerhugs 442 days ago
  Switch DNS to like 1.1.1.1 (Cloudflare) or 8.8.8.8 (Google)
usr1106 441 days ago
What is Anna's archive and why is it blocked by law enforcement in several European countries (EU + UK)?
[-]
- nout 441 days ago
  It's the largest collection of books in easy to download formats for e-readers (often epub).
  [-]
  - usr1106 441 days ago
    So blocked because of copyright issues?
ge96 441 days ago
Ooh prize money, D3 those are fun, where you can map a million things/zoom into it
friend_Fernando 441 days ago
Isn't it interesting how certain online forces affiliated with the letter Z are against copyright for Western IP in general, but are pro copyright when it comes to hamstringing Western AI?
[-]
- CaptainFever 441 days ago
  The letter Z? What does that mean?
  [-]
  - aspenmayer 440 days ago
    Probably a reference to Z-Library, or as a stand-in for Russia.
    https://en.wikipedia.org/wiki/Z-Library
    https://en.wikipedia.org/wiki/Z_(military_symbol)
netman21 441 days ago
Hee, hee. "Imperial Library of Trantor."
qingcharles 441 days ago
Now do ISSNs, please.
starlite-5008 442 days ago
[dead]
Over2Chars 442 days ago
[flagged]
[-]
- michaelt 442 days ago
  Some people in the archiving / 'data hoarding' community feel it's simpler to just back up everything. This attitude is particularly prevalent in the communities that deal with data other people have already digitised.
  If you're paying $100 per book for someone to visit a major library, get the book out, scan it, check the OCR? Then you'd probably be selective, to get the most out of a limited budget.
  But if you're grabbing epubs and pdfs, and a book only needs $0.002 of space on a hard drive somewhere? Grabbing the useless 41% is probably cheaper and easier than exercising editorial control.
  [-]
  - Over2Chars 442 days ago
    [flagged]
- jillesvangurp 442 days ago
  The problem with such judgment is that they are subjective and subject to biases that change over time. Almost every scrap of information from ancient civilizations is considered priceless at this point because so few is left of it. Anything from obscene graffiti, shopping lists, personal messages, etc. All of it.
  Many autocratic regimes editorialize and censure all forms of publications. But even in the US, which is nominally still a democracy you now have states like Florida forcing changes to literature works and banning books entirely for religious and ideological reasons. And this is not just a right wing thing. There have been a few publishers that took it upon themselves to editorialize literature from the 19th and 20th century to get rid of some things that are now considered sexist, racist or otherwise offensive. The whole cancel culture is not just about canceling people, but about limiting access to their work as well.
  I was at a Christmas market in Berlin a few weeks ago near the Opera. There's a nice little monument there for the book burning that happened in the 1930s. Anything that was vaguely intellectual or Jewish in origin was burned right there during the Kristallnacht. Nice place for a Christmas market and a grim reminder that those calling for things to be deleted/cancelled aren't necessarily very nice people. And of course Hitler himself got cancelled. Possession or distribution of his books is still not allowed in Germany.
  Anyway, imagine somebody in 5000 years finding their way to some archive of hacker news or some reddit thread might look differently at the value of some of the comments than the average moderator.
  [-]
  - heinrich5991 442 days ago
    > Possession or distribution of his books is still not allowed in Germany.
    AFAIK this has never been true in Germany (for the book Mein Kampf at least). AFAIK the German state of Bavaria inherited Hitler's copyright on the book, and did not republish it. This means that no one was allowed to print it for copyright reasons, but you could still own or trade existing copies of the book. After 2015, 70 years after Hitler's death, the book entered the public domain. Looking into Wikipedia, uncommented reprints have been forbidden: https://en.wikipedia.org/w/index.php?title=Mein_Kampf&oldid=..., which I didn't know before.
    [-]
    - jillesvangurp 442 days ago
      It seems you are correct and I was only half right. Lets just say that quoting the man in public is still likely to get you in trouble. More than a few AFD politicians are finding that out the hard way.
      [-]
      - 9dev 442 days ago
        And rightfully so. Germany has a peculiar history in this regard, and that implies a federal obligation to account for it.
  - carlosjobim 441 days ago
    > you now have states like Florida forcing changes to literature works and banning books entirely for religious and ideological reasons.
    This is not honest. They're not banning any books, they are stopping school teachers from forcing certain books on children. The difference is immense.
    [-]
    - Over2Chars 441 days ago
      An excellent point.
      Though, I am not against highly directive schools (I think we need them), I am against mis-characterization.
    - jillesvangurp 441 days ago
      I think that's a pretty biased misrepresentation of the facts. Florida is denying children access to those books by actively forcing teachers to not talk about them (or educate kids about their content). Teachers in violation are at risk of being fired (several have been fired). Schools are being threatened with getting their funding cut. They are also forbidding libraries from having those books.
      And of course, the list of books is getting pretty long and what's on that list is basically determined by a very small but vocal group of christian conservatives with uptight opinions on things like science, evolution, sexuality, and other things they insist are wrong/evil/dangerous.
      So they are not technically banning anything. But they are punishing people that go against this nonetheless. Which makes it kind of a ban. It doesn't go as far as Nazi book burnings. But I have just about as much sympathy for the people that did as I have for those insisting e.g. Harry Potter must not be in a school library succeeding with that because the threats against teachers and schools are very real and these people wield a lot of power, apparently. Bullying teachers and librarians into complying with this seems to happen a lot in Florida. That's not banned at all and actively encouraged.
      What's next Uncle Tom's Cabin? Oh wait that actually happened as well in some schools. Because these people also include some racists and xenophobes. You might call some of these people fascists even. And they just don't like being reminded of things like slavery.
  - Over2Chars 442 days ago
    All action is "subjective and subject to biases that change over time". This would then imply I could never take any action, because it's just subjective and biased. Maybe that's an exaggeration of your position, but you do seem to be suggesting inaction or the impossibility of judgement. I reject this position 100%.
    I would suggest that judgement is a critical part of our civilization, and it's judgement that says those bits of obscene graffiti in Pompeii that makes it so.
    Or else they could say "well, we can't claim ancient cave art is priceless, because we're biased and our biases will change over time. Maybe in a thousand years we'll discover that ancient cave art is worthless, so we'll do nothing".
    In fact you have judged my opinions and shared your judgement with me. Good job!
    Your characterization of regimes as autocratic is judgmental, biased and will change over time. But right now that's your judgement and I applaud it, even if I disagree.
    Gosh, book burning. Not backing up a romance novel or cookbook is definitely analogous to book burning, but I'll play along.
    It was a symbolic act to show a rejection of ideas, not an attempt to eradicate the books, much in the same way Gandhi encouraged the burning of foreign made clothing and products. He wasn't going to rid the world of British cloth nor were the Germans going to rid the world of non-German ideas.
    So yeah, when all the badly written cook books, romance novels, and children's books are in a huge bonfire, you can blame me, personally.
    [-]
    - globnomulous 442 days ago
      > All action is "subjective and subject to biases that change over time".
      This is poppycock. Backing up all books -- the very action discussed by the person you're answering -- is by definition neither subjective nor subject to biases.
      > This would then imply I could never take any action, because it's just subjective and biased.
      And even if the first quoted claim were true, this, too, clearly isn't. Nowhere does the comment you're answering imply that the bias or subjective rationale of an action should, ipso facto, discourage a person from taking it.
      Your comment is replete with similar reasoning, so warped that it's difficult to characterize as anything other than in bad faith. Indeed, this is the snottiest, rudest, least constructive comment I've seen on HN in quite some time -- excepting a couple of my snotty remarks on language or the quality of someone's writing.
      I have no idea what response you expect, but the only one you deserve, I think, is one that just points out your dismissiveness, sarcasm, and breathtaking contempt. What an awful way to move through the world, let alone through HN.
      [-]
      - flir 442 days ago
        > least constructive comment I've seen on HN in quite some time
        But we should still archive it. Some day it might be useful to someone ;)
        [-]
        jillesvangurp 442 days ago
        Exactly :-)
        Over2Chars 441 days ago
        LOl, at least someone has a sense of humour here :-)
        [-]
      - jillesvangurp 442 days ago
        Thanks for this. I wasn't going to feed the trolls; but you are not wrong
        > this is the snottiest, rudest, least constructive comment I've seen on HN in quite some time
        I wish ;-). I see a lot worse here regularly. But it's certainly not nice behavior. Luckily, I have a thick skin.
        [-]
        Over2Chars 441 days ago
        Anyone who takes a position you don't like is, presumably, a troll.
        Well, it takes one to know one. So there.
      - Over2Chars 441 days ago
        Hello Mr/Ms/Xe @globnomulous,
        1) tldr: you're wrong. I am not suggesting that "backing up books" is subjective or subject to biases. You'll need to reread my response: I'm talking about the judgement of doing so. And deciding that it is worthwhile (or not, as I joked with my made up 41%) is a judgement that is subjective and biased (if you subscribe to that sort of world view).
        2) tldr: you're wrong. I openly indicate in my reply that I am exaggerating and extending his position (a reductio ad absurdam for you Romans reading this). However, he is suggesting that my position is wrong, or should give me reason to hesitate, as it might be "biased" et al. So I think my take is quite on the money. If you think your actions are biased and subject to historical revision, are you going to march along confidently? Or will your fingers tremble at the next book burning while you wonder "will history condemn me for this?"
        3)tldr: you're wrong again. You resort to ad hominem attacks after demonstrating a complete inability to understand my position. I'd say you demonstrate both your intelligence (or lack thereof) and worth as a being (or lack thereof). I suggest you consider the unbiased subjective absence of your existence as a priority.
        4) tdlr: I expected nonsensical windbaggery, and you delivered! Thank you. You advanced exactly zero of the positions involved and reduced this thread to garbage. YOU are the cancer killing the internet. Have a nice day.
      - bqmjjx0kac 442 days ago
        I wouldn't be surprised if they're an LLM-powered bot.
        [-]
        Over2Chars 441 days ago
        This seems to be the new hotness is generic replies.
        "No, I think you're an LLM-powered bot! So there!"
    - wizzwizz4 442 days ago
      I don't think you're doing it on purpose, but this is Holocaust denial. The Nazis did destroy all extant copies of several works – for example, research of the Institut für Sexualwissenschaft. (Edit: Some Judaica were sent to Prague instead of being destroyed – though apparently Hitler's planned Judaism Museum is an urban myth.) They absolutely were trying to utterly destroy – not just symbolically reject – vast swathes of culture.
      Please don't make stuff up about the Holocaust. It's the sort of mistake you shouldn't make even once.
      [-]
      - Over2Chars 441 days ago
        Hmm... Godwin's law much?
        My argument is that the symbolic rejection, not the practical destruction of books, was important. Also, I pointed out through a subtle (too subtle for you so I'll spell it out: this is a wholly irrelevant argument - arguing that books don't deserve to be backed up has nothing to do with book burning) - so I played along - as I noted. Get it?
        The practical destruction of all books considered non-German, or Gandhi's destruction of all British fabrics - might have been a desire, but there's no need to publicly burn books to destroy them. They can be efficiently destroyed privately. The Nazi's could have destroyed all the books they liked in some private little place.
        Ergo, my argument stands! PUBLIC book burning is about PUBLIC rejection of something (non-German elements, or Foreign made goods, etc) and may in fact be less efficient than simply quietly and privately putting them in a landfill which no one will see. Get it?
        I'm not making stuff up about World War Two, I'm arguing about a ridiculous analogy to not backing up romance novels being akin to book burning. Apparently, not wanting to back up romance novels and cookbooks makes one a holocaust denier. Go Hacker News.
        Mr. Godwin? Are you around?
        [-]
        wizzwizz4 440 days ago
        The Nazis did destroy (or order destroyed) books quietly and privately, but large collections are easier to destroy in place. The more famous Nazi book burnings occurred prior to World War Two.
        When I said "that's Holocaust denial", I wasn't saying "you're a Nazi", but pointing out an accidental mistake. If I thought it were deliberate, I wouldn't have said anything.
        Your argument about the symbolic meaning of public book burnings is irrelevant (and I'm not sure why you brought it up), but there very much is a parallel between calling for the destruction-by-bitrot of books, and calling for the destruction-by-fire of books. (Again, I am not calling you a Nazi.)
        [-]
        Over2Chars 440 days ago
        Like I pointed out, the original reply linked my suggestion that 41% of books weren't worth backing up to "book burning".
        So, in the spirit of rebuttals to random non-sense, I pointed out that there was no link at all to my comment and book burnings, but... I repeat myself.
        Suggesting that backing up garbage books is not worthwhile, is apparently analogous to wanting all garbage you selectively oppose to be burned, in public.
        So, by extension, the Nazi's can be condemned for not only burning books, but refusing to make backup copies of them.
        I hope that clears this up.
- simpaticoder 442 days ago
  Sturgeon's Law (https://en.wikipedia.org/wiki/Sturgeon%27s_law) states "90% of everything is crap" so you're not too far off.
  [-]
  - lyu07282 442 days ago
    This thread should've really summoned Jason Scott, I remember him causally mentioning that he has a backup of every single 4chan post ever made (99% crap in that case, but probably invaluable for future generations of sociologists/historians who want to piece back together where it all went wrong).
    [-]
    - Over2Chars 441 days ago
      Hmm, maybe we need some "Over2Chars' Future Value Law of Garbage" that says for any pile of garbage there is someone who thinks it will be extremely valuable to someone someday.
      How plausible the argument to the value is depends on the eloquence of the person under the Law.
      This is the logic of hoarders everywhere.
  - Over2Chars 441 days ago
    I agree! 90% of all books shouldn't be backed up.
- flir 442 days ago
  Everyone's 41% is different. Long tail, innit.
  [-]
  - Over2Chars 442 days ago
    Gosh.
    [-]
    - flir 442 days ago
      You mention the example of romance novels above.
      There's a schlocky Victorian pulp novel that's of no use to anyone - except that it happens to contain a fantastically detailed description of an abandoned saltings in my hometown that nobody ever thought to record in any way. For me, those two paragraphs are gold.
      If the novel hadn't been digitised as part of Google's Books Archive Project, I wouldn't have been able to find those two paragraphs. Digitisation not only creates backups, it enables completely new ways of interacting with those texts (eg Google's Ngram Viewer).
      [-]
      - Over2Chars 442 days ago
        Well I guess your one valuable paragraph that matters only to you justifies backing up millions (billions?) of human and soon to be AI generated books, because someone, somewhere, at some time will find a line or two valuable. Maybe.
        I retract my position, let's back up everything!
        [-]
        tiagod 442 days ago
        I think that's the case. IIRC The British Library has copies of all published material in the UK, including flyers and such.
        What seems banal and useless to you, might be extremely important for future historians, and to be honest, books are pretty compressible and storage is cheap.
        [-]
        lyu07282 442 days ago
        I think its a law in almost all nations in fact that forces publishers to sent a copy of everything they publish to a national archive like that (the US equivalent is the Library of Congress). If you bring up the topic of preservation, most people won't understand why, or even be opposed to the idea, goes to show that sometimes its a good idea to ignore the ignorant public.
        https://en.wikipedia.org/wiki/Legal_deposit
        [-]
        Over2Chars 441 days ago
        A rule that dates back to when books were rare, expensive, and useful I suspect.
        Many books are just electronic garbage at this point, and backing them all up is like going to a landfill and saying "We should make another one, exactly like this one, in case this landfill proves to be valuable to someone, someday."
        It might be useful for LLM training to produce garbage. Although many say they already do a good job at that already.
        [-]
        lyu07282 441 days ago
        I don't think you seriously suggest that there aren't books worth saving published even today, so the argument left over is who determines what is worth saving? The only reasonable answer to that question is: nobody.
        [-]
        Over2Chars 440 days ago
        I think that there are books published today - especially published today - that aren't worth "saving".
        I'd start with every single AI generated book that's said to be available on Amazon (300 or so iirc).
        And people can and do judge things all the time: Nobel prizes, juried contests, review boards, movies, music, and yes - even books! - as being worthwhile or garbage. Rotten tomatoes, Nobel prize committees, and so on.
        So yeah, I think your answer is not the only reasonable one. And maybe 41% is way too low.
        Over2Chars 441 days ago
        41% of all future historians will be AI LLMs. Hell, I'd be surprised if there are any historians in the future at all, honestly.
        As the saying goes GIGO. In case you're not familiar with the term: https://en.wikipedia.org/wiki/Garbage_in%2C_garbage_out
        Let's back up ALL the garbage. Someday it will be mined as gold, by hypothetical and probably never to exist future historians.
        xp84 442 days ago
        Let’s say there are ten billion such marginally-useful books published by the time the next few decades. Many epub books are like a couple MB. So 30 petabytes total. That’s something you could fit in one room. One rich guy could buy enough hard drives to do that today. Why not?
        [-]
        Over2Chars 441 days ago
        The ability to do less than worthwhile things, at reasonable cost and effort, hardly implies the necessity or justification for it.
        There are lots of worthless things we can do, quite practically. Let's not do them and say we did.
sebstefan 442 days ago
>$10,000 bounty
>There is much to explore here, so we’re announcing a bounty for improving the visualization above. Unlike most of our bounties, this one is time-bound. You have to submit your open source code by 2025-01-31 (23:59 UTC).
>The best submission will get $6,000, second place is $3,000, and third place is $1,000.
>All bounties will be awarded using Monero (XMR).
? Why are they using crypto, and, weirdly enough, specifically the crypto people use for buying drugs, to award this?
Is it some kind of scam?
[-]
- yawndex 442 days ago
  Because the efforts of Anna's Archive are unfortunately currently very much illegal, and XMR is one of the few cryptocurrencies that can actually offer some privacy to its users.
  [-]
  - sebstefan 442 days ago
    I've used XMR before. Just surprised seeing it to pay for legitimate & harmless visualization work.
    I see, that makes sense
    [-]
    - aprilnya 442 days ago
      So what you’re saying is you think XMR is just for buying drugs, and you’re also saying you’ve used XMR before.
      Hmmmmmm
      /s
- fear-anger-hate 442 days ago
  They use monero because what they are doing (copyright infringement) will get you in to big trouble anywhere in the western world. Without cryptocurrencies much of the modern large scale archival efforts wouldn't be possible, or at the very least would significantly increase risks for the people participating in it. For me this alone is a good enough reason to admit that there are valid reasons for existence of privacy coins.
  The harm they may cause in the short term via tax avoidance or being used to buy drugs is minimal, but the possibility that because of them archivists are able to fund servers for data that future historians wouldn't have otherwise been able to get their hands on? Priceless.
- Klaus23 442 days ago
  Because it is a book download site, which is illegal in every country that has copyright, and revealing one's identity with a bank transfer would be a stupid way to go to jail.
- akimbostrawman 442 days ago
  >Why are they using crypto, and, weirdly enough, specifically the crypto people use for buying drugs, to award this?
  You really have to ask why a illegal/grey site is using currency that is build to protect privacy and anonymity?
  is this some kind of sarcasm?
- friend_Fernando 442 days ago
  [flagged]
  [-]
  - thomasingalls 442 days ago
    Major efforts at creating "everything" libraries are usually looked upon as a positive effort that benefits all of humanity, and we generally mourn the loss of any such effort, regardless of whether the effort is against the laws of the state at the time the effort was undertaken, or even if the collection was created in a morally reprehensible way.
    See: Library of Alexandria, Library of Congress, GenBank, the Svalbard seed vault, Google Books, Internet Archive and all its efforts, ...the Louvre, and most major museums.
    In general, we collectively recognize - without having to be told - that preservation of knowledge is a noble and worthy effort that transcends the fleeting whims of a population at a point in time.
    All that to say, people probably don't need to be tricked into liking such efforts. They're popular because of what they are.
    [-]
    - friend_Fernando 442 days ago
      No one is objecting to knowledge preservation - when you just preserve it instead of abundantly replicating it with a wink.
      Reasonable people are objecting to copyright law violation, for the simple reason that it disincentivizes further knowledge creation.
      Even more reasonable people are objecting to weaponizing copyright law violation on behalf of the vilest dictatorship on the planet.
      [-]
      - myrmidon 442 days ago
        > Reasonable people are objecting to copyright law violation, for the simple reason that it disincentivizes further knowledge creation.
        Do you honestly believe that our current copyright framework is mainly aligned at maximizing incentives for knowledge creation?
        This sounds absurd to me. From my point of view, the copyright framework has been shaped (by continous lobbying efforts) into a system to maximize extraction of profits from existing IPs.
        That is very different from incentivizing "knowledge creation", because the lions share of income is spent on overhead or distributed to shareholders, with the "knowledge creator" (i.e. author), getting <20% of each sale. Furthermore, the mechanisms to balance income are ALSO abysmal (to maximize knowledge creation incentives, it would be necessary to "overspend" significantly on "young" writers, enabling them to feed themselves at the start of their careers).
        > weaponizing copyright law violation on behalf of the vilest dictatorship on the planet.
        How is Annas archive weaponizing copyright violation? How is it furthering Putins interests?
      - greenie_beans 442 days ago
        > Reasonable people are objecting to copyright law violation, for the simple reason that it disincentivizes further knowledge creation.
        how?
  - akimbostrawman 442 days ago
    what a bunch of nonsense