Backing Up Spotify

(annas-archive.li)

230 points | by vitplister 2 hours ago

21 comments

  • Etheryte 1 hour ago
    To put this into perspective, What.CD [0] was widely considered to be the music library of Alexandria, unparalleled in both its high quality standard and it's depth. What had in the ballpark of a few million torrents when it got raided and shut down. Anna's rip of Spotify includes roughly 186 million unique records. Granted, the tail end is a mixed bag of bot music and whatnot, but the scale is staggering.

    [0] https://en.wikipedia.org/wiki/What.CD

    • flxy 2 minutes ago
      I think what earned what.cd that title wasn't necessarily just the amount but the quality, as you mentioned, as well as the obscurity of a lot of the offered material. I remember finding an early EP of an unknown local band on there, and I live in the middle of nowhere in Europe. There were also quite a few really old and niche records on there which possibly couldn't be put on streaming services due to the ownership of rights being unknown. It was the equivalent of vinyl crate digging without physical restrictions.

      Additionally there was a lot of discourse about music and a lot of curated discovery mechanisms I sorely miss to this day. An algorithm is no replacement for the amount of time and care people put into the web of similar artists, playlists of recommendations and reviews. Despite it being piracy, music consumption through it felt more purposeful. It's introduced me to some of my all time favourite artists, which I've seen live and own records and merchandise of.

    • VanTheBrand 26 minutes ago
      True but What.cd had a tremendous amount of notable music not available on Spotify though because it was also sourced from cds, bootlegs, vinyl, tape etc whereas Spotify only includes music explicitly licensed for streaming.
      • Etheryte 22 minutes ago
        This is true and a category of music that got hit notably hard was live recordings. What had a wide array of live recordings made by sound engineers straight from the mixer. This is something that you simply cannot find now unless you maybe know a guy.
        • qingcharles 9 minutes ago
          That's why I use YouTube Music as my streamer as they allow damned near anyone to upload any old rare record and then figure out the royalties somehow.
  • crazygringo 1 hour ago
    This is insane.

    I definitely was not aware Spotify DRM had been cracked to enable downloading at scale like this.

    The thing is, this doesn't even seem particularly useful for average consumers/listeners, since Spotify itself is so convenient, and trying to locate individual tracks in massive torrent files of presumably 10,000's of tracks each sounds horrible.

    But this does seem like it will be a godsend for researchers working on things like music classification and generation. The only thing is, you can't really publicly admit exactly what dataset you trained/tested on...?

    Definitely wondering if this was in response to desire from AI researchers/companies who wanted this stuff. Or if the major record labels already license their entire catalogs for training purposes cheaply enough, so this really is just solely intended as a preservation effort?

    • Aurornis 1 hour ago
      > The thing is, this doesn't even seem particularly useful for average consumers/listeners, since Spotify itself is so convenient, and trying to locate individual tracks in massive torrent files of presumably 10,000's of tracks each sounds horrible.

      I wouldn’t be so sure. There are already tools to automatically locate and stream pirated TV and movie content automatic and on demand. They’re so common that I had non-technical family members bragging at Thanksgiving about how they bought at box at their local Best Buy that has an app which plays any movie or TV show they want on demand without paying anything. They didn’t understand what was happening, but they said it worked great.

      > Definitely wondering if this was in response to desire from AI researchers/companies who wanted this stuff.

      The Anna’s archive group is ideologically motivated. They’re definitely not doing this for AI companies.

      • crazygringo 1 hour ago
        > The Anna’s archive group is ideologically motivated.

        Very interesting, thank you. So using this for AI will just be a side effect.

        And good point -- yup, can now definitely imagine apps building an interface to search and download. I guess I just wonder how seeding and bandwidth would work for the long tail of tracks rarely accessed, if people are only ever downloading tiny chunks.

        • nutjob2 37 minutes ago
          I think the people seeding these are also ideologs and so would be interested in also supporting the obscure stuff, maybe more than the popular. There is no way any casual listeners would go to the quite substantial trouble of using these archives.

          Anyone who wants to listen to unlimited free music from a vast catalog with a nice interface can use YouTube/Google Music. If they don't like the ads they can get an ad blocker. Downloading to your own machine works well too.

      • 5- 1 hour ago
        > The Anna’s archive group is ideologically motivated. They’re definitely not doing this for AI companies.

        https://en.wikipedia.org/wiki/Useful_idiot

        • ronsor 16 minutes ago
          They know about AI companies and don't mind AI companies, but they're not doing it because AI companies.
    • VanTheBrand 25 minutes ago
      The metadata is probably more useful than the music files themselves arguably
    • thaumasiotes 5 minutes ago
      > I definitely was not aware Spotify DRM had been cracked to enable downloading at scale like this.

      Do they have DRM at all? Youtube and Pandora don't.

    • IshKebab 10 minutes ago
      I dunno if they publish like a 10 TB torrent of the most popular music I can see people making their own music services. A 10 TB hard disk is easily affordable, and that's about 3 million songs which is way more than anyone could listen to in a lifetime, even if you reduce that by 100x to account for taste.

      It's probably going to make the AI music generation problem worse anyway...

    • basisword 1 hour ago
      >> But this does seem like it will be a godsend for researchers working on things like music classification and generation. The only thing is, you can't really publicly admit exactly what dataset you trained/tested on...?

      Didn't Meta already publicly admit they trained their current models on pirated content? They're too big to fail. I look forward to my music Slop.

      • VanTheBrand 21 minutes ago
        They are too big to fail but they aren’t too big to have to pay out a huge settlement. Facebook annual revenue is about it twice that of the entire global recording industry. The strategy these companies took was probably correct but that calculation included the high risk of ultimately having to pay out down the line. Don’t mistake their current resistance to paying for an internal belief they never will have to.
  • ikamm 4 minutes ago
    I really don't understand how focusing on source quality files is supposed to be a "major issue" with the music preservation community. It's bizarre for them to talk about these being barriers for creating a "full archive of all music that humanity has ever produced" have and their answer be scraping Spotify to end up with a music library comprised of many AI and bulk produced songs at 75/160kbps.
  • bob1029 9 minutes ago
    I recall many interesting tracks that were very aggressively deleted from all platforms in sync. I wonder if I could find them in this archive.

    There is contemporary lost media being created every day because of how we distribute things now. I think in some cases, the intent of the publisher was to literally destroy every copy of the information. I understand the legal arguments for this, but from a spiritual perspective, this is one of the most offensive things I can imagine. Intentionally destroying all copies of a creative work is simply evil. I don't care how you frame it.

    Making media effectively lost is not much different in my mind. Is it available if it's sitting on a tape in an iron mountain bunker that no one will ever look at again?

  • WD-42 1 hour ago
    Incredible.

    > A while ago, we discovered a way to scrape Spotify at scale.

    They wont and shouldn’t divulge the details, but I imagine that would be a fun read!

    • bmikaili 52 minutes ago
      they're probably just using something like https://github.com/nor-dee/spotizerr-spotify
      • WD-42 36 minutes ago
        No way, that would take far too long.
      • bigyabai 32 minutes ago
        Probably not, those tools don't actually download Spotify tracks at source quality.
        • sunaookami 29 minutes ago
          There are tools that actually download directly from Spotify (needs premium then) but yeah most of them just use the search and download from other sources like YouTube without mentioning it. I won't say which tools download directly out of fear that they get killed but they exist.
  • yellow_lead 46 minutes ago
    Is the music torrent not up yet? Only see the metadata one here: https://annas-archive.li/torrents/spotify
    • artninja1988 45 minutes ago
      Yeah, in the article they write:

      The data will be released in different stages on our Torrents page:

      [X] Metadata (Dec 2025)

      [ ] Music files (releasing in order of popularity)

      [ ] Additional file metadata (torrent paths and checksums)

      [ ] Album art

      [ ] .zstdpatch files (to reconstruct original files before we added embedded metadata)

      • yellow_lead 43 minutes ago
        Oh I see, thanks! I missed that
  • syntaxing 1 hour ago
    Moral and legal discussion aside, this is technically very impressive. I also wouldn’t be surprised if this somehow kickstarts open source music generative AI from China.
  • yegle 56 minutes ago
    Not that we should, but it's technically feasible to have a music streaming server with the torrent as the backend, and selectively download the part of the torrent in respond to on-demand streaming request from the client.
  • krick 7 minutes ago
    Uh, cool, I guess? I want to applaud that, but, first off, unless you are OpenAI or Facebook, it is not exactly plausibly easy to participate in the festivities. Even if I had spare 300 TB laying around, how the fuck do I download that?

    But, more importantly, I cannot even say "good for you", because I don't actually think it is good for Anna's Archive. I wouldn't touch that thing, if I was them. Do we even have any solid alternatives for books, if Anna's Archive gets shot down, by the way? Don't recommend Amazon, please.

  • frereubu 1 hour ago
    Site is down for me. Archive link: https://archive.is/jf3HW
    • mawax 1 hour ago
      Probably not down, but blocked by your ISP. Try a VPN. Same thing happens here.
    • ipsum2 1 hour ago
      Ironic. But its working for me.
  • xnx 1 hour ago
    Merry Christmas!
  • Fizzadar 1 hour ago
    I have Spotify premium but the constant shuffle of content availability has meant I’ve stared routinely archiving my liked songs to avoid any rug pull. Zspotify and co still work a charm.
  • throwaway613745 1 hour ago
    I wonder how deep the hole they're gonna put whoever runs this site into is gonna be?
  • lelouch9099 1 hour ago
    How legal is this with regards to copyright laws?
    • luke-stanley 0 minutes ago
      Currently it says they have released metadata and album art. Is archiving and sharing the textual track metadata alone (no images, no audio) legal in the US, or Europe? By what basis?
    • Aurornis 1 hour ago
      Not legal. This group does not concern themselves with copyright law.
    • toomuchtodo 1 hour ago
      Adherence to the legal framework is a function of your risk appetite.
    • phainopepla2 1 hour ago
      Not legal
    • ronsor 43 minutes ago
      Very, if we delete copyright like we're supposed to.
    • basisword 1 hour ago
      It's not. It's awful people justifying awful behaviour. And it's why we can't have nice things. There are always assholes ready to exploit others.
      • conception 0 minutes ago
        Are you talking about Spotify here…?
      • nemomarx 1 hour ago
        There's some irony here considering Spotify used pirated mp3s at the start of their operations, I suppose.
      • jopicornell 34 minutes ago
        Monopoly is not a nice thing. Maybe it is convenient, but not nice.

        People that gives money to artists are the ones going to concerts and buying music directly to artists. Spotify gives cents to artists, incetivizing awful behaviour (AI music, aggressive marketing, low effort art...).

      • poly2it 40 minutes ago
        Some people's urges to destroy all traces of human civilisation astonish me. What do you think Spotify is going to do with all its music when it ceases to exist in however many years? No, we must collectively feed Daniel Ek the Hungry.
  • ipsum2 1 hour ago
    Can someone explain why C#/Db (major/minor) is the third most popular key? Very unexpected for me, since its relatively more difficult to play.
    • ghostie_plz 32 minutes ago
      Both C#m and Db can be played on piano using only the black keys (skipping the 3rd note of the scale). This makes them easy keys for beginners. I'm not sure if that's the reason, but it could be related.

      Anecdotally, I know a few vocalists that sound great in these keys and use them as a starting point

    • kzrdude 53 minutes ago
      Electronic dance music is the biggest genre in the data. So then easy to play shouldn't matter. It's still an interesting question. I think playing Db is pretty nice on the piano even if it's not the easiest.
    • klysm 53 minutes ago
      Difficult to play in what instrument?
  • 827a 20 minutes ago
    Holy crap. This is going to trigger a five-alarm fire at Spotify Engineering. This has got to be among the largest proprietary datasets ever unintentionally publicized by a company.
    • rightbyte 17 minutes ago
      Wasn't all data available to users though?
  • nutjob2 46 minutes ago
    I wonder how definitive their collection is and how much ripping Google Music/YouTube would improve on this.

    A distributed ripping project to do that would be a fine thing.

  • zoklet-enjoyer 1 hour ago
    Wow. Now I just need some hard drives and a way to download that without my ISP doing something about it. That's amazing.
  • basisword 1 hour ago
    Am I understanding this wrong? Ripping the metadata I'm fine with. But it sounds like they've ripped every song from Spotify and they're going to release them?

    Edit: It seems like they are. Stealing from tens of thousands of artists, big and small, and calling it "preservation" or "archiving" is scummy.

    • Nextgrid 1 hour ago
      Music piracy is already a thing, not to mention you don't even need to torrent nowadays when music is available for free on YouTube. Those who don't want to pay already don't pay so nothing changes there.

      The value of Spotify is the convenience, and this collection does not change that in any way. Your argument would apply if someone were to make a Spotify clone with the same UX using this data.

      • montag 12 minutes ago
        I don’t understand how the parent comment is downvoted yet this is not. “Stealing is ok because stealing is already a thing”… come on, now
    • unsungNovelty 3 minutes ago
      Spotify used pirated songs initially when they started it. So...
    • prmoustache 1 hour ago
      Stealing is not the correct word.
    • nutjob2 50 minutes ago
      Don't worry, they let Spotify keep the original files.
    • Slow_Hand 1 hour ago
      While I wouldn't call this scummy I do agree with your sentiment. It is technically stealing and those copyrights should be respected.

      Full disclosure, I am a career musician AND have been known to pirate material. That said, I think this is a valuable archive to build. There are a lot of recordings that will not endure without some kind of archiving. So while it's not a perfect solution, I do think it has an important role to play in preservation for future generations.

      Perhaps it's best to have a light barrier to entry. Something like "Yes, you can listen to these records, but it should be in the spirit of requesting the material for review, and not just as a no-pay alternative to listening on Spotify." Give it just enough friction where people would rather pay the $12/month to use a streaming service.

      Also, it's not like streaming services are a lucrative source of income for most artists. I expect the small amount of revenue lost to listeners of Anna's Archive are just (fractions of) a penny in the bucket of any income that a serious artist would stand to make.

      • IgorPartola 32 minutes ago
        > It is technically stealing

        It is technically not. Stealing means you have a thing, I steal it, now I have the thing and you do not. You can’t steal a copyright (aside from something like breaking into your stuff and stealing the proof that you hold the copyright), and then a song is downloaded the original copyright holder still have copy.

        Calling piracy theft was MPAA/RIAA propaganda. Now people say that piracy is theft without ever even questioning it, so it was quite successful.

    • efilife 1 hour ago
      Why is this stealing? You can already listen to everything that's on Spotify with a free account. You are free to also record the audio while it's playing. I suppose grabbing the actual file should't matter? Or is this about releasing? And robbing people of plays they would otherwise get through Spotify?
      • basisword 1 hour ago
        If you listen to something on Spotify with a free account the artists still get paid. This isn't a case where you're ripping off so mega-corp. You're ripping off thousands of artists from major label ones to tiny indies. Take the metadata and build something cool. Stealing the files and releasing them is something else entirely.
        • prmoustache 1 hour ago
          You can record what you play from Spotify and you are already free to play the record again and again and again without the artist being paid.

          Most people do not because they find it less convenient than paying 20bucks a month or whatever is the current price in 2025 but that doesn't change the reality.

          For most people the appeal of Spotify is not the music itself but the playlists that are shared thanks to its ubiquity. This is the reason other services struggle to make a dent even if they have better quality, UI and algos.

          Spotify started by disrupting the market using pirated music by the way so you are pretty much endorsing and encouraging piracy when "paying" your favorite artists through Spotify.

    • WD-42 1 hour ago
      Nobody is gonna download a 300TB torrent just to get the latest Taylor Swift album. There are much easier avenues than that.

      What’s actually scummy is Spotify paying artists $1 per 1000 streams.

      Buy CDs. Use Bandcamp.

      • basisword 1 hour ago
        How about we let the individual artists decide?
        • WD-42 34 minutes ago
          In most cases, they couldn't make that decision even if they wanted to. Only independent artists and those that are so large as to have enough sway (Niel Young for example) would be able to. The vast majority of artists you probably listen to don't actually own the rights to their own music.

          So let the rights holders make the decision? They would never. Music rights exist for them to extract profit above all else. They don't care about preserving culture or legacy. Which is why it's important that somebody does.

    • klabb3 48 minutes ago
      The people I know who go through the trouble of pirating and downloading vast libraries of music are all musicians themselves, or at the very least total music nerds. They don’t want to lose access to their stuff, plus if they ever need to import audio into a DAW, DRM is a no-go. They are the same people who spend large amounts of money on vinyls, and support smaller independent artists through concerts, merch and (back in the day) CDs.

      It used to be more mixed, but today, piracy is often the only option to ”own” any media at all.

      • temp0826 33 minutes ago
        The musicians I know are the most inclined to actually pay for music (NOT through Spotify) and buy merch.
  • artninja1988 1 hour ago
    Wow. Anna is a godsend. Hopefully now we get some really good open source music models
  • vlaaad 26 minutes ago
    Unrelated, but I just can't stop myself from saying that I absolutely hate Spotify even though I'm a paying customer. Fuck you Spotify. You were supposed to be a convenient way to discover and listen to music. Now you are only convenient for listening to music, and absolutely terrible for any recommendations. This is sad really. Spotify had good recommendations. It's absolutely in a position where it can provide good recommendations — it has both a vast music library and a vast amount of data on user preferences. And it chooses to push procedural/ai-generated slop instead to earn more money. I thought that maybe buying $SPOT stock will make me more at peace with its greed, but it didn't work. Spotify fucking deserves to crash and burn because it sees paying customers as idiots who might not notice they are fed garbage. Fuck you Spotify, fuck you.
    • eastbound 16 minutes ago
      This is more frequent than you would assume. I’ve neither subscribed to Apple Music nor Spotify for this exact reason: I’m a millenial who would like to discover music.

      Another extremely annoying effect is, being 40+, they only suggest music for my age. In “New” and “Trending”, I see Muse and Coldplay! I should make myself a fake ID just to discover new music, but that gets creepy very fast.