Yeah those clusters are interesting. They stand out, so they are the first thing I zoomed in on, then I realized they're all just stock resume sites. Quickly realize the clusters are something to avoid. Turns out to be an effective visualization method.
The thing I find interesting is where the grouping is robust to colour variations: one of the bigger groups is around 25% from left, 20% from bottom, all one theme but in a wide variety of colours.
I’m curious how the choice of which blog is located next to which was made. The writeup mentions “dimensionality”. I found my blog, and the eight surrounding it are interesting people, but every one of them is an AI researcher with degrees from Berkeley or similar, and the sites are predominantly CVs.
Luminous company but not my level, nor is my blog about AI, nor is it a CV. I can’t see any reason for the location.
I think it is literally by the colors of the screenshots. Nothing to do with the contents.
> I just want to encode the high level aesthetic details of webpage screenshots. Because of this, I fell back on an old friend: the triplet loss on top of a small encoder. The resulting output dimension of 64 afforded ample room for describing the visual range while maintaining a considerably smaller footprint.
I started by finding my own blog and scrolling north, south, east, and west to see my neighbours. I’ve already found several interesting sites and a new person to follow on mastodon.
It’s a shame there doesn’t seem to be any way to link to a particular position on the map but great stuff nevertheless.
That's a lot of fun to explore. I'm not entirely convinced by the "you can judge a book by its cover" thing, there are so many "Hi, I'm _____" pages that might have content or might just be portfolio stubs.
[0]: https://onemillionscreenshots.com/idiallo.com/screenshot
Some of them are due to many people using the same theme.
Some of them are expired or parked domains, which I reckon should be detected and excluded.
Teeming masses of sites using what probably seems to the authors as a fresh, unconventional look but ends up being Yet Another.
Recently went back [0] to the open web and feel like this inclusion alone justified that move.
Thanks for sharing. Humble and heart-warming way to end 2025 for an old Internet man.
[0]: https://frankycaron.medium.com/of-an-open-web-rebirth-and-bi...
Luminous company but not my level, nor is my blog about AI, nor is it a CV. I can’t see any reason for the location.
> I just want to encode the high level aesthetic details of webpage screenshots. Because of this, I fell back on an old friend: the triplet loss on top of a small encoder. The resulting output dimension of 64 afforded ample room for describing the visual range while maintaining a considerably smaller footprint.
Here’s another Christmassy alternative: https://display.archive.org/xmas
I’m one of the makers of OneMillionScreenshots.com and I’m currently working on an update to it.
It’s a shame there doesn’t seem to be any way to link to a particular position on the map but great stuff nevertheless.
Timeline: view older versions
Clock: view light/dark mode theme according to user time zone (or enable dark/light mode manually)
I'm also a bit curious, since most web pages are predominantly white, how many of them are adapted to dark mode?
My forum isn't, though. With a post every day or so and nearly 50 active users, it's probably not "small web" any more :-D
My website [1] gets perhaps as many as 200 visitors a week according to Cloudflare. And it's still there at number 399322 (first half of the pack).
[1] https://onemillionscreenshots.com/dmitriid.com
this is one of the coolest blogs i have ever read!
Nice catch, fixing :D