A few words on DS4

(antirez.com)

53 points | by caust1c 1 hour ago

4 comments

  • 0xbadcafebee 4 minutes ago
    [delayed]
  • kamranjon 15 minutes ago
    Just want to mention that I've been pulling down and using DwarfStar locally and it's incredible. I actually have it running on my personal macbook m4 max with 128gb of ram and I am running the server to share it through tailscale with my work laptop and just have pi running there.

    The long context reasoning is something I haven't even seen in frontier models - I was running at 124k tokens earlier and it was still just buzzing along with no issues or fatigue.

    I am amazed at how well it works, I'm using it right now for some pretty complex frontend work, and it is much much faster than, for example running a dense 27b or 31b model (like qwen or gemma) for me (The benefits of MoE) - but the long context capabilities have been what have been absolutely flooring me.

    Super excited about this project and hope Antirez can keep himself from burning out - i've been following the repo pretty closely and there are a ton of PR's flooding in and it seems like he's had to do a lot of filtering out of slop code.

    • le-mark 6 minutes ago
      Is DS4 dwarf star 4 or deep seek 4?
      • kamranjon 5 minutes ago
        Just updated! Sorry I meant Dwarf Star - it's the only way I've actually managed to run DeepSeek flash on my local hardware
      • wolttam 5 minutes ago
        DwarfStar 4 is DeepSeek 4 (check the repo)
  • simonw 32 minutes ago
    I got this running on a 128GB M5 the other day - pretty painless, model runs in about 80GB of RAM and it seemed to be very capable at writing code and tool execution.
    • perfmode 21 minutes ago
      How’s the token throughput / response time?
      • simonw 18 minutes ago
        Healthy!

          prefill: 30.91 t/s, generation: 29.58 t/s
        
        From https://gist.github.com/simonw/31127f9025845c4c9b10c3e0d8612...
        • xienze 8 minutes ago
          I don't want to be a jerk but 31t/s prefill is basically unusable in an agentic situation. A mere 10k in context and you're sitting there for 5+ minutes before the first token is generated.
          • aiscoming 6 minutes ago
            if it's just the coding agent system prompt and tools, you can cache that
            • xienze 1 minute ago
              Yeah the problem is that's just the start of the context. There's, you know, all the tool call results and file reads and stuff.
  • bjconlan 58 minutes ago
    This is great! I feel the same way about the deepseek v4 architecture for commodity hardware.

    Also have enjoyed playing with https://huggingface.co/HuggingFaceTB/nanowhale-100m-base (but early days for me understanding this space)

    • kamranjon 1 minute ago
      Very cool! I had no idea that HF was doing this - I really love their small model experiments.