A few words on DS4

(antirez.com)

53 points | by caust1c 1 hour ago

4 comments

0xbadcafebee 4 minutes ago
[delayed]
kamranjon 15 minutes ago
Just want to mention that I've been pulling down and using DwarfStar locally and it's incredible. I actually have it running on my personal macbook m4 max with 128gb of ram and I am running the server to share it through tailscale with my work laptop and just have pi running there.
The long context reasoning is something I haven't even seen in frontier models - I was running at 124k tokens earlier and it was still just buzzing along with no issues or fatigue.
I am amazed at how well it works, I'm using it right now for some pretty complex frontend work, and it is much much faster than, for example running a dense 27b or 31b model (like qwen or gemma) for me (The benefits of MoE) - but the long context capabilities have been what have been absolutely flooring me.
Super excited about this project and hope Antirez can keep himself from burning out - i've been following the repo pretty closely and there are a ton of PR's flooding in and it seems like he's had to do a lot of filtering out of slop code.
[-]
- le-mark 6 minutes ago
  Is DS4 dwarf star 4 or deep seek 4?
  [-]
  - kamranjon 5 minutes ago
    Just updated! Sorry I meant Dwarf Star - it's the only way I've actually managed to run DeepSeek flash on my local hardware
  - wolttam 5 minutes ago
    DwarfStar 4 is DeepSeek 4 (check the repo)
simonw 32 minutes ago
I got this running on a 128GB M5 the other day - pretty painless, model runs in about 80GB of RAM and it seemed to be very capable at writing code and tool execution.
[-]
- perfmode 21 minutes ago
  How’s the token throughput / response time?
  [-]
  - simonw 18 minutes ago
    Healthy!
```
  prefill: 30.91 t/s, generation: 29.58 t/s
```
    From https://gist.github.com/simonw/31127f9025845c4c9b10c3e0d8612...
    [-]
    - xienze 8 minutes ago
      I don't want to be a jerk but 31t/s prefill is basically unusable in an agentic situation. A mere 10k in context and you're sitting there for 5+ minutes before the first token is generated.
      [-]
      - aiscoming 6 minutes ago
        if it's just the coding agent system prompt and tools, you can cache that
        [-]
        xienze 1 minute ago
        Yeah the problem is that's just the start of the context. There's, you know, all the tool call results and file reads and stuff.
bjconlan 58 minutes ago
This is great! I feel the same way about the deepseek v4 architecture for commodity hardware.
Also have enjoyed playing with https://huggingface.co/HuggingFaceTB/nanowhale-100m-base (but early days for me understanding this space)
[-]
- kamranjon 1 minute ago
  Very cool! I had no idea that HF was doing this - I really love their small model experiments.