Open sourcing Dicer: Databricks's auto-sharder

(databricks.com)

85 points | by vivek-jain 25 days ago

6 comments

charleshn 25 days ago
> Application pods learn the current assignment through a library called the Slicelet (S for server side). The Slicelet maintains a local cache of the latest assignment by fetching it from the Dicer service and watching for updates. When it receives an updated assignment, the Slicelet notifies the application via a listener API.
For a critical control plane component like this, I tend to prefer a constant work pattern [0], to avoid metastable failures [1], e.g. periodically pull the data instead of relying on notifications.
[0] https://aws.amazon.com/builders-library/reliability-and-cons...
[1] https://brooker.co.za/blog/2021/05/24/metastable.html
[-]
- jdellithorpe 24 days ago
  The Dicer Slicelet supports such a pattern, you can poll the assignment directly on the Slicelet:
  https://github.com/databricks/dicer/blob/master/dicer/extern...
  (btw the notification mechanism itself does not deliver the assignment to the application, only notifies the application that the assignment has changed: https://github.com/databricks/dicer/blob/master/dicer/extern...)
khaki54 25 days ago
Seems weird to call it sharding since it's not sharding indexed datasets or anything like that. Is this just a tool to mitigate Databricks’ internal service-scaling challenges?
[-]
- atuladya 25 days ago
  Right - this is not about sharding data/datasets. This is for sharding in-memory state that a service might have. The problem of building services at low cost, high scale, low latency and high throughput is common in many environments including our services at Databricks, and Dicer helps with that.
ayf 25 days ago
Does anyone else have something similar?
What are some use cases that you found are useful?
[-]
- louis-paul 25 days ago
  Sounds related to Google Slicer: https://research.google/pubs/slicer-auto-sharding-for-datace...
  [-]
  - atuladya 25 days ago
    It is similar to Slicer in terms of the abstraction (I built Slicer at Google) but the architecture, implementation and algorithms have a lot of differences
    [-]
    - bigwheels 25 days ago
      Did you also work on this databricks dicery?
      [-]
      - hiyer 24 days ago
        Yes he did. I attended a talk from him on the same, so that's how I know.
- WookieRushing 25 days ago
  These show up once you have a certain scale where it is either cost inefficient or the hot spots are very dynamic. They also try to avoid latency by being eventually consistent sidecars instead of proxies.
  I’ve seen them used for traffic routing, storage system metadata systems, distributed cache etc
- vivek-jain 25 days ago
  Sharded in-memory caching turns out to be rather useful at scale :)
  Some of the key examples highlighted on our blog are Unity Catalog, which is essentially the metadata layer for Databricks, our Query Orchestration Engine, and our distributed remote cache. See the blog post for more!
yomartin 19 days ago
[dead]
vivek-jain 25 days ago
[dead]