We give every user SQL access to a shared ClickHouse cluster

(trigger.dev)

22 points | by eallam 3 days ago

6 comments

cjonas 17 minutes ago
We just create mini data "ponds" on the fly by copying tenant isolated gold tier data to parquet in s3. The users/agent queries are executed with duckdb. We run this process when the user start a session and generate an STS token scoped to their tenant bucket path. Its extremely simple and works well (at least with our data volumes).
[-]
- Waterluvian 6 minutes ago
  Is that why it’s called DuckDb? Because data ponds?
zie 1 hour ago
We do the same thing, every employee can access our main financial/back office SQL database, but we just use PostgreSQL with row level security[0]. We never bothered to complicate it like the post does.
0: https://www.postgresql.org/docs/18/ddl-rowsecurity.html
[-]
- staticassertion 1 minute ago
  I'd be so uncomfortable with this. It sounds like you're placing the full burden of access on a single boundary. I mean, maybe there's more to it that you haven't spoken about here, but "everything rests on this one postgres feature" is an unacceptably unsafe state to me.
- orf 1 hour ago
  Back office, employee access is a completely different problem to what is described in the post.
  How do you enforce tenant isolation with that method, or prevent unbounded table reads?
  [-]
  - weird-eye-issue 0 minutes ago
    RLS...
  - tossandthrow 56 minutes ago
    They likely don't need tenant isolation and unbound table reads can be mitigated using timeouts.
    We do something similar for our backoffice - just with the difference that it is Claude that has full freedom to write queries.
jelder 44 minutes ago
We did this with MotherDuck, and without introducing a new language. Every tenant has their own isolated storage and compute, so it’s trivial to grant internal users access to specific tenants as needed. DuckDB’s SQL dialect is mostly just Postgres’ with some nice ergonomic additions and a host of extra functionality.
[-]
- raw_anon_1111 40 minutes ago
  This is explicitly not the problem they are trying to solve. In a single tenant database you don’t have to by definition worry about multi tenant databases
  [-]
  - DangitBobby 27 minutes ago
    I guess the question then becomes, what problem does a multi-tenancy setup solve that an isolated database setup doesn't? Are they really not solving the same problem for a user perspective, or is it only from their own engineering perspective? And how do those decisions ultimately impact the product they can surface to users?
    [-]
    - raw_anon_1111 5 minutes ago
      Off the top of my head, managing 100 different database instances takes a lot more work from the business standpoint than managing 1 database with 100 users.
      The article also mentioned that they isolate by project_id. That implies one customer (assume a business) can isolate permissions more granulary.
senorrib 1 hour ago
Reasons 1-3 could very well be done with ClickHouse policies (RLS) and good data warehouse design. In fact, that’s more secure than a compiler adding a where to a query ran by an all mighty user.
Reason 4 is probably an improvement, but could probably be done with CH functions.
The problem with custom DSLs like this is that tradeoff a massive ecosystem for very little benefit.
[-]
- efromvt 47 minutes ago
  As long as you don't deviate too much from ANSI, I think the 'light sql DSL' approach has a lot of pros when you control the UX. (so UIs, in particular, are fantastic for this approach - what they seem to be targeting with queryies and dashboards). It's more of a product experience; tables are a terrible product surface to manage.
  Agreed with the ecosystem cons getting much heavier as you move outside the product surface area.
elnatro 25 minutes ago
New to ClickHouse here. Would you thing this kind of database has a niche when compared to usual RDBMS like MySQL and PostgreSQL?
baalimago 17 minutes ago
The evolution of this is to use agents, and have users "chat with the data"
[-]
- mattaitken 8 minutes ago
  Yes, you can actually do this already because we expose a REST API and TypeScript SDK functions to execute the queries.