We just create mini data "ponds" on the fly by copying tenant isolated gold tier data to parquet in s3. The users/agent queries are executed with duckdb. We run this process when the user start a session and generate an STS token scoped to their tenant bucket path. Its extremely simple and works well (at least with our data volumes).
We do the same thing, every employee can access our main financial/back office SQL database, but we just use PostgreSQL with row level security[0]. We never bothered to complicate it like the post does.
I'd be so uncomfortable with this. It sounds like you're placing the full burden of access on a single boundary. I mean, maybe there's more to it that you haven't spoken about here, but "everything rests on this one postgres feature" is an unacceptably unsafe state to me.
We did this with MotherDuck, and without introducing a new language. Every tenant has their own isolated storage and compute, so it’s trivial to grant internal users access to specific tenants as needed. DuckDB’s SQL dialect is mostly just Postgres’ with some nice ergonomic additions and a host of extra functionality.
This is explicitly not the problem they are trying to solve. In a single tenant database you don’t have to by definition worry about multi tenant databases
I guess the question then becomes, what problem does a multi-tenancy setup solve that an isolated database setup doesn't? Are they really not solving the same problem for a user perspective, or is it only from their own engineering perspective? And how do those decisions ultimately impact the product they can surface to users?
Off the top of my head, managing 100 different database instances takes a lot more work from the business standpoint than managing 1 database with 100 users.
The article also mentioned that they isolate by project_id. That implies one customer (assume a business) can isolate permissions more granulary.
Reasons 1-3 could very well be done with ClickHouse policies (RLS) and good data warehouse design. In fact, that’s more secure than a compiler adding a where to a query ran by an all mighty user.
Reason 4 is probably an improvement, but could probably be done with CH functions.
The problem with custom DSLs like this is that tradeoff a massive ecosystem for very little benefit.
As long as you don't deviate too much from ANSI, I think the 'light sql DSL' approach has a lot of pros when you control the UX. (so UIs, in particular, are fantastic for this approach - what they seem to be targeting with queryies and dashboards). It's more of a product experience; tables are a terrible product surface to manage.
Agreed with the ecosystem cons getting much heavier as you move outside the product surface area.
0: https://www.postgresql.org/docs/18/ddl-rowsecurity.html
How do you enforce tenant isolation with that method, or prevent unbounded table reads?
We do something similar for our backoffice - just with the difference that it is Claude that has full freedom to write queries.
The article also mentioned that they isolate by project_id. That implies one customer (assume a business) can isolate permissions more granulary.
Reason 4 is probably an improvement, but could probably be done with CH functions.
The problem with custom DSLs like this is that tradeoff a massive ecosystem for very little benefit.
Agreed with the ecosystem cons getting much heavier as you move outside the product surface area.