CQL: Categorical Databases

(categoricaldata.net)

64 points | by noworriesnate 3 days ago

8 comments

iron_fever 2 minutes ago
I know very little about databases, but I work heavily with category theory, so fwiw: I think the main benefit is composition. The edge over SQL shows up when you combine schema mappings - a mapping is a functor, so when you migrate data along it the constraints come with it by construction, and you don't end up writing ETL and hoping integrity held.
As best as I can tell (but i really dont know much about databases) it's probably a narrow advantage - storage and everyday queries still go to Codd's model - but for stitching schemas together it seems like it could work.
I'm using similar math for automated formal verification, where this approach is what makes it tractable.
mattstir 21 minutes ago
> Reduce risk of failure through artificial intelligence. CQL contains an embedded automated theorem prover that guarantees the correctness of CQL programs.
Man, it's a rough environment right now marketing-wise. I don't know if they're contractually obligated to say the funny magic words, but the term AI is nearly entirely meaningless at this point. Akin to saying "behold my mighty calculator app: it prevents divisions by zero through artificial intelligence!"
bob1029 5 hours ago
> CQL is not a database management system: it neither stores nor updates data.
The same could be said for SQL. How does CQL differ from SQL? If I squint my eyes just a tiny amount, these ideas become really difficult to separate. I was always under the impression that the relational model is based upon many concepts studied in category theory. To my mind, all of the following things are overlapping parts of the exact same monster:
```
  Set theory
  Category theory
  Graph theory
  Type theory
  Discrete mathematics 
  Relational algebra
  Relational calculus
  Relational modeling
  An actual sql schema
```
[-]
- jeltz 1 hour ago
  The main innovation here seems to be compile time checking of that foreign keys are respected but that is a thing that can be added to SQL and there is at least one proposal for doing so. So I do not really see anything fundamentally different from SQL.
  https://keyjoin.org/
  Full disclosure: I am one of the co-authors of this paper and an associated patch implenting it in PostgreSQL that we have proposed.
  I am happy to see more people than us think this is useful.
- js8 4 hours ago
  > How does CQL differ from SQL?
  SQL is like Java, CQL is like Haskell. SQL has been around and used in production. CQL is a research language, possibly cleaner foundation but YMMV.
  The math fields you list are connected, but whether they are the same monster - again it's kinda like claiming all programming languages and implementations are the same (Turing-complete?) monster.
  [-]
  - randomNumber7 4 hours ago
    SQL is not an imperative programming language.
    [-]
    - adrian_b 3 hours ago
      DROP TABLE ?
      Most of SQL is not imperative, but it certainly also includes some imperative commands.
      Inserting a new row into an existing table is an imperative command, which may be the most frequently used of the SQL features, in certain applications concerned with recording transactions.
      Only the subset of SQL that is used for queries can be said to not be an imperative programming language.
      [-]
      - da_chicken 2 hours ago
        No. Standard DDL and DML are declarative in SQL, including DROP and INSERT. Those still don't tell the system how to accomplish the thing. Declarative doesn't mean idempotent, and it doesn't mean stateless.
        Imperative SQL is the procedural elements that mostly do not exist at the standard level. Variables, control flow, and cursors.
        [-]
        adrian_b 7 minutes ago
        Anything that causes a persistent change of the state of a system is imperative, regardless of how detailed the command is.
        printf("Hello world!");
        also does not tell anything to the system about how to accomplish this.
        Anyone who claims that an SQL command like insert a row, create a table or destroy a table is not imperative, is just plainly wrong.
        Even in real life, when humans communicate commands to each other, from where the term "imperative" comes, the commands normally do not tell anything to the recipient about how to accomplish the order, they just name the action, because it is supposed that whoever receives the command knows how to do it.
        FrustratedMonky 1 hour ago
        "how to accomplish the thing."
        I always thought this was splitting hairs.
        C#,C++, Java seems imperative. You are in control? But you aren't really telling it how to move values between registers, the compiler is making a million decisions for you on how the computer will execute that 'imperative' code. Just like SQL isn't really telling the DB how to do it either..
        [-]
        cmrdporcupine 1 hour ago
        It's really a matter of degrees though. You're waving away a big part of the big sell of a relational database as proposed by Codd, which is that the user need not "know" the structure of the data in order to formulate operations on it because there's a consistent set-oriented model that can be used with a bunch of different physical storage forms but also the very sequence of relational operations against it can be re-ordered / restructured without the user knowing. And that the same data can be accessed in N number of ways that don't require changing the underlying storage. In theory. In practice SQL databases are only sort-of there.
        Contrast that if I create a class/object/field structure/hierarchy in Java, or put a HashMap somewhere with a certain set of keys, I've written something in stone which requires significant refactoring if the data needs to be accessed from a different direction.
        cmrdporcupine 1 hour ago
        This just exposes how weak this definition of declarative is, I think. Or how it's carrying two meanings here.
        What you're really talking about is one of the things Codd wanted to emphasize: representational independence. Which actually was the primary thrust of his famous paper: the user should not need to know how the data is stored in order to use or manipulate it.
        The other thing that people are talking about with "declarative" is probably another level up in abstraction. Talking about the business logic or problem in terms that are closer to logic than a sequence of instructions, and then letting the machine sort out what those steps are.
        Consider in a Datalog users don't customarily perform DDL type operations; they declare data rules and the system decides the form of the underlying relations. That's a small step up the declarative ladder from SQL, even if it's somewhat analogous to "create view".
        So I think there's a blurry definitional line between the two. But I don't think your very blunt "No." is doing much to help clear that up?
- khana 3 minutes ago
  [dead]
srean 5 hours ago
There was a good blog post on how the category theoretic ideas behind this applies to data frames
What Category Theory Teaches Us About DataFrames https://mchav.github.io/what-category-theory-teaches-us-abou...
Discussed on HN at (67 comments)
https://news.ycombinator.com/item?id=47561426
randomNumber7 5 hours ago
Since Codd's paper showed that the relational model dominates other approaches (for data storage) I would expect a paper that shows categorical database are not affected by this and what benefit they have.
[-]
- js8 5 hours ago
  My (amateur) take. CDB model (based on functions) has three advantages over RDB model (based on relations):
  1. Easier modelling sum types (inheritance) due to duality.
  2. Better handling of null due to labelled null.
  3. Better foundation of elementary types (they're just another table ids). (Column stores often do that already, if your question is about storage.)
  [-]
  - adrian_b 3 hours ago
    While the relational model is claimed to be based on relations, the vast majority of the "relations" used in practice are functions, not general relations.
    A general relation exists only between the columns of a table that are included in a multi-column primary key.
    All columns that are not part of the primary key are functions of the primary key.
    Most tables used in practice use a single column as the primary key, which is frequently just a number or a UUID. Most databases contain only tables that are functions, without any table that contains general relations.
    The most frequently used kinds of joins are just function compositions.
flying_sheep 5 hours ago
Thanks for the sharing. It looks interesting but I did not dive deep into it. Just wonder how is it different from SQL trigger which can also ensure integrities?
[-]
- js8 5 hours ago
  It's not much really, CDBs are based on foreign key relationships as a fundamental building block, rather than on relation.
  The difference is more in theory than in practice.
mattsouth 4 hours ago
Not to be confused with https://cql.hl7.org/ which is the CQL I know of.