It would be awesome if this were to make it in core. Jelte did a bunch of work to improve query protocol changes. Having Postgres accept Otel trace propagation would be a gamechanger for debugging the clients and DB together. Especially since ~80%+ of DB related issues I see come down to some sort of long running open transaction which Postgres knows nothing about server-side.
This looks very useful for our database heavy teams.
Getting this information is certainly already possible, but there is a bit of a barrier in front of it. You need to realize the query is slow, then you need to re-run it with the right EXPLAIN and/or ANALYZE incantation with 8-9 parameters for a query visualizer, paste it into a query visualizer and then you get some nice, easily digested overview of what is going on.
Teams either don't know how to do that, or don't do that, due to permissions or because it's a hassle. Having a slow "calculateFooReport()" trace go straight into a bunch of slow SequentialScan- and NestedLoop-Nodes would remove one excuse from that equation.
Kinda bummed that we're updating out of the supported versions starting next month.
Great idea. Currently, people have to rely on client-side spans in OpenTelemetry. However, it would be awesome if we could get spans for slow SQL queries, along with explanations.
I've had this with multiple datadog libraries. Even offered to implement some features for the python one for free because we were missing a feature but they just never responded. Note that they had this feature in their php lib so it was not some outrageous demand.
Distributed tracing is a term for tracing that span between systems. With distributed tracing you can follow request traces across your various services. With pg_tracing enabled in postgres this would extend to your database.
You have a service that talks to postgres, probably other services that talk to that service, a client that talks to these services through some gateway. When a user clicks a button, distributed tracing allows you to see the whole request right from the button click to API gateway, to every service it goes through to the DB.. and all the way back. So you can see exactly the path it took, where it slowed down or failed. DBs are usually seen as black boxes in such systems. We just see the DB call took 500ms. We don't know what happened inside though. This allows such distributed traces to also get visibility to the internals of the DB so you can tell the exact joins, index scans etc that took place by looking at your distributed trace. I don't know what level of visibility they've built but that's the general idea.
Getting this information is certainly already possible, but there is a bit of a barrier in front of it. You need to realize the query is slow, then you need to re-run it with the right EXPLAIN and/or ANALYZE incantation with 8-9 parameters for a query visualizer, paste it into a query visualizer and then you get some nice, easily digested overview of what is going on.
Teams either don't know how to do that, or don't do that, due to permissions or because it's a hassle. Having a slow "calculateFooReport()" trace go straight into a bunch of slow SequentialScan- and NestedLoop-Nodes would remove one excuse from that equation.
Kinda bummed that we're updating out of the supported versions starting next month.
PostgresSQL is already at 18.
Also, tracing support is being upstreamed into Postgres proper https://github.com/DataDog/pg_tracing/issues/86 which would make this extension obsolete
Warning
This extension is still in early development and may be unstable.
The database you send the traces to may be distributed, just like any database, but pg_tracing does nothing for it.
The hard part is to merge one trace between storage workers, that's where distributed part comes. pg_tracing does nothing for it.
UPDATE: ah I believe I see what you mean -- that it passes down the trace ID.