This is huge! Congrats 🎉
This is not really related to PostgREST, but a separate project, right? Anyway, this is very interesting. It would be even better if it was related to PostgREST and would play with postgres-js in such a way that it is possible to use the real PostgREST api (or postgrest-js method chaining) to express multiple cross-referencing operations, so one can use the familiar syntax.
You must be aware of the wall clock limits for edge functions - with around 150s on free (and 400s on paid) you are guaranteed have some jobs start too late to be finished before function is terminated. You must think about factoring your processing into smaller units, so each can he retired independently, and just schedule one from the other, or use pgflow.dev to make the process easier (disclaimer: I'm the author).
for your background jobs needs, you can take it one step further and use https://pgflow.dev - i uses Supabase primitives to deliver delightful multi step bg jobs without any extra infra -- shameless plug - im the author
thanks for mentioning pgflow! (author here) agree with your recommendation - pgmq is superior and really easy to use for such workloads
pgmq is better for this kind of systems as it is durable, and pg_notify does not give any guarantees of delivery i've built a robust workflow system on top of PGMQ, fully self contained in Supabase, you may want to have a look as it covers your reliability/durability needs and does not need any external infra: https://pgflow.dev
also thinking if you plan to support load balancing across multiple instances, using hetzner load balancer?
Looks super cool! can it leverage hetzner volumes for storage (buckets, db etc)?
Good question - the main difference from this tutorial is that this reddit post chunking in parallel then embedding in parallel, and Supabase guide shows only how to embed full documents. In upcoming tutorials I will show how introduce variations, like Hypothetical Document Embedding or summary embedding in a decralartive and easy way, by just wiring additional steps together. pgflow abstracts away 200-300 lines of boilerplate code and makes it trivial to reason about how data flows. It shines in multi-step pipelines. Cheers!
Thanks for joining! Happy holidays to you too - enjoy the break and see you in the new year! 🎄
sounds interesting! happy to learn about your use case and help with any issues - feel free to DM me or join Discord :)
Thank you! Curious how are you using pgflow? I saw the vector buckets but thanks for reminding me about them - I should actually cover them in some of upcoming tutorials
## Links: - Example repo: [github.com/pgflow-dev/automatic-embeddings](https://github.com/pgflow-dev/automatic-embeddings) - Full tutorial: [pgflow.dev/tutorials/rag/automatic-embeddings/](https://pgflow.dev/tutorials/rag/automatic-embeddings/) - pgflow docs: [pgflow.dev](https://pgflow.dev) - GitHub: [github.com/pgflow-dev/pgflow](https://github.com/pgflow-dev/pgflow)
glad you like it! btw its used by early adopters on production already, in beta currently. cool stuff coming next week, monitor pgflow.dev/news/ or join the discord at pgflow.dev/discord/
Thank you for kind words. Lets make it happen!
Its surprisingly reliable! Timeout would be less reliable because Supabase is not very strict about 150s/400s wall clock timer and it fluctuates, but onbeforeunload is triggered earlier. If you have very high throughput of processed messages it is possible for worker to not finish the HTTP respawn request on time before being hard terminated. Mostly with CPU bound workloads, as Edge Runtime is much more strict with that limits. The "Keep workers up" cron job [0] is solving this issue completely, and I have few ideas on making it even more robust. To go around that you should write your handlers as retry-safe (idempotent): - factor work to small, focused functions - each focused on one state mutation (api call, sql query) - use provided Abort signal to gracefully abort (pgflow provides that, check [1]) - write upserts instead of inserts (INSERT on conflict UPDATE queries) This is a good practice overall, so I would not say it is a downside :-) --- [0] [Keep Workers Up](https://www.pgflow.dev/deploy/supabase/keep-workers-running/) [1] [shutdownSignal - Context API Reference](https://www.pgflow.dev/reference/context/#shutdownsignal)
Its surprisingly reliable! Timeout would be less reliable because Supabase is not very strict about 150s/400s wall clock timer and it fluctuates, but onbeforeunload is triggered earlier. If you have very high throughput of processed messages it is possible for worker to not finish the HTTP respawn request on time before being hard terminated. Mostly with CPU bound workloads, as Edge Runtime is much more strict with that limits. The "Keep workers up" cron job [0] is solving this issue completely, and I have few ideas on making it even more robust. To go around that you should write your handlers as retry-safe (idempotent): - factor work to small, focused functions - each focused on one state mutation (api call, sql query) - use provided Abort signal to gracefully abort (pgflow provides that, check [1]) - write upserts instead of inserts (INSERT on conflict UPDATE queries) This is a good practice overall, so I would not say it is a downside :-) --- [0] [Keep Workers Up](https://www.pgflow.dev/deploy/supabase/keep-workers-running/) [1] [shutdownSignal - Context API Reference](https://www.pgflow.dev/reference/context/#shutdownsignal)
I explored this setup when tying Supabase Queues to workers. Polling with pg\_cron works, but it gets hard to manage once you need retries, multi-step tasks, or visibility into job state. I ended up building pgflow around this gap: a Supabase-native workflow engine that runs multi-step jobs on Postgres plus Edge Functions. Postgres handles orchestration and state, and an auto-respawning Edge Function worker executes handlers. Flows can start from TypeScript, RPC, triggers, or pg\_cron, and you get realtime progress from the client. Sharing the approach in case it is useful for others evaluating cron polling vs a Postgres-driven workflow layer. [https://pgflow.dev](https://pgflow.dev)
I explored this setup when tying Supabase Queues to workers. Polling with pg\_cron works, but it gets hard to manage once you need retries, multi-step tasks, or visibility into job state. I ended up building pgflow around this gap: a Supabase-native workflow engine that runs multi-step jobs on Postgres plus Edge Functions. Postgres handles orchestration and state, and an auto-respawning Edge Function worker executes handlers. Flows can start from TypeScript, RPC, triggers, or pg\_cron, and you get realtime progress from the client. Sharing the approach in case it is useful for others evaluating cron polling vs a Postgres-driven workflow layer. [https://pgflow.dev](https://pgflow.dev)
You understand it correct - worker calls **pgmq.read_with_poll** in the loop. This long polling approach is the most cost-effective one, as you are required to do 1 http request per worker lifetime. Cron-based polling requires more http requests. Database webhooks are also doing http requests. The only other cost is Egress but it depends on how much data you write to and read from your queues and it is also affecting other approaches. It is also the best one latency-wise, jobs start as fast as 100ms after sending. For your use case - pgflow also have a simpler mode, where it just processess queue messages in a single-step fashion, exacly what you are trying to solve, check out https://www.pgflow.dev/get-started/background-jobs/create-worker/ and https://www.pgflow.dev/get-started/faq/#what-are-the-two-edge-worker-modes FYI pgflow flows can be started in various ways: cron, db event (like db webhook), rpc or a dedicated typescript client, check out the docs https://www.pgflow.dev/build/starting-flows/
You understand it correctly - worker calls **pgmq.read_with_poll** in the loop. This long polling approach is the most cost-effective one, as you are required to do 1 http request per worker lifetime. Cron-based polling requires more http requests. Database webhooks are also doing http requests. The only other cost is Egress but it depends on how much data you write to and read from your queues and it is also affecting other approaches. It is also the best one latency-wise, jobs start as fast as 100ms after sending. For your use case - pgflow also have a simpler mode, where it just processess queue messages in a single-step fashion, exacly what you are trying to solve, check out https://www.pgflow.dev/get-started/background-jobs/create-worker/ and https://www.pgflow.dev/get-started/faq/#what-are-the-two-edge-worker-modes FYI pgflow flows can be started in various ways: cron, db event (like db webhook), rpc or a dedicated typescript client, check out the docs https://www.pgflow.dev/build/starting-flows/
thanks! i was thinking about supporting Vercel btw, the only hard dependency is postgres
We talked through this use case on my Discord last week - delayed jobs + multi-step email flows. I've been working on **pgflow** for Supabase, which handles this from the database side: pgmq for delayed messages, Edge Workers consume the queue and run handlers, flows are DAGs with delays per step. So "run once in 5 minutes with retries" or "send welcome → wait 3 days → send tips" work out of the box. The queue + database handle timing instead of an external scheduler. Implementation details: - Delaying flow steps: https://www.pgflow.dev/build/delaying-steps/ - Background job worker: https://www.pgflow.dev/get-started/background-jobs/create-worker/ Different take on the same problem.
give it a go and tell me what you think!
I think you took the doc too literally - "output to a user" could also mean "create a content for a user based on structured output". OP haven't mentioned tool calling, so IMO its a safe assumption that he wants to generate JSON and not call a tool. The article i linked makes a distinction between those two and advises when tool calling should be used versus structured output. Happy to help!
using tool calling for getting a structured output is not optimal - you can achieve good results but it is just better to use a structured outputs that ai labs are specifically training models for: https://platform.openai.com/docs/guides/structured-outputs From this article: > Conversely, Structured Outputs via response_format are more suitable when you want to indicate a structured schema for use when the model responds to the user, **rather than when the model calls a tool**. > If you are connecting the model to tools, functions, data, etc. in your system, then you should use function calling - If you want to structure the model's output when it responds to the user, then you should use a structured text.format OP never mentioned how he achieved bad results and if he used structured output schema or just asking model politely to output his expected format
That parenthesis really made me smile: > 📌 Ko-fi **(plain text, non-clickable – safe for Reddit)**: ko-fi.com/s/b5b4180ff1 Feels like a prompt (or inner over-explainer 😄) leaking straight into the post - the kind of thing you only catch on a second proofread.
thanks! I'm asking as I'm fighting the start/stop in pgflow's monorepo locally and on CI all the time and fishing for any solutions that folks are using it is slow, sometimes breaks, and in monorepo, when you need to have multiple instances running its a pain in the butt! :)
Looks really easy to use, and the landing page is great! Kudos! Curious - do you have anything around starting/stopping supabase included or this is something that user must manage ot his own?
There are few things you should be doing in order to make your chances higher: \- stop editing stuff in Dashboard, use proper local dev [https://supabase.com/docs/guides/local-development](https://supabase.com/docs/guides/local-development) \- do not edit tables, use migrations [https://supabase.com/docs/guides/local-development/overview](https://supabase.com/docs/guides/local-development/overview) \- when ready for production, start using Preview Branches, so you can test your changes on a clone setup before they land on real prod [https://supabase.com/docs/guides/deployment/branching](https://supabase.com/docs/guides/deployment/branching)
Funny how the site looks completely hax0red when styles not load fully and fonts are huge like that!
happy that you found it useful! put lot of effort into writing good, usable docs i don't have ETA for the human-in-the-loop feature yet, but you can easily implement it by "splitting" your flows in half - just end the flow with the notification/dm/email that nudges human to do the approval, create an edge function that will serve as http endpoint/webhook and use the run\_id of previous flow as part of input to the new flow, this way you can track continuity.
There are few more examples: \- Source code for the demo: [https://github.com/pgflow-dev/pgflow/tree/main/apps/demo/supabase/functions/article\_flow\_worker](https://github.com/pgflow-dev/pgflow/tree/main/apps/demo/supabase/functions/article_flow_worker) \- Tutorial for building flow like in the demo - [https://www.pgflow.dev/tutorials/ai-web-scraper/](https://www.pgflow.dev/tutorials/ai-web-scraper/) \- HackerNews Post Classifier [https://github.com/pgflow-dev/demo-hn-classifier](https://github.com/pgflow-dev/demo-hn-classifier) example repo which also shows that it is easy to use historical workflow runs to test new versions of steps \- Some examples of functionalities in example flows: [https://www.pgflow.dev/build/](https://www.pgflow.dev/build/) I will be posting \*\*Use cases\*\* types of post scheduled for this subreddit in upcoming weeks here. What type of content would be most useful for you?
Sure! The [Get started](https://www.pgflow.dev/get-started/installation/) guide shows how to get it up and running locally first!
Happy to disrupt your plans if it helps you keep everything inside Supabase 🙂 pgflow keeps orchestration state in Postgres – runs, step states, tasks and queue messages, and the payloads you choose to persist for each step. Workers handle the actual "work" (LLM calls, HTTP, etc.), so the database mostly sees a stream of relatively small inserts/updates per step and should scale with whatever write throughput your Supabase Postgres can handle. If your payloads are very large, then IO/WAL volume becomes a factor, just like in any Postgres-heavy app. For typical conversational flows (a few sequential steps per message), the DB load is minimal and spread out over time as handlers execute. I've put a lot of effort into keeping the SQL lean (batching, CTEs, partial indexes), and I've written up the current status + known limitations here if you want the honest version: https://www.pgflow.dev/project-status/
Thanks! It's just the beginning - much more to come!
Glad you enjoy it!
## Links: - Demo: https://demo.pgflow.dev - Docs: https://pgflow.dev - GitHub: https://github.com/pgflow-dev/pgflow - Discord: https://pgflow.dev/discord/