External replication (ETL) monitoring

Track replication status, view logs, and troubleshoot issues.

Private Alpha

External replication (ETL) is currently in private alpha. Managed pipelines run through Supabase ETL. Access is limited and features may change.

After setting up external replication (ETL), you can monitor the status and health of your replication pipelines directly from the Dashboard. The pipeline is the active Postgres replication process that continuously streams changes from your database to your destination.

Viewing pipeline status#

To monitor your replication pipelines:

Navigate to the Database > Replication section of the Dashboard
You'll see a list of all your destinations with their pipeline status

Pipeline states#

Each destination shows its pipeline in one of these states:

State	Description
Stopped	Pipeline is not running
Starting	Pipeline is being started
Running	Pipeline is actively replicating data
Stopping	Pipeline is being stopped
Failed	Pipeline has encountered an error (hover over the status to view error details)

Viewing detailed pipeline metrics#

For detailed information about a specific pipeline, click View status on the destination. This opens the pipeline status page where you can monitor replication performance and table states.

Replication lag metrics#

The status page shows replication lag metrics that help you determine how far the pipeline is behind Postgres. These metrics are loaded directly from Postgres replication slot state.

The destinations list also shows a compact lag value. This value is byte-based: it shows how much WAL the pipeline has not confirmed as flushed yet. A value of Caught up means the pipeline has confirmed every change currently available for its slot.

The detailed status page shows:

Metric	What it means	What to watch for
Waiting to sync	Bytes of WAL between the pipeline's confirmed flush position and the current Postgres WAL position. This is the main byte-based replication lag.	A value that keeps growing means the pipeline is receiving changes more slowly than Postgres produces them.
Room before pausing	How much WAL can still accumulate before Postgres can no longer safely keep all WAL needed by the replication slot. This is controlled by `max_slot_wal_keep_size`.	A small or shrinking value means the slot is getting closer to being invalidated. `Unlimited` means Postgres is not reporting a slot WAL retention limit.
Last check-in	How long it has been since the pipeline last sent replication feedback to Postgres.	An old value can mean the pipeline is stopped, disconnected, overloaded, or unable to make progress.
Connected	Whether the pipeline's replication slot is active and currently being used.	`Not connected` while the pipeline should be running usually means you should check pipeline status and logs.
Slot status	How safely Postgres is keeping the WAL files the pipeline still needs.	`Unreserved` and `Lost` require action. See Slot statuses.

External replication (ETL) uses one main pipeline replication slot for ongoing changes. During the initial copy phase, it can also create temporary table-sync replication slots. These temporary slots let multiple tables copy in parallel, make large table copies faster, and allow individual tables to be retried or copied again without restarting the whole pipeline.

Temporary table-sync slots show the same kind of lag and slot health metrics while they are active. After a table finishes copying and catches up, its temporary slot is removed and ongoing changes continue through the main pipeline slot. For overall replication health, focus first on the main pipeline slot.

Slot statuses#

Replication slot status tells you whether Postgres is still retaining the WAL that the pipeline needs to continue from its current position.

Status	Meaning
Reserved	Healthy. Postgres is keeping the WAL files this pipeline's replication slot needs, and they are within the normal WAL size limit.
Extended	Healthy, but growing. The slot is holding on to more WAL than usual, but Postgres is still keeping everything the pipeline needs.
Unreserved	At risk. Postgres is no longer reserving all WAL files this pipeline's replication slot needs. If the pipeline does not catch up soon, those files may be removed.
Lost	Broken. Some WAL files this pipeline's replication slot needs have already been removed. The pipeline can no longer continue from this slot. Recreate the pipeline, or set Invalidated slot behavior to Recreate in the pipeline's advanced settings and restart it.
Unknown	Postgres reported an unknown or unavailable state for this pipeline's replication slot.

Table states#

The pipeline status page also shows the state of individual tables being replicated. Each table can be in one of these states:

State	Description
Queue	Table is getting ready to be copied
Copying	Initial snapshot of the table is being copied
Copied	Table snapshot is complete and getting ready for real-time replication
Live	Table is now replicating data in near real-time
Error	Table has experienced an error during replication

Dealing with replication lag#

Replication lag means the pipeline is behind the source database. Some lag is expected during the initial copy phase, after a burst of writes, or after restarting a stopped pipeline. Lag becomes a problem when it keeps increasing, when Room before pausing is running low, or when the slot status moves to Unreserved or Lost.

Lag can come from several places:

Destination throughput: The destination is slow, rate-limited, unavailable, or rejecting writes.
Pipeline throughput: The pipeline is overloaded, processing a very large transaction, or not performing as expected for the project workload.
Source database activity: Postgres is producing WAL faster than the pipeline can consume it, often during bulk writes, migrations, or backfill jobs.
Network latency: Latency or instability between the pipeline and source database can slow down WAL streaming.
Stopped or disconnected pipeline: When a pipeline is stopped, disconnected, or failed, Postgres keeps WAL for the slot until the retention limit is reached.
Slow initial table copy: A temporary table-sync slot can fall behind if a table is copied more slowly than new changes are written to that table.

Initial copy and table-sync slots#

A common initial sync issue happens when a large or busy table is still in Copying while new rows keep being inserted or updated. The temporary table-sync slot needs to keep the changes that happen during the copy. If the copy is too slow compared to the table's write rate, the slot can move to Unreserved and then Lost if Postgres removes changes the copy still needs.

When a table-sync slot is lost, the affected table needs to be copied again. Tune the copy settings, then retry the table copy:

Increase Copy connections per table when one large table is the bottleneck. This lets the pipeline copy chunks of that table in parallel, up to the point where the source database or network becomes the limit.
Increase Table sync workers when several tables need to copy at the same time. Each worker can copy one table, and each worker uses an additional temporary replication slot during initial sync.
If possible, run the initial copy during a quieter write period or reduce bulk writes until the table reaches Live.

After the affected table finishes copying and catches up, the temporary slot is deleted. The table then continues through the main pipeline replication slot.

Investigate the lag#

Open Database > Replication and check the destination's lag column.
Click View status and check Waiting to sync, Room before pausing, Last check-in, Connected, and Slot status.
Check table states. Tables in Copying can create temporary lag while the initial snapshot catches up to live changes. If a table-sync slot is Unreserved or Lost, tune copy parallelism and retry the affected table copy.
Open Logs > Replication and look for destination errors, retries, rate limits, schema errors, or repeated restarts.
Compare the lag trend with recent database activity, such as imports, migrations, bulk updates, or long transactions.

Respond based on the slot status#

Slot status	What to do
Reserved	If Waiting to sync is stable or decreasing, continue monitoring. If it keeps increasing, check destination write performance, logs, and whether the publication includes more tables or write volume than expected.
Extended	Treat it as an early warning. Confirm the pipeline is connected, check logs for retries or destination slowness, and reduce avoidable write bursts if possible until the pipeline catches up.
Unreserved	Act quickly. The slot is at risk of losing required WAL. Check whether the pipeline is connected and making progress, fix destination or pipeline errors, and contact support if the lag continues to grow.
Lost	The pipeline cannot continue from the existing slot because required WAL has been removed. Recreate the pipeline, or set Invalidated slot behavior to Recreate in the pipeline's advanced settings and restart the pipeline. This creates a new slot and starts replication from scratch for all tables.
Unknown	Check replication logs for errors or missing slot details. If the status remains unknown while the pipeline should be running, contact support with the pipeline ID and recent log details.

Reduce future lag risk#

Keep publications focused on the tables and operations you need at the destination.
Avoid leaving pipelines stopped for long periods while the source database is still receiving writes.
Schedule bulk updates, imports, and migrations during lower-traffic windows when possible.
For BigQuery, verify that service account permissions, table requirements, and replica identity settings match the BigQuery destination guide.
If the initial copy is the bottleneck, review Table sync workers and Copy connections per table in the pipeline's advanced settings. Increasing them can speed up copying, but it uses more database connections and replication slots.

Handling errors#

Errors can occur at two levels: per table or per pipeline.

Table errors#

Table errors occur during the copy phase and affect individual tables. These errors can be retried without stopping the entire pipeline.

Viewing table error details:

Click View status on your destination
Check the table states section to identify tables in Error state
Review the error message for that specific table

Recovering from table errors:

When a table encounters an error during the copy phase, you can reset the table state. This will restart the table copy from the beginning.

Pipeline errors#

Pipeline errors occur during the streaming phase (Live state) and affect the entire pipeline. When streaming data, if an error occurs, the entire pipeline will stop and enter a Failed state. This prevents data loss by ensuring no changes are skipped.

When a pipeline error occurs, you'll receive an email notification immediately. This ensures you're promptly notified of any issues so you can take action to resolve them.

Viewing pipeline error details:

Hover over the Failed status in the destinations list to see a quick error summary
Click View status for comprehensive error information
Navigate to the Logs > Replication section of the Dashboard for detailed error logs

Recovering from pipeline errors:

To recover from a pipeline error, you'll need to:

Investigate the root cause using the error details and logs
Fix the underlying issue (e.g., destination connectivity, schema compatibility)
Restart the pipeline from the destinations list

Viewing logs#

To see detailed logs for all your replication pipelines:

Navigate to the Logs > Replication section of the Dashboard
Select Replication from the log source filter
You'll see all logs from your replication pipelines

Logs contain diagnostic information that may be too technical for most users. If you're experiencing issues with replication, reaching out to support with your error details is recommended.

Common monitoring scenarios#

Checking if replication is healthy#

Navigate to the Database > Replication section of the Dashboard
Verify your destination shows a "Running" status
Click View status to check replication lag and table states
Ensure all tables show a "Live" state

Investigating errors#

If you see a Failed status:

Hover over the status to see the error summary
Click View status to see detailed error information
Check table states to identify which tables are affected
Navigate to the Logs > Replication section of the Dashboard for full error details
For table errors, attempt to reset the affected tables

Monitoring performance#

To ensure optimal performance:

Regularly check replication lag metrics in the pipeline status view
Monitor table states to ensure tables are staying in a "Live" state
Review logs for warnings or performance issues
If lag is consistently high, review your publication and destination configuration

Troubleshooting#

If you notice issues with your replication:

Check pipeline state: Ensure the pipeline is in Running state
Review table states: Identify tables in Error state
Check logs: Navigate to the Logs > Replication section of the Dashboard for detailed error information
Verify publication: Ensure your Postgres publication is properly configured
Monitor replication lag: High lag may indicate performance issues

For more troubleshooting tips, see the external replication (ETL) FAQ.