Skip to main content RisingWave Cloud provides detailed alert playbooks to help you quickly diagnose and address issues. Each playbook entry includes the alert’s name, description, common triggers, diagnostic steps, and immediate remediation actions. These alerts are organized by their function categories .
This guide is regularly updated to address emerging scenarios.
Streaming
Barrier pending for too long
No barrier has been committed in this project for more than 15 minutes.
Triggers
Streaming graph bottlenecks. Typical causes include: join amplification, insufficient resources, and suboptimal streaming query (e.g., OverWindow, Joins).
Compaction write stalls result in longer barrier sync duration.
Diagnosis
Check CPU and Memory utilization for all nodes. If those are maxed out, it suggests there’s insufficient resource.
Check if there are any creating jobs, which are being backfilled via SHOW JOBS
. Backfilling can induce higher pressure on the cluster.
Resolution
If either resources are maxed out, or backfilling is happening, scale out the cluster to alleviate the pressure.
Sink lag too large
Data for a particular sink has been pending in RisingWave’s internal log store for more than 30 minutes.
Triggers
Slow external sink processing.
Insufficient sink parallelism.
Diagnosis
Check the downstream of the sink to see if there’s any abnormality.
Compaction
Compaction back pressure
Back pressure from compaction detected in your cluster.
Triggers
Insufficient compaction resource.
Diagnosis
Check compaction CPU usage.
Check the CPU ratio of compute nodes and compactor nodes.
Resolution
Scale the compactor out. For more information, see Scale a project manually .