ClickHouse DB sent me +1000k emails

Last week my phone was flooded with notification emails 📬. Thousands of delivery-failure messages — reporting recipient disk quotas exceeded — arrived at my domain’s postmaster address. I jumped onto the VPS to see what was filling the disk. A good first check is:

du -sh /*

208.0K  /root
2.3M    /run
1.2M    /sbin
4.0K    /srv
4.0K    /swap
0       /sys
12.0K   /tmp
407.9M  /usr
18.9G   /var  <-- 🧐 Is Docker responsible for this?

and then docker system df to see if Docker images, containers, or oversized log files are the cause.

The culprit: ClickHouse logs 🔍

In my case /var/lib/docker/volumes/lagoss_clickhouse/ consumed over 30 GB. I use ClickHouse (a columnar DB) to store logs and requests for Lagoss, a FOSS edge runtime similar to Vercel or Cloudflare Workers. My first worry was that Lagoss had started spamming request logs — but the visible tables only contained a few hundred MB.

After looking deeper I discovered that ClickHouse maintains many internal logging tables (query logs, error logs, etc.), and those system tables had grown into multiple gigabytes over time.

The fix: truncate log tables and reduce logging 🛠️

Those system log tables are primarily for debugging and can be safely emptied. To truncate all *log* tables in the system database I ran the command recommended in this blog post.

clickhouse-client -q "SELECT name FROM system.tables WHERE name LIKE '%log%' AND database='system';" | xargs -I{} clickhouse-client -q "TRUNCATE TABLE system.{};"

That reclaimed several GB immediately; services on the VPS recovered and my uptime monitor (Uptime Kuma) started reporting everything as back online ✅.

To avoid a repeat, I disabled most verbose logging in ClickHouse by adding these settings to the server config (for me: /etc/clickhouse-server/config.d/logging.xml, mounted as a Docker volume):

<clickhouse>
  <logger>
    <level>warning</level>
    <console>true</console>
  </logger>
  <query_thread_log remove="remove"/>
  <query_log remove="remove"/>
  <text_log remove="remove"/>
  <trace_log remove="remove"/>
  <metric_log remove="remove"/>
  <asynchronous_metric_log remove="remove"/>
  <session_log remove="remove"/>
  <part_log remove="remove"/>
</clickhouse>

Hopefully this keeps disk usage in check going forward — but I’ll watch the volumes for a while to be sure 🔭

The culprit: ClickHouse logs 🔍#

The fix: truncate log tables and reduce logging 🛠️#

The culprit: ClickHouse logs 🔍

The fix: truncate log tables and reduce logging 🛠️