Monitoring & Alerting β
Tunecamp ships with two monitoring hooks: an HTTP health endpoint and optional Sentry crash reporting. Together with an external uptime check they cover the two questions that matter in production: "is the instance up?" and "what broke?".
Health endpoint β
GET /health returns 200 when the HTTP server is responsive. It is registered before the federation middleware, so a stuck integration cannot block it. The Docker image already uses it for its HEALTHCHECK.
System Resources panel (Admin β System) β
Root admins get a live resource dashboard at Admin Panel β System, backed by GET /api/admin/system/resources. It answers "how much is my instance consuming right now, and is something leaking?" without shelling into the container.
It polls every 5s and shows:
- Process CPU % β derived client-side from successive
process.cpuUsage()deltas, normalised across all cores. Plus system load average (Linux;0on Windows) and core count. - Heap used / total, RSS, external β
process.memoryUsage(). - Heap & RSS trend sparklines over the last ~5 min. A line that climbs and never drops after GC is the classic memory-leak signature; RSS is what the kernel watches before an OOM kill (exit 137).
- Host RAM used vs total (
os.totalmem/freemem). - SQLite DB file size β a cheap
stat, so the endpoint stays light enough to poll. (For the full on-disk music breakdown use the Storage tab β that one walks the directory and is intentionally on-demand only.) - Running background tasks β live list from the
TaskManager(scans, audits, tag-sync, etc.) with progress.
The endpoint is deliberately cheap (no disk walk) so polling it does not itself add load. It is gated to MANAGE_SYSTEM (root admin).
Crash reporting (Sentry) β
Opt-in: set SENTRY_DSN and restart. When unset (the default) the SDK is never initialized and nothing is ever sent.
# .env or your container's environment
SENTRY_DSN=https://xxxx@oXXXX.ingest.sentry.io/XXXXWorks with sentry.io (free tier is fine for a single instance) or any self-hosted Sentry-compatible backend such as GlitchTip.
What gets captured:
- Uncaught exceptions and unhandled rejections in the main process. Capture only β the existing fatal/non-fatal logic in
src/index.tsstill decides whether the process exits. - Unhandled route errors via the Express error handler (the JSON error response sent to clients is unchanged).
Optional variables:
| Variable | Default | Purpose |
|---|---|---|
SENTRY_ENVIRONMENT | NODE_ENV | Environment tag on events |
SENTRY_TRACES_SAMPLE_RATE | 0 | Performance tracing; 0 = errors only |
TUNECAMP_GIT_SHA | set by Docker build | Release tag, links errors to a deploy |
On CapRover the release tag is filled automatically from CAPROVER_GIT_COMMIT_SHA at build time.
In Sentry, enable an alert rule ("Send a notification for new issues") to get emailed on the first occurrence of each new error.
Uptime check (external) β
A crash reporter cannot tell you the whole machine or container is down β use an external pinger for that. Any of these free services works:
- UptimeRobot β HTTP(s) monitor
- healthchecks.io
- Better Stack
Configuration: monitor https://your-instance.example.com/health, interval 1β5 minutes, alert via email/Telegram. Expect 200 OK.
OOM kills on Docker Swarm / CapRover β
Swarm replaces unhealthy or OOM-killed tasks instead of restarting them, so a crashed container disappears from docker ps. To diagnose:
docker service ps <service-name> --no-trunc # exit codes of dead tasks
# "task: non-zero exit (137)" = OOM-killed
docker service logs <service-name> --since 1hExit code 137 means the kernel killed the process for exceeding its memory limit β raise the service memory limit or look in Sentry for what was allocating just before death.