A few weekends ago, I had the opportunity to work with CockroachDB at Hack the North in Waterloo. For those of you who aren’t familiar with CockroachDB, it’s a scalable and resilient SQL database that’s very easy to setup and manage. Together with my team, I worked on a tool called CockroachNest to automatically monitor distributed CockroachDB nodes in a single cluster.
CockroachNest was designed to connect to any single node in a CockroachDB cluster, and using Cockroach’s built-in API, automatically monitor all of the nodes within the cluster without requiring any changes in the monitoring tool, even as nodes were added or removed from the cluster.
To collect data, we used CockroachDB’s built-in status APIs (which are available by default on port 8080 on any Cockroach node), particularly /_status/nodes to get a list of all nodes in the cluster, and /_status/problemranges to find errors on each individual node. At first, we ran into issues with timeouts when checking the problemranges endpoint, but those were solved after some help at 1am on Sunday from an engineer at CockroachLabs who told us that querying problemranges on each individual node with node_id as a GET parameter would prevent large response times on the problemranges endpoint when one or more nodes were down. Thanks Bram!
To present collected data, we performed geo-lookups of each node’s IP address and added a coloured icon (red for dead, green for alive) for each node to a map, showing it’s location, uptime, and any other relevant information. These icons were updated in real-time as monitoring data was refreshed from the individual Cockroach nodes. For an extra layer of monitoring, we also integrated PagerDuty, adding an incident whenever a node’s status went red using the node’s name as an incident key (for de-duplication) for when users weren’t actively watching the map.
Overall, we learned a lot setting up, using, and integrating CockroachDB, and I’m looking forward to using it in the real world now that it’s been added to my proverbial developer’s toolbox.