Compare commits

..

60 Commits
v1.0.0 ... main

Author SHA1 Message Date
e930178e9c
Merge branch 'rel/1.0.4' 2025-07-06 03:35:39 -04:00
5afc940df8
Bump release version to 1.0.4 2025-07-06 03:35:00 -04:00
3961aa3448
Add Gentoo ebuild for 1.0.4-rc1 2025-07-06 03:32:27 -04:00
8fd57032e7
Update Makefile to support rc versions, bump version
* Support packaging RC versions for deb and rpm packages

* Bump version to 1.0.4-rc1
2025-07-06 03:28:36 -04:00
5ede7dea07
Fix package-all make target
* Fix the package-all target in the Makefile

* Remove storage of raw sequence stats
2025-07-05 12:28:36 -04:00
5ea007c3f6
Merge branch 'dev/sequences' into develop 2025-07-05 01:18:01 -04:00
7cb0f7ad40
Fix typo in activity query 2025-07-05 01:17:35 -04:00
83fa12ec54
Correct types for slot LSN lag metrics 2025-07-04 02:46:25 -04:00
98b74d9aed
Add sequence metrics to Zabbix template 2025-07-04 02:45:43 -04:00
45953848e2
Remove some type casts from working around Decimal types 2025-07-03 10:16:20 -04:00
6116f4f885
Fix missing json default 2025-07-03 02:08:45 -04:00
24d1214855
Teach json how to serialize decimals 2025-07-03 01:47:06 -04:00
3c39d8aa97
Remove CIDR prefix from replication id 2025-07-03 01:13:58 -04:00
ebb084aa9d
Add initial query for sequence usage 2025-07-01 02:29:56 -04:00
86d5e8917b
Merge branch 'dev/io_stats' into develop 2025-07-01 01:30:21 -04:00
be84e2b89a
Report some I/O metrics in bytes 2025-06-30 02:00:06 -04:00
fd0a9e1230
Add backend I/O discoovery items to template 2025-06-29 02:08:19 -04:00
b934a30124
Add pg_stat_io query
* Add a query for pg_stat_io metrics grouped by backend type

* Switch some unsupported values from 0 to NULL to better reflect their
  nonexistence
2025-06-28 00:30:03 -04:00
ea397ef889
Improve docker-related query test failure avoidance 2025-06-18 17:39:16 -04:00
def6bab7f4
Add a pause after deploying test db containers as a temporary kludge 2025-06-18 03:30:27 -04:00
ecb616f6d9
Allow retry in query tests 2025-06-18 03:23:43 -04:00
b7f731c6ac
Fix and reformat several queries 2025-06-18 03:03:34 -04:00
4fba81dc2c
Add initial bgwriter and activity metrics queries 2025-06-17 02:40:52 -04:00
9225e745b0
Handle division by zero possibility
* When no I/O activity has been reported in pg_statio_all_tables for a
  database, the queries resulted in a division by zero. Cange this to a
  NULL instead.  Still a problem for Zabbix, but maybe a little cleaner.
2025-06-17 01:16:57 -04:00
b981a9ad36
Add slot lag metrics 2025-06-16 00:56:14 -04:00
c0185d4b86
Remove extra quotes in replication slot query 2025-06-15 02:20:45 -04:00
d34dfc5bf7
Fix replication monitoring queries 2025-06-15 02:19:10 -04:00
7c395a80fe
Add metrics for tracking cache hit ratios 2025-06-13 01:16:24 -04:00
8fe81e3ab3
Fix global variable placement 2025-06-04 12:49:35 -04:00
cfe01eb63e
Fix typo in latest version Zabbix item 2025-06-04 01:44:25 -04:00
55b9b16deb
Install to /usr/bin
* Standardize all install methods to use /usr/bin rather than some of
  them using /usr/local/bin

* Make the init script executable in install-openrc
2025-06-04 01:16:59 -04:00
6a22597f1f
Fix ID age query and template 2025-06-04 00:47:51 -04:00
295d1d6310
Fix typo in install-openrc 2025-06-03 19:38:59 -04:00
8b804f3912
Add requests as a dependency, split install target
* Add requests as a dependency on all OSs

* Split the install target into a common target and separate
  install-systemd and install-openrc targets to support installing on
  Alpine
2025-06-03 19:34:58 -04:00
bc9e039cb3
Fix Gentoo package prefix, update ebuild 2025-06-03 12:46:09 -04:00
54cf117591
Fix package contents for Gentoo
* Use git to identify only the version controlled files when creating
  the Gentoo tarball.
2025-06-03 12:06:34 -04:00
8e32e01c20
Merge branch 'dev/pg-version-lag' 2025-06-03 02:00:48 -04:00
af3bbf7515
Bump version to 1.0.3 2025-06-03 02:00:26 -04:00
180fa31d14
Add latest version metric
* Add a special handler for latest version info

* Add initial latest version items/triggers to the Zabbix template
2025-06-03 01:58:44 -04:00
39a6a9d23e
Add replication slot monitoring to Zabbix template 2025-06-03 01:42:50 -04:00
f716569aa7
Add more tests for the version check code 2025-06-02 12:39:34 -04:00
22ae634a87
Format code with black 2025-06-02 03:41:15 -04:00
375bf6a982
Restructure latest version code for testing
* Split the latest version check code into functions for unit testing

* Add initial unit tests for latest version check
2025-06-02 03:39:52 -04:00
487386a7cc
Initial work on latest version check code
* Add code to pull the latest supported versions of PostgreSQL from the
  official RSS feed.

* TODO: Split rss parsing code into separate function for unit testing.

* TODO: Test/debug

* TODO: Add metrics to return how far behind the latest version the
  current cluster is.
2025-06-01 02:44:01 -04:00
15097dcba4
Merge branch 'dev/query-tests' 2025-06-01 00:55:46 -04:00
1d642d41b2
Bump version to 1.0.2
* Modify Makefile to extract the version from the mainscript

* Bump version to 1.0.2
2025-06-01 00:23:26 -04:00
80304f40d1
Revise the versions on a few queries, improve query tests
* Add ability to specify the sslmode parameter when connecting to
  PostgreSQL

* Fix min versions for replication queries

* Add query-tests target to main Makefile
2025-06-01 00:12:31 -04:00
c0e1531083
Add query test script and test mode
* Add a mode to test all metric queries

* Add a script to run query tests against different versions of
  PostgeSQL

* Add Docker elements for query testing

* Switch to using a --config flag when specifying the config file

* Fix some metric queries

* Allow the agent address to be configured

* Allow the sslmode connection parameter to be configured
2025-05-22 14:53:25 -04:00
529bef9679
Add ability to run query tests 2025-05-18 12:52:32 -04:00
8928bba337
Format python using black 2025-05-15 02:04:50 -04:00
c872fc6b90
Start implementing metric tests 2025-05-15 02:01:20 -04:00
030afafc20
Actually fix Gentoo ebuild for v1.0.1 2025-05-14 02:42:10 -04:00
2dfc336288
Add target to build package for Gentoo 2025-05-14 01:43:44 -04:00
27e1c517bc
Add ebuild file for v1.0.1 2025-05-14 01:05:17 -04:00
e6166d1fe3
Update version to 1.0.1 2025-05-14 00:29:09 -04:00
bffabd9c8f
Reformat python code using black 2025-05-13 01:44:47 -04:00
98ac25743b
Add queries for replication slot monitoring 2025-04-19 02:33:48 -04:00
7fc23961b0
Switch template to http agent 2025-04-19 02:28:33 -04:00
8ace133c23
Dynamically specify version for RPM builds 2025-04-19 00:22:26 -04:00
2afeb827ed
Improve openrc init script, add port setting
* Ensure the log directory exists with openrc

* Add a port setting to configure the port the agent listens on

* Switch to RealDictCursor

* Fix type for connection timeout
2025-04-19 00:07:15 -04:00
22 changed files with 3011 additions and 725 deletions

View File

@ -3,7 +3,7 @@ Version: 1.0
Section: utils Section: utils
Priority: optional Priority: optional
Architecture: all Architecture: all
Depends: logrotate, python3 (>= 3.6), python3-psycopg2, python3-yaml, systemd Depends: logrotate, python3 (>= 3.6), python3-psycopg2, python3-requests, python3-yaml, systemd
Maintainer: James Campbell <james@commandprompt.com> Maintainer: James Campbell <james@commandprompt.com>
Homepage: https://www.commandprompt.com Homepage: https://www.commandprompt.com
Description: A bridge to sit between monitoring tools and PostgreSQL Description: A bridge to sit between monitoring tools and PostgreSQL

View File

@ -1,52 +0,0 @@
# Copyright 2024 Gentoo Authors
# Distributed under the terms of the GNU General Public License v2
EAPI=8
PYTHON_COMPAT=( python3_{6..12} )
inherit git-r3 python-r1
DESCRIPTION="PostgreSQL monitoring bridge"
HOMEPAGE="None"
LICENSE="BSD"
SLOT="0"
KEYWORDS="amd64"
EGIT_REPO_URI="https://code2.shh-dot-com.org/james/pgmon.git"
#EGIT_COMMIT=""
DEPEND="
${PYTHON_DEPS}
dev-python/psycopg:3
dev-python/pyyaml
acct-user/zabbix
acct-group/zabbix
agent? ( net-analyzer/zabbix[agent] )
agent2? ( net-analyzer/zabbix[agent2] )
app-admin/logrotate
"
RDEPEND="${DEPEND}"
BDEPEND=""
src_install() {
default
# Install init script
newinitd "${FILESDIR}/pgmon.openrc" pgmon
# Install script
exeinto /usr/bin
newexe "${S}/pgmon.py" pgmon
# Install default config
diropts -o root -g zabbix -m 0755
insinto /etc/pgmon
doins "${FILESDIR}/pgmon.yml"
doins "${S}/pgmon-metrics.yml"
# Install logrotate config
insinto /etc/logrotate.d
newins "${FILESDIR}/pgmon.logrotate" pgmon
}

View File

@ -5,7 +5,7 @@ EAPI=8
PYTHON_COMPAT=( python3_{6..13} ) PYTHON_COMPAT=( python3_{6..13} )
inherit git-r3 python-r1 inherit python-r1
DESCRIPTION="PostgreSQL monitoring bridge" DESCRIPTION="PostgreSQL monitoring bridge"
HOMEPAGE="None" HOMEPAGE="None"
@ -14,7 +14,9 @@ LICENSE="BSD"
SLOT="0" SLOT="0"
KEYWORDS="amd64" KEYWORDS="amd64"
SRC_URI="https://code2.shh-dot-com.org/james/${PN}/archive/v${PV}.tar.gz -> ${P}.tar.gz" SRC_URI="https://code2.shh-dot-com.org/james/${PN}/archive/v${PV}.tar.bz2 -> ${P}.tar.bz2"
IUSE="-systemd"
DEPEND=" DEPEND="
${PYTHON_DEPS} ${PYTHON_DEPS}
@ -25,21 +27,36 @@ DEPEND="
RDEPEND="${DEPEND}" RDEPEND="${DEPEND}"
BDEPEND="" BDEPEND=""
S="${WORKDIR}/${PN}" RESTRICT="fetch"
#S="${WORKDIR}/${PN}"
pkg_nofetch() {
einfo "Please download"
einfo " - ${P}.tar.bz2"
einfo "from ${HOMEPAGE} and place it in your DISTDIR directory."
einfo "The file should be owned by portage:portage."
}
src_compile() {
true
}
src_install() { src_install() {
default
# Install init script # Install init script
newinitd "openrc/pgmon.initd" pgmon if ! use systemd ; then
newconfd "openrc/pgmon.confd" pgmon newinitd "openrc/pgmon.initd" pgmon
newconfd "openrc/pgmon.confd" pgmon
fi
# Install systemd unit # Install systemd unit
systemd_dounit "systemd/pgmon.service" if use systemd ; then
systemd_dounit "systemd/pgmon.service"
fi
# Install script # Install script
exeinto /usr/bin exeinto /usr/bin
newexe "pgmon.py" pgmon newexe "src/pgmon.py" pgmon
# Install default config # Install default config
diropts -o root -g root -m 0755 diropts -o root -g root -m 0755

73
GENTOO/pgmon-1.0.2.ebuild Normal file
View File

@ -0,0 +1,73 @@
# Copyright 2024 Gentoo Authors
# Distributed under the terms of the GNU General Public License v2
EAPI=8
PYTHON_COMPAT=( python3_{6..13} )
inherit python-r1
DESCRIPTION="PostgreSQL monitoring bridge"
HOMEPAGE="None"
LICENSE="BSD"
SLOT="0"
KEYWORDS="amd64"
SRC_URI="https://code2.shh-dot-com.org/james/${PN}/archive/v${PV}.tar.bz2 -> ${P}.tar.bz2"
IUSE="-systemd"
DEPEND="
${PYTHON_DEPS}
dev-python/psycopg:2
dev-python/pyyaml
app-admin/logrotate
"
RDEPEND="${DEPEND}"
BDEPEND=""
RESTRICT="fetch"
#S="${WORKDIR}/${PN}"
pkg_nofetch() {
einfo "Please download"
einfo " - ${P}.tar.bz2"
einfo "from ${HOMEPAGE} and place it in your DISTDIR directory."
einfo "The file should be owned by portage:portage."
}
src_compile() {
true
}
src_install() {
# Install init script
if ! use systemd ; then
newinitd "openrc/pgmon.initd" pgmon
newconfd "openrc/pgmon.confd" pgmon
fi
# Install systemd unit
if use systemd ; then
systemd_dounit "systemd/pgmon.service"
fi
# Install script
exeinto /usr/bin
newexe "src/pgmon.py" pgmon
# Install default config
diropts -o root -g root -m 0755
insinto /etc/pgmon
doins "sample-config/pgmon.yml"
doins "sample-config/pgmon-metrics.yml"
# Install logrotate config
insinto /etc/logrotate.d
newins "logrotate/pgmon.logrotate" pgmon
# Install man page
doman manpages/pgmon.1
}

74
GENTOO/pgmon-1.0.3.ebuild Normal file
View File

@ -0,0 +1,74 @@
# Copyright 2024 Gentoo Authors
# Distributed under the terms of the GNU General Public License v2
EAPI=8
PYTHON_COMPAT=( python3_{6..13} )
inherit python-r1 systemd
DESCRIPTION="PostgreSQL monitoring bridge"
HOMEPAGE="None"
LICENSE="BSD"
SLOT="0"
KEYWORDS="amd64"
SRC_URI="https://code2.shh-dot-com.org/james/${PN}/releases/download/v${PV}/${P}.tar.bz2"
IUSE="-systemd"
DEPEND="
${PYTHON_DEPS}
dev-python/psycopg:2
dev-python/pyyaml
dev-python/requests
app-admin/logrotate
"
RDEPEND="${DEPEND}"
BDEPEND=""
#RESTRICT="fetch"
#S="${WORKDIR}/${PN}"
#pkg_nofetch() {
# einfo "Please download"
# einfo " - ${P}.tar.bz2"
# einfo "from ${HOMEPAGE} and place it in your DISTDIR directory."
# einfo "The file should be owned by portage:portage."
#}
src_compile() {
true
}
src_install() {
# Install init script
if ! use systemd ; then
newinitd "openrc/pgmon.initd" pgmon
newconfd "openrc/pgmon.confd" pgmon
fi
# Install systemd unit
if use systemd ; then
systemd_dounit "systemd/pgmon.service"
fi
# Install script
exeinto /usr/bin
newexe "src/pgmon.py" pgmon
# Install default config
diropts -o root -g root -m 0755
insinto /etc/pgmon
doins "sample-config/pgmon.yml"
doins "sample-config/pgmon-metrics.yml"
# Install logrotate config
insinto /etc/logrotate.d
newins "logrotate/pgmon.logrotate" pgmon
# Install man page
doman manpages/pgmon.1
}

74
GENTOO/pgmon-1.0.4.ebuild Normal file
View File

@ -0,0 +1,74 @@
# Copyright 2024 Gentoo Authors
# Distributed under the terms of the GNU General Public License v2
EAPI=8
PYTHON_COMPAT=( python3_{6..13} )
inherit python-r1 systemd
DESCRIPTION="PostgreSQL monitoring bridge"
HOMEPAGE="None"
LICENSE="BSD"
SLOT="0"
KEYWORDS="amd64"
SRC_URI="https://code2.shh-dot-com.org/james/${PN}/releases/download/v${PV}/${P}.tar.bz2"
IUSE="-systemd"
DEPEND="
${PYTHON_DEPS}
dev-python/psycopg:2
dev-python/pyyaml
dev-python/requests
app-admin/logrotate
"
RDEPEND="${DEPEND}"
BDEPEND=""
#RESTRICT="fetch"
#S="${WORKDIR}/${PN}"
#pkg_nofetch() {
# einfo "Please download"
# einfo " - ${P}.tar.bz2"
# einfo "from ${HOMEPAGE} and place it in your DISTDIR directory."
# einfo "The file should be owned by portage:portage."
#}
src_compile() {
true
}
src_install() {
# Install init script
if ! use systemd ; then
newinitd "openrc/pgmon.initd" pgmon
newconfd "openrc/pgmon.confd" pgmon
fi
# Install systemd unit
if use systemd ; then
systemd_dounit "systemd/pgmon.service"
fi
# Install script
exeinto /usr/bin
newexe "src/pgmon.py" pgmon
# Install default config
diropts -o root -g root -m 0755
insinto /etc/pgmon
doins "sample-config/pgmon.yml"
doins "sample-config/pgmon-metrics.yml"
# Install logrotate config
insinto /etc/logrotate.d
newins "logrotate/pgmon.logrotate" pgmon
# Install man page
doman manpages/pgmon.1
}

120
Makefile
View File

@ -1,9 +1,25 @@
# Package details # Package details
PACKAGE_NAME := pgmon PACKAGE_NAME := pgmon
VERSION := 1.0
SCRIPT := src/$(PACKAGE_NAME).py SCRIPT := src/$(PACKAGE_NAME).py
# Figure out the version components
# Note: The release is for RPM packages, where prerelease releases are written as 0.<release>
FULL_VERSION := $(shell grep -m 1 '^VERSION = ' "$(SCRIPT)" | sed -ne 's/.*"\(.*\)".*/\1/p')
VERSION := $(shell echo $(FULL_VERSION) | sed -n 's/\(.*\)\(-rc.*\|$$\)/\1/p')
RELEASE := $(shell echo $(FULL_VERSION) | sed -n 's/.*-rc\([0-9]\+\)$$/\1/p')
ifeq ($(RELEASE),)
RPM_RELEASE := 1
RPM_VERSION := $(VERSION)-$(RPM_RELEASE)
DEB_VERSION := $(VERSION)
else
RPM_RELEASE := 0.$(RELEASE)
RPM_VERSION := $(VERSION)-$(RPM_RELEASE)
DEB_VERSION := $(VERSION)~rc$(RELEASE)
endif
# Where packages are built # Where packages are built
BUILD_DIR := build BUILD_DIR := build
@ -15,18 +31,29 @@ SUPPORTED := ubuntu-20.04 \
debian-11 \ debian-11 \
rockylinux-8 \ rockylinux-8 \
rockylinux-9 \ rockylinux-9 \
oraclelinux-7 oraclelinux-7 \
gentoo
## ##
# These targets are the main ones to use for most things. # These targets are the main ones to use for most things.
## ##
.PHONY: all clean tgz test install .PHONY: all clean tgz test query-tests install-common install-openrc install-systemd
all: package-all
version:
@echo "full version=$(FULL_VERSION) version=$(VERSION) rel=$(RELEASE) rpm=$(RPM_VERSION) deb=$(DEB_VERSION)"
# Build all packages # Build all packages
.PHONY: package-all .PHONY: package-all
all: $(foreach distro_release, $(SUPPORTED), package-$(distro_release)) package-all: $(foreach distro_release, $(SUPPORTED), package-$(distro_release))
# Gentoo package (tar.gz) creation
.PHONY: package-gentoo
package-gentoo:
mkdir -p $(BUILD_DIR)/gentoo
tar --transform "s,^,$(PACKAGE_NAME)-$(FULL_VERSION)/," -acjf $(BUILD_DIR)/gentoo/$(PACKAGE_NAME)-$(FULL_VERSION).tar.bz2 --exclude .gitignore $(shell git ls-tree --full-tree --name-only -r HEAD)
# Create a deb package # Create a deb package
@ -42,13 +69,12 @@ package-%:
--user $(shell id -u):$(shell id -g) \ --user $(shell id -u):$(shell id -g) \
"$(DISTRO)-packager:$(RELEASE)" "$(DISTRO)-packager:$(RELEASE)"
# Create a tarball # Create a tarball
tgz: tgz:
rm -rf $(BUILD_DIR)/tgz/root rm -rf $(BUILD_DIR)/tgz/root
mkdir -p $(BUILD_DIR)/tgz/root mkdir -p $(BUILD_DIR)/tgz/root
$(MAKE) install DESTDIR=$(BUILD_DIR)/tgz/root $(MAKE) install-openrc DESTDIR=$(BUILD_DIR)/tgz/root
tar -cz -f $(BUILD_DIR)/tgz/$(PACKAGE_NAME)-$(VERSION).tgz -C $(BUILD_DIR)/tgz/root . tar -cz -f $(BUILD_DIR)/tgz/$(PACKAGE_NAME)-$(FULL_VERSION).tgz -C $(BUILD_DIR)/tgz/root .
# Clean up the build directory # Clean up the build directory
clean: clean:
@ -58,18 +84,21 @@ clean:
test: test:
cd src ; python3 -m unittest cd src ; python3 -m unittest
# Install the script at the specified base directory # Run query tests
install: query-tests:
cd tests ; ./run-tests.sh
# Install the script at the specified base directory (common components)
install-common:
# Set up directories # Set up directories
mkdir -p $(DESTDIR)/etc/$(PACKAGE_NAME) mkdir -p $(DESTDIR)/etc/$(PACKAGE_NAME)
mkdir -p ${DESTDIR}/etc/logrotate.d mkdir -p ${DESTDIR}/etc/logrotate.d
mkdir -p $(DESTDIR)/lib/systemd/system mkdir -p $(DESTDIR)/usr/bin
mkdir -p $(DESTDIR)/usr/local/bin
mkdir -p $(DESTDIR)/usr/share/man/man1 mkdir -p $(DESTDIR)/usr/share/man/man1
# Install script # Install script
cp $(SCRIPT) $(DESTDIR)/usr/local/bin/$(PACKAGE_NAME) cp $(SCRIPT) $(DESTDIR)/usr/bin/$(PACKAGE_NAME)
chmod 755 $(DESTDIR)/usr/local/bin/$(PACKAGE_NAME) chmod 755 $(DESTDIR)/usr/bin/$(PACKAGE_NAME)
# Install manpage # Install manpage
cp manpages/* $(DESTDIR)/usr/share/man/man1/ cp manpages/* $(DESTDIR)/usr/share/man/man1/
@ -78,15 +107,39 @@ install:
# Install sample config # Install sample config
cp sample-config/* $(DESTDIR)/etc/$(PACKAGE_NAME)/ cp sample-config/* $(DESTDIR)/etc/$(PACKAGE_NAME)/
# Install systemd unit files
cp systemd/* $(DESTDIR)/lib/systemd/system/
# Install logrotate config # Install logrotate config
cp logrotate/${PACKAGE_NAME}.logrotate ${DESTDIR}/etc/logrotate.d/${PACKAGE_NAME} cp logrotate/${PACKAGE_NAME}.logrotate ${DESTDIR}/etc/logrotate.d/${PACKAGE_NAME}
# Install for systemd
install-systemd:
# Install the common stuff
$(MAKE) install-common
# Set up directories
mkdir -p $(DESTDIR)/lib/systemd/system
# Install systemd unit files
cp systemd/* $(DESTDIR)/lib/systemd/system/
# Install for open-rc
install-openrc:
# Install the common stuff
$(MAKE) install-common
# Set up directories
mkdir -p $(DESTDIR)/etc/init.d
mkdir -p $(DESTDIR)/etc/conf.d
# Install init script
cp openrc/pgmon.initd $(DESTDIR)/etc/init.d/pgmon
chmod 755 $(DESTDIR)/etc/init.d/pgmon
# Install init script config file
cp openrc/pgmon.confd $(DESTDIR)/etc/conf.d/pgmon
# Run all of the install tests # Run all of the install tests
.PHONY: install-tests debian-%-install-test rockylinux-%-install-test ubuntu-%-install-test .PHONY: install-tests debian-%-install-test rockylinux-%-install-test ubuntu-%-install-test gentoo-install-test
install-tests: $(foreach distro_release, $(SUPPORTED), $(distro_release)-install-test) install-tests: $(foreach distro_release, $(SUPPORTED), $(distro_release)-install-test)
@ -95,28 +148,33 @@ debian-%-install-test:
docker run --rm \ docker run --rm \
-v ./$(BUILD_DIR):/output \ -v ./$(BUILD_DIR):/output \
debian:$* \ debian:$* \
bash -c 'apt-get update && apt-get install -y /output/$(PACKAGE_NAME)-$(VERSION)-debian-$*.deb' bash -c 'apt-get update && apt-get install -y /output/$(PACKAGE_NAME)-$(DEB_VERSION)-debian-$*.deb'
# Run a RedHat install test # Run a RedHat install test
rockylinux-%-install-test: rockylinux-%-install-test:
docker run --rm \ docker run --rm \
-v ./$(BUILD_DIR):/output \ -v ./$(BUILD_DIR):/output \
rockylinux:$* \ rockylinux:$* \
bash -c 'dnf makecache && dnf install -y /output/$(PACKAGE_NAME)-$(VERSION)-1.el$*.noarch.rpm' bash -c 'dnf makecache && dnf install -y /output/$(PACKAGE_NAME)-$(RPM_VERSION).el$*.noarch.rpm'
# Run an Ubuntu install test # Run an Ubuntu install test
ubuntu-%-install-test: ubuntu-%-install-test:
docker run --rm \ docker run --rm \
-v ./$(BUILD_DIR):/output \ -v ./$(BUILD_DIR):/output \
ubuntu:$* \ ubuntu:$* \
bash -c 'apt-get update && apt-get install -y /output/$(PACKAGE_NAME)-$(VERSION)-ubuntu-$*.deb' bash -c 'apt-get update && apt-get install -y /output/$(PACKAGE_NAME)-$(DEB_VERSION)-ubuntu-$*.deb'
# Run an OracleLinux install test (this is for EL7 since CentOS7 images no longer exist) # Run an OracleLinux install test (this is for EL7 since CentOS7 images no longer exist)
oraclelinux-%-install-test: oraclelinux-%-install-test:
docker run --rm \ docker run --rm \
-v ./$(BUILD_DIR):/output \ -v ./$(BUILD_DIR):/output \
oraclelinux:7 \ oraclelinux:7 \
bash -c 'yum makecache && yum install -y /output/$(PACKAGE_NAME)-$(VERSION)-1.el7.noarch.rpm' bash -c 'yum makecache && yum install -y /output/$(PACKAGE_NAME)-$(RPM_VERSION).el7.noarch.rpm'
# Run a Gentoo install test
gentoo-install-test:
# May impliment this in the future, but would require additional headaches to set up a repo
true
## ##
# Container targets # Container targets
@ -151,30 +209,30 @@ package-image-%:
# Debian package creation # Debian package creation
actually-package-debian-%: actually-package-debian-%:
$(MAKE) install DESTDIR=/output/debian-$* $(MAKE) install-systemd DESTDIR=/output/debian-$*
cp -r --preserve=mode DEBIAN /output/debian-$*/ cp -r --preserve=mode DEBIAN /output/debian-$*/
dpkg-deb -Zgzip --build /output/debian-$* "/output/$(PACKAGE_NAME)-$(VERSION)-debian-$*.deb" dpkg-deb -Zgzip --build /output/debian-$* "/output/$(PACKAGE_NAME)-$(DEB_VERSION)-debian-$*.deb"
# RedHat package creation # RedHat package creation
actually-package-rockylinux-%: actually-package-rockylinux-%:
mkdir -p /output/rockylinux-$*/{BUILD,RPMS,SOURCES,SPECS,SRPMS} mkdir -p /output/rockylinux-$*/{BUILD,RPMS,SOURCES,SPECS,SRPMS}
cp RPM/$(PACKAGE_NAME).spec /output/rockylinux-$*/SPECS/ sed -e "s/@@VERSION@@/$(VERSION)/g" -e "s/@@RELEASE@@/$(RPM_RELEASE)/g" RPM/$(PACKAGE_NAME).spec > /output/rockylinux-$*/SPECS/$(PACKAGE_NAME).spec
rpmbuild --define '_topdir /output/rockylinux-$*' \ rpmbuild --define '_topdir /output/rockylinux-$*' \
--define 'version $(VERSION)' \ --define 'version $(RPM_VERSION)' \
-bb /output/rockylinux-$*/SPECS/$(PACKAGE_NAME).spec -bb /output/rockylinux-$*/SPECS/$(PACKAGE_NAME).spec
cp /output/rockylinux-$*/RPMS/noarch/$(PACKAGE_NAME)-$(VERSION)-1.el$*.noarch.rpm /output/ cp /output/rockylinux-$*/RPMS/noarch/$(PACKAGE_NAME)-$(RPM_VERSION).el$*.noarch.rpm /output/
# Ubuntu package creation # Ubuntu package creation
actually-package-ubuntu-%: actually-package-ubuntu-%:
$(MAKE) install DESTDIR=/output/ubuntu-$* $(MAKE) install-systemd DESTDIR=/output/ubuntu-$*
cp -r --preserve=mode DEBIAN /output/ubuntu-$*/ cp -r --preserve=mode DEBIAN /output/ubuntu-$*/
dpkg-deb -Zgzip --build /output/ubuntu-$* "/output/$(PACKAGE_NAME)-$(VERSION)-ubuntu-$*.deb" dpkg-deb -Zgzip --build /output/ubuntu-$* "/output/$(PACKAGE_NAME)-$(DEB_VERSION)-ubuntu-$*.deb"
# OracleLinux package creation # OracleLinux package creation
actually-package-oraclelinux-%: actually-package-oraclelinux-%:
mkdir -p /output/oraclelinux-$*/{BUILD,RPMS,SOURCES,SPECS,SRPMS} mkdir -p /output/oraclelinux-$*/{BUILD,RPMS,SOURCES,SPECS,SRPMS}
cp RPM/$(PACKAGE_NAME)-el7.spec /output/oraclelinux-$*/SPECS/$(PACKAGE_NAME).spec sed -e "s/@@VERSION@@/$(VERSION)/g" -e "s/@@RELEASE@@/$(RPM_RELEASE)/g" RPM/$(PACKAGE_NAME)-el7.spec > /output/oraclelinux-$*/SPECS/$(PACKAGE_NAME).spec
rpmbuild --define '_topdir /output/oraclelinux-$*' \ rpmbuild --define '_topdir /output/oraclelinux-$*' \
--define 'version $(VERSION)' \ --define 'version $(RPM_VERSION)' \
-bb /output/oraclelinux-$*/SPECS/$(PACKAGE_NAME).spec -bb /output/oraclelinux-$*/SPECS/$(PACKAGE_NAME).spec
cp /output/oraclelinux-$*/RPMS/noarch/$(PACKAGE_NAME)-$(VERSION)-1.el$*.noarch.rpm /output/ cp /output/oraclelinux-$*/RPMS/noarch/$(PACKAGE_NAME)-$(RPM_VERSION).el$*.noarch.rpm /output/

View File

@ -1,13 +1,13 @@
Name: pgmon Name: pgmon
Version: 1.0 Version: @@VERSION@@
Release: 1%{?dist} Release: @@RELEASE@@%{?dist}
Summary: A bridge to sit between monitoring tools and PostgreSQL Summary: A bridge to sit between monitoring tools and PostgreSQL
License: MIT License: MIT
URL: https://www.commandprompt.com URL: https://www.commandprompt.com
BuildArch: noarch BuildArch: noarch
Requires: logrotate, python, python-psycopg2, PyYAML, systemd Requires: logrotate, python, python-psycopg2, PyYAML, python-requests, systemd
%description %description
A bridge to sit between monitoring tools and PostgreSQL A bridge to sit between monitoring tools and PostgreSQL
@ -19,7 +19,7 @@ A bridge to sit between monitoring tools and PostgreSQL
# Do nothing since we have nothing to build # Do nothing since we have nothing to build
%install %install
make -C /src install DESTDIR=%{buildroot} make -C /src install-systemd DESTDIR=%{buildroot}
%files %files
/etc/logrotate.d/pgmon /etc/logrotate.d/pgmon
@ -28,7 +28,7 @@ make -C /src install DESTDIR=%{buildroot}
/etc/pgmon/pgmon-service.conf /etc/pgmon/pgmon-service.conf
/lib/systemd/system/pgmon.service /lib/systemd/system/pgmon.service
/lib/systemd/system/pgmon@.service /lib/systemd/system/pgmon@.service
/usr/local/bin/pgmon /usr/bin/pgmon
/usr/share/man/man1/pgmon.1.gz /usr/share/man/man1/pgmon.1.gz
%post %post

View File

@ -1,13 +1,13 @@
Name: pgmon Name: pgmon
Version: 1.0 Version: @@VERSION@@
Release: 1%{?dist} Release: @@RELEASE@@%{?dist}
Summary: A bridge to sit between monitoring tools and PostgreSQL Summary: A bridge to sit between monitoring tools and PostgreSQL
License: MIT License: MIT
URL: https://www.commandprompt.com URL: https://www.commandprompt.com
BuildArch: noarch BuildArch: noarch
Requires: logrotate, python3, python3-psycopg2, python3-pyyaml, systemd Requires: logrotate, python3, python3-psycopg2, python3-pyyaml, python3-requests, systemd
%description %description
A bridge to sit between monitoring tools and PostgreSQL A bridge to sit between monitoring tools and PostgreSQL
@ -19,7 +19,7 @@ A bridge to sit between monitoring tools and PostgreSQL
# Do nothing since we have nothing to build # Do nothing since we have nothing to build
%install %install
make -C /src install DESTDIR=%{buildroot} make -C /src install-systemd DESTDIR=%{buildroot}
%files %files
/etc/logrotate.d/pgmon /etc/logrotate.d/pgmon
@ -28,7 +28,7 @@ make -C /src install DESTDIR=%{buildroot}
/etc/pgmon/pgmon-service.conf /etc/pgmon/pgmon-service.conf
/lib/systemd/system/pgmon.service /lib/systemd/system/pgmon.service
/lib/systemd/system/pgmon@.service /lib/systemd/system/pgmon@.service
/usr/local/bin/pgmon /usr/bin/pgmon
/usr/share/man/man1/pgmon.1.gz /usr/share/man/man1/pgmon.1.gz
%post %post

View File

@ -11,7 +11,14 @@ PGMON_USER="${PGMON_USER:-postgres}"
PGMON_GROUP="${PGMON_GROUP:-$PGMON_USER}" PGMON_GROUP="${PGMON_GROUP:-$PGMON_USER}"
CONFIG_FILE="/etc/pgmon/${agent_name}.yml" CONFIG_FILE="/etc/pgmon/${agent_name}.yml"
output_log=/var/log/pgmon/${SVCNAME}.log
error_log=/var/log/pgmon/${SVCNAME}.err
start_pre() {
checkpath -f -m 0644 -o "${PGMON_USER}:${PGMON_GROUP}" "${output_log}" "${error_log}"
}
command="/usr/bin/pgmon" command="/usr/bin/pgmon"
command_args="'$CONFIG_FILE'" command_args="-c '$CONFIG_FILE'"
command_background="true" command_background="true"
command_user="${PGMON_USER}:${PGMON_GROUP}" command_user="${PGMON_USER}:${PGMON_GROUP}"

4
requirements-dev.yml Normal file
View File

@ -0,0 +1,4 @@
-r requirements.txt
testcontainers[postgresql]
pytest
black

View File

@ -1,45 +1,307 @@
metrics: metrics:
##
# Discovery metrics # Discovery metrics
##
discover_dbs: discover_dbs:
type: set type: set
query: query:
0: SELECT datname AS dbname FROM pg_database 0: >
SELECT datname AS dbname
FROM pg_database
# Note: If the user lacks sufficient privileges, these fields will be NULL.
# The WHERE clause is intended to prevent Zabbix from discovering a
# connection it cannot monitor. Ideally this would generate an error
# instead.
discover_rep: discover_rep:
type: set type: set
query: query:
0: SELECT client_addr || '_' || regexp_replace(application_name, '[ ,]', '_', 'g') AS repid, client_addr, state FROM pg_stat_replication 0: >
SELECT host(client_addr) || '_' || regexp_replace(application_name, '[ ,]', '_', 'g') AS repid,
client_addr,
state
FROM pg_stat_replication
WHERE state IS NOT NULL
discover_slots:
type: set
query:
90400: >
SELECT slot_name,
plugin,
slot_type,
database,
false as temporary,
active
FROM pg_replication_slots
100000: >
SELECT slot_name,
plugin,
slot_type,
database,
temporary,
active
FROM pg_replication_slots
##
# cluster-wide metrics # cluster-wide metrics
##
version: version:
type: value type: value
query: query:
0: SHOW server_version_num 0: SHOW server_version_num
max_frozen_age:
type: value
query:
0: SELECT max(age(datfrozenxid)) FROM pg_database
max_frozen_age:
type: row
query:
0: >
SELECT max(age(datfrozenxid)) AS xid_age,
NULL AS mxid_age
FROM pg_database
90600: >
SELECT max(age(datfrozenxid)) AS xid_age,
max(mxid_age(datminmxid)) AS mxid_age
FROM pg_database
bgwriter:
type: row
query:
0: >
SELECT checkpoints_timed,
checkpoints_req,
checkpoint_write_time,
checkpoint_sync_time,
buffers_checkpoint,
buffers_clean,
maxwritten_clean,
buffers_backend,
buffers_backend_fsync,
buffers_alloc
FROM pg_stat_bgwriter
170000: >
SELECT cp.num_timed AS checkpoints_timed,
cp.num_requested AS checkpoints_req,
cp.write_time AS checkpoint_write_time,
cp.sync_time AS checkpoint_sync_time,
cp.buffers_written AS buffers_checkpoint,
bg.buffers_clean AS buffers_clean,
bg.maxwritten_clean AS maxwritten_clean,
NULL AS buffers_backend,
NULL AS buffers_backend_fsync,
bg.buffers_alloc AS buffers_alloc
FROM pg_stat_bgwriter bg
CROSS JOIN pg_stat_checkpointer cp
io_per_backend:
type: set
query:
160000: >
SELECT backend_type,
COALESCE(SUM(reads * op_bytes), 0) AS reads,
COALESCE(SUM(read_time), 0) AS read_time,
COALESCE(SUM(writes * op_bytes), 0) AS writes,
COALESCE(SUM(write_time), 0) AS write_time,
COALESCE(SUM(writebacks * op_bytes), 0) AS writebacks,
COALESCE(SUM(writeback_time), 0) AS writeback_time,
COALESCE(SUM(extends * op_bytes), 0) AS extends,
COALESCE(SUM(extend_time), 0) AS extend_time,
COALESCE(SUM(op_bytes), 0) AS op_bytes,
COALESCE(SUM(hits), 0) AS hits,
COALESCE(SUM(evictions), 0) AS evictions,
COALESCE(SUM(reuses), 0) AS reuses,
COALESCE(SUM(fsyncs), 0) AS fsyncs,
COALESCE(SUM(fsync_time), 0) AS fsync_time
FROM pg_stat_io
GROUP BY backend_type
##
# Per-database metrics # Per-database metrics
##
db_stats: db_stats:
type: row type: row
query: query:
0: SELECT numbackends, xact_commit, xact_rollback, blks_read, blks_hit, tup_returned, tup_fetched, tup_inserted, tup_updated, tup_deleted, conflicts, temp_files, temp_bytes, deadlocks, blk_read_time, blk_write_time, extract('epoch' from stats_reset)::float FROM pg_stat_database WHERE datname = %(dbname)s 0: >
140000: SELECT numbackends, xact_commit, xact_rollback, blks_read, blks_hit, tup_returned, tup_fetched, tup_inserted, tup_updated, tup_deleted, conflicts, temp_files, temp_bytes, deadlocks, COALESCE(checksum_failures, 0) AS checksum_failures, blk_read_time, blk_write_time, session_time, active_time, idle_in_transaction_time, sessions, sessions_abandoned, sessions_fatal, sessions_killed, extract('epoch' from stats_reset)::float FROM pg_stat_database WHERE datname = %(dbname)s SELECT numbackends,
xact_commit,
xact_rollback,
blks_read,
blks_hit,
tup_returned,
tup_fetched,
tup_inserted,
tup_updated,
tup_deleted,
conflicts,
temp_files,
temp_bytes,
deadlocks,
NULL AS checksum_failures,
blk_read_time,
blk_write_time,
NULL AS session_time,
NULL AS active_time,
NULL AS idle_in_transaction_time,
NULL AS sessions,
NULL AS sessions_abandoned,
NULL AS sessions_fatal,
NULL AS sessions_killed,
extract('epoch' from stats_reset) AS stats_reset
FROM pg_stat_database WHERE datname = %(dbname)s
140000: >
SELECT numbackends,
xact_commit,
xact_rollback,
blks_read,
blks_hit,
tup_returned,
tup_fetched,
tup_inserted,
tup_updated,
tup_deleted,
conflicts,
temp_files,
temp_bytes,
deadlocks,
COALESCE(checksum_failures, 0) AS checksum_failures,
blk_read_time,
blk_write_time,
session_time,
active_time,
idle_in_transaction_time,
sessions,
sessions_abandoned,
sessions_fatal,
sessions_killed,
extract('epoch' from stats_reset) AS stats_reset
FROM pg_stat_database WHERE datname = %(dbname)s
test_args:
dbname: postgres
hit_ratios:
type: row
query:
0: >
SELECT sum(heap_blks_read)::float / NULLIF(sum(heap_blks_read + heap_blks_hit), 0) AS avg_heap_hit_ratio,
sum(idx_blks_hit)::float / NULLIF(sum(idx_blks_read + idx_blks_hit), 0) AS avg_idx_hit_ratio,
sum(toast_blks_hit)::float / NULLIF(sum(toast_blks_read + toast_blks_hit), 0) AS avg_toast_hit_ratio,
sum(tidx_blks_hit)::float / NULLIF(sum(tidx_blks_read + tidx_blks_hit), 0) AS avg_tidx_hit_ratio
FROM pg_statio_all_tables
test_args:
dbname: postgres
activity:
type: set
query:
0: >
SELECT state,
count(*) AS backend_count,
COALESCE(EXTRACT(EPOCH FROM max(now() - state_change)), 0) AS max_state_time
FROM pg_stat_activity
WHERE datname = %(dbname)s
GROUP BY state
test_args:
dbname: postgres
sequence_usage:
type: value
query:
# 9.2 lacks lateral joins, the pg_sequence_last_value function, and the pg_sequences view
# 0: >
# SELECT COALESCE(MAX(pg_sequence_last_value(c.oid)::float / (pg_sequence_parameters(oid)).maximum_value), 0) AS max_usage
# FROM pg_class c
# WHERE c.relkind = 'S'
# 9.3 - 9.6 lacks the pg_sequence_last_value function, and pg_sequences view
# 90300: >
# SELECT COALESCE(MAX(pg_sequence_last_value(c.oid)::float / s.maximum_value), 0) AS max_usage
# FROM pg_class c
# CROSS JOIN LATERAL pg_sequence_parameters(c.oid) AS s
# WHERE c.relkind = 'S'
100000: SELECT COALESCE(MAX(last_value::float / max_value), 0) AS max_usage FROM pg_sequences;
test_args:
dbname: postgres
sequence_visibility:
type: row
query:
100000: >
SELECT COUNT(*) FILTER (WHERE has_sequence_privilege(c.oid, 'SELECT,USAGE')) AS visible_sequences,
COUNT(*) AS total_sequences
FROM pg_class AS c
WHERE relkind = 'S';
##
# Per-replication metrics
##
rep_stats:
type: row
query:
90400: >
SELECT pid, usename,
EXTRACT(EPOCH FROM backend_start) AS backend_start,
state,
pg_xlog_location_diff(pg_current_xlog_location(), sent_location) AS sent_lsn,
pg_xlog_location_diff(pg_current_xlog_location(), write_location) AS write_lsn,
pg_xlog_location_diff(pg_current_xlog_location(), flush_location) AS flush_lsn,
pg_xlog_location_diff(pg_current_xlog_location(), replay_location) AS replay_lsn,
NULL AS write_lag,
NULL AS flush_lag,
NULL AS replay_lag,
sync_state
FROM pg_stat_replication
WHERE host(client_addr) || '_' || regexp_replace(application_name, '[ ,]', '_', 'g') = %(repid)s
100000: >
SELECT pid, usename,
EXTRACT(EPOCH FROM backend_start) AS backend_start,
state,
pg_wal_lsn_diff(pg_current_wal_lsn(), sent_lsn) AS sent_lsn,
pg_wal_lsn_diff(pg_current_wal_lsn(), write_lsn) AS write_lsn,
pg_wal_lsn_diff(pg_current_wal_lsn(), flush_lsn) AS flush_lsn,
pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS replay_lsn,
COALESCE(EXTRACT(EPOCH FROM write_lag), 0) AS write_lag,
COALESCE(EXTRACT(EPOCH FROM flush_lag), 0) AS flush_lag,
COALESCE(EXTRACT(EPOCH FROM replay_lag), 0) AS replay_lag,
sync_state
FROM pg_stat_replication
WHERE host(client_addr) || '_' || regexp_replace(application_name, '[ ,]', '_', 'g') = %(repid)s
test_args:
repid: 127.0.0.1_test_rep
##
# Per-slot metrics
##
slot_stats:
type: row
query:
90400: >
SELECT NULL as active_pid,
xmin,
pg_xlog_location_diff(pg_current_xlog_location(), restart_lsn) AS restart_bytes,
NULL AS confirmed_flush_bytes
FROM pg_replication_slots WHERE slot_name = %(slot)s
90600: >
SELECT active_pid,
xmin,
pg_xlog_location_diff(pg_current_xlog_location(), restart_lsn) AS restart_bytes,
pg_xlog_location_diff(pg_current_xlog_location(), confirmed_flush_lsn) AS confirmed_flush_bytes
FROM pg_replication_slots WHERE slot_name = %(slot)s
100000: >
SELECT active_pid,
xmin,
pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) AS restart_bytes,
pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn) AS confirmed_flush_bytes
FROM pg_replication_slots WHERE slot_name = %(slot)s
test_args:
slot: test_slot
##
# Debugging # Debugging
##
ntables: ntables:
type: value type: value
query: query:
0: SELECT count(*) AS ntables FROM pg_stat_user_tables 0: SELECT count(*) AS ntables FROM pg_stat_user_tables
# Per-replication metrics
rep_stats:
type: row
query:
0: SELECT * FROM pg_stat_database WHERE client_addr || '_' || regexp_replace(application_name, '[ ,]', '_', 'g') = '{repid}'
# Debugging
sleep:
type: value
query:
0: SELECT now(), pg_sleep(5);

View File

@ -1,3 +1,9 @@
# The address the agent binds to
#address: 127.0.0.1
# The port the agent listens on for requests
#port: 5400
# Min PostgreSQL connection pool size (per database) # Min PostgreSQL connection pool size (per database)
#min_pool_size: 0 #min_pool_size: 0
@ -23,6 +29,9 @@
# Default database to connect to when none is specified for a metric # Default database to connect to when none is specified for a metric
#dbname: 'postgres' #dbname: 'postgres'
# SSL connection mode
#ssl_mode: require
# Timeout for getting a connection slot from a pool # Timeout for getting a connection slot from a pool
#pool_slot_timeout: 5 #pool_slot_timeout: 5

View File

@ -4,6 +4,7 @@ import yaml
import json import json
import time import time
import os import os
import sys
import argparse import argparse
import logging import logging
@ -11,7 +12,7 @@ import logging
from datetime import datetime, timedelta from datetime import datetime, timedelta
import psycopg2 import psycopg2
from psycopg2.extras import DictCursor from psycopg2.extras import RealDictCursor
from psycopg2.pool import ThreadedConnectionPool from psycopg2.pool import ThreadedConnectionPool
from contextlib import contextmanager from contextlib import contextmanager
@ -23,7 +24,12 @@ from http.server import BaseHTTPRequestHandler, HTTPServer
from http.server import ThreadingHTTPServer from http.server import ThreadingHTTPServer
from urllib.parse import urlparse, parse_qs from urllib.parse import urlparse, parse_qs
VERSION = '0.1.0' import requests
import re
from decimal import Decimal
VERSION = "1.0.4"
# Configuration # Configuration
config = {} config = {}
@ -42,6 +48,12 @@ cluster_version = None
cluster_version_next_check = None cluster_version_next_check = None
cluster_version_lock = Lock() cluster_version_lock = Lock()
# PostgreSQL latest version information
latest_version = None
latest_version_next_check = None
latest_version_lock = Lock()
release_supported = None
# Running state (used to gracefully shut down) # Running state (used to gracefully shut down)
running = True running = True
@ -53,64 +65,79 @@ config_file = None
# Configure logging # Configure logging
log = logging.getLogger(__name__) log = logging.getLogger(__name__)
formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(filename)s: %(funcName)s() line %(lineno)d: %(message)s') formatter = logging.Formatter(
"%(asctime)s - %(levelname)s - %(filename)s: %(funcName)s() line %(lineno)d: %(message)s"
)
console_log_handler = logging.StreamHandler() console_log_handler = logging.StreamHandler()
console_log_handler.setFormatter(formatter) console_log_handler.setFormatter(formatter)
log.addHandler(console_log_handler) log.addHandler(console_log_handler)
# Error types # Error types
class ConfigError(Exception): class ConfigError(Exception):
pass pass
class DisconnectedError(Exception): class DisconnectedError(Exception):
pass pass
class UnhappyDBError(Exception): class UnhappyDBError(Exception):
pass pass
class UnknownMetricError(Exception):
pass
class MetricVersionError(Exception): class MetricVersionError(Exception):
pass pass
class LatestVersionCheckError(Exception):
pass
# Default config settings # Default config settings
default_config = { default_config = {
# The address the agent binds to
"address": "127.0.0.1",
# The port the agent listens on for requests
"port": 5400,
# Min PostgreSQL connection pool size (per database) # Min PostgreSQL connection pool size (per database)
'min_pool_size': 0, "min_pool_size": 0,
# Max PostgreSQL connection pool size (per database) # Max PostgreSQL connection pool size (per database)
'max_pool_size': 4, "max_pool_size": 4,
# How long a connection can sit idle in the pool before it's removed (seconds) # How long a connection can sit idle in the pool before it's removed (seconds)
'max_idle_time': 30, "max_idle_time": 30,
# Log level for stderr logging # Log level for stderr logging
'log_level': 'error', "log_level": "error",
# Database user to connect as # Database user to connect as
'dbuser': 'postgres', "dbuser": "postgres",
# Database host # Database host
'dbhost': '/var/run/postgresql', "dbhost": "/var/run/postgresql",
# Database port # Database port
'dbport': 5432, "dbport": 5432,
# Default database to connect to when none is specified for a metric # Default database to connect to when none is specified for a metric
'dbname': 'postgres', "dbname": "postgres",
# SSL connection mode
"ssl_mode": "require",
# Timeout for getting a connection slot from a pool # Timeout for getting a connection slot from a pool
'pool_slot_timeout': 5, "pool_slot_timeout": 5,
# PostgreSQL connection timeout (seconds) # PostgreSQL connection timeout (seconds)
# Note: It can actually be double this because of retries # Note: It can actually be double this because of retries
'connect_timeout': 5, "connect_timeout": 5,
# Time to wait before trying to reconnect again after a reconnect failure (seconds) # Time to wait before trying to reconnect again after a reconnect failure (seconds)
'reconnect_cooldown': 30, "reconnect_cooldown": 30,
# How often to check the version of PostgreSQL (seconds) # How often to check the version of PostgreSQL (seconds)
'version_check_period': 300, "version_check_period": 300,
# How often to check the latest supported version of PostgreSQL (seconds)
"latest_version_check_period": 86400,
# Metrics # Metrics
'metrics': {} "metrics": {},
} }
def update_deep(d1, d2): def update_deep(d1, d2):
""" """
Recursively update a dict, adding keys to dictionaries and appending to Recursively update a dict, adding keys to dictionaries and appending to
@ -124,24 +151,33 @@ def update_deep(d1, d2):
The new d1 The new d1
""" """
if not isinstance(d1, dict) or not isinstance(d2, dict): if not isinstance(d1, dict) or not isinstance(d2, dict):
raise TypeError('Both arguments to update_deep need to be dictionaries') raise TypeError("Both arguments to update_deep need to be dictionaries")
for k, v2 in d2.items(): for k, v2 in d2.items():
if isinstance(v2, dict): if isinstance(v2, dict):
v1 = d1.get(k, {}) v1 = d1.get(k, {})
if not isinstance(v1, dict): if not isinstance(v1, dict):
raise TypeError('Type mismatch between dictionaries: {} is not a dict'.format(type(v1).__name__)) raise TypeError(
"Type mismatch between dictionaries: {} is not a dict".format(
type(v1).__name__
)
)
d1[k] = update_deep(v1, v2) d1[k] = update_deep(v1, v2)
elif isinstance(v2, list): elif isinstance(v2, list):
v1 = d1.get(k, []) v1 = d1.get(k, [])
if not isinstance(v1, list): if not isinstance(v1, list):
raise TypeError('Type mismatch between dictionaries: {} is not a list'.format(type(v1).__name__)) raise TypeError(
"Type mismatch between dictionaries: {} is not a list".format(
type(v1).__name__
)
)
d1[k] = v1 + v2 d1[k] = v1 + v2
else: else:
d1[k] = v2 d1[k] = v2
return d1 return d1
def read_config(path, included = False):
def read_config(path, included=False):
""" """
Read a config file. Read a config file.
@ -151,7 +187,7 @@ def read_config(path, included = False):
""" """
# Read config file # Read config file
log.info("Reading log file: {}".format(path)) log.info("Reading log file: {}".format(path))
with open(path, 'r') as f: with open(path, "r") as f:
try: try:
cfg = yaml.safe_load(f) cfg = yaml.safe_load(f)
except yaml.parser.ParserError as e: except yaml.parser.ParserError as e:
@ -161,42 +197,53 @@ def read_config(path, included = False):
config_base = os.path.dirname(path) config_base = os.path.dirname(path)
# Read any external queries and validate metric definitions # Read any external queries and validate metric definitions
for name, metric in cfg.get('metrics', {}).items(): for name, metric in cfg.get("metrics", {}).items():
# Validate return types # Validate return types
try: try:
if metric['type'] not in ['value', 'row', 'column', 'set']: if metric["type"] not in ["value", "row", "column", "set"]:
raise ConfigError("Invalid return type: {} for metric {} in {}".format(metric['type'], name, path)) raise ConfigError(
"Invalid return type: {} for metric {} in {}".format(
metric["type"], name, path
)
)
except KeyError: except KeyError:
raise ConfigError("No type specified for metric {} in {}".format(name, path)) raise ConfigError(
"No type specified for metric {} in {}".format(name, path)
)
# Ensure queries exist # Ensure queries exist
query_dict = metric.get('query', {}) query_dict = metric.get("query", {})
if type(query_dict) is not dict: if type(query_dict) is not dict:
raise ConfigError("Query definition should be a dictionary, got: {} for metric {} in {}".format(query_dict, name, path)) raise ConfigError(
"Query definition should be a dictionary, got: {} for metric {} in {}".format(
query_dict, name, path
)
)
if len(query_dict) == 0: if len(query_dict) == 0:
raise ConfigError("Missing queries for metric {} in {}".format(name, path)) raise ConfigError("Missing queries for metric {} in {}".format(name, path))
# Read external sql files and validate version keys # Read external sql files and validate version keys
for vers, query in metric['query'].items(): for vers, query in metric["query"].items():
try: try:
int(vers) int(vers)
except: except:
raise ConfigError("Invalid version: {} for metric {} in {}".format(vers, name, path)) raise ConfigError(
"Invalid version: {} for metric {} in {}".format(vers, name, path)
)
if query.startswith('file:'): if query.startswith("file:"):
query_path = query[5:] query_path = query[5:]
if not query_path.startswith('/'): if not query_path.startswith("/"):
query_path = os.path.join(config_base, query_path) query_path = os.path.join(config_base, query_path)
with open(query_path, 'r') as f: with open(query_path, "r") as f:
metric['query'][vers] = f.read() metric["query"][vers] = f.read()
# Read any included config files # Read any included config files
for inc in cfg.get('include', []): for inc in cfg.get("include", []):
# Prefix relative paths with the directory from the current config # Prefix relative paths with the directory from the current config
if not inc.startswith('/'): if not inc.startswith("/"):
inc = os.path.join(config_base, inc) inc = os.path.join(config_base, inc)
update_deep(cfg, read_config(inc, included=True)) update_deep(cfg, read_config(inc, included=True))
# Return the config we read if this is an include, otherwise set the final # Return the config we read if this is an include, otherwise set the final
@ -209,19 +256,26 @@ def read_config(path, included = False):
update_deep(new_config, cfg) update_deep(new_config, cfg)
# Minor sanity checks # Minor sanity checks
if len(new_config['metrics']) == 0: if len(new_config["metrics"]) == 0:
log.error("No metrics are defined") log.error("No metrics are defined")
raise ConfigError("No metrics defined") raise ConfigError("No metrics defined")
# Validate the new log level before changing the config # Validate the new log level before changing the config
if new_config['log_level'].upper() not in ['DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL']: if new_config["log_level"].upper() not in [
raise ConfigError("Invalid log level: {}".format(new_config['log_level'])) "DEBUG",
"INFO",
"WARNING",
"ERROR",
"CRITICAL",
]:
raise ConfigError("Invalid log level: {}".format(new_config["log_level"]))
global config global config
config = new_config config = new_config
# Apply changes to log level # Apply changes to log level
log.setLevel(logging.getLevelName(config['log_level'].upper())) log.setLevel(logging.getLevelName(config["log_level"].upper()))
def signal_handler(sig, frame): def signal_handler(sig, frame):
""" """
@ -233,7 +287,7 @@ def signal_handler(sig, frame):
signal.signal(signal.SIGINT, signal.default_int_handler) signal.signal(signal.SIGINT, signal.default_int_handler)
# Signal everything to shut down # Signal everything to shut down
if sig in [ signal.SIGINT, signal.SIGTERM, signal.SIGQUIT ]: if sig in [signal.SIGINT, signal.SIGTERM, signal.SIGQUIT]:
log.info("Shutting down ...") log.info("Shutting down ...")
global running global running
running = False running = False
@ -245,11 +299,12 @@ def signal_handler(sig, frame):
log.warning("Received config reload signal") log.warning("Received config reload signal")
read_config(config_file) read_config(config_file)
class ConnectionPool(ThreadedConnectionPool): class ConnectionPool(ThreadedConnectionPool):
def __init__(self, dbname, minconn, maxconn, *args, **kwargs): def __init__(self, dbname, minconn, maxconn, *args, **kwargs):
# Make sure dbname isn't different in the kwargs # Make sure dbname isn't different in the kwargs
kwargs['dbname'] = dbname kwargs["dbname"] = dbname
super().__init__(minconn, maxconn, *args, **kwargs) super().__init__(minconn, maxconn, *args, **kwargs)
self.name = dbname self.name = dbname
@ -270,7 +325,10 @@ class ConnectionPool(ThreadedConnectionPool):
except psycopg2.pool.PoolError: except psycopg2.pool.PoolError:
# If we failed to get the connection slot, wait a bit and try again # If we failed to get the connection slot, wait a bit and try again
time.sleep(0.1) time.sleep(0.1)
raise TimeoutError("Timed out waiting for an available connection to {}".format(self.name)) raise TimeoutError(
"Timed out waiting for an available connection to {}".format(self.name)
)
def get_pool(dbname): def get_pool(dbname):
""" """
@ -288,26 +346,32 @@ def get_pool(dbname):
# lock # lock
if dbname not in connections: if dbname not in connections:
log.info("Creating connection pool for: {}".format(dbname)) log.info("Creating connection pool for: {}".format(dbname))
# Actually create the connection pool
connections[dbname] = ConnectionPool( connections[dbname] = ConnectionPool(
dbname, dbname,
int(config['min_pool_size']), int(config["min_pool_size"]),
int(config['max_pool_size']), int(config["max_pool_size"]),
application_name='pgmon', application_name="pgmon",
host=config['dbhost'], host=config["dbhost"],
port=config['dbport'], port=config["dbport"],
user=config['dbuser'], user=config["dbuser"],
connect_timeout=float(config['connect_timeout']), connect_timeout=int(config["connect_timeout"]),
sslmode='require') sslmode=config["ssl_mode"],
)
# Clear the unhappy indicator if present # Clear the unhappy indicator if present
unhappy_cooldown.pop(dbname, None) unhappy_cooldown.pop(dbname, None)
return connections[dbname] return connections[dbname]
def handle_connect_failure(pool): def handle_connect_failure(pool):
""" """
Mark the database as being unhappy so we can leave it alone for a while Mark the database as being unhappy so we can leave it alone for a while
""" """
dbname = pool.name dbname = pool.name
unhappy_cooldown[dbname] = datetime.now() + timedelta(seconds=int(config['reconnect_cooldown'])) unhappy_cooldown[dbname] = datetime.now() + timedelta(
seconds=int(config["reconnect_cooldown"])
)
def get_query(metric, version): def get_query(metric, version):
""" """
@ -318,42 +382,61 @@ def get_query(metric, version):
version: The PostgreSQL version number, as given by server_version_num version: The PostgreSQL version number, as given by server_version_num
""" """
# Select the correct query # Select the correct query
for v in reversed(sorted(metric['query'].keys())): for v in reversed(sorted(metric["query"].keys())):
if version >= v: if version >= v:
if len(metric['query'][v].strip()) == 0: if len(metric["query"][v].strip()) == 0:
raise MetricVersionError("Metric no longer applies to PostgreSQL {}".format(version)) raise MetricVersionError(
return metric['query'][v] "Metric no longer applies to PostgreSQL {}".format(version)
)
return metric["query"][v]
raise MetricVersionError('Missing metric query for PostgreSQL {}'.format(version)) raise MetricVersionError("Missing metric query for PostgreSQL {}".format(version))
def json_encode_special(obj):
"""
Encoder function to handle types the standard JSON package doesn't know what
to do with
"""
if isinstance(obj, Decimal):
return float(obj)
raise TypeError(f'Cannot serialize object of {type(obj)}')
def run_query_no_retry(pool, return_type, query, args): def run_query_no_retry(pool, return_type, query, args):
""" """
Run the query with no explicit retry code Run the query with no explicit retry code
""" """
with pool.connection(float(config['connect_timeout'])) as conn: with pool.connection(float(config["connect_timeout"])) as conn:
try: try:
with conn.cursor(cursor_factory=DictCursor) as curs: with conn.cursor(cursor_factory=RealDictCursor) as curs:
curs.execute(query, args) curs.execute(query, args)
res = curs.fetchall() res = curs.fetchall()
if return_type == 'value': if return_type == "value":
if len(res) == 0:
return ""
return str(list(res[0].values())[0]) return str(list(res[0].values())[0])
elif return_type == 'row': elif return_type == "row":
return json.dumps(res[0]) if len(res) == 0:
elif return_type == 'column': return "[]"
return json.dumps([list(r.values())[0] for r in res]) return json.dumps(res[0], default=json_encode_special)
elif return_type == 'set': elif return_type == "column":
return json.dumps(res) if len(res) == 0:
return "[]"
return json.dumps([list(r.values())[0] for r in res], default=json_encode_special)
elif return_type == "set":
return json.dumps(res, default=json_encode_special)
except: except:
dbname = pool.name dbname = pool.name
if dbname in unhappy_cooldown: if dbname in unhappy_cooldown:
raise UnhappyDBError() raise UnhappyDBError()
elif conn.broken: elif conn.closed != 0:
raise DisconnectedError() raise DisconnectedError()
else: else:
raise raise
def run_query(pool, return_type, query, args): def run_query(pool, return_type, query, args):
""" """
Run the query, and if we find upon the first attempt that the connection Run the query, and if we find upon the first attempt that the connection
@ -384,6 +467,7 @@ def run_query(pool, return_type, query, args):
handle_connect_failure(pool) handle_connect_failure(pool)
raise UnhappyDBError() raise UnhappyDBError()
def get_cluster_version(): def get_cluster_version():
""" """
Get the PostgreSQL version if we don't already know it, or if it's been Get the PostgreSQL version if we don't already know it, or if it's been
@ -395,26 +479,228 @@ def get_cluster_version():
# If we don't know the version or it's past the recheck time, get the # If we don't know the version or it's past the recheck time, get the
# version from the database. Only one thread needs to do this, so they all # version from the database. Only one thread needs to do this, so they all
# try to grab the lock, and then make sure nobody else beat them to it. # try to grab the lock, and then make sure nobody else beat them to it.
if cluster_version is None or cluster_version_next_check is None or cluster_version_next_check < datetime.now(): if (
cluster_version is None
or cluster_version_next_check is None
or cluster_version_next_check < datetime.now()
):
with cluster_version_lock: with cluster_version_lock:
# Only check if nobody already got the version before us # Only check if nobody already got the version before us
if cluster_version is None or cluster_version_next_check is None or cluster_version_next_check < datetime.now(): if (
log.info('Checking PostgreSQL cluster version') cluster_version is None
pool = get_pool(config['dbname']) or cluster_version_next_check is None
cluster_version = int(run_query(pool, 'value', 'SHOW server_version_num', None)) or cluster_version_next_check < datetime.now()
cluster_version_next_check = datetime.now() + timedelta(seconds=int(config['version_check_period'])) ):
log.info("Checking PostgreSQL cluster version")
pool = get_pool(config["dbname"])
cluster_version = int(
run_query(pool, "value", "SHOW server_version_num", None)
)
cluster_version_next_check = datetime.now() + timedelta(
seconds=int(config["version_check_period"])
)
log.info("Got PostgreSQL cluster version: {}".format(cluster_version)) log.info("Got PostgreSQL cluster version: {}".format(cluster_version))
log.debug("Next PostgreSQL cluster version check will be after: {}".format(cluster_version_next_check)) log.debug(
"Next PostgreSQL cluster version check will be after: {}".format(
cluster_version_next_check
)
)
return cluster_version return cluster_version
def version_num_to_release(version_num):
"""
Extract the revease from a version_num.
In other words, this converts things like:
90603 => 9.6
130010 => 13
"""
if version_num // 10000 < 10:
return version_num // 10000 + (version_num % 10000 // 100 / 10)
else:
return version_num // 10000
def parse_version_rss(raw_rss, release):
"""
Parse the raw RSS from the versions.rss feed to extract the latest version of
PostgreSQL that's availabe for the cluster being monitored.
This sets these global variables:
latest_version
release_supported
It is expected that the caller already holds the latest_version_lock lock.
params:
raw_rss: The raw rss text from versions.rss
release: The PostgreSQL release we care about (ex: 9.2, 14)
"""
global latest_version
global release_supported
# Regular expressions for parsing the RSS document
version_line = re.compile(
r".*?([0-9][0-9.]+) is the latest release in the {} series.*".format(release)
)
unsupported_line = re.compile(r"^This version is unsupported")
# Loop through the RSS until we find the current release
release_found = False
for line in raw_rss.splitlines():
m = version_line.match(line)
if m:
# Note that we found the version we were looking for
release_found = True
# Convert the version to version_num format
version = m.group(1)
parts = list(map(int, version.split(".")))
if parts[0] < 10:
latest_version = int(
"{}{:02}{:02}".format(parts[0], parts[1], parts[2])
)
else:
latest_version = int("{}00{:02}".format(parts[0], parts[1]))
elif release_found:
# The next line after the version tells if the version is supported
if unsupported_line.match(line):
release_supported = False
else:
release_supported = True
break
# Make sure we actually found it
if not release_found:
raise LatestVersionCheckError("Current release ({}) not found".format(release))
log.info(
"Got latest PostgreSQL version: {} supported={}".format(
latest_version, release_supported
)
)
log.debug(
"Next latest PostgreSQL version check will be after: {}".format(
latest_version_next_check
)
)
def get_latest_version():
"""
Get the latest supported version of the major PostgreSQL release running on the server being monitored.
"""
global latest_version_next_check
# If we don't know the latest version or it's past the recheck time, get the
# version from the PostgreSQL RSS feed. Only one thread needs to do this, so
# they all try to grab the lock, and then make sure nobody else beat them to it.
if (
latest_version is None
or latest_version_next_check is None
or latest_version_next_check < datetime.now()
):
# Note: we get the cluster version here before grabbing the latest_version_lock
# lock so it's not held while trying to talk with the DB.
release = version_num_to_release(get_cluster_version())
with latest_version_lock:
# Only check if nobody already got the version before us
if (
latest_version is None
or latest_version_next_check is None
or latest_version_next_check < datetime.now()
):
log.info("Checking latest PostgreSQL version")
latest_version_next_check = datetime.now() + timedelta(
seconds=int(config["latest_version_check_period"])
)
# Grab the RSS feed
raw_rss = requests.get("https://www.postgresql.org/versions.rss")
if raw_rss.status_code != 200:
raise LatestVersionCheckError("code={}".format(r.status_code))
# Parse the RSS body and set global variables
parse_version_rss(raw_rss.text, release)
return latest_version
def sample_metric(dbname, metric_name, args, retry=True):
"""
Run the appropriate query for the named metric against the specified database
"""
# Get the metric definition
try:
metric = config["metrics"][metric_name]
except KeyError:
raise UnknownMetricError("Unknown metric: {}".format(metric_name))
# Get the connection pool for the database, or create one if it doesn't
# already exist.
pool = get_pool(dbname)
# Identify the PostgreSQL version
version = get_cluster_version()
# Get the query version
query = get_query(metric, version)
# Execute the quert
if retry:
return run_query(pool, metric["type"], query, args)
else:
return run_query_no_retry(pool, metric["type"], query, args)
def test_queries():
"""
Run all of the metric queries against a database and check the results
"""
# We just use the default db for tests
dbname = config["dbname"]
# Loop through all defined metrics.
for name, metric in config["metrics"].items():
# If the metric has arguments to use while testing, grab those
args = metric.get("test_args", {})
print("Testing {} [{}]".format(name, ", ".join(["{}={}".format(key, value) for key, value in args.items()])))
# When testing against a docker container, we may end up connecting
# before the service is truly up (it restarts during the initialization
# phase). To cope with this, we'll allow a few connection failures.
tries = 5
while True:
# Run the query without the ability to retry
try:
res = sample_metric(dbname, name, args, retry=False)
break
except MetricVersionError:
res = "Unsupported for this version"
break
except psycopg2.OperationalError as e:
print("Error encountered, {} tries left: {}".format(tries, e))
if tries <= 0:
raise
time.sleep(1)
tries -= 1
# Compare the result to the provided sample results
# TODO
print("{} -> {}".format(name, res))
# Return the number of errors
# TODO
return 0
class SimpleHTTPRequestHandler(BaseHTTPRequestHandler): class SimpleHTTPRequestHandler(BaseHTTPRequestHandler):
""" """
This is our request handling server. It is responsible for listening for This is our request handling server. It is responsible for listening for
requests, processing them, and responding. requests, processing them, and responding.
""" """
def log_request(self, code='-', size='-'): def log_request(self, code="-", size="-"):
""" """
Override to suppress standard request logging Override to suppress standard request logging
""" """
@ -436,71 +722,58 @@ class SimpleHTTPRequestHandler(BaseHTTPRequestHandler):
""" """
# Parse the URL # Parse the URL
parsed_path = urlparse(self.path) parsed_path = urlparse(self.path)
name = parsed_path.path.strip('/') metric_name = parsed_path.path.strip("/")
parsed_query = parse_qs(parsed_path.query) parsed_query = parse_qs(parsed_path.query)
if name == 'agent_version': if metric_name == "agent_version":
self._reply(200, VERSION) self._reply(200, VERSION)
return return
elif metric_name == "latest_version_info":
try:
get_latest_version()
self._reply(
200,
json.dumps(
{
"latest": latest_version,
"supported": 1 if release_supported else 0,
}
),
)
except LatestVersionCheckError as e:
log.error("Failed to retrieve latest version information: {}".format(e))
self._reply(503, "Failed to retrieve latest version info")
return
# Note: parse_qs returns the values as a list. Since we always expect # Note: parse_qs returns the values as a list. Since we always expect
# single values, just grab the first from each. # single values, just grab the first from each.
args = {key: values[0] for key, values in parsed_query.items()} args = {key: values[0] for key, values in parsed_query.items()}
# Get the metric definition
try:
metric = config['metrics'][name]
except KeyError:
log.error("Unknown metric: {}".format(name))
self._reply(404, 'Unknown metric')
return
# Get the dbname. If none was provided, use the default from the # Get the dbname. If none was provided, use the default from the
# config. # config.
dbname = args.get('dbname', config['dbname']) dbname = args.get("dbname", config["dbname"])
# Get the connection pool for the database, or create one if it doesn't # Sample the metric
# already exist.
try: try:
pool = get_pool(dbname) self._reply(200, sample_metric(dbname, metric_name, args))
except UnhappyDBError: return
except UnknownMetricError as e:
log.error("Unknown metric: {}".format(metric_name))
self._reply(404, "Unknown metric")
return
except MetricVersionError as e:
log.error(
"Failed to find a version of {} for {}".format(metric_name, version)
)
self._reply(404, "Unsupported version")
return
except UnhappyDBError as e:
log.info("Database {} is unhappy, please be patient".format(dbname)) log.info("Database {} is unhappy, please be patient".format(dbname))
self._reply(503, 'Database unavailable') self._reply(503, "Database unavailable")
return
# Identify the PostgreSQL version
try:
version = get_cluster_version()
except UnhappyDBError:
return return
except Exception as e: except Exception as e:
if dbname in unhappy_cooldown: log.error("Error running query: {}".format(e))
log.info("Database {} is unhappy, please be patient".format(dbname)) self._reply(500, "Unexpected error: {}".format(e))
self._reply(503, 'Database unavailable')
else:
log.error("Failed to get PostgreSQL version: {}".format(e))
self._reply(500, 'Error getting DB version')
return
# Get the query version
try:
query = get_query(metric, version)
except KeyError:
log.error("Failed to find a version of {} for {}".format(name, version))
self._reply(404, 'Unsupported version')
return
# Execute the quert
try:
self._reply(200, run_query(pool, metric['type'], query, args))
return
except Exception as e:
if dbname in unhappy_cooldown:
log.info("Database {} is unhappy, please be patient".format(dbname))
self._reply(503, 'Database unavailable')
else:
log.error("Error running query: {}".format(e))
self._reply(500, "Error running query")
return return
def _reply(self, code, content): def _reply(self, code, content):
@ -508,19 +781,29 @@ class SimpleHTTPRequestHandler(BaseHTTPRequestHandler):
Send a reply to the client Send a reply to the client
""" """
self.send_response(code) self.send_response(code)
self.send_header('Content-type', 'application/json') self.send_header("Content-type", "application/json")
self.end_headers() self.end_headers()
self.wfile.write(bytes(content, 'utf-8')) self.wfile.write(bytes(content, "utf-8"))
if __name__ == '__main__':
if __name__ == "__main__":
# Handle cli args # Handle cli args
parser = argparse.ArgumentParser( parser = argparse.ArgumentParser(
prog = 'pgmon', prog="pgmon", description="A PostgreSQL monitoring agent"
description='A PostgreSQL monitoring agent') )
parser.add_argument('config_file', default='pgmon.yml', nargs='?', parser.add_argument(
help='The config file to read (default: %(default)s)') "-c",
"--config_file",
default="pgmon.yml",
nargs="?",
help="The config file to read (default: %(default)s)",
)
parser.add_argument(
"-t", "--test", action="store_true", help="Run query tests and exit"
)
args = parser.parse_args() args = parser.parse_args()
@ -530,8 +813,16 @@ if __name__ == '__main__':
# Read the config file # Read the config file
read_config(config_file) read_config(config_file)
# Run query tests and exit if test mode is enabled
if args.test:
errors = test_queries()
if errors > 0:
sys.exit(1)
else:
sys.exit(0)
# Set up the http server to receive requests # Set up the http server to receive requests
server_address = ('127.0.0.1', config['port']) server_address = (config["address"], config["port"])
httpd = ThreadingHTTPServer(server_address, SimpleHTTPRequestHandler) httpd = ThreadingHTTPServer(server_address, SimpleHTTPRequestHandler)
# Set up the signal handler # Set up the signal handler
@ -539,7 +830,7 @@ if __name__ == '__main__':
signal.signal(signal.SIGHUP, signal_handler) signal.signal(signal.SIGHUP, signal_handler)
# Handle requests. # Handle requests.
log.info("Listening on port {}...".format(config['port'])) log.info("Listening on port {}...".format(config["port"]))
while running: while running:
httpd.handle_request() httpd.handle_request()

File diff suppressed because it is too large Load Diff

View File

@ -7,7 +7,7 @@ After=network.target
[Service] [Service]
EnvironmentFile=/etc/pgmon/%i-service.conf EnvironmentFile=/etc/pgmon/%i-service.conf
User=${SERVICE_USER:-postgres} User=${SERVICE_USER:-postgres}
ExecStart=/usr/local/bin/pgmon /etc/pgmon/%i.yml ExecStart=/usr/bin/pgmon -c /etc/pgmon/%i.yml
ExecReload=kill -HUP $MAINPID ExecReload=kill -HUP $MAINPID
Restart=on-failure Restart=on-failure
Type=exec Type=exec

23
tests/Dockerfile Normal file
View File

@ -0,0 +1,23 @@
FROM alpine:3.21
RUN apk update && \
apk add py3-psycopg2 \
py3-requests \
py3-yaml \
tini
WORKDIR /app
COPY src/pgmon.py /app/
COPY sample-config/pgmon-metrics.yml /app/
COPY tests/test-config.yml /app/
COPY --chmod=0600 --chown=postgres:postgres tests/pgpass /root/.pgpass
ENTRYPOINT ["tini", "--"]
EXPOSE 5400
CMD ["/app/pgmon.py", "-c", "/app/test-config.yml", "--test"]

32
tests/docker-compose.yml Normal file
View File

@ -0,0 +1,32 @@
---
services:
agent:
image: pgmon
build:
context: ..
dockerfile: tests/Dockerfile
ports:
- :5400
depends_on:
db:
condition: service_healthy
db:
image: "postgres:${PGTAG:-17-bookworm}"
ports:
- :5432
environment:
POSTGRES_PASSWORD: secret
healthcheck:
#test: [ "CMD", "pg_isready", "-U", "postgres" ]
test: [ "CMD-SHELL", "pg_controldata /var/lib/postgresql/data/ | grep -q 'in production'" ]
interval: 5s
timeout: 2s
retries: 40
command: >
postgres -c ssl=on
-c ssl_cert_file='/etc/ssl/certs/ssl-cert-snakeoil.pem'
-c ssl_key_file='/etc/ssl/private/ssl-cert-snakeoil.key'
-c listen_addresses='*'

1
tests/pgpass Normal file
View File

@ -0,0 +1 @@
db:5432:*:postgres:secret

65
tests/run-tests.sh Executable file
View File

@ -0,0 +1,65 @@
#!/bin/bash
# Versions to test
versions=( $@ )
# If we weren't given any versions, test them all
if [ ${#versions[@]} -eq 0 ]
then
versions=( 9.2 9.4 9.6 10 11 12 13 14 15 16 17 )
fi
# Image tags to use
declare -A images=()
images["9.2"]='9.2'
images["9.3"]='9.3'
images["9.4"]='9.4'
images["9.5"]='9.5'
images["9.6"]='9.6-bullseye'
images["10"]='10-bullseye'
images["11"]='11-bookworm'
images["12"]='12-bookworm'
images["13"]='13-bookworm'
images["14"]='14-bookworm'
images["15"]='15-bookworm'
images["16"]='16-bookworm'
images["17"]='17-bookworm'
declare -A results=()
# Make sure everything's down to start with
docker compose down
# Make sure our agent container is up to date
docker compose build agent
for version in "${versions[@]}"
do
echo
echo "Testing: PostgreSQL ${version}"
# Specify the version we're testing against
export PGTAG="${images["$version"]}"
# Start the containers
docker compose up --exit-code-from=agent agent
rc=$?
results["$version"]=$rc
# Destroy the containers
docker compose down
done
echo
echo
for v in "${versions[@]}"
do
case "${results["$v"]}" in
0) msg="OK" ;;
1) msg="Query failure detected" ;;
18) msg="Docker image error: 18" ;;
*) msg="Unexpected error: ${results["$v"]}" ;;
esac
echo "$v -> $msg"
done

17
tests/test-config.yml Normal file
View File

@ -0,0 +1,17 @@
---
# Bind to all interfaces so we can submit requests from outside the test container
address: 0.0.0.0
# We always just connect to the db container
dbhost: db
dbport: 5432
dbuser: postgres
# The SSL cipher parameters are too old in the 9.2 container, so we allow the tests
# to be run without encryption
ssl_mode: prefer
# Pull in the standard metrics
include:
- pgmon-metrics.yml

File diff suppressed because it is too large Load Diff