Compare commits

..

37 Commits

Author SHA1 Message Date
e930178e9c
Merge branch 'rel/1.0.4' 2025-07-06 03:35:39 -04:00
5afc940df8
Bump release version to 1.0.4 2025-07-06 03:35:00 -04:00
3961aa3448
Add Gentoo ebuild for 1.0.4-rc1 2025-07-06 03:32:27 -04:00
8fd57032e7
Update Makefile to support rc versions, bump version
* Support packaging RC versions for deb and rpm packages

* Bump version to 1.0.4-rc1
2025-07-06 03:28:36 -04:00
5ede7dea07
Fix package-all make target
* Fix the package-all target in the Makefile

* Remove storage of raw sequence stats
2025-07-05 12:28:36 -04:00
5ea007c3f6
Merge branch 'dev/sequences' into develop 2025-07-05 01:18:01 -04:00
7cb0f7ad40
Fix typo in activity query 2025-07-05 01:17:35 -04:00
83fa12ec54
Correct types for slot LSN lag metrics 2025-07-04 02:46:25 -04:00
98b74d9aed
Add sequence metrics to Zabbix template 2025-07-04 02:45:43 -04:00
45953848e2
Remove some type casts from working around Decimal types 2025-07-03 10:16:20 -04:00
6116f4f885
Fix missing json default 2025-07-03 02:08:45 -04:00
24d1214855
Teach json how to serialize decimals 2025-07-03 01:47:06 -04:00
3c39d8aa97
Remove CIDR prefix from replication id 2025-07-03 01:13:58 -04:00
ebb084aa9d
Add initial query for sequence usage 2025-07-01 02:29:56 -04:00
86d5e8917b
Merge branch 'dev/io_stats' into develop 2025-07-01 01:30:21 -04:00
be84e2b89a
Report some I/O metrics in bytes 2025-06-30 02:00:06 -04:00
fd0a9e1230
Add backend I/O discoovery items to template 2025-06-29 02:08:19 -04:00
b934a30124
Add pg_stat_io query
* Add a query for pg_stat_io metrics grouped by backend type

* Switch some unsupported values from 0 to NULL to better reflect their
  nonexistence
2025-06-28 00:30:03 -04:00
ea397ef889
Improve docker-related query test failure avoidance 2025-06-18 17:39:16 -04:00
def6bab7f4
Add a pause after deploying test db containers as a temporary kludge 2025-06-18 03:30:27 -04:00
ecb616f6d9
Allow retry in query tests 2025-06-18 03:23:43 -04:00
b7f731c6ac
Fix and reformat several queries 2025-06-18 03:03:34 -04:00
4fba81dc2c
Add initial bgwriter and activity metrics queries 2025-06-17 02:40:52 -04:00
9225e745b0
Handle division by zero possibility
* When no I/O activity has been reported in pg_statio_all_tables for a
  database, the queries resulted in a division by zero. Cange this to a
  NULL instead.  Still a problem for Zabbix, but maybe a little cleaner.
2025-06-17 01:16:57 -04:00
b981a9ad36
Add slot lag metrics 2025-06-16 00:56:14 -04:00
c0185d4b86
Remove extra quotes in replication slot query 2025-06-15 02:20:45 -04:00
d34dfc5bf7
Fix replication monitoring queries 2025-06-15 02:19:10 -04:00
7c395a80fe
Add metrics for tracking cache hit ratios 2025-06-13 01:16:24 -04:00
8fe81e3ab3
Fix global variable placement 2025-06-04 12:49:35 -04:00
cfe01eb63e
Fix typo in latest version Zabbix item 2025-06-04 01:44:25 -04:00
55b9b16deb
Install to /usr/bin
* Standardize all install methods to use /usr/bin rather than some of
  them using /usr/local/bin

* Make the init script executable in install-openrc
2025-06-04 01:16:59 -04:00
6a22597f1f
Fix ID age query and template 2025-06-04 00:47:51 -04:00
295d1d6310
Fix typo in install-openrc 2025-06-03 19:38:59 -04:00
8b804f3912
Add requests as a dependency, split install target
* Add requests as a dependency on all OSs

* Split the install target into a common target and separate
  install-systemd and install-openrc targets to support installing on
  Alpine
2025-06-03 19:34:58 -04:00
bc9e039cb3
Fix Gentoo package prefix, update ebuild 2025-06-03 12:46:09 -04:00
54cf117591
Fix package contents for Gentoo
* Use git to identify only the version controlled files when creating
  the Gentoo tarball.
2025-06-03 12:06:34 -04:00
8e32e01c20
Merge branch 'dev/pg-version-lag' 2025-06-03 02:00:48 -04:00
14 changed files with 1456 additions and 100 deletions

View File

@ -3,7 +3,7 @@ Version: 1.0
Section: utils
Priority: optional
Architecture: all
Depends: logrotate, python3 (>= 3.6), python3-psycopg2, python3-yaml, systemd
Depends: logrotate, python3 (>= 3.6), python3-psycopg2, python3-requests, python3-yaml, systemd
Maintainer: James Campbell <james@commandprompt.com>
Homepage: https://www.commandprompt.com
Description: A bridge to sit between monitoring tools and PostgreSQL

74
GENTOO/pgmon-1.0.3.ebuild Normal file
View File

@ -0,0 +1,74 @@
# Copyright 2024 Gentoo Authors
# Distributed under the terms of the GNU General Public License v2
EAPI=8
PYTHON_COMPAT=( python3_{6..13} )
inherit python-r1 systemd
DESCRIPTION="PostgreSQL monitoring bridge"
HOMEPAGE="None"
LICENSE="BSD"
SLOT="0"
KEYWORDS="amd64"
SRC_URI="https://code2.shh-dot-com.org/james/${PN}/releases/download/v${PV}/${P}.tar.bz2"
IUSE="-systemd"
DEPEND="
${PYTHON_DEPS}
dev-python/psycopg:2
dev-python/pyyaml
dev-python/requests
app-admin/logrotate
"
RDEPEND="${DEPEND}"
BDEPEND=""
#RESTRICT="fetch"
#S="${WORKDIR}/${PN}"
#pkg_nofetch() {
# einfo "Please download"
# einfo " - ${P}.tar.bz2"
# einfo "from ${HOMEPAGE} and place it in your DISTDIR directory."
# einfo "The file should be owned by portage:portage."
#}
src_compile() {
true
}
src_install() {
# Install init script
if ! use systemd ; then
newinitd "openrc/pgmon.initd" pgmon
newconfd "openrc/pgmon.confd" pgmon
fi
# Install systemd unit
if use systemd ; then
systemd_dounit "systemd/pgmon.service"
fi
# Install script
exeinto /usr/bin
newexe "src/pgmon.py" pgmon
# Install default config
diropts -o root -g root -m 0755
insinto /etc/pgmon
doins "sample-config/pgmon.yml"
doins "sample-config/pgmon-metrics.yml"
# Install logrotate config
insinto /etc/logrotate.d
newins "logrotate/pgmon.logrotate" pgmon
# Install man page
doman manpages/pgmon.1
}

74
GENTOO/pgmon-1.0.4.ebuild Normal file
View File

@ -0,0 +1,74 @@
# Copyright 2024 Gentoo Authors
# Distributed under the terms of the GNU General Public License v2
EAPI=8
PYTHON_COMPAT=( python3_{6..13} )
inherit python-r1 systemd
DESCRIPTION="PostgreSQL monitoring bridge"
HOMEPAGE="None"
LICENSE="BSD"
SLOT="0"
KEYWORDS="amd64"
SRC_URI="https://code2.shh-dot-com.org/james/${PN}/releases/download/v${PV}/${P}.tar.bz2"
IUSE="-systemd"
DEPEND="
${PYTHON_DEPS}
dev-python/psycopg:2
dev-python/pyyaml
dev-python/requests
app-admin/logrotate
"
RDEPEND="${DEPEND}"
BDEPEND=""
#RESTRICT="fetch"
#S="${WORKDIR}/${PN}"
#pkg_nofetch() {
# einfo "Please download"
# einfo " - ${P}.tar.bz2"
# einfo "from ${HOMEPAGE} and place it in your DISTDIR directory."
# einfo "The file should be owned by portage:portage."
#}
src_compile() {
true
}
src_install() {
# Install init script
if ! use systemd ; then
newinitd "openrc/pgmon.initd" pgmon
newconfd "openrc/pgmon.confd" pgmon
fi
# Install systemd unit
if use systemd ; then
systemd_dounit "systemd/pgmon.service"
fi
# Install script
exeinto /usr/bin
newexe "src/pgmon.py" pgmon
# Install default config
diropts -o root -g root -m 0755
insinto /etc/pgmon
doins "sample-config/pgmon.yml"
doins "sample-config/pgmon-metrics.yml"
# Install logrotate config
insinto /etc/logrotate.d
newins "logrotate/pgmon.logrotate" pgmon
# Install man page
doman manpages/pgmon.1
}

View File

@ -3,7 +3,22 @@ PACKAGE_NAME := pgmon
SCRIPT := src/$(PACKAGE_NAME).py
VERSION := $(shell grep -m 1 '^VERSION = ' "$(SCRIPT)" | sed -ne 's/.*"\(.*\)".*/\1/p')
# Figure out the version components
# Note: The release is for RPM packages, where prerelease releases are written as 0.<release>
FULL_VERSION := $(shell grep -m 1 '^VERSION = ' "$(SCRIPT)" | sed -ne 's/.*"\(.*\)".*/\1/p')
VERSION := $(shell echo $(FULL_VERSION) | sed -n 's/\(.*\)\(-rc.*\|$$\)/\1/p')
RELEASE := $(shell echo $(FULL_VERSION) | sed -n 's/.*-rc\([0-9]\+\)$$/\1/p')
ifeq ($(RELEASE),)
RPM_RELEASE := 1
RPM_VERSION := $(VERSION)-$(RPM_RELEASE)
DEB_VERSION := $(VERSION)
else
RPM_RELEASE := 0.$(RELEASE)
RPM_VERSION := $(VERSION)-$(RPM_RELEASE)
DEB_VERSION := $(VERSION)~rc$(RELEASE)
endif
# Where packages are built
BUILD_DIR := build
@ -23,18 +38,22 @@ SUPPORTED := ubuntu-20.04 \
# These targets are the main ones to use for most things.
##
.PHONY: all clean tgz test query-tests install
.PHONY: all clean tgz test query-tests install-common install-openrc install-systemd
all: package-all
version:
@echo "full version=$(FULL_VERSION) version=$(VERSION) rel=$(RELEASE) rpm=$(RPM_VERSION) deb=$(DEB_VERSION)"
# Build all packages
.PHONY: package-all
all: $(foreach distro_release, $(SUPPORTED), package-$(distro_release))
package-all: $(foreach distro_release, $(SUPPORTED), package-$(distro_release))
# Gentoo package (tar.gz) creation
.PHONY: package-gentoo
package-gentoo:
mkdir -p $(BUILD_DIR)/gentoo
tar --transform "s,^\.,$(PACKAGE_NAME)-$(VERSION)," -acjf $(BUILD_DIR)/gentoo/$(PACKAGE_NAME)-$(VERSION).tar.bz2 --exclude $(BUILD_DIR) .
tar --transform "s,^,$(PACKAGE_NAME)-$(FULL_VERSION)/," -acjf $(BUILD_DIR)/gentoo/$(PACKAGE_NAME)-$(FULL_VERSION).tar.bz2 --exclude .gitignore $(shell git ls-tree --full-tree --name-only -r HEAD)
# Create a deb package
@ -54,8 +73,8 @@ package-%:
tgz:
rm -rf $(BUILD_DIR)/tgz/root
mkdir -p $(BUILD_DIR)/tgz/root
$(MAKE) install DESTDIR=$(BUILD_DIR)/tgz/root
tar -cz -f $(BUILD_DIR)/tgz/$(PACKAGE_NAME)-$(VERSION).tgz -C $(BUILD_DIR)/tgz/root .
$(MAKE) install-openrc DESTDIR=$(BUILD_DIR)/tgz/root
tar -cz -f $(BUILD_DIR)/tgz/$(PACKAGE_NAME)-$(FULL_VERSION).tgz -C $(BUILD_DIR)/tgz/root .
# Clean up the build directory
clean:
@ -69,18 +88,17 @@ test:
query-tests:
cd tests ; ./run-tests.sh
# Install the script at the specified base directory
install:
# Install the script at the specified base directory (common components)
install-common:
# Set up directories
mkdir -p $(DESTDIR)/etc/$(PACKAGE_NAME)
mkdir -p ${DESTDIR}/etc/logrotate.d
mkdir -p $(DESTDIR)/lib/systemd/system
mkdir -p $(DESTDIR)/usr/local/bin
mkdir -p $(DESTDIR)/usr/bin
mkdir -p $(DESTDIR)/usr/share/man/man1
# Install script
cp $(SCRIPT) $(DESTDIR)/usr/local/bin/$(PACKAGE_NAME)
chmod 755 $(DESTDIR)/usr/local/bin/$(PACKAGE_NAME)
cp $(SCRIPT) $(DESTDIR)/usr/bin/$(PACKAGE_NAME)
chmod 755 $(DESTDIR)/usr/bin/$(PACKAGE_NAME)
# Install manpage
cp manpages/* $(DESTDIR)/usr/share/man/man1/
@ -89,11 +107,35 @@ install:
# Install sample config
cp sample-config/* $(DESTDIR)/etc/$(PACKAGE_NAME)/
# Install logrotate config
cp logrotate/${PACKAGE_NAME}.logrotate ${DESTDIR}/etc/logrotate.d/${PACKAGE_NAME}
# Install for systemd
install-systemd:
# Install the common stuff
$(MAKE) install-common
# Set up directories
mkdir -p $(DESTDIR)/lib/systemd/system
# Install systemd unit files
cp systemd/* $(DESTDIR)/lib/systemd/system/
# Install logrotate config
cp logrotate/${PACKAGE_NAME}.logrotate ${DESTDIR}/etc/logrotate.d/${PACKAGE_NAME}
# Install for open-rc
install-openrc:
# Install the common stuff
$(MAKE) install-common
# Set up directories
mkdir -p $(DESTDIR)/etc/init.d
mkdir -p $(DESTDIR)/etc/conf.d
# Install init script
cp openrc/pgmon.initd $(DESTDIR)/etc/init.d/pgmon
chmod 755 $(DESTDIR)/etc/init.d/pgmon
# Install init script config file
cp openrc/pgmon.confd $(DESTDIR)/etc/conf.d/pgmon
# Run all of the install tests
@ -106,28 +148,28 @@ debian-%-install-test:
docker run --rm \
-v ./$(BUILD_DIR):/output \
debian:$* \
bash -c 'apt-get update && apt-get install -y /output/$(PACKAGE_NAME)-$(VERSION)-debian-$*.deb'
bash -c 'apt-get update && apt-get install -y /output/$(PACKAGE_NAME)-$(DEB_VERSION)-debian-$*.deb'
# Run a RedHat install test
rockylinux-%-install-test:
docker run --rm \
-v ./$(BUILD_DIR):/output \
rockylinux:$* \
bash -c 'dnf makecache && dnf install -y /output/$(PACKAGE_NAME)-$(VERSION)-1.el$*.noarch.rpm'
bash -c 'dnf makecache && dnf install -y /output/$(PACKAGE_NAME)-$(RPM_VERSION).el$*.noarch.rpm'
# Run an Ubuntu install test
ubuntu-%-install-test:
docker run --rm \
-v ./$(BUILD_DIR):/output \
ubuntu:$* \
bash -c 'apt-get update && apt-get install -y /output/$(PACKAGE_NAME)-$(VERSION)-ubuntu-$*.deb'
bash -c 'apt-get update && apt-get install -y /output/$(PACKAGE_NAME)-$(DEB_VERSION)-ubuntu-$*.deb'
# Run an OracleLinux install test (this is for EL7 since CentOS7 images no longer exist)
oraclelinux-%-install-test:
docker run --rm \
-v ./$(BUILD_DIR):/output \
oraclelinux:7 \
bash -c 'yum makecache && yum install -y /output/$(PACKAGE_NAME)-$(VERSION)-1.el7.noarch.rpm'
bash -c 'yum makecache && yum install -y /output/$(PACKAGE_NAME)-$(RPM_VERSION).el7.noarch.rpm'
# Run a Gentoo install test
gentoo-install-test:
@ -167,30 +209,30 @@ package-image-%:
# Debian package creation
actually-package-debian-%:
$(MAKE) install DESTDIR=/output/debian-$*
$(MAKE) install-systemd DESTDIR=/output/debian-$*
cp -r --preserve=mode DEBIAN /output/debian-$*/
dpkg-deb -Zgzip --build /output/debian-$* "/output/$(PACKAGE_NAME)-$(VERSION)-debian-$*.deb"
dpkg-deb -Zgzip --build /output/debian-$* "/output/$(PACKAGE_NAME)-$(DEB_VERSION)-debian-$*.deb"
# RedHat package creation
actually-package-rockylinux-%:
mkdir -p /output/rockylinux-$*/{BUILD,RPMS,SOURCES,SPECS,SRPMS}
sed -e "s/@@VERSION@@/$(VERSION)/g" RPM/$(PACKAGE_NAME).spec > /output/rockylinux-$*/SPECS/$(PACKAGE_NAME).spec
sed -e "s/@@VERSION@@/$(VERSION)/g" -e "s/@@RELEASE@@/$(RPM_RELEASE)/g" RPM/$(PACKAGE_NAME).spec > /output/rockylinux-$*/SPECS/$(PACKAGE_NAME).spec
rpmbuild --define '_topdir /output/rockylinux-$*' \
--define 'version $(VERSION)' \
--define 'version $(RPM_VERSION)' \
-bb /output/rockylinux-$*/SPECS/$(PACKAGE_NAME).spec
cp /output/rockylinux-$*/RPMS/noarch/$(PACKAGE_NAME)-$(VERSION)-1.el$*.noarch.rpm /output/
cp /output/rockylinux-$*/RPMS/noarch/$(PACKAGE_NAME)-$(RPM_VERSION).el$*.noarch.rpm /output/
# Ubuntu package creation
actually-package-ubuntu-%:
$(MAKE) install DESTDIR=/output/ubuntu-$*
$(MAKE) install-systemd DESTDIR=/output/ubuntu-$*
cp -r --preserve=mode DEBIAN /output/ubuntu-$*/
dpkg-deb -Zgzip --build /output/ubuntu-$* "/output/$(PACKAGE_NAME)-$(VERSION)-ubuntu-$*.deb"
dpkg-deb -Zgzip --build /output/ubuntu-$* "/output/$(PACKAGE_NAME)-$(DEB_VERSION)-ubuntu-$*.deb"
# OracleLinux package creation
actually-package-oraclelinux-%:
mkdir -p /output/oraclelinux-$*/{BUILD,RPMS,SOURCES,SPECS,SRPMS}
sed -e "s/@@VERSION@@/$(VERSION)/g" RPM/$(PACKAGE_NAME)-el7.spec > /output/oraclelinux-$*/SPECS/$(PACKAGE_NAME).spec
sed -e "s/@@VERSION@@/$(VERSION)/g" -e "s/@@RELEASE@@/$(RPM_RELEASE)/g" RPM/$(PACKAGE_NAME)-el7.spec > /output/oraclelinux-$*/SPECS/$(PACKAGE_NAME).spec
rpmbuild --define '_topdir /output/oraclelinux-$*' \
--define 'version $(VERSION)' \
--define 'version $(RPM_VERSION)' \
-bb /output/oraclelinux-$*/SPECS/$(PACKAGE_NAME).spec
cp /output/oraclelinux-$*/RPMS/noarch/$(PACKAGE_NAME)-$(VERSION)-1.el$*.noarch.rpm /output/
cp /output/oraclelinux-$*/RPMS/noarch/$(PACKAGE_NAME)-$(RPM_VERSION).el$*.noarch.rpm /output/

View File

@ -1,13 +1,13 @@
Name: pgmon
Version: @@VERSION@@
Release: 1%{?dist}
Release: @@RELEASE@@%{?dist}
Summary: A bridge to sit between monitoring tools and PostgreSQL
License: MIT
URL: https://www.commandprompt.com
BuildArch: noarch
Requires: logrotate, python, python-psycopg2, PyYAML, systemd
Requires: logrotate, python, python-psycopg2, PyYAML, python-requests, systemd
%description
A bridge to sit between monitoring tools and PostgreSQL
@ -19,7 +19,7 @@ A bridge to sit between monitoring tools and PostgreSQL
# Do nothing since we have nothing to build
%install
make -C /src install DESTDIR=%{buildroot}
make -C /src install-systemd DESTDIR=%{buildroot}
%files
/etc/logrotate.d/pgmon
@ -28,7 +28,7 @@ make -C /src install DESTDIR=%{buildroot}
/etc/pgmon/pgmon-service.conf
/lib/systemd/system/pgmon.service
/lib/systemd/system/pgmon@.service
/usr/local/bin/pgmon
/usr/bin/pgmon
/usr/share/man/man1/pgmon.1.gz
%post

View File

@ -1,13 +1,13 @@
Name: pgmon
Version: @@VERSION@@
Release: 1%{?dist}
Release: @@RELEASE@@%{?dist}
Summary: A bridge to sit between monitoring tools and PostgreSQL
License: MIT
URL: https://www.commandprompt.com
BuildArch: noarch
Requires: logrotate, python3, python3-psycopg2, python3-pyyaml, systemd
Requires: logrotate, python3, python3-psycopg2, python3-pyyaml, python3-requests, systemd
%description
A bridge to sit between monitoring tools and PostgreSQL
@ -19,7 +19,7 @@ A bridge to sit between monitoring tools and PostgreSQL
# Do nothing since we have nothing to build
%install
make -C /src install DESTDIR=%{buildroot}
make -C /src install-systemd DESTDIR=%{buildroot}
%files
/etc/logrotate.d/pgmon
@ -28,7 +28,7 @@ make -C /src install DESTDIR=%{buildroot}
/etc/pgmon/pgmon-service.conf
/lib/systemd/system/pgmon.service
/lib/systemd/system/pgmon@.service
/usr/local/bin/pgmon
/usr/bin/pgmon
/usr/share/man/man1/pgmon.1.gz
%post

View File

@ -1,63 +1,307 @@
metrics:
##
# Discovery metrics
##
discover_dbs:
type: set
query:
0: SELECT datname AS dbname FROM pg_database
0: >
SELECT datname AS dbname
FROM pg_database
# Note: If the user lacks sufficient privileges, these fields will be NULL.
# The WHERE clause is intended to prevent Zabbix from discovering a
# connection it cannot monitor. Ideally this would generate an error
# instead.
discover_rep:
type: set
query:
0: SELECT client_addr || '_' || regexp_replace(application_name, '[ ,]', '_', 'g') AS repid, client_addr, state FROM pg_stat_replication
0: >
SELECT host(client_addr) || '_' || regexp_replace(application_name, '[ ,]', '_', 'g') AS repid,
client_addr,
state
FROM pg_stat_replication
WHERE state IS NOT NULL
discover_slots:
type: set
query:
90400: SELECT slot_name, plugin, slot_type, database, false as temporary, active FROM pg_replication_slots
100000: SELECT slot_name, plugin, slot_type, database, temporary, active FROM pg_replication_slots
90400: >
SELECT slot_name,
plugin,
slot_type,
database,
false as temporary,
active
FROM pg_replication_slots
100000: >
SELECT slot_name,
plugin,
slot_type,
database,
temporary,
active
FROM pg_replication_slots
##
# cluster-wide metrics
##
version:
type: value
query:
0: SHOW server_version_num
max_frozen_age:
type: row
query:
0: SELECT max(age(datfrozenxid)), max(mxid_age(datminmxid)) FROM pg_database
0: >
SELECT max(age(datfrozenxid)) AS xid_age,
NULL AS mxid_age
FROM pg_database
90600: >
SELECT max(age(datfrozenxid)) AS xid_age,
max(mxid_age(datminmxid)) AS mxid_age
FROM pg_database
bgwriter:
type: row
query:
0: >
SELECT checkpoints_timed,
checkpoints_req,
checkpoint_write_time,
checkpoint_sync_time,
buffers_checkpoint,
buffers_clean,
maxwritten_clean,
buffers_backend,
buffers_backend_fsync,
buffers_alloc
FROM pg_stat_bgwriter
170000: >
SELECT cp.num_timed AS checkpoints_timed,
cp.num_requested AS checkpoints_req,
cp.write_time AS checkpoint_write_time,
cp.sync_time AS checkpoint_sync_time,
cp.buffers_written AS buffers_checkpoint,
bg.buffers_clean AS buffers_clean,
bg.maxwritten_clean AS maxwritten_clean,
NULL AS buffers_backend,
NULL AS buffers_backend_fsync,
bg.buffers_alloc AS buffers_alloc
FROM pg_stat_bgwriter bg
CROSS JOIN pg_stat_checkpointer cp
io_per_backend:
type: set
query:
160000: >
SELECT backend_type,
COALESCE(SUM(reads * op_bytes), 0) AS reads,
COALESCE(SUM(read_time), 0) AS read_time,
COALESCE(SUM(writes * op_bytes), 0) AS writes,
COALESCE(SUM(write_time), 0) AS write_time,
COALESCE(SUM(writebacks * op_bytes), 0) AS writebacks,
COALESCE(SUM(writeback_time), 0) AS writeback_time,
COALESCE(SUM(extends * op_bytes), 0) AS extends,
COALESCE(SUM(extend_time), 0) AS extend_time,
COALESCE(SUM(op_bytes), 0) AS op_bytes,
COALESCE(SUM(hits), 0) AS hits,
COALESCE(SUM(evictions), 0) AS evictions,
COALESCE(SUM(reuses), 0) AS reuses,
COALESCE(SUM(fsyncs), 0) AS fsyncs,
COALESCE(SUM(fsync_time), 0) AS fsync_time
FROM pg_stat_io
GROUP BY backend_type
##
# Per-database metrics
##
db_stats:
type: row
query:
0: SELECT numbackends, xact_commit, xact_rollback, blks_read, blks_hit, tup_returned, tup_fetched, tup_inserted, tup_updated, tup_deleted, conflicts, temp_files, temp_bytes, deadlocks, blk_read_time, blk_write_time, extract('epoch' from stats_reset)::float FROM pg_stat_database WHERE datname = %(dbname)s
140000: SELECT numbackends, xact_commit, xact_rollback, blks_read, blks_hit, tup_returned, tup_fetched, tup_inserted, tup_updated, tup_deleted, conflicts, temp_files, temp_bytes, deadlocks, COALESCE(checksum_failures, 0) AS checksum_failures, blk_read_time, blk_write_time, session_time, active_time, idle_in_transaction_time, sessions, sessions_abandoned, sessions_fatal, sessions_killed, extract('epoch' from stats_reset)::float FROM pg_stat_database WHERE datname = %(dbname)s
0: >
SELECT numbackends,
xact_commit,
xact_rollback,
blks_read,
blks_hit,
tup_returned,
tup_fetched,
tup_inserted,
tup_updated,
tup_deleted,
conflicts,
temp_files,
temp_bytes,
deadlocks,
NULL AS checksum_failures,
blk_read_time,
blk_write_time,
NULL AS session_time,
NULL AS active_time,
NULL AS idle_in_transaction_time,
NULL AS sessions,
NULL AS sessions_abandoned,
NULL AS sessions_fatal,
NULL AS sessions_killed,
extract('epoch' from stats_reset) AS stats_reset
FROM pg_stat_database WHERE datname = %(dbname)s
140000: >
SELECT numbackends,
xact_commit,
xact_rollback,
blks_read,
blks_hit,
tup_returned,
tup_fetched,
tup_inserted,
tup_updated,
tup_deleted,
conflicts,
temp_files,
temp_bytes,
deadlocks,
COALESCE(checksum_failures, 0) AS checksum_failures,
blk_read_time,
blk_write_time,
session_time,
active_time,
idle_in_transaction_time,
sessions,
sessions_abandoned,
sessions_fatal,
sessions_killed,
extract('epoch' from stats_reset) AS stats_reset
FROM pg_stat_database WHERE datname = %(dbname)s
test_args:
dbname: postgres
hit_ratios:
type: row
query:
0: >
SELECT sum(heap_blks_read)::float / NULLIF(sum(heap_blks_read + heap_blks_hit), 0) AS avg_heap_hit_ratio,
sum(idx_blks_hit)::float / NULLIF(sum(idx_blks_read + idx_blks_hit), 0) AS avg_idx_hit_ratio,
sum(toast_blks_hit)::float / NULLIF(sum(toast_blks_read + toast_blks_hit), 0) AS avg_toast_hit_ratio,
sum(tidx_blks_hit)::float / NULLIF(sum(tidx_blks_read + tidx_blks_hit), 0) AS avg_tidx_hit_ratio
FROM pg_statio_all_tables
test_args:
dbname: postgres
activity:
type: set
query:
0: >
SELECT state,
count(*) AS backend_count,
COALESCE(EXTRACT(EPOCH FROM max(now() - state_change)), 0) AS max_state_time
FROM pg_stat_activity
WHERE datname = %(dbname)s
GROUP BY state
test_args:
dbname: postgres
sequence_usage:
type: value
query:
# 9.2 lacks lateral joins, the pg_sequence_last_value function, and the pg_sequences view
# 0: >
# SELECT COALESCE(MAX(pg_sequence_last_value(c.oid)::float / (pg_sequence_parameters(oid)).maximum_value), 0) AS max_usage
# FROM pg_class c
# WHERE c.relkind = 'S'
# 9.3 - 9.6 lacks the pg_sequence_last_value function, and pg_sequences view
# 90300: >
# SELECT COALESCE(MAX(pg_sequence_last_value(c.oid)::float / s.maximum_value), 0) AS max_usage
# FROM pg_class c
# CROSS JOIN LATERAL pg_sequence_parameters(c.oid) AS s
# WHERE c.relkind = 'S'
100000: SELECT COALESCE(MAX(last_value::float / max_value), 0) AS max_usage FROM pg_sequences;
test_args:
dbname: postgres
sequence_visibility:
type: row
query:
100000: >
SELECT COUNT(*) FILTER (WHERE has_sequence_privilege(c.oid, 'SELECT,USAGE')) AS visible_sequences,
COUNT(*) AS total_sequences
FROM pg_class AS c
WHERE relkind = 'S';
##
# Per-replication metrics
##
rep_stats:
type: row
query:
90400: >
SELECT pid, usename,
EXTRACT(EPOCH FROM backend_start) AS backend_start,
state,
pg_xlog_location_diff(pg_current_xlog_location(), sent_location) AS sent_lsn,
pg_xlog_location_diff(pg_current_xlog_location(), write_location) AS write_lsn,
pg_xlog_location_diff(pg_current_xlog_location(), flush_location) AS flush_lsn,
pg_xlog_location_diff(pg_current_xlog_location(), replay_location) AS replay_lsn,
NULL AS write_lag,
NULL AS flush_lag,
NULL AS replay_lag,
sync_state
FROM pg_stat_replication
WHERE host(client_addr) || '_' || regexp_replace(application_name, '[ ,]', '_', 'g') = %(repid)s
100000: >
SELECT pid, usename,
EXTRACT(EPOCH FROM backend_start) AS backend_start,
state,
pg_wal_lsn_diff(pg_current_wal_lsn(), sent_lsn) AS sent_lsn,
pg_wal_lsn_diff(pg_current_wal_lsn(), write_lsn) AS write_lsn,
pg_wal_lsn_diff(pg_current_wal_lsn(), flush_lsn) AS flush_lsn,
pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS replay_lsn,
COALESCE(EXTRACT(EPOCH FROM write_lag), 0) AS write_lag,
COALESCE(EXTRACT(EPOCH FROM flush_lag), 0) AS flush_lag,
COALESCE(EXTRACT(EPOCH FROM replay_lag), 0) AS replay_lag,
sync_state
FROM pg_stat_replication
WHERE host(client_addr) || '_' || regexp_replace(application_name, '[ ,]', '_', 'g') = %(repid)s
test_args:
repid: 127.0.0.1_test_rep
##
# Per-slot metrics
##
slot_stats:
type: row
query:
90400: >
SELECT NULL as active_pid,
xmin,
pg_xlog_location_diff(pg_current_xlog_location(), restart_lsn) AS restart_bytes,
NULL AS confirmed_flush_bytes
FROM pg_replication_slots WHERE slot_name = %(slot)s
90600: >
SELECT active_pid,
xmin,
pg_xlog_location_diff(pg_current_xlog_location(), restart_lsn) AS restart_bytes,
pg_xlog_location_diff(pg_current_xlog_location(), confirmed_flush_lsn) AS confirmed_flush_bytes
FROM pg_replication_slots WHERE slot_name = %(slot)s
100000: >
SELECT active_pid,
xmin,
pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) AS restart_bytes,
pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn) AS confirmed_flush_bytes
FROM pg_replication_slots WHERE slot_name = %(slot)s
test_args:
slot: test_slot
##
# Debugging
##
ntables:
type: value
query:
0: SELECT count(*) AS ntables FROM pg_stat_user_tables
# Per-replication metrics
rep_stats:
type: row
query:
90400: SELECT * FROM pg_stat_replication WHERE client_addr || '_' || regexp_replace(application_name, '[ ,]', '_', 'g') = '{repid}'
test_args:
repid: 127.0.0.1_test_rep
# Debugging
sleep:
type: value
query:
0: SELECT now(), pg_sleep(5);
# Per-slot metrics
slot_stats:
type: row
query:
90400: SELECT active_pid, xmin, pg_xlog_location_diff(pg_current_xlog_location(), restart_lsn) AS restart_bytes, pg_xlog_location_diff(pg_current_xlog_location(), confirmed_flush_lsn) AS confirmed_flush_bytes FROM pg_replication_slots WHERE slot_name = '{slot}'
100000: SELECT active_pid, xmin, pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) AS restart_bytes, pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn) AS confirmed_flush_bytes FROM pg_replication_slots WHERE slot_name = '{slot}'
test_args:
slot: test_slot

View File

@ -27,7 +27,9 @@ from urllib.parse import urlparse, parse_qs
import requests
import re
VERSION = "1.0.3"
from decimal import Decimal
VERSION = "1.0.4"
# Configuration
config = {}
@ -391,6 +393,16 @@ def get_query(metric, version):
raise MetricVersionError("Missing metric query for PostgreSQL {}".format(version))
def json_encode_special(obj):
"""
Encoder function to handle types the standard JSON package doesn't know what
to do with
"""
if isinstance(obj, Decimal):
return float(obj)
raise TypeError(f'Cannot serialize object of {type(obj)}')
def run_query_no_retry(pool, return_type, query, args):
"""
Run the query with no explicit retry code
@ -408,13 +420,13 @@ def run_query_no_retry(pool, return_type, query, args):
elif return_type == "row":
if len(res) == 0:
return "[]"
return json.dumps(res[0])
return json.dumps(res[0], default=json_encode_special)
elif return_type == "column":
if len(res) == 0:
return "[]"
return json.dumps([list(r.values())[0] for r in res])
return json.dumps([list(r.values())[0] for r in res], default=json_encode_special)
elif return_type == "set":
return json.dumps(res)
return json.dumps(res, default=json_encode_special)
except:
dbname = pool.name
if dbname in unhappy_cooldown:
@ -518,7 +530,6 @@ def parse_version_rss(raw_rss, release):
This sets these global variables:
latest_version
latest_version_next_check
release_supported
It is expected that the caller already holds the latest_version_lock lock.
@ -528,7 +539,6 @@ def parse_version_rss(raw_rss, release):
release: The PostgreSQL release we care about (ex: 9.2, 14)
"""
global latest_version
global latest_version_next_check
global release_supported
# Regular expressions for parsing the RSS document
@ -583,6 +593,8 @@ def get_latest_version():
Get the latest supported version of the major PostgreSQL release running on the server being monitored.
"""
global latest_version_next_check
# If we don't know the latest version or it's past the recheck time, get the
# version from the PostgreSQL RSS feed. Only one thread needs to do this, so
# they all try to grab the lock, and then make sure nobody else beat them to it.
@ -655,8 +667,25 @@ def test_queries():
for name, metric in config["metrics"].items():
# If the metric has arguments to use while testing, grab those
args = metric.get("test_args", {})
# Run the query without the ability to retry.
print("Testing {} [{}]".format(name, ", ".join(["{}={}".format(key, value) for key, value in args.items()])))
# When testing against a docker container, we may end up connecting
# before the service is truly up (it restarts during the initialization
# phase). To cope with this, we'll allow a few connection failures.
tries = 5
while True:
# Run the query without the ability to retry
try:
res = sample_metric(dbname, name, args, retry=False)
break
except MetricVersionError:
res = "Unsupported for this version"
break
except psycopg2.OperationalError as e:
print("Error encountered, {} tries left: {}".format(tries, e))
if tries <= 0:
raise
time.sleep(1)
tries -= 1
# Compare the result to the provided sample results
# TODO
print("{} -> {}".format(name, res))
@ -705,12 +734,16 @@ class SimpleHTTPRequestHandler(BaseHTTPRequestHandler):
self._reply(
200,
json.dumps(
{"latest": latest_version, "supported": 1 if release_supported else 0}
{
"latest": latest_version,
"supported": 1 if release_supported else 0,
}
),
)
except LatestVersionCheckError as e:
log.error("Failed to retrieve latest version information: {}".format(e))
self._reply(503, "Failed to retrieve latest version info")
return
# Note: parse_qs returns the values as a list. Since we always expect
# single values, just grab the first from each.

View File

@ -5,6 +5,9 @@ import tempfile
import logging
from decimal import Decimal
import json
import pgmon
# Silence most logging output
@ -789,3 +792,20 @@ metrics:
# Make sure we can pull the RSS file (we assume the 9.6 series won't be getting
# any more updates)
self.assertEqual(pgmon.get_latest_version(), 90624)
def test_json_encode_special(self):
# Confirm that we're getting the right type
self.assertFalse(isinstance(Decimal('0.5'), float))
self.assertTrue(isinstance(pgmon.json_encode_special(Decimal('0.5')), float))
# Make sure we get sane values
self.assertEqual(pgmon.json_encode_special(Decimal('0.5')), 0.5)
self.assertEqual(pgmon.json_encode_special(Decimal('12')), 12.0)
# Make sure we can still fail for other types
self.assertRaises(
TypeError, pgmon.json_encode_special, object
)
# Make sure we can actually serialize a Decimal
self.assertEqual(json.dumps(Decimal('2.5'), default=pgmon.json_encode_special), '2.5')

View File

@ -7,7 +7,7 @@ After=network.target
[Service]
EnvironmentFile=/etc/pgmon/%i-service.conf
User=${SERVICE_USER:-postgres}
ExecStart=/usr/local/bin/pgmon -c /etc/pgmon/%i.yml
ExecStart=/usr/bin/pgmon -c /etc/pgmon/%i.yml
ExecReload=kill -HUP $MAINPID
Restart=on-failure
Type=exec

View File

@ -2,6 +2,7 @@ FROM alpine:3.21
RUN apk update && \
apk add py3-psycopg2 \
py3-requests \
py3-yaml \
tini

View File

@ -23,7 +23,7 @@ services:
test: [ "CMD-SHELL", "pg_controldata /var/lib/postgresql/data/ | grep -q 'in production'" ]
interval: 5s
timeout: 2s
retries: 20
retries: 40
command: >
postgres -c ssl=on
-c ssl_cert_file='/etc/ssl/certs/ssl-cert-snakeoil.pem'

View File

@ -6,12 +6,15 @@ versions=( $@ )
# If we weren't given any versions, test them all
if [ ${#versions[@]} -eq 0 ]
then
versions=( 9.2 9.6 10 11 12 13 14 15 16 17 )
versions=( 9.2 9.4 9.6 10 11 12 13 14 15 16 17 )
fi
# Image tags to use
declare -A images=()
images["9.2"]='9.2'
images["9.3"]='9.3'
images["9.4"]='9.4'
images["9.5"]='9.5'
images["9.6"]='9.6-bullseye'
images["10"]='10-bullseye'
images["11"]='11-bookworm'

File diff suppressed because it is too large Load Diff