[02:45:47] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:34:29] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 921.97 seconds [03:35:07] PROBLEM - puppet last run on analytics1055 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 4 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP/GeoIP2-City.mmdb.gz],File[/usr/share/GeoIP/GeoIP2-City.mmdb.test] [03:57:57] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 4.316 second response time [04:01:09] RECOVERY - puppet last run on analytics1055 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [04:01:45] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:09:43] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 107.09 seconds [05:38:35] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 1.437 second response time [05:42:25] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:50:57] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 8.235 second response time [05:57:07] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:14:03] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 2.697 second response time [06:20:19] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:23:55] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 4.639 second response time [06:27:43] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:31:59] PROBLEM - puppet last run on cp2024 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:36:15] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 8.040 second response time [06:39:59] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:40:04] (03PS3) 10ArielGlenn: convert snapshot/dumps python scripts in puppet to python3 [puppet] - 10https://gerrit.wikimedia.org/r/477222 (https://phabricator.wikimedia.org/T210980) [06:42:47] (03CR) 10ArielGlenn: [C: 03+2] convert snapshot/dumps python scripts in puppet to python3 [puppet] - 10https://gerrit.wikimedia.org/r/477222 (https://phabricator.wikimedia.org/T210980) (owner: 10ArielGlenn) [06:50:49] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 4.128 second response time [06:54:35] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:57:59] RECOVERY - puppet last run on cp2024 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:03:05] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 6.454 second response time [07:10:25] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:28:43] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 2.091 second response time [07:32:31] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:41:07] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 8.169 second response time [07:44:49] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:52:05] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 3.982 second response time [07:54:20] (03PS3) 10ArielGlenn: make all snapshot hosts use php7.2 for dumps [puppet] - 10https://gerrit.wikimedia.org/r/481167 (https://phabricator.wikimedia.org/T211935) [07:55:49] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:56:49] (03CR) 10ArielGlenn: [C: 03+2] make all snapshot hosts use php7.2 for dumps [puppet] - 10https://gerrit.wikimedia.org/r/481167 (https://phabricator.wikimedia.org/T211935) (owner: 10ArielGlenn) [08:27:41] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 9.819 second response time [08:31:23] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:38:41] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 5.423 second response time [08:42:27] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:51:59] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.002 second response time [08:53:06] !log restarted pdfrender on scb1003 [08:53:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:03:48] (03PS1) 10Elukey: profile::hadoop::balancer: remove unused logrotate config [puppet] - 10https://gerrit.wikimedia.org/r/481259 [09:06:29] (03CR) 10Elukey: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/14070/" [puppet] - 10https://gerrit.wikimedia.org/r/481259 (owner: 10Elukey) [09:53:53] PROBLEM - MariaDB Slave SQL: s8 on db1124 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1032, Errmsg: Could not execute Delete_rows_v1 event on table wikidatawiki.pagelinks: Cant find record in pagelinks, Error_code: 1032: handler error HA_ERR_KEY_NOT_FOUND: the events master log db1087-bin.003629, end_log_pos 782886141 [10:03:49] PROBLEM - MariaDB Slave Lag: s8 on db1124 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 743.98 seconds [10:18:03] (03PS1) 10ArielGlenn: no itervalues in python3 [dumps] - 10https://gerrit.wikimedia.org/r/481260 (https://phabricator.wikimedia.org/T210989) [10:24:07] (03CR) 10ArielGlenn: [C: 03+2] no itervalues in python3 [dumps] - 10https://gerrit.wikimedia.org/r/481260 (https://phabricator.wikimedia.org/T210989) (owner: 10ArielGlenn) [10:25:08] !log ariel@deploy1001 Started deploy [dumps/dumps@af74350]: python3 fixup for show runtimes [10:25:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:25:14] !log ariel@deploy1001 Finished deploy [dumps/dumps@af74350]: python3 fixup for show runtimes (duration: 00m 05s) [10:25:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:53:27] !log Fix replication on db1124:3318 [11:53:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:57:37] RECOVERY - MariaDB Slave SQL: s8 on db1124 is OK: OK slave_sql_state Slave_SQL_Running: Yes [12:08:43] RECOVERY - MariaDB Slave Lag: s8 on db1124 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [12:20:41] PROBLEM - MariaDB Slave SQL: s8 on db1124 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1032, Errmsg: Could not execute Delete_rows_v1 event on table wikidatawiki.pagelinks: Cant find record in pagelinks, Error_code: 1032: handler error HA_ERR_KEY_NOT_FOUND: the events master log db1087-bin.003630, end_log_pos 298346008 [12:27:59] RECOVERY - MariaDB Slave SQL: s8 on db1124 is OK: OK slave_sql_state Slave_SQL_Running: Yes [12:54:22] (03PS2) 10Ladsgroup: Add WikibaseQualityConstraints configs in testwikidatawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/480535 (https://phabricator.wikimedia.org/T209922) [13:09:49] (03PS2) 10Framawiki: Whitelist *.*.archive.org in wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/468831 (https://phabricator.wikimedia.org/T207581) [13:51:34] (03PS4) 10Framawiki: Create Cookbook NS in bnwikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458870 (https://phabricator.wikimedia.org/T203534) [13:52:08] (03PS5) 10Framawiki: Create Cookbook NS in bnwikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458870 (https://phabricator.wikimedia.org/T203534) [14:36:34] (03PS1) 10Framawiki: Remove NS 104 from wgContentNamespaces for euwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481266 (https://phabricator.wikimedia.org/T191396) [14:39:57] (03CR) 10Urbanecm: [C: 03+1] "LGTM, should carry no consequences :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481266 (https://phabricator.wikimedia.org/T191396) (owner: 10Framawiki) [14:41:33] (03CR) 10Urbanecm: [C: 03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458870 (https://phabricator.wikimedia.org/T203534) (owner: 10Framawiki) [14:53:21] (03PS1) 10Framawiki: Publish throttle-analyze at noc [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481267 [14:54:03] (03PS2) 10Framawiki: Publish throttle-analyze at noc [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481267 (https://phabricator.wikimedia.org/T187894) [14:55:52] (03PS3) 10Framawiki: Publish throttle-analyze at noc [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481267 (https://phabricator.wikimedia.org/T187894) [15:02:54] (03CR) 10Urbanecm: "Not saying this won't work, but i tried it in https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/414758 and it didn't, see the" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481267 (https://phabricator.wikimedia.org/T187894) (owner: 10Framawiki) [15:15:00] (03CR) 10Framawiki: "> Patch Set 3:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481267 (https://phabricator.wikimedia.org/T187894) (owner: 10Framawiki) [15:20:13] (03CR) 10Reedy: [C: 04-1] "Yeah, probably just caching" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481267 (https://phabricator.wikimedia.org/T187894) (owner: 10Framawiki) [16:13:07] (03PS1) 10Paladox: php: Add support for puppet6 [puppet] - 10https://gerrit.wikimedia.org/r/481269 [16:13:46] (03PS2) 10Paladox: php: Add support for puppet6 [puppet] - 10https://gerrit.wikimedia.org/r/481269 [16:14:15] (03CR) 10Paladox: "I've tested this under puppet6 and puppet4 and confirmed this does not break puppet4." [puppet] - 10https://gerrit.wikimedia.org/r/481269 (owner: 10Paladox) [16:18:28] (03PS6) 10Paladox: wmlib: Fix support for puppet6 in php_ini.rb, ini.rb and ordered_yaml.rb [puppet] - 10https://gerrit.wikimedia.org/r/481254 [16:18:35] (03CR) 10Paladox: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/481254 (owner: 10Paladox) [17:02:25] 10Operations, 10ops-eqiad: frdb1001 RAID controller battery failure - https://phabricator.wikimedia.org/T212556 (10Para32556677) 05Open→03Invalid p:05Unbreak!→03Low a:03Para32556677 [17:02:52] uh [17:03:34] 10Operations, 10ops-eqiad: frdb1001 RAID controller battery failure - https://phabricator.wikimedia.org/T212556 (10Paladox) 05Invalid→03Open p:05Low→03Unbreak! a:05Para32556677→03None [17:52:55] (03PS1) 10Paladox: wmlib: Add support for puppet6 in require_package [puppet] - 10https://gerrit.wikimedia.org/r/481271 [17:53:26] (03PS2) 10Paladox: wmlib: Add support for puppet6 in require_package [puppet] - 10https://gerrit.wikimedia.org/r/481271 [17:53:37] (03PS3) 10Paladox: wmflib: Add support for puppet6 in require_package [puppet] - 10https://gerrit.wikimedia.org/r/481271 [18:38:31] PROBLEM - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is CRITICAL: cluster=cache_text site=eqiad https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4fullscreenrefresh=1morgId=1 [18:39:43] RECOVERY - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4fullscreenrefresh=1morgId=1