[00:22:44] (03PS1) 10Brennen Bearnes: logspam-watch: add time & sortable columns, improve formatting [puppet] - 10https://gerrit.wikimedia.org/r/593936 (https://phabricator.wikimedia.org/T242882) [00:28:03] (03CR) 10Brennen Bearnes: "I hacked together an interactive/sortable/filterable version of this which drops the watch(1) dependency entirely:" [puppet] - 10https://gerrit.wikimedia.org/r/499761 (owner: 10Lucas Werkmeister (WMDE)) [00:34:35] (03PS2) 10Brennen Bearnes: logspam-watch: add time & sortable columns, improve formatting [puppet] - 10https://gerrit.wikimedia.org/r/593936 (https://phabricator.wikimedia.org/T242882) [00:41:54] (03CR) 10Jforrester: "Huh, this is rather neat." [puppet] - 10https://gerrit.wikimedia.org/r/593936 (https://phabricator.wikimedia.org/T242882) (owner: 10Brennen Bearnes) [00:48:36] (03CR) 10Brennen Bearnes: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/593936 (https://phabricator.wikimedia.org/T242882) (owner: 10Brennen Bearnes) [02:04:12] PROBLEM - Check systemd state on boron is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [02:12:34] PROBLEM - Check the last execution of package_builder_Clean_up_build_directory on boron is CRITICAL: CRITICAL: Status of the systemd unit package_builder_Clean_up_build_directory https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [05:39:24] PROBLEM - Backup freshness on backup1001 is CRITICAL: Stale: 1 (gerrit1001), Fresh: 92 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring [07:00:04] Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200503T0700) [07:01:52] PROBLEM - snapshot of s3 in eqiad on db1115 is CRITICAL: snapshot for s3 at eqiad taken more than 3 days ago: Most recent backup 2020-04-30 06:33:38 https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Alerting [08:02:22] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3058 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [08:28:02] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3058 is OK: HTTP OK: HTTP/1.0 200 OK - 22729 bytes in 0.258 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [08:41:56] PROBLEM - Backup freshness on backup1001 is CRITICAL: Stale: 1 (gerrit1001), Fresh: 92 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring [10:07:14] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3060 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [10:18:02] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3060 is OK: HTTP OK: HTTP/1.0 200 OK - 22731 bytes in 0.271 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [11:44:20] RECOVERY - Backup freshness on backup1001 is OK: Fresh: 93 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring [13:10:12] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3064 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [13:11:52] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3064 is OK: HTTP OK: HTTP/1.0 200 OK - 22726 bytes in 0.258 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [14:25:50] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_cxserver_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [14:26:50] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3052 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [14:27:42] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [14:35:52] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3052 is OK: HTTP OK: HTTP/1.0 200 OK - 22736 bytes in 0.256 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [18:31:12] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3064 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [18:34:44] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3064 is OK: HTTP OK: HTTP/1.0 200 OK - 22749 bytes in 0.258 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [19:03:08] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3060 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [19:04:50] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3060 is OK: HTTP OK: HTTP/1.0 200 OK - 22735 bytes in 0.259 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [20:03:56] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3050 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [20:09:18] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3050 is OK: HTTP OK: HTTP/1.0 200 OK - 22730 bytes in 0.259 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [20:16:30] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3060 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [20:23:48] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3060 is OK: HTTP OK: HTTP/1.0 200 OK - 22727 bytes in 7.477 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [20:57:52] 10Operations, 10MediaWiki-extensions-CodeReview, 10Patch-For-Review: Set up static-codereview.wikimedia.org to host static HTML dump of CodeReview - https://phabricator.wikimedia.org/T243056 (10Legoktm) >>! In T243056#6096021, @Dzahn wrote: > We have currently about 9.4GB left on those servers. So while 4GB... [21:03:48] (03PS1) 10Paladox: Drop ProjectCreatedListener [software/gerrit/plugins/wikimedia] (stable-3.1) - 10https://gerrit.wikimedia.org/r/593966 [21:05:00] (03PS2) 10Paladox: Drop ProjectCreatedListener [software/gerrit/plugins/wikimedia] (stable-3.1) - 10https://gerrit.wikimedia.org/r/593966 [21:05:45] (03PS3) 10Paladox: Drop ProjectCreatedListener [software/gerrit/plugins/wikimedia] (stable-3.1) - 10https://gerrit.wikimedia.org/r/593966 [21:08:34] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3052 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [21:09:52] (03PS4) 10Paladox: Drop ProjectCreatedListener [software/gerrit/plugins/wikimedia] (stable-3.1) - 10https://gerrit.wikimedia.org/r/593966 [21:12:06] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3052 is OK: HTTP OK: HTTP/1.0 200 OK - 22740 bytes in 0.274 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [21:19:02] 10Operations, 10Release-Engineering-Team-TODO, 10Continuous-Integration-Infrastructure (phase-out-jessie), 10Patch-For-Review, 10Release-Engineering-Team (CI & Testing services): Migrate contint* hosts to Buster - https://phabricator.wikimedia.org/T224591 (10hashar) [21:32:58] !log bmansurov@deploy1001 Started deploy [recommendation-api/deploy@0c68d62]: Update the recommendation API service [21:33:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:37:20] !log bmansurov@deploy1001 Finished deploy [recommendation-api/deploy@0c68d62]: Update the recommendation API service (duration: 04m 22s) [21:37:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:39:20] 10Operations, 10Research, 10Patch-For-Review: recommendation api's test on scb nodes are flapping - https://phabricator.wikimedia.org/T247732 (10bmansurov) @elukey I've deployed the fix. Let me know if you still see the issue. [22:03:22] (03PS3) 10Reedy: Update path to CirrusSearch maintenance scripts [puppet] - 10https://gerrit.wikimedia.org/r/591335 (https://phabricator.wikimedia.org/T250806) [22:03:41] PROBLEM - Check systemd state on an-launcher1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [22:05:29] (03CR) 10Krinkle: noc.wikimedia.org: highlight.php should not append .txt to dblist URLs (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591459 (https://phabricator.wikimedia.org/T250852) (owner: 10Urbanecm) [22:08:48] (03PS10) 10Urbanecm: noc.wikimedia.org: highlight.php should not append .txt to dblist URLs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591459 (https://phabricator.wikimedia.org/T250852) [22:08:53] (03CR) 10Urbanecm: noc.wikimedia.org: highlight.php should not append .txt to dblist URLs (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591459 (https://phabricator.wikimedia.org/T250852) (owner: 10Urbanecm) [22:09:10] (03PS11) 10Krinkle: noc: Fix highlight.php to not append .txt to dblist URLs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591459 (https://phabricator.wikimedia.org/T250852) (owner: 10Urbanecm) [22:09:31] (03CR) 10Krinkle: [C: 03+1] "Locally verified by running 'php -S localhost:4000' in the noc/ directory and then trying it out." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591459 (https://phabricator.wikimedia.org/T250852) (owner: 10Urbanecm) [22:10:49] (03PS12) 10Urbanecm: noc: Fix highlight.php to not append .txt to dblist URLs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591459 (https://phabricator.wikimedia.org/T250852) [22:13:07] (03CR) 10Urbanecm: [C: 03+1] Update Phab task for elwikiversity logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593902 (https://phabricator.wikimedia.org/T248391) (owner: 10QEDK) [22:13:08] (03CR) 10Urbanecm: [C: 03+1] "LGTM, thanks for the patch!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593893 (https://phabricator.wikimedia.org/T248391) (owner: 10Diomidis Spinellis) [22:15:40] (03CR) 10Urbanecm: [C: 04-1] "> Patch Set 2: Code-Review-1" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593743 (https://phabricator.wikimedia.org/T251371) (owner: 10Zoranzoki21) [22:17:24] (03CR) 10Urbanecm: [C: 04-1] Remove "Create a book" link on enwiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/561403 (https://phabricator.wikimedia.org/T241683) (owner: 10DannyS712) [22:18:02] (03PS1) 10Urbanecm: dblists: Remove "do not modify" note from all.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593968 [22:18:07] (03CR) 10Reedy: [C: 04-1] "Doesn't mean you have to commit it" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593743 (https://phabricator.wikimedia.org/T251371) (owner: 10Zoranzoki21) [22:19:34] (03CR) 10jerkins-bot: [V: 04-1] dblists: Remove "do not modify" note from all.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593968 (owner: 10Urbanecm) [22:29:25] (03PS2) 10Urbanecm: dblists: Remove "do not modify" note from all.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593968 [22:30:20] (03CR) 10jerkins-bot: [V: 04-1] dblists: Remove "do not modify" note from all.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593968 (owner: 10Urbanecm) [22:30:57] (03PS3) 10Urbanecm: dblists: Remove "do not modify" note from all.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593968 [22:35:52] (03CR) 10Krinkle: [C: 03+1] "I've added preg_match as well. Sorry, I did not see your patch while I was writing mine. Did not mean to overwrite." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591459 (https://phabricator.wikimedia.org/T250852) (owner: 10Urbanecm) [22:36:23] (03CR) 10Urbanecm: "> Patch Set 12:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591459 (https://phabricator.wikimedia.org/T250852) (owner: 10Urbanecm) [22:39:42] (03PS1) 10Reedy: Add cf-request-id as a cf new header. [puppet] - 10https://gerrit.wikimedia.org/r/593969 [22:40:45] (03CR) 10Krinkle: [C: 03+2] noc: Fix highlight.php to not append .txt to dblist URLs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591459 (https://phabricator.wikimedia.org/T250852) (owner: 10Urbanecm) [22:41:45] (03Merged) 10jenkins-bot: noc: Fix highlight.php to not append .txt to dblist URLs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591459 (https://phabricator.wikimedia.org/T250852) (owner: 10Urbanecm) [22:42:39] !log scap pull mwmaint1002 and mw2001 for noc.wm.o. – https://gerrit.wikimedia.org/r/591459 [22:42:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:43:20] (03CR) 10Krinkle: dblists: Remove "do not modify" note from all.dblist (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593968 (owner: 10Urbanecm) [22:44:19] (03CR) 10Urbanecm: dblists: Remove "do not modify" note from all.dblist (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593968 (owner: 10Urbanecm) [22:45:14] (03PS4) 10Urbanecm: dblists: Remove "do not modify" note from all.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593968 [22:48:31] (03CR) 10Krinkle: "I think the bug here is that the YAML tree is missing 'all' as ultimate parent. That should be fixed instead. By adding inherits from 'al" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593968 (owner: 10Urbanecm) [22:49:00] (03CR) 10Krinkle: dblists: Remove "do not modify" note from all.dblist (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593968 (owner: 10Urbanecm) [22:49:12] (03CR) 10Urbanecm: "> Patch Set 4:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593968 (owner: 10Urbanecm) [22:50:06] (03PS2) 10Krinkle: noc: Remove broken symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593929 (owner: 10Urbanecm) [22:50:09] (03CR) 10Krinkle: [C: 03+2] noc: Remove broken symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593929 (owner: 10Urbanecm) [22:50:59] (03Merged) 10jenkins-bot: noc: Remove broken symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593929 (owner: 10Urbanecm) [22:51:18] (03CR) 10Krinkle: "> I don't see how would that change the header of the dblist file?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593968 (owner: 10Urbanecm) [22:51:43] (03CR) 10Krinkle: "It also makes it so that when we start using YAML for config, that 'all' is working instead of being broken, for default values" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593968 (owner: 10Urbanecm) [22:52:35] !log scap pull mwmaint1002 and mw2001 for noc.wm.o. – https://gerrit.wikimedia.org/r/593929 [22:52:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:53:19] (03CR) 10Urbanecm: "> Patch Set 4:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593968 (owner: 10Urbanecm) [22:58:59] (03CR) 10Krinkle: "I see. There is a bootstrapping issue there. I thought it would scan the directory, but I guess it's reasonable to have at least one expli" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593968 (owner: 10Urbanecm) [23:00:31] RECOVERY - Check systemd state on an-launcher1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [23:04:47] (03CR) 10Krinkle: dblists: Remove "do not modify" note from all.dblist (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593968 (owner: 10Urbanecm) [23:07:24] (03PS5) 10Urbanecm: dblists: Remove "do not modify" note from all.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593968 [23:07:33] (03CR) 10Urbanecm: dblists: Remove "do not modify" note from all.dblist (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593968 (owner: 10Urbanecm) [23:08:39] (03PS6) 10Urbanecm: dblists: Remove "do not modify" note from all.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593968 [23:41:03] RECOVERY - mediawiki originals uploads -hourly- for eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad [23:41:53] RECOVERY - mediawiki originals uploads -hourly- for codfw on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw