[00:02:25] <wikibugs>	 (03PS1) 10CRusnov: netbox reports alerts: fix notes_url from variable rename error [puppet] - 10https://gerrit.wikimedia.org/r/552365
[00:02:54] <wikibugs>	 (03CR) 10CRusnov: [C: 03+2] "quick fix - no breakage possible" [puppet] - 10https://gerrit.wikimedia.org/r/552365 (owner: 10CRusnov)
[00:03:02] <wikibugs>	 (03PS1) 10Dzahn: xhgui: disable automatic rsync, keep it manual [puppet] - 10https://gerrit.wikimedia.org/r/552366
[00:03:59] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] xhgui: disable automatic rsync, keep it manual [puppet] - 10https://gerrit.wikimedia.org/r/552366 (owner: 10Dzahn)
[00:05:17] <icinga-wm>	 PROBLEM - Check systemd state on tungsten is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:08:50] <mutante>	 ^ me, missing some IPv6 records.. fixing 
[00:09:44] <icinga-wm>	 ACKNOWLEDGEMENT - Check systemd state on tungsten is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. daniel_zahn xhgui1001 needs IPv6 records for ferm https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:18:56] <logmsgbot>	 !log catrope@deploy1001 Synchronized php-1.35.0-wmf.5/extensions/GrowthExperiments/: Make non-remote titles work in RemotePageConfigurationLoader (T237301) (duration: 00m 54s)
[00:19:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:19:01] <stashbot>	 T237301: Newcomer tasks: fix and migrate JSON config pages - https://phabricator.wikimedia.org/T237301
[00:20:15] <wikibugs>	 (03PS1) 10Dzahn: add IPv6 records for xhgui1001/xhgui2001 [dns] - 10https://gerrit.wikimedia.org/r/552368 (https://phabricator.wikimedia.org/T238098)
[00:20:39] <logmsgbot>	 !log catrope@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Move newcomer tasks JSON config from mw.org to local wikis (T237301) (duration: 00m 52s)
[00:20:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:25:02] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] add IPv6 records for xhgui1001/xhgui2001 [dns] - 10https://gerrit.wikimedia.org/r/552368 (https://phabricator.wikimedia.org/T238098) (owner: 10Dzahn)
[00:25:06] <wikibugs>	 (03PS2) 10Dzahn: add IPv6 records for xhgui1001/xhgui2001 [dns] - 10https://gerrit.wikimedia.org/r/552368 (https://phabricator.wikimedia.org/T238098)
[00:37:23] <mutante>	 !log tungsten - starting ferm service
[00:37:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:37:35] <icinga-wm>	 RECOVERY - Check systemd state on tungsten is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:46:45] <mutante>	 !log xhgui1001/xhgui2001 - rsyncing /srv/mongod from tungsten to /srv/tungsten/mongod/ on both new machines (T158837)
[00:46:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:46:50] <stashbot>	 T158837: Consolidate performance website and related software - https://phabricator.wikimedia.org/T158837
[00:48:42] <wikibugs>	 (03PS1) 10Dzahn: xhgui: also copy tungsten mongodb data to xhgui2001 [puppet] - 10https://gerrit.wikimedia.org/r/552369
[00:49:21] <wikibugs>	 (03PS2) 10Dzahn: xhgui: also copy tungsten mongodb data to xhgui2001 [puppet] - 10https://gerrit.wikimedia.org/r/552369
[00:49:23] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] xhgui: also copy tungsten mongodb data to xhgui2001 [puppet] - 10https://gerrit.wikimedia.org/r/552369 (owner: 10Dzahn)
[00:57:31] <wikibugs>	 (03PS2) 10Dzahn: xhgui: disable automatic rsync, keep it manual [puppet] - 10https://gerrit.wikimedia.org/r/552366
[01:38:01] <Krenair>	 Anyone from ops around?
[01:38:07] <Krenair>	 or security or cloud services
[01:39:26] * Platonides doesn't like what these questions seem to imply
[02:25:31] <wikibugs>	 (03CR) 10Eevans: [C: 03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/552257 (https://phabricator.wikimedia.org/T237143) (owner: 10Mobrovac)
[02:43:07] <wikibugs>	 (03PS1) 10DannyS712: Remove `move-rootuserpages` from user on svwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/552374 (https://phabricator.wikimedia.org/T238842)
[02:45:10] <wikibugs>	 (03PS2) 10DannyS712: Remove `move-rootuserpages` from user on svwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/552374 (https://phabricator.wikimedia.org/T238842)
[02:45:48] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Remove `move-rootuserpages` from user on svwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/552374 (https://phabricator.wikimedia.org/T238842) (owner: 10DannyS712)
[03:03:16] <wikibugs>	 (03PS3) 10DannyS712: Remove `move-rootuserpages` from user on svwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/552374 (https://phabricator.wikimedia.org/T238842)
[03:23:27] <wikibugs>	 (03CR) 10Vgutierrez: "looking good :) see the inline comments" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/552201 (https://phabricator.wikimedia.org/T233274) (owner: 10Ema)
[03:49:27] <shdubsh>	 !log restart prometheus@ops on prometheus1003 T238807
[03:49:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:49:34] <stashbot>	 T238807: Clean up ORES metrics - https://phabricator.wikimedia.org/T238807
[03:58:07] <icinga-wm>	 PROBLEM - Prometheus prometheus1003/ops restarted: beware possible monitoring artifacts on prometheus1003 is CRITICAL: instance=127.0.0.1:9900 job=prometheus https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_was_restarted https://grafana.wikimedia.org/d/000000271/prometheus-stats?var-datasource=eqiad+prometheus/ops
[04:21:59] <icinga-wm>	 RECOVERY - Prometheus prometheus1003/ops restarted: beware possible monitoring artifacts on prometheus1003 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_was_restarted https://grafana.wikimedia.org/d/000000271/prometheus-stats?var-datasource=eqiad+prometheus/ops
[04:57:08] <wikibugs>	 (03PS1) 10Vgutierrez: ATS: Enable log rotation via logrotate [puppet] - 10https://gerrit.wikimedia.org/r/552379 (https://phabricator.wikimedia.org/T238724)
[05:00:55] <wikibugs>	 (03CR) 10Vgutierrez: "pcc looks happy: https://puppet-compiler.wmflabs.org/compiler1003/19543/" [puppet] - 10https://gerrit.wikimedia.org/r/552379 (https://phabricator.wikimedia.org/T238724) (owner: 10Vgutierrez)
[06:07:35] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s8 on db2083 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 86444.17 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave
[06:08:26] <marostegui>	 ^ downtime expired
[06:15:53] <wikibugs>	 (03PS1) 10Marostegui: install_server: Do not reimage db213[2-5] [puppet] - 10https://gerrit.wikimedia.org/r/552380 (https://phabricator.wikimedia.org/T238183)
[06:18:25] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] install_server: Do not reimage db213[2-5] [puppet] - 10https://gerrit.wikimedia.org/r/552380 (https://phabricator.wikimedia.org/T238183) (owner: 10Marostegui)
[06:24:10] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Promote db1086 to s7 primary master [puppet] - 10https://gerrit.wikimedia.org/r/552381 (https://phabricator.wikimedia.org/T238044)
[06:25:00] <wikibugs>	 (03PS1) 10Marostegui: wmnet: Update s7-master CNAME [dns] - 10https://gerrit.wikimedia.org/r/552382 (https://phabricator.wikimedia.org/T238044)
[06:25:15] <wikibugs>	 (03CR) 10Marostegui: [C: 04-2] "Wait for the failover day" [puppet] - 10https://gerrit.wikimedia.org/r/552381 (https://phabricator.wikimedia.org/T238044) (owner: 10Marostegui)
[06:25:30] <wikibugs>	 (03CR) 10Marostegui: [C: 04-2] "Wait for the failover day" [dns] - 10https://gerrit.wikimedia.org/r/552382 (https://phabricator.wikimedia.org/T238044) (owner: 10Marostegui)
[06:31:46] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Rebalance weights on s7 in preparation for s7 failover on Tuesday T238044', diff saved to https://phabricator.wikimedia.org/P9722 and previous config saved to /var/cache/conftool/dbconfig/20191122-063145-marostegui.json
[06:31:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:31:51] <stashbot>	 T238044: Switchover s7 primary database master db1062 -> db1086 - 26th Nov 06:00 - 06:30 UTC - https://phabricator.wikimedia.org/T238044
[06:50:37] <icinga-wm>	 PROBLEM - snapshot of s3 in codfw on db1115 is CRITICAL: snapshot for s3 at codfw taken more than 4 days ago: Most recent backup 2019-11-18 06:38:42 https://wikitech.wikimedia.org/wiki/MariaDB/Backups
[07:16:57] <wikibugs>	 (03PS1) 10MaxSem: admin: Remove myself [puppet] - 10https://gerrit.wikimedia.org/r/552389
[07:18:33] <icinga-wm>	 PROBLEM - Memory correctable errors -EDAC- on mw1239 is CRITICAL: 4.001 ge 4 https://wikitech.wikimedia.org/wiki/Monitoring/Memory%23Memory_correctable_errors_-EDAC- https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=mw1239&var-datasource=eqiad+prometheus/ops
[07:23:23] <_joe_>	 MaxSem: :/
[07:23:48] * MaxSem hugs _joe_ 
[07:24:36] <_joe_>	 MaxSem: I'll think of you every time I need to spell kartotherian correctly  :P
[07:25:38] <MaxSem>	 I blame Yuri, my original "kartotherion" was so much easier :P
[07:25:52] <_joe_>	 ahahah
[07:37:29] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Thanks :-) I'll merge this after your last work day" [puppet] - 10https://gerrit.wikimedia.org/r/552389 (owner: 10MaxSem)
[07:40:28] <wikibugs>	 (03CR) 10Muehlenhoff: Add image submission mode to debmonitor client (031 comment) [software/debmonitor] - 10https://gerrit.wikimedia.org/r/551220 (https://phabricator.wikimedia.org/T237978) (owner: 10Muehlenhoff)
[07:43:39] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at codfw on icinga1001 is CRITICAL: 54.11 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[07:47:03] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at codfw on icinga1001 is OK: (C)60 le (W)70 le 71.03 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[07:51:35] <wikibugs>	 (03PS1) 10Elukey: role::prometheus::analytics: remove old burrow configuration [puppet] - 10https://gerrit.wikimedia.org/r/552393 (https://phabricator.wikimedia.org/T238794)
[08:11:17] <wikibugs>	 (03PS1) 10Vgutierrez: acme_chief: Add smokeping certificate [puppet] - 10https://gerrit.wikimedia.org/r/552398 (https://phabricator.wikimedia.org/T238900)
[08:14:32] <wikibugs>	 (03CR) 10Ema: [C: 03+1] acme_chief: Add smokeping certificate [puppet] - 10https://gerrit.wikimedia.org/r/552398 (https://phabricator.wikimedia.org/T238900) (owner: 10Vgutierrez)
[08:22:01] <icinga-wm>	 PROBLEM - Wikitech-static main page has content on labweb1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static
[08:22:04] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] role::prometheus::analytics: remove old burrow configuration [puppet] - 10https://gerrit.wikimedia.org/r/552393 (https://phabricator.wikimedia.org/T238794) (owner: 10Elukey)
[08:22:12] <wikibugs>	 (03CR) 10Ema: [C: 04-1] "As discussed on irc with Valentin, there's a bit of confusion in hieradata/role/common/acme_chief.yaml when it comes to librenms, netbox, " [puppet] - 10https://gerrit.wikimedia.org/r/552398 (https://phabricator.wikimedia.org/T238900) (owner: 10Vgutierrez)
[08:22:13] <icinga-wm>	 PROBLEM - Wikitech-static main page has content on labweb1002 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 1866 bytes in 1.500 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static
[08:23:35] <icinga-wm>	 RECOVERY - Wikitech-static main page has content on labweb1001 is OK: HTTP OK: HTTP/1.1 200 OK - 28421 bytes in 0.179 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static
[08:23:57] <icinga-wm>	 RECOVERY - Wikitech-static main page has content on labweb1002 is OK: HTTP OK: HTTP/1.1 200 OK - 28534 bytes in 0.180 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static
[08:24:22] <wikibugs>	 (03CR) 10Ema: [C: 03+1] ATS: Enable log rotation via logrotate [puppet] - 10https://gerrit.wikimedia.org/r/552379 (https://phabricator.wikimedia.org/T238724) (owner: 10Vgutierrez)
[08:30:04] <wikibugs>	 (03CR) 10Ema: ATS: enable reload for global Lua script (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/552201 (https://phabricator.wikimedia.org/T233274) (owner: 10Ema)
[08:32:07] <wikibugs>	 (03PS2) 10ArielGlenn: redact possible password entries in dumps log exceptions emailer [puppet] - 10https://gerrit.wikimedia.org/r/552328
[08:33:04] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] ATS: enable reload for global Lua script [puppet] - 10https://gerrit.wikimedia.org/r/552201 (https://phabricator.wikimedia.org/T233274) (owner: 10Ema)
[08:33:36] <wikibugs>	 (03PS1) 10Muehlenhoff: Add support for PDNS 4 [debs/prometheus-pdns-exporter] - 10https://gerrit.wikimedia.org/r/552467 (https://phabricator.wikimedia.org/T227411)
[08:35:32] <wikibugs>	 (03CR) 10ArielGlenn: [C: 03+2] redact possible password entries in dumps log exceptions emailer [puppet] - 10https://gerrit.wikimedia.org/r/552328 (owner: 10ArielGlenn)
[08:38:05] <wikibugs>	 (03PS2) 10Vgutierrez: ATS: Enable log rotation via logrotate [puppet] - 10https://gerrit.wikimedia.org/r/552379 (https://phabricator.wikimedia.org/T238724)
[08:40:58] <wikibugs>	 (03PS7) 10Muehlenhoff: Add image submission mode to debmonitor client [software/debmonitor] - 10https://gerrit.wikimedia.org/r/551220 (https://phabricator.wikimedia.org/T237978)
[08:41:12] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] ATS: Enable log rotation via logrotate [puppet] - 10https://gerrit.wikimedia.org/r/552379 (https://phabricator.wikimedia.org/T238724) (owner: 10Vgutierrez)
[08:46:24] <wikibugs>	 (03PS1) 10Ema: cache: reimage cp1081 as text_ats [puppet] - 10https://gerrit.wikimedia.org/r/552468 (https://phabricator.wikimedia.org/T227432)
[08:46:35] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Add image submission mode to debmonitor client [software/debmonitor] - 10https://gerrit.wikimedia.org/r/551220 (https://phabricator.wikimedia.org/T237978) (owner: 10Muehlenhoff)
[08:49:06] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] cache: reimage cp1081 as text_ats [puppet] - 10https://gerrit.wikimedia.org/r/552468 (https://phabricator.wikimedia.org/T227432) (owner: 10Ema)
[08:49:52] <ema>	 !log depool cp1081 and reimage as text_ats T227432
[08:49:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:49:58] <stashbot>	 T227432: Replace Varnish backends with ATS on cache text nodes - https://phabricator.wikimedia.org/T227432
[08:50:26] <wikibugs>	 (03CR) 10Ema: [C: 03+2] cache: reimage cp1081 as text_ats [puppet] - 10https://gerrit.wikimedia.org/r/552468 (https://phabricator.wikimedia.org/T227432) (owner: 10Ema)
[08:54:05] <gehel>	 !log restarting blazegraph and updater on edqs1007
[08:54:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:54:09] <gehel>	 !log restarting blazegraph and updater on wdqs1007
[08:54:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:04:58] <gehel>	 !log remove blazegraph 2.1.5-wmf.11 from archiva, broken upload
[09:05:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:05:03] <wikibugs>	 (03CR) 10Elukey: "Just created the kerberos keytabs and uploaded them to the puppet private repo, the change can be merged anytime. Going to wait for an exp" [puppet] - 10https://gerrit.wikimedia.org/r/550466 (https://phabricator.wikimedia.org/T234229) (owner: 10Elukey)
[09:05:19] <logmsgbot>	 !log ema@cumin1001 START - Cookbook sre.hosts.downtime
[09:05:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:07:27] <logmsgbot>	 !log ema@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[09:07:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:17:26] <wikibugs>	 (03CR) 10Reedy: Add PoolCounter configuration for Special:Contributions (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/552228 (https://phabricator.wikimedia.org/T234450) (owner: 10Reedy)
[09:17:30] <wikibugs>	 (03PS2) 10Reedy: Add PoolCounter configuration for Special:Contributions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/552228 (https://phabricator.wikimedia.org/T234450)
[09:17:34] <wikibugs>	 (03CR) 10Reedy: [C: 03+2] Add PoolCounter configuration for Special:Contributions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/552228 (https://phabricator.wikimedia.org/T234450) (owner: 10Reedy)
[09:17:45] <wikibugs>	 (03CR) 10Elukey: "created https://phabricator.wikimedia.org/T238905 and tagged for sre-access-request since this change needs the SRE team's approval to be " [puppet] - 10https://gerrit.wikimedia.org/r/552304 (https://phabricator.wikimedia.org/T236180) (owner: 10EBernhardson)
[09:18:02] <wikibugs>	 (03CR) 10Vgutierrez: "yeah, actually getting rid of that subprocess would be really nice." [puppet] - 10https://gerrit.wikimedia.org/r/552336 (https://phabricator.wikimedia.org/T98006) (owner: 10BBlack)
[09:18:20] <wikibugs>	 (03Merged) 10jenkins-bot: Add PoolCounter configuration for Special:Contributions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/552228 (https://phabricator.wikimedia.org/T234450) (owner: 10Reedy)
[09:19:53] <logmsgbot>	 !log reedy@deploy1001 Synchronized wmf-config/CommonSettings.php: T234450 (duration: 00m 55s)
[09:19:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:21:02] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM overall, see bikeshe^Wnit on job name" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/549871 (https://phabricator.wikimedia.org/T237234) (owner: 10Giuseppe Lavagetto)
[09:23:10] <logmsgbot>	 !log reedy@deploy1001 Synchronized php-1.35.0-wmf.5/includes/specials/pagers/ContribsPager.php: Remove live hack of limit for T234450 (duration: 00m 54s)
[09:23:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:27:39] <gehel>	 !log depool wdqs1007 to allow to catch up on lag - T238229
[09:27:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:27:43] <stashbot>	 T238229: WDQS is having high update lag for the last week - https://phabricator.wikimedia.org/T238229
[09:28:10] <ema>	 !log pool cp1081 with ATS backend T227432
[09:28:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:28:14] <stashbot>	 T227432: Replace Varnish backends with ATS on cache text nodes - https://phabricator.wikimedia.org/T227432
[09:28:41] <addshore>	 jouncebot now
[09:28:41] <jouncebot>	 No deployments scheduled for the forseeable future!
[09:31:11] <wikibugs>	 (03PS1) 10Addshore: wgWikidataOrgQueryServiceMaxLagFactor 60 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/552474 (https://phabricator.wikimedia.org/T221774)
[09:39:55] <wikibugs>	 (03PS1) 10Ladsgroup: Add gcr and shy-latn to langlist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/552476 (https://phabricator.wikimedia.org/T238104)
[09:40:47] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] Add gcr and shy-latn to langlist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/552476 (https://phabricator.wikimedia.org/T238104) (owner: 10Ladsgroup)
[09:41:28] <wikibugs>	 (03Merged) 10jenkins-bot: Add gcr and shy-latn to langlist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/552476 (https://phabricator.wikimedia.org/T238104) (owner: 10Ladsgroup)
[09:44:52] <logmsgbot>	 !log ladsgroup@deploy1001 Synchronized langlist: T238104 T238104 (duration: 00m 52s)
[09:44:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:44:58] <stashbot>	 T238104: Create Guianan Creole Wikipedia - https://phabricator.wikimedia.org/T238104
[09:45:37] <wikibugs>	 (03PS1) 10Ladsgroup: Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/552477
[09:45:39] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/552477 (owner: 10Ladsgroup)
[09:45:56] <_joe_>	 wikibugs: hey
[09:46:04] <_joe_>	 you're not working
[09:46:36] <wikibugs>	 (03Merged) 10jenkins-bot: Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/552477 (owner: 10Ladsgroup)
[09:46:53] <Reedy>	 _joe_: anything in particular?
[09:47:08] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at codfw on icinga1001 is CRITICAL: 56.14 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[09:47:10] <Reedy>	 Phab?
[09:47:15] <_joe_>	 Reedy: yeah
[09:47:23] <_joe_>	 no phab task updates
[09:47:24] <Reedy>	 probably just needs that service restarting
[09:47:25] <Reedy>	 moment
[09:47:36] <logmsgbot>	 !log ladsgroup@deploy1001 Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 20s)
[09:47:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:49:57] <Reedy>	 should be coming back
[09:51:56] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at codfw on icinga1001 is CRITICAL: 59.91 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[09:53:05] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] Add fix for tclap position (#9702) [debs/envoyproxy] (wikimedia-stretch) - 10https://gerrit.wikimedia.org/r/552311 (owner: 10Giuseppe Lavagetto)
[09:53:15] <wikibugs>	 (03PS1) 10Ladsgroup: Rename shy-latn to shy in langlist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/552478 (https://phabricator.wikimedia.org/T238105)
[09:53:24] <Amir1>	 I'm deploying some fixes for the new wikis
[09:53:29] <addshore>	 :)
[09:53:35] <addshore>	 Amir1: I have one for when your done ;)
[09:53:35] * Reedy throws stuff at wikibugs
[09:53:40] <Reedy>	 addshore: no
[09:53:48] <addshore>	 Reedy: shhhh
[09:53:50] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] Rename shy-latn to shy in langlist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/552478 (https://phabricator.wikimedia.org/T238105) (owner: 10Ladsgroup)
[09:54:09] <Amir1>	 addshore: Sam is sitting next me if you need me to persuade him
[09:54:31] <wikibugs>	 (03Merged) 10jenkins-bot: Rename shy-latn to shy in langlist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/552478 (https://phabricator.wikimedia.org/T238105) (owner: 10Ladsgroup)
[09:54:52] <Reedy>	 _joe_: Should hopefully work now
[09:55:18] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at codfw on icinga1001 is OK: (C)60 le (W)70 le 80.27 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[09:56:37] <addshore>	 Reedy: Amir1 https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/552474/
[09:56:46] <logmsgbot>	 !log ladsgroup@deploy1001 Synchronized langlist: T238105 (duration: 00m 51s)
[09:56:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:56:51] <stashbot>	 T238105: Create Shawiya Wiktionary - https://phabricator.wikimedia.org/T238105
[09:57:55] <wikibugs>	 (03PS1) 10Ladsgroup: Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/552479
[09:57:57] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/552479 (owner: 10Ladsgroup)
[09:58:41] <wikibugs>	 (03Merged) 10jenkins-bot: Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/552479 (owner: 10Ladsgroup)
[09:59:41] <logmsgbot>	 !log ladsgroup@deploy1001 Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 10s)
[09:59:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:05:51] <wikibugs>	 (03PS1) 10Filippo Giunchedi: profile: add esams/eqsin snmp_exporter configs [puppet] - 10https://gerrit.wikimedia.org/r/552480
[10:09:31] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] profile: add esams/eqsin snmp_exporter configs [puppet] - 10https://gerrit.wikimedia.org/r/552480 (owner: 10Filippo Giunchedi)
[10:09:35] <wikibugs>	 10Operations, 10observability, 10Availability, 10Goal, 10Patch-For-Review: Setup bacula backup monitoring - https://phabricator.wikimedia.org/T234900 (10jcrespo) I will document the graph when it is "finished" (WIP), but for now: * Backup time: end_time - start_time of the last backup * Backup level: if...
[10:15:07] <wikibugs>	 10Operations, 10Puppet, 10Patch-For-Review, 10User-jbond: Create NRPE check to alert when cergen certificates are due to expire - https://phabricator.wikimedia.org/T238833 (10jbond) @Ottomata you are correct the script just needs to read the public certificates however the directory with the public certifi...
[10:17:33] <wikibugs>	 10Puppet, 10User-jbond, 10cloud-services-team (Kanban): Prevent catalog breakage on cloud instances by decoupling core cloud puppetmaster from custom puppetmasters - https://phabricator.wikimedia.org/T227029 (10aborrero) p:05Triage→03Normal
[10:18:05] <wikibugs>	 10Puppet, 10Cloud-Services, 10cloud-services-team (Kanban): Consider ways to make puppetmaster CA changes smoother on the puppet client end - https://phabricator.wikimedia.org/T220268 (10aborrero) p:05Triage→03Normal
[10:18:39] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+1] "Looks good to me: https://phab.wmfusercontent.org/file/data/h7ttcuxgdhhyh2h6orv6/PHID-FILE-n7zfiugajujgrf4zxwrc/Screenshot_20191122_111632" [puppet] - 10https://gerrit.wikimedia.org/r/552381 (https://phabricator.wikimedia.org/T238044) (owner: 10Marostegui)
[10:19:26] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+1] "Same suggestion as before." [dns] - 10https://gerrit.wikimedia.org/r/552382 (https://phabricator.wikimedia.org/T238044) (owner: 10Marostegui)
[10:20:46] <wikibugs>	 (03PS2) 10Marostegui: mariadb: Promote db1086 to s7 primary master [puppet] - 10https://gerrit.wikimedia.org/r/552381 (https://phabricator.wikimedia.org/T238044)
[10:20:48] <wikibugs>	 (03CR) 10Marostegui: [C: 04-2] "> Looks good to me: https://phab.wmfusercontent.org/file/data/h7ttcuxgdhhyh2h6orv6/PHID-FILE-n7zfiugajujgrf4zxwrc/Screenshot_20191122_1116" [puppet] - 10https://gerrit.wikimedia.org/r/552381 (https://phabricator.wikimedia.org/T238044) (owner: 10Marostegui)
[10:20:51] <wikibugs>	 10Operations, 10observability: The "logstash-*" index pattern does not contain any of the following field types: ip - https://phabricator.wikimedia.org/T238795 (10fgiunchedi) Looks good! I won't have time to look into this in depth but I'm happy to help if patches need review
[10:21:16] <wikibugs>	 (03PS2) 10Marostegui: wmnet: Update s7-master CNAME [dns] - 10https://gerrit.wikimedia.org/r/552382 (https://phabricator.wikimedia.org/T238044)
[10:21:22] <wikibugs>	 10Operations, 10Epic, 10cloud-services-team (Kanban): CloudVPS: our ideal future model - https://phabricator.wikimedia.org/T209460 (10aborrero)
[10:21:29] <wikibugs>	 (03CR) 10Marostegui: [C: 04-2] "> Same suggestion as before." [dns] - 10https://gerrit.wikimedia.org/r/552382 (https://phabricator.wikimedia.org/T238044) (owner: 10Marostegui)
[10:32:49] <wikibugs>	 (03PS1) 10Jbond: icinga::cas: update bool_2_on_off function [puppet] - 10https://gerrit.wikimedia.org/r/552481
[10:33:36] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: Correct debian/format to quilt [debs/envoyproxy] (wikimedia-stretch) - 10https://gerrit.wikimedia.org/r/552483
[10:35:41] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] icinga::cas: update bool_2_on_off function [puppet] - 10https://gerrit.wikimedia.org/r/552481 (owner: 10Jbond)
[10:36:39] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] Correct debian/format to quilt [debs/envoyproxy] (wikimedia-stretch) - 10https://gerrit.wikimedia.org/r/552483 (owner: 10Giuseppe Lavagetto)
[10:54:25] <wikibugs>	 (03PS1) 10Filippo Giunchedi: WIP: move to Debian packaging [software/logstash/plugins] - 10https://gerrit.wikimedia.org/r/552486 (https://phabricator.wikimedia.org/T217340)
[10:56:02] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at codfw on icinga1001 is CRITICAL: 56.44 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[10:58:16] <wikibugs>	 (03PS3) 10BBlack: vcl: Bump TLSv1/TLSv1.1 pageview replacement to 4% [puppet] - 10https://gerrit.wikimedia.org/r/550868 (https://phabricator.wikimedia.org/T238038) (owner: 10Vgutierrez)
[10:58:18] <wikibugs>	 (03PS3) 10BBlack: vcl: Bump TLSv1/TLSv1.1 pageview replacement to 10% [puppet] - 10https://gerrit.wikimedia.org/r/550869 (https://phabricator.wikimedia.org/T238038) (owner: 10Vgutierrez)
[10:58:20] <wikibugs>	 (03PS3) 10BBlack: vcl: Bump TLSv1/TLSv1.1 pageview replacement to 100% [puppet] - 10https://gerrit.wikimedia.org/r/550870 (https://phabricator.wikimedia.org/T238038) (owner: 10Vgutierrez)
[10:58:22] <wikibugs>	 (03PS1) 10BBlack: browsersec: cover bot traffic better [puppet] - 10https://gerrit.wikimedia.org/r/552488 (https://phabricator.wikimedia.org/T238038)
[10:59:48] <vgutierrez>	 Creative code review
[11:05:14] <wikibugs>	 10Operations, 10observability, 10Availability, 10Goal, 10Patch-For-Review: Setup bacula backup monitoring - https://phabricator.wikimedia.org/T234900 (10Marostegui) Just brainstorming here about the dashboard, feel free to ignore, I know it is WIP.  - It would be nice to include the date on the "last day...
[11:09:13] <wikibugs>	 10Operations, 10Puppet, 10Patch-For-Review, 10User-jbond: Create NRPE check to alert when cergen certificates are due to expire - https://phabricator.wikimedia.org/T238833 (10akosiaris) Just so that you aren't caught off guard   ` file {'/srv/private/secret/secrets/certificate': `  a form of this kind of a...
[11:11:12] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at codfw on icinga1001 is OK: (C)60 le (W)70 le 72.94 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[11:29:44] <effie>	 !log upload wikidiff2 1.10.0-1 - T236963
[11:29:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:29:50] <stashbot>	 T236963: Deploy version 1.10.0 of wikidiff2 to production - https://phabricator.wikimedia.org/T236963
[11:34:25] <effie>	 !log Roll out wikidiff2 1.10.0-1 to canaries - T236963
[11:34:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:38:05] <wikibugs>	 (03CR) 10BBlack: [C: 03+2] browsersec: cover bot traffic better [puppet] - 10https://gerrit.wikimedia.org/r/552488 (https://phabricator.wikimedia.org/T238038) (owner: 10BBlack)
[11:41:12] <icinga-wm>	 PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 47 probes of 490 (alerts on 35) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts
[11:41:45] <wikibugs>	 (03CR) 10BBlack: [C: 03+2] vcl: Bump TLSv1/TLSv1.1 pageview replacement to 4% [puppet] - 10https://gerrit.wikimedia.org/r/550868 (https://phabricator.wikimedia.org/T238038) (owner: 10Vgutierrez)
[11:46:52] <icinga-wm>	 RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 26 probes of 490 (alerts on 35) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts
[11:52:00] <wikibugs>	 10Operations, 10Traffic, 10Wikidata, 10Wikidata-Campsite, 10User-DannyS712: 502 errors on ATS/8.0.5 - https://phabricator.wikimedia.org/T237319 (10WMDE-leszek) Thanks @elukey and @Joe for translating from leet speak! I've filed T238901 about the problem in Wikibase, and we'll be looking into fixing the b...
[11:59:40] <effie>	 !log reload php7 on canaries 
[11:59:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:00:03] <wikibugs>	 (03PS1) 10Jcrespo: prometheus-bacula-exporter: Parallelize bconsole executions [puppet] - 10https://gerrit.wikimedia.org/r/552490 (https://phabricator.wikimedia.org/T234900)
[12:04:23] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] prometheus-bacula-exporter: Parallelize bconsole executions [puppet] - 10https://gerrit.wikimedia.org/r/552490 (https://phabricator.wikimedia.org/T234900) (owner: 10Jcrespo)
[12:04:35] <wikibugs>	 (03PS2) 10Jcrespo: prometheus-bacula-exporter: Parallelize bconsole executions [puppet] - 10https://gerrit.wikimedia.org/r/552490 (https://phabricator.wikimedia.org/T234900)
[12:09:43] <wikibugs>	 10Operations, 10MediaWiki-REST-API, 10serviceops, 10wikidiff2, and 2 others: Deploy version 1.10.0 of wikidiff2 to production - https://phabricator.wikimedia.org/T236963 (10jijiki) Version 1.10.0 has been deployed to the canaries, we can roll out to production on Monday
[12:10:29] <wikibugs>	 10Operations, 10observability, 10Availability, 10Goal, 10Patch-For-Review: Setup bacula backup monitoring - https://phabricator.wikimedia.org/T234900 (10jcrespo) >>! In T234900#5683712, @jcrespo wrote: > As I feared, the exported during peak hours gets too slow: https://grafana.wikimedia.org/d/413r2vbWk/...
[12:15:04] <wikibugs>	 (03PS1) 10Jcrespo: prometheus-bacula-exporter: Restart service on code change [puppet] - 10https://gerrit.wikimedia.org/r/552491 (https://phabricator.wikimedia.org/T234900)
[12:17:18] <wikibugs>	 (03PS1) 10Filippo Giunchedi: prometheus: lower threshold for logstash indexing failures [puppet] - 10https://gerrit.wikimedia.org/r/552492 (https://phabricator.wikimedia.org/T236343)
[12:17:40] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] prometheus: lower threshold for logstash indexing failures [puppet] - 10https://gerrit.wikimedia.org/r/552492 (https://phabricator.wikimedia.org/T236343) (owner: 10Filippo Giunchedi)
[12:18:40] <wikibugs>	 (03PS3) 10BBlack: acme-chief: parallelize gdnsd-sync [puppet] - 10https://gerrit.wikimedia.org/r/552336 (https://phabricator.wikimedia.org/T98006)
[12:18:42] <wikibugs>	 (03PS2) 10BBlack: authdns: refactor role/profile/hieradata bits [puppet] - 10https://gerrit.wikimedia.org/r/552346 (https://phabricator.wikimedia.org/T98006)
[12:21:04] <wikibugs>	 (03CR) 10BBlack: "Seems clean?" [puppet] - 10https://gerrit.wikimedia.org/r/552346 (https://phabricator.wikimedia.org/T98006) (owner: 10BBlack)
[12:26:20] <wikibugs>	 (03CR) 10BBlack: "Better run with icinga as well: https://puppet-compiler.wmflabs.org/compiler1003/19547/" [puppet] - 10https://gerrit.wikimedia.org/r/552346 (https://phabricator.wikimedia.org/T98006) (owner: 10BBlack)
[12:28:36] <addshore>	 jouncebot now
[12:28:36] <jouncebot>	 No deployments scheduled for the forseeable future!
[12:30:24] <wikibugs>	 (03PS2) 10Addshore: wgWikidataOrgQueryServiceMaxLagFactor 60 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/552474 (https://phabricator.wikimedia.org/T221774)
[12:30:26] <wikibugs>	 (03CR) 10Addshore: [C: 03+2] wgWikidataOrgQueryServiceMaxLagFactor 60 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/552474 (https://phabricator.wikimedia.org/T221774) (owner: 10Addshore)
[12:30:33] <addshore>	 wikibugs is slooooow
[12:31:49] <addshore>	 heh
[12:32:27] <logmsgbot>	 !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T221774 - wgWikidataOrgQueryServiceMaxLagFactor 60 (duration: 00m 53s)
[12:32:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:32:33] <stashbot>	 T221774: Add Wikidata query service lag to Wikidata maxlag - https://phabricator.wikimedia.org/T221774
[12:33:30] <addshore>	 aaand resync because IS.php is lame
[12:34:11] <logmsgbot>	 !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T221774 - wgWikidataOrgQueryServiceMaxLagFactor 60 RESYNC (duration: 00m 51s)
[12:34:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:34:17] * addshore is done
[12:45:48] <Amir1>	 I'm backporting this bug thingy
[12:46:19] <bblack>	 sounds like the story of my life
[12:46:24] <wikibugs>	 (03PS3) 10BBlack: authdns: refactor role/profile/hieradata bits [puppet] - 10https://gerrit.wikimedia.org/r/552346 (https://phabricator.wikimedia.org/T98006)
[12:46:29] <Reedy>	 I prefer you backport bug fixes
[12:46:32] <Reedy>	 But each to their own
[12:47:41] <Amir1>	 lol
[12:49:44] <wikibugs>	 (03PS1) 10Jbond: profile::idp::client: add a profile for configuering apache sites [puppet] - 10https://gerrit.wikimedia.org/r/552494
[12:52:45] <wikibugs>	 (03PS4) 10BBlack: authdns: refactor role/profile/hieradata bits [puppet] - 10https://gerrit.wikimedia.org/r/552346 (https://phabricator.wikimedia.org/T98006)
[12:54:25] <icinga-wm>	 PROBLEM - MediaWiki memcached error rate on icinga1001 is CRITICAL: 8429 gt 5000 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[12:55:35] <icinga-wm>	 RECOVERY - MediaWiki memcached error rate on icinga1001 is OK: (C)5000 gt (W)1000 gt 4 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[12:55:51] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] profile::idp::client: add a profile for configuering apache sites [puppet] - 10https://gerrit.wikimedia.org/r/552494 (owner: 10Jbond)
[12:56:54] <wikibugs>	 (03PS5) 10BBlack: authdns: refactor role/profile/hieradata bits [puppet] - 10https://gerrit.wikimedia.org/r/552346 (https://phabricator.wikimedia.org/T98006)
[12:59:20] <wikibugs>	 (03CR) 10Muehlenhoff: profile::idp::client: add a profile for configuering apache sites (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/552494 (owner: 10Jbond)
[12:59:22] <wikibugs>	 10Operations, 10Traffic, 10Performance-Team (Radar): ATS doesn't support X-Wikimedia-Debug - https://phabricator.wikimedia.org/T237687 (10ema) >>! In T237687#5679746, @Krinkle wrote: > The issue - When `X-Wikimedia-Debug` is enabled (e.g. via the WikimediaDebug browser extension), I am no longer able to brow...
[12:59:30] <wikibugs>	 10Operations, 10Traffic, 10Performance-Team (Radar): ATS doesn't support X-Wikimedia-Debug - https://phabricator.wikimedia.org/T237687 (10ema) p:05High→03Normal
[13:04:16] <Amir1>	 it's in mwdebug1002, worked fine, syncing
[13:05:12] <wikibugs>	 (03PS1) 10Jbond: profile::idp::client: remove SAML validation, fix trailing slash [puppet] - 10https://gerrit.wikimedia.org/r/552496
[13:06:06] <logmsgbot>	 !log ladsgroup@deploy1001 Synchronized php-1.35.0-wmf.5/extensions/Wikibase/lib/includes/Store/Sql/SqlEntityInfoBuilder.php: T238473 (duration: 00m 52s)
[13:06:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:06:11] <stashbot>	 T238473: Label for unit isn't displayed correctly, just Q-number - https://phabricator.wikimedia.org/T238473
[13:06:55] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/552496 (owner: 10Jbond)
[13:08:02] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] "PCC https://puppet-compiler.wmflabs.org/compiler1003/19550/icinga1001.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/552496 (owner: 10Jbond)
[13:11:02] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] prometheus-bacula-exporter: Restart service on code change [puppet] - 10https://gerrit.wikimedia.org/r/552491 (https://phabricator.wikimedia.org/T234900) (owner: 10Jcrespo)
[13:11:23] <Amir1>	 !log start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T238119 T238524 T237375 T238120)
[13:11:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:11:32] <stashbot>	 T238119: Add Wikidata support for gcrwiki - https://phabricator.wikimedia.org/T238119
[13:11:32] <stashbot>	 T238524: Add Wikidata support for minwiktionary - https://phabricator.wikimedia.org/T238524
[13:11:33] <stashbot>	 T237375: Add Wikidata support for szywiki - https://phabricator.wikimedia.org/T237375
[13:11:33] <stashbot>	 T238120: Add Wikidata support for shywiktionary - https://phabricator.wikimedia.org/T238120
[13:13:07] <wikibugs>	 (03PS1) 10Lucas Werkmeister (WMDE): Wikibase (beta-only): Update wmgWikibaseClientDataBridgeHrefRegExp [mediawiki-config] - 10https://gerrit.wikimedia.org/r/552498 (https://phabricator.wikimedia.org/T238918)
[13:18:49] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[13:28:09] <wikibugs>	 (03PS1) 10Kosta Harlan: Beta labs: Remove unused GrowthExperiments config var [mediawiki-config] - 10https://gerrit.wikimedia.org/r/552501
[13:28:11] <wikibugs>	 (03PS1) 10Kosta Harlan: GrowthExperiments: Remove unused config var [mediawiki-config] - 10https://gerrit.wikimedia.org/r/552502
[13:28:29] <wikibugs>	 (03PS1) 10Jbond: cas-puppetboard.wikimedia.org: add record [dns] - 10https://gerrit.wikimedia.org/r/552503
[13:30:24] <wikibugs>	 10Operations, 10User-jbond: Add cas authentication to puppetboard - https://phabricator.wikimedia.org/T238924 (10jbond)
[13:31:35] <wikibugs>	 10Operations, 10GLOW, 10SRE-Access-Requests: Requesting access to sites from Google Search Console - https://phabricator.wikimedia.org/T238868 (10Aklapper) Setting #SRE-Access-Requests as per https://wikitech.wikimedia.org/wiki/Google_Search_Console_access (and removing #Operations as subscriber).
[13:36:36] <wikibugs>	 (03CR) 10Alexandros Kosiaris: prometheus-bacula-exporter: Parallelize bconsole executions (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/552490 (https://phabricator.wikimedia.org/T234900) (owner: 10Jcrespo)
[13:38:16] <wikibugs>	 (03CR) 10Thiemo Kreuz (WMDE): Wikibase (beta-only): Update wmgWikibaseClientDataBridgeHrefRegExp (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/552498 (https://phabricator.wikimedia.org/T238918) (owner: 10Lucas Werkmeister (WMDE))
[13:39:16] <wikibugs>	 (03PS1) 10Ema: vcl: move XWD logic to text_common_recv/misc_recv_pass [puppet] - 10https://gerrit.wikimedia.org/r/552504 (https://phabricator.wikimedia.org/T233768)
[13:42:13] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Spam from a non-registered email address coming non-moderated to a restricted mailing list - https://phabricator.wikimedia.org/T238871 (10Aklapper) What does "restricted mailing list" mean exactly when it comes to the settings?
[13:47:34] <wikibugs>	 (03PS1) 10BBlack: dnsrecursor: modernize notrack for udp:53 [puppet] - 10https://gerrit.wikimedia.org/r/552506 (https://phabricator.wikimedia.org/T98006)
[13:49:38] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] dnsrecursor: modernize notrack for udp:53 [puppet] - 10https://gerrit.wikimedia.org/r/552506 (https://phabricator.wikimedia.org/T98006) (owner: 10BBlack)
[13:51:50] <wikibugs>	 (03Abandoned) 10Ema: vcl: move XWD logic to text_common_recv/misc_recv_pass [puppet] - 10https://gerrit.wikimedia.org/r/552504 (https://phabricator.wikimedia.org/T233768) (owner: 10Ema)
[13:52:44] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] prometheus-bacula-exporter: Restart service on code change [puppet] - 10https://gerrit.wikimedia.org/r/552491 (https://phabricator.wikimedia.org/T234900) (owner: 10Jcrespo)
[13:59:27] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[13:59:27] <icinga-wm>	 RECOVERY - snapshot of s3 in codfw on db1115 is OK: snapshot for s3 at codfw taken less than 4 days ago and larger than 90 GB: Last one 2019-11-22 10:34:25 from db2098.codfw.wmnet:3313 (811 GB) https://wikitech.wikimedia.org/wiki/MariaDB/Backups
[14:00:00] <wikibugs>	 (03PS1) 10Ema: Revert "vcl: move XWD pass logic to wm_common" [puppet] - 10https://gerrit.wikimedia.org/r/552507 (https://phabricator.wikimedia.org/T233768)
[14:00:02] <wikibugs>	 (03PS1) 10Ema: cache: do not cache noc.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/552508 (https://phabricator.wikimedia.org/T233768)
[14:18:37] <_joe_>	 uh what's going on with appservers
[14:20:09] <_joe_>	 hah I think it's Amir1's script
[14:20:48] <Amir1>	 oh, where is the error
[14:20:58] <Amir1>	 my script has finished (not the term store)
[14:21:01] <_joe_>	 Amir1: https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad%20prometheus%2Fops&var-cluster=appserver&var-method=GET
[14:21:15] <_joe_>	 the degradation started when you started the script
[14:21:22] <_joe_>	 only correlation I found
[14:21:31] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[14:21:48] <_joe_>	 ok lemme ack that alert for now
[14:22:01] <Amir1>	 but it ended an hour ago
[14:22:26] <Amir1>	 that seems worrying
[14:22:33] <_joe_>	 indeed
[14:22:46] <_joe_>	 and none of the usual suspects seems to be the issue then
[14:23:24] <_joe_>	 webpagetest agrees btw
[14:24:03] <Amir1>	 I can't find anything in https://grafana.wikimedia.org/d/000000278/mysql-aggregated?orgId=1
[14:24:20] <elukey>	 when did it start more or less?
[14:24:26] <_joe_>	 14:11
[14:24:34] <_joe_>	 err sorry, 13:11
[14:25:02] <elukey>	 ack thanks, I see now
[14:25:12] <Amir1>	 There's huge increase in read in s5 but I don't know if it's related
[14:25:33] <marostegui>	 it could be dumps, let me check
[14:25:54] <_joe_>	 possibly?
[14:25:54] <elukey>	 I noticed the memcached alerts a while before, there were two spikes that auto-resolved, but then I noticed mc1021's tx bandwidth usage that was higher than the rest (https://grafana.wikimedia.org/d/000000316/memcache?panelId=56&fullscreen&orgId=1&from=now-12h&to=now). It doesn't correlate though
[14:26:44] <Amir1>	 https://grafana.wikimedia.org/d/000000548/wikibase-wb_terms?refresh=30s&orgId=1&from=now-3h&to=now nothing is doing on wikidata, otherwise this would explode
[14:27:01] <marostegui>	 It doesn't seem to be dumps, but a script running on the vslow host
[14:27:08] <marostegui>	 https://grafana.wikimedia.org/d/000000273/mysql?panelId=3&fullscreen&orgId=1&var-dc=eqiad%20prometheus%2Fops&var-server=db1113&var-port=13315&from=now-24h&to=now
[14:27:15] <marostegui>	 The rest of the slaves do not have that increase
[14:27:30] <marostegui>	 SELECT /* SpecialGadgetUsage::reallyDoQuery
[14:27:31] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at codfw on icinga1001 is CRITICAL: 58.49 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[14:28:13] <marostegui>	 yep, confirmed, just the vslow host
[14:28:26] * apergos peeks back in (sorry, I was trying to shovel food in my mouth)
[14:30:02] <_joe_>	 does this justify the current slowness?
[14:30:26] <marostegui>	 _joe_: it should not, it is just a dewiki slave with a script running, but other than that it is not causing lag or anything
[14:30:51] <_joe_>	 now lemme see if we have per-wiki data somewhere
[14:31:07] <apergos>	 is this updateSpecialPages.php? 
[14:31:20] <marostegui>	 apergos: No, see above
[14:31:28] <marostegui>	 apergos: At least on s5 vslow host
[14:31:32] <apergos>	 ah ha
[14:32:37] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at codfw on icinga1001 is OK: (C)60 le (W)70 le 70.98 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[14:33:21] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[14:34:10] <_joe_>	 I am trying to look at perf data, but it's not clear at all where this time is spent
[14:34:26] <wikibugs>	 10Operations, 10ops-esams: rack/setup/install ganeti300[123] - https://phabricator.wikimedia.org/T236216 (10BBlack)
[14:36:50] <wikibugs>	 10Operations, 10ops-esams: rack/setup/install ganeti300[123] - https://phabricator.wikimedia.org/T236216 (10BBlack) **IMPORTANT NOTE** `ganeti3003` is temporarily repurposed as a critical authdns server and is in live production use for that role (see also: T236479 ).  Do not reimage or touch `ganeti3003`.  Th...
[14:38:54] <wikibugs>	 (03PS1) 10Filippo Giunchedi: prometheus: record job/site availability [puppet] - 10https://gerrit.wikimedia.org/r/552511
[14:38:57] <Amir1>	 _joe_: https://grafana.wikimedia.org/d/2kP3FjAZk/webpagereplay-enwiki-alerts?orgId=1 ?
[14:39:06] <Amir1>	 Nothing is exploding on enwiki 
[14:39:13] <_joe_>	 yeah
[14:39:30] <_joe_>	 it's going to be some scraper
[14:39:35] <_joe_>	 messing with our stats
[14:39:55] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] prometheus: record job/site availability [puppet] - 10https://gerrit.wikimedia.org/r/552511 (owner: 10Filippo Giunchedi)
[14:40:05] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[14:40:07] <Amir1>	 https://grafana.wikimedia.org/d/000000326/navigation-timing-alerts?refresh=5m&orgId=1&from=now-6h&to=now
[14:40:38] <Amir1>	 We are at this hackathon, maybe people are doing crazy things right now
[14:40:48] <_joe_>	 ahah
[14:41:33] <Amir1>	 why not median?
[14:41:44] <Amir1>	 average can be messed up so easily
[14:42:27] <_joe_>	 median, when you use buckets, is usually way less accurate
[14:42:36] <_joe_>	 the effect is there on the 95th percentile too
[14:42:48] <_joe_>	 75th percentile is not
[14:42:58] <_joe_>	 so I guess it's possible it's just a long tail
[14:44:22] <wikibugs>	 (03PS1) 10Jbond: profile::idp::client::httpd: refactor [puppet] - 10https://gerrit.wikimedia.org/r/552512
[14:44:24] <wikibugs>	 (03PS1) 10Jbond: puppetboard: Add cas authentication [puppet] - 10https://gerrit.wikimedia.org/r/552513 (https://phabricator.wikimedia.org/T238924)
[14:44:45] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: otrs: Switch from X-Real-IP to X-Client-IP [puppet] - 10https://gerrit.wikimedia.org/r/552514
[14:44:47] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: Switch from X-Real-IP to X-Client-IP [puppet] - 10https://gerrit.wikimedia.org/r/552515
[14:47:23] <wikibugs>	 (03CR) 10Alexandros Kosiaris: "I guess I need a task, and this was done with a git grep -l | xargs sed incantation, so some review required. I also maybe wrong at the ap" [puppet] - 10https://gerrit.wikimedia.org/r/552515 (owner: 10Alexandros Kosiaris)
[14:47:30] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] profile::idp::client::httpd: refactor [puppet] - 10https://gerrit.wikimedia.org/r/552512 (owner: 10Jbond)
[14:47:48] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] puppetboard: Add cas authentication [puppet] - 10https://gerrit.wikimedia.org/r/552513 (https://phabricator.wikimedia.org/T238924) (owner: 10Jbond)
[14:48:34] <_joe_>	 !log uploaded envoyproxy 1.12.1 to {buster,stretch} T237235
[14:48:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:48:41] <stashbot>	 T237235: Build and upload envoy 1.12.0 package. - https://phabricator.wikimedia.org/T237235
[14:49:48] <_joe_>	 !log disabling puppet on restbase2018, testing envoy upgrade T238050
[14:49:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:49:53] <stashbot>	 T238050: envoy overwrites the server header - https://phabricator.wikimedia.org/T238050
[14:50:39] <wikibugs>	 (03PS1) 10Muehlenhoff: Add image tracking support [software/debmonitor] - 10https://gerrit.wikimedia.org/r/552517 (https://phabricator.wikimedia.org/T237978)
[14:52:41] <wikibugs>	 10Operations, 10RESTBase, 10Traffic: envoy overwrites the server header - https://phabricator.wikimedia.org/T238050 (10Joe) Confirmed the upgrade fixes the Server: header output:  ` restbase2018:~$ curl -k https://restbase2018:7443/de.wikipedia.org/v1/page/references/Der_Junge_mit_dem_gro%C3%9Fen_schwarzen_H...
[14:53:31] <wikibugs>	 10Operations, 10RESTBase, 10Traffic: envoy overwrites the server header - https://phabricator.wikimedia.org/T238050 (10Joe) @Vgutierrez I think you can just upgrade envoy across the fleet when you feel confident enough.
[14:55:02] <wikibugs>	 (03PS2) 10Jbond: profile::idp::client::httpd: refactor [puppet] - 10https://gerrit.wikimedia.org/r/552512
[14:55:23] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[14:58:02] <wikibugs>	 (03CR) 10Effie Mouzeli: "We are missing a bit of context here. Can you please elaborate or create a task ?" [puppet] - 10https://gerrit.wikimedia.org/r/552515 (owner: 10Alexandros Kosiaris)
[14:58:11] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] profile::idp::client::httpd: refactor [puppet] - 10https://gerrit.wikimedia.org/r/552512 (owner: 10Jbond)
[15:00:25] <wikibugs>	 (03PS1) 10Filippo Giunchedi: prometheus: alert on low job availability [puppet] - 10https://gerrit.wikimedia.org/r/552521 (https://phabricator.wikimedia.org/T187708)
[15:00:50] <wikibugs>	 10Operations, 10observability, 10Patch-For-Review: prometheus-pdns-exporter log noise about unexpected metrics - https://phabricator.wikimedia.org/T227411 (10Andrew) >>! In T227411#5683732, @MoritzMuehlenhoff wrote: > @Andrew : I created an (untested) patch which should fix this, can you take it from here?...
[15:01:20] <wikibugs>	 (03PS3) 10Jbond: profile::idp::client::httpd: refactor [puppet] - 10https://gerrit.wikimedia.org/r/552512
[15:02:29] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM, this is equivalent to the notrack parameter of ferm::service (which in turn relies on the NO_TRACK definition from modules/ferm/file" [puppet] - 10https://gerrit.wikimedia.org/r/552506 (https://phabricator.wikimedia.org/T98006) (owner: 10BBlack)
[15:04:24] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] profile::idp::client::httpd: refactor [puppet] - 10https://gerrit.wikimedia.org/r/552512 (owner: 10Jbond)
[15:06:24] <wikibugs>	 (03PS2) 10Jbond: puppetboard: Add cas authentication [puppet] - 10https://gerrit.wikimedia.org/r/552513 (https://phabricator.wikimedia.org/T238924)
[15:08:00] <wikibugs>	 (03PS4) 10Jbond: profile::idp::client::httpd: refactor [puppet] - 10https://gerrit.wikimedia.org/r/552512
[15:09:36] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] puppetboard: Add cas authentication [puppet] - 10https://gerrit.wikimedia.org/r/552513 (https://phabricator.wikimedia.org/T238924) (owner: 10Jbond)
[15:10:10] <wikibugs>	 10Operations: Integrate Buster 10.2 point update - https://phabricator.wikimedia.org/T238519 (10MoritzMuehlenhoff)
[15:11:02] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] profile::idp::client::httpd: refactor [puppet] - 10https://gerrit.wikimedia.org/r/552512 (owner: 10Jbond)
[15:12:08] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] prometheus-bacula-exporter: Restart service on code change [puppet] - 10https://gerrit.wikimedia.org/r/552491 (https://phabricator.wikimedia.org/T234900) (owner: 10Jcrespo)
[15:12:58] <wikibugs>	 (03PS5) 10Jbond: profile::idp::client::httpd: refactor [puppet] - 10https://gerrit.wikimedia.org/r/552512
[15:14:01] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[15:16:47] <logmsgbot>	 !log ladsgroup@deploy1001 Synchronized php-1.35.0-wmf.5/extensions/Wikibase/repo/: Stop outputting anything in case of 304 responses in Special:EntityData (T238901) (duration: 00m 57s)
[15:16:49] <wikibugs>	 (03PS6) 10Jbond: profile::idp::client::httpd: refactor [puppet] - 10https://gerrit.wikimedia.org/r/552512
[15:16:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:16:53] <stashbot>	 T238901: Wikibase's Special:EntityData should not emit when responding with HTTP code 304 - https://phabricator.wikimedia.org/T238901
[15:19:21] <wikibugs>	 (03PS7) 10Jbond: profile::idp::client::httpd: refactor [puppet] - 10https://gerrit.wikimedia.org/r/552512
[15:27:36] <wikibugs>	 (03CR) 10Alexandros Kosiaris: "I am as well, hence the lack of a task for now. I am still trying to fully figure out if we deprecated X-Real-IP or not" [puppet] - 10https://gerrit.wikimedia.org/r/552515 (owner: 10Alexandros Kosiaris)
[15:30:10] <wikibugs>	 (03PS8) 10Jbond: profile::idp::client::httpd: refactor [puppet] - 10https://gerrit.wikimedia.org/r/552512
[15:33:41] <wikibugs>	 (03PS9) 10Jbond: profile::idp::client::httpd: refactor [puppet] - 10https://gerrit.wikimedia.org/r/552512
[15:36:10] <wikibugs>	 (03PS3) 10Jbond: puppetboard: Add cas authentication [puppet] - 10https://gerrit.wikimedia.org/r/552513 (https://phabricator.wikimedia.org/T238924)
[15:36:11] <wikibugs>	 10Operations, 10Discovery-Search, 10SRE-Access-Requests: Allow analytics-search-users members to sudo as the airflow user - https://phabricator.wikimedia.org/T238905 (10Ottomata) Since this instance is maintained by the search team, I think re-using analytics-search-users makes sense to me.  We can re-evalua...
[15:37:14] <wikibugs>	 10Operations, 10Puppet, 10Patch-For-Review, 10User-jbond: Create NRPE check to alert when cergen certificates are due to expire - https://phabricator.wikimedia.org/T238833 (10Ottomata) Could we make the cergen script itself modify the permissions after it creates the files? It won't ensure things are right...
[15:37:45] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=205 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[15:38:04] <wikibugs>	 (03PS4) 10Jbond: puppetboard: Add cas authentication [puppet] - 10https://gerrit.wikimedia.org/r/552513 (https://phabricator.wikimedia.org/T238924)
[15:39:25] <wikibugs>	 10Operations, 10observability, 10Patch-For-Review: prometheus-pdns-exporter log noise about unexpected metrics - https://phabricator.wikimedia.org/T227411 (10Andrew) That patch seems to quiet the alerts; I'll see about building and deploying
[15:40:23] <XioNoX>	 !log renumber AS17639 sessions in eqsin
[15:40:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:40:40] <wikibugs>	 (03CR) 10Andrew Bogott: [V: 03+2 C: 03+2] Add support for PDNS 4 [debs/prometheus-pdns-exporter] - 10https://gerrit.wikimedia.org/r/552467 (https://phabricator.wikimedia.org/T227411) (owner: 10Muehlenhoff)
[15:41:05] <wikibugs>	 (03PS5) 10Jbond: puppetboard: Add cas authentication [puppet] - 10https://gerrit.wikimedia.org/r/552513 (https://phabricator.wikimedia.org/T238924)
[15:43:49] <wikibugs>	 (03PS1) 10Andrew Bogott: Bump changelog for pdns4 support [debs/prometheus-pdns-exporter] - 10https://gerrit.wikimedia.org/r/552531 (https://phabricator.wikimedia.org/T227411)
[15:43:55] <icinga-wm>	 PROBLEM - Check systemd state on netbox1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[15:45:35] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Bump changelog for pdns4 support [debs/prometheus-pdns-exporter] - 10https://gerrit.wikimedia.org/r/552531 (https://phabricator.wikimedia.org/T227411) (owner: 10Andrew Bogott)
[15:46:32] <icinga-wm>	 ACKNOWLEDGEMENT - Memory correctable errors -EDAC- on mw1239 is CRITICAL: 4.001 ge 4 Ayounsi still https://phabricator.wikimedia.org/T238018 https://wikitech.wikimedia.org/wiki/Monitoring/Memory%23Memory_correctable_errors_-EDAC- https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=mw1239&var-datasource=eqiad+prometheus/ops
[15:46:45] <wikibugs>	 (03CR) 10Muehlenhoff: Bump changelog for pdns4 support (031 comment) [debs/prometheus-pdns-exporter] - 10https://gerrit.wikimedia.org/r/552531 (https://phabricator.wikimedia.org/T227411) (owner: 10Andrew Bogott)
[15:47:55] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[15:51:20] <wikibugs>	 10Operations, 10MediaWiki-Cache, 10Performance-Team (Radar), 10User-Elukey: mcrouter does not remove a memcached shard from consistent hashing when timeouts happen - https://phabricator.wikimedia.org/T208934 (10elukey)
[15:52:01] <wikibugs>	 (03PS1) 10Andrew Bogott: reformat changelog line [debs/prometheus-pdns-exporter] - 10https://gerrit.wikimedia.org/r/552535
[15:52:21] <wikibugs>	 (03CR) 10Andrew Bogott: [V: 03+2 C: 03+2] reformat changelog line [debs/prometheus-pdns-exporter] - 10https://gerrit.wikimedia.org/r/552535 (owner: 10Andrew Bogott)
[15:52:59] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[15:53:00] <wikibugs>	 (03PS6) 10Jbond: puppetboard: Add cas authentication [puppet] - 10https://gerrit.wikimedia.org/r/552513 (https://phabricator.wikimedia.org/T238924)
[15:53:41] <effie>	 ^ looking at  mw requests latency
[15:58:05] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[16:03:52] <wikibugs>	 10Operations, 10observability, 10Patch-For-Review: prometheus-pdns-exporter log noise about unexpected metrics - https://phabricator.wikimedia.org/T227411 (10Andrew) 05Open→03Resolved a:03Andrew done -- logs are nice and quiet now.
[16:04:31] <wikibugs>	 (03PS1) 10Jbond: cas-puppetboard.wikimedia.org: add new cas protected puppetboard site [puppet] - 10https://gerrit.wikimedia.org/r/552536 (https://phabricator.wikimedia.org/T238924)
[16:06:33] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[16:07:21] <wikibugs>	 (03CR) 10ArielGlenn: "Someone else needs to make the call on which heder to use; if X-Client-IP turns out to be the choice, the dumps-related changes are good t" [puppet] - 10https://gerrit.wikimedia.org/r/552515 (owner: 10Alexandros Kosiaris)
[16:09:15] <icinga-wm>	 RECOVERY - Check systemd state on netbox1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:13:19] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[16:15:01] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[16:20:07] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[16:22:06] <shdubsh>	 !log clean tombstones on prometheus1003 - T238807
[16:22:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:22:11] <stashbot>	 T238807: Clean up ORES metrics - https://phabricator.wikimedia.org/T238807
[16:22:49] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at codfw on icinga1001 is CRITICAL: 59.84 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[16:23:04] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Spam from a non-registered email address coming non-moderated to a restricted mailing list - https://phabricator.wikimedia.org/T238871 (10Quiddity) 05Open→03Invalid You've got that address listed in the "List of non-member addresses whose postings should be automati...
[16:24:59] <wikibugs>	 (03PS1) 10Papaul: DNS: Remove mgnt DNS for db2048 and db2061 [dns] - 10https://gerrit.wikimedia.org/r/552539
[16:25:13] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[16:25:22] <wikibugs>	 10Operations, 10DC-Ops, 10decommission, 10fundraising-tech-ops: decommission alnilam.frack.codfw.wmnet - https://phabricator.wikimedia.org/T238233 (10Papaul)
[16:25:35] <wikibugs>	 10Operations, 10DC-Ops, 10decommission, 10fundraising-tech-ops: decommission alnilam.frack.codfw.wmnet - https://phabricator.wikimedia.org/T238233 (10Papaul) 05Open→03Resolved Complete
[16:27:53] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at codfw on icinga1001 is OK: (C)60 le (W)70 le 75.14 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[16:28:10] <wikibugs>	 (03PS1) 10Gehel: wdqs: remove the ban of Guzzle user agent. [puppet] - 10https://gerrit.wikimedia.org/r/552540
[16:28:37] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[16:30:56] <Amir1>	 I'm going to deploy the security thingy
[16:32:08] <Amir1>	 It's just postponed due to foooooooood
[16:33:43] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[16:36:33] <wikibugs>	 (03Abandoned) 10Papaul: DNS: Remove mgnt DNS for db2048 and db2061 [dns] - 10https://gerrit.wikimedia.org/r/552539 (owner: 10Papaul)
[16:37:26] <wikibugs>	 10Operations, 10Dumps-Generation, 10Patch-For-Review: Migrate dumpsdata hosts to Stretch/Buster - https://phabricator.wikimedia.org/T224563 (10ArielGlenn) Given that the wikidata entity dumps are still finishing up the truthy gz files, and after that there will be bz2 recompression and the Lexemes, I'll be m...
[16:42:11] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[16:45:37] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[16:47:11] <wikibugs>	 (03PS1) 10Papaul: DNS: Remove mgmt DNS for db2048 and db2061 [dns] - 10https://gerrit.wikimedia.org/r/552542
[16:53:34] <wikibugs>	 10Operations, 10observability: Clean up ORES metrics - https://phabricator.wikimedia.org/T238807 (10colewhite) 05Open→03Resolved
[16:55:49] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[16:58:57] <Lucas_WMDE>	 FYI, someone’s asking for an IP exception config change in #wikimedia-tech (for an event starting in an hour)
[16:59:10] <Lucas_WMDE>	 I’m not going to deploy that on a Friday evening, but if anyone else feels sufficiently adventurous…
[16:59:41] <wikibugs>	 (03PS2) 10Phamhi: labmon: add compatibility in buster [puppet] - 10https://gerrit.wikimedia.org/r/552107 (https://phabricator.wikimedia.org/T224585)
[17:02:48] <wikibugs>	 (03PS3) 10Phamhi: labmon: add compatibility in buster [puppet] - 10https://gerrit.wikimedia.org/r/552107 (https://phabricator.wikimedia.org/T224585)
[17:04:17] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[17:05:52] <wikibugs>	 (03CR) 10Phamhi: labmon: add compatibility in buster (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/552107 (https://phabricator.wikimedia.org/T224585) (owner: 10Phamhi)
[17:07:43] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[17:07:45] <wikibugs>	 10Operations, 10ops-esams, 10DC-Ops, 10Traffic: cp3056 hardware issue - https://phabricator.wikimedia.org/T236497 (10RobH) Please note this has had all the RAM/riser/cards reseated and continues to pass all Dell ePSA tests.  @bblack: With the reseating of everything, shall we reimage and try using this sys...
[17:07:49] <wikibugs>	 10Operations, 10Cloud-VPS (Debian Jessie Deprecation), 10Patch-For-Review, 10cloud-services-team (Kanban): Migrate labmon* to Stretch (or Buster, better yet!) - https://phabricator.wikimedia.org/T224585 (10Phamhi) As per suggestion, I have created different python files (no longer template) for different r...
[17:08:37] <wikibugs>	 10Operations, 10observability: Clean up ORES metrics - https://phabricator.wikimedia.org/T238807 (10colewhite) 05Resolved→03Open
[17:09:13] <shdubsh>	 !log restart prometheus on prometheus1004 - T238807
[17:09:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:09:18] <stashbot>	 T238807: Clean up ORES metrics - https://phabricator.wikimedia.org/T238807
[17:15:06] <wikibugs>	 10Operations, 10LDAP-Access-Requests: LDAP access to the wmf group for Leighanna Mixter - https://phabricator.wikimedia.org/T238933 (10Slaporte)
[17:17:25] <icinga-wm>	 PROBLEM - Prometheus prometheus1004/ops restarted: beware possible monitoring artifacts on prometheus1004 is CRITICAL: instance=127.0.0.1:9900 job=prometheus https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_was_restarted https://grafana.wikimedia.org/d/000000271/prometheus-stats?var-datasource=eqiad+prometheus/ops
[17:17:28] <wikibugs>	 10Operations, 10ops-esams, 10DC-Ops, 10Traffic: cp3056 hardware issue - https://phabricator.wikimedia.org/T236497 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by bblack on cumin1001.eqiad.wmnet for hosts: ` ['cp3056.esams.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/201911221...
[17:17:41] <wikibugs>	 10Operations, 10ops-esams, 10DC-Ops, 10Traffic: cp3056 hardware issue - https://phabricator.wikimedia.org/T236497 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cp3056.esams.wmnet'] `  Of which those **FAILED**: ` ['cp3056.esams.wmnet'] `
[17:18:01] <wikibugs>	 10Operations, 10ops-esams, 10DC-Ops, 10Traffic: cp3056 hardware issue - https://phabricator.wikimedia.org/T236497 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by bblack on cumin1001.eqiad.wmnet for hosts: ` ['cp3056.esams.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/201911221...
[17:19:23] <wikibugs>	 10Operations, 10ops-esams, 10DC-Ops, 10Traffic: cp3056 hardware issue - https://phabricator.wikimedia.org/T236497 (10BBlack) a:05RobH→03BBlack Attempting reimage (see above).  If it fails like before, it won't get very far (certainly not into production use).
[17:21:19] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[17:23:17] <wikibugs>	 10Operations, 10Puppet, 10Patch-For-Review, 10User-jbond: Create NRPE check to alert when cergen certificates are due to expire - https://phabricator.wikimedia.org/T238833 (10CDanis) >>! In T238833#5684837, @Ottomata wrote: > Could we make the cergen script itself modify the permissions after it creates th...
[17:24:43] <wikibugs>	 (03PS2) 10BBlack: dnsrecursor: modernize notrack for udp:53 [puppet] - 10https://gerrit.wikimedia.org/r/552506 (https://phabricator.wikimedia.org/T98006)
[17:27:55] <wikibugs>	 10Operations, 10Puppet, 10Patch-For-Review, 10User-jbond: Create NRPE check to alert when cergen certificates are due to expire - https://phabricator.wikimedia.org/T238833 (10Ottomata) Hm, would running cergen --generate with --force be enough?  `     -F --force                      If not provied --force,...
[17:28:09] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[17:28:16] <wikibugs>	 10Operations, 10Puppet, 10Patch-For-Review, 10User-jbond: Create NRPE check to alert when cergen certificates are due to expire - https://phabricator.wikimedia.org/T238833 (10Ottomata) I'll try to find some time soon to make cergen chmod after creating files.
[17:30:33] <shdubsh>	 !log clean tombstones on prometheus1004 - T238807
[17:30:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:30:39] <stashbot>	 T238807: Clean up ORES metrics - https://phabricator.wikimedia.org/T238807
[17:31:15] <wikibugs>	 10Operations, 10Puppet, 10Patch-For-Review, 10User-jbond: Create NRPE check to alert when cergen certificates are due to expire - https://phabricator.wikimedia.org/T238833 (10Ottomata) I will also update that doc for --force who wrote that!? ò_ô
[17:32:01] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: "Looks better than before!" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/552107 (https://phabricator.wikimedia.org/T224585) (owner: 10Phamhi)
[17:32:20] <Amir1>	 back to deploying the security thingy
[17:34:10] <wikibugs>	 (03PS1) 10BBlack: late_command: remove cpNNNN mkfs stuff [puppet] - 10https://gerrit.wikimedia.org/r/552547 (https://phabricator.wikimedia.org/T227432)
[17:34:49] <logmsgbot>	 !log bblack@cumin1001 START - Cookbook sre.hosts.downtime
[17:34:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:34:59] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[17:36:50] <wikibugs>	 (03CR) 10BBlack: [C: 03+2] late_command: remove cpNNNN mkfs stuff [puppet] - 10https://gerrit.wikimedia.org/r/552547 (https://phabricator.wikimedia.org/T227432) (owner: 10BBlack)
[17:36:54] <logmsgbot>	 !log bblack@cumin1001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
[17:36:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:37:33] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: toolforge: new k8s: specify default backend for nginx-ingress [puppet] - 10https://gerrit.wikimedia.org/r/550347 (https://phabricator.wikimedia.org/T234032)
[17:39:37] <icinga-wm>	 RECOVERY - Prometheus prometheus1004/ops restarted: beware possible monitoring artifacts on prometheus1004 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_was_restarted https://grafana.wikimedia.org/d/000000271/prometheus-stats?var-datasource=eqiad+prometheus/ops
[17:39:42] <wikibugs>	 (03PS1) 10BBlack: cp3056: re-enable cache::nodes entry [puppet] - 10https://gerrit.wikimedia.org/r/552548 (https://phabricator.wikimedia.org/T236497)
[17:40:12] <wikibugs>	 (03CR) 10Bstorm: [C: 03+1] "I think we have the service in toolsbeta? I didn't check, but I seem to remember Bryan did that. If so we can test it there :)" [puppet] - 10https://gerrit.wikimedia.org/r/550347 (https://phabricator.wikimedia.org/T234032) (owner: 10Arturo Borrero Gonzalez)
[17:40:49] <wikibugs>	 (03CR) 10Filippo Giunchedi: "LGTM overall, see nit inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/552107 (https://phabricator.wikimedia.org/T224585) (owner: 10Phamhi)
[17:42:00] <wikibugs>	 (03CR) 10BBlack: [C: 03+2] cp3056: re-enable cache::nodes entry [puppet] - 10https://gerrit.wikimedia.org/r/552548 (https://phabricator.wikimedia.org/T236497) (owner: 10BBlack)
[17:43:31] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[17:45:55] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: "> Patch Set 2: Code-Review+1" [puppet] - 10https://gerrit.wikimedia.org/r/550347 (https://phabricator.wikimedia.org/T234032) (owner: 10Arturo Borrero Gonzalez)
[17:46:15] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] toolforge: new k8s: specify default backend for nginx-ingress [puppet] - 10https://gerrit.wikimedia.org/r/550347 (https://phabricator.wikimedia.org/T234032) (owner: 10Arturo Borrero Gonzalez)
[17:49:20] <wikibugs>	 10Operations, 10ops-esams, 10DC-Ops, 10Traffic, 10Patch-For-Review: cp3056 hardware issue - https://phabricator.wikimedia.org/T236497 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cp3056.esams.wmnet'] `  and were **ALL** successful.
[17:51:19] <wikibugs>	 10Operations, 10serviceops: upgrade and rename krypton & create its codfw equivalent - https://phabricator.wikimedia.org/T224247 (10Dzahn) That's true, just had one last little todo here for the one in codfw. Doing that now.
[17:52:18] <_joe_>	 !log repooling restbase2018
[17:52:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:52:31] <wikibugs>	 10Operations, 10Wikimedia-Logstash, 10observability, 10service-runner, 10Core Platform Team (Needs Cleaning - Services Operations): Move graphoid logging to new logging pipeline - https://phabricator.wikimedia.org/T219923 (10Pchelolo) 05Stalled→03Declined Graphoid is likely going away, so we shouldn'...
[17:52:33] <wikibugs>	 10Operations, 10Wikimedia-Logstash, 10observability, 10service-runner, and 2 others: Move service-runner to new logging infrastructure - https://phabricator.wikimedia.org/T211125 (10Pchelolo)
[17:52:34] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on api_appserver in eqiad on icinga1001 is CRITICAL: cluster=api_appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=api_appserver&var-m
[17:53:49] <wikibugs>	 10Operations, 10CX-cxserver, 10Citoid, 10RESTBase, and 3 others: Decom legacy ex-parsoidcache cxserver, citoid, and restbase service hostnames - https://phabricator.wikimedia.org/T133001 (10Pchelolo) Nothing to do here for the core platform team anymore.
[17:54:40] <wikibugs>	 10Operations, 10ops-esams, 10DC-Ops, 10Traffic, 10Patch-For-Review: cp3056 hardware issue - https://phabricator.wikimedia.org/T236497 (10BBlack) So far so good - it has completed all the initial puppetization stuff, which is much further than it got before.  Given it's Friday and this node has a fishy hi...
[17:55:07] <wikibugs>	 (03CR) 10Bstorm: [C: 03+1] "I'll be around for a bit, so please merge soon in case of issues :)" [puppet] - 10https://gerrit.wikimedia.org/r/550466 (https://phabricator.wikimedia.org/T234229) (owner: 10Elukey)
[17:56:00] <wikibugs>	 (03PS4) 10Elukey: role::dumps::distribution::server: add kerberos [puppet] - 10https://gerrit.wikimedia.org/r/550466 (https://phabricator.wikimedia.org/T234229)
[17:56:48] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[17:56:51] <Amir1>	 The security thingy is over now
[17:57:04] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on api_appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=api_appserver&var-method=GET
[17:57:33] <wikibugs>	 10Operations, 10Traffic, 10fixcopyright.wikimedia.org, 10Core Platform Team Workboards (Clinic Duty Team), and 3 others: Retire fixcopyright.wikimedia.org - https://phabricator.wikimedia.org/T238803 (10Jdforrester-WMF)
[17:58:19] <wikibugs>	 10Operations, 10Analytics, 10ChangeProp, 10Core Platform Team, and 2 others: Consider the possibility of separating ChangeProp and JobQueue on Kafka level - https://phabricator.wikimedia.org/T199431 (10Pchelolo) It's still a viable idea, but I don't think we have the capacity to work on it now. Icebox.
[17:59:14] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[18:00:27] <wikibugs>	 10Operations, 10observability: Clean up ORES metrics - https://phabricator.wikimedia.org/T238807 (10colewhite) 05Open→03Resolved
[18:00:50] <wikibugs>	 (03CR) 10Bstorm: [C: 03+1] role::dumps::distribution::server: add analytics refinery [puppet] - 10https://gerrit.wikimedia.org/r/550816 (https://phabricator.wikimedia.org/T234229) (owner: 10Elukey)
[18:01:44] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[18:02:06] <wikibugs>	 10Operations, 10Discovery-Search, 10SRE-Access-Requests: Allow analytics-search-users members to sudo as the airflow user - https://phabricator.wikimedia.org/T238905 (10elukey) >>! In T238905#5684836, @Ottomata wrote: > Since this instance is maintained by the search team, I think re-using analytics-search-u...
[18:02:57] <shdubsh>	 !log restore prometheus services default settings - T238807
[18:03:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:03:02] <stashbot>	 T238807: Clean up ORES metrics - https://phabricator.wikimedia.org/T238807
[18:03:47] <wikibugs>	 10Operations, 10serviceops: Increased latency in appservers - 22 Nov 2019 - https://phabricator.wikimedia.org/T238939 (10jijiki)
[18:04:20] <wikibugs>	 (03PS1) 10Jforrester: Delete fixcopyrightwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/552549 (https://phabricator.wikimedia.org/T238803)
[18:04:22] <wikibugs>	 (03PS1) 10Jforrester: Drop ability to load SkinPerPage, EUCopyrightCampaign, and EUCopyrightCampaignSkin [mediawiki-config] - 10https://gerrit.wikimedia.org/r/552550 (https://phabricator.wikimedia.org/T238803)
[18:09:28] <icinga-wm>	 PROBLEM - Prometheus prometheus1004/ops restarted: beware possible monitoring artifacts on prometheus1004 is CRITICAL: instance=127.0.0.1:9900 job=prometheus https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_was_restarted https://grafana.wikimedia.org/d/000000271/prometheus-stats?var-datasource=eqiad+prometheus/ops
[18:10:10] <wikibugs>	 (03PS1) 10Dzahn: wikimania_scholarships app: use codfw database when in codfw [puppet] - 10https://gerrit.wikimedia.org/r/552551 (https://phabricator.wikimedia.org/T224247)
[18:10:16] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[18:12:31] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting Access to Stat1004, Stat1006, Stat1007, notebook1003 and notebook1004 - https://phabricator.wikimedia.org/T236321 (10CGlenn) I checked the SRE Clinic Duty. Should I assign this ticket to most recent person on the rotation roster?
[18:12:50] <icinga-wm>	 PROBLEM - Prometheus prometheus1004/global -or a Prometheus it scrapes- was restarted: beware possible monitoring artifacts on prometheus1004 is CRITICAL: instance=127.0.0.1:9900 job=prometheus site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_was_restarted https://grafana.wikimedia.org/d/000000271/prometheus-stats?var-datasource=eqiad+prometheus/global
[18:15:59] <wikibugs>	 (03PS1) 10Dzahn: iegreview app: use codfw database when in codfw [puppet] - 10https://gerrit.wikimedia.org/r/552552 (https://phabricator.wikimedia.org/T224247)
[18:17:04] <wikibugs>	 (03CR) 10Cicalese: "Thank you for working on this! SkinPerPage already was available prior to the creation of the fixcopyright wiki. It was an existing extens" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/552550 (https://phabricator.wikimedia.org/T238803) (owner: 10Jforrester)
[18:18:54] <wikibugs>	 (03CR) 10Jforrester: "> Patch Set 1:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/552550 (https://phabricator.wikimedia.org/T238803) (owner: 10Jforrester)
[18:19:15] <wikibugs>	 (03PS1) 10Dzahn: racktables: use codfw database when in codfw [puppet] - 10https://gerrit.wikimedia.org/r/552553 (https://phabricator.wikimedia.org/T224247)
[18:19:49] <icinga-wm>	 PROBLEM - Prometheus prometheus1003/ops restarted: beware possible monitoring artifacts on prometheus1003 is CRITICAL: instance=127.0.0.1:9900 job=prometheus https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_was_restarted https://grafana.wikimedia.org/d/000000271/prometheus-stats?var-datasource=eqiad+prometheus/ops
[18:23:11] <apergos>	 elukey: stil around? did you see brooke's +1's on the kerb patches (plus the comment: "merge soon"?)
[18:23:26] <apergos>	 although hm then that's merge fancy stuff ona friday... eh well
[18:28:01] <wikibugs>	 (03CR) 10Dzahn: "It is following how it was done for other services on https://phabricator.wikimedia.org/T210411 to stay consistent. Then you can change th" [dns] - 10https://gerrit.wikimedia.org/r/551938 (owner: 10Dzahn)
[18:29:26] <elukey>	 apergos: o/ - yes we synced and decided to postpone to next week, changes are harmless but friday etc..
[18:29:35] <elukey>	 thanks for the ping :)
[18:29:51] <wikibugs>	 (03PS4) 10Dzahn: ATS/varnish: rename thorium director to analytics-web [puppet] - 10https://gerrit.wikimedia.org/r/551939
[18:30:04] <wikibugs>	 (03CR) 10Dzahn: ATS/varnish: rename thorium director to analytics-web (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/551939 (owner: 10Dzahn)
[18:34:33] <icinga-wm>	 RECOVERY - Prometheus prometheus1004/ops restarted: beware possible monitoring artifacts on prometheus1004 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_was_restarted https://grafana.wikimedia.org/d/000000271/prometheus-stats?var-datasource=eqiad+prometheus/ops
[18:35:13] <wikibugs>	 (03PS6) 10Dzahn: monitoring: add data types to monitoring::service [puppet] - 10https://gerrit.wikimedia.org/r/551882
[18:38:14] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] monitoring: add data types to monitoring::service [puppet] - 10https://gerrit.wikimedia.org/r/551882 (owner: 10Dzahn)
[18:38:57] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code={200,204} handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=
[18:39:03] <icinga-wm>	 PROBLEM - High average POST latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=POST
[18:40:29] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[18:40:35] <icinga-wm>	 RECOVERY - High average POST latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=POST
[18:42:04] <wikibugs>	 (03CR) 10Cicalese: "> Patch Set 1:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/552550 (https://phabricator.wikimedia.org/T238803) (owner: 10Jforrester)
[18:42:58] <wikibugs>	 (03PS7) 10Dzahn: monitoring: add data types to monitoring::service [puppet] - 10https://gerrit.wikimedia.org/r/551882
[18:44:01] <icinga-wm>	 RECOVERY - Prometheus prometheus1004/global -or a Prometheus it scrapes- was restarted: beware possible monitoring artifacts on prometheus1004 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_was_restarted https://grafana.wikimedia.org/d/000000271/prometheus-stats?var-datasource=eqiad+prometheus/global
[18:45:01] <icinga-wm>	 RECOVERY - Prometheus prometheus1003/ops restarted: beware possible monitoring artifacts on prometheus1003 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_was_restarted https://grafana.wikimedia.org/d/000000271/prometheus-stats?var-datasource=eqiad+prometheus/ops
[18:45:50] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] monitoring: add data types to monitoring::service [puppet] - 10https://gerrit.wikimedia.org/r/551882 (owner: 10Dzahn)
[18:46:08] <wikibugs>	 10Operations, 10Traffic, 10serviceops: Increased latency in appservers - 22 Nov 2019 - https://phabricator.wikimedia.org/T238939 (10Dzahn)
[18:47:25] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Create OpenGLAM mailing list - https://phabricator.wikimedia.org/T238759 (10crusnov) 05Open→03Resolved a:03crusnov Hello! I have created the mailing list as requested.  Listinfo: https://lists.wikimedia.org/mailman/listinfo/open-glam List admin page: https://lists...
[18:51:42] <wikibugs>	 10Operations, 10Puppet, 10Patch-For-Review, 10User-jbond: Create NRPE check to alert when cergen certificates are due to expire - https://phabricator.wikimedia.org/T238833 (10Volans) >>! In T238833#5685019, @Ottomata wrote: > I'll try to find some time soon to make cergen chmod after creating files.  FYI W...
[18:52:04] <wikibugs>	 10Operations, 10Product-Analytics, 10SRE-Access-Requests: Search Console access for he.wikisource.org - https://phabricator.wikimedia.org/T238090 (10crusnov) a:05crusnov→03None Giving this to the next person on clinic duty. We still need to know the time limits and I believe some other information to com...
[19:06:25] <librenms-wmf>	 04Critical Alert for device asw2-esams.mgmt.esams.wmnet - Primary outbound port utilisation over 80%
[19:09:13] * Reedy squints
[19:14:51] <Nemo_bis>	 :/
[19:16:26] <librenms-wmf>	 04̶C̶r̶i̶t̶i̶c̶a̶l Device asw2-esams.mgmt.esams.wmnet recovered from Primary outbound port utilisation over 80%
[19:16:36] <wikibugs>	 (03PS2) 10CRusnov: admin: add cglenn to researchers and analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/545974 (https://phabricator.wikimedia.org/T236321) (owner: 10Cwhite)
[19:17:00] <wikibugs>	 (03CR) 10Phamhi: labmon: add compatibility in buster (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/552107 (https://phabricator.wikimedia.org/T224585) (owner: 10Phamhi)
[19:17:08] <wikibugs>	 (03CR) 10CRusnov: "This change is ready for review." [puppet] - 10https://gerrit.wikimedia.org/r/545974 (https://phabricator.wikimedia.org/T236321) (owner: 10Cwhite)
[19:18:57] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] admin: add cglenn to researchers and analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/545974 (https://phabricator.wikimedia.org/T236321) (owner: 10Cwhite)
[19:22:13] <wikibugs>	 (03PS2) 10Jforrester: Remove wgTorLoadNodes as it was removed in b5ccbee in 1.340-wmf.15+ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/550055 (owner: 10Reedy)
[19:22:23] <wikibugs>	 (03PS3) 10CRusnov: admin: add cglenn to researchers and analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/545974 (https://phabricator.wikimedia.org/T236321) (owner: 10Cwhite)
[19:22:34] <wikibugs>	 (03CR) 10Jforrester: [C: 03+1] "Good to deploy whenever." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/550055 (owner: 10Reedy)
[19:22:51] <wikibugs>	 (03CR) 10Jforrester: [C: 03+1] "Good to deploy whenever." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/552361 (https://phabricator.wikimedia.org/T231178) (owner: 10DannyS712)
[19:24:33] <wikibugs>	 (03CR) 10Dzahn: [C: 03+1] "key and groups match the information on the ticket" [puppet] - 10https://gerrit.wikimedia.org/r/545974 (https://phabricator.wikimedia.org/T236321) (owner: 10Cwhite)
[19:30:13] <wikibugs>	 10Operations, 10Traffic, 10serviceops: Increased latency in appservers - 22 Nov 2019 - https://phabricator.wikimedia.org/T238939 (10CDanis) At ~18:36 there was another spike in long-tail latency, but then, latency seemed to return to 'normal': https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red...
[19:40:18] <wikibugs>	 10Operations, 10Cloud-VPS, 10Traffic, 10HTTPS, 10cloud-services-team (Kanban): add a https-only option to dynamicproxy - https://phabricator.wikimedia.org/T120486 (10bd808) >>! In T120486#5680210, @Krenair wrote: > done in https://gerrit.wikimedia.org/r/c/operations/puppet/+/482142 ?  My guess is that @D...
[19:41:14] <wikibugs>	 10Operations, 10serviceops: dropped packets to phab1003 22280/tcp - https://phabricator.wikimedia.org/T238781 (10ayounsi) 05Open→03Resolved Confirmed!
[19:41:22] <wikibugs>	 10Operations, 10Phabricator, 10Traffic, 10serviceops: Phabricator downtime due to aphlict and websockets (aphlict current disabled) - https://phabricator.wikimedia.org/T238593 (10ayounsi)
[19:44:07] <wikibugs>	 10Operations, 10Cloud-VPS, 10Traffic, 10HTTPS, 10cloud-services-team (Kanban): add a https-only option to dynamicproxy - https://phabricator.wikimedia.org/T120486 (10Dzahn) Yea, that's true. It's been a long time since i wrote that and i had a per-proxy feature in mind. I am ok with closing this ticket i...
[19:45:21] <wikibugs>	 (03CR) 10CRusnov: [C: 03+2] admin: add cglenn to researchers and analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/545974 (https://phabricator.wikimedia.org/T236321) (owner: 10Cwhite)
[19:55:53] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting Access to Stat1004, Stat1006, Stat1007, notebook1003 and notebook1004 - https://phabricator.wikimedia.org/T236321 (10crusnov) 05Open→03Resolved Hello I have added the key above to the patch and merged it. This means that shortly (within...
[20:03:03] <wikibugs>	 10Operations, 10GLOW, 10SRE-Access-Requests: Requesting access to sites from Google Search Console - https://phabricator.wikimedia.org/T238868 (10crusnov) p:05Triage→03Normal
[20:07:42] <wikibugs>	 10Operations, 10GLOW, 10SRE-Access-Requests: Requesting access to sites from Google Search Console - https://phabricator.wikimedia.org/T238868 (10crusnov) According to the procedure for this request, end-dates for rechecking access are needed. Do you have an end-date in mind? Otherwise we should be able to a...
[20:09:11] <wikibugs>	 10Operations, 10Discovery-Search, 10SRE-Access-Requests: Allow analytics-search-users members to sudo as the airflow user - https://phabricator.wikimedia.org/T238905 (10crusnov) p:05Triage→03Normal
[20:15:54] <wikibugs>	 (03PS1) 10Daniel Kinzler: Ping XML dump schema version at 0.10 for now. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/552565 (https://phabricator.wikimedia.org/T238921)
[20:16:34] <wikibugs>	 (03PS2) 10Daniel Kinzler: Pin XML dump schema version at 0.10 for now. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/552565 (https://phabricator.wikimedia.org/T238921)
[20:41:32] <wikibugs>	 (03PS4) 10Phamhi: labmon: add compatibility in buster [puppet] - 10https://gerrit.wikimedia.org/r/552107 (https://phabricator.wikimedia.org/T224585)
[20:45:02] <wikibugs>	 (03PS1) 10Herron: add forwad/reverse entries for logstash 7 collector hosts [dns] - 10https://gerrit.wikimedia.org/r/552567 (https://phabricator.wikimedia.org/T234854)
[20:45:24] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] add forwad/reverse entries for logstash 7 collector hosts [dns] - 10https://gerrit.wikimedia.org/r/552567 (https://phabricator.wikimedia.org/T234854) (owner: 10Herron)
[20:49:50] <wikibugs>	 (03PS2) 10Herron: add forwad/reverse entries for logstash 7 collector hosts [dns] - 10https://gerrit.wikimedia.org/r/552567 (https://phabricator.wikimedia.org/T234854)
[20:58:01] <wikibugs>	 10Operations, 10ops-eqiad, 10serviceops: (No Need By Date Provided) rack/setup/install mw13[49-84].eqiad.wmnet - https://phabricator.wikimedia.org/T236437 (10wiki_willy)
[20:58:22] <wikibugs>	 (03PS1) 10RLazarus: pristine-tar data for poolcounter-prometheus-exporter_0.0~git20181011.d5cca4f.orig.tar.xz [debs/poolcounter-prometheus-exporter] (pristine-tar) - 10https://gerrit.wikimedia.org/r/552568
[20:59:16] <wikibugs>	 10Operations, 10ops-eqiad: (Need By 8/15/19) rack/setup/install ms-be105[7-9].eqiad.wmnet - https://phabricator.wikimedia.org/T237438 (10wiki_willy)
[20:59:52] <wikibugs>	 (03PS1) 10RLazarus: New upstream version 0.0~git20181011.d5cca4f [debs/poolcounter-prometheus-exporter] (upstream) - 10https://gerrit.wikimedia.org/r/552569
[20:59:54] <wikibugs>	 (03PS1) 10RLazarus: Re-adding vendor directory [debs/poolcounter-prometheus-exporter] (upstream) - 10https://gerrit.wikimedia.org/r/552570
[20:59:56] <wikibugs>	 10Operations, 10ops-eqiad: (No Need By Date Provided) rack/setup/install cloudvirt-wdqs100[123].eqiad.wmnet - https://phabricator.wikimedia.org/T235685 (10wiki_willy)
[21:00:36] <wikibugs>	 10Operations, 10ops-eqiad, 10fundraising-tech-ops: (No Need By Date Provided) rack/setup/install frban1001.eqiad.wmnet - https://phabricator.wikimedia.org/T234068 (10wiki_willy)
[21:00:44] <wikibugs>	 (03PS1) 10RLazarus: Ignore quilt dir .pc via .gitignore [debs/poolcounter-prometheus-exporter] - 10https://gerrit.wikimedia.org/r/552571
[21:00:46] <wikibugs>	 (03PS1) 10RLazarus: Initial debianization [debs/poolcounter-prometheus-exporter] - 10https://gerrit.wikimedia.org/r/552572
[21:01:39] <wikibugs>	 10Operations, 10ops-eqiad: (No Need By Date Provided) replace scs-a8-eqiad - https://phabricator.wikimedia.org/T228919 (10wiki_willy)
[21:02:12] <wikibugs>	 10Operations, 10Parsoid-PHP, 10serviceops, 10Patch-For-Review: wt2html: Out of memory crashers - https://phabricator.wikimedia.org/T236833 (10ssastry) @Joe @Dzahn can that memory bump patch ( https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/548944 )  be deployed next week? Or, are we waiting...
[21:02:35] <wikibugs>	 (03CR) 10RLazarus: [C: 03+2] pristine-tar data for poolcounter-prometheus-exporter_0.0~git20181011.d5cca4f.orig.tar.xz [debs/poolcounter-prometheus-exporter] (pristine-tar) - 10https://gerrit.wikimedia.org/r/552568 (owner: 10RLazarus)
[21:03:25] <wikibugs>	 (03CR) 10RLazarus: [V: 03+2 C: 03+2] pristine-tar data for poolcounter-prometheus-exporter_0.0~git20181011.d5cca4f.orig.tar.xz [debs/poolcounter-prometheus-exporter] (pristine-tar) - 10https://gerrit.wikimedia.org/r/552568 (owner: 10RLazarus)
[21:06:05] <wikibugs>	 (03CR) 10Herron: [C: 03+2] add forwad/reverse entries for logstash 7 collector hosts [dns] - 10https://gerrit.wikimedia.org/r/552567 (https://phabricator.wikimedia.org/T234854) (owner: 10Herron)
[21:06:10] <wikibugs>	 (03PS1) 10Andrew Bogott: Remove puppetpanel.pp -- unused [puppet] - 10https://gerrit.wikimedia.org/r/552574
[21:07:15] <wikibugs>	 10Operations, 10GLOW, 10SRE-Access-Requests: Requesting access to sites from Google Search Console - https://phabricator.wikimedia.org/T238868 (10Iflorez) Hello @crusnov, Thank you for your feedback and help to get access.   >>! In T238868#5685490, @crusnov wrote: > According to the procedure for this reques...
[21:08:57] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Remove puppetpanel.pp -- unused [puppet] - 10https://gerrit.wikimedia.org/r/552574 (owner: 10Andrew Bogott)
[21:28:01] <wikibugs>	 10Operations, 10ops-codfw, 10fundraising-tech-ops: (No Need By Date Provided) rack/setup/install frban2001.frack.codfw.wmnet - https://phabricator.wikimedia.org/T234069 (10wiki_willy)
[21:33:11] <wikibugs>	 (03PS2) 10Andrew Bogott: wmf_sink: delete instance puppet config from git on instance deletion [puppet] - 10https://gerrit.wikimedia.org/r/552348 (https://phabricator.wikimedia.org/T238708)
[21:36:09] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] wmf_sink: delete instance puppet config from git on instance deletion [puppet] - 10https://gerrit.wikimedia.org/r/552348 (https://phabricator.wikimedia.org/T238708) (owner: 10Andrew Bogott)
[21:43:16] <wikibugs>	 (03PS3) 10Andrew Bogott: wmf_sink: delete instance puppet config from git on instance deletion [puppet] - 10https://gerrit.wikimedia.org/r/552348 (https://phabricator.wikimedia.org/T238708)
[21:46:20] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] wmf_sink: delete instance puppet config from git on instance deletion [puppet] - 10https://gerrit.wikimedia.org/r/552348 (https://phabricator.wikimedia.org/T238708) (owner: 10Andrew Bogott)
[21:49:26] <wikibugs>	 (03PS4) 10Andrew Bogott: wmf_sink: delete instance puppet config from git on instance deletion [puppet] - 10https://gerrit.wikimedia.org/r/552348 (https://phabricator.wikimedia.org/T238708)
[21:50:59] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[21:51:43] <wikibugs>	 (03PS1) 10Reedy: Add webservices.picturae.com to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/552579 (https://phabricator.wikimedia.org/T238955)
[21:52:43] <wikibugs>	 (03CR) 10Reedy: [C: 03+2] Add webservices.picturae.com to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/552579 (https://phabricator.wikimedia.org/T238955) (owner: 10Reedy)
[21:53:35] <wikibugs>	 (03Merged) 10jenkins-bot: Add webservices.picturae.com to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/552579 (https://phabricator.wikimedia.org/T238955) (owner: 10Reedy)
[21:55:41] <logmsgbot>	 !log reedy@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T238955 (duration: 00m 53s)
[21:55:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:55:47] <stashbot>	 T238955: Please add webservices.picturae.com to $wgCopyUploadsDomains - https://phabricator.wikimedia.org/T238955
[21:55:53] <wikibugs>	 (03PS5) 10Andrew Bogott: wmf_sink: delete instance puppet config from git on instance deletion [puppet] - 10https://gerrit.wikimedia.org/r/552348 (https://phabricator.wikimedia.org/T238708)
[21:57:47] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[21:58:05] <wikibugs>	 (03PS6) 10Andrew Bogott: wmf_sink: delete instance puppet config from git on instance deletion [puppet] - 10https://gerrit.wikimedia.org/r/552348 (https://phabricator.wikimedia.org/T238708)
[22:00:27] <wikibugs>	 (03PS7) 10Andrew Bogott: wmf_sink: delete instance puppet config from git on instance deletion [puppet] - 10https://gerrit.wikimedia.org/r/552348 (https://phabricator.wikimedia.org/T238708)
[22:04:32] <wikibugs>	 10Operations, 10Phabricator, 10Release-Engineering-Team-TODO, 10serviceops, and 2 others: Reimage both phab1001 and phab2001 to stretch / buster - https://phabricator.wikimedia.org/T190568 (10Dzahn)
[22:04:46] <wikibugs>	 (03PS8) 10Andrew Bogott: wmf_sink: delete instance puppet config from git on instance deletion [puppet] - 10https://gerrit.wikimedia.org/r/552348 (https://phabricator.wikimedia.org/T238708)
[22:07:17] <wikibugs>	 10Operations, 10ops-eqiad, 10serviceops: (No Need By Date Provided) rack/setup/install mw13[49-84].eqiad.wmnet - https://phabricator.wikimedia.org/T236437 (10jijiki) @wiki_willy I will provide racking instructions on Monday for you, sorry we have delayed you this much.
[22:08:08] <wikibugs>	 10Operations, 10ops-eqiad, 10serviceops: (No Need By Date Provided) rack/setup/install mw13[49-84].eqiad.wmnet - https://phabricator.wikimedia.org/T236437 (10wiki_willy) Thanks @jijiki , much appreciated
[22:10:06] <wikibugs>	 10Operations, 10Traffic, 10serviceops: Increased latency in appservers - 22 Nov 2019 - https://phabricator.wikimedia.org/T238939 (10jijiki)
[22:10:27] <wikibugs>	 (03PS9) 10Andrew Bogott: wmf_sink: Prepare to delete instance puppet config from git on instance deletion [puppet] - 10https://gerrit.wikimedia.org/r/552348 (https://phabricator.wikimedia.org/T238708)
[22:10:29] <wikibugs>	 (03PS1) 10Andrew Bogott: wmf_sink: remove instance-puppet git entries for deleted VMs [puppet] - 10https://gerrit.wikimedia.org/r/552583 (https://phabricator.wikimedia.org/T238708)
[22:10:48] <wikibugs>	 10Operations, 10Traffic, 10serviceops: Increased latency in appservers - 22 Nov 2019 - https://phabricator.wikimedia.org/T238939 (10jijiki)
[22:11:04] <wikibugs>	 10Operations, 10Phabricator, 10Release-Engineering-Team-TODO, 10serviceops, 10Release-Engineering-Team (Development services): Reimage both phab1001 and phab2001 to stretch / buster - https://phabricator.wikimedia.org/T190568 (10Dzahn)
[22:11:15] <wikibugs>	 10Operations, 10Phabricator, 10Release-Engineering-Team-TODO, 10serviceops, 10Release-Engineering-Team (Development services): Reimage both phab1001 and phab2001 to stretch / buster - https://phabricator.wikimedia.org/T190568 (10Dzahn) 05Open→03Resolved
[22:11:23] <wikibugs>	 10Operations, 10Phabricator, 10serviceops, 10Patch-For-Review, and 3 others: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832 (10Dzahn)
[22:11:46] <wikibugs>	 10Operations, 10hardware-requests, 10serviceops: requesting WMF7426 as phabricator system in eqiad - https://phabricator.wikimedia.org/T215335 (10Dzahn) We will give this back in T238957.
[22:16:21] <wikibugs>	 10Operations, 10Phabricator, 10hardware-requests, 10serviceops, 10Release-Engineering-Team (Development services): The phabricator server, WMF7426, was given to us temporarily, we would like to make it permanent - https://phabricator.wikimedia.org/T232887 (10Dzahn) After further discussion with Mukunda a...
[22:16:34] <wikibugs>	 10Operations, 10Phabricator, 10hardware-requests, 10serviceops, 10Release-Engineering-Team (Development services): The phabricator server, WMF7426, was given to us temporarily, we would like to make it permanent - https://phabricator.wikimedia.org/T232887 (10Dzahn) 05Open→03Declined
[22:16:39] <wikibugs>	 10Operations, 10Phabricator, 10Release-Engineering-Team-TODO, 10serviceops, 10Release-Engineering-Team (Development services): Reimage both phab1001 and phab2001 to stretch / buster - https://phabricator.wikimedia.org/T190568 (10Dzahn)
[22:16:42] <wikibugs>	 (03CR) 10Jforrester: "Please don't write (or merge) patches that fail the requirements for gerrit changes. In particular, it should be impossible to use the sam" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/546369 (https://phabricator.wikimedia.org/T231178) (owner: 10DannyS712)
[22:23:23] <wikibugs>	 10Operations, 10GLOW, 10SRE-Access-Requests: Requesting access to sites from Google Search Console - https://phabricator.wikimedia.org/T238868 (10crusnov) Until september 2020 seems a reasonable timeframe (the docs say "typically aronud one year").   Listing all of the sites now would likely be easiest, yes,...
[22:29:01] <wikibugs>	 (03CR) 10Jforrester: "> Patch Set 2: -Code-Review" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/542184 (https://phabricator.wikimedia.org/T235142) (owner: 10Jforrester)
[22:29:08] <wikibugs>	 (03CR) 10Jforrester: [C: 03+1] Drop HHVMRequestInit, never called [mediawiki-config] - 10https://gerrit.wikimedia.org/r/542184 (https://phabricator.wikimedia.org/T235142) (owner: 10Jforrester)
[22:37:19] <wikibugs>	 (03PS1) 10Dzahn: phabricator/conftool: switch phab-vcs (git-ssh) service to phab1001 [puppet] - 10https://gerrit.wikimedia.org/r/552589 (https://phabricator.wikimedia.org/T238956)
[22:39:31] <wikibugs>	 (03PS1) 10Dzahn: phabricator: switch "active server" from phab1003 to phab1001 [puppet] - 10https://gerrit.wikimedia.org/r/552591 (https://phabricator.wikimedia.org/T238956)
[22:40:51] <wikibugs>	 (03PS1) 10Dzahn: phabricator: remove phab1003 from list of phab servers [puppet] - 10https://gerrit.wikimedia.org/r/552592 (https://phabricator.wikimedia.org/T238957)
[22:43:50] <wikibugs>	 (03PS1) 10Dzahn: dumps/phabricator: switch dumps host from phab1003 to phab1001 [puppet] - 10https://gerrit.wikimedia.org/r/552593 (https://phabricator.wikimedia.org/T238956)
[22:44:11] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10SRE-Access-Requests: Conversion to volunteer NDA for MaxSem - https://phabricator.wikimedia.org/T238960 (10Aklapper)
[22:45:00] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10SRE-Access-Requests: Conversion to volunteer NDA for MaxSem - https://phabricator.wikimedia.org/T238960 (10Dzahn) a:03Dzahn
[22:45:56] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10SRE-Access-Requests: Conversion to volunteer NDA for MaxSem - https://phabricator.wikimedia.org/T238960 (10Dzahn) @RStallman-legalteam Please let Max sign the volunteer NDA docs.
[22:58:27] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10SRE-Access-Requests: Conversion to volunteer NDA for MaxSem - https://phabricator.wikimedia.org/T238960 (10Dzahn) @MaxSem Wanna sign L2 as well?
[23:00:33] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10SRE-Access-Requests: Conversion to volunteer NDA for MaxSem - https://phabricator.wikimedia.org/T238960 (10crusnov) p:05Triage→03Normal
[23:01:00] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10SRE-Access-Requests: Conversion to volunteer NDA for MaxSem - https://phabricator.wikimedia.org/T238960 (10Krenair) As far as I know, no NDA is required for beta cluster access.
[23:07:22] <wikibugs>	 (03PS1) 10Dzahn: admins: add Max Semenik as ldap_only_admin [puppet] - 10https://gerrit.wikimedia.org/r/552594 (https://phabricator.wikimedia.org/T238960)
[23:08:59] <wikibugs>	 (03PS2) 10Dzahn: admins: add Max Semenik as ldap_only_admin [puppet] - 10https://gerrit.wikimedia.org/r/552594 (https://phabricator.wikimedia.org/T238960)
[23:13:14] <wikibugs>	 (03CR) 10Papaul: [C: 03+2] DNS: Remove mgmt DNS for db2048 and db2061 [dns] - 10https://gerrit.wikimedia.org/r/552542 (owner: 10Papaul)
[23:14:32] <wikibugs>	 (03PS1) 10Dzahn: varnish: switch phabricator backend to phab1001 [puppet] - 10https://gerrit.wikimedia.org/r/552595 (https://phabricator.wikimedia.org/T238956)
[23:16:04] <wikibugs>	 (03PS1) 10Dzahn: phabricator: switch mail destination to phab1001 [puppet] - 10https://gerrit.wikimedia.org/r/552597 (https://phabricator.wikimedia.org/T238956)
[23:16:33] <wikibugs>	 (03PS2) 10Papaul: DNS: Remove mgmt DNS for db2048 and db2061 [dns] - 10https://gerrit.wikimedia.org/r/552542
[23:16:51] <wikibugs>	 (03CR) 10Papaul: [V: 03+2 C: 03+2] DNS: Remove mgmt DNS for db2048 and db2061 [dns] - 10https://gerrit.wikimedia.org/r/552542 (owner: 10Papaul)
[23:17:57] <wikibugs>	 (03PS1) 10Dzahn: switch discovery record for phabricator to 1001 for ATS [dns] - 10https://gerrit.wikimedia.org/r/552598 (https://phabricator.wikimedia.org/T238956)
[23:18:26] <wikibugs>	 10Operations, 10ops-codfw, 10decommission, 10Patch-For-Review: Decommission db2061.codfw.wmnet - https://phabricator.wikimedia.org/T238526 (10Papaul)
[23:18:41] <wikibugs>	 10Operations, 10ops-codfw, 10decommission, 10Patch-For-Review: Decommission db2061.codfw.wmnet - https://phabricator.wikimedia.org/T238526 (10Papaul) 05Open→03Resolved Complete
[23:18:44] <wikibugs>	 10Operations, 10DBA: Decommission db2043-db2070 - https://phabricator.wikimedia.org/T228258 (10Papaul)
[23:19:09] <wikibugs>	 10Operations, 10ops-codfw, 10decommission, 10Patch-For-Review: Decommission db2048.codfw.wmnet - https://phabricator.wikimedia.org/T237913 (10Papaul)
[23:19:32] <wikibugs>	 10Operations, 10ops-codfw, 10decommission, 10Patch-For-Review: Decommission db2048.codfw.wmnet - https://phabricator.wikimedia.org/T237913 (10Papaul) 05Open→03Resolved Complete
[23:19:34] <wikibugs>	 10Operations, 10DBA: Decommission db2043-db2070 - https://phabricator.wikimedia.org/T228258 (10Papaul)
[23:22:25] <wikibugs>	 (03PS2) 10Dzahn: admin: Remove myself (MaxSem) [puppet] - 10https://gerrit.wikimedia.org/r/552389 (https://phabricator.wikimedia.org/T238960) (owner: 10MaxSem)
[23:22:50] <wikibugs>	 (03PS1) 10Dzahn: remove service IPs and IPv6 for phab1003 [dns] - 10https://gerrit.wikimedia.org/r/552599 (https://phabricator.wikimedia.org/T238957)
[23:24:44] <wikibugs>	 10Operations, 10Parsoid-PHP, 10serviceops, 10Patch-For-Review: wt2html: Out of memory crashers - https://phabricator.wikimedia.org/T236833 (10Dzahn) For my part it is blocked on first merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/546448 because it uses that.
[23:28:16] <wikibugs>	 (03PS1) 10Dzahn: remove production IPs for phab1003 [dns] - 10https://gerrit.wikimedia.org/r/552601 (https://phabricator.wikimedia.org/T238957)
[23:30:23] <wikibugs>	 (03PS1) 10Dzahn: site: turn phab1003 into a spare::system [puppet] - 10https://gerrit.wikimedia.org/r/552603 (https://phabricator.wikimedia.org/T238957)
[23:33:04] <wikibugs>	 (03PS1) 10Dzahn: mtail: stop using phab1003 for tests, use phab1001 [puppet] - 10https://gerrit.wikimedia.org/r/552604 (https://phabricator.wikimedia.org/T238957)
[23:36:10] <wikibugs>	 (03PS1) 10Dzahn: mariadb: remove grants for users on phab1003 [puppet] - 10https://gerrit.wikimedia.org/r/552607 (https://phabricator.wikimedia.org/T238957)
[23:36:14] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] mtail: stop using phab1003 for tests, use phab1001 [puppet] - 10https://gerrit.wikimedia.org/r/552604 (https://phabricator.wikimedia.org/T238957) (owner: 10Dzahn)
[23:37:31] <wikibugs>	 (03PS2) 10Dzahn: dumps/phabricator: switch dumps host from phab1003 to phab1001 [puppet] - 10https://gerrit.wikimedia.org/r/552593 (https://phabricator.wikimedia.org/T238956)
[23:48:52] <wikibugs>	 (03PS1) 10Dzahn: install_server: remove phab1003 [puppet] - 10https://gerrit.wikimedia.org/r/552609 (https://phabricator.wikimedia.org/T238957)
[23:56:45] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at ulsfo on icinga1001 is CRITICAL: 56.61 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[23:58:27] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at ulsfo on icinga1001 is OK: (C)60 le (W)70 le 71.12 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[23:59:12] <wikibugs>	 10Operations, 10LDAP-Access-Requests: LDAP access to the wmf group for Leighanna Mixter - https://phabricator.wikimedia.org/T238933 (10Dzahn) 05Open→03Resolved a:03Dzahn Hi @Slaporte,  this is already the case.   LDAP user "lmixter" is already a member of the WMF group.  Let us know if something specific...