[00:32:36] <wikibugs>	 (03PS1) 10Mstyles: Add new helm chart for rdf-streaming-updater [deployment-charts] - 10https://gerrit.wikimedia.org/r/640571 (https://phabricator.wikimedia.org/T265526)
[00:32:43] <ryankemper>	 !log About to begin wdqs deploy; before-deploy tests on canary `wdqs1003` are passing
[00:32:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:34:49] <logmsgbot>	 !log ryankemper@deploy1001 Started deploy [wdqs/wdqs@03219df]: 0.3.55
[00:34:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:38:26] <ryankemper>	 !log Following deploy to canary `wdqs1003`, automated tests are passing as is a manual test of an example query. Proceeding...
[00:38:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:46:00] <ryankemper>	 !log T222669 [Elasticsearch reindex] Began long-running reindex of cirrus elasticsearch for `codfw`, `eqiad`, and `cloudelastic`. 3 tmux sessions on `ryankemper@mwmaint1002`: `reindex_eqiad`, `reindex_codfw`, `reindex_cloudelastic`
[00:46:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:46:07] <stashbot>	 T222669: Normalize homoglyphs in mixed-script tokens when possible - https://phabricator.wikimedia.org/T222669
[00:46:13] <logmsgbot>	 !log ryankemper@deploy1001 Finished deploy [wdqs/wdqs@03219df]: 0.3.55 (duration: 11m 24s)
[00:46:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:47:00] <ryankemper>	 !log [wdqs deploy] following deploy, example query succeeds on `query.wikidata.org`, proceeding to post deploy steps
[00:47:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:47:30] <ryankemper>	 !log Restarted `wdqs-updater` simultaneously across all wdqs hosts: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
[00:47:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:47:52] <ryankemper>	 !log Restarted `wdqs-categories` across wdqs test hosts: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
[00:47:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:48:12] <ryankemper>	 !log Restarting `wdqs-categories` one host at a time across all wdqs production instances: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 60 && systemctl restart wdqs-categories && sleep 30 && pool'`
[00:48:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:53:35] <wikibugs>	 (03CR) 10Krinkle: [C: 03+1] arclamp: Use Python 3 [puppet] - 10https://gerrit.wikimedia.org/r/639885 (https://phabricator.wikimedia.org/T267269) (owner: 10Dave Pifke)
[01:20:53] <wikibugs>	 (03CR) 10Krinkle: [C: 03+1] webperf: change navtiming to use Python 3 [puppet] - 10https://gerrit.wikimedia.org/r/639197 (https://phabricator.wikimedia.org/T267269) (owner: 10Dave Pifke)
[01:33:10] <wikibugs>	 10Operations, 10DBA, 10Performance-Team, 10Platform Engineering, 10User-Kormat: Remove groups from db configs - https://phabricator.wikimedia.org/T263127 (10Krinkle) @Marostegui Should we assume that it has already been determined that there is no significant benefit today to the (other) query groups? Or...
[02:06:36] <wikibugs>	 10Operations: Unclean stop of jobrunner service via puppet - https://phabricator.wikimedia.org/T158288 (10Krinkle) 05Open→03Declined jobchron mediawiki/services/jobrunner are no longer used in production.
[02:18:14] <ryankemper>	 !log (WDQS deploy completed)
[02:18:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:51:49] <wikibugs>	 (03PS1) 10Krinkle: ProductionServices: Document hostname of redis_lock hosts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/640576 (https://phabricator.wikimedia.org/T267581)
[07:22:38] <wikibugs>	 (03CR) 10Effie Mouzeli: [C: 03+1] "that helps, thank you!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/640576 (https://phabricator.wikimedia.org/T267581) (owner: 10Krinkle)
[07:40:41] <wikibugs>	 10Operations, 10ops-codfw, 10netops: ripe-atlast-codfw is down - https://phabricator.wikimedia.org/T267714 (10elukey)
[07:40:58] <wikibugs>	 10Operations, 10ops-codfw, 10netops: ripe-atlast-codfw is down - https://phabricator.wikimedia.org/T267714 (10elukey) p:05Triage→03High
[07:41:37] <icinga-wm>	 ACKNOWLEDGEMENT - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: CRITICAL - failed 566 probes of 567 (alerts on 65) - https://atlas.ripe.net/measurements/1791212/#!map Elukey T267714 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[07:41:37] <icinga-wm>	 ACKNOWLEDGEMENT - Host ripe-atlas-codfw IPv6 is DOWN: CRITICAL - Destination Unreachable (2620:0:860:201:208:80:152:244) Elukey T267714
[07:41:37] <icinga-wm>	 ACKNOWLEDGEMENT - IPv4 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 651 probes of 652 (alerts on 35) - https://atlas.ripe.net/measurements/1791210/#!map Elukey T267714 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[07:41:37] <icinga-wm>	 ACKNOWLEDGEMENT - Host ripe-atlas-codfw is DOWN: PING CRITICAL - Packet loss = 100% Elukey T267714
[07:48:57] <wikibugs>	 (03PS1) 10Muehlenhoff: Extend MOU for Robert West [puppet] - 10https://gerrit.wikimedia.org/r/640662
[07:50:08] <wikibugs>	 (03PS2) 10Muehlenhoff: Extend MOU for Robert West [puppet] - 10https://gerrit.wikimedia.org/r/640662
[07:52:45] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Extend MOU for Robert West [puppet] - 10https://gerrit.wikimedia.org/r/640662 (owner: 10Muehlenhoff)
[08:25:32] <wikibugs>	 (03CR) 10Ayounsi: Add CSV import to ProvisionServerNetwork script (037 comments) [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/635849 (owner: 10Ayounsi)
[08:33:36] <wikibugs>	 (03PS17) 10Ayounsi: Add CSV import to ProvisionServerNetwork script [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/635849
[08:33:38] <wikibugs>	 (03PS18) 10Ayounsi: ProvisionServerNetwork, cleanup and standardize logs format [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/635853 (https://phabricator.wikimedia.org/T265339)
[08:33:40] <wikibugs>	 (03PS1) 10Ayounsi: Add python 3.8 to tox [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/640664
[08:34:11] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] ProvisionServerNetwork, cleanup and standardize logs format [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/635853 (https://phabricator.wikimedia.org/T265339) (owner: 10Ayounsi)
[08:34:13] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add python 3.8 to tox [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/640664 (owner: 10Ayounsi)
[08:34:15] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add CSV import to ProvisionServerNetwork script [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/635849 (owner: 10Ayounsi)
[08:37:10] <wikibugs>	 (03PS18) 10Ayounsi: Add CSV import to ProvisionServerNetwork script [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/635849
[08:37:10] <wikibugs>	 (03PS19) 10Ayounsi: ProvisionServerNetwork, cleanup and standardize logs format [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/635853 (https://phabricator.wikimedia.org/T265339)
[08:37:12] <wikibugs>	 (03PS2) 10Ayounsi: Add python 3.8 to tox [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/640664
[09:15:42] <wikibugs>	 10Operations, 10netops: Prevent advertising invalid prefixes from customers - https://phabricator.wikimedia.org/T267719 (10ayounsi) p:05Triage→03High
[09:17:47] <wikibugs>	 (03PS1) 10Ayounsi: Drop special-ranges in BGP_outfilter [homer/public] - 10https://gerrit.wikimedia.org/r/640666 (https://phabricator.wikimedia.org/T267719)
[09:53:34] <XioNoX>	 !log prioritized DE-CIX IXP - T262681
[09:53:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:54:42] <wikibugs>	 (03PS1) 10Ayounsi: Prioritize DE-CIX Dallas IXP [homer/public] - 10https://gerrit.wikimedia.org/r/640671 (https://phabricator.wikimedia.org/T262681)
[09:55:27] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] Prioritize DE-CIX Dallas IXP [homer/public] - 10https://gerrit.wikimedia.org/r/640671 (https://phabricator.wikimedia.org/T262681) (owner: 10Ayounsi)
[09:55:53] <wikibugs>	 (03Merged) 10jenkins-bot: Prioritize DE-CIX Dallas IXP [homer/public] - 10https://gerrit.wikimedia.org/r/640671 (https://phabricator.wikimedia.org/T262681) (owner: 10Ayounsi)
[10:03:39] <wikibugs>	 (03PS1) 10Bartosz Dziewoński: Fix getHeadlineNodeAndOffset() returning text nodes [extensions/DiscussionTools] (wmf/1.36.0-wmf.16) - 10https://gerrit.wikimedia.org/r/640497 (https://phabricator.wikimedia.org/T267284)
[10:05:17] <wikibugs>	 (03PS1) 10Ayounsi: Add HE to DE-CIX Dallas [homer/public] - 10https://gerrit.wikimedia.org/r/640672
[10:05:57] <wikibugs>	 (03PS2) 10Ayounsi: Add HE to DE-CIX Dallas [homer/public] - 10https://gerrit.wikimedia.org/r/640672
[10:06:39] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] Add HE to DE-CIX Dallas [homer/public] - 10https://gerrit.wikimedia.org/r/640672 (owner: 10Ayounsi)
[10:07:08] <wikibugs>	 (03Merged) 10jenkins-bot: Add HE to DE-CIX Dallas [homer/public] - 10https://gerrit.wikimedia.org/r/640672 (owner: 10Ayounsi)
[10:13:50] <icinga-wm>	 PROBLEM - BGP status on cr2-codfw is CRITICAL: BGP CRITICAL - AS6939/IPv4: Active - HE, AS6939/IPv6: Active - HE https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[10:27:27] <_joe_>	 XioNoX: I guess that's a consequence of your change
[10:27:55] <XioNoX>	 ah yep, waiting on HE to configure their side
[10:28:36] <icinga-wm>	 ACKNOWLEDGEMENT - BGP status on cr2-codfw is CRITICAL: BGP CRITICAL - AS6939/IPv6: Active - HE, AS6939/IPv4: Active - HE ayounsi Waiting for HE to configure their side https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[10:28:50] <XioNoX>	 thx
[10:34:33] <XioNoX>	 !log delete unused interfaces from asw-d-codfw
[10:34:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:54:43] <wikibugs>	 (03CR) 10Jbond: "thanks <3" [puppet] - 10https://gerrit.wikimedia.org/r/640512 (https://phabricator.wikimedia.org/T267396) (owner: 10Andrew Bogott)
[11:24:25] <wikibugs>	 (03PS1) 10Lucas Werkmeister (WMDE): Remove propagateChangeVisibility repo setting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/640676
[11:26:15] <Lucas_WMDE>	 jouncebot: refresh please
[11:26:16] <jouncebot>	 I refreshed my knowledge about deployments.
[11:45:39] <wikibugs>	 10Operations, 10SRE-Access-Requests: Requesting access to deployment for jgiannelos - https://phabricator.wikimedia.org/T267585 (10jijiki) 05Open→03Resolved a:03jijiki @Jgiannelos all done, please reopen or find me on irc if something is not right.
[12:00:04] <jouncebot>	 Amir1, Lucas_WMDE, awight, and Urbanecm: #bothumor I � Unicode. All rise for European mid-day backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201111T1200).
[12:00:04] <jouncebot>	 MatmaRex and Lucas_WMDE: A patch you scheduled for European mid-day backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[12:00:11] <Lucas_WMDE>	 o/
[12:00:26] <Urbanecm>	 Lucas_WMDE: I can deploy, or you can 🙂
[12:00:31] <Lucas_WMDE>	 I can do it :)
[12:01:04] <MatmaRex>	 hi
[12:01:09] <MatmaRex>	 jouncebot: now
[12:01:09] <jouncebot>	 For the next 0 hour(s) and 58 minute(s): European mid-day backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201111T1200)
[12:01:10] <Urbanecm>	 Lucas_WMDE: okay, leaving to you :-)
[12:02:13] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Fix getHeadlineNodeAndOffset() returning text nodes [extensions/DiscussionTools] (wmf/1.36.0-wmf.16) - 10https://gerrit.wikimedia.org/r/640497 (https://phabricator.wikimedia.org/T267284) (owner: 10Bartosz Dziewoński)
[12:07:23] <wikibugs>	 (03Merged) 10jenkins-bot: Fix getHeadlineNodeAndOffset() returning text nodes [extensions/DiscussionTools] (wmf/1.36.0-wmf.16) - 10https://gerrit.wikimedia.org/r/640497 (https://phabricator.wikimedia.org/T267284) (owner: 10Bartosz Dziewoński)
[12:08:26] <Lucas_WMDE>	 MatmaRex: the change should be on mwdebug1001 now, can you test it?
[12:08:35] <MatmaRex>	 yeah. looking
[12:09:28] <MatmaRex>	 looks good now
[12:09:35] <MatmaRex>	 Lucas_WMDE: ^
[12:09:40] <Lucas_WMDE>	 ok!
[12:11:25] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1001 Synchronized php-1.36.0-wmf.16/extensions/DiscussionTools/includes/CommentParser.php: Backport: [[gerrit:640497|Fix getHeadlineNodeAndOffset() returning text nodes (T267284)]] (duration: 01m 01s)
[12:11:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:11:33] <stashbot>	 T267284: DiscussionTools fails on beta enwiki `Wikipedia:Quick_directory` - https://phabricator.wikimedia.org/T267284
[12:12:13] <wikibugs>	 (03PS2) 10Lucas Werkmeister (WMDE): Enable propagatePageDeletion on Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/636453
[12:12:31] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Enable propagatePageDeletion on Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/636453 (owner: 10Lucas Werkmeister (WMDE))
[12:13:25] <wikibugs>	 (03Merged) 10jenkins-bot: Enable propagatePageDeletion on Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/636453 (owner: 10Lucas Werkmeister (WMDE))
[12:13:58] <Lucas_WMDE>	 I can’t really test this change, I’ll just quickly check that mwdebug doesn’t explode
[12:14:59] <MatmaRex>	 (thanks for deploying)
[12:15:17] <Lucas_WMDE>	 np :)
[12:16:26] <wikibugs>	 (03PS2) 10Lucas Werkmeister (WMDE): Remove propagateChangeVisibility repo setting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/640676
[12:16:57] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:636453|Enable propagatePageDeletion on Wikidata]] (duration: 00m 59s)
[12:17:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:19:21] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Remove propagateChangeVisibility repo setting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/640676 (owner: 10Lucas Werkmeister (WMDE))
[12:20:12] <wikibugs>	 (03Merged) 10jenkins-bot: Remove propagateChangeVisibility repo setting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/640676 (owner: 10Lucas Werkmeister (WMDE))
[12:20:53] <Lucas_WMDE>	 and this change should be a no-op, checking on mwdebug that nothing obvious breaks
[12:21:39] <Lucas_WMDE>	 looks like everything’s working
[12:23:12] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1001 Synchronized wmf-config/Wikibase.php: Config: [[gerrit:640676|Remove propagateChangeVisibility repo setting]] (duration: 00m 58s)
[12:23:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:24:30] <Lucas_WMDE>	 I think that’s it
[12:24:35] <Lucas_WMDE>	 any last-minute changes? :)
[12:25:41] <Lucas_WMDE>	 !log EU backport&config window done
[12:25:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:53:26] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps2003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 113170960 and 8 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[12:55:06] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps2003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 4936 and 1 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[12:58:56] <wikibugs>	 10Operations, 10Commons, 10MediaWiki-File-management: File from commons is not loaded properly - https://phabricator.wikimedia.org/T267668 (10CptViraj)
[13:03:43] <moritzm>	 silvia
[13:06:05] <moritzm>	 df -h
[13:45:44] <icinga-wm>	 PROBLEM - Maps tiles generation on alert1001 is CRITICAL: CRITICAL: 90.28% of data under the critical threshold [5.0] https://wikitech.wikimedia.org/wiki/Maps/Runbook https://grafana.wikimedia.org/dashboard/db/maps-performances?panelId=8&fullscreen&orgId=1
[13:48:11] <wikibugs>	 10Operations, 10Desktop Improvements, 10Product-Infrastructure-Team-Backlog, 10Proton, and 4 others: Connection closed while downloading PDF of articles - https://phabricator.wikimedia.org/T266373 (10akosiaris) A few more tests. the TL;DR says varnish 6 is at fault probably, but with a question mark.  Test...
[13:48:14] <wikibugs>	 (03PS1) 10PipelineBot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/640685
[13:49:55] <wikibugs>	 (03PS1) 10PipelineBot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/640686
[13:51:19] <wikibugs>	 (03PS1) 10PipelineBot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/640687
[13:52:46] <logmsgbot>	 !log akosiaris@cumin1001 conftool action : set/pooled=no; selector: name=cp3054.esams.wmnet
[13:52:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:17:57] <wikibugs>	 (03PS1) 10Jbond: puppet: migrate from require_package to ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/640688 (https://phabricator.wikimedia.org/T266479)
[14:18:30] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "LGTM" [homer/public] - 10https://gerrit.wikimedia.org/r/640666 (https://phabricator.wikimedia.org/T267719) (owner: 10Ayounsi)
[14:19:52] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] puppet: migrate from require_package to ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/640688 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond)
[14:26:18] <wikibugs>	 (03PS16) 10Jbond: puppet-merge: add Repository class [puppet] - 10https://gerrit.wikimedia.org/r/544943 (https://phabricator.wikimedia.org/T254249)
[14:27:03] <wikibugs>	 (03PS2) 10Jbond: puppet: migrate from require_package to ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/640688 (https://phabricator.wikimedia.org/T266479)
[14:27:09] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] puppet-merge: add Repository class [puppet] - 10https://gerrit.wikimedia.org/r/544943 (https://phabricator.wikimedia.org/T254249) (owner: 10Jbond)
[14:28:44] <wikibugs>	 (03PS17) 10Jbond: puppet-merge: add Repository class [puppet] - 10https://gerrit.wikimedia.org/r/544943 (https://phabricator.wikimedia.org/T254249)
[14:28:54] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] puppet: migrate from require_package to ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/640688 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond)
[14:29:39] <logmsgbot>	 !log akosiaris@cumin1001 conftool action : set/pooled=yes; selector: name=cp3054.esams.wmnet
[14:29:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:30:43] <wikibugs>	 (03PS3) 10Jbond: puppet: migrate from require_package to ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/640688 (https://phabricator.wikimedia.org/T266479)
[14:33:46] <icinga-wm>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:33:48] <icinga-wm>	 PROBLEM - Host mr1-eqiad.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 100%
[14:34:18] <icinga-wm>	 PROBLEM - Router interfaces on mr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.199, interfaces up: 35, down: 1, dormant: 0, excluded: 1, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[14:36:02] <icinga-wm>	 RECOVERY - Router interfaces on mr1-eqiad is OK: OK: host 208.80.154.199, interfaces up: 37, down: 0, dormant: 0, excluded: 1, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[14:39:32] <icinga-wm>	 RECOVERY - Host mr1-eqiad.oob IPv6 is UP: PING OK - Packet loss = 0%, RTA = 2.28 ms
[14:43:39] <wikibugs>	 (03PS1) 10Jbond: test puppet merge [puppet] - 10https://gerrit.wikimedia.org/r/640689
[14:44:32] <wikibugs>	 (03CR) 10Jbond: "PCC (still running): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/26422/console" [puppet] - 10https://gerrit.wikimedia.org/r/640688 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond)
[14:44:42] <wikibugs>	 (03PS2) 10Jbond: test puppet merge [puppet] - 10https://gerrit.wikimedia.org/r/640689
[14:44:50] <wikibugs>	 (03PS3) 10Jbond: test puppet merge [puppet] - 10https://gerrit.wikimedia.org/r/640689
[14:45:42] <icinga-wm>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:46:05] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] test puppet merge [puppet] - 10https://gerrit.wikimedia.org/r/640689 (owner: 10Jbond)
[14:51:03] <wikibugs>	 (03PS18) 10Jbond: puppet-merge: add Repository class [puppet] - 10https://gerrit.wikimedia.org/r/544943 (https://phabricator.wikimedia.org/T254249)
[14:52:49] <wikibugs>	 (03PS19) 10Jbond: puppet-merge: add Repository class [puppet] - 10https://gerrit.wikimedia.org/r/544943 (https://phabricator.wikimedia.org/T254249)
[14:54:56] <wikibugs>	 (03PS1) 10Jbond: Revert "test puppet merge" [puppet] - 10https://gerrit.wikimedia.org/r/640499
[14:55:04] <wikibugs>	 (03CR) 10Jbond: [V: 03+2 C: 03+2] Revert "test puppet merge" [puppet] - 10https://gerrit.wikimedia.org/r/640499 (owner: 10Jbond)
[14:57:06] <wikibugs>	 (03CR) 10Jbond: "Ready for review again" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/544943 (https://phabricator.wikimedia.org/T254249) (owner: 10Jbond)
[15:20:51] <wikibugs>	 (03CR) 10Faidon Liambotis: [C: 03+1] Drop special-ranges in BGP_outfilter [homer/public] - 10https://gerrit.wikimedia.org/r/640666 (https://phabricator.wikimedia.org/T267719) (owner: 10Ayounsi)
[15:21:40] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] Drop special-ranges in BGP_outfilter [homer/public] - 10https://gerrit.wikimedia.org/r/640666 (https://phabricator.wikimedia.org/T267719) (owner: 10Ayounsi)
[15:22:07] <wikibugs>	 (03Merged) 10jenkins-bot: Drop special-ranges in BGP_outfilter [homer/public] - 10https://gerrit.wikimedia.org/r/640666 (https://phabricator.wikimedia.org/T267719) (owner: 10Ayounsi)
[15:26:22] <icinga-wm>	 PROBLEM - Check systemd state on ms-be2053 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[15:27:26] <wikibugs>	 10Operations, 10netops, 10Patch-For-Review: Prevent advertising invalid prefixes from customers - https://phabricator.wikimedia.org/T267719 (10ayounsi) Pushed to cr3-ulsfo: `name=before   Prefix    Nexthop        MED     Lclpref    AS path * 172.16.0.0/21           Self                                    I *...
[15:50:28] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on ms-be2053 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[15:52:32] <logmsgbot>	 !log ayounsi@cumin1001 START - Cookbook sre.network.cf
[15:52:32] <logmsgbot>	 !log ayounsi@cumin1001 END (PASS) - Cookbook sre.network.cf (exit_code=0)
[15:52:32] <icinga-wm>	 RECOVERY - Check systemd state on ms-be2053 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[15:52:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:52:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:53:32] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_citoid_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[15:54:32] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[16:14:07] <wikibugs>	 10Operations, 10LDAP-Access-Requests: LDAP access for Till Mletzko - https://phabricator.wikimedia.org/T267744 (10tmletzko)
[16:21:00] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on ms-be2053 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[16:30:09] <logmsgbot>	 !log ayounsi@cumin1001 START - Cookbook sre.network.cf
[16:30:12] <logmsgbot>	 !log ayounsi@cumin1001 END (PASS) - Cookbook sre.network.cf (exit_code=0)
[16:30:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:30:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:34:36] <icinga-wm>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:35:42] <icinga-wm>	 PROBLEM - MD RAID on ms-be2031 is CRITICAL: CRITICAL: State: degraded, Active: 2, Working: 2, Failed: 0, Spare: 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[16:35:44] <icinga-wm>	 ACKNOWLEDGEMENT - MD RAID on ms-be2031 is CRITICAL: CRITICAL: State: degraded, Active: 2, Working: 2, Failed: 0, Spare: 0 nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T267746 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[16:36:01] <wikibugs>	 10Operations, 10ops-codfw: Degraded RAID on ms-be2031 - https://phabricator.wikimedia.org/T267746 (10ops-monitoring-bot)
[16:38:05] <logmsgbot>	 !log ayounsi@cumin1001 START - Cookbook sre.network.cf
[16:38:07] <logmsgbot>	 !log ayounsi@cumin1001 END (PASS) - Cookbook sre.network.cf (exit_code=0)
[16:38:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:38:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:41:32] <wikibugs>	 (03PS1) 10Ayounsi: Revert "temporarily route Italy to codfw" [dns] - 10https://gerrit.wikimedia.org/r/640500
[16:41:51] <wikibugs>	 (03PS2) 10Ayounsi: Revert "temporarily route Italy to codfw" [dns] - 10https://gerrit.wikimedia.org/r/640500
[16:42:41] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] Revert "temporarily route Italy to codfw" [dns] - 10https://gerrit.wikimedia.org/r/640500 (owner: 10Ayounsi)
[16:44:35] <XioNoX>	 !log Revert "temporarily route Italy to codfw"
[16:44:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:46:08] <icinga-wm>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:53:13] <wikibugs>	 10Operations, 10netops: Prevent advertising invalid prefixes from customers - https://phabricator.wikimedia.org/T267719 (10ayounsi) 05Open→03Resolved Done.
[16:56:32] <icinga-wm>	 PROBLEM - HP RAID on ms-be2031 is CRITICAL: CRITICAL: Slot 3: OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, 2I:4:2 - Failed: 2I:4:1 - Controller: OK - Battery/Capacitor: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[16:56:35] <icinga-wm>	 ACKNOWLEDGEMENT - HP RAID on ms-be2031 is CRITICAL: CRITICAL: Slot 3: OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, 2I:4:2 - Failed: 2I:4:1 - Controller: OK - Battery/Capacitor: OK nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T267748 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[16:56:39] <wikibugs>	 10Operations, 10ops-codfw: Degraded RAID on ms-be2031 - https://phabricator.wikimedia.org/T267748 (10ops-monitoring-bot)
[17:02:29] <icinga-wm>	 PROBLEM - varnish-http-requests grafana alert on alert1001 is CRITICAL: CRITICAL: Varnish HTTP Requests ( https://grafana.wikimedia.org/d/000000180/varnish-http-requests ) is alerting: 70% GET drop in 30min alert. https://phabricator.wikimedia.org/project/view/1201/ https://grafana.wikimedia.org/d/000000180/
[17:04:37] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at eqsin on alert1001 is CRITICAL: 56.87 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[17:07:53] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at eqsin on alert1001 is OK: (C)60 le (W)70 le 73.63 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[17:16:07] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at eqsin on alert1001 is CRITICAL: 45.78 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[17:19:27] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at eqsin on alert1001 is OK: (C)60 le (W)70 le 72.24 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[17:22:11] <icinga-wm>	 RECOVERY - varnish-http-requests grafana alert on alert1001 is OK: OK: Varnish HTTP Requests ( https://grafana.wikimedia.org/d/000000180/varnish-http-requests ) is not alerting. https://phabricator.wikimedia.org/project/view/1201/ https://grafana.wikimedia.org/d/000000180/
[17:26:05] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at eqsin on alert1001 is CRITICAL: 37.72 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[17:26:37] <icinga-wm>	 PROBLEM - Device not healthy -SMART- on ms-be2031 is CRITICAL: cluster=swift device=None instance=ms-be2031 job=node site=codfw https://wikitech.wikimedia.org/wiki/SMART%23Alerts https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ms-be2031&var-datasource=codfw+prometheus/ops
[17:28:53] <icinga-wm>	 PROBLEM - varnish-http-requests grafana alert on alert1001 is CRITICAL: CRITICAL: Varnish HTTP Requests ( https://grafana.wikimedia.org/d/000000180/varnish-http-requests ) is alerting: 70% GET drop in 30min alert. https://phabricator.wikimedia.org/project/view/1201/ https://grafana.wikimedia.org/d/000000180/
[17:31:01] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at eqsin on alert1001 is OK: (C)60 le (W)70 le 77.31 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[17:32:09] <icinga-wm>	 RECOVERY - varnish-http-requests grafana alert on alert1001 is OK: OK: Varnish HTTP Requests ( https://grafana.wikimedia.org/d/000000180/varnish-http-requests ) is not alerting. https://phabricator.wikimedia.org/project/view/1201/ https://grafana.wikimedia.org/d/000000180/
[17:53:23] <wikibugs>	 (03PS4) 10Jberkel: Enable "Cite" button in toolbar for enwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/640087 (https://phabricator.wikimedia.org/T267504)
[18:34:35] <icinga-wm>	 RECOVERY - BGP status on cr2-codfw is OK: BGP OK - up: 65, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[19:00:04] <jouncebot>	 Deploy window Train log triage with CPT (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201111T1900)
[19:00:04] <jouncebot>	 RoanKattouw, Niharika, and Urbanecm: May I have your attention please! Morning backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201111T1900)
[19:00:04] <jouncebot>	 No GERRIT patches in the queue for this window AFAICS.
[19:47:51] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on alert1001 is CRITICAL: 173 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[19:49:31] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on alert1001 is OK: (C)100 gt (W)50 gt 5 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[20:44:33] <wikibugs>	 10Operations, 10LDAP-Access-Requests: LDAP access for Till Mletzko - https://phabricator.wikimedia.org/T267744 (10Aklapper) 05Open→03Stalled @tmletzko: Hi, please see https://phabricator.wikimedia.org/project/profile/1564/ for required information.
[21:00:04] <jouncebot>	 chrisalbon and accraze: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Services – Graphoid / ORES deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201111T2100).
[21:07:48] <wikibugs>	 10Operations, 10Domains, 10Traffic, 10Patch-For-Review: Change of nameservers for Wikimedia.org.tr - https://phabricator.wikimedia.org/T259792 (10Asaf) Some more weeks on, I repeat the request to make progress, or at least offer an ETA for this.  Thank you.
[21:25:11] <icinga-wm>	 PROBLEM - OSPF status on cr1-eqiad is CRITICAL: OSPFv2: 4/5 UP : OSPFv3: 4/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[21:25:27] <icinga-wm>	 PROBLEM - OSPF status on cr1-codfw is CRITICAL: OSPFv2: 5/6 UP : OSPFv3: 5/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[21:27:09] <icinga-wm>	 RECOVERY - OSPF status on cr1-codfw is OK: OSPFv2: 6/6 UP : OSPFv3: 6/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[21:28:31] <icinga-wm>	 RECOVERY - OSPF status on cr1-eqiad is OK: OSPFv2: 5/5 UP : OSPFv3: 5/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[21:58:06] <wikibugs>	 (03CR) 10Ladsgroup: "> Patch Set 1:" [dns] - 10https://gerrit.wikimedia.org/r/637849 (https://phabricator.wikimedia.org/T152882) (owner: 10Ladsgroup)
[22:00:00] <wikibugs>	 (03CR) 10Ladsgroup: "> Patch Set 1:" [dns] - 10https://gerrit.wikimedia.org/r/637849 (https://phabricator.wikimedia.org/T152882) (owner: 10Ladsgroup)
[22:00:40] <wikibugs>	 (03CR) 10Ladsgroup: "> Patch Set 1:" [dns] - 10https://gerrit.wikimedia.org/r/637849 (https://phabricator.wikimedia.org/T152882) (owner: 10Ladsgroup)
[22:12:19] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[22:13:59] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[22:28:38] <Cyberpower678>	 Guys, are the servers experiencing an overload or something.  Wikipedia is intermittently having issues loading.  This is only happening on Wikipedia, no where else on the Internet.
[22:29:29] <Reedy>	 No one else has complained
[22:29:33] <Reedy>	 What sort of issues loading?
[22:32:04] <Cyberpower678>	 Reedy, requests to the server are not being answered.
[22:32:19] <Cyberpower678>	 So I'm just sitting here with indefinite loading.
[22:32:30] <Cyberpower678>	 Happening to all devices.
[22:32:39] <Cyberpower678>	 But the rest of the internet is loading fine.
[22:33:33] <Reedy>	 You've tested it all?
[22:33:34] <Cyberpower678>	 The problems appear to be intermittent, but are excacerbated when trying to get past the 2FA login.
[22:38:37] <Cyberpower678>	 Reedy, now it's working again.
[22:50:51] <apergos>	 we've had no other reports from users on irc, so hopefully it was somehow just you
[22:54:24] <Cyberpower678>	 Maybe transient DNS issue somewhere along the way.
[23:26:56] <wikibugs>	 10Operations, 10Commons, 10MediaWiki-File-management: File from commons is not loaded properly - https://phabricator.wikimedia.org/T267668 (10ColinFine) I don't know if this is helpful, but when I tried editing [[Allan Shivers]] on en-wiki, which is showing the problem, I noticed that the image appears in "S...
[23:32:27] <wikibugs>	 10Operations, 10Commons, 10MediaWiki-File-management: File from commons is not loaded properly - https://phabricator.wikimedia.org/T267668 (10Urbanecm) p:05Medium→03High >>! In T267668#6617648, @jijiki wrote: > @AntiCompositeNumber the feature we have been working on has been enabled in mw1276 (api) and...
[23:34:19] <wikibugs>	 10Operations, 10Commons, 10MediaWiki-File-management, 10Wikimedia-production-error: File from commons is not loaded properly - https://phabricator.wikimedia.org/T267668 (10jijiki)
[23:52:54] <wikibugs>	 10Operations, 10Commons, 10MediaWiki-File-management, 10Wikimedia-production-error: File from commons is not loaded properly - https://phabricator.wikimedia.org/T267668 (10AntiCompositeNumber) >>! In T267668#6619716, @ColinFine wrote: > I don't know if this is helpful, but when I tried editing [[Allan Shiv...
[23:54:19] <wikibugs>	 10Operations, 10Commons, 10MediaWiki-File-management, 10Wikimedia-production-error: Some recent Commons uploads not available on other wikis (2020-11) - https://phabricator.wikimedia.org/T267668 (10AntiCompositeNumber)