[00:50:55] <wikibugs>	 (03CR) 10Bstorm: "Looks good https://puppet-compiler.wmflabs.org/compiler1001/17918/" [puppet] - 10https://gerrit.wikimedia.org/r/530405 (https://phabricator.wikimedia.org/T230562) (owner: 10BryanDavis)
[00:51:11] <wikibugs>	 (03PS3) 10Bstorm: toolforge: treat all compute nodes as submit hosts [puppet] - 10https://gerrit.wikimedia.org/r/530405 (https://phabricator.wikimedia.org/T230562) (owner: 10BryanDavis)
[00:52:35] <wikibugs>	 (03CR) 10Bstorm: [C: 03+2] toolforge: treat all compute nodes as submit hosts [puppet] - 10https://gerrit.wikimedia.org/r/530405 (https://phabricator.wikimedia.org/T230562) (owner: 10BryanDavis)
[02:51:23] <vgutierrez>	 !log repooling cp5002, running compress.so experiment
[02:51:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:53:31] <wikibugs>	 (03PS1) 10Mholloway: MachineVision (Beta): Request labels targeting Beta Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/530460
[03:00:40] <wikibugs>	 10Operations, 10Commons, 10MediaWiki-File-management, 10Multimedia, and 3 others: Some thumbnail images delivered with wrong application/x-www-form-urlencoded mime-type - https://phabricator.wikimedia.org/T188831 (10Wang_Qiliang) >>! In T188831#5416184, @ema wrote: >>>! In T188831#5416179, @Wang_Qiliang wr...
[04:01:33] <icinga-wm>	 PROBLEM - snapshot of s7 in codfw on db1115 is CRITICAL: snapshot for s7 at codfw taken more than 4 days ago: Most recent backup 2019-08-12 03:45:11 https://wikitech.wikimedia.org/wiki/MariaDB/Backups
[05:03:35] <icinga-wm>	 PROBLEM - Check systemd state on an-coord1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:33:19] <icinga-wm>	 RECOVERY - snapshot of s7 in codfw on db1115 is OK: snapshot for s7 at codfw taken less than 4 days ago and larger than 90 GB: Last one 2019-08-16 03:40:14 from db2100.codfw.wmnet:3317 (849 GB) https://wikitech.wikimedia.org/wiki/MariaDB/Backups
[05:45:01] <wikibugs>	 (03PS1) 10Vgutierrez: ocsp: Allow to load an existing OCSPResponse from disk [software/acme-chief] - 10https://gerrit.wikimedia.org/r/530464 (https://phabricator.wikimedia.org/T219765)
[05:45:03] <wikibugs>	 (03PS1) 10Vgutierrez: acme_chief: Provide OCSP responses [software/acme-chief] - 10https://gerrit.wikimedia.org/r/530465 (https://phabricator.wikimedia.org/T219765)
[05:47:35] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] acme_chief: Provide OCSP responses [software/acme-chief] - 10https://gerrit.wikimedia.org/r/530465 (https://phabricator.wikimedia.org/T219765) (owner: 10Vgutierrez)
[06:09:01] <icinga-wm>	 RECOVERY - Kafka MirrorMaker main-eqiad_to_main-codfw max lag in last 10 minutes on icinga1001 is OK: (C)1e+05 gt (W)1e+04 gt 0 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=codfw+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_main-codfw
[06:42:58] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/530230 (owner: 10MarcoAurelio)
[06:43:05] <wikibugs>	 (03PS4) 10Muehlenhoff: openldap::offboard-user.py: Adjust several renamed projects [puppet] - 10https://gerrit.wikimedia.org/r/530230 (owner: 10MarcoAurelio)
[06:45:41] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] openldap::offboard-user.py: Adjust several renamed projects [puppet] - 10https://gerrit.wikimedia.org/r/530230 (owner: 10MarcoAurelio)
[06:50:07] <_joe_>	 !log upgrading envoyproxy across production (http2 CVEs)
[06:50:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:10:09] <icinga-wm>	 PROBLEM - BGP status on cr2-eqdfw is CRITICAL: BGP CRITICAL - AS2914/IPv4: Active, AS2914/IPv6: Active https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[07:25:27] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 229, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[07:25:59] <icinga-wm>	 PROBLEM - Memory correctable errors -EDAC- on thumbor1004 is CRITICAL: 4.001 ge 4 https://wikitech.wikimedia.org/wiki/Monitoring/Memory%23Memory_correctable_errors_-EDAC- https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=thumbor1004&var-datasource=eqiad+prometheus/ops
[07:27:03] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 231, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[07:29:25] <icinga-wm>	 RECOVERY - BGP status on cr2-eqdfw is OK: BGP OK - up: 81, down: 0, shutdown: 4 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[07:31:53] <icinga-wm>	 PROBLEM - OSPF status on cr4-ulsfo is CRITICAL: OSPFv2: 4/5 UP : OSPFv3: 4/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[07:32:39] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqdfw is CRITICAL: CRITICAL: host 208.80.153.198, interfaces up: 55, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[07:33:15] <icinga-wm>	 PROBLEM - OSPF status on cr2-eqdfw is CRITICAL: OSPFv2: 4/5 UP : OSPFv3: 4/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[07:33:17] <icinga-wm>	 PROBLEM - BFD status on cr4-ulsfo is CRITICAL: CRIT: Down: 2 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[07:35:53] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqdfw is OK: OK: host 208.80.153.198, interfaces up: 57, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[07:36:43] <icinga-wm>	 RECOVERY - OSPF status on cr4-ulsfo is OK: OSPFv2: 5/5 UP : OSPFv3: 5/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[07:38:07] <icinga-wm>	 RECOVERY - OSPF status on cr2-eqdfw is OK: OSPFv2: 5/5 UP : OSPFv3: 5/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[07:38:09] <icinga-wm>	 RECOVERY - BFD status on cr4-ulsfo is OK: OK: UP: 8 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[07:39:03] <icinga-wm>	 PROBLEM - BGP status on cr2-eqdfw is CRITICAL: BGP CRITICAL - AS2914/IPv4: Active, AS2914/IPv6: Active https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[07:43:03] <wikibugs>	 10Operations, 10SRE-Access-Requests: Requesting access to LogStash for Abijeet Patro - https://phabricator.wikimedia.org/T230104 (10abi_) Can confirm that I'm able to access this.
[07:50:12] <wikibugs>	 10Operations, 10serviceops, 10Core Platform Team (Needs Cleaning - Services Operations): Migrate node-based services in production to node10 - https://phabricator.wikimedia.org/T210704 (10MoritzMuehlenhoff)
[07:53:33] <icinga-wm>	 RECOVERY - BGP status on cr2-eqdfw is OK: BGP OK - up: 81, down: 0, shutdown: 4 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[07:55:51] <icinga-wm>	 PROBLEM - OSPF status on cr2-eqdfw is CRITICAL: OSPFv2: 4/5 UP : OSPFv3: 4/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[07:56:05] <icinga-wm>	 PROBLEM - OSPF status on cr4-ulsfo is CRITICAL: OSPFv2: 4/5 UP : OSPFv3: 4/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[07:56:47] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqdfw is CRITICAL: CRITICAL: host 208.80.153.198, interfaces up: 55, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[07:57:29] <icinga-wm>	 PROBLEM - BFD status on cr4-ulsfo is CRITICAL: CRIT: Down: 2 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[08:01:39] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqdfw is OK: OK: host 208.80.153.198, interfaces up: 57, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[08:02:19] <icinga-wm>	 RECOVERY - OSPF status on cr2-eqdfw is OK: OSPFv2: 5/5 UP : OSPFv3: 5/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[08:02:19] <icinga-wm>	 RECOVERY - BFD status on cr4-ulsfo is OK: OK: UP: 8 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[08:02:31] <icinga-wm>	 RECOVERY - OSPF status on cr4-ulsfo is OK: OSPFv2: 5/5 UP : OSPFv3: 5/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[08:04:47] <icinga-wm>	 PROBLEM - BGP status on cr2-eqdfw is CRITICAL: BGP CRITICAL - AS2914/IPv4: Active, AS2914/IPv6: Active https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[08:06:03] <wikibugs>	 10Operations, 10Discovery-Search (Current work): can't SSH to elastic2050.mgmt - https://phabricator.wikimedia.org/T230597 (10Mathew.onipe)
[08:08:57] <wikibugs>	 10Operations, 10Discovery-Search (Current work): can't SSH to elastic2050.mgmt - https://phabricator.wikimedia.org/T230597 (10Mathew.onipe)
[08:10:58] <icinga-wm>	 ACKNOWLEDGEMENT - Host elastic2050.mgmt is DOWN: PING CRITICAL - Packet loss = 100% Mathew.onipe see T230597 - The acknowledgement expires at: 2019-08-19 08:10:34.
[08:18:12] <_joe_>	 !log stopping php on phab1003, to restart it with systemd
[08:18:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:38:17] <icinga-wm>	 RECOVERY - BGP status on cr2-eqdfw is OK: BGP OK - up: 83, down: 0, shutdown: 2 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[08:43:31] <wikibugs>	 (03PS2) 10Vgutierrez: acme_chief: Provide OCSP responses [software/acme-chief] - 10https://gerrit.wikimedia.org/r/530465 (https://phabricator.wikimedia.org/T219765)
[08:43:41] <wikibugs>	 (03PS4) 10Ema: ATS: do not autostart service upon package installation [puppet] - 10https://gerrit.wikimedia.org/r/529402
[08:44:45] <wikibugs>	 (03CR) 10Ema: [C: 03+2] ATS: do not autostart service upon package installation [puppet] - 10https://gerrit.wikimedia.org/r/529402 (owner: 10Ema)
[08:45:44] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] acme_chief: Provide OCSP responses [software/acme-chief] - 10https://gerrit.wikimedia.org/r/530465 (https://phabricator.wikimedia.org/T219765) (owner: 10Vgutierrez)
[08:46:12] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: phabricator::main: correct the php extension list [puppet] - 10https://gerrit.wikimedia.org/r/530538
[08:46:14] <wikibugs>	 (03CR) 10Muehlenhoff: ATS: do not autostart service upon package installation (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/529402 (owner: 10Ema)
[08:48:27] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1001/17919/" [puppet] - 10https://gerrit.wikimedia.org/r/530538 (owner: 10Giuseppe Lavagetto)
[08:48:39] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: phabricator::main: correct the php extension list [puppet] - 10https://gerrit.wikimedia.org/r/530538
[08:48:44] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [V: 03+2 C: 03+2] phabricator::main: correct the php extension list [puppet] - 10https://gerrit.wikimedia.org/r/530538 (owner: 10Giuseppe Lavagetto)
[08:51:01] <wikibugs>	 (03PS2) 10Vgutierrez: ocsp: Allow to load an existing OCSPResponse from disk [software/acme-chief] - 10https://gerrit.wikimedia.org/r/530464 (https://phabricator.wikimedia.org/T219765)
[08:51:03] <wikibugs>	 (03PS3) 10Vgutierrez: acme_chief: Provide OCSP responses [software/acme-chief] - 10https://gerrit.wikimedia.org/r/530465 (https://phabricator.wikimedia.org/T219765)
[08:52:29] <wikibugs>	 (03CR) 10Vgutierrez: ATS: do not autostart service upon package installation (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/529402 (owner: 10Ema)
[09:04:50] <wikibugs>	 10Operations, 10netops: Investigate the potential benefits of BGPalerter - https://phabricator.wikimedia.org/T230600 (10jbond)
[09:08:12] <wikibugs>	 (03PS2) 10Jakob: Whitelist jenkins for edit rate limits on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/530144 (https://phabricator.wikimedia.org/T230481)
[09:14:00] <wikibugs>	 10Operations, 10DBA, 10Data-Services: Prepare and check storage layer for nqowiki - https://phabricator.wikimedia.org/T230543 (10bd808)
[09:17:08] <wikibugs>	 10Operations, 10Cloud-Services, 10Traffic: All sites served by cloudweb2001-dev return 503 - https://phabricator.wikimedia.org/T230105 (10bd808)
[09:23:33] <wikibugs>	 (03PS1) 10BryanDavis: toolforge: provision zstd [puppet] - 10https://gerrit.wikimedia.org/r/530547 (https://phabricator.wikimedia.org/T225380)
[09:25:11] <wikibugs>	 (03CR) 10Muehlenhoff: "Is there still Toolforge on jessie? If so, this will need an os_version guard as it's only part of Debian starting with Stretch." [puppet] - 10https://gerrit.wikimedia.org/r/530547 (https://phabricator.wikimedia.org/T225380) (owner: 10BryanDavis)
[09:25:31] <wikibugs>	 (03PS4) 10Vgutierrez: acme_chief: Provide OCSP responses [software/acme-chief] - 10https://gerrit.wikimedia.org/r/530465 (https://phabricator.wikimedia.org/T219765)
[09:25:33] <wikibugs>	 (03PS1) 10Vgutierrez: ocsp: Provide basic test coverage [software/acme-chief] - 10https://gerrit.wikimedia.org/r/530548 (https://phabricator.wikimedia.org/T219765)
[09:28:45] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10Epic, 10cloud-services-team (Kanban): relocate/reimage cloudvirt1023 with 10G interfaces - https://phabricator.wikimedia.org/T229871 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by aborrero on cumin1001.eqiad.wmnet for hosts: ` cloudvirt1023.eqiad.wmn...
[09:29:27] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] toolforge: provision zstd [puppet] - 10https://gerrit.wikimedia.org/r/530547 (https://phabricator.wikimedia.org/T225380) (owner: 10BryanDavis)
[09:31:04] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/530547 (https://phabricator.wikimedia.org/T225380) (owner: 10BryanDavis)
[09:32:44] <wikibugs>	 (03CR) 10BryanDavis: "> Is there still Toolforge on jessie? If so, this will need an" [puppet] - 10https://gerrit.wikimedia.org/r/530547 (https://phabricator.wikimedia.org/T225380) (owner: 10BryanDavis)
[09:33:18] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/530547 (https://phabricator.wikimedia.org/T225380) (owner: 10BryanDavis)
[09:34:34] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: envoyproxy: support debian jessie [puppet] - 10https://gerrit.wikimedia.org/r/529919
[09:36:05] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: toolforge: grid_environ: zstd is only available starting with startch [puppet] - 10https://gerrit.wikimedia.org/r/530551 (https://phabricator.wikimedia.org/T225380)
[09:37:05] <wikibugs>	 (03Abandoned) 10Arturo Borrero Gonzalez: toolforge: grid_environ: zstd is only available starting with startch [puppet] - 10https://gerrit.wikimedia.org/r/530551 (https://phabricator.wikimedia.org/T225380) (owner: 10Arturo Borrero Gonzalez)
[09:39:29] <icinga-wm>	 PROBLEM - Check the Netbox report-s- puppetdb for fail status. on netmon1002 is CRITICAL: puppetdb.PuppetDB CRITICAL https://wikitech.wikimedia.org/wiki/Netbox%23Reports
[09:44:46] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10Epic, 10cloud-services-team (Kanban): relocate/reimage cloudvirt1023 with 10G interfaces - https://phabricator.wikimedia.org/T229871 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cloudvirt1023.eqiad.wmnet'] `  Of which those **FAILED**: ` ['cloudvirt10...
[09:46:08] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10Epic, 10cloud-services-team (Kanban): relocate/reimage cloudvirt1023 with 10G interfaces - https://phabricator.wikimedia.org/T229871 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by aborrero on cumin1001.eqiad.wmnet for hosts: ` cloudvirt1023.eqiad.wmn...
[09:56:08] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10Epic, 10cloud-services-team (Kanban): relocate/reimage cloudvirt1023 with 10G interfaces - https://phabricator.wikimedia.org/T229871 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cloudvirt1023.eqiad.wmnet'] `  Of which those **FAILED**: ` ['cloudvirt10...
[10:04:12] <icinga-wm>	 RECOVERY - Check systemd state on an-coord1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:12:10] <wikibugs>	 10Operations, 10DC-Ops, 10Discovery-Search (Current work): can't SSH to elastic2050.mgmt - https://phabricator.wikimedia.org/T230597 (10Mathew.onipe)
[10:12:11] <wikibugs>	 (03PS1) 10Elukey: profile::analytics::refinery::job::data_purge: remove shell redirect [puppet] - 10https://gerrit.wikimedia.org/r/530555
[10:12:20] <wikibugs>	 10Operations, 10DC-Ops, 10Discovery-Search (Current work): can't SSH to elastic2050.mgmt - https://phabricator.wikimedia.org/T230597 (10Mathew.onipe) p:05Triage→03High
[10:14:27] <wikibugs>	 (03PS2) 10Elukey: profile::analytics::refinery::job::data_purge: remove shell redirect [puppet] - 10https://gerrit.wikimedia.org/r/530555
[10:15:53] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] profile::analytics::refinery::job::data_purge: remove shell redirect [puppet] - 10https://gerrit.wikimedia.org/r/530555 (owner: 10Elukey)
[10:33:51] <wikibugs>	 (03PS1) 10Elukey: profile::analytics::cluster::packages::common: temp remove python3-tk [puppet] - 10https://gerrit.wikimedia.org/r/530556 (https://phabricator.wikimedia.org/T229347)
[10:35:43] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] profile::analytics::cluster::packages::common: temp remove python3-tk [puppet] - 10https://gerrit.wikimedia.org/r/530556 (https://phabricator.wikimedia.org/T229347) (owner: 10Elukey)
[10:51:03] <wikibugs>	 (03CR) 10Alex Monk: "What's the reason for the python-cryptography version bump?" [software/acme-chief] - 10https://gerrit.wikimedia.org/r/530464 (https://phabricator.wikimedia.org/T219765) (owner: 10Vgutierrez)
[10:59:59] <wikibugs>	 (03CR) 10Alex Monk: [C: 03+2] ocsp: Provide basic test coverage [software/acme-chief] - 10https://gerrit.wikimedia.org/r/530548 (https://phabricator.wikimedia.org/T219765) (owner: 10Vgutierrez)
[11:12:45] <wikibugs>	 (03CR) 10Alex Monk: acme_chief: Provide OCSP responses (034 comments) [software/acme-chief] - 10https://gerrit.wikimedia.org/r/530465 (https://phabricator.wikimedia.org/T219765) (owner: 10Vgutierrez)
[11:25:29] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove apt-setup/multiarch from d-i config [puppet] - 10https://gerrit.wikimedia.org/r/530559
[11:29:25] <wikibugs>	 (03CR) 10Krinkle: "@Ori I like the direction and pre-building. I'm not sure how long it would take to run for 900+ wikis, but I think it is worth trying. Cou" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528447 (https://phabricator.wikimedia.org/T217830) (owner: 10Krinkle)
[11:46:37] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/530559 (owner: 10Muehlenhoff)
[12:19:25] <wikibugs>	 (03CR) 10Elukey: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/529399 (https://phabricator.wikimedia.org/T229357) (owner: 10Cwhite)
[12:19:48] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] logster: add ensure parameter [puppet] - 10https://gerrit.wikimedia.org/r/529399 (https://phabricator.wikimedia.org/T229357) (owner: 10Cwhite)
[12:21:04] <wikibugs>	 (03PS6) 10Elukey: logster: add ensure parameter [puppet] - 10https://gerrit.wikimedia.org/r/529399 (https://phabricator.wikimedia.org/T229357) (owner: 10Cwhite)
[12:22:00] <icinga-wm>	 PROBLEM - Number of backend failures per minute from CirrusSearch on graphite1004 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [600.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?orgId=1&var-cluster=eqiad&var-smoothing=1&panelId=9&fullscreen
[12:23:32] <wikibugs>	 (03CR) 10Elukey: "There was a parent change that was causing some troubles, rebased and resent the code review :)" [puppet] - 10https://gerrit.wikimedia.org/r/529399 (https://phabricator.wikimedia.org/T229357) (owner: 10Cwhite)
[12:25:50] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] "https://puppet-compiler.wmflabs.org/compiler1002/17922/ looks fine!" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/529399 (https://phabricator.wikimedia.org/T229357) (owner: 10Cwhite)
[12:26:18] <wikibugs>	 (03CR) 10Muehlenhoff: profile::kerberos::kdc: add debconf settings (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/529786 (https://phabricator.wikimedia.org/T226089) (owner: 10Elukey)
[12:29:26] <wikibugs>	 (03CR) 10Elukey: profile::kerberos::kdc: add debconf settings (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/529786 (https://phabricator.wikimedia.org/T226089) (owner: 10Elukey)
[12:29:58] <icinga-wm>	 RECOVERY - Number of backend failures per minute from CirrusSearch on graphite1004 is OK: OK: Less than 20.00% above the threshold [300.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?orgId=1&var-cluster=eqiad&var-smoothing=1&panelId=9&fullscreen
[12:36:47] <wikibugs>	 (03CR) 10Elukey: [V: 03+2 C: 03+2] Add more granularity to query/time|size buckets [software/druid_exporter] - 10https://gerrit.wikimedia.org/r/519365 (https://phabricator.wikimedia.org/T226035) (owner: 10Elukey)
[12:40:26] <wikibugs>	 (03CR) 10Muehlenhoff: "That looks fine approach-wise, I'll have a closer look/review on Monday." [puppet] - 10https://gerrit.wikimedia.org/r/529733 (https://phabricator.wikimedia.org/T226089) (owner: 10Elukey)
[12:55:14] <wikibugs>	 10Operations, 10netops: Investigate the potential benefits of BGPalerter - https://phabricator.wikimedia.org/T230600 (10CDanis) p:05Triage→03Normal
[12:55:18] <wikibugs>	 10Operations, 10vm-requests: Site: 2 VMs for puppetdb - https://phabricator.wikimedia.org/T230609 (10MoritzMuehlenhoff)
[12:55:32] <wikibugs>	 10Operations, 10DBA, 10Data-Services: Prepare and check storage layer for nqowiki - https://phabricator.wikimedia.org/T230543 (10CDanis) p:05Triage→03Normal
[12:55:42] <wikibugs>	 10Operations, 10Puppet: offboard-user.py: do not hardcode Phabricator project names, use PHID instead - https://phabricator.wikimedia.org/T230516 (10CDanis) p:05Triage→03Normal
[12:55:56] <wikibugs>	 10Operations, 10DNS, 10Traffic, 10Wikimedia-Apache-configuration, 10Patch-For-Review: Sundown aliases `minnan` and `zh-cfr` for `nan`/`zh-min-nan` - https://phabricator.wikimedia.org/T230382 (10CDanis) p:05Triage→03Normal
[12:56:22] <wikibugs>	 10Operations, 10Puppet: clean up systemd::timer::job logging basedir mess - https://phabricator.wikimedia.org/T230127 (10CDanis) 05Open→03Resolved a:03CDanis
[12:56:47] <wikibugs>	 10Operations, 10Mail, 10OTRS: check OTRS wiki for email addresses no longer used - https://phabricator.wikimedia.org/T230243 (10CDanis) p:05Triage→03Normal
[12:57:10] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Set up mailing list for Santali Wikipedia - https://phabricator.wikimedia.org/T230435 (10CDanis) a:03CDanis
[12:57:29] <wikibugs>	 10Operations, 10vm-requests: Site: 2 VMs for puppetdb - https://phabricator.wikimedia.org/T230609 (10CDanis) p:05Triage→03Normal
[12:58:04] <wikibugs>	 10Operations, 10Analytics, 10Discovery, 10Research-Backlog: Run swift-object-expirer as part of the swift cluster - https://phabricator.wikimedia.org/T229584 (10CDanis) p:05Triage→03Normal
[12:58:41] <wikibugs>	 10Operations, 10Release-Engineering-Team, 10cloud-services-team (Kanban): Requesting access to Puppet for Viztor[S] - https://phabricator.wikimedia.org/T229894 (10CDanis) p:05Triage→03Normal
[12:58:53] <wikibugs>	 10Operations, 10Cassandra: Create a cassandra.service which subsumes casandra-{a,b,c} services using PartsOf=cassandra.service - https://phabricator.wikimedia.org/T229916 (10CDanis) p:05Triage→03Normal
[12:59:22] <wikibugs>	 10Operations, 10Wiki-Setup (Delete / Redirect): Merge or delete grantswiki - https://phabricator.wikimedia.org/T229950 (10CDanis) p:05Triage→03Normal
[12:59:37] <wikibugs>	 10Operations: decom cookbook: dry-run mode not working / PuppetDB and Debmonitor removals can fail - https://phabricator.wikimedia.org/T229998 (10CDanis) p:05Triage→03Normal
[13:00:22] <wikibugs>	 10Operations, 10MediaWiki-Maintenance-scripts, 10serviceops: Stop forcing RUNNER=php for foreachwiki/foreachwikiindblist - https://phabricator.wikimedia.org/T230110 (10CDanis)
[13:00:30] <wikibugs>	 10Operations, 10MediaWiki-Maintenance-scripts, 10serviceops: Stop forcing RUNNER=php for foreachwiki/foreachwikiindblist - https://phabricator.wikimedia.org/T230110 (10CDanis) p:05Triage→03Normal
[13:01:50] <wikibugs>	 10Operations, 10Commons, 10MediaWiki-File-management, 10Multimedia, and 2 others: server-cache did neither update on uploading nor with ?action=purge - https://phabricator.wikimedia.org/T228433 (10CDanis) p:05Triage→03Normal
[13:01:57] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: b2-eqiad pdu refresh (Tuesday 10/29 @11am UTC) - https://phabricator.wikimedia.org/T227538 (10CDanis) p:05Triage→03Normal
[13:02:03] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: b1-eqiad pdu refresh (Thursday 10/10 @11am UTC) - https://phabricator.wikimedia.org/T227536 (10CDanis) p:05Triage→03Normal
[13:02:14] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: a7-eqiad pdu refresh - https://phabricator.wikimedia.org/T227143 (10CDanis) p:05Triage→03Normal
[13:02:19] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: a6-eqiad pdu refresh (Tuesday 10/22 @11am UTC) - https://phabricator.wikimedia.org/T227142 (10CDanis) p:05Triage→03Normal
[13:02:24] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: a8-eqiad pdu refresh (Thursday 9/19 @11am UTC) - https://phabricator.wikimedia.org/T227133 (10CDanis) p:05Triage→03Normal
[13:02:31] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: a1-eqiad pdu refresh (Thursday 9/12 @11am UTC) - https://phabricator.wikimedia.org/T226782 (10CDanis) p:05Triage→03Normal
[13:03:07] <wikibugs>	 10Operations, 10PDF-Rendering, 10Proton, 10Reading-Infrastructure-Team-Backlog, and 2 others: PDF renderer needs better CJK font - https://phabricator.wikimedia.org/T226633 (10CDanis) p:05Triage→03Normal
[13:11:03] <wikibugs>	 10Operations, 10Release-Engineering-Team: Requesting access to Puppet for Viztor[S] - https://phabricator.wikimedia.org/T229894 (10bd808)
[13:17:52] <wikibugs>	 10Puppet, 10Beta-Cluster-Infrastructure: Puppet error on deployment-logtash03 - https://phabricator.wikimedia.org/T230611 (10Krenair)
[13:18:11] <wikibugs>	 10Puppet, 10Beta-Cluster-Infrastructure: Puppet error on deployment-logtash03 - https://phabricator.wikimedia.org/T230611 (10Krenair)
[13:30:14] <icinga-wm>	 PROBLEM - HHVM rendering on mw1279 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[13:31:42] <icinga-wm>	 RECOVERY - HHVM rendering on mw1279 is OK: HTTP OK: HTTP/1.1 200 OK - 79058 bytes in 2.275 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[13:33:14] <wikibugs>	 (03CR) 10Alex Monk: ocsp: Provide basic test coverage [software/acme-chief] - 10https://gerrit.wikimedia.org/r/530548 (https://phabricator.wikimedia.org/T219765) (owner: 10Vgutierrez)
[13:33:34] <wikibugs>	 10Puppet, 10Beta-Cluster-Infrastructure: Puppet error on deployment-logtash03 - https://phabricator.wikimedia.org/T230611 (10herron) Not sure what caused the system to be in this state, but after the following steps logstash is back up and running.  ` root@deployment-logstash03:~# apt remove logstash Reading p...
[13:37:29] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Set up mailing list for Santali Wikipedia - https://phabricator.wikimedia.org/T230435 (10CDanis) Hi Manik,  Happy to create this for you, but first, we'll also need a second list administrator -- can you provide someone?  Thanks!
[13:42:21] <wikibugs>	 (03CR) 10Alex Monk: acme_chief: Provide OCSP responses (031 comment) [software/acme-chief] - 10https://gerrit.wikimedia.org/r/530465 (https://phabricator.wikimedia.org/T219765) (owner: 10Vgutierrez)
[13:45:27] <wikibugs>	 10Puppet, 10Beta-Cluster-Infrastructure: Puppet error on deployment-logtash03 - https://phabricator.wikimedia.org/T230611 (10Krenair) Interesting, okay, so the file is `/etc/systemd/system/logstash.service` (init.pp left out the `system/` part), and that doesn't seem to come from a package: `dpkg-query: no pat...
[13:46:01] <wikibugs>	 (03PS1) 10Mholloway: Machine vision (beta): Configure Wikidata Beta item URL template [mediawiki-config] - 10https://gerrit.wikimedia.org/r/530575
[13:46:39] <wikibugs>	 10Puppet, 10Beta-Cluster-Infrastructure: Puppet error on deployment-logtash03 - https://phabricator.wikimedia.org/T230611 (10Krenair) 05Open→03Resolved a:03herron
[13:49:17] <wikibugs>	 (03CR) 10Mholloway: [C: 03+2] MachineVision (Beta): Request labels targeting Beta Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/530460 (owner: 10Mholloway)
[13:50:23] <wikibugs>	 (03Merged) 10jenkins-bot: MachineVision (Beta): Request labels targeting Beta Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/530460 (owner: 10Mholloway)
[13:50:39] <wikibugs>	 (03CR) 10jenkins-bot: MachineVision (Beta): Request labels targeting Beta Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/530460 (owner: 10Mholloway)
[13:52:58] <logmsgbot>	 !log mholloway-shell@deploy1001 Synchronized wmf-config/InitialiseSettings-labs.php: MachineVision (beta): Request labels targeting Beta Wikidata (duration: 00m 50s)
[13:53:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:59:54] <wikibugs>	 (03CR) 10Alex Monk: "<Krenair> while a given cert_id/key_type_id combination is in the process of being renewed" [software/acme-chief] - 10https://gerrit.wikimedia.org/r/530465 (https://phabricator.wikimedia.org/T219765) (owner: 10Vgutierrez)
[14:02:27] <wikibugs>	 (03PS2) 10Mholloway: Machine vision (beta): Configure Wikidata Beta item URL template [mediawiki-config] - 10https://gerrit.wikimedia.org/r/530575
[14:08:27] <wikibugs>	 (03PS1) 10Ema: profile::tlsproxy::instance: do not autostart nginx [puppet] - 10https://gerrit.wikimedia.org/r/530578
[14:12:41] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good, one typo inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/530578 (owner: 10Ema)
[14:19:43] <wikibugs>	 (03PS1) 10Jhedden: openstack: change codfw nova api and metadata port [puppet] - 10https://gerrit.wikimedia.org/r/530580 (https://phabricator.wikimedia.org/T223907)
[14:20:48] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] openstack: change codfw nova api and metadata port [puppet] - 10https://gerrit.wikimedia.org/r/530580 (https://phabricator.wikimedia.org/T223907) (owner: 10Jhedden)
[14:24:32] <gehel>	 !log rolling reboot of cloudelastic
[14:24:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:25:50] <wikibugs>	 (03PS2) 10Jhedden: openstack: change codfw nova api and metadata port [puppet] - 10https://gerrit.wikimedia.org/r/530580 (https://phabricator.wikimedia.org/T223907)
[14:26:52] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] openstack: change codfw nova api and metadata port [puppet] - 10https://gerrit.wikimedia.org/r/530580 (https://phabricator.wikimedia.org/T223907) (owner: 10Jhedden)
[14:30:14] <wikibugs>	 (03PS3) 10Jhedden: openstack: change codfw nova api and metadata port [puppet] - 10https://gerrit.wikimedia.org/r/530580 (https://phabricator.wikimedia.org/T223907)
[14:30:23] <wikibugs>	 (03CR) 10Herron: [C: 03+1] icinga: disable autocomplete.js in icinga search text input [puppet] - 10https://gerrit.wikimedia.org/r/528586 (owner: 10Cwhite)
[14:31:01] <wikibugs>	 (03PS1) 10Elukey: Add metrics related to number of queries to Broker and Historicals [software/druid_exporter] - 10https://gerrit.wikimedia.org/r/530583
[14:31:56] <wikibugs>	 (03CR) 10Elukey: [V: 03+2 C: 03+2] Add metrics related to number of queries to Broker and Historicals [software/druid_exporter] - 10https://gerrit.wikimedia.org/r/530583 (owner: 10Elukey)
[14:39:06] <onimisionipe>	 !log run `bmc-device --cold-reset; echo $?` in elastic2050 hoping it resets mgmt interface -T230597
[14:39:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:39:14] <stashbot>	 T230597: can't SSH to elastic2050.mgmt  - https://phabricator.wikimedia.org/T230597
[14:46:25] <wikibugs>	 (03PS4) 10Jhedden: openstack: change codfw nova api and metadata port [puppet] - 10https://gerrit.wikimedia.org/r/530580 (https://phabricator.wikimedia.org/T223907)
[14:47:56] <wikibugs>	 10Operations, 10DC-Ops, 10Discovery-Search (Current work): can't SSH to elastic2050.mgmt - https://phabricator.wikimedia.org/T230597 (10RobH) Please note this mgmt interface is still down:   ` robh@cumin2001:~$ ping elastic2050.mgmt.codfw.wmnet PING elastic2050.mgmt.codfw.wmnet (10.193.3.56) 56(84) bytes of...
[14:47:59] <wikibugs>	 (03PS3) 10Alexandros Kosiaris: mediawiki: Introduce startupregistrystats.pp to record RL modules registry [puppet] - 10https://gerrit.wikimedia.org/r/528526 (https://phabricator.wikimedia.org/T229836) (owner: 10Ladsgroup)
[14:48:10] <wikibugs>	 (03PS4) 10Alexandros Kosiaris: mediawiki: Introduce startupregistrystats.pp to record RL modules registry [puppet] - 10https://gerrit.wikimedia.org/r/528526 (https://phabricator.wikimedia.org/T229836) (owner: 10Ladsgroup)
[14:49:47] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] mediawiki: Introduce startupregistrystats.pp to record RL modules registry [puppet] - 10https://gerrit.wikimedia.org/r/528526 (https://phabricator.wikimedia.org/T229836) (owner: 10Ladsgroup)
[14:51:09] <wikibugs>	 (03CR) 10Jhedden: "compiler results: https://puppet-compiler.wmflabs.org/compiler1001/17924/" [puppet] - 10https://gerrit.wikimedia.org/r/530580 (https://phabricator.wikimedia.org/T223907) (owner: 10Jhedden)
[14:51:27] <wikibugs>	 (03PS5) 10Alexandros Kosiaris: mediawiki: Introduce startupregistrystats.pp to record RL modules registry [puppet] - 10https://gerrit.wikimedia.org/r/528526 (https://phabricator.wikimedia.org/T229836) (owner: 10Ladsgroup)
[14:53:40] <wikibugs>	 (03CR) 10Jhedden: "Once I verify that this works as expected in codfw I'll run through the other services and submit a patch with the full haproxy configurat" [puppet] - 10https://gerrit.wikimedia.org/r/530580 (https://phabricator.wikimedia.org/T223907) (owner: 10Jhedden)
[14:55:06] <wikibugs>	 (03PS1) 10Elukey: Add QueryCountStatsMonitor to Druid broker/historicals [puppet] - 10https://gerrit.wikimedia.org/r/530588
[14:55:20] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] mediawiki: Introduce startupregistrystats.pp to record RL modules registry [puppet] - 10https://gerrit.wikimedia.org/r/528526 (https://phabricator.wikimedia.org/T229836) (owner: 10Ladsgroup)
[14:57:45] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10User-Zppix, 10cloud-services-team (Kanban): VMs on cloudvirt1015 crashing - bad mainboard/memory - https://phabricator.wikimedia.org/T220853 (10Cmjohnson) Dell approved my ticket.  I talked to the technician today and he will be out Monday morning to replace the mot...
[14:57:46] <wikibugs>	 (03PS2) 10Elukey: Add QueryCountStatsMonitor to Druid broker/historicals [puppet] - 10https://gerrit.wikimedia.org/r/530588
[14:59:17] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] Add QueryCountStatsMonitor to Druid broker/historicals [puppet] - 10https://gerrit.wikimedia.org/r/530588 (owner: 10Elukey)
[15:04:03] <wikibugs>	 10Operations, 10vm-requests, 10cloud-services-team (Kanban): Three small ganeti VMs to host haproxy for OpenStack endpoints - https://phabricator.wikimedia.org/T227041 (10JHedden) 05Stalled→03Resolved For this phase we're going to install haproxy directly on the openstack controllers. We will not be need...
[15:07:53] <wikibugs>	 (03PS2) 10Ema: profile::tlsproxy::instance: do not autostart nginx [puppet] - 10https://gerrit.wikimedia.org/r/530578
[15:13:15] <wikibugs>	 (03PS2) 10Bstorm: toolforge: rebranding k8s control plane to control [puppet] - 10https://gerrit.wikimedia.org/r/530186 (https://phabricator.wikimedia.org/T229009)
[15:15:27] <wikibugs>	 (03CR) 10Bstorm: [C: 03+2] toolforge: rebranding k8s control plane to control [puppet] - 10https://gerrit.wikimedia.org/r/530186 (https://phabricator.wikimedia.org/T229009) (owner: 10Bstorm)
[15:18:50] <wikibugs>	 (03CR) 10Ori.livneh: [C: 03+1] "I'm confident it could be done efficiently. IIRC, generating the configuration for a wiki with a cold cache took something like 40ms on an" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528447 (https://phabricator.wikimedia.org/T217830) (owner: 10Krinkle)
[15:21:15] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops, 10Discovery-Search (Current work): can't SSH to elastic2050.mgmt - https://phabricator.wikimedia.org/T230597 (10RobH) a:03Papaul IRC sync: Chatted with @Mathew.onipe, who let me know they had synced with @papaul to take this offline on Monday to reset the power/bmc.
[15:22:20] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops, 10Discovery-Search (Current work): can't SSH to elastic2050.mgmt - https://phabricator.wikimedia.org/T230597 (10RobH)
[15:41:54] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops: solve mtp panel issue for row uplinks - https://phabricator.wikimedia.org/T112774 (10Papaul) 05Open→03Resolved  Resolving this task since we will not be using and we are not using the patch panels. This will not be setup anymore
[15:42:33] <elukey>	 !log roll restart of druid broker/historicals to pick up new logging/metrics settings
[15:42:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:43:14] <wikibugs>	 (03PS1) 10RobH: update to dell skus [software] - 10https://gerrit.wikimedia.org/r/530597
[15:43:45] <wikibugs>	 10Operations, 10MediaWiki-General, 10Multimedia: Segmentation fault creating thumbnail - https://phabricator.wikimedia.org/T159242 (10Ebe123) Note that even though no error is shown, the image is not of Richmond City but of Richmond //County//, so this bug has not been resolved.
[15:46:29] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops, 10Discovery-Search (Current work): can't SSH to elastic2050.mgmt - https://phabricator.wikimedia.org/T230597 (10Papaul) @Mathew.onipe any reason why this is set to high priority ?
[15:47:56] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops, 10Discovery-Search (Current work): can't SSH to elastic2050.mgmt - https://phabricator.wikimedia.org/T230597 (10Mathew.onipe) p:05High→03Normal
[15:48:53] <wikibugs>	 10Operations, 10netops: Investigate the potential benefits of BGPalerter - https://phabricator.wikimedia.org/T230600 (10ayounsi) Indeed, should replace bgpmon.net (going EoL soon).
[15:49:01] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops, 10Discovery-Search (Current work): can't SSH to elastic2050.mgmt - https://phabricator.wikimedia.org/T230597 (10Mathew.onipe) @Papaul On second thought, we have other servers and losing one elastic node is Ok. So this should be set to normal
[15:52:31] <wikibugs>	 (03CR) 10RobH: [C: 03+1] "I'm awaiting feedback from Dell confirming this SKU change is going to be used on all quotations going forward.  Once I have that confirma" [software] - 10https://gerrit.wikimedia.org/r/530597 (owner: 10RobH)
[15:53:24] <wikibugs>	 (03PS1) 10Jbond: apereo_cas: bump to RC5 [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/530600
[15:58:59] <wikibugs>	 (03CR) 10Jbond: [V: 03+2 C: 03+2] apereo_cas: bump to RC5 [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/530600 (owner: 10Jbond)
[16:12:14] <elukey>	 !log upload prometheus-druid-exporter 0.7-1 to stretch/buster-wikimedia
[16:12:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:13:17] <wikibugs>	 (03PS1) 10Mholloway: MachineVision (beta): Update handler services to support label lookups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/530605
[16:32:20] <wikibugs>	 (03PS1) 10Elukey: Fix README [software/druid_exporter] - 10https://gerrit.wikimedia.org/r/530607
[16:32:49] <wikibugs>	 (03CR) 10Elukey: [V: 03+2 C: 03+2] Fix README [software/druid_exporter] - 10https://gerrit.wikimedia.org/r/530607 (owner: 10Elukey)
[16:36:36] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10User-Zppix, 10cloud-services-team (Kanban): VMs on cloudvirt1015 crashing - bad mainboard/memory - https://phabricator.wikimedia.org/T220853 (10wiki_willy) Thanks Chris, hopefully this will solve things.
[16:38:37] <XioNoX>	 !log add BGP sessions to Scaleway (AS12876) in esams
[16:38:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:40:56] <wikibugs>	 (03PS1) 10Jbond: apereo_cas: roll back version to 6.1.0-RC4 [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/530608
[16:41:26] <wikibugs>	 (03CR) 10Jbond: [V: 03+2 C: 03+2] apereo_cas: roll back version to 6.1.0-RC4 [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/530608 (owner: 10Jbond)
[16:48:13] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/17920/" [puppet] - 10https://gerrit.wikimedia.org/r/529919 (owner: 10Giuseppe Lavagetto)
[16:48:22] <wikibugs>	 (03PS3) 10Giuseppe Lavagetto: envoyproxy: support debian jessie [puppet] - 10https://gerrit.wikimedia.org/r/529919
[16:50:02] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Set up mailing list for Santali Wikipedia - https://phabricator.wikimedia.org/T230435 (10Manik87) Dear @CDanis, Please see the below information: Name: R Ashwani Banjan Murmu E-mail: ashwani.murmu@gmail.com  Thanks again for your support.  Manik
[17:01:21] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Set up mailing list for Santali Wikipedia - https://phabricator.wikimedia.org/T230435 (10CDanis) 05Open→03Resolved List created!  @Manik87 you should have received an email with your administrator password for the mailing list.  Please also add the mailing list to t...
[17:02:06] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Set up mailing list for Santali Wikipedia - https://phabricator.wikimedia.org/T230435 (10CDanis) Oh, also, please note that list administrators are not automatically subscribed to the list -- subscribe yourselves if you want to receive posts.
[17:20:16] <wikibugs>	 (03PS1) 10Urbanecm: Assign all rights assigned to suppress group to oversight group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/530612 (https://phabricator.wikimedia.org/T230601)
[17:21:12] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Assign all rights assigned to suppress group to oversight group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/530612 (https://phabricator.wikimedia.org/T230601) (owner: 10Urbanecm)
[17:26:16] <wikibugs>	 (03PS2) 10Urbanecm: Assign all rights assigned to suppress group to oversight group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/530612 (https://phabricator.wikimedia.org/T230601)
[17:27:42] <hauskatze>	 Urbanecm: that suppress thing is a flow-specific "thing" I think.
[17:27:50] <hauskatze>	 totally useless
[17:28:05] <hauskatze>	 at least for WMF wikis - its permissions should be on the oversight group indeed
[17:31:54] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: aptrepo: add distributions-wikimedia as well, conform naming [puppet] - 10https://gerrit.wikimedia.org/r/530615
[17:33:16] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: aptrepo: add distributions-wikimedia as well, conform naming [puppet] - 10https://gerrit.wikimedia.org/r/530615
[17:34:51] <_joe_>	 third time's the charm?
[17:34:54] <wikibugs>	 (03PS3) 10Giuseppe Lavagetto: aptrepo: add distributions-wikimedia as well, conform naming [puppet] - 10https://gerrit.wikimedia.org/r/530615
[17:35:15] <_joe_>	 apparently not!
[17:36:10] <wikibugs>	 (03PS4) 10Giuseppe Lavagetto: aptrepo: add distributions-wikimedia as well, conform naming [puppet] - 10https://gerrit.wikimedia.org/r/530615
[17:36:39] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] aptrepo: add distributions-wikimedia as well, conform naming [puppet] - 10https://gerrit.wikimedia.org/r/530615 (owner: 10Giuseppe Lavagetto)
[17:40:31] <wikibugs>	 (03PS1) 10Herron: prometheus: add prometheus ipsec exporter service & config [puppet] - 10https://gerrit.wikimedia.org/r/530616 (https://phabricator.wikimedia.org/T230236)
[17:45:35] <wikibugs>	 (03CR) 10JJMC89: [C: 03+1] Assign all rights assigned to suppress group to oversight group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/530612 (https://phabricator.wikimedia.org/T230601) (owner: 10Urbanecm)
[17:45:54] <icinga-wm>	 PROBLEM - Host mr1-codfw.oob is DOWN: PING CRITICAL - Packet loss = 100%
[17:48:39] <wikibugs>	 (03CR) 10Daimona Eaytoy: [C: 04-1] Assign all rights assigned to suppress group to oversight group (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/530612 (https://phabricator.wikimedia.org/T230601) (owner: 10Urbanecm)
[17:49:10] <icinga-wm>	 PROBLEM - IPv4 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 131 probes of 492 (alerts on 35) - https://atlas.ripe.net/measurements/1791210/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts
[17:50:48] <icinga-wm>	 PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: CRITICAL - failed 50 probes of 449 (alerts on 35) - https://atlas.ripe.net/measurements/1791212/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts
[17:51:36] <icinga-wm>	 RECOVERY - Host mr1-codfw.oob is UP: PING OK - Packet loss = 0%, RTA = 30.34 ms
[17:54:42] <icinga-wm>	 RECOVERY - IPv4 ping to codfw on ripe-atlas-codfw is OK: OK - failed 4 probes of 492 (alerts on 35) - https://atlas.ripe.net/measurements/1791210/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts
[17:56:20] <icinga-wm>	 RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is OK: OK - failed 21 probes of 449 (alerts on 35) - https://atlas.ripe.net/measurements/1791212/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts
[17:59:22] <cdanis>	 XioNoX: I don't see any maintenances that could be responsible for that codfw blip
[18:00:01] <XioNoX>	 looking
[18:00:22] <XioNoX>	 interesting, even the oob, which is a totally different network
[18:00:34] <wikibugs>	 (03PS3) 10Urbanecm: Assign all rights assigned to suppress group to oversight group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/530612 (https://phabricator.wikimedia.org/T230601)
[18:01:07] <wikibugs>	 (03CR) 10Urbanecm: Assign all rights assigned to suppress group to oversight group (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/530612 (https://phabricator.wikimedia.org/T230601) (owner: 10Urbanecm)
[18:01:59] <wikibugs>	 (03CR) 10Daimona Eaytoy: [C: 03+1] Assign all rights assigned to suppress group to oversight group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/530612 (https://phabricator.wikimedia.org/T230601) (owner: 10Urbanecm)
[18:02:52] <XioNoX>	 could be a telia network issue, as we go through telia from icinga to mr1-codfw.oob (which is a Cyrus one IP)
[18:08:49] <Urbanecm>	 hauskatze, well, so should we just change suppress to oversight in AbuseFilter extension?
[18:08:51] <Urbanecm>	 (btw, Flow uses oversight rn)
[18:11:12] <hauskatze>	 is it AF this time?
[18:11:15] <hauskatze>	 *facepalm*
[18:12:37] <Urbanecm>	 that doesn't answer my question...
[18:12:38] <Urbanecm>	 ...but there's no one to ask now
[18:14:57] <Bsadowski1>	 Hmm
[18:53:21] <wikibugs>	 (03PS1) 10Eevans: Revert "Deploy 2019-08-14-210839-production Docker image" [deployment-charts] - 10https://gerrit.wikimedia.org/r/530627 (https://phabricator.wikimedia.org/T229697)
[18:53:23] <wikibugs>	 (03PS1) 10Eevans: Revert "sessionstore: (Temporarily )use HTTP for liveness" [deployment-charts] - 10https://gerrit.wikimedia.org/r/530628 (https://phabricator.wikimedia.org/T229697)
[18:55:00] <wikibugs>	 (03CR) 10Eevans: "Reverting as discussed." [deployment-charts] - 10https://gerrit.wikimedia.org/r/530627 (https://phabricator.wikimedia.org/T229697) (owner: 10Eevans)
[18:55:19] <wikibugs>	 (03CR) 10Eevans: "Reverting as discussed." [deployment-charts] - 10https://gerrit.wikimedia.org/r/530628 (https://phabricator.wikimedia.org/T229697) (owner: 10Eevans)
[18:55:44] <wikibugs>	 (03CR) 10Eevans: [V: 03+2 C: 03+2] Revert "Deploy 2019-08-14-210839-production Docker image" [deployment-charts] - 10https://gerrit.wikimedia.org/r/530627 (https://phabricator.wikimedia.org/T229697) (owner: 10Eevans)
[18:55:54] <wikibugs>	 (03CR) 10Eevans: [V: 03+2 C: 03+2] Revert "sessionstore: (Temporarily )use HTTP for liveness" [deployment-charts] - 10https://gerrit.wikimedia.org/r/530628 (https://phabricator.wikimedia.org/T229697) (owner: 10Eevans)
[18:57:09] <logmsgbot>	 !log @ helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
[18:57:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:06:40] <icinga-wm>	 PROBLEM - IPv4 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 38 probes of 492 (alerts on 35) - https://atlas.ripe.net/measurements/1791210/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts
[19:09:40] <icinga-wm>	 RECOVERY - Check the Netbox report-s- cables for fail status. on netmon1002 is OK: cables.Cables OK https://wikitech.wikimedia.org/wiki/Netbox%23Reports
[19:12:18] <icinga-wm>	 RECOVERY - IPv4 ping to codfw on ripe-atlas-codfw is OK: OK - failed 27 probes of 492 (alerts on 35) - https://atlas.ripe.net/measurements/1791210/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts
[19:13:02] <XioNoX>	 cdanis: digging a bit more, seems like Telia is having issues around the east coast
[19:14:49] <cdanis>	 oh interesting
[19:16:14] <XioNoX>	 the probes failing seem to be in europe
[19:16:49] <XioNoX>	 nothing super clear, but that's all using the mesurement link above and doing some reverse mtr to the ones that failed
[19:38:12] <sbassett>	 Hey all - I'd like to deploy sec patch for T230576 (ex:MobileFrontend) now.
[19:48:24] <sbassett>	 !log Deployed security patch for T230576 (ex:MobileFrontend)
[19:48:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:17:04] <icinga-wm>	 PROBLEM - Elasticsearch HTTPS for production-search-eqiad on elastic1046 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused https://wikitech.wikimedia.org/wiki/Search
[20:17:22] <icinga-wm>	 PROBLEM - Check size of conntrack table on elastic1046 is CRITICAL: connect to address 10.64.16.70 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[20:17:22] <icinga-wm>	 PROBLEM - Check systemd state on elastic1046 is CRITICAL: connect to address 10.64.16.70 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[20:17:26] <icinga-wm>	 PROBLEM - configured eth on elastic1046 is CRITICAL: connect to address 10.64.16.70 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[20:17:32] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on elastic1046 is CRITICAL: connect to address 10.64.16.70 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[20:17:44] <icinga-wm>	 PROBLEM - dhclient process on elastic1046 is CRITICAL: connect to address 10.64.16.70 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[20:18:00] <icinga-wm>	 PROBLEM - SSH on elastic1046 is CRITICAL: connect to address 10.64.16.70 and port 22: Connection refused https://wikitech.wikimedia.org/wiki/SSH/monitoring
[20:18:06] <icinga-wm>	 PROBLEM - DPKG on elastic1046 is CRITICAL: connect to address 10.64.16.70 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[20:18:08] <icinga-wm>	 PROBLEM - Disk space on elastic1046 is CRITICAL: connect to address 10.64.16.70 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=elastic1046&var-datasource=eqiad+prometheus/ops
[20:18:14] <icinga-wm>	 PROBLEM - Elasticsearch HTTPS for production-search-psi-eqiad on elastic1046 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused https://wikitech.wikimedia.org/wiki/Search
[20:22:21] <onimisionipe>	 ^expired downtime
[20:32:36] <wikibugs>	 (03PS5) 10Jhedden: openstack: Add codfw1dev nova API and metadata to haproxy [puppet] - 10https://gerrit.wikimedia.org/r/530580 (https://phabricator.wikimedia.org/T223907)
[20:33:36] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] openstack: Add codfw1dev nova API and metadata to haproxy [puppet] - 10https://gerrit.wikimedia.org/r/530580 (https://phabricator.wikimedia.org/T223907) (owner: 10Jhedden)
[20:37:59] <wikibugs>	 (03PS1) 10Kosta Harlan: Echo: Enable poll for updates feature on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/530639 (https://phabricator.wikimedia.org/T219222)
[20:38:21] <wikibugs>	 (03PS2) 10Kosta Harlan: Echo: Enable poll for updates feature on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/530639 (https://phabricator.wikimedia.org/T219222)
[20:39:49] <wikibugs>	 (03PS1) 10Kosta Harlan: Echo: Enable poll for updates feature on enwiki beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/530640 (https://phabricator.wikimedia.org/T219222)
[20:42:16] <wikibugs>	 (03PS6) 10Jhedden: openstack: Add codfw1dev nova API and metadata to haproxy [puppet] - 10https://gerrit.wikimedia.org/r/530580 (https://phabricator.wikimedia.org/T223907)
[20:55:18] <wikibugs>	 (03PS7) 10Jhedden: openstack: Add codfw1dev nova API and metadata to haproxy [puppet] - 10https://gerrit.wikimedia.org/r/530580 (https://phabricator.wikimedia.org/T223907)
[21:03:48] <wikibugs>	 10Operations, 10ops-ulsfo: refresh/replace scs-ulsfo - https://phabricator.wikimedia.org/T230077 (10RobH)
[21:12:54] <wikibugs>	 10Operations, 10observability: automation: issue reminders for about-to-expire downtimes - https://phabricator.wikimedia.org/T230633 (10CDanis)
[21:13:00] <wikibugs>	 10Operations, 10observability: automation: issue reminders for about-to-expire downtimes - https://phabricator.wikimedia.org/T230633 (10CDanis) p:05Triage→03Normal
[21:16:33] <wikibugs>	 10Operations, 10Jade, 10Scoring-platform-team, 10TechCom, and 4 others: Deploy Jade extension MVP to production - https://phabricator.wikimedia.org/T183381 (10Halfak)
[21:30:00] <icinga-wm>	 PROBLEM - Logstash rate of ingestion percent change compared to yesterday on icinga1001 is CRITICAL: 131.1 ge 130 https://phabricator.wikimedia.org/T202307 https://grafana.wikimedia.org/dashboard/db/logstash?orgId=1&panelId=2&fullscreen