[01:05:51] <wikibugs>	 (03CR) 10Jhedden: [C: 03+1] "Great idea, looks good" [puppet] - 10https://gerrit.wikimedia.org/r/588163 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott)
[01:19:08] <wikibugs>	 (03CR) 10Jhedden: [C: 03+1] "Looks really good!" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/588169 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott)
[02:32:18] <icinga-wm>	 PROBLEM - PHP opcache health on scandium is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health
[02:50:40] <icinga-wm>	 RECOVERY - PHP opcache health on scandium is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health
[03:15:34] <icinga-wm>	 PROBLEM - cassandra-a SSL 10.192.16.85:7001 on restbase2014 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused https://phabricator.wikimedia.org/T120662
[03:15:40] <icinga-wm>	 PROBLEM - cassandra-c SSL 10.192.16.87:7001 on restbase2014 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused https://phabricator.wikimedia.org/T120662
[03:15:56] <icinga-wm>	 PROBLEM - cassandra-b SSL 10.192.16.86:7001 on restbase2014 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused https://phabricator.wikimedia.org/T120662
[03:16:04] <icinga-wm>	 PROBLEM - Check systemd state on restbase2014 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:16:04] <icinga-wm>	 PROBLEM - cassandra-b CQL 10.192.16.86:9042 on restbase2014 is CRITICAL: connect to address 10.192.16.86 and port 9042: Connection refused https://phabricator.wikimedia.org/T93886
[03:16:10] <icinga-wm>	 PROBLEM - cassandra-c CQL 10.192.16.87:9042 on restbase2014 is CRITICAL: connect to address 10.192.16.87 and port 9042: Connection refused https://phabricator.wikimedia.org/T93886
[03:16:14] <icinga-wm>	 PROBLEM - cassandra-a service on restbase2014 is CRITICAL: CRITICAL - Expecting active but unit cassandra-a is failed https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[03:16:38] <icinga-wm>	 PROBLEM - cassandra-a CQL 10.192.16.85:9042 on restbase2014 is CRITICAL: connect to address 10.192.16.85 and port 9042: Connection refused https://phabricator.wikimedia.org/T93886
[03:16:48] <icinga-wm>	 PROBLEM - cassandra-c service on restbase2014 is CRITICAL: CRITICAL - Expecting active but unit cassandra-c is failed https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[03:16:52] <icinga-wm>	 PROBLEM - cassandra-b service on restbase2014 is CRITICAL: CRITICAL - Expecting active but unit cassandra-b is failed https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[03:25:26] <icinga-wm>	 PROBLEM - MD RAID on restbase2014 is CRITICAL: CRITICAL: State: degraded, Active: 6, Working: 6, Failed: 0, Spare: 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[03:25:27] <icinga-wm>	 ACKNOWLEDGEMENT - MD RAID on restbase2014 is CRITICAL: CRITICAL: State: degraded, Active: 6, Working: 6, Failed: 0, Spare: 0 nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T250050 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[03:25:30] <wikibugs>	 10Operations, 10ops-codfw: Degraded RAID on restbase2014 - https://phabricator.wikimedia.org/T250050 (10ops-monitoring-bot)
[03:49:02] <icinga-wm>	 RECOVERY - Check systemd state on restbase2014 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:54:34] <icinga-wm>	 PROBLEM - Check systemd state on restbase2014 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[04:19:06] <icinga-wm>	 RECOVERY - cassandra-c service on restbase2014 is OK: OK - cassandra-c is active https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[04:24:36] <icinga-wm>	 PROBLEM - cassandra-c service on restbase2014 is CRITICAL: CRITICAL - Expecting active but unit cassandra-c is failed https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[04:33:24] <icinga-wm>	 RECOVERY - snapshot of s5 in eqiad on db1115 is OK: snapshot for s5 at eqiad taken less than 3 days ago and larger than 90 GB: Last one 2020-04-13 03:18:16 from db1102.eqiad.wmnet:3315 (667 GB) https://wikitech.wikimedia.org/wiki/MariaDB/Backups
[04:35:20] <icinga-wm>	 PROBLEM - Device not healthy -SMART- on restbase2014 is CRITICAL: cluster=restbase device=sdd instance=restbase2014:9100 job=node site=codfw https://wikitech.wikimedia.org/wiki/SMART%23Alerts https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=restbase2014&var-datasource=codfw+prometheus/ops
[05:18:33] <wikibugs>	 (03PS1) 10Vgutierrez: Release 8.0.7-rc0-1wm1 [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/588201
[05:19:18] <icinga-wm>	 RECOVERY - cassandra-c service on restbase2014 is OK: OK - cassandra-c is active https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[05:19:20] <icinga-wm>	 RECOVERY - cassandra-b service on restbase2014 is OK: OK - cassandra-b is active https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[05:24:46] <icinga-wm>	 PROBLEM - cassandra-c service on restbase2014 is CRITICAL: CRITICAL - Expecting active but unit cassandra-c is failed https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[05:24:48] <icinga-wm>	 PROBLEM - cassandra-b service on restbase2014 is CRITICAL: CRITICAL - Expecting active but unit cassandra-b is failed https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[05:25:22] <vgutierrez>	 !log restart varnish-fe on cp3050
[05:25:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:55:57] <wikibugs>	 (03PS1) 10Marostegui: clouddb.sql.erb: Add GRANTs file [puppet] - 10https://gerrit.wikimedia.org/r/588202
[05:59:34] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] Release 8.0.7-rc0-1wm1 [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/588201 (owner: 10Vgutierrez)
[06:02:39] <wikibugs>	 10Operations, 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for gr.wikimedia.org - https://phabricator.wikimedia.org/T245912 (10Marostegui) @Urbanecm I saw you created the database, next time please ping us on this sort of ticket "Prepare and check storage layer" s...
[06:03:02] <marostegui>	  !log Sanitize grwikimedia on db2094:3313 and db1124:3313 - T245912
[06:03:03] <stashbot>	 T245912: Prepare and check storage layer for gr.wikimedia.org - https://phabricator.wikimedia.org/T245912
[06:19:40] <icinga-wm>	 RECOVERY - cassandra-c service on restbase2014 is OK: OK - cassandra-c is active https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[06:19:42] <icinga-wm>	 RECOVERY - cassandra-b service on restbase2014 is OK: OK - cassandra-b is active https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[06:20:45] <vgutierrez>	 !log upload trafficserver 8.0.7-rc0-1wm1 to apt.wm.o (buster)
[06:20:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:23:41] <vgutierrez>	 !log upgrade to ats 8.0.7-rc0-1wm1 on cp[4026,4032,5006,5012]
[06:23:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:25:08] <icinga-wm>	 PROBLEM - cassandra-c service on restbase2014 is CRITICAL: CRITICAL - Expecting active but unit cassandra-c is failed https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[06:25:10] <icinga-wm>	 PROBLEM - cassandra-b service on restbase2014 is CRITICAL: CRITICAL - Expecting active but unit cassandra-b is failed https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[06:26:25] <wikibugs>	 (03PS1) 10Marostegui: pc[12]008: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/588203 (https://phabricator.wikimedia.org/T247787)
[06:27:32] <elukey>	 ah lovely - for restbase2014
[06:27:34] <elukey>	 java.nio.file.FileSystemException: /srv/sdc4/cassandra-a/data: Input/output error
[06:27:44] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] pc[12]008: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/588203 (https://phabricator.wikimedia.org/T247787) (owner: 10Marostegui)
[06:29:39] <marostegui>	 elukey: happy monday!
[06:31:24] <elukey>	 marostegui: hola :D indeed
[06:31:49] <elukey>	 so /dev/sdc failed, and it seems preventing cassandra instances on 2014 to run
[06:34:33] <elukey>	 ah nice https://phabricator.wikimedia.org/T250050
[06:36:19] <elukey>	 !log temporary stopped puppet on restbase2014 to avoid attempts to start cassandra on each run - T250050
[06:36:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:36:25] <stashbot>	 T250050: Degraded RAID on restbase2014 - https://phabricator.wikimedia.org/T250050
[06:37:43] <elukey>	 acked the alerts to avoid spam in here
[06:37:51] <elukey>	 err in icinga
[06:38:35] <wikibugs>	 10Operations, 10ops-codfw: Degraded RAID on restbase2014 - https://phabricator.wikimedia.org/T250050 (10elukey) @Eevans this is the weekend of broken cassandra hosts, adding you as FYI :)
[06:50:22] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1110 T249973', diff saved to https://phabricator.wikimedia.org/P10961 and previous config saved to /var/cache/conftool/dbconfig/20200413-065022-marostegui.json
[06:50:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:50:29] <stashbot>	 T249973: db1110 has 5 important database drifts that are unique to the host - https://phabricator.wikimedia.org/T249973
[06:51:31] <marostegui>	 !log Deploy schema changes on db1110 - T249973
[06:51:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:17:41] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1110 T249973', diff saved to https://phabricator.wikimedia.org/P10962 and previous config saved to /var/cache/conftool/dbconfig/20200413-071740-marostegui.json
[07:17:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:17:47] <stashbot>	 T249973: db1110 has 5 important database drifts that are unique to the host - https://phabricator.wikimedia.org/T249973
[07:23:46] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] mediawiki: Document the apache sample hosts [puppet] - 10https://gerrit.wikimedia.org/r/587289 (https://phabricator.wikimedia.org/T244472) (owner: 10Krinkle)
[07:23:59] <wikibugs>	 (03PS1) 10Vgutierrez: ATS: Enable res_track_memory in cp1085 [puppet] - 10https://gerrit.wikimedia.org/r/588366 (https://phabricator.wikimedia.org/T249335)
[07:30:15] <wikibugs>	 (03CR) 10Dzahn: icinga: Add git local changes check (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/588049 (owner: 10CRusnov)
[07:30:30] <wikibugs>	 (03PS2) 10Vgutierrez: ATS: Enable res_track_memory in cp1085 [puppet] - 10https://gerrit.wikimedia.org/r/588366 (https://phabricator.wikimedia.org/T249335)
[07:34:17] <wikibugs>	 (03CR) 10Vgutierrez: "https://puppet-compiler.wmflabs.org/compiler1002/21875/" [puppet] - 10https://gerrit.wikimedia.org/r/588366 (https://phabricator.wikimedia.org/T249335) (owner: 10Vgutierrez)
[07:37:37] <wikibugs>	 (03CR) 10Dzahn: [C: 03+1] "tried it out locally with the puppet repo. works for me. returns 0 without changes and CRIT when adding an untracked file." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/588049 (owner: 10CRusnov)
[07:39:40] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1092 T232446', diff saved to https://phabricator.wikimedia.org/P10963 and previous config saved to /var/cache/conftool/dbconfig/20200413-073939-marostegui.json
[07:39:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:39:46] <stashbot>	 T232446: Compress new Wikibase tables - https://phabricator.wikimedia.org/T232446
[07:40:11] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "easy to revert if we end up wanting to reinstall them. cleaning up." [puppet] - 10https://gerrit.wikimedia.org/r/585185 (https://phabricator.wikimedia.org/T247780) (owner: 10Dzahn)
[07:40:22] <wikibugs>	 (03PS2) 10Dzahn: DHCP: remove mw1254-mw1258 [puppet] - 10https://gerrit.wikimedia.org/r/585185 (https://phabricator.wikimedia.org/T247780)
[07:40:40] <vgutierrez>	 !log rolling upgrade to ats 8.0.7-rc0-1wm1 in ulsfo
[07:40:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:41:58] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Temporary pool db1111 in s8 API', diff saved to https://phabricator.wikimedia.org/P10964 and previous config saved to /var/cache/conftool/dbconfig/20200413-074158-marostegui.json
[07:42:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:43:23] <marostegui>	 !log Compress db1092 T232446
[07:43:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:44:11] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] ATS: Enable res_track_memory in cp1085 [puppet] - 10https://gerrit.wikimedia.org/r/588366 (https://phabricator.wikimedia.org/T249335) (owner: 10Vgutierrez)
[07:50:06] <vgutierrez>	 !log enable memory tracking in ats-tls on cp1085 - T249335
[07:50:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:50:13] <stashbot>	 T249335: Memory leak on ats-tls 8.0.6 - https://phabricator.wikimedia.org/T249335
[07:51:54] <wikibugs>	 10Operations: Netbox report accounting icinga alert - https://phabricator.wikimedia.org/T250053 (10ayounsi) p:05Triage→03Low
[07:55:58] <wikibugs>	 10Operations, 10ops-eqiad, 10ops-ulsfo: Netbox report coherence_rack Icinga alert - https://phabricator.wikimedia.org/T250054 (10ayounsi) p:05Triage→03Low
[07:59:41] <wikibugs>	 10Operations, 10Traffic, 10Patch-For-Review: Memory leak on ats-tls 8.0.6 - https://phabricator.wikimedia.org/T249335 (10Vgutierrez) res_memory_tracking should help a lot, this is an example report @ cp1085 after restarting ats-tls: `     Allocated      |        In-Use      | Type Size  |   Free List Name --...
[08:15:06] <marostegui>	 !log Remove grants for haproxy@10.64.37.15 from labsdb hosts T231280
[08:15:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:15:14] <stashbot>	 T231280: Remove grants for the old dbproxy hosts from the misc databases - https://phabricator.wikimedia.org/T231280
[08:18:07] <wikibugs>	 (03PS3) 10Elukey: kafkatee::instance: add types to parameters [puppet] - 10https://gerrit.wikimedia.org/r/588086
[08:18:09] <wikibugs>	 (03PS4) 10Elukey: Enable TLS encryption between kafkatee instances and Kafka [puppet] - 10https://gerrit.wikimedia.org/r/588015
[08:22:21] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] profile::kafkatee::webrequest::analytics: use ssl_array for Kafka Brokers [puppet] - 10https://gerrit.wikimedia.org/r/588085 (owner: 10Elukey)
[08:24:25] <wikibugs>	 (03CR) 10Elukey: kafkatee::instance: add types to parameters (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/588086 (owner: 10Elukey)
[09:26:03] <wikibugs>	 (03PS1) 10QChris: Add .gitreview [software/purged] - 10https://gerrit.wikimedia.org/r/588373
[09:26:05] <wikibugs>	 (03CR) 10QChris: [V: 03+2 C: 03+2] Add .gitreview [software/purged] - 10https://gerrit.wikimedia.org/r/588373 (owner: 10QChris)
[09:26:37] <mutante>	 ema: ^ there's your new repo
[09:31:30] <wikibugs>	 (03CR) 10Ayounsi: "I didn't finish the review as being able to have the same VIP on multiple devices might make the rest of my review useless." (0311 comments) [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/588036 (https://phabricator.wikimedia.org/T244153) (owner: 10CRusnov)
[09:38:33] <wikibugs>	 (03PS1) 10Dzahn: cloud/devtools: set profile::tlsproxy::envoy::capitalize_headers [puppet] - 10https://gerrit.wikimedia.org/r/588375
[09:45:08] <wikibugs>	 10Operations, 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for gr.wikimedia.org - https://phabricator.wikimedia.org/T245912 (10Urbanecm) Upps, sorry, will do next time @Marostegui!
[09:45:52] <wikibugs>	 10Operations, 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for gr.wikimedia.org - https://phabricator.wikimedia.org/T245912 (10Marostegui) No problem! Can you check my comment at: T245911#6051001 Thank you!
[09:47:41] <Urbanecm>	 !log mwscript createAndPromote.php --wiki=grwikimedia --force Gerakiw <redacted> (T245911)
[09:47:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:47:48] <stashbot>	 T245911: Create a wiki for Wikimedia Community User Group Greece - https://phabricator.wikimedia.org/T245911
[09:52:19] <Urbanecm>	 !log Rename user account Gerakiw@grwikimedia to Geraki@grwikimedia (T245911)
[09:52:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:55:57] <wikibugs>	 10Operations, 10ops-eqiad, 10ops-ulsfo, 10DC-Ops: Netbox report coherence_rack Icinga alert - https://phabricator.wikimedia.org/T250054 (10faidon)
[09:55:59] <wikibugs>	 10Operations, 10DC-Ops: Netbox report accounting icinga alert - https://phabricator.wikimedia.org/T250053 (10faidon)
[09:56:50] <wikibugs>	 10Operations, 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for gr.wikimedia.org - https://phabricator.wikimedia.org/T245912 (10Marostegui) #cloud-services-team this is ready for the views creation on labsdb1009, 1010, 1011 and 1012. I have run this: `set session s...
[10:00:38] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] sbuild: introduce module and use it in toolforge package builder [puppet] - 10https://gerrit.wikimedia.org/r/587991 (https://phabricator.wikimedia.org/T249837) (owner: 10Arturo Borrero Gonzalez)
[10:07:34] <revi>	 Is this... normal? (I was trying to do `pwb.py claimit` https://www.irccloud.com/pastebin/tlmsyyY3/wikidata-log.txt
[10:08:17] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "unbreak puppet on phabricator instances in cloud VPS devtools" [puppet] - 10https://gerrit.wikimedia.org/r/588375 (owner: 10Dzahn)
[10:08:46] <icinga-wm>	 PROBLEM - Check systemd state on mwmaint1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:10:21] <mutante>	 can't confirm. 0 units failed
[10:11:02] <mutante>	 ah, i can
[10:11:29] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: toolforge: legacy URLs: use HTTP 307/308 for the redirects [puppet] - 10https://gerrit.wikimedia.org/r/588380 (https://phabricator.wikimedia.org/T249843)
[10:12:01] <mutante>	 !log mwmaint1002 - sudo systemctl status mediawiki_job_translationnotifications-mediawikiwiki.service
[10:12:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:12:23] <wikibugs>	 10Operations, 10Wikimedia-General-or-Unknown, 10Readers-Web-Backlog (Needs Product Owner Decisions), 10SEO: Yoruba Language Wikipedia not being indexed by search engines - https://phabricator.wikimedia.org/T236241 (10ovasileva) >>! In T236241#6047510, @Aklapper wrote: > @ovasileva: Could you please check t...
[10:12:24] <icinga-wm>	 RECOVERY - Check systemd state on mwmaint1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:16:40] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] toolforge: legacy URLs: use HTTP 307/308 for the redirects [puppet] - 10https://gerrit.wikimedia.org/r/588380 (https://phabricator.wikimedia.org/T249843) (owner: 10Arturo Borrero Gonzalez)
[10:19:14] <marostegui>	 !log Kill updateSpecialPages.php --only=Fewestrevisions for s8 in mwmaint1002, the vslow host is lagging and creating errors
[10:19:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:24:58] <mutante>	 !log depooled wdqs1004 by request because of high lag
[10:25:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:28:23] <wikibugs>	 (03PS1) 10Jdrewniak: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/588383 (https://phabricator.wikimedia.org/T128546)
[10:30:04] <jouncebot>	 jan_drewniak: Your horoscope predicts another unfortunate Wikimedia Portals Update deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200413T1030).
[10:31:35] <wikibugs>	 (03CR) 10Jdrewniak: [C: 03+2] Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/588383 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak)
[10:32:33] <wikibugs>	 (03Merged) 10jenkins-bot: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/588383 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak)
[10:33:38] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] Make totp profile parameters optional [puppet] - 10https://gerrit.wikimedia.org/r/587714 (owner: 10Muehlenhoff)
[10:36:33] <logmsgbot>	 !log jdrewniak@deploy1001 Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:588383| Bumping portals to master (563985)]] (duration: 01m 00s)
[10:36:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:37:32] <logmsgbot>	 !log jdrewniak@deploy1001 Synchronized portals: Wikimedia Portals Update: [[gerrit:588383| Bumping portals to master (563985)]] (duration: 00m 58s)
[10:37:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:42:00] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/587726 (owner: 10Elukey)
[10:56:20] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] "thanks, will merge" [puppet] - 10https://gerrit.wikimedia.org/r/587988 (owner: 10Hashar)
[10:57:53] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] "thanks will merge" [puppet] - 10https://gerrit.wikimedia.org/r/587989 (owner: 10Hashar)
[10:58:05] <wikibugs>	 (03PS2) 10Jbond: admin: enhance test output for groups GID [puppet] - 10https://gerrit.wikimedia.org/r/587989 (owner: 10Hashar)
[10:58:38] <wikibugs>	 (03PS2) 10Jbond: admin: show gid in gid test error [puppet] - 10https://gerrit.wikimedia.org/r/587990 (owner: 10Hashar)
[10:59:53] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] "thanks will merge" [puppet] - 10https://gerrit.wikimedia.org/r/587990 (owner: 10Hashar)
[11:00:04] <jouncebot>	 Amir1, Lucas_WMDE, awight, and Urbanecm: Dear deployers, time to do the European Mid-day SWAT(Max 6 patches) deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200413T1100).
[11:00:04] <jouncebot>	 Zoranzoki21: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[11:00:08] <Zoranzoki21>	 Here :)
[11:02:42] <Zoranzoki21>	 P. S. I have only one patch which no needs mwdebug ;)
[11:11:11] <Urbanecm>	 Zoranzoki21: I can SWAT today!
[11:11:19] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/584615 (https://phabricator.wikimedia.org/T248860) (owner: 10Zoranzoki21)
[11:12:13] <wikibugs>	 (03Merged) 10jenkins-bot: robots.txt: Disable indexing user (sub)pages and draft-related pages on srwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/584615 (https://phabricator.wikimedia.org/T248860) (owner: 10Zoranzoki21)
[11:13:27] <Urbanecm>	 Zoranzoki21: syncing
[11:14:21] <logmsgbot>	 !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: efe2feb: robots.txt: Disable indexing user (sub)pages and draft-related pages on srwiki (T248860) (duration: 00m 58s)
[11:14:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:14:28] <stashbot>	 T248860: Disable indexing user (sub)pages and drafts on Serbian Wikipedia - https://phabricator.wikimedia.org/T248860
[11:15:26] <logmsgbot>	 !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: efe2feb: robots.txt: Disable indexing user (sub)pages and draft-related pages on srwiki (T248860; take II) (duration: 00m 58s)
[11:15:29] <Zoranzoki21>	 Cool
[11:15:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:15:34] <Urbanecm>	 Zoranzoki21: should be all done :)
[11:15:59] <Zoranzoki21>	 Yes, thanks Urbanecm.
[11:16:03] <Urbanecm>	 yw
[11:19:41] <wikibugs>	 (03PS5) 10Alexandros Kosiaris: admin: Deduplicate defaults.yaml [deployment-charts] - 10https://gerrit.wikimedia.org/r/581507
[11:19:43] <wikibugs>	 (03PS3) 10Alexandros Kosiaris: admin: deduplicate main helmfile.yaml [deployment-charts] - 10https://gerrit.wikimedia.org/r/581656
[11:19:45] <wikibugs>	 (03PS3) 10Alexandros Kosiaris: admin/namespace: Deduplicate all helmfile templates [deployment-charts] - 10https://gerrit.wikimedia.org/r/581657
[11:19:47] <wikibugs>	 (03PS3) 10Alexandros Kosiaris: admin: Default to sensible values for deploUser, namespaceName [deployment-charts] - 10https://gerrit.wikimedia.org/r/581658
[11:19:49] <wikibugs>	 (03PS3) 10Alexandros Kosiaris: admin: Remove all override files [deployment-charts] - 10https://gerrit.wikimedia.org/r/581748
[11:29:29] <wikibugs>	 (03PS1) 10Jbond: admin: show uid in uid test error [puppet] - 10https://gerrit.wikimedia.org/r/588387
[11:30:49] <wikibugs>	 (03CR) 10Cparle: [C: 03+1] MachineVision: Add MachineVisionWithholdImageList config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/588053 (https://phabricator.wikimedia.org/T249939) (owner: 10Mholloway)
[11:45:11] <wikibugs>	 10Operations, 10netops, 10User-jbond: Sporatic RST drops in the ulogd logs - https://phabricator.wikimedia.org/T238823 (10jbond) >>! In T238823#5681406, @akosiaris wrote: > Could be totally different but with @jijiki we 've seen this behavior elsewhere as well. The latest installment is T238789. Per that log...
[11:47:10] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[11:50:48] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[11:53:13] <marostegui>	 !log Deploy schema change on codfw master (lag will appear on codfw) - T250062
[11:53:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:53:19] <stashbot>	 T250062: ipb_parent_block_id_2 index on ipblocks table on s8 only - https://phabricator.wikimedia.org/T250062
[11:53:29] <logmsgbot>	 !log akosiaris@deploy1001 helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
[11:53:35] <logmsgbot>	 !log akosiaris@deploy1001 helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
[11:53:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:53:40] <logmsgbot>	 !log akosiaris@deploy1001 helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
[11:53:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:53:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:57:54] <marostegui>	 !log Deploy schema change on eqiad s8 hosts - T250062
[11:57:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:58:24] <logmsgbot>	 !log akosiaris@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
[11:58:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:03:48] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] admin: Deduplicate defaults.yaml [deployment-charts] - 10https://gerrit.wikimedia.org/r/581507 (owner: 10Alexandros Kosiaris)
[12:04:13] <wikibugs>	 (03Merged) 10jenkins-bot: admin: Deduplicate defaults.yaml [deployment-charts] - 10https://gerrit.wikimedia.org/r/581507 (owner: 10Alexandros Kosiaris)
[12:12:05] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: "Some comments. Mostly about datatypes, the [0] syntax and lookup()." (0317 comments) [puppet] - 10https://gerrit.wikimedia.org/r/588169 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott)
[12:12:21] <vgutierrez>	 !log rolling upgrade to ats 8.0.7-rc0-1wm1 in eqsin and codfw
[12:12:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:17:21] <wikibugs>	 (03PS1) 10ArielGlenn: fix listing of input files for 7z recompression [dumps] - 10https://gerrit.wikimedia.org/r/588393 (https://phabricator.wikimedia.org/T250018)
[12:31:59] <wikibugs>	 (03PS5) 10Jbond: apereo_cas: update templates login page [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/587538 (https://phabricator.wikimedia.org/T233939)
[12:39:04] <wikibugs>	 10Operations, 10Patch-For-Review, 10User-jbond: Wikimedia theme for SSO login page - https://phabricator.wikimedia.org/T233939 (10jbond) >>! In T233939#6041881, @Volker_E wrote: > We're following WCAG 2.0 level AA color contrast ratios, so something like a placeholder text color needs to provide 4.5:1 contra...
[12:39:14] <wikibugs>	 (03CR) 10Jbond: "thanks see inline" (031 comment) [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/587538 (https://phabricator.wikimedia.org/T233939) (owner: 10Jbond)
[13:04:24] <wikibugs>	 (03PS1) 10Vgutierrez: Release 8.0.7-rc0-1wm2 [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/588399 (https://phabricator.wikimedia.org/T249335)
[13:04:43] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Release 8.0.7-rc0-1wm2 [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/588399 (https://phabricator.wikimedia.org/T249335) (owner: 10Vgutierrez)
[13:08:25] <wikibugs>	 (03CR) 10Andrew Bogott: "Hello all!" [puppet] - 10https://gerrit.wikimedia.org/r/588169 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott)
[13:08:50] <wikibugs>	 (03PS2) 10Vgutierrez: Release 8.0.7-rc0-1wm2 [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/588399 (https://phabricator.wikimedia.org/T249335)
[13:15:00] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[13:18:42] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[13:25:36] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Release 8.0.7-rc0-1wm2 [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/588399 (https://phabricator.wikimedia.org/T249335) (owner: 10Vgutierrez)
[13:26:03] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: Heavily amend Description: field [debs/helmfile] (debian/buster-wikimedia) - 10https://gerrit.wikimedia.org/r/588404
[13:28:17] <wikibugs>	 (03PS3) 10Vgutierrez: Release 8.0.7-rc0-1wm2 [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/588399 (https://phabricator.wikimedia.org/T249335)
[13:30:21] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] Heavily amend Description: field [debs/helmfile] (debian/buster-wikimedia) - 10https://gerrit.wikimedia.org/r/588404 (owner: 10Alexandros Kosiaris)
[13:31:35] <wikibugs>	 (03Merged) 10jenkins-bot: Heavily amend Description: field [debs/helmfile] (debian/buster-wikimedia) - 10https://gerrit.wikimedia.org/r/588404 (owner: 10Alexandros Kosiaris)
[13:33:18] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[13:33:32] <wikibugs>	 (03PS1) 10Mholloway: MachineVision: Add MachineVisionWithholdImageList config (Beta) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/588407 (https://phabricator.wikimedia.org/T249939)
[13:34:47] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] MachineVision: Add MachineVisionWithholdImageList config (Beta) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/588407 (https://phabricator.wikimedia.org/T249939) (owner: 10Mholloway)
[13:35:48] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[13:36:56] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[13:37:38] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[13:37:47] <wikibugs>	 (03PS2) 10Mholloway: MachineVision: Add MachineVisionWithholdImageList config (Beta) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/588407 (https://phabricator.wikimedia.org/T249939)
[13:39:54] <wikibugs>	 (03CR) 10Mholloway: [C: 03+2] MachineVision: Add MachineVisionWithholdImageList config (Beta) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/588407 (https://phabricator.wikimedia.org/T249939) (owner: 10Mholloway)
[13:40:47] <wikibugs>	 (03Merged) 10jenkins-bot: MachineVision: Add MachineVisionWithholdImageList config (Beta) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/588407 (https://phabricator.wikimedia.org/T249939) (owner: 10Mholloway)
[13:41:32] <wikibugs>	 (03CR) 10Ottomata: "COOL!  Does this mean we can access this as .Values.puppet_ca_crt?" [puppet] - 10https://gerrit.wikimedia.org/r/587799 (https://phabricator.wikimedia.org/T249633) (owner: 10Hnowlan)
[13:43:09] <wikibugs>	 (03CR) 10Ottomata: "Interesting!  Does writing even work?  We might want to allow people to write results somewhere, no? Perhaps to their own Hive DBs?" [puppet] - 10https://gerrit.wikimedia.org/r/588073 (owner: 10Elukey)
[13:48:29] <wikibugs>	 (03PS1) 10Jhedden: openstack: increase labweb memcached size [puppet] - 10https://gerrit.wikimedia.org/r/588411 (https://phabricator.wikimedia.org/T145703)
[13:51:55] <wikibugs>	 (03CR) 10Jhedden: "PCC results: https://puppet-compiler.wmflabs.org/compiler1001/21877/labweb1002.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/588411 (https://phabricator.wikimedia.org/T145703) (owner: 10Jhedden)
[13:56:10] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] "> Interesting!  Does writing even work?  We might want to allow" [puppet] - 10https://gerrit.wikimedia.org/r/588073 (owner: 10Elukey)
[13:58:39] <wikibugs>	 (03PS2) 10ArielGlenn: fix listing of input files for 7z recompression [dumps] - 10https://gerrit.wikimedia.org/r/588393 (https://phabricator.wikimedia.org/T250018)
[14:02:36] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[14:04:26] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[14:16:51] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: Revert "Ignore quilt dir .pc via .gitignore" [debs/helmfile] (debian/buster-wikimedia) - 10https://gerrit.wikimedia.org/r/588413
[14:17:20] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: Fix debian/copyright [debs/helmfile] (debian/buster-wikimedia) - 10https://gerrit.wikimedia.org/r/588414
[14:17:22] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: Bump to 0.109.0 [debs/helmfile] (debian/buster-wikimedia) - 10https://gerrit.wikimedia.org/r/588415
[14:17:24] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Bump to 0.109.0 [debs/helmfile] (debian/buster-wikimedia) - 10https://gerrit.wikimedia.org/r/588415 (owner: 10Alexandros Kosiaris)
[14:18:24] <wikibugs>	 (03CR) 10Alexandros Kosiaris: "recheck" [debs/helmfile] (debian/buster-wikimedia) - 10https://gerrit.wikimedia.org/r/588415 (owner: 10Alexandros Kosiaris)
[14:18:42] <cdanis>	 kormat: 👀
[14:21:27] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] Revert "Ignore quilt dir .pc via .gitignore" [debs/helmfile] (debian/buster-wikimedia) - 10https://gerrit.wikimedia.org/r/588413 (owner: 10Alexandros Kosiaris)
[14:21:40] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] Fix debian/copyright [debs/helmfile] (debian/buster-wikimedia) - 10https://gerrit.wikimedia.org/r/588414 (owner: 10Alexandros Kosiaris)
[14:21:46] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Fix debian/copyright [debs/helmfile] (debian/buster-wikimedia) - 10https://gerrit.wikimedia.org/r/588414 (owner: 10Alexandros Kosiaris)
[14:21:51] <marostegui>	 cdanis: XDDD
[14:29:41] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] Fix debian/copyright [debs/helmfile] (debian/buster-wikimedia) - 10https://gerrit.wikimedia.org/r/588414 (owner: 10Alexandros Kosiaris)
[14:43:15] <wikibugs>	 (03PS2) 10Alexandros Kosiaris: Bump to 0.109.0 [debs/helmfile] (debian/buster-wikimedia) - 10https://gerrit.wikimedia.org/r/588415
[14:43:17] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: Merge branch 'master' into buster-wikimedia [debs/helmfile] (debian/buster-wikimedia) - 10https://gerrit.wikimedia.org/r/588418
[14:43:53] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: Replace pykube with a custom API client (033 comments) [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/586162 (https://phabricator.wikimedia.org/T197930) (owner: 10BryanDavis)
[14:43:55] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Bump to 0.109.0 [debs/helmfile] (debian/buster-wikimedia) - 10https://gerrit.wikimedia.org/r/588415 (owner: 10Alexandros Kosiaris)
[14:44:27] <wikibugs>	 (03PS3) 10Jbond: profile::mail::jumpcloud: add new class to manage jumpcloud aliases [puppet] - 10https://gerrit.wikimedia.org/r/585501 (https://phabricator.wikimedia.org/T244792)
[14:44:29] <wikibugs>	 (03PS1) 10Jbond: profile::mail::mx: add type enforcment, lookups and move defaults [puppet] - 10https://gerrit.wikimedia.org/r/588419 (https://phabricator.wikimedia.org/T244792)
[14:44:31] <wikibugs>	 (03PS1) 10Jbond: profile::mail::mx: Add toggle to enable jumpcloud integration [puppet] - 10https://gerrit.wikimedia.org/r/588420 (https://phabricator.wikimedia.org/T244792)
[14:45:09] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+1] "Harmless at worst" [puppet] - 10https://gerrit.wikimedia.org/r/588411 (https://phabricator.wikimedia.org/T145703) (owner: 10Jhedden)
[14:48:18] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] profile::mail::jumpcloud: add new class to manage jumpcloud aliases [puppet] - 10https://gerrit.wikimedia.org/r/585501 (https://phabricator.wikimedia.org/T244792) (owner: 10Jbond)
[14:49:31] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] profile::mail::mx: Add toggle to enable jumpcloud integration [puppet] - 10https://gerrit.wikimedia.org/r/588420 (https://phabricator.wikimedia.org/T244792) (owner: 10Jbond)
[14:50:30] <wikibugs>	 (03PS4) 10Jbond: profile::mail::jumpcloud: add new class to manage jumpcloud aliases [puppet] - 10https://gerrit.wikimedia.org/r/585501 (https://phabricator.wikimedia.org/T244792)
[14:51:08] <wikibugs>	 (03PS2) 10Jbond: profile::mail::mx: add type enforcment, lookups and move defaults [puppet] - 10https://gerrit.wikimedia.org/r/588419 (https://phabricator.wikimedia.org/T244792)
[14:51:49] <wikibugs>	 (03PS2) 10Jhedden: openstack: increase labweb memcached size [puppet] - 10https://gerrit.wikimedia.org/r/588411 (https://phabricator.wikimedia.org/T145703)
[14:52:32] <wikibugs>	 (03PS3) 10Alexandros Kosiaris: Bump to 0.109.0 [debs/helmfile] (debian/buster-wikimedia) - 10https://gerrit.wikimedia.org/r/588415
[14:52:51] <wikibugs>	 (03PS3) 10Jbond: profile::mail::mx: add type enforcment, lookups and move defaults [puppet] - 10https://gerrit.wikimedia.org/r/588419 (https://phabricator.wikimedia.org/T244792)
[14:53:03] <wikibugs>	 (03PS2) 10Jbond: profile::mail::mx: Add toggle to enable jumpcloud integration [puppet] - 10https://gerrit.wikimedia.org/r/588420 (https://phabricator.wikimedia.org/T244792)
[14:53:12] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Bump to 0.109.0 [debs/helmfile] (debian/buster-wikimedia) - 10https://gerrit.wikimedia.org/r/588415 (owner: 10Alexandros Kosiaris)
[14:56:37] <wikibugs>	 (03CR) 10Jhedden: [C: 03+2] openstack: increase labweb memcached size [puppet] - 10https://gerrit.wikimedia.org/r/588411 (https://phabricator.wikimedia.org/T145703) (owner: 10Jhedden)
[14:58:16] <wikibugs>	 10Operations, 10ops-ulsfo: update rack location of decom wmf5801 - https://phabricator.wikimedia.org/T249287 (10RobH)
[14:58:18] <wikibugs>	 10Operations, 10ops-eqiad, 10ops-ulsfo, 10DC-Ops: Netbox report coherence_rack Icinga alert - https://phabricator.wikimedia.org/T250054 (10RobH)
[14:58:28] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] profile::mail::mx: Add toggle to enable jumpcloud integration [puppet] - 10https://gerrit.wikimedia.org/r/588420 (https://phabricator.wikimedia.org/T244792) (owner: 10Jbond)
[15:01:27] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "thanks for the cleanup!" [puppet] - 10https://gerrit.wikimedia.org/r/588163 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott)
[15:02:37] <wikibugs>	 (03PS3) 10Jbond: profile::mail::mx: Add toggle to enable jumpcloud integration [puppet] - 10https://gerrit.wikimedia.org/r/588420 (https://phabricator.wikimedia.org/T244792)
[15:09:52] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] designate: change api_base_uri to proper HA endpoint [puppet] - 10https://gerrit.wikimedia.org/r/588176 (owner: 10Andrew Bogott)
[15:10:06] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[15:10:06] <icinga-wm>	 PROBLEM - Memory correctable errors -EDAC- on scb1001 is CRITICAL: 10 ge 4 https://wikitech.wikimedia.org/wiki/Monitoring/Memory%23Memory_correctable_errors_-EDAC- https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=scb1001&var-datasource=eqiad+prometheus/ops
[15:10:44] <wikibugs>	 10Operations, 10Core Platform Team, 10MediaWiki-Cache, 10Performance-Team, 10Traffic: Separate Cache-Control header for proxy and client - https://phabricator.wikimedia.org/T50835 (10Krinkle)
[15:10:49] <wikibugs>	 10Operations, 10Core Platform Team, 10MediaWiki-Cache, 10Performance-Team, 10Traffic: Separate Cache-Control header for proxy and client - https://phabricator.wikimedia.org/T50835 (10Krinkle) a:05Krinkle→03None
[15:13:44] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[15:14:04] <wikibugs>	 (03CR) 10BryanDavis: Replace pykube with a custom API client (033 comments) [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/586162 (https://phabricator.wikimedia.org/T197930) (owner: 10BryanDavis)
[15:16:13] <wikibugs>	 (03PS4) 10Alexandros Kosiaris: Bump to 0.109.0 [debs/helmfile] (debian/buster-wikimedia) - 10https://gerrit.wikimedia.org/r/588415
[15:16:16] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[15:16:56] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Bump to 0.109.0 [debs/helmfile] (debian/buster-wikimedia) - 10https://gerrit.wikimedia.org/r/588415 (owner: 10Alexandros Kosiaris)
[15:17:22] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[15:20:44] <wikibugs>	 (03PS1) 10Jbond: role::mail::mx: enable jumpcloud test domain [puppet] - 10https://gerrit.wikimedia.org/r/588425
[15:22:33] <wikibugs>	 (03PS2) 10Jbond: role::mail::mx: enable jumpcloud test domain [puppet] - 10https://gerrit.wikimedia.org/r/588425 (https://phabricator.wikimedia.org/T244792)
[15:23:03] <wikibugs>	 (03PS23) 10Andrew Bogott: Designate: use a list of designate hosts in hiera [puppet] - 10https://gerrit.wikimedia.org/r/588169 (https://phabricator.wikimedia.org/T249941)
[15:23:06] <wikibugs>	 (03PS1) 10Andrew Bogott: Designate: remove the coordination_host param [puppet] - 10https://gerrit.wikimedia.org/r/588426 (https://phabricator.wikimedia.org/T250087)
[15:23:34] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[15:24:17] <wikibugs>	 (03CR) 10Jbond: [C: 04-1] "I have self -1 as this needs review from kieth or someone else familiar with exim to correct my copy/paste" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/588425 (https://phabricator.wikimedia.org/T244792) (owner: 10Jbond)
[15:24:42] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[15:27:14] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[15:28:40] <wikibugs>	 (03CR) 10Krinkle: apereo_cas: update templates login page (031 comment) [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/587538 (https://phabricator.wikimedia.org/T233939) (owner: 10Jbond)
[15:30:06] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[15:32:39] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[15:33:44] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[15:34:25] <wikibugs>	 (03PS4) 10Jbond: profile::mail::mx: add type enforcment, lookups and move defaults [puppet] - 10https://gerrit.wikimedia.org/r/588419 (https://phabricator.wikimedia.org/T244792)
[15:34:53] <wikibugs>	 10Operations, 10DC-Ops, 10cloud-services-team (Kanban): labstore1005 A PCIe link training failure error on boot - https://phabricator.wikimedia.org/T169286 (10RobH) a:03Bstorm Please note this was NOT in ops-eqiad, and was likely being overlooked by onsites in eqiad due to that reason.  (It also is not ass...
[15:35:03] <wikibugs>	 (03PS4) 10Jbond: profile::mail::mx: Add toggle to enable jumpcloud integration [puppet] - 10https://gerrit.wikimedia.org/r/588420 (https://phabricator.wikimedia.org/T244792)
[15:36:16] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[15:36:38] <wikibugs>	 10Operations, 10DC-Ops: document all scs connections - https://phabricator.wikimedia.org/T175876 (10RobH) 05Open→03Resolved a:03RobH quick review of the scs devices on https://netbox.wikimedia.org/dcim/devices/?q=scs&status=active&mac_address=&has_primary_ip=&local_context_data=&virtual_chassis_member=&c...
[15:36:40] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: scs-c1-eqiad unresponsive - https://phabricator.wikimedia.org/T175625 (10RobH)
[15:38:35] <wikibugs>	 10Operations, 10DC-Ops: Wipe of spare/replacement disks - https://phabricator.wikimedia.org/T166368 (10RobH) This task is over a year old (should we resolve/reject it?)  Please note that we no longer require all disks be wiped before decom (just reuse), as we physically destroy all disks now.  I don't think ha...
[15:39:38] <wikibugs>	 (03PS6) 10Jbond: apereo_cas: update templates login page [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/587538 (https://phabricator.wikimedia.org/T233939)
[15:39:56] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[15:40:25] <wikibugs>	 10Operations, 10netops, 10User-jbond: Sporatic RST drops in the ulogd logs - https://phabricator.wikimedia.org/T238823 (10akosiaris) >>! In T238823#6051460, @jbond wrote: >>>! In T238823#5681406, @akosiaris wrote: >> Could be totally different but with @jijiki we 've seen this behavior elsewhere as well. The...
[15:40:49] <wikibugs>	 (03CR) 10Jbond: apereo_cas: update templates login page (031 comment) [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/587538 (https://phabricator.wikimedia.org/T233939) (owner: 10Jbond)
[15:44:06] <wikibugs>	 (03CR) 10Jbond: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/588419 (https://phabricator.wikimedia.org/T244792) (owner: 10Jbond)
[15:44:42] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[15:45:26] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[15:46:22] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2006 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[15:47:20] <wikibugs>	 (03PS3) 10CRusnov: icinga: Add git local changes check [puppet] - 10https://gerrit.wikimedia.org/r/588049
[15:48:06] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[15:48:49] <wikibugs>	 10Operations, 10OTRS, 10User-notice: Update OTRS to the latest stable version (6.x.x) - https://phabricator.wikimedia.org/T187984 (10akosiaris) >>! In T187984#6048219, @Gryllida wrote: > @akosiaris Thank you for volunteering with this task. Are you still interested? How has the situation changed in the last...
[15:49:20] <wikibugs>	 (03CR) 10CRusnov: [C: 03+2] icinga: Add git local changes check [puppet] - 10https://gerrit.wikimedia.org/r/588049 (owner: 10CRusnov)
[15:50:08] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[15:50:52] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[15:56:12] <marostegui>	 !log Deploy schema change on s4 codfw master - T250067
[15:56:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:56:18] <stashbot>	 T250067: user_newtalk has two indexes not renamed in s4 - https://phabricator.wikimedia.org/T250067
[15:56:18] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[15:56:40] <wikibugs>	 (03CR) 10CDanis: [C: 03+1] "LGTM overall, some questions" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/588049 (owner: 10CRusnov)
[15:57:24] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[16:01:04] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[16:01:46] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[16:05:22] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[16:07:10] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[16:09:19] <wikibugs>	 (03PS1) 10CDanis: WIP: add NIC saturation exporter [puppet] - 10https://gerrit.wikimedia.org/r/588431
[16:14:11] <wikibugs>	 (03PS5) 10Alexandros Kosiaris: Bump to 0.109.0 [debs/helmfile] (debian/buster-wikimedia) - 10https://gerrit.wikimedia.org/r/588415
[16:15:02] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Bump to 0.109.0 [debs/helmfile] (debian/buster-wikimedia) - 10https://gerrit.wikimedia.org/r/588415 (owner: 10Alexandros Kosiaris)
[16:16:18] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[16:19:12] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[16:19:22] <wikibugs>	 (03PS4) 10Andrew Bogott: designate: change api_base_uri to proper HA endpoint [puppet] - 10https://gerrit.wikimedia.org/r/588176
[16:21:02] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[16:21:44] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[16:25:34] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] designate: change api_base_uri to proper HA endpoint [puppet] - 10https://gerrit.wikimedia.org/r/588176 (owner: 10Andrew Bogott)
[16:28:56] <icinga-wm>	 RECOVERY - Host ps1-c6-eqiad is UP: PING OK - Packet loss = 0%, RTA = 1.80 ms
[16:29:00] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[16:29:50] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2003 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[16:30:06] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[16:30:44] <icinga-wm>	 RECOVERY - Host mw1335.mgmt is UP: PING OK - Packet loss = 0%, RTA = 1.54 ms
[16:30:56] <logmsgbot>	 !log mholloway-shell@deploy1001 helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
[16:31:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:31:21] <cmjohnson1>	 !log replacing msw-c6-eqiad
[16:31:24] <icinga-wm>	 RECOVERY - Host db1134.mgmt is UP: PING OK - Packet loss = 0%, RTA = 1.03 ms
[16:31:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:31:41] <XioNoX>	 !log Sample all inbound v6 traffic on cr2-eqsin
[16:31:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:32:38] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[16:32:57] <wikibugs>	 (03PS10) 10Andrew Bogott: designate: remove second_region_* hiera values [puppet] - 10https://gerrit.wikimedia.org/r/588163 (https://phabricator.wikimedia.org/T249941)
[16:33:18] <icinga-wm>	 RECOVERY - Host mw1329.mgmt is UP: PING OK - Packet loss = 0%, RTA = 0.83 ms
[16:33:28] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2003 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[16:33:36] <icinga-wm>	 RECOVERY - Host mw1331.mgmt is UP: PING OK - Packet loss = 0%, RTA = 0.81 ms
[16:33:45] <logmsgbot>	 !log mholloway-shell@deploy1001 helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
[16:33:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:34:24] <icinga-wm>	 RECOVERY - Host mw1322.mgmt is UP: PING OK - Packet loss = 0%, RTA = 0.84 ms
[16:34:24] <icinga-wm>	 RECOVERY - Host mw1336.mgmt is UP: PING OK - Packet loss = 0%, RTA = 0.81 ms
[16:34:42] <icinga-wm>	 RECOVERY - Host mw1345.mgmt is UP: PING OK - Packet loss = 0%, RTA = 1.49 ms
[16:34:46] <icinga-wm>	 RECOVERY - Host mw1325.mgmt is UP: PING OK - Packet loss = 0%, RTA = 1.45 ms
[16:34:46] <icinga-wm>	 RECOVERY - Host mw1339.mgmt is UP: PING OK - Packet loss = 0%, RTA = 1.43 ms
[16:34:48] <icinga-wm>	 RECOVERY - Host mw1340.mgmt is UP: PING OK - Packet loss = 0%, RTA = 1.64 ms
[16:34:49] <icinga-wm>	 RECOVERY - Host mw1323.mgmt is UP: PING OK - Packet loss = 0%, RTA = 1.70 ms
[16:34:49] <icinga-wm>	 RECOVERY - Host mw1326.mgmt is UP: PING OK - Packet loss = 0%, RTA = 0.78 ms
[16:34:54] <icinga-wm>	 RECOVERY - Host mw1333.mgmt is UP: PING OK - Packet loss = 0%, RTA = 0.81 ms
[16:34:58] <icinga-wm>	 RECOVERY - Host mw1330.mgmt is UP: PING OK - Packet loss = 0%, RTA = 0.83 ms
[16:34:58] <icinga-wm>	 RECOVERY - Host mw1320.mgmt is UP: PING OK - Packet loss = 0%, RTA = 1.43 ms
[16:34:58] <icinga-wm>	 RECOVERY - Host mw1332.mgmt is UP: PING OK - Packet loss = 0%, RTA = 0.77 ms
[16:34:58] <icinga-wm>	 RECOVERY - Host mw1328.mgmt is UP: PING OK - Packet loss = 0%, RTA = 1.48 ms
[16:34:58] <icinga-wm>	 RECOVERY - Host mw1342.mgmt is UP: PING OK - Packet loss = 0%, RTA = 1.53 ms
[16:34:59] <icinga-wm>	 RECOVERY - Host mw1344.mgmt is UP: PING OK - Packet loss = 0%, RTA = 1.40 ms
[16:35:28] <icinga-wm>	 RECOVERY - Host mw1337.mgmt is UP: PING OK - Packet loss = 0%, RTA = 0.79 ms
[16:35:37] <XioNoX>	 wooooo
[16:35:42] <XioNoX>	 thanks cmjohnson1 !
[16:35:42] <icinga-wm>	 RECOVERY - Host mw1343.mgmt is UP: PING OK - Packet loss = 0%, RTA = 1.42 ms
[16:35:44] <icinga-wm>	 RECOVERY - Host mw1321.mgmt is UP: PING OK - Packet loss = 0%, RTA = 1.61 ms
[16:35:48] <icinga-wm>	 RECOVERY - Host mw1319.mgmt is UP: PING OK - Packet loss = 0%, RTA = 1.50 ms
[16:35:48] <icinga-wm>	 RECOVERY - Host mw1327.mgmt is UP: PING OK - Packet loss = 0%, RTA = 1.51 ms
[16:36:05] <logmsgbot>	 !log mholloway-shell@deploy1001 helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
[16:36:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:36:12] <icinga-wm>	 RECOVERY - Host mw1324.mgmt is UP: PING OK - Packet loss = 0%, RTA = 0.75 ms
[16:36:12] <icinga-wm>	 RECOVERY - Host mw1334.mgmt is UP: PING OK - Packet loss = 0%, RTA = 0.76 ms
[16:36:12] <icinga-wm>	 RECOVERY - Host mw1348.mgmt is UP: PING OK - Packet loss = 0%, RTA = 1.89 ms
[16:36:12] <icinga-wm>	 RECOVERY - Host mw1346.mgmt is UP: PING OK - Packet loss = 0%, RTA = 1.73 ms
[16:36:12] <icinga-wm>	 RECOVERY - Host mw1338.mgmt is UP: PING OK - Packet loss = 0%, RTA = 1.52 ms
[16:36:13] <icinga-wm>	 RECOVERY - Host mw1347.mgmt is UP: PING OK - Packet loss = 0%, RTA = 1.43 ms
[16:36:19] <XioNoX>	 !log sample before any other border-in terms in eqsin
[16:36:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:37:16] <icinga-wm>	 RECOVERY - Host wdqs1010.mgmt is UP: PING OK - Packet loss = 0%, RTA = 32.64 ms
[16:37:23] <apergos>	 wut
[16:37:24] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[16:38:38] <icinga-wm>	 RECOVERY - Host bast1002.mgmt is UP: PING OK - Packet loss = 0%, RTA = 1.45 ms
[16:39:19] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] designate: remove second_region_* hiera values [puppet] - 10https://gerrit.wikimedia.org/r/588163 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott)
[16:39:26] <icinga-wm>	 RECOVERY - Host mw1341.mgmt is UP: PING OK - Packet loss = 0%, RTA = 0.81 ms
[16:40:36] <icinga-wm>	 RECOVERY - Host db1121.mgmt is UP: PING OK - Packet loss = 0%, RTA = 0.79 ms
[16:42:52] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[16:45:24] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[16:46:13] <XioNoX>	 !log sample before any other border-in terms in ulsfo
[16:46:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:46:30] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[16:47:27] <apergos>	 what are all the mgmt up alerts, does anyone know?
[16:48:03] <XioNoX>	 apergos: dead mgmt switch got replaced
[16:48:18] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[16:48:19] <apergos>	 aaahhh
[16:48:22] <apergos>	 excellent
[16:48:28] <apergos>	 tyvm!
[16:48:39] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: Netbox report accounting icinga alert - https://phabricator.wikimedia.org/T250053 (10wiki_willy) a:03Jclark-ctr Hey @Jclark-ctr - per our conversation from last Thursday, can you work on fixing these following Netbox errors for eqiad when you go onsite this week?  https...
[16:49:53] <wikibugs>	 10Operations, 10ops-eqiad, 10ops-ulsfo, 10DC-Ops: Netbox report coherence_rack Icinga alert - https://phabricator.wikimedia.org/T250054 (10wiki_willy) a:03wiki_willy
[16:50:04] <wikibugs>	 (03PS1) 10Andrew Bogott: cloudservices: add a missing ) in a ferm rule [puppet] - 10https://gerrit.wikimedia.org/r/588434 (https://phabricator.wikimedia.org/T249941)
[16:50:17] <wikibugs>	 10Operations, 10ops-eqiad: msw-a2-eqiad missing from Netbox - https://phabricator.wikimedia.org/T249685 (10Cmjohnson) 05Open→03Resolved Very odd, I fixed this
[16:50:27] <XioNoX>	 !log sample before any other border-in terms in dfw
[16:50:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:50:52] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[16:53:46] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[16:54:28] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[16:55:18] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] cloudservices: add a missing ) in a ferm rule [puppet] - 10https://gerrit.wikimedia.org/r/588434 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott)
[16:55:44] <wikibugs>	 10Operations, 10ops-eqiad, 10netops: Eqiad: C6 mgmt switch down - https://phabricator.wikimedia.org/T249309 (10Cmjohnson) 05Open→03Resolved replaced the management switch today and updated netbox with new information, keeping the same name. changed the old one to msw-c6-eqiad-old and set status to decomm...
[16:56:07] <wikibugs>	 (03PS2) 10Andrew Bogott: cloudservices: add a missing ) in a ferm rule [puppet] - 10https://gerrit.wikimedia.org/r/588434 (https://phabricator.wikimedia.org/T249941)
[16:56:16] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[16:57:00] <XioNoX>	 !log sample before any other border-in terms in esams
[16:57:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:59:14] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[16:59:56] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code={200,204} handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=
[17:00:04] <jouncebot>	 gehel and onimisionipe: Your horoscope predicts another unfortunate Wikidata Query Service weekly deploy deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200413T1700).
[17:00:37] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: ganeti1011.mgmt is un-configured (was: Puppet resolves wrong IP for Icinga host config) - https://phabricator.wikimedia.org/T249314 (10Cmjohnson) 05Stalled→03Resolved the new management switch fixed the issue.
[17:00:40] <wikibugs>	 10Operations, 10ops-eqiad, 10netops: Eqiad: C6 mgmt switch down - https://phabricator.wikimedia.org/T249309 (10Cmjohnson)
[17:01:02] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[17:01:13] <XioNoX>	 !log sample before any other border-in terms in eqiad
[17:01:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:01:30] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] Sample all inbound traffic [homer/public] - 10https://gerrit.wikimedia.org/r/577316 (https://phabricator.wikimedia.org/T246618) (owner: 10Ayounsi)
[17:01:38] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] cloudservices: add a missing ) in a ferm rule [puppet] - 10https://gerrit.wikimedia.org/r/588434 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott)
[17:01:46] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[17:02:44] <icinga-wm>	 PROBLEM - Check systemd state on cescout1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[17:09:31] <wikibugs>	 10Operations, 10ops-eqiad: restbase1025 reported DIMM issues in getsel - https://phabricator.wikimedia.org/T250027 (10Cmjohnson) I swapped the DIMM A side to B side to see if error disappears or presents itself on the same slot or if it followed the DIMM
[17:10:50] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[17:11:22] <wikibugs>	 10Operations, 10ops-eqiad: restbase1025 reported DIMM issues in getsel - https://phabricator.wikimedia.org/T250027 (10Cmjohnson) I cleared the log, this is a paste of the error  Record:      17 Date/Time:   04/12/2020 06:30:30 Source:      system Severity:    Critical Description: Multi-bit memory errors detec...
[17:11:43] <wikibugs>	 (03PS1) 10Andrew Bogott: openstack: increase labweb memcached size for codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/588439 (https://phabricator.wikimedia.org/T145703)
[17:11:56] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[17:15:02] <RhinosF1>	 gehel: around?
[17:15:12] <wikibugs>	 (03PS2) 10Andrew Bogott: Designate: remove the coordination_host param [puppet] - 10https://gerrit.wikimedia.org/r/588426 (https://phabricator.wikimedia.org/T250087)
[17:15:14] <wikibugs>	 (03PS24) 10Andrew Bogott: Designate: use a list of designate hosts in hiera [puppet] - 10https://gerrit.wikimedia.org/r/588169 (https://phabricator.wikimedia.org/T249941)
[17:15:36] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[17:16:18] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[17:16:42] <wikibugs>	 10Operations, 10DC-Ops: Wipe of spare/replacement disks - https://phabricator.wikimedia.org/T166368 (10faidon) If I understand it correctly, this task is specifically about a box that was returned to the spare pool and then was reallocated for a new purpose but kept its old data. We should //definitely// wipe...
[17:18:40] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[17:19:59] <wikibugs>	 10Operations, 10DC-Ops, 10SRE-Access-Requests: access request on cumin[1-2]001 for John Clark - https://phabricator.wikimedia.org/T249916 (10wiki_willy)
[17:21:46] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code={200,204} handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=
[17:23:54] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[17:24:10] <icinga-wm>	 RECOVERY - DNS on ganeti1011.mgmt is OK: DNS OK: 0.011 seconds response time. ganeti1011.mgmt.eqiad.wmnet returns 10.65.5.106 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[17:26:06] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[17:26:50] <wikibugs>	 10Operations, 10DC-Ops: determine/process/document bios firmware tracking/updating policies - https://phabricator.wikimedia.org/T141128 (10wiki_willy) a:03wiki_willy
[17:27:16] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2004 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[17:28:28] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[17:29:46] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2006 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[17:31:02] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[17:34:18] <gehel>	 RhinosF1: public holiday here. I'll be back tomorrow
[17:34:51] <RhinosF1>	 gehel: Sorted, was about to ask about one of them patches but saw another email saying they were both deployed
[17:35:54] <gehel>	 RhinosF1: good! Ping me if there is any issue with that federation !
[17:36:31] <wikibugs>	 (03PS3) 10Mholloway: MachineVision: Add MachineVisionWithholdImageList config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/588053 (https://phabricator.wikimedia.org/T249939)
[17:36:36] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[17:36:44] <RhinosF1>	 gehel: I’ll let you know if I hear anything
[17:37:58] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1002 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[17:38:10] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2004 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[17:39:26] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[17:39:40] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on api_appserver in eqiad on icinga1001 is CRITICAL: cluster=api_appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=api_appserver&var-m
[17:39:48] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[17:40:28] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[17:40:33] <wikibugs>	 10Operations, 10Beta-Cluster-Infrastructure, 10Traffic, 10User-DannyS712: 502 error on beta commons - https://phabricator.wikimedia.org/T250103 (10DannyS712)
[17:41:01] <wikibugs>	 10Operations, 10Beta-Cluster-Infrastructure, 10Traffic, 10User-DannyS712: 502 error on beta commons - https://phabricator.wikimedia.org/T250103 (10DannyS712)
[17:41:13] <wikibugs>	 10Operations, 10Beta-Cluster-Infrastructure, 10Traffic, 10User-DannyS712: 502 error on beta commons - https://phabricator.wikimedia.org/T250103 (10RhinosF1) Caused by a reboot to fix T246577 (again)
[17:41:16] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on api_appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=api_appserver&var-method=GET
[17:41:20] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2006 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[17:42:38] <wikibugs>	 10Operations, 10Beta-Cluster-Infrastructure, 10Traffic, 10User-DannyS712: 502 error on beta commons - https://phabricator.wikimedia.org/T250103 (10RhinosF1) 05Open→03Resolved
[17:42:52] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[17:43:58] <wikibugs>	 10Operations, 10Beta-Cluster-Infrastructure, 10Traffic, 10User-DannyS712: 502 error on beta commons - https://phabricator.wikimedia.org/T250103 (10Jdforrester-WMF) >>! In T250103#6052721, @RhinosF1 wrote: > Caused by a reboot to fix T246577 (again)  No? This task was *why* I restarted the instance.
[17:44:41] <wikibugs>	 10Operations, 10Beta-Cluster-Infrastructure, 10Traffic, 10User-DannyS712: 502 error on beta commons - https://phabricator.wikimedia.org/T250103 (10RhinosF1) >>! In T250103#6052734, @Jdforrester-WMF wrote: >>>! In T250103#6052721, @RhinosF1 wrote: >> Caused by a reboot to fix T246577 (again) >  > No? This t...
[17:45:04] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_citoid_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[17:45:22] <icinga-wm>	 RECOVERY - Check systemd state on cescout1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[17:47:16] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[17:48:36] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[17:49:59] <wikibugs>	 10Operations, 10DC-Ops: Wipe of spare/replacement disks - https://phabricator.wikimedia.org/T166368 (10RobH) >>! In T166368#6052598, @faidon wrote: > If I understand it correctly, this task is specifically about a box that was returned to the spare pool and then was reallocated for a new purpose but kept its o...
[17:50:06] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2004 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[17:51:50] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[17:52:40] <wikibugs>	 10Operations, 10DC-Ops: Wipe of spare/replacement disks - https://phabricator.wikimedia.org/T166368 (10wiki_willy) Hi @faidon - from our last conversation around this topic during the all-hands, if the onsite shredding was successful on March 20, then we could proceed with onsite shredding over drive wiping fo...
[17:54:12] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2001 is CRITICAL: /{domain}/v1/description/addition/{target} (Description addition suggestions) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[17:54:48] <icinga-wm>	 RECOVERY - Host restbase1025 is UP: PING OK - Packet loss = 0%, RTA = 0.28 ms
[17:55:56] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[17:56:30] <RhinosF1>	 jouncebot: next
[17:56:30] <jouncebot>	 In 0 hour(s) and 3 minute(s): Morning SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200413T1800)
[17:58:10] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[17:58:37] <wikibugs>	 10Operations, 10ops-eqiad: restbase1025 reported DIMM issues in getsel - https://phabricator.wikimedia.org/T250027 (10elukey) The host seems up, but the following is listed in dmesg:  ` [Mon Apr 13 17:54:20 2020] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0 [Mon Apr 13 17:54:2...
[18:00:04] <jouncebot>	 RoanKattouw, Niharika, and Urbanecm: (Dis)respected human, time to deploy Morning SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200413T1800). Please do the needful.
[18:00:04] <jouncebot>	 awight and niedzielski: A patch you scheduled for Morning SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[18:00:31] <awight>	 I can deploy :-)
[18:00:34] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2002 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:01:05] <niedzielski>	 o/ hey all
[18:01:52] <Urbanecm>	 nice awight :-)
[18:01:58] <awight>	 mholloway has an undeployed config patch, "MachineVision: Add MachineVisionWithholdImageList config (Beta)"
[18:02:22] <awight>	 ^ if anyone knows their screen name, would be nice to ping.
[18:02:31] <niedzielski>	 mdholloway: ^
[18:02:33] <awight>	 I'll just tiptoe around that for now
[18:02:35] <awight>	 :-)
[18:02:38] <awight>	 ty\
[18:02:43] <Urbanecm>	 awight: that one is beta-only, it doesn't affect production at all
[18:02:44] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code={200,204} handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=
[18:03:04] <niedzielski>	 awight: I'm actually trying to become a real SWAT deployer at some point. I have the access rights AFAIK but haven't done one yet. Would it be possible to sit in on our deployment today (or another day)?
[18:03:10] <awight>	 Urbanecm: thanks for pointing it out!
[18:03:24] <awight>	 niedzielski: Sure thing -- want to screen share perhaps?
[18:03:34] <niedzielski>	 awight: that'd be amazing!
[18:03:43] <niedzielski>	 awight: do you have a preferred service?
[18:04:14] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:05:28] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] openstack: increase labweb memcached size for codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/588439 (https://phabricator.wikimedia.org/T145703) (owner: 10Andrew Bogott)
[18:05:29] <mdholloway>	 awight: sorry about that, i haven't been doing manual deployments for beta-only config updates lately since they roll out automatically via jenkins job. 
[18:05:30] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[18:05:34] <mdholloway>	 thanks niedzielski for the ping
[18:06:36] <Urbanecm>	 mdholloway: you should pull beta-only patches onto the deployment host, to avoid further confusions :-) (no need to sync, just fetch them there)
[18:07:27] <mdholloway>	 Urbanecm: will do, thanks!
[18:08:14] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[18:09:08] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[18:09:51] <mdholloway>	 OK, should be fixed up onw
[18:09:53] <mdholloway>	 *now
[18:09:55] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Designate: remove the coordination_host param [puppet] - 10https://gerrit.wikimedia.org/r/588426 (https://phabricator.wikimedia.org/T250087) (owner: 10Andrew Bogott)
[18:13:42] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[18:14:18] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2003 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:14:36] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[18:16:04] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2003 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:16:09] <wikibugs>	 10Operations, 10serviceops, 10wikitech.wikimedia.org: Install php-ldap on all MW appservers - https://phabricator.wikimedia.org/T237889 (10Andrew) Pinging @MoritzMuehlenhoff, any objections to this?
[18:16:15] <wikibugs>	 10Operations, 10WM-Bot: wm-bot doesn't set charset=utf-8, which breaks (amongst other things) emoji rendering - https://phabricator.wikimedia.org/T250104 (10CDanis)
[18:22:37] <wikibugs>	 10Operations, 10DC-Ops: determine/process/document bios firmware tracking/updating policies - https://phabricator.wikimedia.org/T141128 (10wiki_willy) I'll take this as action item to discuss during our next staff meeting.  I gave our Dell account rep a call today inquiring about when the latest firmware/bios...
[18:23:37] <wikibugs>	 (03PS25) 10Andrew Bogott: Designate: use a list of designate hosts in hiera [puppet] - 10https://gerrit.wikimedia.org/r/588169 (https://phabricator.wikimedia.org/T249941)
[18:27:16] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2001 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:29:02] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:29:34] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Designate: use a list of designate hosts in hiera [puppet] - 10https://gerrit.wikimedia.org/r/588169 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott)
[18:30:26] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2006 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:32:10] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:33:05] <wikibugs>	 10Operations, 10CX-cxserver, 10Product-Infrastructure-Team-Backlog, 10Wikifeeds, and 4 others: service-runner apps (wikifeeds/cxserver at the least) running on kubernetes emit logs with log level 50 - https://phabricator.wikimedia.org/T239459 (10Mholloway) Just verified that wikifeeds is now using named_le...
[18:37:42] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1004 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:38:22] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[18:39:03] <wikibugs>	 (03PS1) 10Andrew Bogott: Designate: fix ferm rules for designate_hosts list [puppet] - 10https://gerrit.wikimedia.org/r/588455 (https://phabricator.wikimedia.org/T249941)
[18:39:58] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2003 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:40:02] <wikibugs>	 (03PS2) 10Andrew Bogott: Designate: fix ferm rules for designate_hosts list [puppet] - 10https://gerrit.wikimedia.org/r/588455 (https://phabricator.wikimedia.org/T249941)
[18:40:56] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1001 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:41:04] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on api_appserver in eqiad on icinga1001 is CRITICAL: cluster=api_appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=api_appserver&var-m
[18:41:06] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2005 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:41:12] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1002 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:41:26] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2006 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:41:40] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1003 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:42:25] <wikibugs>	 (03PS3) 10Andrew Bogott: Designate: fix ferm rules for designate_hosts list [puppet] - 10https://gerrit.wikimedia.org/r/588455 (https://phabricator.wikimedia.org/T249941)
[18:43:00] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:43:08] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:43:38] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2003 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:45:10] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2004 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:45:34] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2001 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:46:32] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2002 is CRITICAL: /{domain}/v1/description/translation/from/{source}/to/{target} (Description translation suggestions) timed out before a response was received: /{domain}/v1/description/addition/{target} (Description addition suggestions) timed out before a response was received: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed
[18:46:32] <icinga-wm>	 ponse was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:47:02] <icinga-wm>	 PROBLEM - Check systemd state on db2078 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[18:48:20] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:48:48] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:48:54] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1003 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:49:06] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:49:13] <logmsgbot>	 !log niedzielski@deploy1001 Synchronized php-1.35.0-wmf.27/extensions/TwoColConflict: SWAT: [[gerrit:588370|Fix double HTML escaping of "copytext" lines in the diff (T249986)]] (duration: 01m 01s)
[18:49:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:49:19] <stashbot>	 T249986: Double escaping of HTML characters in the wikitext of "copy" lines - https://phabricator.wikimedia.org/T249986
[18:49:24] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10SRE-Access-Requests: Onboarding Wolfgang Kandek - https://phabricator.wikimedia.org/T249352 (10RLazarus)
[18:50:10] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:50:10] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on api_appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=api_appserver&var-method=GET
[18:50:37] <wikibugs>	 (03PS4) 10Andrew Bogott: Designate: fix ferm rules for designate_hosts list [puppet] - 10https://gerrit.wikimedia.org/r/588455 (https://phabricator.wikimedia.org/T249941)
[18:51:06] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1396 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 1310 bytes in 0.031 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[18:51:40] <icinga-wm>	 PROBLEM - Apache HTTP on mw1396 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1310 bytes in 0.042 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[18:53:42] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:53:50] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on api_appserver in eqiad on icinga1001 is CRITICAL: cluster=api_appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=api_appserver&var-m
[18:53:56] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2005 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:54:02] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1002 is CRITICAL: /{domain}/v1/description/translation/from/{source}/to/{target} (Description translation suggestions) timed out before a response was received: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:54:12] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1004 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:54:18] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2004 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:54:30] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1003 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:54:40] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2003 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:54:44] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2001 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:55:17] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Designate: fix ferm rules for designate_hosts list [puppet] - 10https://gerrit.wikimedia.org/r/588455 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott)
[18:55:42] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:55:44] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2002 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:55:48] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:55:56] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:56:00] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:56:06] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:56:26] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2003 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:56:28] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:58:13] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10SRE-Access-Requests: Onboarding Wolfgang Kandek - https://phabricator.wikimedia.org/T249352 (10wkandek)
[18:58:19] <mdholloway>	 awight: niedzielski: o/ still working? i see there's a minerva backport on the schedule that i haven't seen come through yet
[18:59:18] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1001 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:59:22] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on api_appserver in eqiad on icinga1001 is CRITICAL: cluster=api_appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=api_appserver&var-m
[18:59:54] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1003 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:00:46] <niedzielski>	 mdholloway: yep, it's on its way. just a moment please
[19:01:01] <mdholloway>	 niedzielski: ok, no rush :)
[19:01:18] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2005 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:01:26] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1002 is CRITICAL: /{domain}/v1/description/translation/from/{source}/to/{target} (Description translation suggestions) timed out before a response was received: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:01:36] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2006 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:02:02] <awight>	 EU SWAT is going over our window, probably just a few more minutes.
[19:02:04] <wikibugs>	 (03PS1) 10Andrew Bogott: designate: fix a misplaced ) in a ferm rule [puppet] - 10https://gerrit.wikimedia.org/r/588462 (https://phabricator.wikimedia.org/T249941)
[19:02:46] <logmsgbot>	 !log niedzielski@deploy1001 Synchronized php-1.35.0-wmf.27/skins/MinervaNeue: SWAT: [[gerrit:588405|Update the icon glyph (T249864)]] (duration: 01m 00s)
[19:02:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:02:52] <stashbot>	 T249864: Section edit icon not displaying in Minerva skin - https://phabricator.wikimedia.org/T249864
[19:03:02] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on api_appserver in eqiad on icinga1001 is CRITICAL: cluster=api_appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=api_appserver&var-m
[19:03:08] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:03:24] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:04:48] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:04:52] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on api_appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=api_appserver&var-method=GET
[19:05:24] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2004 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:05:48] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[19:06:10] <niedzielski>	 !log Morning SWAT done
[19:06:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:06:34] <niedzielski>	 mdholloway: thanks for waiting! all done now
[19:06:44] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:06:50] <mdholloway>	 niedzielski: great, thanks!
[19:06:52] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:07:10] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:07:44] <niedzielski>	 Thank you so much awight for the SWAT deployment lessons!
[19:07:50] <niedzielski>	 Very helpful!!
[19:08:05] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] designate: fix a misplaced ) in a ferm rule [puppet] - 10https://gerrit.wikimedia.org/r/588462 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott)
[19:08:08] <icinga-wm>	 PROBLEM - termbox codfw on termbox.svc.codfw.wmnet is CRITICAL: /termbox (get rendered termbox) is CRITICAL: Test get rendered termbox returned the unexpected status 500 (expecting: 200) https://wikitech.wikimedia.org/wiki/WMDE/Wikidata/SSR_Service
[19:08:17] <wikibugs>	 (03CR) 10Mholloway: [C: 03+2] MachineVision: Add MachineVisionWithholdImageList config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/588053 (https://phabricator.wikimedia.org/T249939) (owner: 10Mholloway)
[19:08:32] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on api_appserver in eqiad on icinga1001 is CRITICAL: cluster=api_appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=api_appserver&var-m
[19:08:36] <awight>	 niedzielski: Great work, thanks for letting me not get my hands dirty, so I could eat cookies :-)
[19:08:59] <niedzielski>	 ;D
[19:09:00] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2006 is CRITICAL: /{domain}/v1/caption/addition/{target} (Caption addition suggestions) timed out before a response was received: /{domain}/v1/description/translation/from/{source}/to/{target} (Description translation suggestions) timed out before a response was received: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out bef
[19:09:00] <icinga-wm>	 s received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:09:58] <icinga-wm>	 RECOVERY - termbox codfw on termbox.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/WMDE/Wikidata/SSR_Service
[19:10:11] <wikibugs>	 (03Merged) 10jenkins-bot: MachineVision: Add MachineVisionWithholdImageList config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/588053 (https://phabricator.wikimedia.org/T249939) (owner: 10Mholloway)
[19:10:50] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:11:16] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2003 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:12:26] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2005 is CRITICAL: /{domain}/v1/description/addition/{target} (Description addition suggestions) timed out before a response was received: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:12:32] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1002 is CRITICAL: /{domain}/v1/caption/addition/{target} (Caption addition suggestions) timed out before a response was received: /{domain}/v1/description/addition/{target} (Description addition suggestions) timed out before a response was received: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was rece
[19:12:32] <icinga-wm>	 tech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:12:44] <icinga-wm>	 RECOVERY - Check systemd state on db2078 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[19:12:50] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2004 is CRITICAL: /{domain}/v1/description/translation/from/{source}/to/{target} (Description translation suggestions) timed out before a response was received: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:13:00] <logmsgbot>	 !log mholloway-shell@deploy1001 Synchronized wmf-config/InitialiseSettings.php: MachineVision: Add MachineVisionWithholdImageList config (T249939) (duration: 01m 03s)
[19:13:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:13:06] <stashbot>	 T249939: Don't include images of humans in Special:SuggestedTags - https://phabricator.wikimedia.org/T249939
[19:14:14] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:14:20] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:14:36] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:14:52] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2003 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:16:06] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2002 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:17:50] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:19:36] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on api_appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=api_appserver&var-method=GET
[19:20:34] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[19:21:28] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[19:23:13] <wikibugs>	 (03PS1) 10Andrew Bogott: cloud puppetmaster: fix flatten() syntax when producing cert manager list [puppet] - 10https://gerrit.wikimedia.org/r/588465 (https://phabricator.wikimedia.org/T249941)
[19:23:18] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1001 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:23:28] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2005 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:25:04] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on api_appserver in eqiad on icinga1001 is CRITICAL: cluster=api_appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=api_appserver&var-m
[19:25:06] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:25:14] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:25:24] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1002 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:25:34] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1004 is CRITICAL: /{domain}/v1/description/addition/{target} (Description addition suggestions) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:26:00] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2003 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:27:53] <wikibugs>	 (03PS2) 10Andrew Bogott: cloud puppetmaster: fix flatten() syntax when producing cert manager list [puppet] - 10https://gerrit.wikimedia.org/r/588465 (https://phabricator.wikimedia.org/T249941)
[19:27:55] <wikibugs>	 (03PS1) 10Andrew Bogott: cloud-vps: add a dummy profile::backup::director_seed: changeme value for VMs [puppet] - 10https://gerrit.wikimedia.org/r/588469
[19:27:56] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2001 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:28:44] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on api_appserver in eqiad on icinga1001 is CRITICAL: cluster=api_appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=api_appserver&var-m
[19:29:00] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2002 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:29:04] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:29:10] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:29:16] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2006 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:29:32] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1003 is CRITICAL: /{domain}/v1/description/addition/{target} (Description addition suggestions) timed out before a response was received: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:29:38] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2003 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:30:46] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:30:48] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2005 is CRITICAL: /{domain}/v1/description/addition/{target} (Description addition suggestions) timed out before a response was received: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:31:02] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:31:11] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] cloud-vps: add a dummy profile::backup::director_seed: changeme value for VMs [puppet] - 10https://gerrit.wikimedia.org/r/588469 (owner: 10Andrew Bogott)
[19:31:14] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2004 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:33:04] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:33:42] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] cloud puppetmaster: fix flatten() syntax when producing cert manager list [puppet] - 10https://gerrit.wikimedia.org/r/588465 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott)
[19:33:56] <icinga-wm>	 PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp1083 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[19:34:38] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1002 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:35:16] <wikibugs>	 10Operations, 10MediaWiki-Cache, 10Traffic: Cache not being invalidated on new edits on Vaginal steaming - https://phabricator.wikimedia.org/T250108 (10AntiCompositeNumber)
[19:36:18] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2002 is CRITICAL: /{domain}/v1/caption/addition/{target} (Caption addition suggestions) timed out before a response was received: /{domain}/v1/description/translation/from/{source}/to/{target} (Description translation suggestions) timed out before a response was received: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out bef
[19:36:18] <icinga-wm>	 s received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:37:31] <logmsgbot>	 !log mholloway-shell@deploy1001 Synchronized php-1.35.0-wmf.27/extensions/MachineVision: Add support for WITHHOLD_ALL review state (T249939) (duration: 01m 23s)
[19:37:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:37:37] <stashbot>	 T249939: Don't include images of humans in Special:SuggestedTags - https://phabricator.wikimedia.org/T249939
[19:38:34] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2004 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:38:38] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1003 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:39:58] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:41:20] <mdholloway>	 !log ran extensions/MachineVision/maintenance/withholdImages.php on testcommonswiki
[19:41:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:41:34] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on api_appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=api_appserver&var-method=GET
[19:41:50] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:41:58] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:43:58] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2006 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:44:08] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:44:24] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:46:25] <wikibugs>	 10Operations, 10Services, 10Service-deployment-requests: New Service Request 'open_nsfw' - https://phabricator.wikimedia.org/T250110 (10Chtnnh)
[19:46:38] <icinga-wm>	 RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp1083 is OK: HTTP OK: HTTP/1.0 200 OK - 22420 bytes in 0.007 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[19:46:51] <wikibugs>	 10Operations, 10Services, 10Service-deployment-requests: New Service Request 'open_nsfw' - https://phabricator.wikimedia.org/T250110 (10Chtnnh)
[19:49:58] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2001 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:50:44] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on api_appserver in eqiad on icinga1001 is CRITICAL: cluster=api_appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=api_appserver&var-m
[19:51:02] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2002 is CRITICAL: /{domain}/v1/description/addition/{target} (Description addition suggestions) timed out before a response was received: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:51:02] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2005 is CRITICAL: /{domain}/v1/caption/addition/{target} (Caption addition suggestions) timed out before a response was received: /{domain}/v1/description/translation/from/{source}/to/{target} (Description translation suggestions) timed out before a response was received: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out bef
[19:51:02] <icinga-wm>	 s received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:51:07] <mdholloway>	 !log running extensions/MachineVision/maintenance/withholdImages.php on commonswiki
[19:51:10] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1002 is CRITICAL: /{domain}/v1/description/translation/from/{source}/to/{target} (Description translation suggestions) timed out before a response was received: /{domain}/v1/description/addition/{target} (Description addition suggestions) timed out before a response was received: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed
[19:51:10] <icinga-wm>	 ponse was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:51:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:51:28] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2004 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:51:46] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2003 is CRITICAL: /{domain}/v1/description/translation/from/{source}/to/{target} (Description translation suggestions) timed out before a response was received: /{domain}/v1/description/addition/{target} (Description addition suggestions) timed out before a response was received: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed
[19:51:46] <icinga-wm>	 ponse was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:52:38] <wikibugs>	 10Operations, 10CommRel-Specialists-Support (Apr-Jun-2020): CommRel support for FY2019-2020 Q4 DC switchover - https://phabricator.wikimedia.org/T244808 (10Whatamidoing-WMF) We have a lot of requests right now, but if we get enough warning (which your team is always good about), then I think it's likely that s...
[19:53:10] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1004 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:53:28] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1003 is CRITICAL: /{domain}/v1/description/translation/from/{source}/to/{target} (Description translation suggestions) timed out before a response was received: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:54:36] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1001 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:54:40] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:56:43] <wikibugs>	 10Operations, 10serviceops, 10wikitech.wikimedia.org: Install php-ldap on all MW appservers - https://phabricator.wikimedia.org/T237889 (10bd808) Just so everyone is on the same page about this task and its parent ({T237773}), the desired long term solution ({T161859}) will //NOT// require LDAP for Wikitech....
[19:56:44] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:56:47] <mdholloway>	 !log finished running extensions/MachineVision/maintenance/withholdImages.php on commonswiki (T249939)
[19:56:48] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:56:52] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:56:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:56:54] <stashbot>	 T249939: Don't include images of humans in Special:SuggestedTags - https://phabricator.wikimedia.org/T249939
[19:56:58] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:57:04] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1003 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:57:18] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:58:22] <wikibugs>	 (03PS1) 10Wolfgang Kandek: admin: add Wolfgang Kandek to ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/588474 (https://phabricator.wikimedia.org/T249352)
[19:58:24] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:59:02] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2003 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:59:52] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on api_appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=api_appserver&var-method=GET
[20:00:04] <jouncebot>	 halfak and accraze: My dear minions, it's time we take the moon! Just kidding. Time for Services – Graphoid / Citoid / ORES deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200413T2000).
[20:00:54] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] admin: add Wolfgang Kandek to ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/588474 (https://phabricator.wikimedia.org/T249352) (owner: 10Wolfgang Kandek)
[20:02:12] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job={swagger_check_citoid_cluster_eqiad,swagger_check_cxserver_cluster_codfw} site={codfw,eqiad} https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[20:02:12] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1002 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:03:05] <wikibugs>	 10Operations, 10MediaWiki-Cache, 10Traffic: Cache not being invalidated on new edits on Vaginal steaming - https://phabricator.wikimedia.org/T250108 (10AntiCompositeNumber) After waiting a bit, I decided to try again. This is 50 minutes since the edit, and 20 minutes since the last attempt.  2. Navigate to h...
[20:03:30] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on api_appserver in eqiad on icinga1001 is CRITICAL: cluster=api_appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=api_appserver&var-m
[20:03:40] <wikibugs>	 10Operations, 10MediaWiki-Cache, 10Traffic: Cache not being invalidated on new edits on Vaginal steaming - https://phabricator.wikimedia.org/T250108 (10AntiCompositeNumber)
[20:03:50] <wikibugs>	 (03PS2) 10Ottomata: Remove now unused mediawiki/event-schemas repo [puppet] - 10https://gerrit.wikimedia.org/r/587255 (https://phabricator.wikimedia.org/T240985)
[20:03:52] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2002 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:04:00] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:04:00] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[20:05:30] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:05:42] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2005 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:07:10] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on api_appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=api_appserver&var-method=GET
[20:07:28] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:08:00] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2004 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:08:01] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] Remove now unused mediawiki/event-schemas repo [puppet] - 10https://gerrit.wikimedia.org/r/587255 (https://phabricator.wikimedia.org/T240985) (owner: 10Ottomata)
[20:08:16] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2003 is CRITICAL: /{domain}/v1/description/translation/from/{source}/to/{target} (Description translation suggestions) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:09:14] <wikibugs>	 (03PS2) 10Wolfgang Kandek: admin: add Wolfgang Kandek to ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/588474 (https://phabricator.wikimedia.org/T249352)
[20:09:18] <wikibugs>	 10Operations, 10MediaWiki-Cache, 10Traffic: Cache not being invalidated on new edits on Vaginal steaming - https://phabricator.wikimedia.org/T250108 (10CDanis)
[20:09:28] <wikibugs>	 10Operations, 10MediaWiki-Cache, 10Page Content Service, 10Product-Infrastructure-Team-Backlog, and 4 others: cache_text cluster consistently backlogged on purge requests - https://phabricator.wikimedia.org/T249325 (10CDanis)
[20:10:10] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2001 is CRITICAL: /{domain}/v1/caption/addition/{target} (Caption addition suggestions) timed out before a response was received: /{domain}/v1/caption/translation/from/{source}/to/{target} (Caption translation suggestions) timed out before a response was received: /{domain}/v1/description/translation/from/{source}/to/{target} (Description translation suggestions) timed out befor
[20:10:10] <icinga-wm>	 received: /{domain}/v1/description/addition/{target} (Description addition suggestions) timed out before a response was received: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:11:02] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1001 is CRITICAL: /{domain}/v1/caption/addition/{target} (Caption addition suggestions) timed out before a response was received: /{domain}/v1/caption/translation/from/{source}/to/{target} (Caption translation suggestions) timed out before a response was received: /{domain}/v1/description/translation/from/{source}/to/{target} (Description translation suggestions) timed out befor
[20:11:02] <icinga-wm>	 received: /{domain}/v1/description/addition/{target} (Description addition suggestions) timed out before a response was received: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - bad article title) timed out before a response was received https://wikitech.wikimedia.org/
[20:11:02] <icinga-wm>	 itoring/recommendation_api
[20:11:32] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2006 is CRITICAL: /{domain}/v1/caption/addition/{target} (Caption addition suggestions) timed out before a response was received: /{domain}/v1/caption/translation/from/{source}/to/{target} (Caption translation suggestions) timed out before a response was received: /{domain}/v1/description/translation/from/{source}/to/{target} (Description translation suggestions) timed out befor
[20:11:32] <icinga-wm>	 received: /{domain}/v1/description/addition/{target} (Description addition suggestions) timed out before a response was received: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:11:48] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1003 is CRITICAL: /{domain}/v1/caption/addition/{target} (Caption addition suggestions) timed out before a response was received: /{domain}/v1/caption/translation/from/{source}/to/{target} (Caption translation suggestions) timed out before a response was received: /{domain}/v1/description/translation/from/{source}/to/{target} (Description translation suggestions) timed out befor
[20:11:48] <icinga-wm>	 received: /{domain}/v1/description/addition/{target} (Description addition suggestions) timed out before a response was received: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - bad article title) timed out before a response was received https://wikitech.wikimedia.org/
[20:11:48] <icinga-wm>	 itoring/recommendation_api
[20:11:48] <icinga-wm>	 PROBLEM - proton endpoints health on proton2002 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/proton
[20:12:40] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on api_appserver in eqiad on icinga1001 is CRITICAL: cluster=api_appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=api_appserver&var-m
[20:12:42] <icinga-wm>	 PROBLEM - mobileapps endpoints health on scb2002 is CRITICAL: /{domain}/v1/page/metadata/{title} (retrieve extended metadata for Video article on English Wikipedia) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps
[20:13:02] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2002 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:13:02] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1005 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[20:13:02] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1004 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[20:13:14] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1002 is CRITICAL: /{domain}/v1/description/translation/from/{source}/to/{target} (Description translation suggestions) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:13:20] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1004 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:13:34] <icinga-wm>	 RECOVERY - proton endpoints health on proton2002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/proton
[20:13:36] <icinga-wm>	 PROBLEM - High average POST latency for mw requests on api_appserver in eqiad on icinga1001 is CRITICAL: cluster=api_appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=api_appserver&var-
[20:14:04] <wikibugs>	 (03CR) 10RLazarus: [C: 03+2] admin: add Wolfgang Kandek to ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/588474 (https://phabricator.wikimedia.org/T249352) (owner: 10Wolfgang Kandek)
[20:14:26] <icinga-wm>	 RECOVERY - mobileapps endpoints health on scb2002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps
[20:14:38] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:14:46] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[20:14:46] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[20:14:46] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:14:58] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:15:04] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:15:10] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:15:20] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1003 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:15:24] <icinga-wm>	 RECOVERY - High average POST latency for mw requests on api_appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=api_appserver&var-method=POST
[20:15:30] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2003 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:15:36] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:16:18] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on api_appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=api_appserver&var-method=GET
[20:16:38] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:17:04] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:20:18] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2002 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:20:24] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_restbase_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[20:20:36] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1004 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:21:02] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2003 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:21:08] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2001 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:21:44] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on api_appserver in eqiad on icinga1001 is CRITICAL: cluster=api_appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=api_appserver&var-m
[20:22:12] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[20:22:20] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:22:42] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1003 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:22:46] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2003 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:22:56] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:23:03] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10SRE-Access-Requests, 10Patch-For-Review: Onboarding Wolfgang Kandek - https://phabricator.wikimedia.org/T249352 (10wkandek)
[20:23:25] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10SRE-Access-Requests, 10Patch-For-Review: Onboarding Wolfgang Kandek - https://phabricator.wikimedia.org/T249352 (10wkandek)
[20:24:26] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1003 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:25:22] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on api_appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=api_appserver&var-method=GET
[20:25:42] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:29:02] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[20:31:16] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2005 is CRITICAL: /{domain}/v1/description/translation/from/{source}/to/{target} (Description translation suggestions) timed out before a response was received: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:31:32] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1004 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:31:58] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2003 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:32:38] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on api_appserver in eqiad on icinga1001 is CRITICAL: cluster=api_appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=api_appserver&var-m
[20:33:02] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:33:16] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1002 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:34:54] <icinga-wm>	 PROBLEM - termbox eqiad on termbox.svc.eqiad.wmnet is CRITICAL: /termbox (get rendered termbox) is CRITICAL: Test get rendered termbox returned the unexpected status 500 (expecting: 200) https://wikitech.wikimedia.org/wiki/WMDE/Wikidata/SSR_Service
[20:35:08] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:35:12] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2006 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:35:24] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2004 is CRITICAL: /{domain}/v1/description/addition/{target} (Description addition suggestions) timed out before a response was received: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:35:32] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2003 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:35:42] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2001 is CRITICAL: /{domain}/v1/description/translation/from/{source}/to/{target} (Description translation suggestions) timed out before a response was received: /{domain}/v1/description/addition/{target} (Description addition suggestions) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:36:42] <icinga-wm>	 RECOVERY - termbox eqiad on termbox.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/WMDE/Wikidata/SSR_Service
[20:36:54] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:36:58] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:37:14] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:37:28] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:37:42] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics: (Need by: TBD) rack/setup/install kafka-jumbo100[789].eqiad.wmnet - https://phabricator.wikimedia.org/T244506 (10Cmjohnson) These are on 1G racks.  If you need 10G they will have to be moved.
[20:39:56] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on api_appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=api_appserver&var-method=GET
[20:40:47] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics: (Need by: TBD) rack/setup/install kafka-jumbo100[789].eqiad.wmnet - https://phabricator.wikimedia.org/T244506 (10elukey) >>! In T244506#6053265, @Cmjohnson wrote: > These are on 1G racks.  If you need 10G they will have to be moved.   Yep we'd need 10G, but regardless...
[20:40:54] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[20:41:06] <wikibugs>	 10Operations, 10MediaWiki-Cache, 10Traffic: Cache not being invalidated on new edits on Vaginal steaming - https://phabricator.wikimedia.org/T250108 (10Peachey88)
[20:41:08] <wikibugs>	 (03PS2) 10Mholloway: MachineVision label blacklist updates, 2020-04-09 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/587994 (https://phabricator.wikimedia.org/T249895)
[20:41:27] <wikibugs>	 (03CR) 10Hashar: "Thanks :] The error output was really confusing previously." [puppet] - 10https://gerrit.wikimedia.org/r/587990 (owner: 10Hashar)
[20:42:19] <wikibugs>	 (03PS3) 10Mholloway: MachineVision label blocklist updates, 2020-04-13 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/587994 (https://phabricator.wikimedia.org/T249895)
[20:44:45] <wikibugs>	 (03CR) 10Mholloway: [C: 03+2] MachineVision label blocklist updates, 2020-04-13 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/587994 (https://phabricator.wikimedia.org/T249895) (owner: 10Mholloway)
[20:46:01] <wikibugs>	 (03Merged) 10jenkins-bot: MachineVision label blocklist updates, 2020-04-13 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/587994 (https://phabricator.wikimedia.org/T249895) (owner: 10Mholloway)
[20:46:24] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[20:49:00] <logmsgbot>	 !log mholloway-shell@deploy1001 Synchronized wmf-config/InitialiseSettings.php: MachineVision blocklist update (T249895) (duration: 00m 59s)
[20:49:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:49:08] <stashbot>	 T249895: CAT blocklist update, 2020-04-09 - https://phabricator.wikimedia.org/T249895
[20:49:48] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1002 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:53:24] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:53:53] <wikibugs>	 10Operations, 10Internet-Archive, 10observability: Implement an accurate and easy to understand status page for all wikis - https://phabricator.wikimedia.org/T202061 (10Quiddity)
[20:55:34] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[20:55:35] <wikibugs>	 10Operations, 10observability: Implement an accurate and easy to understand status page for all wikis - https://phabricator.wikimedia.org/T202061 (10Quiddity)
[20:56:26] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[21:00:04] <jouncebot>	 Reedy and sbassett: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for Weekly Security deployment window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200413T2100).
[21:02:52] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[21:03:44] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code={200,204} handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=
[21:04:28] <wikibugs>	 10Operations, 10serviceops, 10wikitech.wikimedia.org: Install php-ldap on all MW appservers - https://phabricator.wikimedia.org/T237889 (10Joe)  I don't know enough about php-ldap at the moment to have an opinion. In itself, adding a php extension to production is a big deal, but it's also easy to undo.   Se...
[21:07:13] <wikibugs>	 10Operations, 10serviceops, 10wikitech.wikimedia.org: Install php-ldap on all MW appservers - https://phabricator.wikimedia.org/T237889 (10bd808) >>! In T237889#6053313, @Joe wrote: > Also: how temporary? Do you have a tentative timeline for transitioning wikitech to SUL?  Geologically short, but maybe not s...
[21:07:24] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[21:12:00] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[21:12:52] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[21:13:50] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[21:14:42] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[21:20:14] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[21:21:10] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[21:23:00] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[21:23:52] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[21:25:20] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2001 is CRITICAL: /{domain}/v1/description/addition/{target} (Description addition suggestions) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[21:26:21] <wikibugs>	 10Operations, 10serviceops: Request for a in-memory caching data set for caching research - https://phabricator.wikimedia.org/T240503 (10leila)
[21:27:04] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[21:27:09] <wikibugs>	 10Operations, 10serviceops: Request for a in-memory caching data set for caching research - https://phabricator.wikimedia.org/T240503 (10leila) I removed the Research tag as it refers to the work of the research team in WMF. However, if I can be of any help to the SRE team with this particular request, please...
[21:32:22] <logmsgbot>	 !log mholloway-shell@deploy1001 Synchronized php-1.35.0-wmf.27/extensions/MachineVision: Add script to apply blacklist to current labels (T249273) (duration: 00m 58s)
[21:32:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:32:29] <stashbot>	 T249273: [S/M]Explore ways to apply CAT blacklist updates to previously tagged images - https://phabricator.wikimedia.org/T249273
[21:32:32] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Password reset for admin of wikiwomencamp mailing list - https://phabricator.wikimedia.org/T250035 (10Quiddity) 05Open→03Resolved a:03Quiddity Done. New pw sent to the address listed at https://lists.wikimedia.org/mailman/listinfo/wikiwomencamp :)
[21:33:31] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+1] clouddb.sql.erb: Add GRANTs file [puppet] - 10https://gerrit.wikimedia.org/r/588202 (owner: 10Marostegui)
[21:34:40] <mdholloway>	 !log ran extensions/MachineVision/maintenance/removeBlacklistedSuggestions.php on testcommonswiki
[21:34:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:43:21] <mdholloway>	 !log ran extensions/MachineVision/maintenance/removeBlacklistedSuggestions.php on commonswiki (T249273)
[21:43:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:43:27] <stashbot>	 T249273: [S/M]Explore ways to apply CAT blacklist updates to previously tagged images - https://phabricator.wikimedia.org/T249273
[21:58:54] <wikibugs>	 (03PS1) 10CDanis: depool codfw, connectivity issues? [dns] - 10https://gerrit.wikimedia.org/r/588502
[22:04:44] <wikibugs>	 10Operations, 10Research, 10Traffic: Set up git-driven static microsite for wikiworkshop.org - https://phabricator.wikimedia.org/T242374 (10leila) @BBlack can we close this task?
[22:08:05] <wikibugs>	 (03CR) 10CDanis: [C: 03+2] depool codfw, connectivity issues? [dns] - 10https://gerrit.wikimedia.org/r/588502 (owner: 10CDanis)
[22:08:42] <cdanis>	 !log depool codfw
[22:08:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:15:00] <wikibugs>	 10Operations, 10Analytics, 10Research, 10Traffic: Wikipedia Accessibility, check false positives and false negatives of traffic alarms - https://phabricator.wikimedia.org/T245166 (10leila) @Nuria do you need our team's support in any way for this task? (I'm reviewing our tasks in Staged.)
[22:19:06] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - elastic1052-production-search-psi-eqiad on elastic1052 is CRITICAL: 116.9 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-eqiad&var-instance=elastic1052&panelId=37
[22:35:20] <ebernhardson>	 !log restart elasticsearch_6@production-search-psi-eqiad on elastic1052
[22:35:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:35:58] <ebernhardson>	 !log restart elasticsearch_6@production-search-psi-eqiad on elastic1052 for excessive old gc over last few hours
[22:36:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:39:53] <wikibugs>	 (03PS1) 10CDanis: Revert "depool codfw, connectivity issues?" [dns] - 10https://gerrit.wikimedia.org/r/588505
[22:41:16] <wikibugs>	 (03CR) 10CDanis: [C: 03+2] Revert "depool codfw, connectivity issues?" [dns] - 10https://gerrit.wikimedia.org/r/588505 (owner: 10CDanis)
[22:41:37] <cdanis>	 !log repool codfw
[22:41:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:48:18] <icinga-wm>	 RECOVERY - Rate of JVM GC Old generation-s runs - elastic1052-production-search-psi-eqiad on elastic1052 is OK: (C)100 gt (W)80 gt 79.32 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-eqiad&var-instance=elastic1052&panelId=37
[23:00:04] <jouncebot>	 RoanKattouw, Niharika, and Urbanecm: It is that lovely time of the day again! You are hereby commanded to deploy Evening SWAT(Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200413T2300).
[23:00:04] <jouncebot>	 No GERRIT patches in the queue for this window AFAICS.
[23:05:10] <wikibugs>	 (03CR) 10RLazarus: "Looks good! Two structural suggestions (first and third comments) and I don't feel super strongly about either -- if you decided to keep b" (0310 comments) [puppet] - 10https://gerrit.wikimedia.org/r/588431 (owner: 10CDanis)
[23:06:29] <wikibugs>	 (03PS1) 10Mholloway: MachineVision: Withholding list additions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/588508 (https://phabricator.wikimedia.org/T249939)
[23:09:08] <wikibugs>	 (03CR) 10Mholloway: [C: 03+2] MachineVision: Withholding list additions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/588508 (https://phabricator.wikimedia.org/T249939) (owner: 10Mholloway)
[23:10:19] <wikibugs>	 (03Merged) 10jenkins-bot: MachineVision: Withholding list additions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/588508 (https://phabricator.wikimedia.org/T249939) (owner: 10Mholloway)
[23:14:36] <logmsgbot>	 !log mholloway-shell@deploy1001 Synchronized wmf-config/InitialiseSettings.php: MachineVision withholding list additions (T249939) (duration: 00m 59s)
[23:14:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:14:44] <stashbot>	 T249939: Don't include images of humans in Special:SuggestedTags - https://phabricator.wikimedia.org/T249939
[23:24:31] <mdholloway>	 !log re-ran extensions/MachineVision/maintenance/withholdImages.php on commonswiki
[23:24:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:38:34] <wikibugs>	 10Operations, 10Research, 10Traffic: Set up git-driven static microsite for wikiworkshop.org - https://phabricator.wikimedia.org/T242374 (10bmansurov) 05Open→03Resolved Yes.