[00:01:38] <wikibugs>	 10Operations, 10ops-codfw, 10decommission-hardware, 10serviceops: decommission mc2028.codfw.wmnet - https://phabricator.wikimedia.org/T261168 (10Papaul)
[00:02:26] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at eqiad on icinga1001 is CRITICAL: 58.25 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[00:03:04] <cdanis>	 all of above expected
[00:03:10] <cdanis>	 eqiad traffic drop because of the geo-remapping involved
[00:04:39] <wikibugs>	 10Operations, 10Documentation: Improve documentation for mirrors.wikimedia.org - https://phabricator.wikimedia.org/T179856 (10Dzahn) @Quiddity I made some more edits to the page to give it more structure and add the things you listed. Why, who, where, added the links.. etc.  Good enough to resolve?
[00:08:04] <cdanis>	 !log T259621 cdanis@re0.cr3-esams> request vmhost software add /var/tmp/junos-vmhost-install-mx-x86-64-17.3R3-S8.1.tgz re1 
[00:08:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:11:08] <wikibugs>	 (03CR) 10BryanDavis: "PCC output: https://puppet-compiler.wmflabs.org/compiler1003/24635/" [puppet] - 10https://gerrit.wikimedia.org/r/622237 (https://phabricator.wikimedia.org/T251628) (owner: 10BryanDavis)
[00:14:05] <cdanis>	 !log T259621 cdanis@re0.cr3-esams> request vmhost reboot re1 
[00:14:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:18:09] <cdanis>	 !log T259621 cdanis@re0.cr3-esams> request chassis routing-engine master switch 
[00:18:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:20:10] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at esams on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[00:22:14] <icinga-wm>	 PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 84, down: 4, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[00:22:41] <icinga-wm>	 PROBLEM - OSPF status on cr3-knams is CRITICAL: OSPFv2: 3/4 UP : OSPFv3: 3/4 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[00:23:57] <cdanis>	 also expected
[00:24:05] <cdanis>	 !log T259621 cdanis@re1.cr3-esams> request vmhost software add /var/tmp/junos-vmhost-install-mx-x86-64-17.3R3-S8.1.tgz re0 
[00:24:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:24:11] <icinga-wm>	 RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 92, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[00:24:40] <icinga-wm>	 RECOVERY - OSPF status on cr3-knams is OK: OSPFv2: 4/4 UP : OSPFv3: 4/4 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[00:27:52] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at eqiad on icinga1001 is OK: (C)60 le (W)70 le 71.68 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[00:30:50] <cdanis>	 !log T259621 cdanis@re1.cr3-esams> request vmhost reboot re0 
[00:30:52] <wikibugs>	 10Operations, 10Documentation: Improve documentation for mirrors.wikimedia.org - https://phabricator.wikimedia.org/T179856 (10Quiddity) 05Open→03Resolved Looks great! Thank you :)
[00:30:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:36:45] <cdanis>	 !log T259621 cdanis@re1.cr3-esams> request chassis routing-engine master switch 
[00:36:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:41:16] <icinga-wm>	 PROBLEM - OSPF status on cr2-eqiad is CRITICAL: OSPFv2: 5/6 UP : OSPFv3: 5/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[00:43:14] <icinga-wm>	 RECOVERY - OSPF status on cr2-eqiad is OK: OSPFv2: 6/6 UP : OSPFv3: 6/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[00:56:58] <cdanis>	 !log T259621 ❌cdanis@cumin1001.eqiad.wmnet ~ 🕘🍺 homer 'cr*' commit 'drain cr2-esams transport link'
[00:57:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:07:15] <cdanis>	 !log cdanis@re0.cr2-esams> request system software add validate re1 /var/tmp/junos-vmhost-install-mx-x86-64-17.3R3-S8.1.tgz 
[01:07:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:11:06] <cdanis>	 !log T259621 wrong junos version was staged on cr2-esams, abandoning this attempt and putting back in service
[01:11:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:16:10] <wikibugs>	 (03PS1) 10CDanis: Revert "depool esams for router upgrades" [dns] - 10https://gerrit.wikimedia.org/r/622208
[01:16:22] <wikibugs>	 (03CR) 10CDanis: [C: 03+2] Revert "depool esams for router upgrades" [dns] - 10https://gerrit.wikimedia.org/r/622208 (owner: 10CDanis)
[01:17:12] <cdanis>	 !log repool esams
[01:17:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:27:16] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at codfw on icinga1001 is CRITICAL: 51.43 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[01:35:40] <cdanis>	 expected
[01:35:51] <cdanis>	 thinks look copacetic, signing off for now 
[01:45:33] <wikibugs>	 (03CR) 10Dave Pifke: [C: 03+1] webperf: add data types to profiles [puppet] - 10https://gerrit.wikimedia.org/r/621756 (owner: 10Dzahn)
[01:58:38] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at codfw on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[02:07:12] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/1.36.0-wmf.6 [core] (wmf/1.36.0-wmf.6) - 10https://gerrit.wikimedia.org/r/622250
[02:13:46] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=205 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[02:15:42] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[03:47:55] <wikibugs>	 (03PS1) 10KartikMistry: Enable ContentTranslation as a default tool in Assamese and Burmese WPs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622257 (https://phabricator.wikimedia.org/T258503)
[04:37:38] <wikibugs>	 10Operations, 10serviceops, 10Platform Team Workboards (Clinic Duty Team): PHP microservice for containerized shell execution - https://phabricator.wikimedia.org/T260330 (10tstarling) Has anyone got an idea for giving the HMAC key to the server without allowing the command to have access to it? Otherwise an...
[05:03:00] <wikibugs>	 (03PS1) 10Marostegui: db1092,db1084: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/622259
[05:04:52] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1084,db1092 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12332 and previous config saved to /var/cache/conftool/dbconfig/20200825-050451-marostegui.json
[05:04:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:05:06] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db1092,db1084: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/622259 (owner: 10Marostegui)
[05:10:02] <marostegui>	 !log Deploy MCR schema change on s1 codfw, this will create lag on s1 codfw - T238966
[05:10:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:10:07] <stashbot>	 T238966: Apply updates for MCR, actor migration, and content migration, to production wikis. - https://phabricator.wikimedia.org/T238966
[05:11:06] <marostegui>	 !log Remove revisions triggers from db2094:3311 T238966
[05:11:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:13:28] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1084,db1092 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12333 and previous config saved to /var/cache/conftool/dbconfig/20200825-051327-marostegui.json
[05:13:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:21:12] <wikibugs>	 10Operations, 10observability, 10Patch-For-Review, 10good first task: nagios-nrpe-server.service: systemd unit references path below legacy directory /var/run/ - https://phabricator.wikimedia.org/T252990 (10MoritzMuehlenhoff) Looking at git history, this service unit is shipped via Puppet since nagios-nrpe...
[05:21:27] <moritzm>	 !log installing Java security updates on relforge*
[05:21:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:26:03] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1084,db1092 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12334 and previous config saved to /var/cache/conftool/dbconfig/20200825-052602-marostegui.json
[05:26:03] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 51 probes of 556 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[05:26:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:31:23] <icinga-wm>	 RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 49 probes of 556 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[05:38:02] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Fully repool db1084,db1092 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12335 and previous config saved to /var/cache/conftool/dbconfig/20200825-053801-marostegui.json
[05:38:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:38:57] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1111, db1118 for MCR change', diff saved to https://phabricator.wikimedia.org/P12336 and previous config saved to /var/cache/conftool/dbconfig/20200825-053856-marostegui.json
[05:38:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:41:31] <wikibugs>	 (03PS1) 10Marostegui: db1128: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/622260 (https://phabricator.wikimedia.org/T260324)
[05:42:43] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db1128: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/622260 (https://phabricator.wikimedia.org/T260324) (owner: 10Marostegui)
[05:51:52] <wikibugs>	 (03PS2) 10DannyS712: Branch commit for wmf/1.36.0-wmf.6 [core] (wmf/1.36.0-wmf.6) - 10https://gerrit.wikimedia.org/r/622250 (https://phabricator.wikimedia.org/T257974) (owner: 10TrainBranchBot)
[05:53:23] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] toolforge: Remove jessie conditionals [puppet] - 10https://gerrit.wikimedia.org/r/617995 (owner: 10Muehlenhoff)
[05:54:11] <wikibugs>	 (03PS6) 10Muehlenhoff: Disable backports on stretch for production [puppet] - 10https://gerrit.wikimedia.org/r/613611 (https://phabricator.wikimedia.org/T256877)
[05:54:34] <wikibugs>	 (03PS1) 10Ayounsi: LibreNMS report, whitelist c2l54ce-ycmfam90 PDUs [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/622261
[05:55:31] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] LibreNMS report, whitelist c2l54ce-ycmfam90 PDUs [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/622261 (owner: 10Ayounsi)
[06:08:33] <wikibugs>	 (03CR) 10Muehlenhoff: ldap: remove jessie support (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/621372 (owner: 10Dzahn)
[06:10:47] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 04-1] "The scb hosts still use that class on jessie, so this would lead to failing Puppet runs there." [puppet] - 10https://gerrit.wikimedia.org/r/621374 (owner: 10Dzahn)
[06:20:18] <wikibugs>	 (03PS3) 10ZPapierski: Multiple instances of msearch_daemon [puppet] - 10https://gerrit.wikimedia.org/r/621988 (https://phabricator.wikimedia.org/T260305)
[06:20:39] <logmsgbot>	 !log ayounsi@cumin1001 START - Cookbook sre.network.prepare-upgrade
[06:20:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:25:43] <wikibugs>	 10Operations, 10User-MoritzMuehlenhoff, 10User-jbond: Updated java security policy in OpenJDK 8 u252 - https://phabricator.wikimedia.org/T251493 (10MoritzMuehlenhoff) I think this can be closed, given that 593467 is merged?
[06:26:39] <wikibugs>	 (03PS1) 10Marostegui: wmnet: Decrease m5-master TTL to 1M [dns] - 10https://gerrit.wikimedia.org/r/622266 (https://phabricator.wikimedia.org/T260324)
[06:37:35] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] Scap: git_fat -> git_binary_manager [software/cassandra-twcs] - 10https://gerrit.wikimedia.org/r/404228 (https://phabricator.wikimedia.org/T184882) (owner: 10Thcipriani)
[06:37:59] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] Scap: git_fat -> git_binary_manager [software/cassandra-metrics-collector] - 10https://gerrit.wikimedia.org/r/404226 (https://phabricator.wikimedia.org/T184882) (owner: 10Thcipriani)
[06:39:56] <wikibugs>	 (03PS1) 10Ayounsi: Change eqord ASN to 65020 [homer/public] - 10https://gerrit.wikimedia.org/r/622268 (https://phabricator.wikimedia.org/T259593)
[06:39:58] <wikibugs>	 (03PS5) 10Giuseppe Lavagetto: Test deployments with helmfile lint [deployment-charts] - 10https://gerrit.wikimedia.org/r/620934 (https://phabricator.wikimedia.org/T258572)
[06:41:57] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Test deployments with helmfile lint [deployment-charts] - 10https://gerrit.wikimedia.org/r/620934 (https://phabricator.wikimedia.org/T258572) (owner: 10Giuseppe Lavagetto)
[06:43:18] <wikibugs>	 (03PS1) 10Ayounsi: Puppet: change eqord ASN to 65020 [puppet] - 10https://gerrit.wikimedia.org/r/622269 (https://phabricator.wikimedia.org/T259593)
[06:43:55] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: Test deployments with helmfile lint (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/620934 (https://phabricator.wikimedia.org/T258572) (owner: 10Giuseppe Lavagetto)
[06:45:22] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] Puppet: change eqord ASN to 65020 [puppet] - 10https://gerrit.wikimedia.org/r/622269 (https://phabricator.wikimedia.org/T259593) (owner: 10Ayounsi)
[06:45:44] <icinga-wm>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:57:57] <wikibugs>	 10Operations, 10observability, 10Patch-For-Review, 10good first task: nagios-nrpe-server.service: systemd unit references path below legacy directory /var/run/ - https://phabricator.wikimedia.org/T252990 (10ema) >>! In T252990#6407587, @Southparkfan wrote: > I have uploaded a new patch using /run on all se...
[07:03:28] <icinga-wm>	 PROBLEM - Check the last execution of php7.2-fpm_check_restart on mwdebug1001 is CRITICAL: CRITICAL: Status of the systemd unit php7.2-fpm_check_restart https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[07:03:33] <wikibugs>	 (03PS4) 10ZPapierski: Multiple instances of msearch_daemon [puppet] - 10https://gerrit.wikimedia.org/r/621988 (https://phabricator.wikimedia.org/T260305)
[07:04:31] <dcausse>	 !log restartint blazegraph on wdqs1005 (T242453)
[07:04:34] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Multiple instances of msearch_daemon [puppet] - 10https://gerrit.wikimedia.org/r/621988 (https://phabricator.wikimedia.org/T260305) (owner: 10ZPapierski)
[07:04:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:04:36] <stashbot>	 T242453: Deadlock in blazegraph blocking all queries and updates - https://phabricator.wikimedia.org/T242453
[07:05:14] <icinga-wm>	 PROBLEM - Check systemd state on mwdebug1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[07:05:28] <icinga-wm>	 PROBLEM - Query Service HTTP Port on wdqs1005 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 380 bytes in 0.001 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service
[07:07:26] <icinga-wm>	 RECOVERY - Query Service HTTP Port on wdqs1005 is OK: HTTP OK: HTTP/1.1 200 OK - 448 bytes in 0.025 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service
[07:07:46] <icinga-wm>	 PROBLEM - WDQS high update lag on wdqs1005 is CRITICAL: 6.267e+04 ge 4.32e+04 https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook%23Update_lag https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen
[07:09:09] <dcausse>	 !log depooling wdqs1005 (high lag)
[07:09:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:13:27] <marostegui>	 !log Upgrade MySQL on dbstore1004
[07:13:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:22:21] <logmsgbot>	 !log ayounsi@cumin1001 END (FAIL) - Cookbook sre.network.prepare-upgrade (exit_code=1)
[07:22:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:24:56] <wikibugs>	 (03PS5) 10ZPapierski: Multiple instances of msearch_daemon [puppet] - 10https://gerrit.wikimedia.org/r/621988 (https://phabricator.wikimedia.org/T260305)
[07:28:27] <wikibugs>	 (03PS6) 10ZPapierski: Multiple instances of msearch_daemon [puppet] - 10https://gerrit.wikimedia.org/r/621988 (https://phabricator.wikimedia.org/T260305)
[07:29:26] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Multiple instances of msearch_daemon [puppet] - 10https://gerrit.wikimedia.org/r/621988 (https://phabricator.wikimedia.org/T260305) (owner: 10ZPapierski)
[07:31:07] <wikibugs>	 (03PS7) 10ZPapierski: Multiple instances of msearch_daemon [puppet] - 10https://gerrit.wikimedia.org/r/621988 (https://phabricator.wikimedia.org/T260305)
[07:33:58] <icinga-wm>	 ACKNOWLEDGEMENT - WDQS high update lag on wdqs1005 is CRITICAL: 6.079e+04 ge 4.32e+04 Gehel Blazegraph restarted, catching up on lag https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook%23Update_lag https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen
[07:34:32] <wikibugs>	 (03PS8) 10ZPapierski: Multiple instances of msearch_daemon [puppet] - 10https://gerrit.wikimedia.org/r/621988 (https://phabricator.wikimedia.org/T260305)
[07:40:10] <wikibugs>	 (03PS9) 10ZPapierski: Multiple instances of msearch_daemon [puppet] - 10https://gerrit.wikimedia.org/r/621988 (https://phabricator.wikimedia.org/T260305)
[07:42:40] <wikibugs>	 (03CR) 10Volans: "Code looks ok from my point of view, few nit/question inline." (035 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/619781 (owner: 10Ryan Kemper)
[07:43:16] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+1] "LGTM" [deployment-charts] - 10https://gerrit.wikimedia.org/r/621605 (https://phabricator.wikimedia.org/T256973) (owner: 10Effie Mouzeli)
[07:45:12] <wikibugs>	 10Operations, 10observability: Grafana link redirecting to port :3000 - https://phabricator.wikimedia.org/T261184 (10elukey)
[07:46:06] <wikibugs>	 (03CR) 10Kormat: [C: 03+1] wmnet: Decrease m5-master TTL to 1M [dns] - 10https://gerrit.wikimedia.org/r/622266 (https://phabricator.wikimedia.org/T260324) (owner: 10Marostegui)
[07:54:13] <wikibugs>	 (03PS10) 10ZPapierski: Multiple instances of msearch_daemon [puppet] - 10https://gerrit.wikimedia.org/r/621988 (https://phabricator.wikimedia.org/T260305)
[07:56:02] <wikibugs>	 (03CR) 10JMeybohm: [C: 04-1] package_builder: add support for 'sloppy' backports (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/622190 (owner: 10CDanis)
[08:01:42] <wikibugs>	 10Operations, 10serviceops, 10Platform Team Workboards (Clinic Duty Team): PHP microservice for containerized shell execution - https://phabricator.wikimedia.org/T260330 (10daniel) >>! In T260330#6408193, @tstarling wrote: > Has anyone got an idea for giving the HMAC key to the server without allowing the co...
[08:05:09] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] grafana: remove deprecated settings [puppet] - 10https://gerrit.wikimedia.org/r/621472 (https://phabricator.wikimedia.org/T259143) (owner: 10Filippo Giunchedi)
[08:06:53] <wikibugs>	 (03CR) 10Hashar: [C: 03+1] zuul: add data types, replace hiera() with lookup() [puppet] - 10https://gerrit.wikimedia.org/r/621758 (owner: 10Dzahn)
[08:07:47] <wikibugs>	 (03CR) 10Volans: [C: 04-1] "Code looks good to me, one typo in a parameter." (033 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/621701 (https://phabricator.wikimedia.org/T260110) (owner: 10Jbond)
[08:08:55] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] logstash: add #o11y tag to logstash alert descriptions [puppet] - 10https://gerrit.wikimedia.org/r/622161 (owner: 10Herron)
[08:09:27] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] pontoon: Use a symlink for /etc/puppet/hieradata/pontoon [puppet] - 10https://gerrit.wikimedia.org/r/621688 (owner: 10Kormat)
[08:10:14] <wikibugs>	 (03CR) 10Kormat: [C: 03+2] pontoon: Use a symlink for /etc/puppet/hieradata/pontoon [puppet] - 10https://gerrit.wikimedia.org/r/621688 (owner: 10Kormat)
[08:11:53] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM, for more information on the wikidata metric I believe Addshore might know more" [puppet] - 10https://gerrit.wikimedia.org/r/615269 (https://phabricator.wikimedia.org/T180105) (owner: 10Cwhite)
[08:13:39] <wikibugs>	 (03CR) 10Filippo Giunchedi: prometheus: add apache2 es-exporter config (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/621597 (https://phabricator.wikimedia.org/T256418) (owner: 10Cwhite)
[08:14:12] <wikibugs>	 10Operations, 10observability: Grafana link redirecting to port :3000 - https://phabricator.wikimedia.org/T261184 (10jijiki) p:05Triage→03Medium
[08:18:11] <XioNoX>	 !log deactivate eqord peering/transit - T259593
[08:18:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:18:15] <stashbot>	 T259593: Make eqord its own AS - https://phabricator.wikimedia.org/T259593
[08:19:21] <XioNoX>	 !log reconfigure eqord to be AS65020 - T259593
[08:19:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:21:41] <wikibugs>	 (03CR) 10Elukey: [V: 03+2 C: 03+2] Scap: git_fat -> git_binary_manager [software/cassandra-twcs] - 10https://gerrit.wikimedia.org/r/404228 (https://phabricator.wikimedia.org/T184882) (owner: 10Thcipriani)
[08:21:55] <wikibugs>	 (03CR) 10Elukey: [V: 03+2 C: 03+2] Scap: git_fat -> git_binary_manager [software/cassandra-metrics-collector] - 10https://gerrit.wikimedia.org/r/404226 (https://phabricator.wikimedia.org/T184882) (owner: 10Thcipriani)
[08:22:18] <wikibugs>	 (03CR) 10Elukey: [V: 03+2 C: 03+2] Scap: git_fat -> git_binary_manager [software/logstash-logback-encoder] - 10https://gerrit.wikimedia.org/r/404227 (https://phabricator.wikimedia.org/T184882) (owner: 10Thcipriani)
[08:23:24] <wikibugs>	 (03CR) 10Legoktm: "recheck" [deployment-charts] - 10https://gerrit.wikimedia.org/r/620934 (https://phabricator.wikimedia.org/T258572) (owner: 10Giuseppe Lavagetto)
[08:24:01] <wikibugs>	 (03PS1) 10Kormat: Remove unused sql.py and check_private_data.py [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/622310 (https://phabricator.wikimedia.org/T259516)
[08:25:17] <wikibugs>	 (03PS2) 10Kormat: Remove unused sql.py and check_private_data.py [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/622310 (https://phabricator.wikimedia.org/T259516)
[08:28:01] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] wmcs: collect prometheus metrics from alertmanager in metricsinfra [puppet] - 10https://gerrit.wikimedia.org/r/620760 (owner: 10BryanDavis)
[08:31:20] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] "no actual idea what this is doing, but I trust you :-)" [puppet] - 10https://gerrit.wikimedia.org/r/622238 (https://phabricator.wikimedia.org/T158216) (owner: 10BryanDavis)
[08:31:37] <wikibugs>	 10Operations, 10GrowthExperiments-NewcomerTasks, 10Product-Infrastructure-Team-Backlog, 10serviceops: Service operations setup for Add a Link project - https://phabricator.wikimedia.org/T258978 (10Joe) I have a few questions for you, before giving a refined recommendation:  - do you think you'll need to de...
[08:36:22] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 04-1] "can you please split this change into smaller ones? That would make me more confident to merge and babysit, given I could easily identify " [puppet] - 10https://gerrit.wikimedia.org/r/622237 (https://phabricator.wikimedia.org/T251628) (owner: 10BryanDavis)
[08:37:09] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] "can't merge this until https://gerrit.wikimedia.org/r/c/operations/puppet/+/622237 is merged" [puppet] - 10https://gerrit.wikimedia.org/r/622238 (https://phabricator.wikimedia.org/T158216) (owner: 10BryanDavis)
[08:39:52] <wikibugs>	 (03CR) 10Marostegui: "How do you foresee the future development of check_private_data? So we just make changes to it on the puppet repo and ship it as it is now" [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/622310 (https://phabricator.wikimedia.org/T259516) (owner: 10Kormat)
[08:41:52] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - logstash1010-production-logstash-eqiad on logstash1010 is CRITICAL: 100.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-logstash-eqiad&var-instance=logstash1010&panelId=37
[08:45:54] <wikibugs>	 (03CR) 10Kormat: "> Patch Set 2:" [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/622310 (https://phabricator.wikimedia.org/T259516) (owner: 10Kormat)
[08:48:02] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_cxserver_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[08:49:41] <wikibugs>	 (03CR) 10Filippo Giunchedi: "Yes a full PCC run in this case would be good to validate the change. A valid strategy would be to push the change for one/two exporter fo" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/621759 (owner: 10Dzahn)
[08:50:09] <XioNoX>	 !log re-activate eqord peering/transit - T259593
[08:50:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:50:13] <stashbot>	 T259593: Make eqord its own AS - https://phabricator.wikimedia.org/T259593
[08:52:36] <icinga-wm>	 PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 51 probes of 554 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[08:53:56] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[08:55:50] <wikibugs>	 10Operations, 10Traffic, 10conftool, 10serviceops: confd's watch functionality appears to be partially broken when interacting with etcd 3.x - https://phabricator.wikimedia.org/T260889 (10Joe) For the record, the problem is more general, and also affects servers connecting to etcd 2.x - the watch functiona...
[08:57:40] <wikibugs>	 10Operations, 10netops, 10Patch-For-Review: Make eqord its own AS - https://phabricator.wikimedia.org/T259593 (10ayounsi)
[08:59:04] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: confd: use -interval 3 as a lower bound in all datacenters [puppet] - 10https://gerrit.wikimedia.org/r/622314 (https://phabricator.wikimedia.org/T260889)
[09:02:16] <wikibugs>	 10Operations, 10netops, 10Patch-For-Review: Make eqord its own AS - https://phabricator.wikimedia.org/T259593 (10ayounsi) 05Open→03Resolved All done and checked that: 1/ internal prefixes are properly exchange in all direction (eg. ulsfo sees eqiad via eqord) even if not always the active path 2/ externa...
[09:03:09] <wikibugs>	 (03CR) 10Ema: [C: 03+2] cache: remove 'backend_services' hiera setting [puppet] - 10https://gerrit.wikimedia.org/r/622131 (https://phabricator.wikimedia.org/T222937) (owner: 10Ema)
[09:05:40] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] confd: use -interval 3 as a lower bound in all datacenters [puppet] - 10https://gerrit.wikimedia.org/r/622314 (https://phabricator.wikimedia.org/T260889) (owner: 10Giuseppe Lavagetto)
[09:06:38] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 52 probes of 556 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[09:07:33] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] Change eqord ASN to 65020 [homer/public] - 10https://gerrit.wikimedia.org/r/622268 (https://phabricator.wikimedia.org/T259593) (owner: 10Ayounsi)
[09:07:56] <wikibugs>	 (03Merged) 10jenkins-bot: Change eqord ASN to 65020 [homer/public] - 10https://gerrit.wikimedia.org/r/622268 (https://phabricator.wikimedia.org/T259593) (owner: 10Ayounsi)
[09:18:23] <wikibugs>	 (03CR) 10Gehel: "A few more comments." (034 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/619781 (owner: 10Ryan Kemper)
[09:18:34] <icinga-wm>	 RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 50 probes of 556 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[09:21:20] <wikibugs>	 (03CR) 10Gehel: [C: 04-1] elasticsearch: verify all write queues are empty (032 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/619781 (owner: 10Ryan Kemper)
[09:22:18] <icinga-wm>	 RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 50 probes of 554 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[09:26:12] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] Test deployments with helmfile lint [deployment-charts] - 10https://gerrit.wikimedia.org/r/620934 (https://phabricator.wikimedia.org/T258572) (owner: 10Giuseppe Lavagetto)
[09:26:46] <wikibugs>	 (03PS1) 10Filippo Giunchedi: icinga_exporter: export problems only from Icinga active_host [puppet] - 10https://gerrit.wikimedia.org/r/622316 (https://phabricator.wikimedia.org/T258948)
[09:28:10] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Disable google code in mailinglists - https://phabricator.wikimedia.org/T261084 (10Urbanecm) >>! In T261084#6407773, @Dzahn wrote: > So to know which one you want you basically just have to answer the question if you want to re-enable it later and still have the same li...
[09:28:37] <wikibugs>	 (03Merged) 10jenkins-bot: Test deployments with helmfile lint [deployment-charts] - 10https://gerrit.wikimedia.org/r/620934 (https://phabricator.wikimedia.org/T258572) (owner: 10Giuseppe Lavagetto)
[09:29:44] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] Remove the original X-Forwarded-Proto header if injecting https [deployment-charts] - 10https://gerrit.wikimedia.org/r/622118 (owner: 10Giuseppe Lavagetto)
[09:30:02] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] "PCC https://puppet-compiler.wmflabs.org/compiler1002/24644/" [puppet] - 10https://gerrit.wikimedia.org/r/622316 (https://phabricator.wikimedia.org/T258948) (owner: 10Filippo Giunchedi)
[09:30:33] <godog>	 _joe_: I'll merge your change too
[09:30:42] <wikibugs>	 (03PS1) 10Kormat: Add mypy to tox, and check in CI. [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/622317
[09:30:45] <_joe_>	 ouch, yes, please
[09:30:56] <godog>	 _joe_: 7f7e0554e3 that is
[09:31:10] <_joe_>	 yes
[09:31:25] <_joe_>	 I had a puppet-merge that I aborted by mistyping yes
[09:31:32] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=205 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[09:31:41] <_joe_>	 205?
[09:32:00] <wikibugs>	 (03Merged) 10jenkins-bot: Remove the original X-Forwarded-Proto header if injecting https [deployment-charts] - 10https://gerrit.wikimedia.org/r/622118 (owner: 10Giuseppe Lavagetto)
[09:33:28] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[09:38:03] <wikibugs>	 10Operations: Make bpfcc-tools available fleet-wide - https://phabricator.wikimedia.org/T261193 (10MoritzMuehlenhoff)
[09:40:14] <wikibugs>	 (03PS1) 10Ayounsi: Apply netflow group to existing fpc X statements [homer/public] - 10https://gerrit.wikimedia.org/r/622318 (https://phabricator.wikimedia.org/T257392)
[09:40:56] <wikibugs>	 (03PS2) 10Ayounsi: Apply sampling group to existing fpc X statements [homer/public] - 10https://gerrit.wikimedia.org/r/622318 (https://phabricator.wikimedia.org/T257392)
[09:42:22] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 51 probes of 556 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[09:45:51] <marostegui>	 !log Create missing table cx_notification_log on x1 wikishared T261190
[09:45:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:45:55] <stashbot>	 T261190: Create notification-log table in Production (wikishared) - https://phabricator.wikimedia.org/T261190
[09:51:08] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=205 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[09:53:04] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[09:55:05] <wikibugs>	 10Operations, 10netops, 10Patch-For-Review: automatically sample from all FPCs on core routers - https://phabricator.wikimedia.org/T257392 (10ayounsi) While working on that I noticed that the `apply group` only applies to existing `fpc X` statements, for example if they are configured with `pic` sub-section...
[09:55:08] <wikibugs>	 (03PS11) 10ZPapierski: Multiple instances of msearch_daemon [puppet] - 10https://gerrit.wikimedia.org/r/621988 (https://phabricator.wikimedia.org/T260305)
[09:56:46] <wikibugs>	 (03CR) 10Ayounsi: [C: 04-1] "See comment on the task." [homer/public] - 10https://gerrit.wikimedia.org/r/622318 (https://phabricator.wikimedia.org/T257392) (owner: 10Ayounsi)
[09:56:56] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: termbox: bump chart to pick up changes in the envoy configuration [deployment-charts] - 10https://gerrit.wikimedia.org/r/622319
[09:58:23] <wikibugs>	 10Operations, 10DBA, 10serviceops, 10Goal, 10Patch-For-Review: Strengthen backup infrastructure and support - https://phabricator.wikimedia.org/T229209 (10jcrespo)
[09:58:26] <wikibugs>	 10Operations, 10Documentation: Wikitech: update Bacula article - https://phabricator.wikimedia.org/T100954 (10jcrespo) 05Open→03Resolved Done months ago.
[09:58:41] <wikibugs>	 (03PS1) 10Filippo Giunchedi: prometheus: update summary for IcingaServiceProblem alert [puppet] - 10https://gerrit.wikimedia.org/r/622320 (https://phabricator.wikimedia.org/T258948)
[10:00:26] <icinga-wm>	 RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 50 probes of 556 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[10:00:33] <wikibugs>	 10Operations: Setup an Offsite backup infrastructure - https://phabricator.wikimedia.org/T85278 (10jcrespo) 05Stalled→03Open This is being reimplemented doing parallel backup jobs into Codfw.  This has started by now only with database backups and DatabasesCodfw pool on backup2001, other pools to follow at a...
[10:00:44] <wikibugs>	 (03PS4) 10Hnowlan: api-gateway: strip cookie headers from requests and responses. [deployment-charts] - 10https://gerrit.wikimedia.org/r/620311 (https://phabricator.wikimedia.org/T259296)
[10:01:44] <wikibugs>	 (03PS1) 10Vgutierrez: vcl: Use synthetic warning for DHE-RSA-AES128-SHA pageviews [puppet] - 10https://gerrit.wikimedia.org/r/622321 (https://phabricator.wikimedia.org/T258405)
[10:01:45] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] Remove unused sql.py and check_private_data.py [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/622310 (https://phabricator.wikimedia.org/T259516) (owner: 10Kormat)
[10:02:20] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] termbox: bump chart to pick up changes in the envoy configuration [deployment-charts] - 10https://gerrit.wikimedia.org/r/622319 (owner: 10Giuseppe Lavagetto)
[10:02:24] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] prometheus: update summary for IcingaServiceProblem alert [puppet] - 10https://gerrit.wikimedia.org/r/622320 (https://phabricator.wikimedia.org/T258948) (owner: 10Filippo Giunchedi)
[10:02:28] <wikibugs>	 (03CR) 10Kormat: [C: 03+2] Remove unused sql.py and check_private_data.py [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/622310 (https://phabricator.wikimedia.org/T259516) (owner: 10Kormat)
[10:03:12] <wikibugs>	 (03Merged) 10jenkins-bot: Remove unused sql.py and check_private_data.py [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/622310 (https://phabricator.wikimedia.org/T259516) (owner: 10Kormat)
[10:04:16] <wikibugs>	 (03Merged) 10jenkins-bot: termbox: bump chart to pick up changes in the envoy configuration [deployment-charts] - 10https://gerrit.wikimedia.org/r/622319 (owner: 10Giuseppe Lavagetto)
[10:04:37] <wikibugs>	 (03PS1) 10Jcrespo: Revert "mariadb-backups: Ignore backup freshness check for dbprov1* hosts" [puppet] - 10https://gerrit.wikimedia.org/r/622209 (https://phabricator.wikimedia.org/T260764)
[10:04:52] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Revert "mariadb-backups: Ignore backup freshness check for dbprov1* hosts" [puppet] - 10https://gerrit.wikimedia.org/r/622209 (https://phabricator.wikimedia.org/T260764) (owner: 10Jcrespo)
[10:06:13] <wikibugs>	 (03PS1) 10Filippo Giunchedi: karma: match Icinga background colors for 'severity' and hide 'info' label [puppet] - 10https://gerrit.wikimedia.org/r/622322 (https://phabricator.wikimedia.org/T258948)
[10:06:19] <wikibugs>	 (03CR) 10Hnowlan: [C: 03+2] api-gateway: strip cookie headers from requests and responses. [deployment-charts] - 10https://gerrit.wikimedia.org/r/620311 (https://phabricator.wikimedia.org/T259296) (owner: 10Hnowlan)
[10:06:21] <wikibugs>	 (03PS2) 10Jcrespo: mariadb-backups: Setup dbprov1003 [puppet] - 10https://gerrit.wikimedia.org/r/621987 (https://phabricator.wikimedia.org/T257551)
[10:06:23] <wikibugs>	 (03PS2) 10Jcrespo: Revert "mariadb-backups: Ignore backup freshness check for dbprov1* hosts" [puppet] - 10https://gerrit.wikimedia.org/r/622209 (https://phabricator.wikimedia.org/T260764)
[10:06:34] <wikibugs>	 (03PS3) 10Jcrespo: Revert "mariadb-backups: Ignore backup freshness check for dbprov1* hosts" [puppet] - 10https://gerrit.wikimedia.org/r/622209 (https://phabricator.wikimedia.org/T260764)
[10:08:03] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] Revert "mariadb-backups: Ignore backup freshness check for dbprov1* hosts" [puppet] - 10https://gerrit.wikimedia.org/r/622209 (https://phabricator.wikimedia.org/T260764) (owner: 10Jcrespo)
[10:08:19] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] Add mypy to tox, and check in CI. [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/622317 (owner: 10Kormat)
[10:08:25] <wikibugs>	 (03Merged) 10jenkins-bot: api-gateway: strip cookie headers from requests and responses. [deployment-charts] - 10https://gerrit.wikimedia.org/r/620311 (https://phabricator.wikimedia.org/T259296) (owner: 10Hnowlan)
[10:08:48] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] karma: match Icinga background colors for 'severity' and hide 'info' label [puppet] - 10https://gerrit.wikimedia.org/r/622322 (https://phabricator.wikimedia.org/T258948) (owner: 10Filippo Giunchedi)
[10:09:28] <wikibugs>	 10Operations: Updated java security policy in OpenJDK 8 u265 - https://phabricator.wikimedia.org/T261196 (10MoritzMuehlenhoff)
[10:10:08] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 51 probes of 556 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[10:10:13] <wikibugs>	 10Operations, 10User-MoritzMuehlenhoff, 10User-jbond: Updated java security policy in OpenJDK 8 u252 - https://phabricator.wikimedia.org/T251493 (10MoritzMuehlenhoff) And right in time there's new changes in u265 :-) Opened T261196 to track these.
[10:16:52] <wikibugs>	 (03CR) 10Kormat: [C: 03+2] Add mypy to tox, and check in CI. [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/622317 (owner: 10Kormat)
[10:17:29] <wikibugs>	 (03CR) 10Elukey: Multiple instances of msearch_daemon (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/621988 (https://phabricator.wikimedia.org/T260305) (owner: 10ZPapierski)
[10:18:14] <wikibugs>	 (03Merged) 10jenkins-bot: Add mypy to tox, and check in CI. [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/622317 (owner: 10Kormat)
[10:23:18] <moritzm>	 !log removed fermium.wikimedia.org from debmonitor
[10:23:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:23:38] <logmsgbot>	 !log hnowlan@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
[10:23:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:28:39] <logmsgbot>	 !log oblivian@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
[10:28:39] <logmsgbot>	 !log oblivian@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
[10:28:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:28:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:32:28] <logmsgbot>	 !log hnowlan@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
[10:32:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:37:01] <wikibugs>	 (03PS1) 10Muehlenhoff: Set U2F token expiry to 3650 on the production IDPs [puppet] - 10https://gerrit.wikimedia.org/r/622324 (https://phabricator.wikimedia.org/T258029)
[10:37:11] <arturo>	 !log import all binary packages from tesseract-ocr-lang into stretch-wikimedia/component/tesseract-410-bpo (T247422)
[10:37:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:37:15] <stashbot>	 T247422: Update Tesseract on Toolforge to  v4.1.0 - https://phabricator.wikimedia.org/T247422
[10:45:54] <icinga-wm>	 RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 50 probes of 556 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[10:46:20] <wikibugs>	 10Operations, 10Mail: Create Group Aliases for itservices@ - https://phabricator.wikimedia.org/T259727 (10Aklapper)
[10:49:32] <wikibugs>	 (03PS1) 10Muehlenhoff: Add component/ceph [puppet] - 10https://gerrit.wikimedia.org/r/622326 (https://phabricator.wikimedia.org/T256877)
[10:51:18] <NotASpy>	 is WP down for anybody else, or is it just me ?
[10:52:35] <hauskatze>	 NotASpy: I can see WP
[10:52:43] <hauskatze>	 do you get any error message?
[10:53:00] <NotASpy>	 nope, just getting time out errors, not connecting at all
[10:56:43] <Urbanecm>	 NotASpy: can you please follow https://wikitech.wikimedia.org/wiki/Reporting_a_connectivity_issue?
[10:57:14] <Urbanecm>	 oh, you probably can't access that as well...
[10:57:28] <Urbanecm>	 https://wikitech-static.wikimedia.org/wiki/Reporting_a_connectivity_issue should work, as it's outside the cluster :)
[10:58:50] <NotASpy>	 it's my ISP, Urbanecm 
[10:59:15] <Urbanecm>	 okay then NotASpy - just providing resources you might need when reporting :)
[10:59:47] <moritzm>	 !log installing remaining libx11 security updates
[10:59:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:00:04] <jouncebot>	 Amir1, Lucas_WMDE, awight, and Urbanecm: My dear minions, it's time we take the moon! Just kidding. Time for European mid-day backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200825T1100).
[11:00:04] <jouncebot>	 kart_: A patch you scheduled for European mid-day backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[11:00:04] <NotASpy>	 I can remote access my office PC and WP is working there (different ISP)
[11:00:16] <Urbanecm>	 I see NotASpy 
[11:00:18] <Urbanecm>	 \o/
[11:00:29] <Urbanecm>	 kart_: Hi :). I can deploy, if needed :)
[11:00:51] <kart_>	 Urbanecm: Thanks :) Please go ahead. I'll do testing.
[11:01:06] <Urbanecm>	 ack kart_ :)
[11:01:17] <kart_>	 (And, having bad network too :/)
[11:01:35] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Enable ContentTranslation as a default tool in Assamese and Burmese WPs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622257 (https://phabricator.wikimedia.org/T258503) (owner: 10KartikMistry)
[11:01:42] <Urbanecm>	 :( kart_
[11:02:28] <wikibugs>	 (03Merged) 10jenkins-bot: Enable ContentTranslation as a default tool in Assamese and Burmese WPs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622257 (https://phabricator.wikimedia.org/T258503) (owner: 10KartikMistry)
[11:02:54] <NotASpy>	 looks like my ISP's IPv6 has fallen over
[11:03:29] <Urbanecm>	 hehe
[11:04:01] <Urbanecm>	 kart_: ready for you to test at mwdebug1002
[11:04:14] <kart_>	 Testing.
[11:05:40] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 51 probes of 556 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[11:06:20] <kart_>	 Urbanecm: looks good. Go ahead.
[11:06:27] <Urbanecm>	 syncing kart_ 
[11:07:52] <logmsgbot>	 !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: d869e308492ee72cb3d1998b15409aa44a4af9c7: Enable ContentTranslation as a default tool in Assamese and Burmese WPs (T258503; T258505) (duration: 01m 00s)
[11:07:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:07:57] <stashbot>	 T258503: Enable Content Translation in Burmese Wikipedia as a default tool - https://phabricator.wikimedia.org/T258503
[11:07:57] <stashbot>	 T258505: Enable Content Translation in Assamese Wikipedia as a default tool - https://phabricator.wikimedia.org/T258505
[11:08:12] <Urbanecm>	 kart_: should be live!
[11:08:14] <Urbanecm>	 anything else?
[11:08:44] <kart_>	 Urbanecm: thanks a lot! 
[11:08:51] <Urbanecm>	 happy to help!
[11:08:52] <kart_>	 Nothing else from me :)
[11:10:21] <Urbanecm>	 ack :)
[11:11:16] <icinga-wm>	 PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 51 probes of 554 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[11:11:36] <icinga-wm>	 RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 50 probes of 556 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[11:11:46] <wikibugs>	 (03CR) 10Effie Mouzeli: [C: 03+2] helmfile: add values for staging environment [deployment-charts] - 10https://gerrit.wikimedia.org/r/621605 (https://phabricator.wikimedia.org/T256973) (owner: 10Effie Mouzeli)
[11:12:20] <NotASpy>	 and we're back, Urbanecm 
[11:12:26] <Urbanecm>	 cool!
[11:12:34] <Urbanecm>	 enjoy IPv6 again :)
[11:12:37] <NotASpy>	 can go block people now
[11:13:18] <hauskatze>	 lol
[11:13:20] <Urbanecm>	 you could've used /etc/hosts and forced IPv4 too :)
[11:14:08] <wikibugs>	 (03Merged) 10jenkins-bot: helmfile: add values for staging environment [deployment-charts] - 10https://gerrit.wikimedia.org/r/621605 (https://phabricator.wikimedia.org/T256973) (owner: 10Effie Mouzeli)
[11:15:24] <wikibugs>	 (03PS1) 10Hnowlan: api-gateway: Bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/622328
[11:16:46] <Urbanecm>	 !log EU B&C done
[11:16:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:17:12] <icinga-wm>	 RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 50 probes of 554 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[11:21:52] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Add component/ceph [puppet] - 10https://gerrit.wikimedia.org/r/622326 (https://phabricator.wikimedia.org/T256877) (owner: 10Muehlenhoff)
[11:22:43] <wikibugs>	 (03CR) 10Hnowlan: [C: 03+2] api-gateway: Bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/622328 (owner: 10Hnowlan)
[11:23:50] <wikibugs>	 (03PS1) 10Effie Mouzeli: push-notifications: enable TLS for all environments [deployment-charts] - 10https://gerrit.wikimedia.org/r/622330 (https://phabricator.wikimedia.org/T256973)
[11:25:23] <marostegui>	 !log Upgrade mysql on db1118 after MCR change
[11:25:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:25:33] <wikibugs>	 (03Merged) 10jenkins-bot: api-gateway: Bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/622328 (owner: 10Hnowlan)
[11:25:55] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] push-notifications: enable TLS for all environments [deployment-charts] - 10https://gerrit.wikimedia.org/r/622330 (https://phabricator.wikimedia.org/T256973) (owner: 10Effie Mouzeli)
[11:28:00] <wikibugs>	 (03Merged) 10jenkins-bot: push-notifications: enable TLS for all environments [deployment-charts] - 10https://gerrit.wikimedia.org/r/622330 (https://phabricator.wikimedia.org/T256973) (owner: 10Effie Mouzeli)
[11:29:00] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1118 MCR changes', diff saved to https://phabricator.wikimedia.org/P12337 and previous config saved to /var/cache/conftool/dbconfig/20200825-112859-marostegui.json
[11:29:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:30:00] <wikibugs>	 10Operations: Make bpfcc-tools available fleet-wide - https://phabricator.wikimedia.org/T261193 (10ema) Thanks for opening this!  When it comes to systemtap, user-space tracing requires the linux-headers package for the currently running kernel, plus the debug symbols for whatever software is under scrutiny (eg:...
[11:31:28] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 51 probes of 556 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[11:32:03] <logmsgbot>	 !log hnowlan@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
[11:32:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:35:59] <wikibugs>	 (03CR) 10Volans: [C: 03+2] dns: fix corner case that should not happen [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/619982 (owner: 10Volans)
[11:36:37] <logmsgbot>	 !log hnowlan@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
[11:36:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:37:24] <icinga-wm>	 RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 50 probes of 556 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[11:37:59] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1118 MCR changes', diff saved to https://phabricator.wikimedia.org/P12338 and previous config saved to /var/cache/conftool/dbconfig/20200825-113758-marostegui.json
[11:38:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:41:52] <icinga-wm>	 RECOVERY - WDQS high update lag on wdqs1005 is OK: (C)4.32e+04 ge (W)2.16e+04 ge 2.127e+04 https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook%23Update_lag https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen
[11:41:53] <wikibugs>	 (03CR) 10ZPapierski: Multiple instances of msearch_daemon (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/621988 (https://phabricator.wikimedia.org/T260305) (owner: 10ZPapierski)
[11:42:27] <wikibugs>	 (03PS1) 10Cparle: CAT blocklist update [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622332 (https://phabricator.wikimedia.org/T260958)
[11:45:11] <wikibugs>	 (03PS1) 10Effie Mouzeli: push-notifications: Bump chart version [deployment-charts] - 10https://gerrit.wikimedia.org/r/622333 (https://phabricator.wikimedia.org/T256973)
[11:45:46] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] push-notifications: Bump chart version [deployment-charts] - 10https://gerrit.wikimedia.org/r/622333 (https://phabricator.wikimedia.org/T256973) (owner: 10Effie Mouzeli)
[11:46:14] <wikibugs>	 10Operations, 10observability: nagios-nrpe-server in jessie not compatibile with Buster version - https://phabricator.wikimedia.org/T261198 (10MoritzMuehlenhoff)
[11:47:50] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] push-notifications: Bump chart version [deployment-charts] - 10https://gerrit.wikimedia.org/r/622333 (https://phabricator.wikimedia.org/T256973) (owner: 10Effie Mouzeli)
[11:48:48] <_joe_>	 uhm random failure from chartmuseum it seems 
[11:48:51] <_joe_>	 jayme: ^^
[11:49:16] <effie>	 we are on it 
[11:49:38] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1118 MCR changes', diff saved to https://phabricator.wikimedia.org/P12339 and previous config saved to /var/cache/conftool/dbconfig/20200825-114938-marostegui.json
[11:49:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:50:19] <wikibugs>	 (03CR) 10Matthias Mullie: [C: 03+2] CAT blocklist update [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622332 (https://phabricator.wikimedia.org/T260958) (owner: 10Cparle)
[11:51:04] <wikibugs>	 (03Merged) 10jenkins-bot: CAT blocklist update [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622332 (https://phabricator.wikimedia.org/T260958) (owner: 10Cparle)
[11:52:49] <icinga-wm>	 PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 51 probes of 554 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[11:56:40] <wikibugs>	 (03CR) 10JMeybohm: [V: 03+2 C: 03+2] "Temporary error in CI, rerun looks okay" [deployment-charts] - 10https://gerrit.wikimedia.org/r/622333 (https://phabricator.wikimedia.org/T256973) (owner: 10Effie Mouzeli)
[11:59:15] <logmsgbot>	 !log jayme@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
[11:59:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:02:11] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Fully repool db1118 MCR changes', diff saved to https://phabricator.wikimedia.org/P12340 and previous config saved to /var/cache/conftool/dbconfig/20200825-120211-marostegui.json
[12:02:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:07:09] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1135 MCR change', diff saved to https://phabricator.wikimedia.org/P12341 and previous config saved to /var/cache/conftool/dbconfig/20200825-120708-marostegui.json
[12:07:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:07:54] <wikibugs>	 10Operations, 10observability: nagios-nrpe-server in jessie not compatibile with Buster version - https://phabricator.wikimedia.org/T261198 (10fgiunchedi) >>! In T261198#6408841, @MoritzMuehlenhoff wrote: > That rings a bell, we've seen similar issues before: https://phabricator.wikimedia.org/T157853 >  >> The...
[12:10:19] <moritzm>	 !log installing ruby-json security updates
[12:10:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:10:38] <icinga-wm>	 RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 50 probes of 554 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[12:13:01] <wikibugs>	 (03PS2) 10Vgutierrez: vcl: Use synthetic warning for DHE-RSA-AES128-SHA pageviews [puppet] - 10https://gerrit.wikimedia.org/r/622321 (https://phabricator.wikimedia.org/T258405)
[12:13:51] <wikibugs>	 10Operations: Integrate Stretch 9.13 point update - https://phabricator.wikimedia.org/T258407 (10MoritzMuehlenhoff)
[12:19:34] <wikibugs>	 10Operations, 10observability: nagios-nrpe-server in jessie not compatibile with Buster version - https://phabricator.wikimedia.org/T261198 (10fgiunchedi) >>! In T261198#6408898, @fgiunchedi wrote: >>>! In T261198#6408841, @MoritzMuehlenhoff wrote: >> That rings a bell, we've seen similar issues before: https:...
[12:19:37] <wikibugs>	 (03PS12) 10ZPapierski: Multiple instances of msearch_daemon [puppet] - 10https://gerrit.wikimedia.org/r/621988 (https://phabricator.wikimedia.org/T260305)
[12:20:38] <wikibugs>	 10Operations, 10Product-Infrastructure-Team-Backlog, 10Push-Notification-Service, 10serviceops: Deploy push-notifications service to Kubernetes - https://phabricator.wikimedia.org/T256973 (10jijiki)
[12:25:07] <wikibugs>	 (03PS13) 10ZPapierski: Multiple instances of msearch_daemon [puppet] - 10https://gerrit.wikimedia.org/r/621988 (https://phabricator.wikimedia.org/T260305)
[12:26:50] <wikibugs>	 10Operations, 10Product-Infrastructure-Team-Backlog, 10Push-Notification-Service, 10serviceops: Deploy push-notifications service to Kubernetes - https://phabricator.wikimedia.org/T256973 (10jijiki) Push-notifications is up and running in staging. Our next step is to perform the LVS steps and expose the ap...
[12:29:22] <wikibugs>	 10Operations, 10Mail: Create Group Aliases for itservices@ - https://phabricator.wikimedia.org/T259727 (10jijiki) p:05Triage→03Medium
[12:29:24] <wikibugs>	 10Operations, 10observability: nagios-nrpe-server in jessie not compatibile with Buster version - https://phabricator.wikimedia.org/T261198 (10jijiki) p:05Triage→03Medium
[12:35:31] <moritzm>	 !log imported ceph packages from stretch-backports to component/ceph T256877
[12:35:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:35:35] <stashbot>	 T256877: Handle sunset of stretch-backports - https://phabricator.wikimedia.org/T256877
[12:35:49] <wikibugs>	 (03PS1) 10Muehlenhoff: Switch openstack::serverpackages::rocky::stretch to component/ceph [puppet] - 10https://gerrit.wikimedia.org/r/622340 (https://phabricator.wikimedia.org/T256877)
[12:39:16] <godog>	 !log test nagios-nrpe-server with dh 2048 on scb2001 - T261198
[12:39:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:39:20] <stashbot>	 T261198: nagios-nrpe-server in jessie not compatibile with Buster version - https://phabricator.wikimedia.org/T261198
[12:39:39] <marostegui>	 !log alter table sites on s6, directly on the primary master T260476
[12:39:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:39:43] <stashbot>	 T260476: Extend sites.site_global_key on WMF production - https://phabricator.wikimedia.org/T260476
[12:42:50] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 51 probes of 556 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[12:45:45] <wikibugs>	 10Operations, 10observability: nagios-nrpe-server in jessie not compatibile with Buster version - https://phabricator.wikimedia.org/T261198 (10fgiunchedi) >>! In T261198#6409003, @Stashbot wrote: > {nav icon=file, name=Mentioned in SAL (#wikimedia-operations), href=https://sal.toolforge.org/log/St-hJXQBv7KcG9M...
[12:45:50] <marostegui>	 !log Update MySQL on db1111 after MCR change
[12:45:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:48:40] <Amir1>	 Urbanecm: you can revoke the bot's flag in arwiki so it gets throttled 
[12:48:46] <Amir1>	 What do you think?
[12:48:46] <icinga-wm>	 RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 50 probes of 556 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[12:48:58] <Urbanecm>	 Amir1: which bot do you mean? The one I locked yesterday?
[12:49:13] <Amir1>	 yup
[12:49:38] <Urbanecm>	 well, I can, but that would mean its edits would flood the RC
[12:49:59] <wikibugs>	 10Operations, 10Discovery-Search (Current work): wdqs1009 has puppet changes on each run - https://phabricator.wikimedia.org/T260123 (10Gehel) 05Open→03Resolved
[12:50:30] <Amir1>	 Urbanecm: since it's locked it shouldn't be able to edit
[12:50:38] <Amir1>	 Maybe I'm missing something obvious 
[12:50:57] <Urbanecm>	 Amir1: well, since it's locked, it can't login
[12:51:08] <Urbanecm>	 or are you saying it still tries to login, even it's failing?
[12:51:09] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1111 MCR changes', diff saved to https://phabricator.wikimedia.org/P12343 and previous config saved to /var/cache/conftool/dbconfig/20200825-125108-marostegui.json
[12:51:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:51:27] <Amir1>	 Urbanecm: yup
[12:51:31] <Urbanecm>	 gotcha
[12:51:36] <Amir1>	 maybe we should check if it tries
[12:51:41] <wikibugs>	 (03PS14) 10ZPapierski: Multiple instances of msearch_daemon [puppet] - 10https://gerrit.wikimedia.org/r/621988 (https://phabricator.wikimedia.org/T260305)
[12:51:46] <Amir1>	 let me look
[12:52:22] <Urbanecm>	 Amir1: there is an issue with the throttling through. I'm not seeing throttling in https://github.com/wikimedia/mediawiki/blob/master/includes/api/ApiLogin.php at all, but I might be blind, or looking in the wrong place
[12:52:59] <Amir1>	 maybe it's centralized in auth manager?
[12:53:02] <Urbanecm>	 maybe
[12:53:23] <wikibugs>	 (03PS1) 10DCausse: Use dedicated schedules for the various wikidata ttl dumps [puppet] - 10https://gerrit.wikimedia.org/r/622342 (https://phabricator.wikimedia.org/T261204)
[12:54:57] <Urbanecm>	 Amir1: I'm not seeing any call to pingLimiter from any code that seems to be relevant with the auth process <https://phabricator.wikimedia.org/source/mediawiki/browse/master/?grep=-%3EpingLimiter>
[12:55:14] <Amir1>	 :(
[12:56:07] <godog>	 !log upgrade nagios-nrpe-server on scb2* and mwlog* - T261198
[12:56:12] <icinga-wm>	 PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 52 probes of 554 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[12:56:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:56:15] <stashbot>	 T261198: nagios-nrpe-server in jessie not compatibile with Buster version - https://phabricator.wikimedia.org/T261198
[12:57:12] <Urbanecm>	 Amir1: there's https://github.com/wikimedia/mediawiki/blob/master/includes/auth/ThrottlePreAuthenticationProvider.php, but that seems to work for incorrect attempts mainly
[12:57:44] <Amir1>	 does it count if the account is blocked?
[12:58:16] <wikibugs>	 10Operations, 10observability: nagios-nrpe-server in jessie not compatibile with Buster version - https://phabricator.wikimedia.org/T261198 (10fgiunchedi) >>! In T261198#6409081, @Stashbot wrote: > {nav icon=file, name=Mentioned in SAL (#wikimedia-operations), href=https://sal.toolforge.org/log/z9-wJXQBv7KcG9M...
[12:58:29] <wikibugs>	 10Operations, 10observability, 10User-fgiunchedi: nagios-nrpe-server in jessie not compatibile with Buster version - https://phabricator.wikimedia.org/T261198 (10fgiunchedi)
[12:58:42] <Urbanecm>	 checking Amir1 
[13:02:08] <icinga-wm>	 RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 50 probes of 554 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[13:06:46] <wikibugs>	 10Operations: Make bpfcc-tools available fleet-wide - https://phabricator.wikimedia.org/T261193 (10CDanis) p:05Triage→03Medium Thanks for opening this!  Really happy to see it (and was also talking to @wkandek just yesterday about making bpfcc generally available in the fleet).  +1 to the wrapper idea.  In m...
[13:08:03] <wikibugs>	 (03CR) 10Elukey: Multiple instances of msearch_daemon (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/621988 (https://phabricator.wikimedia.org/T260305) (owner: 10ZPapierski)
[13:13:47] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: Revert "termbox/staging: rollback the configuration, it clearly doesn't work." [deployment-charts] - 10https://gerrit.wikimedia.org/r/622212
[13:15:34] <wikibugs>	 (03PS3) 10Jcrespo: mariadb-backups: Setup dbprov1003 [puppet] - 10https://gerrit.wikimedia.org/r/621987 (https://phabricator.wikimedia.org/T257551)
[13:15:54] <wikibugs>	 (03CR) 10Elukey: Multiple instances of msearch_daemon (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/621988 (https://phabricator.wikimedia.org/T260305) (owner: 10ZPapierski)
[13:16:56] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] mariadb-backups: Setup dbprov1003 [puppet] - 10https://gerrit.wikimedia.org/r/621987 (https://phabricator.wikimedia.org/T257551) (owner: 10Jcrespo)
[13:17:13] <moritzm>	 !log installing firejail security updates on remaining mw* servers in eqiad
[13:17:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:20:27] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1111 MCR changes', diff saved to https://phabricator.wikimedia.org/P12344 and previous config saved to /var/cache/conftool/dbconfig/20200825-132027-marostegui.json
[13:20:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:20:38] <icinga-wm>	 PROBLEM - Too many messages in kafka logging-eqiad on icinga1001 is CRITICAL: cluster=misc exported_cluster=logging-eqiad group=logstash instance=kafkamon1001 job=burrow partition={2,3} prometheus=ops site=eqiad topic={udp_localhost-info,udp_localhost-warning} https://wikitech.wikimedia.org/wiki/Logstash%23Kafka_consumer_lag https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?from=now-3h&to=now&orgId=1&var-datasource=tha
[13:20:38] <icinga-wm>	 ogging-eqiad&var-topic=All&var-consumer_group=All
[13:21:22] <wikibugs>	 10Operations, 10CommRel-Specialists-Support (Jul-Sep-2020), 10User-notice: CommRel support for FY2020-2021 Q1 DC switchover - https://phabricator.wikimedia.org/T244808 (10Trizek-WMF) I plan to send the announcement to communities tomorrow.   At the moment, https://wikitech.wikimedia.org/wiki/Switch_Datacente...
[13:21:52] <icinga-wm>	 PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 53 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[13:22:02] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] Revert "termbox/staging: rollback the configuration, it clearly doesn't work." [deployment-charts] - 10https://gerrit.wikimedia.org/r/622212 (owner: 10Giuseppe Lavagetto)
[13:24:23] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Revert "termbox/staging: rollback the configuration, it clearly doesn't work." [deployment-charts] - 10https://gerrit.wikimedia.org/r/622212 (owner: 10Giuseppe Lavagetto)
[13:27:56] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/621779 (owner: 10Dzahn)
[13:31:36] <icinga-wm>	 RECOVERY - Rate of JVM GC Old generation-s runs - logstash1010-production-logstash-eqiad on logstash1010 is OK: (C)100 gt (W)80 gt 76.27 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-logstash-eqiad&var-instance=logstash1010&panelId=37
[13:32:19] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 51 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[13:34:32] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "LGTM, thx" [puppet] - 10https://gerrit.wikimedia.org/r/621758 (owner: 10Dzahn)
[13:34:34] <wikibugs>	 (03PS1) 10Filippo Giunchedi: alertmanager: remove inhibit rules until we need them [puppet] - 10https://gerrit.wikimedia.org/r/622348 (https://phabricator.wikimedia.org/T258948)
[13:35:37] <wikibugs>	 (03PS1) 10Kormat: Tidy up import ordering using isort. [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/622349
[13:37:35] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1111 MCR changes', diff saved to https://phabricator.wikimedia.org/P12345 and previous config saved to /var/cache/conftool/dbconfig/20200825-133734-marostegui.json
[13:37:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:38:42] <wikibugs>	 (03PS1) 10Muehlenhoff: Retire stub firejail code in service::uwsgi [puppet] - 10https://gerrit.wikimedia.org/r/622350
[13:38:43] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] alertmanager: remove inhibit rules until we need them [puppet] - 10https://gerrit.wikimedia.org/r/622348 (https://phabricator.wikimedia.org/T258948) (owner: 10Filippo Giunchedi)
[13:39:18] <wikibugs>	 10Operations, 10DBA, 10Patch-For-Review, 10User-Kormat: DBA python layout - https://phabricator.wikimedia.org/T259516 (10Kormat)
[13:39:42] <icinga-wm>	 RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 50 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[13:39:50] <wikibugs>	 (03PS3) 10Jbond: cookbook sre.puppet.renew-cert: add cookbook to renew a puppet cert [cookbooks] - 10https://gerrit.wikimedia.org/r/621701 (https://phabricator.wikimedia.org/T260110)
[13:39:59] <wikibugs>	 10Operations, 10DBA, 10Patch-For-Review, 10User-Kormat: DBA python layout - https://phabricator.wikimedia.org/T259516 (10Kormat) 05Open→03Resolved Mission accomplished.
[13:40:05] <wikibugs>	 (03CR) 10Jbond: "updated thx" (033 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/621701 (https://phabricator.wikimedia.org/T260110) (owner: 10Jbond)
[13:42:32] <wikibugs>	 (03PS15) 10ZPapierski: Multiple instances of msearch_daemon [puppet] - 10https://gerrit.wikimedia.org/r/621988 (https://phabricator.wikimedia.org/T260305)
[13:43:58] <wikibugs>	 (03PS1) 10Filippo Giunchedi: prometheus: fix 'summary' annotation for IcingaServiceProblem [puppet] - 10https://gerrit.wikimedia.org/r/622352 (https://phabricator.wikimedia.org/T258948)
[13:44:14] <icinga-wm>	 RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 50 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[13:45:25] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "Thx, LGTM!" (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/621701 (https://phabricator.wikimedia.org/T260110) (owner: 10Jbond)
[13:46:09] <icinga-wm>	 RECOVERY - Too many messages in kafka logging-eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Logstash%23Kafka_consumer_lag https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?from=now-3h&to=now&orgId=1&var-datasource=thanos&var-cluster=logging-eqiad&var-topic=All&var-consumer_group=All
[13:46:43] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] prometheus: fix 'summary' annotation for IcingaServiceProblem [puppet] - 10https://gerrit.wikimedia.org/r/622352 (https://phabricator.wikimedia.org/T258948) (owner: 10Filippo Giunchedi)
[13:47:04] <wikibugs>	 (03CR) 10Jbond: "LGTM barring comments from filippo.  A general comment however, Stdlib::Host may be more preferable to Stdlib::Fqdn.  the former also allo" [puppet] - 10https://gerrit.wikimedia.org/r/621759 (owner: 10Dzahn)
[13:47:16] <wikibugs>	 (03PS1) 10JMeybohm: Use include instead of template to include defines [deployment-charts] - 10https://gerrit.wikimedia.org/r/622354
[13:47:36] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'fully repool db1111 MCR changes', diff saved to https://phabricator.wikimedia.org/P12346 and previous config saved to /var/cache/conftool/dbconfig/20200825-134736-marostegui.json
[13:47:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:49:02] <wikibugs>	 (03PS16) 10ZPapierski: Multiple instances of msearch_daemon [puppet] - 10https://gerrit.wikimedia.org/r/621988 (https://phabricator.wikimedia.org/T260305)
[13:49:05] <wikibugs>	 (03PS1) 10ZPapierski: Remove unnecessary daemon definitions [puppet] - 10https://gerrit.wikimedia.org/r/622355 (https://phabricator.wikimedia.org/T260305)
[13:49:47] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/622324 (https://phabricator.wikimedia.org/T258029) (owner: 10Muehlenhoff)
[13:51:52] <wikibugs>	 10Operations, 10serviceops: assess and re-evaluate 'weight' settings of appservers in codfw - https://phabricator.wikimedia.org/T261159 (10Joe) I would rather try to elaborate starting from what eqiad does with similar hardware.   The api cluster has, excluding servers to decom 65 servers, distributed as follo...
[13:52:49] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1114 for MCR change', diff saved to https://phabricator.wikimedia.org/P12347 and previous config saved to /var/cache/conftool/dbconfig/20200825-135248-marostegui.json
[13:52:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:52:53] <wikibugs>	 10Operations, 10CommRel-Specialists-Support (Jul-Sep-2020), 10User-notice: CommRel support for FY2020-2021 Q1 DC switchover - https://phabricator.wikimedia.org/T244808 (10RLazarus) Yeah, sorry that's later than I expected -- we're meeting today to confirm the timing details and I'll post the update immediate...
[13:54:54] <wikibugs>	 (03PS17) 10ZPapierski: Multiple instances of msearch_daemon [puppet] - 10https://gerrit.wikimedia.org/r/621988 (https://phabricator.wikimedia.org/T260305)
[13:55:52] <wikibugs>	 (03CR) 10Jbond: cookbook sre.puppet.renew-cert: add cookbook to renew a puppet cert (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/621701 (https://phabricator.wikimedia.org/T260110) (owner: 10Jbond)
[13:55:53] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] cookbook sre.puppet.renew-cert: add cookbook to renew a puppet cert [cookbooks] - 10https://gerrit.wikimedia.org/r/621701 (https://phabricator.wikimedia.org/T260110) (owner: 10Jbond)
[13:57:23] <icinga-wm>	 PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 51 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[13:57:45] <wikibugs>	 (03PS18) 10ZPapierski: Multiple instances of msearch_daemon [puppet] - 10https://gerrit.wikimedia.org/r/621988 (https://phabricator.wikimedia.org/T260305)
[13:58:37] <icinga-wm>	 PROBLEM - Too many messages in kafka logging-eqiad on icinga1001 is CRITICAL: cluster=misc exported_cluster=logging-eqiad group=logstash instance=kafkamon1001 job=burrow partition={2,3} prometheus=ops site=eqiad topic={udp_localhost-info,udp_localhost-warning} https://wikitech.wikimedia.org/wiki/Logstash%23Kafka_consumer_lag https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?from=now-3h&to=now&orgId=1&var-datasource=tha
[13:58:37] <icinga-wm>	 ogging-eqiad&var-topic=All&var-consumer_group=All
[14:00:47] <wikibugs>	 (03PS2) 10ZPapierski: Remove unnecessary daemon definitions [puppet] - 10https://gerrit.wikimedia.org/r/622355 (https://phabricator.wikimedia.org/T260305)
[14:01:51] <wikibugs>	 (03CR) 10ZPapierski: "PCC: https://puppet-compiler.wmflabs.org/compiler1002/24651/" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/621988 (https://phabricator.wikimedia.org/T260305) (owner: 10ZPapierski)
[14:02:30] <wikibugs>	 (03PS1) 10Volans: junos: colorize configuration diff [software/homer] - 10https://gerrit.wikimedia.org/r/622356 (https://phabricator.wikimedia.org/T260769)
[14:06:31] <logmsgbot>	 !log andrew@deploy1001 Started deploy [horizon/deploy@7a3221d]: add hostname checking --bug T207538
[14:06:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:10:21] <logmsgbot>	 !log andrew@deploy1001 Finished deploy [horizon/deploy@7a3221d]: add hostname checking --bug T207538 (duration: 03m 50s)
[14:10:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:25:10] <wikibugs>	 10Operations, 10observability, 10Patch-For-Review, 10good first task: nagios-nrpe-server.service: systemd unit references path below legacy directory /var/run/ - https://phabricator.wikimedia.org/T252990 (10Southparkfan) >>! In T252990#6408248, @ema wrote: > This still uses the legacy `/var/run` though, he...
[14:25:44] <SPF|Cloud>	 ema: ^ sorry for making stuff complicated :)
[14:26:00] <XioNoX>	 !log disable IPv6 BGP to Init7 in knams
[14:26:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:27:16] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] "recheck" [deployment-charts] - 10https://gerrit.wikimedia.org/r/622212 (owner: 10Giuseppe Lavagetto)
[14:29:44] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "termbox/staging: rollback the configuration, it clearly doesn't work." [deployment-charts] - 10https://gerrit.wikimedia.org/r/622212 (owner: 10Giuseppe Lavagetto)
[14:32:05] <papaul>	 doing some cables work in c5,c6,c7 and c8 in case you see any mgmt interface going down
[14:32:12] <logmsgbot>	 !log volker-e@deploy1001 Started deploy [design/style-guide@e3fda83]: Deploy design/style-guide:
[14:32:14] <wikibugs>	 10Operations, 10LDAP-Access-Requests: LDAP access to the wmf group for Seve Kim - https://phabricator.wikimedia.org/T261208 (10sdkim)
[14:32:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:32:17] <logmsgbot>	 !log volker-e@deploy1001 Finished deploy [design/style-guide@e3fda83]: Deploy design/style-guide:  (duration: 00m 05s)
[14:32:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:34:29] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: decomission oresrdb100[12] - https://phabricator.wikimedia.org/T254238 (10Cmjohnson)
[14:34:30] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: decomission oresrdb100[12] - https://phabricator.wikimedia.org/T254238 (10Cmjohnson) 05Open→03Resolved
[14:36:44] <wikibugs>	 10Operations, 10observability, 10Patch-For-Review, 10good first task: nagios-nrpe-server.service: systemd unit references path below legacy directory /var/run/ - https://phabricator.wikimedia.org/T252990 (10MoritzMuehlenhoff) >>! In T252990#6409276, @Southparkfan wrote: > A change from /var/run to /run was...
[14:36:46] <logmsgbot>	 !log oblivian@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
[14:36:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:49:41] <icinga-wm>	 PROBLEM - Host ganeti2013.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[14:49:43] <icinga-wm>	 PROBLEM - Host db2127.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[14:49:43] <icinga-wm>	 PROBLEM - Host ganeti2014.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[14:55:01] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 52 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[14:55:35] <icinga-wm>	 PROBLEM - MariaDB Replica SQL: analytics_meta on db1108 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1146, Errmsg: Error executing row event: Table superset_staging.ab_user doesnt exist https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[14:55:41] <icinga-wm>	 RECOVERY - Host ganeti2013.mgmt is UP: PING OK - Packet loss = 0%, RTA = 33.94 ms
[14:55:43] <icinga-wm>	 RECOVERY - Host db2127.mgmt is UP: PING OK - Packet loss = 0%, RTA = 33.93 ms
[14:55:43] <icinga-wm>	 RECOVERY - Host ganeti2014.mgmt is UP: PING OK - Packet loss = 0%, RTA = 34.04 ms
[14:56:10] <jynus>	 elukey ^
[14:56:24] <moritzm>	 !log installing take security updates on stretch
[14:56:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:56:34] <moritzm>	 !log installing rake security updates on stretch
[14:56:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:56:37] <marostegui>	 elukey: around?
[14:56:45] <marostegui>	 ah, sorry didn't see jaime pinged you already
[14:56:57] <elukey>	 I am yes, but I didn't do anything to superset's db
[14:56:58] <marostegui>	 I am going to a meeting now, but I can check later if you need help
[14:57:03] <wikibugs>	 (03CR) 10ArielGlenn: "I don't have a problem with this as long as everyone bears in mind that multiple (maybe all three) of these jobs could end up running at t" [puppet] - 10https://gerrit.wikimedia.org/r/622342 (https://phabricator.wikimedia.org/T261204) (owner: 10DCausse)
[14:57:04] <elukey>	 yes yes no problem
[14:57:14] <jynus>	 I can help at least debugging
[14:57:50] <elukey>	 ah wait I think I may now what's happening
[14:58:00] <wikibugs>	 (03PS4) 10Southparkfan: nagios-nrpe-server systemd unit: use /run for PID files + add new versions for os_version [puppet] - 10https://gerrit.wikimedia.org/r/621967 (https://phabricator.wikimedia.org/T252990)
[14:58:35] <elukey>	 so we have a database called superset_staging on an-coord1001, that is not one that we need to replicate
[14:58:45] <jynus>	 elukey: ping if you need help
[14:58:53] <elukey>	 yep yep thanks :)
[14:58:57] <marostegui>	 elukey: we'd need to set a replication filter then
[14:59:22] <elukey>	 possibly yes, we can do it later when you people have time
[15:00:25] <elukey>	 so something like
[15:00:27] <jynus>	 can you do the logical work, as in, get a list of db that have to be skipped or something
[15:00:40] <jynus>	 or tables
[15:00:42] <elukey>	 stop slaves; SET GLOBAL replicate_ignore_db=superset_staging; start slave; ?
[15:00:56] <marostegui>	 yeah, but also set it to my.cnf so it doesn't get lost on restart
[15:00:58] <icinga-wm>	 RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 50 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[15:01:20] <jynus>	 I strongly recommend against ignore, I recommend ignore_wild_db
[15:01:38] <jynus>	 I can show you examples on labsdb
[15:01:48] <marostegui>	 elukey: you can check modules/profile/templates/mariadb/mysqld_config/sanitarium_multiinstance.my.cnf.erb as an example
[15:02:13] * marostegui goes to the meeting for real
[15:02:33] <elukey>	 jynus: sure thanks!
[15:02:51] <jynus>	 so replicate-wild-ignore-table = db.%
[15:03:41] <jynus>	 if you send a patch I can review it if necessary
[15:03:56] <elukey>	 ah so replicate-wild-ignore-table = superset_staging.%
[15:04:09] <jynus>	 the difference is that it is ignored on application
[15:04:19] <jynus>	 where it is less probablility of going bad
[15:04:30] <jynus>	 binlog gets untouched
[15:04:47] <elukey>	 ah right
[15:04:52] <jynus>	 e.g. ignore ignores based on the current db
[15:05:15] <elukey>	 can I try to set it dynamically to see if it works or better to directly apply the my.cnf patch and restart mariadb?
[15:05:17] <jynus>	 so if there is cross-db updates, things will break
[15:05:25] <jynus>	 elukey: yes
[15:05:29] <wikibugs>	 (03CR) 10Ebernhardson: Multiple instances of msearch_daemon (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/621988 (https://phabricator.wikimedia.org/T260305) (owner: 10ZPapierski)
[15:05:34] <jynus>	 although it requires stopping replication
[15:05:43] <elukey>	 yep yep no problem, trying now
[15:05:53] <jynus>	 lets move discussion to -database to not spam here
[15:05:56] <jynus>	 if needed
[15:06:16] <jynus>	 it has to be applied to the replica, not the master
[15:06:27] <jynus>	 so it is also safer because of that
[15:06:28] <wikibugs>	 (03CR) 10DCausse: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/622342 (https://phabricator.wikimedia.org/T261204) (owner: 10DCausse)
[15:06:30] <elukey>	 yep yep let's move to database
[15:10:51] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 51 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[15:15:03] <icinga-wm>	 RECOVERY - MariaDB Replica SQL: analytics_meta on db1108 is OK: OK slave_sql_state Slave_SQL_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[15:15:08] <elukey>	 \o/
[15:19:21] <icinga-wm>	 RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 49 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[15:21:05] <wikibugs_>	 (03PS9) 10Bstorm: wikireplicas: refactor to eliminate confusing "labsdb" naming [puppet] - 10https://gerrit.wikimedia.org/r/621618 (https://phabricator.wikimedia.org/T260843)
[15:22:54] <liw>	 !log testing upcoming Scap release on beta
[15:22:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:25:15] <wikibugs>	 10Operations, 10observability, 10Patch-For-Review, 10good first task: nagios-nrpe-server.service: systemd unit references path below legacy directory /var/run/ - https://phabricator.wikimedia.org/T252990 (10Southparkfan) @MoritzMuehlenhoff understood. Patch set 4 will use the custom unit (with /run) on sys...
[15:31:30] <wikibugs>	 (03PS1) 10Elukey: Exclude superset_staging from the db1108's meta replication [puppet] - 10https://gerrit.wikimedia.org/r/622382
[15:32:55] <wikibugs>	 (03CR) 10Elukey: "pcc looks good https://puppet-compiler.wmflabs.org/compiler1002/24652/db1108.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/622382 (owner: 10Elukey)
[15:33:52] <wikibugs_>	 (03CR) 10Jcrespo: [C: 03+1] Exclude superset_staging from the db1108's meta replication [puppet] - 10https://gerrit.wikimedia.org/r/622382 (owner: 10Elukey)
[15:34:35] <icinga-wm>	 RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 50 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[15:35:09] <icinga-wm>	 PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 52 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[15:36:23] <wikibugs_>	 (03PS5) 10Lucas Werkmeister (WMDE): Add new slow-bot group for Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618245 (https://phabricator.wikimedia.org/T258354) (owner: 10Tobias Andersson)
[15:37:12] <wikibugs_>	 (03CR) 10Elukey: [C: 03+2] Exclude superset_staging from the db1108's meta replication [puppet] - 10https://gerrit.wikimedia.org/r/622382 (owner: 10Elukey)
[15:37:55] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): Add new slow-bot group for Wikidata (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618245 (https://phabricator.wikimedia.org/T258354) (owner: 10Tobias Andersson)
[15:39:49] <logmsgbot>	 !log dcausse@deploy1001 Started deploy [wikimedia/discovery/analytics@cbf2f9d]: Add wikidata ttl import
[15:39:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:41:27] <logmsgbot>	 !log dcausse@deploy1001 Finished deploy [wikimedia/discovery/analytics@cbf2f9d]: Add wikidata ttl import (duration: 01m 38s)
[15:41:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:47:03] <icinga-wm>	 RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 50 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[15:47:40] <elukey>	 !log restart mariadb@analytics_meta on db1108 to apply a replication filter (exclude superset_staging database from replication)
[15:47:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:56:57] <icinga-wm>	 PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 53 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[15:57:15] <wikibugs_>	 (03CR) 10Bstorm: [C: 03+2] wikireplicas: refactor to eliminate confusing "labsdb" naming [puppet] - 10https://gerrit.wikimedia.org/r/621618 (https://phabricator.wikimedia.org/T260843) (owner: 10Bstorm)
[16:00:01] <gehel>	 !log repool wdqs1005 - catched up on lag
[16:00:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:00:04] <jouncebot>	 godog and _joe_: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Puppet request window . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200825T1600).
[16:00:20] <gehel>	 ryankemper: ^^^
[16:00:31] <_joe_>	 no changes today :)
[16:01:53] <wikibugs_>	 (03CR) 10Dduvall: [C: 03+2] Branch commit for wmf/1.36.0-wmf.6 [core] (wmf/1.36.0-wmf.6) - 10https://gerrit.wikimedia.org/r/622250 (https://phabricator.wikimedia.org/T257974) (owner: 10TrainBranchBot)
[16:02:53] <icinga-wm>	 RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 50 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[16:06:09] <marxarelli>	 1.36.0-wmf.6 was branched at 8c26ce9746bd57c8c7801c4c99b60cbb0cbc0703 for T257974
[16:06:10] <stashbot>	 T257974: 1.36.0-wmf.6 deployment blockers - https://phabricator.wikimedia.org/T257974
[16:07:28] <logmsgbot>	 !log dcausse@deploy1001 Started deploy [wikimedia/discovery/analytics@ae6dd8d]: test: Add wikidata ttl import
[16:07:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:08:22] <logmsgbot>	 !log dcausse@deploy1001 Finished deploy [wikimedia/discovery/analytics@ae6dd8d]: test: Add wikidata ttl import (duration: 00m 54s)
[16:08:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:09:32] <wikibugs>	 (03PS1) 10Bstorm: wikireplicas: fix path typo for the heartbeat-views file [puppet] - 10https://gerrit.wikimedia.org/r/622387 (https://phabricator.wikimedia.org/T260843)
[16:10:12] <wikibugs_>	 (03CR) 10Jcrespo: [C: 03+1] "He he, as predicted :-)" [puppet] - 10https://gerrit.wikimedia.org/r/622387 (https://phabricator.wikimedia.org/T260843) (owner: 10Bstorm)
[16:10:45] <wikibugs>	 (03CR) 10Bstorm: [C: 03+2] "> Patch Set 1: Code-Review+1" [puppet] - 10https://gerrit.wikimedia.org/r/622387 (https://phabricator.wikimedia.org/T260843) (owner: 10Bstorm)
[16:12:45] <icinga-wm>	 PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 52 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[16:13:00] <wikibugs>	 10Operations, 10DC-Ops, 10fundraising-tech-ops: RAID controller failing on frdb1002.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T261221 (10Jgreen)
[16:13:33] <wikibugs_>	 10Operations, 10DC-Ops, 10fundraising-tech-ops: RAID controller failing on frdb1002.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T261221 (10Jgreen)
[16:21:20] <wikibugs>	 10Operations, 10DC-Ops, 10fundraising-tech-ops: RAID controller failing on frdb1002.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T261221 (10Jgreen)
[16:21:55] <shdubsh>	 !log restart logstash on logstash1007 -- gc duration outlier
[16:21:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:22:20] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: (Need By:TBD) rack/setup/install rows C and D new PDUs - https://phabricator.wikimedia.org/T253694 (10wiki_willy)
[16:24:05] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 51 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[16:25:07] <wikibugs_>	 (03Merged) 10jenkins-bot: Branch commit for wmf/1.36.0-wmf.6 [core] (wmf/1.36.0-wmf.6) - 10https://gerrit.wikimedia.org/r/622250 (https://phabricator.wikimedia.org/T257974) (owner: 10TrainBranchBot)
[16:26:09] <wikibugs>	 10Operations, 10Analytics, 10Analytics-Kanban, 10netops: Add more dimensions in the netflow/pmacct/Druid pipeline - https://phabricator.wikimedia.org/T254332 (10Nuria) a:03fdans
[16:30:39] <icinga-wm>	 RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 50 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[16:36:29] <icinga-wm>	 RECOVERY - Too many messages in kafka logging-eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Logstash%23Kafka_consumer_lag https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?from=now-3h&to=now&orgId=1&var-datasource=thanos&var-cluster=logging-eqiad&var-topic=All&var-consumer_group=All
[16:40:35] <icinga-wm>	 PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 51 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[16:41:38] <wikibugs>	 (03PS2) 10Herron: logstash: add #o11y tag to logstash alert descriptions [puppet] - 10https://gerrit.wikimedia.org/r/622161
[16:41:57] <logmsgbot>	 !log dcausse@deploy1001 Started deploy [wikimedia/discovery/analytics@89b4f74]: test: Add wikidata ttl import
[16:41:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:42:46] <logmsgbot>	 !log dcausse@deploy1001 Finished deploy [wikimedia/discovery/analytics@89b4f74]: test: Add wikidata ttl import (duration: 00m 49s)
[16:42:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:43:00] <wikibugs_>	 (03CR) 10Herron: [C: 03+2] logstash: add #o11y tag to logstash alert descriptions [puppet] - 10https://gerrit.wikimedia.org/r/622161 (owner: 10Herron)
[16:47:22] <icinga-wm>	 RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 50 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[16:54:43] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 51 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[16:59:13] <wikibugs_>	 (03CR) 10Cwhite: prometheus: add apache2 es-exporter config (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/621597 (https://phabricator.wikimedia.org/T256418) (owner: 10Cwhite)
[17:00:04] <jouncebot>	 halfak and accraze: May I have your attention please! Services – Graphoid / ORES. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200825T1700)
[17:00:50] <wikibugs>	 (03CR) 10Cwhite: profile: install and configure statsd_exporter and retarget statsv (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/615269 (https://phabricator.wikimedia.org/T180105) (owner: 10Cwhite)
[17:01:31] <herron>	 !log imported logstash, elasticsearch, and kibana 7.9.0 -oss packages into buster-wikimedia thirdparty/elastic79 
[17:01:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:01:48] <logmsgbot>	 !log dduvall@deploy1001 Pruned MediaWiki: 1.36.0-wmf.3 (duration: 19m 12s)
[17:01:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:05:22] <wikibugs_>	 (03PS1) 10Herron: logstash: set elk7 cluster elasticsearch version to 7.9 [puppet] - 10https://gerrit.wikimedia.org/r/622395
[17:05:36] <icinga-wm>	 RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 50 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[17:07:06] <wikibugs>	 10Operations, 10ops-codfw, 10netops: (Need by:  ) codfw:rack/setup/new management switches - https://phabricator.wikimedia.org/T253154 (10Papaul)
[17:08:12] <logmsgbot>	 !log dduvall@deploy1001 Pruned MediaWiki: 1.36.0-wmf.4 (duration: 01m 40s)
[17:08:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:08:32] <wikibugs>	 10Operations, 10ops-codfw, 10netops: (Need by:  ) codfw:rack/setup/new management switches - https://phabricator.wikimedia.org/T253154 (10Papaul) @Jgreen is it okay for me to replace fmsw on the 27th (start time 9:30am end time 11:30am) CT
[17:09:59] <wikibugs_>	 10Operations, 10ops-eqiad, 10Analytics-Clusters: (Need By: TBD) upgrade ram in an-master100[12] - https://phabricator.wikimedia.org/T259162 (10elukey) @Jclark-ctr I'd say 5/10 minutes for each host to do proper failover, and the host can stay down even for half an hour but better if less of course :)
[17:10:37] <wikibugs>	 (03PS1) 10Dduvall: testwikis wikis to 1.36.0-wmf.6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622396
[17:10:39] <wikibugs_>	 (03CR) 10Dduvall: [C: 03+2] testwikis wikis to 1.36.0-wmf.6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622396 (owner: 10Dduvall)
[17:11:29] <wikibugs_>	 (03Merged) 10jenkins-bot: testwikis wikis to 1.36.0-wmf.6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622396 (owner: 10Dduvall)
[17:12:38] <wikibugs_>	 10Operations, 10DC-Ops, 10fundraising-tech-ops: RAID controller failing on frdb1002.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T261221 (10Jgreen)
[17:13:00] <wikibugs_>	 10Operations, 10DC-Ops, 10fundraising-tech-ops: RAID controller failing on frdb1002.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T261221 (10Jgreen)
[17:13:29] <wikibugs_>	 (03CR) 10BryanDavis: [C: 04-1] "Will split this up" [puppet] - 10https://gerrit.wikimedia.org/r/622237 (https://phabricator.wikimedia.org/T251628) (owner: 10BryanDavis)
[17:13:31] <wikibugs_>	 10Operations, 10DC-Ops, 10fundraising-tech-ops: RAID controller failing on frdb1002.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T261221 (10Jgreen)
[17:17:00] <logmsgbot>	 !log dduvall@deploy1001 Started scap: testwikis wikis to 1.36.0-wmf.6
[17:17:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:20:58] <icinga-wm>	 PROBLEM - PHP7 rendering on mwdebug1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[17:22:54] <icinga-wm>	 RECOVERY - PHP7 rendering on mwdebug1001 is OK: HTTP OK: HTTP/1.1 302 Found - 649 bytes in 8.082 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[17:28:38] <logmsgbot>	 !log dcausse@deploy1001 Started deploy [wikimedia/discovery/analytics@bc2f7f1]: test: Add wikidata ttl import
[17:28:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:30:31] <logmsgbot>	 !log dcausse@deploy1001 Finished deploy [wikimedia/discovery/analytics@bc2f7f1]: test: Add wikidata ttl import (duration: 01m 52s)
[17:30:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:31:12] <icinga-wm>	 RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 50 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[17:32:52] <wikibugs_>	 10Operations, 10DC-Ops, 10fundraising-tech-ops: RAID controller failing on frdb1002.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T261221 (10Jgreen)
[17:33:18] <wikibugs>	 (03CR) 10Cwhite: [C: 03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/622395 (owner: 10Herron)
[17:33:36] <wikibugs_>	 (03CR) 10Herron: [C: 03+2] logstash: set elk7 cluster elasticsearch version to 7.9 [puppet] - 10https://gerrit.wikimedia.org/r/622395 (owner: 10Herron)
[17:34:57] <wikibugs>	 10Operations, 10DC-Ops, 10fundraising-tech-ops: RAID controller failing on frdb1002.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T261221 (10Jgreen)
[17:35:31] <wikibugs>	 10Operations, 10CommRel-Specialists-Support (Jul-Sep-2020), 10User-notice: CommRel support for FY2020-2021 Q1 DC switchover - https://phabricator.wikimedia.org/T244808 (10RLazarus) Done: https://wikitech.wikimedia.org/wiki/Switch_Datacenter#Schedule_for_2020_switch
[17:37:45] <wikibugs>	 (03PS1) 10Herron: profile::elasticserach: add version 7.9 to enum [puppet] - 10https://gerrit.wikimedia.org/r/622402
[17:40:03] <wikibugs_>	 (03CR) 10Herron: "https://puppet-compiler.wmflabs.org/compiler1001/24653/" [puppet] - 10https://gerrit.wikimedia.org/r/622402 (owner: 10Herron)
[17:40:33] <wikibugs>	 10Operations, 10DC-Ops, 10fundraising-tech-ops: RAID controller failing on frdb1002.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T261221 (10Jgreen)
[17:40:51] <wikibugs>	 (03CR) 10Herron: [C: 03+2] profile::elasticserach: add version 7.9 to enum [puppet] - 10https://gerrit.wikimedia.org/r/622402 (owner: 10Herron)
[17:41:44] <wikibugs_>	 (03CR) 10Dzahn: [V: 03+1 C: 03+2] zuul: add data types, replace hiera() with lookup() [puppet] - 10https://gerrit.wikimedia.org/r/621758 (owner: 10Dzahn)
[17:47:59] <wikibugs>	 (03Abandoned) 10Dzahn: mediawiki::fonts: remove jessie support [puppet] - 10https://gerrit.wikimedia.org/r/621374 (owner: 10Dzahn)
[17:48:15] <wikibugs_>	 (03PS2) 10Dzahn: webperf: add data types to profiles [puppet] - 10https://gerrit.wikimedia.org/r/621756
[17:49:12] <wikibugs_>	 (03CR) 10Dzahn: [C: 03+2] webperf: add data types to profiles [puppet] - 10https://gerrit.wikimedia.org/r/621756 (owner: 10Dzahn)
[17:58:58] <logmsgbot>	 !log dduvall@deploy1001 Finished scap: testwikis wikis to 1.36.0-wmf.6 (duration: 41m 58s)
[17:59:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:59:31] <wikibugs_>	 10Operations, 10DC-Ops, 10fundraising-tech-ops: RAID controller failing on frdb1002.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T261221 (10Jgreen)
[18:00:04] <jouncebot>	 Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200825T1800)
[18:00:57] <wikibugs_>	 10Operations, 10DC-Ops, 10fundraising-tech-ops: RAID controller failing on frdb1002.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T261221 (10Jgreen) Note that fr-tech-ops intents to schedule a firmware upgrade, but our concern is that the upgrade is likely surface an underlying hardware issue rather...
[18:05:44] <wikibugs>	 (03CR) 10Dzahn: ldap: remove jessie support (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/621372 (owner: 10Dzahn)
[18:06:05] <wikibugs_>	 (03PS2) 10Dzahn: ldap: remove jessie support [puppet] - 10https://gerrit.wikimedia.org/r/621372
[18:08:22] <wikibugs_>	 (03Abandoned) 10Dzahn: service::catalog: switch ORES to encryption: true [puppet] - 10https://gerrit.wikimedia.org/r/621564 (owner: 10Dzahn)
[18:08:24] <wikibugs>	 (03CR) 10Ryan Kemper: [C: 03+2] "Shipping this" [puppet] - 10https://gerrit.wikimedia.org/r/618954 (owner: 10DCausse)
[18:10:42] <wikibugs_>	 (03CR) 10Ryan Kemper: "`sudo puppet-merge` successful" [puppet] - 10https://gerrit.wikimedia.org/r/618954 (owner: 10DCausse)
[18:11:45] <wikibugs>	 (03PS1) 10Ahmon Dancy: Improve error message if wikiversions.php has wrong format [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622408
[18:11:58] <wikibugs_>	 (03CR) 10Dzahn: prometheus: hiera() -> lookup(), add data type for prometheus_nodes (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/621759 (owner: 10Dzahn)
[18:12:37] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Improve error message if wikiversions.php has wrong format [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622408 (owner: 10Ahmon Dancy)
[18:13:57] <wikibugs>	 (03PS2) 10Ahmon Dancy: Improve error message if wikiversions.php has wrong format [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622408
[18:15:43] <wikibugs>	 10Operations, 10CommRel-Specialists-Support (Jul-Sep-2020), 10User-notice: CommRel support for FY2020-2021 Q1 DC switchover - https://phabricator.wikimedia.org/T244808 (10Trizek-WMF) >>! In T244808#6409998, @RLazarus wrote: > Done: https://wikitech.wikimedia.org/wiki/Switch_Datacenter#Schedule_for_2020_switc...
[18:16:10] <wikibugs>	 10Operations, 10CommRel-Specialists-Support (Jul-Sep-2020), 10User-notice: CommRel support for FY2020-2021 Q1 DC switchover - https://phabricator.wikimedia.org/T244808 (10Trizek-WMF)
[18:16:24] <wikibugs>	 (03CR) 10Ryan Kemper: [C: 03+2] "Newest PCC looks good:" [puppet] - 10https://gerrit.wikimedia.org/r/619289 (https://phabricator.wikimedia.org/T251515) (owner: 10ZPapierski)
[18:19:34] <logmsgbot>	 !log ryankemper@cumin1001 START - Cookbook sre.wdqs.data-reload
[18:19:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:24:42] <wikibugs_>	 (03CR) 10Dzahn: "@hashar How long should we wait before releases1001 is deleted?" [puppet] - 10https://gerrit.wikimedia.org/r/621090 (https://phabricator.wikimedia.org/T260742) (owner: 10Dzahn)
[18:34:23] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 52, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[18:34:34] <icinga-wm>	 PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 57 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[18:34:36] <icinga-wm>	 PROBLEM - Router interfaces on cr3-ulsfo is CRITICAL: CRITICAL: host 198.35.26.192, interfaces up: 75, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[18:34:38] <icinga-wm>	 PROBLEM - Unmerged changes on repository puppet on puppetmaster1001 is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). https://wikitech.wikimedia.org/wiki/Monitoring/unmerged_changes
[18:40:16] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqord is OK: OK: host 208.80.154.198, interfaces up: 54, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[18:40:28] <icinga-wm>	 RECOVERY - Router interfaces on cr3-ulsfo is OK: OK: host 198.35.26.192, interfaces up: 77, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[18:40:32] <icinga-wm>	 RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 50 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[18:43:38] <wikibugs_>	 (03PS3) 10CDanis: package_builder: add support for 'sloppy' backports [puppet] - 10https://gerrit.wikimedia.org/r/622190
[18:45:07] <wikibugs_>	 (03CR) 10CDanis: "https://puppet-compiler.wmflabs.org/compiler1001/24655/deneb.codfw.wmnet/index.html" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/622190 (owner: 10CDanis)
[18:46:00] <cdanis>	 that is an odd set of alerts to be coincident 
[18:50:08] <wikibugs>	 (03PS4) 10CDanis: package_builder: add support for 'sloppy' backports [puppet] - 10https://gerrit.wikimedia.org/r/622190 (https://phabricator.wikimedia.org/T261193)
[18:50:40] <wikibugs_>	 (03PS6) 10Dzahn: prometheus: hiera() -> lookup(), add data type for prometheus_nodes [puppet] - 10https://gerrit.wikimedia.org/r/621759
[18:53:29] <wikibugs_>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/621372 (owner: 10Dzahn)
[18:56:24] <icinga-wm>	 PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 53 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[19:00:04] <jouncebot>	 marxarelli and longma: (Dis)respected human, time to deploy Mediawiki train - American Version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200825T1900). Please do the needful.
[19:01:05] <wikibugs>	 (03CR) 10Muehlenhoff: "The patch is technically correct, but stretch-backports will be removed from the Debian mirrors  soon (along with the sloppy- counterpart)" [puppet] - 10https://gerrit.wikimedia.org/r/622190 (https://phabricator.wikimedia.org/T261193) (owner: 10CDanis)
[19:02:22] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 51 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[19:02:22] <icinga-wm>	 RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 50 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[19:02:39] <moritzm>	 !log installing Java security updates on elastic* hosts
[19:02:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:05:09] <moritzm>	 !log installing Java security updates on cloudelastic* hosts
[19:05:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:05:12] <marxarelli>	 o/ deploying to group0 shortly
[19:05:43] <wikibugs>	 10Operations, 10Discovery-Search, 10Datacenter-Switchover-2018: Warn when CirrusSearch is not configured to use local DC for an extended time - https://phabricator.wikimedia.org/T204135 (10Gehel) p:05High→03Low
[19:05:55] <wikibugs_>	 (03CR) 10CDanis: "> Patch Set 4:" [puppet] - 10https://gerrit.wikimedia.org/r/622190 (https://phabricator.wikimedia.org/T261193) (owner: 10CDanis)
[19:07:02] <wikibugs>	 (03PS1) 10Dduvall: group0 wikis to 1.36.0-wmf.6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622414
[19:07:04] <wikibugs_>	 (03CR) 10Dduvall: [C: 03+2] group0 wikis to 1.36.0-wmf.6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622414 (owner: 10Dduvall)
[19:07:50] <wikibugs_>	 (03Merged) 10jenkins-bot: group0 wikis to 1.36.0-wmf.6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622414 (owner: 10Dduvall)
[19:08:07] <wikibugs>	 (03CR) 10Muehlenhoff: "I have built a stretch-wikimedia bpfcc backport on deneb, only needs to be imported to apt.wikimedia.org along with the wrapper discussed " [puppet] - 10https://gerrit.wikimedia.org/r/622190 (https://phabricator.wikimedia.org/T261193) (owner: 10CDanis)
[19:08:18] <icinga-wm>	 RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 50 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[19:09:42] <wikibugs>	 (03CR) 10CDanis: "> Patch Set 4:" [puppet] - 10https://gerrit.wikimedia.org/r/622190 (https://phabricator.wikimedia.org/T261193) (owner: 10CDanis)
[19:09:43] <logmsgbot>	 !log dduvall@deploy1001 rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.6
[19:09:44] <wikibugs_>	 10Operations, 10ops-eqiad, 10Discovery-Search (Current work), 10Patch-For-Review: (Need by: 2020-04-02) rack/setup/install relforge100[34] - https://phabricator.wikimedia.org/T241791 (10Gehel)
[19:09:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:12:14] <icinga-wm>	 PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 51 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[19:15:20] <marxarelli>	 !log 1.36.0-wmf.6 promoted to group0 (T257974). no new errors
[19:15:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:15:24] <stashbot>	 T257974: 1.36.0-wmf.6 deployment blockers - https://phabricator.wikimedia.org/T257974
[19:16:26] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[19:17:31] <hasharDinner>	 bah
[19:18:14] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 52 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[19:18:59] <wikibugs>	 10Operations, 10video2commons: video-redis-buster.video.eqiad.wmflabs:6379. Connection refused. - https://phabricator.wikimedia.org/T261245 (10Jidanni)
[19:20:24] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[19:27:39] <wikibugs>	 (03Abandoned) 10CDanis: package_builder: add support for 'sloppy' backports [puppet] - 10https://gerrit.wikimedia.org/r/622190 (https://phabricator.wikimedia.org/T261193) (owner: 10CDanis)
[19:30:10] <icinga-wm>	 RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 50 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[19:39:18] <logmsgbot>	 !log dcausse@deploy1001 Started deploy [wikimedia/discovery/analytics@125cb6d]: test: Add wikidata ttl import
[19:39:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:40:12] <logmsgbot>	 !log dcausse@deploy1001 Finished deploy [wikimedia/discovery/analytics@125cb6d]: test: Add wikidata ttl import (duration: 00m 54s)
[19:40:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:46:04] <icinga-wm>	 PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 51 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[19:52:02] <icinga-wm>	 RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 50 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[19:58:09] <wikibugs_>	 (03PS2) 10Ahmon Dancy: Updated some cross references in comments [mediawiki-config] - 10https://gerrit.wikimedia.org/r/621589
[20:05:56] <icinga-wm>	 RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 50 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[20:15:52] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 51 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[20:19:48] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10serviceops: (Need By: TBD) rack/setup/install kubernetes1017.eqiad.wmnet - https://phabricator.wikimedia.org/T258747 (10Jclark-ctr) a:05Jclark-ctr→03Cmjohnson
[20:19:54] <wikibugs_>	 10Operations, 10ops-eqiad, 10DC-Ops, 10serviceops: (Need By: TBD) rack/setup/install kubernetes1017.eqiad.wmnet - https://phabricator.wikimedia.org/T258747 (10Jclark-ctr) Racked and cabled host   kubernetes1017    A5. U31. Port 31
[20:20:57] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10DC-Ops: (Need By: 2020-09-15) rack/setup/install db1150 (see note on hostname) - https://phabricator.wikimedia.org/T260817 (10Jclark-ctr)
[20:21:50] <icinga-wm>	 RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 50 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[20:22:05] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10DC-Ops: (Need By: 2020-09-15) rack/setup/install db1150 (see note on hostname) - https://phabricator.wikimedia.org/T260817 (10Jclark-ctr)
[20:53:38] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 53 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[20:55:30] <icinga-wm>	 PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 55 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[20:55:41] <wikibugs_>	 (03PS1) 10Andrew Bogott: cloudvirts: add ceph config to non-ceph-enabled cloudvirts [puppet] - 10https://gerrit.wikimedia.org/r/622427 (https://phabricator.wikimedia.org/T261252)
[20:59:12] <icinga-wm>	 PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: CRITICAL - failed 51 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/1791212/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[21:05:10] <icinga-wm>	 RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is OK: OK - failed 50 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/1791212/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[21:05:35] <cdanis>	 the IPv6 internet seems kind of generally unhappy the last several hours / last day :/
[21:10:16] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 52, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[21:10:34] <icinga-wm>	 PROBLEM - Router interfaces on cr3-ulsfo is CRITICAL: CRITICAL: host 198.35.26.192, interfaces up: 75, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[21:13:50] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 51 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[21:16:28] <icinga-wm>	 RECOVERY - Router interfaces on cr3-ulsfo is OK: OK: host 198.35.26.192, interfaces up: 77, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[21:18:06] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqord is OK: OK: host 208.80.154.198, interfaces up: 54, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[21:19:46] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 50 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[21:25:14] <icinga-wm>	 RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 50 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[21:41:26] <icinga-wm>	 RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 50 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[21:46:35] <mutante>	 !log importing xhgui 0.12.0-2-wmf1 to buster-wikimedia APT repo (T260397)
[21:46:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:46:39] <stashbot>	 T260397: XHGui is returning all results, with wrong sort order - https://phabricator.wikimedia.org/T260397
[21:49:00] <mutante>	 dpifke: on xhgui1001 i can now see that it _would_ install 012.0-2-wmf1 when i simulate the install with -s. would you like me to install it for real or do it yourself
[21:49:49] <dpifke>	 Go for it.
[21:50:28] <mutante>	 !log xhgui1001 - Unpacking xhgui (0.12.0-2-wmf1) over (0.9.0-1-wmf1) ...
[21:50:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:51:31] <mutante>	 !log xhgui1001/xhgui2001 - Unpacking xhgui (0.12.0-2-wmf1) over (0.9.0-1-wmf1) (T260397)
[21:51:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:51:45] <mutante>	 done on both
[21:52:05] <dpifke>	 Looks good, thanks! 
[21:52:12] <mutante>	 great. yw
[21:53:53] <wikibugs>	 (03PS2) 10Andrew Bogott: cloudvirts: add ceph config to non-ceph-enabled cloudvirts [puppet] - 10https://gerrit.wikimedia.org/r/622427 (https://phabricator.wikimedia.org/T261252)
[21:58:15] <wikibugs>	 (03PS3) 10Andrew Bogott: cloudvirts: add ceph config to non-ceph-enabled cloudvirts [puppet] - 10https://gerrit.wikimedia.org/r/622427 (https://phabricator.wikimedia.org/T261252)
[21:59:06] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=205 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[21:59:22] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 51 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[22:01:08] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[22:05:20] <icinga-wm>	 RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 50 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[22:07:37] <wikibugs>	 (03PS4) 10Andrew Bogott: cloudvirts: add ceph config to non-ceph-enabled cloudvirts [puppet] - 10https://gerrit.wikimedia.org/r/622427 (https://phabricator.wikimedia.org/T261252)
[22:11:00] <icinga-wm>	 PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 51 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[22:14:37] <wikibugs_>	 (03CR) 10Andrew Bogott: [C: 03+2] cloudvirts: add ceph config to non-ceph-enabled cloudvirts [puppet] - 10https://gerrit.wikimedia.org/r/622427 (https://phabricator.wikimedia.org/T261252) (owner: 10Andrew Bogott)
[22:16:07] <wikibugs>	 (03PS3) 10BryanDavis: dynamicproxy: serve default /robots.txt and /favicon.ico for Toolforge [puppet] - 10https://gerrit.wikimedia.org/r/622237 (https://phabricator.wikimedia.org/T251628)
[22:16:09] <wikibugs_>	 (03PS3) 10BryanDavis: dynamicproxy: allow service workers in Toolforge [puppet] - 10https://gerrit.wikimedia.org/r/622238 (https://phabricator.wikimedia.org/T158216)
[22:16:11] <wikibugs>	 (03PS1) 10BryanDavis: dynamicproxy: Drop temporary file cleanup blocks [puppet] - 10https://gerrit.wikimedia.org/r/622434
[22:16:13] <wikibugs_>	 (03PS1) 10BryanDavis: dynamicproxy: update Content-Security-Policy-Report-Only header [puppet] - 10https://gerrit.wikimedia.org/r/622435
[22:16:15] <wikibugs>	 (03PS1) 10BryanDavis: dynamicproxy: Remove X-Wikimedia-Debug error page overrides [puppet] - 10https://gerrit.wikimedia.org/r/622436
[22:16:17] <wikibugs_>	 (03PS1) 10BryanDavis: dynamicproxy: Update proxy_redirect to use $host to limit scheme rewrites [puppet] - 10https://gerrit.wikimedia.org/r/622437
[22:20:16] <icinga-wm>	 RECOVERY - Unmerged changes on repository puppet on puppetmaster1001 is OK: No changes to merge. https://wikitech.wikimedia.org/wiki/Monitoring/unmerged_changes
[22:22:58] <icinga-wm>	 RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 50 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[22:29:01] <wikibugs_>	 (03PS1) 10Andrew Bogott: wmcs admin scripts: add wmcs-ceph-migrate [puppet] - 10https://gerrit.wikimedia.org/r/622440 (https://phabricator.wikimedia.org/T261252)
[22:29:40] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] wmcs admin scripts: add wmcs-ceph-migrate [puppet] - 10https://gerrit.wikimedia.org/r/622440 (https://phabricator.wikimedia.org/T261252) (owner: 10Andrew Bogott)
[22:30:28] <wikibugs>	 10Operations, 10video2commons: video-redis-buster.video.eqiad.wmflabs:6379. Connection refused. - https://phabricator.wikimedia.org/T261245 (10Aklapper) 05Open→03Invalid In my understanding this needs to be reported on Github and not here; see https://phabricator.wikimedia.org/project/profile/2141/
[22:32:52] <icinga-wm>	 PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 51 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[22:38:50] <icinga-wm>	 RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 50 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[22:55:00] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 51 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[23:00:04] <jouncebot>	 RoanKattouw, Niharika, and Urbanecm: My dear minions, it's time we take the moon! Just kidding. Time for Evening backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200825T2300).
[23:00:04] <jouncebot>	 No GERRIT patches in the queue for this window AFAICS.
[23:00:36] <icinga-wm>	 PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 51 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[23:06:58] <icinga-wm>	 RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 50 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[23:20:56] <icinga-wm>	 PROBLEM - Time elapsed since the last kafka event processed by purged on cp4030 is CRITICAL: cluster=cache_text instance=cp4030 job=purged site=ulsfo topic=codfw.resource-purge https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=ulsfo+prometheus/ops&var-instance=cp4030
[23:22:56] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 51 probes of 557 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[23:22:56] <icinga-wm>	 RECOVERY - Time elapsed since the last kafka event processed by purged on cp4030 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=ulsfo+prometheus/ops&var-instance=cp4030
[23:34:28] <wikibugs>	 (03PS1) 10Bstorm: wikireplicas: create multiinstance roles and profiles [puppet] - 10https://gerrit.wikimedia.org/r/622444 (https://phabricator.wikimedia.org/T260843)
[23:35:27] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] wikireplicas: create multiinstance roles and profiles [puppet] - 10https://gerrit.wikimedia.org/r/622444 (https://phabricator.wikimedia.org/T260843) (owner: 10Bstorm)
[23:36:24] <icinga-wm>	 RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 50 probes of 555 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[23:43:00] <wikibugs_>	 (03PS2) 10Bstorm: wikireplicas: create multiinstance roles and profiles [puppet] - 10https://gerrit.wikimedia.org/r/622444 (https://phabricator.wikimedia.org/T260843)
[23:43:59] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] wikireplicas: create multiinstance roles and profiles [puppet] - 10https://gerrit.wikimedia.org/r/622444 (https://phabricator.wikimedia.org/T260843) (owner: 10Bstorm)
[23:46:32] <wikibugs_>	 (03CR) 10Bstorm: "This is obviously adapted from dbstore_multiinstance with some updates to be more keyed off hiera and create all the stuff needed for labs" [puppet] - 10https://gerrit.wikimedia.org/r/622444 (https://phabricator.wikimedia.org/T260843) (owner: 10Bstorm)
[23:53:42] <wikibugs>	 (03CR) 10Bstorm: [C: 03+2] dynamicproxy: Drop temporary file cleanup blocks [puppet] - 10https://gerrit.wikimedia.org/r/622434 (owner: 10BryanDavis)
[23:56:06] <icinga-wm>	 PROBLEM - Time elapsed since the last kafka event processed by purged on cp5009 is CRITICAL: cluster=cache_text instance=cp5009 job=purged site=eqsin topic=codfw.resource-purge https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=eqsin+prometheus/ops&var-instance=cp5009
[23:56:10] <icinga-wm>	 PROBLEM - Time elapsed since the last kafka event processed by purged on cp4028 is CRITICAL: cluster=cache_text instance=cp4028 job=purged site=ulsfo topic=codfw.resource-purge https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=ulsfo+prometheus/ops&var-instance=cp4028
[23:57:42] <icinga-wm>	 PROBLEM - Time elapsed since the last kafka event processed by purged on cp3058 is CRITICAL: cluster=cache_text instance=cp3058 job=purged site=esams topic=codfw.resource-purge https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3058
[23:58:06] <icinga-wm>	 RECOVERY - Time elapsed since the last kafka event processed by purged on cp5009 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=eqsin+prometheus/ops&var-instance=cp5009
[23:58:10] <icinga-wm>	 RECOVERY - Time elapsed since the last kafka event processed by purged on cp4028 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=ulsfo+prometheus/ops&var-instance=cp4028
[23:59:40] <icinga-wm>	 RECOVERY - Time elapsed since the last kafka event processed by purged on cp3058 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3058