[00:00:31] <logmsgbot>	 !log dzahn@cumin1001 START - Cookbook sre.hosts.decommission
[00:00:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:09:37] <wikibugs>	 (03CR) 10Bstorm: "> Patch Set 3:" [puppet] - 10https://gerrit.wikimedia.org/r/631315 (owner: 10Dzahn)
[00:16:02] <logmsgbot>	 !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
[00:16:05] <wikibugs>	 10Operations: serve tftpboot environment from the install servers and create one in each edge POP - https://phabricator.wikimedia.org/T252526 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: `testvm4001.ulsfo.wmnet` - testvm4001.ulsfo.wmnet (**WARN**)   - **Failed do...
[00:16:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:18:05] <wikibugs>	 (03PS4) 10Dzahn: toolforge/grid: hiera()->lookup(), add data types [puppet] - 10https://gerrit.wikimedia.org/r/631315
[00:18:19] <wikibugs>	 (03CR) 10Dzahn: "ah yes, rebase needed because toolforge/services/basic.pp was deleted earlier today. should be fixed now" [puppet] - 10https://gerrit.wikimedia.org/r/631315 (owner: 10Dzahn)
[00:20:42] <wikibugs>	 (03PS12) 10Dzahn: labstore: add data types and some other style fixes [puppet] - 10https://gerrit.wikimedia.org/r/622666
[00:20:45] <wikibugs>	 (03CR) 10Dzahn: labstore: add data types and some other style fixes (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/622666 (owner: 10Dzahn)
[00:22:39] <wikibugs>	 (03CR) 10Bstorm: "It seems...upset https://puppet-compiler.wmflabs.org/compiler1003/25802/" [puppet] - 10https://gerrit.wikimedia.org/r/631315 (owner: 10Dzahn)
[00:28:45] <wikibugs>	 (03PS13) 10Dzahn: labstore: add data types and some other style fixes [puppet] - 10https://gerrit.wikimedia.org/r/622666
[00:29:30] <wikibugs>	 (03CR) 10Dzahn: "also have to use " Hash[String, Hash[String, Variant[Integer,String]]] $drbd_resource_config " now because of the port as actual number" [puppet] - 10https://gerrit.wikimedia.org/r/622666 (owner: 10Dzahn)
[00:31:48] <wikibugs>	 (03CR) 10Dzahn: [C: 04-1] "arr.. still not there :( https://puppet-compiler.wmflabs.org/compiler1003/25804/labstore1004.eqiad.wmnet/change.labstore1004.eqiad.wmnet.e" [puppet] - 10https://gerrit.wikimedia.org/r/622666 (owner: 10Dzahn)
[00:35:15] <wikibugs>	 (03PS5) 10Dzahn: toolforge/grid: hiera()->lookup(), add data types [puppet] - 10https://gerrit.wikimedia.org/r/631315
[00:36:32] <wikibugs>	 (03CR) 10Dzahn: "> Patch Set 4:" [puppet] - 10https://gerrit.wikimedia.org/r/631315 (owner: 10Dzahn)
[00:39:38] <wikibugs>	 (03PS6) 10Dzahn: toolforge/grid: hiera()->lookup(), add data types [puppet] - 10https://gerrit.wikimedia.org/r/631315
[00:42:28] <wikibugs>	 (03PS7) 10Dzahn: toolforge/grid: hiera()->lookup(), add data types [puppet] - 10https://gerrit.wikimedia.org/r/631315
[00:45:27] <wikibugs>	 (03CR) 10Dzahn: [V: 03+1] "after fixing multiple issues, finally looking good: https://puppet-compiler.wmflabs.org/compiler1002/25807/" [puppet] - 10https://gerrit.wikimedia.org/r/631315 (owner: 10Dzahn)
[02:38:10] <wikibugs>	 (03PS1) 10Andrew Bogott: wmcs-backup-instances: add missing argument [puppet] - 10https://gerrit.wikimedia.org/r/633049 (https://phabricator.wikimedia.org/T260692)
[02:39:20] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] wmcs-backup-instances: add missing argument [puppet] - 10https://gerrit.wikimedia.org/r/633049 (https://phabricator.wikimedia.org/T260692) (owner: 10Andrew Bogott)
[05:18:09] <wikibugs>	 (03PS1) 10Marostegui: control-mariadb*: Bump version [software] - 10https://gerrit.wikimedia.org/r/633053
[05:18:48] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] control-mariadb*: Bump version [software] - 10https://gerrit.wikimedia.org/r/633053 (owner: 10Marostegui)
[06:07:20] <icinga-wm>	 PROBLEM - Host lvs3005 is DOWN: PING CRITICAL - Packet loss = 100%
[06:07:36] <icinga-wm>	 PROBLEM - Host ncredir-lb.esams.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100%
[06:08:00] <marostegui>	 checking
[06:08:02] <icinga-wm>	 RECOVERY - Host ncredir-lb.esams.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 92.63 ms
[06:08:06] <icinga-wm>	 RECOVERY - Host lvs3005 is UP: PING OK - Packet loss = 0%, RTA = 83.37 ms
[06:08:31] <marostegui>	 the host didn't reboot
[06:08:42] <vgutierrez>	 Uh
[06:08:52] <marostegui>	 I guess network glitch?
[06:09:15] <vgutierrez>	 kern.log reports changes on network link status?
[06:09:30] <marostegui>	 nope
[06:14:27] <XioNoX>	 don't see anything wrong on the network neither
[06:49:20] <icinga-wm>	 PROBLEM - Check systemd state on netbox1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:50:29] <wikibugs>	 (03PS1) 10Marostegui: es*: Unify new es hosts config [puppet] - 10https://gerrit.wikimedia.org/r/633133 (https://phabricator.wikimedia.org/T261717)
[06:56:44] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] es*: Unify new es hosts config [puppet] - 10https://gerrit.wikimedia.org/r/633133 (https://phabricator.wikimedia.org/T261717) (owner: 10Marostegui)
[07:00:05] <jouncebot>	 Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201009T0700)
[07:01:12] <icinga-wm>	 PROBLEM - Router interfaces on cr2-codfw is CRITICAL: CRITICAL: host 208.80.153.193, interfaces up: 130, down: 1, dormant: 0, excluded: 1, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[07:01:12] <icinga-wm>	 PROBLEM - Router interfaces on cr4-ulsfo is CRITICAL: CRITICAL: host 198.35.26.193, interfaces up: 75, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[07:01:38] <icinga-wm>	 PROBLEM - BGP status on cr2-codfw is CRITICAL: BGP CRITICAL - AS6461/IPv4: Idle - Zayo, AS6461/IPv6: Idle - Zayo https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[07:02:20] <icinga-wm>	 PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 132, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[07:04:12] <icinga-wm>	 PROBLEM - Check systemd state on elastic1063 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[07:07:47] <dcausse>	 hm: elasticsearch_6@production-search-eqiad.service: Main process exited, code=killed, status=11/SEGV ^ :/
[07:09:43] <elukey>	 ouch
[07:09:46] <icinga-wm>	 RECOVERY - Router interfaces on cr2-codfw is OK: OK: host 208.80.153.193, interfaces up: 132, down: 0, dormant: 0, excluded: 1, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[07:09:46] <icinga-wm>	 RECOVERY - Router interfaces on cr4-ulsfo is OK: OK: host 198.35.26.193, interfaces up: 77, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[07:10:08] <icinga-wm>	 RECOVERY - BGP status on cr2-codfw is OK: BGP OK - up: 53, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[07:10:52] <icinga-wm>	 RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 134, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[07:11:55] <wikibugs>	 10Operations, 10Traffic, 10Wikipedia-iOS-App-Backlog, 10iOS-app-Bugs: Wikipedia iOS apps sending harmful bursts of traffic synchronized to the top of the hour, especially at 22:00 UTC - https://phabricator.wikimedia.org/T264881 (10Joe) I think some form of ratelimiting for that should be present in restbas...
[07:12:51] <dcausse>	 elukey: seeing "Killing elasticsearch[e:3361 due to hardware memory corruption fault at 7ff100d68000" should we depool this machine?
[07:13:27] <elukey>	 checking
[07:13:30] <wikibugs>	 (03PS1) 10Elukey: admin: add user lexnasser back to active state [puppet] - 10https://gerrit.wikimedia.org/r/633135 (https://phabricator.wikimedia.org/T265071)
[07:14:56] <icinga-wm>	 RECOVERY - Check systemd state on netbox1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[07:16:06] <icinga-wm>	 RECOVERY - Check systemd state on elastic1063 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[07:16:13] <elukey>	 dcausse: I checked the DELL's DRAC and I don't see any memory errors reported (like DIMM bank on fire etc.. :D)
[07:16:52] <elukey>	 so we have two options
[07:17:09] <elukey>	 1) we leave the host running, and if it fails again we completely depool it (relying on {1}[Hardware Error]: It has been corrected by h/w and requires no further action)
[07:17:21] <elukey>	 2) we depool it directly asking for a memory test from dcops
[07:17:39] <elukey>	 likely 2 is better anyway, but for the time being we could see if the host keeps running
[07:17:55] <dcausse>	 elukey: ok I'm filing a task
[07:18:00] <elukey>	 ack
[07:21:06] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/633135 (https://phabricator.wikimedia.org/T265071) (owner: 10Elukey)
[07:21:44] <wikibugs>	 (03CR) 10Elukey: "If restoring the ssh key is ok I'll also take care of the other perms (ldap/kerberos). Lex already started working so he'd need access to " [puppet] - 10https://gerrit.wikimedia.org/r/633135 (https://phabricator.wikimedia.org/T265071) (owner: 10Elukey)
[07:24:06] <wikibugs>	 10Operations, 10Discovery-Search: Memory issue on elastic1063 caused elasticsearch to be killed - https://phabricator.wikimedia.org/T265113 (10dcausse)
[07:25:33] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] admin: add user lexnasser back to active state [puppet] - 10https://gerrit.wikimedia.org/r/633135 (https://phabricator.wikimedia.org/T265071) (owner: 10Elukey)
[07:26:30] <wikibugs>	 10Operations, 10ops-eqiad, 10Discovery-Search: Memory issue on elastic1063 caused elasticsearch to be killed - https://phabricator.wikimedia.org/T265113 (10elukey)
[07:26:35] <wikibugs>	 10Operations, 10SRE-Access-Requests: Requesting access to researchers and analytics-privatedata-users for Leila Zia - https://phabricator.wikimedia.org/T264472 (10MoritzMuehlenhoff) The "leila" account also needs to be removed from the wmf LDAP group.
[07:31:54] <wikibugs>	 10Operations, 10netops, 10Patch-For-Review, 10Security, 10User-jbond: Review default ferm INPUT policy - https://phabricator.wikimedia.org/T264888 (10MoritzMuehlenhoff) >>! In T264888#6529767, @jbond wrote: >however i, like @BBlack, prefer reject to drop if possible.  As such it would be nice to be good...
[07:32:56] <wikibugs>	 10Operations, 10Analytics, 10SRE-Access-Requests, 10Patch-For-Review: Renable SSH access for Lex Nasser, analytics intern - https://phabricator.wikimedia.org/T265071 (10elukey) @lexnasser you should now be able to ssh to the stat100x hosts (notebooks are not there anymore, deprecated, we copied your things...
[07:34:01] <logmsgbot>	 !log filippo@cumin1001 START - Cookbook sre.hosts.downtime
[07:34:01] <logmsgbot>	 !log filippo@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[07:34:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:34:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:36:34] <moritzm>	 !log installing xen security updates for buster (libs only)
[07:36:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:40:38] <wikibugs>	 10Operations, 10DBA, 10Patch-For-Review, 10User-Kormat, 10User-jbond: Refactor mariadb puppet code - https://phabricator.wikimedia.org/T256972 (10Marostegui)
[07:40:49] <wikibugs>	 10Operations, 10SRE-Access-Requests: Requesting access to researchers and analytics-privatedata-users for Leila Zia - https://phabricator.wikimedia.org/T264472 (10Kormat) >>! In T264472#6531651, @MoritzMuehlenhoff wrote: > The "leila" account also needs to be removed from the wmf LDAP group.  Good catch, done.
[07:41:02] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] "Looks good, before merging it is better to try the hdfs command to verify that it works (it should but it changed a little bit, so better " [puppet] - 10https://gerrit.wikimedia.org/r/631896 (https://phabricator.wikimedia.org/T264152) (owner: 10Razzi)
[07:45:49] <wikibugs>	 (03PS1) 10Muehlenhoff: Update cloudvirt Cumin alias [puppet] - 10https://gerrit.wikimedia.org/r/633137
[07:46:59] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Update cloudvirt Cumin alias [puppet] - 10https://gerrit.wikimedia.org/r/633137 (owner: 10Muehlenhoff)
[07:56:51] <wikibugs>	 (03PS1) 10Elukey: Decommission analytics1044 from Hadoop [puppet] - 10https://gerrit.wikimedia.org/r/633140 (https://phabricator.wikimedia.org/T255140)
[07:58:20] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] Decommission analytics1044 from Hadoop [puppet] - 10https://gerrit.wikimedia.org/r/633140 (https://phabricator.wikimedia.org/T255140) (owner: 10Elukey)
[08:11:05] <logmsgbot>	 !log filippo@cumin1001 START - Cookbook sre.hosts.downtime
[08:11:06] <logmsgbot>	 !log filippo@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[08:11:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:11:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:15:00] <wikibugs>	 (03PS1) 10Marostegui: dbstore1005: Decrease buffer pool size [puppet] - 10https://gerrit.wikimedia.org/r/633142
[08:17:53] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] dbstore1005: Decrease buffer pool size [puppet] - 10https://gerrit.wikimedia.org/r/633142 (owner: 10Marostegui)
[08:19:17] <wikibugs>	 (03PS4) 10ArielGlenn: new util to display info about revisions for one or more pages from XML input [dumps/mwbzutils] - 10https://gerrit.wikimedia.org/r/630267 (https://phabricator.wikimedia.org/T263319)
[08:20:01] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] dbstore1005: Decrease buffer pool size [puppet] - 10https://gerrit.wikimedia.org/r/633142 (owner: 10Marostegui)
[08:22:32] <marostegui>	 !log Restart dbstore1005 mysql to pick up new buffer pool sizes
[08:22:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:30:09] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove access for rush [puppet] - 10https://gerrit.wikimedia.org/r/633144
[08:30:56] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Remove access for rush [puppet] - 10https://gerrit.wikimedia.org/r/633144 (owner: 10Muehlenhoff)
[08:33:36] <wikibugs>	 10Operations, 10Traffic, 10Performance-Team (Radar): 8-10% response start regression (Varnish 5.1.3-1wm15 -> 6.0.6-1wm1) - https://phabricator.wikimedia.org/T264398 (10Gilles) I would like to know what the Traffic team is doing or planning on doing about this investigation at the moment. Now that I've narrow...
[08:34:04] <wikibugs>	 (03PS2) 10Muehlenhoff: Remove access for rush [puppet] - 10https://gerrit.wikimedia.org/r/633144
[08:43:01] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: cloudgw: refresh CIDR for vlan 2107 - cloud-gw-transport-codfw [puppet] - 10https://gerrit.wikimedia.org/r/633147 (https://phabricator.wikimedia.org/T263622)
[08:43:01] <wikibugs>	 10Operations, 10Wikidata, 10serviceops: Hourly read spikes against s8 resulting in occasional user-visible latency & error spikes - https://phabricator.wikimedia.org/T264821 (10LSobanski) Removing #DBA as there's nothing specific for us to do right now, do add us back if anything comes up.
[08:45:36] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] cloudgw: refresh CIDR for vlan 2107 - cloud-gw-transport-codfw [puppet] - 10https://gerrit.wikimedia.org/r/633147 (https://phabricator.wikimedia.org/T263622) (owner: 10Arturo Borrero Gonzalez)
[08:46:40] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_cxserver_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[08:49:43] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Remove access for rush [puppet] - 10https://gerrit.wikimedia.org/r/633144 (owner: 10Muehlenhoff)
[08:49:50] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[08:52:04] <wikibugs>	 (03PS1) 10Elukey: cdh: increase retention for the RFA log appender [puppet] - 10https://gerrit.wikimedia.org/r/633148
[08:52:33] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] cdh: increase retention for the RFA log appender [puppet] - 10https://gerrit.wikimedia.org/r/633148 (owner: 10Elukey)
[08:57:23] <wikibugs>	 (03PS1) 10Ayounsi: Remove user rush [homer/public] - 10https://gerrit.wikimedia.org/r/633149
[08:58:48] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [homer/public] - 10https://gerrit.wikimedia.org/r/633149 (owner: 10Ayounsi)
[08:59:53] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] Remove user rush [homer/public] - 10https://gerrit.wikimedia.org/r/633149 (owner: 10Ayounsi)
[09:00:20] <wikibugs>	 (03Merged) 10jenkins-bot: Remove user rush [homer/public] - 10https://gerrit.wikimedia.org/r/633149 (owner: 10Ayounsi)
[09:02:02] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove Chase from Icinga config [puppet] - 10https://gerrit.wikimedia.org/r/633150
[09:06:54] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Remove Chase from Icinga config [puppet] - 10https://gerrit.wikimedia.org/r/633150 (owner: 10Muehlenhoff)
[09:06:56] <wikibugs>	 (03PS1) 10Elukey: Allow the hdfs user to run Yarn jobs in Hadoop clusters [puppet] - 10https://gerrit.wikimedia.org/r/633151
[09:07:03] <wikibugs>	 10Operations, 10Performance-Team, 10Traffic, 10Wikimedia-Incident: 15% response start regression as of 2019-11-11 (Varnish->ATS) - https://phabricator.wikimedia.org/T238494 (10Gilles) @bblack when we last discussed the subject of this task in a meeting recently, you mentioned that replacing ats-tls (the "p...
[09:07:18] <XioNoX>	 !log remove user from all network devices
[09:07:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:07:32] <wikibugs>	 (03PS2) 10JMeybohm: eventgate-analytics: Update envoy to 1.15.1-2 See: Id8dfd7c5002cfd2c71b7f0aac4f21902035cc150 [deployment-charts] - 10https://gerrit.wikimedia.org/r/632712 (https://phabricator.wikimedia.org/T264157)
[09:08:10] <wikibugs>	 (03CR) 10Joal: "LGTM - Thanks elukey :)" [puppet] - 10https://gerrit.wikimedia.org/r/633151 (owner: 10Elukey)
[09:08:23] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] Allow the hdfs user to run Yarn jobs in Hadoop clusters [puppet] - 10https://gerrit.wikimedia.org/r/633151 (owner: 10Elukey)
[09:11:09] <wikibugs>	 (03PS1) 10Elukey: Remove incorrect banned user (hdfs) from Hadoop Yarn container's settings [puppet] - 10https://gerrit.wikimedia.org/r/633154
[09:11:31] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] Remove incorrect banned user (hdfs) from Hadoop Yarn container's settings [puppet] - 10https://gerrit.wikimedia.org/r/633154 (owner: 10Elukey)
[09:16:16] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] toolforge/grid: hiera()->lookup(), add data types [puppet] - 10https://gerrit.wikimedia.org/r/631315 (owner: 10Dzahn)
[09:17:53] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: openstack: codfw1dev: refresh external connection for neutron [puppet] - 10https://gerrit.wikimedia.org/r/633155 (https://phabricator.wikimedia.org/T261724)
[09:18:05] <wikibugs>	 (03PS10) 10Gehel: Introduce an interface for progress bars. [software/cumin] - 10https://gerrit.wikimedia.org/r/631702 (https://phabricator.wikimedia.org/T212783)
[09:18:12] <wikibugs>	 (03CR) 10Gehel: Introduce an interface for progress bars. (033 comments) [software/cumin] - 10https://gerrit.wikimedia.org/r/631702 (https://phabricator.wikimedia.org/T212783) (owner: 10Gehel)
[09:18:35] <wikibugs>	 (03Abandoned) 10Gehel: extract reporting from BaseEventHandler [software/cumin] - 10https://gerrit.wikimedia.org/r/451080 (owner: 10Gehel)
[09:18:42] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] openstack: codfw1dev: refresh external connection for neutron [puppet] - 10https://gerrit.wikimedia.org/r/633155 (https://phabricator.wikimedia.org/T261724) (owner: 10Arturo Borrero Gonzalez)
[09:19:41] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Introduce an interface for progress bars. [software/cumin] - 10https://gerrit.wikimedia.org/r/631702 (https://phabricator.wikimedia.org/T212783) (owner: 10Gehel)
[09:24:52] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1079 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:25:06] <elukey>	 this is me --^
[09:25:24] <wikibugs>	 10Operations, 10LDAP-Access-Requests: Access to the Logstash for John Bolorinos - https://phabricator.wikimedia.org/T264918 (10Kormat) It is, yep, thanks.  I just need the staff contact + contract end date now.
[09:25:33] <kormat>	 PROBLEM==elukey. it is known.
[09:25:38] <elukey>	 yeah correct
[09:26:08] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1079 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:26:22] <wikibugs>	 10Operations, 10DBA, 10Data-Persistence, 10Blocked-on-schema-change, 10User-Kormat: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831 (10Kormat)
[09:27:01] <wikibugs>	 (03PS1) 10Jbond: Firewall: Change the default firewall rule fleet wide [puppet] - 10https://gerrit.wikimedia.org/r/633156 (https://phabricator.wikimedia.org/T264888)
[09:27:03] <wikibugs>	 (03PS1) 10Jbond: Firewall: Change the default firewall rule cloud environment [puppet] - 10https://gerrit.wikimedia.org/r/633157 (https://phabricator.wikimedia.org/T264888)
[09:29:59] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] eventgate-analytics: Update envoy to 1.15.1-2 See: Id8dfd7c5002cfd2c71b7f0aac4f21902035cc150 [deployment-charts] - 10https://gerrit.wikimedia.org/r/632712 (https://phabricator.wikimedia.org/T264157) (owner: 10JMeybohm)
[09:30:09] <wikibugs>	 (03PS2) 10JMeybohm: eventgate-main: Update envoy to 1.15.1-2 See: Id8dfd7c5002cfd2c71b7f0aac4f21902035cc150 [deployment-charts] - 10https://gerrit.wikimedia.org/r/632713 (https://phabricator.wikimedia.org/T264157)
[09:30:58] <wikibugs>	 (03PS2) 10JMeybohm: mathoid: Update envoy to 1.15.1-2 See: Id8dfd7c5002cfd2c71b7f0aac4f21902035cc150 [deployment-charts] - 10https://gerrit.wikimedia.org/r/632714 (https://phabricator.wikimedia.org/T264157)
[09:31:10] <wikibugs>	 (03PS2) 10JMeybohm: mobileapps: Update envoy to 1.15.1-2 See: Id8dfd7c5002cfd2c71b7f0aac4f21902035cc150 [deployment-charts] - 10https://gerrit.wikimedia.org/r/632715 (https://phabricator.wikimedia.org/T264157)
[09:31:14] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[09:31:17] <wikibugs>	 (03PS2) 10JMeybohm: proton: Update envoy to 1.15.1-2 See: Id8dfd7c5002cfd2c71b7f0aac4f21902035cc150 [deployment-charts] - 10https://gerrit.wikimedia.org/r/632716 (https://phabricator.wikimedia.org/T264157)
[09:31:43] <wikibugs>	 (03PS2) 10JMeybohm: push-notifications: Update envoy to 1.15.1-2 See: Id8dfd7c5002cfd2c71b7f0aac4f21902035cc150 [deployment-charts] - 10https://gerrit.wikimedia.org/r/632717 (https://phabricator.wikimedia.org/T264157)
[09:31:52] <wikibugs>	 (03PS2) 10JMeybohm: termbox: Update envoy to 1.15.1-2 See: Id8dfd7c5002cfd2c71b7f0aac4f21902035cc150 [deployment-charts] - 10https://gerrit.wikimedia.org/r/632718 (https://phabricator.wikimedia.org/T264157)
[09:31:57] <wikibugs>	 (03PS2) 10JMeybohm: wikifeeds: Update envoy to 1.15.1-2 See: Id8dfd7c5002cfd2c71b7f0aac4f21902035cc150 [deployment-charts] - 10https://gerrit.wikimedia.org/r/632719 (https://phabricator.wikimedia.org/T264157)
[09:32:03] <wikibugs>	 (03PS2) 10JMeybohm: zotero: Update envoy to 1.15.1-2 See: Id8dfd7c5002cfd2c71b7f0aac4f21902035cc150 [deployment-charts] - 10https://gerrit.wikimedia.org/r/632720 (https://phabricator.wikimedia.org/T264157)
[09:32:30] <logmsgbot>	 !log kormat@cumin1001 START - Cookbook sre.hosts.downtime
[09:32:31] <logmsgbot>	 !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[09:32:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:32:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:32:42] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[09:32:54] <wikibugs>	 (03Merged) 10jenkins-bot: eventgate-analytics: Update envoy to 1.15.1-2 See: Id8dfd7c5002cfd2c71b7f0aac4f21902035cc150 [deployment-charts] - 10https://gerrit.wikimedia.org/r/632712 (https://phabricator.wikimedia.org/T264157) (owner: 10JMeybohm)
[09:33:31] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] proton: Update envoy to 1.15.1-2 See: Id8dfd7c5002cfd2c71b7f0aac4f21902035cc150 [deployment-charts] - 10https://gerrit.wikimedia.org/r/632716 (https://phabricator.wikimedia.org/T264157) (owner: 10JMeybohm)
[09:35:46] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] eventgate-main: Update envoy to 1.15.1-2 See: Id8dfd7c5002cfd2c71b7f0aac4f21902035cc150 [deployment-charts] - 10https://gerrit.wikimedia.org/r/632713 (https://phabricator.wikimedia.org/T264157) (owner: 10JMeybohm)
[09:35:48] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] mathoid: Update envoy to 1.15.1-2 See: Id8dfd7c5002cfd2c71b7f0aac4f21902035cc150 [deployment-charts] - 10https://gerrit.wikimedia.org/r/632714 (https://phabricator.wikimedia.org/T264157) (owner: 10JMeybohm)
[09:37:45] <wikibugs>	 (03Merged) 10jenkins-bot: eventgate-main: Update envoy to 1.15.1-2 See: Id8dfd7c5002cfd2c71b7f0aac4f21902035cc150 [deployment-charts] - 10https://gerrit.wikimedia.org/r/632713 (https://phabricator.wikimedia.org/T264157) (owner: 10JMeybohm)
[09:37:53] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] mathoid: Update envoy to 1.15.1-2 See: Id8dfd7c5002cfd2c71b7f0aac4f21902035cc150 [deployment-charts] - 10https://gerrit.wikimedia.org/r/632714 (https://phabricator.wikimedia.org/T264157) (owner: 10JMeybohm)
[09:38:15] <logmsgbot>	 !log jayme@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
[09:38:15] <logmsgbot>	 !log jayme@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
[09:38:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:38:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:39:33] <wikibugs>	 (03CR) 10Kormat: [C: 03+1] pontoon: use hiera.output [puppet] - 10https://gerrit.wikimedia.org/r/632921 (owner: 10Filippo Giunchedi)
[09:40:49] <wikibugs>	 (03CR) 10Kormat: [C: 03+1] "Nice :)" [puppet] - 10https://gerrit.wikimedia.org/r/632918 (owner: 10Filippo Giunchedi)
[09:43:03] <wikibugs>	 (03CR) 10Kormat: [C: 03+1] pontoon: read stack from stack.file [puppet] - 10https://gerrit.wikimedia.org/r/632919 (owner: 10Filippo Giunchedi)
[09:43:58] <wikibugs>	 (03CR) 10Kormat: [C: 03+1] pontoon: configure hiera based on the stack found on the filesystem [puppet] - 10https://gerrit.wikimedia.org/r/632920 (owner: 10Filippo Giunchedi)
[09:45:58] <wikibugs>	 (03PS1) 10Filippo Giunchedi: pontoon: set labs_tld and labs_site globally [puppet] - 10https://gerrit.wikimedia.org/r/633158
[09:47:31] <elukey>	 !log roll restart of hadoop-yarn-nodemanager on all hadoop workers to pick up new settings
[09:47:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:48:09] <wikibugs>	 (03CR) 10Kormat: [C: 03+1] pontoon: set labs_tld and labs_site globally [puppet] - 10https://gerrit.wikimedia.org/r/633158 (owner: 10Filippo Giunchedi)
[09:48:37] <wikibugs>	 (03PS1) 10JMeybohm: admin: jayme dotfiles: Add helmfile aliases [puppet] - 10https://gerrit.wikimedia.org/r/633159
[09:49:46] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] admin: jayme dotfiles: Add helmfile aliases [puppet] - 10https://gerrit.wikimedia.org/r/633159 (owner: 10JMeybohm)
[09:52:53] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] pontoon: set labs_tld and labs_site globally [puppet] - 10https://gerrit.wikimedia.org/r/633158 (owner: 10Filippo Giunchedi)
[09:53:56] <logmsgbot>	 !log jayme@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
[09:54:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:55:47] <logmsgbot>	 !log jayme@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
[09:55:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:55:57] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] pontoon: use hiera.output [puppet] - 10https://gerrit.wikimedia.org/r/632921 (owner: 10Filippo Giunchedi)
[09:55:59] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] pontoon: write the stack name once to the filesystem [puppet] - 10https://gerrit.wikimedia.org/r/632918 (owner: 10Filippo Giunchedi)
[09:56:01] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] pontoon: read stack from stack.file [puppet] - 10https://gerrit.wikimedia.org/r/632919 (owner: 10Filippo Giunchedi)
[09:56:03] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] pontoon: configure hiera based on the stack found on the filesystem [puppet] - 10https://gerrit.wikimedia.org/r/632920 (owner: 10Filippo Giunchedi)
[10:09:23] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "The change (as discussed in the phab task) looks good to me overall." [puppet] - 10https://gerrit.wikimedia.org/r/633157 (https://phabricator.wikimedia.org/T264888) (owner: 10Jbond)
[10:11:49] <logmsgbot>	 !log jayme@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
[10:11:49] <logmsgbot>	 !log jayme@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
[10:11:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:11:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:16:12] <logmsgbot>	 !log jayme@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
[10:16:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:17:18] <logmsgbot>	 !log jayme@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
[10:17:18] <logmsgbot>	 !log jayme@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
[10:17:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:17:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:34:37] <wikibugs>	 (03PS2) 10Muehlenhoff: Install ldap-replica200[34] as additional LDAP replicas [puppet] - 10https://gerrit.wikimedia.org/r/632648 (https://phabricator.wikimedia.org/T264388)
[10:39:27] <wikibugs>	 (03PS1) 10Elukey: Set up the new Analytics Hadoop test cluster [puppet] - 10https://gerrit.wikimedia.org/r/633162 (https://phabricator.wikimedia.org/T255139)
[10:41:10] <logmsgbot>	 !log gehel@cumin1001 END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
[10:41:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:42:16] <gehel>	 dcausse: ^ I'll have a look when back from lunch
[10:47:09] <wikibugs>	 (03CR) 10Hnowlan: [C: 03+1] Use ubuntu 16.04 as buildsystem to be compatible with stretch [debs/envoyproxy] - 10https://gerrit.wikimedia.org/r/632479 (owner: 10JMeybohm)
[10:49:03] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] "recheck" [deployment-charts] - 10https://gerrit.wikimedia.org/r/632714 (https://phabricator.wikimedia.org/T264157) (owner: 10JMeybohm)
[10:49:56] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] Use ubuntu 16.04 as buildsystem to be compatible with stretch [debs/envoyproxy] - 10https://gerrit.wikimedia.org/r/632479 (owner: 10JMeybohm)
[10:51:18] <wikibugs>	 (03Merged) 10jenkins-bot: mathoid: Update envoy to 1.15.1-2 See: Id8dfd7c5002cfd2c71b7f0aac4f21902035cc150 [deployment-charts] - 10https://gerrit.wikimedia.org/r/632714 (https://phabricator.wikimedia.org/T264157) (owner: 10JMeybohm)
[10:52:02] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] mobileapps: Update envoy to 1.15.1-2 See: Id8dfd7c5002cfd2c71b7f0aac4f21902035cc150 [deployment-charts] - 10https://gerrit.wikimedia.org/r/632715 (https://phabricator.wikimedia.org/T264157) (owner: 10JMeybohm)
[10:52:19] <logmsgbot>	 !log jayme@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
[10:52:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:54:15] <wikibugs>	 (03Merged) 10jenkins-bot: mobileapps: Update envoy to 1.15.1-2 See: Id8dfd7c5002cfd2c71b7f0aac4f21902035cc150 [deployment-charts] - 10https://gerrit.wikimedia.org/r/632715 (https://phabricator.wikimedia.org/T264157) (owner: 10JMeybohm)
[10:58:51] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] Set up the new Analytics Hadoop test cluster [puppet] - 10https://gerrit.wikimedia.org/r/633162 (https://phabricator.wikimedia.org/T255139) (owner: 10Elukey)
[11:02:22] <wikibugs>	 (03PS1) 10Elukey: Remove min disk available constraint from Hadoop test workers' settings [puppet] - 10https://gerrit.wikimedia.org/r/633164 (https://phabricator.wikimedia.org/T255139)
[11:04:24] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] Remove min disk available constraint from Hadoop test workers' settings [puppet] - 10https://gerrit.wikimedia.org/r/633164 (https://phabricator.wikimedia.org/T255139) (owner: 10Elukey)
[11:08:38] <wikibugs>	 (03CR) 10Hnowlan: [C: 03+2] map::postgresql_common: make maps-admin chgrp toggle [puppet] - 10https://gerrit.wikimedia.org/r/632935 (https://phabricator.wikimedia.org/T263726) (owner: 10Hnowlan)
[11:13:41] <logmsgbot>	 !log jayme@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
[11:13:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:13:47] <logmsgbot>	 !log jayme@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
[11:13:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:16:14] <logmsgbot>	 !log jayme@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
[11:16:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:21:43] <wikibugs>	 (03CR) 10Muehlenhoff: "PCC: https://puppet-compiler.wmflabs.org/compiler1002/25808/" [puppet] - 10https://gerrit.wikimedia.org/r/632648 (https://phabricator.wikimedia.org/T264388) (owner: 10Muehlenhoff)
[11:24:07] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1004 is CRITICAL: instance=kubernetes1004.eqiad.wmnet https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[11:26:07] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes1004 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[11:30:21] <wikibugs>	 (03PS5) 10ArielGlenn: new util to display info about revisions for one or more pages from XML input [dumps/mwbzutils] - 10https://gerrit.wikimedia.org/r/630267 (https://phabricator.wikimedia.org/T263319)
[11:30:23] <wikibugs>	 (03PS1) 10ArielGlenn: bump version to 0.0.10 [dumps/mwbzutils] - 10https://gerrit.wikimedia.org/r/633166
[11:30:38] <wikibugs>	 (03PS1) 10ArielGlenn: version 0.0.10 [debs/mwbzutils] - 10https://gerrit.wikimedia.org/r/633167
[11:38:06] <logmsgbot>	 !log jayme@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
[11:38:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:43:07] <wikibugs>	 (03PS1) 10Elukey: Avoid the analytics keytab for the Analytics Hadoop test master [puppet] - 10https://gerrit.wikimedia.org/r/633168 (https://phabricator.wikimedia.org/T255139)
[11:43:45] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] Avoid the analytics keytab for the Analytics Hadoop test master [puppet] - 10https://gerrit.wikimedia.org/r/633168 (https://phabricator.wikimedia.org/T255139) (owner: 10Elukey)
[11:48:02] <wikibugs>	 (03PS1) 10Elukey: Revert "Avoid the analytics keytab for the Analytics Hadoop test master" [puppet] - 10https://gerrit.wikimedia.org/r/633079
[11:49:17] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] Revert "Avoid the analytics keytab for the Analytics Hadoop test master" [puppet] - 10https://gerrit.wikimedia.org/r/633079 (owner: 10Elukey)
[11:51:32] <wikibugs>	 10Operations, 10Puppet, 10puppet-compiler, 10User-jbond: puppet master command will be removed in puppet 6 - https://phabricator.wikimedia.org/T236373 (10jbond) Also related https://tickets.puppetlabs.com/browse/PE-24280
[12:02:18] <wikibugs>	 (03CR) 10JMeybohm: "recheck" [deployment-charts] - 10https://gerrit.wikimedia.org/r/632716 (https://phabricator.wikimedia.org/T264157) (owner: 10JMeybohm)
[12:02:33] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] push-notifications: Update envoy to 1.15.1-2 See: Id8dfd7c5002cfd2c71b7f0aac4f21902035cc150 [deployment-charts] - 10https://gerrit.wikimedia.org/r/632717 (https://phabricator.wikimedia.org/T264157) (owner: 10JMeybohm)
[12:06:17] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] proton: Update envoy to 1.15.1-2 See: Id8dfd7c5002cfd2c71b7f0aac4f21902035cc150 [deployment-charts] - 10https://gerrit.wikimedia.org/r/632716 (https://phabricator.wikimedia.org/T264157) (owner: 10JMeybohm)
[12:08:45] <wikibugs>	 (03Merged) 10jenkins-bot: proton: Update envoy to 1.15.1-2 See: Id8dfd7c5002cfd2c71b7f0aac4f21902035cc150 [deployment-charts] - 10https://gerrit.wikimedia.org/r/632716 (https://phabricator.wikimedia.org/T264157) (owner: 10JMeybohm)
[12:08:47] <wikibugs>	 (03Merged) 10jenkins-bot: push-notifications: Update envoy to 1.15.1-2 See: Id8dfd7c5002cfd2c71b7f0aac4f21902035cc150 [deployment-charts] - 10https://gerrit.wikimedia.org/r/632717 (https://phabricator.wikimedia.org/T264157) (owner: 10JMeybohm)
[12:13:40] <logmsgbot>	 !log jayme@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'production' .
[12:13:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:15:23] <logmsgbot>	 !log jayme@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
[12:15:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:16:23] <logmsgbot>	 !log jayme@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
[12:16:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:20:24] <logmsgbot>	 !log jayme@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
[12:20:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:20:42] <logmsgbot>	 !log jayme@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
[12:20:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:25:52] <wikibugs>	 (03PS11) 10Gehel: Introduce an interface for progress bars. [software/cumin] - 10https://gerrit.wikimedia.org/r/631702 (https://phabricator.wikimedia.org/T212783)
[12:26:39] <wikibugs>	 10Operations, 10serviceops, 10Growth-Team (Current Sprint), 10Patch-For-Review, 10User-Elukey: Reimage one memcached shard to Buster - https://phabricator.wikimedia.org/T252391 (10kostajh) >>! In T252391#6387745, @MMiller_WMF wrote: > @kostajh -- maybe we should do that, but I would like to hear from @ne...
[12:27:21] <wikibugs>	 (03PS1) 10Muehlenhoff: Add an apt proxy config for deb.debian.org [puppet] - 10https://gerrit.wikimedia.org/r/633172 (https://phabricator.wikimedia.org/T262647)
[12:30:53] <wikibugs>	 (03CR) 10Gehel: Introduce an interface for progress bars. (031 comment) [software/cumin] - 10https://gerrit.wikimedia.org/r/631702 (https://phabricator.wikimedia.org/T212783) (owner: 10Gehel)
[12:33:23] <logmsgbot>	 !log jayme@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
[12:33:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:44:36] <wikibugs>	 10Operations, 10Puppet, 10puppet-compiler, 10User-jbond: puppet master command will be removed in puppet 6 - https://phabricator.wikimedia.org/T236373 (10jbond) Need to check this further but it may be possible to switch to [[ https://tickets.puppetlabs.com/browse/PUP-9055  | puppet catalogue compile ]]
[12:46:08] <icinga-wm>	 PROBLEM - Thanos compact has disappeared from Prometheus discovery on alert1001 is CRITICAL: 1 ge 1 https://wikitech.wikimedia.org/wiki/Thanos%23Alerts https://grafana.wikimedia.org/d/0cb8830a6e957978796729870f560cda/thanos-overview
[12:47:26] <icinga-wm>	 PROBLEM - Thanos query has high gRPC client errors on alert1001 is CRITICAL: job=thanos-query https://wikitech.wikimedia.org/wiki/Thanos%23Alerts https://grafana.wikimedia.org/d/af36c91291a603f1d9fbdabdd127ac4a/thanos-query
[12:47:50] <icinga-wm>	 RECOVERY - Thanos compact has disappeared from Prometheus discovery on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Thanos%23Alerts https://grafana.wikimedia.org/d/0cb8830a6e957978796729870f560cda/thanos-overview
[12:49:10] <icinga-wm>	 RECOVERY - Thanos query has high gRPC client errors on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Thanos%23Alerts https://grafana.wikimedia.org/d/af36c91291a603f1d9fbdabdd127ac4a/thanos-query
[12:49:14] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] termbox: Update envoy to 1.15.1-2 See: Id8dfd7c5002cfd2c71b7f0aac4f21902035cc150 [deployment-charts] - 10https://gerrit.wikimedia.org/r/632718 (https://phabricator.wikimedia.org/T264157) (owner: 10JMeybohm)
[12:49:29] <godog>	 investigating what's up with the alerts above
[12:50:19] <godog>	 hah, heavy query on prometheus codfw
[12:51:22] <jayme>	 I'm probably loading a bit of data while checking deployments. But nothing really out of normal I would guess
[12:51:33] <wikibugs>	 (03Merged) 10jenkins-bot: termbox: Update envoy to 1.15.1-2 See: Id8dfd7c5002cfd2c71b7f0aac4f21902035cc150 [deployment-charts] - 10https://gerrit.wikimedia.org/r/632718 (https://phabricator.wikimedia.org/T264157) (owner: 10JMeybohm)
[12:52:47] <logmsgbot>	 !log jayme@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
[12:52:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:55:05] <logmsgbot>	 !log jayme@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
[12:55:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:57:14] <godog>	 jayme: ack, thanks! yeah ATM not easy to know what could be causing it
[12:57:42] <godog>	 also generally prometheus has safeguards to try and avoid oom, doesn't always work
[12:59:04] <wikibugs>	 10Operations: wb_terms has been removed - https://phabricator.wikimedia.org/T265137 (10toan)
[13:04:30] <wikibugs>	 10Operations, 10Mail, 10Security: Don't get a mail to confirm my email address (mx2001 is blacklisted by abusix blacklist) - https://phabricator.wikimedia.org/T264504 (10Xqt) I tried to get a confirmation mail again and got it. It worked as expected now. Thanks a lot.
[13:06:09] <wikibugs>	 (03PS1) 10Filippo Giunchedi: pontoon: add ssl symlink unconditionally [puppet] - 10https://gerrit.wikimedia.org/r/633179
[13:06:29] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] pontoon: add ssl symlink unconditionally [puppet] - 10https://gerrit.wikimedia.org/r/633179 (owner: 10Filippo Giunchedi)
[13:06:51] <wikibugs>	 10Operations, 10Puppet, 10puppet-compiler, 10User-jbond: OKR: Worked required to prepare for puppet 6 - https://phabricator.wikimedia.org/T265138 (10jbond) p:05Triage→03Medium
[13:07:17] <wikibugs>	 (03PS2) 10Filippo Giunchedi: pontoon: add ssl symlink unconditionally [puppet] - 10https://gerrit.wikimedia.org/r/633179
[13:07:43] <wikibugs>	 10Operations, 10Puppet, 10puppet-compiler, 10User-jbond: OKR: Worked required to prepare for puppet 6 - https://phabricator.wikimedia.org/T265138 (10jbond)
[13:07:46] <wikibugs>	 10Operations, 10Puppet, 10Patch-For-Review, 10User-jbond: Investigate using the rich_data option to support Binary and binary_file for binary data - https://phabricator.wikimedia.org/T236481 (10jbond)
[13:07:48] <wikibugs>	 10Operations, 10Puppet, 10puppet-compiler, 10User-jbond: puppet master command will be removed in puppet 6 - https://phabricator.wikimedia.org/T236373 (10jbond)
[13:07:50] <wikibugs>	 10Operations, 10Puppet, 10User-jbond: require_package should mark packages as manually installed - https://phabricator.wikimedia.org/T195981 (10jbond)
[13:09:12] <wikibugs>	 (03CR) 10Kormat: [C: 03+1] pontoon: add ssl symlink unconditionally [puppet] - 10https://gerrit.wikimedia.org/r/633179 (owner: 10Filippo Giunchedi)
[13:09:21] <wikibugs>	 10Operations, 10Puppet, 10User-jbond: Update puppet infrastructure latest 5.5 version - https://phabricator.wikimedia.org/T265139 (10jbond)
[13:10:51] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] pontoon: add ssl symlink unconditionally [puppet] - 10https://gerrit.wikimedia.org/r/633179 (owner: 10Filippo Giunchedi)
[13:12:14] <wikibugs>	 10Operations, 10Puppet, 10puppet-compiler, 10User-jbond: OKR: Worked required to prepare for puppet 6 - https://phabricator.wikimedia.org/T265138 (10jbond)
[13:12:26] <logmsgbot>	 !log jayme@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
[13:12:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:17:18] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] wikifeeds: Update envoy to 1.15.1-2 See: Id8dfd7c5002cfd2c71b7f0aac4f21902035cc150 [deployment-charts] - 10https://gerrit.wikimedia.org/r/632719 (https://phabricator.wikimedia.org/T264157) (owner: 10JMeybohm)
[13:19:25] <wikibugs>	 (03Merged) 10jenkins-bot: wikifeeds: Update envoy to 1.15.1-2 See: Id8dfd7c5002cfd2c71b7f0aac4f21902035cc150 [deployment-charts] - 10https://gerrit.wikimedia.org/r/632719 (https://phabricator.wikimedia.org/T264157) (owner: 10JMeybohm)
[13:20:39] <wikibugs>	 (03PS1) 10Gehel: wdqs: don't fail if journal does not exist before data reload [cookbooks] - 10https://gerrit.wikimedia.org/r/633182 (https://phabricator.wikimedia.org/T255399)
[13:20:51] <wikibugs>	 (03PS2) 10Gehel: wdqs: don't fail if journal does not exist before data reload [cookbooks] - 10https://gerrit.wikimedia.org/r/633182 (https://phabricator.wikimedia.org/T255399)
[13:23:35] <logmsgbot>	 !log jayme@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
[13:23:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:23:44] <wikibugs>	 (03CR) 10DCausse: [C: 03+1] wdqs: don't fail if journal does not exist before data reload [cookbooks] - 10https://gerrit.wikimedia.org/r/633182 (https://phabricator.wikimedia.org/T255399) (owner: 10Gehel)
[13:24:07] <wikibugs>	 (03PS1) 10Andrew Bogott: nova-fullstack monitoring: turn on debug logging [puppet] - 10https://gerrit.wikimedia.org/r/633183 (https://phabricator.wikimedia.org/T265140)
[13:25:03] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] nova-fullstack monitoring: turn on debug logging [puppet] - 10https://gerrit.wikimedia.org/r/633183 (https://phabricator.wikimedia.org/T265140) (owner: 10Andrew Bogott)
[13:25:44] <wikibugs>	 10Operations, 10Mail, 10Security: Don't get a mail to confirm my email address (mx2001 is blacklisted by abusix blacklist) - https://phabricator.wikimedia.org/T264504 (10herron) >>! In T264504#6532332, @Xqt wrote: > I tried to get a confirmation mail again and got it. It worked as expected now. Thanks a lot....
[13:27:33] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10fundraising-tech-ops: (Need By: 2020-09-30) rack/setup/install frdb1004.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T260379 (10Cmjohnson) 05Open→03Invalid This is an old ticket, @jgreen just made a new task for the same server. Killing this off
[13:29:15] <logmsgbot>	 !log jayme@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
[13:29:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:29:44] <wikibugs>	 (03PS1) 10Kormat: puppetmaster: Make self-master-post-receive more general. [puppet] - 10https://gerrit.wikimedia.org/r/633184
[13:30:43] <wikibugs>	 (03CR) 10Gehel: [C: 03+2] wdqs: don't fail if journal does not exist before data reload [cookbooks] - 10https://gerrit.wikimedia.org/r/633182 (https://phabricator.wikimedia.org/T255399) (owner: 10Gehel)
[13:31:44] <logmsgbot>	 !log gehel@cumin1001 START - Cookbook sre.wdqs.data-reload
[13:31:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:32:50] <gehel>	 dcausse: ^ restarted, we'll see on Monday how it went!
[13:32:55] <dcausse>	 gehel: thanks!
[13:33:52] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] puppetmaster: Make self-master-post-receive more general. [puppet] - 10https://gerrit.wikimedia.org/r/633184 (owner: 10Kormat)
[13:34:20] <wikibugs>	 10Operations, 10serviceops: Ugrade the MediaWiki appservers to debian buster, icu63 - https://phabricator.wikimedia.org/T264991 (10MoritzMuehlenhoff) After a lot of fist shaking and head scratching I think I've found a workable solution, to the problem that PHP build depends on ICU 63 (for intl) and indirectly...
[13:34:40] <wikibugs>	 (03CR) 10Kormat: [C: 03+2] puppetmaster: Make self-master-post-receive more general. [puppet] - 10https://gerrit.wikimedia.org/r/633184 (owner: 10Kormat)
[13:35:11] <wikibugs>	 10Operations, 10Puppet, 10puppet-compiler, 10User-jbond: OKR: Worked required to prepare for puppet 6 - https://phabricator.wikimedia.org/T265138 (10jbond)
[13:35:37] <wikibugs>	 10Operations, 10Mail: exim should log the reason for defer with disconnect after HELO/EHLO - https://phabricator.wikimedia.org/T265142 (10herron) p:05Triage→03Medium
[13:36:32] <wikibugs>	 10Operations, 10Mail: exim should log the reason for defer with disconnect after HELO/EHLO - https://phabricator.wikimedia.org/T265142 (10herron) This would have been helpful in troubleshooting T264504
[13:37:47] <wikibugs>	 10Operations, 10Mail, 10Security: Don't get a mail to confirm my email address (mx2001 is blacklisted by abusix blacklist) - https://phabricator.wikimedia.org/T264504 (10herron) 05Open→03Resolved a:03herron I think we're in good shape here now.  Related exim logging improvements can be coordinated via...
[13:40:07] <wikibugs>	 10Operations, 10Puppet, 10puppet-compiler, 10User-jbond: in puppet 6 some core types have been moved to external modules.  check and confirm our exposure - https://phabricator.wikimedia.org/T265143 (10jbond)
[13:43:38] <wikibugs>	 (03PS1) 10Elukey: Fix typos for Hadoop test cluster's hostnames [puppet] - 10https://gerrit.wikimedia.org/r/633187 (https://phabricator.wikimedia.org/T255139)
[13:44:10] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] Fix typos for Hadoop test cluster's hostnames [puppet] - 10https://gerrit.wikimedia.org/r/633187 (https://phabricator.wikimedia.org/T255139) (owner: 10Elukey)
[13:45:52] <jayme>	 !log helm rollback push-notification in eqiad to revision 8
[13:45:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:48:24] <logmsgbot>	 !log jayme@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
[13:48:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:57:55] <wikibugs>	 10Operations, 10Puppet, 10puppet-compiler, 10User-jbond: OKR: Work required to prepare for puppet 6 - https://phabricator.wikimedia.org/T265138 (10Reedy)
[14:02:07] <wikibugs>	 10Operations, 10Puppet, 10puppet-compiler, 10User-jbond: OKR: Work required to prepare for puppet 6 - https://phabricator.wikimedia.org/T265138 (10jbond) > will move to puppet6 untill at least bullseye  Its worth noting that bullseye currently has puppet 5.5.19 (with sid on 5.5.21) its not clear if bullsey...
[14:08:04] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] zotero: Update envoy to 1.15.1-2 See: Id8dfd7c5002cfd2c71b7f0aac4f21902035cc150 [deployment-charts] - 10https://gerrit.wikimedia.org/r/632720 (https://phabricator.wikimedia.org/T264157) (owner: 10JMeybohm)
[14:08:07] <wikibugs>	 10Operations, 10Analytics, 10Analytics-Kanban, 10netops: Add more dimensions in the netflow/pmacct/Druid pipeline - https://phabricator.wikimedia.org/T254332 (10mforns) Hi @ayounsi, can you help me? I have some more questions:  * What is the field that we want to extract the AS name for? I see as_src, as_d...
[14:12:35] <wikibugs>	 (03Merged) 10jenkins-bot: zotero: Update envoy to 1.15.1-2 See: Id8dfd7c5002cfd2c71b7f0aac4f21902035cc150 [deployment-charts] - 10https://gerrit.wikimedia.org/r/632720 (https://phabricator.wikimedia.org/T264157) (owner: 10JMeybohm)
[14:17:52] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[14:18:19] <logmsgbot>	 !log jayme@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
[14:18:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:20:49] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[14:22:49] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[14:24:51] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[14:28:55] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/633172 (https://phabricator.wikimedia.org/T262647) (owner: 10Muehlenhoff)
[14:32:34] <logmsgbot>	 !log jayme@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
[14:32:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:35:06] <wikibugs_>	 10Operations, 10GrowthExperiments-NewcomerTasks, 10Product-Infrastructure-Team-Backlog, 10serviceops: Service operations setup for Add a Link project - https://phabricator.wikimedia.org/T258978 (10Joe) Adding some notes after yesterday's meeting:  - the current script is using `sqlitedict` right now, and t...
[14:36:19] <wikibugs_>	 (03CR) 10Muehlenhoff: "> Patch Set 1: Code-Review+1" [puppet] - 10https://gerrit.wikimedia.org/r/633172 (https://phabricator.wikimedia.org/T262647) (owner: 10Muehlenhoff)
[14:37:25] <wikibugs_>	 (03PS1) 10Klausman: amd_rocm: Ensure linux-headers-amd64 is installed [puppet] - 10https://gerrit.wikimedia.org/r/633194
[14:38:05] <wikibugs_>	 10Operations, 10Analytics, 10Analytics-Kanban, 10netops: Add more dimensions in the netflow/pmacct/Druid pipeline - https://phabricator.wikimedia.org/T254332 (10ayounsi) > * What is the field that we want to extract the AS name for? I see as_src, as_dst, peer_as_src, peer_as_dst? Ideally all of them, but a...
[14:41:05] <logmsgbot>	 !log jayme@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
[14:41:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:44:57] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/633172 (https://phabricator.wikimedia.org/T262647) (owner: 10Muehlenhoff)
[15:13:16] <wikibugs>	 10Operations, 10Traffic, 10Performance-Team (Radar): 8-10% response start regression (Varnish 5.1.3-1wm15 -> 6.0.6-1wm1) - https://phabricator.wikimedia.org/T264398 (10ema) With [[ https://github.com/rakyll/hey | hey ]] on cp3052 (Varnish 5.1.3-1wm15) and cp3054 (6.0.6-1wm1) I obtained the following two late...
[15:15:59] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job={federate-ops,prometheus} https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[15:16:39] <icinga-wm>	 PROBLEM - Check systemd state on mwmaint2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[15:17:11] <icinga-wm>	 PROBLEM - ping-offload grafana alert on alert1001 is CRITICAL: CRITICAL: Ping offload ( https://grafana.wikimedia.org/d/000000513/ping-offload ) is alerting: target IP missing on hosts loopback. https://wikitech.wikimedia.org/wiki/Ping_offload%23InAddrErrors_alert https://grafana.wikimedia.org/d/000000513/
[15:21:44] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[15:22:42] <icinga-wm>	 PROBLEM - Prometheus prometheus1004/ops restarted: beware possible monitoring artifacts on prometheus1004 is CRITICAL: instance=127.0.0.1 job=prometheus https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_was_restarted https://grafana.wikimedia.org/d/000000271/prometheus-stats?var-datasource=eqiad+prometheus/ops
[15:24:20] <icinga-wm>	 PROBLEM - Prometheus prometheus1003/ops restarted: beware possible monitoring artifacts on prometheus1003 is CRITICAL: instance=127.0.0.1 job=prometheus https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_was_restarted https://grafana.wikimedia.org/d/000000271/prometheus-stats?var-datasource=eqiad+prometheus/ops
[15:25:38] <wikibugs>	 (03PS1) 10JMeybohm: service_proxy: add node.js keepalive to push-notifications [puppet] - 10https://gerrit.wikimedia.org/r/633199
[15:25:42] <icinga-wm>	 PROBLEM - Prometheus prometheus1004/global -or a Prometheus it scrapes- was restarted: beware possible monitoring artifacts on prometheus1004 is CRITICAL: instance=127.0.0.1 job=prometheus site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_was_restarted https://grafana.wikimedia.org/d/000000271/prometheus-stats?var-datasource=eqiad+prometheus/global
[15:32:50] <herron>	 hmm prometheus1004 prometheus@ops[9562]: fatal error: runtime: out of memory
[15:33:14] <godog>	 :( sad_trombone.wav
[15:33:16] <herron>	 restarted by systemd
[15:34:43] <godog>	 yeah I'm guessing an heavy query
[15:35:40] <icinga-wm>	 PROBLEM - Stale file for node-exporter textfile in eqiad on alert1001 is CRITICAL: cluster=misc file=smartmon.prom instance=relforge1004 job=node site=eqiad https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile
[15:37:14] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 241, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[15:39:58] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 243, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[15:40:02] <logmsgbot>	 !log dzahn@cumin1001 START - Cookbook sre.hosts.decommission
[15:40:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:42:24] <icinga-wm>	 RECOVERY - Check systemd state on mwmaint2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[15:42:50] <logmsgbot>	 !log dzahn@cumin1001 END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
[15:42:53] <wikibugs>	 10Operations: serve tftpboot environment from the install servers and create one in each edge POP - https://phabricator.wikimedia.org/T252526 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: `testvm3001.esams.wmnet` - testvm3001.esams.wmnet (**WARN**)   - **Failed do...
[15:42:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:43:56] <volans|off>	 mutante: I'm in a meeting but I can look in a bit at the failrue
[15:43:58] <volans|off>	 *failure
[15:46:10] <icinga-wm>	 RECOVERY - Prometheus prometheus1004/ops restarted: beware possible monitoring artifacts on prometheus1004 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_was_restarted https://grafana.wikimedia.org/d/000000271/prometheus-stats?var-datasource=eqiad+prometheus/ops
[15:47:34] <icinga-wm>	 RECOVERY - Prometheus prometheus1004/global -or a Prometheus it scrapes- was restarted: beware possible monitoring artifacts on prometheus1004 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_was_restarted https://grafana.wikimedia.org/d/000000271/prometheus-stats?var-datasource=eqiad+prometheus/global
[15:48:08] <mutante>	 volans|off: thank you, alright. this was a VM that existed in ganeti but was not in puppetdb. maybe that is causing something
[15:48:33] <mutante>	 did not have that issue last night removing one like it.. but the difference was that was in site.pp for a short time
[15:48:34] <icinga-wm>	 RECOVERY - ping-offload grafana alert on alert1001 is OK: OK: Ping offload ( https://grafana.wikimedia.org/d/000000513/ping-offload ) is not alerting. https://wikitech.wikimedia.org/wiki/Ping_offload%23InAddrErrors_alert https://grafana.wikimedia.org/d/000000513/
[15:48:55] <mutante>	 i'll try now with the 3rd and last one
[15:49:07] <volans|off>	 mutante: nah that's not the issue, as you can see in the phab update
[15:49:15] <volans|off>	 the dns step failed, that's why I want to look at it
[15:49:23] <volans|off>	 the icinga downtime it's just a warning and no factor
[15:50:19] <mutante>	 ack, actual failure to run dns cookbook
[15:54:54] <volans|off>	 mutante: I think was a failed netbox API call, I want to add retry logic that depends on updating pynetbox
[15:55:04] <volans|off>	 with the next host it should show you the diff for both
[15:55:09] <volans|off>	 in the DNS part
[15:55:13] <mutante>	 volans|off: should i try repeating the exact command one more time?
[15:55:18] <volans|off>	 nah, no need
[15:55:20] <mutante>	 ok
[15:55:25] <volans|off>	 in case you didn't have another to decom
[15:55:33] <volans|off>	 I would have asked you to run the sre.dns.netbox cookbook
[15:55:37] <volans|off>	 to sync the dns part
[15:55:37] <mutante>	 i have one more. i am just not sure if this one is in a zombie state now
[15:55:45] <mutante>	 checks netbox
[15:55:55] <volans|off>	 shouldn't matter
[15:55:59] <mutante>	 ok
[15:56:07] <volans|off>	 but feel free to ask if it's in some very weird state
[15:56:24] <mutante>	 ack. netbox looks ok. doing the last one
[15:56:35] <logmsgbot>	 !log dzahn@cumin1001 START - Cookbook sre.hosts.decommission
[15:56:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:57:04] <wikibugs>	 (03PS1) 10Ayounsi: Prioritize SG-IX [homer/public] - 10https://gerrit.wikimedia.org/r/633200 (https://phabricator.wikimedia.org/T260991)
[15:58:58] <wikibugs>	 (03PS2) 10Dzahn: install_server: remove testvm[345]001 [puppet] - 10https://gerrit.wikimedia.org/r/632590
[16:06:02] <mutante>	 volans|off: it's showing me the DNS diff and that has both VMs, the previous one and this one and just said "done" now
[16:06:10] <mutante>	 so seems all good
[16:06:14] <logmsgbot>	 !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
[16:06:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:06:19] <wikibugs>	 10Operations: serve tftpboot environment from the install servers and create one in each edge POP - https://phabricator.wikimedia.org/T252526 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: `testvm5001.eqsin.wmnet` - testvm5001.eqsin.wmnet (**WARN**)   - **Failed do...
[16:06:57] <volans|off>	 mutante: ack, as expected
[16:07:54] <mutante>	 yep, thanks
[16:08:04] <wikibugs>	 (03PS1) 10Dave Pifke: [WIP] Start puppetizing WebPageTest [puppet] - 10https://gerrit.wikimedia.org/r/633202 (https://phabricator.wikimedia.org/T262962)
[16:10:23] <icinga-wm>	 RECOVERY - Prometheus prometheus1003/ops restarted: beware possible monitoring artifacts on prometheus1003 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_was_restarted https://grafana.wikimedia.org/d/000000271/prometheus-stats?var-datasource=eqiad+prometheus/ops
[16:10:53] <wikibugs>	 10Operations, 10Analytics, 10SRE-Access-Requests: Renable SSH access for Lex Nasser, analytics intern - https://phabricator.wikimedia.org/T265071 (10lexnasser) @elukey  Yep, that's the correct email. I also confirm that I'm now able to access Turnilo and Stat1007. Thanks for your help!
[16:11:59] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "all 3 have been decom'ed with cookbook" [puppet] - 10https://gerrit.wikimedia.org/r/632590 (owner: 10Dzahn)
[16:12:24] <wikibugs>	 10Operations, 10User-jbond: Proposal: create a framework to build containerized incident management  protects - https://phabricator.wikimedia.org/T265153 (10jbond) p:05Triage→03Low
[16:14:49] <wikibugs>	 10Operations, 10Traffic, 10Wikipedia-iOS-App-Backlog, 10iOS-app-Bugs: Wikipedia iOS apps sending harmful bursts of traffic synchronized to the top of the hour, especially at 22:00 UTC - https://phabricator.wikimedia.org/T264881 (10Tsevener) @CDanis we didn't remove fetches against `/api/rest_v1/page/random...
[16:18:40] <wikibugs>	 10Operations, 10User-jbond: Proposal: create a framework to build containerized incident management  protects - https://phabricator.wikimedia.org/T265153 (10jbond)
[16:22:50] <wikibugs>	 10Operations, 10Traffic, 10Wikipedia-iOS-App-Backlog, 10iOS-app-Bugs: Wikipedia iOS apps sending harmful bursts of traffic synchronized to the top of the hour, especially at 22:00 UTC - https://phabricator.wikimedia.org/T264881 (10CDanis) >>! In T264881#6532868, @Tsevener wrote: > @CDanis we didn't remove...
[16:35:46] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1003/25811/" [puppet] - 10https://gerrit.wikimedia.org/r/633020 (owner: 10Dzahn)
[16:36:05] <wikibugs>	 (03PS3) 10Dzahn: ci: replace hiera with lookup, jenkins, shipyard, pipeline, k8s [puppet] - 10https://gerrit.wikimedia.org/r/633017
[16:36:39] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[16:38:21] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[16:39:59] <wikibugs>	 (03CR) 10Dzahn: [V: 03+1 C: 03+2] "all of these are on ci::master, so here's the noop:" [puppet] - 10https://gerrit.wikimedia.org/r/633017 (owner: 10Dzahn)
[16:40:37] <wikibugs>	 (03CR) 10Dzahn: "> Patch Set 2:" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/633017 (owner: 10Dzahn)
[16:42:16] <wikibugs>	 (03CR) 10Dzahn: "noop confirmed on contint1001" [puppet] - 10https://gerrit.wikimedia.org/r/633017 (owner: 10Dzahn)
[16:52:12] <wikibugs>	 (03PS1) 10Gergő Tisza: Enable session-ip log channel on group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633210 (https://phabricator.wikimedia.org/T264799)
[16:54:20] <wikibugs>	 10Operations, 10Analytics, 10Analytics-Kanban, 10netops: Add more dimensions in the netflow/pmacct/Druid pipeline - https://phabricator.wikimedia.org/T254332 (10mforns) > > What is the field that we want to extract the AS name for? I see as_src, as_dst, peer_as_src, peer_as_dst? > Ideally all of them, but...
[17:03:44] <wikibugs>	 (03CR) 10Dzahn: "noop confirmed wtp2015" [puppet] - 10https://gerrit.wikimedia.org/r/633020 (owner: 10Dzahn)
[17:04:33] <wikibugs>	 (03PS2) 10Razzi: geoip: move archive timer from stat1007 to an-launcher1002 [puppet] - 10https://gerrit.wikimedia.org/r/633032 (https://phabricator.wikimedia.org/T264152)
[17:10:49] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+1] "LGTM, but perhaps this should be everywhere, at least until T264369 is resolved?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633210 (https://phabricator.wikimedia.org/T264799) (owner: 10Gergő Tisza)
[17:15:10] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] amd_rocm: Ensure linux-headers-amd64 is installed (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/633194 (owner: 10Klausman)
[17:15:23] <wikibugs>	 (03CR) 10Brennen Bearnes: [C: 03+1] local_dev::docker_publish: hiera->lookup, data types [puppet] - 10https://gerrit.wikimedia.org/r/633027 (owner: 10Dzahn)
[17:16:39] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "thanks, that was quick 😊" [puppet] - 10https://gerrit.wikimedia.org/r/633027 (owner: 10Dzahn)
[17:18:11] <wikibugs>	 (03CR) 10Gergő Tisza: "Eventually, yeah. I want to make sure it's not overloading Logstash or Kask." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633210 (https://phabricator.wikimedia.org/T264799) (owner: 10Gergő Tisza)
[17:20:58] <wikibugs>	 10Operations, 10Analytics, 10Analytics-Kanban, 10netops: Add more dimensions in the netflow/pmacct/Druid pipeline - https://phabricator.wikimedia.org/T254332 (10ayounsi) >  Or maybe I misunderstood what needs to be done here... I assumed we want to determine whether the given IP is v4 or v6. But which IP w...
[17:24:30] <wikibugs>	 (03CR) 10Bstorm: [C: 03+1] "> Patch Set 5: Verified+1" [puppet] - 10https://gerrit.wikimedia.org/r/631315 (owner: 10Dzahn)
[17:29:29] <wikibugs>	 10Operations, 10Traffic, 10Wikipedia-iOS-App-Backlog, 10iOS-app-Bugs: Wikipedia iOS apps sending harmful bursts of traffic synchronized to the top of the hour, especially at 22:00 UTC - https://phabricator.wikimedia.org/T264881 (10Tsevener) @CDanis Thanks! That response is working fine in my testing, feel...
[17:32:58] <wikibugs>	 10Operations, 10Traffic, 10Wikipedia-iOS-App-Backlog, 10iOS-app-Bugs: Wikipedia iOS apps sending harmful bursts of traffic synchronized to the top of the hour, especially at 22:00 UTC - https://phabricator.wikimedia.org/T264881 (10CDanis) Great! Thanks @Tsevener !
[17:44:37] <icinga-wm>	 RECOVERY - Check systemd state on stat1006 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[17:49:31] <wikibugs>	 (03PS8) 10Dzahn: toolforge/grid: hiera()->lookup(), add data types [puppet] - 10https://gerrit.wikimedia.org/r/631315
[17:51:05] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] toolforge/grid: hiera()->lookup(), add data types [puppet] - 10https://gerrit.wikimedia.org/r/631315 (owner: 10Dzahn)
[17:58:02] <wikibugs>	 (03CR) 10Ori.livneh: "Thanks for this" [puppet] - 10https://gerrit.wikimedia.org/r/631895 (https://phabricator.wikimedia.org/T95064) (owner: 10Dzahn)
[18:01:16] <wikibugs>	 (03CR) 10Dzahn: "thank you Ori. looks like I failed to even add you to reviewers. that was an accident." [puppet] - 10https://gerrit.wikimedia.org/r/631895 (https://phabricator.wikimedia.org/T95064) (owner: 10Dzahn)
[18:05:35] <wikibugs>	 10Operations, 10netops, 10Patch-For-Review, 10Security, 10User-jbond: Review default ferm INPUT policy - https://phabricator.wikimedia.org/T264888 (10jbond) no sure why but some host take a looong time  ` Completed SYN Stealth Scan against 208.80.153.45 in 50099.78s (63 hosts left) Completed SYN Stealth...
[18:08:14] <wikibugs>	 (03PS1) 10Andrew Bogott: Cloud puppetmasters: Rename an argument in Profile::Pupetmaster::Backend [puppet] - 10https://gerrit.wikimedia.org/r/633215
[18:09:19] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Cloud puppetmasters: Rename an argument in Profile::Pupetmaster::Backend [puppet] - 10https://gerrit.wikimedia.org/r/633215 (owner: 10Andrew Bogott)
[18:14:17] <wikibugs>	 (03CR) 10Dzahn: "confirmed no issue on tools-sgeexec-0901. thank you as well." [puppet] - 10https://gerrit.wikimedia.org/r/631315 (owner: 10Dzahn)
[18:27:21] <wikibugs>	 10Operations, 10Technical-blog-posts, 10Traffic: Blog post series: the evolution of Wikimedia's Content Delivery Network - https://phabricator.wikimedia.org/T264729 (10srodlund) It looks good to me! I am copying it over to the blog for publication next week. (Tuesday 13 Oct)
[18:44:08] <logmsgbot>	 !log dzahn@cumin1001 START - Cookbook sre.ganeti.makevm
[18:44:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:47:56] <wikibugs>	 10Operations, 10serviceops, 10Growth-Team (Current Sprint), 10Patch-For-Review, 10User-Elukey: Reimage one memcached shard to Buster - https://phabricator.wikimedia.org/T252391 (10nettrom_WMF) @kostajh : Thanks for picking this up and pinging me about it. I think we should switch off EditorJourney since...
[18:52:57] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_citoid_cluster_codfw site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[18:56:19] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[19:00:47] <wikibugs>	 (03PS1) 10Herron: logstash: add field checks to filter throttle [puppet] - 10https://gerrit.wikimedia.org/r/633224
[19:02:51] <wikibugs>	 (03PS1) 10Dzahn: add testvm1001.eqiad.wmnet [dns] - 10https://gerrit.wikimedia.org/r/633225
[19:04:01] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] add testvm1001.eqiad.wmnet [dns] - 10https://gerrit.wikimedia.org/r/633225 (owner: 10Dzahn)
[19:06:40] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+1] "> Patch Set 1:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633210 (https://phabricator.wikimedia.org/T264799) (owner: 10Gergő Tisza)
[19:08:17] <wikibugs>	 (03PS1) 10Dzahn: site: add testvm1001 with appserver role for a test [puppet] - 10https://gerrit.wikimedia.org/r/633226
[19:10:50] <logmsgbot>	 !log dzahn@cumin1001 END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
[19:10:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:12:32] <wikibugs>	 (03PS2) 10Dzahn: site/DHCP: add testvm1001 with appserver role for a test [puppet] - 10https://gerrit.wikimedia.org/r/633226
[19:14:20] <wikibugs>	 (03PS1) 10Razzi: turnilo: switch from nginx to envoy for tls termination [puppet] - 10https://gerrit.wikimedia.org/r/633227 (https://phabricator.wikimedia.org/T240439)
[19:22:09] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_citoid_cluster_codfw site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[19:22:48] <wikibugs>	 (03PS1) 10CDanis: WIP [puppet] - 10https://gerrit.wikimedia.org/r/633229
[19:23:51] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[19:25:01] <wikibugs>	 (03PS2) 10CDanis: WIP [puppet] - 10https://gerrit.wikimedia.org/r/633229
[19:25:11] <wikibugs>	 (03CR) 10Razzi: "I figured it'd be easiest to make this change for a single host, iterate on that, and then roll it out to the others. I can see there's so" [puppet] - 10https://gerrit.wikimedia.org/r/633227 (https://phabricator.wikimedia.org/T240439) (owner: 10Razzi)
[19:34:16] <wikibugs>	 10Operations, 10Machine Learning Platform, 10SRE-Access-Requests: Requesting adding to ores-admin for Ladsgroup - https://phabricator.wikimedia.org/T265172 (10Ladsgroup)
[19:46:04] <wikibugs>	 10Operations, 10Security-Team: Remove Chase Pettet from security@ alias in exim - https://phabricator.wikimedia.org/T265175 (10sbassett)
[19:46:26] <wikibugs>	 10Operations, 10Security-Team: Remove Chase Pettet from security@ alias in exim - https://phabricator.wikimedia.org/T265175 (10sbassett)
[19:46:28] <wikibugs>	 (03PS3) 10CDanis: VCL: temp. ratelimit iOS app fetches of random page summary [puppet] - 10https://gerrit.wikimedia.org/r/633229 (https://phabricator.wikimedia.org/T264881)
[19:47:06] <wikibugs>	 10Operations, 10Security-Team: Remove Chase Pettet from security@ alias in exim - https://phabricator.wikimedia.org/T265175 (10sbassett)
[19:49:01] <wikibugs>	 (03CR) 10Hashar: [C: 03+1] Enable session-ip log channel on group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633210 (https://phabricator.wikimedia.org/T264799) (owner: 10Gergő Tisza)
[19:49:07] <wikibugs>	 10Operations, 10Security-Team: Remove Chase Pettet from security@ alias in exim - https://phabricator.wikimedia.org/T265175 (10sbassett)
[19:49:51] <wikibugs>	 10Operations, 10Security-Team: Remove Chase Pettet from security@ alias in exim - https://phabricator.wikimedia.org/T265175 (10Dzahn) Hi @sbassett security@ isn't in exim anymore nowadays. It's in Google, you'll have to ask OIT via Zendesk please:   ` [mx1001:~] $ sudo exim4 -bt security@wikimedia.org security...
[19:50:51] <wikibugs>	 10Operations, 10Security-Team: Remove Chase Pettet from security@ alias in Google - https://phabricator.wikimedia.org/T265175 (10sbassett)
[19:52:07] <wikibugs>	 (03CR) 10Hashar: [C: 03+1] "If we don't want to roll it this Friday we can have:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633210 (https://phabricator.wikimedia.org/T264799) (owner: 10Gergő Tisza)
[19:53:38] <wikibugs>	 (03CR) 10CDanis: "PCC looks correct: https://puppet-compiler.wmflabs.org/compiler1001/25816/" [puppet] - 10https://gerrit.wikimedia.org/r/633229 (https://phabricator.wikimedia.org/T264881) (owner: 10CDanis)
[19:54:05] <wikibugs>	 (03CR) 10Gergő Tisza: "Per the recent discussion in #mediawiki_security, I think it's OK to roll this out today." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633210 (https://phabricator.wikimedia.org/T264799) (owner: 10Gergő Tisza)
[20:08:09] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_citoid_cluster_codfw site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[20:09:51] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[20:15:50] <wikibugs>	 (03CR) 10Hashar: [C: 03+1] "Looks good to me :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633210 (https://phabricator.wikimedia.org/T264799) (owner: 10Gergő Tisza)
[20:16:43] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[20:18:27] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[20:40:19] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[20:42:01] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[20:44:04] <wikibugs>	 (03CR) 10RLazarus: [C: 03+1] VCL: temp. ratelimit iOS app fetches of random page summary [puppet] - 10https://gerrit.wikimedia.org/r/633229 (https://phabricator.wikimedia.org/T264881) (owner: 10CDanis)
[20:46:30] <wikibugs>	 (03CR) 10CDanis: [C: 03+2] "0 tests failed, 0 tests skipped, 22 tests passed" [puppet] - 10https://gerrit.wikimedia.org/r/633229 (https://phabricator.wikimedia.org/T264881) (owner: 10CDanis)
[20:55:28] <wikibugs>	 (03PS1) 10Gergő Tisza: Log IP/device changes within the same session [core] (wmf/1.36.0-wmf.10) - 10https://gerrit.wikimedia.org/r/633252 (https://phabricator.wikimedia.org/T264799)
[20:55:55] <wikibugs>	 (03PS1) 10Gergő Tisza: Log IP/device changes within the same session [core] (wmf/1.36.0-wmf.11) - 10https://gerrit.wikimedia.org/r/633253 (https://phabricator.wikimedia.org/T264799)
[20:56:53] <wikibugs>	 (03PS1) 10Gergő Tisza: SessionManager: Always log IP/UA in session-ip [core] (wmf/1.36.0-wmf.10) - 10https://gerrit.wikimedia.org/r/633254 (https://phabricator.wikimedia.org/T264799)
[20:57:53] <wikibugs>	 (03PS1) 10Gergő Tisza: SessionManager: Always log IP/UA in session-ip [core] (wmf/1.36.0-wmf.11) - 10https://gerrit.wikimedia.org/r/633255 (https://phabricator.wikimedia.org/T264799)
[20:58:54] <wikibugs>	 (03CR) 10Gergő Tisza: [C: 03+2] "deploying" [core] (wmf/1.36.0-wmf.10) - 10https://gerrit.wikimedia.org/r/633252 (https://phabricator.wikimedia.org/T264799) (owner: 10Gergő Tisza)
[20:59:00] <wikibugs>	 (03CR) 10Gergő Tisza: [C: 03+2] "deploying" [core] (wmf/1.36.0-wmf.11) - 10https://gerrit.wikimedia.org/r/633253 (https://phabricator.wikimedia.org/T264799) (owner: 10Gergő Tisza)
[20:59:25] <wikibugs>	 (03CR) 10Gergő Tisza: [C: 03+2] "deploying" [core] (wmf/1.36.0-wmf.10) - 10https://gerrit.wikimedia.org/r/633254 (https://phabricator.wikimedia.org/T264799) (owner: 10Gergő Tisza)
[20:59:42] <wikibugs>	 (03CR) 10Gergő Tisza: [C: 03+2] "deploying" [core] (wmf/1.36.0-wmf.11) - 10https://gerrit.wikimedia.org/r/633255 (https://phabricator.wikimedia.org/T264799) (owner: 10Gergő Tisza)
[21:06:15] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] site/DHCP: add testvm1001 with appserver role for a test [puppet] - 10https://gerrit.wikimedia.org/r/633226 (owner: 10Dzahn)
[21:11:46] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics-Clusters, 10DC-Ops: (Need By: 2020-09-15) upgrade/replace memory in stat100[58] - https://phabricator.wikimedia.org/T260448 (10wiki_willy) a:05wiki_willy→03Cmjohnson PDUs were shipped out today and should arrive next week.  Assigning back to @Cmjohnson to complete...
[21:28:43] <wikibugs>	 (03Merged) 10jenkins-bot: Log IP/device changes within the same session [core] (wmf/1.36.0-wmf.10) - 10https://gerrit.wikimedia.org/r/633252 (https://phabricator.wikimedia.org/T264799) (owner: 10Gergő Tisza)
[21:32:13] <wikibugs>	 (03Merged) 10jenkins-bot: Log IP/device changes within the same session [core] (wmf/1.36.0-wmf.11) - 10https://gerrit.wikimedia.org/r/633253 (https://phabricator.wikimedia.org/T264799) (owner: 10Gergő Tisza)
[21:33:15] <wikibugs>	 (03Merged) 10jenkins-bot: SessionManager: Always log IP/UA in session-ip [core] (wmf/1.36.0-wmf.10) - 10https://gerrit.wikimedia.org/r/633254 (https://phabricator.wikimedia.org/T264799) (owner: 10Gergő Tisza)
[21:33:18] <wikibugs>	 (03Merged) 10jenkins-bot: SessionManager: Always log IP/UA in session-ip [core] (wmf/1.36.0-wmf.11) - 10https://gerrit.wikimedia.org/r/633255 (https://phabricator.wikimedia.org/T264799) (owner: 10Gergő Tisza)
[21:53:27] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[21:53:41] <Urbanecm>	 !log [urbanecm@mwmaint2001 ~]$ mwscript extensions/CentralAuth/maintenance/attachAccount.php --wiki=dewiki --userlist users.txt # users.txt contains Almeida # T263935
[21:53:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:53:47] <stashbot>	 T263935: Local account not attached after SUL-Finalization - https://phabricator.wikimedia.org/T263935
[21:54:59] <Urbanecm>	 tgr_: you seem to have unsynced patches?
[21:55:50] <wikibugs>	 (03PS1) 10Dzahn: partman: add testvm to use standard flat/virtual recipe [puppet] - 10https://gerrit.wikimedia.org/r/633269
[21:56:05] <tgr_>	 Urbanecm: do I? I haven't pulled anything to the deploy host yet
[21:56:26] <Urbanecm>	 tgr_: yes, but you merged patches
[21:57:33] <Urbanecm>	 (I don't plan to roll anything at a friday evening, I just noticed that when I logged, and saw that some patches were merged 20 minutes ago, and not synced, so I pinged oyu)
[21:58:29] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[21:58:30] <tgr_>	 yeah, I went for lunch while CI was working
[21:59:19] <Urbanecm>	 i see :). 
[22:01:19] <tgr_>	 !log rolling out T264799#6533622
[22:01:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:01:24] <stashbot>	 T264799: Log when a request with the same user session comes from a different IP - https://phabricator.wikimedia.org/T264799
[22:09:43] <logmsgbot>	 !log tgr@deploy1001 Synchronized php-1.36.0-wmf.10/includes/: Backport: [[gerrit:633252|Log IP/device changes within the same session (T264799)]] & [[gerrit:633254|SessionManager: Always log IP/UA in session-ip]] (duration: 01m 06s)
[22:09:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:09:50] <stashbot>	 T264799: Log when a request with the same user session comes from a different IP - https://phabricator.wikimedia.org/T264799
[22:12:50] <tgr_>	 that caused a bit of an error spike. I should have synced file by file, probably.
[22:13:19] <wikibugs>	 (03CR) 10Gergő Tisza: [C: 03+2] Enable session-ip log channel on group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633210 (https://phabricator.wikimedia.org/T264799) (owner: 10Gergő Tisza)
[22:14:03] <wikibugs>	 (03Merged) 10jenkins-bot: Enable session-ip log channel on group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633210 (https://phabricator.wikimedia.org/T264799) (owner: 10Gergő Tisza)
[22:14:07] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] partman: add testvm to use standard flat/virtual recipe [puppet] - 10https://gerrit.wikimedia.org/r/633269 (owner: 10Dzahn)
[22:20:13] <logmsgbot>	 !log tgr@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:633210|Enable session-ip log channel on group0 (T264799)]] (duration: 00m 59s)
[22:20:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:20:20] <stashbot>	 T264799: Log when a request with the same user session comes from a different IP - https://phabricator.wikimedia.org/T264799
[22:23:28] <logmsgbot>	 !log tgr@deploy1001 Synchronized php-1.36.0-wmf.11/includes/: Backport: [[gerrit:633252|Log IP/device changes within the same session (T264799)]] & [[gerrit:633254|SessionManager: Always log IP/UA in session-ip]] (duration: 01m 04s)
[22:23:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:26:38] <wikibugs>	 (03PS2) 10Dzahn: wmcs::instance: remove diamond removal remnants [puppet] - 10https://gerrit.wikimedia.org/r/632570 (https://phabricator.wikimedia.org/T210993)
[22:26:58] <wikibugs>	 (03PS1) 10Gergő Tisza: Enable session-ip log channel on group1, except Commons/Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633271 (https://phabricator.wikimedia.org/T264799)
[22:34:13] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[22:35:55] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[22:45:30] <wikibugs>	 (03CR) 10Gergő Tisza: [C: 03+2] Enable session-ip log channel on group1, except Commons/Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633271 (https://phabricator.wikimedia.org/T264799) (owner: 10Gergő Tisza)
[22:46:37] <wikibugs>	 (03Merged) 10jenkins-bot: Enable session-ip log channel on group1, except Commons/Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633271 (https://phabricator.wikimedia.org/T264799) (owner: 10Gergő Tisza)
[22:52:15] <logmsgbot>	 !log tgr@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:633271|Enable session-ip log channel on group1, except Commons/Wikidata (T264799)]] (duration: 00m 57s)
[22:52:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:52:22] <stashbot>	 T264799: Log when a request with the same user session comes from a different IP - https://phabricator.wikimedia.org/T264799
[22:57:19] <wikibugs>	 (03PS1) 10Gergő Tisza: Enable session-ip log channel on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633272 (https://phabricator.wikimedia.org/T264799)
[23:02:11] <icinga-wm>	 PROBLEM - PHP7 rendering on testvm1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[23:03:38] <logmsgbot>	 !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime
[23:03:38] <logmsgbot>	 !log dzahn@cumin1001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
[23:03:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:03:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:04:16] <icinga-wm>	 ACKNOWLEDGEMENT - PHP7 rendering on testvm1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds daniel_zahn test https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[23:05:01] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops, 10Maps: (Need By: TBD) rack/setup/install maps20[05-10].codfw.wmnet - https://phabricator.wikimedia.org/T260271 (10Dzahn) maps2010 is reported as down since about 3 days
[23:05:24] <wikibugs>	 (03PS1) 10Urbanecm: Allow testwiki bureaucrats to grant and revoke (transwiki) importer rights [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633273
[23:06:08] <logmsgbot>	 !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime
[23:06:08] <logmsgbot>	 !log dzahn@cumin1001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
[23:06:12] <logmsgbot>	 !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime
[23:06:12] <logmsgbot>	 !log dzahn@cumin1001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
[23:06:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:06:16] <logmsgbot>	 !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime
[23:06:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:06:17] <logmsgbot>	 !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[23:06:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:06:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:06:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:06:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:07:35] <mutante>	 !log maps2010 is down since almost 3 days - unhandled crit alert but nothing in SAL or tickets
[23:07:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:08:35] <wikibugs>	 (03CR) 10DannyS712: Allow testwiki bureaucrats to grant and revoke (transwiki) importer rights (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633273 (owner: 10Urbanecm)
[23:10:15] <wikibugs>	 (03PS2) 10Urbanecm: [testwiki, test2wiki] Allow bureaucrats to grant import rights [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633273
[23:11:24] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops, 10Maps: (Need By: TBD) rack/setup/install maps20[05-10].codfw.wmnet - https://phabricator.wikimedia.org/T260271 (10Dzahn) Is there a ticket for moving these into production?
[23:11:57] <wikibugs>	 (03CR) 10Gergő Tisza: [C: 03+2] Enable session-ip log channel on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633272 (https://phabricator.wikimedia.org/T264799) (owner: 10Gergő Tisza)
[23:13:06] <wikibugs>	 (03Merged) 10jenkins-bot: Enable session-ip log channel on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633272 (https://phabricator.wikimedia.org/T264799) (owner: 10Gergő Tisza)
[23:13:49] <mutante>	 !log maps2010 is down since almost 3 days - unhandled crit alert but nothing in SAL and only related ticket says resolved - powercycling it - boots normal but doesn't have a prod role (T260271)
[23:13:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:13:55] <stashbot>	 T260271: (Need By: TBD) rack/setup/install maps20[05-10].codfw.wmnet - https://phabricator.wikimedia.org/T260271
[23:16:17] <icinga-wm>	 PROBLEM - OSPF status on cr2-codfw is CRITICAL: OSPFv2: 5/5 UP : OSPFv3: 4/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[23:17:49] <icinga-wm>	 RECOVERY - OSPF status on cr2-codfw is OK: OSPFv2: 5/5 UP : OSPFv3: 5/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[23:24:51] <wikibugs>	 10Operations, 10serviceops: Ugrade the MediaWiki appservers to debian buster, icu63 - https://phabricator.wikimedia.org/T264991 (10Dzahn) There was already T245757 with dependency tickets.
[23:25:12] <logmsgbot>	 !log tgr@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:633272|Enable session-ip log channel on Commons (T264799)]] (duration: 00m 59s)
[23:25:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:25:18] <stashbot>	 T264799: Log when a request with the same user session comes from a different IP - https://phabricator.wikimedia.org/T264799
[23:26:07] <wikibugs>	 10Operations, 10serviceops: Ugrade the MediaWiki appservers to debian buster, icu63 - https://phabricator.wikimedia.org/T264991 (10Dzahn) As well as T250515 for the PHP packages.
[23:31:13] <wikibugs>	 (03PS1) 10Gergő Tisza: Enable session-ip log channel on Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633274 (https://phabricator.wikimedia.org/T264799)
[23:31:15] <wikibugs>	 10Operations, 10serviceops: Ugrade the MediaWiki appservers to debian buster, icu63 - https://phabricator.wikimedia.org/T264991 (10Dzahn) > Fix all of our puppet code for MediaWiki for incompatibilities with buster  I applied the puppet role on a buster test VM in eqiad and the following packages are missing:...
[23:40:46] <wikibugs>	 (03CR) 10Gergő Tisza: [C: 03+2] Enable session-ip log channel on Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633274 (https://phabricator.wikimedia.org/T264799) (owner: 10Gergő Tisza)
[23:41:39] <wikibugs>	 (03Merged) 10jenkins-bot: Enable session-ip log channel on Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633274 (https://phabricator.wikimedia.org/T264799) (owner: 10Gergő Tisza)
[23:44:58] <logmsgbot>	 !log tgr@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:633274|Enable session-ip log channel on Wikidata (T264799)]] (duration: 00m 59s)
[23:45:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:45:04] <stashbot>	 T264799: Log when a request with the same user session comes from a different IP - https://phabricator.wikimedia.org/T264799
[23:46:06] <wikibugs>	 (03PS1) 10Dzahn: mediawiki: replace font package ttf-alee with fonts-alee [puppet] - 10https://gerrit.wikimedia.org/r/633275 (https://phabricator.wikimedia.org/T264991)
[23:49:35] <wikibugs>	 (03PS1) 10Gergő Tisza: Enable session-ip log channel on eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633276 (https://phabricator.wikimedia.org/T264799)
[23:51:30] <wikibugs>	 (03PS2) 10Dzahn: mediawiki: replace font package ttf-alee with fonts-alee [puppet] - 10https://gerrit.wikimedia.org/r/633275 (https://phabricator.wikimedia.org/T264991)
[23:51:39] <wikibugs>	 10Operations, 10serviceops, 10Patch-For-Review: Ugrade the MediaWiki appservers to debian buster, icu63 - https://phabricator.wikimedia.org/T264991 (10Legoktm) >>! In T264991#6533968, @Dzahn wrote: > - ploticus  {T253377}  > - php7.2-opcache > - php7.2-common  These should be php7.3 now.
[23:53:53] <wikibugs>	 10Operations, 10serviceops, 10Patch-For-Review: Ugrade the MediaWiki appservers to debian buster, icu63 - https://phabricator.wikimedia.org/T264991 (10Dzahn)
[23:54:32] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops, 10Maps: (Need By: TBD) rack/setup/install maps20[05-10].codfw.wmnet - https://phabricator.wikimedia.org/T260271 (10Peachey88)