[00:00:04] <jouncebot>	 RoanKattouw, Niharika, and Urbanecm: How many deployers does it take to do Evening SWAT(Max 6 patches) deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200108T0000).
[00:00:04] <jouncebot>	 NicholasG04: A patch you scheduled for Evening SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[00:01:23] <NicholasG04>	 :)
[00:02:57] <wikibugs>	 (03CR) 10Dzahn: [V: 03+2 C: 03+2] delete unused fake SSL keys [labs/private] - 10https://gerrit.wikimedia.org/r/561909 (owner: 10Dzahn)
[00:12:54] <NicholasG04>	 Urbanecm should I be doing something?
[00:13:18] <Reedy>	 It kinda looks like no one has shown up to do the deploy
[00:14:24] <wikibugs>	 (03PS2) 10Dzahn: gerrit: adjust bacula backup behaviour to deal with multiple hosts [puppet] - 10https://gerrit.wikimedia.org/r/562639 (https://phabricator.wikimedia.org/T239151)
[00:16:24] <wikibugs>	 (03PS6) 10Reedy: Added throttle rule for University of Derby mini-editathon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559612 (https://phabricator.wikimedia.org/T240845) (owner: 10NicholasG04)
[00:16:56] <wikibugs>	 (03CR) 10Reedy: [C: 03+2] Added throttle rule for University of Derby mini-editathon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559612 (https://phabricator.wikimedia.org/T240845) (owner: 10NicholasG04)
[00:17:27] <NicholasG04>	 Reedy literally ._.
[00:17:35] <Reedy>	 ?
[00:18:11] <wikibugs>	 (03Merged) 10jenkins-bot: Added throttle rule for University of Derby mini-editathon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559612 (https://phabricator.wikimedia.org/T240845) (owner: 10NicholasG04)
[00:18:17] <NicholasG04>	 The three scheduled aren't here lol
[00:18:38] <Reedy>	 It's mostly a best effort thing
[00:19:02] <NicholasG04>	 Thank you for sorting it!
[00:21:39] <logmsgbot>	 !log reedy@deploy1001 Synchronized wmf-config/throttle.php: T240845 (duration: 01m 04s)
[00:21:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:21:43] <stashbot>	 T240845: Temporary lift of IP cap on en.wikipedia for 21 Jan 2020 - https://phabricator.wikimedia.org/T240845
[00:23:49] <wikibugs>	 (03PS1) 10Dzahn: admins: add clarakosi to deploy-service for RESTBase deployment [puppet] - 10https://gerrit.wikimedia.org/r/562661 (https://phabricator.wikimedia.org/T242152)
[00:25:02] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] admins: add clarakosi to deploy-service for RESTBase deployment [puppet] - 10https://gerrit.wikimedia.org/r/562661 (https://phabricator.wikimedia.org/T242152) (owner: 10Dzahn)
[00:29:04] <icinga-wm>	 PROBLEM - Check systemd state on netflow5001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:37:21] <logmsgbot>	 !log ebernhardson@deploy1001 Started deploy [wikimedia/discovery/analytics@024488f]: airflow: set mjolnir dag start date to today (20200108)
[00:37:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:38:03] <logmsgbot>	 !log ebernhardson@deploy1001 Finished deploy [wikimedia/discovery/analytics@024488f]: airflow: set mjolnir dag start date to today (20200108) (duration: 00m 42s)
[00:38:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:43:52] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to RESTBase for clarakosi - https://phabricator.wikimedia.org/T242152 (10Dzahn) a:03Dzahn
[00:45:10] <wikibugs>	 10Operations, 10SRE-Access-Requests: Requesting access to EventLogging data for knissen - https://phabricator.wikimedia.org/T241838 (10Dzahn)
[00:45:42] <wikibugs>	 10Operations, 10SRE-Access-Requests: Requesting access to EventLogging data for knissen - https://phabricator.wikimedia.org/T241838 (10Dzahn)
[00:48:31] <wikibugs>	 10Operations, 10Analytics, 10Product-Analytics, 10SRE-Access-Requests: Access to analytics infrastructure for SNowick_WMF - https://phabricator.wikimedia.org/T242026 (10Dzahn)
[00:53:21] <wikibugs>	 10Operations, 10Analytics, 10Product-Analytics, 10SRE-Access-Requests: Access to analytics infrastructure for SNowick_WMF - https://phabricator.wikimedia.org/T242026 (10Dzahn) I see on https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hue " If you already have cluster access, but can't log into Hue, it...
[00:54:23] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10WMF-Legal: Request to add Silvan Heintze to the ldap/wmde group - https://phabricator.wikimedia.org/T242080 (10Dzahn)
[00:56:40] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10WMF-Legal: Request to add Silvan Heintze to the ldap/wmde group - https://phabricator.wikimedia.org/T242080 (10Dzahn) a:03Silvan_WMDE Hi Silvan, assigning this ticket to you for signing the NDA. Once that is done please assign it back to "nobody' or me or just leave...
[01:09:21] <wikibugs>	 (03PS2) 10Dzahn: admins: add clarakosi to deploy-service for RESTBase deployment [puppet] - 10https://gerrit.wikimedia.org/r/562661 (https://phabricator.wikimedia.org/T242152)
[01:11:53] <wikibugs>	 (03PS2) 10Dzahn: phabricator: Remove comment about bans being superseded by now non existent WP0 bans [puppet] - 10https://gerrit.wikimedia.org/r/543138 (owner: 10Reedy)
[01:12:08] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "comment-only" [puppet] - 10https://gerrit.wikimedia.org/r/543138 (owner: 10Reedy)
[01:14:05] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] phabricator: Remove comment about bans being superseded by now non existent WP0 bans [puppet] - 10https://gerrit.wikimedia.org/r/543138 (owner: 10Reedy)
[01:16:59] <mutante>	 jerkins ...
[01:17:47] <wikibugs>	 (03PS3) 10Dzahn: phabricator: Remove comment about bans being superseded by WP0 bans [puppet] - 10https://gerrit.wikimedia.org/r/543138 (owner: 10Reedy)
[01:19:20] <Reedy>	 mutante: rebase it?
[01:21:40] <mutante>	 Reedy: it was "commit message too long" 
[01:21:46] <mutante>	 submits
[01:23:00] <wikibugs>	 (03PS1) 10EBernhardson: airflow: Provide wrapper script to invoke airflow [puppet] - 10https://gerrit.wikimedia.org/r/562666
[01:25:39] <wikibugs>	 (03PS2) 10Dzahn: Adapt auto restart for Buster [puppet] - 10https://gerrit.wikimedia.org/r/562473 (owner: 10Muehlenhoff)
[01:25:59] <wikibugs>	 (03PS3) 10Dzahn: url_downloader: Adapt auto restart for Buster [puppet] - 10https://gerrit.wikimedia.org/r/562473 (owner: 10Muehlenhoff)
[01:26:39] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] url_downloader: Adapt auto restart for Buster [puppet] - 10https://gerrit.wikimedia.org/r/562473 (owner: 10Muehlenhoff)
[01:33:13] <wikibugs>	 10Operations, 10vm-requests: eqiad/codfw: 2 VM request for URL downloaders - https://phabricator.wikimedia.org/T241979 (10Dzahn) 05Open→03Resolved VMs have been created. looks like from here on it will continue on T224551
[01:33:15] <wikibugs>	 10Operations: Migrate URL downloaders to Buster - https://phabricator.wikimedia.org/T224551 (10Dzahn) merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/562473  commented on https://gerrit.wikimedia.org/r/c/operations/puppet/+/562472
[01:34:36] <wikibugs>	 10Operations, 10Analytics, 10Product-Analytics, 10SRE-Access-Requests: Access to analytics infrastructure for SNowick_WMF - https://phabricator.wikimedia.org/T242026 (10SNowick_WMF) Thanks, yes it is a manual sync process:  The ticket attached to this one says "Currently, Hue users are manually synced from...
[01:39:19] <wikibugs>	 10Operations, 10Analytics, 10Product-Analytics, 10SRE-Access-Requests: Access to analytics infrastructure for SNowick_WMF - https://phabricator.wikimedia.org/T242026 (10Dzahn) a:03elukey
[02:12:02] <wikibugs>	 (03CR) 10CDanis: fastnetmon: add UDP/ICMP bw limits, greatly increase pps limits (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/562387 (owner: 10CDanis)
[03:02:36] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=atlas_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[03:04:24] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[03:54:49] <logmsgbot>	 !log volker-e@deploy1001 Started deploy [design/style-guide@ad595d5]: Deploy design/style-guide:
[03:54:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:54:57] <logmsgbot>	 !log volker-e@deploy1001 Finished deploy [design/style-guide@ad595d5]: Deploy design/style-guide:  (duration: 00m 08s)
[03:54:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:14:04] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] Set port 443 (was 8190) for term schema in analytics-in4 [homer/public] - 10https://gerrit.wikimedia.org/r/562543 (owner: 10Elukey)
[05:27:07] <wikibugs>	 (03PS3) 10Ammarpad: Enable lead paragraph in user namespace on nlwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562486 (https://phabricator.wikimedia.org/T242030)
[05:35:28] <icinga-wm>	 RECOVERY - Check systemd state on netflow5001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:38:32] <wikibugs>	 (03PS1) 10Ayounsi: Enable netflow in eqsin [homer/public] - 10https://gerrit.wikimedia.org/r/562692
[05:40:52] <wikibugs>	 (03CR) 10Ayounsi: [V: 03+2 C: 03+2] Enable netflow in eqsin [homer/public] - 10https://gerrit.wikimedia.org/r/562692 (owner: 10Ayounsi)
[05:41:14] <wikibugs>	 (03CR) 10Ayounsi: [V: 03+2 C: 03+2] "Diff for 2 devices: ['cr1-eqsin.wikimedia.org', 'cr2-eqsin.wikimedia.org']" [homer/public] - 10https://gerrit.wikimedia.org/r/562692 (owner: 10Ayounsi)
[05:41:43] <XioNoX>	 !log enable netflow in eqsin
[05:41:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:55:08] <wikibugs>	 (03PS6) 10Ammarpad: Add initial configuration for ng.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559218 (https://phabricator.wikimedia.org/T240771) (owner: 10IAmNetx)
[05:58:16] <wikibugs>	 (03CR) 10Ammarpad: [C: 03+1] Add initial configuration for ng.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559218 (https://phabricator.wikimedia.org/T240771) (owner: 10IAmNetx)
[06:06:11] <wikibugs>	 (03PS1) 10Ayounsi: Remove sampling: true for eqsin as it's true by default [homer/public] - 10https://gerrit.wikimedia.org/r/562697
[06:07:06] <wikibugs>	 (03CR) 10Ayounsi: [V: 03+2 C: 03+2] Remove sampling: true for eqsin as it's true by default [homer/public] - 10https://gerrit.wikimedia.org/r/562697 (owner: 10Ayounsi)
[06:11:19] <wikibugs>	 (03PS1) 10Ayounsi: Enable netflow sampling in knams [homer/public] - 10https://gerrit.wikimedia.org/r/562698
[06:15:57] <wikibugs>	 (03CR) 10Ayounsi: "Faidon for the administrative stamp, Chris for the technical one." [homer/public] - 10https://gerrit.wikimedia.org/r/562698 (owner: 10Ayounsi)
[06:19:38] <wikibugs>	 10Operations, 10SRE-Access-Requests: Requesting access to production servers in perf-team group for dpifke - https://phabricator.wikimedia.org/T242189 (10dpifke)
[06:24:12] <DannyS712>	 Please see https://phabricator.wikimedia.org/T242188 - PHP fatal error on beta cluster
[06:25:03] <wikibugs>	 10Operations, 10DBA: backup2001 crashed 2019-12-08 - https://phabricator.wikimedia.org/T240177 (10Marostegui) Thanks for clarifying Papaul. Jaime is off and will be back online the 9th of January
[06:26:38] <wikibugs>	 10Operations, 10Traffic: servers freeze across the caching cluster - https://phabricator.wikimedia.org/T238305 (10Marostegui) Thanks for the clarification. My thoughts were that we upgraded also BIOS. Let's start with that indeed.
[06:33:46] <wikibugs>	 10Operations, 10Beta-Cluster-Infrastructure, 10Lexicographical data, 10Wikidata, 10User-DannyS712: PHP fatal error on beta cluster - https://phabricator.wikimedia.org/T242188 (10DannyS712)
[06:35:52] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1098:3316 - T239453', diff saved to https://phabricator.wikimedia.org/P10077 and previous config saved to /var/cache/conftool/dbconfig/20200108-063550-marostegui.json
[06:35:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:35:55] <stashbot>	 T239453: Remove partitions from revision table - https://phabricator.wikimedia.org/T239453
[06:41:45] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1096:3316 - T239453', diff saved to https://phabricator.wikimedia.org/P10078 and previous config saved to /var/cache/conftool/dbconfig/20200108-064144-marostegui.json
[06:41:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:41:49] <stashbot>	 T239453: Remove partitions from revision table - https://phabricator.wikimedia.org/T239453
[06:42:27] <marostegui>	 !log Remove partitions from revision table on s6 for db1096:3316 - T239453
[06:42:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:44:05] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1079', diff saved to https://phabricator.wikimedia.org/P10079 and previous config saved to /var/cache/conftool/dbconfig/20200108-064404-marostegui.json
[06:44:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:45:14] <wikibugs>	 10Operations, 10Beta-Cluster-Infrastructure, 10Lexicographical data, 10Wikidata, 10User-DannyS712: PHP fatal error on beta cluster - https://phabricator.wikimedia.org/T242188 (10DannyS712) @Reedy can I ask why https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/WikibaseLexeme/+/562646/ was abandoned?...
[06:49:41] <wikibugs>	 (03PS1) 10Marostegui: db1114: Change package to reflect the current one [puppet] - 10https://gerrit.wikimedia.org/r/562702
[06:55:28] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db1114: Change package to reflect the current one [puppet] - 10https://gerrit.wikimedia.org/r/562702 (owner: 10Marostegui)
[06:56:43] <marostegui>	 !log Upgrade db1079
[06:56:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:58:23] <wikibugs>	 10Operations, 10Puppet, 10DBA, 10User-jbond: DB: perform rolling restart of mariadb deamons to pick up CA changes - https://phabricator.wikimedia.org/T239791 (10Marostegui)
[07:00:10] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1079', diff saved to https://phabricator.wikimedia.org/P10080 and previous config saved to /var/cache/conftool/dbconfig/20200108-070009-marostegui.json
[07:00:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:06:16] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1079', diff saved to https://phabricator.wikimedia.org/P10081 and previous config saved to /var/cache/conftool/dbconfig/20200108-070614-marostegui.json
[07:06:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:07:13] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1097:3315', diff saved to https://phabricator.wikimedia.org/P10082 and previous config saved to /var/cache/conftool/dbconfig/20200108-070712-marostegui.json
[07:07:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:07:34] <marostegui>	 !log Remove partitions from dewiki.revision on db1097:3315 T239453
[07:07:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:07:37] <stashbot>	 T239453: Remove partitions from revision table - https://phabricator.wikimedia.org/T239453
[07:13:14] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1079', diff saved to https://phabricator.wikimedia.org/P10083 and previous config saved to /var/cache/conftool/dbconfig/20200108-071312-marostegui.json
[07:13:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:20:19] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Fully repool db1079', diff saved to https://phabricator.wikimedia.org/P10084 and previous config saved to /var/cache/conftool/dbconfig/20200108-072017-marostegui.json
[07:20:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:23:02] <icinga-wm>	 RECOVERY - snapshot of s7 in eqiad on db1115 is OK: snapshot for s7 at eqiad taken less than 4 days ago and larger than 90 GB: Last one 2020-01-08 05:14:57 from db1116.eqiad.wmnet:3317 (899 GB) https://wikitech.wikimedia.org/wiki/MariaDB/Backups
[07:32:19] <marostegui>	 XioNoX: ^ :)
[07:36:25] <XioNoX>	 nice!
[07:41:52] <wikibugs>	 10Operations, 10Traffic: servers freeze across the caching cluster - https://phabricator.wikimedia.org/T238305 (10ema) >>! In T238305#5784511, @Papaul wrote: > sometimes when the IDRAC version is not up to date we might not see and log at system crash  Interesting!  > so i think let us start by getting all tho...
[07:50:25] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+1] Add initial configuration for ng.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559218 (https://phabricator.wikimedia.org/T240771) (owner: 10IAmNetx)
[07:57:59] <marostegui>	 !log Deploy schema change on clouddb2001-dev.labtestwiki - T234052
[07:58:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:58:02] <stashbot>	 T234052: Add abuse_filter_log.afl_filter_id and afl_global columns - https://phabricator.wikimedia.org/T234052
[08:02:15] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] Enable hive kerberos connections from search/airflow instances [puppet] - 10https://gerrit.wikimedia.org/r/562589 (owner: 10EBernhardson)
[08:05:34] <wikibugs>	 10Operations, 10LDAP-Access-Requests: NDA for Superset Request from WMDE Employee - Kris Litson - https://phabricator.wikimedia.org/T241722 (10Kris_Litson_WMDE) Got it! Thanks everyone!
[08:07:11] <marostegui>	 !log Deploy schema change on s1 codfw, there will be lag on s1 codfw - T234052
[08:07:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:07:14] <stashbot>	 T234052: Add abuse_filter_log.afl_filter_id and afl_global columns - https://phabricator.wikimedia.org/T234052
[08:08:54] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1085', diff saved to https://phabricator.wikimedia.org/P10085 and previous config saved to /var/cache/conftool/dbconfig/20200108-080853-marostegui.json
[08:08:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:09:01] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] caching-proxy: squid vs squid3 paths [puppet] - 10https://gerrit.wikimedia.org/r/562560 (owner: 10Filippo Giunchedi)
[08:09:15] <marostegui>	 !log Upgrade db1085
[08:09:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:12:40] <wikibugs>	 10Operations, 10Puppet, 10DBA, 10User-jbond: DB: perform rolling restart of mariadb deamons to pick up CA changes - https://phabricator.wikimedia.org/T239791 (10Marostegui)
[08:19:48] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10WMF-Legal: Request to add Silvan Heintze to the ldap/wmde group - https://phabricator.wikimedia.org/T242080 (10Silvan_WMDE) a:05Silvan_WMDE→03Dzahn Thanks everyone, I just signed the NDA.
[08:20:52] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1085', diff saved to https://phabricator.wikimedia.org/P10086 and previous config saved to /var/cache/conftool/dbconfig/20200108-082050-marostegui.json
[08:20:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:29:31] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1085', diff saved to https://phabricator.wikimedia.org/P10087 and previous config saved to /var/cache/conftool/dbconfig/20200108-082930-marostegui.json
[08:29:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:41:26] <wikibugs>	 10Operations, 10Beta-Cluster-Infrastructure, 10Lexicographical data, 10Wikidata, 10User-DannyS712: PHP fatal error on beta cluster - https://phabricator.wikimedia.org/T242188 (10Reedy) >>! In T242188#5784973, @DannyS712 wrote: > @Reedy can I ask why https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions...
[08:41:41] <wikibugs>	 10Operations, 10Beta-Cluster-Infrastructure, 10Lexicographical data, 10Wikidata, 10User-DannyS712: PHP fatal error on beta cluster - https://phabricator.wikimedia.org/T242188 (10Reedy) 05Open→03Resolved a:03Reedy
[08:49:42] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: DNM: Append -http to eventgate-analytics [puppet] - 10https://gerrit.wikimedia.org/r/562767
[08:50:12] <wikibugs>	 (03CR) 10Alexandros Kosiaris: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/562767 (owner: 10Alexandros Kosiaris)
[08:54:34] <wikibugs>	 10Operations, 10Maps, 10Discovery-Search (Current work): Re-import OSM data at eqiad and codfw to temporarily fix current OSM replication issues. - https://phabricator.wikimedia.org/T239728 (10Pikne) I'm not sure if this should be considered part of or related to this task, but no new tiles have been generat...
[08:58:36] <wikibugs>	 10Operations, 10netops: Routinator RSYNC errors - https://phabricator.wikimedia.org/T240817 (10ayounsi) 05Open→03Stalled p:05Normal→03Low
[08:58:40] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] admins: add clarakosi to deploy-service for RESTBase deployment [puppet] - 10https://gerrit.wikimedia.org/r/562661 (https://phabricator.wikimedia.org/T242152) (owner: 10Dzahn)
[09:00:26] <moritzm>	 !log installing urldownloader1001 T241979
[09:00:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:00:29] <stashbot>	 T241979: eqiad/codfw: 2 VM request for URL downloaders - https://phabricator.wikimedia.org/T241979
[09:01:56] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] profile::url_downloader: Add types and switch to lookup() (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/562472 (owner: 10Muehlenhoff)
[09:11:22] <wikibugs>	 10Operations, 10netops: Upgrade routinator to 0.6.4 - https://phabricator.wikimedia.org/T242197 (10ayounsi) p:05Triage→03Low
[09:11:25] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Fully repool db1085', diff saved to https://phabricator.wikimedia.org/P10088 and previous config saved to /var/cache/conftool/dbconfig/20200108-091124-marostegui.json
[09:11:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:23:28] <wikibugs>	 10Operations, 10Traffic: Docker registry needs cache to vary on Accept header value - https://phabricator.wikimedia.org/T242200 (10Joe)
[09:27:09] <moritzm>	 !log installing urldownloader1002 T241979
[09:27:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:27:12] <stashbot>	 T241979: eqiad/codfw: 2 VM request for URL downloaders - https://phabricator.wikimedia.org/T241979
[09:33:52] <wikibugs>	 (03PS2) 10Alexandros Kosiaris: DNM: Append -http to eventgate-analytics [puppet] - 10https://gerrit.wikimedia.org/r/562767
[09:37:49] <wikibugs>	 (03PS3) 10Alexandros Kosiaris: DNM: Append -http to eventgate-analytics [puppet] - 10https://gerrit.wikimedia.org/r/562767
[09:40:59] <wikibugs>	 (03PS1) 10Tarrow: Enable tainted references on test.wikidata.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562777 (https://phabricator.wikimedia.org/T239621)
[09:41:46] <wikibugs>	 (03PS1) 10Muehlenhoff: Switch rdb* to standardised Partman layout [puppet] - 10https://gerrit.wikimedia.org/r/562778 (https://phabricator.wikimedia.org/T156955)
[09:41:49] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Enable tainted references on test.wikidata.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562777 (https://phabricator.wikimedia.org/T239621) (owner: 10Tarrow)
[09:44:23] <wikibugs>	 (03CR) 10Alexandros Kosiaris: "Did a full PCC for the whole fleet, identified the hosts that failed and fixed those in the followup patches. Now at https://puppet-compil" [puppet] - 10https://gerrit.wikimedia.org/r/562767 (owner: 10Alexandros Kosiaris)
[09:45:11] <wikibugs>	 (03PS2) 10Tarrow: Enable tainted references on test.wikidata.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562777 (https://phabricator.wikimedia.org/T239621)
[09:45:11] <wikibugs>	 (03CR) 10Alexandros Kosiaris: "I think I 'll tentatively merge this and shepherd it through production to make sure this won't bite us." [puppet] - 10https://gerrit.wikimedia.org/r/562767 (owner: 10Alexandros Kosiaris)
[09:54:11] <wikibugs>	 (03CR) 10Tarrow: "Added some extra camp reviewers. I wanted to know what you think about adding the 'default' line. You'll see this results in many more ent" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562777 (https://phabricator.wikimedia.org/T239621) (owner: 10Tarrow)
[09:57:02] <wikibugs>	 (03PS1) 10Vgutierrez: ATS: Disable TLSv1.0/1.1 support on the caching layer [puppet] - 10https://gerrit.wikimedia.org/r/562779 (https://phabricator.wikimedia.org/T238038)
[09:57:31] <wikibugs>	 (03CR) 10Addshore: [C: 03+1] Enable tainted references on test.wikidata.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562777 (https://phabricator.wikimedia.org/T239621) (owner: 10Tarrow)
[09:59:04] <wikibugs>	 (03CR) 10Vgutierrez: "pcc looks happy: https://puppet-compiler.wmflabs.org/compiler1002/20264/" [puppet] - 10https://gerrit.wikimedia.org/r/562779 (https://phabricator.wikimedia.org/T238038) (owner: 10Vgutierrez)
[10:01:35] <wikibugs>	 (03PS1) 10Muehlenhoff: Extend Netbox Ganeti sync for eqsin [puppet] - 10https://gerrit.wikimedia.org/r/562780 (https://phabricator.wikimedia.org/T228099)
[10:05:58] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/562780 (https://phabricator.wikimedia.org/T228099) (owner: 10Muehlenhoff)
[10:08:03] <moritzm>	 !log enabling spec-ctr, ssbd. md-clear passthrough for new eqsin cluster T228099
[10:08:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:08:08] <stashbot>	 T228099: rack/setup/install ganeti500[123].eqsin.wmnet - https://phabricator.wikimedia.org/T228099
[10:18:43] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] Switch rdb* to standardised Partman layout [puppet] - 10https://gerrit.wikimedia.org/r/562778 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff)
[10:18:50] <icinga-wm>	 PROBLEM - Postgres Replication Lag on puppetdb2002 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB puppetdb (host:localhost) 85468512 and 3 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[10:19:11] <volans>	 wut?
[10:20:36] <icinga-wm>	 RECOVERY - Postgres Replication Lag on puppetdb2002 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB puppetdb (host:localhost) 22216 and 0 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[10:21:11] <akosiaris>	 22216?
[10:21:17] <wikibugs>	 (03PS4) 10Alexandros Kosiaris: Append -http to eventgate-analytics [puppet] - 10https://gerrit.wikimedia.org/r/562767
[10:22:52] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] Append -http to eventgate-analytics [puppet] - 10https://gerrit.wikimedia.org/r/562767 (owner: 10Alexandros Kosiaris)
[10:31:52] <wikibugs>	 (03PS1) 10Muehlenhoff: Don't install the Postgres contrib package on Buster [puppet] - 10https://gerrit.wikimedia.org/r/562787
[10:32:27] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Don't install the Postgres contrib package on Buster [puppet] - 10https://gerrit.wikimedia.org/r/562787 (owner: 10Muehlenhoff)
[10:38:12] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=pdu_sentry4 site=ulsfo https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[10:39:38] <wikibugs>	 (03PS6) 10Alexandros Kosiaris: Set up new LVS service eventgate-analytics-https [puppet] - 10https://gerrit.wikimedia.org/r/559167 (https://phabricator.wikimedia.org/T241073) (owner: 10Ottomata)
[10:39:58] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[10:40:48] <wikibugs>	 (03CR) 10Alexandros Kosiaris: "Check experimental" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/559167 (https://phabricator.wikimedia.org/T241073) (owner: 10Ottomata)
[10:41:31] <moritzm>	 !log rebooting netflow5001 to pick up microcode
[10:41:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:41:35] <wikibugs>	 (03PS2) 10Ema: ATS: add webrequest logging for atskafka [puppet] - 10https://gerrit.wikimedia.org/r/562535 (https://phabricator.wikimedia.org/T237993)
[10:44:18] <wikibugs>	 (03PS1) 10Ayounsi: Routinator: add proxy for RRDP protocol [puppet] - 10https://gerrit.wikimedia.org/r/562788
[10:46:12] <icinga-wm>	 PROBLEM - Check systemd state on netflow5001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:48:19] <wikibugs>	 (03CR) 10Ema: [C: 03+1] 5.1.3-1wm12: Bump version and target buster [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/562493 (https://phabricator.wikimedia.org/T242093) (owner: 10Vgutierrez)
[10:52:20] <wikibugs>	 (03PS1) 10Jbond: apt:::pin: allow callers to override the notify resource [puppet] - 10https://gerrit.wikimedia.org/r/562789
[10:53:05] <logmsgbot>	 !log aborrero@cumin1001 START - Cookbook sre.hosts.downtime
[10:53:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:53:15] <logmsgbot>	 !log aborrero@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[10:53:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:54:25] <wikibugs>	 (03PS1) 10Muehlenhoff: Initially assing spare role to gerrit-test.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/562790 (https://phabricator.wikimedia.org/T239151)
[10:54:27] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] apt:::pin: allow callers to override the notify resource [puppet] - 10https://gerrit.wikimedia.org/r/562789 (owner: 10Jbond)
[10:54:55] <icinga-wm>	 RECOVERY - Check systemd state on netflow5001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:56:55] <wikibugs>	 (03PS2) 10Jbond: apt::pin: allow callers to override the notify resource [puppet] - 10https://gerrit.wikimedia.org/r/562789
[10:59:08] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] apt::pin: allow callers to override the notify resource [puppet] - 10https://gerrit.wikimedia.org/r/562789 (owner: 10Jbond)
[11:00:16] <moritzm>	 !log drain ganeti5003 to test new Ganeti setup in eqsin T228099
[11:00:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:00:18] <stashbot>	 T228099: rack/setup/install ganeti500[123].eqsin.wmnet - https://phabricator.wikimedia.org/T228099
[11:06:27] <wikibugs>	 (03CR) 10Vgutierrez: [V: 03+2 C: 03+2] 5.1.3-1wm12: Bump version and target buster [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/562493 (https://phabricator.wikimedia.org/T242093) (owner: 10Vgutierrez)
[11:07:08] <moritzm>	 !log test failover of Ganeti master in eqsin T228099
[11:07:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:07:11] <stashbot>	 T228099: rack/setup/install ganeti500[123].eqsin.wmnet - https://phabricator.wikimedia.org/T228099
[11:11:35] <wikibugs>	 (03PS7) 10Alexandros Kosiaris: Set up new LVS service eventgate-analytics-https [puppet] - 10https://gerrit.wikimedia.org/r/559167 (https://phabricator.wikimedia.org/T241073) (owner: 10Ottomata)
[11:11:35] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: lvs: Remove unused eventgate-analytics-http service [puppet] - 10https://gerrit.wikimedia.org/r/562792 (https://phabricator.wikimedia.org/T241073)
[11:14:18] <wikibugs>	 (03PS3) 10Ema: ATS: add webrequest logging for atskafka [puppet] - 10https://gerrit.wikimedia.org/r/562535 (https://phabricator.wikimedia.org/T237993)
[11:16:40] <wikibugs>	 (03PS1) 10Muehlenhoff: Re-enable notifications for ganeti5*, setup is done [puppet] - 10https://gerrit.wikimedia.org/r/562793 (https://phabricator.wikimedia.org/T228099)
[11:19:24] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Re-enable notifications for ganeti5*, setup is done [puppet] - 10https://gerrit.wikimedia.org/r/562793 (https://phabricator.wikimedia.org/T228099) (owner: 10Muehlenhoff)
[11:25:02] <wikibugs>	 10Operations, 10Patch-For-Review: rack/setup/install ganeti500[123].eqsin.wmnet - https://phabricator.wikimedia.org/T228099 (10MoritzMuehlenhoff) 05Open→03Resolved I tested a failover and an instance migration successfully. I also changed the cluster setting so that CPU vulnerability flags are passed throu...
[11:26:49] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Initially assing spare role to gerrit-test.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/562790 (https://phabricator.wikimedia.org/T239151) (owner: 10Muehlenhoff)
[11:36:13] <akosiaris>	 moritzm: assing? 
[11:36:19] <wikibugs>	 (03PS8) 10Alexandros Kosiaris: Set up new LVS service eventgate-analytics-https [puppet] - 10https://gerrit.wikimedia.org/r/559167 (https://phabricator.wikimedia.org/T241073) (owner: 10Ottomata)
[11:36:20] <akosiaris>	 probably adding, right?
[11:36:21] <wikibugs>	 (03PS2) 10Alexandros Kosiaris: lvs: Remove unused eventgate-analytics-http service [puppet] - 10https://gerrit.wikimedia.org/r/562792 (https://phabricator.wikimedia.org/T241073)
[11:36:31] <akosiaris>	 although it was a nice typo
[11:36:40] <moritzm>	 oh, I actually meant assign :-)
[11:36:51] <akosiaris>	 ahaha
[11:36:53] <akosiaris>	 even better
[11:37:38] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] Set up new LVS service eventgate-analytics-https [puppet] - 10https://gerrit.wikimedia.org/r/559167 (https://phabricator.wikimedia.org/T241073) (owner: 10Ottomata)
[11:38:31] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] "This is a bit of a weird one as it's a migration of an existing service to a different port. So, we reuse a lot of the things (e.g. IP, di" [puppet] - 10https://gerrit.wikimedia.org/r/559167 (https://phabricator.wikimedia.org/T241073) (owner: 10Ottomata)
[11:44:24] <logmsgbot>	 !log akosiaris@cumin1001 conftool action : set/pooled=yes; selector: name=kubernetes1001.*
[11:44:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:44:38] <logmsgbot>	 !log akosiaris@cumin1001 conftool action : set/weight=10; selector: name=kubernetes1001.*
[11:44:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:44:39] <vgutierrez>	 !log uploaded varnish 5.1.3-1wm12 to apt.wikimedia.org (buster) - T242093
[11:44:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:44:42] <stashbot>	 T242093: Upgrade cache cluster to debian buster - https://phabricator.wikimedia.org/T242093
[11:45:15] <icinga-wm>	 PROBLEM - PyBal IPVS diff check on lvs1016 is CRITICAL: CRITICAL: Services known to PyBal but not to IPVS: set([10.2.2.42:4192]) https://wikitech.wikimedia.org/wiki/PyBal
[11:45:46] <vgutierrez>	 I guess that's expected akosiaris 
[11:45:50] <logmsgbot>	 !log akosiaris@cumin1001 conftool action : set/weight=10; selector: service=echostore
[11:45:51] <akosiaris>	 yup
[11:45:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:45:55] <icinga-wm>	 PROBLEM - PyBal IPVS diff check on lvs2006 is CRITICAL: CRITICAL: Services known to PyBal but not to IPVS: set([10.2.1.42:4192]) https://wikitech.wikimedia.org/wiki/PyBal
[11:46:06] <akosiaris>	 pybal has already been restarted on the mains to avoid pages
[11:46:09] <icinga-wm>	 PROBLEM - PyBal connections to etcd on lvs1016 is CRITICAL: CRITICAL: 95 connections established with conf1004.eqiad.wmnet:4001 (min=96) https://wikitech.wikimedia.org/wiki/PyBal
[11:46:24] <akosiaris>	 those are the backups I 'll wait a couple of more mins just to avoid BGP not having converged yet
[11:46:47] <icinga-wm>	 PROBLEM - PyBal connections to etcd on lvs2006 is CRITICAL: CRITICAL: 52 connections established with conf2001.codfw.wmnet:2379 (min=53) https://wikitech.wikimedia.org/wiki/PyBal
[11:47:08] <logmsgbot>	 !log akosiaris@cumin1001 conftool action : set/pooled=yes; selector: name=kubernetes2001.*
[11:47:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:49:33] <wikibugs>	 (03CR) 10Vgutierrez: "recheck" [software/varnish/libvmod-netmapper] (debian) - 10https://gerrit.wikimedia.org/r/562515 (https://phabricator.wikimedia.org/T242093) (owner: 10Vgutierrez)
[11:49:57] <akosiaris>	 at least it has worked pretty well up to now. It does look like we 'll get 0 pages
[11:50:25] <vgutierrez>	 0 pages <3
[11:50:56] <icinga-wm>	 RECOVERY - PyBal IPVS diff check on lvs2006 is OK: OK: no difference between hosts in IPVS/PyBal https://wikitech.wikimedia.org/wiki/PyBal
[11:51:14] <icinga-wm>	 RECOVERY - PyBal connections to etcd on lvs1016 is OK: OK: 96 connections established with conf1004.eqiad.wmnet:4001 (min=96) https://wikitech.wikimedia.org/wiki/PyBal
[11:51:50] <icinga-wm>	 RECOVERY - PyBal connections to etcd on lvs2006 is OK: OK: 53 connections established with conf2001.codfw.wmnet:2379 (min=53) https://wikitech.wikimedia.org/wiki/PyBal
[11:52:43] <wikibugs>	 (03PS1) 10KartikMistry: Update cxserver to 2020-01-06-070550-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/562799 (https://phabricator.wikimedia.org/T233405)
[11:53:00] <akosiaris>	 cool, everything went according to plan. 
[11:53:32] <akosiaris>	 icinga is happy for both services (TLS and temporary now nonTLS one), we got 0 pages, so cool
[11:54:34] <wikibugs>	 (03PS3) 10Jbond: apt:::pin: allow callers to override the notify resource [puppet] - 10https://gerrit.wikimedia.org/r/562789
[11:55:24] <icinga-wm>	 RECOVERY - PyBal IPVS diff check on lvs1016 is OK: OK: no difference between hosts in IPVS/PyBal https://wikitech.wikimedia.org/wiki/PyBal
[11:55:33] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] apt:::pin: allow callers to override the notify resource [puppet] - 10https://gerrit.wikimedia.org/r/562789 (owner: 10Jbond)
[11:55:46] <kart_>	 akosiaris: updating cxserver soon. 
[11:56:10] <akosiaris>	 kart_: ok, thanks for the heads up
[11:56:14] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: toolforge: new k8s: cleanup metrics manifests and files [puppet] - 10https://gerrit.wikimedia.org/r/562800 (https://phabricator.wikimedia.org/T237643)
[11:57:21] <wikibugs>	 (03CR) 10KartikMistry: [C: 03+2] Update cxserver to 2020-01-06-070550-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/562799 (https://phabricator.wikimedia.org/T233405) (owner: 10KartikMistry)
[11:57:42] <wikibugs>	 (03Merged) 10jenkins-bot: Update cxserver to 2020-01-06-070550-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/562799 (https://phabricator.wikimedia.org/T233405) (owner: 10KartikMistry)
[11:57:43] <Lucas_WMDE>	 jouncebot: refresh
[11:57:43] <jouncebot>	 I refreshed my knowledge about deployments.
[11:58:36] <_joe_>	 I will finish my patches for moving to a better abstraction of lvs configurations
[11:58:39] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] toolforge: new k8s: cleanup metrics manifests and files [puppet] - 10https://gerrit.wikimedia.org/r/562800 (https://phabricator.wikimedia.org/T237643) (owner: 10Arturo Borrero Gonzalez)
[11:58:49] <_joe_>	 so that no pages will be the norm, not an exception
[11:59:51] <wikibugs>	 (03PS1) 10Vgutierrez: 1.3.1-3 Rebuild for buster [software/varnish/libvmod-re2] (debian) - 10https://gerrit.wikimedia.org/r/562801 (https://phabricator.wikimedia.org/T242093)
[12:00:01] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] 1.3.1-3 Rebuild for buster [software/varnish/libvmod-re2] (debian) - 10https://gerrit.wikimedia.org/r/562801 (https://phabricator.wikimedia.org/T242093) (owner: 10Vgutierrez)
[12:00:04] <jouncebot>	 Amir1, Lucas_WMDE, awight, and Urbanecm: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for European Mid-day SWAT(Max 6 patches) . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200108T1200).
[12:00:04] <jouncebot>	 tarrow, CFisch_WMDE, and Lucas_WMDE: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[12:00:09] <Lucas_WMDE>	 o/
[12:00:22] <Lucas_WMDE>	 I can SWAT
[12:00:23] <logmsgbot>	 !log kartik@deploy1001 helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
[12:00:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:00:29] <Lucas_WMDE>	 tarrow: do you want to start by deploying your change?
[12:00:57] <tarrow>	 I would be happy to! Unless CFisch_WMDE wants to go first?
[12:01:54] <logmsgbot>	 !log kartik@deploy1001 helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
[12:01:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:01:59] <Lucas_WMDE>	 I’ll already +2 the backport, since it’ll take a while to go through CI anyways
[12:02:04] <Lucas_WMDE>	 but I think you can go first
[12:02:16] <tarrow>	 Thanks! I'm doing it now :)
[12:03:37] <wikibugs>	 (03CR) 10Tarrow: [C: 03+2] Enable tainted references on test.wikidata.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562777 (https://phabricator.wikimedia.org/T239621) (owner: 10Tarrow)
[12:04:16] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: toolforge: new k8s: fix metrics directory [puppet] - 10https://gerrit.wikimedia.org/r/562802 (https://phabricator.wikimedia.org/T237643)
[12:04:43] <wikibugs>	 (03Merged) 10jenkins-bot: Enable tainted references on test.wikidata.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562777 (https://phabricator.wikimedia.org/T239621) (owner: 10Tarrow)
[12:04:48] <logmsgbot>	 !log kartik@deploy1001 helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
[12:04:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:05:09] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] toolforge: new k8s: fix metrics directory [puppet] - 10https://gerrit.wikimedia.org/r/562802 (https://phabricator.wikimedia.org/T237643) (owner: 10Arturo Borrero Gonzalez)
[12:06:16] <CFisch_WMDE>	 tarrow: I'm in a meeting, would be happy if someone does the deployment of my backport
[12:06:21] <CFisch_WMDE>	 I can check
[12:06:39] <Lucas_WMDE>	 I can deploy it
[12:08:17] <tarrow>	 Lucas_WMDE: silly question time: Did mwdebug1002 change ssh key in the last month or so?
[12:08:46] <kart_>	 !log Updated cxserver to 2020-01-06-070550-production (T233405)
[12:08:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:08:49] <stashbot>	 T233405: Reference shown duplicated in the source document - https://phabricator.wikimedia.org/T233405
[12:08:56] <Lucas_WMDE>	 it might have
[12:09:02] <Lucas_WMDE>	 I think at the moment you’re supposed to use mwdebug1001 anyways
[12:09:23] <tarrow>	 ah! yes that's where I've confused myself
[12:10:06] <moritzm>	 tarrow: yeah, it was reimaged some time in December IIRC
[12:10:52] <moritzm>	 or rather November: https://phabricator.wikimedia.org/T236806
[12:11:06] <Lucas_WMDE>	 not sure if mwdebug1002 is still verboten actually
[12:11:12] <Lucas_WMDE>	 the motd warning about it was reverted, apparently: https://gerrit.wikimedia.org/r/c/operations/puppet/+/559088
[12:14:26] <Urbanecm>	 Lucas_WMDE: it's still broken, but also the other mwdebug1001 started to behave the same way. It doesn't matter which host you use
[12:14:50] <Lucas_WMDE>	 great
[12:14:58] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: lvs: Append -http to eventgate-main [puppet] - 10https://gerrit.wikimedia.org/r/562805 (https://phabricator.wikimedia.org/T241073)
[12:17:24] <tarrow>	 syncing now :)
[12:17:48] <logmsgbot>	 !log tarrow@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:562777|Enable tainted references on test.wikidata.org (T239621)]] (duration: 01m 19s)
[12:17:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:17:53] <stashbot>	 T239621: Enable Tainted Refs on test.wikidata.org - https://phabricator.wikimedia.org/T239621
[12:17:54] <wikibugs>	 (03PS4) 10Jbond: apt:::pin: allow callers to override the notify resource [puppet] - 10https://gerrit.wikimedia.org/r/562789
[12:18:36] <Lucas_WMDE>	 \o/
[12:18:38] <tarrow>	 Lucas_WMDE: I'm all done :)
[12:18:49] <Lucas_WMDE>	 great, thanks!
[12:19:03] <tarrow>	 Thanks
[12:19:57] <Lucas_WMDE>	 hm, /srv/mediawiki-staging/php-1.35.0-wmf.11 is one commit ahead of upstream…
[12:19:59] <wikibugs>	 (03PS5) 10Jbond: apt:::pin: allow callers to override the notify resource [puppet] - 10https://gerrit.wikimedia.org/r/562789
[12:20:46] <Lucas_WMDE>	 anomie: in case you’re online – do you remember if that’s a security commit?
[12:22:57] <wikibugs>	 (03CR) 10Jbond: "I think it would be better to update the apt::pin resource to be a bit more flexible.  see https://gerrit.wikimedia.org/r/c/operations/pup" [puppet] - 10https://gerrit.wikimedia.org/r/562544 (owner: 10Muehlenhoff)
[12:23:28] <Lucas_WMDE>	 strange thing, the commit isn’t on Gerrit but the Phabricator task has been public for a while
[12:23:31] <Lucas_WMDE>	 :shrug:
[12:23:34] <Lucas_WMDE>	 I’ll just rebase it, I guess
[12:24:19] <wikibugs>	 (03PS5) 10Alexandros Kosiaris: Switch eventgate-main LVS to use TLS port 4292 [puppet] - 10https://gerrit.wikimedia.org/r/559168 (https://phabricator.wikimedia.org/T241073) (owner: 10Ottomata)
[12:24:22] <Lucas_WMDE>	 CFisch_WMDE: your change is on mwdebug1001, can you test it?
[12:25:14] <CFisch_WMDE>	 jepp
[12:25:56] <wikibugs>	 (03CR) 10Muehlenhoff: "Sure thing! I'll look into your patch in a bit." [puppet] - 10https://gerrit.wikimedia.org/r/562544 (owner: 10Muehlenhoff)
[12:28:32] <CFisch_WMDE>	 Lucas_WMDE: Seems to work thanks.
[12:28:37] <Lucas_WMDE>	 great!
[12:28:54] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] "PCC at https://puppet-compiler.wmflabs.org/compiler1002/20270, LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/562792 (https://phabricator.wikimedia.org/T241073) (owner: 10Alexandros Kosiaris)
[12:29:24] <Lucas_WMDE>	 syncing
[12:29:56] <wikibugs>	 (03PS3) 10Lucas Werkmeister (WMDE): Update Skolt Sami language name [mediawiki-config] - 10https://gerrit.wikimedia.org/r/510875 (https://phabricator.wikimedia.org/T223544)
[12:30:27] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1001 Synchronized php-1.35.0-wmf.11/extensions/Cite: SWAT: [[gerrit:561169|Fix handling of `<references responsive="" />` (T241303)]] (duration: 01m 06s)
[12:30:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:30:30] <stashbot>	 T241303: <references responsive /> does not work anymore - https://phabricator.wikimedia.org/T241303
[12:31:04] <wikibugs>	 (03PS1) 10Elukey: admin: add kerberos flag for gbirke [puppet] - 10https://gerrit.wikimedia.org/r/562809 (https://phabricator.wikimedia.org/T242215)
[12:31:05] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/510875 (https://phabricator.wikimedia.org/T223544) (owner: 10Lucas Werkmeister (WMDE))
[12:31:17] <Lucas_WMDE>	 deploying my own config change now
[12:32:14] <wikibugs>	 (03Merged) 10jenkins-bot: Update Skolt Sami language name [mediawiki-config] - 10https://gerrit.wikimedia.org/r/510875 (https://phabricator.wikimedia.org/T223544) (owner: 10Lucas Werkmeister (WMDE))
[12:32:54] <Lucas_WMDE>	 testing on mwdebug1001
[12:33:08] <Lucas_WMDE>	 looks great, syncing
[12:34:36] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:510875|Update Skolt Sami language name (T223544)]] (duration: 01m 06s)
[12:34:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:34:39] <stashbot>	 T223544: WMHack19: The native language name for [sms] Skolt Sami should be changed from "sää´mǩiõll" and "sääʹmǩiõll" to "nuõrttsääʹmǩiõll" - https://phabricator.wikimedia.org/T223544
[12:35:11] <Lucas_WMDE>	 anything else to SWAT?
[12:36:15] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: lvs: Remove unused eventgate-main-http service [puppet] - 10https://gerrit.wikimedia.org/r/562810 (https://phabricator.wikimedia.org/T241073)
[12:36:20] <Lucas_WMDE>	 !log EU SWAT done
[12:36:20] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] lvs: Append -http to eventgate-main [puppet] - 10https://gerrit.wikimedia.org/r/562805 (https://phabricator.wikimedia.org/T241073) (owner: 10Alexandros Kosiaris)
[12:36:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:36:23] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] "Same thing as for eventgate-analytics-http. Merging, it's a NOOP effectively" [puppet] - 10https://gerrit.wikimedia.org/r/562805 (https://phabricator.wikimedia.org/T241073) (owner: 10Alexandros Kosiaris)
[12:37:06] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] admin: add kerberos flag for gbirke [puppet] - 10https://gerrit.wikimedia.org/r/562809 (https://phabricator.wikimedia.org/T242215) (owner: 10Elukey)
[12:42:24] <wikibugs_>	 (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/562666 (owner: 10EBernhardson)
[12:46:24] <_joe_>	 !log deleting releng/composer-php55:0.1.0 from the docker registry
[12:46:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:54:32] <wikibugs>	 (03PS1) 10Ema: ATS: add X-Analytics-TLS [puppet] - 10https://gerrit.wikimedia.org/r/562811 (https://phabricator.wikimedia.org/T237993)
[12:57:08] <wikibugs>	 (03PS2) 10Ema: ATS: add X-Analytics-TLS [puppet] - 10https://gerrit.wikimedia.org/r/562811 (https://phabricator.wikimedia.org/T237993)
[12:57:09] <wikibugs>	 (03PS4) 10Ema: ATS: add webrequest logging for atskafka [puppet] - 10https://gerrit.wikimedia.org/r/562535 (https://phabricator.wikimedia.org/T237993)
[13:15:58] <wikibugs>	 (03PS1) 10Gehel: wdqs: enable async_imports by default [puppet] - 10https://gerrit.wikimedia.org/r/562817
[13:16:00] <wikibugs>	 10Operations, 10Traffic: Docker registry needs cache to vary on Accept header value - https://phabricator.wikimedia.org/T242200 (10BBlack) So long as the registry's responses do all the standards-based things correctly (they contain `Vary: Accept`, and the matching `Accept` values also match the `Content-Type`...
[13:20:56] <wikibugs_>	 (03CR) 10Gehel: "PCC looks happy: https://puppet-compiler.wmflabs.org/compiler1002/20274/" [puppet] - 10https://gerrit.wikimedia.org/r/562817 (owner: 10Gehel)
[13:36:54] <wikibugs>	 (03PS1) 10Elukey: admin: add kerberos flag to user cohi [puppet] - 10https://gerrit.wikimedia.org/r/562824 (https://phabricator.wikimedia.org/T242217)
[13:36:57] <wikibugs>	 (03PS5) 10Ema: ATS: add webrequest logging for atskafka [puppet] - 10https://gerrit.wikimedia.org/r/562535 (https://phabricator.wikimedia.org/T237993)
[13:41:23] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] "It looks good, but ideally X-Analytics-TLS shouldn't reach varnish-fe" [puppet] - 10https://gerrit.wikimedia.org/r/562811 (https://phabricator.wikimedia.org/T237993) (owner: 10Ema)
[13:44:17] <wikibugs>	 (03CR) 10DCausse: [C: 03+1] wdqs: enable async_imports by default [puppet] - 10https://gerrit.wikimedia.org/r/562817 (owner: 10Gehel)
[13:45:35] <wikibugs>	 (03CR) 10Gehel: [C: 03+2] wdqs: enable async_imports by default [puppet] - 10https://gerrit.wikimedia.org/r/562817 (owner: 10Gehel)
[13:47:33] <wikibugs>	 (03PS1) 10ArielGlenn: Generate only missing 7z files when doing recompression job [dumps] - 10https://gerrit.wikimedia.org/r/562828 (https://phabricator.wikimedia.org/T242221)
[13:50:41] <wikibugs>	 (03PS3) 10Ema: ATS: add X-Analytics-TLS [puppet] - 10https://gerrit.wikimedia.org/r/562811 (https://phabricator.wikimedia.org/T237993)
[13:51:20] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] Switch eventgate-main LVS to use TLS port 4292 [puppet] - 10https://gerrit.wikimedia.org/r/559168 (https://phabricator.wikimedia.org/T241073) (owner: 10Ottomata)
[13:52:18] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] "Same as with eventgate-analytics, merging and shepherding to production" [puppet] - 10https://gerrit.wikimedia.org/r/559168 (https://phabricator.wikimedia.org/T241073) (owner: 10Ottomata)
[13:53:54] <wikibugs>	 (03PS7) 10Ema: ATS: add webrequest logging for atskafka [puppet] - 10https://gerrit.wikimedia.org/r/562535 (https://phabricator.wikimedia.org/T237993)
[13:55:53] <ottomata>	 akosiaris:  phew for a second there i thought you just merged the main one...i hadn't submitted a patch for that
[13:55:55] <ottomata>	 thanks for doing that!
[13:56:25] <akosiaris>	 ottomata: yeah I worked on that. It should be good to use in a few
[13:56:49] <akosiaris>	 up to now everything has gone perfect
[13:56:57] <ottomata>	 awesome
[13:57:05] <ottomata>	 thanks for renaming those too
[13:58:10] <akosiaris>	 it turned out to be less complicated that I feared
[13:59:22] <wikibugs_>	 (03CR) 10ArielGlenn: [C: 03+2] Generate only missing 7z files when doing recompression job [dumps] - 10https://gerrit.wikimedia.org/r/562828 (https://phabricator.wikimedia.org/T242221) (owner: 10ArielGlenn)
[13:59:24] <wikibugs_>	 (03PS4) 10Ema: ATS: add X-Analytics-TLS [puppet] - 10https://gerrit.wikimedia.org/r/562811 (https://phabricator.wikimedia.org/T237993)
[13:59:26] <wikibugs_>	 (03PS8) 10Ema: ATS: add webrequest logging for atskafka [puppet] - 10https://gerrit.wikimedia.org/r/562535 (https://phabricator.wikimedia.org/T237993)
[13:59:51] <ottomata>	 akosiaris:  am confused tho
[13:59:53] <ottomata>	 https://gerrit.wikimedia.org/r/c/operations/puppet/+/559167/8/hieradata/common/discovery.yaml
[14:00:04] <jouncebot>	 longma and liw: Your horoscope predicts another unfortunate Mediawiki train - American+European Version (secondary timeslot) deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200108T1400).
[14:00:12] <logmsgbot>	 !log ariel@deploy1001 Started deploy [dumps/dumps@dbd0ecd]: don't regenerate existing 7z files on rerun of the 7z recompression job
[14:00:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:00:17] <logmsgbot>	 !log ariel@deploy1001 Finished deploy [dumps/dumps@dbd0ecd]: don't regenerate existing 7z files on rerun of the 7z recompression job (duration: 00m 05s)
[14:00:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:00:44] <ottomata>	 oh the discovery doesn't matter, since it is just dns?
[14:00:48] <akosiaris>	 yes
[14:01:00] <akosiaris>	 I had the epiphany as well today
[14:01:23] <ottomata>	 ok, so now, all we have to do is change the ports on the client services
[14:01:24] <ottomata>	 ?
[14:01:36] <akosiaris>	 which is just mediawiki, right?
[14:01:43] <ottomata>	 no there are a few
[14:01:48] <akosiaris>	 ah, please do tell
[14:01:53] <ottomata>	 change prop, job queue, analytics stuff, wdqs
[14:01:53] <akosiaris>	 or RTFM me 
[14:02:05] <ottomata>	 i don't think we have a good collection of all users
[14:02:06] <ottomata>	 we should though eh
[14:02:16] <ottomata>	 doc with collection of all users*
[14:02:31] <ottomata>	 this is a good time to make one!
[14:02:35] <akosiaris>	 :)
[14:03:36] <akosiaris>	 pybal restarted everywhere, monitoring has been updated, everything looks peachy. Ball's in your court now
[14:05:02] <ottomata>	 awesome thank you so much!
[14:06:53] <akosiaris>	 yw, thanks as well!
[14:07:06] <XioNoX>	 !log add routinator 0.6.4 to reprepro stretch-wikimedia - T242197
[14:07:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:07:09] <stashbot>	 T242197: Upgrade routinator to 0.6.4 - https://phabricator.wikimedia.org/T242197
[14:07:48] <wikibugs>	 (03PS9) 10Ema: ATS: add webrequest logging for atskafka [puppet] - 10https://gerrit.wikimedia.org/r/562535 (https://phabricator.wikimedia.org/T237993)
[14:08:15] <anomie>	 Lucas_WMDE: https://phabricator.wikimedia.org/T234450#5698503 seems most relevant to that. It's being applied as a security patch, yes, even though the task and patch are public.
[14:09:51] <ottomata>	 ema: ! am very interested to understand how ^^ works :)
[14:10:30] <Lucas_WMDE>	 ok, thanks
[14:12:24] <ema>	 ottomata: hey! It doesn't :)
[14:13:02] <ottomata>	 hah wow sounds easy
[14:13:10] <ema>	 ottomata: jk, that's the first step: we're configuring a named pipe to which ATS logs all requests with the format above
[14:13:35] <icinga-wm>	 RECOVERY - Disk space on notebook1004 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=notebook1004&var-datasource=eqiad+prometheus/ops
[14:14:13] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=routinator site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[14:14:20] <ema>	 ottomata: then we're gonna write atskafka, which reads from there and sends to kafka
[14:14:54] <ema>	 (plus metrics and all the nice things that elukey wrote here: T237993)
[14:14:54] <stashbot>	 T237993: Create replacement for Varnishkafka - https://phabricator.wikimedia.org/T237993
[14:15:01] <ottomata>	 ah k
[14:15:16] <ottomata>	 nice keeping them separate.  so that is just regular ats logging stuff
[14:15:28] <ottomata>	 no json formatting or anything, that will be done by atskafka?
[14:15:37] <ema>	 that's the plan, yes
[14:15:39] <ottomata>	 aye cool
[14:16:42] <ema>	 I like the idea of keeping logging and kafkaing separate too :)
[14:20:21] <wikibugs>	 (03CR) 10Ema: [C: 03+2] ATS: add X-Analytics-TLS [puppet] - 10https://gerrit.wikimedia.org/r/562811 (https://phabricator.wikimedia.org/T237993) (owner: 10Ema)
[14:20:22] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] ATS: add webrequest logging for atskafka [puppet] - 10https://gerrit.wikimedia.org/r/562535 (https://phabricator.wikimedia.org/T237993) (owner: 10Ema)
[14:22:20] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[14:23:28] <ema>	 !log depool cp4028 to test X-Analytics-TLS patch T237993
[14:23:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:23:32] <stashbot>	 T237993: Create replacement for Varnishkafka - https://phabricator.wikimedia.org/T237993
[14:30:00] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: toolforge: prometheus: fix regex for cadvisor in the new k8s cluster [puppet] - 10https://gerrit.wikimedia.org/r/562837 (https://phabricator.wikimedia.org/T237643)
[14:30:32] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=routinator site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[14:30:56] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: toolforge: new k8s: prometheus: scrape metrics from each individual ingress pod [puppet] - 10https://gerrit.wikimedia.org/r/562838 (https://phabricator.wikimedia.org/T237643)
[14:30:58] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good, thanks! One comment inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/562789 (owner: 10Jbond)
[14:33:34] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[14:35:34] <ema>	 !log repool cp4028 after successful X-Analytics-TLS patch test T237993
[14:35:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:35:37] <stashbot>	 T237993: Create replacement for Varnishkafka - https://phabricator.wikimedia.org/T237993
[14:36:24] <wikibugs>	 10Operations, 10Analytics, 10Product-Analytics, 10SRE-Access-Requests: Access to analytics infrastructure for SNowick_WMF - https://phabricator.wikimedia.org/T242026 (10Ottomata) Manual syncing is still needed for Hue (users are in MySQL, not SQLite, syncing is still needed).
[14:37:21] <wikibugs>	 10Operations, 10Analytics, 10Product-Analytics, 10SRE-Access-Requests: Access to analytics infrastructure for SNowick_WMF - https://phabricator.wikimedia.org/T242026 (10Ottomata) Done.  Use your shell username and ldap password to login.
[14:40:34] <wikibugs>	 10Operations, 10Analytics, 10Product-Analytics, 10SRE-Access-Requests: Access to analytics infrastructure for SNowick_WMF - https://phabricator.wikimedia.org/T242026 (10Ottomata) Also hi and welcome! :D
[14:42:29] <wikibugs>	 (03Abandoned) 10Muehlenhoff: Inline a variant of apt::pin to package_from_component [puppet] - 10https://gerrit.wikimedia.org/r/562544 (owner: 10Muehlenhoff)
[14:42:34] <wikibugs>	 (03PS2) 10Muehlenhoff: Deprecate raid1.cfg [puppet] - 10https://gerrit.wikimedia.org/r/562483 (https://phabricator.wikimedia.org/T156955)
[14:42:36] <wikibugs>	 (03PS2) 10Gehel: airflow: Provide wrapper script to invoke airflow [puppet] - 10https://gerrit.wikimedia.org/r/562666 (owner: 10EBernhardson)
[14:45:03] <wikibugs>	 (03CR) 10Gehel: [C: 03+2] airflow: Provide wrapper script to invoke airflow [puppet] - 10https://gerrit.wikimedia.org/r/562666 (owner: 10EBernhardson)
[14:50:10] <wikibugs>	 (03PS1) 10Ottomata: Use new TLS port for eventgate-analytics [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562840 (https://phabricator.wikimedia.org/T242224)
[14:53:02] <wikibugs_>	 (03PS1) 10Ema: ATS: escape hyphens in X-Analytics-TLS patterns [puppet] - 10https://gerrit.wikimedia.org/r/562841 (https://phabricator.wikimedia.org/T237993)
[14:53:41] <wikibugs_>	 (03CR) 10Elukey: [V: 03+2] "Is it ok to submit right?" [homer/public] - 10https://gerrit.wikimedia.org/r/562543 (owner: 10Elukey)
[14:53:54] <moritzm>	 XioNoX: shall I puppet-merge your routinator patch along?
[14:54:05] <XioNoX>	 moritzm: was about to ping you, yep
[14:54:15] <moritzm>	 ok :-)
[14:54:44] <moritzm>	 done
[14:56:33] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] ATS: escape hyphens in X-Analytics-TLS patterns [puppet] - 10https://gerrit.wikimedia.org/r/562841 (https://phabricator.wikimedia.org/T237993) (owner: 10Ema)
[14:58:45] <wikibugs>	 (03PS3) 10Muehlenhoff: Don't install the Postgres contrib package on Buster [puppet] - 10https://gerrit.wikimedia.org/r/562787
[14:59:20] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=routinator site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[14:59:47] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Don't install the Postgres contrib package on Buster [puppet] - 10https://gerrit.wikimedia.org/r/562787 (owner: 10Muehlenhoff)
[14:59:50] <wikibugs>	 (03PS3) 10Vgutierrez: 1.3.1-3 Rebuild for buster [software/varnish/libvmod-re2] (debian) - 10https://gerrit.wikimedia.org/r/562801 (https://phabricator.wikimedia.org/T242093)
[15:00:35] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] Use new TLS port for eventgate-analytics [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562840 (https://phabricator.wikimedia.org/T242224) (owner: 10Ottomata)
[15:00:39] <ottomata>	 !log deploying change to use new TLS port for eventgate-analytics - T242224
[15:00:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:00:42] <stashbot>	 T242224: Switch all eventgate clients to use new TLS port - https://phabricator.wikimedia.org/T242224
[15:01:10] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[15:02:51] <XioNoX>	 !log Routinator 0.6.4 looking good on rpki2001, upgrading rpki1001 - T242197
[15:02:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:02:54] <stashbot>	 T242197: Upgrade routinator to 0.6.4 - https://phabricator.wikimedia.org/T242197
[15:03:56] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] 1.3.1-3 Rebuild for buster [software/varnish/libvmod-re2] (debian) - 10https://gerrit.wikimedia.org/r/562801 (https://phabricator.wikimedia.org/T242093) (owner: 10Vgutierrez)
[15:03:56] <wikibugs>	 (03CR) 10Ema: [C: 03+2] ATS: escape hyphens in X-Analytics-TLS patterns [puppet] - 10https://gerrit.wikimedia.org/r/562841 (https://phabricator.wikimedia.org/T237993) (owner: 10Ema)
[15:04:01] <wikibugs>	 (03PS4) 10Muehlenhoff: Don't install the Postgres contrib package on Buster [puppet] - 10https://gerrit.wikimedia.org/r/562787
[15:04:54] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Don't install the Postgres contrib package on Buster [puppet] - 10https://gerrit.wikimedia.org/r/562787 (owner: 10Muehlenhoff)
[15:05:46] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+1] Refactor, preparatory to testing multiple hosts in parallel. [software/httpbb] - 10https://gerrit.wikimedia.org/r/555515 (owner: 10RLazarus)
[15:08:00] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1230 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:08:02] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1280 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:08:02] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1288 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:08:02] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1225 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:08:02] <icinga-wm>	 PROBLEM - Apache HTTP on mw1314 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:08:02] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1285 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:08:02] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1297 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:08:03] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1314 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:08:04] <icinga-wm>	 PROBLEM - Apache HTTP on mw1343 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:08:12] <icinga-wm>	 PROBLEM - Apache HTTP on mw1284 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:08:12] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1314 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:08:12] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1279 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:08:14] <icinga-wm>	 PROBLEM - Apache HTTP on mw1226 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:08:14] <icinga-wm>	 PROBLEM - Apache HTTP on mw1288 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:08:14] <icinga-wm>	 PROBLEM - Apache HTTP on mw1281 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:08:15] <icinga-wm>	 PROBLEM - Apache HTTP on mw1313 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:08:16] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1233 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:08:18] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+1] Test multiple hosts in parallel. [software/httpbb] - 10https://gerrit.wikimedia.org/r/559952 (owner: 10RLazarus)
[15:08:18] <icinga-wm>	 PROBLEM - Apache HTTP on mw1231 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:08:18] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1276 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:08:18] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1290 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:08:20] <icinga-wm>	 PROBLEM - Apache HTTP on mw1347 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:08:20] <icinga-wm>	 PROBLEM - Apache HTTP on mw1280 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:08:20] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1343 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:08:20] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=routinator site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[15:08:20] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1288 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:08:22] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1229 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:08:22] <icinga-wm>	 PROBLEM - Apache HTTP on mw1276 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:08:22] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1297 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:08:22] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1284 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:08:26] <icinga-wm>	 PROBLEM - Apache HTTP on mw1221 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:08:26] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1285 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:08:28] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1226 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:08:30] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1344 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:08:34] <icinga-wm>	 PROBLEM - Apache HTTP on mw1289 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:08:34] <icinga-wm>	 PROBLEM - Apache HTTP on mw1227 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:08:34] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1224 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:08:34] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1286 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:08:36] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1228 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:08:36] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1278 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:08:36] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1231 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:08:38] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1222 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:08:38] <icinga-wm>	 PROBLEM - Apache HTTP on mw1315 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:08:38] <icinga-wm>	 PROBLEM - Apache HTTP on mw1285 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:08:38] <icinga-wm>	 PROBLEM - Apache HTTP on mw1341 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:08:40] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1281 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:08:40] <icinga-wm>	 PROBLEM - Apache HTTP on mw1316 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:08:42] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1289 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:08:42] <icinga-wm>	 PROBLEM - Apache HTTP on mw1348 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:08:42] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1281 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:08:42] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1340 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:08:42] <icinga-wm>	 PROBLEM - Apache HTTP on mw1312 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:08:44] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1342 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:08:44] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1286 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:08:44] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1313 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:08:44] <icinga-wm>	 PROBLEM - Apache HTTP on mw1344 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:08:45] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1231 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:08:45] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1342 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:08:46] <icinga-wm>	 PROBLEM - Apache HTTP on mw1230 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:08:46] <icinga-wm>	 PROBLEM - Graphoid LVS eqiad on graphoid.svc.eqiad.wmnet is CRITICAL: /{domain}/v1/{format}/{title}/{revid}/{id} (retrieve PNG from mediawiki.org) is CRITICAL: Test retrieve PNG from mediawiki.org returned the unexpected status 400 (expecting: 200) https://wikitech.wikimedia.org/wiki/Graphoid
[15:08:47] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1016 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received: /en.wikipedia.org/v1/page/title/{title} (Get rev by title from storage) timed out before a response was received: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out be
[15:08:47] <icinga-wm>	 as received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:08:48] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1313 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:08:48] <icinga-wm>	 PROBLEM - Apache HTTP on mw1286 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:08:49] <icinga-wm>	 PROBLEM - Apache HTTP on mw1340 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:08:49] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1228 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:08:50] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1276 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:08:50] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1027 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received: /en.wikipedia.org/v1/page/title/{title} (Get rev by title from storage) timed out before a response was received: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out be
[15:08:51] <icinga-wm>	 as received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:08:51] <icinga-wm>	 PROBLEM - mobileapps endpoints health on scb2003 is CRITICAL: /{domain}/v1/page/media/{title} (Get media in test page) timed out before a response was received: /{domain}/v1/page/metadata/{title} (retrieve extended metadata for Video article on English Wikipedia) timed out before a response was received: /{domain}/v1/page/mobile-sections/{title} (retrieve test page via mobile-sections) timed out before a response was received: /{
[15:08:52] <icinga-wm>	 obile-html/{title} (Get page content HTML for test page) timed out before a response was received: /{domain}/v1/page/summary/{title} (Get summary for test page) timed out before a response was received: /{domain}/v1/transform/html/to/mobile-html/{title} (Get preview mobile HTML for test page) timed out before a response was received: /{domain}/v1/page/random/title (retrieve a random article title) timed out before a response was 
[15:08:52] <icinga-wm>	 n}/v1/page/media-list/{title} (Get media list from test page) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps
[15:10:08] <rlazarus>	 uh oh
[15:10:09] <logmsgbot>	 !log otto@deploy1001 Synchronized wmf-config/ProductionServices.php: Make EventBus use TLS for eventgate-analytics - T242224 (duration: 06m 10s)
[15:10:10] <vgutierrez>	 wow
[15:10:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:10:13] <stashbot>	 T242224: Switch all eventgate clients to use new TLS port - https://phabricator.wikimedia.org/T242224
[15:10:32] <wikibugs>	 (03PS5) 10Muehlenhoff: Don't install the Postgres contrib package on Buster [puppet] - 10https://gerrit.wikimedia.org/r/562787
[15:10:59] <_joe_>	 what
[15:11:02] <herron>	 hmm
[15:11:13] <_joe_>	 ottomata: revert please
[15:11:19] <_joe_>	 like now
[15:11:21] <ottomata>	 am
[15:11:26] <cdanis>	 o/
[15:11:29] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Don't install the Postgres contrib package on Buster [puppet] - 10https://gerrit.wikimedia.org/r/562787 (owner: 10Muehlenhoff)
[15:11:29] * apergos looks in
[15:11:30] <wikibugs>	 (03PS1) 10Ottomata: Revert "Use new TLS port for eventgate-analytics" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562844
[15:11:30] <icinga-wm>	 PROBLEM - Logstash Elasticsearch indexing errors on icinga1001 is CRITICAL: 23.76 ge 8 https://wikitech.wikimedia.org/wiki/Logstash%23Indexing_errors https://logstash.wikimedia.org/goto/1cee1f1b5d4e6c5e06edb3353a2a4b83 https://grafana.wikimedia.org/dashboard/db/logstash
[15:11:31] <_joe_>	 I could've told you it would not work if I noticed the change
[15:11:32] <wikibugs>	 (03CR) 10Ottomata: [V: 03+2 C: 03+2] Revert "Use new TLS port for eventgate-analytics" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562844 (owner: 10Ottomata)
[15:11:43] <logmsgbot>	 !log otto@deploy1001 sync-file aborted: Make EventBus use TLS for eventgate-analytics - T242224 (duration: 00m 00s)
[15:11:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:11:50] <_joe_>	 php-fpm and TLS don't like each other
[15:11:58] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs3007 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[15:12:03] <ottomata>	 uh huhh...
[15:12:12] <_joe_>	 lemme see the impact
[15:12:14] <ottomata>	 _joe_:  it worked on mwdebug1001
[15:12:20] <icinga-wm>	 PROBLEM - ATS TLS has reduced HTTP availability #page on icinga1001 is CRITICAL: cluster=cache_text layer=tls https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=13&fullscreen&refresh=1m&orgId=1
[15:12:23] <_joe_>	 ottomata: without traffic, sure
[15:12:24] <icinga-wm>	 PROBLEM - PyBal IPVS diff check on lvs1016 is CRITICAL: CRITICAL: Hosts in IPVS but unknown to PyBal: set([mw1284.eqiad.wmnet, mw1346.eqiad.wmnet, mw1348.eqiad.wmnet, mw1232.eqiad.wmnet, mw1344.eqiad.wmnet, mw1227.eqiad.wmnet, mw1229.eqiad.wmnet, mw1314.eqiad.wmnet, mw1279.eqiad.wmnet, mw1226.eqiad.wmnet, mw1317.eqiad.wmnet, mw1233.eqiad.wmnet, mw1222.eqiad.wmnet, mw1283.eqiad.wmnet, mw1340.eqiad.wmnet, mw1343.eqiad.wmnet, mw1225
[15:12:24] <icinga-wm>	 281.eqiad.wmnet, mw1347.eqiad.wmnet, mw1345.eqiad.wmnet, mw1223.eqiad.wmnet, mw1286.eqiad.wmnet, mw1282.eqiad.wmnet, mw1276.eqiad.wmnet, mw1221.eqiad.wmnet, mw1230.eqiad.wmnet, mw1235.eqiad.wmnet, mw1234.eqiad.wmnet, mw1278.eqiad.wmnet, mw1224.eqiad.wmnet, mw1316.eqiad.wmnet, mw1231.eqiad.wmnet, mw1312.eqiad.wmnet, mw1228.eqiad.wmnet, mw1297.eqiad.wmnet, mw1342.eqiad.wmnet, mw1289.eqiad.wmnet, mw1315.eqiad.wmnet, mw1341.eqiad.wmn
[15:12:24] <icinga-wm>	 wmnet, mw1277.eqiad.wmnet, mw1313.eqiad.wmnet]) https://wikitech.wikimedia.org/wiki/PyBal
[15:12:26] <ottomata>	 ah
[15:12:31] <XioNoX>	 I'm around if needed
[15:12:37] <logmsgbot>	 !log otto@deploy1001 Scap failed!: 4/11 canaries failed their endpoint checks(http://en.wikipedia.org)
[15:12:38] <jbond42>	 im also here
[15:12:44] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at esams on icinga1001 is CRITICAL: 57.15 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[15:12:50] <ottomata>	 having trouble syncing
[15:12:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:12:53] <ottomata>	 timeouts
[15:12:54] <_joe_>	 ottomata: --force?
[15:13:01] * volans ofc around but I see enough people already
[15:13:14] <ottomata>	 scap sync-file --force
[15:13:15] <ottomata>	 ?
[15:13:22] <_joe_>	 ottomata: IIRC, yes
[15:13:31] <_joe_>	 I can't get data from grafana btw
[15:13:36] <icinga-wm>	 PROBLEM - PyBal IPVS diff check on lvs1015 is CRITICAL: CRITICAL: Hosts in IPVS but unknown to PyBal: set([mw1284.eqiad.wmnet, mw1346.eqiad.wmnet, mw1280.eqiad.wmnet, mw1348.eqiad.wmnet, mw1232.eqiad.wmnet, mw1344.eqiad.wmnet, mw1287.eqiad.wmnet, mw1227.eqiad.wmnet, mw1288.eqiad.wmnet, mw1229.eqiad.wmnet, mw1314.eqiad.wmnet, mw1279.eqiad.wmnet, mw1226.eqiad.wmnet, mw1317.eqiad.wmnet, mw1233.eqiad.wmnet, mw1222.eqiad.wmnet, mw1283
[15:13:36] <icinga-wm>	 340.eqiad.wmnet, mw1343.eqiad.wmnet, mw1225.eqiad.wmnet, mw1281.eqiad.wmnet, mw1228.eqiad.wmnet, mw1345.eqiad.wmnet, mw1339.eqiad.wmnet, mw1286.eqiad.wmnet, mw1282.eqiad.wmnet, mw1276.eqiad.wmnet, mw1221.eqiad.wmnet, mw1230.eqiad.wmnet, mw1347.eqiad.wmnet, mw1235.eqiad.wmnet, mw1234.eqiad.wmnet, mw1278.eqiad.wmnet, mw1224.eqiad.wmnet, mw1290.eqiad.wmnet, mw1316.eqiad.wmnet, mw1231.eqiad.wmnet, mw1312.eqiad.wmnet, mw1223.eqiad.wmn
[15:13:36] <icinga-wm>	 wmnet, mw1342.eqiad.wmnet, mw1289.eqiad.wmnet, mw1315.eqiad.wmnet, mw1341.eqiad.wmnet, mw1 https://wikitech.wikimedia.org/wiki/PyBal
[15:13:49] <cdanis>	 _joe_: likely if you use another text-lb it will work
[15:13:50] <icinga-wm>	 PROBLEM - wikidata.org dispatch lag is REALLY high ---4000s- on www.wikidata.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://phabricator.wikimedia.org/project/view/71/
[15:13:58] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[15:14:12] <godog>	 wfm on esams btw
[15:14:23] <wikibugs>	 (03CR) 10Vgutierrez: "builds as expected on boron" [software/varnish/libvmod-re2] (debian) - 10https://gerrit.wikimedia.org/r/562801 (https://phabricator.wikimedia.org/T242093) (owner: 10Vgutierrez)
[15:14:28] <_joe_>	 now it does godog
[15:14:32] <ottomata>	 revert sync in progress
[15:14:41] <_joe_>	 ottomata: aye
[15:14:59] <_joe_>	 apis are completely down
[15:15:26] <ottomata>	 oof
[15:16:29] <_joe_>	 ottomata: lmk when the sync is done
[15:16:41] <wikibugs>	 (03PS1) 10Clarakosi: Update parsoid_uri to use Parsoid-PHP [puppet] - 10https://gerrit.wikimedia.org/r/562845 (https://phabricator.wikimedia.org/T241756)
[15:16:49] <ottomata>	 15:13:58 Check php-fpm cache...
[15:17:00] <ottomata>	 it does say
[15:17:02] <ottomata>	 sync-apaches: 100% (ok: 325; fail: 0; left: 0)
[15:17:02] <ottomata>	 15:13:58 Finished sync-apaches (duration: 00m 06s)
[15:17:05] <_joe_>	 which will never work
[15:17:15] <cdanis>	 do we need to rolling restart apiservers by hand?
[15:17:16] <ottomata>	 it is hanbging on the check php-fpm cache
[15:17:18] <cdanis>	 are all the threads wedged?
[15:17:19] <_joe_>	 ok things are definitely NOT back to normal
[15:17:24] <icinga-wm>	 PROBLEM - LVS HTTPS IPv6 #page on text-lb.esams.wikimedia.org_ipv6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems
[15:19:08] <icinga-wm>	 RECOVERY - LVS HTTPS IPv6 #page on text-lb.esams.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 15276 bytes in 6.335 second response time https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems
[15:19:19] <ottomata>	 yeah i still see some https configs on an app server
[15:19:34] <ottomata>	 i could try and fix manually via cumin?
[15:19:48] <cdanis>	 ottomata: sync-file again
[15:19:49] <_joe_>	 ottomata: can you do another sync?
[15:19:54] <ottomata>	 ko
[15:19:56] <ottomata>	 ok
[15:19:57] <cdanis>	 maybe this is the thing where old configurations get wedged in the cache
[15:19:59] <logmsgbot>	 !log otto@deploy1001 sync-file aborted: REVERT Make EventBus use TLS for eventgate-analytics - T242224 (duration: 06m 33s)
[15:20:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:20:02] <stashbot>	 T242224: Switch all eventgate clients to use new TLS port - https://phabricator.wikimedia.org/T242224
[15:20:02] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at esams on icinga1001 is CRITICAL: 59.62 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[15:20:14] <volans>	 can I help in anyway?
[15:20:15] <ottomata>	 syncing again
[15:20:28] <icinga-wm>	 RECOVERY - Apache HTTP on mw1317 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.050 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:20:28] <ottomata>	 volans:  i made a config change that made eventbus use https
[15:20:30] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1315 is OK: HTTP OK: HTTP/1.1 200 OK - 76983 bytes in 3.126 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:20:31] <ottomata>	 in ProductionServices.php
[15:20:32] <icinga-wm>	 RECOVERY - wikifeeds eqiad on wikifeeds.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Wikifeeds
[15:20:32] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1315 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.056 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:20:33] <ottomata>	 we need to revert it
[15:20:35] <icinga-wm>	 RECOVERY - Apache HTTP on mw1345 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.896 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:20:36] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1345 is OK: HTTP OK: HTTP/1.1 200 OK - 76982 bytes in 0.988 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:20:38] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1312 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 0.223 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:20:38] <ottomata>	 i'm scap syncing again
[15:20:40] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1339 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 0.694 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:20:40] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1347 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 629 bytes in 8.131 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:20:42] <ottomata>	 but i'm not sure it is working
[15:20:42] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1348 is OK: HTTP OK: HTTP/1.1 200 OK - 76983 bytes in 1.701 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:20:43] <volans>	 ottomata: yes I'm aware
[15:20:44] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1341 is OK: HTTP OK: HTTP/1.1 200 OK - 76982 bytes in 0.158 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:20:50] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1230 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.063 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:20:50] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1280 is OK: HTTP OK: HTTP/1.1 200 OK - 76982 bytes in 0.159 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:20:52] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1283 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 629 bytes in 3.772 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:20:52] <icinga-wm>	 RECOVERY - Apache HTTP on mw1346 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 4.095 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:20:54] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1297 is OK: HTTP OK: HTTP/1.1 200 OK - 76982 bytes in 0.538 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:20:54] <icinga-wm>	 RECOVERY - Apache HTTP on mw1235 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 7.410 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:20:56] <ottomata>	 if we can manually revert the config line
[15:21:00] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1288 is OK: HTTP OK: HTTP/1.1 200 OK - 76983 bytes in 9.304 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:21:00] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1225 is OK: HTTP OK: HTTP/1.1 200 OK - 76983 bytes in 9.987 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:21:01] <ottomata>	 ...unless it is working?
[15:21:08] <icinga-wm>	 RECOVERY - Apache HTTP on mw1284 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 6.549 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:21:12] <icinga-wm>	 RECOVERY - Apache HTTP on mw1278 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.046 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:21:14] <icinga-wm>	 RECOVERY - Apache HTTP on mw1231 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 7.465 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:21:26] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1278 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.062 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:21:31] <_joe_>	 ok this time it worked
[15:21:52] <ottomata>	 still hanging on checking php-fpm cache
[15:21:58] <_joe_>	 I still see a the https url on some servers though
[15:22:01] <ottomata>	 and i still see bad config ...yeah
[15:22:07] <ottomata>	 on the one i'm looking at
[15:22:10] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1278 is OK: HTTP OK: HTTP/1.1 200 OK - 76982 bytes in 0.146 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:22:30] <_joe_>	 ottomata: ok the problem is we don't timeout on the php cache check
[15:22:42] <_joe_>	 it should be possible to disable it though, lemme see
[15:22:57] <cdanis>	 would it be faster to cumin a full scap pull on each apiserver?
[15:23:17] <_joe_>	 cdanis: that would kill the network, but maybe
[15:23:27] <thcipriani>	 for scap? --no-php-restart?
[15:23:39] <_joe_>	 yes, that
[15:23:41] <_joe_>	 thanks thcipriani
[15:23:42] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at esams on icinga1001 is OK: (C)60 le (W)70 le 84.45 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[15:23:44] <thcipriani>	 probably ought to make --force do that, too
[15:23:46] <_joe_>	 I was looking at the code
[15:23:48] <ottomata>	 ok i should run sync-file with that?
[15:23:52] <_joe_>	 thcipriani: yep
[15:23:53] <ottomata>	 doing
[15:23:55] <_joe_>	 ottomata: and --force
[15:23:57] <ottomata>	 yes
[15:23:58] <logmsgbot>	 !log otto@deploy1001 sync-file aborted: REVERT Make EventBus use TLS for eventgate-analytics - T242224 (duration: 03m 56s)
[15:24:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:24:22] <ottomata>	 scap: error: extra arguments found: --no-php-restart
[15:24:34] <ottomata>	 scap sync-file --no-php-restart --force wmf-config/ProductionServices.php 'REVERT Make EventBus use TLS for eventgate-analytics - T242224'
[15:24:54] <ottomata>	 thcipriani:  ^
[15:25:12] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1315 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:25:16] <icinga-wm>	 PROBLEM - Apache HTTP on mw1317 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:25:20] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1315 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:25:20] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1347 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:25:22] <icinga-wm>	 PROBLEM - Apache HTTP on mw1345 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:25:22] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1345 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:25:24] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1312 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:25:26] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1339 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:25:28] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1348 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:25:30] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1341 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:25:32] <_joe_>	 ottomata: try now please
[15:25:34] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1283 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:25:34] <icinga-wm>	 PROBLEM - Apache HTTP on mw1346 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:25:35] <icinga-wm>	 PROBLEM - Apache HTTP on mw1235 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:25:38] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1230 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:25:38] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1280 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:25:38] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1288 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:25:38] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1225 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:25:40] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1297 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:25:44] <icinga-wm>	 PROBLEM - Ensure traffic_manager binds on 443 and responds to HTTP requests on cp3058 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[15:25:47] <_joe_>	 without the --no-php-restart
[15:25:48] <icinga-wm>	 PROBLEM - Apache HTTP on mw1284 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:25:54] <icinga-wm>	 PROBLEM - Apache HTTP on mw1231 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:25:58] <icinga-wm>	 PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3058 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[15:26:07] <_joe_>	 ottomata: try a --force please
[15:26:10] <icinga-wm>	 PROBLEM - Varnish has reduced HTTP availability #page on icinga1001 is CRITICAL: job=varnish-text https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[15:26:10] <icinga-wm>	 PROBLEM - wikifeeds eqiad on wikifeeds.svc.eqiad.wmnet is CRITICAL: /{domain}/v1/page/featured/{year}/{month}/{day} (retrieve title of the featured article for April 29, 2016) timed out before a response was received: /{domain}/v1/media/image/featured/{year}/{month}/{day} (retrieve featured image data for April 29, 2016) is CRITICAL: Test retrieve featured image data for April 29, 2016 returned the unexpected status 504 (expectin
[15:26:10] <icinga-wm>	 ikitech.wikimedia.org/wiki/Wikifeeds
[15:26:25] <logmsgbot>	 !log otto@deploy1001 Synchronized wmf-config/ProductionServices.php: REVERT Make EventBus use TLS for eventgate-analytics - T242224 (duration: 00m 34s)
[15:26:28] <ottomata>	 worked
[15:26:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:26:30] <stashbot>	 T242224: Switch all eventgate clients to use new TLS port - https://phabricator.wikimedia.org/T242224
[15:26:30] <icinga-wm>	 RECOVERY - Apache HTTP on mw1233 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 7.716 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:26:34] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1285 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 629 bytes in 9.954 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:26:38] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1233 is OK: HTTP OK: HTTP/1.1 200 OK - 76983 bytes in 1.557 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:26:38] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1314 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 629 bytes in 5.650 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:26:38] <icinga-wm>	 RECOVERY - Apache HTTP on mw1313 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 4.791 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:26:38] <icinga-wm>	 RECOVERY - Apache HTTP on mw1226 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 4.958 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:26:38] <icinga-wm>	 RECOVERY - Apache HTTP on mw1281 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 5.261 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:26:40] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1276 is OK: HTTP OK: HTTP/1.1 200 OK - 76983 bytes in 3.419 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:26:42] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1279 is OK: HTTP OK: HTTP/1.1 200 OK - 76983 bytes in 8.950 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:26:44] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1290 is OK: HTTP OK: HTTP/1.1 200 OK - 76983 bytes in 5.904 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:26:44] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1297 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.055 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:26:44] <icinga-wm>	 RECOVERY - Apache HTTP on mw1280 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 3.967 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:26:44] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1343 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 629 bytes in 4.161 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:26:44] <icinga-wm>	 RECOVERY - Apache HTTP on mw1347 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 4.376 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:26:45] <icinga-wm>	 RECOVERY - Apache HTTP on mw1276 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 1.162 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:26:45] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1229 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 629 bytes in 3.422 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:26:46] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1288 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 629 bytes in 3.802 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:26:46] <icinga-wm>	 RECOVERY - Restrouter LVS eqiad on restrouter.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase
[15:26:47] <icinga-wm>	 RECOVERY - Apache HTTP on mw1221 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.028 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:26:47] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1285 is OK: HTTP OK: HTTP/1.1 200 OK - 76982 bytes in 0.126 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:26:48] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1284 is OK: HTTP OK: HTTP/1.1 200 OK - 76983 bytes in 2.929 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:26:48] <icinga-wm>	 RECOVERY - mobileapps endpoints health on scb1002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps
[15:26:50] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2012 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:26:50] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1226 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.041 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:26:50] <icinga-wm>	 PROBLEM - Check systemd state on wdqs1003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[15:26:50] <wikibugs_>	 (03CR) 10Jdlrobson: [C: 03+1] Enable lead paragraph in user namespace on nlwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562486 (https://phabricator.wikimedia.org/T242030) (owner: 10Ammarpad)
[15:26:52] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1344 is OK: HTTP OK: HTTP/1.1 200 OK - 76982 bytes in 0.145 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:26:52] <icinga-wm>	 RECOVERY - mobileapps endpoints health on scb1004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps
[15:26:54] <icinga-wm>	 RECOVERY - Apache HTTP on mw1289 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.030 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:26:54] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1224 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.039 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:26:54] <icinga-wm>	 RECOVERY - Apache HTTP on mw1227 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.036 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:26:54] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1286 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.053 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:26:55] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1315 is OK: HTTP OK: HTTP/1.1 200 OK - 76982 bytes in 0.833 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:26:58] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1233 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.046 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:26:58] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1223 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.049 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:26:58] <icinga-wm>	 RECOVERY - Apache HTTP on mw1285 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.041 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:26:58] <icinga-wm>	 RECOVERY - Apache HTTP on mw1315 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.047 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:26:58] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1228 is OK: HTTP OK: HTTP/1.1 200 OK - 76981 bytes in 0.112 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:26:59] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1231 is OK: HTTP OK: HTTP/1.1 200 OK - 76982 bytes in 0.131 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:26:59] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1222 is OK: HTTP OK: HTTP/1.1 200 OK - 76982 bytes in 0.120 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:27:00] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1221 is OK: HTTP OK: HTTP/1.1 200 OK - 76982 bytes in 0.165 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:27:00] <icinga-wm>	 RECOVERY - LVS HTTP IPv4 #page on api.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 23764 bytes in 0.318 second response time https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems
[15:27:02] <icinga-wm>	 RECOVERY - Apache HTTP on mw1341 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.045 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:27:02] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2013 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:27:04] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1281 is OK: HTTP OK: HTTP/1.1 200 OK - 76982 bytes in 0.121 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:27:04] <icinga-wm>	 RECOVERY - wikifeeds codfw on wikifeeds.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Wikifeeds
[15:27:04] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1018 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:27:05] <icinga-wm>	 RECOVERY - Apache HTTP on mw1234 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.031 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:27:05] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1225 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.042 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:27:05] <icinga-wm>	 RECOVERY - Apache HTTP on mw1316 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.048 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:27:05] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1347 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.040 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:27:06] <icinga-wm>	 RECOVERY - Apache HTTP on mw1348 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.043 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:27:06] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1340 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.052 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:27:07] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1281 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.054 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:27:07] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1313 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.055 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:27:08] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1289 is OK: HTTP OK: HTTP/1.1 200 OK - 76982 bytes in 0.125 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:27:08] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1286 is OK: HTTP OK: HTTP/1.1 200 OK - 76982 bytes in 0.136 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:27:09] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1342 is OK: HTTP OK: HTTP/1.1 200 OK - 76982 bytes in 0.148 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:27:09] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[15:27:09] <icinga-wm>	 RECOVERY - Apache HTTP on mw1317 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 7.528 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:27:10] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid
[15:27:11] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1315 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 629 bytes in 2.162 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:27:11] <icinga-wm>	 RECOVERY - Apache HTTP on mw1344 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.050 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:27:12] <icinga-wm>	 RECOVERY - Apache HTTP on mw1230 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.046 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:27:12] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1342 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.055 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:27:12] <icinga-wm>	 RECOVERY - Apache HTTP on mw1286 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.039 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:27:13] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1231 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.071 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:27:14] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1223 is OK: HTTP OK: HTTP/1.1 200 OK - 76982 bytes in 0.126 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:27:14] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1313 is OK: HTTP OK: HTTP/1.1 200 OK - 76982 bytes in 0.152 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:27:15] <icinga-wm>	 RECOVERY - Apache HTTP on mw1340 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.043 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:27:15] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1228 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.042 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:27:16] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1276 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.055 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:27:16] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1277 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.053 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:27:16] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1346 is OK: HTTP OK: HTTP/1.1 200 OK - 76982 bytes in 0.131 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:27:17] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1224 is OK: HTTP OK: HTTP/1.1 200 OK - 76982 bytes in 0.127 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:27:18] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1226 is OK: HTTP OK: HTTP/1.1 200 OK - 76982 bytes in 0.134 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:27:18] <icinga-wm>	 RECOVERY - Apache HTTP on mw1312 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 4.479 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:27:18] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1339 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.049 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:27:19] <icinga-wm>	 PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3050 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[15:27:19] <icinga-wm>	 RECOVERY - proton endpoints health on proton2002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/proton
[15:27:20] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1348 is OK: HTTP OK: HTTP/1.1 200 OK - 76982 bytes in 0.146 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:27:28] <wikibugs_>	 (03PS1) 10Ema: Revert "ATS: assign 8G instead of 2G to RAM caches on ats-be" [puppet] - 10https://gerrit.wikimedia.org/r/562849 (https://phabricator.wikimedia.org/T241593)
[15:27:31] <wikibugs_>	 (03PS2) 10Ema: Revert "ATS: assign 8G instead of 2G to RAM caches on ats-be" [puppet] - 10https://gerrit.wikimedia.org/r/562849 (https://phabricator.wikimedia.org/T241593)
[15:27:35] <ottomata>	 um.  and. um.  it didn't work before because I didn't merge in the revert... :( sorry.  I often forget that because of the fetch && diff steps, but forget to merge after it looks good.
[15:27:43] <ottomata>	 yar i'm sorry all.
[15:28:16] <Urbanecm>	 ottomata: it's also possible to revert locally at deploy1001 and care 'bout Gerrit later
[15:28:40] <ottomata>	 gerrit revert was fine
[15:28:52] <icinga-wm>	 RECOVERY - Cxserver LVS codfw on cxserver.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/CX
[15:29:05] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1016 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:29:10] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[15:29:12] <icinga-wm>	 RECOVERY - Restbase edge codfw on text-lb.codfw.wikimedia.org is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase
[15:29:14] <icinga-wm>	 RECOVERY - graphoid endpoints health on scb2001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/graphoid
[15:29:18] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at esams on icinga1001 is CRITICAL: 47.52 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[15:29:20] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[15:29:20] <icinga-wm>	 RECOVERY - Restrouter LVS codfw on restrouter.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase
[15:29:22] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2014 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:29:22] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2010 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:29:26] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2019 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:29:28] <thcipriani>	 ottomata: sorry you have to know so much about deploying. In an emergency it should be easier than that.
[15:29:30] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:29:32] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2018 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:29:45] <icinga-wm>	 RECOVERY - graphoid endpoints health on scb2005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/graphoid
[15:29:52] <icinga-wm>	 RECOVERY - Cxserver LVS eqiad on cxserver.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/CX
[15:29:52] <ottomata>	 i probably should have let someone else do the revert emergency sync
[15:29:54] <icinga-wm>	 RECOVERY - PyBal IPVS diff check on lvs1016 is OK: OK: no difference between hosts in IPVS/PyBal https://wikitech.wikimedia.org/wiki/PyBal
[15:29:56] <icinga-wm>	 RECOVERY - Varnish has reduced HTTP availability #page on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[15:30:00] <icinga-wm>	 RECOVERY - High average POST latency for mw requests on api_appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=api_appserver&var-method=POST
[15:30:04] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs3005 is CRITICAL: PYBAL CRITICAL - CRITICAL - testlb_443: Servers cp3060.esams.wmnet, cp3054.esams.wmnet, cp3064.esams.wmnet, cp3056.esams.wmnet are marked down but pooled: textlb_443: Servers cp3060.esams.wmnet, cp3054.esams.wmnet, cp3052.esams.wmnet, cp3064.esams.wmnet, cp3056.esams.wmnet are marked down but pooled: testlb6_443: Servers cp3060.esams.wmnet, cp3054.esams.wmnet, cp3064.e
[15:30:05] <icinga-wm>	 6.esams.wmnet are marked down but pooled: textlb6_443: Servers cp3060.esams.wmnet, cp3054.esams.wmnet, cp3052.esams.wmnet, cp3064.esams.wmnet, cp3056.esams.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[15:30:16] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1282 is OK: HTTP OK: HTTP/1.1 200 OK - 76983 bytes in 7.418 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:30:18] <icinga-wm>	 RECOVERY - proton endpoints health on proton1002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/proton
[15:30:18] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1024 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:30:23] <_joe_>	 something's not right in esams right now
[15:30:24] <icinga-wm>	 PROBLEM - LVS HTTPS IPv6 #page on text-lb.esams.wikimedia.org_ipv6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems
[15:30:28] <icinga-wm>	 RECOVERY - graphoid endpoints health on scb1003 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/graphoid
[15:30:35] <icinga-wm>	 RECOVERY - graphoid endpoints health on scb1002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/graphoid
[15:30:48] <icinga-wm>	 RECOVERY - ATS TLS has reduced HTTP availability #page on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=13&fullscreen&refresh=1m&orgId=1
[15:31:08] <icinga-wm>	 RECOVERY - PyBal IPVS diff check on lvs1015 is OK: OK: no difference between hosts in IPVS/PyBal https://wikitech.wikimedia.org/wiki/PyBal
[15:31:10] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1020 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:31:16] <icinga-wm>	 RECOVERY - wikidata.org dispatch lag is REALLY high ---4000s- on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1978 bytes in 5.775 second response time https://phabricator.wikimedia.org/project/view/71/
[15:31:30] <icinga-wm>	 RECOVERY - Restbase edge eqsin on text-lb.eqsin.wikimedia.org is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase
[15:31:32] <icinga-wm>	 RECOVERY - graphoid endpoints health on scb2006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/graphoid
[15:31:34] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[15:31:36] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1345 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 629 bytes in 5.002 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:31:46] <tassu>	 Hey folks, I am in Europe and nothing opens at all but VPN via US works just fine
[15:31:48] <icinga-wm>	 RECOVERY - Logstash Elasticsearch indexing errors on icinga1001 is OK: (C)8 ge (W)1 ge 0.8917 https://wikitech.wikimedia.org/wiki/Logstash%23Indexing_errors https://logstash.wikimedia.org/goto/1cee1f1b5d4e6c5e06edb3353a2a4b83 https://grafana.wikimedia.org/dashboard/db/logstash
[15:32:18] <apergos>	 thanks for the report. folks are looking into it.
[15:32:22] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1026 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:32:30] <icinga-wm>	 PROBLEM - Restrouter LVS eqiad on restrouter.svc.eqiad.wmnet is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) is CRITICAL: Test Get a graph from Graphoid returned the unexpected status 400 (expecting: 200) https://wikitech.wikimedia.org/wiki/RESTBase
[15:32:48] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2005 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[15:32:48] <icinga-wm>	 RECOVERY - Graphoid LVS eqiad on graphoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Graphoid
[15:32:48] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1027 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:33:02] <icinga-wm>	 RECOVERY - graphoid endpoints health on scb1004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/graphoid
[15:33:10] <icinga-wm>	 RECOVERY - Restbase edge ulsfo on text-lb.ulsfo.wikimedia.org is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase
[15:33:16] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1023 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:33:16] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1004 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:33:26] <icinga-wm>	 RECOVERY - graphoid endpoints health on scb2002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/graphoid
[15:33:32] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1022 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:34:01] <wikibugs>	 (03CR) 10Ema: [C: 03+2] Revert "ATS: assign 8G instead of 2G to RAM caches on ats-be" [puppet] - 10https://gerrit.wikimedia.org/r/562849 (https://phabricator.wikimedia.org/T241593) (owner: 10Ema)
[15:34:02] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2003 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[15:34:22] <icinga-wm>	 PROBLEM - Cxserver LVS codfw on cxserver.svc.codfw.wmnet is CRITICAL: /v2/translate/{from}/{to}{/provider} (Machine translate an HTML fragment using TestClient, adapt the links to target language wiki.) is CRITICAL: Test Machine translate an HTML fragment using TestClient, adapt the links to target language wiki. returned the unexpected status 500 (expecting: 200) https://wikitech.wikimedia.org/wiki/CX
[15:34:28] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[15:34:42] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1016 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:34:46] <icinga-wm>	 RECOVERY - Restbase edge eqiad on text-lb.eqiad.wikimedia.org is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase
[15:34:56] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:35:02] <icinga-wm>	 PROBLEM - LVS HTTPS IPv4 #page on text-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems
[15:35:02] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1021 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:35:02] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1023 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:35:44] <icinga-wm>	 PROBLEM - Apache HTTP on mw1345 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:35:52] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2003 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[15:36:14] <icinga-wm>	 RECOVERY - Cxserver LVS codfw on cxserver.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/CX
[15:36:34] <icinga-wm>	 PROBLEM - Restbase edge esams on text-lb.esams.wikimedia.org is CRITICAL: WARNING:urllib3.connectionpool:Retrying (Retry(total=2, connect=None, read=None, redirect=None)) after connection broken by ReadTimeoutError(HTTPSConnectionPool(host=text-lb.esams.wikimedia.org, port=443): Read timed out. (read timeout=15),): /api/rest_v1/?spec https://wikitech.wikimedia.org/wiki/RESTBase
[15:37:09] <wikibugs>	 (03PS1) 10BBlack: Depool esams temporarily [dns] - 10https://gerrit.wikimedia.org/r/562850
[15:37:09] <wikibugs>	 (03PS1) 10CDanis: depool esams text [dns] - 10https://gerrit.wikimedia.org/r/562851
[15:37:09] <dcausse>	 VE on mw.org seems to have trouble?
[15:37:12] <wikibugs>	 (03CR) 10BBlack: [C: 03+2] depool esams text [dns] - 10https://gerrit.wikimedia.org/r/562851 (owner: 10CDanis)
[15:37:22] <icinga-wm>	 RECOVERY - Apache HTTP on mw1345 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.065 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:37:30] <apergos>	 dcausse: let's assume it's related to the ongoing outage
[15:37:40] <apergos>	 if it's not cleared up when this does, we can look then
[15:37:40] <icinga-wm>	 PROBLEM - HTTPS Unified ECDSA on cp3050 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection reset by peer https://wikitech.wikimedia.org/wiki/HTTPS
[15:37:40] <icinga-wm>	 PROBLEM - ats-tls HTTPS en.wikipedia.org RSA on cp3050 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection reset by peer https://wikitech.wikimedia.org/wiki/HTTPS
[15:37:40] <icinga-wm>	 PROBLEM - ats-tls HTTPS en.wikipedia.org ECDSA on cp3050 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection reset by peer https://wikitech.wikimedia.org/wiki/HTTPS
[15:37:40] <icinga-wm>	 PROBLEM - HTTPS Unified RSA on cp3050 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection reset by peer https://wikitech.wikimedia.org/wiki/HTTPS
[15:37:41] <bblack>	 !log authdns-update to depool esams
[15:37:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:37:51] <ema>	 !log cumin -s10 -b1 'A:cp-text_esams' 'run-puppet-agent -q ; ats-backend-restart'
[15:37:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:37:56] <icinga-wm>	 RECOVERY - Restrouter LVS eqiad on restrouter.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase
[15:38:08] <icinga-wm>	 RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3050 is OK: HTTP OK: HTTP/1.0 200 OK - 20453 bytes in 0.258 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[15:38:34] <icinga-wm>	 RECOVERY - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase
[15:38:42] <icinga-wm>	 RECOVERY - Ensure traffic_manager binds on 443 and responds to HTTP requests on cp3050 is OK: HTTP OK: HTTP/1.1 200 Ok - 31770 bytes in 0.663 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[15:38:44] <icinga-wm>	 RECOVERY - ats-tls HTTPS en.wikipedia.org RSA on cp3050 is OK: SSL OK - OCSP staple validity for en.wikipedia.org has 547769 seconds left:Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (RSA) valid until 2020-10-06 12:00:00 +0000 (expires in 271 days) https://wikitech.wikimedia.org/wiki/HTTPS
[15:38:44] <icinga-wm>	 RECOVERY - HTTPS Unified RSA on cp3050 is OK: SSL OK - OCSP staple validity for en.wikipedia.org has 547768 seconds left:Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (RSA) valid until 2020-10-06 12:00:00 +0000 (expires in 271 days) https://wikitech.wikimedia.org/wiki/HTTPS
[15:38:54] <icinga-wm>	 RECOVERY - ats-tls HTTPS en.wikipedia.org ECDSA on cp3050 is OK: SSL OK - OCSP staple validity for en.wikipedia.org has 559279 seconds left:Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2020-10-06 12:00:00 +0000 (expires in 271 days) https://wikitech.wikimedia.org/wiki/HTTPS
[15:39:02] <icinga-wm>	 RECOVERY - HTTPS Unified ECDSA on cp3050 is OK: SSL OK - OCSP staple validity for en.wikipedia.org has 559270 seconds left:Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2020-10-06 12:00:00 +0000 (expires in 271 days) https://wikitech.wikimedia.org/wiki/HTTPS
[15:40:02] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1016 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:40:06] <vgutierrez>	 !log restarting ats-tls on esams text nodes
[15:40:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:40:18] <icinga-wm>	 PROBLEM - Apache HTTP on mw1282 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:42:00] <icinga-wm>	 RECOVERY - Apache HTTP on mw1282 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 2.198 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:42:26] <icinga-wm>	 PROBLEM - Restbase edge esams on text-lb.esams.wikimedia.org is CRITICAL: WARNING:urllib3.connectionpool:Retrying (Retry(total=2, connect=None, read=None, redirect=None)) after connection broken by ReadTimeoutError(HTTPSConnectionPool(host=text-lb.esams.wikimedia.org, port=443): Read timed out. (read timeout=15),): /api/rest_v1/?spec https://wikitech.wikimedia.org/wiki/RESTBase
[15:42:34] <wikibugs>	 (03Abandoned) 10BBlack: Depool esams temporarily [dns] - 10https://gerrit.wikimedia.org/r/562850 (owner: 10BBlack)
[15:43:04] <icinga-wm>	 RECOVERY - LVS HTTPS IPv6 #page on text-lb.esams.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 15275 bytes in 7.527 second response time https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems
[15:43:38] <icinga-wm>	 RECOVERY - Restbase edge esams on text-lb.esams.wikimedia.org is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase
[15:43:56] <icinga-wm>	 RECOVERY - LVS HTTPS IPv4 #page on text-lb.esams.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 15263 bytes in 0.544 second response time https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems
[15:44:38] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs3005 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[15:45:00] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs3007 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[15:45:34] <ema>	 !log cumin -s10 -b1 'A:cp-text_eqiad' 'run-puppet-agent -q ; ats-backend-restart'
[15:45:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:46:16] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on api_appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=api_appserver&var-method=GET
[15:46:42] <icinga-wm>	 PROBLEM - HTTPS Unified ECDSA on cp3058 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection reset by peer https://wikitech.wikimedia.org/wiki/HTTPS
[15:46:42] <icinga-wm>	 PROBLEM - ats-tls HTTPS en.wikipedia.org RSA on cp3058 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection reset by peer https://wikitech.wikimedia.org/wiki/HTTPS
[15:47:28] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at esams on icinga1001 is OK: (C)60 le (W)70 le 81.4 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[15:47:34] <icinga-wm>	 RECOVERY - ats-tls HTTPS en.wikipedia.org RSA on cp3058 is OK: SSL OK - OCSP staple validity for en.wikipedia.org has 547238 seconds left:Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (RSA) valid until 2020-10-06 12:00:00 +0000 (expires in 271 days) https://wikitech.wikimedia.org/wiki/HTTPS
[15:47:34] <icinga-wm>	 RECOVERY - HTTPS Unified ECDSA on cp3058 is OK: SSL OK - OCSP staple validity for en.wikipedia.org has 558758 seconds left:Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2020-10-06 12:00:00 +0000 (expires in 271 days) https://wikitech.wikimedia.org/wiki/HTTPS
[15:47:38] <icinga-wm>	 RECOVERY - Ensure traffic_manager binds on 443 and responds to HTTP requests on cp3058 is OK: HTTP OK: HTTP/1.1 200 Ok - 31683 bytes in 0.430 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[15:47:52] <icinga-wm>	 RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3058 is OK: HTTP OK: HTTP/1.0 200 OK - 20462 bytes in 0.268 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[15:47:58] <icinga-wm>	 PROBLEM - ats-tls HTTPS en.wikipedia.org RSA on cp3062 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection reset by peer https://wikitech.wikimedia.org/wiki/HTTPS
[15:47:58] <icinga-wm>	 PROBLEM - HTTPS Unified RSA on cp3062 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection reset by peer https://wikitech.wikimedia.org/wiki/HTTPS
[15:47:58] <icinga-wm>	 PROBLEM - ats-tls HTTPS en.wikipedia.org ECDSA on cp3062 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection reset by peer https://wikitech.wikimedia.org/wiki/HTTPS
[15:48:32] <icinga-wm>	 RECOVERY - Ensure traffic_manager binds on 443 and responds to HTTP requests on cp3062 is OK: HTTP OK: HTTP/1.1 200 Ok - 31603 bytes in 0.453 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[15:48:48] <icinga-wm>	 RECOVERY - ats-tls HTTPS en.wikipedia.org ECDSA on cp3062 is OK: SSL OK - OCSP staple validity for en.wikipedia.org has 558685 seconds left:Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2020-10-06 12:00:00 +0000 (expires in 271 days) https://wikitech.wikimedia.org/wiki/HTTPS
[15:48:48] <icinga-wm>	 RECOVERY - ats-tls HTTPS en.wikipedia.org RSA on cp3062 is OK: SSL OK - OCSP staple validity for en.wikipedia.org has 547165 seconds left:Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (RSA) valid until 2020-10-06 12:00:00 +0000 (expires in 271 days) https://wikitech.wikimedia.org/wiki/HTTPS
[15:49:18] <icinga-wm>	 RECOVERY - HTTPS Unified RSA on cp3062 is OK: SSL OK - OCSP staple validity for en.wikipedia.org has 547135 seconds left:Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (RSA) valid until 2020-10-06 12:00:00 +0000 (expires in 271 days) https://wikitech.wikimedia.org/wiki/HTTPS
[15:49:30] <icinga-wm>	 RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3062 is OK: HTTP OK: HTTP/1.0 200 OK - 20457 bytes in 0.255 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[15:49:39] <wikibugs_>	 (03PS1) 10Jbond: ldap - idp:  add ldap helper script for enabling u2f on cas [puppet] - 10https://gerrit.wikimedia.org/r/562852
[15:50:58] <wikibugs>	 (03PS2) 10Jbond: ldap - idp:  add ldap helper script for enabling u2f on cas [puppet] - 10https://gerrit.wikimedia.org/r/562852
[15:51:00] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at esams on icinga1001 is CRITICAL: 37.49 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[15:52:30] <wikibugs>	 (03CR) 10Ppchelko: [C: 04-1] "LGTM. -1 until I095ed9b4cf2afd2e933738246d49fa416d151d6e is fully deployed." [puppet] - 10https://gerrit.wikimedia.org/r/562845 (https://phabricator.wikimedia.org/T241756) (owner: 10Clarakosi)
[15:52:45] <wikibugs>	 (03PS3) 10Jbond: ldap - idp:  add ldap helper script for enabling u2f on cas [puppet] - 10https://gerrit.wikimedia.org/r/562852
[15:52:46] <wikibugs>	 (03PS4) 10Jbond: ldap - idp:  add ldap helper script for enabling u2f on cas [puppet] - 10https://gerrit.wikimedia.org/r/562852
[15:54:44] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] ldap - idp:  add ldap helper script for enabling u2f on cas [puppet] - 10https://gerrit.wikimedia.org/r/562852 (owner: 10Jbond)
[15:56:21] <wikibugs>	 (03PS5) 10Jbond: ldap - idp:  add ldap helper script for enabling u2f on cas [puppet] - 10https://gerrit.wikimedia.org/r/562852
[15:56:59] <wikibugs>	 (03PS1) 10Herron: apply profile::base::firewall to default nodes [puppet] - 10https://gerrit.wikimedia.org/r/562856
[15:57:42] <icinga-wm>	 PROBLEM - graphoid endpoints health on scb1003 is CRITICAL: /{domain}/v1/{format}/{title}/{revid}/{id} (retrieve PNG from mediawiki.org) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/graphoid
[15:58:10] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received https://wikitech.wikimedia.org/wiki/Citoid
[15:58:24] <icinga-wm>	 PROBLEM - Restbase edge eqiad on text-lb.eqiad.wikimedia.org is CRITICAL: /api/rest_v1/page/title/{title} (Get rev by title from storage) timed out before a response was received: /api/rest_v1/page/references/{title} (Get references from storage) timed out before a response was received: /api/rest_v1/transform/wikitext/to/html/{title} (Transform wikitext to html) timed out before a response was received https://wikitech.wikimedia
[15:58:24] <icinga-wm>	 e
[16:00:29] <wikibugs>	 (03PS1) 10BBlack: Revert "depool esams text" [dns] - 10https://gerrit.wikimedia.org/r/562858
[16:00:30] <wikibugs>	 (03CR) 10BBlack: [C: 03+2] Revert "depool esams text" [dns] - 10https://gerrit.wikimedia.org/r/562858 (owner: 10BBlack)
[16:00:41] <bblack>	 !log re-pooling esams text traffic in DNS
[16:01:10] <icinga-wm>	 PROBLEM - proton endpoints health on proton1002 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) timed out before a response was received: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/pro
[16:01:48] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at esams on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[16:01:56] <_joe_>	 what's all this ^^
[16:01:59] <stashbot>	 bblack: Failed to log message to wiki. Somebody should check the error logs.
[16:02:02] <icinga-wm>	 PROBLEM - graphoid endpoints health on scb1004 is CRITICAL: /{domain}/v1/{format}/{title}/{revid}/{id} (retrieve PNG from mediawiki.org) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/graphoid
[16:02:30] <bblack>	 _joe: we suspect text-lb overload with all the esams traffic
[16:02:30] <icinga-wm>	 PROBLEM - Mobileapps LVS eqiad on mobileapps.svc.eqiad.wmnet is CRITICAL: /{domain}/v1/data/css/mobile/site (Get site-specific CSS) timed out before a response was received https://wikitech.wikimedia.org/wiki/Mobileapps_%28service%29
[16:02:32] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1016 is CRITICAL: PYBAL CRITICAL - CRITICAL - textlb_443: Servers cp1077.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[16:03:00] <icinga-wm>	 RECOVERY - graphoid endpoints health on scb1003 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/graphoid
[16:03:16] <icinga-wm>	 PROBLEM - Restrouter LVS eqiad on restrouter.svc.eqiad.wmnet is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) is CRITICAL: Test Get a graph from Graphoid returned the unexpected status 400 (expecting: 200) https://wikitech.wikimedia.org/wiki/RESTBase
[16:03:23] <wikibugs>	 (03PS6) 10Jbond: ldap - idp:  add ldap helper script for enabling u2f on cas [puppet] - 10https://gerrit.wikimedia.org/r/562852
[16:03:48] <icinga-wm>	 RECOVERY - graphoid endpoints health on scb1004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/graphoid
[16:04:12] <icinga-wm>	 RECOVERY - Mobileapps LVS eqiad on mobileapps.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Mobileapps_%28service%29
[16:04:20] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1016 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[16:04:42] <icinga-wm>	 RECOVERY - proton endpoints health on proton1002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/proton
[16:04:50] <icinga-wm>	 PROBLEM - WDQS high update lag on wdqs1010 is CRITICAL: 3653 ge 3600 https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook%23Update_lag https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen
[16:05:00] <icinga-wm>	 RECOVERY - Restrouter LVS eqiad on restrouter.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase
[16:05:20] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid
[16:05:26] <icinga-wm>	 RECOVERY - Restbase edge eqiad on text-lb.eqiad.wikimedia.org is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase
[16:09:12] <wikibugs>	 10Operations, 10Traffic, 10Performance Issue: Current performance issues - https://phabricator.wikimedia.org/T242228 (10Gestumblindi)
[16:12:02] <wikibugs>	 10Operations, 10ops-eqiad, 10vm-requests: rack/setup/install ganeti10([09]|1[0-8[).eqiad.wmnet - https://phabricator.wikimedia.org/T228924 (10herron) The row_A ganeti group is running low on memory capacity (please see T239151#5707691) .  Should we allocate a few of these new hosts to expand the existing row...
[16:16:20] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at eqiad on icinga1001 is CRITICAL: 48.13 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[16:19:08] <icinga-wm>	 RECOVERY - WDQS high update lag on wdqs1010 is OK: (C)3600 ge (W)1200 ge 904.7 https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook%23Update_lag https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen
[16:19:19] <_joe_>	 the eqiad alert is expected
[16:20:50] <ema>	 !log rolling ats-be restart on !text@eqiad, !text@esams to apply https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/562849/
[16:20:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:22:14] <apergos>	 I guess it clears itself in another 30 mins?
[16:25:14] <_joe_>	 !log running puppet on deploy1001 to remove my hot-patch to scap.cfg
[16:25:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:28:57] <wikibugs>	 10Operations, 10Traffic, 10Performance Issue: Current performance issues - https://phabricator.wikimedia.org/T242228 (10Joe) 05Open→03Resolved a:03Joe Hi, thanks for your report!  We were already aware of the issues, and were at work to solve them. Everything should be fine now though.
[16:29:53] <wikibugs>	 10Operations, 10Traffic, 10Performance Issue: Current performance issues - https://phabricator.wikimedia.org/T242228 (10Joe) An incident report will be published later on wikitech at https://wikitech.wikimedia.org/wiki/Incident_documentation
[16:43:12] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at eqiad on icinga1001 is OK: (C)60 le (W)70 le 70.48 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[16:44:08] <apergos>	 yep 30 mins like clockwork
[16:53:39] <wikibugs>	 (03PS6) 10Muehlenhoff: Don't install the Postgres contrib package on Buster [puppet] - 10https://gerrit.wikimedia.org/r/562787
[16:54:25] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Don't install the Postgres contrib package on Buster [puppet] - 10https://gerrit.wikimedia.org/r/562787 (owner: 10Muehlenhoff)
[16:58:38] <wikibugs>	 (03CR) 10CDanis: [C: 03+1] "\o/" [homer/public] - 10https://gerrit.wikimedia.org/r/562692 (owner: 10Ayounsi)
[17:11:14] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 52275664 and 5 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[17:13:02] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 42328 and 57 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[17:15:32] <Amir1>	 since when we started to have postgres in production
[17:16:07] <Reedy>	 For maps and shizz
[17:16:09] <Reedy>	 (a while)
[17:17:01] <moritzm>	 also puppetdb
[17:17:45] <Amir1>	 I thought we didn't have postgres at all
[17:22:21] <bd808>	 OpenStreetMaps doesn't work with any other database as far as I know
[17:25:56] <wikibugs>	 (03PS7) 10Muehlenhoff: Don't install the Postgres contrib package on Buster [puppet] - 10https://gerrit.wikimedia.org/r/562787
[17:27:54] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops, 10fundraising-tech-ops: hw troubleshooting: hardware RAID predictive failure for bellatrix.frack.codfw.wmnet - https://phabricator.wikimedia.org/T240876 (10Papaul) @Jgreen  this server is out of warranty since 2017 and we have a replacement server already on site that w...
[17:28:49] <wikibugs>	 (03PS1) 10Elukey: admin: add kerberos flag for user dsharpe [puppet] - 10https://gerrit.wikimedia.org/r/562882 (https://phabricator.wikimedia.org/T242244)
[17:29:20] <icinga-wm>	 RECOVERY - HP RAID on ms-be2035 is OK: OK: Slot 3: OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, 2I:4:1, 2I:4:2 - Controller: OK - Battery/Capacitor: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[17:30:06] <wikibugs>	 10Operations, 10DBA: backup2001 crashed 2019-12-08 - https://phabricator.wikimedia.org/T240177 (10Papaul) @Marostegui  thanks will wait tomorrow the 9th so he can take the server down for the FW upgrade.
[17:32:05] <wikibugs>	 (03Abandoned) 10Zoranzoki21: Rearrange of wmgEnableGeoData [mediawiki-config] - 10https://gerrit.wikimedia.org/r/561658 (owner: 10Zoranzoki21)
[17:33:09] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] admin: add kerberos flag for user dsharpe [puppet] - 10https://gerrit.wikimedia.org/r/562882 (https://phabricator.wikimedia.org/T242244) (owner: 10Elukey)
[17:36:26] <wikibugs>	 10Operations, 10Analytics, 10Product-Analytics, 10SRE-Access-Requests: Access to analytics infrastructure for SNowick_WMF - https://phabricator.wikimedia.org/T242026 (10Dzahn) 05Open→03Resolved Cool, thanks, Ottomata. Closing ticket.
[17:51:41] <wikibugs>	 (03PS1) 10Joal: Bump aqs druid snapshot to 2019-12 [puppet] - 10https://gerrit.wikimedia.org/r/562887
[17:51:46] <hauskatze>	 sigh, I was updating [[m:System administrators]]
[17:52:08] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=atlas_exporter site={codfw,eqiad} https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[17:52:36] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] Add tables to analytics regular sqoop list [puppet] - 10https://gerrit.wikimedia.org/r/562322 (https://phabricator.wikimedia.org/T242015) (owner: 10Joal)
[17:52:41] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] Bump aqs druid snapshot to 2019-12 [puppet] - 10https://gerrit.wikimedia.org/r/562887 (owner: 10Joal)
[17:53:54] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[17:57:50] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10WMF-Legal: Request to add Silvan Heintze to the ldap/wmde group - https://phabricator.wikimedia.org/T242080 (10Dzahn) @RStallman-legalteam good to go?
[17:59:38] <icinga-wm>	 RECOVERY - Device not healthy -SMART- on ms-be2035 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/SMART%23Alerts https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ms-be2035&var-datasource=codfw+prometheus/ops
[18:03:37] <logmsgbot>	 !log elukey@cumin1001 START - Cookbook sre.aqs.roll-restart
[18:03:37] <logmsgbot>	 !log elukey@cumin1001 END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99)
[18:03:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:03:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:04:00] <logmsgbot>	 !log elukey@cumin1001 START - Cookbook sre.aqs.roll-restart
[18:04:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:04:10] <elukey>	 (was not using tmux)
[18:04:41] <wikibugs>	 (03CR) 10Volans: "Do you have a compiler result by any chance?" [puppet] - 10https://gerrit.wikimedia.org/r/562787 (owner: 10Muehlenhoff)
[18:07:25] <logmsgbot>	 !log elukey@cumin1001 END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
[18:07:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:10:42] <volans>	 where is wikibugs gone? killed for excess flood apparently
[18:14:30] <chaomodus>	 naughty old bots
[18:17:31] <wikibugs>	 (03PS1) 10RobH: setting new eqsin PDUs dns entries [dns] - 10https://gerrit.wikimedia.org/r/562894 (https://phabricator.wikimedia.org/T242250)
[18:18:36] <logmsgbot>	 !log ppchelko@deploy1001 Started deploy [restbase/deploy@ebb1849] (dev-cluster): Clean up Parsoid-PHP transition code & config T241756
[18:18:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:18:39] <stashbot>	 T241756: Clean-up Parsoid-PHP transition code from RESTBase - https://phabricator.wikimedia.org/T241756
[18:19:35] <wikibugs>	 (03CR) 10RobH: [C: 03+2] setting new eqsin PDUs dns entries [dns] - 10https://gerrit.wikimedia.org/r/562894 (https://phabricator.wikimedia.org/T242250) (owner: 10RobH)
[18:21:16] <logmsgbot>	 !log ppchelko@deploy1001 Finished deploy [restbase/deploy@ebb1849] (dev-cluster): Clean up Parsoid-PHP transition code & config T241756 (duration: 02m 41s)
[18:21:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:22:03] <logmsgbot>	 !log ppchelko@deploy1001 Started deploy [restbase/deploy@ebb1849]: Clean up Parsoid-PHP transition code & config T241756
[18:22:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:25:08] <hauskatze>	 moritzm: hi, can I have a quick word?
[18:30:35] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: nagios_common: contacgroups: arturo in email paging for prod servers [puppet] - 10https://gerrit.wikimedia.org/r/562902
[18:33:08] <volans>	 !log restarted wikibugs
[18:33:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:33:31] <wikibugs>	 (03CR) 10Ppchelko: [C: 03+1] "Currently this variable is temporary not used in RESTBase. Once this is merged and deployed, we will switch RESTBase to using the variable" [puppet] - 10https://gerrit.wikimedia.org/r/562845 (https://phabricator.wikimedia.org/T241756) (owner: 10Clarakosi)
[18:34:26] <wikibugs>	 (03PS3) 10Jforrester: logspam.pl: Shorten paths and include fatals [puppet] - 10https://gerrit.wikimedia.org/r/559246 (https://phabricator.wikimedia.org/T242252) (owner: 10Brennen Bearnes)
[18:34:50] <wikibugs>	 (03CR) 10Jforrester: [C: 03+1] logspam.pl: Shorten paths and include fatals [puppet] - 10https://gerrit.wikimedia.org/r/559246 (https://phabricator.wikimedia.org/T242252) (owner: 10Brennen Bearnes)
[18:36:30] <logmsgbot>	 !log ppchelko@deploy1001 Finished deploy [restbase/deploy@ebb1849]: Clean up Parsoid-PHP transition code & config T241756 (duration: 14m 27s)
[18:36:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:36:35] <stashbot>	 T241756: Clean-up Parsoid-PHP transition code from RESTBase - https://phabricator.wikimedia.org/T241756
[18:39:05] <logmsgbot>	 !log otto@deploy1001 helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
[18:39:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:43:31] <wikibugs>	 (03PS1) 10Ottomata: staging/eventgate-logging-external - fix name of client error schema to precache [deployment-charts] - 10https://gerrit.wikimedia.org/r/562906 (https://phabricator.wikimedia.org/T240985)
[18:43:52] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1097:3315', diff saved to https://phabricator.wikimedia.org/P10089 and previous config saved to /var/cache/conftool/dbconfig/20200108-184350-marostegui.json
[18:43:52] <wikibugs>	 (03CR) 10Ottomata: [V: 03+2 C: 03+2] staging/eventgate-logging-external - fix name of client error schema to precache [deployment-charts] - 10https://gerrit.wikimedia.org/r/562906 (https://phabricator.wikimedia.org/T240985) (owner: 10Ottomata)
[18:43:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:45:11] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1096:3315', diff saved to https://phabricator.wikimedia.org/P10090 and previous config saved to /var/cache/conftool/dbconfig/20200108-184510-marostegui.json
[18:45:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:46:08] <marostegui>	 !log Remove partitions from dewiki.revision on db1096:3315 T239453
[18:46:09] <wikibugs>	 10Operations, 10Traffic, 10netops, 10Patch-For-Review: Offload pings to dedicated server - https://phabricator.wikimedia.org/T190090 (10CDanis) 05Resolved→03Open boldly re-opening this, now that the POPs have Ganeti clusters available.  Today I learned that text-lb.esams receives something like 60k+ PP...
[18:46:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:46:10] <stashbot>	 T239453: Remove partitions from revision table - https://phabricator.wikimedia.org/T239453
[18:46:32] <logmsgbot>	 !log otto@deploy1001 helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
[18:46:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:48:43] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10WMF-Legal: Request to add Silvan Heintze to the ldap/wmde group - https://phabricator.wikimedia.org/T242080 (10RStallman-legalteam) Yes, the NDA is signed and filed. Thanks all!
[18:50:16] <wikibugs>	 (03PS27) 10Cwhite: lvs, prometheus, profile: add blackbox job helper and enable openapi scrapes [puppet] - 10https://gerrit.wikimedia.org/r/542472 (https://phabricator.wikimedia.org/T205870)
[18:52:16] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] lvs, prometheus, profile: add blackbox job helper and enable openapi scrapes [puppet] - 10https://gerrit.wikimedia.org/r/542472 (https://phabricator.wikimedia.org/T205870) (owner: 10Cwhite)
[18:53:25] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10WMF-Legal: Request to add Silvan Heintze to the ldap/wmde group - https://phabricator.wikimedia.org/T242080 (10Dzahn)
[18:56:00] <wikibugs>	 (03PS28) 10Cwhite: lvs, prometheus, profile: add blackbox job helper and enable openapi scrapes [puppet] - 10https://gerrit.wikimedia.org/r/542472 (https://phabricator.wikimedia.org/T205870)
[18:58:00] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] lvs, prometheus, profile: add blackbox job helper and enable openapi scrapes [puppet] - 10https://gerrit.wikimedia.org/r/542472 (https://phabricator.wikimedia.org/T205870) (owner: 10Cwhite)
[19:00:04] <jouncebot>	 RoanKattouw, Niharika, and Urbanecm: Time to snap out of that daydream and deploy Morning SWAT(Max 6 patches). Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200108T1900).
[19:00:04] <jouncebot>	 Ammarpad: A patch you scheduled for Morning SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[19:00:36] <wikibugs>	 (03PS1) 10Ottomata: eventgate - use new primary schema repository by default [deployment-charts] - 10https://gerrit.wikimedia.org/r/562908 (https://phabricator.wikimedia.org/T240985)
[19:01:33] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] eventgate - use new primary schema repository by default [deployment-charts] - 10https://gerrit.wikimedia.org/r/562908 (https://phabricator.wikimedia.org/T240985) (owner: 10Ottomata)
[19:02:38] <wikibugs>	 10Operations, 10ops-eqiad, 10fundraising-tech-ops: rack/setup/install frdb1003.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T239139 (10Jgreen)
[19:02:50] <wikibugs>	 10Operations, 10ops-eqiad, 10fundraising-tech-ops: rack/setup/install frdb1003.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T239139 (10Jgreen)
[19:03:25] <wikibugs>	 (03PS1) 10Ottomata: Add missing eventgate-0.0.17.tgz [deployment-charts] - 10https://gerrit.wikimedia.org/r/562909 (https://phabricator.wikimedia.org/T240985)
[19:03:48] <wikibugs>	 10Operations, 10ops-eqiad, 10fundraising-tech-ops: rack/setup/install frdb1003.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T239139 (10Jgreen)
[19:04:25] <logmsgbot>	 !log joal@deploy1001 Started deploy [analytics/refinery@c205576]: Regular analytics weekly deploy train
[19:04:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:04:51] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] Add missing eventgate-0.0.17.tgz [deployment-charts] - 10https://gerrit.wikimedia.org/r/562909 (https://phabricator.wikimedia.org/T240985) (owner: 10Ottomata)
[19:06:45] <wikibugs>	 10Operations, 10ops-codfw, 10fundraising-tech-ops: rack/setup/install frdb2002.frack.codfw.wmnet - https://phabricator.wikimedia.org/T239733 (10Jgreen)
[19:07:00] <wikibugs>	 10Operations, 10ops-codfw, 10fundraising-tech-ops: rack/setup/install frdb2002.frack.codfw.wmnet - https://phabricator.wikimedia.org/T239733 (10Jgreen)
[19:07:50] <wikibugs>	 (03PS29) 10Cwhite: lvs, prometheus, profile: add blackbox job helper and enable openapi scrapes [puppet] - 10https://gerrit.wikimedia.org/r/542472 (https://phabricator.wikimedia.org/T205870)
[19:09:06] <wikibugs>	 10Operations, 10ops-codfw, 10fundraising-tech-ops: rack/setup/install frdb2002.frack.codfw.wmnet - https://phabricator.wikimedia.org/T239733 (10Jgreen)
[19:09:41] <wikibugs>	 10Operations, 10ops-codfw, 10fundraising-tech-ops: rack/setup/install frdb2002.frack.codfw.wmnet - https://phabricator.wikimedia.org/T239733 (10Jgreen)
[19:11:43] <wikibugs>	 (03PS1) 10Dzahn: admins: add Silvan Heintze to ldap_only_admins (WMDE) [puppet] - 10https://gerrit.wikimedia.org/r/562910 (https://phabricator.wikimedia.org/T242080)
[19:13:00] <logmsgbot>	 !log joal@deploy1001 Finished deploy [analytics/refinery@c205576]: Regular analytics weekly deploy train (duration: 08m 36s)
[19:13:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:13:20] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] admins: add Silvan Heintze to ldap_only_admins (WMDE) [puppet] - 10https://gerrit.wikimedia.org/r/562910 (https://phabricator.wikimedia.org/T242080) (owner: 10Dzahn)
[19:13:25] <logmsgbot>	 !log joal@deploy1001 Started deploy [analytics/refinery@c205576] (thin): Regular analytics weekly deploy train [thin]
[19:13:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:13:32] <logmsgbot>	 !log joal@deploy1001 Finished deploy [analytics/refinery@c205576] (thin): Regular analytics weekly deploy train [thin] (duration: 00m 07s)
[19:13:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:15:29] <logmsgbot>	 !log otto@deploy1001 helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
[19:15:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:16:09] <mutante>	 !log LDAP - added 'sihe' to 'wmde' and 'nda' (T242080)
[19:16:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:16:11] <stashbot>	 T242080: Request to add Silvan Heintze to the ldap/wmde group - https://phabricator.wikimedia.org/T242080
[19:16:33] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10WMF-Legal, 10Patch-For-Review: Request to add Silvan Heintze to the ldap/wmde group - https://phabricator.wikimedia.org/T242080 (10Dzahn)
[19:17:55] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10WMF-Legal, 10Patch-For-Review: Request to add Silvan Heintze to the ldap/wmde group - https://phabricator.wikimedia.org/T242080 (10Dzahn) 05Open→03Resolved @Silvan_WMDE Done! Things should work as expected now. You are in the LDAP group(s).
[19:22:12] <wikibugs>	 (03CR) 10RLazarus: [C: 03+2] Refactor, preparatory to testing multiple hosts in parallel. [software/httpbb] - 10https://gerrit.wikimedia.org/r/555515 (owner: 10RLazarus)
[19:22:19] <wikibugs>	 (03CR) 10RLazarus: [C: 03+2] Test multiple hosts in parallel. [software/httpbb] - 10https://gerrit.wikimedia.org/r/559952 (owner: 10RLazarus)
[19:23:14] <wikibugs>	 (03PS2) 10Ammarpad: Add ipblock-exempt and extendedconfirmed to bot group on fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562040 (https://phabricator.wikimedia.org/T241904)
[19:25:10] <wikibugs>	 (03PS3) 10Ammarpad: Set $wgArticleCountMethod to 'any' for minwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/561572 (https://phabricator.wikimedia.org/T241694)
[19:26:37] <wikibugs>	 (03PS11) 10Ammarpad: Add minerva custom log for la.wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/557439 (https://phabricator.wikimedia.org/T240728)
[19:28:36] <wikibugs>	 (03PS1) 10Ottomata: eventgate-logging-external - use proper schema_title name for mediawiki.client.error stream [deployment-charts] - 10https://gerrit.wikimedia.org/r/562930 (https://phabricator.wikimedia.org/T240985)
[19:28:52] <wikibugs>	 10Operations, 10ops-codfw, 10fundraising-tech-ops: rack/setup/install frban2001.frack.codfw.wmnet - https://phabricator.wikimedia.org/T234069 (10Jgreen)
[19:29:17] <wikibugs>	 10Operations, 10ops-eqiad, 10fundraising-tech-ops: (ASAP) rack/setup/install frban1001.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T234068 (10Jgreen)
[19:29:22] <wikibugs>	 10Operations, 10ops-eqiad, 10fundraising-tech-ops: (ASAP) rack/setup/install frnetmon1001.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T232137 (10Jgreen)
[19:29:41] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] eventgate-logging-external - use proper schema_title name for mediawiki.client.error stream [deployment-charts] - 10https://gerrit.wikimedia.org/r/562930 (https://phabricator.wikimedia.org/T240985) (owner: 10Ottomata)
[19:29:58] <wikibugs>	 10Operations, 10ops-eqiad, 10fundraising-tech-ops: (ASAP) rack/setup/install frban1001.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T234068 (10Jgreen)
[19:30:10] <wikibugs>	 10Operations, 10ops-eqiad, 10fundraising-tech-ops: (ASAP) rack/setup/install frban1001.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T234068 (10Jgreen)
[19:30:57] <wikibugs>	 10Operations, 10ops-codfw, 10fundraising-tech-ops: rack/setup/install frban2001.frack.codfw.wmnet - https://phabricator.wikimedia.org/T234069 (10Jgreen)
[19:31:26] <logmsgbot>	 !log otto@deploy1001 helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
[19:31:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:31:30] <wikibugs>	 10Operations, 10ops-codfw, 10fundraising-tech-ops: rack/setup/install frban2001.frack.codfw.wmnet - https://phabricator.wikimedia.org/T234069 (10Jgreen)
[19:38:40] <wikibugs>	 (03PS4) 10Brennen Bearnes: logspam.pl: Shorten paths and include fatals [puppet] - 10https://gerrit.wikimedia.org/r/559246 (https://phabricator.wikimedia.org/T242252)
[19:40:43] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] logspam.pl: Shorten paths and include fatals [puppet] - 10https://gerrit.wikimedia.org/r/559246 (https://phabricator.wikimedia.org/T242252) (owner: 10Brennen Bearnes)
[19:43:01] <wikibugs>	 (03PS5) 10Brennen Bearnes: logspam.pl: Shorten paths and include fatals [puppet] - 10https://gerrit.wikimedia.org/r/559246 (https://phabricator.wikimedia.org/T242252)
[19:49:08] <wikibugs>	 (03PS30) 10Cwhite: lvs, prometheus, profile: add blackbox job helper and enable openapi scrapes [puppet] - 10https://gerrit.wikimedia.org/r/542472 (https://phabricator.wikimedia.org/T205870)
[19:54:03] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on ms-be2035 is CRITICAL: connect to address 10.192.32.165 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[19:54:09] <wikibugs>	 (03PS31) 10Cwhite: lvs, prometheus, profile: add blackbox job helper and enable openapi scrapes [puppet] - 10https://gerrit.wikimedia.org/r/542472 (https://phabricator.wikimedia.org/T205870)
[19:54:19] <icinga-wm>	 PROBLEM - configured eth on ms-be2035 is CRITICAL: connect to address 10.192.32.165 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[19:54:41] <icinga-wm>	 PROBLEM - dhclient process on ms-be2035 is CRITICAL: connect to address 10.192.32.165 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[19:54:43] <icinga-wm>	 PROBLEM - DPKG on ms-be2035 is CRITICAL: connect to address 10.192.32.165 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[19:55:11] <icinga-wm>	 PROBLEM - Check size of conntrack table on ms-be2035 is CRITICAL: connect to address 10.192.32.165 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[19:55:21] <icinga-wm>	 PROBLEM - very high load average likely xfs on ms-be2035 is CRITICAL: connect to address 10.192.32.165 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Swift
[19:55:37] <icinga-wm>	 PROBLEM - Disk space on ms-be2035 is CRITICAL: connect to address 10.192.32.165 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ms-be2035&var-datasource=codfw+prometheus/ops
[19:55:43] <icinga-wm>	 PROBLEM - Check systemd state on ms-be2035 is CRITICAL: connect to address 10.192.32.165 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[19:55:53] <icinga-wm>	 PROBLEM - Check the NTP synchronisation status of timesyncd on ms-be2035 is CRITICAL: connect to address 10.192.32.165 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/NTP
[19:57:40] <godog>	 downtimed ^ known, I'll look tomorrow
[19:58:11] <wikibugs>	 (03PS1) 10Dzahn: admins: add Kai Nissen to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/562940 (https://phabricator.wikimedia.org/T241838)
[20:00:04] <jouncebot>	 longma and liw: Your horoscope predicts another unfortunate Mediawiki train - American+European Version deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200108T2000).
[20:00:20] <wikibugs>	 (03CR) 10Cwhite: "PCC looks good https://puppet-compiler.wmflabs.org/compiler1001/20280/" [puppet] - 10https://gerrit.wikimedia.org/r/542472 (https://phabricator.wikimedia.org/T205870) (owner: 10Cwhite)
[20:00:53] <logmsgbot>	 !log otto@deploy1001 helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
[20:00:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:06:16] <wikibugs>	 (03PS1) 10Jeena Huneidi: group1 wikis to 1.35.0-wmf.14  refs T233862 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562946
[20:06:19] <wikibugs>	 (03CR) 10Jeena Huneidi: [C: 03+2] group1 wikis to 1.35.0-wmf.14  refs T233862 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562946 (owner: 10Jeena Huneidi)
[20:09:08] <wikibugs>	 (03PS1) 10Dzahn: admins: add Dave Pifke to perf-team admins [puppet] - 10https://gerrit.wikimedia.org/r/562947 (https://phabricator.wikimedia.org/T242189)
[20:09:28] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to production servers in perf-team group for dpifke - https://phabricator.wikimedia.org/T242189 (10Dzahn) a:03Dzahn
[20:11:50] <wikibugs>	 (03CR) 10Bartosz Dziewoński: "Peter said "great"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/561649 (owner: 10Bartosz Dziewoński)
[20:13:17] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 48124368 and 8 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[20:13:23] <wikibugs>	 (03CR) 10Bartosz Dziewoński: "https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200109T0000" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/561649 (owner: 10Bartosz Dziewoński)
[20:13:29] <wikibugs>	 (03PS2) 10Bartosz Dziewoński: Remove 2017 wikitext editor as default on Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/561649
[20:13:40] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "it's just about the allowed data types. to allow changing it to 7.3 in cloud" [puppet] - 10https://gerrit.wikimedia.org/r/561931 (owner: 10Dzahn)
[20:15:05] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 43016 and 30 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[20:17:29] <wikibugs>	 (03PS3) 10Dzahn: gerrit: adjust bacula backup behaviour to deal with multiple hosts [puppet] - 10https://gerrit.wikimedia.org/r/562639 (https://phabricator.wikimedia.org/T239151)
[20:17:44] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] gerrit: adjust bacula backup behaviour to deal with multiple hosts [puppet] - 10https://gerrit.wikimedia.org/r/562639 (https://phabricator.wikimedia.org/T239151) (owner: 10Dzahn)
[20:31:29] <wikibugs>	 (03CR) 10Dzahn: [V: 03+2 C: 03+2] gerrit: adjust bacula backup behaviour to deal with multiple hosts [puppet] - 10https://gerrit.wikimedia.org/r/562639 (https://phabricator.wikimedia.org/T239151) (owner: 10Dzahn)
[20:34:32] <wikibugs>	 (03CR) 10Jeena Huneidi: [C: 03+2] "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562946 (owner: 10Jeena Huneidi)
[20:36:09] <wikibugs>	 (03CR) 10Jforrester: [C: 03+2] "…" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562946 (owner: 10Jeena Huneidi)
[20:40:15] <mutante>	 !log contint1001 - restarting zuul service
[20:40:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:40:34] <wikibugs>	 (03CR) 10Dzahn: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562946 (owner: 10Jeena Huneidi)
[20:45:02] <wikibugs>	 (03CR) 10Dzahn: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562946 (owner: 10Jeena Huneidi)
[20:45:17] <wikibugs>	 (03CR) 10Jforrester: [C: 03+2] "…" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562946 (owner: 10Jeena Huneidi)
[20:46:12] <wikibugs>	 (03Merged) 10jenkins-bot: group1 wikis to 1.35.0-wmf.14  refs T233862 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562946 (owner: 10Jeena Huneidi)
[20:49:11] <icinga-wm>	 PROBLEM - Unmerged changes on repository puppet on labtestpuppetmaster2001 is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). https://wikitech.wikimedia.org/wiki/Monitoring/unmerged_changes
[20:50:20] <wikibugs>	 10Operations, 10Discovery, 10Traffic, 10Wikidata, 10Wikidata-Query-Service: LDF server has 404 errors for JS and CSS resources - https://phabricator.wikimedia.org/T237165 (10Mstyles)
[20:50:23] <logmsgbot>	 !log jhuneidi@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.14  refs T233862
[20:50:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:50:26] <stashbot>	 T233862: 1.35.0-wmf.14 deployment blockers - https://phabricator.wikimedia.org/T233862
[20:50:35] <wikibugs>	 (03PS1) 10Joal: Update turnilo configuration [puppet] - 10https://gerrit.wikimedia.org/r/562958 (https://phabricator.wikimedia.org/T240681)
[20:51:28] <logmsgbot>	 !log jhuneidi@deploy1001 Synchronized php: group1 wikis to 1.35.0-wmf.14  refs T233862 (duration: 01m 04s)
[20:51:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:52:55] <icinga-wm>	 PROBLEM - Unmerged changes on repository puppet on puppetmaster1001 is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). https://wikitech.wikimedia.org/wiki/Monitoring/unmerged_changes
[20:54:48] <cdanis>	 mutante: everything ok?
[20:56:28] <wikibugs>	 (03PS1) 10Ottomata: eventgate - Bump staging services image version [deployment-charts] - 10https://gerrit.wikimedia.org/r/562962 (https://phabricator.wikimedia.org/T240985)
[20:56:50] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] eventgate - Bump staging services image version [deployment-charts] - 10https://gerrit.wikimedia.org/r/562962 (https://phabricator.wikimedia.org/T240985) (owner: 10Ottomata)
[20:56:57] <James_F>	 longma: So far LGTM.
[20:57:32] <longma>	 👍
[20:57:40] <wikibugs>	 (03CR) 10Ottomata: "recheck" [deployment-charts] - 10https://gerrit.wikimedia.org/r/562962 (https://phabricator.wikimedia.org/T240985) (owner: 10Ottomata)
[20:58:19] <icinga-wm>	 RECOVERY - Unmerged changes on repository puppet on puppetmaster1001 is OK: No changes to merge. https://wikitech.wikimedia.org/wiki/Monitoring/unmerged_changes
[20:58:28] <mutante>	 cdanis: oh.. yes, all ok. merged now
[20:58:39] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] eventgate - Bump staging services image version [deployment-charts] - 10https://gerrit.wikimedia.org/r/562962 (https://phabricator.wikimedia.org/T240985) (owner: 10Ottomata)
[21:00:01] <icinga-wm>	 RECOVERY - Unmerged changes on repository puppet on labtestpuppetmaster2001 is OK: No changes to merge. https://wikitech.wikimedia.org/wiki/Monitoring/unmerged_changes
[21:00:04] <jouncebot>	 cscott, arlolra, subbu, halfak, and accraze: Your horoscope predicts another unfortunate Services – Graphoid / Parsoid / Citoid / ORES deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200108T2100).
[21:00:25] <logmsgbot>	 !log otto@deploy1001 helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
[21:00:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:01:40] <halfak>	 deploying ORES
[21:02:28] <wikibugs>	 (03CR) 10Umherirrender: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/561649 (owner: 10Bartosz Dziewoński)
[21:03:08] <logmsgbot>	 !log otto@deploy1001 helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
[21:03:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:04:47] <logmsgbot>	 !log halfak@deploy1001 Started deploy [ores/deploy@039251f]: T242035
[21:04:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:04:50] <stashbot>	 T242035: First deployment of the new decade! - https://phabricator.wikimedia.org/T242035
[21:07:20] <logmsgbot>	 !log otto@deploy1001 helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
[21:07:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:10:50] <halfak>	 Canary looks good.  Continuing
[21:12:27] <wikibugs>	 10Operations, 10ops-codfw, 10fundraising-tech-ops: rack/setup/install frlog2001.frack.codfw.wmnet - https://phabricator.wikimedia.org/T242265 (10Jgreen)
[21:20:18] <wikibugs>	 (03PS1) 10Dzahn: ferm_misc/db: allow connections from gerrit-test in ferm [puppet] - 10https://gerrit.wikimedia.org/r/562965 (https://phabricator.wikimedia.org/T239151)
[21:21:17] <logmsgbot>	 !log halfak@deploy1001 Finished deploy [ores/deploy@039251f]: T242035 (duration: 16m 32s)
[21:21:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:21:20] <stashbot>	 T242035: First deployment of the new decade! - https://phabricator.wikimedia.org/T242035
[21:23:01] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1100 - https://phabricator.wikimedia.org/T241506 (10Jclark-ctr) Drive was ordered  should arrive shortly will update when it arrives
[21:23:14] <halfak>	 Everything looks good. 
[21:23:58] <wikibugs>	 (03CR) 10Dzahn: "Hi Manuel, so we would like to let gerrit-test connect to the Gerrit DB (m2-master / dbproxy1007) but ideally we don't want it to have UPD" [puppet] - 10https://gerrit.wikimedia.org/r/562965 (https://phabricator.wikimedia.org/T239151) (owner: 10Dzahn)
[21:26:48] <wikibugs>	 10Operations, 10ops-codfw, 10fundraising-tech-ops: rack/setup/install frlog2001.frack.codfw.wmnet - https://phabricator.wikimedia.org/T242265 (10Dzahn) p:05Triage→03Normal
[21:28:30] <logmsgbot>	 !log jhuneidi@deploy1001 rebuilt and synchronized wikiversions files: Revert "commonswiki to 1.35.0-wmf.11"
[21:28:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:29:16] <logmsgbot>	 !log dzahn@cumin1001 START - Cookbook sre.hosts.decommission
[21:29:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:29:39] <James_F>	 longma: Thanks!
[21:29:43] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): cloudvirt1016 crash - https://phabricator.wikimedia.org/T241882 (10Jclark-ctr) Confirmed: Service Request 1009577756 was successfully submitted.
[21:30:32] <logmsgbot>	 !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
[21:30:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:30:55] <mutante>	 !log phab1003 - running decom cookbook - shutdown host, removed from puppetmaster, debmonitor etc (T238957)
[21:30:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:30:58] <stashbot>	 T238957: decommission phab1003.eqiad.wmnet - https://phabricator.wikimedia.org/T238957
[21:31:46] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): cloudvirt1016 crash - https://phabricator.wikimedia.org/T241882 (10Jclark-ctr) Confirmed: Service Request 1009577756 was successfully submitted.
[21:35:03] <arlolra>	 halfak: all done?
[21:35:11] <wikibugs>	 (03PS2) 10Dzahn: remove service IPs and IPv6 for phab1003 [dns] - 10https://gerrit.wikimedia.org/r/552599 (https://phabricator.wikimedia.org/T238957)
[21:35:12] <halfak>	 yes :) 
[21:35:22] <halfak>	 Sorry I wasn't clear 
[21:35:46] <arlolra>	 all good
[21:36:03] <arlolra>	 just being cautious
[21:36:04] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "host has been shut down" [dns] - 10https://gerrit.wikimedia.org/r/552599 (https://phabricator.wikimedia.org/T238957) (owner: 10Dzahn)
[21:37:11] <wikibugs>	 10Operations, 10ops-codfw: (Need By: Jan 15) codfw: rack/setup/install  mc-gp200[123].codfw.wmnet - https://phabricator.wikimedia.org/T239249 (10Papaul)
[21:38:03] <wikibugs>	 (03CR) 10Dzahn: [C: 03+1] "This host has been shut down today (by the decom script)" [puppet] - 10https://gerrit.wikimedia.org/r/552607 (https://phabricator.wikimedia.org/T238957) (owner: 10Dzahn)
[21:38:18] <logmsgbot>	 !log arlolra@deploy1001 Started deploy [parsoid/deploy@45a4245]: Updating Parsoid to f963e51
[21:38:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:41:05] <wikibugs>	 10Operations, 10ops-codfw, 10DBA: (Needed By 31st January) codfw: rack/setup/install  es202[0-5].codfw.wmnet - https://phabricator.wikimedia.org/T241336 (10Papaul)
[21:46:17] <logmsgbot>	 !log arlolra@deploy1001 Finished deploy [parsoid/deploy@45a4245]: Updating Parsoid to f963e51 (duration: 08m 00s)
[21:46:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:49:01] <wikibugs>	 (03CR) 10EBernhardson: [C: 04-1] "not needed anymore, we ended up getting it working to talk to an-coord1001 sql" [puppet] - 10https://gerrit.wikimedia.org/r/554215 (https://phabricator.wikimedia.org/T236180) (owner: 10Dzahn)
[21:49:15] <wikibugs>	 (03PS2) 10EBernhardson: airflow: remove config settings for Celery Executor and Flower [puppet] - 10https://gerrit.wikimedia.org/r/553413 (owner: 10Dzahn)
[21:49:33] <wikibugs>	 (03CR) 10EBernhardson: [C: 03+1] "seems reasonable to reduce confusion" [puppet] - 10https://gerrit.wikimedia.org/r/553413 (owner: 10Dzahn)
[21:50:15] <wikibugs>	 (03Abandoned) 10Dzahn: airflow: add a local mariadb server [puppet] - 10https://gerrit.wikimedia.org/r/554215 (https://phabricator.wikimedia.org/T236180) (owner: 10Dzahn)
[21:51:09] <wikibugs>	 10Operations, 10ops-codfw, 10fundraising-tech-ops: rack/setup/install frlog2001.frack.codfw.wmnet - https://phabricator.wikimedia.org/T242265 (10Jgreen)
[21:51:34] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] airflow: remove config settings for Celery Executor and Flower [puppet] - 10https://gerrit.wikimedia.org/r/553413 (owner: 10Dzahn)
[21:55:58] <arlolra>	 !log Updated Parsoid to f963e51 (T238934, T237318, T238022, T228217)
[21:56:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:56:14] <stashbot>	 T238934: Call to a member function getContent() on null - https://phabricator.wikimedia.org/T238934
[21:56:15] <stashbot>	 T237318: Invariant failed: Bad UTF-8 at end of string (2 byte sequence) - https://phabricator.wikimedia.org/T237318
[21:56:15] <stashbot>	 T228217: Ensure all the features of parse.js are covered by parse.php - https://phabricator.wikimedia.org/T228217
[21:56:15] <stashbot>	 T238022: Parsoid/JS use of \w \s \b etc is inconsistent with PHP's behavior when the 'u' regexp modifier is used, which leads to selective serializer output differences between Parsoid/PHP & Parsoid/JS in some scenarios - https://phabricator.wikimedia.org/T238022
[21:58:54] <wikibugs>	 (03PS6) 10Dzahn: gerrit: make scap user configurable in Hiera [puppet] - 10https://gerrit.wikimedia.org/r/536704
[22:00:11] <wikibugs>	 (03PS2) 10CDanis: fastnetmon: remove UDP and ICMP limits [puppet] - 10https://gerrit.wikimedia.org/r/562387 (https://phabricator.wikimedia.org/T241374)
[22:00:34] <wikibugs>	 (03PS7) 10Dzahn: gerrit: make scap user configurable in Hiera [puppet] - 10https://gerrit.wikimedia.org/r/536704
[22:01:25] <icinga-wm>	 PROBLEM - PHP opcache health on wtp1027 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health
[22:01:52] <wikibugs>	 (03PS1) 10Jforrester: Revert commonswiki to 1.35.0-wmf.11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562973
[22:02:05] <wikibugs>	 (03CR) 10Dzahn: "amended to keep the "user and key name can be changed in Hiera" while removing the "user/group creation"-part of it. That probably needs t" [puppet] - 10https://gerrit.wikimedia.org/r/536704 (owner: 10Dzahn)
[22:03:36] <wikibugs>	 (03CR) 10Dzahn: [C: 04-1] "https://puppet-compiler.wmflabs.org/compiler1001/20282/gerrit1001.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/536704 (owner: 10Dzahn)
[22:03:56] <wikibugs>	 (03CR) 10Jeena Huneidi: [C: 03+2] Revert commonswiki to 1.35.0-wmf.11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562973 (owner: 10Jforrester)
[22:04:13] <icinga-wm>	 RECOVERY - PHP opcache health on wtp1027 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health
[22:04:38] <wikibugs>	 (03CR) 10CDanis: fastnetmon: remove UDP and ICMP limits (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/562387 (https://phabricator.wikimedia.org/T241374) (owner: 10CDanis)
[22:05:09] <wikibugs>	 (03Merged) 10jenkins-bot: Revert commonswiki to 1.35.0-wmf.11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562973 (owner: 10Jforrester)
[22:08:17] <wikibugs>	 (03PS8) 10Dzahn: gerrit: make scap user configurable in Hiera [puppet] - 10https://gerrit.wikimedia.org/r/536704
[22:10:38] <wikibugs>	 (03CR) 10Dzahn: [C: 03+1] "https://puppet-compiler.wmflabs.org/compiler1002/20283/gerrit1001.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/536704 (owner: 10Dzahn)
[22:11:40] <wikibugs>	 (03CR) 10Paladox: gerrit: make scap user configurable in Hiera (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/536704 (owner: 10Dzahn)
[22:18:13] <wikibugs>	 (03CR) 10Dzahn: [C: 03+1] gerrit: make scap user configurable in Hiera (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/536704 (owner: 10Dzahn)
[22:18:22] <wikibugs>	 (03PS9) 10Dzahn: gerrit: make scap user configurable in Hiera [puppet] - 10https://gerrit.wikimedia.org/r/536704
[22:19:44] <wikibugs>	 (03CR) 10Paladox: [C: 03+1] gerrit: make scap user configurable in Hiera [puppet] - 10https://gerrit.wikimedia.org/r/536704 (owner: 10Dzahn)
[22:22:43] <wikibugs>	 10Operations, 10ops-codfw, 10DBA: (Needed By 31st January) codfw: rack/setup/install  es202[0-5].codfw.wmnet - https://phabricator.wikimedia.org/T241336 (10Papaul)
[22:23:01] <icinga-wm>	 PROBLEM - BFD status on cr2-eqdfw is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[22:25:01] <mutante>	 looking what interface that is
[22:25:17] <mutante>	 BFD neighbor fe80::7a4f:9b00:174e:8004 down
[22:25:47] <wikibugs>	 10Operations, 10Discovery, 10Traffic, 10Wikidata, 10Wikidata-Query-Service: LDF server has 404 errors for JS and CSS resources - https://phabricator.wikimedia.org/T237165 (10Mstyles) from inside any of the WDQS machines ( 'wdqs1004.eqiad.wmnet','wdqs1005.eqiad.wmnet', 'wdqs1006.eqiad.wmnet','wdqs1007.eqi...
[22:26:15] <mutante>	 well, i can't follow the docs after that. dont have access
[22:26:38] <James_F>	 mutante: Zuul seems to have frozen again, BTW.
[22:29:23] <wikibugs>	 10Operations, 10ops-codfw, 10fundraising-tech-ops: rack/setup/install frlog2001.frack.codfw.wmnet - https://phabricator.wikimedia.org/T242265 (10wiki_willy) a:03Papaul
[22:33:35] <mutante>	 James_F: logs say it's doing something (now)
[22:34:11] <James_F>	 Yeah, but new gerrit events aren't coming in? Zuul dashboard is very quiet.
[22:34:29] <wikibugs>	 (03PS1) 10Paladox: Gerrit: Remove nocanon from apache template [puppet] - 10https://gerrit.wikimedia.org/r/562977
[22:34:31] <wikibugs>	 (03PS2) 10Paladox: Gerrit: Remove nocanon from apache template [puppet] - 10https://gerrit.wikimedia.org/r/562977
[22:37:42] <wikibugs>	 (03PS1) 10Jhedden: ceph: add prometheus scrape config [puppet] - 10https://gerrit.wikimedia.org/r/562979 (https://phabricator.wikimedia.org/T240715)
[22:38:18] <wikibugs>	 (03Abandoned) 10Jhedden: lvs: update cloudceph proxy check url [puppet] - 10https://gerrit.wikimedia.org/r/562637 (https://phabricator.wikimedia.org/T240715) (owner: 10Jhedden)
[22:38:20] <wikibugs>	 (03Abandoned) 10Paladox: Gerrit: Remove nocanon from apache template [puppet] - 10https://gerrit.wikimedia.org/r/562977 (owner: 10Paladox)
[22:39:39] <mutante>	 https://grafana.wikimedia.org/d/000000322/zuul-gearman?panelId=10&fullscreen&orgId=1
[22:39:48] <mutante>	 !log restarted zuul on contint1001
[22:39:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:41:34] <wikibugs>	 (03PS2) 10Jhedden: ceph: add prometheus scrape config [puppet] - 10https://gerrit.wikimedia.org/r/562979 (https://phabricator.wikimedia.org/T240715)
[22:42:19] <wikibugs>	 (03PS3) 10Jhedden: ceph: add prometheus scrape config [puppet] - 10https://gerrit.wikimedia.org/r/562979 (https://phabricator.wikimedia.org/T240715)
[22:46:32] <wikibugs>	 (03PS4) 10Jhedden: ceph: add prometheus scrape config [puppet] - 10https://gerrit.wikimedia.org/r/562979 (https://phabricator.wikimedia.org/T240715)
[22:47:40] <wikibugs>	 (03PS5) 10Jhedden: ceph: add prometheus scrape config [puppet] - 10https://gerrit.wikimedia.org/r/562979 (https://phabricator.wikimedia.org/T240715)
[22:50:20] <wikibugs>	 (03PS3) 10Jdlrobson: Drop beta setting. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562607 (https://phabricator.wikimedia.org/T237290)
[22:51:10] <wikibugs>	 (03CR) 10Jforrester: "Please re-fix the commit message before merging." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562607 (https://phabricator.wikimedia.org/T237290) (owner: 10Jdlrobson)
[22:51:21] <wikibugs>	 (03CR) 10Dzahn: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/536704 (owner: 10Dzahn)
[22:54:07] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] gerrit: make scap user configurable in Hiera [puppet] - 10https://gerrit.wikimedia.org/r/536704 (owner: 10Dzahn)
[22:55:52] <wikibugs>	 (03PS6) 10Jhedden: ceph: add prometheus scrape config [puppet] - 10https://gerrit.wikimedia.org/r/562979 (https://phabricator.wikimedia.org/T240715)
[22:56:20] <wikibugs>	 (03PS7) 10Jhedden: ceph: add prometheus scrape config [puppet] - 10https://gerrit.wikimedia.org/r/562979 (https://phabricator.wikimedia.org/T240715)
[23:00:10] <wikibugs>	 (03PS1) 10Dzahn: admins: add Moushira Elamrawy to ldap_only_admins (WMF-ctr) [puppet] - 10https://gerrit.wikimedia.org/r/562981 (https://phabricator.wikimedia.org/T242000)
[23:08:59] <mutante>	 !log LDAP - added moushirael to 'wmf' (T242000)
[23:09:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:09:02] <stashbot>	 T242000: Allow LDAP access to superset dashboards for Moushira Elamrawy - https://phabricator.wikimedia.org/T242000
[23:10:14] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10Patch-For-Review: Allow LDAP access to superset dashboards for Moushira Elamrawy - https://phabricator.wikimedia.org/T242000 (10Dzahn) 05Open→03Resolved Hi @Moushira you have been added to the "wmf" group with the Moushirael user. I confirm it has the wikimedia.org...
[23:12:08] <wikibugs>	 (03CR) 10Dzahn: "We are not using the old "moushira" user which is absented and was the wmf employee user, we are using the separate user with the contract" [puppet] - 10https://gerrit.wikimedia.org/r/562981 (https://phabricator.wikimedia.org/T242000) (owner: 10Dzahn)
[23:14:56] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10Patch-For-Review: Allow LDAP access to superset dashboards for Moushira Elamrawy - https://phabricator.wikimedia.org/T242000 (10Dzahn) @Moushira @MeganHernandez_WMF  Just one more question. Contractor access usually has an associated "expiry date".  Is there a date whe...
[23:15:28] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10Patch-For-Review: Allow LDAP access to superset dashboards for Moushira Elamrawy - https://phabricator.wikimedia.org/T242000 (10Dzahn) 05Resolved→03Open
[23:15:55] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] admins: add Moushira Elamrawy to ldap_only_admins (WMF-ctr) [puppet] - 10https://gerrit.wikimedia.org/r/562981 (https://phabricator.wikimedia.org/T242000) (owner: 10Dzahn)
[23:17:53] <wikibugs>	 10Operations, 10Puppet, 10Patch-For-Review, 10User-jbond: Clean up SSL configuration - https://phabricator.wikimedia.org/T240941 (10Dzahn)
[23:18:20] <mutante>	 clever nickname halafk :)
[23:18:33] <halAFK>	 ^_^ 
[23:19:16] <wikibugs>	 (03PS2) 10Dzahn: remove production IPs for phab1003 [dns] - 10https://gerrit.wikimedia.org/r/552601 (https://phabricator.wikimedia.org/T238957)
[23:19:56] <wikibugs>	 (03CR) 10Dzahn: [C: 04-1] "hmm..i'll wait with this until we removed the IP from mysql GRANTS.. before something else recycles them" [dns] - 10https://gerrit.wikimedia.org/r/552601 (https://phabricator.wikimedia.org/T238957) (owner: 10Dzahn)
[23:25:58] <wikibugs>	 (03CR) 10Jhedden: "PCC results https://puppet-compiler.wmflabs.org/compiler1003/20284/" [puppet] - 10https://gerrit.wikimedia.org/r/562979 (https://phabricator.wikimedia.org/T240715) (owner: 10Jhedden)
[23:30:32] <wikibugs>	 (03PS7) 10Paladox: Gerrit: Rename ssh_host_key to ssh_host_rsa_key [puppet] - 10https://gerrit.wikimedia.org/r/556265
[23:30:34] <wikibugs>	 (03PS8) 10Paladox: Gerrit: Add ed25519 and ecdsa ssh host keys [puppet] - 10https://gerrit.wikimedia.org/r/556270
[23:31:52] <wikibugs>	 (03PS9) 10Paladox: Gerrit: Add ed25519 and ecdsa ssh host keys [puppet] - 10https://gerrit.wikimedia.org/r/556270
[23:31:59] <wikibugs>	 (03CR) 10Paladox: Gerrit: Add ed25519 and ecdsa ssh host keys (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/556270 (owner: 10Paladox)
[23:32:40] <James_F>	 Grabbing the prod conch.
[23:34:19] <wikibugs>	 (03Abandoned) 10Paladox: Gerrit: Rename ssh_host_key to ssh_host_rsa_key [labs/private] - 10https://gerrit.wikimedia.org/r/556268 (owner: 10Paladox)
[23:34:38] <wikibugs>	 (03PS8) 10Paladox: Gerrit: Rename ssh_host_key to ssh_host_rsa_key [puppet] - 10https://gerrit.wikimedia.org/r/556265
[23:34:57] <logmsgbot>	 !log jforrester@deploy1001 Synchronized php-1.35.0-wmf.14/extensions/WikibaseMediaInfo/resources/statements/StatementWidget.js: T242286 Update StatementWidget initialization logic (duration: 01m 05s)
[23:34:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:35:00] <stashbot>	 T242286: Unable to to add Structured Data to files that don't have any; "mainSnak.getValue is not a function" thrown in console - https://phabricator.wikimedia.org/T242286
[23:35:08] <wikibugs>	 (03PS10) 10Paladox: Gerrit: Add ed25519 and ecdsa ssh host keys [puppet] - 10https://gerrit.wikimedia.org/r/556270
[23:35:15] <James_F>	 longma: wmf.14 now fixed for Commons, if you want to roll the train forwards there?
[23:35:26] <longma>	 okay
[23:36:58] <wikibugs>	 (03PS1) 10Catrope: GrowthExperiments: Set newcomer tasks config title ahead of deployment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562987 (https://phabricator.wikimedia.org/T233465)
[23:42:38] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10Patch-For-Review: Allow LDAP access to superset dashboards for Moushira Elamrawy - https://phabricator.wikimedia.org/T242000 (10Moushira) Thanks @Dzahn, yes it works now.  I am in the process of contract extension, not sure about the new expiry dateyet, and yes Megan i...
[23:44:13] <logmsgbot>	 !log jhuneidi@deploy1001 rebuilt and synchronized wikiversions files: Roll commonswiki forward to 1.35.0-wmf.14
[23:44:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:46:23] <wikibugs>	 (03PS1) 10Jeena Huneidi: Roll commonswiki forward to 1.35.0-wmf.14 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562989
[23:46:44] <wikibugs>	 (03CR) 10Jeena Huneidi: [C: 03+2] Roll commonswiki forward to 1.35.0-wmf.14 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562989 (owner: 10Jeena Huneidi)
[23:47:34] <wikibugs>	 (03Merged) 10jenkins-bot: Roll commonswiki forward to 1.35.0-wmf.14 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562989 (owner: 10Jeena Huneidi)
[23:57:29] <icinga-wm>	 PROBLEM - Memory correctable errors -EDAC- on mw1239 is CRITICAL: 4.001 ge 4 https://wikitech.wikimedia.org/wiki/Monitoring/Memory%23Memory_correctable_errors_-EDAC- https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=mw1239&var-datasource=eqiad+prometheus/ops