[00:08:15] <wikibugs>	 10Operations, 10Dumps-Generation, 10SDC General, 10Wikidata: Capacity planning for Commons Structured Data - https://phabricator.wikimedia.org/T226093 (10Addshore) So there are some details on https://wikitech.wikimedia.org/wiki/WMDE/Wikidata/Growth, but I havn't written too much about media info yet.  Det...
[00:09:42] <Reedy>	 "we'll need capacity"
[00:45:40] <RoanKattouw>	 !log Running FlowReserializeRevisionContent.php on testwiki
[00:45:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:31:34] <wikibugs>	 10Operations, 10DC-Ops, 10Traffic: poll power data for redeployment of esams/knams - https://phabricator.wikimedia.org/T225720 (10RobH) Ok, as it is now peak hours (according to @bblack) for eqiad, I'm re-pulling all the power data now.  Please note that I'll update the task description AFTER this post (and...
[03:23:53] <wikibugs>	 (03PS1) 10Bmansurov: Labs: enable QuickSurveys on hewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518186 (https://phabricator.wikimedia.org/T225819)
[03:26:26] <bmansurov>	 Hi, can anyone please deploy a tiny labs config for me? I'd appreciate it: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/518186
[04:26:29] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at esams on icinga1001 is CRITICAL: cluster=cache_text site=esams https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[04:30:47] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at esams on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[04:35:26] <wikibugs>	 (03PS1) 10ArielGlenn: svwiki officially 'big', 6 dumps jobs in parallel like the others [puppet] - 10https://gerrit.wikimedia.org/r/518189 (https://phabricator.wikimedia.org/T226200)
[04:57:16] <wikibugs>	 10Operations, 10DBA: db2084 temporary correctable hardware errors - https://phabricator.wikimedia.org/T225884 (10Marostegui) 05Open→03Resolved And it finally cleared up ` 23:38:30 <+icinga-wm> RECOVERY - EDAC syslog messages on db2084 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dash...
[04:57:52] <wikibugs>	 10Operations, 10Performance-Team, 10Traffic, 10Performance: Sometimes pages load slowly for European users (due to some factor outside of Wikimedia cluster) - https://phabricator.wikimedia.org/T226048 (10ema) >>! In T226048#5270181, @CDanis wrote: > My guess is that the beginning of this problem correlates...
[05:00:26] <wikibugs>	 10Operations, 10DBA: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['dbproxy1018.eqiad.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/201906...
[05:02:11] <wikibugs>	 (03PS1) 10Marostegui: db-codfw.php: Pool db2051 into s2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518193 (https://phabricator.wikimedia.org/T221533)
[05:05:11] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db-codfw.php: Pool db2051 into s2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518193 (https://phabricator.wikimedia.org/T221533) (owner: 10Marostegui)
[05:06:05] <wikibugs>	 (03Merged) 10jenkins-bot: db-codfw.php: Pool db2051 into s2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518193 (https://phabricator.wikimedia.org/T221533) (owner: 10Marostegui)
[05:06:32] <wikibugs>	 (03CR) 10jenkins-bot: db-codfw.php: Pool db2051 into s2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518193 (https://phabricator.wikimedia.org/T221533) (owner: 10Marostegui)
[05:07:42] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Pool db2051 into s2 to replace db2035 as a master (duration: 01m 00s)
[05:07:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:10:05] <wikibugs>	 (03CR) 10Elukey: "> Hm, I'm trying to remember why we made the cdh:exec be in the cdh" [puppet/cdh] - 10https://gerrit.wikimedia.org/r/518097 (https://phabricator.wikimedia.org/T212259) (owner: 10Elukey)
[05:26:12] <wikibugs>	 10Operations, 10DBA: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['dbproxy1020.eqiad.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/201906...
[05:34:13] <wikibugs>	 10Operations, 10DBA: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['dbproxy1019.eqiad.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/201906...
[05:34:37] <wikibugs>	 10Operations, 10DBA: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['dbproxy1021.eqiad.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/201906...
[05:35:16] <wikibugs>	 10Operations, 10Performance-Team, 10Traffic, 10Performance: Sometimes pages load slowly for European users (due to some factor outside of Wikimedia cluster) - https://phabricator.wikimedia.org/T226048 (10Wurgl) @ema It was not a single timeout. A chunk of data (the start of the page) was rendered by the br...
[05:36:15] <wikibugs>	 (03CR) 10Elukey: "Sorry for the delay! I like the cookbook, but I have to admit that I have rarely used repair in the past with the AQS cassandra cluster. I" (033 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/517377 (https://phabricator.wikimedia.org/T225694) (owner: 10Mathew.onipe)
[05:38:18] <wikibugs>	 10Operations, 10SRE-Access-Requests: Request access to analytics cluster for Alaa Sarhan - https://phabricator.wikimedia.org/T223697 (10jijiki) @alaa_wmde If turnilo is enough for analysis, should we mark this as resolved?
[05:41:06] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: access for foks to labweb (in one way or another) (or make changePassword.php work on mwmaint hosts) - https://phabricator.wikimedia.org/T220860 (10jijiki) 05Stalled→03Resolved @andrew @jrbs This looks like resolved, please ping if it is not :)
[05:53:24] <wikibugs>	 (03PS1) 10Marostegui: install_server: Add MAC address for dbproxy1018 [puppet] - 10https://gerrit.wikimedia.org/r/518197 (https://phabricator.wikimedia.org/T225704)
[05:55:00] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] install_server: Add MAC address for dbproxy1018 [puppet] - 10https://gerrit.wikimedia.org/r/518197 (https://phabricator.wikimedia.org/T225704) (owner: 10Marostegui)
[06:00:43] <wikibugs>	 10Operations, 10DBA, 10Patch-For-Review: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['dbproxy1018.eqiad.wmnet'] `  Of which those **FAILED**: ` ['dbproxy1018.eqiad.wmnet'] `
[06:05:34] <wikibugs>	 (03CR) 10Muehlenhoff: "cdh::exec was added to the cdh submodule as necessary base infrastructure so that anyone using the submodule can use Hadoop with Kerberos " [puppet/cdh] - 10https://gerrit.wikimedia.org/r/518097 (https://phabricator.wikimedia.org/T212259) (owner: 10Elukey)
[06:17:03] <wikibugs>	 (03PS1) 10Elukey: hadoop: lower down the min.user.id's yarn config [puppet] - 10https://gerrit.wikimedia.org/r/518200
[06:17:35] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] hadoop: lower down the min.user.id's yarn config [puppet] - 10https://gerrit.wikimedia.org/r/518200 (owner: 10Elukey)
[06:19:41] <wikibugs>	 (03PS1) 10Marostegui: install_server: Add MAC for dbproxies [puppet] - 10https://gerrit.wikimedia.org/r/518203 (https://phabricator.wikimedia.org/T225704)
[06:20:12] <wikibugs>	 (03PS2) 10Marostegui: install_server: Add MAC for dbproxies [puppet] - 10https://gerrit.wikimedia.org/r/518203 (https://phabricator.wikimedia.org/T225704)
[06:21:04] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] install_server: Add MAC for dbproxies [puppet] - 10https://gerrit.wikimedia.org/r/518203 (https://phabricator.wikimedia.org/T225704) (owner: 10Marostegui)
[06:23:26] <wikibugs>	 10Operations, 10DBA, 10Patch-For-Review: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['dbproxy1018.eqiad.wmnet'] ` The log can be found in `/var/log/w...
[06:26:33] <wikibugs>	 10Operations, 10DBA, 10Patch-For-Review: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['dbproxy1020.eqiad.wmnet'] `  Of which those **FAILED**: ` ['dbproxy1020.eqiad.wmnet'] `
[06:30:21] <icinga-wm>	 PROBLEM - puppet last run on mw1307 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/profile.d/bash_autologout.sh]
[06:30:25] <wikibugs>	 10Operations, 10DBA, 10Patch-For-Review: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['dbproxy1020.eqiad.wmnet'] ` The log can be found in `/var/log/w...
[06:30:57] <icinga-wm>	 PROBLEM - puppet last run on mw1278 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/ImageMagick-6/policy.xml]
[06:32:35] <wikibugs>	 10Operations, 10DNS, 10Matrix, 10Traffic, and 2 others: Configure wikimedia.org to enable *:wikimedia.org Matrix user IDs - https://phabricator.wikimedia.org/T223835 (10Tgr) https://wikimedia.org/.well-known/matrix/server works corrently. https://wikimedia.org/.well-known/matrix/client is loaded via AJAX a...
[06:33:25] <icinga-wm>	 PROBLEM - puppet last run on mw1308 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/rsyslog.d/20-mcrouter.conf]
[06:34:37] <wikibugs>	 10Operations, 10DBA, 10Patch-For-Review: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['dbproxy1019.eqiad.wmnet'] `  Of which those **FAILED**: ` ['dbproxy1019.eqiad.wmnet'] `
[06:35:05] <wikibugs>	 10Operations, 10DBA, 10Patch-For-Review: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['dbproxy1021.eqiad.wmnet'] `  Of which those **FAILED**: ` ['dbproxy1021.eqiad.wmnet'] `
[06:40:18] <wikibugs>	 10Operations, 10DBA, 10Patch-For-Review: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['dbproxy1021.eqiad.wmnet'] ` The log can be found in `/var/log/w...
[06:44:21] <moritzm>	 !log installed python-opencv on stat1005 (T220811)
[06:44:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:44:26] <stashbot>	 T220811: Test Thumbor OpenCL smart cropping on stat1005 - https://phabricator.wikimedia.org/T220811
[06:45:01] <gilles>	 moritzm: thanks!
[06:47:07] <wikibugs>	 10Operations, 10DBA, 10Patch-For-Review: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['dbproxy1020.eqiad.wmnet'] `  and were **ALL** successful.
[06:47:44] <wikibugs>	 (03PS1) 10Gergő Tisza: Add permissive CORS headers for wikimedia.org/.well-known/matrix [puppet] - 10https://gerrit.wikimedia.org/r/518209 (https://phabricator.wikimedia.org/T223835)
[06:48:34] <wikibugs>	 10Operations, 10Wikibase-Containers, 10Wikidata, 10serviceops, and 2 others: Create a wmf production ready nginx image - https://phabricator.wikimedia.org/T209292 (10hashar) 05Open→03Declined This was for #serviceops!  But I decline the task based on @Ladsgroup comment which it would be better to have...
[06:48:41] <wikibugs>	 10Operations, 10DBA, 10Patch-For-Review: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10Marostegui)
[06:54:44] <moritzm>	 !log installed radeontop on stat1005 to diagnose GPU usage (T220811)
[06:54:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:54:49] <stashbot>	 T220811: Test Thumbor OpenCL smart cropping on stat1005 - https://phabricator.wikimedia.org/T220811
[06:56:18] <icinga-wm>	 RECOVERY - puppet last run on mw1307 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures
[06:56:56] <icinga-wm>	 RECOVERY - puppet last run on mw1278 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:58:34] <wikibugs>	 10Operations, 10DBA, 10Patch-For-Review: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['dbproxy1021.eqiad.wmnet'] `  and were **ALL** successful.
[06:59:12] <icinga-wm>	 RECOVERY - puppet last run on mw1308 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[06:59:34] <wikibugs>	 10Operations, 10serviceops, 10PHP 7.2 support, 10Patch-For-Review: Socket Errors on PHP7 - https://phabricator.wikimedia.org/T224538 (10Joe) a:03jijiki
[07:03:45] <wikibugs>	 (03PS1) 10Muehlenhoff: Allow gpu-testers to run radeontop [puppet] - 10https://gerrit.wikimedia.org/r/518210 (https://phabricator.wikimedia.org/T220811)
[07:07:41] <wikibugs>	 10Operations, 10MediaWiki-General-or-Unknown, 10serviceops, 10Core Platform Team (PHP7 (TEC4)), and 4 others: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10Joe) a:05Joe→03None
[07:10:26] <icinga-wm>	 PROBLEM - Check the Netbox report-s- puppetdb for fail status. on netmon1002 is CRITICAL: puppetdb.PuppetDB CRITICAL https://wikitech.wikimedia.org/wiki/Netbox%23Reports
[07:23:34] <wikibugs>	 (03CR) 10DCausse: "not entirely sure but I wonder if you should not split this patch in 2 steps so that you ship InitialiseSettings.php first then ship the w" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517871 (https://phabricator.wikimedia.org/T222268) (owner: 10Ottomata)
[07:23:44] <wikibugs>	 10Operations, 10DBA: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['dbproxy1018.eqiad.wmnet'] `  Of which those **FAILED**: ` ['dbproxy1018.eqiad.wmnet'] `
[07:24:19] <moritzm>	 !log installing python-thumbor-wikimedia, python-opencv on stat1006
[07:24:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:29:56] <wikibugs>	 10Operations, 10DBA: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10Marostegui)
[07:30:16] <wikibugs>	 10Operations, 10DBA: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10Marostegui) a:05Marostegui→03Cmjohnson @Cmjohnson @ayounsi is there anything special with dbproxy1018 and dbproxy1019 VLAN's and PXE? None of the seems to be booting up from PXE, despite that...
[07:35:36] <wikibugs>	 10Operations, 10serviceops, 10HHVM, 10Performance-Team (Radar), 10User-Marostegui: Increased instability in MediaWiki backends (according to load balancers) - https://phabricator.wikimedia.org/T223952 (10Joe) 05Open→03Resolved I've looked back in the last week or so and we don't see those kind of ins...
[07:35:54] <wikibugs>	 (03CR) 10DCausse: "same here I'd suggest to split this patch into multiple steps that are all valid. I think that the deployement order is non-trivial enough" (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517874 (https://phabricator.wikimedia.org/T222268) (owner: 10Ottomata)
[07:36:35] <icinga-wm>	 RECOVERY - Check the Netbox report-s- puppetdb for fail status. on netmon1002 is OK: puppetdb.PuppetDB OK https://wikitech.wikimedia.org/wiki/Netbox%23Reports
[07:38:06] <logmsgbot>	 !log akosiaris@puppetmaster1001 conftool action : set/pooled=yes; selector: dc=codfw,cluster=kubernetes,service=eventgate-main,name=kubernetes2001.codfw.wmnet
[07:38:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:39:03] <logmsgbot>	 !log akosiaris@puppetmaster1001 conftool action : set/pooled=yes; selector: dc=codfw,cluster=kubernetes,service=eventgate-analytics,name=kubernetes2001.codfw.wmnet
[07:39:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:53:44] <wikibugs>	 10Operations, 10DBA: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['dbproxy1018.eqiad.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/201906...
[07:58:00] <wikibugs>	 (03PS1) 10Ema: Revert "cache::upload: temporarily prevent abuses" [puppet] - 10https://gerrit.wikimedia.org/r/518215
[07:58:20] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Revert "cache::upload: temporarily prevent abuses" [puppet] - 10https://gerrit.wikimedia.org/r/518215 (owner: 10Ema)
[08:02:56] <wikibugs>	 (03PS1) 10Marostegui: wmnet: Change dbproxy1018,dbproxy1019 IPs to be in cloud [dns] - 10https://gerrit.wikimedia.org/r/518216 (https://phabricator.wikimedia.org/T225704)
[08:03:41] <wikibugs>	 (03PS10) 10Ema: Normalize thumbnail URLs to avoid cachebusting [puppet] - 10https://gerrit.wikimedia.org/r/495643 (https://phabricator.wikimedia.org/T216339) (owner: 10Gilles)
[08:04:29] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] wmnet: Change dbproxy1018,dbproxy1019 IPs to be in cloud [dns] - 10https://gerrit.wikimedia.org/r/518216 (https://phabricator.wikimedia.org/T225704) (owner: 10Marostegui)
[08:08:14] <wikibugs>	 10Operations, 10DBA, 10Patch-For-Review: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10Marostegui) a:05Cmjohnson→03Marostegui While debugging we Arzhel we have noticed that the DNS entries for dbproxy1018 and dbproxy1019 didn't belong to the cloud network,...
[08:09:10] <wikibugs>	 (03CR) 10Ema: [C: 03+2] Normalize thumbnail URLs to avoid cachebusting [puppet] - 10https://gerrit.wikimedia.org/r/495643 (https://phabricator.wikimedia.org/T216339) (owner: 10Gilles)
[08:14:14] <wikibugs>	 (03PS2) 10Ema: Revert "cache::upload: temporarily prevent abuses" [puppet] - 10https://gerrit.wikimedia.org/r/518215
[08:22:13] <wikibugs>	 (03CR) 10Ema: [C: 03+2] Revert "cache::upload: temporarily prevent abuses" [puppet] - 10https://gerrit.wikimedia.org/r/518215 (owner: 10Ema)
[08:22:48] <wikibugs>	 10Operations, 10Performance-Team, 10Traffic, 10media-storage, 10Patch-For-Review: Automatically clean up unused thumbnails in Swift - https://phabricator.wikimedia.org/T211661 (10Gilles)
[08:22:52] <wikibugs>	 10Operations, 10Performance-Team, 10Traffic, 10Patch-For-Review: Normalize thumbnail request URLs in Varnish to avoid cachebusting - https://phabricator.wikimedia.org/T216339 (10Gilles) 05Open→03Resolved
[08:23:00] <wikibugs>	 10Operations, 10Performance-Team, 10Traffic, 10media-storage, 10Patch-For-Review: Automatically clean up unused thumbnails in Swift - https://phabricator.wikimedia.org/T211661 (10Gilles) 05Stalled→03Open
[08:23:59] <wikibugs>	 10Operations, 10Performance-Team, 10Traffic, 10media-storage, 10Patch-For-Review: Automatically clean up unused thumbnails in Swift - https://phabricator.wikimedia.org/T211661 (10Gilles) @aaron your concern has been addressed now, the Varnish-level thumbnail URL normalization is live. We can now proceed...
[08:25:39] <wikibugs>	 10Operations, 10Performance-Team, 10Traffic, 10media-storage, 10Patch-For-Review: Automatically clean up unused thumbnails in Swift - https://phabricator.wikimedia.org/T211661 (10Gilles) @jijiki @fgiunchedi Have Swift proxies been restarted since https://gerrit.wikimedia.org/r/#/c/mediawiki/vagrant/+/489...
[08:27:25] <icinga-wm>	 PROBLEM - Upload HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=upload&var-status_type=5
[08:27:41] <wikibugs>	 10Operations, 10Performance-Team, 10Traffic, 10media-storage, 10Patch-For-Review: Automatically clean up unused thumbnails in Swift - https://phabricator.wikimedia.org/T211661 (10jijiki) @Gilles can it wait till next week when we will all be back, unless it is urgent, in which case we will figure it out
[08:28:36] <wikibugs>	 (03PS1) 10Elukey: profile::hue: add a parameter to selectively enable oozie security [puppet] - 10https://gerrit.wikimedia.org/r/518220 (https://phabricator.wikimedia.org/T212259)
[08:30:18] <wikibugs>	 10Operations, 10Performance-Team, 10Traffic, 10media-storage, 10Patch-For-Review: Automatically clean up unused thumbnails in Swift - https://phabricator.wikimedia.org/T211661 (10Gilles) It can wait. Basically I want to figure out where we're at in regards to that patch, what's actually deployed and runn...
[08:30:53] <wikibugs>	 (03PS2) 10Ema: Add debian/patches/0034-r02135.vtc-fixes.patch [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/518065
[08:30:55] <wikibugs>	 (03PS2) 10Ema: Add debian/patches/0032-vbe_dir_finish-no-VBT_Wait.patch [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/518063
[08:30:57] <wikibugs>	 (03PS2) 10Ema: Add debian/patches/0033-recycled-honor-first_byte_timeout.patch [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/518064
[08:31:02] <wikibugs>	 10Operations, 10Performance-Team, 10Traffic, 10media-storage, and 2 others: Automatically clean up unused thumbnails in Swift - https://phabricator.wikimedia.org/T211661 (10jijiki)
[08:31:18] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add debian/patches/0034-r02135.vtc-fixes.patch [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/518065 (owner: 10Ema)
[08:31:20] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add debian/patches/0032-vbe_dir_finish-no-VBT_Wait.patch [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/518063 (owner: 10Ema)
[08:31:30] <wikibugs>	 (03PS1) 10Elukey: hue: move $oozie_security_enabled to parameters [puppet/cdh] - 10https://gerrit.wikimedia.org/r/518221
[08:33:09] <wikibugs>	 10Operations, 10serviceops, 10wikitech.wikimedia.org, 10PHP 7.2 support, 10Patch-For-Review: switch wikitech to PHP 7.2 - https://phabricator.wikimedia.org/T223393 (10jijiki) p:05Triage→03Low
[08:34:29] <wikibugs>	 10Operations, 10Gerrit, 10Release-Engineering-Team-TODO, 10serviceops: Gerrit Hardware Upgrade - https://phabricator.wikimedia.org/T222391 (10jijiki) p:05Triage→03Low
[08:35:11] <wikibugs>	 10Operations, 10Performance-Team, 10Traffic, 10media-storage, and 2 others: Automatically clean up unused thumbnails in Swift - https://phabricator.wikimedia.org/T211661 (10Gilles) Nevermind, this was the Vagrant patch... I'm going to make the production one now
[08:35:43] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] hue: move $oozie_security_enabled to parameters [puppet/cdh] - 10https://gerrit.wikimedia.org/r/518221 (owner: 10Elukey)
[08:36:01] <icinga-wm>	 PROBLEM - puppet last run on schema1002 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle.
[08:36:31] <icinga-wm>	 PROBLEM - puppet last run on lvs3003 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle.
[08:36:54] <wikibugs>	 (03PS2) 10Elukey: profile::hue: add a parameter to selectively enable oozie security [puppet] - 10https://gerrit.wikimedia.org/r/518220 (https://phabricator.wikimedia.org/T212259)
[08:37:43] <wikibugs>	 10Operations, 10Continuous-Integration-Infrastructure (phase-out-jessie): Upload docker-ce 18.06.3 upstream package for Stretch - https://phabricator.wikimedia.org/T226236 (10hashar)
[08:37:48] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: gmail users being suspended from mediawiki-l due to excessive bounces due to DMARC - https://phabricator.wikimedia.org/T225553 (10Aklapper)
[08:38:31] <wikibugs>	 (03PS3) 10Ema: Honor first_byte_timeout for recycled backend connections [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/518064
[08:40:33] <logmsgbot>	 !log ema@cumin1001 START - Cookbook sre.hosts.upgrade-and-reboot
[08:40:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:41:36] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1001/17054/" [puppet] - 10https://gerrit.wikimedia.org/r/518220 (https://phabricator.wikimedia.org/T212259) (owner: 10Elukey)
[08:42:01] <logmsgbot>	 !log ema@cumin1001 START - Cookbook sre.hosts.upgrade-and-reboot
[08:42:01] <wikibugs>	 10Operations, 10Continuous-Integration-Infrastructure (phase-out-jessie): Upload docker-ce 18.06.3 upstream package for Stretch - https://phabricator.wikimedia.org/T226236 (10hashar) And maybe we could use some `reprepro` configuration to ease further upgrades?
[08:42:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:42:33] <wikibugs>	 (03CR) 10Effie Mouzeli: [V: 03+1] "LGTM https://puppet-compiler.wmflabs.org/compiler1002/17049/" [puppet] - 10https://gerrit.wikimedia.org/r/517755 (https://phabricator.wikimedia.org/T225284) (owner: 10Effie Mouzeli)
[08:43:40] <wikibugs>	 (03Abandoned) 10Ema: Add debian/patches/0031-vbt-close-stolen.patch [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/518062 (owner: 10Ema)
[08:43:55] <wikibugs>	 (03Abandoned) 10Ema: Add debian/patches/0034-r02135.vtc-fixes.patch [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/518065 (owner: 10Ema)
[08:44:05] <wikibugs>	 (03Abandoned) 10Ema: Add debian/patches/0032-vbe_dir_finish-no-VBT_Wait.patch [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/518063 (owner: 10Ema)
[08:46:24] <wikibugs>	 (03PS1) 10Hashar: contint: remove zuul-cloner from Docker agent [puppet] - 10https://gerrit.wikimedia.org/r/518222 (https://phabricator.wikimedia.org/T226233)
[08:46:44] <logmsgbot>	 !log ema@cumin1001 END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
[08:46:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:47:32] <wikibugs>	 10Operations, 10MediaWiki-Cache, 10serviceops-radar, 10Core Platform Team (Security, stability, performance and scalability (TEC1)), and 5 others: Use a multi-dc aware store for ObjectCache's MainStash if needed. - https://phabricator.wikimedia.org/T212129 (10Joe)
[08:48:01] <wikibugs>	 10Operations, 10Scap, 10serviceops-radar, 10User-jijiki: Introduce state to Scap - https://phabricator.wikimedia.org/T209881 (10Joe)
[08:48:25] <logmsgbot>	 !log ema@cumin1001 END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
[08:48:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:49:35] <wikibugs>	 10Operations, 10CX-cxserver, 10Citoid, 10Graphoid, and 10 others: Make services swagger specs standard compliant - https://phabricator.wikimedia.org/T218217 (10Joe)
[08:50:17] <wikibugs>	 10Operations, 10Operations-Software-Development, 10serviceops, 10Patch-For-Review, and 3 others: Convert makevm to spicerack cookbook - https://phabricator.wikimedia.org/T203963 (10akosiaris) Should we close this? Is there anything left to be done?
[08:50:47] <wikibugs>	 10Operations, 10Analytics, 10Research, 10serviceops-radar, and 4 others: Transferring data from Hadoop to production MySQL database - https://phabricator.wikimedia.org/T213566 (10Joe)
[08:51:08] <wikibugs>	 10Operations, 10DBA, 10Patch-For-Review: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['dbproxy1018.eqiad.wmnet'] `  Of which those **FAILED**: ` ['dbproxy1018.eqiad.wmnet'] `
[08:52:20] <wikibugs>	 10Operations, 10Operations-Software-Development, 10serviceops-radar, 10Patch-For-Review, and 3 others: Convert makevm to spicerack cookbook - https://phabricator.wikimedia.org/T203963 (10akosiaris)
[08:52:35] <wikibugs>	 10Operations, 10Proton, 10Reading-Infrastructure-Team-Backlog, 10Traffic, and 3 others: Document and possibly fine-tune how Proton interacts with Varnish - https://phabricator.wikimedia.org/T213371 (10Joe)
[08:55:44] <wikibugs>	 10Operations, 10DBA, 10Patch-For-Review: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['dbproxy1018.eqiad.wmnet'] ` The log can be found in `/var/log/w...
[09:01:44] <logmsgbot>	 !log ema@cumin1001 START - Cookbook sre.hosts.upgrade-and-reboot
[09:01:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:02:15] <icinga-wm>	 RECOVERY - Upload HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=upload&var-status_type=5
[09:03:15] <icinga-wm>	 RECOVERY - puppet last run on schema1002 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures
[09:03:20] <wikibugs>	 (03PS1) 10Ema: varnish (5.1.3-1wm11) stretch-wikimedia; urgency=medium [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/518224
[09:03:41] <icinga-wm>	 RECOVERY - puppet last run on lvs3003 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[09:04:23] <elukey>	 ema: o/ - are the above 50x due to upload upgrades?
[09:04:39] <wikibugs>	 (03CR) 10Effie Mouzeli: [C: 03+2] Remove kafka1018 from ProductionServices [mediawiki-config] - 10https://gerrit.wikimedia.org/r/513033 (https://phabricator.wikimedia.org/T224538) (owner: 10Effie Mouzeli)
[09:05:39] <wikibugs>	 (03Merged) 10jenkins-bot: Remove kafka1018 from ProductionServices [mediawiki-config] - 10https://gerrit.wikimedia.org/r/513033 (https://phabricator.wikimedia.org/T224538) (owner: 10Effie Mouzeli)
[09:06:32] <wikibugs>	 (03CR) 10jenkins-bot: Remove kafka1018 from ProductionServices [mediawiki-config] - 10https://gerrit.wikimedia.org/r/513033 (https://phabricator.wikimedia.org/T224538) (owner: 10Effie Mouzeli)
[09:07:11] <ema>	 elukey: not sure, see ~ema/upload-503.log on weblog1001 if you wanna help find out
[09:08:19] <logmsgbot>	 !log ema@cumin1001 END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
[09:08:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:08:25] <logmsgbot>	 !log ema@cumin1001 START - Cookbook sre.hosts.upgrade-and-reboot
[09:08:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:09:04] <ema>	 elukey: likely not, various eqiad/esams nodes affected
[09:09:44] <logmsgbot>	 !log jiji@deploy1001 Synchronized wmf-config/ProductionServices.php: Remove kafka1018 from ProductionServices - T224538 (duration: 00m 56s)
[09:09:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:09:49] <stashbot>	 T224538: Socket Errors on PHP7  - https://phabricator.wikimedia.org/T224538
[09:11:18] <ema>	 looks like swift is in trouble https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?orgId=1&from=now-24h&to=now-1m
[09:11:47] <wikibugs>	 10Operations, 10DBA, 10Patch-For-Review: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['dbproxy1019.eqiad.wmnet'] ` The log can be found in `/var/log/w...
[09:11:53] <ema>	 the vast majority of the 503 errors had ttfb 20s
[09:12:31] <wikibugs>	 10Operations, 10DBA, 10Patch-For-Review: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['dbproxy1018.eqiad.wmnet'] `  and were **ALL** successful.
[09:13:04] <icinga-wm>	 PROBLEM - Check the Netbox report-s- puppetdb for fail status. on netmon1002 is CRITICAL: puppetdb.PuppetDB CRITICAL https://wikitech.wikimedia.org/wiki/Netbox%23Reports
[09:14:01] <wikibugs>	 10Operations, 10DBA, 10Patch-For-Review: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10Marostegui)
[09:15:46] <logmsgbot>	 !log ema@cumin1001 END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
[09:15:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:16:06] <wikibugs>	 10Operations, 10service-runner, 10serviceops-radar, 10Patch-For-Review, 10Services (later): Re-evaluate service-runner's (ab)use of statsd timing metric for nodejs GC stats - https://phabricator.wikimedia.org/T222795 (10akosiaris)
[09:17:21] <wikibugs>	 (03PS1) 10Gilles: Have the Swift rewrite proxy renew expiry headers [puppet] - 10https://gerrit.wikimedia.org/r/518226 (https://phabricator.wikimedia.org/T211661)
[09:17:25] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Have the Swift rewrite proxy renew expiry headers [puppet] - 10https://gerrit.wikimedia.org/r/518226 (https://phabricator.wikimedia.org/T211661) (owner: 10Gilles)
[09:18:57] <wikibugs>	 (03PS2) 10Gilles: Have the Swift rewrite proxy renew expiry headers [puppet] - 10https://gerrit.wikimedia.org/r/518226 (https://phabricator.wikimedia.org/T211661)
[09:19:23] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Have the Swift rewrite proxy renew expiry headers [puppet] - 10https://gerrit.wikimedia.org/r/518226 (https://phabricator.wikimedia.org/T211661) (owner: 10Gilles)
[09:23:19] <logmsgbot>	 !log ema@cumin1001 START - Cookbook sre.hosts.upgrade-and-reboot
[09:23:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:24:31] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/518210 (https://phabricator.wikimedia.org/T220811) (owner: 10Muehlenhoff)
[09:24:46] <wikibugs>	 (03PS3) 10Gilles: Have the Swift rewrite proxy renew expiry headers [puppet] - 10https://gerrit.wikimedia.org/r/518226 (https://phabricator.wikimedia.org/T211661)
[09:25:16] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] "Thanks papaul, merging!" [dns] - 10https://gerrit.wikimedia.org/r/515111 (owner: 10Papaul)
[09:25:20] <wikibugs>	 (03PS3) 10Alexandros Kosiaris: DNS: Add mgmt and production DNS for ganeti2009, ganeti201[0-8] [dns] - 10https://gerrit.wikimedia.org/r/515111 (owner: 10Papaul)
[09:25:37] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] DNS: Add mgmt and production DNS for ganeti2009, ganeti201[0-8] [dns] - 10https://gerrit.wikimedia.org/r/515111 (owner: 10Papaul)
[09:28:11] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: toolforge: k8s: etcd: use domain names instead of IP addresses [puppet] - 10https://gerrit.wikimedia.org/r/518075 (https://phabricator.wikimedia.org/T226098)
[09:28:30] <wikibugs>	 10Operations, 10DBA, 10Patch-For-Review: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['dbproxy1019.eqiad.wmnet'] `  and were **ALL** successful.
[09:30:03] <wikibugs>	 10Operations, 10DBA, 10Patch-For-Review: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10Marostegui)
[09:30:14] <wikibugs>	 10Operations, 10DBA, 10Patch-For-Review: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10Marostegui) 05Open→03Resolved All hosts installed
[09:30:31] <logmsgbot>	 !log ema@cumin1001 END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
[09:30:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:31:00] <elukey>	 ema: sorry just seen the ping
[09:33:57] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] toolforge: k8s: etcd: use domain names instead of IP addresses [puppet] - 10https://gerrit.wikimedia.org/r/518075 (https://phabricator.wikimedia.org/T226098) (owner: 10Arturo Borrero Gonzalez)
[09:35:46] <logmsgbot>	 !log ema@cumin1001 START - Cookbook sre.hosts.upgrade-and-reboot
[09:35:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:37:45] <wikibugs>	 10Operations, 10ops-eqiad, 10Cloud-Services, 10cloud-services-team (Kanban): rack/setup/install (3) new osd ceph nodes - https://phabricator.wikimedia.org/T224188 (10faidon) >>! In T224188#5271528, @Bstorm wrote: > Ceph is capable of saturating 10G links under heavy load > [...] > Rate-limiting traffic is...
[09:38:50] <icinga-wm>	 RECOVERY - Check the Netbox report-s- puppetdb for fail status. on netmon1002 is OK: puppetdb.PuppetDB OK https://wikitech.wikimedia.org/wiki/Netbox%23Reports
[09:42:28] <logmsgbot>	 !log ema@cumin1001 END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
[09:42:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:43:23] <moritzm>	 !log rebooting cp1008
[09:43:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:43:58] <wikibugs>	 (03PS1) 10Ema: Revert "Normalize thumbnail URLs to avoid cachebusting" [puppet] - 10https://gerrit.wikimedia.org/r/518230
[09:44:14] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Revert "Normalize thumbnail URLs to avoid cachebusting" [puppet] - 10https://gerrit.wikimedia.org/r/518230 (owner: 10Ema)
[09:45:24] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/511686 (owner: 10Jbond)
[09:45:32] <logmsgbot>	 !log ema@cumin1001 START - Cookbook sre.hosts.upgrade-and-reboot
[09:45:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:45:40] <wikibugs>	 10Operations, 10Performance-Team, 10Traffic, 10media-storage, and 2 others: Automatically clean up unused thumbnails in Swift - https://phabricator.wikimedia.org/T211661 (10Gilles)
[09:45:44] <wikibugs>	 10Operations, 10Performance-Team, 10Traffic, 10Patch-For-Review: Normalize thumbnail request URLs in Varnish to avoid cachebusting - https://phabricator.wikimedia.org/T216339 (10Gilles) 05Resolved→03Open
[09:46:01] <wikibugs>	 (03PS2) 10Ema: Revert "Normalize thumbnail URLs to avoid cachebusting" [puppet] - 10https://gerrit.wikimedia.org/r/518230
[09:50:16] <wikibugs>	 (03PS1) 10Ema: Revert "Normalize thumbnail URLs to avoid cachebusting" [puppet] - 10https://gerrit.wikimedia.org/r/518231
[09:50:30] <wikibugs>	 (03Abandoned) 10Ema: Revert "Normalize thumbnail URLs to avoid cachebusting" [puppet] - 10https://gerrit.wikimedia.org/r/518230 (owner: 10Ema)
[09:51:47] <logmsgbot>	 !log ema@cumin1001 END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
[09:51:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:59:36] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: toolforge: k8s: etcd: enable 2379/tcp for peers as well [puppet] - 10https://gerrit.wikimedia.org/r/518235 (https://phabricator.wikimedia.org/T226098)
[10:01:24] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] toolforge: k8s: etcd: enable 2379/tcp for peers as well [puppet] - 10https://gerrit.wikimedia.org/r/518235 (https://phabricator.wikimedia.org/T226098) (owner: 10Arturo Borrero Gonzalez)
[10:06:48] <logmsgbot>	 !log ema@cumin1001 START - Cookbook sre.hosts.upgrade-and-reboot
[10:06:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:08:54] <wikibugs>	 10Puppet, 10Cloud-VPS, 10serviceops, 10Patch-For-Review, and 2 others: upgrade simplelamp class (apache -> httpd and mysql -> mariadb) or deprecate it - https://phabricator.wikimedia.org/T215662 (10Joe) p:05Triage→03Low
[10:09:03] <logmsgbot>	 !log ema@cumin1001 START - Cookbook sre.hosts.upgrade-and-reboot
[10:09:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:13:05] <logmsgbot>	 !log ema@cumin1001 END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
[10:13:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:15:26] <logmsgbot>	 !log ema@cumin1001 END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
[10:15:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:17:55] <wikibugs>	 10Operations, 10observability, 10serviceops, 10PHP 7.2 support, and 2 others: [Regression] fatal-errors.php action=segfault results in a 503 error under php7-fpm. - https://phabricator.wikimedia.org/T223336 (10Joe)
[10:17:58] <wikibugs>	 10Operations, 10MediaWiki-Logging, 10Wikimedia-Logstash, 10wmerrors, and 7 others: Port mediawiki/php/wmerrors to PHP7 and deploy - https://phabricator.wikimedia.org/T187147 (10Joe)
[10:19:14] <wikibugs>	 10Operations, 10MediaWiki-Logging, 10Wikimedia-Logstash, 10serviceops, and 8 others: Port mediawiki/php/wmerrors to PHP7 and deploy - https://phabricator.wikimedia.org/T187147 (10Joe)
[10:25:15] <wikibugs>	 10Operations, 10MediaWiki-Logging, 10Wikimedia-Logstash, 10serviceops, and 8 others: Port mediawiki/php/wmerrors to PHP7 and deploy - https://phabricator.wikimedia.org/T187147 (10Joe) @Legoktm kindly did my job and created an [[https://salsa.debian.org/mediawiki-team/php-wmerrors | upstream package ]] for...
[10:29:30] <icinga-wm>	 PROBLEM - Check systemd state on kubernetes2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:41:37] <wikibugs>	 10Operations, 10Performance-Team, 10Traffic, 10Performance: Sometimes pages load slowly for European users (due to some factor outside of Wikimedia cluster) - https://phabricator.wikimedia.org/T226048 (10Jheald) ^^ I saw exactly what Wurgl reports, browsing long pages on en-wiki (eg the Fram case) in Londo...
[10:45:31] <wikibugs>	 (03PS1) 10Muehlenhoff: Add ferm rules for kpasswd [puppet] - 10https://gerrit.wikimedia.org/r/518237
[11:23:59] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: toolforge: k8s: etcd: refresh server with certs changes [puppet] - 10https://gerrit.wikimedia.org/r/518238 (https://phabricator.wikimedia.org/T169287)
[11:28:32] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: toolforge: k8s: etcd: refresh server with certs changes [puppet] - 10https://gerrit.wikimedia.org/r/518238 (https://phabricator.wikimedia.org/T169287)
[11:30:33] <wikibugs>	 (03PS3) 10Arturo Borrero Gonzalez: toolforge: k8s: etcd: refresh server with certs changes [puppet] - 10https://gerrit.wikimedia.org/r/518238 (https://phabricator.wikimedia.org/T169287)
[11:32:30] <wikibugs>	 10Operations, 10SRE-Access-Requests: Access Q re maint1002 - https://phabricator.wikimedia.org/T225253 (10jijiki) @Iflorez is this working out for you? You could reach out on the -sre channel on irc to further debug this, but if this is not an access request task anymore, I would really appreciate if we mark i...
[11:39:32] <wikibugs>	 (03Abandoned) 10Arturo Borrero Gonzalez: toolforge: k8s: etcd: restart etcd service when certs change [puppet] - 10https://gerrit.wikimedia.org/r/518020 (https://phabricator.wikimedia.org/T226098) (owner: 10Arturo Borrero Gonzalez)
[11:39:34] <wikibugs>	 (03PS1) 10Lucas Werkmeister (WMDE): Specify $wgWBRepoSettings['conceptBaseUri'] again [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518239 (https://phabricator.wikimedia.org/T225212)
[11:41:15] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] toolforge: k8s: etcd: refresh server with certs changes [puppet] - 10https://gerrit.wikimedia.org/r/518238 (https://phabricator.wikimedia.org/T169287) (owner: 10Arturo Borrero Gonzalez)
[11:47:39] <hauskatze>	 Did you make any change to 'dologmsg'? It ain't working anymore
[11:50:45] <Lucas_WMDE>	 in Toolforge or in production?
[11:53:28] <Lucas_WMDE>	 hauskatze: ^
[11:54:50] <hauskatze>	 Lucas_WMDE: toolforge
[11:54:57] <Lucas_WMDE>	 any error message?
[11:55:07] <hauskatze>	 I saw some puppet patch a couple of days ago but I can't remember
[11:55:12] <hauskatze>	 yes Lucas_WMDE , let me fetch
[11:55:30] <hauskatze>	 tools.stewardbots@tools-sgebastion-07:~/public_html$ dologmsg Updated stewardbots to 5099d2e.
[11:55:41] <hauskatze>	 tools.stewardbots@tools-sgebastion-07:~/public_html$ dologmsg Updated stewardbots to 5099d2e.
[11:55:43] <hauskatze>	 sigh
[11:56:18] <hauskatze>	 https://pastebin.com/06NVhkHb
[11:56:30] <hauskatze>	 bbl
[11:56:56] <Lucas_WMDE>	 ahah
[11:57:02] <Lucas_WMDE>	 there’s a space missing between "tools" and ]]
[11:57:05] <Lucas_WMDE>	 bstorm_: ^
[11:57:08] <Lucas_WMDE>	 I’ll upload a patch
[11:57:37] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+1] "LGTM." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518171 (https://phabricator.wikimedia.org/T226217) (owner: 10DannyS712)
[12:00:16] <wikibugs>	 (03PS1) 10Lucas Werkmeister (WMDE): dologmsg: fix missing space in conditional [puppet] - 10https://gerrit.wikimedia.org/r/518242
[12:06:17] <wikibugs>	 10Operations, 10Thumbor, 10serviceops, 10Patch-For-Review, 10User-jijiki: Investigate systemd hardening to replace Firejail for Thumbor - https://phabricator.wikimedia.org/T212941 (10jijiki) p:05Normal→03Low
[12:06:20] <wikibugs>	 10Operations, 10Thumbor, 10serviceops, 10Patch-For-Review, 10User-jijiki: Investigate systemd hardening to replace Firejail for Thumbor - https://phabricator.wikimedia.org/T212941 (10jijiki) p:05Low→03Normal
[12:06:23] <wikibugs>	 10Operations, 10Thumbor, 10Wikimedia-Logstash, 10serviceops, 10User-jijiki: Stream Thumbor logs to logstash - https://phabricator.wikimedia.org/T212946 (10jijiki) p:05Normal→03Low
[12:06:47] <jijiki>	 grrr
[12:15:57] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: toolforge: k8s: etcd: also create /etc/etcd [puppet] - 10https://gerrit.wikimedia.org/r/518247 (https://phabricator.wikimedia.org/T226098)
[12:30:29] <logmsgbot>	 !log ema@cumin1001 START - Cookbook sre.hosts.upgrade-and-reboot
[12:30:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:34:29] <wikibugs>	 (03PS1) 10Marostegui: mariadb: WIP Provision dbproxy2001 into m1 [puppet] - 10https://gerrit.wikimedia.org/r/518251
[12:36:05] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] toolforge: k8s: etcd: also create /etc/etcd [puppet] - 10https://gerrit.wikimedia.org/r/518247 (https://phabricator.wikimedia.org/T226098) (owner: 10Arturo Borrero Gonzalez)
[12:36:44] <logmsgbot>	 !log ema@cumin1001 END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
[12:36:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:38:00] <wikibugs>	 10Operations, 10Phabricator, 10Traffic, 10Release-Engineering-Team (Kanban): Set up a subdomain for Phame to enable caching - https://phabricator.wikimedia.org/T226044 (10mmodell) This seems like a good idea.  The upstream documentation has a warning that there are some issues with an external blog / dedic...
[12:40:03] <wikibugs>	 (03CR) 10Marostegui: "PCC looks good so far: https://puppet-compiler.wmflabs.org/compiler1001/17055/dbproxy2001.codfw.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/518251 (owner: 10Marostegui)
[12:43:59] <wikibugs>	 (03PS2) 10Elukey: Add ferm rules for kpasswd [puppet] - 10https://gerrit.wikimedia.org/r/518237 (owner: 10Muehlenhoff)
[12:44:04] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] Add ferm rules for kpasswd [puppet] - 10https://gerrit.wikimedia.org/r/518237 (owner: 10Muehlenhoff)
[12:47:04] <wikibugs>	 10Operations, 10Diffusion, 10Packaging, 10Release-Engineering-Team (Kanban), and 2 others: Cannot connect to vcs@git-ssh.wikimedia.org (since move from phab1001 to phab1003) - https://phabricator.wikimedia.org/T224677 (10mmodell) @LucasWerkmeister that could work, though if the fix is as simple as it appea...
[12:52:35] <logmsgbot>	 !log ema@cumin1001 START - Cookbook sre.hosts.upgrade-and-reboot
[12:52:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:55:41] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] Add ferm rules for kpasswd [puppet] - 10https://gerrit.wikimedia.org/r/518237 (owner: 10Muehlenhoff)
[12:58:21] <logmsgbot>	 !log ema@cumin1001 END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
[12:58:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:58:40] <wikibugs>	 10Operations, 10Diffusion, 10Packaging, 10Release-Engineering-Team (Kanban), and 2 others: Cannot connect to vcs@git-ssh.wikimedia.org (since move from phab1001 to phab1003) - https://phabricator.wikimedia.org/T224677 (10ArielGlenn) If we build it for reals, I'd ask @MoritzMuehlenhoff about all that. If we...
[13:00:07] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: releases: Rely on cron alone for helm charts updating [puppet] - 10https://gerrit.wikimedia.org/r/518253
[13:09:24] <icinga-wm>	 PROBLEM - puppet last run on mw1255 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle.
[13:11:18] <logmsgbot>	 !log jmm@cumin2001 START - Cookbook sre.hosts.downtime
[13:11:19] <logmsgbot>	 !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[13:11:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:11:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:11:44] <wikibugs>	 (03CR) 10Alexandros Kosiaris: "Overall seems ok as a first draft to me, I 've left a comment about the owner/group/mode stuff, I 'll have a look whether it can be somewh" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/517888 (https://phabricator.wikimedia.org/T212130) (owner: 10Fsero)
[13:16:06] <logmsgbot>	 !log jmm@cumin2001 START - Cookbook sre.hosts.downtime
[13:16:08] <logmsgbot>	 !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[13:16:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:16:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:16:30] <moritzm>	 !log rebooting kafkamon instances to pick up MDS mitigations/new kernel
[13:16:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:22:08] <wikibugs>	 (03CR) 10Alexandros Kosiaris: k8s, deploy: introducing helmfile for manage charts (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/517888 (https://phabricator.wikimedia.org/T212130) (owner: 10Fsero)
[13:26:50] <logmsgbot>	 !log ema@cumin1001 START - Cookbook sre.hosts.upgrade-and-reboot
[13:26:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:27:40] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] "Still trying to wrap my head around the raw chart but I 'll figure it out." (0318 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/517887 (https://phabricator.wikimedia.org/T212130) (owner: 10Fsero)
[13:33:03] <logmsgbot>	 !log ema@cumin1001 END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
[13:33:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:34:53] <wikibugs>	 10Operations, 10Commons, 10Wikimedia-Site-requests, 10media-storage, 10User-Urbanecm: Server-side upload request for Hurtigruten minutt for minutt videos - https://phabricator.wikimedia.org/T223052 (10Urbanecm) Lock error disappeared, unknown error one is still in place.
[13:35:38] <wikibugs>	 (03PS1) 10Petar.petkovic: Don't show cannot publish error to 'sysop' users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518260 (https://phabricator.wikimedia.org/T225398)
[13:36:36] <icinga-wm>	 RECOVERY - puppet last run on mw1255 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[13:36:48] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Don't show cannot publish error to 'sysop' users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518260 (https://phabricator.wikimedia.org/T225398) (owner: 10Petar.petkovic)
[13:39:43] <wikibugs>	 (03CR) 10Fsero: introducing helmfile.d values for staging cluster (033 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/517887 (https://phabricator.wikimedia.org/T212130) (owner: 10Fsero)
[13:43:35] <logmsgbot>	 !log akosiaris@deploy1001 scap-helm mathoid upgrade --recreate-pods -f mathoid-staging-values.yaml staging stable/mathoid [namespace: mathoid, clusters: staging]
[13:43:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:43:42] <logmsgbot>	 !log akosiaris@deploy1001 scap-helm mathoid cluster staging completed
[13:43:43] <logmsgbot>	 !log akosiaris@deploy1001 scap-helm mathoid finished
[13:43:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:43:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:45:24] <icinga-wm>	 PROBLEM - Disk space on contint1001 is CRITICAL: DISK CRITICAL - free space: / 2514 MB (5% inode=45%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space
[13:45:35] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] introducing helmfile.d values for staging cluster (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/517887 (https://phabricator.wikimedia.org/T212130) (owner: 10Fsero)
[13:46:56] <icinga-wm>	 RECOVERY - Disk space on contint1001 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space
[13:48:04] <logmsgbot>	 !log ema@cumin1001 START - Cookbook sre.hosts.upgrade-and-reboot
[13:48:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:48:46] <logmsgbot>	 !log jmm@cumin2001 START - Cookbook sre.hosts.downtime
[13:48:47] <logmsgbot>	 !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[13:48:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:48:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:51:38] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 04-2] "We've been trying to deprecate ferm macros as they have a number of issues." [puppet] - 10https://gerrit.wikimedia.org/r/518130 (owner: 10EBernhardson)
[13:52:32] <logmsgbot>	 !log jmm@cumin2001 START - Cookbook sre.hosts.downtime
[13:52:33] <logmsgbot>	 !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[13:52:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:52:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:55:35] <logmsgbot>	 !log ema@cumin1001 END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
[13:55:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:58:20] <wikibugs>	 10Operations, 10Performance-Team, 10Traffic, 10media-storage, and 2 others: Automatically clean up unused thumbnails in Swift - https://phabricator.wikimedia.org/T211661 (10Gilles)
[13:58:24] <wikibugs>	 10Operations, 10Performance-Team, 10Traffic, 10Patch-For-Review: Normalize thumbnail request URLs in Varnish to avoid cachebusting - https://phabricator.wikimedia.org/T216339 (10Gilles) 05Open→03Resolved This has caused a spike of thumbor thumbnailing requests, by virtue of making some objects hotter t...
[14:09:46] <Urbanecm>	 !log Renamed Carmen0429@metawiki to Carmen0428@metawiki as part of re-attaching to global account (T223036)
[14:09:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:09:53] <stashbot>	 T223036: Detached local SUL account User:Carmen0428 needs to be reunited with the global account - https://phabricator.wikimedia.org/T223036
[14:10:22] <Urbanecm>	 !log Attached Carmen0428@metawiki to Carmen0428 global account (T223036)
[14:10:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:10:35] <logmsgbot>	 !log ema@cumin1001 START - Cookbook sre.hosts.upgrade-and-reboot
[14:10:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:15:41] <wikibugs>	 10Operations, 10Operations-Software-Development, 10User-Joe, 10User-jijiki: Spicerack cookbooks TODO list - https://phabricator.wikimedia.org/T203943 (10debt)
[14:15:45] <wikibugs>	 10Operations, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work): Create Cookbook to restart WDQS - https://phabricator.wikimedia.org/T221832 (10debt) 05Open→03Resolved
[14:16:49] <logmsgbot>	 !log ema@cumin1001 END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
[14:16:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:17:23] <logmsgbot>	 !log ema@cumin1001 START - Cookbook sre.hosts.upgrade-and-reboot
[14:17:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:20:39] <wikibugs>	 (03PS2) 10Fsero: k8s, deploy: introducing helmfile for manage charts [puppet] - 10https://gerrit.wikimedia.org/r/517888 (https://phabricator.wikimedia.org/T212130)
[14:21:40] <wikibugs>	 10Operations, 10Traffic, 10Patch-For-Review: cp3041 - Varnish frontend child restarted icinga alert - https://phabricator.wikimedia.org/T224694 (10ema) 05Open→03Resolved a:03ema All cache nodes are currently running Varnish 5.1.3-1wm10, which fixes this.
[14:21:44] <wikibugs>	 (03CR) 10Fsero: [C: 03+2] releases: Rely on cron alone for helm charts updating [puppet] - 10https://gerrit.wikimedia.org/r/518253 (owner: 10Alexandros Kosiaris)
[14:21:55] <wikibugs>	 (03PS1) 10Urbanecm: Add hualab.nl to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518278 (https://phabricator.wikimedia.org/T225917)
[14:22:21] <logmsgbot>	 !log jmm@cumin2001 START - Cookbook sre.hosts.downtime
[14:22:22] <logmsgbot>	 !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[14:22:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:22:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:23:12] <moritzm>	 !log rebooting wezen
[14:23:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:23:29] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add hualab.nl to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518278 (https://phabricator.wikimedia.org/T225917) (owner: 10Urbanecm)
[14:23:44] <logmsgbot>	 !log ema@cumin1001 END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
[14:23:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:26:27] <logmsgbot>	 !log jmm@cumin2001 START - Cookbook sre.hosts.downtime
[14:26:28] <logmsgbot>	 !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[14:26:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:26:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:29:50] <wikibugs>	 10Operations, 10Performance-Team, 10Traffic, 10Performance: Study performance impact of disabling TCP selective acknowledgments - https://phabricator.wikimedia.org/T225998 (10ema) All cache nodes are now running Linux 4.9.168-1+deb9u3. Next week we can thus re-enable SACKs on part of the cache fleet and fu...
[14:33:54] <icinga-wm>	 RECOVERY - Check systemd state on kubernetes2001 is OK: OK - running: The system is fully operational
[14:37:56] <moritzm>	 !log rebooting kerberos1001 to pick up MDS mitigations/new kernel
[14:37:57] <wikibugs>	 (03PS1) 10Fsero: k8s,deploy: adding fake secret data for PCC [labs/private] - 10https://gerrit.wikimedia.org/r/518282
[14:38:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:44:14] <wikibugs>	 (03PS1) 10Fsero: k8s,deploy: changing the order [labs/private] - 10https://gerrit.wikimedia.org/r/518283
[14:44:41] <wikibugs>	 (03CR) 10Fsero: [V: 03+2 C: 03+2] k8s,deploy: changing the order [labs/private] - 10https://gerrit.wikimedia.org/r/518283 (owner: 10Fsero)
[14:45:00] <wikibugs>	 (03Abandoned) 10Fsero: k8s,deploy: adding fake secret data for PCC [labs/private] - 10https://gerrit.wikimedia.org/r/518282 (owner: 10Fsero)
[14:49:57] <logmsgbot>	 !log jmm@cumin2001 START - Cookbook sre.hosts.downtime
[14:49:58] <logmsgbot>	 !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[14:50:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:50:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:50:44] <logmsgbot>	 !log jmm@cumin2001 START - Cookbook sre.hosts.downtime
[14:50:45] <logmsgbot>	 !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[14:50:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:50:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:51:13] <moritzm>	 !log rebooting planet1001 to pick up MDS mitigations/new kernel
[14:51:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:53:59] <logmsgbot>	 !log jmm@cumin2001 START - Cookbook sre.hosts.downtime
[14:54:00] <logmsgbot>	 !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[14:54:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:54:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:54:31] <wikibugs>	 (03PS3) 10Fsero: k8s, deploy: introducing helmfile for manage charts [puppet] - 10https://gerrit.wikimedia.org/r/517888 (https://phabricator.wikimedia.org/T212130)
[14:56:56] <wikibugs>	 (03PS4) 10Marostegui: db-eqiad,db-codfw.php: Change last parsercache key [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517807 (https://phabricator.wikimedia.org/T210725)
[15:02:26] <wikibugs>	 (03PS4) 10Fsero: k8s, deploy: introducing helmfile for manage charts [puppet] - 10https://gerrit.wikimedia.org/r/517888 (https://phabricator.wikimedia.org/T212130)
[15:05:48] <wikibugs>	 (03CR) 10Mforns: analytics::refinery::job::data_purge add deletion for data_quality_hourly (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/518069 (https://phabricator.wikimedia.org/T215863) (owner: 10Mforns)
[15:06:54] <wikibugs>	 (03PS5) 10Fsero: k8s, deploy: introducing helmfile for manage charts [puppet] - 10https://gerrit.wikimedia.org/r/517888 (https://phabricator.wikimedia.org/T212130)
[15:08:29] <wikibugs>	 (03PS6) 10Fsero: k8s, deploy: introducing helmfile for manage charts [puppet] - 10https://gerrit.wikimedia.org/r/517888 (https://phabricator.wikimedia.org/T212130)
[15:09:25] <wikibugs>	 (03PS1) 10Jbond: facter: add confine to cpu_details fact [puppet] - 10https://gerrit.wikimedia.org/r/518286
[15:09:37] <wikibugs>	 (03CR) 10Fsero: "Joe, Alex i think i've addressed your nits" [puppet] - 10https://gerrit.wikimedia.org/r/517888 (https://phabricator.wikimedia.org/T212130) (owner: 10Fsero)
[15:09:55] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] facter: add confine to cpu_details fact [puppet] - 10https://gerrit.wikimedia.org/r/518286 (owner: 10Jbond)
[15:15:13] <wikibugs>	 (03CR) 10Fsero: "> Patch Set 6:" [puppet] - 10https://gerrit.wikimedia.org/r/517888 (https://phabricator.wikimedia.org/T212130) (owner: 10Fsero)
[15:15:46] <wikibugs>	 (03PS2) 10Jbond: facter: add confine to cpu_details fact [puppet] - 10https://gerrit.wikimedia.org/r/518286
[15:17:43] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] facter: add confine to cpu_details fact [puppet] - 10https://gerrit.wikimedia.org/r/518286 (owner: 10Jbond)
[15:42:30] <wikibugs>	 (03Abandoned) 10EBernhardson: Define ferm classes for lvs owned ips [puppet] - 10https://gerrit.wikimedia.org/r/518130 (owner: 10EBernhardson)
[15:47:22] <wikibugs>	 (03PS7) 10EBernhardson: LVS for cloudelastic [puppet] - 10https://gerrit.wikimedia.org/r/512925 (https://phabricator.wikimedia.org/T224324)
[15:53:48] <wikibugs>	 10Puppet, 10cloud-services-team (Kanban): Reduce the effects of puppet breakage on VPS - https://phabricator.wikimedia.org/T226270 (10Andrew)
[15:56:04] <wikibugs>	 10Puppet, 10cloud-services-team (Kanban): Reduce the effects of puppet breakage on VPS - https://phabricator.wikimedia.org/T226270 (10Andrew) Here are some usage stats:  https://phabricator.wikimedia.org/P8638  As expected, relatively many projects use zero or few user-applied modules, and a small number of pr...
[15:57:01] <wikibugs>	 (03PS2) 10Bstorm: dologmsg: fix missing space in conditional [puppet] - 10https://gerrit.wikimedia.org/r/518242 (owner: 10Lucas Werkmeister (WMDE))
[15:57:38] <wikibugs>	 10Puppet, 10cloud-services-team (Kanban): Reduce the effects of puppet breakage on VPS - https://phabricator.wikimedia.org/T226270 (10Andrew) And, here is some info about the base classes applied to a VM vs a production machine:  https://phabricator.wikimedia.org/P8639  Lots of shared code in there!  The unsha...
[16:02:39] <wikibugs>	 (03CR) 10Bstorm: [C: 03+2] dologmsg: fix missing space in conditional [puppet] - 10https://gerrit.wikimedia.org/r/518242 (owner: 10Lucas Werkmeister (WMDE))
[16:05:23] <bstorm_>	 Lucas_WMDE: merged...that should roll out soon and fix things.  Good catch, and sorry about the miss.
[16:19:03] <wikibugs>	 10Operations, 10Gerrit, 10Traffic: When downloading from git using HTTPS: HTTP 500 / GnuTLS recv error (-110) - https://phabricator.wikimedia.org/T225347 (10sbassett) Hello @Ciencia_Al_Poder -  Apologies for the delay on a response to this issue.  Due to an ongoing security incident [0], certain IP ranges co...
[16:22:38] <ajr>	 hi operations peeps, can I rename a user with 88k edits right now? https://meta.wikimedia.org/wiki/Special:CentralAuth/Schmelzle
[16:28:35] <marostegui>	 ajr: that should be fine, yes
[16:48:27] <ajr>	 thanks!
[16:52:41] <wikibugs>	 (03PS1) 10Urbanecm: [throttle-analyze] Grant autoconfirmed permission to user when throttle rule is applied [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518298 (https://phabricator.wikimedia.org/T204583)
[16:55:03] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] [throttle-analyze] Grant autoconfirmed permission to user when throttle rule is applied [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518298 (https://phabricator.wikimedia.org/T204583) (owner: 10Urbanecm)
[16:56:03] <wikibugs>	 10Operations, 10Commons, 10MediaWiki-File-management, 10Multimedia, and 2 others: Image thumbnail (cache?) broken on Wikimedia Commons, e.g. Information.svg, when viewing non-default resolution (e.g. 241px) - https://phabricator.wikimedia.org/T226271 (10Krinkle)
[16:59:24] <wikibugs>	 (03PS1) 10Nuria: Removing page_links events from the ones that get refined [puppet] - 10https://gerrit.wikimedia.org/r/518299 (https://phabricator.wikimedia.org/T226268)
[17:00:52] <wikibugs>	 (03PS2) 10Nuria: Removing page_links events from refine whitelist [puppet] - 10https://gerrit.wikimedia.org/r/518299 (https://phabricator.wikimedia.org/T226268)
[17:07:15] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] Removing page_links events from refine whitelist [puppet] - 10https://gerrit.wikimedia.org/r/518299 (https://phabricator.wikimedia.org/T226268) (owner: 10Nuria)
[17:13:28] <wikibugs>	 (03CR) 10Nuria: analytics::refinery::job::data_purge add deletion for data_quality_hourly (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/518069 (https://phabricator.wikimedia.org/T215863) (owner: 10Mforns)
[17:13:33] <wikibugs>	 10Operations, 10Gerrit, 10Traffic: When downloading from git using HTTPS: HTTP 500 / GnuTLS recv error (-110) - https://phabricator.wikimedia.org/T225347 (10Ciencia_Al_Poder) Ok, thanks for the update. I only need a read-only access to the repos, so I'd have to live with the github mirror...
[17:13:52] <wikibugs>	 10Operations, 10ops-eqiad: rack/setup/install kafka-main100[1-5] - https://phabricator.wikimedia.org/T226274 (10herron)
[17:14:39] <wikibugs>	 10Operations, 10ops-eqiad: rack/setup/install kafka-main100[1-5] - https://phabricator.wikimedia.org/T226274 (10herron)
[17:15:59] <wikibugs>	 (03PS18) 10CRusnov: profile::netbox: Reorganize for splitting front and back-end. [puppet] - 10https://gerrit.wikimedia.org/r/514395 (https://phabricator.wikimedia.org/T223291)
[17:16:46] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] profile::netbox: Reorganize for splitting front and back-end. [puppet] - 10https://gerrit.wikimedia.org/r/514395 (https://phabricator.wikimedia.org/T223291) (owner: 10CRusnov)
[17:19:18] <wikibugs>	 10Operations, 10ops-eqiad: rack/setup/install kafka-main100[1-5] - https://phabricator.wikimedia.org/T226274 (10herron) a:03RobH
[17:26:32] <wikibugs>	 (03PS19) 10CRusnov: profile::netbox: Reorganize for splitting front and back-end. [puppet] - 10https://gerrit.wikimedia.org/r/514395 (https://phabricator.wikimedia.org/T223291)
[17:27:30] <icinga-wm>	 PROBLEM - Apache HTTP on mw1277 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.002 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[17:27:32] <icinga-wm>	 PROBLEM - HHVM rendering on mw1277 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.001 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[17:27:57] <wikibugs>	 (03CR) 10Herron: "> How does this look in terms of a cutover plan?  https://docs.google.com/document/d/1o7bl1WBzSMymsXGzhWLy1GmMOSo_Evof1PHk2aQXAiE/edit?usp" [puppet] - 10https://gerrit.wikimedia.org/r/514361 (https://phabricator.wikimedia.org/T225005) (owner: 10Herron)
[17:28:56] <icinga-wm>	 RECOVERY - Apache HTTP on mw1277 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 616 bytes in 0.040 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[17:29:00] <icinga-wm>	 RECOVERY - HHVM rendering on mw1277 is OK: HTTP OK: HTTP/1.1 200 OK - 82255 bytes in 0.200 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[17:30:57] <wikibugs>	 (03PS1) 10Papaul: DHCP: Add MAC address entries for ganeti2009 and ganeti201[0-8] [puppet] - 10https://gerrit.wikimedia.org/r/518303 (https://phabricator.wikimedia.org/T224603)
[17:45:05] <wikibugs>	 (03PS1) 10Papaul: Partman: Add ganeti201[0-8] [puppet] - 10https://gerrit.wikimedia.org/r/518305 (https://phabricator.wikimedia.org/T224603)
[17:58:24] <icinga-wm>	 PROBLEM - cassandra-a SSL 10.64.0.230:7001 on restbase1007 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused https://phabricator.wikimedia.org/T120662
[17:58:30] <icinga-wm>	 PROBLEM - cassandra-a service on restbase1007 is CRITICAL: CRITICAL - Expecting active but unit cassandra-a is failed https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[17:59:04] <icinga-wm>	 PROBLEM - cassandra-a CQL 10.64.0.230:9042 on restbase1007 is CRITICAL: connect to address 10.64.0.230 and port 9042: Connection refused https://phabricator.wikimedia.org/T93886
[18:28:49] <wikibugs>	 10Operations, 10Continuous-Integration-Infrastructure (phase-out-jessie): Migrate contint* hosts to Stretch/Buster - https://phabricator.wikimedia.org/T224591 (10Jdforrester-WMF)
[18:38:04] <icinga-wm>	 PROBLEM - puppet last run on bast2002 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle.
[18:53:29] <wikibugs>	 (03PS1) 10CRusnov: profile::netbox: Reorganize for splitting front and back-end. [puppet] - 10https://gerrit.wikimedia.org/r/518311 (https://phabricator.wikimedia.org/T223291)
[18:53:40] <wikibugs>	 10Operations, 10ops-eqiad: rack/setup/install kafka-main100[1-5] - https://phabricator.wikimedia.org/T226274 (10RobH)
[18:53:48] <wikibugs>	 10Operations, 10ops-eqiad: rack/setup/install kafka-main100[1-5] - https://phabricator.wikimedia.org/T226274 (10RobH)
[18:54:17] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] profile::netbox: Reorganize for splitting front and back-end. [puppet] - 10https://gerrit.wikimedia.org/r/518311 (https://phabricator.wikimedia.org/T223291) (owner: 10CRusnov)
[18:57:13] <wikibugs>	 (03PS20) 10CRusnov: profile::netbox: Reorganize for splitting front and back-end. [puppet] - 10https://gerrit.wikimedia.org/r/514395 (https://phabricator.wikimedia.org/T223291)
[18:58:03] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] profile::netbox: Reorganize for splitting front and back-end. [puppet] - 10https://gerrit.wikimedia.org/r/514395 (https://phabricator.wikimedia.org/T223291) (owner: 10CRusnov)
[18:59:35] <wikibugs>	 10Operations, 10ops-eqiad: rack/setup/install kafka-main100[1-5] - https://phabricator.wikimedia.org/T226274 (10RobH) a:05RobH→03herron @herron,  This looks correct, except there isn't a mention of if these need internal or external vlan/ip addresses?    Please comment to state which, and then assign this...
[19:01:26] <wikibugs>	 (03PS1) 10CRusnov: profile::netbox: Reorganize for splitting front and back-end. [puppet] - 10https://gerrit.wikimedia.org/r/518312 (https://phabricator.wikimedia.org/T223291)
[19:02:13] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] profile::netbox: Reorganize for splitting front and back-end. [puppet] - 10https://gerrit.wikimedia.org/r/518312 (https://phabricator.wikimedia.org/T223291) (owner: 10CRusnov)
[19:04:16] <wikibugs>	 (03Abandoned) 10CRusnov: profile::netbox: Reorganize for splitting front and back-end. [puppet] - 10https://gerrit.wikimedia.org/r/518312 (https://phabricator.wikimedia.org/T223291) (owner: 10CRusnov)
[19:05:09] <wikibugs>	 (03PS21) 10CRusnov: profile::netbox: Reorganize for splitting front and back-end. [puppet] - 10https://gerrit.wikimedia.org/r/514395 (https://phabricator.wikimedia.org/T223291)
[19:05:14] <icinga-wm>	 RECOVERY - puppet last run on bast2002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:06:00] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] profile::netbox: Reorganize for splitting front and back-end. [puppet] - 10https://gerrit.wikimedia.org/r/514395 (https://phabricator.wikimedia.org/T223291) (owner: 10CRusnov)
[19:27:12] <wikibugs>	 10Operations, 10Commons, 10MediaWiki-File-management, 10Multimedia, and 2 others: Image thumbnail (cache?) broken on Wikimedia Commons, e.g. Information.svg, when viewing non-default resolution (e.g. 241px) - https://phabricator.wikimedia.org/T226271 (10JJMC89) I deleted it and created protected the page a...
[19:38:35] <wikibugs>	 10Operations, 10Commons, 10MediaWiki-File-management, 10Multimedia, and 2 others: Image thumbnail (cache?) broken on Wikimedia Commons, e.g. Information.svg, when viewing non-default resolution (e.g. 241px) - https://phabricator.wikimedia.org/T226271 (10JJMC89) I reuploaded the file as [[ https://en.wikipe...
[19:47:40] <wikibugs>	 (03PS22) 10CRusnov: profile::netbox: Reorganize for splitting front and back-end. [puppet] - 10https://gerrit.wikimedia.org/r/514395 (https://phabricator.wikimedia.org/T223291)
[19:49:18] <wikibugs>	 10Operations, 10Commons, 10MediaWiki-File-management, 10Multimedia, and 2 others: Image thumbnail (cache?) broken on Wikimedia Commons, e.g. Information.svg, when viewing non-default resolution (e.g. 241px) - https://phabricator.wikimedia.org/T226271 (10Xaosflux) Possibly related to T30299 ?  I'd like to s...
[19:56:30] <wikibugs>	 10Operations, 10Commons, 10MediaWiki-File-management, 10Multimedia, and 2 others: Image thumbnail (cache?) broken on English Wikipedia, e.g. Information.svg, when viewing non-default resolution (e.g. 241px) - https://phabricator.wikimedia.org/T226271 (10JJMC89)
[19:56:42] <wikibugs>	 10Operations, 10Commons, 10MediaWiki-File-management, 10Multimedia, and 2 others: Image thumbnail (cache?) broken on English Wikipedia, e.g. Information.svg, when viewing non-default resolution (e.g. 241px) - https://phabricator.wikimedia.org/T226271 (10Krinkle) >>! In T226271#5274599, @Xaosflux wrote: > P...
[19:58:28] <wikibugs>	 10Operations, 10ops-eqiad: rack/setup/install kafka-main100[1-5] - https://phabricator.wikimedia.org/T226274 (10herron) These will need internal vlan/ips.  Fwiw kafka-main100[1-5] will be replacing kafka100[123], so those existing hosts could be used as a template.
[19:58:41] <wikibugs>	 10Operations, 10ops-eqiad: rack/setup/install kafka-main100[1-5] - https://phabricator.wikimedia.org/T226274 (10herron) a:05herron→03Cmjohnson
[20:04:17] <wikibugs>	 10Operations, 10SRE-Access-Requests: Access Q re maint1002 - https://phabricator.wikimedia.org/T225253 (10Iflorez) Not working yet. I will follow up on the -sre channel on irc. Thank you @jijiki
[20:10:23] <wikibugs>	 10Puppet, 10cloud-services-team (Kanban): Reduce the effects of puppet breakage on VPS - https://phabricator.wikimedia.org/T226270 (10Andrew) I have an (ironically) unpuppetized example of a dual-run setup running now:  abogott-dual-puppet-base-master.testlabs.eqiad.wmflabs serves only role::wmcs::instance for...
[20:21:48] <wikibugs>	 10Operations, 10Commons, 10MediaWiki-File-management, 10Multimedia, and 2 others: Image thumbnail (cache?) broken on English Wikipedia, e.g. Information.svg, when viewing non-default resolution (e.g. 241px) - https://phabricator.wikimedia.org/T226271 (10JJMC89) //(In case any of this is helpful to those in...
[20:35:16] <wikibugs>	 10Operations, 10Gerrit, 10Traffic: When downloading from git using HTTPS: HTTP 500 / GnuTLS recv error (-110) - https://phabricator.wikimedia.org/T225347 (10sbassett) 05Open→03Resolved
[20:41:54] <greg-g>	 twentyafterfour: ugh, I forgot about merging this https://gerrit.wikimedia.org/r/c/operations/puppet/+/517140
[20:43:25] <greg-g>	 herron: if you're still around would you be willing to review/merge ^
[20:54:06] <wikibugs>	 10Operations, 10Core Platform Team, 10MassMessage, 10WMF-JobQueue: Jobs not being executed on 1.34.0-wmf.10 - https://phabricator.wikimedia.org/T226109 (10MarcoAurelio) We're experiencing again global renames becoming stuck: https://meta.wikimedia.org/wiki/Special:GlobalRenameProgress  This time they start...
[21:00:53] <twentyafterfour>	 greg-g: do you want me to run some commands for you in the meantime? 
[21:01:37] <greg-g>	 twentyafterfour: I don't have a queue of them yet, I'll ping you when I do.
[21:23:57] <twentyafterfour>	 greg-g: ok I'll be around. 
[21:24:29] <greg-g>	 twentyafterfour: I'm getting pulled into some yak shaving that may prevent me from starting today. Don't wait for me when you're at quitting time.
[21:24:45] <twentyafterfour>	 ok
[21:25:36] <icinga-wm>	 PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [140.0] https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1
[21:28:13] <wikibugs>	 10Operations, 10Core Platform Team, 10MassMessage, 10WMF-JobQueue: Jobs not being executed on 1.34.0-wmf.10 - https://phabricator.wikimedia.org/T226109 (10Reedy) >>! In T226109#5274665, @MarcoAurelio wrote: > We're experiencing again global renames becoming stuck: https://meta.wikimedia.org/wiki/Special:Gl...
[21:29:43] <wikibugs>	 10Operations, 10Core Platform Team, 10MassMessage, 10WMF-JobQueue: Jobs not being executed on 1.34.0-wmf.10 - https://phabricator.wikimedia.org/T226109 (10Reedy) ` reedy@mwlog1001:/srv/mw-log$ grep "チルノ" JobExecutor.log | grep testwiki 2019-06-21 15:29:16 [XQz3oApAMF0AAHI6IgEAAAAR] mw1336 testwikidatawiki...
[21:35:46] <icinga-wm>	 RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1
[21:54:51] <wikibugs_>	 (03Abandoned) 10CRusnov: profile::netbox: Reorganize for splitting front and back-end. [puppet] - 10https://gerrit.wikimedia.org/r/518311 (https://phabricator.wikimedia.org/T223291) (owner: 10CRusnov)
[21:59:22] <icinga-wm>	 ACKNOWLEDGEMENT - cassandra-a CQL 10.64.0.230:9042 on restbase1007 is CRITICAL: connect to address 10.64.0.230 and port 9042: Connection refused eevans Decommissioned (T223976) https://phabricator.wikimedia.org/T93886
[21:59:22] <icinga-wm>	 ACKNOWLEDGEMENT - cassandra-a SSL 10.64.0.230:7001 on restbase1007 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused eevans Decommissioned (T223976) https://phabricator.wikimedia.org/T120662
[21:59:22] <icinga-wm>	 ACKNOWLEDGEMENT - cassandra-a service on restbase1007 is CRITICAL: CRITICAL - Expecting active but unit cassandra-a is failed eevans Decommissioned (T223976) https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[22:22:07] <greg-g>	 twentyafterfour: ./bin/bulk make-silent --id 1822 please :)
[22:30:19] <twentyafterfour>	 greg-g: done
[22:30:23] <greg-g>	 twentyafterfour: and now ./bin/bulk make-silent --id 1823
[22:30:52] <greg-g>	 and thank you!
[23:01:14] <James_F>	 Eurgh. Anyone have a bot handy that wants to make a shedload of redirects on MediaWiki.org? Translate extension just broke rather badly and failed to make redirects…
[23:35:18] <JJMC89[m]>	 James_F: I have one but ts not flagged on mw. Happy to fix those though.
[23:39:57] <James_F>	 JJMC89[m]: That's kind of you. I'll start a thread.
[23:41:37] <wikibugs>	 (03CR) 10Krinkle: [C: 03+1] "When deploying/SWAT, take special care to sync IS.php first. This means it cannot be reliably tested on mwdebug as 'scap pull' will apply " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518239 (https://phabricator.wikimedia.org/T225212) (owner: 10Lucas Werkmeister (WMDE))
[23:49:07] <James_F>	 JJMC89[m]: Posted on https://www.mediawiki.org/wiki/Topic:V22uce5k6lmij7xb if you want to volunteer. :-)
[23:57:48] <JJMC89[m]>	 James_F: Commented there. If you just want all of the redirects created, I can start that now.
[23:59:10] <James_F>	 JJMC89[m]: I'm pretty sure you won't be allowed to create pages in the Translate: namespace.
[23:59:42] <James_F>	 JJMC89[m]: Also there's all the talk pages. :-(