[00:07:09] <icinga-wm>	 RECOVERY - Check systemd state on aqs1007 is OK: OK - running: The system is fully operational
[00:07:30] <icinga-wm>	 RECOVERY - cassandra-a service on aqs1007 is OK: OK - cassandra-a is active
[00:11:40] <icinga-wm>	 PROBLEM - cassandra-a service on aqs1007 is CRITICAL: CRITICAL - Expecting active but unit cassandra-a is failed
[00:12:19] <icinga-wm>	 PROBLEM - Check systemd state on aqs1007 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[00:37:39] <icinga-wm>	 RECOVERY - cassandra-a service on aqs1007 is OK: OK - cassandra-a is active
[00:38:10] <icinga-wm>	 RECOVERY - Check systemd state on aqs1007 is OK: OK - running: The system is fully operational
[00:41:40] <icinga-wm>	 PROBLEM - cassandra-a service on aqs1007 is CRITICAL: CRITICAL - Expecting active but unit cassandra-a is failed
[00:42:19] <icinga-wm>	 PROBLEM - Check systemd state on aqs1007 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[01:07:20] <icinga-wm>	 RECOVERY - Check systemd state on aqs1007 is OK: OK - running: The system is fully operational
[01:07:50] <icinga-wm>	 RECOVERY - cassandra-a service on aqs1007 is OK: OK - cassandra-a is active
[01:11:39] <icinga-wm>	 PROBLEM - Check systemd state on aqs1007 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[01:12:00] <icinga-wm>	 PROBLEM - cassandra-a service on aqs1007 is CRITICAL: CRITICAL - Expecting active but unit cassandra-a is failed
[01:37:39] <icinga-wm>	 RECOVERY - Check systemd state on aqs1007 is OK: OK - running: The system is fully operational
[01:38:09] <icinga-wm>	 RECOVERY - cassandra-a service on aqs1007 is OK: OK - cassandra-a is active
[01:41:40] <icinga-wm>	 PROBLEM - Check systemd state on aqs1007 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[01:42:19] <icinga-wm>	 PROBLEM - cassandra-a service on aqs1007 is CRITICAL: CRITICAL - Expecting active but unit cassandra-a is failed
[02:07:19] <icinga-wm>	 RECOVERY - cassandra-a service on aqs1007 is OK: OK - cassandra-a is active
[02:07:49] <icinga-wm>	 RECOVERY - Check systemd state on aqs1007 is OK: OK - running: The system is fully operational
[02:11:29] <icinga-wm>	 PROBLEM - cassandra-a service on aqs1007 is CRITICAL: CRITICAL - Expecting active but unit cassandra-a is failed
[02:11:59] <icinga-wm>	 PROBLEM - Check systemd state on aqs1007 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[02:28:30] <logmsgbot>	 !log l10nupdate@deploy1001 scap sync-l10n completed (1.32.0-wmf.16) (duration: 08m 38s)
[02:28:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:37:30] <icinga-wm>	 RECOVERY - cassandra-a service on aqs1007 is OK: OK - cassandra-a is active
[02:38:00] <icinga-wm>	 RECOVERY - Check systemd state on aqs1007 is OK: OK - running: The system is fully operational
[02:38:46] <logmsgbot>	 !log l10nupdate@deploy1001 ResourceLoader cache refresh completed at Thu Aug 16 02:38:46 UTC 2018 (duration 10m 17s)
[02:38:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:41:40] <icinga-wm>	 PROBLEM - cassandra-a service on aqs1007 is CRITICAL: CRITICAL - Expecting active but unit cassandra-a is failed
[02:42:10] <icinga-wm>	 PROBLEM - Check systemd state on aqs1007 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[02:50:29] <wikibugs_>	 10Operations: Feedback Appreciatted: Use of HTTP Without TLS - https://phabricator.wikimedia.org/T202033 (10Legoktm) a:05Akondrahman>03None I'm not sure what kind of a useful answer you're going to get...I suspect each case has a different answer/reason. For ~/vagrant, it's used as a development tool on indi...
[03:07:39] <icinga-wm>	 RECOVERY - cassandra-a service on aqs1007 is OK: OK - cassandra-a is active
[03:08:09] <icinga-wm>	 RECOVERY - Check systemd state on aqs1007 is OK: OK - running: The system is fully operational
[03:11:50] <icinga-wm>	 PROBLEM - cassandra-a service on aqs1007 is CRITICAL: CRITICAL - Expecting active but unit cassandra-a is failed
[03:12:19] <icinga-wm>	 PROBLEM - Check systemd state on aqs1007 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[03:26:20] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 861.39 seconds
[03:37:29] <icinga-wm>	 RECOVERY - Check systemd state on aqs1007 is OK: OK - running: The system is fully operational
[03:38:00] <icinga-wm>	 RECOVERY - cassandra-a service on aqs1007 is OK: OK - cassandra-a is active
[03:41:09] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 270.29 seconds
[03:41:39] <icinga-wm>	 PROBLEM - Check systemd state on aqs1007 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[03:42:10] <icinga-wm>	 PROBLEM - cassandra-a service on aqs1007 is CRITICAL: CRITICAL - Expecting active but unit cassandra-a is failed
[04:07:40] <icinga-wm>	 RECOVERY - cassandra-a service on aqs1007 is OK: OK - cassandra-a is active
[04:08:10] <icinga-wm>	 RECOVERY - Check systemd state on aqs1007 is OK: OK - running: The system is fully operational
[04:12:00] <icinga-wm>	 PROBLEM - cassandra-a service on aqs1007 is CRITICAL: CRITICAL - Expecting active but unit cassandra-a is failed
[04:12:29] <icinga-wm>	 PROBLEM - Check systemd state on aqs1007 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[04:37:30] <icinga-wm>	 RECOVERY - Check systemd state on aqs1007 is OK: OK - running: The system is fully operational
[04:38:10] <icinga-wm>	 RECOVERY - cassandra-a service on aqs1007 is OK: OK - cassandra-a is active
[04:41:49] <icinga-wm>	 PROBLEM - Check systemd state on aqs1007 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[04:42:29] <icinga-wm>	 PROBLEM - cassandra-a service on aqs1007 is CRITICAL: CRITICAL - Expecting active but unit cassandra-a is failed
[05:07:30] <icinga-wm>	 RECOVERY - cassandra-a service on aqs1007 is OK: OK - cassandra-a is active
[05:07:59] <icinga-wm>	 RECOVERY - Check systemd state on aqs1007 is OK: OK - running: The system is fully operational
[05:08:00] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1001 is CRITICAL: instance=kubernetes1001.eqiad.wmnet operation_type=create_container https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[05:09:09] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[05:11:50] <icinga-wm>	 PROBLEM - cassandra-a service on aqs1007 is CRITICAL: CRITICAL - Expecting active but unit cassandra-a is failed
[05:12:47] <_joe_>	 !log moving away corrupted commitlog file on aqs1007 cassandra-a instance, trying to restart it
[05:12:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:12:59] <icinga-wm>	 RECOVERY - cassandra-a service on aqs1007 is OK: OK - cassandra-a is active
[05:15:49] <icinga-wm>	 RECOVERY - cassandra-a CQL 10.64.0.213:9042 on aqs1007 is OK: TCP OK - 0.000 second response time on 10.64.0.213 port 9042
[06:28:59] <icinga-wm>	 PROBLEM - puppet last run on mw1308 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/ssl/localcerts/jobrunner.svc.eqiad.wmnet.crt]
[06:29:10] <icinga-wm>	 PROBLEM - puppet last run on analytics1061 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/R/biocLite.R]
[06:52:29] <wikibugs_>	 10Operations, 10Cassandra: cassandra-a instance on aqs1007 is not starting - https://phabricator.wikimedia.org/T201986 (10ema) 05Open>03Resolved a:03ema @Joe removed the log and restarted cassandra-a. The service seems now to be working fine.  ``` 05:12 _joe_: moving away corrupted commitlog file on aqs1...
[06:54:41] <_joe_>	 heh sorry, but I opened the file
[06:54:52] <_joe_>	 and it was all zeroes
[06:55:03] <_joe_>	 so there was really nothing that could be done with it
[06:59:19] <icinga-wm>	 RECOVERY - puppet last run on mw1308 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[06:59:20] <wikibugs_>	 10Operations, 10Cassandra: cassandra-a instance on aqs1007 is not starting - https://phabricator.wikimedia.org/T201986 (10Joe) For the record: I removed the file (still on disk at `/srv/cassandra-a/commitlog/CommitLog-5-1530620590775.log.bak` once I noticed it was all zeroes.   Since there was no real informat...
[06:59:29] <icinga-wm>	 RECOVERY - puppet last run on analytics1061 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[07:02:34] <ema>	 _joe_: thanks for taking care of that!
[07:03:56] <_joe_>	 ema: yeah well, I just saw this system alarming all night long and tried to fix it, I can't say I'm sure what I did was 100% correct but I an all-zeroes file can't really do anything meaningful IMHO
[07:04:26] <wikibugs_>	 10Operations: Feedback Appreciatted: Use of HTTP Without TLS - https://phabricator.wikimedia.org/T202033 (10Aklapper) Proposing to close this task as invalid as it's vague and not actionable. Please also read and understand T201576#4490641.  Dropping automatically created lists of http links without any further...
[07:05:00] <wikibugs_>	 10Puppet: Suspicious Comments in Puppet Scripts - https://phabricator.wikimedia.org/T201576 (10Aklapper)
[07:22:20] <wikibugs_>	 (03CR) 10Volans: [V: 032 C: 032] LDAP: allow to specify multiple search strings [software/debmonitor] - 10https://gerrit.wikimedia.org/r/452686 (owner: 10Volans)
[07:24:39] <moritzm>	 !log rebooting install2002 for kernel security update
[07:24:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:27:34] <moritzm>	 !log rebooting install1002 for kernel security update
[07:27:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:33:39] <icinga-wm>	 PROBLEM - puppet last run on stat1005 is CRITICAL: CRITICAL: Puppet has 6 failures. Last run 3 minutes ago with 6 failures. Failed resources (up to 3 shown): Exec[git_pull_wmde/scripts],Exec[git_pull_wmde/toolkit-analyzer-build],Exec[git_pull_mediawiki/event-schemas],Exec[git_pull_statistics_mediawiki]
[07:33:46] <wikibugs_>	 (03PS1) 10Volans: Updated src to v0.1.8 and rebuilt wheels [software/debmonitor/deploy] - 10https://gerrit.wikimedia.org/r/453092
[07:37:40] <wikibugs_>	 (03CR) 10Volans: [V: 032 C: 032] Updated src to v0.1.8 and rebuilt wheels [software/debmonitor/deploy] - 10https://gerrit.wikimedia.org/r/453092 (owner: 10Volans)
[07:40:12] <wikibugs_>	 (03PS2) 10Giuseppe Lavagetto: PHP: create module for modern Debian-based distributions [puppet] - 10https://gerrit.wikimedia.org/r/452664 (https://phabricator.wikimedia.org/T201140)
[07:40:14] <wikibugs_>	 (03PS1) 10Giuseppe Lavagetto: mediawiki: move php to a profile, use the php class [puppet] - 10https://gerrit.wikimedia.org/r/453093 (https://phabricator.wikimedia.org/T201140)
[07:41:11] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] PHP: create module for modern Debian-based distributions [puppet] - 10https://gerrit.wikimedia.org/r/452664 (https://phabricator.wikimedia.org/T201140) (owner: 10Giuseppe Lavagetto)
[07:41:50] <wikibugs_>	 10Operations, 10DNS, 10Traffic: rack/setup/install authdns1001.wikimedia.org - https://phabricator.wikimedia.org/T196693 (10Vgutierrez) @MoritzMuehlenhoff ack, thanks for pinging us
[07:43:34] <wikibugs_>	 (03PS1) 10Gehel: elasticsearch: storage device name changed with new partitioning scheme [puppet] - 10https://gerrit.wikimedia.org/r/453094 (https://phabricator.wikimedia.org/T198391)
[07:45:28] <logmsgbot>	 !log volans@deploy1001 Started deploy [debmonitor/deploy@1f01fd1]: Release v0.1.8
[07:45:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:45:59] <logmsgbot>	 !log volans@deploy1001 Finished deploy [debmonitor/deploy@1f01fd1]: Release v0.1.8 (duration: 00m 31s)
[07:46:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:46:30] <wikibugs_>	 (03PS1) 10Volans: debmonitor: allow access to WMF+NDA groups [puppet] - 10https://gerrit.wikimedia.org/r/453096
[07:49:14] <gehel>	 !log reimaging elastic10(23|24)
[07:49:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:49:43] <wikibugs_>	 10Operations, 10Discovery-Search (Current work), 10Patch-For-Review: migrate elasticsearch to stretch (from jessie) - https://phabricator.wikimedia.org/T193649 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by gehel on neodymium.eqiad.wmnet for hosts: ``` ['elastic1023.eqiad.wmnet', 'elastic1024...
[07:52:26] <wikibugs_>	 (03CR) 10Muehlenhoff: [C: 031] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/453096 (owner: 10Volans)
[07:53:23] <wikibugs_>	 (03CR) 10Volans: [C: 032] debmonitor: allow access to WMF+NDA groups [puppet] - 10https://gerrit.wikimedia.org/r/453096 (owner: 10Volans)
[08:03:04] <wikibugs_>	 10Operations, 10ops-codfw, 10Traffic: Decommission baham - https://phabricator.wikimedia.org/T199247 (10Vgutierrez)
[08:03:11] <wikibugs_>	 10Operations, 10ops-codfw, 10Traffic, 10decommission: Decommission baham - https://phabricator.wikimedia.org/T199247 (10Vgutierrez)
[08:03:49] <icinga-wm>	 RECOVERY - puppet last run on stat1005 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[08:05:23] <wikibugs_>	 10Operations, 10DNS, 10Traffic: rack/setup/install authdns1001.wikimedia.org - https://phabricator.wikimedia.org/T196693 (10Vgutierrez) 05Open>03Resolved
[08:05:36] <wikibugs_>	 10Operations, 10ops-eqiad, 10Traffic, 10decommission: Decommission radon - https://phabricator.wikimedia.org/T202040 (10Vgutierrez)
[08:06:49] <wikibugs_>	 (03CR) 10Jcrespo: [C: 04-1] "Public exposure of a credential- please change it and document it on the private repo only." [puppet] - 10https://gerrit.wikimedia.org/r/452997 (owner: 10Andrew Bogott)
[08:08:55] <wikibugs_>	 (03PS1) 10Vgutierrez: authdns: Remove radon from the authdns host list [puppet] - 10https://gerrit.wikimedia.org/r/453099 (https://phabricator.wikimedia.org/T202040)
[08:12:36] <wikibugs_>	 (03CR) 10Vgutierrez: [C: 032] authdns: Remove radon from the authdns host list [puppet] - 10https://gerrit.wikimedia.org/r/453099 (https://phabricator.wikimedia.org/T202040) (owner: 10Vgutierrez)
[08:12:41] <wikibugs_>	 10Operations, 10Discovery-Search (Current work), 10Patch-For-Review: migrate elasticsearch to stretch (from jessie) - https://phabricator.wikimedia.org/T193649 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['elastic1023.eqiad.wmnet'] ```  Of which those **FAILED**: ``` ['elastic1023.eqiad.wmnet...
[08:12:44] <wikibugs_>	 (03PS2) 10Vgutierrez: authdns: Remove radon from the authdns host list [puppet] - 10https://gerrit.wikimedia.org/r/453099 (https://phabricator.wikimedia.org/T202040)
[08:13:55] <wikibugs_>	 10Operations, 10Discovery-Search (Current work), 10Patch-For-Review: migrate elasticsearch to stretch (from jessie) - https://phabricator.wikimedia.org/T193649 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by gehel on neodymium.eqiad.wmnet for hosts: ``` ['elastic1023.eqiad.wmnet'] ``` The log...
[08:14:10] <icinga-wm>	 PROBLEM - Check systemd state on elastic1024 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[08:15:10] <icinga-wm>	 RECOVERY - Check systemd state on elastic1024 is OK: OK - running: The system is fully operational
[08:16:39] <moritzm>	 !log upgrading wikidiff to 1.7.2 on mw1334-mw1338/mw1307/mw1318/ (HHVM bytecode cache is pruned during update)
[08:16:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:23:26] <wikibugs_>	 (03PS1) 10Vgutierrez: site: Reimage radon as stretch spare system [puppet] - 10https://gerrit.wikimedia.org/r/453100 (https://phabricator.wikimedia.org/T202040)
[08:32:55] <moritzm>	 !log uploaded jenkins 2.121.3 to apt.wikimedia.org (for jessie and stretch)
[08:32:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:34:02] <wikibugs_>	 (03CR) 10Vgutierrez: [C: 032] site: Reimage radon as stretch spare system [puppet] - 10https://gerrit.wikimedia.org/r/453100 (https://phabricator.wikimedia.org/T202040) (owner: 10Vgutierrez)
[08:35:03] <vgutierrez>	 !log Reimaging radon as spare system - T202040
[08:35:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:35:10] <stashbot>	 T202040: Decommission radon - https://phabricator.wikimedia.org/T202040
[08:37:18] <ema>	 !log reboot cp2009 for kernel upgrade
[08:37:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:37:57] <wikibugs_>	 10Operations, 10Discovery-Search (Current work), 10Patch-For-Review: migrate elasticsearch to stretch (from jessie) - https://phabricator.wikimedia.org/T193649 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['elastic1023.eqiad.wmnet'] ```  Of which those **FAILED**: ``` ['elastic1023.eqiad.wmnet...
[08:39:08] <wikibugs_>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: new ssh key for daniel - https://phabricator.wikimedia.org/T201913 (10daniel) Is the GPG signature I added to the description sufficient?  If not, I'll be in the WMDE office in a couple of hours, so I could do a quick hangout.
[08:40:18] <moritzm>	 !log upgrading wikidiff to 1.7.2 on mw1319-mw1333 (HHVM bytecode cache is pruned during update)
[08:40:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:41:04] <wikibugs_>	 10Operations, 10ops-eqiad, 10Traffic, 10decommission, 10Patch-For-Review: Decommission radon - https://phabricator.wikimedia.org/T202040 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by vgutierrez on neodymium.eqiad.wmnet for hosts: ``` radon.wikimedia.org ``` The log can be found in `/var/...
[08:58:06] <wikibugs_>	 (03PS10) 10Jcrespo: db backup statistics: Initial implementation of the backup stats [puppet] - 10https://gerrit.wikimedia.org/r/449681 (https://phabricator.wikimedia.org/T198987)
[09:07:05] <wikibugs_>	 (03CR) 10Jcrespo: "> I don't know if it is planned but being able to specify a wiki to" [puppet] - 10https://gerrit.wikimedia.org/r/449681 (https://phabricator.wikimedia.org/T198987) (owner: 10Jcrespo)
[09:14:34] <wikibugs_>	 (03CR) 10Jcrespo: "> I like the abstraction level of "section" so at restore time we can" [puppet] - 10https://gerrit.wikimedia.org/r/449681 (https://phabricator.wikimedia.org/T198987) (owner: 10Jcrespo)
[09:16:35] <gehel>	 !log reimaging elastic1022
[09:16:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:16:56] <wikibugs_>	 10Operations, 10Discovery-Search (Current work), 10Patch-For-Review: migrate elasticsearch to stretch (from jessie) - https://phabricator.wikimedia.org/T193649 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by gehel on neodymium.eqiad.wmnet for hosts: ``` ['elastic1022.eqiad.wmnet'] ``` The log...
[09:19:06] <wikibugs_>	 10Operations, 10Traffic: cp3032 PS Redundancy Lost - https://phabricator.wikimedia.org/T202046 (10ema)
[09:19:35] <wikibugs_>	 10Operations, 10Traffic: cp3032 PS Redundancy Lost - https://phabricator.wikimedia.org/T202046 (10ema) p:05Triage>03Normal
[09:19:55] <wikibugs_>	 10Operations, 10ops-eqiad, 10Traffic, 10decommission, 10Patch-For-Review: Decommission radon - https://phabricator.wikimedia.org/T202040 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['radon.wikimedia.org'] ```  and were **ALL** successful.
[09:20:08] <wikibugs_>	 10Operations, 10ops-esams, 10Traffic: cp3032 PS Redundancy Lost - https://phabricator.wikimedia.org/T202046 (10ema)
[09:20:30] <icinga-wm>	 ACKNOWLEDGEMENT - IPMI Sensor Status on cp3032 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] Ema https://phabricator.wikimedia.org/T202046
[09:23:15] <wikibugs_>	 10Operations, 10ops-eqiad, 10Traffic, 10decommission: Decommission radon - https://phabricator.wikimedia.org/T202040 (10Vgutierrez)
[09:24:22] <wikibugs_>	 10Operations, 10Traffic, 10Patch-For-Review: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by ema on neodymium.eqiad.wmnet for hosts: ``` ['cp3049.esams.wmnet', 'cp2001.codfw.wmnet'] ``` The log can be found in `/var/l...
[09:26:08] <wikibugs_>	 10Operations, 10Traffic, 10Patch-For-Review: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by ema on neodymium.eqiad.wmnet for hosts: ``` cp4023.ulsfo.wmnet ``` The log can be found in `/var/log/wmf-auto-reimage/201808...
[09:26:44] <wikibugs_>	 (03PS7) 10Vgutierrez: [WIP] Refactor certcentral.certificate_management() [software/certcentral] - 10https://gerrit.wikimedia.org/r/451867
[09:27:46] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] [WIP] Refactor certcentral.certificate_management() [software/certcentral] - 10https://gerrit.wikimedia.org/r/451867 (owner: 10Vgutierrez)
[09:30:31] <wikibugs_>	 (03CR) 10MarcoAurelio: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/294679 (owner: 10Gehel)
[09:31:51] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Fixed typo (cron instead of from). [puppet] - 10https://gerrit.wikimedia.org/r/294679 (owner: 10Gehel)
[09:34:57] <icinga-wm>	 PROBLEM - IPsec on cp2026 is CRITICAL: Strongswan CRITICAL - ok: 60 not-conn: cp3049_v4, cp3049_v6, cp4023_v4, cp4023_v6
[09:35:38] <wikibugs_>	 (03PS2) 10Muehlenhoff: Tweak fragmentation memory limits [puppet] - 10https://gerrit.wikimedia.org/r/452901 (https://phabricator.wikimedia.org/T201608)
[09:36:32] <wikibugs_>	 (03CR) 10Ema: [C: 031] "We could mention the previous defaults for reference, LGTM otherwise." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/452901 (https://phabricator.wikimedia.org/T201608) (owner: 10Muehlenhoff)
[09:38:36] <icinga-wm>	 ACKNOWLEDGEMENT - IPsec on cp2026 is CRITICAL: Strongswan CRITICAL - ok: 60 not-conn: cp3049_v4, cp3049_v6, cp4023_v4, cp4023_v6 Ema reimaging
[09:39:09] <wikibugs_>	 (03PS3) 10Muehlenhoff: Tweak fragmentation memory limits [puppet] - 10https://gerrit.wikimedia.org/r/452901 (https://phabricator.wikimedia.org/T201608)
[09:40:25] <icinga-wm>	 RECOVERY - Device not healthy -SMART- on cp2009 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=cp2009&var-datasource=codfw%2520prometheus%252Fops
[09:41:34] <wikibugs_>	 10Operations, 10Discovery-Search (Current work), 10Patch-For-Review: migrate elasticsearch to stretch (from jessie) - https://phabricator.wikimedia.org/T193649 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['elastic1022.eqiad.wmnet'] ```  and were **ALL** successful.
[09:42:13] <wikibugs_>	 (03CR) 10Muehlenhoff: [C: 032] Tweak fragmentation memory limits [puppet] - 10https://gerrit.wikimedia.org/r/452901 (https://phabricator.wikimedia.org/T201608) (owner: 10Muehlenhoff)
[09:46:28] <icinga-wm>	 PROBLEM - Check systemd state on labvirt1016 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[09:46:34] <gehel>	 !log all elasticsearch nodes reimaged (except elastic1029, waiting on memory issue) - T198391 / T193649 / T201991
[09:46:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:46:44] <stashbot>	 T201991: Broken memory on elastic1029 - https://phabricator.wikimedia.org/T201991
[09:46:45] <stashbot>	 T198391: migrate elasticsearch cirrus cluster to RAID0 - https://phabricator.wikimedia.org/T198391
[09:46:46] <stashbot>	 T193649: migrate elasticsearch to stretch (from jessie) - https://phabricator.wikimedia.org/T193649
[09:53:31] <icinga-wm>	 RECOVERY - IPsec on cp2026 is OK: Strongswan OK - 64 ESP OK
[09:53:44] <arturo>	 ` systemd-sysctl[49685]: Couldn't write '262144' to 'net/ipv6/ip6frag_high_thresh', ignoring: Invalid argument`
[09:53:47] <wikibugs_>	 10Operations, 10Patch-For-Review: Onboarding Effie Mouzeli - https://phabricator.wikimedia.org/T201816 (10Volans) I've added Effie to the "wmf" LDAP group.
[09:53:54] <arturo>	 moritzm: could this be related to some kernel upgrade?
[09:53:57] <arturo>	 is on labvirt1016
[09:55:12] <icinga-wm>	 ACKNOWLEDGEMENT - Check systemd state on labvirt1016 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. Arturo Borrero Gonzalez Looking
[09:55:34] <wikibugs_>	 10Operations, 10Traffic, 10Patch-For-Review: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['cp2001.codfw.wmnet', 'cp3049.esams.wmnet'] ```  and were **ALL** successful.
[09:57:21] <icinga-wm>	 PROBLEM - Check systemd state on labtestmetal2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[09:58:14] <wikibugs_>	 10Operations, 10Traffic, 10Patch-For-Review: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['cp4023.ulsfo.wmnet'] ```  and were **ALL** successful.
[09:58:20] <moritzm>	 arturo: looking
[09:58:42] <icinga-wm>	 PROBLEM - Check systemd state on labtestvirt2003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[09:59:42] <icinga-wm>	 PROBLEM - Check systemd state on cloudvirt1022 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[09:59:53] <icinga-wm>	 ACKNOWLEDGEMENT - Check systemd state on labtestmetal2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. Arturo Borrero Gonzalez looking
[10:00:41] <volans>	 !log add jiji to the 'ops' LDAP group - T201849
[10:00:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:00:54] <stashbot>	 T201849: Request production global root access for Effie Mouzeli - https://phabricator.wikimedia.org/T201849
[10:01:06] <moritzm>	 arturo: seems to be limited to the new jessie-based labvirts, right?
[10:01:33] <moritzm>	 on the trusty ones, the settings have been successfully applied
[10:02:30] <arturo>	 moritzm: ok, could we add an `if` switch?
[10:02:42] <moritzm>	 on 1016 all the values have been set, but for some reason it failed to apply ip6frag_high_thresh
[10:02:55] <moritzm>	 arturo: let's rather fix the bug and bring them in line
[10:03:33] <moritzm>	 setting the sysctl value manually via "sysctl -w" also works fine
[10:04:57] <moritzm>	 I'm running puppet on 1018 to see whether it also happens there
[10:06:02] <arturo>	 moritzm: I just did a simple `sudo systemctl restart systemd-sysctl.service` and now the unit is not failing -_-
[10:06:04] <icinga-wm>	 RECOVERY - Check systemd state on labvirt1016 is OK: OK - running: The system is fully operational
[10:06:17] <moritzm>	 worked fine on 1018 as well running puppet
[10:06:23] <icinga-wm>	 PROBLEM - Check systemd state on labtestcontrol2003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:09:32] <moritzm>	 arturo: it seems all the affected hosts run systemd from jessie-backports, that's what I mentioned in the ticket about too loose pinning, we should strictly only pull in the OpenStack packages from jessie-backports
[10:10:48] <wikibugs_>	 10Operations, 10ops-eqiad, 10DC-Ops: Replace wtp1043's sda - https://phabricator.wikimedia.org/T196886 (10faidon) @RobH, @cmjohnson, this has been open for two months now -- why is this taking such a long time to resolve?
[10:10:56] <moritzm>	 I'll fix up the systemd-sysctl status where it failed, these settings won't be re-set again, as the new jessie kernel reduces the default value
[10:11:37] <moritzm>	 the sysctl application via puppet does the same, but avoid another round of reboots, but it's effectively a one time effort until the servers are rebooted again
[10:11:59] <paravoid>	 arturo: we are getting labcontrol1001 cronspam
[10:13:00] <arturo>	 paravoid: ack I saw it this morning
[10:14:03] <icinga-wm>	 RECOVERY - Check systemd state on cloudvirt1022 is OK: OK - running: The system is fully operational
[10:15:12] <icinga-wm>	 RECOVERY - Check systemd state on labtestvirt2003 is OK: OK - running: The system is fully operational
[10:15:33] <icinga-wm>	 RECOVERY - Check systemd state on labtestcontrol2003 is OK: OK - running: The system is fully operational
[10:15:43] <moritzm>	 arturo: ^should be all sorted
[10:18:56] <paravoid>	 systemd from jessie-backports? ouch!
[10:27:04] <icinga-wm>	 ACKNOWLEDGEMENT - Check systemd state on wdqs1010 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. Gehel data import in progress
[10:27:15] <_joe_>	 !log restarting cpjobqueue on scb1002, not listening on its tcp port
[10:27:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:29:55] <wikibugs_>	 (03PS1) 10Jijiki: admin: added user jiji to ops group [puppet] - 10https://gerrit.wikimedia.org/r/453107 (https://phabricator.wikimedia.org/T201849)
[10:30:23] <_joe_>	 !log restarting changeprop on scb1002, not listening on its tcp port
[10:30:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:31:31] <wikibugs_>	 10Operations, 10ops-codfw: Degraded RAID on db2039 - https://phabricator.wikimedia.org/T201761 (10jcrespo) 05Open>03Resolved Thanks,  ``` root@db2039:~$ sudo /usr/local/lib/nagios/plugins/get-raid-status-hpssacli   Smart Array P420i in Slot 0 (Embedded)     array A        Logical Drive: 1          Size: 3....
[10:34:42] <icinga-wm>	 PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen
[10:36:52] <icinga-wm>	 RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen
[10:42:46] <wikibugs_>	 (03CR) 10Giuseppe Lavagetto: [C: 032] "https://puppet-compiler.wmflabs.org/compiler02/12105/puppetmaster1001.eqiad.wmnet/ looks good, merging" [puppet] - 10https://gerrit.wikimedia.org/r/453107 (https://phabricator.wikimedia.org/T201849) (owner: 10Jijiki)
[10:43:47] <moritzm>	 !log upgrading wikidiff to 1.7.2 on mw1285-mw1290 and mw1312-mw1317 (HHVM bytecode cache is pruned during update)
[10:43:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:52:43] <wikibugs_>	 (03PS8) 10Vgutierrez: [WIP] Refactor certcentral.certificate_management() [software/certcentral] - 10https://gerrit.wikimedia.org/r/451867
[10:53:45] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] [WIP] Refactor certcentral.certificate_management() [software/certcentral] - 10https://gerrit.wikimedia.org/r/451867 (owner: 10Vgutierrez)
[10:55:26] <wikibugs_>	 10Operations, 10ops-codfw, 10DBA: db2042 RAID battery failed - https://phabricator.wikimedia.org/T202051 (10jcrespo)
[11:00:06] <jouncebot>	 addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor My software never has bugs. It just develops random features. Rise for European Mid-day SWAT(Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180816T1100).
[11:00:06] <jouncebot>	 No GERRIT patches in the queue for this window AFAICS.
[11:00:57] <Amir1>	 Can I push a change for SWAT?
[11:02:20] <Amir1>	 zeljkof: ^
[11:02:21] <zeljkof>	 Amir1: go ahead, I'm on vacation :)
[11:02:29] <Amir1>	 oh nice, enjoy!
[11:03:36] <arturo>	 !log manually delete glance rsync image cronjob from the glancesync user in labcontrol1001.wikimedia.org (leftover after glance merge in main/eqiad1)
[11:03:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:09:05] <moritzm>	 !log upgrading wikidiff to 1.7.2 on mw1339-mw1348 (HHVM bytecode cache is pruned during update)
[11:09:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:09:30] <wikibugs_>	 (03PS9) 10Vgutierrez: [WIP] Refactor certcentral.certificate_management() [software/certcentral] - 10https://gerrit.wikimedia.org/r/451867
[11:12:21] <jynus>	 !log stopping db2042 for maintenance T202051
[11:12:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:12:28] <stashbot>	 T202051: db2042 RAID battery failed - https://phabricator.wikimedia.org/T202051
[11:31:28] <logmsgbot>	 !log ladsgroup@deploy1001 Synchronized php-1.32.0-wmf.16/maintenance/populateChangeTagDef.php: SWAT: [[gerrit:452950|Add option to populateChangeTagDef not to update the count]] (duration: 00m 53s)
[11:31:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:34:33] <Amir1>	 !log EU mid-day SWAT is done
[11:34:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:37:41] <arturo>	 !log T201473 copy `prometheus-pdns-exporter` from trusty-wikimedia to jessie-wikimedia in reprepro
[11:37:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:37:48] <stashbot>	 T201473: prometheus-pdns-exporter for Jessie? - https://phabricator.wikimedia.org/T201473
[11:45:48] <wikibugs_>	 10Operations, 10ops-codfw, 10DBA: db2042 RAID battery failed - https://phabricator.wikimedia.org/T202051 (10jcrespo) 05Open>03Resolved a:03jcrespo Solved with a reboot, let's reopen if it happens after some time CC @Marostegui @Papaul.
[11:46:09] <wikibugs_>	 (03PS1) 10Ema: ATS: fix routing to Restbase [puppet] - 10https://gerrit.wikimedia.org/r/453111 (https://phabricator.wikimedia.org/T199720)
[11:51:17] <wikibugs_>	 (03PS10) 10Vgutierrez: [WIP] Refactor certcentral.certificate_management() [software/certcentral] - 10https://gerrit.wikimedia.org/r/451867
[11:56:34] <wikibugs_>	 (03PS3) 10Giuseppe Lavagetto: PHP: create module for modern Debian-based distributions [puppet] - 10https://gerrit.wikimedia.org/r/452664 (https://phabricator.wikimedia.org/T201140)
[11:56:35] <wikibugs_>	 (03PS2) 10Giuseppe Lavagetto: mediawiki: move php to a profile, use the php class [puppet] - 10https://gerrit.wikimedia.org/r/453093 (https://phabricator.wikimedia.org/T201140)
[11:57:23] <wikibugs_>	 (03PS11) 10Vgutierrez: Refactor certcentral.certificate_management() [software/certcentral] - 10https://gerrit.wikimedia.org/r/451867
[12:00:17] <moritzm>	 !log installing ruby2.3 security updates
[12:00:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:02:54] <wikibugs_>	 (03PS1) 10Arturo Borrero Gonzalez: d/rules: prevent dh_installinit from installing sysvinit files [debs/prometheus-pdns-exporter] - 10https://gerrit.wikimedia.org/r/453112 (https://phabricator.wikimedia.org/T201473)
[12:06:16] <gehel>	 !log depooling wdqs[12]003 to catchup on updates
[12:06:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:08:06] <arturo>	 !log T201473 install a new version of `prometheus-pdns-exporter` (0.3) into jessie-wikimedia, due to errors in the postinst script
[12:08:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:08:13] <stashbot>	 T201473: prometheus-pdns-exporter for Jessie? - https://phabricator.wikimedia.org/T201473
[12:08:15] <wikibugs_>	 (03PS1) 10Arturo Borrero Gonzalez: d/changelog: generate new entry for v0.3 [debs/prometheus-pdns-exporter] - 10https://gerrit.wikimedia.org/r/453115 (https://phabricator.wikimedia.org/T201473)
[12:09:43] <wikibugs_>	 (03CR) 10Arturo Borrero Gonzalez: [C: 032] d/rules: prevent dh_installinit from installing sysvinit files [debs/prometheus-pdns-exporter] - 10https://gerrit.wikimedia.org/r/453112 (https://phabricator.wikimedia.org/T201473) (owner: 10Arturo Borrero Gonzalez)
[12:10:12] <wikibugs_>	 (03CR) 10Arturo Borrero Gonzalez: [C: 032] d/changelog: generate new entry for v0.3 [debs/prometheus-pdns-exporter] - 10https://gerrit.wikimedia.org/r/453115 (https://phabricator.wikimedia.org/T201473) (owner: 10Arturo Borrero Gonzalez)
[12:11:52] <icinga-wm>	 RECOVERY - puppet last run on cloudservices1003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[12:12:24] <arturo>	 \o/ 
[12:15:19] <wikibugs_>	 (03CR) 10Muehlenhoff: "That's not really needed? If a systemd unit is around,systemd will simply ignore the sysvinit script." [debs/prometheus-pdns-exporter] - 10https://gerrit.wikimedia.org/r/453112 (https://phabricator.wikimedia.org/T201473) (owner: 10Arturo Borrero Gonzalez)
[12:17:31] <wikibugs_>	 (03CR) 10Arturo Borrero Gonzalez: [C: 032] "dh_installinit will put some code in postinst that will try to call invoke-rc.d for prometheus-pdns-exporter, which doesn't exists, and pa" [debs/prometheus-pdns-exporter] - 10https://gerrit.wikimedia.org/r/453112 (https://phabricator.wikimedia.org/T201473) (owner: 10Arturo Borrero Gonzalez)
[12:40:09] <wikibugs_>	 (03CR) 10Muehlenhoff: "invoke-rc.d is shipped in sysv-rc which is "priority: required", that should not happen. It's also installed on cloudservices1003?" [debs/prometheus-pdns-exporter] - 10https://gerrit.wikimedia.org/r/453112 (https://phabricator.wikimedia.org/T201473) (owner: 10Arturo Borrero Gonzalez)
[12:44:08] <moritzm>	 !log upgrading wikidiff to 1.7.2 on labweb* (HHVM bytecode cache is pruned during update)
[12:44:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:05:27] <gehel>	 !log restarting blazegraph on wdqs[12]003
[13:05:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:15:39] <moritzm>	 !log rebooting serpens for kernel security update
[13:15:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:17:03] <icinga-wm>	 PROBLEM - puppet last run on mw1231 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[tzdata]
[13:21:15] <wikibugs_>	 (03PS1) 10Jcrespo: mariadb: Depool es1018 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/453122
[13:31:26] <moritzm>	 !log rebooting seaborgium for kernel security update
[13:31:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:33:53] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 70.00% of data above the critical threshold [50.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen
[13:35:05] <wikibugs_>	 (03PS16) 10Bstorm: labstore: Change backup cron to a systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/451657 (https://phabricator.wikimedia.org/T171394)
[13:35:07] <wikibugs_>	 (03CR) 10Arturo Borrero Gonzalez: [C: 032] "> invoke-rc.d is shipped in sysv-rc which is "priority: required"," [debs/prometheus-pdns-exporter] - 10https://gerrit.wikimedia.org/r/453112 (https://phabricator.wikimedia.org/T201473) (owner: 10Arturo Borrero Gonzalez)
[13:35:36] <jynus>	 there is high criticals in the last half an hour
[13:36:02] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 70.00% above the threshold [25.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen
[13:39:32] <Amir1>	 This basic grammar fix up: https://gerrit.wikimedia.org/r/c/operations/puppet/+/452716 
[13:39:36] <Amir1>	 that would be great
[13:42:13] <icinga-wm>	 RECOVERY - puppet last run on mw1231 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[13:43:38] <wikibugs_>	 (03PS1) 10Vgutierrez: Implement different Certificate.save() modes [software/certcentral] - 10https://gerrit.wikimedia.org/r/453124
[13:44:41] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Implement different Certificate.save() modes [software/certcentral] - 10https://gerrit.wikimedia.org/r/453124 (owner: 10Vgutierrez)
[13:46:14] <moritzm>	 !log rebooting labtestservices2002/2003 for kernel security update
[13:46:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:47:40] <wikibugs_>	 (03CR) 10Andrew Bogott: "That hash seems to originally come from 44d4872620e30f47a3465f01b2f3e9f12e3634a4  -- I guess I assumed that it was a dummy :)  I'll refres" [puppet] - 10https://gerrit.wikimedia.org/r/452997 (owner: 10Andrew Bogott)
[13:47:51] <wikibugs_>	 (03Restored) 10Gehel: [WIP] extract reporting from BaseEventHandler [software/cumin] - 10https://gerrit.wikimedia.org/r/451080 (owner: 10Gehel)
[13:48:21] <wikibugs_>	 (03PS2) 10Vgutierrez: Implement different Certificate.save() modes [software/certcentral] - 10https://gerrit.wikimedia.org/r/453124
[13:52:41] <wikibugs_>	 (03CR) 10Giuseppe Lavagetto: [C: 031] "LGTM, thanks for implementing the ; Only doubt I had is you stop getting spammed by its output, which looks like a net win, but it's a cha" [puppet] - 10https://gerrit.wikimedia.org/r/451657 (https://phabricator.wikimedia.org/T171394) (owner: 10Bstorm)
[13:53:13] <wikibugs_>	 (03PS1) 10BBlack: puppetmaster: use strong ciphers only [puppet] - 10https://gerrit.wikimedia.org/r/453126
[13:53:15] <wikibugs_>	 (03PS1) 10BBlack: tlsproxy: no-op rename of params to tlsproxy namespace [puppet] - 10https://gerrit.wikimedia.org/r/453127
[13:53:17] <wikibugs_>	 (03PS1) 10BBlack: tlsproxy: parameterize ciphersuite level [puppet] - 10https://gerrit.wikimedia.org/r/453128
[13:53:19] <wikibugs_>	 (03PS1) 10BBlack: role::cache::*: explicit tlsproxy compat level [puppet] - 10https://gerrit.wikimedia.org/r/453129
[13:53:21] <wikibugs_>	 (03PS1) 10BBlack: tlsproxy: default ciphersuite_level strong [puppet] - 10https://gerrit.wikimedia.org/r/453130
[13:54:30] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] tlsproxy: parameterize ciphersuite level [puppet] - 10https://gerrit.wikimedia.org/r/453128 (owner: 10BBlack)
[13:54:38] <moritzm>	 !log rebooting labtestvirt2003 for kernel security update
[13:54:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:55:05] <wikibugs_>	 (03CR) 10Bstorm: [C: 032] labstore: Change backup cron to a systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/451657 (https://phabricator.wikimedia.org/T171394) (owner: 10Bstorm)
[13:58:33] <paravoid>	 bstorm_: oh wow, this is awesome, nice!
[13:59:20] <bstorm_>	 :)
[14:01:15] <bstorm_>	 Except it has a problem on the server that I didn't see in the compiler, lol.  Shouldn't be hard to fix
[14:01:33] <arturo>	 ?
[14:01:40] <arturo>	 something from wikibugs?
[14:01:42] <icinga-wm>	 PROBLEM - puppet last run on labstore2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[14:02:22] <paravoid>	 bstorm_: I'm curious, why keep the sunday => Sun mapping and not just convert callers to pass Sun/Mon/Tue as $weekday?
[14:03:12] <bstorm_>	 I was keying off how it was done originally in that.  I might change that in the next patch (needed to fix the dependency).  I thought the systemd:unit would be enough.  It wants a systemd:service :-p
[14:03:45] <bstorm_>	 The compiler didn't error, which is weird *shrugs*
[14:04:09] <bstorm_>	 Gonna revert it and fix it up quick
[14:04:41] <wikibugs_>	 (03PS1) 10Bstorm: Revert "labstore: Change backup cron to a systemd timer" [puppet] - 10https://gerrit.wikimedia.org/r/453135
[14:04:43] <icinga-wm>	 PROBLEM - IPMI Sensor Status on elastic1022 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical]
[14:06:00] <wikibugs_>	 (03CR) 10Bstorm: [C: 032] Revert "labstore: Change backup cron to a systemd timer" [puppet] - 10https://gerrit.wikimedia.org/r/453135 (owner: 10Bstorm)
[14:07:29] <wikibugs_>	 (03CR) 10Alex Monk: [C: 04-1] "The TODO in this make it seem like this commit completely breaks our LE cert issuance?" (031 comment) [software/certcentral] - 10https://gerrit.wikimedia.org/r/451866 (owner: 10Vgutierrez)
[14:09:00] <_joe_>	 bstorm_: what was the problem?
[14:09:35] <bstorm_>	 It depends on systemd:service, but I had done a systemd:unit beforehand.  It said there was no systemd:service with that name in the catalog
[14:09:38] <bstorm_>	 Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Invalid relationship: Systemd::Service[block_sync] { require => Systemd::Service[block_sync.service] }, because Systemd::Service[block_sync.service] doesn't seem to be in the catalog
[14:09:51] <_joe_>	 oh richt
[14:10:01] <bstorm_>	 So I'll adjust that and put it back :)
[14:10:01] <_joe_>	 dependencies are resolved by the agent
[14:10:03] <_joe_>	 not the master
[14:10:08] <bstorm_>	 That makes sense
[14:10:15] <_joe_>	 it's the biggest limitation of our compiler
[14:10:39] <wikibugs_>	 (03CR) 10Alex Monk: [C: 032] Implement different Certificate.save() modes [software/certcentral] - 10https://gerrit.wikimedia.org/r/453124 (owner: 10Vgutierrez)
[14:11:37] <moritzm>	 !log rebooting labtestweb2001 for kernel security update
[14:11:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:11:43] <icinga-wm>	 RECOVERY - puppet last run on labstore2003 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[14:12:02] <icinga-wm>	 PROBLEM - puppet last run on labstore2004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[14:13:43] <Guest49527>	 Is anyone else missing Phab boards? Our backlog board seems to have disappeared today
[14:13:45] <wikibugs_>	 (03PS2) 10Ema: ATS: fix routing to Restbase [puppet] - 10https://gerrit.wikimedia.org/r/453111 (https://phabricator.wikimedia.org/T199720)
[14:13:47] <Guest49527>	 https://phabricator.wikimedia.org/tag/readers-web-backlog/ https://phabricator.wikimedia.org/project/board/67/
[14:14:35] <wikibugs_>	 (03CR) 10Ema: [C: 032] ATS: fix routing to Restbase [puppet] - 10https://gerrit.wikimedia.org/r/453111 (https://phabricator.wikimedia.org/T199720) (owner: 10Ema)
[14:16:55] <wikibugs_>	 (03PS1) 10Bstorm: labstore: Change backup cron to a systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/453137 (https://phabricator.wikimedia.org/T171394)
[14:22:26] <gehel>	 !log repooling wdqs[12]003 
[14:22:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:22:43] <wikibugs_>	 10Operations, 10Traffic, 10Patch-For-Review: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by ema on neodymium.eqiad.wmnet for hosts: ``` ['cp4024.ulsfo.wmnet', 'cp2002.codfw.wmnet'] ``` The log can be found in `/var/l...
[14:25:18] <wikibugs_>	 (03PS2) 10Bstorm: labstore: Change backup cron to a systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/453137 (https://phabricator.wikimedia.org/T171394)
[14:25:29] <XioNoX>	 !log starting moving asw2-a-eqiad servers' uplinks for T201694
[14:25:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:25:35] <stashbot>	 T201694: Move servers off asw2-a-eqiad - https://phabricator.wikimedia.org/T201694
[14:27:04] <cmjohnson1>	 !log lvs1015 moving cross connect from asw2-a2 to asw2-a5  T201694 
[14:27:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:27:41] <wikibugs_>	 (03PS4) 10Giuseppe Lavagetto: mediawiki::web::prod_sites: move the other private wikis to the define [puppet] - 10https://gerrit.wikimedia.org/r/451255 (https://phabricator.wikimedia.org/T196968)
[14:28:27] <wikibugs_>	 (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki::web::prod_sites: move the other private wikis to the define [puppet] - 10https://gerrit.wikimedia.org/r/451255 (https://phabricator.wikimedia.org/T196968) (owner: 10Giuseppe Lavagetto)
[14:28:55] <wikibugs_>	 (03CR) 10Alex Monk: [C: 04-1] Refactor certcentral.certificate_management() (036 comments) [software/certcentral] - 10https://gerrit.wikimedia.org/r/451867 (owner: 10Vgutierrez)
[14:29:04] <wikibugs_>	 (03PS3) 10Jcrespo: mariadb: Set s2 in read only mode due to maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/452620 (https://phabricator.wikimedia.org/T201694)
[14:33:44] <cmjohnson1>	 !log cloudelastic1001 moving uplink from asw2-a eqiad to asw2-a2
[14:33:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:33:57] <wikibugs_>	 (03CR) 10Jcrespo: [C: 032] mariadb: Depool es1018 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/453122 (owner: 10Jcrespo)
[14:34:30] <wikibugs_>	 10Operations, 10monitoring, 10Patch-For-Review, 10User-fgiunchedi: Sunset Watchmouse's status.wikimedia.org - https://phabricator.wikimedia.org/T199816 (10waldyrious) >>! In T199816#4444683, @fgiunchedi wrote: > I've setup a very bare deprecation page for status.wikimedia.org, we can sunset the DNS name in...
[14:34:57] <logmsgbot>	 !log mholloway-shell@deploy1001 Started deploy [mobileapps/deploy@166eafa]: Update mobileapps to a808c9d (T201979)
[14:35:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:35:03] <stashbot>	 T201979: Fix usage of deprecated API query pattern(s) - https://phabricator.wikimedia.org/T201979
[14:35:25] <wikibugs_>	 (03Merged) 10jenkins-bot: mariadb: Depool es1018 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/453122 (owner: 10Jcrespo)
[14:37:11] <icinga-wm>	 RECOVERY - puppet last run on labstore2004 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[14:37:49] <wikibugs_>	 (03PS1) 10Jcrespo: Revert "mariadb: Depool es1018 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/453144
[14:38:32] <logmsgbot>	 !log jynus@deploy1001 Synchronized wmf-config/db-codfw.php: Depool es2018 (duration: 00m 55s)
[14:38:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:39:40] <wikibugs_>	 (03PS4) 10Jcrespo: mariadb: Set s2 in read only mode due to maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/452620 (https://phabricator.wikimedia.org/T201694)
[14:39:47] <logmsgbot>	 !log bblack@neodymium conftool action : set/pooled=no; selector: name=dns1001.wikimedia.org
[14:39:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:40:30] <cmjohnson1>	 !log dns1001 moving uplink from asw2-a eqiad to asw-a-eqiad
[14:40:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:41:00] <logmsgbot>	 !log mholloway-shell@deploy1001 Finished deploy [mobileapps/deploy@166eafa]: Update mobileapps to a808c9d (T201979) (duration: 06m 03s)
[14:41:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:41:06] <stashbot>	 T201979: Fix usage of deprecated API query pattern(s) - https://phabricator.wikimedia.org/T201979
[14:43:10] <cmjohnson1>	 !log dbproxy1012 moving uplink from asw2-a-eqiad to asw-a-eqiad
[14:43:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:43:35] <logmsgbot>	 !log bblack@neodymium conftool action : set/pooled=yes; selector: name=dns1001.wikimedia.org
[14:43:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:44:52] <cmjohnson1>	 !log labstore1008 moving uplink from asw2-a-eqiad to asw-a-eqiad
[14:44:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:46:30] <cmjohnson1>	 !log db1116  moving uplink from asw2-a-eqiad to asw-a-eqiad
[14:46:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:47:21] <wikibugs_>	 (03PS3) 10Bstorm: labstore: Change backup cron to a systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/453137 (https://phabricator.wikimedia.org/T171394)
[14:47:40] <wikibugs_>	 (03CR) 10Vgutierrez: Refactor certcentral.certificate_management() (033 comments) [software/certcentral] - 10https://gerrit.wikimedia.org/r/451867 (owner: 10Vgutierrez)
[14:49:24] <cmjohnson1>	 !log db1118 moving uplink from asw2-a-eqiad to asw-a-eqiad
[14:49:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:49:42] <wikibugs_>	 (03CR) 10Alex Monk: [C: 04-1] Refactor certcentral.certificate_management() (032 comments) [software/certcentral] - 10https://gerrit.wikimedia.org/r/451867 (owner: 10Vgutierrez)
[14:49:54] <wikibugs_>	 10Operations, 10Traffic, 10Patch-For-Review: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['cp4024.ulsfo.wmnet'] ```  Of which those **FAILED**: ``` ['cp4024.ulsfo.wmnet'] ```
[14:49:59] <wikibugs_>	 (03CR) 10jenkins-bot: mariadb: Depool es1018 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/453122 (owner: 10Jcrespo)
[14:50:15] <icinga-wm>	 RECOVERY - Host mw2184 is UP: PING OK - Packet loss = 0%, RTA = 36.19 ms
[14:50:44] <cmjohnson1>	 !log db1066 moving uplink from asw2-a-eqiad to asw-a-eqiad
[14:50:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:50:56] <wikibugs_>	 10Operations, 10monitoring: Implement an accurate and easy to understand status page for all wikis - https://phabricator.wikimedia.org/T202061 (10Imarlier)
[14:51:40] <wikibugs_>	 10Operations, 10monitoring, 10Patch-For-Review, 10User-fgiunchedi: Sunset Watchmouse's status.wikimedia.org - https://phabricator.wikimedia.org/T199816 (10Imarlier) @waldyrious Good point.  I added T202061 as a task to implement a replacement.
[14:51:57] <papaul>	 !log shutting down mw2184 for maintenance 
[14:52:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:52:18] <wikibugs_>	 (03PS4) 10Bstorm: labstore: Change backup cron to a systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/453137 (https://phabricator.wikimedia.org/T171394)
[14:53:04] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] labstore: Change backup cron to a systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/453137 (https://phabricator.wikimedia.org/T171394) (owner: 10Bstorm)
[14:53:25] <icinga-wm>	 PROBLEM - Host mw2184 is DOWN: PING CRITICAL - Packet loss = 100%
[14:54:42] <cmjohnson1>	 !log labstore1009  moving uplink from asw2-a-eqiad to asw-a-eqiad
[14:54:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:56:15] <cmjohnson1>	 !log dbproxy1013  moving uplink from asw2-a-eqiad to asw-a-eqiad
[14:56:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:57:06] <cmjohnson1>	 !log ms-be1040  moving uplink from asw2-a-eqiad to asw-a-eqiad
[14:57:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:57:58] <wikibugs_>	 (03Abandoned) 10Jcrespo: mariadb: Set s2 in read only mode due to maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/452620 (https://phabricator.wikimedia.org/T201694) (owner: 10Jcrespo)
[14:58:18] <wikibugs_>	 (03Abandoned) 10Jcrespo: mariadb: Set s2 as read-write and promote db1122 as the new s2 master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/452632 (https://phabricator.wikimedia.org/T201694) (owner: 10Jcrespo)
[14:58:39] <wikibugs_>	 (03Abandoned) 10Jcrespo: mariadb: Failover db1066 (eqiad s2 master) to db1122 [puppet] - 10https://gerrit.wikimedia.org/r/452637 (https://phabricator.wikimedia.org/T197073) (owner: 10Jcrespo)
[14:58:54] <wikibugs_>	 (03Abandoned) 10Jcrespo: mariadb: Point s2-master CNAME to db1122 [dns] - 10https://gerrit.wikimedia.org/r/452642 (https://phabricator.wikimedia.org/T201694) (owner: 10Jcrespo)
[14:58:54] <cmjohnson1>	 !log torrelay1001  moving uplink from asw2-a-eqiad to asw-a-eqiad
[14:58:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:59:56] <icinga-wm>	 PROBLEM - Host mw2184.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[15:00:30] <jynus>	 !log stopping es2018 for upgrade
[15:00:33] <logmsgbot>	 !log bblack@neodymium conftool action : set/pooled=no; selector: name=cp107[56]\.eqiad\.wmnet
[15:00:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:00:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:00:46] <wikibugs_>	 (03PS2) 10Volans: Log: use local variable for dry_run [software/spicerack] - 10https://gerrit.wikimedia.org/r/452379 (https://phabricator.wikimedia.org/T199079)
[15:00:52] <wikibugs_>	 (03PS2) 10Volans: dry-run: remove the module, inject the parameter [software/spicerack] - 10https://gerrit.wikimedia.org/r/452378 (https://phabricator.wikimedia.org/T199079)
[15:00:54] <wikibugs_>	 (03PS13) 10Volans: Add cookbook entry point script [software/spicerack] - 10https://gerrit.wikimedia.org/r/450937 (https://phabricator.wikimedia.org/T199079)
[15:00:59] <wikibugs_>	 (03PS1) 10Volans: tests: enable pytest logging [software/spicerack] - 10https://gerrit.wikimedia.org/r/453145 (https://phabricator.wikimedia.org/T199079)
[15:01:01] <wikibugs_>	 (03PS1) 10Volans: config: fix docstring [software/spicerack] - 10https://gerrit.wikimedia.org/r/453146 (https://phabricator.wikimedia.org/T199079)
[15:01:03] <logmsgbot>	 !log bblack@neodymium conftool action : set/pooled=yes; selector: name=cp107[56]\.eqiad\.wmnet
[15:01:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:01:16] <logmsgbot>	 !log bblack@neodymium conftool action : set/pooled=no; selector: name=cp107[56]\.eqiad\.wmnet
[15:01:18] <wikibugs_>	 (03CR) 10Volans: "inline" (032 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/450937 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans)
[15:01:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:01:24] <wikibugs_>	 10Operations, 10ops-eqiad, 10DC-Ops: Replace wtp1043's sda - https://phabricator.wikimedia.org/T196886 (10RobH) I did not check this, just didn't notice it assigned to me.  The Tech Direct doesn't work, was normal support attempted?  I've emailed our team, & CCed Chris.    > Dell Team, >  > We're experiencin...
[15:01:26] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Log: use local variable for dry_run [software/spicerack] - 10https://gerrit.wikimedia.org/r/452379 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans)
[15:01:28] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] dry-run: remove the module, inject the parameter [software/spicerack] - 10https://gerrit.wikimedia.org/r/452378 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans)
[15:01:30] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] tests: enable pytest logging [software/spicerack] - 10https://gerrit.wikimedia.org/r/453145 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans)
[15:01:32] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] config: fix docstring [software/spicerack] - 10https://gerrit.wikimedia.org/r/453146 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans)
[15:01:38] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Add cookbook entry point script [software/spicerack] - 10https://gerrit.wikimedia.org/r/450937 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans)
[15:02:04] <volans>	 why?
[15:02:28] <bblack>	 because it's a jerk
[15:02:38] <volans>	 they all passsed locally
[15:02:44] <cmjohnson1>	 !log cp1075 moving uplink from asw2-a-eqiad to asw-a-eqiad
[15:02:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:02:49] <papaul>	 _joe_: np
[15:03:45] <cmjohnson1>	 !log cp1076 moving uplink from asw2-a-eqiad to asw-a-eqiad
[15:03:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:04:29] <wikibugs_>	 (03PS5) 10Bstorm: labstore: Change backup cron to a systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/453137 (https://phabricator.wikimedia.org/T171394)
[15:04:50] <logmsgbot>	 !log bblack@neodymium conftool action : set/pooled=yes; selector: name=cp107[56]\.eqiad\.wmnet
[15:04:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:05:10] <logmsgbot>	 !log bblack@neodymium conftool action : set/pooled=no; selector: name=cp107[78]\.eqiad\.wmnet
[15:05:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:05:20] <volans>	 damn bugged test-dependencies
[15:05:52] <cmjohnson1>	 !log cp107[78] moving uplink from asw2-a-eqiad to asw-a-eqiad
[15:05:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:06:41] <wikibugs_>	 (03CR) 10Gehel: [C: 031] "LGTM, trivial" [software/spicerack] - 10https://gerrit.wikimedia.org/r/452379 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans)
[15:07:19] <volans>	 gehel: the prospector issues are https://github.com/PyCQA/prospector/issues/276 :(
[15:07:31] <logmsgbot>	 !log bblack@neodymium conftool action : set/pooled=yes; selector: name=cp107[78]\.eqiad\.wmnet
[15:07:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:08:05] <volans>	 I can guarantee that all was good with 2.3.1 and all test were passing
[15:08:41] <icinga-wm>	 RECOVERY - Host mw2184 is UP: PING OK - Packet loss = 0%, RTA = 36.23 ms
[15:08:44] <wikibugs_>	 (03CR) 10Gehel: "LGTM, trivial enough" [software/spicerack] - 10https://gerrit.wikimedia.org/r/453145 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans)
[15:08:48] <wikibugs_>	 (03CR) 10Gehel: [C: 031] tests: enable pytest logging [software/spicerack] - 10https://gerrit.wikimedia.org/r/453145 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans)
[15:09:12] <bblack>	 !log stopping pybal on lvs1016 to fail traffic to lvs1006 for T201694
[15:09:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:09:18] <stashbot>	 T201694: Move servers off asw2-a-eqiad - https://phabricator.wikimedia.org/T201694
[15:09:39] <wikibugs_>	 (03CR) 10Gehel: [C: 031] "LGTM, trivial enough" [software/spicerack] - 10https://gerrit.wikimedia.org/r/452378 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans)
[15:09:49] <wikibugs_>	 (03CR) 10Volans: [V: 032 C: 032] Log: use local variable for dry_run [software/spicerack] - 10https://gerrit.wikimedia.org/r/452379 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans)
[15:10:00] <wikibugs_>	 (03CR) 10Bstorm: [C: 032] labstore: Change backup cron to a systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/453137 (https://phabricator.wikimedia.org/T171394) (owner: 10Bstorm)
[15:10:14] <wikibugs_>	 (03PS6) 10Bstorm: labstore: Change backup cron to a systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/453137 (https://phabricator.wikimedia.org/T171394)
[15:10:31] <wikibugs_>	 (03CR) 10Volans: [V: 032 C: 032] tests: enable pytest logging [software/spicerack] - 10https://gerrit.wikimedia.org/r/453145 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans)
[15:10:38] <wikibugs_>	 (03CR) 10Gehel: [C: 031] "LGTM" (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/453146 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans)
[15:11:22] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1016 is CRITICAL: PYBAL CRITICAL - Bad Response from pybal: 500 Cant connect to localhost:9090
[15:11:24] <wikibugs_>	 (03CR) 10Volans: [V: 032 C: 032] dry-run: remove the module, inject the parameter (032 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/452378 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans)
[15:11:42] <icinga-wm>	 PROBLEM - pybal on lvs1016 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), args /usr/sbin/pybal
[15:11:52] <icinga-wm>	 PROBLEM - PyBal connections to etcd on lvs1016 is CRITICAL: CRITICAL: 0 connections established with conf1001.eqiad.wmnet:2379 (min=42)
[15:11:55] <bblack>	 ^ lvs1016 alerts expected, see logmsg earlier
[15:12:07] <wikibugs_>	 (03CR) 10Vgutierrez: Refactor certcentral.certificate_management() (031 comment) [software/certcentral] - 10https://gerrit.wikimedia.org/r/451867 (owner: 10Vgutierrez)
[15:13:21] <cmjohnson1>	 !log  lvs1016 moving uplink from asw2-a-eqiad to asw-a-eqiad
[15:13:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:15:41] <wikibugs_>	 10Operations, 10Cassandra: cassandra-a instance on aqs1007 is not starting - https://phabricator.wikimedia.org/T201986 (10Eevans) Just for posterity sake: I don't know why the log would have been corrupted like this (almost certainly a bug), but the commitlog only exists to append incoming writes until what wa...
[15:16:23] <bblack>	 !log restarting pybal on lvs1016 - T201694
[15:16:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:16:30] <stashbot>	 T201694: Move servers off asw2-a-eqiad - https://phabricator.wikimedia.org/T201694
[15:16:42] <icinga-wm>	 RECOVERY - pybal on lvs1016 is OK: PROCS OK: 1 process with UID = 0 (root), args /usr/sbin/pybal
[15:16:52] <icinga-wm>	 RECOVERY - PyBal connections to etcd on lvs1016 is OK: OK: 42 connections established with conf1001.eqiad.wmnet:2379 (min=42)
[15:16:52] <wikibugs_>	 (03CR) 10Vgutierrez: [C: 04-2] "sigh.. I've swapped CHALLENGES_SOLVED and CHALLENGES_PUSHED status implementation, back to WIP :(" [software/certcentral] - 10https://gerrit.wikimedia.org/r/451867 (owner: 10Vgutierrez)
[15:17:31] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1016 is OK: PYBAL OK - All pools are healthy
[15:17:32] <icinga-wm>	 PROBLEM - Host mw2184 is DOWN: PING CRITICAL - Packet loss = 100%
[15:19:46] <wikibugs_>	 (03PS2) 10Jcrespo: Revert "mariadb: Depool es1018 for maintenance" and depool es1019 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/453144
[15:20:20] <wikibugs_>	 10Operations, 10ops-eqiad: rack/setup/install puppetmaster1003.eqiad.wmnet - https://phabricator.wikimedia.org/T201342 (10Cmjohnson)
[15:21:17] <wikibugs_>	 (03PS3) 10Jcrespo: Revert "mariadb: Depool es1018 for maintenance" and depool es2019 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/453144
[15:21:21] <icinga-wm>	 RECOVERY - Host mw2184 is UP: PING OK - Packet loss = 0%, RTA = 36.10 ms
[15:21:52] <wikibugs_>	 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install mwmaint1002.eqiad.wmnet - https://phabricator.wikimedia.org/T201343 (10Cmjohnson)
[15:22:10] <wikibugs_>	 10Operations, 10ops-eqiad, 10Operations-Software-Development: rack/setup/install clustermgmt1001.eqiad.wmnet (new cumin master) - https://phabricator.wikimedia.org/T201346 (10Cmjohnson)
[15:23:21] <icinga-wm>	 PROBLEM - mediawiki-installation DSH group on mw2184 is CRITICAL: Host mw2184 is not in mediawiki-installation dsh group
[15:23:32] <wikibugs_>	 10Operations, 10ops-eqiad: rack/setup/install sulfur.wikimedia.org - https://phabricator.wikimedia.org/T201364 (10Cmjohnson)
[15:23:55] <wikibugs_>	 10Operations, 10ops-eqiad, 10Parsoid: rack/setup/install scandium.eqiad.wmnet (parsoid test box) - https://phabricator.wikimedia.org/T201366 (10Cmjohnson)
[15:24:17] <wikibugs_>	 10Operations, 10ops-eqiad: rack/setup/add to spares tracking 2 dual cpu misc system - https://phabricator.wikimedia.org/T201367 (10Cmjohnson)
[15:24:49] <wikibugs_>	 10Operations, 10ops-eqiad: rack/setup/add to spares tracking 2 dual cpu misc system - https://phabricator.wikimedia.org/T201367 (10Cmjohnson)
[15:25:19] <wikibugs_>	 10Operations, 10ops-eqiad: rack/setup/add to spares tracking 2 dual cpu misc system - https://phabricator.wikimedia.org/T201367 (10Cmjohnson)
[15:25:27] <wikibugs_>	 10Operations, 10ops-eqiad: rack/setup/add to spares tracking 2 dual cpu misc system - https://phabricator.wikimedia.org/T201367 (10Cmjohnson) 05Open>03Resolved
[15:25:31] <wikibugs_>	 10Operations, 10SRE-Access-Requests: Requesting access to view EventLogging data - https://phabricator.wikimedia.org/T202063 (10Tim_WMDE)
[15:26:38] <wikibugs_>	 (03CR) 10Jcrespo: [C: 032] Revert "mariadb: Depool es1018 for maintenance" and depool es2019 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/453144 (owner: 10Jcrespo)
[15:27:57] <wikibugs_>	 (03Merged) 10jenkins-bot: Revert "mariadb: Depool es1018 for maintenance" and depool es2019 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/453144 (owner: 10Jcrespo)
[15:29:27] <logmsgbot>	 !log jynus@deploy1001 Synchronized wmf-config/db-codfw.php: Repool es2018, depool es2019 (duration: 00m 50s)
[15:29:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:31:15] <wikibugs_>	 10Operations: onboarding Effie Mouzeli - https://phabricator.wikimedia.org/T201855 (10jijiki)
[15:31:17] <wikibugs_>	 10Operations, 10Patch-For-Review: Onboarding Effie Mouzeli - https://phabricator.wikimedia.org/T201816 (10jijiki)
[15:31:20] <wikibugs_>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Request production global root access for Effie Mouzeli - https://phabricator.wikimedia.org/T201849 (10jijiki) 05Open>03Resolved
[15:34:31] <jynus>	 !log stopping es2019 for upgrade
[15:34:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:37:47] <wikibugs_>	 10Operations, 10ops-codfw: mw2184 stuck after reboot - https://phabricator.wikimedia.org/T202006 (10Papaul) a:05Papaul>03MoritzMuehlenhoff This is what was showing  {F25020319} - Drain the power - Upgrade BIOS from version 2.3.3 to 2.6.0 - Upgrade IDRAC from 1.4.2 to 2.60 server is back up
[15:38:30] <wikibugs_>	 (03PS2) 10Volans: config: fix docstring [software/spicerack] - 10https://gerrit.wikimedia.org/r/453146 (https://phabricator.wikimedia.org/T199079)
[15:38:32] <wikibugs_>	 (03PS14) 10Volans: Add cookbook entry point script [software/spicerack] - 10https://gerrit.wikimedia.org/r/450937 (https://phabricator.wikimedia.org/T199079)
[15:38:39] <wikibugs_>	 (03CR) 10Volans: "done" (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/453146 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans)
[15:39:24] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] config: fix docstring [software/spicerack] - 10https://gerrit.wikimedia.org/r/453146 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans)
[15:39:26] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Add cookbook entry point script [software/spicerack] - 10https://gerrit.wikimedia.org/r/450937 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans)
[15:42:47] <wikibugs_>	 10Operations, 10ops-codfw: mw2184 stuck after reboot - https://phabricator.wikimedia.org/T202006 (10MoritzMuehlenhoff) 05Open>03Resolved Thanks! I've run "scap pull" and repooled the server.
[15:44:04] <wikibugs_>	 (03PS1) 10Cmjohnson: Removing second mgmt dns entry for scandium [dns] - 10https://gerrit.wikimedia.org/r/453150 (https://phabricator.wikimedia.org/T201366)
[15:44:20] <wikibugs_>	 (03CR) 10Cmjohnson: [C: 032] Removing second mgmt dns entry for scandium [dns] - 10https://gerrit.wikimedia.org/r/453150 (https://phabricator.wikimedia.org/T201366) (owner: 10Cmjohnson)
[15:45:49] <wikibugs_>	 (03PS1) 10Jcrespo: mariadb: Repool es2019 after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/453151
[15:46:32] <wikibugs_>	 (03PS3) 10Volans: config: rename parameter to avoid negation [software/spicerack] - 10https://gerrit.wikimedia.org/r/453146 (https://phabricator.wikimedia.org/T199079)
[15:46:34] <wikibugs_>	 (03PS15) 10Volans: Add cookbook entry point script [software/spicerack] - 10https://gerrit.wikimedia.org/r/450937 (https://phabricator.wikimedia.org/T199079)
[15:46:39] <volans>	 sorry for the spam of -1
[15:47:15] <wikibugs_>	 10Operations, 10netops, 10Wikimedia-Incident: asw2-a-eqiad FPC5 gets disconnected every 10 minutes - https://phabricator.wikimedia.org/T201145 (10ayounsi)
[15:47:16] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] config: rename parameter to avoid negation [software/spicerack] - 10https://gerrit.wikimedia.org/r/453146 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans)
[15:47:17] <wikibugs_>	 10Operations, 10netops, 10Patch-For-Review: Move servers off asw2-a-eqiad - https://phabricator.wikimedia.org/T201694 (10ayounsi) 05Open>03Resolved a:03Cmjohnson
[15:47:24] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Add cookbook entry point script [software/spicerack] - 10https://gerrit.wikimedia.org/r/450937 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans)
[15:47:32] <wikibugs_>	 (03PS1) 10Cmjohnson: Removing mgmt dns for decom host labsdb1001-3 [dns] - 10https://gerrit.wikimedia.org/r/453152 (https://phabricator.wikimedia.org/T184832)
[15:48:06] <wikibugs_>	 (03CR) 10Cmjohnson: [C: 032] Removing mgmt dns for decom host labsdb1001-3 [dns] - 10https://gerrit.wikimedia.org/r/453152 (https://phabricator.wikimedia.org/T184832) (owner: 10Cmjohnson)
[15:50:07] <wikibugs_>	 (03CR) 10Jcrespo: [C: 032] mariadb: Repool es2019 after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/453151 (owner: 10Jcrespo)
[15:51:22] <wikibugs_>	 (03CR) 10jenkins-bot: Revert "mariadb: Depool es1018 for maintenance" and depool es2019 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/453144 (owner: 10Jcrespo)
[15:51:24] <wikibugs_>	 (03Merged) 10jenkins-bot: mariadb: Repool es2019 after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/453151 (owner: 10Jcrespo)
[15:51:41] <wikibugs_>	 (03CR) 10jenkins-bot: mariadb: Repool es2019 after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/453151 (owner: 10Jcrespo)
[15:51:51] <wikibugs_>	 10Operations, 10ops-eqiad, 10decommission, 10Patch-For-Review, 10cloud-services-team (Kanban): Decommission labsdb1001 and labsdb1003 - https://phabricator.wikimedia.org/T184832 (10Cmjohnson)
[15:52:57] <Krenair>	 herron, hey, just wanted to check in to see where we're at with https://gerrit.wikimedia.org/r/439774 and/or https://gerrit.wikimedia.org/r/439791 ?
[15:53:48] <logmsgbot>	 !log jynus@deploy1001 Synchronized wmf-config/db-codfw.php: Repool es2019 (duration: 00m 51s)
[15:53:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:55:32] <wikibugs_>	 (03CR) 10Gehel: "Minor comments inline, otherwise LGTM" (036 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/450937 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans)
[15:55:49] <jynus>	 !log stopping db2034 for upgrade
[15:55:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:56:33] <wikibugs_>	 (03PS1) 10Cmjohnson: Removing mgmt dns for decom server zinc [dns] - 10https://gerrit.wikimedia.org/r/453153 (https://phabricator.wikimedia.org/T191352)
[15:56:35] <wikibugs_>	 10Operations, 10SRE-Access-Requests: Requesting access to view EventLogging data - https://phabricator.wikimedia.org/T202069 (10Tonina_Zhelyazkova_WMDE)
[15:56:54] <wikibugs_>	 (03CR) 10Cmjohnson: [C: 032] Removing mgmt dns for decom server zinc [dns] - 10https://gerrit.wikimedia.org/r/453153 (https://phabricator.wikimedia.org/T191352) (owner: 10Cmjohnson)
[15:57:45] <wikibugs_>	 10Operations, 10ops-codfw, 10Traffic, 10Patch-For-Review: rack/setup/install LVS200[7-10] - https://phabricator.wikimedia.org/T196560 (10Papaul)
[15:58:09] <wikibugs_>	 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission, 10Patch-For-Review: decom zinc/WMF3298 - https://phabricator.wikimedia.org/T191352 (10Cmjohnson)
[15:58:21] <wikibugs_>	 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: Decommission old and unused/spare servers in eqiad - https://phabricator.wikimedia.org/T187473 (10Cmjohnson)
[15:58:25] <wikibugs_>	 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission, 10Patch-For-Review: decom zinc/WMF3298 - https://phabricator.wikimedia.org/T191352 (10Cmjohnson) 05Open>03Resolved a:03Cmjohnson
[15:58:43] <wikibugs_>	 10Operations, 10ops-eqiad, 10DC-Ops: Decommission server zinc - https://phabricator.wikimedia.org/T182016 (10Cmjohnson) 05Open>03Resolved duplicate
[16:00:04] <jouncebot>	 godog, moritzm, and _joe_: Dear deployers, time to do the Puppet SWAT(Max 6 patches) deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180816T1600).
[16:00:05] <jouncebot>	 No GERRIT patches in the queue for this window AFAICS.
[16:00:34] <wikibugs_>	 (03PS1) 10Cmjohnson: Removing mgmt dns for decom host vanadium [dns] - 10https://gerrit.wikimedia.org/r/453154 (https://phabricator.wikimedia.org/T191351)
[16:01:08] <wikibugs_>	 10Operations, 10DC-Ops, 10cloud-services-team, 10netops: Refresh switch ports descriptions for recently renamed cloud servers - https://phabricator.wikimedia.org/T201444 (10RobH) p:05Triage>03Normal
[16:04:06] <wikibugs_>	 (03CR) 10Cmjohnson: [C: 032] Removing mgmt dns for decom host vanadium [dns] - 10https://gerrit.wikimedia.org/r/453154 (https://phabricator.wikimedia.org/T191351) (owner: 10Cmjohnson)
[16:04:45] <wikibugs_>	 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission, 10Patch-For-Review: decom vanadium/WMF3291 - https://phabricator.wikimedia.org/T191351 (10Cmjohnson)
[16:04:53] <wikibugs_>	 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: Decommission old and unused/spare servers in eqiad - https://phabricator.wikimedia.org/T187473 (10Cmjohnson)
[16:04:55] <wikibugs_>	 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission, 10Patch-For-Review: decom vanadium/WMF3291 - https://phabricator.wikimedia.org/T191351 (10Cmjohnson) 05Open>03Resolved a:03Cmjohnson
[16:12:32] <gehel>	 !log banning, depooling and shutting down elastic1029 for memory replacement - T201991
[16:12:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:12:39] <stashbot>	 T201991: Broken memory on elastic1029 - https://phabricator.wikimedia.org/T201991
[16:14:54] <icinga-wm>	 PROBLEM - Host elastic1029 is DOWN: PING CRITICAL - Packet loss = 100%
[16:16:28] <gehel>	 damn, elastic1029 is obviously me 
[16:17:40] <wikibugs_>	 (03PS1) 10Bstorm: labstore: trying to make dependency issues work [puppet] - 10https://gerrit.wikimedia.org/r/453156 (https://phabricator.wikimedia.org/T171394)
[16:18:54] <icinga-wm>	 PROBLEM - Host elastic1029.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[16:22:23] <icinga-wm>	 RECOVERY - Host elastic1029 is UP: PING OK - Packet loss = 0%, RTA = 0.25 ms
[16:23:23] <icinga-wm>	 RECOVERY - mediawiki-installation DSH group on mw2184 is OK: OK
[16:24:13] <icinga-wm>	 RECOVERY - Host elastic1029.mgmt is UP: PING OK - Packet loss = 0%, RTA = 0.90 ms
[16:25:29] <wikibugs_>	 10Operations, 10ops-eqiad: Broken memory on elastic1029 - https://phabricator.wikimedia.org/T201991 (10Cmjohnson) I reseated the DIMM  and moved  all on side A to side B.  Powered on and server came back normally.
[16:32:20] <wikibugs_>	 10Operations, 10LDAP-Access-Requests, 10User-Addshore: Give access to graphite and grafana-admin to Aleksey Bekh-Ivanov (WMDE) - https://phabricator.wikimedia.org/T199233 (10RStallman-legalteam) NDA is fully signed and on file with legal. Thanks!
[16:35:07] <wikibugs_>	 (03PS1) 10Urbanecm: Throttle exeptions for Czech Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/453160 (https://phabricator.wikimedia.org/T202038)
[16:35:59] <wikibugs_>	 10Operations, 10SRE-Access-Requests: Requesting Access to view EventLogging data - https://phabricator.wikimedia.org/T202072 (10gabriel-wmde)
[16:37:55] <Urbanecm>	 jouncebot, next
[16:37:55] <jouncebot>	 In 0 hour(s) and 22 minute(s): Services – Graphoid / Parsoid / Citoid / ORES (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180816T1700)
[16:42:31] <wikibugs_>	 (03PS1) 10Cmjohnson: Adding mgmt dns for analyticsmaster1001-2 [dns] - 10https://gerrit.wikimedia.org/r/453161 (https://phabricator.wikimedia.org/T201939)
[16:43:00] <wikibugs_>	 (03CR) 10Cmjohnson: [C: 032] Adding mgmt dns for analyticsmaster1001-2 [dns] - 10https://gerrit.wikimedia.org/r/453161 (https://phabricator.wikimedia.org/T201939) (owner: 10Cmjohnson)
[16:43:34] <icinga-wm>	 PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen
[16:46:35] <wikibugs_>	 (03CR) 10Dzahn: [C: 032] "GPG is a great solution, though for some reason i got bad signature (probably missing whitespace or something like that). anyways, confirm" [puppet] - 10https://gerrit.wikimedia.org/r/452844 (https://phabricator.wikimedia.org/T201913) (owner: 10Dzahn)
[16:47:16] <wikibugs_>	 (03PS2) 10Dzahn: admins: add new SSH key for Daniel Kinzler [puppet] - 10https://gerrit.wikimedia.org/r/452844 (https://phabricator.wikimedia.org/T201913)
[16:47:53] <icinga-wm>	 RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen
[16:49:02] <wikibugs_>	 (03PS16) 10Volans: Add cookbook entry point script [software/spicerack] - 10https://gerrit.wikimedia.org/r/450937 (https://phabricator.wikimedia.org/T199079)
[16:49:04] <wikibugs_>	 (03PS3) 10Volans: Add confctl module to interact with conftool [software/spicerack] - 10https://gerrit.wikimedia.org/r/451254 (https://phabricator.wikimedia.org/T199079)
[16:49:25] <wikibugs_>	 (03PS2) 10Bstorm: labstore: trying to make dependency issues work [puppet] - 10https://gerrit.wikimedia.org/r/453156 (https://phabricator.wikimedia.org/T171394)
[16:49:35] <wikibugs_>	 (03CR) 10Volans: "See inline, thanks a lot for the review!" (036 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/450937 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans)
[16:49:44] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Add cookbook entry point script [software/spicerack] - 10https://gerrit.wikimedia.org/r/450937 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans)
[16:49:46] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Add confctl module to interact with conftool [software/spicerack] - 10https://gerrit.wikimedia.org/r/451254 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans)
[16:50:07] <wikibugs_>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: new ssh key for daniel - https://phabricator.wikimedia.org/T201913 (10Dzahn) Yes, GPG signature was a great solution, though for some reason i got 'bad signature', probably a missing whitespace during copy/paste or similar.  Anyways, confirmed with a h...
[16:52:17] <wikibugs_>	 10Operations, 10Patch-For-Review: Onboarding Effie Mouzeli - https://phabricator.wikimedia.org/T201816 (10Dzahn)
[16:52:31] <wikibugs_>	 10Operations, 10ops-eqiad, 10netops: Move  asw2-a<->cr1 uplink back to asw-a - https://phabricator.wikimedia.org/T202075 (10ayounsi) p:05Triage>03High
[16:57:00] <wikibugs_>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: new ssh key for daniel - https://phabricator.wikimedia.org/T201913 (10daniel)
[16:57:13] <librenms-wmf>	 04Critical Alert for device mr1-eqiad.wikimedia.org - Duplicate IP on mgmt network got acknowledged
[16:58:32] <wikibugs_>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: new ssh key for daniel - https://phabricator.wikimedia.org/T201913 (10daniel) > Yes, GPG signature was a great solution, though for some reason i got 'bad signature', probably a missing whitespace during copy/paste or similar.  I guess I introduced a l...
[17:00:04] <jouncebot>	 cscott, arlolra, subbu, halfak, and Amir1: How many deployers does it take to do Services – Graphoid / Parsoid / Citoid / ORES deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180816T1700).
[17:00:46] <wikibugs_>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: new ssh key for daniel - https://phabricator.wikimedia.org/T201913 (10Dzahn) 05Open>03Resolved a:03Dzahn ran puppet on bastion hosts and mwmaint1001.  key has been updated there. all other hosts will follow automatically
[17:00:48] <wikibugs_>	 10Operations, 10ops-eqiad, 10Analytics, 10Patch-For-Review: rack/setup/install analytics-master100[12].eqiad.wmnet - https://phabricator.wikimedia.org/T201939 (10Cmjohnson)
[17:01:18] <wikibugs_>	 (03PS3) 10Bstorm: labstore: trying to make dependency issues work [puppet] - 10https://gerrit.wikimedia.org/r/453156 (https://phabricator.wikimedia.org/T171394)
[17:02:39] <wikibugs_>	 (03CR) 10Bstorm: [C: 032] labstore: trying to make dependency issues work [puppet] - 10https://gerrit.wikimedia.org/r/453156 (https://phabricator.wikimedia.org/T171394) (owner: 10Bstorm)
[17:06:18] <wikibugs_>	 (03PS1) 10Bstorm: Revert "labstore: trying to make dependency issues work" [puppet] - 10https://gerrit.wikimedia.org/r/453162
[17:07:21] <wikibugs_>	 (03CR) 10Bstorm: [C: 032] Revert "labstore: trying to make dependency issues work" [puppet] - 10https://gerrit.wikimedia.org/r/453162 (owner: 10Bstorm)
[17:09:00] <awight>	 Nothing for ORES today
[17:09:47] <wikibugs_>	 (03PS1) 10Ema: ATS: storage configuration [puppet] - 10https://gerrit.wikimedia.org/r/453164 (https://phabricator.wikimedia.org/T199720)
[17:10:23] <icinga-wm>	 PROBLEM - puppet last run on labstore2003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Service[block_sync.service]
[17:10:33] <icinga-wm>	 PROBLEM - puppet last run on labstore2004 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Service[block_sync.service]
[17:21:59] <wikibugs_>	 (03PS1) 10Catrope: Enable ORES filters for PageTriage on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/453168
[17:22:17] <wikibugs_>	 (03PS2) 10Ema: ATS: storage configuration [puppet] - 10https://gerrit.wikimedia.org/r/453164 (https://phabricator.wikimedia.org/T199720)
[17:25:15] <wikibugs_>	 (03PS3) 10Ema: ATS: storage configuration [puppet] - 10https://gerrit.wikimedia.org/r/453164 (https://phabricator.wikimedia.org/T199720)
[17:29:30] <wikibugs_>	 10Operations, 10ops-eqiad, 10DBA: rack/setup/install dbproxy101[2-7].eqiad.wmnet - https://phabricator.wikimedia.org/T196690 (10Cmjohnson) a:05Cmjohnson>03RobH dbproxy1015 had the same ip in the idrac. Fixed
[17:30:23] <icinga-wm>	 RECOVERY - puppet last run on labstore2003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:35:34] <icinga-wm>	 RECOVERY - puppet last run on labstore2004 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures
[17:36:19] <wikibugs_>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: new ssh key for daniel - https://phabricator.wikimedia.org/T201913 (10Dzahn) I manually copied this key from here:  https://pgp.mit.edu/pks/lookup?op=get&search=0x7DB725DFC506256E  and imported it and then i could verify. ( i could not find it with --s...
[17:36:39] <gehel>	 !log reimaging elastic1029
[17:36:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:36:48] <wikibugs_>	 10Operations, 10Discovery-Search (Current work), 10Patch-For-Review: migrate elasticsearch to stretch (from jessie) - https://phabricator.wikimedia.org/T193649 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by gehel on neodymium.eqiad.wmnet for hosts: ``` ['elastic1029.eqiad.wmnet'] ``` The log...
[17:39:46] <wikibugs_>	 10Operations, 10ops-eqiad: Broken memory on elastic1029 - https://phabricator.wikimedia.org/T201991 (10Gehel) 05Open>03Resolved a:03Gehel Looking good!
[17:40:19] <wikibugs_>	 (03PS1) 10RobH: dbproxy101[56] mac update [puppet] - 10https://gerrit.wikimedia.org/r/453171
[17:41:34] <wikibugs_>	 (03CR) 10RobH: [C: 032] dbproxy101[56] mac update [puppet] - 10https://gerrit.wikimedia.org/r/453171 (owner: 10RobH)
[17:45:32] <wikibugs_>	 10Operations, 10ops-eqiad, 10Discovery, 10Discovery-Search, 10Elasticsearch: check elastic1022 power supply redundancy - https://phabricator.wikimedia.org/T177631 (10Gehel) @Cmjohnson confirms that there is still nothing in the H/W logs and the PSU seem to work correctly. IPMI reporting a false positive...
[17:55:45] <wikibugs_>	 10Operations, 10ops-eqiad, 10Performance-Team: tungsten disk 1 and 8 SMART failure - https://phabricator.wikimedia.org/T193628 (10Krinkle)
[17:57:43] <logmsgbot>	 !log ebernhardson@deploy1001 Started deploy [wikimedia/discovery/analytics@fec00bc]: Push updated transfer-to-es oozie job
[17:57:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:57:52] <logmsgbot>	 !log ebernhardson@deploy1001 Finished deploy [wikimedia/discovery/analytics@fec00bc]: Push updated transfer-to-es oozie job (duration: 00m 08s)
[17:57:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:59:22] <logmsgbot>	 !log ebernhardson@deploy1001 Started deploy [wikimedia/discovery/analytics@122080c]: push new python dependency handling
[17:59:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:59:41] <logmsgbot>	 !log ebernhardson@deploy1001 Finished deploy [wikimedia/discovery/analytics@122080c]: push new python dependency handling (duration: 00m 20s)
[17:59:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:00:04] <jouncebot>	 addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: That opportune time is upon us again. Time for a Morning SWAT (Max 6 patches) deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180816T1800).
[18:00:04] <jouncebot>	 Urbanecm and RoanKattouw: A patch you scheduled for Morning SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[18:00:12] <Urbanecm>	 Present
[18:00:19] <wikibugs_>	 10Operations, 10ops-eqiad, 10DBA: rack/setup/install dbproxy101[2-7].eqiad.wmnet - https://phabricator.wikimedia.org/T196690 (10RobH)
[18:01:17] <wikibugs_>	 10Operations, 10Discovery-Search (Current work), 10Patch-For-Review: migrate elasticsearch to stretch (from jessie) - https://phabricator.wikimedia.org/T193649 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['elastic1029.eqiad.wmnet'] ```  and were **ALL** successful.
[18:02:20] <RoanKattouw>	 I'll do it
[18:02:38] <wikibugs_>	 (03PS1) 10Bstorm: labstore and systemd: change timer module to use simpler interface [puppet] - 10https://gerrit.wikimedia.org/r/453173 (https://phabricator.wikimedia.org/T171394)
[18:03:14] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] labstore and systemd: change timer module to use simpler interface [puppet] - 10https://gerrit.wikimedia.org/r/453173 (https://phabricator.wikimedia.org/T171394) (owner: 10Bstorm)
[18:05:33] <wikibugs_>	 (03PS2) 10Catrope: Throttle exeptions for Czech Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/453160 (https://phabricator.wikimedia.org/T202038) (owner: 10Urbanecm)
[18:06:17] <wikibugs_>	 (03CR) 10Catrope: [C: 032] Throttle exeptions for Czech Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/453160 (https://phabricator.wikimedia.org/T202038) (owner: 10Urbanecm)
[18:06:29] <logmsgbot>	 !log ebernhardson@deploy1001 Started deploy [wikimedia/discovery/analytics@d994cb9]: push new python dependency handling
[18:06:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:06:34] <logmsgbot>	 !log ebernhardson@deploy1001 Finished deploy [wikimedia/discovery/analytics@d994cb9]: push new python dependency handling (duration: 00m 05s)
[18:06:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:06:52] <Urbanecm>	 Hi RoanKattouw :)
[18:07:47] <wikibugs_>	 (03Merged) 10jenkins-bot: Throttle exeptions for Czech Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/453160 (https://phabricator.wikimedia.org/T202038) (owner: 10Urbanecm)
[18:08:02] <wikibugs_>	 (03PS3) 10Ori.livneh: Declare /var/cache/coal_web [puppet] - 10https://gerrit.wikimedia.org/r/452953
[18:08:14] <wikibugs_>	 (03PS2) 10Ori.livneh: Ensure coal-web caches are warm via a bi-hourly cron job [puppet] - 10https://gerrit.wikimedia.org/r/452984
[18:08:41] <wikibugs_>	 (03PS2) 10Bstorm: labstore and systemd: change timer module to use simpler interface [puppet] - 10https://gerrit.wikimedia.org/r/453173 (https://phabricator.wikimedia.org/T171394)
[18:10:28] <logmsgbot>	 !log ebernhardson@deploy1001 Started deploy [wikimedia/discovery/analytics@0a704b6]: push new python dependency handling
[18:10:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:14:18] <logmsgbot>	 !log ebernhardson@deploy1001 Finished deploy [wikimedia/discovery/analytics@0a704b6]: push new python dependency handling (duration: 03m 49s)
[18:14:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:15:06] <RoanKattouw>	 Sorry, missed the message about that patch being merged
[18:16:34] <logmsgbot>	 !log catrope@deploy1001 Synchronized wmf-config/throttle.php: Throttle exemptions for cswiki (T202038) (duration: 00m 53s)
[18:16:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:16:42] <stashbot>	 T202038: Account creation throttling exception request for Friday 17 and 24 August 2018 - https://phabricator.wikimedia.org/T202038
[18:18:05] <wikibugs_>	 (03PS2) 10Catrope: Enable ORES filters for PageTriage on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/453168
[18:18:12] <wikibugs_>	 (03CR) 10Catrope: [C: 032] Enable ORES filters for PageTriage on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/453168 (owner: 10Catrope)
[18:19:44] <wikibugs_>	 (03Merged) 10jenkins-bot: Enable ORES filters for PageTriage on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/453168 (owner: 10Catrope)
[18:19:57] <wikibugs_>	 (03CR) 10jenkins-bot: Throttle exeptions for Czech Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/453160 (https://phabricator.wikimedia.org/T202038) (owner: 10Urbanecm)
[18:19:59] <wikibugs_>	 (03CR) 10jenkins-bot: Enable ORES filters for PageTriage on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/453168 (owner: 10Catrope)
[18:21:01] <wikibugs_>	 (03CR) 10Dzahn: [C: 032] Declare /var/cache/coal_web [puppet] - 10https://gerrit.wikimedia.org/r/452953 (owner: 10Ori.livneh)
[18:22:32] <logmsgbot>	 !log catrope@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Enable ORES filters in PageTriage on testwiki (duration: 00m 50s)
[18:22:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:22:57] <wikibugs_>	 (03PS1) 10Ankry: Allow bureaucrats to remove the 'interface-admin' right in plwikisource (T202085) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/453177
[18:22:59] <wikibugs_>	 (03CR) 10Welcome, new contributor!: "Thank you for making your first contribution to Wikimedia! :) To learn how to get your code changes reviewed faster and more likely to get" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/453177 (owner: 10Ankry)
[18:24:39] <wikibugs_>	 (03Abandoned) 10Bstorm: labstore and systemd: change timer module to use simpler interface [puppet] - 10https://gerrit.wikimedia.org/r/453173 (https://phabricator.wikimedia.org/T171394) (owner: 10Bstorm)
[18:24:45] <wikibugs_>	 (03CR) 10Dzahn: [C: 032] Ensure coal-web caches are warm via a bi-hourly cron job [puppet] - 10https://gerrit.wikimedia.org/r/452984 (owner: 10Ori.livneh)
[18:25:29] <logmsgbot>	 !log ebernhardson@deploy1001 Started deploy [wikimedia/discovery/analytics@5d87cc0]: now without the shebang
[18:25:33] <wikibugs_>	 (03Abandoned) 10Ankry: Allow bureaucrats to remove the 'interface-admin' right in plwikisource (T202085) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/453177 (owner: 10Ankry)
[18:25:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:25:47] <logmsgbot>	 !log ebernhardson@deploy1001 Finished deploy [wikimedia/discovery/analytics@5d87cc0]: now without the shebang (duration: 00m 17s)
[18:25:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:28:11] <wikibugs_>	 (03CR) 10Gehel: [C: 031] "LGTM, trivial enough" [software/spicerack] - 10https://gerrit.wikimedia.org/r/453146 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans)
[18:29:12] <wikibugs_>	 (03CR) 10Gehel: [C: 031] "LGTM" [software/spicerack] - 10https://gerrit.wikimedia.org/r/450937 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans)
[18:30:20] <wikibugs_>	 (03PS1) 10Ankry: Allow bureaucrats to remove 'interface-admin' right in plwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/453179 (https://phabricator.wikimedia.org/T202085)
[18:30:47] <wikibugs_>	 (03CR) 10Ori.livneh: "thank you, dzahn :)" [puppet] - 10https://gerrit.wikimedia.org/r/452984 (owner: 10Ori.livneh)
[18:30:54] <wikibugs_>	 (03CR) 10Dzahn: [C: 032] "cron has been added and i manually ran the resulting command user user nobody on webperf2001. it showed no errors" [puppet] - 10https://gerrit.wikimedia.org/r/452984 (owner: 10Ori.livneh)
[18:32:33] <wikibugs_>	 (03PS1) 10Bstorm: labstore and systemd: Change timer dependency to unit instead of service [puppet] - 10https://gerrit.wikimedia.org/r/453180
[18:33:23] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] labstore and systemd: Change timer dependency to unit instead of service [puppet] - 10https://gerrit.wikimedia.org/r/453180 (owner: 10Bstorm)
[18:34:23] <wikibugs_>	 (03PS2) 10Bstorm: labstore and systemd: Change timer dependency to unit instead of service [puppet] - 10https://gerrit.wikimedia.org/r/453180 (https://phabricator.wikimedia.org/T171394)
[18:35:31] <icinga-wm>	 ACKNOWLEDGEMENT - IPMI Sensor Status on elastic1022 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] Gehel tracked in https://phabricator.wikimedia.org/T177631
[18:37:24] <logmsgbot>	 !log arlolra@deploy1001 Started deploy [parsoid/deploy@59f6585]: Updating Parsoid to dbbad6a
[18:37:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:39:20] <wikibugs_>	 (03CR) 10Volans: [V: 032 C: 032] config: rename parameter to avoid negation [software/spicerack] - 10https://gerrit.wikimedia.org/r/453146 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans)
[18:40:12] <wikibugs_>	 (03CR) 10Volans: [V: 032 C: 032] Add cookbook entry point script [software/spicerack] - 10https://gerrit.wikimedia.org/r/450937 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans)
[18:45:05] <gehel>	 !log reimage of elasticsearch eqiad completed - T198391 / T193649
[18:45:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:45:14] <stashbot>	 T198391: migrate elasticsearch cirrus cluster to RAID0 - https://phabricator.wikimedia.org/T198391
[18:45:15] <stashbot>	 T193649: migrate elasticsearch to stretch (from jessie) - https://phabricator.wikimedia.org/T193649
[18:47:15] <logmsgbot>	 !log arlolra@deploy1001 Finished deploy [parsoid/deploy@59f6585]: Updating Parsoid to dbbad6a (duration: 09m 51s)
[18:47:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:49:26] <wikibugs_>	 (03PS3) 10Bstorm: labstore and systemd: Change timer dependency to unit instead of service [puppet] - 10https://gerrit.wikimedia.org/r/453180 (https://phabricator.wikimedia.org/T171394)
[18:50:27] <wikibugs_>	 (03PS2) 10Ankry: Allow bureaucrats to remove 'interface-admin' right in plwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/453179 (https://phabricator.wikimedia.org/T202085)
[18:52:39] <arlolra>	 !log Updated Parsoid to dbbad6a (T201115)
[18:52:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:52:46] <stashbot>	 T201115: MediaWiki API deprecation warnings - https://phabricator.wikimedia.org/T201115
[18:53:38] <wikibugs_>	 10Operations, 10ops-eqiad, 10DC-Ops: Replace wtp1043's sda - https://phabricator.wikimedia.org/T196886 (10RobH) a:05RobH>03Cmjohnson Dell fixed the ownership info for us, you can put in requests for support and parts now.
[18:55:55] <wikibugs_>	 (03CR) 10Bstorm: [C: 032] labstore and systemd: Change timer dependency to unit instead of service [puppet] - 10https://gerrit.wikimedia.org/r/453180 (https://phabricator.wikimedia.org/T171394) (owner: 10Bstorm)
[18:56:57] <logmsgbot>	 !log ebernhardson@deploy1001 Started deploy [wikimedia/discovery/analytics@040690e]: coordinator.properties should reference a coordinator, not bundle
[18:57:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:57:15] <logmsgbot>	 !log ebernhardson@deploy1001 Finished deploy [wikimedia/discovery/analytics@040690e]: coordinator.properties should reference a coordinator, not bundle (duration: 00m 18s)
[18:57:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:00:04] <jouncebot>	 thcipriani and twentyafterfour: #bothumor My software never has bugs. It just develops random features. Rise for Gerrit All-Users/Cache change. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180816T1900).
[19:00:23] * thcipriani on it
[19:05:45] <logmsgbot>	 !log ebernhardson@deploy1001 Started deploy [wikimedia/discovery/analytics@9fb53c4]: (no justification provided)
[19:05:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:06:02] <logmsgbot>	 !log ebernhardson@deploy1001 Finished deploy [wikimedia/discovery/analytics@9fb53c4]: (no justification provided) (duration: 00m 17s)
[19:06:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:10:16] <legoktm>	 !log T201314 mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=enwiki --logwiki=metawiki 'EricEnfermero' 'Larry Hockett' --ignorestatus
[19:10:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:10:27] <stashbot>	 T201314: Please unblock stuck global rename: EricEnfermero to Larry Hockett - https://phabricator.wikimedia.org/T201314
[19:11:22] <twentyafterfour>	 !log twentyafterfour and thcipriani performing online maintenance on gerrit All-Users repo
[19:11:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:15:48] <thcipriani>	 !log clearing gerrit accounts cache
[19:15:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:20:05] <stephanebisson>	 Hi! I've lost track of the hhvm/php7 discussions... Are we running php7 in prod now? Is it ok to use the ?? syntax?
[19:20:26] <Reedy>	 no, we're not
[19:20:28] <Reedy>	 But yes it is
[19:20:51] <stephanebisson>	 Because hhvm supports it?
[19:21:07] <Reedy>	 yup
[19:21:23] <Reedy>	 we're in a weird limbo
[19:22:00] <stephanebisson>	 limbo is not cool, but the ?? syntax is. Thanks Reedy 
[19:22:00] <Reedy>	 https://docs.hhvm.com/hhvm/configuration/INI-settings#php-7-settings are related to the things that you can't do
[19:25:41] <paladox>	 twentyafterfour thcipriani did you reindex too?
[19:28:18] <thcipriani>	 paladox: we determined it was likely not necessary for what we changed
[19:28:29] <paladox>	 oh
[19:28:31] <paladox>	 ok
[19:33:15] <wikibugs_>	 (03PS1) 10Andrew Bogott: nfs-exportd: add exports for neutron IPs [puppet] - 10https://gerrit.wikimedia.org/r/453192 (https://phabricator.wikimedia.org/T202088)
[19:33:52] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] nfs-exportd: add exports for neutron IPs [puppet] - 10https://gerrit.wikimedia.org/r/453192 (https://phabricator.wikimedia.org/T202088) (owner: 10Andrew Bogott)
[19:34:41] <wikibugs_>	 (03PS2) 10Andrew Bogott: nfs-exportd: add exports for neutron IPs [puppet] - 10https://gerrit.wikimedia.org/r/453192 (https://phabricator.wikimedia.org/T202088)
[19:36:09] <wikibugs_>	 (03PS3) 10Andrew Bogott: nfs-exportd: add exports for neutron IPs [puppet] - 10https://gerrit.wikimedia.org/r/453192 (https://phabricator.wikimedia.org/T202088)
[19:37:22] <wikibugs_>	 (03CR) 10Andrew Bogott: [C: 032] nfs-exportd: add exports for neutron IPs [puppet] - 10https://gerrit.wikimedia.org/r/453192 (https://phabricator.wikimedia.org/T202088) (owner: 10Andrew Bogott)
[19:42:54] <wikibugs_>	 (03CR) 10Dzahn: [V: 031 C: 032] "was once used for performance.wm website, that (and nothing else i can see) is using it anymore, reduces apache module, which is what we w" [puppet] - 10https://gerrit.wikimedia.org/r/452687 (owner: 10Krinkle)
[19:43:08] <wikibugs_>	 (03PS2) 10Dzahn: apache: Remove unused apache::static_site type [puppet] - 10https://gerrit.wikimedia.org/r/452687 (owner: 10Krinkle)
[19:44:28] <wikibugs_>	 (03PS4) 10Volans: Add confctl module to interact with conftool [software/spicerack] - 10https://gerrit.wikimedia.org/r/451254 (https://phabricator.wikimedia.org/T199079)
[19:45:12] <wikibugs_>	 (03CR) 10Volans: "removed logging when raising exceptions, addressed 1 comment, 1 pending, see inline" (032 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/451254 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans)
[19:45:14] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Add confctl module to interact with conftool [software/spicerack] - 10https://gerrit.wikimedia.org/r/451254 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans)
[19:54:22] <mutante>	 !log phab1002 closing idle root screen that was used for rsyncing repos
[19:54:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:55:17] <wikibugs_>	 (03PS2) 10Dzahn: memcached: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/450319
[19:56:00] <wikibugs_>	 (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/12118/mc1020.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/450319 (owner: 10Dzahn)
[20:07:54] <wikibugs_>	 10Operations, 10Puppet, 10Wikidata, 10Wikidata-Query-Service, 10cloud-services-team: convert cloud VPS projects from apache to httpd module (wikidata-query/ldfclient) - https://phabricator.wikimedia.org/T202092 (10Dzahn)
[20:08:16] <wikibugs_>	 10Operations, 10Puppet, 10Wikidata, 10Wikidata-Query-Service, 10cloud-services-team: convert cloud VPS projects from apache to httpd module (wikidata-query/ldfclient) - https://phabricator.wikimedia.org/T202092 (10Dzahn) p:05Triage>03Low
[20:11:31] <wikibugs_>	 10Operations, 10Puppet, 10Wikidata, 10Wikidata-Query-Service, 10cloud-services-team: convert cloud VPS projects from apache to httpd module (wikidata-query/ldfclient) - https://phabricator.wikimedia.org/T202092 (10Dzahn) related:  convert the "role(simplelamp)" which is used by more things:  https://gerr...
[20:16:41] <wikibugs_>	 10Operations, 10Patch-For-Review: Onboarding Effie Mouzeli - https://phabricator.wikimedia.org/T201816 (10Dzahn)
[20:18:01] <wikibugs_>	 10Operations, 10Patch-For-Review: Onboarding Effie Mouzeli - https://phabricator.wikimedia.org/T201816 (10Dzahn) checked that "ops" LDAP group was also done.  Icinga and mail we will do Monday, talked about it on Service Ops meeting  for the pwstore part we will need a GPG key from you @jijiki but it has time
[20:19:06] <mutante>	 @seen thcipriani 
[20:19:06] <wm-bot>	 mutante: thcipriani is in here, right now
[20:19:17] <thcipriani>	 o/
[20:19:44] <mutante>	 heh:) hi Tyler. i would like to schedule a reboot of gerrit servers
[20:19:51] <mutante>	 though gerrit2001 i would jfdi
[20:23:12] <wikibugs_>	 (03CR) 10Paladox: "I doin't know if we want to do this or keep it in puppet? Seeing as these files are basically deprecated except that they may be kept for " [software/gerrit] (deploy/wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/439890 (owner: 10Paladox)
[20:26:43] <mutante>	 !log gerrit2001 - scheduled downtime, rebooting
[20:26:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:27:10] <wikibugs_>	 10Operations: `sql centralauth` is broken on mwmaint1001 - https://phabricator.wikimedia.org/T202096 (10Legoktm)
[20:33:13] <mutante>	 !log releases1001/2001: upgrading apache2 packages
[20:33:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:38:08] <mutante>	 !log releases2001: upgrading openjdk, systemd, jenkins
[20:38:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:46:25] <logmsgbot>	 !log ebernhardson@deploy1001 Started deploy [wikimedia/discovery/analytics@76dddd2]: point transfer_to_es at spark 2.x
[20:46:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:46:40] <wikibugs_>	 10Operations, 10DBA: `sql centralauth` is broken on mwmaint1001 - https://phabricator.wikimedia.org/T202096 (10Reedy)
[20:47:59] <logmsgbot>	 !log ebernhardson@deploy1001 Finished deploy [wikimedia/discovery/analytics@76dddd2]: point transfer_to_es at spark 2.x (duration: 01m 33s)
[20:48:00] <logmsgbot>	 !log ebernhardson@deploy1001 Started deploy [wikimedia/discovery/analytics@76dddd2]: point transfer_to_es at spark 2.x
[20:48:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:48:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:48:13] <wikibugs_>	 10Operations, 10DBA: `sql centralauth` is broken on mwmaint1001 - https://phabricator.wikimedia.org/T202096 (10Reedy) Looks just broken for you?  ``` reedy@mwmaint1001:~$ sql centralauth Reading table information for completion of table and column names You can turn off this feature to get a quicker startup wi...
[20:48:46] <logmsgbot>	 !log ebernhardson@deploy1001 Finished deploy [wikimedia/discovery/analytics@76dddd2]: point transfer_to_es at spark 2.x (duration: 00m 46s)
[20:48:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:50:31] <mutante>	 !log releases2001 - rebooting
[20:50:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:53:21] <wikibugs_>	 10Operations, 10DBA: `sql centralauth` is broken on mwmaint1001 - https://phabricator.wikimedia.org/T202096 (10Legoktm) :'(
[20:53:49] <logmsgbot>	 !log ebernhardson@deploy1001 Started deploy [wikimedia/discovery/analytics@76dddd2]: (no justification provided)
[20:53:52] <logmsgbot>	 !log ebernhardson@deploy1001 Finished deploy [wikimedia/discovery/analytics@76dddd2]: (no justification provided) (duration: 00m 02s)
[20:53:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:53:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:54:19] <wikibugs_>	 10Operations, 10DBA: `sql centralauth` is broken on mwmaint1001 - https://phabricator.wikimedia.org/T202096 (10Legoktm) 05Open>03Invalid I had an old ~/.my.cnf that was apparently getting in the way. My bad :(
[21:00:10] <mutante>	 !log releases1001 - installing package upgrade like on releases2001 before, scheduling downtime, reboot
[21:00:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:01:12] <logmsgbot>	 !log ebernhardson@deploy1001 Started deploy [wikimedia/discovery/analytics@76dddd2]: debug git-fat initialization fail
[21:01:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:01:18] <logmsgbot>	 !log ebernhardson@deploy1001 Finished deploy [wikimedia/discovery/analytics@76dddd2]: debug git-fat initialization fail (duration: 00m 06s)
[21:01:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:04:00] <thcipriani>	 ebernhardson: you might need --force, if scap thinks a revision is already deployed (i.e. it sees the directory /srv/deployment/[repo]-cache/revs/[76dddd2]) it's going to assume it's deployed already and not try again
[21:04:28] <ebernhardson>	 thcipriani: ahha, yea it went awfully fast and didn't run promote :)
[21:04:45] <logmsgbot>	 !log ebernhardson@deploy1001 Started deploy [wikimedia/discovery/analytics@76dddd2]: debug git-fat initialization fail
[21:04:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:05:01] <logmsgbot>	 !log ebernhardson@deploy1001 Finished deploy [wikimedia/discovery/analytics@76dddd2]: debug git-fat initialization fail (duration: 00m 16s)
[21:05:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:05:36] <ebernhardson>	 thcipriani: i'll file a ticket, but from what i can tell there might be a race between git-fat init and the promote script?
[21:06:01] <ebernhardson>	 or git-fat pull i suppose
[21:06:14] <thcipriani>	 hrm, I think git fat pull should block
[21:06:26] <thcipriani>	 but if you file a task I can dig deeper on it
[21:06:55] <ebernhardson>	 sure, it could have been any number of errors. The symptom is pip failing to install a .whl because it's not a zip file. I added a 5 second pause and it works, but who knows what happened
[21:07:33] <thcipriani>	 IIRC: git fat init/pull should run as part of fetch, then fetch scripts run, then promote stage/symlink swap, then promote scripts run
[21:07:52] <thcipriani>	 that is, I think git fat stuff happens as part of a different stage
[21:08:00] <ebernhardson>	 hmm, ok then that's certainly odd
[21:08:50] <mutante>	 !log contint2001 - installing jenkins upgrade
[21:08:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:09:25] <icinga-wm>	 RECOVERY - Memory correctable errors -EDAC- on wtp2011 is OK: (C)4 ge (W)2 ge 1 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=wtp2011&var-datasource=codfw%2520prometheus%252Fops
[21:12:07] <wikibugs_>	 10Operations, 10Traffic, 10Patch-For-Review, 10User-notice: Removing support for AES128-SHA TLS cipher - https://phabricator.wikimedia.org/T147202 (10Jdforrester-WMF)
[21:12:10] <wikibugs_>	 10Operations, 10Traffic, 10Goal, 10Patch-For-Review: Begin execution of non-forward-secret ciphers deprecation - https://phabricator.wikimedia.org/T192555 (10Jdforrester-WMF) 05Open>03Resolved a:03Vgutierrez Please re-open if I'm wrong.
[21:12:22] <wikibugs_>	 10Operations, 10Traffic, 10Patch-For-Review: Planning for phasing out non-Forward-Secret TLS ciphers - https://phabricator.wikimedia.org/T118181 (10Jdforrester-WMF)
[21:12:24] <wikibugs_>	 10Operations, 10Traffic, 10Patch-For-Review, 10User-notice: Removing support for AES128-SHA TLS cipher - https://phabricator.wikimedia.org/T147202 (10Jdforrester-WMF) 05Open>03Resolved a:03Vgutierrez Please re-open if I'm wrong.
[21:13:08] <wikibugs_>	 10Operations, 10Traffic: Planning for phasing out non-Forward-Secret TLS ciphers - https://phabricator.wikimedia.org/T118181 (10Jdforrester-WMF) I believe that the planning and execution of the work is all now complete?
[21:17:48] <wikibugs_>	 (03CR) 10Gehel: [C: 031] "Good enough, minor comments inline, but feel free to merge as-is" (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/451254 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans)
[21:18:14] <wikibugs_>	 (03Abandoned) 10Bstorm: labstore: set up an icinga plugin to check cron exit codes [puppet] - 10https://gerrit.wikimedia.org/r/451181 (https://phabricator.wikimedia.org/T171394) (owner: 10Bstorm)
[21:21:24] <wikibugs_>	 10Operations, 10ops-ulsfo, 10Traffic, 10netops: troubleshoot cr3/cr4 link - https://phabricator.wikimedia.org/T196030 (10RobH)   Link-level type: Flexible-Ethernet, MTU: 9192, MRU: 9200, Speed: 40Gbps, BPDU Error: None, Loop Detect PDU Error: None, Loopback: Disabled, Source filtering: Disabled, Flow contr...
[21:23:18] <thcipriani>	 !log contint1001 - installing jenkins upgrade
[21:23:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:28:55] <logmsgbot>	 !log ebernhardson@deploy1001 Started deploy [wikimedia/discovery/analytics@5731563]: point oozie sharelib at spark2.3.1
[21:28:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:29:13] <logmsgbot>	 !log ebernhardson@deploy1001 Finished deploy [wikimedia/discovery/analytics@5731563]: point oozie sharelib at spark2.3.1 (duration: 00m 18s)
[21:29:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:30:42] <mutante>	 jouncebot: now
[21:30:42] <jouncebot>	 No deployments scheduled for the next 1 hour(s) and 29 minute(s)
[21:30:45] <wikibugs_>	 (03PS1) 10Dzahn: releases/mediawiki: proper Icinga monitoring for both Apache vhosts [puppet] - 10https://gerrit.wikimedia.org/r/453267
[21:32:38] <wikibugs_>	 (03CR) 10Paladox: releases/mediawiki: proper Icinga monitoring for both Apache vhosts (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/453267 (owner: 10Dzahn)
[21:34:00] <mutante>	 hello, we have to reboot the gerrit prod server. sorry for logging you out soon. 
[21:34:28] <mutante>	 you can stop me if you are currently doing that one-hour inline edit mega patchset
[21:35:05] <enterprisey>	 quick question: the second interface admin config change (removing the ability to edit MW space from sysops) will go in the European SWAT on the 27th, right?
[21:36:11] <wikibugs_>	 (03CR) 10Dzahn: releases/mediawiki: proper Icinga monitoring for both Apache vhosts (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/453267 (owner: 10Dzahn)
[21:38:30] <wikibugs_>	 (03PS2) 10Dzahn: releases/mediawiki: proper Icinga monitoring for both Apache vhosts [puppet] - 10https://gerrit.wikimedia.org/r/453267
[21:39:33] <mutante>	 enterprisey: this? [config] 453179 Allow 'crats to remove IA right in pl.ws ?
[21:39:43] <logmsgbot>	 !log ebernhardson@deploy1001 Started deploy [wikimedia/discovery/analytics@5731563]: point oozie sharelib at spark2.3.0
[21:39:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:39:49] <mutante>	 that's the only thing i see scheduled on that day in the calendar
[21:39:57] <logmsgbot>	 !log ebernhardson@deploy1001 Finished deploy [wikimedia/discovery/analytics@5731563]: point oozie sharelib at spark2.3.0 (duration: 00m 14s)
[21:40:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:40:31] <enterprisey>	 mutante: something related ot https://phabricator.wikimedia.org/T190015
[21:40:53] <enterprisey>	 I don't think the patch has even appeared in the thread yet, so this may be a silly time to ask the question
[21:41:05] <logmsgbot>	 !log ebernhardson@deploy1001 Started deploy [wikimedia/discovery/analytics@5731563]: point oozie sharelib at spark2.3.0
[21:41:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:41:15] <mutante>	 enterprisey: i think you'll have to get it on https://wikitech.wikimedia.org/wiki/Deployments#Monday,_August_27  or it might be no
[21:41:26] <enterprisey>	 I see
[21:41:31] <mutante>	 it's all about the calendar page
[21:41:36] <mutante>	 that's also what jouncebot reads
[21:41:51] <enterprisey>	 yeah I'm not involved in the dev work for this patch, enwp is just very interested in our deadline to start granting the perm to people
[21:42:25] <mutante>	 i see.. yea, but you can stil ask for it to be deployed by somebody or in SWAT depending on the nature of the change
[21:42:33] <enterprisey>	 alright solid thanks
[21:42:38] <mutante>	 quite welcome
[21:44:47] <mutante>	 !log rebooting cobalt (gerrit.wikimedia.org) for kernel upgrade
[21:44:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:45:48] <James_F>	 Ah, right, that's why gerrit's dead. :-)
[21:46:02] <logmsgbot>	 !log ebernhardson@deploy1001 Finished deploy [wikimedia/discovery/analytics@5731563]: point oozie sharelib at spark2.3.0 (duration: 04m 56s)
[21:46:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:46:42] <mutante>	 James_F: yes, i made a mini announcement on -dev  and here. coming back shortly
[21:47:34] <mutante>	 server is back.. service should start in a few
[21:48:30] <mutante>	 James_F: back
[21:48:36] <James_F>	 Thanks!
[21:50:15] <davidwbarratt>	 did Jenkins go down or is still processing the queue?
[21:50:46] <logmsgbot>	 !log ebernhardson@deploy1001 Started deploy [wikimedia/discovery/analytics@5731563]: try again after git-fat init fail
[21:50:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:51:04] <mutante>	 davidwbarratt: it had to be restarted but is separate from gerrit, it happened earlier
[21:51:04] <logmsgbot>	 !log ebernhardson@deploy1001 Finished deploy [wikimedia/discovery/analytics@5731563]: try again after git-fat init fail (duration: 00m 18s)
[21:51:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:51:16] <davidwbarratt>	 mutante oh ok
[21:53:43] <thcipriani>	 davidwbarratt: are you asking because zuul looks backed up? Or something else?
[21:54:08] <davidwbarratt>	 thcipriani oh no, it was just taking awhile, but jenkinsbot finally got back to me
[21:54:18] <thcipriani>	 ah, cool
[21:55:35] <wikibugs_>	 10Operations, 10SRE-Access-Requests, 10User-Addshore: Requesting Access to view EventLogging data - https://phabricator.wikimedia.org/T202072 (10Addshore)
[21:55:54] <wikibugs_>	 10Operations, 10SRE-Access-Requests, 10User-Addshore: Requesting access to view EventLogging data - https://phabricator.wikimedia.org/T202069 (10Addshore)
[21:55:57] <logmsgbot>	 !log ebernhardson@deploy1001 Started deploy [wikimedia/discovery/analytics@5731563]: try again after git-fat init fail
[21:55:59] <logmsgbot>	 !log ebernhardson@deploy1001 deploy aborted: try again after git-fat init fail (duration: 00m 01s)
[21:56:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:56:01] <wikibugs_>	 10Operations, 10SRE-Access-Requests, 10User-Addshore: Requesting access to view EventLogging data - https://phabricator.wikimedia.org/T202063 (10Addshore)
[21:56:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:56:35] <logmsgbot>	 !log ebernhardson@deploy1001 Started deploy [wikimedia/discovery/analytics@651904b]: use spark 2.3.0, oozie still doesnt like 2.3.1
[21:56:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:56:51] <logmsgbot>	 !log ebernhardson@deploy1001 Finished deploy [wikimedia/discovery/analytics@651904b]: use spark 2.3.0, oozie still doesnt like 2.3.1 (duration: 00m 16s)
[21:56:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:09:35] <icinga-wm>	 PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 71, down: 1, dormant: 0, excluded: 0, unused: 0
[22:09:54] <wikibugs_>	 (03PS6) 10Ayounsi: [WIP] Bird anycast: add anycast_healthchecker [puppet] - 10https://gerrit.wikimedia.org/r/397723
[22:09:54] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 229, down: 1, dormant: 0, excluded: 0, unused: 0
[22:20:36] <wikibugs_>	 (03CR) 10Ayounsi: "https://puppet-compiler.wmflabs.org/compiler02/12120/dns1001.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/397723 (owner: 10Ayounsi)
[22:24:25] <logmsgbot>	 !log ebernhardson@deploy1001 Started deploy [wikimedia/discovery/analytics@e70f9d5]: dont override the spark sharelib
[22:24:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:24:36] <logmsgbot>	 !log ebernhardson@deploy1001 Finished deploy [wikimedia/discovery/analytics@e70f9d5]: dont override the spark sharelib (duration: 00m 11s)
[22:24:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:24:48] <logmsgbot>	 !log ebernhardson@deploy1001 Started deploy [wikimedia/discovery/analytics@e70f9d5]: retry git fat init fail
[22:24:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:25:02] <logmsgbot>	 !log ebernhardson@deploy1001 Finished deploy [wikimedia/discovery/analytics@e70f9d5]: retry git fat init fail (duration: 00m 14s)
[22:25:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:26:38] <wikibugs_>	 10Operations, 10Scap: Intermittent git-fat failure during deploy - https://phabricator.wikimedia.org/T202100 (10thcipriani) It is that same problem!  The current version of git-fat doesn't have my commit in it: https://github.com/wikimedia/operations-debs-git-fat/commit/0e3abb0c5e8b1e4d81470397ec17138c6d24d9e8...
[22:32:01] <XioNoX>	 !log re-activating BGP sessions between cr1/2-ulsfo and the office's router2
[22:32:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:54:30] <wikibugs_>	 (03CR) 10Dzahn: "leaving inline comments for stuff found out while debugging" (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/397723 (owner: 10Ayounsi)
[22:56:14] <icinga-wm>	 RECOVERY - Long running screen/tmux on phab1002 is OK: OK: No SCREEN or tmux processes detected.
[23:00:04] <jouncebot>	 addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for Evening SWAT (Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180816T2300).
[23:00:05] <jouncebot>	 Jdlrobson and nray: A patch you scheduled for Evening SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[23:02:23] <jdlrobson>	 \o
[23:02:32] <nray>	 \o
[23:03:35] <jdlrobson>	 mark: twentyafterfour RoanKattouw thcipriani Niharika any of you able to swat right now?
[23:03:48] <RoanKattouw>	 I can do it
[23:04:48] <ejegg>	 oh man, mistaken for a spammer!
[23:05:06] <nray>	 yeah looks like @jdlrobson got booted for spam? Surely a mistake
[23:05:13] <ejegg>	 all it takes is mentioning 5 nicks I guess?
[23:05:25] <ejegg>	 Sigyn is a bot
[23:06:45] <nray>	 Can we bring him back? 
[23:06:59] <RoanKattouw>	 He says he got a message saying "please email this person if you think this is a mistaek"
[23:07:19] <RoanKattouw>	 Thankfully he and I are in the same office :)
[23:07:33] <nray>	 haha
[23:23:26] <wikibugs_>	 10Operations, 10LDAP-Access-Requests, 10User-Addshore: Give access to graphite and grafana-admin to Aleksey Bekh-Ivanov (WMDE) - https://phabricator.wikimedia.org/T199233 (10Dzahn) a:05Aleksey_WMDE>03Dzahn
[23:25:59] <mutante>	 i can tell #freenode it was a mistake
[23:26:48] <legoktm>	 he should also auth with freenode
[23:26:52] <paladox>	 if he gets a wikimedia cloak or a wikipedia cloak the bot will not kick him.
[23:28:07] <wikibugs_>	 10Operations, 10Analytics, 10vm-requests: eqiad: (3) VM %request for internal analytics web sites - https://phabricator.wikimedia.org/T202013 (10Dzahn) a:03Dzahn
[23:30:14] <mutante>	 <@Unit193> mutante: Might want to have him be more careful around Sigyn. 
[23:30:17] <mutante>	 :p
[23:34:39] <logmsgbot>	 !log catrope@deploy1001 Synchronized php-1.32.0-wmf.16/skins/MinervaNeue/resources/skins.minerva.content.styles.images/magnifying-glass.svg: Correct MinervaNeue search icon (T199000) (duration: 00m 51s)
[23:34:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:34:47] <stashbot>	 T199000: Remove redundant/non-critical styling rules in Minerva - https://phabricator.wikimedia.org/T199000
[23:35:44] <mutante>	 RoanKattouw: fixed it for Jon   19:34 <@Unit193> mutante: I've removed it at this point.
[23:48:21] <jdlrobson>	 thanks ... (looks around before 
[23:48:37] <jdlrobson>	 thanks! (looks around before daring to @) @mutante 
[23:50:17] <paladox>	 jdlrobson you should register your nick and login :)
[23:51:05] <jdlrobson>	 paladox: i have...
[23:51:08] <jdlrobson>	 that's what makes it so strange
[23:51:10] <paladox>	 oh
[23:51:15] <jdlrobson>	 unless something broke?
[23:52:33] <mutante>	 we don't see you with your cloak at least
[23:52:38] <mutante>	 so maybe something broke, yea
[23:52:42] <jdlrobson>	 hmm
[23:53:16] <jdlrobson>	 NickServ says I'm identified.. unless you are talking about something else?
[23:54:44] <mutante>	 i can confirm that. nickserv says you are logged in (status: 3)
[23:54:59] <mutante>	 we meant (additionally) that you have the /wikimedia as address
[23:56:24] <icinga-wm>	 PROBLEM - https://grafana.wikimedia.org/dashboard/db/varnish-http-requests grafana alert on einsteinium is CRITICAL: CRITICAL: https://grafana.wikimedia.org/dashboard/db/varnish-http-requests is alerting: 70% GET drop in 30min alert.
[23:57:07] <mutante>	 it might help even more, i dont know about the rules of that bot of course
[23:58:34] <icinga-wm>	 RECOVERY - https://grafana.wikimedia.org/dashboard/db/varnish-http-requests grafana alert on einsteinium is OK: OK: https://grafana.wikimedia.org/dashboard/db/varnish-http-requests is not alerting.