[00:00:04] addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor My software never has bugs. It just develops random features. Rise for Evening SWAT (Max 8 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180216T0000). [00:00:06] MatmaRex: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [00:02:00] hi [00:05:53] anyone? [00:06:55] i can ship that i suppose [00:08:18] (03PS2) 10Krinkle: [WIP] extract2: Set wiki context directly instead of MW_LANG indirection [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410109 [00:08:25] (03PS2) 10Krinkle: [WIP] multiversion: Remove support for MW_LANG env override [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410110 [00:11:28] ebernhardson: i would appreciate that [00:14:05] MatmaRex: pulled to mwdebug1001 [00:14:52] ebernhardson: works as expected! [00:16:52] !log ebernhardson@tin Synchronized php-1.31.0-wmf.21/extensions/ProofreadPage/modules/page/ext.proofreadpage.page.edit.js: SWAT: T187454 fix text selection on #wpTextbox1 (duration: 00m 58s) [00:17:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:17:07] T187454: "encapsulateSelection" broken with the proofread-page content model - https://phabricator.wikimedia.org/T187454 [00:17:16] MatmaRex: all synced out [00:19:56] thanks ebernhardson [00:19:59] works in prod [00:28:45] (03CR) 10Krinkle: [C: 04-1] Move all dblists on noc to dblists/ directory, rather than individually (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394199 (owner: 10Chad) [00:30:27] PROBLEM - HHVM rendering on mw2220 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:31:17] RECOVERY - HHVM rendering on mw2220 is OK: HTTP OK: HTTP/1.1 200 OK - 79320 bytes in 0.281 second response time [01:00:27] PROBLEM - puppet last run on analytics1065 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:00:47] PROBLEM - puppet last run on db1067 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:02:08] PROBLEM - puppet last run on aqs1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:02:47] PROBLEM - puppet last run on elastic1048 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:03:08] PROBLEM - puppet last run on maps1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:03:28] PROBLEM - puppet last run on puppetmaster2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:03:38] PROBLEM - puppet last run on wtp1026 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:03:58] PROBLEM - puppet last run on cp1047 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:03:58] PROBLEM - puppet last run on baham is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:04:07] PROBLEM - puppet last run on elastic1018 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:04:17] PROBLEM - puppet last run on labvirt1012 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:04:17] PROBLEM - puppet last run on elastic1043 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:08:36] Puppetdb? [01:08:55] mutante: herron ^^ [01:28:38] RECOVERY - puppet last run on wtp1026 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [01:29:07] RECOVERY - puppet last run on cp1047 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [01:29:07] RECOVERY - puppet last run on baham is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [01:29:07] RECOVERY - puppet last run on elastic1018 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [01:29:17] RECOVERY - puppet last run on labvirt1012 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [01:29:17] RECOVERY - puppet last run on elastic1043 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [01:30:28] RECOVERY - puppet last run on analytics1065 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [01:30:38] RECOVERY - puppet last run on db1067 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [01:32:08] RECOVERY - puppet last run on aqs1006 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [01:32:47] RECOVERY - puppet last run on elastic1048 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [01:33:07] RECOVERY - puppet last run on maps1001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [01:33:37] RECOVERY - puppet last run on puppetmaster2002 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [02:07:47] PROBLEM - puppet last run on scb1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token] [02:37:47] RECOVERY - puppet last run on scb1002 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [03:08:07] RECOVERY - Check whether ferm is active by checking the default input chain on labpuppetmaster1001 is OK: OK ferm input default policy is set [03:08:17] RECOVERY - Check systemd state on labpuppetmaster1002 is OK: OK - running: The system is fully operational [03:11:35] ACKNOWLEDGEMENT - ensure kvm processes are running on labtestvirt2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args /usr/bin/kvm andrew bogott These are part of a new test that chase is working on no need to alert. [03:11:36] ACKNOWLEDGEMENT - ensure kvm processes are running on labtestvirt2002 is CRITICAL: PROCS CRITICAL: 0 processes with regex args /usr/bin/kvm andrew bogott These are part of a new test that chase is working on no need to alert. [03:25:18] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 768.99 seconds [03:59:27] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 267.93 seconds [06:35:15] !log Deploy schema change on s5 primary master db1070 - T185128 T153182 [06:35:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:35:32] T153182: Perform schema change to add externallinks.el_index_60 to all wikis - https://phabricator.wikimedia.org/T153182 [06:35:33] T185128: Schema change to prepare for dropping archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T185128 [06:37:02] (03CR) 10Muehlenhoff: [C: 031] "Looks good to me" [puppet] - 10https://gerrit.wikimedia.org/r/410599 (https://phabricator.wikimedia.org/T187355) (owner: 10Bstorm) [06:37:09] (03PS2) 10Marostegui: db-eqiad.php: Depool db1067 and db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411029 [06:39:54] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2048 - https://phabricator.wikimedia.org/T187419#3977674 (10Marostegui) 05Open>03Resolved All good now - thanks Papaul! ``` logicaldrive 1 (3.3 TB, RAID 1+0, OK) physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 600 GB, OK) physicald... [06:40:48] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1067 and db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411029 (owner: 10Marostegui) [06:41:18] !log installing installing quagga security updates [06:41:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:43:18] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1067 and db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411029 (owner: 10Marostegui) [06:43:31] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1067 and db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411029 (owner: 10Marostegui) [06:46:24] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1089 and db1067 - T162807 (duration: 00m 59s) [06:46:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:46:36] T162807: Run pt-table-checksum on s1 (enwiki) - https://phabricator.wikimedia.org/T162807 [07:22:20] (03PS7) 10Giuseppe Lavagetto: Release new version of conftool [software/conftool] - 10https://gerrit.wikimedia.org/r/410226 [07:23:34] (03CR) 10jerkins-bot: [V: 04-1] Release new version of conftool [software/conftool] - 10https://gerrit.wikimedia.org/r/410226 (owner: 10Giuseppe Lavagetto) [07:24:51] (03PS8) 10Giuseppe Lavagetto: Release new version of conftool [software/conftool] - 10https://gerrit.wikimedia.org/r/410226 [07:26:00] (03CR) 10jerkins-bot: [V: 04-1] Release new version of conftool [software/conftool] - 10https://gerrit.wikimedia.org/r/410226 (owner: 10Giuseppe Lavagetto) [07:26:15] <_joe_> uhm [07:26:24] <_joe_> gbp works on this version, wth? [07:30:30] (03PS9) 10Giuseppe Lavagetto: Release new version of conftool [software/conftool] - 10https://gerrit.wikimedia.org/r/410226 [07:32:38] <_joe_> ook, that's better [07:32:59] (03CR) 10Giuseppe Lavagetto: [C: 032] Release new version of conftool [software/conftool] - 10https://gerrit.wikimedia.org/r/410226 (owner: 10Giuseppe Lavagetto) [07:37:45] (03CR) 10Muehlenhoff: cassandra: enable component/cassandra33 where applicable (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/410252 (https://phabricator.wikimedia.org/T186619) (owner: 10Eevans) [08:03:07] PROBLEM - Host mwdebug1002 is DOWN: PING CRITICAL - Packet loss = 0%, RTA = 2444.27 ms [08:03:27] RECOVERY - Host mwdebug1002 is UP: PING OK - Packet loss = 0%, RTA = 2.96 ms [08:04:27] PROBLEM - Host webperf1001 is DOWN: PING CRITICAL - Packet loss = 100% [08:04:47] PROBLEM - Host chlorine is DOWN: PING CRITICAL - Packet loss = 100% [08:04:47] PROBLEM - Host bohrium is DOWN: PING CRITICAL - Packet loss = 100% [08:04:57] PROBLEM - SSH on ganeti1006 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:05:28] PROBLEM - Host mwdebug1002 is DOWN: PING CRITICAL - Packet loss = 100% [08:05:28] PROBLEM - Host dubnium is DOWN: PING CRITICAL - Packet loss = 100% [08:05:28] PROBLEM - Host logstash1008 is DOWN: PING CRITICAL - Packet loss = 100% [08:05:28] PROBLEM - Host planet1001 is DOWN: PING CRITICAL - Packet loss = 100% [08:05:37] PROBLEM - Host releases1001 is DOWN: PING CRITICAL - Packet loss = 100% [08:05:37] PROBLEM - Host hassium is DOWN: PING CRITICAL - Packet loss = 100% [08:05:48] PROBLEM - Host logstash1007 is DOWN: PING CRITICAL - Packet loss = 100% [08:05:48] PROBLEM - Host netmon1003 is DOWN: PING CRITICAL - Packet loss = 100% [08:05:57] PROBLEM - Host install1002 is DOWN: PING CRITICAL - Packet loss = 100% [08:05:58] PROBLEM - Host rutherfordium is DOWN: PING CRITICAL - Packet loss = 100% [08:06:41] PROBLEM - LVS HTTP IPv4 on logstash.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:07:17] PROBLEM - PyBal backends health check on lvs1010 is CRITICAL: PYBAL CRITICAL - CRITICAL - logstash-json-tcp_11514: Servers logstash1008.eqiad.wmnet are marked down but pooled: logstash-log4j_4560: Servers logstash1008.eqiad.wmnet are marked down but pooled: logstash-json-udp_11514_udp: Servers logstash1008.eqiad.wmnet are marked down but pooled: logstash-syslog-tcp_10514: Servers logstash1008.eqiad.wmnet are marked down but poo [08:07:17] g-udp_10514_udp: Servers logstash1008.eqiad.wmnet are marked down but pooled: kibana_80: Servers logstash1008.eqiad.wmnet are marked down but pooled [08:07:37] PROBLEM - PyBal backends health check on lvs1003 is CRITICAL: PYBAL CRITICAL - CRITICAL - logstash-json-tcp_11514: Servers logstash1008.eqiad.wmnet are marked down but pooled: logstash-log4j_4560: Servers logstash1008.eqiad.wmnet are marked down but pooled: logstash-json-udp_11514_udp: Servers logstash1008.eqiad.wmnet are marked down but pooled: logstash-syslog-tcp_10514: Servers logstash1008.eqiad.wmnet are marked down but poo [08:07:37] g-udp_10514_udp: Servers logstash1008.eqiad.wmnet are marked down but pooled: kibana_80: Servers logstash1008.eqiad.wmnet are marked down but pooled [08:07:38] PROBLEM - PyBal backends health check on lvs1006 is CRITICAL: PYBAL CRITICAL - CRITICAL - logstash-json-tcp_11514: Servers logstash1008.eqiad.wmnet are marked down but pooled: logstash-log4j_4560: Servers logstash1008.eqiad.wmnet are marked down but pooled: logstash-json-udp_11514_udp: Servers logstash1008.eqiad.wmnet are marked down but pooled: logstash-syslog-tcp_10514: Servers logstash1008.eqiad.wmnet are marked down but poo [08:07:38] g-udp_10514_udp: Servers logstash1008.eqiad.wmnet are marked down but pooled: kibana_80: Servers logstash1008.eqiad.wmnet are marked down but pooled [08:07:48] RECOVERY - Host bohrium is UP: PING WARNING - Packet loss = 37%, RTA = 2.11 ms [08:07:57] RECOVERY - SSH on ganeti1006 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u4 (protocol 2.0) [08:07:57] RECOVERY - Host hassium is UP: PING OK - Packet loss = 0%, RTA = 2.55 ms [08:07:58] RECOVERY - Host logstash1008 is UP: PING OK - Packet loss = 0%, RTA = 2.63 ms [08:07:58] RECOVERY - Host chlorine is UP: PING OK - Packet loss = 0%, RTA = 2.45 ms [08:07:58] RECOVERY - Host dubnium is UP: PING OK - Packet loss = 0%, RTA = 2.46 ms [08:07:58] RECOVERY - Host mwdebug1002 is UP: PING OK - Packet loss = 0%, RTA = 2.30 ms [08:07:58] RECOVERY - Host rutherfordium is UP: PING OK - Packet loss = 0%, RTA = 2.31 ms [08:07:58] RECOVERY - Host install1002 is UP: PING OK - Packet loss = 0%, RTA = 2.40 ms [08:07:59] (03PS3) 10Jcrespo: mariadb: Depool db1053 from s2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410932 (https://phabricator.wikimedia.org/T183469) [08:07:59] RECOVERY - Host netmon1003 is UP: PING OK - Packet loss = 0%, RTA = 2.46 ms [08:07:59] RECOVERY - Host logstash1007 is UP: PING OK - Packet loss = 0%, RTA = 2.39 ms [08:08:00] RECOVERY - Host releases1001 is UP: PING OK - Packet loss = 0%, RTA = 3.91 ms [08:08:07] RECOVERY - Host webperf1001 is UP: PING OK - Packet loss = 0%, RTA = 0.73 ms [08:08:07] RECOVERY - Host planet1001 is UP: PING OK - Packet loss = 0%, RTA = 1.91 ms [08:08:16] the ganeti host on which bohrium was on.. no surprise there [08:08:21] probably some extra IO [08:08:42] RECOVERY - LVS HTTP IPv4 on logstash.svc.eqiad.wmnet is OK: TCP OK - 0.003 second response time on 10.2.2.36 port 10514 [08:08:47] ah so the theory is that bohrium causes the IO spike and then the freeze? [08:08:59] :( [08:09:07] it's not ofc bohrium's fault [08:09:31] but it's the one VM where some IO spikes are expected [08:09:41] more like normal part of the process [08:09:41] can you ptu bohrium on its own vm and see what happens? [08:09:55] it's own hardware you mean [08:10:00] yes, sorry [08:10:05] its own vm host [08:10:22] I've been trying to reproduce the problem with a single vm on a host and had no luck so far [08:10:38] RECOVERY - PyBal backends health check on lvs1003 is OK: PYBAL OK - All pools are healthy [08:10:39] which means that what you suggest would alleviate the problem [08:10:45] isn't that a win? [08:10:47] RECOVERY - PyBal backends health check on lvs1006 is OK: PYBAL OK - All pools are healthy [08:10:58] PROBLEM - Apache HTTP on mwdebug1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:11:00] but it would only be an rsync away [08:11:07] PROBLEM - Nginx local proxy to apache on mwdebug1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:11:23] mortiz mentioned drbd interactions, and that is something I would consider [08:11:35] <_joe_> we're running logstash on ganeti? [08:11:37] <_joe_> uhm [08:11:56] yeah and I violated on purpose the rules about placement yesterday [08:11:57] PROBLEM - Host rutherfordium is DOWN: PING CRITICAL - Packet loss = 100% [08:11:57] PROBLEM - Host chlorine is DOWN: PING CRITICAL - Packet loss = 100% [08:11:57] PROBLEM - Host bohrium is DOWN: PING CRITICAL - Packet loss = 100% [08:12:06] drbd is the nfs of distributed filesystems [08:12:26] <_joe_> nfs is the nfs of distributed filesystems [08:12:29] <_joe_> :) [08:12:31] normally not all logstash vms would be placed on the same host [08:12:48] PROBLEM - Host logstash1008 is DOWN: PING CRITICAL - Packet loss = 100% [08:12:48] PROBLEM - Host releases1001 is DOWN: PING CRITICAL - Packet loss = 100% [08:12:52] I think this time around ganeti1006 fully died [08:12:58] PROBLEM - Host logstash1007 is DOWN: PING CRITICAL - Packet loss = 100% [08:12:58] PROBLEM - Host webperf1001 is DOWN: PING CRITICAL - Packet loss = 100% [08:12:58] PROBLEM - Host planet1001 is DOWN: PING CRITICAL - Packet loss = 100% [08:12:58] PROBLEM - Host netmon1003 is DOWN: PING CRITICAL - Packet loss = 100% [08:12:58] PROBLEM - Host dubnium is DOWN: PING CRITICAL - Packet loss = 100% [08:12:58] PROBLEM - Host install1002 is DOWN: PING CRITICAL - Packet loss = 100% [08:12:59] * akosiaris powercycles [08:13:07] PROBLEM - Host mwdebug1002 is DOWN: PING CRITICAL - Packet loss = 100% [08:13:07] PROBLEM - Host hassium is DOWN: PING CRITICAL - Packet loss = 100% [08:13:47] PROBLEM - PyBal backends health check on lvs1003 is CRITICAL: PYBAL CRITICAL - CRITICAL - logstash-json-tcp_11514: Servers logstash1007.eqiad.wmnet are marked down but pooled: logstash-log4j_4560: Servers logstash1007.eqiad.wmnet are marked down but pooled: logstash-json-udp_11514_udp: Servers logstash1007.eqiad.wmnet are marked down but pooled: logstash-syslog-tcp_10514: Servers logstash1007.eqiad.wmnet are marked down but poo [08:13:47] g-udp_10514_udp: Servers logstash1007.eqiad.wmnet are marked down but pooled: kibana_80: Servers logstash1008.eqiad.wmnet are marked down but pooled [08:13:47] PROBLEM - PyBal backends health check on lvs1006 is CRITICAL: PYBAL CRITICAL - CRITICAL - logstash-json-tcp_11514: Servers logstash1008.eqiad.wmnet are marked down but pooled: logstash-log4j_4560: Servers logstash1007.eqiad.wmnet are marked down but pooled: logstash-json-udp_11514_udp: Servers logstash1008.eqiad.wmnet are marked down but pooled: logstash-syslog-tcp_10514: Servers logstash1007.eqiad.wmnet are marked down but poo [08:13:47] g-udp_10514_udp: Servers logstash1007.eqiad.wmnet are marked down but pooled: kibana_80: Servers logstash1007.eqiad.wmnet are marked down but pooled [08:13:58] !log powercycle ganeti1006 [08:14:07] PROBLEM - SSH on ganeti1006 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:14:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:14:17] !log powercycle ganeti1006 T181121 [08:14:20] as long as that doesn't bring down our whole infrastructure... [08:14:21] that's more like it [08:14:22] PROBLEM - LVS HTTP IPv4 on kibana.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:14:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:14:30] T181121: Kernels errors on ganeti1005- ganeti1008 under high I/O - https://phabricator.wikimedia.org/T181121 [08:16:08] PROBLEM - ganeti-confd running on ganeti1006 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 113 (gnt-confd), command name ganeti-confd [08:16:17] RECOVERY - SSH on ganeti1006 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u4 (protocol 2.0) [08:17:07] RECOVERY - ganeti-confd running on ganeti1006 is OK: PROCS OK: 1 process with UID = 113 (gnt-confd), command name ganeti-confd [08:18:27] RECOVERY - Host logstash1008 is UP: PING OK - Packet loss = 0%, RTA = 2.65 ms [08:18:37] RECOVERY - Host chlorine is UP: PING OK - Packet loss = 0%, RTA = 6.40 ms [08:18:37] RECOVERY - Host rutherfordium is UP: PING OK - Packet loss = 0%, RTA = 6.77 ms [08:18:47] PROBLEM - Misc HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=misc&var-status_type=5 [08:18:47] RECOVERY - Host mwdebug1002 is UP: PING OK - Packet loss = 0%, RTA = 7.58 ms [08:18:47] RECOVERY - Host releases1001 is UP: PING OK - Packet loss = 0%, RTA = 6.85 ms [08:18:57] RECOVERY - Host logstash1007 is UP: PING OK - Packet loss = 0%, RTA = 7.47 ms [08:18:57] RECOVERY - Host dubnium is UP: PING OK - Packet loss = 0%, RTA = 7.93 ms [08:18:58] RECOVERY - Apache HTTP on mwdebug1002 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 622 bytes in 0.198 second response time [08:18:58] RECOVERY - Nginx local proxy to apache on mwdebug1002 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 623 bytes in 0.232 second response time [08:19:07] RECOVERY - Host bohrium is UP: PING WARNING - Packet loss = 58%, RTA = 8.39 ms [08:19:07] RECOVERY - Host hassium is UP: PING OK - Packet loss = 0%, RTA = 9.38 ms [08:19:17] RECOVERY - Host netmon1003 is UP: PING OK - Packet loss = 0%, RTA = 8.25 ms [08:19:22] RECOVERY - LVS HTTP IPv4 on kibana.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 3162 bytes in 0.026 second response time [08:19:27] RECOVERY - Host webperf1001 is UP: PING OK - Packet loss = 0%, RTA = 8.56 ms [08:19:27] RECOVERY - Host planet1001 is UP: PING OK - Packet loss = 0%, RTA = 8.98 ms [08:20:57] RECOVERY - Host install1002 is UP: PING OK - Packet loss = 0%, RTA = 8.54 ms [08:22:16] (03CR) 10Jcrespo: [C: 032] mariadb: Depool db1053 from s2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410932 (https://phabricator.wikimedia.org/T183469) (owner: 10Jcrespo) [08:23:37] RECOVERY - PyBal backends health check on lvs1010 is OK: PYBAL OK - All pools are healthy [08:23:50] (03Merged) 10jenkins-bot: mariadb: Depool db1053 from s2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410932 (https://phabricator.wikimedia.org/T183469) (owner: 10Jcrespo) [08:23:57] RECOVERY - PyBal backends health check on lvs1003 is OK: PYBAL OK - All pools are healthy [08:24:07] RECOVERY - PyBal backends health check on lvs1006 is OK: PYBAL OK - All pools are healthy [08:26:34] (03CR) 10jenkins-bot: mariadb: Depool db1053 from s2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410932 (https://phabricator.wikimedia.org/T183469) (owner: 10Jcrespo) [08:28:37] (03PS1) 10Jcrespo: mariadb: Remove db1053 for mediawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411188 (https://phabricator.wikimedia.org/T183469) [08:30:10] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Depool db1053 (duration: 00m 57s) [08:30:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:31:48] RECOVERY - Misc HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=misc&var-status_type=5 [08:34:21] !log manually allocate logstash1008 on ganeti1005 to undo the manual override of sensible allocation rules by ganeti [08:34:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:41:41] 10Operations, 10Ops-Access-Requests, 10Traffic, 10Patch-For-Review: Ops Onboarding for Valentín Gutiérrez - https://phabricator.wikimedia.org/T187035#3977824 (10Vgutierrez) 05Open>03Resolved [08:43:58] (03PS2) 10Jcrespo: mariadb: Remove db1053 for mediawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411188 (https://phabricator.wikimedia.org/T183469) [08:44:33] (03PS3) 10Jcrespo: mariadb: Remove db1053 from mediawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411188 (https://phabricator.wikimedia.org/T183469) [08:44:50] (03PS4) 10Jcrespo: mariadb: Remove db1053 from mediawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411188 (https://phabricator.wikimedia.org/T183469) [08:48:16] !log doing IO stress tests on ganeti1005. T181121 [08:48:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:48:32] T181121: Kernels errors on ganeti1005- ganeti1008 under high I/O - https://phabricator.wikimedia.org/T181121 [08:55:28] (03PS8) 10Alexandros Kosiaris: otrs: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/409462 (owner: 10Dzahn) [08:55:32] (03CR) 10Alexandros Kosiaris: [C: 032] otrs: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/409462 (owner: 10Dzahn) [08:56:58] (03PS7) 10Krinkle: [DNM] Add cron job for expired userrights maintenance script [puppet] - 10https://gerrit.wikimedia.org/r/382631 (https://phabricator.wikimedia.org/T176754) (owner: 10EddieGP) [09:00:21] (03CR) 10Alexandros Kosiaris: [C: 032] "Noop on mendelevium. One thing that is a bit interesting is what will happen on new installation. Those File[/etc/apache2/mods-available/s" [puppet] - 10https://gerrit.wikimedia.org/r/409462 (owner: 10Dzahn) [09:12:39] herron mutante what do you think re: https://gerrit.wikimedia.org/r/c/410758/ ? [09:22:36] (03PS14) 10Muehlenhoff: Add support for selective automatic restarts of stateless services [puppet] - 10https://gerrit.wikimedia.org/r/399618 (https://phabricator.wikimedia.org/T135991) [09:23:06] (03PS1) 10Alexandros Kosiaris: httpd: Make sure we purge package's status.conf [puppet] - 10https://gerrit.wikimedia.org/r/411189 [09:23:08] (03PS1) 10Alexandros Kosiaris: apache/httpd: Support IPv6 in status page [puppet] - 10https://gerrit.wikimedia.org/r/411190 [09:23:58] (03CR) 10Alexandros Kosiaris: [C: 032] "Addressed in https://gerrit.wikimedia.org/r/411189" [puppet] - 10https://gerrit.wikimedia.org/r/409462 (owner: 10Dzahn) [09:25:27] (03CR) 10Giuseppe Lavagetto: [C: 04-1] httpd: Make sure we purge package's status.conf (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/411189 (owner: 10Alexandros Kosiaris) [09:26:06] (03CR) 10Giuseppe Lavagetto: "Maybe let's do httpd first, and apache later? this will go out to all servers basically :P" [puppet] - 10https://gerrit.wikimedia.org/r/411190 (owner: 10Alexandros Kosiaris) [09:28:25] (03CR) 10Alexandros Kosiaris: httpd: Make sure we purge package's status.conf (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/411189 (owner: 10Alexandros Kosiaris) [09:37:04] (03PS1) 10Elukey: profile::oozie::server: remove symlink installed by Oozie's package [puppet] - 10https://gerrit.wikimedia.org/r/411192 (https://phabricator.wikimedia.org/T184794) [09:38:34] (03CR) 10Alexandros Kosiaris: httpd: Make sure we purge package's status.conf (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/411189 (owner: 10Alexandros Kosiaris) [09:38:42] <_joe_> win 66 [09:38:51] (03CR) 10Elukey: [C: 032] profile::oozie::server: remove symlink installed by Oozie's package [puppet] - 10https://gerrit.wikimedia.org/r/411192 (https://phabricator.wikimedia.org/T184794) (owner: 10Elukey) [09:39:00] _joe_: E_TOO_MANY_WINDOWS [09:39:19] <_joe_> vgutierrez: oh I restarted irssi a couple weeks ago, or I'd be in the 100s [09:39:35] (03CR) 10Giuseppe Lavagetto: [C: 031] "that's ok then :)" [puppet] - 10https://gerrit.wikimedia.org/r/411189 (owner: 10Alexandros Kosiaris) [09:41:11] (03PS2) 10Alexandros Kosiaris: httpd: Make sure we purge package's status.conf [puppet] - 10https://gerrit.wikimedia.org/r/411189 [09:41:13] (03PS2) 10Alexandros Kosiaris: httpd: Support IPv6 in status page [puppet] - 10https://gerrit.wikimedia.org/r/411190 [09:41:15] (03PS1) 10Alexandros Kosiaris: apache: Support IPv6 in status [puppet] - 10https://gerrit.wikimedia.org/r/411193 [09:41:27] 10Operations: Create 2 VMs in codfw for mwdebug20001 and 2002 - https://phabricator.wikimedia.org/T187468#3977922 (10fgiunchedi) p:05Triage>03Normal [09:41:38] 10Operations, 10ops-eqiad: Decommission mw2017 and mw2099 - https://phabricator.wikimedia.org/T187467#3977923 (10fgiunchedi) p:05Triage>03Normal [09:41:44] 10Operations, 10ops-eqiad: Decommission mw1259-mw1260 - https://phabricator.wikimedia.org/T187466#3977924 (10fgiunchedi) p:05Triage>03Normal [09:42:20] (03CR) 10Alexandros Kosiaris: "done" [puppet] - 10https://gerrit.wikimedia.org/r/411190 (owner: 10Alexandros Kosiaris) [09:42:40] 10Operations, 10LDAP-Access-Requests, 10WMF-NDA-Requests: Request to be added to the ldap/wmde group - https://phabricator.wikimedia.org/T187442#3977925 (10fgiunchedi) p:05Triage>03Normal [09:42:45] (03CR) 10Alexandros Kosiaris: "https://puppet-compiler.wmflabs.org/compiler02/9995/mendelevium.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/411189 (owner: 10Alexandros Kosiaris) [09:42:48] (03CR) 10Alexandros Kosiaris: [C: 032] httpd: Make sure we purge package's status.conf [puppet] - 10https://gerrit.wikimedia.org/r/411189 (owner: 10Alexandros Kosiaris) [09:42:55] (03CR) 10Alexandros Kosiaris: [C: 032] httpd: Support IPv6 in status page [puppet] - 10https://gerrit.wikimedia.org/r/411190 (owner: 10Alexandros Kosiaris) [09:45:03] (03PS15) 10Muehlenhoff: Add support for selective automatic restarts of stateless services [puppet] - 10https://gerrit.wikimedia.org/r/399618 (https://phabricator.wikimedia.org/T135991) [09:45:27] 10Operations, 10ops-eqiad, 10DBA: Disk #5 (count starts at #0) of db1111 has corrupted sectors - https://phabricator.wikimedia.org/T187526#3977928 (10jcrespo) [09:46:39] (03CR) 10Muehlenhoff: [C: 032] Add support for selective automatic restarts of stateless services [puppet] - 10https://gerrit.wikimedia.org/r/399618 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [09:46:53] (03PS1) 10Marostegui: db-eqiad.php: Depool db1099:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411194 (https://phabricator.wikimedia.org/T162807) [09:47:19] (03CR) 10Jcrespo: [C: 032] mariadb: Remove db1053 from mediawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411188 (https://phabricator.wikimedia.org/T183469) (owner: 10Jcrespo) [09:48:54] (03Merged) 10jenkins-bot: mariadb: Remove db1053 from mediawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411188 (https://phabricator.wikimedia.org/T183469) (owner: 10Jcrespo) [09:49:03] (03CR) 10jenkins-bot: mariadb: Remove db1053 from mediawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411188 (https://phabricator.wikimedia.org/T183469) (owner: 10Jcrespo) [09:49:40] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1099:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411194 (https://phabricator.wikimedia.org/T162807) (owner: 10Marostegui) [09:49:45] (03PS2) 10Marostegui: db-eqiad.php: Depool db1099:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411194 (https://phabricator.wikimedia.org/T162807) [09:53:07] jynus: have you deployed your change? I want to deploy mine, I can rebase and deploy both if you want [09:53:09] 10Operations, 10monitoring: Many "NRPE: Unable to read output" from "long running screen/tmux" checks in icinga - https://phabricator.wikimedia.org/T187528#3977961 (10fgiunchedi) [09:53:12] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1099:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411194 (https://phabricator.wikimedia.org/T162807) (owner: 10Marostegui) [09:53:51] it is ongoing [09:53:54] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Remove db1053 (duration: 00m 56s) [09:53:56] ah [09:53:58] hehe [09:53:59] it requires 2 deploys [09:54:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:54:24] yeah, I will wait for codfw to be finished [09:54:27] (03PS1) 10Elukey: hive-env.sh: use HIVE_METASTORE_HADOOP_OPTS to configure the metastore [puppet/cdh] - 10https://gerrit.wikimedia.org/r/411195 (https://phabricator.wikimedia.org/T184794) [09:55:07] !log jynus@tin Synchronized wmf-config/db-codfw.php: Remove db1053 (duration: 00m 56s) [09:55:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:55:23] done [09:55:29] thanks! [09:56:26] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1099:3311 - T162807 (duration: 00m 56s) [09:56:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:56:40] T162807: Run pt-table-checksum on s1 (enwiki) - https://phabricator.wikimedia.org/T162807 [09:57:40] (03CR) 10Elukey: [V: 032 C: 032] "https://puppet-compiler.wmflabs.org/compiler02/9998/analytics1003.eqiad.wmnet/" [puppet/cdh] - 10https://gerrit.wikimedia.org/r/411195 (https://phabricator.wikimedia.org/T184794) (owner: 10Elukey) [09:57:46] 10Operations, 10monitoring: Many "NRPE: Unable to read output" from "long running screen/tmux" checks in icinga - https://phabricator.wikimedia.org/T187528#3977977 (10fgiunchedi) cc @Dzahn as original author of the check [10:00:30] (03PS1) 10Jcrespo: mariadb: Move db1053 from eqiad:core:s2 to eqiad:misc:m3 (phabricator) [puppet] - 10https://gerrit.wikimedia.org/r/411196 (https://phabricator.wikimedia.org/T183469) [10:01:01] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Move db1053 from eqiad:core:s2 to eqiad:misc:m3 (phabricator) [puppet] - 10https://gerrit.wikimedia.org/r/411196 (https://phabricator.wikimedia.org/T183469) (owner: 10Jcrespo) [10:01:47] (03CR) 10Jcrespo: [V: 032] "Ignoring linter" [puppet] - 10https://gerrit.wikimedia.org/r/411196 (https://phabricator.wikimedia.org/T183469) (owner: 10Jcrespo) [10:01:53] (03PS1) 10Elukey: Update the cdh module to its latest change [puppet] - 10https://gerrit.wikimedia.org/r/411197 (https://phabricator.wikimedia.org/T184794) [10:03:07] (03CR) 10Elukey: [C: 032] Update the cdh module to its latest change [puppet] - 10https://gerrit.wikimedia.org/r/411197 (https://phabricator.wikimedia.org/T184794) (owner: 10Elukey) [10:18:28] (03PS1) 10Jcrespo: mariadb: Prepare reimage of db1053 and db2042 to strech [puppet] - 10https://gerrit.wikimedia.org/r/411198 (https://phabricator.wikimedia.org/T183469) [10:19:40] (03CR) 10Marostegui: [C: 031] mariadb: Prepare reimage of db1053 and db2042 to strech [puppet] - 10https://gerrit.wikimedia.org/r/411198 (https://phabricator.wikimedia.org/T183469) (owner: 10Jcrespo) [10:20:27] (03PS2) 10Jcrespo: mariadb: Prepare reimage of db1053 and db2042 to strech [puppet] - 10https://gerrit.wikimedia.org/r/411198 (https://phabricator.wikimedia.org/T183469) [10:21:33] (03CR) 10Jcrespo: [C: 032] mariadb: Prepare reimage of db1053 and db2042 to strech [puppet] - 10https://gerrit.wikimedia.org/r/411198 (https://phabricator.wikimedia.org/T183469) (owner: 10Jcrespo) [10:25:57] PROBLEM - Host logstash1008 is DOWN: PING CRITICAL - Packet loss = 100% [10:26:07] PROBLEM - Host sca1004 is DOWN: PING CRITICAL - Packet loss = 100% [10:26:08] ? [10:26:19] is it ganeti again? [10:26:38] PROBLEM - SSH on ganeti1005 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:29:37] RECOVERY - SSH on ganeti1005 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u3 (protocol 2.0) [10:30:37] RECOVERY - Host sca1004 is UP: PING OK - Packet loss = 0%, RTA = 0.45 ms [10:30:37] RECOVERY - Host logstash1008 is UP: PING OK - Packet loss = 0%, RTA = 0.35 ms [10:30:48] PROBLEM - puppet last run on db1053 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:34:19] 10Operations, 10DBA, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 codfw machines - https://phabricator.wikimedia.org/T183470#3854215 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jynus on sarin.codfw.wmnet for hosts: ``` ['db2042.codfw.wmnet'] ``` The log can... [10:34:30] 10Operations, 10DBA, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469#3978016 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jynus on neodymium.eqiad.wmnet for hosts: ``` ['db1053.eqiad.wmnet'] ``` The log... [10:34:53] (03CR) 10Jcrespo: [V: 032 C: 032] mariadb: Move db2042 from codfw:core:s1 to misc:s3 (phabricator) [puppet] - 10https://gerrit.wikimedia.org/r/410794 (https://phabricator.wikimedia.org/T183470) (owner: 10Jcrespo) [10:34:59] (03PS3) 10Jcrespo: mariadb: Move db2042 from codfw:core:s1 to misc:s3 (phabricator) [puppet] - 10https://gerrit.wikimedia.org/r/410794 (https://phabricator.wikimedia.org/T183470) [10:35:03] (03CR) 10Jcrespo: [V: 032 C: 032] mariadb: Move db2042 from codfw:core:s1 to misc:s3 (phabricator) [puppet] - 10https://gerrit.wikimedia.org/r/410794 (https://phabricator.wikimedia.org/T183470) (owner: 10Jcrespo) [10:35:25] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Move db2042 from codfw:core:s1 to misc:s3 (phabricator) [puppet] - 10https://gerrit.wikimedia.org/r/410794 (https://phabricator.wikimedia.org/T183470) (owner: 10Jcrespo) [10:36:16] (03PS2) 10Jcrespo: mariadb: Move db1053 from eqiad:core:s2 to eqiad:misc:m3 (phabricator) [puppet] - 10https://gerrit.wikimedia.org/r/411196 (https://phabricator.wikimedia.org/T183469) [10:36:22] (03CR) 10Jcrespo: [V: 032] mariadb: Move db1053 from eqiad:core:s2 to eqiad:misc:m3 (phabricator) [puppet] - 10https://gerrit.wikimedia.org/r/411196 (https://phabricator.wikimedia.org/T183469) (owner: 10Jcrespo) [10:36:43] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Move db1053 from eqiad:core:s2 to eqiad:misc:m3 (phabricator) [puppet] - 10https://gerrit.wikimedia.org/r/411196 (https://phabricator.wikimedia.org/T183469) (owner: 10Jcrespo) [10:36:59] (03CR) 10Jcrespo: [V: 032 C: 032] mariadb: Move db1053 from eqiad:core:s2 to eqiad:misc:m3 (phabricator) [puppet] - 10https://gerrit.wikimedia.org/r/411196 (https://phabricator.wikimedia.org/T183469) (owner: 10Jcrespo) [10:52:24] (03PS2) 10Filippo Giunchedi: prometheus: tweak node_exporter ignored_devices and ignored_fs_types [puppet] - 10https://gerrit.wikimedia.org/r/404430 [10:52:42] (03CR) 10jerkins-bot: [V: 04-1] prometheus: tweak node_exporter ignored_devices and ignored_fs_types [puppet] - 10https://gerrit.wikimedia.org/r/404430 (owner: 10Filippo Giunchedi) [10:53:50] 10Operations, 10ops-codfw, 10DBA, 10netops: switch port configuration for tendril2001 - https://phabricator.wikimedia.org/T186172#3978036 (10Marostegui) Please change this to db2093 as we have decided to rename that host from tendril2001 to db2093 (T186123#3975533) Thanks and sorry for che changes! [10:54:37] (03PS3) 10Filippo Giunchedi: prometheus: tweak node_exporter ignored_devices and ignored_fs_types [puppet] - 10https://gerrit.wikimedia.org/r/404430 [10:57:21] 10Operations, 10DBA, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469#3978043 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['db1053.eqiad.wmnet'] ``` and were **ALL** successful. [10:59:16] 10Operations, 10DBA, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 codfw machines - https://phabricator.wikimedia.org/T183470#3978052 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['db2042.codfw.wmnet'] ``` and were **ALL** successful. [11:00:57] (03PS1) 10Marostegui: db-eqiad.php: Depool db1093 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411199 [11:03:12] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1093 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411199 (owner: 10Marostegui) [11:03:32] (03PS1) 10Marostegui: db1093: Update socket path [puppet] - 10https://gerrit.wikimedia.org/r/411200 [11:05:16] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1093 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411199 (owner: 10Marostegui) [11:05:40] (03PS1) 10Jcrespo: mariadb: Disable notifications for db1043 and db2012 [puppet] - 10https://gerrit.wikimedia.org/r/411201 (https://phabricator.wikimedia.org/T183469) [11:06:14] (03CR) 10Jcrespo: [C: 032] mariadb: Disable notifications for db1043 and db2012 [puppet] - 10https://gerrit.wikimedia.org/r/411201 (https://phabricator.wikimedia.org/T183469) (owner: 10Jcrespo) [11:06:25] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1093 (duration: 00m 56s) [11:06:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:06:41] !log Stop MySQL on db1093 for mariadb and kernel upgrade, also update socket path [11:06:42] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1093 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411199 (owner: 10Marostegui) [11:06:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:07:09] (03PS2) 10Marostegui: db1093: Update socket path [puppet] - 10https://gerrit.wikimedia.org/r/411200 [11:07:57] (03CR) 10Marostegui: [C: 032] db1093: Update socket path [puppet] - 10https://gerrit.wikimedia.org/r/411200 (owner: 10Marostegui) [11:09:08] !log restart nfaccd on rhenium to see if it picks up the new kafka topic config (3 partitions) [11:09:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:09:47] not really [11:10:30] (03CR) 10Hashar: "check experimental" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394199 (owner: 10Chad) [11:10:33] (03CR) 10Hashar: "check experimental" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410110 (owner: 10Krinkle) [11:11:47] (03CR) 10jerkins-bot: [V: 04-1] Move all dblists on noc to dblists/ directory, rather than individually [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394199 (owner: 10Chad) [11:14:40] (03PS1) 10Marostegui: db-eqiad.php: Slowly repool db1093 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411202 [11:17:20] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Slowly repool db1093 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411202 (owner: 10Marostegui) [11:18:54] (03Merged) 10jenkins-bot: db-eqiad.php: Slowly repool db1093 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411202 (owner: 10Marostegui) [11:19:04] (03CR) 10jenkins-bot: db-eqiad.php: Slowly repool db1093 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411202 (owner: 10Marostegui) [11:20:04] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Slowly repool db1093 (duration: 00m 56s) [11:20:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:21:01] (03PS1) 10Jcrespo: mariadb: Change phabricator db config to modern defaults [puppet] - 10https://gerrit.wikimedia.org/r/411203 (https://phabricator.wikimedia.org/T183470) [11:21:26] (03PS1) 10Elukey: pmacct: set kafka_patition to -1 on nfacctd.conf [puppet] - 10https://gerrit.wikimedia.org/r/411204 (https://phabricator.wikimedia.org/T181036) [11:22:24] 10Operations, 10monitoring: es1019 ipmi and mgmt unresponsive - https://phabricator.wikimedia.org/T187530#3978115 (10fgiunchedi) p:05Triage>03Normal [11:23:24] (03CR) 10Hashar: "check experimental" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411199 (owner: 10Marostegui) [11:23:27] (03CR) 10Hashar: "check experimental" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411194 (https://phabricator.wikimedia.org/T162807) (owner: 10Marostegui) [11:23:57] (03CR) 10Elukey: [C: 032] pmacct: set kafka_patition to -1 on nfacctd.conf [puppet] - 10https://gerrit.wikimedia.org/r/411204 (https://phabricator.wikimedia.org/T181036) (owner: 10Elukey) [11:23:58] 10Operations, 10monitoring: es1019 ipmi and mgmt unresponsive - https://phabricator.wikimedia.org/T187530#3978115 (10Marostegui) This is a slave, so if we need to reboot it, it should be doable. [11:24:38] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1099:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411194 (https://phabricator.wikimedia.org/T162807) (owner: 10Marostegui) [11:24:40] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1093 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411199 (owner: 10Marostegui) [11:25:08] (03PS1) 10Marostegui: db-eqiad.php: Increase traffic for db1093 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411205 [11:28:06] !log Switching operations/mediawiki-config job for composer to Docker | https://gerrit.wikimedia.org/r/#/c/411206/ [11:28:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:28:33] !log cp3036: restart varnish-fe to clear 'child restarted' alert [11:28:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:29:21] (03PS1) 10Elukey: camus: set mapreduce jobs to 3 [puppet] - 10https://gerrit.wikimedia.org/r/411207 (https://phabricator.wikimedia.org/T181036) [11:30:21] (03PS2) 10Elukey: camus: set netflow's mapreduce jobs to 3 [puppet] - 10https://gerrit.wikimedia.org/r/411207 (https://phabricator.wikimedia.org/T181036) [11:30:39] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase traffic for db1093 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411205 (owner: 10Marostegui) [11:31:00] (03CR) 10Elukey: [C: 032] camus: set netflow's mapreduce jobs to 3 [puppet] - 10https://gerrit.wikimedia.org/r/411207 (https://phabricator.wikimedia.org/T181036) (owner: 10Elukey) [11:31:50] (03Merged) 10jenkins-bot: db-eqiad.php: Increase traffic for db1093 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411205 (owner: 10Marostegui) [11:32:03] (03CR) 10jenkins-bot: db-eqiad.php: Increase traffic for db1093 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411205 (owner: 10Marostegui) [11:32:22] (03CR) 10Jcrespo: [C: 032] "Looking good: https://puppet-compiler.wmflabs.org/compiler02/9999/db1059.eqiad.wmnet/ db1059 was already using a symlink on /tmp" [puppet] - 10https://gerrit.wikimedia.org/r/411203 (https://phabricator.wikimedia.org/T183470) (owner: 10Jcrespo) [11:32:27] (03PS2) 10Jcrespo: mariadb: Change phabricator db config to modern defaults [puppet] - 10https://gerrit.wikimedia.org/r/411203 (https://phabricator.wikimedia.org/T183470) [11:33:01] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase traffic for db1093 (duration: 00m 56s) [11:33:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:33:55] !log changing socket location on phabricator db hosts T148507 [11:34:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:35:09] !log stopping mysql on db1043, db2012 for clonning data away [11:35:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:36:24] 10Operations, 10monitoring: es1019 ipmi and mgmt unresponsive - https://phabricator.wikimedia.org/T187530#3978185 (10fgiunchedi) Yeah it looks like it'll need a power drain like last time in parent task. cc #ops-eqiad and @Cmjohnson for visibility [11:36:33] moritzm: would you +1 https://gerrit.wikimedia.org/r/#/c/410177/ ? [11:37:23] (03PS5) 10Arturo Borrero Gonzalez: toollabs: add apt pinnings for key packages [puppet] - 10https://gerrit.wikimedia.org/r/410177 (https://phabricator.wikimedia.org/T187193) [11:38:18] (03PS1) 10Marostegui: db-eqiad.php: Increase weight for db1093 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411208 [11:38:37] PROBLEM - haproxy failover on dbproxy1003 is CRITICAL: CRITICAL check_failover servers up 1 down 1 [11:38:41] arturo: sure, is later the afternoon okay? currently busy with other tasks [11:38:47] PROBLEM - haproxy failover on dbproxy1008 is CRITICAL: CRITICAL check_failover servers up 1 down 1 [11:38:52] 10Operations, 10ops-eqiad, 10monitoring: es1019 ipmi and mgmt unresponsive - https://phabricator.wikimedia.org/T187530#3978201 (10fgiunchedi) [11:38:55] the haproxy is me, it is the replicas I am curretnly rebuilding [11:39:02] see last log [11:39:18] moritzm: well, I was hoping for something more in realtime [11:39:31] PROBLEM - mysqld processes on db1043 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [11:39:40] mm [11:40:07] I'm assuming that's you too jynus ? ^ [11:40:20] the last puppet run wiped the disable notifications [11:40:29] because it hasn't run on icinga yet [11:42:09] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase weight for db1093 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411208 (owner: 10Marostegui) [11:42:43] did you get a page? I didn't [11:42:54] I did [11:43:18] (03Merged) 10jenkins-bot: db-eqiad.php: Increase weight for db1093 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411208 (owner: 10Marostegui) [11:43:31] (03CR) 10jenkins-bot: db-eqiad.php: Increase weight for db1093 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411208 (owner: 10Marostegui) [11:44:00] 10Operations, 10ops-codfw: db2049 management unable to login via ssh - https://phabricator.wikimedia.org/T187534#3978203 (10fgiunchedi) [11:44:30] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase traffic for db1093 (duration: 00m 56s) [11:44:32] yeah that paged for me [11:44:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:46:02] (03PS1) 10Marostegui: db-eqiad.php: Fully repool db1093 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411209 [11:46:58] I merged https://gerrit.wikimedia.org/r/#/c/411201/ at 12:06 PM [11:47:30] I got it too [11:48:13] I didn't, it showed out of my notification hours [11:48:29] 10Operations, 10ops-codfw: db2049 management unable to login via ssh - https://phabricator.wikimedia.org/T187534#3978218 (10Marostegui) This is a slave, so if @Papaul needs to reboot it to get it fixed, we can easily depool it. [11:52:39] now I get a page [11:58:04] 10Operations, 10Analytics-Kanban, 10monitoring, 10netops, and 2 others: Pull netflow data in realtime from Kafka via Tranquillity/Spark - https://phabricator.wikimedia.org/T181036#3978245 (10elukey) After the last round of patches nfacctd/pmacct are sending events to Kafka using three topic partitions rath... [11:58:22] (03CR) 10Muehlenhoff: [C: 04-1] toollabs: add apt pinnings for key packages (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/410177 (https://phabricator.wikimedia.org/T187193) (owner: 10Arturo Borrero Gonzalez) [11:59:51] (03PS9) 10Ema: icinga: add check_established_connections plugin [puppet] - 10https://gerrit.wikimedia.org/r/409921 (https://phabricator.wikimedia.org/T170847) [12:00:00] (03CR) 10Ema: [V: 032 C: 032] icinga: add check_established_connections plugin [puppet] - 10https://gerrit.wikimedia.org/r/409921 (https://phabricator.wikimedia.org/T170847) (owner: 10Ema) [12:00:19] thanks moritzm !!! [12:01:22] (03PS1) 10Muehlenhoff: Move python3 into standard packages, it's 2018 after all. [puppet] - 10https://gerrit.wikimedia.org/r/411211 [12:03:28] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Fully repool db1093 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411209 (owner: 10Marostegui) [12:03:31] (03CR) 10Chad: toollabs: add apt pinnings for key packages (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/410177 (https://phabricator.wikimedia.org/T187193) (owner: 10Arturo Borrero Gonzalez) [12:04:39] (03Merged) 10jenkins-bot: db-eqiad.php: Fully repool db1093 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411209 (owner: 10Marostegui) [12:04:55] (03CR) 10Volans: [C: 031] "+1000 \o/ (to be on the safe side I suggest to run a full compiler, just in case)" [puppet] - 10https://gerrit.wikimedia.org/r/411211 (owner: 10Muehlenhoff) [12:05:49] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Fully repool db1093 (duration: 00m 56s) [12:06:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:06:38] (03CR) 10jenkins-bot: db-eqiad.php: Fully repool db1093 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411209 (owner: 10Marostegui) [12:08:29] (03CR) 10Arturo Borrero Gonzalez: toollabs: add apt pinnings for key packages (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/410177 (https://phabricator.wikimedia.org/T187193) (owner: 10Arturo Borrero Gonzalez) [12:12:40] (03PS1) 10Marostegui: db-eqiad.php: Depool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411212 [12:14:07] (03PS1) 10Marostegui: db1094: Update socket location [puppet] - 10https://gerrit.wikimedia.org/r/411213 [12:14:47] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411212 (owner: 10Marostegui) [12:15:57] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411212 (owner: 10Marostegui) [12:16:42] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411212 (owner: 10Marostegui) [12:16:44] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1099:3311" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411215 [12:17:06] (03PS2) 10Marostegui: Revert "db-eqiad.php: Depool db1099:3311" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411215 [12:17:15] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1094 (duration: 00m 56s) [12:17:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:17:55] !log Stop MySQL on db1094 for mariadb upgrade, kernel upgrade and socket location upgrade [12:18:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:19:18] (03CR) 10Marostegui: [C: 032] db1094: Update socket location [puppet] - 10https://gerrit.wikimedia.org/r/411213 (owner: 10Marostegui) [12:20:46] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1099:3311" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411215 (owner: 10Marostegui) [12:21:54] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1099:3311" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411215 (owner: 10Marostegui) [12:21:59] (03PS1) 10Jcrespo: Mariadb: Set default basedir for phabricator (Mariadb 10.1) [puppet] - 10https://gerrit.wikimedia.org/r/411216 (https://phabricator.wikimedia.org/T183469) [12:22:04] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1099:3311" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411215 (owner: 10Marostegui) [12:22:38] (03PS2) 10Jcrespo: Mariadb: Set default basedir for phabricator (Mariadb 10.1) [puppet] - 10https://gerrit.wikimedia.org/r/411216 (https://phabricator.wikimedia.org/T183469) [12:23:07] (03CR) 10Jcrespo: [C: 032] Mariadb: Set default basedir for phabricator (Mariadb 10.1) [puppet] - 10https://gerrit.wikimedia.org/r/411216 (https://phabricator.wikimedia.org/T183469) (owner: 10Jcrespo) [12:23:34] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1099:3311 - T162807 (duration: 00m 56s) [12:23:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:23:48] T162807: Run pt-table-checksum on s1 (enwiki) - https://phabricator.wikimedia.org/T162807 [12:31:30] (03PS1) 10Marostegui: db-eqiad.php: Slowly repool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411218 [12:32:48] PROBLEM - Check systemd state on serpens is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [12:33:33] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Slowly repool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411218 (owner: 10Marostegui) [12:35:38] (03PS1) 10Muehlenhoff: Enable base::service_auto_restart for prometheus-openldap-exporter [puppet] - 10https://gerrit.wikimedia.org/r/411219 (https://phabricator.wikimedia.org/T135991) [12:35:40] (03Merged) 10jenkins-bot: db-eqiad.php: Slowly repool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411218 (owner: 10Marostegui) [12:36:20] ^ serpens is me [12:36:42] (03CR) 10jenkins-bot: db-eqiad.php: Slowly repool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411218 (owner: 10Marostegui) [12:36:57] RECOVERY - Check systemd state on serpens is OK: OK - running: The system is fully operational [12:37:06] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Slowly repool db1094 (duration: 00m 56s) [12:37:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:37:44] !log cp3049: restart varnish-fe to clear 'child restarted' alert [12:37:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:39:52] (03PS1) 10Jcrespo: dbproxy: Failover db1043 phabricator replica db to db1053 [puppet] - 10https://gerrit.wikimedia.org/r/411220 (https://phabricator.wikimedia.org/T183469) [12:40:37] (03CR) 10Marostegui: [C: 031] dbproxy: Failover db1043 phabricator replica db to db1053 [puppet] - 10https://gerrit.wikimedia.org/r/411220 (https://phabricator.wikimedia.org/T183469) (owner: 10Jcrespo) [12:41:10] (03PS1) 10Jcrespo: dbproxy: Failover db1043 phabricator replica db to db1053 [dns] - 10https://gerrit.wikimedia.org/r/411221 (https://phabricator.wikimedia.org/T183469) [12:41:14] (03PS1) 10Marostegui: db-eqiad.php: Increase traffic db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411222 [12:41:28] (03CR) 10Marostegui: [C: 031] dbproxy: Failover db1043 phabricator replica db to db1053 [dns] - 10https://gerrit.wikimedia.org/r/411221 (https://phabricator.wikimedia.org/T183469) (owner: 10Jcrespo) [12:41:30] (03CR) 10Jcrespo: [C: 032] dbproxy: Failover db1043 phabricator replica db to db1053 [puppet] - 10https://gerrit.wikimedia.org/r/411220 (https://phabricator.wikimedia.org/T183469) (owner: 10Jcrespo) [12:44:21] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase traffic db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411222 (owner: 10Marostegui) [12:44:42] !log reload dbproxy1003 configuration [12:44:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:45:07] RECOVERY - haproxy failover on dbproxy1003 is OK: OK check_failover servers up 2 down 0 [12:45:18] db1059,0,0,0,0,,0,0,0,,0,,0,0,0,0,UP, db1053,0,0,0,0,,0,0,0,,0,,0,0,0,0,UP [12:45:29] (03Merged) 10jenkins-bot: db-eqiad.php: Increase traffic db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411222 (owner: 10Marostegui) [12:46:04] !log reload dbproxy1008 configuration [12:46:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:46:36] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase traffic for db1094 (duration: 00m 56s) [12:46:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:46:48] (03CR) 10jenkins-bot: db-eqiad.php: Increase traffic db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411222 (owner: 10Marostegui) [12:47:17] RECOVERY - haproxy failover on dbproxy1008 is OK: OK check_failover servers up 2 down 0 [12:50:46] (03CR) 10Hashar: "A side effect is all those modules now suddenly depends on "base". At least for Nodepool, we don't use standard nor base modules. Althou" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/411211 (owner: 10Muehlenhoff) [12:51:59] (03PS1) 10Marostegui: db-eqiad.php: Increase traffic for db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411223 [12:53:05] (03CR) 10Muehlenhoff: Move python3 into standard packages, it's 2018 after all. (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/411211 (owner: 10Muehlenhoff) [12:53:43] 10Operations, 10ops-eqiad, 10hardware-requests: Decommission db1043 - https://phabricator.wikimedia.org/T187542#3978426 (10jcrespo) p:05Triage>03Normal [12:57:53] (03CR) 10Chad: [C: 031] "Half of these are just plain redundant anyway. I mean python3-yaml implies python3 duh." [puppet] - 10https://gerrit.wikimedia.org/r/411211 (owner: 10Muehlenhoff) [13:00:01] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase traffic for db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411223 (owner: 10Marostegui) [13:01:12] (03Merged) 10jenkins-bot: db-eqiad.php: Increase traffic for db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411223 (owner: 10Marostegui) [13:01:25] (03CR) 10jenkins-bot: db-eqiad.php: Increase traffic for db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411223 (owner: 10Marostegui) [13:02:17] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase traffic for db1094 (duration: 00m 56s) [13:02:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:05:38] (03PS1) 10Marostegui: db-eqiad.php: Fully repool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411226 [13:07:06] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1067 and db1089" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411227 [13:07:09] (03PS2) 10Marostegui: Revert "db-eqiad.php: Depool db1067 and db1089" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411227 [13:09:17] 10Operations, 10ops-codfw, 10hardware-requests: Decommission db2012 - https://phabricator.wikimedia.org/T187543#3978461 (10jcrespo) p:05Triage>03Normal [13:09:25] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1067 and db1089" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411227 (owner: 10Marostegui) [13:10:36] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1067 and db1089" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411227 (owner: 10Marostegui) [13:10:50] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1067 and db1089" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411227 (owner: 10Marostegui) [13:11:11] (03PS1) 10Jcrespo: Mariadb: Schedule db1043 and db2012 for decommission [puppet] - 10https://gerrit.wikimedia.org/r/411229 (https://phabricator.wikimedia.org/T187543) [13:11:31] 10Operations, 10ops-eqiad: Decommission mw1259-mw1260 - https://phabricator.wikimedia.org/T187466#3978490 (10MoritzMuehlenhoff) a:03MoritzMuehlenhoff [13:11:56] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1089 and db1067 - T162807 (duration: 00m 55s) [13:12:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:12:08] T162807: Run pt-table-checksum on s1 (enwiki) - https://phabricator.wikimedia.org/T162807 [13:12:47] (03PS2) 10Marostegui: db-eqiad.php: Fully repool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411226 [13:12:55] (03PS6) 10Arturo Borrero Gonzalez: toollabs: add apt pinnings for key packages [puppet] - 10https://gerrit.wikimedia.org/r/410177 (https://phabricator.wikimedia.org/T187193) [13:13:22] (03CR) 10jerkins-bot: [V: 04-1] toollabs: add apt pinnings for key packages [puppet] - 10https://gerrit.wikimedia.org/r/410177 (https://phabricator.wikimedia.org/T187193) (owner: 10Arturo Borrero Gonzalez) [13:14:46] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Fully repool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411226 (owner: 10Marostegui) [13:15:43] (03PS1) 10Jcrespo: dblist: Update latest m3 movements [software] - 10https://gerrit.wikimedia.org/r/411230 [13:15:45] (03PS1) 10Jcrespo: Update mariadb packages to 10.0.34 and 10.1.31 [software] - 10https://gerrit.wikimedia.org/r/411231 [13:15:54] (03Merged) 10jenkins-bot: db-eqiad.php: Fully repool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411226 (owner: 10Marostegui) [13:15:56] (03PS7) 10Arturo Borrero Gonzalez: toollabs: add apt pinnings for key packages [puppet] - 10https://gerrit.wikimedia.org/r/410177 (https://phabricator.wikimedia.org/T187193) [13:16:02] (03CR) 10Muehlenhoff: [C: 04-1] toollabs: add apt pinnings for key packages (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/410177 (https://phabricator.wikimedia.org/T187193) (owner: 10Arturo Borrero Gonzalez) [13:16:22] (03CR) 10jerkins-bot: [V: 04-1] toollabs: add apt pinnings for key packages [puppet] - 10https://gerrit.wikimedia.org/r/410177 (https://phabricator.wikimedia.org/T187193) (owner: 10Arturo Borrero Gonzalez) [13:16:26] (03CR) 10Jcrespo: [V: 032 C: 032] Update mariadb packages to 10.0.34 and 10.1.31 [software] - 10https://gerrit.wikimedia.org/r/411231 (owner: 10Jcrespo) [13:16:31] (03PS2) 10Jcrespo: Update mariadb packages to 10.0.34 and 10.1.31 [software] - 10https://gerrit.wikimedia.org/r/411231 [13:16:33] (03CR) 10Jcrespo: [V: 032 C: 032] Update mariadb packages to 10.0.34 and 10.1.31 [software] - 10https://gerrit.wikimedia.org/r/411231 (owner: 10Jcrespo) [13:17:08] (03CR) 10Jcrespo: [C: 032] dblist: Update latest m3 movements [software] - 10https://gerrit.wikimedia.org/r/411230 (owner: 10Jcrespo) [13:17:12] 10Operations, 10DBA, 10MediaWiki-General-or-Unknown, 10MW-1.31-release-notes (WMF-deploy-2018-02-20 (1.31.0-wmf.22)), and 2 others: Regularly purge expired temporary userrights from DB tables - https://phabricator.wikimedia.org/T176754#3978507 (10EddieGP) Next steps: # Deploy https://gerrit.wikimedia.org/r... [13:17:13] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Fully repool db1094 (duration: 00m 56s) [13:17:14] (03PS2) 10Jcrespo: dblist: Update latest m3 movements [software] - 10https://gerrit.wikimedia.org/r/411230 [13:17:16] (03CR) 10Jcrespo: [V: 032 C: 032] dblist: Update latest m3 movements [software] - 10https://gerrit.wikimedia.org/r/411230 (owner: 10Jcrespo) [13:17:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:17:32] (03CR) 10jenkins-bot: db-eqiad.php: Fully repool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411226 (owner: 10Marostegui) [13:21:03] (03CR) 10Chad: [C: 031] "Also we should kill extract2 outright but hey baby steps." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410109 (owner: 10Krinkle) [13:21:20] (03PS8) 10Arturo Borrero Gonzalez: toollabs: add apt pinnings for key packages [puppet] - 10https://gerrit.wikimedia.org/r/410177 (https://phabricator.wikimedia.org/T187193) [13:24:41] (03PS2) 10Jcrespo: Mariadb: Schedule db1043 and db2012 for decommission [puppet] - 10https://gerrit.wikimedia.org/r/411229 (https://phabricator.wikimedia.org/T187543) [13:27:40] (03CR) 10Jcrespo: [C: 032] dbproxy: Failover db1043 phabricator replica db to db1053 [dns] - 10https://gerrit.wikimedia.org/r/411221 (https://phabricator.wikimedia.org/T183469) (owner: 10Jcrespo) [13:38:22] (03CR) 10Filippo Giunchedi: [C: 031] "Thanks for taking care of this!" [puppet] - 10https://gerrit.wikimedia.org/r/411211 (owner: 10Muehlenhoff) [13:40:48] (03CR) 10Jcrespo: [C: 032] Mariadb: Schedule db1043 and db2012 for decommission [puppet] - 10https://gerrit.wikimedia.org/r/411229 (https://phabricator.wikimedia.org/T187543) (owner: 10Jcrespo) [13:43:45] (03PS2) 10Muehlenhoff: Move python3 into standard packages, it's 2018 after all. [puppet] - 10https://gerrit.wikimedia.org/r/411211 [13:49:44] (03PS1) 10BBlack: Revert "Varnish: block MJ12bot" [puppet] - 10https://gerrit.wikimedia.org/r/411238 [13:53:56] (03CR) 10BBlack: [C: 032] Revert "Varnish: block MJ12bot" [puppet] - 10https://gerrit.wikimedia.org/r/411238 (owner: 10BBlack) [14:06:28] !log T184209 initial setup of labs-instances2-b-codfw and hosts [14:06:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:14:36] 10Operations, 10ops-eqiad: Kernels errors on ganeti1005- ganeti1008 under high I/O - https://phabricator.wikimedia.org/T181121#3978654 (10akosiaris) First successful and intended reproduction!. After ~2 hours we got ``` Feb 16 10:29:06 ganeti1005 kernel: [669669.675614] qemu-system-x86: page allocation stalls... [14:15:24] !log doing more IO stress tests on ganeti1005. T181121. Seems like we can reproduce [14:15:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:15:39] T181121: Kernels errors on ganeti1005- ganeti1008 under high I/O - https://phabricator.wikimedia.org/T181121 [14:28:43] 10Operations, 10ops-codfw, 10Cloud-VPS: connect eth2 for labneutron2001 and 2002 - https://phabricator.wikimedia.org/T187552#3978705 (10chasemp) [14:28:55] 10Operations, 10ops-codfw, 10Cloud-VPS: connect eth2 for labneutron2001 and 2002 - https://phabricator.wikimedia.org/T187552#3978719 (10chasemp) p:05Triage>03Normal [14:29:41] (03PS1) 10Elukey: profile::kafka::burrow: add prometheus monitoring [puppet] - 10https://gerrit.wikimedia.org/r/411249 [14:30:23] (03CR) 10jerkins-bot: [V: 04-1] profile::kafka::burrow: add prometheus monitoring [puppet] - 10https://gerrit.wikimedia.org/r/411249 (owner: 10Elukey) [14:39:15] (03PS2) 10Elukey: profile::kafka::burrow: add prometheus monitoring [puppet] - 10https://gerrit.wikimedia.org/r/411249 [14:44:30] (03PS3) 10Elukey: profile::kafka::burrow: add prometheus monitoring [puppet] - 10https://gerrit.wikimedia.org/r/411249 [14:56:39] (03PS4) 10Elukey: profile::kafka::burrow: add prometheus monitoring [puppet] - 10https://gerrit.wikimedia.org/r/411249 [14:58:28] (03CR) 10Herron: [C: 031] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/410758 (https://phabricator.wikimedia.org/T181519) (owner: 10Filippo Giunchedi) [14:59:48] PROBLEM - Host logstash1008 is DOWN: PING CRITICAL - Packet loss = 100% [14:59:58] PROBLEM - Host sca1004 is DOWN: PING CRITICAL - Packet loss = 100% [15:01:09] PROBLEM - SSH on ganeti1005 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:03:08] RECOVERY - SSH on ganeti1005 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u3 (protocol 2.0) [15:03:42] (03PS5) 10Elukey: profile::kafka::burrow: add prometheus monitoring [puppet] - 10https://gerrit.wikimedia.org/r/411249 (https://phabricator.wikimedia.org/T180442) [15:03:55] (03PS1) 10Chad: robots.txt: Combine various NS_SPECIAL disallows [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411255 [15:04:29] (03PS1) 10Muehlenhoff: wmf-auto-restart: Fix restarts if multiple libraries in need of a restart [puppet] - 10https://gerrit.wikimedia.org/r/411256 (https://phabricator.wikimedia.org/T135991) [15:05:38] RECOVERY - Host sca1004 is UP: PING OK - Packet loss = 0%, RTA = 0.52 ms [15:05:39] RECOVERY - Host logstash1008 is UP: PING OK - Packet loss = 0%, RTA = 0.52 ms [15:08:34] ok looks like a reproduced it once more :-) [15:09:32] nice! how did you do it? [15:09:47] elukey: actually you did it. https://phabricator.wikimedia.org/T181121#3978654 [15:10:36] fun part is that all this is done via the VM. I now want to see if I can reproduce it from the host [15:10:50] that and see if DRBD is or is not related per moritzm suggestion [15:11:54] good luck! :) [15:14:08] !log poweroff sca1004, switch from DRBD to plain disk template T181121 [15:14:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:14:22] T181121: Kernels errors on ganeti1005- ganeti1008 under high I/O - https://phabricator.wikimedia.org/T181121 [15:16:39] !log run T181121#3978654 oneliner once more on sca1004, this time the VM has no DRBD [15:16:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:23:12] 10Operations, 10ops-eqiad: Degraded RAID on analytics1057 - https://phabricator.wikimedia.org/T187146#3978838 (10Cmjohnson) [15:23:14] 10Operations, 10ops-eqiad, 10Analytics-Kanban: Broken disk on analytics1057 - https://phabricator.wikimedia.org/T187162#3978840 (10Cmjohnson) [15:24:48] 10Operations, 10ops-eqiad: Degraded RAID on analytics1057 - https://phabricator.wikimedia.org/T187146#3965895 (10Cmjohnson) @elukey the disk has been replaced. you may need to add it back Return tracking info USPS 9202 3946 5301 2437 9877 74 FEDEX 8611918 2393026 74737795 [15:25:36] 10Operations, 10ops-eqiad: Degraded RAID on analytics1057 - https://phabricator.wikimedia.org/T187146#3978851 (10Cmjohnson) a:03elukey Assigning to @elukey to add back and resolve task [15:27:41] !log shut ms-be1018 for bbu swap - T186988 [15:27:51] cmjohnson1: ^ should be powered off shortly [15:27:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:27:54] T186988: Degraded RAID on ms-be1018 - https://phabricator.wikimedia.org/T186988 [15:28:06] !log andrew@tin Started deploy [horizon/deploy@58d2718]: first attempt at ocata branch [15:28:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:29:28] (03PS1) 10Ema: etcd: Introduce reconnectTimeout [debs/pybal] - 10https://gerrit.wikimedia.org/r/411264 (https://phabricator.wikimedia.org/T169765) [15:29:34] !log andrew@tin Finished deploy [horizon/deploy@58d2718]: first attempt at ocata branch (duration: 01m 28s) [15:29:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:32:08] PROBLEM - Varnish HTTP text-backend - port 3128 on cp5007 is CRITICAL: connect to address 10.132.0.107 and port 3128: Connection refused [15:33:02] (03CR) 10Filippo Giunchedi: [C: 031] wmf-auto-restart: Fix restarts if multiple libraries in need of a restart [puppet] - 10https://gerrit.wikimedia.org/r/411256 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [15:33:05] bblack: I guess this is you, right? ^ [15:33:08] RECOVERY - Varnish HTTP text-backend - port 3128 on cp5007 is OK: HTTP OK: HTTP/1.1 200 OK - 217 bytes in 0.498 second response time [15:34:03] 10Operations, 10Patch-For-Review: setup/install bast1002(WMF4749) - https://phabricator.wikimedia.org/T186623#3978884 (10Cmjohnson) a:05Cmjohnson>03RobH @robh the disk has been replaced. Assigning back to you Return tracking info USPS 9202 3946 5301 2437 9854 35 FEDEX 9611918 2393026 74735456 [15:35:23] ema: yess :) [15:37:06] !log andrew@tin Started deploy [horizon/deploy@29f9afb]: second attempt at ocata branch [15:37:09] PROBLEM - Varnish HTTP text-backend - port 3128 on cp5007 is CRITICAL: connect to address 10.132.0.107 and port 3128: Connection refused [15:37:09] RECOVERY - MegaRAID on analytics1057 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy [15:37:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:38:33] (03CR) 10Muehlenhoff: [C: 032] wmf-auto-restart: Fix restarts if multiple libraries in need of a restart [puppet] - 10https://gerrit.wikimedia.org/r/411256 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [15:39:08] PROBLEM - Host ms-be1018.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [15:40:28] !log andrew@tin Finished deploy [horizon/deploy@29f9afb]: second attempt at ocata branch (duration: 03m 22s) [15:40:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:41:58] RECOVERY - Hadoop DataNode on analytics1057 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.hdfs.server.datanode.DataNode [15:42:12] and I icinga-downtime'd ms-be1018, ah [15:43:02] cmjohnson1: all good with analytics1057, thanks! [15:44:18] RECOVERY - Host ms-be1018.mgmt is UP: PING OK - Packet loss = 0%, RTA = 2.01 ms [15:44:28] (03PS9) 10Bstorm: servers: install hp-health on all HP servers [puppet] - 10https://gerrit.wikimedia.org/r/410599 (https://phabricator.wikimedia.org/T187355) [15:44:39] 10Operations, 10ops-eqiad: Degraded RAID on analytics1057 - https://phabricator.wikimedia.org/T187146#3978952 (10elukey) 05Open>03Resolved Disk configured and Hadoop worker node back serving traffic, thanks! [15:46:58] RECOVERY - HP RAID on ms-be1018 is OK: OK: Slot 1: OK: 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, 2I:4:1, 2I:4:2 - Controller: OK - Battery/Capacitor: OK [15:47:12] cmjohnson1: ^ all good \o/ thanks [15:47:36] 10Operations, 10ops-eqiad: Degraded RAID on ms-be1018 - https://phabricator.wikimedia.org/T186988#3978958 (10fgiunchedi) 05Open>03Resolved Fixed! ``` 15:46 -icinga-wm:#wikimedia-operations- RECOVERY - HP RAID on ms-be1018 is OK: OK: Slot 1: OK: 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I... [15:47:48] 10Operations, 10ops-codfw: Decommission mw2017 and mw2099 - https://phabricator.wikimedia.org/T187467#3978970 (10Cmjohnson) [15:47:55] herron: thanks for the review! mind taking a look at https://gerrit.wikimedia.org/r/c/410759/ too? [15:48:07] 10Operations, 10ops-codfw: Decommission mw2017 and mw2099 - https://phabricator.wikimedia.org/T187467#3976153 (10Cmjohnson) a:03Papaul [15:50:18] RECOVERY - Varnish HTTP text-backend - port 3128 on cp5007 is OK: HTTP OK: HTTP/1.1 200 OK - 217 bytes in 0.482 second response time [15:50:25] 10Operations, 10ops-eqiad, 10hardware-requests: Decommisson restbase-dev100[1-3] - https://phabricator.wikimedia.org/T171179#3979012 (10Cmjohnson) 05Open>03Resolved Fixed [15:50:28] 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review, and 2 others: rack/setup/install restbase-dev100[456] - https://phabricator.wikimedia.org/T166181#3979014 (10Cmjohnson) [15:50:37] (03PS1) 10Muehlenhoff: Enable base::service_auto_restart for prometheus-etherpad-exporter [puppet] - 10https://gerrit.wikimedia.org/r/411276 (https://phabricator.wikimedia.org/T135991) [15:53:58] (03CR) 10Bstorm: [C: 032] servers: install hp-health on all HP servers [puppet] - 10https://gerrit.wikimedia.org/r/410599 (https://phabricator.wikimedia.org/T187355) (owner: 10Bstorm) [15:57:46] (03CR) 10Filippo Giunchedi: profile::kafka::burrow: add prometheus monitoring (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/411249 (https://phabricator.wikimedia.org/T180442) (owner: 10Elukey) [15:58:30] PROBLEM - Host labstore1006 is DOWN: PING CRITICAL - Packet loss = 100% [15:59:10] 10Operations, 10ops-codfw, 10Cloud-VPS: connect eth2 for labneutron2001 and 2002 - https://phabricator.wikimedia.org/T187552#3979061 (10chasemp) [15:59:51] PROBLEM - Host labstore1006.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [16:00:56] !log andrew@tin Started deploy [horizon/deploy@16d0b17]: ocata branch with upper constraints [16:01:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:02:37] madhuvishy: do we know taht labstore1006 is down? [16:02:42] I think yes [16:03:14] !log andrew@tin Finished deploy [horizon/deploy@16d0b17]: ocata branch with upper constraints (duration: 02m 18s) [16:03:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:04:44] chasemp sorry that's me moving it [16:04:52] thought it was already in maint mode [16:04:54] chasemp: no. cmjohnson1 are you working on labstore1006? [16:04:57] Oh cool [16:05:06] I can downtime [16:05:54] cmjohnson1: don't forget to !log :) [16:06:21] !log labstore1006 and labstore1007 down for rack relocation [16:06:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:09:32] cmjohnson1: no worries man I was just checking, I knew that whole move thing was coming up [16:11:01] (03PS6) 10Elukey: profile::kafka::burrow: add prometheus monitoring [puppet] - 10https://gerrit.wikimedia.org/r/411249 (https://phabricator.wikimedia.org/T180442) [16:13:27] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler02/10004/krypton.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/411249 (https://phabricator.wikimedia.org/T180442) (owner: 10Elukey) [16:13:31] (03PS2) 10Muehlenhoff: Enable base::service_auto_restart for prometheus-openldap-exporter [puppet] - 10https://gerrit.wikimedia.org/r/411219 (https://phabricator.wikimedia.org/T135991) [16:13:46] !log andrew@tin Started deploy [horizon/deploy@16f3d8e]: ocata branch with upper new requirements [16:13:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:14:47] (03CR) 10Elukey: "Just noticed File[/etc/default/prometheus-burrow-exporter@main-eqiad], not pretty, going to amend it.." [puppet] - 10https://gerrit.wikimedia.org/r/411249 (https://phabricator.wikimedia.org/T180442) (owner: 10Elukey) [16:16:27] (03CR) 10Elukey: "> Just noticed File[/etc/default/prometheus-burrow-exporter@main-eqiad]," [puppet] - 10https://gerrit.wikimedia.org/r/411249 (https://phabricator.wikimedia.org/T180442) (owner: 10Elukey) [16:17:37] (03PS1) 10Chad: Remove indirection from search-redirect.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411284 [16:21:46] !log andrew@tin Finished deploy [horizon/deploy@16f3d8e]: ocata branch with upper new requirements (duration: 08m 00s) [16:21:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:26:20] RECOVERY - Host labstore1006.mgmt is UP: PING OK - Packet loss = 0%, RTA = 0.88 ms [16:30:14] 10Operations, 10ops-codfw, 10DBA, 10netops: switch port configuration for tendril2001 - https://phabricator.wikimedia.org/T186172#3979092 (10ayounsi) No worries, port description renamed! [16:31:48] (03CR) 10Rush: [C: 032] openstack: set up values for test and n environment [puppet] - 10https://gerrit.wikimedia.org/r/410943 (https://phabricator.wikimedia.org/T184209) (owner: 10Rush) [16:31:52] (03PS3) 10Rush: openstack: set up values for test and n environment [puppet] - 10https://gerrit.wikimedia.org/r/410943 (https://phabricator.wikimedia.org/T184209) [16:33:29] some day we will have proper names for these clusters ^^^ [16:36:18] (03PS1) 10Rush: openstack: correct key lookup values for n deployment [puppet] - 10https://gerrit.wikimedia.org/r/411288 [16:37:42] (03CR) 10Rush: [C: 032] openstack: correct key lookup values for n deployment [puppet] - 10https://gerrit.wikimedia.org/r/411288 (owner: 10Rush) [16:43:56] (03CR) 10Niedzielski: [C: 04-1] New: add chromium_render service (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/409996 (https://phabricator.wikimedia.org/T178166) (owner: 10Niedzielski) [16:54:03] (03PS1) 10Giuseppe Lavagetto: Enable EtcdConfig on the debug hosts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411296 (https://phabricator.wikimedia.org/T149617) [16:55:48] (03CR) 10jerkins-bot: [V: 04-1] Enable EtcdConfig on the debug hosts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411296 (https://phabricator.wikimedia.org/T149617) (owner: 10Giuseppe Lavagetto) [16:58:14] (03CR) 10Ppchelko: New: add chromium_render service (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/409996 (https://phabricator.wikimedia.org/T178166) (owner: 10Niedzielski) [17:00:34] PROBLEM - puppet last run on db1094 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:01:33] PROBLEM - puppet last run on elastic1043 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:02:03] PROBLEM - puppet last run on conf1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:02:03] PROBLEM - puppet last run on restbase1013 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:02:43] PROBLEM - puppet last run on analytics1051 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:03:53] (03PS3) 10Niedzielski: New: add chromium_render service [puppet] - 10https://gerrit.wikimedia.org/r/409996 (https://phabricator.wikimedia.org/T178166) [17:03:59] (03CR) 10Niedzielski: New: add chromium_render service (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/409996 (https://phabricator.wikimedia.org/T178166) (owner: 10Niedzielski) [17:04:03] PROBLEM - puppet last run on labvirt1013 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:04:03] PROBLEM - puppet last run on logstash1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:04:23] PROBLEM - puppet last run on conf1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:04:33] PROBLEM - puppet last run on mw1275 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:04:33] PROBLEM - puppet last run on hafnium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:04:33] PROBLEM - puppet last run on analytics1065 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:04:43] PROBLEM - puppet last run on mw1238 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:04:44] PROBLEM - puppet last run on elastic1026 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:04:53] PROBLEM - puppet last run on mw1262 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:04:53] PROBLEM - puppet last run on rhodium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:04:55] (03CR) 10Ppchelko: New: add chromium_render service (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/409996 (https://phabricator.wikimedia.org/T178166) (owner: 10Niedzielski) [17:05:54] puppetdb --> Active: active (running) since Fri 2018-02-16 16:57:50 UTC; 7min ago [17:06:31] (03CR) 10Niedzielski: [C: 04-1] "Whoops! Well I will leave this voted down then until a port is decided upon." [puppet] - 10https://gerrit.wikimedia.org/r/409996 (https://phabricator.wikimedia.org/T178166) (owner: 10Niedzielski) [17:07:03] PROBLEM - Host labstore1007.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [17:08:06] (03CR) 10Chad: Move all dblists on noc to dblists/ directory, rather than individually (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394199 (owner: 10Chad) [17:10:54] PROBLEM - Varnish HTTP text-backend - port 3128 on cp4028 is CRITICAL: connect to address 10.128.0.128 and port 3128: Connection refused [17:11:54] RECOVERY - Varnish HTTP text-backend - port 3128 on cp4028 is OK: HTTP OK: HTTP/1.1 200 OK - 218 bytes in 0.157 second response time [17:12:13] RECOVERY - Host labstore1007.mgmt is UP: PING OK - Packet loss = 0%, RTA = 2.71 ms [17:18:29] (03CR) 10Chad: toollabs: add apt pinnings for key packages (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/410177 (https://phabricator.wikimedia.org/T187193) (owner: 10Arturo Borrero Gonzalez) [17:19:03] (03CR) 10Chad: toollabs: add apt pinnings for key packages (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/410177 (https://phabricator.wikimedia.org/T187193) (owner: 10Arturo Borrero Gonzalez) [17:29:03] RECOVERY - puppet last run on labvirt1013 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [17:29:04] RECOVERY - puppet last run on logstash1005 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:29:23] RECOVERY - puppet last run on conf1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:29:33] RECOVERY - puppet last run on mw1275 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [17:29:33] RECOVERY - puppet last run on hafnium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:29:34] RECOVERY - puppet last run on analytics1065 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:29:43] RECOVERY - puppet last run on mw1238 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:29:44] RECOVERY - puppet last run on elastic1026 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [17:29:53] RECOVERY - puppet last run on mw1262 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:29:53] RECOVERY - puppet last run on rhodium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:30:43] RECOVERY - puppet last run on db1094 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [17:31:33] RECOVERY - puppet last run on elastic1043 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [17:32:03] RECOVERY - puppet last run on conf1003 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [17:32:03] RECOVERY - puppet last run on restbase1013 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [17:32:43] RECOVERY - puppet last run on analytics1051 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [17:34:38] 10Operations, 10ops-eqiad, 10hardware-requests: Decommission host erbium - https://phabricator.wikimedia.org/T185226#3979284 (10RobH) p:05Triage>03Normal [17:36:20] 10Operations, 10ops-codfw, 10hardware-requests, 10Patch-For-Review: Decommission db2012 - https://phabricator.wikimedia.org/T187543#3978461 (10RobH) [17:37:25] 10Operations, 10ops-codfw, 10DBA, 10hardware-requests, 10Patch-For-Review: Decommission db2012 - https://phabricator.wikimedia.org/T187543#3979295 (10RobH) [17:37:41] 10Operations, 10ops-eqiad, 10DBA, 10hardware-requests, 10Patch-For-Review: Decommission db1043 - https://phabricator.wikimedia.org/T187542#3979298 (10RobH) [17:39:12] 10Operations, 10ops-codfw, 10DBA, 10hardware-requests, 10Patch-For-Review: Decommission db2012 - https://phabricator.wikimedia.org/T187543#3978461 (10RobH) Since this is pending the DBA team's work on stating the new host is online, I've appended in the #DBA flag. Once the DBA team work is done (their s... [17:39:18] 10Operations, 10ops-eqiad, 10DBA, 10hardware-requests, 10Patch-For-Review: Decommission db1043 - https://phabricator.wikimedia.org/T187542#3978426 (10RobH) Since this is pending the DBA team's work on stating the new host is online, I've appended in the #DBA flag. Once the DBA team work is done (their s... [17:40:04] back into the s/salt/decom mines [17:57:53] (03PS1) 10Chico Venancio: Graphite sumSeries to reduce shinken puppet failures false positves [puppet] - 10https://gerrit.wikimedia.org/r/411315 [18:02:49] 10Operations, 10Analytics-Kanban, 10monitoring, 10netops, and 2 others: Pull netflow data in realtime from Kafka via Tranquillity/Spark - https://phabricator.wikimedia.org/T181036#3979339 (10Nuria) Are we planing to use tranquility to move the he data into druid or rather just kafka-> camus-> hive? [18:07:43] (03CR) 10Arturo Borrero Gonzalez: [C: 04-1] "I like the change as commented on IRC. This review is more about the format of the patch:" [puppet] - 10https://gerrit.wikimedia.org/r/411315 (owner: 10Chico Venancio) [18:23:23] (03PS1) 10RobH: decom mc201[78] [dns] - 10https://gerrit.wikimedia.org/r/411323 (https://phabricator.wikimedia.org/T187474) [18:23:43] 10Operations, 10ops-codfw, 10DC-Ops, 10hardware-requests, 10Patch-For-Review: Decommission old and unused/spare servers in codfw - https://phabricator.wikimedia.org/T187474#3979396 (10RobH) [18:23:48] (03PS1) 10Cmjohnson: Updating MAC address labstore1006-7 [puppet] - 10https://gerrit.wikimedia.org/r/411324 (https://phabricator.wikimedia.org/T186756) [18:23:59] (03PS3) 10Dzahn: Revert "mediawiki: reduce frequency of purge_abusefilter to weekly" [puppet] - 10https://gerrit.wikimedia.org/r/411031 [18:24:32] (03CR) 10Cmjohnson: [C: 032] Updating MAC address labstore1006-7 [puppet] - 10https://gerrit.wikimedia.org/r/411324 (https://phabricator.wikimedia.org/T186756) (owner: 10Cmjohnson) [18:24:43] (03CR) 10RobH: [C: 032] decom mc201[78] [dns] - 10https://gerrit.wikimedia.org/r/411323 (https://phabricator.wikimedia.org/T187474) (owner: 10RobH) [18:24:47] (03CR) 10Dzahn: [C: 032] "per Reedy's comment on ticket, running "en" just takes minutes now, going back to how things were before" [puppet] - 10https://gerrit.wikimedia.org/r/411031 (owner: 10Dzahn) [18:24:52] 10Operations, 10ops-eqiad: Rack/cable/configure asw2-b-eqiad switch stack - https://phabricator.wikimedia.org/T183585#3979420 (10ayounsi) [18:25:03] (03PS4) 10Dzahn: Revert "mediawiki: reduce frequency of purge_abusefilter to weekly" [puppet] - 10https://gerrit.wikimedia.org/r/411031 [18:25:53] (03PS1) 10Madhuvishy: partman: Add recipe for dumps distribution servers [puppet] - 10https://gerrit.wikimedia.org/r/411326 [18:26:21] 10Operations, 10ops-codfw, 10DC-Ops, 10hardware-requests, 10Patch-For-Review: Decommission old and unused/spare servers in codfw - https://phabricator.wikimedia.org/T187474#3979427 (10RobH) a:05RobH>03Papaul All of these systems are now ready for on-site steps, assigned to @papaul. [18:27:52] 10Operations, 10ops-eqiad: Rack/cable/configure asw2-b-eqiad switch stack - https://phabricator.wikimedia.org/T183585#3979437 (10ayounsi) @Cmjohnson please pre-populate the following interfaces with SFP-Ts: ``` ge-2/0/9 db1099 ge-2/0/17 db1060 ge-2/0/18 ms-be... [18:31:12] (03PS2) 10RobH: partman: Add recipe for dumps distribution servers [puppet] - 10https://gerrit.wikimedia.org/r/411326 (owner: 10Madhuvishy) [18:31:20] (03CR) 10RobH: [C: 032] partman: Add recipe for dumps distribution servers [puppet] - 10https://gerrit.wikimedia.org/r/411326 (owner: 10Madhuvishy) [18:32:09] (03CR) 10Madhuvishy: [V: 032] partman: Add recipe for dumps distribution servers [puppet] - 10https://gerrit.wikimedia.org/r/411326 (owner: 10Madhuvishy) [18:34:42] !log upgraded zuul [18:34:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:37:58] (03PS1) 10Arturo Borrero Gonzalez: apt: apt-upgrade: cleanup report output [puppet] - 10https://gerrit.wikimedia.org/r/411330 (https://phabricator.wikimedia.org/T181647) [18:38:23] (03PS2) 10Arturo Borrero Gonzalez: apt: apt-upgrade: cleanup report output [puppet] - 10https://gerrit.wikimedia.org/r/411330 (https://phabricator.wikimedia.org/T181647) [18:38:23] RECOVERY - Host labstore1006 is UP: PING OK - Packet loss = 0%, RTA = 0.22 ms [18:42:34] (03CR) 10Arturo Borrero Gonzalez: [C: 032] apt: apt-upgrade: cleanup report output [puppet] - 10https://gerrit.wikimedia.org/r/411330 (https://phabricator.wikimedia.org/T181647) (owner: 10Arturo Borrero Gonzalez) [18:51:03] (03PS2) 10Chico Venancio: shinken: WMCS: use check_graphite_series sumSeries to reduce puppet failures false positves [puppet] - 10https://gerrit.wikimedia.org/r/411315 [18:58:41] (03CR) 10Arturo Borrero Gonzalez: [C: 04-1] "The author is still using your gmail address. Also, if we are including the Signed-off-by line, better place at the end of the commit mess" [puppet] - 10https://gerrit.wikimedia.org/r/411315 (owner: 10Chico Venancio) [19:00:47] 10Operations, 10ops-eqiad: Rack/cable/configure asw2-b-eqiad switch stack - https://phabricator.wikimedia.org/T183585#3979544 (10ayounsi) [19:15:12] (03PS1) 10Chad: mw.org: remove old keys txt file from 2009 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411364 [19:17:05] (03PS1) 10Chad: mw.org: Symlink keys.html to index.html [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411367 [19:17:19] (03PS3) 10Chico Venancio: shinken: WMCS: use check_graphite_series sumSeries to reduce puppet failures false positves [puppet] - 10https://gerrit.wikimedia.org/r/411315 [19:18:32] (03PS1) 10Chad: Move mw.org docroot to mediawiki.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411368 [19:20:34] (03PS1) 10Chad: Turn wikimedia.org docroot into symlink to standard-docroot [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411369 [19:23:22] Krinkle: 404.html is *only* used for the old secure.wm.o redirect vhost.....is there a pressing reason we couldn't use 404.php? [19:50:36] (03PS3) 10Krinkle: extract2: Set wiki context directly instead of MW_LANG indirection [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410109 [19:50:48] (03PS3) 10Krinkle: multiversion: Remove support for MW_LANG env override [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410110 [19:50:51] (03PS4) 10Krinkle: multiversion: Remove support for MW_LANG env override [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410110 [19:51:39] !log andrew@tin Started deploy [horizon/deploy@bdcc12b]: ocata branch with sidebar fix [19:51:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:54:51] !log andrew@tin Finished deploy [horizon/deploy@bdcc12b]: ocata branch with sidebar fix (duration: 03m 12s) [19:55:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:10:10] !log andrew@tin Started deploy [horizon/deploy@1fdd122]: two more small fixes [20:10:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:11:31] !log andrew@tin Finished deploy [horizon/deploy@1fdd122]: two more small fixes (duration: 01m 21s) [20:11:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:30:11] (03CR) 10Krinkle: [C: 031] mw.org: remove old keys txt file from 2009 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411364 (owner: 10Chad) [20:39:40] !log andrew@tin Started deploy [horizon/deploy@efcba2b]: sudo dashboard update [20:39:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:40:56] !log andrew@tin Finished deploy [horizon/deploy@efcba2b]: sudo dashboard update (duration: 01m 16s) [20:41:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:52:52] (03PS1) 10Chad: Gerrit: Also set read timeout [puppet] - 10https://gerrit.wikimedia.org/r/411394 [20:53:05] (03PS2) 10Chad: Gerrit: Also set ldap read timeout [puppet] - 10https://gerrit.wikimedia.org/r/411394 [21:01:08] 10Operations, 10ops-codfw: Decommission mw2017 and mw2099 - https://phabricator.wikimedia.org/T187467#3979829 (10Papaul) a:05Papaul>03RobH [21:06:16] (03PS1) 10Chad: Gerrit: Tweak SSH timeout settings and such [puppet] - 10https://gerrit.wikimedia.org/r/411397 [21:09:31] (03CR) 10Paladox: [C: 031] Gerrit: Tweak SSH timeout settings and such [puppet] - 10https://gerrit.wikimedia.org/r/411397 (owner: 10Chad) [21:10:37] (03CR) 10Paladox: [C: 031] Gerrit: Also set ldap read timeout [puppet] - 10https://gerrit.wikimedia.org/r/411394 (owner: 10Chad) [21:12:15] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10User-Elukey: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3979836 (10mmodell) If it's really as simple as importi... [21:12:37] !log Upgraded Zuul to https://gerrit.wikimedia.org/r/#/c/411322/3 | T187567 [21:12:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:12:53] T187567: CI is running against parent patches, not the patches themselves for chained patches - https://phabricator.wikimedia.org/T187567 [21:26:53] (03PS4) 10Chico Venancio: shinken: WMCS: use check_graphite_series sumSeries to reduce puppet failures false positves [puppet] - 10https://gerrit.wikimedia.org/r/411315 [21:29:50] (03PS1) 10EBernhardson: Deploy libhdfs0 to hadoop nodes [puppet] - 10https://gerrit.wikimedia.org/r/411464 [21:32:53] (03CR) 10EBernhardson: "This is semi-related to T187139. In that ticket copying files from hdfs to the local machine so a C++ library could read them triggered an" [puppet] - 10https://gerrit.wikimedia.org/r/411464 (owner: 10EBernhardson) [21:35:16] (03CR) 10Rush: shinken: WMCS: use check_graphite_series sumSeries to reduce puppet failures false positves (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/411315 (owner: 10Chico Venancio) [21:35:35] PROBLEM - Check systemd state on labpuppetmaster1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [21:40:09] seeing quite a lot of log noise from job queue. Stuff like "29 buffered job(s) of type(s) JobSpecification, CdnPurgeJob never inserted" [21:40:34] I'm not sure if this is worrisome or not? [21:44:12] Not last time I checked. [21:49:00] 10Operations, 10Traffic, 10Wikipedia-Android-App-Backlog, 10Wikipedia-iOS-App-Backlog, and 2 others: Zero: Investigate removing the limit on carrier tagging to m-dot and zero-dot requests - https://phabricator.wikimedia.org/T137990#3979897 (10Dbrant) 05Open>03Invalid [22:08:05] PROBLEM - Host scb2005 is DOWN: PING CRITICAL - Packet loss = 100% [22:09:55] RECOVERY - Host scb2005 is UP: PING OK - Packet loss = 0%, RTA = 36.08 ms [22:17:08] (03PS1) 10Rush: openstack: neutron l3 and service for labtestn [puppet] - 10https://gerrit.wikimedia.org/r/411488 (https://phabricator.wikimedia.org/T167293) [22:17:18] (03PS1) 10Andrew Bogott: wmcs encapi: preposterous erb hack to limit POSTs to horizon hosts [puppet] - 10https://gerrit.wikimedia.org/r/411489 [22:17:56] (03CR) 10jerkins-bot: [V: 04-1] wmcs encapi: preposterous erb hack to limit POSTs to horizon hosts [puppet] - 10https://gerrit.wikimedia.org/r/411489 (owner: 10Andrew Bogott) [22:37:29] (03PS2) 10Andrew Bogott: wmcs encapi: preposterous erb hack to limit POSTs to horizon hosts [puppet] - 10https://gerrit.wikimedia.org/r/411489 [22:38:01] (03CR) 10jerkins-bot: [V: 04-1] wmcs encapi: preposterous erb hack to limit POSTs to horizon hosts [puppet] - 10https://gerrit.wikimedia.org/r/411489 (owner: 10Andrew Bogott) [22:39:36] (03PS3) 10Andrew Bogott: wmcs encapi: preposterous hack to limit POSTs to horizon hosts [puppet] - 10https://gerrit.wikimedia.org/r/411489 [22:40:05] (03CR) 10jerkins-bot: [V: 04-1] wmcs encapi: preposterous hack to limit POSTs to horizon hosts [puppet] - 10https://gerrit.wikimedia.org/r/411489 (owner: 10Andrew Bogott) [22:41:56] (03PS2) 10Krinkle: Enable EtcdConfig on the debug hosts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411296 (https://phabricator.wikimedia.org/T149617) (owner: 10Giuseppe Lavagetto) [22:42:00] (03CR) 10Krinkle: "Fixed phpcs violation" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411296 (https://phabricator.wikimedia.org/T149617) (owner: 10Giuseppe Lavagetto) [22:43:35] (03PS4) 10Andrew Bogott: wmcs encapi: preposterous hack to limit POSTs to horizon hosts [puppet] - 10https://gerrit.wikimedia.org/r/411489 [22:44:12] (03CR) 10jerkins-bot: [V: 04-1] wmcs encapi: preposterous hack to limit POSTs to horizon hosts [puppet] - 10https://gerrit.wikimedia.org/r/411489 (owner: 10Andrew Bogott) [22:45:05] (03PS5) 10Andrew Bogott: wmcs encapi: preposterous hack to limit POSTs to horizon hosts [puppet] - 10https://gerrit.wikimedia.org/r/411489 [22:55:25] mutante: Is there anywhere we actually use HTTP 418? Or do we just have the HTTP 301 at coffee.wikimedia.org ? [22:59:46] (03PS6) 10Andrew Bogott: wmcs encapi: preposterous hack to limit POSTs to horizon hosts [puppet] - 10https://gerrit.wikimedia.org/r/411489 [23:04:04] (03CR) 10Andrew Bogott: [C: 032] wmcs encapi: preposterous hack to limit POSTs to horizon hosts [puppet] - 10https://gerrit.wikimedia.org/r/411489 (owner: 10Andrew Bogott) [23:08:59] Ivy: Used to, for a short time, for the "Old IE insecure SSL support" page [23:08:59] https://github.com/wikimedia/puppet/commit/fb7eae473a44cdeac2c9188ac388fe7fbeabf4b4 [23:13:06] Hmmm. [23:13:20] Oh well. [23:24:29] (03PS5) 10Chico Venancio: shinken: WMCS: use check_graphite_series sumSeries to reduce puppet failures false positves [puppet] - 10https://gerrit.wikimedia.org/r/411315 [23:36:36] (03PS1) 10Andrew Bogott: labspuppetbackend: rewrite of the read-only security layer [puppet] - 10https://gerrit.wikimedia.org/r/411520 [23:37:11] (03CR) 10jerkins-bot: [V: 04-1] labspuppetbackend: rewrite of the read-only security layer [puppet] - 10https://gerrit.wikimedia.org/r/411520 (owner: 10Andrew Bogott) [23:38:22] (03PS2) 10Andrew Bogott: labspuppetbackend: rewrite of the read-only security layer [puppet] - 10https://gerrit.wikimedia.org/r/411520 [23:43:11] (03PS3) 10Andrew Bogott: labspuppetbackend: rewrite of the read-only security layer [puppet] - 10https://gerrit.wikimedia.org/r/411520 [23:47:10] (03CR) 10Andrew Bogott: "puppet diff can be found here: https://puppet-compiler.wmflabs.org/compiler02/10011/labpuppetmaster1001.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/411520 (owner: 10Andrew Bogott) [23:51:03] (03PS4) 10Andrew Bogott: labspuppetbackend: rewrite of the read-only security layer [puppet] - 10https://gerrit.wikimedia.org/r/411520 (https://phabricator.wikimedia.org/T187499) [23:53:51] (03PS5) 10Andrew Bogott: labspuppetbackend: rewrite of the read-only security layer [puppet] - 10https://gerrit.wikimedia.org/r/411520 (https://phabricator.wikimedia.org/T187499)