[00:01:29] (03CR) 10Aaron Schulz: [C: 031] HHVM: capture & log traces for catchable fatals. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163680 (owner: 10Ori.livneh) [00:01:45] RECOVERY - Corp OIT LDAP Mirror on plutonium is OK: LDAP OK - 0.007 seconds response time [00:09:35] Just got 403 Forbidden on en.W [00:09:46] I think it's 403s everywhere [00:09:53] 404 File Not Found also [00:09:55] fuck [00:09:58] that's me [00:10:11] not logged in 403, logged in HHVM 404 [00:10:11] (03PS1) 10MaxSem: Revert "Remove live-1.5 and skins-1.5" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163779 [00:10:13] What did you do? [00:10:17] hahahaha [00:10:18] (03CR) 10MaxSem: [C: 032 V: 032] Revert "Remove live-1.5 and skins-1.5" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163779 (owner: 10MaxSem) [00:10:40] !log maxsem Synchronized live-1.5: (no message) (duration: 00m 03s) [00:10:40] hahahaha... classical :D [00:10:45] Logged the message, Master [00:10:45] Reedy: care to log it? [00:10:59] MaxSem: Yep, it's back [00:11:07] pheww [00:12:04] !log https://gerrit.wikimedia.org/r/#/c/162520/ broke stuff, reverted [00:12:04] wow, most concise site-destroying patch ever? [00:12:08] Logged the message, Master [00:12:37] !log maxsem Synchronized w/skins-1.5: (no message) (duration: 00m 03s) [00:12:42] Logged the message, Master [00:12:42] just two symlinks, one of which is only one character long [00:13:04] bd808: I think MaxSem just earned a t-shirt [00:13:21] lol [00:13:37] Oh, it wasn't just commons? Cool. :D [00:14:06] yep, it was boring not to cause full outages in 2 years of deploying stuff, had to fix it :| [00:16:10] the blip on gdash is tiny, maybe 4 mins max [00:16:51] (03PS1) 10Dzahn: fix syntax in check_lucene_frontend [puppet] - 10https://gerrit.wikimedia.org/r/163781 [00:18:28] internal server error on icinga now? [00:18:50] hmm.. no. nvm [00:20:25] (03CR) 10Dzahn: [C: 032] "not even used ? oh well, fixed anyways, we just want the BGP checks to work" [puppet] - 10https://gerrit.wikimedia.org/r/163781 (owner: 10Dzahn) [00:24:26] !log linne - shutting down, revoking puppet cert, salt key, puppet/icinga ... [00:24:31] Logged the message, Master [00:52:45] (03PS1) 10Alexandros Kosiaris: openldap: force password policy [puppet] - 10https://gerrit.wikimedia.org/r/163786 [00:58:17] !log integration-slave1007 and integration-slave1008 have not gotten any jobs in the past 24h. integration-slave1006 however has gotten loads of action. Investigating load balancing issue. [00:58:22] Logged the message, Master [01:00:14] !log Jenkins connection seemed in order with integration-slave1007 and 8, but disconnecting and relaunching the slave agents immediately resulted in them getting jobs assigned. Cause unknown, problem resolved for now. [01:00:19] Logged the message, Master [01:04:10] (03PS1) 10Dzahn: remove unused check_lucene_frontend [puppet] - 10https://gerrit.wikimedia.org/r/163788 [01:16:51] (03CR) 10Dzahn: [C: 031] rm root cert from chain [puppet] - 10https://gerrit.wikimedia.org/r/111387 (owner: 10Jeremyb) [01:23:36] (03CR) 10Chmarkine: "Do you need to redirect on port 443 as well?" [puppet] - 10https://gerrit.wikimedia.org/r/163756 (owner: 10Dzahn) [01:25:21] (03PS2) 10Alexandros Kosiaris: openldap: force password policy [puppet] - 10https://gerrit.wikimedia.org/r/163786 [01:32:47] PROBLEM - puppet last run on ocg1002 is CRITICAL: CRITICAL: puppet fail [01:34:02] (03CR) 10Alexandros Kosiaris: [C: 032] openldap: fix sambaNTpassword aci [puppet] - 10https://gerrit.wikimedia.org/r/163758 (owner: 10Alexandros Kosiaris) [01:34:16] (03PS3) 10Alexandros Kosiaris: openldap: force password policy [puppet] - 10https://gerrit.wikimedia.org/r/163786 [01:34:22] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] openldap: force password policy [puppet] - 10https://gerrit.wikimedia.org/r/163786 (owner: 10Alexandros Kosiaris) [01:38:41] (03CR) 10Chmarkine: [C: 031] "I forgot noc is now behind misc-web. This change looks good to me." [puppet] - 10https://gerrit.wikimedia.org/r/163756 (owner: 10Dzahn) [02:00:08] (03PS1) 10Krinkle: [WIP] Implement role::ci::slave::localbrowser (Chromium) [puppet] - 10https://gerrit.wikimedia.org/r/163791 [02:09:56] (03PS2) 10Krinkle: [WIP] Implement role::ci::slave::localbrowser (Chromium) [puppet] - 10https://gerrit.wikimedia.org/r/163791 [02:10:03] (03CR) 10Krinkle: [C: 04-1] [WIP] Implement role::ci::slave::localbrowser (Chromium) [puppet] - 10https://gerrit.wikimedia.org/r/163791 (owner: 10Krinkle) [02:10:09] (03PS3) 10Krinkle: [WIP] Implement role::ci::slave::localbrowser (Chromium) [puppet] - 10https://gerrit.wikimedia.org/r/163791 [02:10:18] PROBLEM - Disk space on virt0 is CRITICAL: DISK CRITICAL - free space: /a 3614 MB (3% inode=99%): [02:10:30] springle: if you have a moment, could you review (but not merge) https://gerrit.wikimedia.org/r/#/c/163222/ ? [02:11:18] RECOVERY - Disk space on virt0 is OK: DISK OK [02:11:32] (03PS1) 10Ori.livneh: Make the 503 error page consistent with other 5xx error pages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163794 [02:16:28] PROBLEM - puppet last run on mw1093 is CRITICAL: CRITICAL: Puppet has 1 failures [02:19:18] RECOVERY - puppet last run on ocg1002 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [02:25:31] (03CR) 10Ori.livneh: [WIP] Implement role::ci::slave::localbrowser (Chromium) (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/163791 (owner: 10Krinkle) [02:28:15] (03CR) 10Ori.livneh: [C: 032] "This is just php-fatal-error.html with "503 Service Temporarily Unavailable" hard-coded in the TechnicalStuff div." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163794 (owner: 10Ori.livneh) [02:28:19] (03Merged) 10jenkins-bot: Make the 503 error page consistent with other 5xx error pages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163794 (owner: 10Ori.livneh) [02:34:07] !log LocalisationUpdate completed (1.24wmf22) at 2014-09-30 02:34:07+00:00 [02:34:13] Logged the message, Master [02:34:27] RECOVERY - puppet last run on mw1093 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [02:35:07] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/). [02:41:24] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [02:41:26] !log ori Synchronized 503.html: Ia88b306ef: Make the 503 error page consistent with other 5xx error pages (duration: 00m 08s) [02:41:32] Logged the message, Master [02:47:17] (03PS4) 10coren: Use the Ubuntu Way of installing SSL certs [puppet] - 10https://gerrit.wikimedia.org/r/163222 [02:56:54] PROBLEM - puppet last run on mw1107 is CRITICAL: CRITICAL: Puppet has 1 failures [03:04:03] (03PS5) 10coren: Use the Ubuntu Way of installing SSL certs [puppet] - 10https://gerrit.wikimedia.org/r/163222 [03:09:48] (03Abandoned) 10Reedy: Don't use deployment-rsync01 as scap proxy [puppet] - 10https://gerrit.wikimedia.org/r/163736 (owner: 10Reedy) [03:13:55] (03CR) 10coren: [C: 031] "This should make our certificate handing sane, with caveats:" [puppet] - 10https://gerrit.wikimedia.org/r/163222 (owner: 10coren) [03:15:06] RECOVERY - puppet last run on mw1107 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [03:17:15] !log LocalisationUpdate completed (1.25wmf1) at 2014-09-30 03:17:15+00:00 [03:17:20] Logged the message, Master [04:00:35] (03PS1) 10coren: Autogenerate chained certificates [puppet] - 10https://gerrit.wikimedia.org/r/163798 [04:18:48] !log LocalisationUpdate ResourceLoader cache refresh completed at Tue Sep 30 04:18:48 UTC 2014 (duration 18m 47s) [04:18:52] Logged the message, Master [05:08:05] PROBLEM - Host ps1-c2-pmtpa is DOWN: PING CRITICAL - Packet loss = 100% [05:08:07] PROBLEM - Host ps1-d1-pmtpa is DOWN: PING CRITICAL - Packet loss = 100% [05:08:07] PROBLEM - Host ps1-d2-pmtpa is DOWN: PING CRITICAL - Packet loss = 100% [05:08:07] PROBLEM - Host ps1-c3-pmtpa is DOWN: PING CRITICAL - Packet loss = 100% [05:08:07] PROBLEM - Host ps1-c1-pmtpa is DOWN: PING CRITICAL - Packet loss = 100% [05:08:07] PROBLEM - Host ps1-d3-pmtpa is DOWN: PING CRITICAL - Packet loss = 100% [05:08:56] PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 222, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-5/2/1: down - Core: cr2-pmtpa:xe-0/0/0 (Level3/FPL, CV71026) {#2008} [10Gbps wave]BR [05:12:51] uhm [05:13:07] did we lose tampa a little ahead of schedule or something? [05:13:24] I can still ping fenari [05:13:58] ah we lost the pmtpa mgmt network [05:14:09] (I think) [05:17:29] no, we just lost 1/2 links period is what I now think, based on the labels [05:17:43] who knows. hosts are reachable, mgmt apparently not. [05:24:40] RECOVERY - Host ps1-c2-pmtpa is UP: PING OK - Packet loss = 0%, RTA = 33.71 ms [05:24:40] RECOVERY - Host ps1-d2-pmtpa is UP: PING OK - Packet loss = 0%, RTA = 35.52 ms [05:24:40] RECOVERY - Host ps1-c3-pmtpa is UP: PING OK - Packet loss = 0%, RTA = 34.35 ms [05:24:40] RECOVERY - Host ps1-d3-pmtpa is UP: PING OK - Packet loss = 0%, RTA = 38.56 ms [05:24:50] RECOVERY - Host ps1-d1-pmtpa is UP: PING OK - Packet loss = 0%, RTA = 34.39 ms [05:24:50] RECOVERY - Host ps1-c1-pmtpa is UP: PING OK - Packet loss = 0%, RTA = 37.48 ms [05:25:20] RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 224, down: 0, dormant: 0, excluded: 0, unused: 0 [06:12:12] (03PS1) 10Ori.livneh: HHVM: send profiling data to port 3812 on tungsten [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163801 [06:12:47] (03CR) 10Ori.livneh: [C: 032] HHVM: send profiling data to port 3812 on tungsten [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163801 (owner: 10Ori.livneh) [06:12:50] <_joe_> ori: can't we get rid of that damned tool? [06:12:51] (03Merged) 10jenkins-bot: HHVM: send profiling data to port 3812 on tungsten [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163801 (owner: 10Ori.livneh) [06:12:58] <_joe_> I mean mwprof [06:13:50] not really [06:14:00] not yet, anyways. [06:14:06] <_joe_> :( [06:28:41] PROBLEM - puppet last run on gallium is CRITICAL: CRITICAL: puppet fail [06:29:12] PROBLEM - puppet last run on db1067 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:42] PROBLEM - puppet last run on db1015 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:01] PROBLEM - puppet last run on db1018 is CRITICAL: CRITICAL: Puppet has 3 failures [06:30:31] PROBLEM - puppet last run on analytics1030 is CRITICAL: CRITICAL: Puppet has 3 failures [06:30:32] PROBLEM - puppet last run on mw1177 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:32] PROBLEM - puppet last run on mw1065 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:41] PROBLEM - puppet last run on mw1118 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:42] PROBLEM - puppet last run on dbproxy1001 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:51] PROBLEM - puppet last run on mw1119 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:51] PROBLEM - puppet last run on cp3014 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:01] PROBLEM - puppet last run on db1059 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:02] PROBLEM - puppet last run on mw1042 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:02] PROBLEM - puppet last run on mw1170 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:02] PROBLEM - puppet last run on lvs2004 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:12] PROBLEM - puppet last run on ms-fe1004 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:22] PROBLEM - puppet last run on mw1172 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:23] PROBLEM - puppet last run on cp4003 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:31] PROBLEM - puppet last run on mw1092 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:32] PROBLEM - puppet last run on mw1052 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:41] PROBLEM - puppet last run on mw1123 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:42] PROBLEM - puppet last run on cp4008 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:42] PROBLEM - puppet last run on mw1025 is CRITICAL: CRITICAL: Puppet has 1 failures [06:45:17] RECOVERY - puppet last run on db1015 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [06:45:18] RECOVERY - puppet last run on db1018 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [06:45:39] RECOVERY - puppet last run on mw1092 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [06:45:47] RECOVERY - puppet last run on mw1052 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [06:45:50] RECOVERY - puppet last run on analytics1030 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [06:45:57] RECOVERY - puppet last run on cp4003 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [06:45:58] RECOVERY - puppet last run on mw1119 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [06:45:58] RECOVERY - puppet last run on mw1065 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [06:46:08] RECOVERY - puppet last run on db1059 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [06:46:17] RECOVERY - puppet last run on mw1042 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [06:46:17] RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [06:46:17] RECOVERY - puppet last run on dbproxy1001 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [06:46:18] RECOVERY - puppet last run on lvs2004 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [06:46:18] RECOVERY - puppet last run on cp4008 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [06:46:27] RECOVERY - puppet last run on mw1118 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [06:46:27] RECOVERY - puppet last run on ms-fe1004 is OK: OK: Puppet is currently enabled, last run 60 seconds ago with 0 failures [06:46:28] RECOVERY - puppet last run on db1067 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [06:46:28] RECOVERY - puppet last run on mw1025 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [06:46:37] RECOVERY - puppet last run on mw1172 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [06:46:38] RECOVERY - puppet last run on mw1123 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [06:46:47] RECOVERY - puppet last run on mw1177 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [06:46:57] RECOVERY - puppet last run on cp3014 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [06:47:21] RECOVERY - puppet last run on gallium is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [06:55:38] PROBLEM - puppet last run on virt1000 is CRITICAL: CRITICAL: Puppet has 2 failures [07:09:10] good morning [07:10:39] <_joe_> morning hashar [07:11:52] _joe_: do you have some bandwidth this morning to talk about a change that switch Jenkins to Java 7 ? :)D [07:12:35] RECOVERY - puppet last run on virt1000 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [07:18:11] <_joe_> to java 7?!? [07:18:58] <_joe_> anyway, maybe later [07:19:06] :-D [07:19:22] will poke you again tomorrow [07:20:20] <_joe_> hashar: I'm working on https://gerrit.wikimedia.org/r/#/c/147486 [07:20:41] <_joe_> which is a first step into simplifying _a_lot_ our apache config [07:22:58] \O/ [07:23:10] speaking of that, I have to reenable the apache config linting [07:25:00] <_joe_> yeah, that [07:58:49] (03PS7) 10Giuseppe Lavagetto: Move a lot of the miscellaneous wikis out of their own specific docroots [puppet] - 10https://gerrit.wikimedia.org/r/147486 (owner: 10Reedy) [08:13:04] (03PS1) 10Giuseppe Lavagetto: swift_new: include role::swift::base [puppet] - 10https://gerrit.wikimedia.org/r/163809 [08:16:13] <_joe_> godog: ^^ [08:16:29] <_joe_> I think this is why we didn't get the host declared in icinga [08:31:58] _joe_: indeed, I think bblack fixed it in https://gerrit.wikimedia.org/r/#/c/163755/ [08:32:48] <_joe_> godog: well, mine way is more "standard" [08:33:06] I see what you did there with "standard" [08:33:44] <_joe_> my Idea is, we could move "include admin" to role::swift::base as well [08:36:43] ye I don't think is necessary, the thing is that role::swift::base will be likely gone when the legacy swift stuff is gone [08:44:33] <_joe_> well, copy-pasting include standard, admin everywhere is not the way to go [08:45:04] <_joe_> this is btw one of the few cases where inheritance may make sense [08:47:24] yeah it'd be nice to have a list of cases where we want it under e.g. Puppet_coding in wikitech [08:53:13] (03PS1) 10Hashar: Jenkins job validation (DO NOT SUBMIT) [puppet] - 10https://gerrit.wikimedia.org/r/163814 [08:53:29] (03PS1) 10Hashar: Jenkins job validation (DO NOT SUBMIT) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163815 [08:57:22] <_joe_> yeah btw, I should update that with a hiera section [08:57:52] (03CR) 10Hashar: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/163814 (owner: 10Hashar) [08:57:57] (03CR) 10Hashar: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163815 (owner: 10Hashar) [09:04:10] (03CR) 10Hashar: "Chris/Zeljkof would confirm, but I think we want the browsertests jobs to run on SauceLabs. Do you have a specific use case in mind?" [puppet] - 10https://gerrit.wikimedia.org/r/163791 (owner: 10Krinkle) [09:13:24] (03PS1) 10Filippo Giunchedi: swift: check device, not partition for xfs [puppet] - 10https://gerrit.wikimedia.org/r/163816 [09:13:43] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] swift: check device, not partition for xfs [puppet] - 10https://gerrit.wikimedia.org/r/163816 (owner: 10Filippo Giunchedi) [09:14:19] (03Abandoned) 10Filippo Giunchedi: swift: add missing xfsprogs dependency [puppet] - 10https://gerrit.wikimedia.org/r/163576 (owner: 10Filippo Giunchedi) [09:18:09] RECOVERY - puppet last run on ms-be2009 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [09:18:09] RECOVERY - puppet last run on ms-be2001 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [09:22:30] RECOVERY - puppet last run on ms-be2002 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [09:24:41] !log reboot ms-be2001 as a test [09:24:47] Logged the message, Master [09:25:26] RECOVERY - puppet last run on ms-be2004 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [09:28:24] RECOVERY - puppet last run on ms-be2011 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [09:30:30] RECOVERY - puppet last run on ms-be2012 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [09:41:09] !log removed obsolete /etc/puppet/hiera from strontium and palladium, /etc/puppet/hieradata is the new location [09:41:13] Logged the message, Master [09:58:10] (03PS1) 10Mushroom: Add metilli.com to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163821 (https://bugzilla.wikimedia.org/71460) [10:41:32] does gerrit.wikimedia.org use MyISAM ? [10:45:21] PROBLEM - puppet last run on ms-be2003 is CRITICAL: CRITICAL: Puppet has 12 failures [10:49:12] Nemo_bis: I'd be semi-surprised... why? [10:51:00] yes, I'd be surprised too, but just checking [10:51:18] it's for gerrit 2.8.6 upgrade, I'd love ops opinions https://bugzilla.wikimedia.org/show_bug.cgi?id=63847#c6 [10:53:53] Nemo_bis: ) ENGINE=InnoDB DEFAULT CHARSET=utf8 | [10:55:15] :) [10:59:18] (03PS1) 10Filippo Giunchedi: swift: fix filesystem provisioning [puppet] - 10https://gerrit.wikimedia.org/r/163825 [10:59:46] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] swift: fix filesystem provisioning [puppet] - 10https://gerrit.wikimedia.org/r/163825 (owner: 10Filippo Giunchedi) [11:03:40] RECOVERY - puppet last run on ms-be2003 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [11:12:07] RECOVERY - puppet last run on ms-be2005 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [11:12:14] RECOVERY - puppet last run on ms-be2008 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [11:19:10] RECOVERY - puppet last run on ms-be2007 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [11:20:32] RECOVERY - puppet last run on ms-be2010 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [11:24:32] PROBLEM - swift-account-replicator on ms-be2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [11:24:42] PROBLEM - swift-object-replicator on ms-be2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [11:24:52] PROBLEM - swift-object-auditor on ms-be2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [11:24:52] PROBLEM - swift-account-auditor on ms-be2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [11:24:53] PROBLEM - swift-account-reaper on ms-be2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [11:24:53] PROBLEM - swift-container-server on ms-be2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [11:24:53] PROBLEM - swift-container-replicator on ms-be2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [11:24:53] PROBLEM - swift-container-auditor on ms-be2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [11:25:04] PROBLEM - swift-object-server on ms-be2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [11:25:11] le sigh, silencing [11:25:12] PROBLEM - swift-account-server on ms-be2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [11:25:12] PROBLEM - swift-container-updater on ms-be2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-updater [11:25:13] PROBLEM - swift-object-updater on ms-be2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-updater [11:33:42] RECOVERY - swift-account-replicator on ms-be2001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [11:33:52] RECOVERY - swift-object-replicator on ms-be2001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [11:34:02] RECOVERY - swift-object-auditor on ms-be2001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [11:34:03] RECOVERY - swift-account-auditor on ms-be2001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [11:34:03] RECOVERY - swift-account-reaper on ms-be2001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [11:34:03] RECOVERY - swift-container-server on ms-be2001 is OK: PROCS OK: 25 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [11:34:03] RECOVERY - swift-container-replicator on ms-be2001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [11:34:13] RECOVERY - swift-container-auditor on ms-be2001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [11:34:16] RECOVERY - swift-object-server on ms-be2001 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [11:34:18] RECOVERY - swift-account-server on ms-be2001 is OK: PROCS OK: 25 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [11:34:23] RECOVERY - swift-container-updater on ms-be2001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-updater [11:34:23] RECOVERY - swift-object-updater on ms-be2001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [11:37:47] ... recovery notifications sent during downtime too :( [11:41:14] PROBLEM - puppet last run on ms-be2009 is CRITICAL: CRITICAL: Puppet has 2 failures [11:43:39] PROBLEM - swift-account-replicator on ms-be2006 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [11:44:39] RECOVERY - swift-account-replicator on ms-be2006 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [11:48:20] PROBLEM - puppet last run on ms-be2004 is CRITICAL: CRITICAL: Puppet has 2 failures [11:50:38] PROBLEM - puppet last run on ms-be2011 is CRITICAL: CRITICAL: Puppet has 2 failures [11:52:29] RECOVERY - puppet last run on ms-be2009 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [11:53:48] PROBLEM - puppet last run on ms-be2012 is CRITICAL: CRITICAL: Puppet has 2 failures [12:04:44] RECOVERY - puppet last run on ms-be2004 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [12:07:54] RECOVERY - puppet last run on ms-be2011 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [12:10:23] !log Stopped exim daemon on mchenry [12:10:29] Logged the message, Master [12:11:04] RECOVERY - puppet last run on ms-be2012 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [12:12:55] PROBLEM - Exim SMTP on mchenry is CRITICAL: Connection refused [12:41:42] (03PS1) 10Gilles: Thumbnail prerendering at upload time on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163836 [12:52:34] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.0366666666667 [12:56:39] manybubbles: some CirrusSearch alarm above ^^^ [12:56:47] manybubbles: and good morning! [12:57:04] hashar: morning! I think that alarm is a bit alarmist - but will check [12:57:23] I wish it the alarm could point to a graphite graph :D [12:57:51] hashar: it'd be nice yeah [12:58:06] meanwhile you can probably craft a dashboard on http://gdash.wikimedia.org [12:58:16] its 3 queries in the past one minute over 10 seconds long. bad for them but not a huge deal [13:05:46] ocg is still looking good this morning [13:06:08] we can probably end the icinga 'scheduled downtime' for it [13:07:25] (03PS3) 10Bartosz Dziewoński: Remove dead code for amwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163148 [13:07:33] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [13:12:09] Can someone cover triage for me for a couple hours? I have to go downtown to battle bureaucracy. [13:14:30] (03CR) 10Filippo Giunchedi: [C: 031] "FWIW I don't think beta is using swift (?) perhaps the local filebackend instead" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163836 (owner: 10Gilles) [13:14:51] Coren: sure I'll take a look in RT [13:15:26] godog: There's nothing waiting on triage right now, so it's just new things. I'm leaving in about 1h and returning in about 5h. [13:16:39] Coren: can you help cscott with the icinga request? [13:16:54] Coren: ok! [13:16:56] mark: Sure. [13:17:29] mark: I'm not leaving for a bit, just wanted to make sure I was covered in advance. :-) [13:18:04] (03CR) 10Gilles: "Ah, can you find that out for me? I have no idea what the file config is like on beta." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163836 (owner: 10Gilles) [13:18:17] cscott: All 3 OCG health checks? [13:19:23] all 4. there are 3 icinga notifications that are muted, plus the ocg.svc.eqiad.wmnet ping test. but hang on -- that last one just went critical for some reason. [13:19:50] kk [13:19:57] * Coren just noticed the ping check. [13:20:24] hm. maybe just do the notifications for now. they might be flapping, and i'm not noticing. if they are, then we can mute them again. [13:20:50] does icinga keep history? [13:22:37] icinga does keep alerting history yes [13:23:03] *grumble* Hang on, icinga still doesn't think I'm cool. [13:23:24] I may need to log off then back on again, but it's not clear how you do that with http auth' [13:24:45] (03PS1) 10KartikMistry: WIP: Update config for Language pairs [puppet] - 10https://gerrit.wikimedia.org/r/163841 [13:25:46] (03PS3) 10Christopher Johnson (WMDE): Adds a libext option to phabricator class for tag install of Sprint library Change-Id: If2e66a090581e10e350f3e7f9795e3b43c6b25da [puppet] - 10https://gerrit.wikimedia.org/r/162873 [13:28:32] (03CR) 10Christopher Johnson (WMDE): "check instance @ https://phab08.wmflabs.org/ for test verification" [puppet] - 10https://gerrit.wikimedia.org/r/162873 (owner: 10Christopher Johnson (WMDE)) [13:29:04] (03PS1) 10Hoo man: Split Wikibase's entity cache for HHVM/Zend [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163842 (https://bugzilla.wikimedia.org/71461) [13:29:35] (03Abandoned) 10Christopher Johnson (WMDE): Adds a libext option to class for tag install of Sprint library [puppet] - 10https://gerrit.wikimedia.org/r/163121 (owner: 10Christopher Johnson (WMDE)) [13:29:50] (03CR) 10Manybubbles: "This is scheduled for today's SWAT. Can we resolve the retina issue before then?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/162538 (owner: 10Jhobs) [13:31:18] (03CR) 10Manybubbles: "Also there is a merge conflict (on a single binary file change?!) Anyone familiar with this project want to investigate? Is that just gi" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/162538 (owner: 10Jhobs) [13:31:39] <_joe_> Reedy: ping. If you're not around, I'll go on with the apache change anyways [13:31:51] mark: I'm still "Not Authorized" to do anything in Icinga. [13:32:03] it should be based on ldap [13:32:05] so fix it :) [13:33:26] hashar: any idea where I could find what gi11es is asking for here in beta? https://gerrit.wikimedia.org/r/#/c/163836/ [14:13:27] guess I should read backscroll first [14:13:34] <_joe_> andrewbogott: I am locked out of my instances in labs, most notably puppet-compiler02 [14:14:03] _joe_: ok, looking [14:14:16] andrewbogott: I have to leave for a couple hours and that's a patch I really, really want to watch being deployed. Review it, and I'll deploy on my return? [14:14:21] Coren: ok [14:15:08] <_joe_> andrewbogott: also, I can't delete puppet-compiler01 from the web interface; it says "this host doesn't exist" [14:15:27] <_joe_> and finally, VE doesn't work on wikitech (but that's less important of course) [14:15:28] andrewbogott: Note that it got bigger since the last time you checked - I included the matching changes to certificates elsewhere to avoid having to many timebombs of stuff-that-points-to-files-that-just-heppen-to-be-left-there [14:16:01] _joe_: the deletion failure might be fixed by a log-out-and-back-in [14:16:20] <_joe_> andrewbogott: ok thanks [14:16:27] regarding VE… https://gerrit.wikimedia.org/r/#/c/161262/ [14:18:09] _joe_: puppet-compiler-02 was fixed with "service nslcd restart" [14:18:12] I'm unsure why it needed it [14:18:17] Can you log in now? [14:18:18] <_joe_> andrewbogott: thanks [14:18:27] And, what other instances need it? I'll do the same there. [14:18:28] (03PS2) 10coren: Autogenerate chained certificates [puppet] - 10https://gerrit.wikimedia.org/r/163798 [14:18:30] (03PS6) 10coren: Use the Ubuntu Way of installing SSL certs [puppet] - 10https://gerrit.wikimedia.org/r/163222 [14:18:32] (03PS1) 10coren: give *Coren* Icinga permissions [puppet] - 10https://gerrit.wikimedia.org/r/163853 [14:18:46] <_joe_> andrewbogott: thanks a lot! [14:18:49] <_joe_> no need [14:19:14] Aaaugh. Dependent git-review. [14:19:28] (03CR) 10Gilles: "Thanks, I've had a glance and as long as the render-thumbnail-on-404 is configured (which seems to be the case on beta commons), it should" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163836 (owner: 10Gilles) [14:20:01] Anyone +1 that last one? Trivial fix that should make icinga work right for me. [14:20:12] https://gerrit.wikimedia.org/r/#/c/163853/ [14:21:07] (03CR) 10Giuseppe Lavagetto: [C: 031] give *Coren* Icinga permissions [puppet] - 10https://gerrit.wikimedia.org/r/163853 (owner: 10coren) [14:21:17] (03CR) 10Andrew Bogott: [V: 032] give *Coren* Icinga permissions [puppet] - 10https://gerrit.wikimedia.org/r/163853 (owner: 10coren) [14:21:20] <_joe_> although you don't really deserve those [14:21:23] <_joe_> :P [14:21:23] _joe_: ty [14:21:49] (03CR) 10coren: [C: 032] give *Coren* Icinga permissions [puppet] - 10https://gerrit.wikimedia.org/r/163853 (owner: 10coren) [14:22:05] oblivian is doing a graceful restart of all apaches [14:22:17] !log oblivian gracefulled all apaches [14:22:20] <_joe_> graceful /reload/ [14:22:22] Logged the message, Master [14:22:55] <_joe_> Reedy: ^^ your docroot change is live [14:24:37] PROBLEM - Apache HTTP on mw1196 is CRITICAL: HTTP CRITICAL: HTTP/1.0 500 Internal Server Error - 50413 bytes in 3.231 second response time [14:26:46] Coren: after your patch will things like /etc/ssl/certs/GlobalSign_CA.pem still exist with the same contents? I haven't followed the thread that far yet [14:26:53] <_joe_> !log restarted apache on mw1196, lots of apc errors [14:26:58] Logged the message, Master [14:27:37] RECOVERY - Apache HTTP on mw1196 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 0.101 second response time [14:27:38] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.0466666666667 [14:27:51] godog: Did you get your beta questions sorted out? I haven't really played with the thumbs infrastructure there, but I know that instead of swift we have a shared file backend on labs NFS. [14:27:55] <_joe_> manybubbles: good morning ^^ [14:28:40] _joe_: thanks - that thing is overreacting a bit. [14:28:48] good afternoon, btw [14:28:56] andrewbogott: It will. There are no ensure => absent anywhere, [14:28:56] bd808: yup thanks! the context is https://gerrit.wikimedia.org/r/163836 [14:28:59] godog: Setting up a prod-like swift cache instead of NFS has been a desire for a long time but as far as I know no one has put time into making it happen. [14:29:18] Coren: ok, but we aren't officially installing them there anymore, right? So ldap.conf needs to point to the new location... [14:29:46] Ah faster MMV will be good. :) [14:29:49] andrewbogott: "A followup patch should clean up various configuration that may [14:29:49] have been edited to point to a specific root certificate rather [14:29:49] than rely on the system" [14:29:57] bd808: indeed, I hope the new swift module I'm trying out for codfw will make things easier [14:29:59] heh, ok :) [14:30:27] New labs instances will be broken in the meantime. [14:31:11] andrewbogott: No, because update-ca-certificates will create a symlink. [14:31:17] oh, ok [14:31:25] that's what I wasn't clear on. [14:31:26] andrewbogott: It's just evil to refer to them specifically. [14:32:16] ACKNOWLEDGEMENT - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.0466666666667 manybubbles This is likely being caused by some reindexing Im doing. Its no cause for alarm. - The acknowledgement expires at: 2014-10-01 14:31:32. [14:32:22] hm… ok, I'm still confused. But I fear my questions are standing between you and breakfast [14:32:34] Is running gpg --gen-key on one of the apaches going to function, or is it going to hang excessively long while GPG waits for enough entropy? [14:32:41] But I really really want more reviews for this. The patch is simple, but it touches almost every SSL config. [14:33:26] andrewbogott: Not breakfast (that's past) I just have to leave to go downtown in ~30m for a bit. Now is a good time for questions. [14:33:58] I only barely understand enough to ask useful questions. [14:34:11] You said that the /etc/ssl/certs/*.pem files will still exist as symlinks... [14:34:37] but I note that your patch changes some references to the .pem files to /etc/ssl/localcerts/ [14:34:57] What does it mean to refer to a .crt vs a .pem? [14:35:32] Is it basically that servers should use .crt and clients should use .pem, and the client-side use should be implicit rather than explicit? [14:35:52] anomie: I would think an mw* host would have plenty of entropy collected from the network stack. [14:36:08] andrewbogott: Nothing; two extensions for the same type. Debian standardized on *.crt but the /etc/ssl/cert linkfarm uses *.pem for historical reasons. [14:36:23] bd808: I've heard the network stack traditionally isn't counted as giving much entropy though. [14:37:38] Coren: OK, and things in /etc/ssl/cert should always be autogenerated by update-ca-certificates and never placed there explicitly, right? [14:37:39] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [14:38:33] anomie: the network stack isn't counted as giving much entropy because it's potentially in the control of the attacker [14:38:43] cscott: Exactly [14:39:10] the advent of ssd has greatly decreased the amount of entropy in the world ;) [14:40:06] * anomie is looking at whether bug 68129 is possible from a technical standpoint [14:41:06] anomie: you could ssh to a random mw server and time generating a key; assuming we have gpg installed of course [14:41:33] <_joe_> anomie: lemme take a look at the bug [14:41:35] bd808: It'll be installed for SecurePoll. I don't know if draining all the entropy would break anything else though [14:41:38] _joe_: Thanks [14:42:12] The other thing I guess you could do to implement that is use a javascript gpg library to do it all in the client browser. [14:42:39] <_joe_> anomie: oh the gpg-stored-on-the-server-oh-noes thing [14:42:45] bd808: +1 [14:42:50] on the javascript client-side stuff [14:43:05] bd808, cscott: Links? [14:43:07] i don't think we're concerned about state-level actors here, right? the nsa isn't going to subvert wp board elections? [14:43:14] <_joe_> anomie: I expressed my opinion on server-side generated gpg private keys, didn't I? [14:43:27] <_joe_> yes I did [14:43:35] PROBLEM - puppet last run on baham is CRITICAL: CRITICAL: Puppet has 1 failures [14:43:43] let-me-google-that-for-you/?q=javascript+client-side+gpg [14:44:01] Coren: I'm wondering why it's meaningful to change references from e.g. /etc/ssl/certs/blog.wikimedia.org.pem to /etc/ssl/localcerts/blog.wikimedia.org.crt if the former .pem file is still reliably generated by update-ca-certificates [14:44:11] _joe_: Yes. The people who decide things don't care about "not a good idea", but they might about "won't work" [14:44:22] <_joe_> cscott: that has problems too; the only reasonable model for "easy gpg" is to create a browser extension [14:44:40] <_joe_> anomie: what is the aim of this project? To gpg sign votes? [14:44:48] anomie: https://keybase.io/kbpgp is one such library [14:44:48] cscott: That gives lots of results for doing crypto with already-generated keys [14:45:32] <_joe_> anomie: if the endgoal is to have unmodificable poll results, server-side keys won't work period [14:45:41] <_joe_> you can quote me on this. [14:45:49] _joe_: There are two keys, one to sign vote receipts and one to encrypt votes until after the election so people can't peek. They've decided they don't care about root swiping the keys or the like. [14:46:08] (03PS3) 10Aude: Split Wikibase's entity cache for HHVM/Zend [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163842 (https://bugzilla.wikimedia.org/71461) (owner: 10Hoo man) [14:46:12] <_joe_> anomie: "they" [14:46:31] Coren: probably what I need is for that patch (or a comment someplace) to list the three cert dirs with a brief explanation of their usage [14:46:38] Now I will look and see that you have already done this [14:46:41] <_joe_> sorry but my expert opinion is "this doesn't work" [14:47:03] <_joe_> the entropy generation issue you bring forward is another possible technical hurdle [14:47:14] _joe_: I'm not entirely sure who "they" is. James Alexander and Phillipe, I think. [14:47:34] <_joe_> as I'm pretty sure we do have php code that uses RNG from the kernel [14:47:48] <_joe_> hhvm uses it too internally I think, but lemme check [14:48:33] <_joe_> anomie: BTW - chris steipp was pretty clear in the bug, and on security matters he should have a say. We should all have. [14:48:39] apt-get install haveged #thereifixedit [14:50:26] andrewbogott: It's not - not for server certs. update-ca-certificates does that for *ca* certs [14:50:33] manybubbles, marktraceur, ^d: Who wants to SWAT today? [14:50:52] anomie: probably not me because I'm in a meeting [14:51:05] <^d> anomie: Can do. [14:51:06] andrewbogott: Like our wmf-ca and the RapidSSL things [14:51:12] ^d: ok! [14:51:14] _joe_: hopefully /dev/urandom ? [14:51:22] <_joe_> anomie: btw, thanks for bringing that up [14:51:45] <_joe_> cat /proc/sys/kernel/random/entropy_avail [14:52:02] ^d: can you also swat https://gerrit.wikimedia.org/r/#/c/161262/ if you aren't overwhelmed with other patches? [14:52:05] <_joe_> quite interestingly, the value is quite low on most appservers [14:52:54] <^d> andrewbogott: Add it to the list and sure. [14:53:09] ^d: I'm asking you here because I can't find the list :( [14:53:32] it's generally not sufficient for large-scale public consumption (default kernel entropy gathering). But usually it's ok if you do something sane like seed a userspace PRNG from it once per thread-process. [14:53:43] (03CR) 10Mark Bergsma: [C: 031] decom nfs1 [puppet] - 10https://gerrit.wikimedia.org/r/159442 (owner: 10Dzahn) [14:54:12] <_joe_> bblack: yeah I don't think generating gpg keys one-off for polls storage would be a big problem either [14:54:13] andrewbogott: I can add it to the list for you. [14:54:44] James_F: thanks. Is there a lucky wikitech search string I can use to find the calendar? All I get is this: https://wikitech.wikimedia.org/wiki/SWAT_deploys [14:54:57] andrewbogott: You want to edit https://wikitech.wikimedia.org/w/index.php?title=Deployments [14:55:05] andrewbogott: But happy to do so for you; I already have it open. [14:55:21] well, that's an obvious place for it. Thanks, sorry for being all helpless [14:55:29] andrewbogott: No worries at all. :-) [14:55:56] <^d> andrewbogott: https://wikitech.wikimedia.org/w/index.php?title=Deployments&diff=129224&oldid=129223 done [14:56:15] <^d> James_F: I edit conflicted you :p [14:56:18] <^d> I was already half dne [14:56:20] <^d> *done [14:56:30] ^d: Sorry. :-) [14:56:52] (03CR) 10Jforrester: [C: 031] Revert "(Re-)enable VisualEditor for Wikitech (labswiki)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/161262 (owner: 10Andrew Bogott) [14:57:31] andrewbogott: Though Roan will probably have some time after Wednesday to get Parsoid running again on wikitech; he's so far spent a few hours trying to work out what's happened to it. [14:58:19] great -- I'd love for it to work properly. Right now, though, that tab should be labled 'click here and then complain to Andrew' [14:58:37] (03CR) 10Andrew Bogott: "Naturally I have no objection to fixing this properly!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/161262 (owner: 10Andrew Bogott) [14:59:01] andrewbogott: :-) [14:59:05] <^d> James_F: For yours, there's no core change prepped? [14:59:22] ^d: No, because I can't +2 in wmf branches so can't make it for you… [14:59:33] ^d: If you +2 the VE change I'll make it for you. [14:59:44] <^d> {{done}} [14:59:46] :-) [15:00:04] manybubbles, anomie, ^d, marktraceur: Respected human, time to deploy SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140930T1500). Please do the needful. [15:01:32] Who's doing it? [15:02:04] (03CR) 10Mark Bergsma: [C: 04-2] "This should NOT be done as one huge change that may have implications in many parts of the infrastructure at the same time. It would be fa" [puppet] - 10https://gerrit.wikimedia.org/r/163222 (owner: 10coren) [15:03:14] hoo: ^d. [15:03:24] swat time? [15:03:26] <^d> Lezzzzzz seeeee [15:03:36] (03CR) 10coren: "The problem with doing it incrementally is that the location where the certificates are installed is global. Do you feel safe relying on " [puppet] - 10https://gerrit.wikimedia.org/r/163222 (owner: 10coren) [15:04:18] mark: That said, if you _do_ feel okay about services using the certificate in its old location, that _would_ make the patch much more bite-sized and we can do the servers one-by-one. [15:04:33] ^d: 163863 done, [15:04:38] mark: (That was my original idea) [15:04:40] why can't we keep managing the old certificates until we can kill them? [15:05:02] mark: You mean by having the class copy the certs twice? [15:05:11] if that's what it takes, yes [15:05:20] * Coren ponders. [15:05:25] It's kinda ugly, but doable. [15:05:29] it's ugly for sure [15:05:46] but it's explicit, it's managed, and the ugliness means we'll finish the migration ;) [15:05:53] as opposed to forget about timebombs [15:05:53] Heh. [15:05:53] (03CR) 10Chad: [C: 032] Split Wikibase's entity cache for HHVM/Zend [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163842 (https://bugzilla.wikimedia.org/71461) (owner: 10Hoo man) [15:06:09] (03Merged) 10jenkins-bot: Split Wikibase's entity cache for HHVM/Zend [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163842 (https://bugzilla.wikimedia.org/71461) (owner: 10Hoo man) [15:06:17] btw I am not terribly excited about the /etc/ssl/localcerts part [15:06:35] !log demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 06s) [15:06:40] Logged the message, Master [15:06:41] akosiaris: That's bikeshed material. /etc/ssl/localcerts/ is the most common but there really isn't a standard. [15:06:52] !log demon Synchronized wmf-config/Wikibase.php: (no message) (duration: 00m 04s) [15:06:56] <^d> hoo: You're live. [15:06:57] Logged the message, Master [15:06:59] akosiaris: What do you like better? [15:07:01] * aude checks [15:07:02] do love the /usr/local/share/ca-certificates/ though. I actually submitted wmf_ca_2014_2017 like that yesterday [15:07:21] akosiaris: I saw. But that's a spot for.. well, CA certificates. :-) [15:07:32] That will be automatically trusted. [15:07:40] aude: Looks good to me [15:07:41] Coren: why do you feel that /etc/ssl/certs is not good for certs ? [15:07:48] (03CR) 10Filippo Giunchedi: "already fixed by bblack in I63c4fbcd17, can be abandoned" [puppet] - 10https://gerrit.wikimedia.org/r/163809 (owner: 10Giuseppe Lavagetto) [15:07:54] being automanaged and not automanaged at the same time ? [15:07:55] <^d> aude: Doing your submodule update on wmf1 now [15:08:00] ok [15:08:10] akosiaris: It's an automatically managed linkfarm *and* libraries implicitly trust every cert in there to be a valid CA [15:08:22] no they dont [15:08:35] well depends ... [15:08:55] akosiaris: Right. It may or may not be, and it depends... i.e.: not a great place to stuff random server certificates. [15:09:11] I mean, it's not a catastrophe -- we *have* been using it. [15:09:12] yeah I get the point [15:09:15] <^d> andrewbogott: You've got a merge conflict on your patch. [15:09:20] ok I am convinced [15:09:40] ^d: ok, I'll fix [15:09:47] akosiaris: You cool with localcerts/ or do you have a good argument for something nicer? [15:09:54] (I mean namewise) [15:10:07] I never participate on naming bikesheds [15:10:15] feel free to name it xyz [15:10:16] :-) [15:10:22] sslocal [15:10:28] mark, andrewbogott: I'll make a new patch this afternoon that does just the new locations and doubles up on certs for the migration. [15:10:40] thanks [15:10:41] (03PS3) 10Andrew Bogott: Revert "(Re-)enable VisualEditor for Wikitech (labswiki)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/161262 [15:10:43] I needed to be convinced about the functionality, not the name [15:10:44] yeah, sounds safer [15:11:21] (03CR) 10Chad: [C: 032] Revert "(Re-)enable VisualEditor for Wikitech (labswiki)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/161262 (owner: 10Andrew Bogott) [15:11:27] _joe_: Going to battle the bureaucracy. I'll ping you as soon as I return - thanks for covering for me. [15:11:31] (03Merged) 10jenkins-bot: Revert "(Re-)enable VisualEditor for Wikitech (labswiki)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/161262 (owner: 10Andrew Bogott) [15:11:47] ^d: thanks! [15:11:55] word of warning, merging https://gerrit.wikimedia.org/r/#/c/163735/1 [15:11:58] !log demon Synchronized visualeditor-default.dblist: (no message) (duration: 00m 04s) [15:12:03] Logged the message, Master [15:12:04] anything weird with emails, ping me ASAP [15:12:07] !log demon Synchronized visualeditor.dblist: (no message) (duration: 00m 04s) [15:12:12] Logged the message, Master [15:12:18] <^d> andrewbogott: Both files live for you, feel free to sync to wikitech now [15:12:24] <^d> (and yw :)) [15:12:27] !log running sync-common on virt1000 [15:12:31] Logged the message, Master [15:12:43] !log merging https://gerrit.wikimedia.org/r/#/c/163735/1, changing the LDAP master from sanger to ldap-mirror for inbound mail [15:12:48] Logged the message, Master [15:13:09] (03CR) 10Alexandros Kosiaris: [C: 032] exim4 uses ldap-mirror now [puppet] - 10https://gerrit.wikimedia.org/r/163735 (owner: 10Alexandros Kosiaris) [15:14:18] <_joe_> Coren: wut? [15:14:45] !log demon Synchronized php-1.25wmf1/extensions/VisualEditor: (no message) (duration: 00m 08s) [15:14:50] Logged the message, Master [15:14:52] <^d> James_F: You're live ^ [15:15:00] ^d: Ta. Testing. [15:15:05] _joe_: suspecting another case of italian swapping [15:15:18] <_joe_> eehheh I guess so [15:15:53] bd808: Despite that libraries claims of not locking up the browser while it generates the keys, my browser is locked. :( [15:15:56] ^d: Works fine. Thanks! [15:15:57] (03CR) 10Gilles: [C: 031] Enable image dimension logging in MediaViewer [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163279 (owner: 10Gergő Tisza) [15:15:58] !log demon Synchronized php-1.25wmf1/extensions/Wikidata: (no message) (duration: 00m 11s) [15:16:01] <^d> aude: And there's your other bit ^ [15:16:03] * aude checks [15:16:03] Logged the message, Master [15:16:08] <^d> James_F: Good. yw :) [15:16:28] looks good [15:16:34] <^d> Still no yurik about? [15:16:46] * ^d will wait a little longer before pushing his to the next swat [15:20:54] (03PS3) 10Chad: Reduce file size of wikipedia favicon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/162538 (owner: 10Jhobs) [15:21:19] (03CR) 10Chad: "No idea why gerrit reported a conflict, PS3 rebased locally so it works again." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/162538 (owner: 10Jhobs) [15:21:20] anomie: Lame. I haven't used it from the browser or to generate a key. I got an invite to keybase.io last night and played with their cli tool a tiny bit. [15:23:21] manybubbles, seems like noone is out there with a retina :) [15:23:31] yurikR: ha! [15:24:16] i say lets go ahead and merge, and we can always fix it if something comes up quickly enough [15:24:22] <^d> What about retina? [15:24:34] not like we will be horrible, justslightly less than perfect [15:25:42] which patch is this? [15:26:00] some icon :-D [15:26:07] the icon chang? I cc'd Jared on it, he said he was going to comment about how it didn't look good on retina [15:26:17] I trusted he would, did he not? [15:26:35] <^d> I have it open in a retina right now. [15:26:47] lets $wgScrewUpIConsUnderRetina = false; [15:26:48] ^d: you need special designer eyes, i think. [15:27:10] <^d> I was going to say, I must not have the right fontnerd package installed. [15:27:13] <^d> I don't see a difference. [15:27:44] well, he didn't comment? then I'll assume he's ok with it [15:27:48] all I can do [15:27:58] <^d> It looks...the same. [15:28:05] <^d> I really can't see any perceptible difference. [15:28:45] greg-g, https://gerrit.wikimedia.org/r/#/c/162538/ [15:29:23] good, but its 5 times smaller :) [15:29:30] ^d: did you test on a live install, or just in gerrit? [15:29:32] (03CR) 10Chad: [C: 032] "I don't see any noticeable different on my retina displays between old/new. Going ahead with this." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/162538 (owner: 10Jhobs) [15:29:35] i hope we get the gzip fixed soon enough [15:29:39] (03Merged) 10jenkins-bot: Reduce file size of wikipedia favicon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/162538 (owner: 10Jhobs) [15:30:06] <^d> greg-g: Gerrit wouldn't make a difference. It just spits out the .ico for your browser to render. [15:30:46] * greg-g was just parroting jared [15:30:54] <^d> https://gerrit.wikimedia.org/r/cat/162538%2C3%2Cdocroot/bits/favicon/wikipedia.ico%5E1 and https://gerrit.wikimedia.org/r/cat/162538%2C3%2Cdocroot/bits/favicon/wikipedia.ico%5E0 [15:33:02] !log demon Synchronized docroot/bits/favicon/wikipedia.ico: Favicons are my favorite icons, especially when they're only 18% of the size of the original (duration: 00m 04s) [15:33:07] Logged the message, Master [15:35:02] ^d good msg :) [15:35:18] <^d> :) [15:35:36] wonder if you want to put it through purgeList [15:36:07] echo "http://bits.wikimedia.org/favicon/wikipedia.ico' | mwscript purgeList.php --wiki=aawiki [15:36:11] echo "https://bits.wikimedia.org/favicon/wikipedia.ico' | mwscript purgeList.php --wiki=aawiki [15:36:50] (03PS2) 10Filippo Giunchedi: use scap's embedded linking, remove lint script [puppet] - 10https://gerrit.wikimedia.org/r/160691 (https://bugzilla.wikimedia.org/68255) [15:37:00] (03CR) 10Filippo Giunchedi: use scap's embedded linking, remove lint script (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/160691 (https://bugzilla.wikimedia.org/68255) (owner: 10Filippo Giunchedi) [15:37:05] (03PS1) 10Alexandros Kosiaris: deploy pollux as codfw corp ldap [puppet] - 10https://gerrit.wikimedia.org/r/163867 [15:37:24] <^d> Reedy: No real preference. [15:37:30] (03CR) 10Sjoerddebruin: [C: 031] "Looks good for me, screenshots:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/162638 (https://bugzilla.wikimedia.org/70996) (owner: 10Glaisher) [15:37:35] copy paste, enter :) [15:38:37] <^d> copy + alttab back to screen + paste + enter. [15:40:00] Reedy, what does that do? invalidates varnish? [15:40:09] purges it from varnish, yah [15:41:07] can we do it the same way with a regex? [15:42:49] !log rebooting mexia [15:42:53] Logged the message, Master [15:43:30] eh? [15:43:43] You could get ops to ban it from varnish too [15:45:21] !log Updating our Jenkins job builder fork 686265a..ee80dbc (no job changed) [15:45:26] Logged the message, Master [15:48:27] (03CR) 10Aude: [C: 031] Add wikidatawiki to wgAppleTouchIcon and add wikidata.png to bits [mediawiki-config] - 10https://gerrit.wikimedia.org/r/162638 (https://bugzilla.wikimedia.org/70996) (owner: 10Glaisher) [15:49:04] ori: don't forget qchris's comment here [15:49:04] https://gerrit.wikimedia.org/r/#/c/157841/1/XAnalytics.php [15:49:08] needs response too I guess [15:49:58] Reedy, sorry, lost connection for a sec. I heard you can do regex-based purging from php [15:50:04] says who? [15:50:18] I think you just get ops to do it at that point [15:51:13] (03CR) 10Reedy: "Noting this might not work fully without https://gerrit.wikimedia.org/r/#/c/147488/ - It should be linked in the HTML, but wikidata.org/ap" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/162638 (https://bugzilla.wikimedia.org/70996) (owner: 10Glaisher) [15:51:21] (03CR) 10Reedy: [C: 031] Add wikidatawiki to wgAppleTouchIcon and add wikidata.png to bits [mediawiki-config] - 10https://gerrit.wikimedia.org/r/162638 (https://bugzilla.wikimedia.org/70996) (owner: 10Glaisher) [15:59:00] robh: I may have made a mistake when quoting stats for the virt servers. I just looked, and the virt100x servers use raid10. So I have have been off with my storage request. [15:59:19] The config is modules/install-server/files/autoinstall/partman/virt-raid10-cisco.cfg [15:59:22] (03PS1) 10Jgreen: DMARC parser script [puppet] - 10https://gerrit.wikimedia.org/r/163869 [15:59:27] can you tell from reading that what the multiple should be? [16:02:01] (03PS2) 10Jgreen: DMARC parser script [puppet] - 10https://gerrit.wikimedia.org/r/163869 [16:03:34] (03PS3) 10Jgreen: DMARC parser script [puppet] - 10https://gerrit.wikimedia.org/r/163869 [16:05:20] (03PS1) 10Dzahn: add AAAA record for phabricator [dns] - 10https://gerrit.wikimedia.org/r/163870 [16:06:38] (03PS2) 10Rush: add AAAA record for phabricator [dns] - 10https://gerrit.wikimedia.org/r/163870 (owner: 10Dzahn) [16:06:43] (03CR) 10Rush: [C: 031] add AAAA record for phabricator [dns] - 10https://gerrit.wikimedia.org/r/163870 (owner: 10Dzahn) [16:11:26] (03PS3) 10Dzahn: add AAAA record for phabricator [dns] - 10https://gerrit.wikimedia.org/r/163870 [16:19:29] (03PS1) 10Ottomata: Add python-matplotlib on stat servers [puppet] - 10https://gerrit.wikimedia.org/r/163872 [16:20:16] _joe_: if still around… do you know about running master reports in puppet? I had one set up on virt1000 which seems to have stopped working with the upgrade to 3. [16:20:57] (03CR) 10Ottomata: [C: 032] Add python-matplotlib on stat servers [puppet] - 10https://gerrit.wikimedia.org/r/163872 (owner: 10Ottomata) [16:27:11] PROBLEM - puppet last run on helium is CRITICAL: CRITICAL: Puppet has 1 failures [16:29:53] (03PS4) 10Christopher Johnson (WMDE): Abstracts Sprint install with defined resource type phabricator::libext Change-Id: If2e66a090581e10e350f3e7f9795e3b43c6b25da [puppet] - 10https://gerrit.wikimedia.org/r/162873 [16:33:11] PROBLEM - puppet last run on stat1002 is CRITICAL: CRITICAL: Puppet has 1 failures [16:34:36] (03CR) 10Dzahn: [C: 032] "curl -6 -I https://misc-web-lb.eqiad.wikimedia.org -H "host:phabricator.wikimedia.org"" [dns] - 10https://gerrit.wikimedia.org/r/163870 (owner: 10Dzahn) [16:38:48] (03PS1) 10Ori.livneh: Labs: Specify hostname rather than IP for jobqueue redis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163874 [16:38:58] (03PS2) 10Ori.livneh: Labs: Specify hostname rather than IP for jobqueue redis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163874 [16:39:03] (03CR) 10Ori.livneh: [C: 032] Labs: Specify hostname rather than IP for jobqueue redis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163874 (owner: 10Ori.livneh) [16:39:10] (03Merged) 10jenkins-bot: Labs: Specify hostname rather than IP for jobqueue redis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163874 (owner: 10Ori.livneh) [16:41:36] (03CR) 10BryanDavis: "Mark said on irc that he was fine with this change as long as it was only applied in beta and not production (which is what the patch curr" [puppet] - 10https://gerrit.wikimedia.org/r/143788 (https://bugzilla.wikimedia.org/60690) (owner: 10BryanDavis) [16:49:15] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/). [16:55:47] hm… chasemp, bblack, has either of you ever written a custom puppet report? (Or anyone else, for that matter?) [16:55:59] i have [16:56:14] ori! Great! So… I have one installed on virt1000 [16:56:18] and it used to work, and now it does nothing [16:56:26] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [16:56:27] I have some big loud debug lines in it, it's pretty clear it's never getting called at all. [16:56:41] want to log in and see what you can see? [16:56:57] the report is /var/lib/puppet/lib/puppet/reports/labsstatus.rb [16:58:51] Hey subbu, bd808 - apparently I'm all alone for SWATs during the mornings of October 13-17, want to help out? :D [16:59:11] If there are other deployers around who might also be awake then you could bother them too. [17:00:45] marktraceur: I'll be in San Diego that week along with the other morning SWATers [17:00:49] Oh right, fail [17:01:44] the number of non-core SWAT members is TOO DAM LOW [17:02:43] greg-g: Time for a recruiting drive / conscription campaign [17:03:05] I just did one a couple months ago! [17:03:07] andrewbogott: some wikidata issue came up but i'll look right afterwards [17:03:11] but es [17:03:13] +y [17:03:23] ugh, emails to write [17:04:06] ori: np, thank you! [17:04:45] greg-g: Also add that EU swat window so you can draw from a large pool! [17:06:04] bd808: that's going to be the carrot [17:06:16] "Want an EU SWAT timeslot? be a part of SWAT!" [17:06:28] marktraceur, oh .. i've never done non-parsoid deploys .. but, maybe that is a good way to learn? [17:06:39] Hrm [17:06:51] or maybe not. [17:06:54] (03PS13) 10Catrope: Citoid puppetization [puppet] - 10https://gerrit.wikimedia.org/r/163068 [17:07:02] subbu: I mean, yes, but probably not as the point person [17:07:18] right. [17:07:30] Isn't ebernhardson a deployer? Can he halp? [17:08:44] if he's up at that hour [17:09:11] marktraceur: I hear there is a guy named Reedy that can deploy stuff. Also he has an apprentice in twentyafterfour (who is not idling here?) [17:10:21] And maybe shared screen training is something that swat folks could start doing. knowledge is power and all that [17:10:41] asciinema.org FTW [17:11:48] that would be nice, but real-time paring might be ever better [17:12:23] I think that failed horribly when Sam and Mukunda tried to do it on the prod bastion though [17:12:42] Something about screen not being suid so attaches failed [17:13:01] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.0133333333333 [17:13:14] the shared screen thing? yeah I remember the conversation, lame workaround is to be the same user [17:13:52] tmuch can do it with just a chmod of the session socket [17:13:55] (03PS1) 10Jgreen: really fix OCG log stream collation [puppet] - 10https://gerrit.wikimedia.org/r/163878 [17:13:56] *tmux [17:14:08] bd808: I think Reedy will be at the offsite too? [17:14:29] (03CR) 10Plucas: "bump" [debs/kafka] - 10https://gerrit.wikimedia.org/r/162458 (owner: 10Plucas) [17:15:08] marktraceur: Not with us in SD as far as I know. [17:15:16] Ah. [17:15:17] Well, then. [17:15:22] (03CR) 10Jgreen: [C: 032 V: 031] really fix OCG log stream collation [puppet] - 10https://gerrit.wikimedia.org/r/163878 (owner: 10Jgreen) [17:18:33] (03PS4) 10Jgreen: DMARC parser script [puppet] - 10https://gerrit.wikimedia.org/r/163869 [17:18:35] (03PS2) 10Jgreen: really fix OCG log stream collation [puppet] - 10https://gerrit.wikimedia.org/r/163878 [17:20:39] marktraceur: it's just MW Core Team (plus me, because reasons) [17:20:56] greg-g: I thought that was a superset of people named Reedy! I guess not [17:21:30] marktraceur: go update the orgchart [17:21:31] PROBLEM - puppet last run on analytics1021 is CRITICAL: CRITICAL: Puppet last ran 944688 seconds ago, expected 14400 [17:21:32] (03Abandoned) 10Jgreen: DMARC parser script [puppet] - 10https://gerrit.wikimedia.org/r/163869 (owner: 10Jgreen) [17:21:33] ;) [17:21:40] * marktraceur looks wistfully at the orgchart [17:21:49] (03Abandoned) 10Jgreen: really fix OCG log stream collation [puppet] - 10https://gerrit.wikimedia.org/r/163878 (owner: 10Jgreen) [17:22:03] alantz requested a fix recently. I haven't done it. [17:22:41] has a certain feeeling which new addition it is [17:23:11] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.0498338870432 [17:23:31] RECOVERY - puppet last run on analytics1021 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [17:23:56] (03PS1) 10Jgreen: really fix OCG log collation (redo) [puppet] - 10https://gerrit.wikimedia.org/r/163880 [17:24:29] ottomata: replying now, sorry [17:25:05] (03CR) 10Jgreen: [C: 032 V: 031] really fix OCG log collation (redo) [puppet] - 10https://gerrit.wikimedia.org/r/163880 (owner: 10Jgreen) [17:26:13] (03PS1) 10Jgreen: dmarc_parser added (redo) [puppet] - 10https://gerrit.wikimedia.org/r/163881 [17:29:46] ori, np, just saw it updated in my queue and wanted to poke it :) [17:29:56] ottomata: i replied [17:33:44] danke! :) [17:33:52] PROBLEM - puppet last run on amslvs3 is CRITICAL: CRITICAL: puppet fail [17:52:18] RECOVERY - puppet last run on amslvs3 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [17:52:34] (03CR) 10Ori.livneh: "Example trace: https://dpaste.de/5OYg/raw" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163680 (owner: 10Ori.livneh) [17:53:24] (03CR) 10Krinkle: "This is not for "browser tests" (integration tests for the application from the front-end). This is for running qunit tests. We currently " [puppet] - 10https://gerrit.wikimedia.org/r/163791 (owner: 10Krinkle) [17:55:23] (03CR) 10Aude: [C: 031] HHVM: capture & log traces for catchable fatals. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163680 (owner: 10Ori.livneh) [17:56:18] (03CR) 10Krinkle: "And another important reason: Like PhantomJS, it is important for productivity of developers that unit tests are easy to run locally with " [puppet] - 10https://gerrit.wikimedia.org/r/163791 (owner: 10Krinkle) [17:58:25] (03CR) 10BryanDavis: [C: 031] "How many +1s before a +2?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163680 (owner: 10Ori.livneh) [18:00:03] (03PS2) 10Ori.livneh: HHVM: capture & log traces for catchable fatals. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163680 [18:00:04] Reedy, greg-g: Dear anthropoid, the time has come. Please deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140930T1800). [18:00:13] (03CR) 10Ori.livneh: [C: 032] HHVM: capture & log traces for catchable fatals. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163680 (owner: 10Ori.livneh) [18:00:19] (03Merged) 10jenkins-bot: HHVM: capture & log traces for catchable fatals. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163680 (owner: 10Ori.livneh) [18:01:25] (03CR) 10Ori.livneh: [C: 032] HHVM: Set a fatal handler that logs traces [puppet] - 10https://gerrit.wikimedia.org/r/163686 (owner: 10Ori.livneh) [18:02:35] (03PS2) 10Ottomata: Experiment with different topic_request_required_acks settings [puppet] - 10https://gerrit.wikimedia.org/r/163744 (owner: 10QChris) [18:04:05] (03PS3) 10Ottomata: Experiment with different topic_request_required_acks settings [puppet] - 10https://gerrit.wikimedia.org/r/163744 (owner: 10QChris) [18:05:04] what's up? [18:05:15] !log ori Synchronized wmf-config/HHVMRequestInit.php: (no message) (duration: 00m 07s) [18:05:20] Logged the message, Master [18:05:38] Wikipedia down for everyone or just me? [18:05:53] there's a website for that [18:05:55] (03PS1) 10Aude: Bump cache epoch for wikidata, due to dom changes to item page [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163885 [18:05:57] Nevermind, it's back. [18:06:07] it was down for me, too [18:06:15] http://www.downforeveryoneorjustme.com/en.wikipedia.org [18:06:34] ragesoss: btw, can you try account creation one more time? i'm 99% sure it's fixed now [18:07:06] (I got a 404 on https://www.mediawiki.org/wiki/Special:Preferences and 2 refreshes. the 3rd refresh got it to launch ok.) [18:07:09] ori: I don't actually need to create an account now, but sure. [18:07:41] ragesoss: you can never have too many sock puppets [18:07:41] I don't see anything obvious in gdash, and our LVS stats didn't take a hit [18:09:41] ori: woohooo! [18:09:47] I feel like a real capitalist. [18:09:49] ragesoss: thanks :) [18:09:51] I'm in the 1%. [18:09:52] [5c89fa05] 2014-09-30 18:09:24: Fatal exception of type MWException [18:09:58] wait [18:09:58] oh [18:10:01] :( [18:11:08] ragesoss: what's that from? [18:11:28] from trying to create an account [18:11:34] for another person via hhvm [18:11:37] there's a bug in core [18:11:41] oh [18:12:01] note: all downtime reports from people using hhvm should have a big huge [I'm on HHVM] tag :p [18:12:36] bblack: I only see a small one :p [18:13:04] (03CR) 10Plucas: "Actually, I am about to post another change to this repo, so I will cancel this review." [debs/kafka] - 10https://gerrit.wikimedia.org/r/162458 (owner: 10Plucas) [18:13:34] JohnFLewis: https://dpaste.de/j8dY/raw [18:14:17] ori: that's better :D [18:14:28] /nick $nick(HHVM) [18:14:32] (03PS1) 10BryanDavis: beta: update mwdeploy sudo grants [puppet] - 10https://gerrit.wikimedia.org/r/163886 [18:15:07] (03PS1) 10Reedy: Non wikipedias to 1.25wmf1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163888 [18:16:04] mutante: heh [18:16:20] ragesoss: try one more time and i owe you a drink at wikimania 2015 [18:16:26] ragesoss: well, several [18:17:25] ori: worked that time. I'm one of the plebs again. [18:17:49] :) thanks [18:18:09] (03CR) 10Reedy: [C: 032] Non wikipedias to 1.25wmf1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163888 (owner: 10Reedy) [18:18:10] np [18:18:59] (03Merged) 10jenkins-bot: Non wikipedias to 1.25wmf1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163888 (owner: 10Reedy) [18:19:19] Reedy: https://gerrit.wikimedia.org/r/#/c/163885/ for wikidata [18:19:43] (03CR) 10BryanDavis: [C: 04-1] "Needs a manual rebase. Had to fiddle with the cherry-pick in beta to keep up with operations/puppet production branch." [puppet] - 10https://gerrit.wikimedia.org/r/158016 (https://bugzilla.wikimedia.org/70181) (owner: 10Dduvall) [18:20:31] (03PS2) 10Reedy: Bump cache epoch for wikidata, due to dom changes to item page [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163885 (owner: 10Aude) [18:20:35] (03CR) 10Reedy: [C: 032] Bump cache epoch for wikidata, due to dom changes to item page [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163885 (owner: 10Aude) [18:20:45] (03Merged) 10jenkins-bot: Bump cache epoch for wikidata, due to dom changes to item page [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163885 (owner: 10Aude) [18:20:47] thanks [18:20:50] (03PS1) 10Plucas: Add option for extra classpath entries [puppet/kafka] - 10https://gerrit.wikimedia.org/r/163890 [18:21:14] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non wikipedias to 1.25wmf1 [18:21:19] Logged the message, Master [18:21:41] (03CR) 10Plucas: "I was wrong, I don't need to make another change in debs/kafka. (Only puppet/kafka required an update: https://gerrit.wikimedia.org/r/#/c/" [debs/kafka] - 10https://gerrit.wikimedia.org/r/162458 (owner: 10Plucas) [18:22:34] andrewbogott: ok, i'll look at the virt1000 reporter now.. could you look at https://gerrit.wikimedia.org/r/#/c/163886/ ? :) [18:22:37] Nemo_bis: https://bugzilla.wikimedia.org/show_bug.cgi?id=71367 is that your fault? :P [18:23:04] !log reedy Synchronized wmf-config/: (no message) (duration: 00m 16s) [18:23:09] Logged the message, Master [18:23:28] (03CR) 10Andrew Bogott: [C: 032] beta: update mwdeploy sudo grants [puppet] - 10https://gerrit.wikimedia.org/r/163886 (owner: 10BryanDavis) [18:24:00] hey thanks andrewbogott. I was just cherry picking that in beta :O) [18:24:10] andrewbogott: thanks! are you seeing an error for that reporter or is it just silently not getting called? [18:24:18] ori: The latter [18:24:27] unless I'm looking in the wrong place... [18:24:40] ori: if you comment out the echos in the report it should log to /var/log. That used to work... [18:24:42] !log killing silver from icinga and puppet [18:24:47] Logged the message, Master [18:27:35] ACKNOWLEDGEMENT - Kafka Broker Messages In on analytics1021 is CRITICAL: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate CRITICAL: 2.18060312817e-103 daniel_zahn https://bugzilla.wikimedia.org/show_bug.cgi?id=69667 [18:28:21] (03CR) 10Reedy: "Needs rebasing" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/162638 (https://bugzilla.wikimedia.org/70996) (owner: 10Glaisher) [18:30:05] (03PS4) 10Reedy: Remove dead code for amwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163148 (owner: 10Bartosz Dziewoński) [18:30:10] (03CR) 10Reedy: [C: 032] Remove dead code for amwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163148 (owner: 10Bartosz Dziewoński) [18:30:17] (03Merged) 10jenkins-bot: Remove dead code for amwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163148 (owner: 10Bartosz Dziewoński) [18:31:18] (03CR) 10Reedy: [C: 04-1] "Needs rebasing" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163554 (owner: 10Aude) [18:31:19] ACKNOWLEDGEMENT - HTTP on virt0 is CRITICAL: Connection refused daniel_zahn andrewbogott: - virt1000 move [18:31:20] ACKNOWLEDGEMENT - puppetmaster https on virt0 is CRITICAL: Connection refused daniel_zahn andrewbogott: - virt1000 move [18:31:34] (03CR) 10Steinsplitter: [C: 031] "Can we merge this please? Use asked my on irc why this domain is not witelisted." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/162556 (https://bugzilla.wikimedia.org/71191) (owner: 10Steinsplitter) [18:31:36] (03CR) 10Reedy: [C: 04-1] Add wikidatawiki to wgAppleTouchIcon and add wikidata.png to bits [mediawiki-config] - 10https://gerrit.wikimedia.org/r/162638 (https://bugzilla.wikimedia.org/70996) (owner: 10Glaisher) [18:31:45] aaaaah [18:31:49] rebase hell [18:31:59] Reedy: can you pls https://gerrit.wikimedia.org/r/162556 [18:32:12] ACKNOWLEDGEMENT - Exim SMTP on mchenry is CRITICAL: Connection refused daniel_zahn RT #1804 [18:32:14] I don't get why gerrit won't rebase them :( [18:32:34] (03CR) 10Reedy: [C: 04-1] "Needs rebasing" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/162556 (https://bugzilla.wikimedia.org/71191) (owner: 10Steinsplitter) [18:32:38] Ok, seriously [18:32:39] w [18:32:39] t [18:32:40] f [18:32:47] Y U NO REBASE GERRIT [18:32:58] omg. sorry [18:33:00] * Steinsplitter hides [18:33:06] ACKNOWLEDGEMENT - BGP status on cr1-eqiad is CRITICAL: Return code of -1 is out of bounds daniel_zahn monitoring issue, works manually from neon [18:33:06] ACKNOWLEDGEMENT - BGP status on cr1-esams is CRITICAL: Return code of -1 is out of bounds daniel_zahn monitoring issue, works manually from neon [18:33:06] ACKNOWLEDGEMENT - BGP status on cr2-eqiad is CRITICAL: Return code of -1 is out of bounds daniel_zahn monitoring issue, works manually from neon [18:33:06] ACKNOWLEDGEMENT - BGP status on csw1-esams is CRITICAL: Return code of -1 is out of bounds daniel_zahn monitoring issue, works manually from neon [18:33:07] ACKNOWLEDGEMENT - BGP status on csw2-esams is CRITICAL: Return code of -1 is out of bounds daniel_zahn monitoring issue, works manually from neon [18:33:11] Steinsplitter: It's not you [18:33:27] ohgood. thanks :P [18:33:41] It's almost like someone merged a big whitespace patch [18:33:47] But I can't see one obviously merged [18:33:57] It's very likely it'll just rebase fine locally [18:34:15] Reedy: no, SPQRobin did that https://bugzilla.wikimedia.org/show_bug.cgi?id=71367 [18:34:27] Is it definitely that change? ;) [18:34:56] Reedy: but https://gerrit.wikimedia.org/r/163553 awaits merge :) [18:35:08] * Reedy bets it won't rebase [18:35:20] (03CR) 10Reedy: [C: 04-1] "Needs rebasing" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163553 (https://bugzilla.wikimedia.org/71403) (owner: 10Nemo bis) [18:35:30] That's why it must be merged as soon as possible" [18:35:35] :p [18:35:35] Nemo_bis: Well, no [18:35:41] gerrit is being a PITA tonight [18:35:49] even simple line additions won't rebase [18:36:25] yeah :/ [18:36:55] (03PS2) 10Nemo bis: Disable local uploads where unused, per local request [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163553 (https://bugzilla.wikimedia.org/71403) [18:37:04] (03PS5) 10Dzahn: Add *.nijmegen.nl to wgCopyUploadsDomains. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/162556 (https://bugzilla.wikimedia.org/71191) (owner: 10Steinsplitter) [18:37:10] (03CR) 10Reedy: [C: 032] Disable local uploads where unused, per local request [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163553 (https://bugzilla.wikimedia.org/71403) (owner: 10Nemo bis) [18:37:15] Reedy: it didn't even need any manual editing, I just did $ git rebase -i origin/master [18:37:17] (03Merged) 10jenkins-bot: Disable local uploads where unused, per local request [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163553 (https://bugzilla.wikimedia.org/71403) (owner: 10Nemo bis) [18:37:22] Yeah [18:37:26] [19:33:57] It's very likely it'll just rebase fine locally [18:37:30] :) [18:37:49] wait, i did the same [18:37:53] I guess jgit is just being full of fail [18:37:53] but you still cant merge it, heh [18:37:54] every time this happens I don't know whether to be happy for work saved or unhappy for gerrit's jokes [18:38:00] thx mutante [18:38:01] (03PS5) 10Aude: Add wikidatawiki to wgAppleTouchIcon and add wikidata.png to bits [mediawiki-config] - 10https://gerrit.wikimedia.org/r/162638 (https://bugzilla.wikimedia.org/70996) (owner: 10Glaisher) [18:38:09] git review -d 162638 && git rebase master [18:38:13] no problem there [18:38:17] "The change could not be rebased due to a path conflict during merge." [18:38:26] yea, but then hit "review" and see the greyed-out button [18:38:34] aude: I can codereview again? [18:38:37] sjoerddebruin: yes! [18:38:47] (03CR) 10Sjoerddebruin: [C: 031] Add wikidatawiki to wgAppleTouchIcon and add wikidata.png to bits [mediawiki-config] - 10https://gerrit.wikimedia.org/r/162638 (https://bugzilla.wikimedia.org/70996) (owner: 10Glaisher) [18:38:52] :) [18:38:55] :) [18:38:56] sjoerddebruin: It won't merge though :P [18:39:05] Reedy: between PS4 and PS5 a new jenkins check appeared on that [18:39:14] apache-lint is back, heh [18:39:14] ^d: Any idea why jgit is being even more fail than usual on mediawiki-config? [18:39:25] but we are not on the repo that has apache config [18:39:38] (03PS2) 10Aude: Add wikidatawiki and test wikis (e.g. test2) to wikidataclient.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163554 [18:39:40] git rebase master again [18:39:52] (03PS3) 10Reedy: Add wikidatawiki and test wikis (e.g. test2) to wikidataclient.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163554 (owner: 10Aude) [18:39:57] (03CR) 10Reedy: [C: 032] Add wikidatawiki and test wikis (e.g. test2) to wikidataclient.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163554 (owner: 10Aude) [18:39:59] rebase button workd [18:40:00] lol [18:40:07] (03Merged) 10jenkins-bot: Add wikidatawiki and test wikis (e.g. test2) to wikidataclient.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163554 (owner: 10Aude) [18:40:09] (03PS6) 10Reedy: Add wikidatawiki to wgAppleTouchIcon and add wikidata.png to bits [mediawiki-config] - 10https://gerrit.wikimedia.org/r/162638 (https://bugzilla.wikimedia.org/70996) (owner: 10Glaisher) [18:40:09] heh [18:40:14] Now it's working again? [18:40:14] wtf, i just hit that too [18:40:19] <^d> no clue. [18:40:21] (03CR) 10Reedy: [C: 032] Add wikidatawiki to wgAppleTouchIcon and add wikidata.png to bits [mediawiki-config] - 10https://gerrit.wikimedia.org/r/162638 (https://bugzilla.wikimedia.org/70996) (owner: 10Glaisher) [18:40:28] (03Merged) 10jenkins-bot: Add wikidatawiki to wgAppleTouchIcon and add wikidata.png to bits [mediawiki-config] - 10https://gerrit.wikimedia.org/r/162638 (https://bugzilla.wikimedia.org/70996) (owner: 10Glaisher) [18:40:33] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There are 3 unmerged changes in mediawiki_config (dir /srv/mediawiki-staging/). [18:41:19] (03PS2) 10Reedy: Add metilli.com to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163821 (https://bugzilla.wikimedia.org/71460) (owner: 10Mushroom) [18:41:25] (03CR) 10Reedy: [C: 032] Add metilli.com to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163821 (https://bugzilla.wikimedia.org/71460) (owner: 10Mushroom) [18:41:32] (03Merged) 10jenkins-bot: Add metilli.com to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163821 (https://bugzilla.wikimedia.org/71460) (owner: 10Mushroom) [18:41:45] !log pc1001-1003 - can't generate tmp files for percona monitoring checks -> puppet fail [18:41:50] Logged the message, Master [18:42:12] (03CR) 10Reedy: [C: 04-1] "Needs rebase" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/157062 (https://bugzilla.wikimedia.org/70169) (owner: 10Tpt) [18:43:40] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [18:43:43] We need a rebase bot :) [18:44:27] ACKNOWLEDGEMENT - puppet last run on pc1001 is CRITICAL: CRITICAL: Puppet has 1 failures daniel_zahn https://rt.wikimedia.org/Ticket/Display.html?id=8487 [18:44:30] ACKNOWLEDGEMENT - puppet last run on pc1002 is CRITICAL: CRITICAL: Puppet has 1 failures daniel_zahn https://rt.wikimedia.org/Ticket/Display.html?id=8487 [18:44:30] ACKNOWLEDGEMENT - puppet last run on pc1003 is CRITICAL: CRITICAL: Puppet has 1 failures daniel_zahn https://rt.wikimedia.org/Ticket/Display.html?id=8487 [18:44:49] (03PS4) 10Ottomata: Experiment with different topic_request_required_acks settings [puppet] - 10https://gerrit.wikimedia.org/r/163744 (owner: 10QChris) [18:44:58] (03PS2) 10Aude: Removes hardcoded list of linked wikis in the "other projects" sidebar [mediawiki-config] - 10https://gerrit.wikimedia.org/r/157062 (https://bugzilla.wikimedia.org/70169) (owner: 10Tpt) [18:45:20] (03PS6) 10Reedy: Add *.nijmegen.nl to wgCopyUploadsDomains. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/162556 (https://bugzilla.wikimedia.org/71191) (owner: 10Steinsplitter) [18:45:34] (03CR) 10Reedy: [C: 032] Add *.nijmegen.nl to wgCopyUploadsDomains. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/162556 (https://bugzilla.wikimedia.org/71191) (owner: 10Steinsplitter) [18:45:41] (03Merged) 10jenkins-bot: Add *.nijmegen.nl to wgCopyUploadsDomains. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/162556 (https://bugzilla.wikimedia.org/71191) (owner: 10Steinsplitter) [18:46:07] (03PS3) 10Reedy: Removes hardcoded list of linked wikis in the "other projects" sidebar [mediawiki-config] - 10https://gerrit.wikimedia.org/r/157062 (https://bugzilla.wikimedia.org/70169) (owner: 10Tpt) [18:46:13] (03CR) 10Reedy: [C: 032] Removes hardcoded list of linked wikis in the "other projects" sidebar [mediawiki-config] - 10https://gerrit.wikimedia.org/r/157062 (https://bugzilla.wikimedia.org/70169) (owner: 10Tpt) [18:46:20] (03Merged) 10jenkins-bot: Removes hardcoded list of linked wikis in the "other projects" sidebar [mediawiki-config] - 10https://gerrit.wikimedia.org/r/157062 (https://bugzilla.wikimedia.org/70169) (owner: 10Tpt) [18:47:35] (03PS2) 10Reedy: wgMemoryLimit to 300MB [mediawiki-config] - 10https://gerrit.wikimedia.org/r/162959 [18:48:25] (03CR) 10Reedy: [C: 032] wgMemoryLimit to 300MB [mediawiki-config] - 10https://gerrit.wikimedia.org/r/162959 (owner: 10Reedy) [18:48:34] (03Merged) 10jenkins-bot: wgMemoryLimit to 300MB [mediawiki-config] - 10https://gerrit.wikimedia.org/r/162959 (owner: 10Reedy) [18:49:17] ACKNOWLEDGEMENT - puppet last run on stat1002 is CRITICAL: CRITICAL: Puppet has 1 failures daniel_zahn https://rt.wikimedia.org/Ticket/Display.html?id=8489 [18:49:41] !log reedy Synchronized database lists: (no message) (duration: 00m 15s) [18:49:46] Logged the message, Master [18:50:01] !log reedy Synchronized wmf-config/: (no message) (duration: 00m 16s) [18:50:01] (03CR) 10Ottomata: [C: 032] Experiment with different topic_request_required_acks settings [puppet] - 10https://gerrit.wikimedia.org/r/163744 (owner: 10QChris) [18:51:09] (03PS1) 10Ottomata: Fix comment in cache.pp about $topic_request_required_acks [puppet] - 10https://gerrit.wikimedia.org/r/163894 [18:51:14] (03CR) 10jenkins-bot: [V: 04-1] Fix comment in cache.pp about $topic_request_required_acks [puppet] - 10https://gerrit.wikimedia.org/r/163894 (owner: 10Ottomata) [18:51:46] 6 Fatal error: Cannot use object of type stdClass as array in /srv/mediawiki/php-1.25wmf1/includes/specialpage/SpecialPageFactory.php on line 281 [18:51:50] * Reedy cries [18:53:00] MaxSem: ^^ :( [18:53:03] (03PS1) 10Ottomata: Not ensuring that python3-matplotlib is installed, not all stat machines are Trusty [puppet] - 10https://gerrit.wikimedia.org/r/163895 [18:53:31] Reedy, yeah - I poked you about that yesterday :| [18:53:50] (03PS2) 10Ottomata: Fix comment in cache.pp about $topic_request_required_acks [puppet] - 10https://gerrit.wikimedia.org/r/163894 [18:54:02] (03CR) 10Ottomata: [C: 032 V: 032] Not ensuring that python3-matplotlib is installed, not all stat machines are Trusty [puppet] - 10https://gerrit.wikimedia.org/r/163895 (owner: 10Ottomata) [18:54:14] (03PS3) 10Ottomata: Fix comment in cache.pp about $topic_request_required_acks [puppet] - 10https://gerrit.wikimedia.org/r/163894 [18:54:24] (03CR) 10Ottomata: [C: 032 V: 032] Fix comment in cache.pp about $topic_request_required_acks [puppet] - 10https://gerrit.wikimedia.org/r/163894 (owner: 10Ottomata) [18:55:21] MaxSem: I thought it went away :( [18:57:47] ACKNOWLEDGEMENT - Router interfaces on mr1-eqiad is CRITICAL: CRITICAL: host 10.65.0.1, interfaces up: 33, down: 3, dormant: 0, excluded: 0, unused: 0BRfe-0/0/3: down - BRfe-0/0/5: down - testBRfe-0/0/7: down - Layer42 OOB linkBR daniel_zahn https://rt.wikimedia.org/Ticket/Display.html?id=8490 [19:01:21] ACKNOWLEDGEMENT - Router interfaces on mr1-eqiad is CRITICAL: CRITICAL: host 10.65.0.1, interfaces up: 33, down: 3, dormant: 0, excluded: 0, unused: 0BRfe-0/0/3: down - BRfe-0/0/5: down - testBRfe-0/0/7: down - Layer42 OOB linkBR daniel_zahn https://rt.wikimedia.org/Ticket/Display.html?id=8491 [19:02:11] ACKNOWLEDGEMENT - Disk space on ms1001 is CRITICAL: DISK CRITICAL - free space: /export 298329 MB (0% inode=98%): daniel_zahn https://rt.wikimedia.org/Ticket/Display.html?id=8491 [19:02:36] RECOVERY - puppet last run on stat1002 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [19:02:42] the rest are baham, fenari, helium and nescio [19:04:06] RECOVERY - puppet last run on helium is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [19:04:17] :) [19:04:48] RECOVERY - Kafka Broker Messages In on analytics1021 is OK: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate OKAY: 4827.55779924 [19:05:19] ottomata: cool @ recoveries! a new one popped up though as well [19:05:29] on cp3021 [19:05:38] PROBLEM - puppet last run on amssq49 is CRITICAL: CRITICAL: Puppet has 1 failures [19:06:50] _joe_: I iz back. Thanks for covering for me. [19:07:03] <_joe_> Coren: that was godog [19:07:05] <_joe_> ;) [19:07:12] <_joe_> italians swapping [19:08:34] Heh. Thanks Italians! :-) [19:08:35] indeed [19:09:44] hm, checking mutante [19:11:52] ottomata: it's already gone again.. it was just there for 3 minutes.. Varnishkafka Delivery Errors and back to normal [19:12:16] it happened when the others recovered.. [19:12:58] ori: before i go off inventing anything(i need to check out a performance issue on beta labs), have you started anything in regards to hhvm + xhprof? [19:13:13] ebernhardson: no [19:13:15] ok [19:22:42] RECOVERY - puppet last run on amssq49 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [19:27:02] (03PS1) 10Andrew Bogott: In a puppet report, desc is a function and not a variable. [puppet] - 10https://gerrit.wikimedia.org/r/163902 [19:27:33] (03CR) 10Andrew Bogott: [C: 032] Include libnet-dns-perl with ferm. It's needed for certain ferm functions. [puppet] - 10https://gerrit.wikimedia.org/r/163597 (owner: 10Andrew Bogott) [19:28:13] (03CR) 10Andrew Bogott: [C: 032] In a puppet report, desc is a function and not a variable. [puppet] - 10https://gerrit.wikimedia.org/r/163902 (owner: 10Andrew Bogott) [19:29:12] PROBLEM - puppet last run on ocg1002 is CRITICAL: CRITICAL: puppet fail [19:31:13] PROBLEM - puppet last run on virt1000 is CRITICAL: CRITICAL: puppet fail [19:31:33] (03PS1) 10Andrew Bogott: Revert "Include libnet-dns-perl with ferm. It's needed for certain ferm functions." [puppet] - 10https://gerrit.wikimedia.org/r/163903 [19:32:21] (03CR) 10Andrew Bogott: [C: 032] Revert "Include libnet-dns-perl with ferm. It's needed for certain ferm functions." [puppet] - 10https://gerrit.wikimedia.org/r/163903 (owner: 10Andrew Bogott) [19:33:43] (03PS7) 10coren: Use the Ubuntu Way of installing SSL certs [puppet] - 10https://gerrit.wikimedia.org/r/163222 [19:34:12] RECOVERY - puppet last run on virt1000 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [19:35:02] PROBLEM - puppet last run on iodine is CRITICAL: CRITICAL: puppet fail [19:35:36] (03PS8) 10coren: Use the Ubuntu Way of installing SSL certs [puppet] - 10https://gerrit.wikimedia.org/r/163222 [19:36:34] (03CR) 10coren: [C: 031] "Rejiggered to remove the multitude changes to SSL configuration, opting to pupopulate /etc/ssl/certs/ with symlinks to the new location to" [puppet] - 10https://gerrit.wikimedia.org/r/163222 (owner: 10coren) [19:36:52] mark: ^^ works for you that way? [19:39:12] RECOVERY - puppet last run on ocg1002 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [19:44:51] Hm… now I supposed mark won't review that ssl patch until tomorrow, which is when virt0 is shutting off. [19:44:57] So… no backup ldap for a while. [19:50:16] andrewbogott: The patch is now umptillion times simpler; maybe he can find a few minutes for it? [19:50:45] If he's working at 9 at night [19:53:18] i don't find it simple at all [19:53:21] RECOVERY - puppet last run on iodine is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [19:53:58] (10 btw ;) [19:54:14] so why is this blocking the ldap work? not sure I understand that [19:55:38] to fix issues on labs instances? [19:55:56] I don't fully understand, Coren may be able to better explain. Our experience with gerrit was that when we switched it over to ldap-eqiad it couldn't cope with the change in certs. [19:56:11] And when we looked at how it was working before it was kind of a "How did this ever work?" scenario [19:56:47] Coren hotfixed the gerrit box with work similar to that patch, and gerrit got happy. [19:57:07] An experimental merge of a similar virt1000->ldap-eqiad change on ishmael produced a similar failure. [19:57:22] Labs is not blocked by this. [19:57:29] mark: gerrit, amongst other, refuses to allow root CAs that aren't in the "true" /etc/ssl/certs/ format with hashes, etc. [19:57:41] Right now lots of production services are using virt1000, virt0 as the ldap server. [19:57:55] well, more likely virt0, virt1000 [19:58:09] but either way, it shouldn't actually break anything to kill virt0, it just stops us from having an ldap backup [19:58:29] mark: This patch moves asides the normal server sides, and lets update-ca-certificates to its jobs, which makes gerrit happy to use the Globalsign certs of ldap-eqiad and ldap-codfw [19:58:49] there's no question that this patch is already loads better than the previous one in terms of granularity [19:59:11] but to say that I can say without question that this 88 diff line patch does the right thing on all servers and won't break anything, sorry, can't say that for sure [19:59:14] mark: The current patch leaves symlinks at the "old names" so that nothing else needs to be changed. [19:59:17] has this been tested? [19:59:24] that's the intention, yes [19:59:59] mark: it works on test instances (but, admitedly, in labs). The ensure => link always leaves the cert at the same spot the old class used to put them. [20:02:17] mark: To be entirely fair, this doesn't /block/; but it means that once virt0 goes down there will only be virt1000 usable by many clients with no backup. [20:03:23] mark: Oh, also, ytterbium is already setup like this (by hand) [20:03:33] mark: that was our "test case" to fix gerrit. [20:03:51] (aka panicky hotfix because it was broken) :-P [20:04:52] Looks like tendril is on neon. So that would be another place to try -- we could verify that this fix actually helps with >1 of the services I'm worried about. [20:05:15] oh, actually, tendril, icinga, ishmael, all on neon [20:05:28] (03PS1) 10Nemo bis: Remove one wiki included by mistake in aeacad551 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163906 (https://bugzilla.wikimedia.org/71403) [20:05:35] not that we really have a way to selectively apply a patch on one server [20:07:06] andrewbogott: not really, without going through the whole rimagole with a different puppetmaster. [20:08:39] mark: If nothing else, do you feel confident enough it the more limited scope to remove the -2 and let a couple of us scrutinize it? :-) [20:08:45] have you checked this latest patch with the compiler? [20:09:18] Anyway, mark, I would say that this is necessary but not urgent. So we don't have to rush through this particular patch, it could be broken into yet more pieces. [20:10:05] that's what I would have done anyway [20:10:10] the CA part separately for example [20:10:20] but yeah, this looks right [20:13:00] mark: I hadn't run it on the compiler yet; do you think it worthwhile to test for a couple of important nodes or for all? [20:13:07] a couple [20:14:11] andrewbogott: what do you think? neon, ytterbium, ldap-equiad and... and a proxy? [20:14:25] my -2 is gone [20:15:02] yeah, neon and virt1000, and something that touches prod wikis but I don't know what a good candidate is for that. [20:15:53] (03CR) 10Mark Bergsma: "This looks right, but I can't guarantee it." [puppet] - 10https://gerrit.wikimedia.org/r/163222 (owner: 10coren) [20:16:05] also the nginx boxes etc [20:20:48] neon.wikimedia.org,virt1000.wikimedia.org,stat1001.wikimedia.org,rcs1001.eqiad.wmnet,cp4001.ulsfo.wmnet sounds okay as a list? [20:21:44] Sounds ok to me. I haven't actually ever used the puppet compiler -- it predicts the changes on a given host with a given patch? [20:22:28] * Coren nods. Mostly used to test for null changes, but it does a good job of finding things like dependency errors and duplicate resources. [20:23:16] speaking of duplicate resources… what's the state of the art for including packages that might be redundant? [20:23:16] https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/389/ [20:25:15] Coren: I'm ok with the new version, but please coordinate when you merge (is that very soon now?) and I'll disable the puppet agent on the prod ssl frontends, update one manually and validate it, before releasing the rest. [20:25:45] bblack: I'm hoping it's soon, but it's no earlier than "the stakeholders are comfortable with it" [20:26:07] Like, for instance, you. :-) [20:26:07] hm, looks like the compiler is broken? [20:27:03] I'm as ok with it as I'm going to be. The thing with holding back puppet agent is something I do anytime I'm touching sensitive stuff on e.g. LVS, DNS, SSL boxes. It just makes me feel safer :) [20:27:05] andrewbogott: Looks like. :-( [20:27:44] Maybe I broke it when I fixed ldap... [20:27:51] ironically the puppet-compiler nodes were not puppetized [20:27:51] heh [20:29:15] I did do a puppet-compiler on the older version of the patch, just on the prod ssl boxes [20:29:17] andrewbogott: Lemme try with just the one node at a time, I may also be doing something wrong. [20:29:44] (which was here: http://puppet-compiler.wmflabs.org/388/change/163222/html/ ) [20:30:20] (that was PS6) [20:31:26] Hm. Well, it works for cp4001 when specified alone [20:31:35] http://puppet-compiler.wmflabs.org/390/change/163222/html/ [20:33:14] And there's clearly a bug. Which ima fix now. [20:33:22] (note that prod SSL comes in two flavors. they use the same certs and it should work out about the same. the ulsfo flavor will have a simpler diff, and the non-ulsfo ones have a bunch of extra unused certs currently) [20:34:00] changing the owner and mode of those certs is… safe because they're links to files that have the same owners and modes? [20:34:17] oh, wait, not links [20:34:56] I haven't changed any of the permission, I use the same they used to be. [20:35:29] Oh, I guess "type":​·​"File" can still mean 'link' [20:35:34] so, nm! [20:35:43] (03PS9) 10coren: Use the Ubuntu Way of installing SSL certs [puppet] - 10https://gerrit.wikimedia.org/r/163222 [20:36:00] Small fix, in one case I used the wrong variable naming the link. [20:38:05] (03CR) 10coren: [C: 031] "Fixed a bug picked up by the compiler (wrong variable used in naming a symlink)" [puppet] - 10https://gerrit.wikimedia.org/r/163222 (owner: 10coren) [20:39:07] andrewbogott: I can't seem to be able to compile against neon. "returned non-zero exit status 30" [20:40:23] * andrewbogott has no idea [20:40:59] could it be that just one of the compiler hosts is busted, and it's luck of the draw? [20:41:54] there's also some tricky things with manual updates needed on the compiler hosts [20:42:02] andrewbogott: It's possible, though I can't see where the individual jobs were sent. [20:42:05] as in, they need their catalog of facts re-imported from prod every so often [20:42:13] (don't ask me what that process is, _joe_ usually does it) [20:42:23] * Coren tries again with just neon. [20:42:38] http://puppet-compiler.wmflabs.org/391/change/163222/html/ looks good though [20:44:20] I still can't get the compiler to work against neon though. [20:47:27] Coren: unrelatedly… I want to shut down ldap on virt0 temporarily, to make sure that nothing unexpected and obvious breaks. [20:47:35] No time like the present, I guess... [20:47:51] andrewbogott: Better do that when it can still be turned back on. :-) [20:47:58] yep, that's what I'm thinking. [20:48:11] !log shutting down opendj on virt (temporary, a preview of tomorrow) [20:48:16] Logged the message, Master [20:48:16] (03PS1) 10Ori.livneh: Update require_package() from Vagrant [puppet] - 10https://gerrit.wikimedia.org/r/163912 [20:48:50] !log shutting down pdns on virt0 [20:48:55] Logged the message, Master [20:49:56] well… labs dns and login still work for me, at least [20:51:09] PROBLEM - Auth DNS on labs-ns0.wikimedia.org is CRITICAL: CRITICAL - Plugin timed out while executing system call [20:51:28] heh, well, yes, there's that [20:51:35] But I can log out and in of gerrit and tendril [20:51:40] PROBLEM - LDAP on virt0 is CRITICAL: Connection refused [20:51:45] Same here. [20:51:48] Coren: thoughts about what else I should be looking for? [20:51:51] PROBLEM - Certificate expiration on virt0 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [20:52:10] PROBLEM - LDAPS on virt0 is CRITICAL: Connection refused [20:52:21] andrewbogott: Not offhand. I think that's a case of "turn it off and see if anyone complains" [20:53:11] ACKNOWLEDGEMENT - Certificate expiration on virt0 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 andrew bogott Stopping virt0 services in anticipation of shutdown [20:53:11] ACKNOWLEDGEMENT - HTTP on virt0 is CRITICAL: Connection refused andrew bogott Stopping virt0 services in anticipation of shutdown [20:53:11] ACKNOWLEDGEMENT - LDAP on virt0 is CRITICAL: Connection refused andrew bogott Stopping virt0 services in anticipation of shutdown [20:53:11] ACKNOWLEDGEMENT - LDAPS on virt0 is CRITICAL: Connection refused andrew bogott Stopping virt0 services in anticipation of shutdown [20:53:11] ACKNOWLEDGEMENT - puppetmaster https on virt0 is CRITICAL: Connection refused andrew bogott Stopping virt0 services in anticipation of shutdown [20:53:14] (03PS1) 10Mark Bergsma: Let's not send IMAP mail to the LDAP server [puppet] - 10https://gerrit.wikimedia.org/r/163932 [20:53:33] I guess I should actually power it down to make sure it's not providing Secret Services [20:53:43] I wonder if I know how to power it up again? [20:54:28] (03CR) 10Mark Bergsma: [C: 032] Let's not send IMAP mail to the LDAP server [puppet] - 10https://gerrit.wikimedia.org/r/163932 (owner: 10Mark Bergsma) [20:54:34] (03PS1) 10Chad: Remove pastebin [puppet] - 10https://gerrit.wikimedia.org/r/163944 [20:54:36] (03PS1) 10Chad: Elasticsearch: Script the way to "safe but quickly" restart a node [puppet] - 10https://gerrit.wikimedia.org/r/163945 [20:55:23] !log powering down virt0, just to see what breaks [20:55:28] Logged the message, Master [20:58:49] PROBLEM - Host labs-ns0.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [20:58:50] * andrewbogott listens closely for the sounds of klaxons and breaking dishes [20:59:00] PROBLEM - Host virt0 is DOWN: PING CRITICAL - Packet loss = 100% [20:59:02] ooh, that's not supposed to happen! [20:59:35] labs-ns0 is totally not down, icinga. Jerk. [21:00:04] heh [21:00:04] spagewmf, ebernhardson: Respected human, time to deploy Flow (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140930T2100). Please do the needful. [21:00:18] are you sure? :) [21:00:38] bblack: nslookup testlabs-dns.wmflabs.org labs-ns0.wikimedia.org [21:00:39] works [21:01:06] It does worry me that icinga disagrees though [21:01:22] did you fail it over to another host? [21:01:32] (also, nslookup sucks, use dig!) [21:01:58] I moved them yesterday… ns0 should == virt1000, and ns1 labcontrol2001 [21:03:10] I can ping it from neon [21:03:18] maybe the host def for labs-ns0 is wrong in the monitoring check? [21:03:26] must be. [21:03:27] (and has virt0 IP?) [21:03:32] Where would I look for that? [21:03:41] And why would it be strictly defined there rather than a lookup? [21:03:46] well, directly, in /etc/icinga/puppet_hosts.cfg on neon [21:04:09] every service check corresponds to an icinga-defined "host", and each of those is just a name-label with a manually assigned IP [21:04:20] puppet takes care of the IPs, but for a virtual service, something could be off in puppet [21:04:30] or puppet didn't know to refresh somehow [21:04:56] or it just hasn't done so yet, which might involve puppet runs on the virt boxes, then palladium, then neon, in sequence before it sorts itself [21:05:30] (because the underlying boxes generate puppet data, palladium compiles it via naggen, then neon applies it as icinga config) [21:05:31] hm, icinga has three IPs for that host, none of them right [21:05:36] lol [21:06:08] Should I just change them right in the .cfg? Or is there something more elegant to force an update? [21:06:34] you have to fix it in puppet somehow [21:06:41] that file is regenerated completely via puppet [21:07:04] I'll look in a few if nothing seems obvious [21:07:47] bblack: I think puppet can only /add/ stuff [21:08:11] it rewrites the file, doesn't append to it [21:08:18] that would explain the multiple IPs [21:08:24] Ah. [21:08:27] oh, ok [21:08:35] so it's getting 3x defs from puppet data somehow [21:09:00] ah, well, puppet is disabled on virt0 so it wouldn't have removed itself [21:09:09] perhaps the applicable monitor_host isn't defined in the right place (for a virtual service, usually it should be evaluated once on neon somehow, rather than on all the applicable hosts it could be on) [21:09:19] or that [21:09:36] andrewbogott: bblack: robh: I can't see of any way to improve https://gerrit.wikimedia.org/r/#/c/163222/ or test it further. I'll await a couple concurring opinions and then deploy once bblack is ready. [21:10:21] (03CR) 10Manybubbles: "?" [puppet] - 10https://gerrit.wikimedia.org/r/163944 (owner: 10Chad) [21:11:13] (03CR) 10Manybubbles: "So the reason why the init script doesn't do this is that if you break the process in the middle you have to remember to re-enable allocat" [puppet] - 10https://gerrit.wikimedia.org/r/163945 (owner: 10Chad) [21:14:45] Coren: just to confirm, we're not creating files like etc/ssl/certs/DigiCertHighAssuranceCA-3.pem at all anymore? Just trusting in them staying as a remnant from previous runs? [21:18:12] andrewbogott: They get created as symlinks by update-ca-certificates [21:18:35] Coren: that is definitely not the first time I have asked you that [21:18:57] (03CR) 10Andrew Bogott: [C: 031] Use the Ubuntu Way of installing SSL certs [puppet] - 10https://gerrit.wikimedia.org/r/163222 (owner: 10coren) [21:19:22] Indeed not. What used to get leftover (possibly) is *server* certs; but we're not explicitly making them into symlinks now to avoid having to rely on crumbs. [21:19:40] s/we're not/we're now/ [21:20:05] * andrewbogott nods [21:21:27] (03CR) 10BBlack: [C: 031] Use the Ubuntu Way of installing SSL certs [puppet] - 10https://gerrit.wikimedia.org/r/163222 (owner: 10coren) [21:22:41] well, nothing to do now except merge and then all immediately punch out for the day :) [21:23:14] Reports of 503s in two channels [21:23:21] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 1 below the confidence bounds [21:23:40] Who's breaking stuff [21:23:40] gdash shows a 5xx spike, yeah [21:23:42] there's a spike of fatals: http://graphite.wikimedia.org/render/?width=588&height=315&_salt=1409869403.785&title=Rate%20of%20MediaWiki%20exceptions%20and%20fatal%20errors%2C%20past%2024%20hours&vtitle=Errors%20per%20second&target=alias(mw.errors.fatal.rate%2C%22Fatal%20errors%22)&target=alias(mw.errors.exception.rate%2C%22Exceptions%22) [21:24:03] I didn't merge anything yet! Not it! :-) [21:24:07] anyone looked at fatals log? [21:24:31] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [21:24:40] ori: did you turn off the json hhvm logs? [21:24:49] * andrewbogott waits to find out how that's related to virt0 somehow [21:24:55] [2014-09-30 21:24:34] Fatal error: Cannot use object of type stdClass as array at /srv/mediawiki/php-1.25wmf1/includes/specialpage/SpecialPageFactory.php on line 281 [21:25:04] lots of those [21:25:15] yeah that [21:25:21] ori beat me! [21:25:22] We saw that yesterday, right? [21:25:38] yeah we did [21:25:39] MaxSem: MF related? [21:25:49] * aude glares at MaxSem [21:26:26] bblack: When things settle down and you want to do that thang, you tell me. I'll stand by. [21:26:36] i'm reverting max's revert of his revert [21:26:47] ok, mostly just waiting for the dust to settle on this, and I need to salt agent disable the ssls just before [21:29:37] MaxSem isn't around, so I'll push the revert. Kaldari nodded. [21:30:04] Coren: you are faster than light. Thank you for applying that patch! [21:30:20] he is *not* faster than light [21:30:22] but pretty close [21:30:29] Nope. I obey relativity. [21:30:31] In my universe with my physics he is! [21:30:57] maybe my light is just slower than yours, to compare with? [21:31:04] Vulnerabilities have a lot of mass though, so there's lots of time dilation. :-) [21:31:12] haha [21:31:23] * andre__ grabs his Hawking book from the shelf [21:31:23] MaxSem's here now. MaxSem: I +2'd https://gerrit.wikimedia.org/r/163957 but haven't deployed yet [21:31:24] back [21:31:39] Going to sync it unless you tell me to hold off. [21:31:46] go on [21:31:50] And yes, making general relativity jokes pretty much nails me as a hopless nerd. :-) [21:33:46] !log ori Synchronized php-1.25wmf1/includes/specialpage/SpecialPageFactory.php: I672c699c (1/2) (duration: 00m 07s) [21:33:50] !log ori Synchronized php-1.25wmf1/languages/Language.php: I672c699c (2/2) (duration: 00m 03s) [21:33:51] Logged the message, Master [21:33:56] Logged the message, Master [21:34:23] now vee vait [21:35:31] palladium is chugging [21:36:10] as in, almost unusable shell there. seems to be puppetmaster related of course [21:36:33] my new thought-experiment is to hold up zend and mediawiki to the deployment standards we're applying to HHVM (which are good, don't get me wrong) [21:37:02] opt-in to wmfN+1 with a cookie? :) [21:37:10] heheh [21:38:02] PROBLEM - puppet last run on ssl3001 is CRITICAL: CRITICAL: Puppet has 1 failures [21:38:15] (03CR) 10Chad: "This is stuff I just copy+pasted into here from wikitech with no context. Without any kind of wrapper or loading, they're not useful at al" [puppet] - 10https://gerrit.wikimedia.org/r/163944 (owner: 10Chad) [21:38:58] what happened to esams? http://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&c=Miscellaneous+esams&m=cpu_report&s=by+name&mc=2&g=network_report [21:39:58] since wmf2 is going to be branched tomorrow morning and anomie is not around, I think we need to revert that patch from aster too [21:40:32] (03CR) 10Chad: "Maybe it could be broken into multiple scripts then so you can run them individually?" [puppet] - 10https://gerrit.wikimedia.org/r/163945 (owner: 10Chad) [21:40:34] MaxSem: could you get kaldari to +2? [21:40:49] just had a report in the Commons channel of "[f6a3157b] 2014-09-30 21:40:04: Fatal exception of type MWException" when trying to open https://commons.wikimedia.org/w/index.php?title=File%3ALogic+Model+FKAGEU+WMAT.pdf&action=purge [21:41:52] ori: http://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&c=LVS+loadbalancers+esams&m=cpu_report&s=by+name&mc=2&g=network_report [21:41:58] ^ not the LVS, not sure which misc that was [21:42:20] eh, not tomorrow but the day after it [21:42:26] so not 100% urgent [21:42:26] hooft or nescio? [21:42:40] NotASpy: ack, I see it [21:42:43] hooft https://ganglia.wikimedia.org/latest/?c=Miscellaneous%20esams&h=hooft.esams.wikimedia.org&m=network_report&r=hour&s=by%20name&hc=4&mc=2 [21:42:46] yeah hooft [21:43:01] which does lots of misc things, but mostly administrative? [21:43:18] 404 https://wikitech.wikimedia.org/wiki/Hooft [21:43:41] ori: yeah, same for me too [21:43:41] it's a ganglia aggregator, maybe ganglia stats dropped out for a bit [21:43:44] There is one instance of gmetad running on hooft. It aggregates data from the misc hosts in the esams cloud (apparently) [21:44:19] so the "apparently" can be removed from https://wikitech.wikimedia.org/wiki/Ganglia? :) [21:44:42] maybe :) [21:44:42] NotASpy, "Could not acquire lock for 'Logic_Model_FKAGEU_WMAT.pdf.'" [21:44:55] !log Spike of bitter irony from Nemo_bis on #wikimedia-operations starting 21:43 UTC [21:45:00] Logged the message, Master [21:45:09] aww not true :p [21:45:36] I kid. [21:45:39] well technically, it is still apparently running gmetad :) [21:45:46] it's just more apparent than it was before [21:46:00] actually, just gmond [21:46:46] so are we safe from the array stdClass thing now? [21:47:07] MaxSem: can't get that error, no matter what I do [21:47:25] yeah, its not widespread [21:47:42] I assume it's just usual commons db locking problems [21:47:56] (03CR) 10Dzahn: [C: 032] Add .gitreview [debs/contenttranslation/apertium] - 10https://gerrit.wikimedia.org/r/163614 (owner: 10KartikMistry) [21:48:25] (03CR) 10Dzahn: [V: 032] Add .gitreview [debs/contenttranslation/apertium] - 10https://gerrit.wikimedia.org/r/163614 (owner: 10KartikMistry) [21:48:35] Oh, HAH! https://gerrit.wikimedia.org/r/#/c/136520/ actually causes /per-language/ enabling of uploads. How silly. [21:49:07] oh? is that related to Norwegian chapter board wiki having them disabled? [21:49:21] Norwegian = .no uploads ?:p [21:49:23] Yes, though afaict it was disabled in every language. [21:49:27] hah, nice [21:51:03] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [21:51:53] bmansurov: yer with me and Kaity in room 35 [21:52:07] kaldari, ok [21:52:08] Coren: puppet's disabled on the main ssl frontends, fire at will [21:53:06] (03CR) 10coren: [C: 032] "Gingerly applying." [puppet] - 10https://gerrit.wikimedia.org/r/163222 (owner: 10coren) [21:53:29] kaldari, can you give me the link to hangout? [21:53:40] bblack: merged in [21:54:25] * Coren applies on neon, watches result. [21:54:45] I'll wait for you to check out the other hosts before I mess with the ssl's [21:54:57] if something has to be reverted, then we don't have to worry about any mess on them [21:56:06] PROBLEM - puppet last run on iodine is CRITICAL: CRITICAL: Puppet has 5 failures [21:56:12] PROBLEM - puppet last run on fluorine is CRITICAL: CRITICAL: Puppet has 2 failures [21:56:12] PROBLEM - puppet last run on copper is CRITICAL: CRITICAL: Puppet has 2 failures [21:56:13] PROBLEM - puppet last run on cp1037 is CRITICAL: CRITICAL: Puppet has 2 failures [21:56:13] PROBLEM - puppet last run on db60 is CRITICAL: CRITICAL: Puppet has 2 failures [21:56:22] PROBLEM - puppet last run on mw1093 is CRITICAL: CRITICAL: Puppet has 2 failures [21:56:22] PROBLEM - puppet last run on rbf1001 is CRITICAL: CRITICAL: Puppet has 2 failures [21:56:22] PROBLEM - puppet last run on mw1215 is CRITICAL: CRITICAL: Puppet has 2 failures [21:56:22] PROBLEM - puppet last run on cp1067 is CRITICAL: CRITICAL: Puppet has 2 failures [21:56:23] PROBLEM - puppet last run on mw1016 is CRITICAL: CRITICAL: Puppet has 2 failures [21:56:23] PROBLEM - puppet last run on db1072 is CRITICAL: CRITICAL: Puppet has 1 failures [21:56:33] PROBLEM - puppet last run on achernar is CRITICAL: CRITICAL: Puppet has 2 failures [21:56:42] PROBLEM - puppet last run on cp1040 is CRITICAL: CRITICAL: Puppet has 2 failures [21:56:42] PROBLEM - puppet last run on ms-be1011 is CRITICAL: CRITICAL: Puppet has 2 failures [21:56:42] PROBLEM - puppet last run on mw1091 is CRITICAL: CRITICAL: Puppet has 2 failures [21:56:43] PROBLEM - puppet last run on lvs3003 is CRITICAL: CRITICAL: Puppet has 2 failures [21:57:00] awesome [21:57:02] PROBLEM - puppet last run on mw1204 is CRITICAL: CRITICAL: Puppet has 2 failures [21:57:02] PROBLEM - puppet last run on strontium is CRITICAL: CRITICAL: Puppet has 1 failures [21:57:02] PROBLEM - puppet last run on db2028 is CRITICAL: CRITICAL: Puppet has 2 failures [21:57:02] PROBLEM - puppet last run on search1011 is CRITICAL: CRITICAL: Puppet has 2 failures [21:57:02] PROBLEM - puppet last run on elastic1003 is CRITICAL: CRITICAL: Puppet has 1 failures [21:57:02] PROBLEM - puppet last run on amssq37 is CRITICAL: CRITICAL: puppet fail [21:57:03] PROBLEM - puppet last run on cp1054 is CRITICAL: CRITICAL: Puppet has 1 failures [21:57:03] PROBLEM - puppet last run on search1004 is CRITICAL: CRITICAL: Puppet has 2 failures [21:57:03] PROBLEM - puppet last run on mw1107 is CRITICAL: CRITICAL: Puppet has 1 failures [21:57:04] PROBLEM - puppet last run on mw1193 is CRITICAL: CRITICAL: Puppet has 2 failures [21:57:05] PROBLEM - puppet last run on analytics1018 is CRITICAL: CRITICAL: Puppet has 2 failures [21:57:06] PROBLEM - puppet last run on wtp1010 is CRITICAL: CRITICAL: Puppet has 1 failures [21:57:06] PROBLEM - puppet last run on mw1203 is CRITICAL: CRITICAL: Puppet has 2 failures [21:57:12] PROBLEM - puppet last run on mw1112 is CRITICAL: CRITICAL: Puppet has 2 failures [21:57:13] PROBLEM - puppet last run on mw1071 is CRITICAL: CRITICAL: Puppet has 1 failures [21:57:13] PROBLEM - puppet last run on db71 is CRITICAL: CRITICAL: Puppet has 1 failures [21:57:13] PROBLEM - puppet last run on amssq39 is CRITICAL: CRITICAL: Puppet has 2 failures [21:57:13] PROBLEM - puppet last run on cp1044 is CRITICAL: CRITICAL: Puppet has 1 failures [21:57:13] PROBLEM - puppet last run on mw1143 is CRITICAL: CRITICAL: Puppet has 1 failures [21:57:19] /etc/init.d/ircecho stop if you wanna kill the bot temp. , fwiw [21:57:22] PROBLEM - puppet last run on ms-be1015 is CRITICAL: CRITICAL: Puppet has 1 failures [21:57:28] PROBLEM - puppet last run on mw1086 is CRITICAL: CRITICAL: Puppet has 2 failures [21:57:28] PROBLEM - puppet last run on mw1066 is CRITICAL: CRITICAL: Puppet has 2 failures [21:57:29] PROBLEM - puppet last run on cp1068 is CRITICAL: CRITICAL: Puppet has 1 failures [21:57:33] PROBLEM - puppet last run on osm-cp1001 is CRITICAL: CRITICAL: Puppet has 1 failures [21:57:33] PROBLEM - puppet last run on ms-be1002 is CRITICAL: CRITICAL: Puppet has 1 failures [21:57:40] ESC[1;31mError: Could not set 'directory' on ensure: Could not find group ca-certs at 233:/etc/puppet/manifests/certs.ppESC[0m [21:57:42] ESC[1;31mError: Could not set 'directory' on ensure: Could not find group ca-certs at 233:/etc/puppet/manifests/certs.pp [21:57:45] PROBLEM - puppet last run on mw1220 is CRITICAL: CRITICAL: Puppet has 1 failures [21:57:45] PROBLEM - puppet last run on mw1090 is CRITICAL: CRITICAL: Puppet has 1 failures [21:57:55] PROBLEM - puppet last run on search1012 is CRITICAL: CRITICAL: Puppet has 1 failures [21:57:56] PROBLEM - puppet last run on cp1053 is CRITICAL: CRITICAL: Puppet has 1 failures [21:57:56] PROBLEM - puppet last run on bast1001 is CRITICAL: CRITICAL: Puppet has 2 failures [21:57:56] PROBLEM - puppet last run on db1056 is CRITICAL: CRITICAL: Puppet has 1 failures [21:57:56] PROBLEM - puppet last run on db1045 is CRITICAL: CRITICAL: Puppet has 1 failures [21:57:56] PROBLEM - puppet last run on terbium is CRITICAL: CRITICAL: Puppet has 1 failures [21:57:56] PROBLEM - puppet last run on lvs1003 is CRITICAL: CRITICAL: Puppet has 1 failures [21:58:07] PROBLEM - puppet last run on virt1008 is CRITICAL: CRITICAL: Puppet has 1 failures [21:58:07] PROBLEM - puppet last run on cp1045 is CRITICAL: CRITICAL: Puppet has 1 failures [21:58:08] PROBLEM - puppet last run on db1037 is CRITICAL: CRITICAL: Puppet has 1 failures [21:58:08] PROBLEM - puppet last run on db1035 is CRITICAL: CRITICAL: Puppet has 1 failures [21:58:08] PROBLEM - puppet last run on cp3018 is CRITICAL: CRITICAL: Puppet has 1 failures [21:58:08] PROBLEM - puppet last run on mw1027 is CRITICAL: CRITICAL: Puppet has 1 failures [21:58:16] bblack: There's a minor bug (a typo in a group name); lemme fix it first. [21:58:17] PROBLEM - puppet last run on mw1064 is CRITICAL: CRITICAL: Puppet has 1 failures [21:58:22] stopped [21:58:43] bblack: There's a minor bug (a typo in a group name); lemme fix it first. (in case flooded out) [21:59:34] (03Abandoned) 10Catrope: Followup 6084646d: apply Mathoid directory creation hack to labs too [puppet] - 10https://gerrit.wikimedia.org/r/162811 (owner: 10Catrope) [22:00:17] (03PS1) 10coren: Certificate config had wrong group for dir [puppet] - 10https://gerrit.wikimedia.org/r/163966 [22:00:24] bblack: ^^ [22:01:37] (03CR) 10coren: [C: 032] "Typo fix" [puppet] - 10https://gerrit.wikimedia.org/r/163966 (owner: 10coren) [22:02:19] (03PS14) 10Catrope: Citoid puppetization [puppet] - 10https://gerrit.wikimedia.org/r/163068 [22:02:21] (03PS1) 10Dzahn: remove host linne [dns] - 10https://gerrit.wikimedia.org/r/163967 [22:02:21] ok [22:02:34] * Coren stupid stupid stupid [22:03:15] (03PS1) 10coren: Fix typo with typo [puppet] - 10https://gerrit.wikimedia.org/r/163969 [22:03:46] * Coren rereads. [22:03:50] * Coren double checks. [22:04:07] S. S. L. Dash. C. E. R. T. No S. [22:04:23] (03CR) 10coren: [C: 032] Fix typo with typo [puppet] - 10https://gerrit.wikimedia.org/r/163969 (owner: 10coren) [22:05:11] git commit --amend --just-edit-it-for-me-and-get-it-right-pls [22:05:27] I've been typing "ca-certificates" over and over for the past day. :-) [22:05:41] (03CR) 10Dzahn: [C: 031] "about to kill linne - shut it down yesterday" [dns] - 10https://gerrit.wikimedia.org/r/163967 (owner: 10Dzahn) [22:05:42] Applying again on neon. [22:07:07] (03PS1) 10EBernhardson: Point beta redis at the domain instead of ip [puppet] - 10https://gerrit.wikimedia.org/r/163973 [22:07:17] bmansurov: http://en.m.wikipedia.beta.wmflabs.org/wiki/Anne_Dallas_Dudley?wikidataid=Q3784220 [22:07:42] (03PS15) 10Catrope: Citoid puppetization [puppet] - 10https://gerrit.wikimedia.org/r/163068 [22:08:09] PROBLEM - puppet last run on db1018 is CRITICAL: CRITICAL: Puppet has 1 failures [22:08:09] PROBLEM - puppet last run on mw1025 is CRITICAL: CRITICAL: Puppet has 1 failures [22:08:09] PROBLEM - puppet last run on amssq35 is CRITICAL: CRITICAL: Puppet has 1 failures [22:08:10] PROBLEM - puppet last run on mw1092 is CRITICAL: CRITICAL: Puppet has 1 failures [22:08:10] PROBLEM - puppet last run on mw1119 is CRITICAL: CRITICAL: Puppet has 1 failures [22:08:10] PROBLEM - puppet last run on search1018 is CRITICAL: CRITICAL: Puppet has 1 failures [22:08:13] hah [22:08:18] (03CR) 10Dzahn: [C: 032] remove host linne [dns] - 10https://gerrit.wikimedia.org/r/163967 (owner: 10Dzahn) [22:08:19] your puppet run on neon restarted the bot [22:08:20] PROBLEM - puppet last run on mw1170 is CRITICAL: CRITICAL: Puppet has 1 failures [22:08:21] bmansurov: make sure you open it while logged in, opted in to beta, and with a very small window (or mobile emulator) [22:08:29] killed again [22:08:34] kaldari, ok [22:09:02] !log removing linne from DNS - was already shutdown about 24 hours before [22:09:08] Logged the message, Master [22:09:27] testing on cp1008 (which was my SNI test box) [22:10:08] notices that the sanity checks on authdns-update somehow take longer nowadays [22:10:09] bblack: It applies cleanly on neon [22:11:02] bmansurov: https://www.wikidata.org/wiki/Q42 [22:12:24] (03PS2) 10Ori.livneh: Update require_package() from Vagrant [puppet] - 10https://gerrit.wikimedia.org/r/163912 [22:12:49] (03CR) 10Ori.livneh: [C: 032 V: 032] Update require_package() from Vagrant [puppet] - 10https://gerrit.wikimedia.org/r/163912 (owner: 10Ori.livneh) [22:12:52] Coren: I let it regenerate certs on cp1008, and the md5sums of the new ones match the old ones for the unified cert. [22:13:09] and nginx restarts cleanly [22:13:11] looks pretty good [22:13:20] Coren: Shall I merge one of those neon service patches and see if the change actually helped? [22:13:30] checking ssllabs against pu.wm.o just in case [22:13:33] Or is that premature? I guess we know we want this change even if it doesn't help with that [22:13:36] bblack: if everthing went right, the certs in /etc/ssl/certs/ should actually now be symlinks. [22:13:42] right, they are [22:13:55] but the files they point at are brand-new, I just want to make sure they were identical to the old [22:14:01] * Coren nods. [22:14:35] (03PS2) 10Spage: Point beta redis at the domain instead of ip [puppet] - 10https://gerrit.wikimedia.org/r/163973 (https://bugzilla.wikimedia.org/71484) (owner: 10EBernhardson) [22:14:40] You may also want to check that /usr/local/share/ca-certificates contains our roots as expected. [22:15:27] it does, but I don't think nginx refs those anyways [22:15:48] Not unless you use client certs or it behaves as a proxy to an ssl client. [22:16:29] trying real servers [22:16:38] andrewbogott: AFAICT, the patch does everything expected; and sets the link farm right. Go ahead and merge it in. [22:18:59] Hm, actually the first patch in line is for tungsten. Does that have the cert changes as well? [22:19:38] Coren: also, I'm going to be called away to dinner in ~15 -- shall I change my plans for additional hovering? [22:21:01] RECOVERY - puppet last run on mw1218 is OK: OK: Puppet is currently enabled, last run 1 seconds ago with 0 failures [22:21:01] RECOVERY - puppet last run on ms1001 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [22:21:02] RECOVERY - puppet last run on db1010 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [22:21:02] RECOVERY - puppet last run on mw1132 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [22:21:16] RECOVERY - puppet last run on mw1216 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [22:21:18] Coren: I turned puppet back on for all the SSLs, looks ok [22:21:19] RECOVERY - puppet last run on db2035 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [22:21:19] RECOVERY - puppet last run on mc1011 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [22:21:19] RECOVERY - puppet last run on mw1130 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [22:21:19] RECOVERY - puppet last run on cp1059 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [22:21:19] RECOVERY - puppet last run on db1007 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [22:21:20] RECOVERY - puppet last run on wtp1024 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [22:21:20] RECOVERY - puppet last run on mw1161 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [22:21:20] RECOVERY - puppet last run on analytics1036 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [22:21:20] RECOVERY - puppet last run on mc1010 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [22:21:26] RECOVERY - puppet last run on mw1178 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [22:21:27] RECOVERY - puppet last run on rdb1004 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [22:21:27] RECOVERY - puppet last run on cp3013 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [22:21:27] RECOVERY - puppet last run on wtp1017 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [22:21:36] RECOVERY - puppet last run on uranium is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [22:21:36] ... and the bot's back :p [22:21:40] PROBLEM - puppet last run on ssl3001 is CRITICAL: CRITICAL: Puppet has 1 failures [22:21:40] RECOVERY - puppet last run on mw1005 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [22:21:40] RECOVERY - puppet last run on mw1038 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [22:21:40] RECOVERY - puppet last run on mc1004 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [22:21:41] RECOVERY - puppet last run on mw1147 is OK: OK: Puppet is currently enabled, last run 60 seconds ago with 0 failures [22:21:41] RECOVERY - puppet last run on mc1009 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [22:21:46] RECOVERY - puppet last run on ms-fe1003 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [22:21:46] RECOVERY - puppet last run on mw1192 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [22:21:47] RECOVERY - puppet last run on dbstore1001 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [22:21:47] RECOVERY - puppet last run on mw1080 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [22:21:47] RECOVERY - puppet last run on wtp1019 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [22:21:47] RECOVERY - puppet last run on cp3022 is OK: OK: Puppet is currently enabled, last run 60 seconds ago with 0 failures [22:21:47] RECOVERY - puppet last run on ocg1001 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [22:21:49] andrewbogott: Things seem okay; I'm keeping an eye out. Go eat. [22:21:56] RECOVERY - puppet last run on tarin is OK: OK: Puppet is currently enabled, last run 81 seconds ago with 0 failures [22:21:56] RECOVERY - puppet last run on ms-be1013 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [22:21:57] RECOVERY - puppet last run on mw1089 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [22:21:57] RECOVERY - puppet last run on mw1134 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [22:21:59] hmm ssl3001? [22:22:16] RECOVERY - puppet last run on analytics1033 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [22:22:18] RECOVERY - puppet last run on carbon is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [22:22:18] RECOVERY - puppet last run on db1017 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [22:22:18] RECOVERY - puppet last run on mw1031 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [22:22:19] RECOVERY - puppet last run on netmon1001 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [22:22:19] RECOVERY - puppet last run on search1020 is OK: OK: Puppet is currently enabled, last run 61 seconds ago with 0 failures [22:22:19] RECOVERY - puppet last run on mw1115 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [22:22:26] RECOVERY - puppet last run on vanadium is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [22:22:27] RECOVERY - puppet last run on mw1124 is OK: OK: Puppet is currently enabled, last run 120 seconds ago with 0 failures [22:22:27] RECOVERY - puppet last run on dbproxy1002 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [22:22:27] RECOVERY - puppet last run on wtp1009 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [22:22:27] RECOVERY - puppet last run on logstash1003 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [22:22:27] RECOVERY - puppet last run on magnesium is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [22:22:27] RECOVERY - puppet last run on ms-be1001 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [22:22:28] RECOVERY - puppet last run on amssq59 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [22:22:28] RECOVERY - puppet last run on analytics1017 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [22:22:36] RECOVERY - puppet last run on mw1048 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [22:22:36] RECOVERY - puppet last run on mw1059 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [22:22:36] RECOVERY - puppet last run on cp3006 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [22:22:37] RECOVERY - puppet last run on mw1072 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [22:22:37] RECOVERY - puppet last run on mw1028 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [22:22:37] RECOVERY - puppet last run on ms-be2002 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [22:22:37] RECOVERY - puppet last run on db1029 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [22:22:37] RECOVERY - puppet last run on db1053 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [22:22:38] RECOVERY - puppet last run on db73 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [22:22:38] RECOVERY - puppet last run on es1001 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [22:22:38] RECOVERY - puppet last run on cp1070 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [22:22:43] Error: /Stage[main]/Certificates::Rapidssl_ca/File[/etc/ssl/certs/RapidSSL_CA.pem]: Could not evaluate: Connection timed out - connect(2) Could not retrieve file metadata for puppet:///files/ssl/RapidSSL [22:22:46] RECOVERY - puppet last run on ms-be1004 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [22:22:46] RECOVERY - puppet last run on mw1067 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [22:22:46] RECOVERY - puppet last run on elastic1004 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [22:22:46] RECOVERY - puppet last run on amslvs2 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [22:22:46] RECOVERY - puppet last run on mw1145 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [22:22:47] RECOVERY - puppet last run on amssq44 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [22:22:47] _CA.pem: Connection timed out - connect(2) [22:22:47] RECOVERY - puppet last run on amssq43 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [22:22:47] RECOVERY - puppet last run on cp3008 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [22:22:48] it's just puppet overload [22:22:56] RECOVERY - puppet last run on cp1049 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [22:22:56] RECOVERY - puppet last run on elastic1007 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [22:22:56] RECOVERY - puppet last run on mw1140 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [22:22:56] RECOVERY - puppet last run on cp3015 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [22:22:56] RECOVERY - puppet last run on caesium is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [22:22:57] RECOVERY - puppet last run on mw1012 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [22:22:57] RECOVERY - puppet last run on db1065 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [22:22:57] RECOVERY - puppet last run on logstash1001 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [22:22:57] RECOVERY - puppet last run on db2034 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [22:22:58] RECOVERY - puppet last run on mw1141 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [22:23:06] RECOVERY - puppet last run on cp3020 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [22:23:16] RECOVERY - puppet last run on mw1006 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [22:23:19] RECOVERY - puppet last run on mc1016 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [22:23:19] RECOVERY - puppet last run on mw1063 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [22:23:19] RECOVERY - puppet last run on neptunium is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [22:23:19] RECOVERY - puppet last run on lvs4002 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [22:23:19] RECOVERY - puppet last run on ms-be1006 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [22:23:19] RECOVERY - puppet last run on platinum is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [22:23:20] RECOVERY - puppet last run on analytics1041 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [22:23:26] RECOVERY - puppet last run on cerium is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [22:23:26] RECOVERY - puppet last run on ms-be1003 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [22:23:26] RECOVERY - puppet last run on amssq49 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [22:23:27] RECOVERY - puppet last run on potassium is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [22:23:27] RECOVERY - puppet last run on amssq54 is OK: OK: Puppet is currently enabled, last run 60 seconds ago with 0 failures [22:23:27] RECOVERY - puppet last run on mw1106 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [22:23:27] RECOVERY - puppet last run on mc1006 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [22:23:27] RECOVERY - puppet last run on wtp1006 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [22:23:36] RECOVERY - puppet last run on analytics1025 is OK: OK: Puppet is currently enabled, last run 1 seconds ago with 0 failures [22:23:36] RECOVERY - puppet last run on analytics1040 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [22:23:36] RECOVERY - puppet last run on mw1045 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [22:23:36] RECOVERY - puppet last run on mw1007 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [22:23:37] RECOVERY - puppet last run on gold is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [22:23:37] RECOVERY - puppet last run on mw1040 is OK: OK: Puppet is currently enabled, last run 161 seconds ago with 0 failures [22:23:37] RECOVERY - puppet last run on helium is OK: OK: Puppet is currently enabled, last run 1 seconds ago with 0 failures [22:23:37] RECOVERY - puppet last run on mw1041 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [22:23:37] RECOVERY - puppet last run on mw1200 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [22:23:38] RECOVERY - puppet last run on mw1197 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [22:23:38] RECOVERY - puppet last run on snapshot1003 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [22:23:39] RECOVERY - puppet last run on wtp1020 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [22:23:39] RECOVERY - puppet last run on cp1055 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [22:23:46] RECOVERY - puppet last run on ms-be2003 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [22:23:50] RECOVERY - puppet last run on cp1047 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [22:23:50] RECOVERY - puppet last run on analytics1020 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [22:23:50] RECOVERY - puppet last run on cp1039 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [22:23:50] RECOVERY - puppet last run on searchidx1001 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [22:23:50] RECOVERY - puppet last run on db1031 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [22:23:56] RECOVERY - puppet last run on es1008 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [22:23:56] RECOVERY - puppet last run on mw1026 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [22:23:56] RECOVERY - puppet last run on mw1082 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [22:24:06] RECOVERY - puppet last run on db2005 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [22:24:06] RECOVERY - puppet last run on ms-be2006 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [22:24:06] RECOVERY - puppet last run on search1016 is OK: OK: Puppet is currently enabled, last run 61 seconds ago with 0 failures [22:24:16] RECOVERY - puppet last run on db1050 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [22:24:16] RECOVERY - puppet last run on mw1187 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [22:24:17] RECOVERY - puppet last run on sca1002 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [22:24:17] RECOVERY - puppet last run on db2002 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [22:24:21] RECOVERY - puppet last run on ms1004 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [22:24:21] RECOVERY - puppet last run on db74 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [22:24:21] RECOVERY - puppet last run on db2009 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [22:24:26] RECOVERY - puppet last run on mw1150 is OK: OK: Puppet is currently enabled, last run 1 seconds ago with 0 failures [22:24:26] RECOVERY - puppet last run on amssq32 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [22:24:26] RECOVERY - puppet last run on amssq61 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [22:24:26] RECOVERY - puppet last run on rbf1002 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [22:24:26] RECOVERY - puppet last run on db1073 is OK: OK: Puppet is currently enabled, last run 60 seconds ago with 0 failures [22:24:26] RECOVERY - puppet last run on elastic1001 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [22:24:26] RECOVERY - puppet last run on db2039 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [22:24:27] RECOVERY - puppet last run on db2019 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [22:24:27] RECOVERY - puppet last run on mw1160 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [22:24:28] RECOVERY - puppet last run on mw1174 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [22:24:36] RECOVERY - puppet last run on xenon is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [22:24:36] RECOVERY - puppet last run on elastic1008 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [22:24:36] RECOVERY - puppet last run on search1010 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [22:24:37] RECOVERY - puppet last run on lvs1005 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [22:24:37] RECOVERY - puppet last run on db1066 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [22:24:37] RECOVERY - puppet last run on ms-be2004 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [22:24:37] RECOVERY - puppet last run on analytics1035 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [22:24:37] RECOVERY - puppet last run on elastic1012 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [22:24:42] RECOVERY - puppet last run on mc1003 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [22:24:42] RECOVERY - puppet last run on lead is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [22:24:42] RECOVERY - puppet last run on mw1153 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [22:24:56] RECOVERY - puppet last run on mw1009 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [22:24:57] RECOVERY - puppet last run on mw1046 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [22:24:57] RECOVERY - puppet last run on lvs3001 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [22:24:57] RECOVERY - puppet last run on lvs1002 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [22:24:57] RECOVERY - puppet last run on mw1176 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [22:24:57] RECOVERY - puppet last run on mw1205 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [22:24:57] RECOVERY - puppet last run on mw1164 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [22:24:57] RECOVERY - puppet last run on db1022 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [22:24:58] RECOVERY - puppet last run on mc1002 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [22:24:58] RECOVERY - puppet last run on mw1117 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [22:24:59] RECOVERY - puppet last run on ms-fe1001 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [22:24:59] RECOVERY - puppet last run on mw1189 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [22:25:05] RECOVERY - puppet last run on mw1100 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [22:25:17] RECOVERY - puppet last run on labstore1003 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [22:25:20] RECOVERY - puppet last run on mw1088 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [22:25:20] RECOVERY - puppet last run on mw1120 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [22:25:20] RECOVERY - puppet last run on mw1099 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [22:25:20] RECOVERY - puppet last run on mw1008 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [22:25:21] RECOVERY - puppet last run on mw1217 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [22:25:21] RECOVERY - puppet last run on amssq53 is OK: OK: Puppet is currently enabled, last run 63 seconds ago with 0 failures [22:25:25] RECOVERY - puppet last run on cp3016 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [22:25:37] RECOVERY - puppet last run on mw1052 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [22:25:37] RECOVERY - puppet last run on mw1003 is OK: OK: Puppet is currently enabled, last run 69 seconds ago with 0 failures [22:25:38] RECOVERY - puppet last run on amssq35 is OK: OK: Puppet is currently enabled, last run 62 seconds ago with 0 failures [22:25:38] RECOVERY - puppet last run on mw1060 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [22:25:38] RECOVERY - puppet last run on wtp1016 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [22:25:39] RECOVERY - puppet last run on mw1166 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [22:25:46] RECOVERY - puppet last run on db1018 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [22:25:46] RECOVERY - puppet last run on ssl3001 is OK: OK: Puppet is currently enabled, last run 66 seconds ago with 0 failures [22:25:46] RECOVERY - puppet last run on cp3003 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [22:25:47] RECOVERY - puppet last run on mw1173 is OK: OK: Puppet is currently enabled, last run 67 seconds ago with 0 failures [22:25:56] RECOVERY - puppet last run on cp1056 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [22:25:56] RECOVERY - puppet last run on mw1069 is OK: OK: Puppet is currently enabled, last run 60 seconds ago with 0 failures [22:25:57] RECOVERY - puppet last run on db1015 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [22:25:57] RECOVERY - puppet last run on cp1061 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [22:25:57] RECOVERY - puppet last run on db1059 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [22:25:57] RECOVERY - puppet last run on mw1068 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [22:26:18] RECOVERY - puppet last run on db1040 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [22:26:37] RECOVERY - puppet last run on mw1025 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [22:26:38] RECOVERY - puppet last run on mw1092 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [22:26:38] RECOVERY - puppet last run on mw1119 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [22:26:38] RECOVERY - puppet last run on mw1118 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [22:26:38] RECOVERY - puppet last run on search1018 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [22:26:47] RECOVERY - puppet last run on mw1144 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [22:26:47] RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [22:27:02] RECOVERY - puppet last run on mw1123 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [22:28:26] (03PS16) 10Catrope: Citoid puppetization [puppet] - 10https://gerrit.wikimedia.org/r/163068 [22:29:53] (03PS17) 10Catrope: Citoid puppetization [puppet] - 10https://gerrit.wikimedia.org/r/163068 [22:32:38] (03PS3) 10coren: Autogenerate chained certificates [puppet] - 10https://gerrit.wikimedia.org/r/163798 [22:36:56] !log disabling puppet on mw1019 to enable debug logging in apache [22:37:03] Logged the message, Master [22:46:19] (03PS4) 10coren: Autogenerate chained certificates [puppet] - 10https://gerrit.wikimedia.org/r/163798 [22:46:39] bblack: You might find this one patch interesting, though there is no urgency ^^ [22:47:15] bblack: Having cleaned up the ca-certificates store, we can now reliably generate the chained certificates without having to speciry the CA all over puppet. [22:49:05] Coren: we have that one central place in certs.pp too [22:49:17] as opposed to using the "ca" parameter [22:49:30] mutante: For defaults, but it still requires having a list of issuers etc. [22:49:51] mutante: This actually constructs the chain from the issuer's hashes/dn [22:50:24] bd808: Hey so your rewrite of everything to use package { ... provider => trebuchet } instead of deployment::target ? It doesn't work. You're not adding salt::grain like deployment::target does (or if you are I can't find it). I just ported Citoid to use it and it's now failing to deploy in labs because the grain has 0 minions in it [22:50:45] mutante: Better yet, it will never construct an invalid chain. :-) [22:50:58] Coren: aha! so for example for the certs from GlobalCA it would have found the intermediate ? (and it does not add the root CA cert itself into the chained.pem , right) [22:51:09] Coren: never invalid sure sounds good :) [22:51:24] mutante: So long as we have the intermediate installed, yes. [22:51:33] RoanKattouw: :( THat needs fixing then in Ori's provider [22:51:39] Yeah [22:51:49] mutante: I also tried with with the RapidSSL ones, it chains properly with GeoTrust. [22:51:51] RoanKattouw: it is adding salt::Grain [22:51:51] For now I'm going to revert to deployment::target with directory creation hacks, again. [22:51:54] RoanKattouw: it's doing it in ruby [22:51:56] hang on [22:51:58] Ugh [22:52:02] I grepped salt::grain and didn't see it [22:52:08] Coren: cool! [22:52:08] because it's not using the puppet wrapper [22:52:20] RoanKattouw: can i come by yr desk and take a look? [22:52:32] I tried running puppet on both deployment-salt and deployment-sca01 and git deploy still doesn't work [22:52:39] mutante: It's a fairly simple script, you can try it on any cert trivially. [22:52:54] Wait wtf now it is in there [22:53:02] ... [22:53:10] Coren: fwiw, because a little related. this old old patch from JeremyB, he should be right. (we shouldn't be adding the root ca cert itself) https://gerrit.wikimedia.org/r/#/c/111387/ [22:53:26] Apparently running puppet fixed it, even though there was nothing in the puppet output remotely related to citoid or trebuchet or anything, and I was using --verbose [22:53:28] Thanks puippet [22:54:44] mutante: Right now my script stops at, but includes, the first self-signed certificate in the chain - so it'd default to including the root I think. Lemme actually test it with star.planet. [22:54:51] FYI we got a PHP error on testwiki with Echo backport, so I'm reverting [22:55:50] !log re-enabling puppet on mw1019 [22:55:55] Logged the message, Master [22:56:46] <^d> No swat this afternoon? [22:57:53] jouncebot: next [22:57:53] In 0 hour(s) and 2 minute(s): SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140930T2300) [22:58:51] mutante: Yeah, it includes the root cert by default. It's not /necessary/ but I don't think it hurts though. [23:00:00] Ah, same deal with etherpad, it chains /C=US/O=GeoTrust, Inc./CN=RapidSSL CA /C=US/O=GeoTrust Inc./CN=GeoTrust Global CA /C=US/O=Equifax/OU=Equifax Secure Certificate Authority [23:00:04] RoanKattouw, ^d, marktraceur, MaxSem: Dear anthropoid, the time has come. Please deploy SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140930T2300). [23:00:16] So the best thing would be to exclude the last in the chain then. [23:00:21] Coren: hmmm, "does use chained w/ Apache which explains more weirdness in the way it's serving its cert (the server cert is sent in duplicate" [23:00:29] soory, busy right now [23:01:33] No patches. I declare SWAT a grand success. [23:01:49] WTF [23:01:57] https://wikitech.wikimedia.org/wiki/Special:NovaInstance is not listing any instances at all [23:02:14] marktraceur: 0/0 ain't bad. [23:02:20] Oh logging out and back in fixed it [23:02:23] ^ [23:02:23] <^d> Yeah [23:02:24] * RoanKattouw sighs in the general direction of labs [23:02:25] Coren: ^ and/or it's about the individual Apache configs using SSLCACertificatePath vs. SSLCertificateChainFile or both [23:02:31] <^d> marktraceur: GOOD JOB EVERYBODY [23:02:38] marktraceur, RoanKattouw : I'm still reverting an Echo backport because testwiki had errors, but ebernhardson and I would like to deploy some code to log the error [23:02:40] eh, I've had this since last week [23:03:02] spagewmf: SWAT is empty so go right ahead [23:03:16] thx [23:03:17] mutante: Actually, that's a common error. SSLCACertificate{Path,File} are used for /client/ certs. But I thing that if you use both SSLCertificateChainFile *and* SSLCertificateFile you'll get double the server cert [23:03:43] (03PS18) 10Ori.livneh: Citoid puppetization [puppet] - 10https://gerrit.wikimedia.org/r/163068 (owner: 10Catrope) [23:04:01] mutante: Either way, the scripts makes it ultra-easy to include the original cert or not, and/or include the root or not [23:04:43] Coren: yep, that's cool, and i agree we'd want to not include the root, per " They serve no purpose (clients will always ignore them) and they incur a slight performance (latency) penalty" [23:04:48] (03CR) 10Ori.livneh: Citoid puppetization (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/163068 (owner: 10Catrope) [23:05:52] (03PS5) 10coren: Autogenerate chained certificates [puppet] - 10https://gerrit.wikimedia.org/r/163798 [23:05:55] mutante: ^^ no longer includes the root. :-) [23:06:11] Coren: :) [23:06:29] * Coren ponders. [23:06:33] jeremyb: ^ fyi, backlog, remember that discussion? [23:06:46] But that version will emit an empty chain for a self-signed cert. (Since it is its own root) [23:07:51] We really really shouldn't ever be using self-signed certs anywhere anyways. [23:08:05] (03CR) 10Catrope: Citoid puppetization (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/163068 (owner: 10Catrope) [23:08:13] https://community.qualys.com/thread/11026 [23:09:58] (03CR) 10Dzahn: "also see https://gerrit.wikimedia.org/r/#/c/163798/" [puppet] - 10https://gerrit.wikimedia.org/r/111387 (owner: 10Jeremyb) [23:12:42] mutante: You can have some fun testing it on zirconium: ~marc/test-cert-chain /etc/ssl/localcerts/etherpad.wikimedia.org.crt [23:13:10] (03PS1) 10Chad: Put my .gitconfig everywhere [puppet] - 10https://gerrit.wikimedia.org/r/163984 [23:13:21] (Same code, prints the filename instead of the cert itself) [23:15:05] Coren: thanks [23:16:01] (03PS1) 10EBernhardson: Create log group for Echo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163987 [23:17:42] (03CR) 10Spage: [C: 032] "OK" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163987 (owner: 10EBernhardson) [23:17:49] (03Merged) 10jenkins-bot: Create log group for Echo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163987 (owner: 10EBernhardson) [23:19:40] someone needs to stop me from using war/draft imagery in SWAT emails in the future [23:20:02] (03PS19) 10Ori.livneh: Citoid puppetization [puppet] - 10https://gerrit.wikimedia.org/r/163068 (owner: 10Catrope) [23:20:16] RoanKattouw: PS19 is a touch more opinionated, but feel free to revert to PS18 if you don't like it. [23:20:43] RoanKattouw: I generally avoid parametrizing things that don't practically need to be parametrized; there's a horror of string literals in configs that I think is misplaced. If /var/log/citoid is the log dir, let's go ahead and assume it for all installations. [23:21:30] err, there is a small mistake, sec [23:21:54] ori: OK I'll look in a minute [23:21:57] (03PS20) 10Ori.livneh: Citoid puppetization [puppet] - 10https://gerrit.wikimedia.org/r/163068 (owner: 10Catrope) [23:21:58] better [23:23:50] greg-g: Dear jouncebot, the time has come to talk about code changes, ☮ out [23:24:33] mutante: :) [23:25:06] ori: Looks totally fine to me [23:25:38] RoanKattouw, if I reverted a change to Echo 1.25wmf1 that was never sync'd to the cluster, should I still sync-dir Echo so the git hash on production matches tin? [23:26:11] Yes [23:26:22] Also in case they're not identical, it's you taking that hit instead of a random pesron [23:27:05] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/). [23:27:25] !log spage Synchronized php-1.25wmf1/extensions/Echo: Echo no-op (change reverted) (duration: 00m 09s) [23:27:31] Logged the message, Master [23:27:47] it's fast if you don't do anything :) [23:28:17] (03Abandoned) 10PleaseStand: CommonSettings.php: Remove some dated cruft [mediawiki-config] - 10https://gerrit.wikimedia.org/r/154442 (owner: 10PleaseStand) [23:29:03] (03PS1) 10Dzahn: remove host 'silver' [dns] - 10https://gerrit.wikimedia.org/r/163992 [23:29:05] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [23:29:42] !log spage Synchronized wmf-config/InitialiseSettings.php: Create log group for Echo (duration: 00m 11s) [23:29:47] Logged the message, Master [23:35:37] (03PS1) 10Dzahn: search - remove commented check_lucene_frontend [puppet] - 10https://gerrit.wikimedia.org/r/163994 [23:36:15] (03PS2) 10Dzahn: remove unused check_lucene_frontend [puppet] - 10https://gerrit.wikimedia.org/r/163788 [23:38:01] (03CR) 10Dzahn: "bblack: so now it looks better when vim highlights it, but did not solve the issue we were after anyways :p" [puppet] - 10https://gerrit.wikimedia.org/r/163781 (owner: 10Dzahn) [23:43:17] ori and ^demon|away I've CC'd you on https://bugzilla.wikimedia.org/show_bug.cgi?id=71486 - hope that was correct. [23:48:48] (03CR) 10coren: [C: 031] "Me gusta." [puppet] - 10https://gerrit.wikimedia.org/r/163798 (owner: 10coren) [23:51:04] (03CR) 10BryanDavis: "Cherry-picked to deployment-salt and applied on deployment-jobrunner01. Jobrunner is working again! Thanks Erik." [puppet] - 10https://gerrit.wikimedia.org/r/163973 (https://bugzilla.wikimedia.org/71484) (owner: 10EBernhardson) [23:54:35] thanks for the timezone math, RoanKattouw :) [23:57:02] (03CR) 10Catrope: [C: 04-1] Citoid puppetization (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/163068 (owner: 10Catrope) [23:57:13] ori: Minor thing ---^^ [23:59:44] (03CR) 10Dzahn: [C: 032] "since it is behind misc-web the backend does not need SSL config anymore" [puppet] - 10https://gerrit.wikimedia.org/r/163312 (owner: 10Dzahn)