[00:00:05] RoanKattouw, ^d, marktraceur, MaxSem, James_F: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141118T0000). [00:00:11] * James_F waves. [00:00:40] (03CR) 10jenkins-bot: [V: 04-1] gerrit role: add ssh::server listening on other IP [puppet] - 10https://gerrit.wikimedia.org/r/174015 (owner: 10Dzahn) [00:00:51] awight: Neat! [00:00:55] PROBLEM - CI: Low disk space on /var on labmon1001 is CRITICAL: CRITICAL: integration.integration-puppetmaster.diskspace._var.byte_avail.value (11.11%) [00:00:57] * AndyRussG falls off the floor onto the floor [00:01:09] I can do it [00:01:23] AndyRussG: There's always another floor. [00:01:29] MaxSem: SWAT += https://gerrit.wikimedia.org/r/174017 for VE [00:01:30] (03PS2) 10Dzahn: gerrit role: add ssh::server listening on other IP [puppet] - 10https://gerrit.wikimedia.org/r/174015 [00:01:31] wmf7 version of that coming [00:01:35] awight, are you still deploying? [00:02:25] (03CR) 10jenkins-bot: [V: 04-1] gerrit role: add ssh::server listening on other IP [puppet] - 10https://gerrit.wikimedia.org/r/174015 (owner: 10Dzahn) [00:02:43] MaxSem: Sorry for the slowness, I'm submitting SWAT commits from airplane wifi [00:02:48] (Which is also why I'm not deploying today :P ) [00:03:04] <^d> I think Roan should do swat from 34k ft. [00:03:08] awight, ??? [00:03:11] MaxSem: all done! [00:03:17] thanks:) [00:03:27] * RoanKattouw is at 38k ft and has just past Salt Lake City [00:03:54] awight so, test campaign on mediawiki.org in esperanto? [00:03:55] gwicke, is https://gerrit.wikimedia.org/r/#/c/172995/ live? [00:04:12] (03PS3) 10Dzahn: gerrit role: add ssh::server listening on other IP [puppet] - 10https://gerrit.wikimedia.org/r/174015 [00:04:31] ejegg: :) [00:04:32] *passed [00:04:53] ejegg: sounds bonan [00:05:25] MaxSem: Aaand https://gerrit.wikimedia.org/r/174020 [00:05:30] I will go and add them to the deployments page [00:05:45] awight: hah [00:05:46] cool [00:06:19] (03CR) 10Dzahn: "next we'll need https://gerrit.wikimedia.org/r/#/c/174015/3/manifests/role/gerrit.pp or so and make sure the gerrit node does not include " [puppet] - 10https://gerrit.wikimedia.org/r/172313 (https://bugzilla.wikimedia.org/35611) (owner: 10Dereckson) [00:07:13] MaxSem: that failure is intermittent, re-doing +2 [00:07:14] cscott, is https://gerrit.wikimedia.org/r/#/c/172995/ live? [00:08:05] RECOVERY - CI: Low disk space on /var on labmon1001 is OK: OK: All targets OK [00:08:19] MaxSem: yes. https://www.mediawiki.org/wiki/Parsoid/Deployments [00:08:28] thanks [00:08:51] (03PS4) 10Dzahn: gerrit role: add ssh::server listening on other IP [puppet] - 10https://gerrit.wikimedia.org/r/174015 (https://bugzilla.wikimedia.org/35611) [00:15:19] (03CR) 10Dzahn: [C: 032] "only removes labs / pmtpa and those per Bryan's comments above" [puppet] - 10https://gerrit.wikimedia.org/r/173456 (owner: 10Dzahn) [00:21:20] !log maxsem Synchronized php-1.25wmf8/extensions/Flow: SWAT (duration: 00m 05s) [00:21:24] Logged the message, Master [00:21:34] ebernhardson, please verify ^^^ [00:21:54] !log maxsem Synchronized php-1.25wmf8/extensions/VisualEditor/: SWAT (duration: 00m 04s) [00:21:56] Logged the message, Master [00:22:18] James_F, ^^^ [00:22:46] (03PS6) 10Dzahn: add monitoring for search.wm (Apple dict bridge) [puppet] - 10https://gerrit.wikimedia.org/r/171193 [00:23:23] MaxSem: Are you doing the mobile SWAT deployment? [00:23:24] !log maxsem Synchronized php-1.25wmf8/extensions/WikiGrok/: (no message) (duration: 00m 04s) [00:23:26] Logged the message, Master [00:24:28] MaxSem: looks good, thanks [00:24:29] !log maxsem Synchronized php-1.25wmf8/extensions/MobileFrontend/: SWAT (duration: 00m 04s) [00:24:31] Logged the message, Master [00:24:39] kaldari|2, wmf8 ^^^ [00:24:44] (03CR) 10Dzahn: [C: 032] "adding the check on terbium for now" [puppet] - 10https://gerrit.wikimedia.org/r/171193 (owner: 10Dzahn) [00:24:54] MaxSem: Cool, I'll test now [00:26:26] RoanKattouw: ^^^ [00:28:01] MaxSem: Looks good on wmf8 [00:29:34] PROBLEM - puppet last run on terbium is CRITICAL: CRITICAL: puppet fail [00:29:49] ^ yea, fixing it, monitor_service got renamed [00:30:34] RoanKattouw, forgot to ask - does VE need a submodule update? [00:31:33] Yes [00:31:38] bleh [00:31:45] I didn't do it [00:31:50] git submodule update --recursive extensions/VisualEditor works [00:32:10] can we agree to explicitly mention it on deployments page? :P [00:32:49] !log maxsem Synchronized php-1.25wmf8/extensions/VisualEditor/: SWAT (duration: 00m 04s) [00:33:00] for realz ^^^ [00:33:40] MaxSem: Sure, I can do that [00:33:41] (03PS1) 10Dzahn: fix: monitor_service is now monitoring::service [puppet] - 10https://gerrit.wikimedia.org/r/174030 [00:33:43] !log maxsem Synchronized php-1.25wmf7/extensions/WikiGrok: SWAT (duration: 00m 03s) [00:33:46] Logged the message, Master [00:33:57] !log maxsem Synchronized php-1.25wmf7/extensions/MobileFrontend/: SWAT (duration: 00m 04s) [00:33:59] Logged the message, Master [00:34:02] kaldari|2, ^^ [00:34:15] !log maxsem Synchronized php-1.25wmf7/extensions/VisualEditor/: SWAT (duration: 00m 04s) [00:34:17] Logged the message, Master [00:34:21] RoanKattouw, ^^ [00:34:27] Thanks man [00:34:40] (03CR) 10Dzahn: [C: 032] fix: monitor_service is now monitoring::service [puppet] - 10https://gerrit.wikimedia.org/r/174030 (owner: 10Dzahn) [00:34:54] waiting for a confirmation from kaldari|2 and RoanKattouw then deploying config changes [00:35:42] Looking [00:36:45] RECOVERY - puppet last run on terbium is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [00:37:38] MaxSem: everything looks good on wmf8 [00:39:01] MaxSem: wmf8 looks good, testing wmf7 [00:39:19] MaxSem: And wmf7 is good too [00:39:51] thanks [00:40:21] (03CR) 10MaxSem: [C: 032] Enable VisualEditor by default on Tagalog Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172993 (https://bugzilla.wikimedia.org/73365) (owner: 10Jforrester) [00:40:22] RoanKattouw, MaxSem: Do the OOUI patch too? [00:40:30] (03Merged) 10jenkins-bot: Enable VisualEditor by default on Tagalog Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172993 (https://bugzilla.wikimedia.org/73365) (owner: 10Jforrester) [00:40:31] ?? [00:40:52] MaxSem: https://gerrit.wikimedia.org/r/174029 – Roan forgot it, but if the window's still open; [00:41:12] after config changes [00:41:47] Cool. [00:41:59] I had it scheduled for tomorrow, I'll move it to this window on the wiki page [00:42:20] * James_F nods. [00:42:46] on the other hand, we already deployed more that 8 changes today... [00:43:03] MaxSem: Depends on definition of change counting… [00:43:21] MaxSem: It's a pretty bad bug for VE (wmf8 only though). [00:43:23] !log maxsem Synchronized visualeditor-default.dblist: https://gerrit.wikimedia.org/r/172993 (duration: 00m 04s) [00:43:26] Logged the message, Master [00:43:29] (03PS1) 10Ori.livneh: Add pybal.test.fixtures and pybal.test.test_monitors [debs/pybal] - 10https://gerrit.wikimedia.org/r/174034 [00:43:36] on any definition:P [00:43:38] MaxSem: Confirmed on tlwiki. [00:43:43] ok, let's do it:P [00:44:04] (03CR) 10MaxSem: [C: 032] Follow-up I50cb3ed: Enable VisualEditor as a Beta Feature on maiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172996 (owner: 10Jforrester) [00:44:14] (03Merged) 10jenkins-bot: Follow-up I50cb3ed: Enable VisualEditor as a Beta Feature on maiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172996 (owner: 10Jforrester) [00:44:49] Yay. [00:44:54] !log maxsem Synchronized visualeditor.dblist: https://gerrit.wikimedia.org/r/172996 (duration: 00m 03s) [00:44:57] Logged the message, Master [00:46:25] (03CR) 10MaxSem: [C: 032] Set wgMFUseWikibaseDescription to true on test wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173878 (owner: 10Kaldari) [00:46:35] (03Merged) 10jenkins-bot: Set wgMFUseWikibaseDescription to true on test wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173878 (owner: 10Kaldari) [00:47:15] !log maxsem Synchronized wmf-config: https://gerrit.wikimedia.org/r/173878 (duration: 00m 03s) [00:47:17] Logged the message, Master [00:47:38] kaldari|2, ^^^ [00:48:18] (03PS1) 10Jforrester: Enable VisualEditor by default on Catalan Wikiquote (cawikiquote) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174036 (https://bugzilla.wikimedia.org/73483) [00:50:28] !log Restarted hung logstash process on logstash1001 [00:50:30] Logged the message, Master [00:52:00] PROBLEM - MySQL Replication Heartbeat on db1016 is CRITICAL: CRIT replication delay 341 seconds [00:52:14] PROBLEM - MySQL Slave Delay on db1016 is CRITICAL: CRIT replication delay 354 seconds [00:52:17] PROBLEM - puppet last run on snapshot1004 is CRITICAL: CRITICAL: Puppet has 1 failures [00:52:35] MaxSem: Confirmed mai config working. [00:52:55] RECOVERY - MySQL Replication Heartbeat on db1016 is OK: OK replication delay -1 seconds [00:53:02] !log Restarted logstash on logstash1002; lots of errors in the log about GELF input >128 chunks [00:53:05] Logged the message, Master [00:53:15] RECOVERY - MySQL Slave Delay on db1016 is OK: OK replication delay 0 seconds [00:53:42] ocg is the likely culprit for the oversize GELF messages to logstash1002 [00:54:28] RECOVERY - puppet last run on snapshot1004 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [00:54:37] it's also the worst flooder - kill? :P [00:55:52] (03CR) 10Dzahn: "doesn't show up in Icinga yet due to unrelated problem with icinga" [puppet] - 10https://gerrit.wikimedia.org/r/173633 (owner: 10Ori.livneh) [00:55:52] !log maxsem Synchronized php-1.25wmf8/resources/lib/oojs-ui/: https://gerrit.wikimedia.org/r/#/c/174029/ (duration: 00m 04s) [00:55:57] Logged the message, Master [00:56:02] The logs from OCG seem a bit loud to me for operational logs, but we've got disk space now so *shrug* [00:56:12] James_F, RoanKattouw_away ^^^ [00:56:51] bd808, SO KILL ANYWAY [00:56:57] ONEONEONE [00:57:19] MaxSem: Confirmed fixed. [00:57:21] MaxSem: Yay! [00:57:51] MaxSem: You are sooo agro. It's like your a speed metal fan or something. ;) [00:58:04] DEATH METAL [00:58:18] agro? [01:00:50] <^d> bd808: s/agro/emo/ [01:01:32] MaxSem: agro is a some time in the past SoCal-ism for agressive [01:01:37] WEEEP WEEEP ^D HURTS MY FEELINGS [01:01:42] (03CR) 10Ori.livneh: "@mutante: Oh, thanks for letting me know. I was wondering about that." [puppet] - 10https://gerrit.wikimedia.org/r/173633 (owner: 10Ori.livneh) [01:02:00] mutante: what's the unrelated problem? [01:02:12] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 0 below the confidence bounds [01:02:26] emo MaxSem would be fun to see. Same eyeliner as the metal heads but for a totally different reason. ;) [01:02:52] ori: icinga -v /etc/icinga/icinga.cfg -> Error processing object config files! [01:03:00] ori: the fun part is it does _not_ give details where [01:03:26] i have seen it break so many times before, but it would always say which line and what [01:03:30] not now [01:03:41] bd808, I don't see an exceptionmonitor dashboard in logstash anymore [01:03:53] https://logstash.wikimedia.org/#/dashboard/elasticsearch/fatalmonitor [01:03:58] it's better [01:04:16] I think I killed the old exception monitor one last week [01:04:47] <^d> bd808: Appropriate: [[w:The Ungroundable]] [01:05:54] Oh Hot Topic. Source of so much confusion for so many [01:06:32] I love the South Park goth kids. [01:06:58] they buy at Hot Topic? [01:07:37] mutante: No, the vampire wanna be kids did. So the goth kids burned it down. [01:07:50] heh :) [01:08:41] ori: nowadays it reads files from /etc/icinga & /etc/nagios/ :p [01:09:27] /etc/icinga# file puppet_hostgroups.cfg [01:09:27] puppet_hostgroups.cfg: empty [01:10:17] <^d> I need to finish making my Butters mp3 for e-mails. [01:23:30] !log temporarily reassign db1004 for phab migration tests [01:23:35] Logged the message, Master [01:27:49] !log on osmium: removing ori's custom kernel and rebooting [01:27:53] Logged the message, Master [01:29:08] (03PS1) 10Ori.livneh: Fix-up for I8fbef8279 [puppet] - 10https://gerrit.wikimedia.org/r/174046 [01:31:12] (03CR) 10Dzahn: [C: 032] "yea, icinga config currently broken due to the empty servicegroups, thanks for this" [puppet] - 10https://gerrit.wikimedia.org/r/174046 (owner: 10Ori.livneh) [01:34:30] (03CR) 10Dzahn: "works, i see it removing empty servicegroups on neon .." [puppet] - 10https://gerrit.wikimedia.org/r/174046 (owner: 10Ori.livneh) [01:36:55] <^d> bd808: Success! http://anyonecanedit.org/email.mp3 [01:37:22] ^d: <3 [01:38:10] You should rig something up in the office to blast that when puppet fails to run in beta. [01:38:40] <^d> So just put it on loop all day? ;-) [01:38:48] PROBLEM - puppet last run on neon is CRITICAL: CRITICAL: puppet fail [01:39:39] PROBLEM - Host osmium is DOWN: CRITICAL - Plugin timed out after 15 seconds [01:40:09] RECOVERY - Host osmium is UP: PING OK - Packet loss = 0%, RTA = 1.64 ms [01:40:39] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There are 2 unmerged changes in puppet (dir /var/lib/git/operations/puppet). [01:40:48] RECOVERY - puppet last run on neon is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [01:40:59] RECOVERY - puppet last run on osmium is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [01:55:08] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [01:55:40] springle: Coren was hoping to get feedback from you about replag; are you on top of that? [01:58:19] PROBLEM - CI: Low disk space on /var on labmon1001 is CRITICAL: CRITICAL: integration.integration-puppetmaster.diskspace._var.byte_avail.value (11.11%) [02:00:06] andrewbogott: hmm, did I miss a ping. I guess you mean labs; it isn't lagged right now [02:00:30] springle: I forwarded you an email over the weekend… there are also a bunch of bugs and tickets, lemme find links [02:00:58] (by 'a bunch' I probably mean two) [02:01:24] I'm aware of eg, bug 72908 [02:01:35] which isn't about lag, but it's mentioned [02:01:56] https://rt.wikimedia.org/Ticket/Display.html?id=8877 [02:02:22] https://bugzilla.wikimedia.org/show_bug.cgi?id=73511 [02:02:37] https://bugzilla.wikimedia.org/show_bug.cgi?id=73480 [02:02:43] Hm, so far I think those are all the same [02:03:11] none of those are replag, though they are legitimate and would require Coren's perl scripts [02:03:28] andrewbogott: oh, i see the email. labs-l [02:03:44] And, yeah, en seems to be up-to-date now. So I don't know what was up over the weekend :( [02:03:52] So, I guess, nevermind :) [02:03:56] "Tech support requests" [02:04:24] It's possible there was some user query blocking (common), or Sanitarium lagged for some reason [02:04:27] * springle digs [02:04:49] Thanks. [02:04:54] I'll respond [02:05:30] (03CR) 10Ori.livneh: [C: 032] Add pybal.test.fixtures and pybal.test.test_monitors [debs/pybal] - 10https://gerrit.wikimedia.org/r/174034 (owner: 10Ori.livneh) [02:05:47] (03Merged) 10jenkins-bot: Add pybal.test.fixtures and pybal.test.test_monitors [debs/pybal] - 10https://gerrit.wikimedia.org/r/174034 (owner: 10Ori.livneh) [02:05:50] In general does replag actually signify a problem? Or is it more like the weather and I should disregard complaints about it unless the it's days behind? [02:10:10] andrewbogott: labs replag more than a hour is a proper problem. Usually the fix is to track down whatever slow query is running for hours or days on end. [02:10:35] ok… someone should probably teach me how to do that sometime :) [02:11:45] uhhh [02:11:49] why is jenkins in italian? [02:12:31] It's been that way for a few days but I don't think anyone has done anything about it... [02:12:40] (I thought it was Spanish) [02:13:37] o.O [02:13:47] well, I'm pretty sure it's not spanish now :P [02:13:59] Oh shit, https://wiki.jenkins-ci.org/display/JENKINS/How+to+view+Jenkins+in+your+language [02:14:04] Apparently it's trying to detect the setting in your browser. [02:14:07] And maybe off by one? [02:14:19] !log LocalisationUpdate completed (1.25wmf7) at 2014-11-18 02:14:18+00:00 [02:14:25] Logged the message, Master [02:14:36] <^d> Heh, it's set to 'en' [02:14:40] <^d> And set to ignore the browser. [02:15:34] http://cl.ly/image/3E1U1C0e0C29 looks italian to me [02:15:49] and I'm pretty sure my browser is set to en_US :P [02:16:08] <^d> It's pretty obviously wrong for me too :) [02:17:17] ^d: where is it set? [02:17:49] <^d> I was looking in https://integration.wikimedia.org/ci/configure [02:18:43] SAL says jenkins was upgraded today (yesterday) [02:19:21] Did I fix it? [02:19:33] woot, back to english :D [02:19:47] I did that by changing the language from 'en' to '' [02:19:50] So. [02:20:20] <^d> so obviously 'en' is no longer english :) [02:20:34] It'll stay in English until someone else visits that page and things, man, I should really specify english here... [02:21:09] >.< [02:21:19] en_US seems to work [02:21:30] Does it still look right for y'all? [02:21:42] <^d> lgtm. [02:21:46] looks the same [02:22:12] ok then! [02:22:34] <^d> !log jenkins locale set from 'en' to 'en_US' since 'en' means Italian somehow. [02:22:38] Logged the message, Master [02:22:40] <^d> There, now hashar will know. [02:22:40] random button-pushing ftw [02:22:43] thanks :) [02:23:21] legoktm: thanks for complaining -- I was just living with an interface that I couldn't read and feeling bad about my language skills. [02:23:28] :P [02:23:52] I'm bilingual, I speak English and python. [02:24:43] legoktm_, not PHP? [02:25:49] https://en.wikipedia.org/wiki/User:Legoktm/php-0 [02:26:17] !log LocalisationUpdate completed (1.25wmf8) at 2014-11-18 02:26:17+00:00 [02:26:20] Logged the message, Master [02:36:57] RECOVERY - HTTP error ratio anomaly detection on tungsten is OK: OK: No anomaly detected [02:54:37] (03PS1) 10Aaron Schulz: Added switch-logic for new Profiler config format [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174060 [02:56:20] (03CR) 10Chad: [C: 031] Added switch-logic for new Profiler config format [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174060 (owner: 10Aaron Schulz) [03:46:27] PROBLEM - check if salt-minion is running on analytics1003 is CRITICAL: Connection refused by host [03:46:59] PROBLEM - check configured eth on analytics1003 is CRITICAL: Connection refused by host [03:46:59] PROBLEM - check if dhclient is running on analytics1003 is CRITICAL: Connection refused by host [03:47:00] PROBLEM - Disk space on analytics1003 is CRITICAL: Connection refused by host [03:47:27] PROBLEM - RAID on analytics1003 is CRITICAL: Connection refused by host [03:47:27] PROBLEM - puppet last run on analytics1003 is CRITICAL: Connection refused by host [03:50:06] PROBLEM - SSH on analytics1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:16:03] !log LocalisationUpdate ResourceLoader cache refresh completed at Tue Nov 18 04:16:03 UTC 2014 (duration 16m 2s) [04:16:09] Logged the message, Master [04:23:47] PROBLEM - RAID on ms-be2007 is CRITICAL: CRITICAL: 1 failed LD(s) (Offline) [04:25:27] PROBLEM - Disk space on ms-be2007 is CRITICAL: DISK CRITICAL - /srv/swift-storage/sdk1 is not accessible: Input/output error [04:33:56] PROBLEM - puppet last run on ms-be2007 is CRITICAL: CRITICAL: Puppet has 1 failures [04:36:27] RECOVERY - CI: Low disk space on /var on labmon1001 is OK: OK: All targets OK [05:01:12] RECOVERY - Disk space on ms-be2007 is OK: DISK OK [05:03:14] PROBLEM - NTP on analytics1003 is CRITICAL: NTP CRITICAL: No response from NTP server [06:28:33] PROBLEM - puppet last run on mw1119 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:33] PROBLEM - puppet last run on mw1170 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:04] PROBLEM - puppet last run on search1018 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:23] PROBLEM - puppet last run on cp4008 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:03] PROBLEM - puppet last run on mw1065 is CRITICAL: CRITICAL: Puppet has 1 failures [06:45:43] RECOVERY - puppet last run on mw1065 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [06:45:53] RECOVERY - puppet last run on search1018 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [06:46:22] RECOVERY - puppet last run on cp4008 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [06:46:24] RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [06:46:24] RECOVERY - puppet last run on mw1119 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [06:49:33] PROBLEM - puppet last run on db1043 is CRITICAL: CRITICAL: Puppet has 1 failures [07:06:53] RECOVERY - puppet last run on db1043 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [07:30:57] (03PS2) 10Giuseppe Lavagetto: icinga: give a non-null default group to monitoring::service [puppet] - 10https://gerrit.wikimedia.org/r/173825 [07:36:20] (03PS1) 10Giuseppe Lavagetto: naggen2: don't print empty service directives [puppet] - 10https://gerrit.wikimedia.org/r/174067 [07:36:57] (03CR) 10Giuseppe Lavagetto: [C: 032] naggen2: don't print empty service directives [puppet] - 10https://gerrit.wikimedia.org/r/174067 (owner: 10Giuseppe Lavagetto) [07:42:43] (03PS1) 10Giuseppe Lavagetto: monitoring: fix excessive newlines in naggen2 [puppet] - 10https://gerrit.wikimedia.org/r/174068 [07:43:36] (03CR) 10Giuseppe Lavagetto: [C: 032] monitoring: fix excessive newlines in naggen2 [puppet] - 10https://gerrit.wikimedia.org/r/174068 (owner: 10Giuseppe Lavagetto) [07:45:45] (03PS3) 10Giuseppe Lavagetto: icinga: give a non-null default group to monitoring::service [puppet] - 10https://gerrit.wikimedia.org/r/173825 [08:03:14] PROBLEM - Host graphite1001 is DOWN: CRITICAL - Plugin timed out after 15 seconds [08:04:25] PROBLEM - Memcached on virt1000 is CRITICAL: Connection refused [08:26:06] (03CR) 10Giuseppe Lavagetto: [C: 032] icinga: give a non-null default group to monitoring::service [puppet] - 10https://gerrit.wikimedia.org/r/173825 (owner: 10Giuseppe Lavagetto) [08:35:33] (03PS1) 10Alexandros Kosiaris: Reimaging script improvements [puppet] - 10https://gerrit.wikimedia.org/r/174070 [08:39:49] <_joe_> akosiaris: wow [08:41:09] Ι now just run it in a screen and walk away [08:41:27] that being said... various iDRAC don't behave exactly nice when it comes to IPMI... [08:56:25] (03CR) 10Alexandros Kosiaris: [C: 04-2] "I am gonna go the extra mile and suggest we solve this with hiera instead. We should be able to modify inclusion of ssh::server in base to" [puppet] - 10https://gerrit.wikimedia.org/r/174015 (https://bugzilla.wikimedia.org/35611) (owner: 10Dzahn) [09:04:01] <_joe_> akosiaris: that's actually trivial to do with hiera in fact [09:04:39] <_joe_> ok, I have like a ton of code reviews to do [09:05:09] (03CR) 10Giuseppe Lavagetto: [C: 032] Reimaging script improvements [puppet] - 10https://gerrit.wikimedia.org/r/174070 (owner: 10Alexandros Kosiaris) [09:05:14] PROBLEM - puppet last run on cp3002 is CRITICAL: Timeout while attempting connection [09:07:05] PROBLEM - Host cp3001 is DOWN: CRITICAL - Plugin timed out after 15 seconds [09:07:12] PROBLEM - Host cp3002 is DOWN: CRITICAL - Plugin timed out after 15 seconds [09:07:15] ^^ me [09:08:58] <_joe_> jgage: when you reimage servers, please also remove their stored resources from puppet [09:09:54] ok. [09:10:22] (03CR) 10Alexandros Kosiaris: [C: 032] Reimaging script improvements [puppet] - 10https://gerrit.wikimedia.org/r/174070 (owner: 10Alexandros Kosiaris) [09:10:32] <_joe_> cp3001 and 3002 cost me and akosiaris a couple of hours each of debugging neon :) [09:11:24] ah geez, sorry! [09:11:34] i did think it was weird that they didn't show up in monitoring after i installed them [09:12:26] is 'puppet node clean' enough to nuke their stored resources? [09:14:14] yeah. we also have puppetstoredconfigclean.rb [09:14:28] cool ok [09:14:34] remnant from pre puppet 3 days. Both do the same thing [09:23:34] PROBLEM - MySQL Replication Heartbeat on db1015 is CRITICAL: CRIT replication delay 301 seconds [09:23:35] PROBLEM - MySQL Slave Delay on db1015 is CRITICAL: CRIT replication delay 305 seconds [09:24:35] RECOVERY - MySQL Replication Heartbeat on db1015 is OK: OK replication delay 127 seconds [09:24:45] RECOVERY - MySQL Slave Delay on db1015 is OK: OK replication delay 130 seconds [09:37:33] (03PS1) 10Filippo Giunchedi: eqiad: add missing forward for graphite1001 [dns] - 10https://gerrit.wikimedia.org/r/174078 [09:38:28] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] eqiad: add missing forward for graphite1001 [dns] - 10https://gerrit.wikimedia.org/r/174078 (owner: 10Filippo Giunchedi) [09:51:46] RECOVERY - Host graphite1001 is UP: PING OK - Packet loss = 0%, RTA = 2.97 ms [10:44:22] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "Apart from a series of inconsistencies in what you did, which would yield incorrect results, all you need here is to create a base configu" [puppet] - 10https://gerrit.wikimedia.org/r/173904 (owner: 10Andrew Bogott) [10:45:17] (03CR) 10Giuseppe Lavagetto: "Look the next PS, I tried to amend your patch." [puppet] - 10https://gerrit.wikimedia.org/r/173904 (owner: 10Andrew Bogott) [10:45:55] (03PS3) 10Giuseppe Lavagetto: Move the openstack_version setting hiera. [puppet] - 10https://gerrit.wikimedia.org/r/173904 (owner: 10Andrew Bogott) [10:51:46] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "This is not a correct use of hiera, I'll show you how to do it." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/171741 (owner: 10GWicke) [11:11:36] (03PS19) 10Giuseppe Lavagetto: Add a simple restbase::labs role [puppet] - 10https://gerrit.wikimedia.org/r/171741 (owner: 10GWicke) [11:25:54] PROBLEM - puppet last run on ms-be2006 is CRITICAL: CRITICAL: puppet fail [11:26:36] (03PS2) 10Giuseppe Lavagetto: monitoring: add config class [puppet] - 10https://gerrit.wikimedia.org/r/173826 [11:36:09] (03PS3) 10Giuseppe Lavagetto: monitoring: add config class [puppet] - 10https://gerrit.wikimedia.org/r/173826 [11:42:16] PROBLEM - Check status of defined EventLogging jobs on vanadium is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [11:42:36] PROBLEM - check if salt-minion is running on vanadium is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [11:43:37] RECOVERY - Check status of defined EventLogging jobs on vanadium is OK: OK: All defined EventLogging jobs are runnning. [11:43:51] RECOVERY - check if salt-minion is running on vanadium is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [11:44:02] RECOVERY - puppet last run on ms-be2006 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [11:52:15] (03PS4) 10Giuseppe Lavagetto: monitoring: add config class [puppet] - 10https://gerrit.wikimedia.org/r/173826 [11:55:31] (03PS1) 10Gage: cp3001 & cp3001: move to private IP [dns] - 10https://gerrit.wikimedia.org/r/174090 [11:55:45] (03PS1) 10Gage: install server: cp3001 & cp3002 fqdn update [puppet] - 10https://gerrit.wikimedia.org/r/174091 [11:57:21] (03PS2) 10Gage: cp3001 & cp3002: move to private IP [dns] - 10https://gerrit.wikimedia.org/r/174090 [12:13:19] (03CR) 10Mark Bergsma: [C: 04-1] cp3001 & cp3002: move to private IP (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/174090 (owner: 10Gage) [12:16:12] (03PS3) 10Gage: cp3001 & cp3002: move to private IP [dns] - 10https://gerrit.wikimedia.org/r/174090 [12:18:16] (03CR) 10Mark Bergsma: [C: 031] cp3001 & cp3002: move to private IP [dns] - 10https://gerrit.wikimedia.org/r/174090 (owner: 10Gage) [12:21:36] (03CR) 10Gage: [C: 032] cp3001 & cp3002: move to private IP [dns] - 10https://gerrit.wikimedia.org/r/174090 (owner: 10Gage) [12:29:27] (03CR) 10Gage: [C: 032] install server: cp3001 & cp3002 fqdn update [puppet] - 10https://gerrit.wikimedia.org/r/174091 (owner: 10Gage) [13:17:01] (03CR) 10Gage: [C: 032] logstash: Update regex in 'exception' grokker [puppet] - 10https://gerrit.wikimedia.org/r/173658 (owner: 10Krinkle) [13:26:43] PROBLEM - Host graphite1001 is DOWN: CRITICAL - Plugin timed out after 15 seconds [13:27:33] RECOVERY - Host graphite1001 is UP: PING OK - Packet loss = 0%, RTA = 0.57 ms [13:32:56] Hm.. memcached-serious is rather high on logstash [13:32:58] all from mw1189 [13:33:01] locked until retry [13:33:10] Memcached error for key "frwiktionary:revisiontext:textid:18243639" on server "127.0.0.1:11212": SERVER HAS FAILED AND IS DISABLED UNTIL TIMED RETRY [13:34:38] It started 01:43:00, and then 100s of times a second. [13:48:48] (03PS1) 10Alexandros Kosiaris: Migrate to ganglia_new::web::view [puppet] - 10https://gerrit.wikimedia.org/r/174114 [13:58:15] (03CR) 10Aklapper: [C: 031] phab don't try to preview icon/x-icon [puppet] - 10https://gerrit.wikimedia.org/r/173875 (owner: 10Rush) [14:16:00] (03PS1) 10Filippo Giunchedi: add graphite role to graphite1001 [puppet] - 10https://gerrit.wikimedia.org/r/174121 [14:16:50] (03PS2) 10Filippo Giunchedi: add graphite role to graphite1001 [puppet] - 10https://gerrit.wikimedia.org/r/174121 [14:16:59] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] add graphite role to graphite1001 [puppet] - 10https://gerrit.wikimedia.org/r/174121 (owner: 10Filippo Giunchedi) [14:26:34] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [14:26:34] PROBLEM - Unmerged changes on repository puppet on palladium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [14:28:34] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [14:28:34] RECOVERY - Unmerged changes on repository puppet on palladium is OK: No changes to merge. [14:33:14] PROBLEM - puppet last run on graphite1001 is CRITICAL: CRITICAL: Puppet has 1 failures [14:36:26] that's me, looking [14:40:47] (03CR) 10Mark Bergsma: [C: 032] "Nobody is going to review 6000 lines of code at this point, so I'm just going to merge it. Please submit patchsets with (much) smaller dif" [software] - 10https://gerrit.wikimedia.org/r/141473 (owner: 10ArielGlenn) [14:43:53] PROBLEM - mysqld processes on db1020 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [14:44:10] gerrit is down [14:44:14] !log Gerrit web interface dead with: Cannot open ReviewDb {{bug|73555}} [14:44:20] might be because db1020 died [14:44:21] Logged the message, Master [14:44:23] PROBLEM - haproxy failover on dbproxy1002 is CRITICAL: CRITICAL check_failover servers up 1 down 1 [14:44:30] great timing [14:45:22] Gerrit uses m2-master.eqiad.wmnet as a db [14:46:16] apparently the same as at least eventlogging and otrs, [14:47:44] gerrit is down [14:47:45] :/ [14:47:57] looking at db1020 [14:48:04] ah good [14:48:07] <^demon|away> tango down :( [14:48:21] LOL https://gerrit.wikimedia.org/r/#/c/141473/Guice provision errors: 1) Cannot open ReviewDb at com.google.gerrit.server.util.ThreadLocalRequestContext$1.provideReviewDb(ThreadLocalRequestContext.java:70) while locating com.google.gerrit.reviewdb.server.ReviewDb 1 error [14:48:37] Ah, it's not specific to the megapatch :) [14:48:54] filled as https://bugzilla.wikimedia.org/show_bug.cgi?id=73555 [14:48:55] RECOVERY - mysqld processes on db1020 is OK: PROCS OK: 1 process with command name mysqld [14:49:38] springle: do db errors magically wake you up ? :/ [14:50:03] s/mag/programati/ [14:50:16] <^d> Glaisher: Yay, we can all take the day off then! :) [14:50:28] :D [14:50:54] <^d> hashar: db errors don't wake me up, but e-mails saying "gerrit down" in my inbox do :p [14:51:31] oh [14:51:31] ^d: remember, one of the arguments you gave for git conversion was that one can work also offline ;) [14:51:54] PROBLEM - mysqld processes on db1020 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [14:55:07] !log fail over m2 to m2-slave (db1046); investigating db1020 [14:55:10] Logged the message, Master [14:55:17] stuff should come back [14:55:51] <^d> gerrit's back, thx [14:56:01] you are our hero sean! [14:56:23] so pings do wake him up, useful info [14:56:41] my phone woke me up [14:56:47] nagios app [14:57:13] guess who never sleeps then [14:57:22] and it is back in less than 15 minutes \O/O [14:57:44] \o/ [14:58:13] !log starting upgrade to Trusty of analytics1015 [14:58:17] Logged the message, Master [15:01:05] (03PS1) 10Filippo Giunchedi: gdash: install rubygems only (03CR) 10Filippo Giunchedi: [C: 032 V: 032] gdash: install rubygems only PROBLEM - CI: Puppet failure events on labmon1001 is CRITICAL: CRITICAL: integration.integration-puppetmaster.puppetagent.failed_events.value (33.33%) [15:03:23] PROBLEM - DPKG on analytics1015 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [15:04:03] RECOVERY - puppet last run on graphite1001 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [15:04:06] (03CR) 10Ottomata: "I don't understand the use of hieradata/labs/restabase.yaml." [puppet] - 10https://gerrit.wikimedia.org/r/171741 (owner: 10GWicke) [15:05:28] (03CR) 10Hashar: [C: 031] Add support for woff2 files [puppet] - 10https://gerrit.wikimedia.org/r/173763 (owner: 10KartikMistry) [15:13:13] PROBLEM - puppet last run on analytics1015 is CRITICAL: Connection refused by host [15:14:44] PROBLEM - Hadoop DataNode on analytics1015 is CRITICAL: Connection refused by host [15:15:05] PROBLEM - RAID on analytics1015 is CRITICAL: Connection refused by host [15:15:08] PROBLEM - Disk space on analytics1015 is CRITICAL: Connection refused by host [15:15:13] PROBLEM - check if salt-minion is running on analytics1015 is CRITICAL: Connection refused by host [15:15:24] PROBLEM - Hadoop NodeManager on analytics1015 is CRITICAL: Connection refused by host [15:15:34] PROBLEM - check configured eth on analytics1015 is CRITICAL: Connection refused by host [15:15:34] PROBLEM - check if dhclient is running on analytics1015 is CRITICAL: Connection refused by host [15:18:35] RECOVERY - check configured eth on analytics1015 is OK: NRPE: Unable to read output [15:18:43] RECOVERY - check if dhclient is running on analytics1015 is OK: PROCS OK: 0 processes with command name dhclient [15:18:53] RECOVERY - Hadoop DataNode on analytics1015 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.hdfs.server.datanode.DataNode [15:19:13] RECOVERY - RAID on analytics1015 is OK: OK: no disks configured for RAID [15:19:14] RECOVERY - Disk space on analytics1015 is OK: DISK OK [15:19:14] RECOVERY - check if salt-minion is running on analytics1015 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [15:19:20] (03CR) 10QChris: [C: 04-1] "We'd add more aggregations to aggregator." [puppet] - 10https://gerrit.wikimedia.org/r/172201 (https://bugzilla.wikimedia.org/72740) (owner: 10QChris) [15:19:33] RECOVERY - Hadoop NodeManager on analytics1015 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [15:27:46] RECOVERY - CI: Puppet failure events on labmon1001 is OK: OK: All targets OK [15:30:18] (03PS1) 10Ottomata: Add libboost-regex-dev, libboost-system-dev and libyaml-cpp0.3 to stat1002 and 1003 for Ironholds [puppet] - 10https://gerrit.wikimedia.org/r/174131 [15:31:05] (03CR) 10jenkins-bot: [V: 04-1] Add libboost-regex-dev, libboost-system-dev and libyaml-cpp0.3 to stat1002 and 1003 for Ironholds [puppet] - 10https://gerrit.wikimedia.org/r/174131 (owner: 10Ottomata) [15:32:01] !log Deleting job https://integration.wikimedia.org/ci/job/mediawiki-vendor-integration/ replaced by mediawiki-phpunit. Clearing out workspaces {{bug|73515}} [15:32:06] Logged the message, Master [15:32:08] (03PS2) 10Ottomata: Add libboost-regex-dev, libboost-system-dev and libyaml-cpp0.3 to stat1002 and 1003 for Ironholds [puppet] - 10https://gerrit.wikimedia.org/r/174131 [15:34:03] RECOVERY - DPKG on analytics1015 is OK: All packages OK [15:35:44] PROBLEM - puppet last run on analytics1015 is CRITICAL: Connection refused by host [15:37:54] PROBLEM - Host analytics1015 is DOWN: PING CRITICAL - Packet loss = 100% [15:39:02] just ran across a most curious cronjob [15:39:05] set to run on the 27th minute [15:39:07] no comment [15:39:14] RECOVERY - Host analytics1015 is UP: PING OK - Packet loss = 0%, RTA = 1.84 ms [15:43:24] RECOVERY - puppet last run on analytics1015 is OK: OK: Puppet is currently enabled, last run 1 hour ago with 0 failures [15:43:39] !log starting trusty upgrade of analytics1016 [15:43:44] Logged the message, Master [15:49:44] PROBLEM - DPKG on analytics1016 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [15:50:17] * anomie will SWAT today [15:50:32] James_F|Away, gi11es: Ping for SWAT in 10 minutes [15:50:47] anomie: pong [15:53:15] (03PS1) 10Yuvipanda: puppetmaster: Make time to keep old reports for configurable [puppet] - 10https://gerrit.wikimedia.org/r/174132 (https://bugzilla.wikimedia.org/73472) [15:53:17] _joe_: ^ [15:54:19] Hi operations! [15:54:24] * AndyRussG waves [15:55:13] Just thought I'd drop by to check that everything seems OK following our no-op deploy of new CentralNotice code to some sites yesterday... [15:55:35] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "While good in general, please see my comment." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/174132 (https://bugzilla.wikimedia.org/73472) (owner: 10Yuvipanda) [15:55:58] (03CR) 10Hashar: [C: 04-1] "puppet.master.reports = logstash, would stop writing the yaml reports on disk which might be a problem. See inline diff for details :)" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/143788 (https://bugzilla.wikimedia.org/60690) (owner: 10BryanDavis) [15:56:27] ^ YuviPanda, Reedy, _joe_, ^d :) [15:56:50] what's the prod puppetmaster again? [15:57:00] palladium [15:57:02] <_joe_> YuviPanda: palladium/strontium [15:57:20] strontium is such a cool name [15:57:49] main possible issue would be cache problems on RL modules or some unexpected failure of JS on some clients [15:57:55] anomie: Reporting for SWAT. [15:58:23] PROBLEM - puppet last run on analytics1016 is CRITICAL: Connection refused by host [15:58:30] <_joe_> YuviPanda: why do you think it's cool? [15:58:37] I don't know. it just sounds cool? [15:58:52] <_joe_> not to an italian ear for sure :P [15:58:57] haha [15:59:02] palladium sounds somewhat holy [15:59:17] strontium doesn't care, however. Sounds very irreverant. [15:59:20] _joe_: you mean to a modern-day Latin ear? [15:59:44] <_joe_> it's pronounced similar to http://en.wiktionary.org/wiki/stronzo [16:00:04] manybubbles, anomie, ^d, marktraceur, James_F: Dear anthropoid, the time has come. Please deploy Morning SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141118T1600). [16:00:06] James_F: I'll do yours first. [16:00:11] (03PS2) 10Anomie: Enable VisualEditor by default on Catalan Wikiquote (cawikiquote) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174036 (https://bugzilla.wikimedia.org/73483) (owner: 10Jforrester) [16:00:20] (03CR) 10Anomie: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174036 (https://bugzilla.wikimedia.org/73483) (owner: 10Jforrester) [16:00:25] ah [16:00:44] PROBLEM - check if dhclient is running on analytics1016 is CRITICAL: Connection refused by host [16:00:45] (03Merged) 10jenkins-bot: Enable VisualEditor by default on Catalan Wikiquote (cawikiquote) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174036 (https://bugzilla.wikimedia.org/73483) (owner: 10Jforrester) [16:00:52] _joe_: hmm, I seem to pronounce it as strawn-tee-em [16:00:54] PROBLEM - check configured eth on analytics1016 is CRITICAL: Connection refused by host [16:01:02] _joe_: oh, wow, didn't realize what stronzo meant [16:01:03] PROBLEM - Hadoop DataNode on analytics1016 is CRITICAL: Connection refused by host [16:01:04] PROBLEM - RAID on analytics1016 is CRITICAL: Connection refused by host [16:01:04] PROBLEM - Hadoop NodeManager on analytics1016 is CRITICAL: Connection refused by host [16:01:04] PROBLEM - Disk space on analytics1016 is CRITICAL: Connection refused by host [16:01:05] !log anomie Synchronized visualeditor.dblist: SWAT: Enable VisualEditor by default on Catalan Wikiquote (cawikiquote) [[gerrit:174036]] (duration: 00m 09s) [16:01:09] Logged the message, Master [16:01:14] !log anomie Synchronized visualeditor-default.dblist: SWAT: Enable VisualEditor by default on Catalan Wikiquote (cawikiquote) [[gerrit:174036]] (duration: 00m 09s) [16:01:15] James_F: ^^^ ^ test please [16:01:17] Logged the message, Master [16:01:33] PROBLEM - check if salt-minion is running on analytics1016 is CRITICAL: Connection refused by host [16:01:39] gi11es: You're next [16:01:39] <_joe_> YuviPanda: "tium" is pronounced "zium" in latin [16:01:45] anomie: You need to touch a PHP file. [16:01:58] James_F: Which PHP file needs touching? [16:01:59] anomie: dblist-only syncs don't. [16:02:02] _joe_: oh, so strontium is stronzium. ok, that *is* very close [16:02:14] anomie: Any one; it's a deployment bug. [16:02:19] !log replaying some searches against cirrus to make *super* *duper* sure it won't fall over tomorrow when we enable enwiki [16:02:20] Logged the message, Master [16:02:38] manybubbles: Gosh, enwiki's tomorrow? That's awesome. [16:02:46] James_F: took us long enough:) [16:02:47] !log anomie Synchronized wmf-config/InitialiseSettings.php: Touch a random PHP file, supposedly required (duration: 00m 09s) [16:02:49] Logged the message, Master [16:02:51] manybubbles: Still. ;-) [16:02:53] James_F: ^ There you go [16:03:19] anomie: Bingo, working great. Thanks! [16:03:32] gi11es: Ok, doing your logging patch first. [16:04:33] RECOVERY - check if salt-minion is running on analytics1016 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [16:04:43] RECOVERY - check if dhclient is running on analytics1016 is OK: PROCS OK: 0 processes with command name dhclient [16:05:03] RECOVERY - check configured eth on analytics1016 is OK: NRPE: Unable to read output [16:05:14] RECOVERY - Hadoop DataNode on analytics1016 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.hdfs.server.datanode.DataNode [16:05:14] RECOVERY - RAID on analytics1016 is OK: OK: no disks configured for RAID [16:05:20] <^d> James_F: Be sure to be in the office. I'm bringing cake :) [16:05:23] RECOVERY - Hadoop NodeManager on analytics1016 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [16:05:23] RECOVERY - Disk space on analytics1016 is OK: DISK OK [16:05:28] ^d: Woo. Will do. :-) [16:05:41] <_joe_> ^d: on my birthday, and I won't have any :/ [16:06:05] <^d> _joe_: Nobody's stopping you from getting cake :) [16:06:20] !log replaying 20,000 searches at approximately the same speed that they were issued caused only marginal bounce in load (cluster load average was 13% and two machines went about 20%). We're ready from a performance standpoint. yay [16:06:22] <_joe_> not _your_one though [16:06:23] Logged the message, Master [16:06:33] PROBLEM - CI: Low disk space on /var on labmon1001 is CRITICAL: CRITICAL: integration.integration-puppetmaster.diskspace._var.byte_avail.value (11.11%) [16:06:46] <_joe_> free food at the office is the only thing I really miss from an office [16:07:14] <_joe_> here I have my SO guarding the fridge [16:07:31] * YuviPanda remembers eating all the M&Ms and coke in the office [16:07:32] anomie: I don't have a way to test that one. it's harmless, though [16:07:39] HFCS Coke mmmm [16:07:51] YuviPanda: Tsk. :-) [16:07:55] is someone having a birthday today? [16:08:09] <_joe_> Nikerabbit: no, tomorrow [16:08:18] it's going to be tomorrow pretty soon here [16:08:35] _joe_: dang, so close [16:08:51] <_joe_> Nikerabbit: happy birthday :) [16:08:58] Nikerabbit: is it your birthday? [16:09:59] YuviPanda: well practically speaking I don't own it but yes [16:10:08] buon antecompleanno, _joe_ [16:10:19] Nikerabbit: happy birthday :) [16:10:32] <_joe_> oh I didn't really want to start all this [16:10:37] :D [16:10:56] :p [16:11:03] <_joe_> I was just envious of SF people that will get cake [16:12:01] SF people keep a piece for you until january _joe_ [16:12:03] it will be yummie [16:12:11] <_joe_> ahah [16:12:37] http://en.wikipedia.org/wiki/Cake_(band) > http://en.wikipedia.org/wiki/Cake > http://en.wikipedia.org/wiki/Pie [16:13:11] <_joe_> bblack: "the cake" were a pretty decent band in fact. But digressing again. [16:14:15] Sigh. Jenkins, you're being especially slow today. [16:15:06] anomie: he's just showing his age [16:15:35] anomie: Zuul monitor shows a Wikibase task that's been running for 17 minutes. [16:15:54] anomie: Just finished; next up is, umm, another one. Yay. [16:16:15] WTF? It ran all the tests, and now it's going to run them again? [16:16:33] RECOVERY - DPKG on analytics1016 is OK: All packages OK [16:17:53] anomie: A different commit, I think. [16:18:19] !log rubidium+eeden gdnsd upgraded to 2.1.0 (baham was already there) [16:18:23] PROBLEM - puppet last run on eeden is CRITICAL: CRITICAL: Puppet has 1 failures [16:18:23] Logged the message, Master [16:19:16] James_F: The Zuul page was showing two wikibase changes, then the one I'm waiting for, then another. Finally the first WB change finished, and it decided to start the tests over for all the rest of the changes in line. [16:19:44] anomie: Isn't that because of the external test system that Wikibase uses? It can't chain them, unlikely everything else. [16:20:12] PROBLEM - puppet last run on analytics1016 is CRITICAL: Connection refused by host [16:22:17] Bah! It had run all the tests for my change, and the WB change finished, and it decided to start over again! [16:22:22] PROBLEM - Host analytics1016 is DOWN: PING CRITICAL - Packet loss = 100% [16:22:48] <_joe_> ottomata: is that ^^ you? [16:23:12] RECOVERY - Host analytics1016 is UP: PING OK - Packet loss = 0%, RTA = 0.33 ms [16:23:48] yup [16:23:57] _joe_, i'd schedule downtime in icinga [16:24:01] but it isn't working this week, dunno [16:24:04] i am not authorized to do so :/ [16:24:14] <_joe_> ottomata: mmmh I think coren had the same issue [16:25:45] _joe_: Is something interesting happening here or did you just skip this file? https://gerrit.wikimedia.org/r/#/c/173904/3/modules/openstack/manifests/common.pp [16:26:38] <_joe_> andrewbogott: uhm? [16:26:48] <_joe_> andrewbogott: typo! [16:27:00] <_joe_> $::openstack::version of course [16:27:27] Shouldn't we just remove $openstack_version=$::openstack_version in the params and then require openstack? [16:27:30] oh mana [16:27:33] grrrit-wm is sick [16:27:38] <^d> obvs. [16:27:55] !log anomie Synchronized php-1.25wmf8/includes/filebackend: SWAT: Log more details about backend-fail-internal errors [[gerrit:174128]] (duration: 00m 09s) [16:27:56] gi11es: ^ Well, you said you can't test that. [16:28:00] Logged the message, Master [16:28:07] !log starting upgrade to trusty of analytics1017 [16:28:09] Logged the message, Master [16:29:16] _joe_: ^ ? [16:29:20] !log anomie Synchronized php-1.25wmf8/extensions/MultimediaViewer: SWAT: Media Viewer UI bugfixes [[gerrit:174116]] (duration: 00m 09s) [16:29:22] gi11es: ^ Test please [16:29:23] Logged the message, Master [16:29:54] anomie: testing... [16:30:27] <_joe_> andrewbogott: it depends on the single case, and if the templates of the class use @openstack_version [16:30:33] !log restarting txstatsd on tungsten to drop old metrics [16:30:35] Logged the message, Master [16:30:50] godog: no other way to do that other than restarting, is there? [16:30:52] is painful on labs [16:30:54] Ah, ok. As it turns out, openstack_version isn't referenced in any templates. [16:31:04] YuviPanda: not afaict, it is indeed a pain [16:31:13] sigh [16:31:29] <_joe_> andrewbogott: so go on, mine wanted just to be a way to show what to do [16:31:36] yep, ok. [16:31:59] jgage: yo! [16:32:50] !log anomie Synchronized php-1.25wmf8/extensions/MultimediaViewer: SWAT: Media Viewer UI bugfixes [[gerrit:174116]] (for real this time) (duration: 00m 09s) [16:32:53] Logged the message, Master [16:32:57] gi11es: ^ If you weren't seeing the changes, it's because I screwed up [16:33:27] anomie: indeed I wasn't, but I was being patient and retrying :) I often see the changes with a delay [16:34:23] anomie: I see tha changes, SWAT successful, thanks :) [16:34:32] PROBLEM - DPKG on analytics1017 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [16:34:45] _joe_: one more… in this patch is there any reason why you use 'require openstack' vs. 'include openstack'? [16:35:09] !log anomie Synchronized php-1.25wmf8/extensions/SecurePoll: SWAT: Fix SecurePollContent handling [[gerrit:174125]] (duration: 00m 09s) [16:35:10] anomie: ^ Test please [16:35:11] <_joe_> require asks openstack to be included before the current class [16:35:12] Logged the message, Master [16:35:53] RECOVERY - puppet last run on eeden is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [16:36:33] RECOVERY - CI: Low disk space on /var on labmon1001 is OK: OK: All targets OK [16:36:35] anomie: Works! [16:36:47] * anomie is done with SWAT [16:43:40] PROBLEM - puppet last run on analytics1017 is CRITICAL: Connection refused by host [16:45:50] PROBLEM - Hadoop DataNode on analytics1017 is CRITICAL: Connection refused by host [16:45:51] PROBLEM - Disk space on analytics1017 is CRITICAL: Connection refused by host [16:45:54] (03PS11) 10BryanDavis: Allow puppetmaster to send reports to logstash [puppet] - 10https://gerrit.wikimedia.org/r/143788 (https://bugzilla.wikimedia.org/60690) [16:45:56] (03PS12) 10BryanDavis: Allow puppetmaster to send reports to logstash [puppet] - 10https://gerrit.wikimedia.org/r/143788 (https://bugzilla.wikimedia.org/60690) [16:46:00] PROBLEM - check configured eth on analytics1017 is CRITICAL: Connection refused by host [16:46:01] PROBLEM - Hadoop NodeManager on analytics1017 is CRITICAL: Connection refused by host [16:46:10] PROBLEM - check if dhclient is running on analytics1017 is CRITICAL: Connection refused by host [16:46:31] PROBLEM - check if salt-minion is running on analytics1017 is CRITICAL: Connection refused by host [16:46:31] PROBLEM - RAID on analytics1017 is CRITICAL: Connection refused by host [16:49:20] RECOVERY - check if dhclient is running on analytics1017 is OK: PROCS OK: 0 processes with command name dhclient [16:49:40] RECOVERY - check if salt-minion is running on analytics1017 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [16:49:41] RECOVERY - RAID on analytics1017 is OK: OK: no disks configured for RAID [16:50:00] RECOVERY - Hadoop DataNode on analytics1017 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.hdfs.server.datanode.DataNode [16:50:01] RECOVERY - Disk space on analytics1017 is OK: DISK OK [16:50:10] RECOVERY - check configured eth on analytics1017 is OK: NRPE: Unable to read output [16:50:20] RECOVERY - Hadoop NodeManager on analytics1017 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [16:50:51] <^d> YuviPanda: You working on gerrit bot? I'm getting lots of e-mails from bigbrother :p [16:51:10] ^d: no, I tried to stop it and then got super annoyed at bigbrother for not letting me [16:51:16] ^d: it started thrashng wildly [16:51:35] AzaToth is thankfully taking a look [16:51:52] <^d> gotcha [16:56:02] hashar: requested by you in Oct 2013 https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=terbium&service=Mediawiki+Apple+Dictionary+Bridge [16:56:43] just added and resolving RT [16:56:58] mutante: great :-] [16:57:03] heading to a conf call [17:00:04] manybubbles, ^d, Nikerabbit: Respected human, time to deploy Translate to Elasticsearch (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141118T1700). Please do the needful. [17:00:11] <^d> yo yo jouncebot [17:00:28] (03PS2) 10Giuseppe Lavagetto: icinga: add monitor_group definition [puppet] - 10https://gerrit.wikimedia.org/r/174142 [17:00:39] shouldn't that be "humans"? [17:00:49] (03CR) 10Dzahn: "works now - in Icinga https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=terbium&service=Mediawiki+Apple+Dictionary+Bridg" [puppet] - 10https://gerrit.wikimedia.org/r/171193 (owner: 10Dzahn) [17:01:31] * bd808 resists urge to file a bug for plural support in jouncebot messages [17:01:32] (03CR) 10Giuseppe Lavagetto: [C: 032] icinga: add monitor_group definition [puppet] - 10https://gerrit.wikimedia.org/r/174142 (owner: 10Giuseppe Lavagetto) [17:02:58] <^d> bd808: We could just hardcode {{PLURAL:$1|human|humans}} [17:03:02] <^d> We'd all get the joke [17:03:46] <^d> Nikerabbit: Ok, let's do this :) [17:04:12] ^d: ok, on this channel or the other? [17:04:51] PROBLEM - puppet last run on es2008 is CRITICAL: CRITICAL: puppet fail [17:05:01] PROBLEM - puppet last run on ms-fe2004 is CRITICAL: CRITICAL: puppet fail [17:05:14] PROBLEM - puppet last run on elastic1004 is CRITICAL: CRITICAL: puppet fail [17:05:22] PROBLEM - puppet last run on mw1174 is CRITICAL: CRITICAL: puppet fail [17:05:22] PROBLEM - puppet last run on analytics1020 is CRITICAL: CRITICAL: puppet fail [17:05:25] <^d> Nikerabbit: Let's do here in case we have to !log [17:05:31] <^d> And yeah, your patch looks good to go [17:05:31] sure [17:05:33] PROBLEM - puppet last run on cp1047 is CRITICAL: CRITICAL: puppet fail [17:05:40] PROBLEM - puppet last run on analytics1040 is CRITICAL: CRITICAL: puppet fail [17:05:41] PROBLEM - puppet last run on es1008 is CRITICAL: CRITICAL: puppet fail [17:05:48] (03CR) 10Chad: [C: 032] Add read only configuration for ElasticSearchTTMServer [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172534 (owner: 10Nikerabbit) [17:05:50] PROBLEM - puppet last run on helium is CRITICAL: CRITICAL: puppet fail [17:05:54] PROBLEM - puppet last run on ms-be2004 is CRITICAL: CRITICAL: puppet fail [17:05:54] PROBLEM - puppet last run on db2005 is CRITICAL: CRITICAL: puppet fail [17:05:54] PROBLEM - puppet last run on es2001 is CRITICAL: CRITICAL: puppet fail [17:06:00] PROBLEM - puppet last run on mw1026 is CRITICAL: CRITICAL: puppet fail [17:06:03] (03Merged) 10jenkins-bot: Add read only configuration for ElasticSearchTTMServer [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172534 (owner: 10Nikerabbit) [17:06:11] PROBLEM - puppet last run on elastic1021 is CRITICAL: CRITICAL: puppet fail [17:06:12] PROBLEM - puppet last run on wtp1016 is CRITICAL: CRITICAL: puppet fail [17:06:19] <^d> Who broke puppet? [17:06:30] PROBLEM - puppet last run on lvs1002 is CRITICAL: CRITICAL: puppet fail [17:06:31] PROBLEM - puppet last run on db1050 is CRITICAL: CRITICAL: puppet fail [17:06:40] PROBLEM - puppet last run on mw1009 is CRITICAL: CRITICAL: puppet fail [17:06:41] PROBLEM - puppet last run on mw1153 is CRITICAL: CRITICAL: puppet fail [17:06:41] PROBLEM - puppet last run on ms1004 is CRITICAL: CRITICAL: puppet fail [17:06:41] PROBLEM - puppet last run on search1010 is CRITICAL: CRITICAL: puppet fail [17:06:42] PROBLEM - puppet last run on lvs1005 is CRITICAL: CRITICAL: puppet fail [17:06:42] PROBLEM - puppet last run on mw1164 is CRITICAL: CRITICAL: puppet fail [17:06:48] !log demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 04s) [17:06:50] Logged the message, Master [17:06:50] PROBLEM - puppet last run on mw1060 is CRITICAL: CRITICAL: puppet fail [17:07:00] !log demon Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 04s) [17:07:00] PROBLEM - puppet last run on db1015 is CRITICAL: CRITICAL: puppet fail [17:07:02] Logged the message, Master [17:07:10] PROBLEM - puppet last run on ruthenium is CRITICAL: CRITICAL: puppet fail [17:07:11] PROBLEM - puppet last run on mw1061 is CRITICAL: CRITICAL: puppet fail [17:07:11] PROBLEM - puppet last run on cp3003 is CRITICAL: CRITICAL: puppet fail [17:07:11] PROBLEM - puppet last run on labstore1003 is CRITICAL: CRITICAL: puppet fail [17:07:22] PROBLEM - puppet last run on analytics1030 is CRITICAL: CRITICAL: puppet fail [17:07:23] PROBLEM - puppet last run on amssq35 is CRITICAL: CRITICAL: puppet fail [17:07:23] PROBLEM - puppet last run on mw1025 is CRITICAL: CRITICAL: puppet fail [17:07:23] PROBLEM - puppet last run on ms-fe2001 is CRITICAL: CRITICAL: puppet fail [17:07:23] PROBLEM - puppet last run on mw1069 is CRITICAL: CRITICAL: puppet fail [17:07:30] PROBLEM - puppet last run on ms-fe1004 is CRITICAL: CRITICAL: puppet fail [17:07:30] PROBLEM - puppet last run on virt1006 is CRITICAL: CRITICAL: puppet fail [17:07:30] PROBLEM - puppet last run on db2018 is CRITICAL: CRITICAL: puppet fail [17:07:31] PROBLEM - puppet last run on db1059 is CRITICAL: CRITICAL: puppet fail [17:07:31] PROBLEM - puppet last run on mw1119 is CRITICAL: CRITICAL: puppet fail [17:07:40] PROBLEM - puppet last run on cp4003 is CRITICAL: CRITICAL: puppet fail [17:07:40] PROBLEM - puppet last run on db1067 is CRITICAL: CRITICAL: puppet fail [17:07:42] PROBLEM - puppet last run on mw1123 is CRITICAL: CRITICAL: puppet fail [17:07:42] PROBLEM - puppet last run on cp4004 is CRITICAL: CRITICAL: puppet fail [17:07:50] PROBLEM - puppet last run on mw1166 is CRITICAL: CRITICAL: puppet fail [17:07:50] PROBLEM - puppet last run on cp4008 is CRITICAL: CRITICAL: puppet fail [17:07:50] PROBLEM - puppet last run on iron is CRITICAL: CRITICAL: puppet fail [17:07:50] PROBLEM - puppet last run on mw1065 is CRITICAL: CRITICAL: puppet fail [17:07:51] PROBLEM - puppet last run on amssq47 is CRITICAL: CRITICAL: puppet fail [17:08:00] PROBLEM - puppet last run on db1034 is CRITICAL: CRITICAL: puppet fail [17:08:01] PROBLEM - puppet last run on labcontrol2001 is CRITICAL: CRITICAL: puppet fail [17:08:01] PROBLEM - puppet last run on elastic1030 is CRITICAL: CRITICAL: puppet fail [17:08:01] PROBLEM - puppet last run on search1018 is CRITICAL: CRITICAL: puppet fail [17:08:11] PROBLEM - puppet last run on mw1118 is CRITICAL: CRITICAL: puppet fail [17:08:21] PROBLEM - puppet last run on mw1002 is CRITICAL: CRITICAL: puppet fail [17:08:21] PROBLEM - puppet last run on db1046 is CRITICAL: CRITICAL: puppet fail [17:08:21] PROBLEM - puppet last run on amssq48 is CRITICAL: CRITICAL: puppet fail [17:08:21] PROBLEM - puppet last run on mw1114 is CRITICAL: CRITICAL: puppet fail [17:08:22] PROBLEM - puppet last run on mw1092 is CRITICAL: CRITICAL: puppet fail [17:08:31] PROBLEM - puppet last run on logstash1002 is CRITICAL: CRITICAL: puppet fail [17:08:40] ^d: we can try running the bootstrap on testwiki now? [17:08:40] PROBLEM - puppet last run on dbproxy1001 is CRITICAL: CRITICAL: puppet fail [17:08:43] PROBLEM - puppet last run on db1052 is CRITICAL: CRITICAL: puppet fail [17:08:44] PROBLEM - puppet last run on mw1129 is CRITICAL: CRITICAL: puppet fail [17:08:44] PROBLEM - puppet last run on ms-fe2003 is CRITICAL: CRITICAL: puppet fail [17:08:44] PROBLEM - puppet last run on lvs2001 is CRITICAL: CRITICAL: puppet fail [17:08:51] PROBLEM - puppet last run on db1021 is CRITICAL: CRITICAL: puppet fail [17:08:52] <^d> Nikerabbit: Yeah, go ahead and kick that off from terbium [17:08:52] PROBLEM - puppet last run on mw1172 is CRITICAL: CRITICAL: puppet fail [17:08:52] PROBLEM - puppet last run on search1007 is CRITICAL: CRITICAL: puppet fail [17:08:53] PROBLEM - puppet last run on db1003 is CRITICAL: CRITICAL: puppet fail [17:09:00] <_joe_> mmmh [17:09:00] PROBLEM - puppet last run on mw1054 is CRITICAL: CRITICAL: puppet fail [17:09:00] PROBLEM - puppet last run on amssq60 is CRITICAL: CRITICAL: puppet fail [17:09:01] PROBLEM - puppet last run on mw1149 is CRITICAL: CRITICAL: puppet fail [17:09:02] ^d: ok [17:09:05] <_joe_> something I did I'd say [17:09:10] PROBLEM - puppet last run on es1007 is CRITICAL: CRITICAL: puppet fail [17:09:10] PROBLEM - puppet last run on db2038 is CRITICAL: CRITICAL: puppet fail [17:09:26] PROBLEM - puppet last run on elastic1022 is CRITICAL: CRITICAL: puppet fail [17:09:26] PROBLEM - puppet last run on mc1012 is CRITICAL: CRITICAL: puppet fail [17:09:26] PROBLEM - puppet last run on antimony is CRITICAL: CRITICAL: puppet fail [17:09:26] RECOVERY - DPKG on analytics1017 is OK: All packages OK [17:09:32] <^d> _joe_: Yeah, looks like that openldap change. [17:09:33] PROBLEM - puppet last run on amssq34 is CRITICAL: CRITICAL: puppet fail [17:09:33] PROBLEM - puppet last run on cp3010 is CRITICAL: CRITICAL: puppet fail [17:09:36] ^d: it's been a while ... mwscript extensions/Translate/scripts/foo.php --wiki testwiki --more? [17:09:40] PROBLEM - puppet last run on mw1044 is CRITICAL: CRITICAL: puppet fail [17:09:45] <^d> Nikerabbit: Yep, that's it. [17:09:50] PROBLEM - puppet last run on db2029 is CRITICAL: CRITICAL: puppet fail [17:09:50] PROBLEM - puppet last run on install2001 is CRITICAL: CRITICAL: puppet fail [17:09:51] PROBLEM - puppet last run on wtp1012 is CRITICAL: CRITICAL: puppet fail [17:09:51] PROBLEM - puppet last run on wtp1005 is CRITICAL: CRITICAL: puppet fail [17:09:52] PROBLEM - puppet last run on mw1055 is CRITICAL: CRITICAL: puppet fail [17:09:52] PROBLEM - puppet last run on virt1003 is CRITICAL: CRITICAL: puppet fail [17:09:55] <_joe_> ^d: fixing it [17:10:02] PROBLEM - puppet last run on mw1190 is CRITICAL: CRITICAL: puppet fail [17:10:02] PROBLEM - puppet last run on cp4019 is CRITICAL: CRITICAL: puppet fail [17:10:10] PROBLEM - puppet last run on bast4001 is CRITICAL: CRITICAL: puppet fail [17:10:13] PROBLEM - puppet last run on mw1111 is CRITICAL: CRITICAL: puppet fail [17:10:13] PROBLEM - puppet last run on mw1098 is CRITICAL: CRITICAL: puppet fail [17:10:20] PROBLEM - puppet last run on labnet1001 is CRITICAL: CRITICAL: puppet fail [17:10:24] PROBLEM - puppet last run on db1060 is CRITICAL: CRITICAL: puppet fail [17:10:24] PROBLEM - puppet last run on mw1079 is CRITICAL: CRITICAL: puppet fail [17:10:24] PROBLEM - puppet last run on cp1046 is CRITICAL: CRITICAL: puppet fail [17:10:25] PROBLEM - puppet last run on analytics1013 is CRITICAL: CRITICAL: puppet fail [17:10:25] PROBLEM - puppet last run on cp4001 is CRITICAL: CRITICAL: puppet fail [17:10:25] PROBLEM - puppet last run on db2016 is CRITICAL: CRITICAL: puppet fail [17:10:25] PROBLEM - puppet last run on virt1001 is CRITICAL: CRITICAL: puppet fail [17:10:30] PROBLEM - puppet last run on lithium is CRITICAL: CRITICAL: puppet fail [17:10:31] PROBLEM - puppet last run on amssq51 is CRITICAL: CRITICAL: puppet fail [17:10:31] PROBLEM - puppet last run on mw1014 is CRITICAL: CRITICAL: puppet fail [17:10:40] PROBLEM - puppet last run on elastic1006 is CRITICAL: CRITICAL: puppet fail [17:10:46] PROBLEM - puppet last run on analytics1016 is CRITICAL: CRITICAL: puppet fail [17:10:46] PROBLEM - puppet last run on amssq40 is CRITICAL: CRITICAL: puppet fail [17:10:46] PROBLEM - puppet last run on mw1084 is CRITICAL: CRITICAL: puppet fail [17:10:46] PROBLEM - puppet last run on db1026 is CRITICAL: CRITICAL: puppet fail [17:11:01] PROBLEM - puppet last run on lvs4003 is CRITICAL: CRITICAL: puppet fail [17:11:06] PROBLEM - puppet last run on snapshot1002 is CRITICAL: CRITICAL: puppet fail [17:11:10] PROBLEM - puppet last run on mw1168 is CRITICAL: CRITICAL: puppet fail [17:11:10] PROBLEM - puppet last run on sodium is CRITICAL: CRITICAL: puppet fail [17:11:11] PROBLEM - puppet last run on labmon1001 is CRITICAL: CRITICAL: puppet fail [17:11:18] ^d: um what's this [17:11:19] Fatal error: Call to undefined method Profiler::setInstance() in /srv/mediawiki/php-1.25wmf8/extensions/Translate/scripts/ttmserver-export.php on line 55 [17:11:21] PROBLEM - puppet last run on ms-be1012 is CRITICAL: CRITICAL: puppet fail [17:11:21] PROBLEM - puppet last run on gadolinium is CRITICAL: CRITICAL: puppet fail [17:11:21] PROBLEM - puppet last run on mc1005 is CRITICAL: CRITICAL: puppet fail [17:11:21] PROBLEM - puppet last run on db2023 is CRITICAL: CRITICAL: puppet fail [17:11:21] PROBLEM - puppet last run on search1017 is CRITICAL: CRITICAL: puppet fail [17:11:21] PROBLEM - puppet last run on hooft is CRITICAL: CRITICAL: puppet fail [17:11:22] PROBLEM - puppet last run on amssq36 is CRITICAL: CRITICAL: puppet fail [17:11:31] PROBLEM - puppet last run on db1071 is CRITICAL: CRITICAL: puppet fail [17:11:32] PROBLEM - puppet last run on db2001 is CRITICAL: CRITICAL: puppet fail [17:11:32] PROBLEM - puppet last run on cp4005 is CRITICAL: CRITICAL: puppet fail [17:11:32] PROBLEM - puppet last run on search1005 is CRITICAL: CRITICAL: puppet fail [17:11:33] PROBLEM - puppet last run on rubidium is CRITICAL: CRITICAL: puppet fail [17:11:40] PROBLEM - puppet last run on mw1034 is CRITICAL: CRITICAL: puppet fail [17:11:42] PROBLEM - puppet last run on mw1030 is CRITICAL: CRITICAL: puppet fail [17:11:43] PROBLEM - puppet last run on ssl3002 is CRITICAL: CRITICAL: puppet fail [17:11:43] PROBLEM - puppet last run on amssq56 is CRITICAL: CRITICAL: puppet fail [17:11:43] PROBLEM - puppet last run on elastic1015 is CRITICAL: CRITICAL: puppet fail [17:11:43] PROBLEM - puppet last run on mw1056 is CRITICAL: CRITICAL: puppet fail [17:11:50] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [17:11:52] PROBLEM - puppet last run on mw1050 is CRITICAL: CRITICAL: puppet fail [17:11:52] PROBLEM - puppet last run on berkelium is CRITICAL: CRITICAL: puppet fail [17:11:54] ^d: I guess that's the recent profiling changes... perhaps I could just comment that line out (it's there to reduce memory usage)? [17:12:00] PROBLEM - puppet last run on mw1159 is CRITICAL: CRITICAL: puppet fail [17:12:01] PROBLEM - puppet last run on mw1081 is CRITICAL: CRITICAL: puppet fail [17:12:09] (03PS1) 10Giuseppe Lavagetto: monitoring: fix duplicate hostgroup def [puppet] - 10https://gerrit.wikimedia.org/r/174144 [17:12:11] PROBLEM - puppet last run on ssl1009 is CRITICAL: CRITICAL: puppet fail [17:12:13] PROBLEM - puppet last run on mw1156 is CRITICAL: CRITICAL: puppet fail [17:12:21] PROBLEM - puppet last run on labsdb1006 is CRITICAL: CRITICAL: puppet fail [17:12:22] <^d> Nikerabbit: Yeah just comment out, I missed Translate when I made profiling changes. [17:12:30] PROBLEM - puppet last run on mw1188 is CRITICAL: CRITICAL: puppet fail [17:12:30] PROBLEM - puppet last run on mw1053 is CRITICAL: CRITICAL: puppet fail [17:12:40] PROBLEM - puppet last run on mw1004 is CRITICAL: CRITICAL: puppet fail [17:12:40] PROBLEM - puppet last run on ms-be1009 is CRITICAL: CRITICAL: puppet fail [17:12:40] PROBLEM - puppet last run on analytics1037 is CRITICAL: CRITICAL: puppet fail [17:12:41] PROBLEM - puppet last run on amssq62 is CRITICAL: CRITICAL: puppet fail [17:12:41] PROBLEM - puppet last run on wtp1023 is CRITICAL: CRITICAL: puppet fail [17:12:41] PROBLEM - puppet last run on wtp1013 is CRITICAL: CRITICAL: puppet fail [17:12:51] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [17:13:10] PROBLEM - puppet last run on db2003 is CRITICAL: CRITICAL: puppet fail [17:13:11] PROBLEM - puppet last run on elastic1011 is CRITICAL: CRITICAL: puppet fail [17:13:11] PROBLEM - puppet last run on rcs1001 is CRITICAL: CRITICAL: puppet fail [17:13:20] PROBLEM - puppet last run on db1057 is CRITICAL: CRITICAL: puppet fail [17:13:20] PROBLEM - puppet last run on analytics1017 is CRITICAL: Timeout while attempting connection [17:13:20] PROBLEM - puppet last run on lvs2003 is CRITICAL: CRITICAL: puppet fail [17:13:21] PROBLEM - puppet last run on tungsten is CRITICAL: CRITICAL: puppet fail [17:13:21] PROBLEM - puppet last run on wtp1022 is CRITICAL: CRITICAL: puppet fail [17:13:21] PROBLEM - puppet last run on mw1029 is CRITICAL: CRITICAL: puppet fail [17:13:27] <_joe_> ok the problem wasn't my patch [17:13:31] PROBLEM - puppet last run on search1015 is CRITICAL: CRITICAL: puppet fail [17:13:39] <_joe_> but the failure to merge the change on strontium [17:13:41] PROBLEM - puppet last run on mw1032 is CRITICAL: CRITICAL: puppet fail [17:13:46] ^d: hmm I don't have rights to do that on terbium... can you? [17:13:48] PROBLEM - puppet last run on elastic1002 is CRITICAL: CRITICAL: puppet fail [17:13:49] PROBLEM - puppet last run on wtp1015 is CRITICAL: CRITICAL: puppet fail [17:14:00] PROBLEM - puppet last run on cp1060 is CRITICAL: CRITICAL: puppet fail [17:14:04] PROBLEM - puppet last run on elastic1023 is CRITICAL: CRITICAL: puppet fail [17:14:04] PROBLEM - puppet last run on mc1007 is CRITICAL: CRITICAL: puppet fail [17:14:04] RECOVERY - puppet last run on analytics1030 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [17:14:04] PROBLEM - puppet last run on mw1148 is CRITICAL: CRITICAL: puppet fail [17:14:05] PROBLEM - puppet last run on rdb1003 is CRITICAL: CRITICAL: puppet fail [17:14:21] PROBLEM - puppet last run on mw1186 is CRITICAL: CRITICAL: puppet fail [17:14:29] PROBLEM - puppet last run on labsdb1004 is CRITICAL: CRITICAL: puppet fail [17:14:29] PROBLEM - puppet last run on search1013 is CRITICAL: CRITICAL: puppet fail [17:14:29] PROBLEM - puppet last run on db1063 is CRITICAL: CRITICAL: puppet fail [17:14:29] PROBLEM - puppet last run on ocg1003 is CRITICAL: CRITICAL: puppet fail [17:14:40] PROBLEM - puppet last run on rdb1002 is CRITICAL: CRITICAL: puppet fail [17:14:41] PROBLEM - puppet last run on mw1033 is CRITICAL: CRITICAL: puppet fail [17:14:51] PROBLEM - puppet last run on db2030 is CRITICAL: CRITICAL: puppet fail [17:14:51] PROBLEM - puppet last run on mw1185 is CRITICAL: CRITICAL: puppet fail [17:15:00] PROBLEM - puppet last run on db1006 is CRITICAL: CRITICAL: puppet fail [17:15:01] PROBLEM - puppet last run on db1044 is CRITICAL: CRITICAL: puppet fail [17:15:04] (03CR) 10Giuseppe Lavagetto: [C: 032] monitoring: fix duplicate hostgroup def [puppet] - 10https://gerrit.wikimedia.org/r/174144 (owner: 10Giuseppe Lavagetto) [17:15:14] PROBLEM - puppet last run on ssl1001 is CRITICAL: CRITICAL: puppet fail [17:15:14] PROBLEM - puppet last run on mw1167 is CRITICAL: CRITICAL: puppet fail [17:15:14] PROBLEM - puppet last run on db1064 is CRITICAL: CRITICAL: puppet fail [17:15:19] !log demon Synchronized php-1.25wmf8/extensions/Translate/scripts/ttmserver-export.php: profiling hack (duration: 00m 06s) [17:15:20] <^d> Nikerabbit: ^ [17:15:20] PROBLEM - puppet last run on mw1209 is CRITICAL: CRITICAL: puppet fail [17:15:20] PROBLEM - Host analytics1017 is DOWN: PING CRITICAL - Packet loss = 100% [17:15:20] PROBLEM - puppet last run on mw1121 is CRITICAL: CRITICAL: puppet fail [17:15:21] Logged the message, Master [17:15:39] ^d: thanks [17:15:42] PROBLEM - puppet last run on virt1000 is CRITICAL: CRITICAL: puppet fail [17:15:50] PROBLEM - puppet last run on pc1003 is CRITICAL: CRITICAL: puppet fail [17:15:50] PROBLEM - puppet last run on search1011 is CRITICAL: CRITICAL: puppet fail [17:15:51] PROBLEM - puppet last run on db1011 is CRITICAL: CRITICAL: puppet fail [17:15:51] PROBLEM - puppet last run on mercury is CRITICAL: CRITICAL: puppet fail [17:15:55] <^d> yw [17:16:00] PROBLEM - puppet last run on db1072 is CRITICAL: CRITICAL: puppet fail [17:16:01] PROBLEM - puppet last run on cp1052 is CRITICAL: CRITICAL: puppet fail [17:16:01] PROBLEM - puppet last run on cp1037 is CRITICAL: CRITICAL: puppet fail [17:16:02] PROBLEM - puppet last run on iodine is CRITICAL: CRITICAL: puppet fail [17:16:02] PROBLEM - puppet last run on db1061 is CRITICAL: CRITICAL: puppet fail [17:16:10] RECOVERY - Host analytics1017 is UP: PING OK - Packet loss = 0%, RTA = 1.01 ms [17:16:10] PROBLEM - puppet last run on amssq31 is CRITICAL: CRITICAL: puppet fail [17:16:10] PROBLEM - puppet last run on ms-fe2002 is CRITICAL: CRITICAL: puppet fail [17:16:10] PROBLEM - puppet last run on baham is CRITICAL: CRITICAL: puppet fail [17:16:10] PROBLEM - puppet last run on mw1016 is CRITICAL: CRITICAL: puppet fail [17:16:11] PROBLEM - puppet last run on db1038 is CRITICAL: CRITICAL: puppet fail [17:16:11] PROBLEM - puppet last run on elastic1003 is CRITICAL: CRITICAL: puppet fail [17:16:12] PROBLEM - puppet last run on mw1220 is CRITICAL: CRITICAL: puppet fail [17:16:12] PROBLEM - puppet last run on ms-be1011 is CRITICAL: CRITICAL: puppet fail [17:16:20] PROBLEM - puppet last run on mw1086 is CRITICAL: CRITICAL: puppet fail [17:16:20] PROBLEM - puppet last run on mw1203 is CRITICAL: CRITICAL: puppet fail [17:16:20] PROBLEM - puppet last run on amssq39 is CRITICAL: CRITICAL: puppet fail [17:16:20] PROBLEM - puppet last run on lvs3003 is CRITICAL: CRITICAL: puppet fail [17:16:21] PROBLEM - puppet last run on dbstore1002 is CRITICAL: CRITICAL: puppet fail [17:16:21] PROBLEM - puppet last run on fluorine is CRITICAL: CRITICAL: puppet fail [17:16:21] PROBLEM - puppet last run on strontium is CRITICAL: CRITICAL: puppet fail [17:16:22] PROBLEM - puppet last run on cp1067 is CRITICAL: CRITICAL: puppet fail [17:16:22] PROBLEM - puppet last run on virt1008 is CRITICAL: CRITICAL: puppet fail [17:16:23] PROBLEM - puppet last run on ms-be1002 is CRITICAL: CRITICAL: puppet fail [17:16:23] PROBLEM - puppet last run on mw1204 is CRITICAL: CRITICAL: puppet fail [17:16:24] PROBLEM - puppet last run on search1004 is CRITICAL: CRITICAL: puppet fail [17:16:28] <_joe_> sigh [17:16:30] okay next mistake [17:16:30] PROBLEM - puppet last run on search1006 is CRITICAL: CRITICAL: puppet fail [17:16:31] Fatal error: Class 'ElasticsearchTTMServer' not found in /srv/mediawiki/php-1.25wmf8/extensions/Translate/ttmserver/TTMServer.php on line 29 [17:16:34] PROBLEM - puppet last run on mw1066 is CRITICAL: CRITICAL: puppet fail [17:16:34] PROBLEM - puppet last run on mw1143 is CRITICAL: CRITICAL: puppet fail [17:16:34] PROBLEM - puppet last run on mw1112 is CRITICAL: CRITICAL: puppet fail [17:16:35] PROBLEM - puppet last run on cp1044 is CRITICAL: CRITICAL: puppet fail [17:16:39] PROBLEM - puppet last run on mw1071 is CRITICAL: CRITICAL: puppet fail [17:16:42] I fscked up the case [17:16:53] PROBLEM - puppet last run on rbf1001 is CRITICAL: CRITICAL: puppet fail [17:16:53] PROBLEM - puppet last run on mw1193 is CRITICAL: CRITICAL: puppet fail [17:16:53] PROBLEM - puppet last run on db1033 is CRITICAL: CRITICAL: puppet fail [17:16:53] PROBLEM - puppet last run on mw1152 is CRITICAL: CRITICAL: puppet fail [17:16:54] PROBLEM - puppet last run on cp4020 is CRITICAL: CRITICAL: puppet fail [17:16:54] PROBLEM - puppet last run on bast1001 is CRITICAL: CRITICAL: puppet fail [17:16:54] PROBLEM - puppet last run on cp4009 is CRITICAL: CRITICAL: puppet fail [17:16:54] should be ElasticSearchTTMServer [17:16:55] RECOVERY - puppet last run on analytics1017 is OK: OK: Puppet is currently enabled, last run 54 minutes ago with 0 failures [17:16:55] PROBLEM - puppet last run on mw1064 is CRITICAL: CRITICAL: puppet fail [17:16:56] PROBLEM - puppet last run on cp1068 is CRITICAL: CRITICAL: puppet fail [17:16:56] PROBLEM - puppet last run on mw1142 is CRITICAL: CRITICAL: puppet fail [17:16:57] PROBLEM - puppet last run on cp4002 is CRITICAL: CRITICAL: puppet fail [17:16:57] PROBLEM - puppet last run on cp4012 is CRITICAL: CRITICAL: puppet fail [17:17:03] PROBLEM - puppet last run on eeden is CRITICAL: CRITICAL: puppet fail [17:17:03] PROBLEM - puppet last run on mw1215 is CRITICAL: CRITICAL: puppet fail [17:17:03] PROBLEM - puppet last run on cp1045 is CRITICAL: CRITICAL: puppet fail [17:17:03] PROBLEM - puppet last run on lvs1004 is CRITICAL: CRITICAL: puppet fail [17:17:04] PROBLEM - puppet last run on amssq37 is CRITICAL: CRITICAL: puppet fail [17:17:04] PROBLEM - puppet last run on search1012 is CRITICAL: CRITICAL: puppet fail [17:17:04] PROBLEM - puppet last run on mw1158 is CRITICAL: CRITICAL: puppet fail [17:17:05] PROBLEM - puppet last run on copper is CRITICAL: CRITICAL: puppet fail [17:17:05] PROBLEM - puppet last run on cp1040 is CRITICAL: CRITICAL: puppet fail [17:17:06] PROBLEM - puppet last run on achernar is CRITICAL: CRITICAL: puppet fail [17:17:06] PROBLEM - puppet last run on db1056 is CRITICAL: CRITICAL: puppet fail [17:17:07] PROBLEM - puppet last run on chromium is CRITICAL: CRITICAL: puppet fail [17:17:15] PROBLEM - puppet last run on cp3018 is CRITICAL: CRITICAL: puppet fail [17:17:16] PROBLEM - puppet last run on mw1207 is CRITICAL: CRITICAL: puppet fail [17:17:16] PROBLEM - puppet last run on sca1001 is CRITICAL: CRITICAL: puppet fail [17:17:16] PROBLEM - puppet last run on mw1021 is CRITICAL: CRITICAL: puppet fail [17:17:16] PROBLEM - puppet last run on ssl3001 is CRITICAL: CRITICAL: puppet fail [17:17:16] PROBLEM - puppet last run on terbium is CRITICAL: CRITICAL: puppet fail [17:17:16] PROBLEM - puppet last run on analytics1018 is CRITICAL: CRITICAL: puppet fail [17:17:18] PROBLEM - puppet last run on mw1090 is CRITICAL: CRITICAL: puppet fail [17:17:18] PROBLEM - puppet last run on stat1001 is CRITICAL: CRITICAL: puppet fail [17:17:23] PROBLEM - puppet last run on cp1053 is CRITICAL: CRITICAL: puppet fail [17:17:23] PROBLEM - puppet last run on mw1110 is CRITICAL: CRITICAL: puppet fail [17:17:23] PROBLEM - puppet last run on cp3005 is CRITICAL: CRITICAL: puppet fail [17:17:23] PROBLEM - puppet last run on mw1107 is CRITICAL: CRITICAL: puppet fail [17:17:23] PROBLEM - puppet last run on cp3011 is CRITICAL: CRITICAL: puppet fail [17:17:24] PROBLEM - puppet last run on dysprosium is CRITICAL: CRITICAL: puppet fail [17:17:24] PROBLEM - puppet last run on wtp1010 is CRITICAL: CRITICAL: puppet fail [17:17:33] PROBLEM - puppet last run on mw1027 is CRITICAL: CRITICAL: puppet fail [17:17:35] PROBLEM - puppet last run on mw1135 is CRITICAL: CRITICAL: puppet fail [17:17:35] PROBLEM - puppet last run on mw1113 is CRITICAL: CRITICAL: puppet fail [17:17:35] PROBLEM - puppet last run on wtp1001 is CRITICAL: CRITICAL: puppet fail [17:17:35] PROBLEM - puppet last run on amssq33 is CRITICAL: CRITICAL: puppet fail [17:17:35] PROBLEM - puppet last run on wtp1008 is CRITICAL: CRITICAL: puppet fail [17:17:35] PROBLEM - puppet last run on mw1104 is CRITICAL: CRITICAL: puppet fail [17:17:43] PROBLEM - puppet last run on lvs4004 is CRITICAL: CRITICAL: puppet fail [17:17:43] PROBLEM - puppet last run on osm-cp1001 is CRITICAL: CRITICAL: puppet fail [17:17:54] PROBLEM - puppet last run on db1037 is CRITICAL: CRITICAL: puppet fail [17:17:57] PROBLEM - puppet last run on cp4011 is CRITICAL: CRITICAL: puppet fail [17:17:57] PROBLEM - puppet last run on cp1054 is CRITICAL: CRITICAL: puppet fail [17:17:58] PROBLEM - puppet last run on mw1037 is CRITICAL: CRITICAL: puppet fail [17:18:03] PROBLEM - puppet last run on mw1131 is CRITICAL: CRITICAL: puppet fail [17:18:03] PROBLEM - puppet last run on mw1154 is CRITICAL: CRITICAL: puppet fail [17:18:03] PROBLEM - puppet last run on mw1155 is CRITICAL: CRITICAL: puppet fail [17:18:03] PROBLEM - puppet last run on cp4016 is CRITICAL: CRITICAL: puppet fail [17:18:15] PROBLEM - puppet last run on db1035 is CRITICAL: CRITICAL: puppet fail [17:18:15] PROBLEM - puppet last run on cp1066 is CRITICAL: CRITICAL: puppet fail [17:18:15] PROBLEM - puppet last run on ms-fe1002 is CRITICAL: CRITICAL: puppet fail [17:18:15] PROBLEM - puppet last run on db1045 is CRITICAL: CRITICAL: puppet fail [17:18:15] PROBLEM - puppet last run on radon is CRITICAL: CRITICAL: puppet fail [17:18:15] PROBLEM - puppet last run on ms-be1015 is CRITICAL: CRITICAL: puppet fail [17:18:16] PROBLEM - puppet last run on amssq50 is CRITICAL: CRITICAL: puppet fail [17:18:16] PROBLEM - puppet last run on mw1073 is CRITICAL: CRITICAL: puppet fail [17:18:17] PROBLEM - puppet last run on es1004 is CRITICAL: CRITICAL: puppet fail [17:18:17] PROBLEM - puppet last run on radium is CRITICAL: CRITICAL: puppet fail [17:18:24] PROBLEM - puppet last run on mw1194 is CRITICAL: CRITICAL: puppet fail [17:18:24] PROBLEM - puppet last run on db2010 is CRITICAL: CRITICAL: puppet fail [17:18:24] PROBLEM - puppet last run on lvs1003 is CRITICAL: CRITICAL: puppet fail [17:18:26] I think puppet is broken [17:18:26] PROBLEM - puppet last run on mw1128 is CRITICAL: CRITICAL: puppet fail [17:18:35] PROBLEM - puppet last run on ssl1003 is CRITICAL: CRITICAL: puppet fail [17:18:38] <_joe_> domas: not anymore [17:18:43] :-) [17:18:44] PROBLEM - puppet last run on cp3019 is CRITICAL: CRITICAL: puppet fail [17:18:46] <_joe_> but notifications are coming in slowly [17:18:46] PROBLEM - puppet last run on cp3017 is CRITICAL: CRITICAL: puppet fail [17:18:47] PROBLEM - puppet last run on analytics1031 is CRITICAL: CRITICAL: puppet fail [17:18:53] PROBLEM - puppet last run on mw1137 is CRITICAL: CRITICAL: puppet fail [17:18:54] PROBLEM - puppet last run on mw1018 is CRITICAL: CRITICAL: puppet fail [17:18:54] <_joe_> domas: what makes you suspect that? [17:19:03] PROBLEM - puppet last run on bast2001 is CRITICAL: CRITICAL: puppet fail [17:19:03] PROBLEM - puppet last run on db2017 is CRITICAL: CRITICAL: puppet fail [17:19:03] PROBLEM - puppet last run on es2006 is CRITICAL: CRITICAL: puppet fail [17:19:05] PROBLEM - puppet last run on mc1015 is CRITICAL: CRITICAL: puppet fail [17:19:07] I saw some alerts lately [17:19:14] PROBLEM - puppet last run on elastic1025 is CRITICAL: CRITICAL: puppet fail [17:19:15] PROBLEM - puppet last run on cp4010 is CRITICAL: CRITICAL: puppet fail [17:19:15] ^d: should I submit a patch to mediawiki-config? [17:19:15] PROBLEM - puppet last run on mw1047 is CRITICAL: CRITICAL: puppet fail [17:19:34] RECOVERY - puppet last run on mw1032 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [17:19:45] <^d> Nikerabbit: Is it a patch for config or for missing class registration? [17:19:52] ^d: for config [17:19:58] <^d> go for it then [17:20:03] (03PS1) 10Nikerabbit: Fix class case for translation memory [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174146 [17:20:12] ^d: https://gerrit.wikimedia.org/r/174146 ;) [17:20:34] ah grrrit-wm is up again too [17:20:38] (03CR) 10Chad: [C: 032] Fix class case for translation memory [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174146 (owner: 10Nikerabbit) [17:20:47] (03Merged) 10jenkins-bot: Fix class case for translation memory [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174146 (owner: 10Nikerabbit) [17:20:53] Nikerabbit: indeed, thanks to AzaToth [17:21:03] !log demon Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 05s) [17:21:06] Logged the message, Master [17:21:17] <^d> Nikerabbit: merged & sync'd. [17:21:31] ^d: and script runs [17:21:36] <^d> yay :) [17:21:49] ^d: there should be ttmserver-test somewhere in your cluster now [17:22:10] https://test.wikipedia.org/w/api.php?action=translationaids&format=jsonfm&title=Translations%3ATesting+some+translate+related+stuff%2F2%2Fru also works [17:22:21] I'm ready to bootstrap rest of the wikis [17:22:22] !log starting trusty upgrade of analytics1020 [17:22:24] Logged the message, Master [17:22:29] <^d> curl localhost:9200/_cat/indices/ttmserver*?v [17:22:29] <^d> health index pri rep docs.count docs.deleted store.size pri.store.size [17:22:29] <^d> green ttmserver-test 1 1 1040 0 516.3kb 265.5kb [17:22:31] PROBLEM - gdash.wikimedia.org on graphite1001 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 398 bytes in 0.004 second response time [17:22:35] <^d> :) [17:23:15] ok, going ahead with the rest [17:23:42] argh [17:23:50] RECOVERY - puppet last run on labstore1003 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [17:23:51] RECOVERY - puppet last run on ms-fe2004 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [17:23:51] RECOVERY - puppet last run on elastic1004 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [17:23:51] RECOVERY - puppet last run on wtp1016 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [17:23:53] ^d: can you also fix the setProfiling on wmf7 branch? [17:24:01] RECOVERY - puppet last run on es1008 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [17:24:02] <^d> Yeah one second [17:24:10] RECOVERY - puppet last run on lvs1002 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [17:24:11] RECOVERY - puppet last run on mw1174 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [17:24:11] RECOVERY - puppet last run on db1050 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [17:24:11] RECOVERY - puppet last run on cp1047 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [17:24:20] RECOVERY - puppet last run on ms1004 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [17:24:30] RECOVERY - puppet last run on lvs1005 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [17:24:31] RECOVERY - puppet last run on analytics1040 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [17:24:31] RECOVERY - puppet last run on helium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:24:32] <_joe_> aww they're coming in in flock [17:24:41] RECOVERY - puppet last run on ms-be2004 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [17:24:41] RECOVERY - puppet last run on es2001 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [17:24:41] RECOVERY - puppet last run on db2005 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [17:24:41] RECOVERY - puppet last run on mw1164 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [17:24:41] RECOVERY - puppet last run on es2008 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [17:24:41] !log demon Synchronized php-1.25wmf7/extensions/Translate/scripts/ttmserver-export.php: profiling hack (duration: 00m 04s) [17:24:41] RECOVERY - puppet last run on mw1026 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [17:24:47] Logged the message, Master [17:24:51] RECOVERY - puppet last run on elastic1021 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [17:25:00] RECOVERY - puppet last run on cp3003 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [17:25:00] RECOVERY - puppet last run on mw1060 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [17:25:11] RECOVERY - puppet last run on amssq35 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [17:25:11] RECOVERY - puppet last run on mw1069 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [17:25:11] RECOVERY - puppet last run on ms-fe1004 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [17:25:21] RECOVERY - puppet last run on db2018 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [17:25:21] RECOVERY - puppet last run on mw1009 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [17:25:21] RECOVERY - puppet last run on mw1153 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [17:25:21] RECOVERY - puppet last run on search1010 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [17:25:31] RECOVERY - puppet last run on iron is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [17:25:40] RECOVERY - puppet last run on db1015 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [17:25:42] RECOVERY - puppet last run on search1018 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [17:25:51] RECOVERY - puppet last run on db1046 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [17:26:10] RECOVERY - puppet last run on mw1025 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [17:26:12] RECOVERY - puppet last run on ms-fe2001 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [17:26:22] RECOVERY - puppet last run on dbproxy1001 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [17:26:23] RECOVERY - puppet last run on db1059 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [17:26:23] RECOVERY - puppet last run on mw1119 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [17:26:23] RECOVERY - puppet last run on db1067 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [17:26:30] RECOVERY - puppet last run on mw1172 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [17:26:30] RECOVERY - puppet last run on mw1166 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:26:31] RECOVERY - puppet last run on cp4008 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [17:26:40] RECOVERY - puppet last run on mw1123 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [17:26:40] RECOVERY - puppet last run on mw1065 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [17:26:41] RECOVERY - puppet last run on search1007 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [17:26:41] RECOVERY - puppet last run on lvs2001 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [17:26:41] RECOVERY - puppet last run on elastic1030 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [17:26:41] RECOVERY - puppet last run on mw1118 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [17:26:52] RECOVERY - puppet last run on elastic1022 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [17:26:52] RECOVERY - puppet last run on mw1061 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [17:27:10] RECOVERY - puppet last run on mw1092 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [17:27:11] RECOVERY - puppet last run on amssq48 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [17:27:11] RECOVERY - puppet last run on db1021 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [17:27:11] RECOVERY - puppet last run on logstash1002 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [17:27:20] RECOVERY - puppet last run on virt1006 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [17:27:21] we know, icinga-wm [17:27:22] RECOVERY - puppet last run on cp4004 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [17:27:22] RECOVERY - puppet last run on wtp1012 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [17:27:30] RECOVERY - puppet last run on wtp1005 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [17:27:31] RECOVERY - puppet last run on cp4003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:27:31] RECOVERY - puppet last run on ms-fe2003 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [17:27:31] RECOVERY - puppet last run on db1034 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [17:27:31] RECOVERY - puppet last run on amssq60 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [17:27:40] RECOVERY - puppet last run on db1003 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [17:27:40] <^d> Nikerabbit: I see the main ttmserver now [17:27:42] RECOVERY - puppet last run on labnet1001 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [17:27:42] RECOVERY - puppet last run on es1007 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [17:27:42] RECOVERY - puppet last run on amssq47 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [17:27:42] RECOVERY - puppet last run on mw1054 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [17:27:43] RECOVERY - puppet last run on db2038 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [17:27:50] RECOVERY - puppet last run on ruthenium is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [17:27:50] RECOVERY - puppet last run on labcontrol2001 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [17:28:10] RECOVERY - puppet last run on mc1012 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [17:28:10] RECOVERY - puppet last run on mw1002 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [17:28:10] RECOVERY - puppet last run on antimony is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [17:28:11] RECOVERY - puppet last run on lithium is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [17:28:11] RECOVERY - puppet last run on mw1114 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [17:28:11] RECOVERY - puppet last run on analytics1016 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [17:28:11] RECOVERY - puppet last run on cp3010 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [17:28:12] RECOVERY - puppet last run on mw1044 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [17:28:12] RECOVERY - puppet last run on db1026 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [17:28:20] <^d> YuviPanda: Just like ~240 or so to go ;-) [17:28:21] RECOVERY - puppet last run on db1052 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:28:21] RECOVERY - puppet last run on db2029 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [17:28:40] PROBLEM - DPKG on analytics1020 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [17:28:40] RECOVERY - puppet last run on install2001 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [17:28:40] RECOVERY - puppet last run on snapshot1002 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [17:28:40] RECOVERY - puppet last run on mw1149 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [17:28:40] RECOVERY - puppet last run on mw1129 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [17:28:41] RECOVERY - puppet last run on mw1055 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [17:28:41] RECOVERY - puppet last run on virt1003 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [17:28:42] RECOVERY - puppet last run on cp4019 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [17:28:42] RECOVERY - puppet last run on db1060 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [17:28:50] RECOVERY - puppet last run on cp1046 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [17:28:51] RECOVERY - puppet last run on analytics1013 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [17:28:51] RECOVERY - puppet last run on search1005 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [17:29:00] RECOVERY - puppet last run on cp4001 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [17:29:00] RECOVERY - puppet last run on db2016 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [17:29:00] RECOVERY - puppet last run on virt1001 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [17:29:10] RECOVERY - puppet last run on mw1014 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [17:29:10] RECOVERY - puppet last run on elastic1015 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [17:29:10] RECOVERY - puppet last run on elastic1006 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [17:29:11] RECOVERY - puppet last run on amssq56 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [17:29:11] RECOVERY - puppet last run on amssq34 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [17:29:11] RECOVERY - puppet last run on ssl1009 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [17:29:11] RECOVERY - puppet last run on labmon1001 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [17:29:12] RECOVERY - puppet last run on sodium is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [17:29:17] ^d: running on mediawikiwiki, the second largest for Translate [17:29:20] RECOVERY - puppet last run on amssq40 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [17:29:38] How long does it take? [17:29:40] RECOVERY - puppet last run on lvs4003 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [17:29:41] RECOVERY - puppet last run on mw1168 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [17:29:41] RECOVERY - puppet last run on mw1190 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [17:29:41] RECOVERY - puppet last run on labsdb1006 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [17:29:41] RECOVERY - puppet last run on mw1098 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [17:29:41] RECOVERY - puppet last run on ms-be1012 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [17:29:41] RECOVERY - puppet last run on gadolinium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:29:42] RECOVERY - puppet last run on bast4001 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [17:29:42] RECOVERY - puppet last run on mc1005 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [17:29:43] RECOVERY - puppet last run on search1017 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [17:29:43] RECOVERY - puppet last run on mw1111 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [17:29:51] RECOVERY - puppet last run on db2023 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [17:29:51] RECOVERY - puppet last run on db1071 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [17:29:51] RECOVERY - puppet last run on mw1079 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [17:29:51] RECOVERY - puppet last run on db2001 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [17:29:51] RECOVERY - puppet last run on amssq36 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [17:29:56] Nemo_bis: a bit, how many groups are there? [17:30:10] RECOVERY - puppet last run on cp4005 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:30:11] RECOVERY - puppet last run on wtp1023 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [17:30:11] RECOVERY - puppet last run on rubidium is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [17:30:11] RECOVERY - puppet last run on mw1056 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [17:30:14] RECOVERY - puppet last run on mw1030 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [17:30:15] RECOVERY - puppet last run on mw1034 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [17:30:15] RECOVERY - puppet last run on amssq51 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [17:30:15] RECOVERY - puppet last run on ssl3002 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [17:30:15] RECOVERY - puppet last run on tungsten is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [17:30:20] RECOVERY - puppet last run on berkelium is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [17:30:20] RECOVERY - puppet last run on mw1084 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:30:30] RECOVERY - puppet last run on mw1159 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [17:30:31] RECOVERY - puppet last run on mw1156 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [17:30:40] RECOVERY - puppet last run on mw1081 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [17:30:40] RECOVERY - puppet last run on elastic1011 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [17:30:40] RECOVERY - puppet last run on wtp1022 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [17:31:02] RECOVERY - puppet last run on ms-be1009 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [17:31:02] RECOVERY - puppet last run on analytics1037 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [17:31:10] RECOVERY - puppet last run on mw1004 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [17:31:10] RECOVERY - puppet last run on hooft is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [17:31:11] RECOVERY - puppet last run on wtp1015 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [17:31:31] RECOVERY - puppet last run on mw1050 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [17:31:37] RECOVERY - puppet last run on db2003 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [17:31:42] around 1500 [17:31:51] RECOVERY - puppet last run on labsdb1004 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [17:31:51] RECOVERY - puppet last run on rcs1001 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [17:31:52] RECOVERY - puppet last run on mw1029 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:31:52] RECOVERY - puppet last run on mw1188 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [17:31:52] RECOVERY - puppet last run on mw1053 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [17:32:01] RECOVERY - puppet last run on db1057 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [17:32:01] RECOVERY - puppet last run on search1015 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [17:32:02] RECOVERY - puppet last run on elastic1002 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [17:32:10] RECOVERY - puppet last run on db1044 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [17:32:20] RECOVERY - puppet last run on cp1060 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [17:32:21] RECOVERY - puppet last run on mc1007 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [17:32:21] RECOVERY - puppet last run on elastic1023 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [17:32:30] RECOVERY - puppet last run on amssq62 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [17:32:32] RECOVERY - puppet last run on wtp1013 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [17:32:33] RECOVERY - puppet last run on rdb1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:32:33] RECOVERY - puppet last run on pc1003 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [17:32:34] RECOVERY - puppet last run on rdb1003 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [17:32:41] RECOVERY - puppet last run on search1013 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [17:32:43] ^d: can you check docs.count for ttmserver? [17:32:50] RECOVERY - puppet last run on db1063 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [17:32:50] RECOVERY - puppet last run on ocg1003 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [17:32:51] RECOVERY - puppet last run on virt1000 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [17:32:51] RECOVERY - puppet last run on lvs2003 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [17:32:58] <^d> Nikerabbit: 62932 [17:33:06] <^d> 66033 [17:33:14] <^d> 66979 [17:33:15] <^d> etc. [17:33:38] ^d: and now? [17:33:40] <^d> 24. [17:33:48] ugh [17:33:50] that's not good [17:33:51] <^d> 3955 [17:34:08] so it looks like the bootstrap wipes the whole index when I start new wiki [17:34:48] <^d> Oh crud :( [17:34:48] <_joe_> !log restarting icinga [17:34:50] Logged the message, Master [17:35:38] RECOVERY - puppet last run on cp1052 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [17:35:38] RECOVERY - puppet last run on db1006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:35:38] RECOVERY - puppet last run on ms-be1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:35:38] RECOVERY - puppet last run on mw1016 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:35:38] RECOVERY - puppet last run on mw1121 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [17:35:39] RECOVERY - puppet last run on mw1142 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:35:47] elastica claims: bool=> Deletes index first if already exists (default = false). [17:35:48] RECOVERY - puppet last run on mw1143 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [17:35:48] RECOVERY - puppet last run on mw1148 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [17:35:48] RECOVERY - puppet last run on mw1152 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:35:48] RECOVERY - puppet last run on osm-cp1001 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [17:35:48] RECOVERY - puppet last run on cp3011 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [17:35:48] RECOVERY - puppet last run on lvs1004 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [17:35:59] RECOVERY - puppet last run on sca1001 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [17:35:59] RECOVERY - puppet last run on achernar is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:36:01] RECOVERY - puppet last run on cp1053 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [17:36:04] and it is set to true! [17:36:04] RECOVERY - puppet last run on cp1067 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:36:05] RECOVERY - puppet last run on cp4012 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [17:36:05] RECOVERY - puppet last run on db1037 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [17:36:05] RECOVERY - puppet last run on db1056 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:36:05] RECOVERY - puppet last run on db1072 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:36:05] RECOVERY - puppet last run on elastic1025 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [17:36:11] RECOVERY - puppet last run on mc1015 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [17:36:18] RECOVERY - puppet last run on radium is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [17:36:19] RECOVERY - puppet last run on terbium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:36:19] RECOVERY - puppet last run on cp1037 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [17:36:19] RECOVERY - puppet last run on ssl3001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:36:20] RECOVERY - puppet last run on db1064 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [17:36:20] RECOVERY - puppet last run on lvs1003 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [17:36:20] RECOVERY - puppet last run on lvs3003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:36:20] RECOVERY - puppet last run on amssq39 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:36:21] RECOVERY - puppet last run on analytics1018 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:36:24] stupid me [17:36:28] RECOVERY - puppet last run on cp4009 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:36:28] RECOVERY - puppet last run on db1045 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [17:36:28] RECOVERY - puppet last run on db1061 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [17:36:28] RECOVERY - puppet last run on fluorine is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [17:36:28] RECOVERY - puppet last run on eeden is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [17:36:29] RECOVERY - puppet last run on mw1033 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [17:36:29] RECOVERY - puppet last run on mw1037 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [17:36:30] RECOVERY - puppet last run on mw1167 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [17:36:30] RECOVERY - puppet last run on mw1185 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [17:36:31] RECOVERY - puppet last run on mw1186 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [17:36:31] RECOVERY - puppet last run on amssq31 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [17:36:38] RECOVERY - puppet last run on cp1054 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:36:42] RECOVERY - puppet last run on cp1068 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [17:36:42] RECOVERY - puppet last run on cp4020 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [17:36:42] RECOVERY - puppet last run on db1035 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [17:36:42] RECOVERY - puppet last run on db2030 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [17:36:42] RECOVERY - puppet last run on lvs4004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:36:42] RECOVERY - puppet last run on ms-fe1002 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [17:36:43] RECOVERY - puppet last run on ms-fe2002 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [17:36:48] RECOVERY - puppet last run on mw1131 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [17:36:48] RECOVERY - puppet last run on mw1135 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:36:48] RECOVERY - puppet last run on wtp1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:36:48] RECOVERY - puppet last run on mw1155 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [17:36:48] RECOVERY - puppet last run on mw1154 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [17:36:49] RECOVERY - puppet last run on mw1158 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [17:36:49] RECOVERY - puppet last run on ms-be1011 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [17:36:50] RECOVERY - puppet last run on dysprosium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:36:55] <^d> Nikerabbit: This is why boolean parameters are evil :( :) [17:36:59] RECOVERY - puppet last run on bast1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [17:36:59] RECOVERY - puppet last run on cp3005 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:37:09] RECOVERY - puppet last run on db2010 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [17:37:09] RECOVERY - puppet last run on search1006 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [17:37:09] RECOVERY - puppet last run on stat1001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [17:37:09] RECOVERY - puppet last run on search1012 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [17:37:09] RECOVERY - puppet last run on cp1045 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:37:09] RECOVERY - puppet last run on db1011 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [17:37:10] RECOVERY - puppet last run on db1033 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [17:37:10] RECOVERY - puppet last run on iodine is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [17:37:11] PROBLEM - puppet last run on analytics1020 is CRITICAL: Connection refused by host [17:37:18] RECOVERY - puppet last run on mw1021 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [17:37:18] RECOVERY - puppet last run on mw1027 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [17:37:18] RECOVERY - puppet last run on mw1071 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [17:37:18] RECOVERY - puppet last run on mw1064 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [17:37:18] RECOVERY - puppet last run on mw1073 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [17:37:19] RECOVERY - puppet last run on mw1066 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [17:37:19] RECOVERY - puppet last run on mw1086 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [17:37:20] brb, relocating [17:37:54] <^d> I was wondering when it finally would be a flood. [17:38:39] ^d: can we try somethin glike https://gerrit.wikimedia.org/r/174151 -- my only fear is that then it will throw an exception if index exists [17:38:48] RECOVERY - puppet last run on mw1137 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [17:38:58] <^d> Nikerabbit: We can try and find out :) [17:39:18] PROBLEM - Hadoop NodeManager on analytics1020 is CRITICAL: Connection refused by host [17:39:29] PROBLEM - puppet last run on neon is CRITICAL: CRITICAL: puppet fail [17:39:30] ^d: yes that's fast way to find out [17:39:41] PROBLEM - RAID on analytics1020 is CRITICAL: Connection refused by host [17:39:52] PROBLEM - check configured eth on analytics1020 is CRITICAL: Connection refused by host [17:40:07] !log demon Synchronized php-1.25wmf7/extensions/Translate/ttmserver/ElasticSearchTTMServer.php: hack (duration: 00m 04s) [17:40:08] PROBLEM - Disk space on analytics1020 is CRITICAL: Connection refused by host [17:40:09] PROBLEM - check if dhclient is running on analytics1020 is CRITICAL: Connection refused by host [17:40:09] Logged the message, Master [17:40:18] PROBLEM - Hadoop DataNode on analytics1020 is CRITICAL: Connection refused by host [17:40:19] PROBLEM - check if salt-minion is running on analytics1020 is CRITICAL: Connection refused by host [17:40:19] !log demon Synchronized php-1.25wmf8/extensions/Translate/ttmserver/ElasticSearchTTMServer.php: hack (duration: 00m 04s) [17:40:21] Logged the message, Master [17:40:24] Nemo_bis: fyi bootstrap with two threads for mediawikiwiki took 11 minutes on semi-warm cache [17:40:26] <^d> Nikerabbit: hacked on both branches [17:40:32] ^d: okay [17:40:52] Nikerabbit: whee that's very fast :) [17:41:17] ^d: yes it throws an exception but continues [17:41:34] RECOVERY - puppet last run on neon is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [17:41:36] <^d> ok good, we can tidy up the handling there then [17:41:47] ^d: good thing that I put boostrapping in separate thread [17:42:46] ^d: can you paste the number of docs for a moment again to see that delete query is correct? [17:42:56] RECOVERY - RAID on analytics1020 is OK: OK: no disks configured for RAID [17:42:58] RECOVERY - check configured eth on analytics1020 is OK: NRPE: Unable to read output [17:43:14] RECOVERY - Disk space on analytics1020 is OK: DISK OK [17:43:16] RECOVERY - check if dhclient is running on analytics1020 is OK: PROCS OK: 0 processes with command name dhclient [17:43:16] RECOVERY - Hadoop DataNode on analytics1020 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.hdfs.server.datanode.DataNode [17:43:23] RECOVERY - check if salt-minion is running on analytics1020 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [17:43:33] RECOVERY - Hadoop NodeManager on analytics1020 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [17:44:31] <^d> Nikerabbit: 100367 [17:44:39] <^d> 103544 [17:44:49] ^d: that's good [17:46:30] (03PS1) 10Giuseppe Lavagetto: puppet: various lookup fixes in site.pp/hiera [puppet] - 10https://gerrit.wikimedia.org/r/174153 [17:46:59] (03PS3) 10Ottomata: Add libboost-regex-dev, libboost-system-dev and libyaml-cpp0.3 to stat1002 and 1003 for Ironholds [puppet] - 10https://gerrit.wikimedia.org/r/174131 [17:48:18] (03CR) 10Ottomata: [C: 032] Add libboost-regex-dev, libboost-system-dev and libyaml-cpp0.3 to stat1002 and 1003 for Ironholds [puppet] - 10https://gerrit.wikimedia.org/r/174131 (owner: 10Ottomata) [17:49:04] RECOVERY - DPKG on analytics1020 is OK: All packages OK [17:49:05] <^d> Nikerabbit: 180484, still good :) [17:50:15] (03PS2) 10Giuseppe Lavagetto: puppet: various lookup fixes in site.pp/hiera [puppet] - 10https://gerrit.wikimedia.org/r/174153 [17:50:26] ^d: ~4 wikis left including meta (the largest) [17:51:03] ^d: while waiting I prepare the patch to swap ES to be the primary backend [17:51:20] Nikerabbit: for SearchTranslations too? [17:51:23] (03PS3) 10Giuseppe Lavagetto: puppet: various lookup fixes in site.pp/hiera [puppet] - 10https://gerrit.wikimedia.org/r/174153 [17:51:31] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] puppet: various lookup fixes in site.pp/hiera [puppet] - 10https://gerrit.wikimedia.org/r/174153 (owner: 10Giuseppe Lavagetto) [17:51:35] Nemo_bis: it uses the same thing [17:53:02] mutante: What's the path to get https://gerrit.wikimedia.org/r/#/c/108484/ tested and deployed? [17:53:47] (03PS1) 10Nikerabbit: Make ES the primary TTMServer backend and drop Solr [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174157 [17:54:35] (03CR) 10Chad: [C: 031] "Will merge when indexes are ready." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174157 (owner: 10Nikerabbit) [17:55:39] ^d: meta and wikidata currently running, will likely go a bit overtime from our 1h window [17:55:44] PROBLEM - puppet last run on analytics1020 is CRITICAL: Connection refused by host [17:56:36] wikidata ready, only meta running now [17:56:44] PROBLEM - puppet last run on stat1001 is CRITICAL: CRITICAL: Puppet has 1 failures [17:57:29] <^d> Nikerabbit: I think we'll be fine :) [17:57:40] yeah [17:57:44] PROBLEM - Host analytics1020 is DOWN: PING CRITICAL - Packet loss = 100% [17:58:22] meta has ~3000 message groups [18:00:05] maxsem, kaldari: Respected human, time to deploy Mobile Web (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141118T1800). Please do the needful. [18:00:37] jgage: if you are around, halfak has some hadoop logstash qs for you [18:00:59] :) Trying to find my hadoop output in logstash (application_1415917009743_4929) [18:01:50] oh, halfak zoom out [18:01:58] i guess the default of logstash is only to show you really recen tmatches [18:02:15] i see hits for that around 17:30 [18:02:25] Cool. Seeing stuff now. [18:03:11] (03CR) 10Aaron Schulz: [C: 032] Added switch-logic for new Profiler config format [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174060 (owner: 10Aaron Schulz) [18:03:24] (03Merged) 10jenkins-bot: Added switch-logic for new Profiler config format [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174060 (owner: 10Aaron Schulz) [18:03:37] (03PS1) 10Dzahn: CI: install private ssh key for Travis integration [puppet] - 10https://gerrit.wikimedia.org/r/174161 [18:04:11] !log aaron Synchronized wmf-config/StartProfiler.php: Added switch-logic for new Profiler config format (duration: 00m 05s) [18:04:13] Logged the message, Master [18:05:49] RECOVERY - Host analytics1020 is UP: PING OK - Packet loss = 0%, RTA = 0.65 ms [18:06:02] bd808: the config part for https://gerrit.wikimedia.org/r/#/c/173348/ is handled nw [18:06:08] PROBLEM - puppet last run on helium is CRITICAL: CRITICAL: Puppet has 1 failures [18:07:11] ^d: meta still running, will still take a while [18:07:20] <^d> That's cool [18:08:09] RECOVERY - puppet last run on analytics1020 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [18:08:51] !log starting trusty upgrade of analytics1029 [18:08:54] Logged the message, Master [18:14:06] (03PS1) 10Ottomata: Add Gergo to the researchers group [puppet] - 10https://gerrit.wikimedia.org/r/174166 [18:14:34] (03CR) 10Dzahn: "this private key would be installed by puppet into the home of a special system user for Travis integration on nodes with role CI/jenkins " [puppet] - 10https://gerrit.wikimedia.org/r/174161 (owner: 10Dzahn) [18:17:18] (03CR) 10GWicke: "To me it looks like we'd actually want something closer to https://docs.puppetlabs.com/hiera/1/puppet.html#interacting-with-structured-dat" [puppet] - 10https://gerrit.wikimedia.org/r/171741 (owner: 10GWicke) [18:17:20] order is not consistent but it's currently indexing pages created in july 2013 [18:18:38] PROBLEM - DPKG on analytics1029 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:19:01] Anything below several hours is still flashing fast from my pov [18:19:38] RECOVERY - DPKG on analytics1029 is OK: All packages OK [18:19:42] Nemo_bis: it's not CPU bound on the script [18:20:02] probably just data transfer from/to mysql and then ES [18:21:39] PROBLEM - puppet last run on analytics1029 is CRITICAL: CRITICAL: Puppet has 26 failures [18:21:58] now in 2014 [18:22:24] now is future [18:25:14] (03CR) 10Dzahn: "thanks Andrew, you already did exactly what i meant to right now. also deleting templates" [puppet] - 10https://gerrit.wikimedia.org/r/173991 (owner: 10Dzahn) [18:25:17] (03PS3) 10Rush: phab don't try to preview icon/x-icon [puppet] - 10https://gerrit.wikimedia.org/r/173875 [18:25:18] !log maxsem Synchronized php-1.25wmf8/extensions/MobileFrontend/: https://gerrit.wikimedia.org/r/#/c/173316/ (duration: 00m 08s) [18:25:23] Logged the message, Master [18:25:32] !log maxsem Synchronized php-1.25wmf7/extensions/MobileFrontend/: https://gerrit.wikimedia.org/r/#/c/173316/ (duration: 00m 04s) [18:25:33] (03CR) 10Rush: [C: 032 V: 032] phab don't try to preview icon/x-icon [puppet] - 10https://gerrit.wikimedia.org/r/173875 (owner: 10Rush) [18:25:34] Logged the message, Master [18:25:41] PROBLEM - DPKG on analytics1029 is CRITICAL: Connection refused by host [18:25:55] kaldari, ^^^ [18:27:23] MaxSem: Seems to be OK [18:27:34] (03CR) 10Ottomata: [C: 032] Add Gergo to the researchers group [puppet] - 10https://gerrit.wikimedia.org/r/174166 (owner: 10Ottomata) [18:28:21] pages done, doing cn groups [18:28:33] tgr: ^^ :) [18:29:29] PROBLEM - puppet last run on iridium is CRITICAL: CRITICAL: puppet fail [18:30:23] Those are only like 1500? :P [18:30:49] RECOVERY - DPKG on analytics1029 is OK: All packages OK [18:31:28] !log created wikigrok_questions table on test, test2 and enwiki [18:31:30] Logged the message, Master [18:32:15] (03PS2) 10Aaron Schulz: Enable xhprof in labs, testwiki, and with ?forceprofile anywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173472 [18:32:38] (03CR) 10Aaron Schulz: [C: 04-2] "Needs https://gerrit.wikimedia.org/r/#/c/173348/" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173472 (owner: 10Aaron Schulz) [18:33:42] (03PS1) 10Rush: phab fix quoting on yaml values [puppet] - 10https://gerrit.wikimedia.org/r/174181 [18:33:49] (03CR) 10jenkins-bot: [V: 04-1] phab fix quoting on yaml values [puppet] - 10https://gerrit.wikimedia.org/r/174181 (owner: 10Rush) [18:33:54] (03PS2) 10Rush: phab fix quoting on yaml values [puppet] - 10https://gerrit.wikimedia.org/r/174181 [18:35:56] (03CR) 10Rush: [C: 032 V: 032] phab fix quoting on yaml values [puppet] - 10https://gerrit.wikimedia.org/r/174181 (owner: 10Rush) [18:36:34] ^d: and done! [18:36:44] \o/ [18:36:58] <^d> Nikerabbit: Yay \o/ [18:37:17] 43 minutes... need to remember that next time when doing indexes [18:37:40] RECOVERY - puppet last run on iridium is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [18:38:29] PROBLEM - Host analytics1029 is DOWN: PING CRITICAL - Packet loss = 100% [18:38:40] PROBLEM - puppet last run on radon is CRITICAL: CRITICAL: puppet fail [18:38:50] RECOVERY - Host analytics1029 is UP: PING OK - Packet loss = 0%, RTA = 1.18 ms [18:39:02] <^d> Nikerabbit: 43 minutes is nothing. Good to know :) [18:39:14] ^d: what's the index size? ;) [18:39:38] <^d> ~200mb for the primary. [18:39:48] <^d> I closed the tab I had open for the exact #. [18:40:03] <^d> ~500k docs [18:40:12] at twn: 825M and 2 685 414 docs [18:40:54] jgage, have some time to help me query logstash? [18:41:01] Almost directly proportional, although Meta-Wiki messages are certainly bigger on average [18:41:15] <^d> Nikerabbit: If we're only 200mb then I don't think we'll have to worry about sharding anymore for quite some time. [18:41:38] ^d: cool [18:41:40] (03PS1) 10Dereckson: Adding blogs to de. and en.planet.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/174182 [18:41:46] ^d: https://gerrit.wikimedia.org/r/#/c/174157/ can go now? [18:42:02] (03CR) 10Chad: [C: 032] Make ES the primary TTMServer backend and drop Solr [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174157 (owner: 10Nikerabbit) [18:42:06] great [18:42:11] (03Merged) 10jenkins-bot: Make ES the primary TTMServer backend and drop Solr [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174157 (owner: 10Nikerabbit) [18:42:32] !log demon Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 04s) [18:42:36] Logged the message, Master [18:42:37] !log starting trusty upgrade of analtyics1030 [18:42:39] Logged the message, Master [18:42:51] ^d: wow ES is fast [18:43:09] <^d> 31 servers ;-) [18:43:16] "ttmserver": 0.039 [18:43:28] while we had both solr and ES enabled that was 1.500 [18:43:36] go figure [18:43:57] ^d: thanks a lot [18:46:54] <^d> Nikerabbit: You're welcome :) [18:47:08] and the query which previously timed out takes now ~ 4 seconds [18:47:14] hapyp days [18:47:19] PROBLEM - DPKG on analytics1030 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:47:29] PROBLEM - puppet last run on analytics1030 is CRITICAL: CRITICAL: Puppet has 26 failures [18:48:10] <^d> Nikerabbit: That's a great improvement!! [18:48:11] <^d> :) [18:49:36] !log maxsem Synchronized php-1.25wmf8/extensions/WikiGrok/: SQL backed version (duration: 00m 05s) [18:49:39] Logged the message, Master [18:53:18] Nikerabbit: less matches for "wikimedia" now http://www.webpagetest.org/result/141118_Z2_11RV/4/screen_shot/cached/ http://www.webpagetest.org/result/141118_NP_109V/1/screen_shot/cached/ [18:54:30] !log maxsem Synchronized php-1.25wmf7/extensions/WikiGrok/: SQL backed version (duration: 00m 04s) [18:54:32] Nemo_bis: I bet it is just due to "purging" of obsolete translation units [18:54:34] Logged the message, Master [18:55:31] RECOVERY - DPKG on analytics1030 is OK: All packages OK [18:56:00] RECOVERY - puppet last run on radon is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [18:57:17] !log maxsem Synchronized php-1.25wmf7/extensions/WikiGrok/: Revert (duration: 00m 04s) [18:57:19] Logged the message, Master [18:58:00] PROBLEM - Host analytics1030 is DOWN: PING CRITICAL - Packet loss = 100% [18:58:14] Nikerabbit: sorting looks different too [18:58:21] ES prefers weirder occurrences, perhaps [18:58:31] like '"Wikimedia"' and 'WIKIMEDIA' [18:58:59] Nemo_bis: yes there are slight differences [18:59:01] RECOVERY - Host analytics1030 is UP: PING OK - Packet loss = 0%, RTA = 2.17 ms [18:59:10] Nemo_bis: if you find something which is clearly wrong, file a bug ;) [18:59:40] RECOVERY - puppet last run on analytics1030 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [19:00:04] Reedy, greg-g: Dear anthropoid, the time has come. Please deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141118T1900). [19:00:56] eh, I thought I had 2 hours:P [19:03:31] !log maxsem Synchronized php-1.25wmf7/extensions/WikiGrok/: Retry sync (duration: 00m 07s) [19:03:37] Logged the message, Master [19:03:56] WTF, mw1205 is out of sync [19:04:52] can someone take mw1205 out of rotation, it's having issues? [19:05:23] mutante, ^^ [19:05:37] !log depooled mw1205; out of sync [19:05:39] Logged the message, Master [19:05:44] thanks [19:06:17] trying to run sync-common locally [19:07:01] also, we're having some mysql problems [19:07:03] Reedy: why'd we move the deploy back today? [19:13:31] MaxSem: if you can't fix mw1205, can you file an RT? [19:14:28] !log starting trusty upgrade of analytics1031 [19:14:31] Logged the message, Master [19:14:32] ori, after running sync-common there the file that wasn't synced before looks right, but the question is why it happened and how to make sure that it won't repeat [19:14:34] apparently my Xorg is acting up again [19:14:55] hoo: obligatory: http://xkcd.com/963/ [19:15:24] So true :D [19:16:58] Anyway... I'm around for the deploy [19:17:18] greg-g: Because hoo and aude also weren't available [19:17:34] oh right [19:17:46] Reedy & greg-g, other than the mw1205 hiccup I'm done [19:17:48] Yeah, aude is traveling and I had to be at the university [19:17:51] So me stopping somewhere on the motorway wasn't the best idea [19:18:02] We had an Amsterdam Cabal discussion :) [19:18:40] PROBLEM - DPKG on analytics1031 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [19:19:39] PROBLEM - puppet last run on analytics1031 is CRITICAL: CRITICAL: Puppet has 20 failures [19:21:31] (03PS1) 10Reedy: Non wikipedias to 1.25wmf8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174192 [19:22:09] (03CR) 10Reedy: [C: 032] Non wikipedias to 1.25wmf8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174192 (owner: 10Reedy) [19:22:17] (03Merged) 10jenkins-bot: Non wikipedias to 1.25wmf8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174192 (owner: 10Reedy) [19:22:57] (03PS1) 10Rush: phab set files.image-mime-types yaml as array [puppet] - 10https://gerrit.wikimedia.org/r/174193 [19:23:06] (03CR) 10jenkins-bot: [V: 04-1] phab set files.image-mime-types yaml as array [puppet] - 10https://gerrit.wikimedia.org/r/174193 (owner: 10Rush) [19:23:07] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non wikipedias to 1.25wmf8 [19:23:09] (03PS2) 10Rush: phab set files.image-mime-types yaml as array [puppet] - 10https://gerrit.wikimedia.org/r/174193 [19:23:09] Logged the message, Master [19:24:01] (03CR) 10Faidon Liambotis: [C: 032] "Either works for me. If this approach becomes too complex in the future, we can always reevaluate." [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/166888 (owner: 10Ottomata) [19:24:14] (03PS2) 10Reedy: Enable global AbuseFilter on wikidatawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173519 (owner: 10Hoo man) [19:24:18] (03CR) 10Reedy: [C: 032] Enable global AbuseFilter on wikidatawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173519 (owner: 10Hoo man) [19:24:34] (03Merged) 10jenkins-bot: Enable global AbuseFilter on wikidatawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173519 (owner: 10Hoo man) [19:24:36] (03CR) 10Rush: [C: 032 V: 032] phab set files.image-mime-types yaml as array [puppet] - 10https://gerrit.wikimedia.org/r/174193 (owner: 10Rush) [19:24:58] (03PS2) 10Reedy: Set $wgUploadNavigationUrl on it.wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173480 (https://bugzilla.wikimedia.org/73439) (owner: 10Dereckson) [19:25:07] (03CR) 10Reedy: [C: 032] Set $wgUploadNavigationUrl on it.wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173480 (https://bugzilla.wikimedia.org/73439) (owner: 10Dereckson) [19:25:33] (03Merged) 10jenkins-bot: Set $wgUploadNavigationUrl on it.wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173480 (https://bugzilla.wikimedia.org/73439) (owner: 10Dereckson) [19:25:37] (03PS1) 10Ori.livneh: varnishkafka: update submodule; convert to multi-instance [puppet] - 10https://gerrit.wikimedia.org/r/174194 [19:25:39] (03PS1) 10Ori.livneh: add varnish::logging::statslistener [puppet] - 10https://gerrit.wikimedia.org/r/174195 [19:25:42] ^ ottomata [19:26:00] (03PS2) 10Reedy: Removing special wgAccountThrottle for hewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173273 (owner: 10Eranroz) [19:26:05] (03CR) 10Reedy: [C: 032] Removing special wgAccountThrottle for hewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173273 (owner: 10Eranroz) [19:26:24] (03Merged) 10jenkins-bot: Removing special wgAccountThrottle for hewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173273 (owner: 10Eranroz) [19:26:51] RECOVERY - DPKG on analytics1031 is OK: All packages OK [19:27:11] (03CR) 10Ottomata: [C: 031] varnishkafka: update submodule; convert to multi-instance [puppet] - 10https://gerrit.wikimedia.org/r/174194 (owner: 10Ori.livneh) [19:27:36] (03PS2) 10Reedy: Group translate-proofr was removed from Translate [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173766 (owner: 10Nikerabbit) [19:27:39] (03CR) 10Reedy: [C: 032] Group translate-proofr was removed from Translate [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173766 (owner: 10Nikerabbit) [19:27:49] (03Merged) 10jenkins-bot: Group translate-proofr was removed from Translate [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173766 (owner: 10Nikerabbit) [19:28:53] !log reedy Synchronized wmf-config/: (no message) (duration: 00m 14s) [19:28:55] Logged the message, Master [19:29:25] (03CR) 10Ottomata: "Hm 'stats' seems very generic. I've been bitten before! :)" [puppet] - 10https://gerrit.wikimedia.org/r/174195 (owner: 10Ori.livneh) [19:30:10] PROBLEM - Host analytics1031 is DOWN: CRITICAL - Plugin timed out after 15 seconds [19:31:08] ottomata: https://gerrit.wikimedia.org/r/#/c/166888/ still needs a 'verified' [19:31:20] RECOVERY - Host analytics1031 is UP: PING OK - Packet loss = 0%, RTA = 1.41 ms [19:31:28] (03PS1) 10Rush: phab files.image-mime-types needs special format [puppet] - 10https://gerrit.wikimedia.org/r/174196 [19:31:35] (03CR) 10jenkins-bot: [V: 04-1] phab files.image-mime-types needs special format [puppet] - 10https://gerrit.wikimedia.org/r/174196 (owner: 10Rush) [19:31:54] (03CR) 10Ottomata: [V: 032] Initial commit of Cassandra puppet module [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/166888 (owner: 10Ottomata) [19:31:57] (03PS2) 10Rush: phab files.image-mime-types needs special format [puppet] - 10https://gerrit.wikimedia.org/r/174196 [19:32:26] (03PS3) 10Ottomata: Add cassandra role [puppet] - 10https://gerrit.wikimedia.org/r/167700 [19:32:33] (03CR) 10Ottomata: Add cassandra role [puppet] - 10https://gerrit.wikimedia.org/r/167700 (owner: 10Ottomata) [19:33:45] ottomata: is it cool if i merge the first of those changes (submodule update / convert to multi-instance)? [19:33:54] (03CR) 10Rush: [C: 032 V: 032] phab files.image-mime-types needs special format [puppet] - 10https://gerrit.wikimedia.org/r/174196 (owner: 10Rush) [19:34:21] ori, it scares me a little [19:34:28] hm [19:34:39] ottomata: i was going to disable puppet on all the bits servers and apply it on just one [19:34:41] is there a way we can apply it to one or two hosts at a time? [19:34:44] yeah [19:34:47] hm [19:34:55] cmjohnson: do you need more specifics from me re: getting a horizon host? [19:35:03] youd' have to disable puppet on all of the varnishes, right? [19:35:04] actually, i'm being an idiot [19:35:06] let's do labs first [19:35:08] ok [19:35:14] i'll do that [19:35:18] i can cherry-pick it there [19:35:21] <_joe_> I was about to say "beta" :) [19:35:24] make sure it is ok with an existing varnishkafka instance [19:35:31] that uses the class and init.d script [19:36:38] (03CR) 10GWicke: [C: 031] Add cassandra role [puppet] - 10https://gerrit.wikimedia.org/r/167700 (owner: 10Ottomata) [19:36:40] (03CR) 1020after4: "updated patch coming up ... but see inline comments. the passwords file was already included even before my changes." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/173483 (owner: 1020after4) [19:37:49] (03CR) 10GWicke: [C: 031] "In any case, lets get *something* merged for now so that we can test the actual puppet modules. We can always rework the way this is hooke" [puppet] - 10https://gerrit.wikimedia.org/r/171741 (owner: 10GWicke) [19:38:57] (03PS1) 10Hoo man: Bump Wikidata cache epoch [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174199 [19:39:00] Reedy: ^ [19:39:29] ottomata: well, the change applied in labs, but we don't run varnishkafka there. so we know that it doesn't contain any egregious errors, but i think the next step is probably applying it on just one varnish [19:40:01] (03CR) 10Reedy: [C: 032] Bump Wikidata cache epoch [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174199 (owner: 10Hoo man) [19:40:08] (03Merged) 10jenkins-bot: Bump Wikidata cache epoch [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174199 (owner: 10Hoo man) [19:40:27] !log starting trusty upgrade of analytics1032 [19:40:32] Logged the message, Master [19:40:37] !log reedy Synchronized wmf-config/Wikibase.php: bump epoch (duration: 00m 13s) [19:40:39] Logged the message, Master [19:40:42] ori, you want to disable puppet on all varnishes? [19:40:45] and then try? [19:41:31] ottomata: the way i'd do this is: on palladium, salt -G cluster:cache_bits cmd.run 'puppet agent --disable "disabling to push out change Iac35f2329' ; then !log disabled puppet on varnishes to push out Iac35f2329, then enable it on, say, cp1056 and puppet agent -tv [19:41:59] then if it looks good, re-enable everywhere [19:42:04] if it looks bad, revert or fix-up [19:42:34] _joe_: does that sound sane (asking for a second opinion) [19:42:41] sounds ok to me ori, i'd add an extra step to make sure that puppet was actually disabled [19:42:45] i don't always trust salt [19:42:56] nod [19:43:01] <_joe_> just bits? [19:43:15] <_joe_> won't this matter for all caches? [19:43:29] yes, you're right [19:43:39] it's the follow-up change that is bits only [19:44:04] <_joe_> so -G 'cluster:cache_*' [19:44:08] right [19:44:08] <_joe_> and test.ping first [19:44:19] <_joe_> you may miss some servers [19:44:20] to confirm we can [19:44:26] <_joe_> yes [19:44:35] salt -G 'cluster:cache_*' cmd.run 'test -f /var/lib/puppet/state/agent_disabled.lock && echo OK' [19:44:51] <_joe_> yes [19:45:06] <_joe_> we need to make all this a salt module [19:45:11] <_joe_> I need to study salt [19:45:24] <_joe_> (disabling/enabling puppet, I mean) [19:45:27] _joe_, ottomata: OK if I go ahead, then? I'm going to do cp1056 [19:45:41] <_joe_> yes, don't count me in though [19:46:06] <_joe_> I have a movie to watch :) [19:46:15] kk [19:46:16] haha, ori, ja go ahead, i'm here. [19:46:35] (03PS4) 1020after4: set up redirects for bugzilla urls to redirect to phabricator. [puppet] - 10https://gerrit.wikimedia.org/r/173483 [19:46:59] !log disabling Puppet on varnishes to push out Iac35f2329 [19:47:03] Logged the message, Master [19:47:04] (03PS1) 10Dzahn: ganglia: remove pmtpa varnish stanza [puppet] - 10https://gerrit.wikimedia.org/r/174205 [19:47:21] (03CR) 10jenkins-bot: [V: 04-1] set up redirects for bugzilla urls to redirect to phabricator. [puppet] - 10https://gerrit.wikimedia.org/r/173483 (owner: 1020after4) [19:48:23] (03CR) 1020after4: [C: 031] "@Dzahn: templatized preamble.php so that the host name can be passed as a puppet variable" [puppet] - 10https://gerrit.wikimedia.org/r/173483 (owner: 1020after4) [19:49:04] (03CR) 10Nuria: "Nuria testing this change in staging." [puppet] - 10https://gerrit.wikimedia.org/r/172285 (https://bugzilla.wikimedia.org/72740) (owner: 10QChris) [19:49:21] (03PS5) 1020after4: set up redirects for bugzilla urls to redirect to phabricator. [puppet] - 10https://gerrit.wikimedia.org/r/173483 [19:49:42] (03CR) 1020after4: [C: 031] "missed a semicolon. fixed" [puppet] - 10https://gerrit.wikimedia.org/r/173483 (owner: 1020after4) [19:50:52] (03PS2) 10Ori.livneh: varnishkafka: update submodule; convert to multi-instance [puppet] - 10https://gerrit.wikimedia.org/r/174194 [19:51:01] (03CR) 10Ori.livneh: [C: 032 V: 032] varnishkafka: update submodule; convert to multi-instance [puppet] - 10https://gerrit.wikimedia.org/r/174194 (owner: 10Ori.livneh) [19:52:50] PROBLEM - puppet last run on cp1056 is CRITICAL: CRITICAL: puppet fail [19:53:14] (03PS1) 10Ori.livneh: Fix-up for Iac35f2329 [puppet] - 10https://gerrit.wikimedia.org/r/174215 [19:53:23] (03CR) 10Ori.livneh: [C: 032 V: 032] Fix-up for Iac35f2329 [puppet] - 10https://gerrit.wikimedia.org/r/174215 (owner: 10Ori.livneh) [19:54:50] RECOVERY - puppet last run on cp1056 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [19:55:12] ottomata: looks good on cp1056 [19:55:17] going to try on one more host before re-enabling [19:55:30] PROBLEM - DPKG on analytics1032 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [19:56:54] bits slowness? [19:57:37] OH COOL [19:57:40] whoa caps [19:57:43] ori, can I try on one? [19:57:45] i wanna see it :) [19:57:49] which one should I do? [19:57:51] cp1052 ok? [19:58:00] ottomata: yes, but i found another small problem :/ [19:58:06] ok, i will wait [19:58:06] ? [19:58:08] the upstart config needs to be before => the Service [19:58:09] yeah [19:58:12] hmmmmaye [19:58:13] yeah [19:58:14] ok [19:58:15] it fixes itself after one run, but still a bug [19:58:20] <_joe_> hoo: bits slowness? [19:58:44] _joe_: Might be my connection... but stuff seems slow right now [19:58:50] oh wait, we do that already [19:59:08] oh, not for the upstart file [19:59:21] <_joe_> ori: ^^ see hoo, we may have to stall for a minute? [19:59:50] PROBLEM - puppet last run on cp1008 is CRITICAL: CRITICAL: Puppet has 1 failures [20:00:16] (03PS1) 10Ori.livneh: Provision Upstart config before service [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/174225 [20:00:20] <_joe_> yes there is an occasional bits slowness [20:00:26] hoo: hm? [20:00:37] ottomata: see change above ^ [20:00:38] <_joe_> I've hit that too [20:01:13] _joe_: the change is only applied on two hosts (cp1056 and cp1008); the rest have puppet disabled. why would that affect perf? [20:01:40] <_joe_> ori: I don't think it has to do with it [20:01:50] <_joe_> I just asked you to stall for a minute [20:01:59] <_joe_> not to superimpose things [20:02:23] !log ran populateGlobalRenameLogSearch.php on metawiki [20:02:25] Logged the message, Master [20:02:50] _joe_: nod, np [20:03:00] RECOVERY - puppet last run on cp1008 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [20:03:12] jgage: yt? halfak needs logstash hadoop help! [20:03:59] ottomata, halfak: hadoop specific or some logstash general issue I could help with? [20:04:11] hadoop specific [20:04:22] he's looking for some logs, we aren't sure if htey are actually in logstash [20:04:35] *nod* [20:04:39] RECOVERY - DPKG on analytics1032 is OK: All packages OK [20:06:11] (03CR) 10Ottomata: [C: 032 V: 032] Provision Upstart config before service [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/174225 (owner: 10Ori.livneh) [20:08:30] PROBLEM - Host analytics1032 is DOWN: PING CRITICAL - Packet loss = 100% [20:08:45] <_joe_> mmmh I have lost a few minutes of backlog [20:08:55] <_joe_> but bits seem ok, so go on [20:09:46] (03CR) 10Dzahn: [C: 031] "unfortunately couldn't run in compiler because unrelated issue: http://puppet-compiler.wmflabs.org/513/change/173483/html/iridium.eqiad.wm" [puppet] - 10https://gerrit.wikimedia.org/r/173483 (owner: 1020after4) [20:10:25] (03PS1) 10Ori.livneh: don't run replace-varnishkafka-${name}.pyconf unless file exists [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/174232 [20:10:30] RECOVERY - Host analytics1032 is UP: PING OK - Packet loss = 0%, RTA = 1.48 ms [20:10:31] ottomata: one more (pre-existing bug) [20:10:44] !log running batchCAAntiSpoof.php on terbium [20:10:48] Logged the message, Master [20:10:56] like [20:11:09] https://gerrit.wikimedia.org/r/174232 i mean [20:11:17] (03CR) 10Ottomata: [C: 032 V: 032] don't run replace-varnishkafka-${name}.pyconf unless file exists [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/174232 (owner: 10Ori.livneh) [20:12:01] (03PS1) 10Ori.livneh: Update varnishkafka submodule to c965ebca0d [puppet] - 10https://gerrit.wikimedia.org/r/174234 [20:12:12] (03CR) 10Ori.livneh: [C: 032 V: 032] Update varnishkafka submodule to c965ebca0d [puppet] - 10https://gerrit.wikimedia.org/r/174234 (owner: 10Ori.livneh) [20:14:52] ottomata: looks good. want to try one? [20:15:15] ja [20:15:21] i try cp1052 [20:15:25] go for it [20:15:39] (03CR) 10Dzahn: "this isn't caused by your change, but i really wonder why the compiler fails not finding passwords::phabricator even though it's in labs/p" [puppet] - 10https://gerrit.wikimedia.org/r/173483 (owner: 1020after4) [20:16:01] mutante: maybe the compiler doesn't automatically update the private repo? [20:17:03] hmm. it was added on 10-14 though [20:17:14] if it doesn't update, it doesn't update [20:17:16] hmm, ori, i think we either need to puppetize or just manually remove the old varnishkafka.pyconf [20:17:29] ottomata: ah, can do [20:18:01] ah, hm [20:18:02] also, ori [20:18:03] as is [20:18:18] the pyconf metrics will be conflated i think [20:18:34] e.g. outbuf_cnt, rtt.avg, etc metrics [20:18:43] will have the same names in ganglia [20:18:50] even though they are from different varnishkafkainstances [20:18:53] and will have different metrics [20:18:58] hmm, right [20:19:02] let me see for a sec [20:19:06] i removed varnishkafka.pyconf everywhere btw via salt [20:19:06] paravoid: pong, though off again. traveling. [20:19:13] that's fine [20:19:20] eef [20:19:21] hm [20:19:32] maybe the metric names should be templated too [20:19:37] based on $varnish_name [20:19:39] hm [20:19:49] also, i think we should make it so that the existing metrics stay, if we can [20:19:55] so, maybe that could be parameterized [20:20:19] OOF [20:20:23] but that file is generated [20:20:23] hm [20:20:27] will have to change pythong script? [20:23:00] yeah, give me a minute [20:23:32] k [20:23:35] ottomata: is it ok if the ganglia metrics are briefly insane? i don't want to leave puppet disabled much longer [20:24:01] hm, it will be fine for now, ori, as we aren't addding a second instance yet [20:24:05] and the metric names are all still the same [20:24:12] right [20:24:20] okay, re-enabling across the fleet [20:24:21] so i thikn we can go ahead and enable puppet [20:24:22] k [20:24:43] ori, log it! [20:24:51] of course [20:25:23] !log re-enabling puppet on all varnishes following deployment of Iac35f2329 [20:25:27] Logged the message, Master [20:25:42] :) [20:48:44] (03CR) 1020after4: "@Dzahn: that passwords module is in the labs/private project, is that the one used in production or is there a separate repository used b" [puppet] - 10https://gerrit.wikimedia.org/r/173483 (owner: 1020after4) [20:51:50] (03CR) 10Nuria: "Tested and it is working well, can this be merged?" [puppet] - 10https://gerrit.wikimedia.org/r/172285 (https://bugzilla.wikimedia.org/72740) (owner: 10QChris) [20:57:36] (03PS1) 10Ori.livneh: Ganglia module: allow a key prefix to be specified [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/174272 [20:57:51] ottomata: ^ [20:59:42] ottomata: (the idea is that for the primary webrequest varnishkafka instance, we'd do: varnishkafka::monitoring { $varnish_name: key_prefix => '', } [21:00:39] (03PS1) 10BBlack: Add admin_state file to dns repo with copious internal docs [dns] - 10https://gerrit.wikimedia.org/r/174273 [21:00:53] (03PS1) 10BBlack: Update authdns scripts for gdnsd 2.x only [puppet] - 10https://gerrit.wikimedia.org/r/174274 [21:00:56] (03PS1) 10BBlack: Switch gdnsd config-geo mechanism to $include [puppet] - 10https://gerrit.wikimedia.org/r/174275 [21:00:57] ori: fwiw, i actually went to puppet-compiler02 in labs to check about private repo. the "missing" class does exist on the instance :p [21:00:58] (03PS1) 10BBlack: authdns-local-update: copy admin_state into place [puppet] - 10https://gerrit.wikimedia.org/r/174276 [21:01:34] (03CR) 10BBlack: [C: 032] Add admin_state file to dns repo with copious internal docs [dns] - 10https://gerrit.wikimedia.org/r/174273 (owner: 10BBlack) [21:02:55] greg-g: Reedy: We need a backport... https://gerrit.wikimedia.org/r/174271 [21:03:18] (03CR) 10BBlack: [C: 032] Update authdns scripts for gdnsd 2.x only [puppet] - 10https://gerrit.wikimedia.org/r/174274 (owner: 10BBlack) [21:03:30] PROBLEM - puppet last run on vanadium is CRITICAL: CRITICAL: puppet fail [21:04:38] (03CR) 10Dzahn: "20after4: in prod it's production/private (and i can confirm passwords::phabricator exists and has the 'real' value for $emailbot_cert), i" [puppet] - 10https://gerrit.wikimedia.org/r/173483 (owner: 1020after4) [21:04:39] (03CR) 10BBlack: [C: 032] Switch gdnsd config-geo mechanism to $include [puppet] - 10https://gerrit.wikimedia.org/r/174275 (owner: 10BBlack) [21:05:06] (03CR) 10BBlack: [C: 032] authdns-local-update: copy admin_state into place [puppet] - 10https://gerrit.wikimedia.org/r/174276 (owner: 10BBlack) [21:07:17] (03PS2) 10Dzahn: ganglia/gerrit: move install_cert out of site.pp [puppet] - 10https://gerrit.wikimedia.org/r/173477 [21:07:46] (03CR) 10Aklapper: [C: 04-1] "I'd still remove the "pattern" "(bugs|bugzilla).wikimedia.org/describecomponents.cgi" entirely as it currently redirects to a 404, plus re" [puppet] - 10https://gerrit.wikimedia.org/r/173483 (owner: 1020after4) [21:08:56] (03CR) 10BBlack: "recheck" [dns] - 10https://gerrit.wikimedia.org/r/174273 (owner: 10BBlack) [21:11:11] ottomata: ping [21:11:20] ahh everybvody is pinging me :) [21:11:31] hehe [21:11:48] what do you think about just merging the role patches? [21:12:02] https://gerrit.wikimedia.org/r/#/c/167700/ and https://gerrit.wikimedia.org/r/#/c/171741/ [21:12:23] ottomata: i can review those for you if you like [21:12:34] we should then be able to do some testing in beta labs without manual role copying [21:13:02] PROBLEM - puppet last run on baham is CRITICAL: CRITICAL: puppet fail [21:13:13] ori: that would be great [21:14:14] ^ that's me on baham [21:14:40] (03PS1) 10BBlack: s/config-head/config/ bugfix for 69f3b4ec [puppet] - 10https://gerrit.wikimedia.org/r/174278 [21:14:49] (03PS6) 1020after4: set up redirects for bugzilla urls to redirect to phabricator. [puppet] - 10https://gerrit.wikimedia.org/r/173483 [21:14:50] ori: so that would create keys like [21:14:58] statslistener.kafka.rdkafka.... [21:15:00] ja? [21:15:08] ottomata: yep [21:15:22] ottomata: but not for the primary instance; for that one we'd set key_prefix = '' [21:15:22] (03CR) 10BBlack: [C: 032 V: 032] s/config-head/config/ bugfix for 69f3b4ec [puppet] - 10https://gerrit.wikimedia.org/r/174278 (owner: 10BBlack) [21:15:30] aye right [21:15:31] hm [21:15:36] i guess that's cool. [21:15:37] sur esure ok [21:15:48] (03CR) 10Ottomata: [C: 032 V: 032] Ganglia module: allow a key prefix to be specified [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/174272 (owner: 10Ori.livneh) [21:16:10] PROBLEM - puppet last run on baham is CRITICAL: CRITICAL: puppet fail [21:16:23] (03CR) 10Ori.livneh: [C: 04-1] Add cassandra role (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/167700 (owner: 10Ottomata) [21:16:45] (03CR) 1020after4: [C: 031] "@aklapper: done" [puppet] - 10https://gerrit.wikimedia.org/r/173483 (owner: 1020after4) [21:17:40] (03PS1) 10Ori.livneh: Update varnishkafka module to f3b76bd [puppet] - 10https://gerrit.wikimedia.org/r/174279 [21:17:47] (03CR) 10jenkins-bot: [V: 04-1] Update varnishkafka module to f3b76bd [puppet] - 10https://gerrit.wikimedia.org/r/174279 (owner: 10Ori.livneh) [21:18:10] (03PS7) 1020after4: set up redirects for bugzilla urls to redirect to phabricator. [puppet] - 10https://gerrit.wikimedia.org/r/173483 [21:18:43] (03CR) 1020after4: "(patch set 7 is just a rebase to bring this up to date with production head)" [puppet] - 10https://gerrit.wikimedia.org/r/173483 (owner: 1020after4) [21:19:13] (03PS2) 10Ori.livneh: Update varnishkafka module to f3b76bd [puppet] - 10https://gerrit.wikimedia.org/r/174279 [21:20:37] (03CR) 10Ori.livneh: Add a simple restbase::labs role (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/171741 (owner: 10GWicke) [21:20:52] (03CR) 10Ori.livneh: [C: 032] Update varnishkafka module to f3b76bd [puppet] - 10https://gerrit.wikimedia.org/r/174279 (owner: 10Ori.livneh) [21:20:59] (03CR) 10Ottomata: "K. I have to run very soon, so I will let you merge this. Watch it closely to make sure the existing metrics are cool, otherwise icinga" [puppet] - 10https://gerrit.wikimedia.org/r/174279 (owner: 10Ori.livneh) [21:21:15] ottomata: got it [21:21:43] ottomata: before you run -- would you be okay with 'metrics' instead of 'stats'? (it's not *just* perf) [21:21:50] RECOVERY - puppet last run on vanadium is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [21:24:11] PROBLEM - puppet last run on amssq43 is CRITICAL: CRITICAL: Puppet has 1 failures [21:24:20] PROBLEM - puppet last run on cp3013 is CRITICAL: CRITICAL: Puppet has 1 failures [21:24:35] (03PS1) 10Ori.livneh: Fix-up for I4ad01776c79 [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/174281 [21:24:39] PROBLEM - puppet last run on amssq59 is CRITICAL: CRITICAL: Puppet has 1 failures [21:24:39] PROBLEM - puppet last run on cp3006 is CRITICAL: CRITICAL: Puppet has 1 failures [21:24:44] (03CR) 10Ori.livneh: [C: 032 V: 032] Fix-up for I4ad01776c79 [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/174281 (owner: 10Ori.livneh) [21:24:48] puppet failure on varnishes is me, fixing in a sec [21:24:49] PROBLEM - puppet last run on cp1049 is CRITICAL: CRITICAL: Puppet has 1 failures [21:24:51] PROBLEM - puppet last run on cp3015 is CRITICAL: CRITICAL: Puppet has 1 failures [21:25:00] PROBLEM - puppet last run on amssq54 is CRITICAL: CRITICAL: Puppet has 1 failures [21:25:01] PROBLEM - puppet last run on cp1070 is CRITICAL: CRITICAL: Puppet has 1 failures [21:25:10] PROBLEM - puppet last run on cp1056 is CRITICAL: CRITICAL: Puppet has 1 failures [21:25:13] PROBLEM - puppet last run on amssq44 is CRITICAL: CRITICAL: Puppet has 1 failures [21:25:13] PROBLEM - puppet last run on amssq49 is CRITICAL: CRITICAL: Puppet has 1 failures [21:25:19] RECOVERY - puppet last run on baham is OK: OK: Puppet is currently enabled, last run 7 minutes ago with 0 failures [21:25:29] PROBLEM - puppet last run on cp3020 is CRITICAL: CRITICAL: Puppet has 1 failures [21:25:35] (03PS1) 10Ori.livneh: Update varnishkafka submodule [puppet] - 10https://gerrit.wikimedia.org/r/174282 [21:25:39] PROBLEM - puppet last run on cp3008 is CRITICAL: CRITICAL: Puppet has 1 failures [21:25:40] PROBLEM - puppet last run on cp1039 is CRITICAL: CRITICAL: Puppet has 1 failures [21:25:40] ori, i know you are all excited to get this in, but i thikn we should wait to merge the addition of the second instance [21:25:44] (03CR) 10Ori.livneh: [C: 032 V: 032] Update varnishkafka submodule [puppet] - 10https://gerrit.wikimedia.org/r/174282 (owner: 10Ori.livneh) [21:25:50] at least, unless you find another babysitter other than me [21:25:54] we'll have to create the kafka topic too [21:25:58] it doesn' auto create [21:26:10] PROBLEM - puppet last run on cp1055 is CRITICAL: CRITICAL: Puppet has 1 failures [21:26:12] PROBLEM - puppet last run on cp4006 is CRITICAL: CRITICAL: Puppet has 1 failures [21:26:17] probably should talk about replication/partitions, etc. [21:26:26] ok, so let's wait until tomorrow or something [21:26:27] can we discuss more tomorrow? [21:26:28] cool, thanks [21:26:30] PROBLEM - puppet last run on amssq61 is CRITICAL: CRITICAL: Puppet has 1 failures [21:26:31] PROBLEM - puppet last run on cp1047 is CRITICAL: CRITICAL: Puppet has 1 failures [21:26:32] i'm giving a kafka talk tonight! [21:26:34] have to prepare [21:26:38] http://www.meetup.com/Apache-Kafka-NYC/events/206917572/ [21:26:39] ooh coool! where? [21:26:39] PROBLEM - puppet last run on amssq53 is CRITICAL: CRITICAL: Puppet has 1 failures [21:26:41] 70 peopel! [21:26:41] awesome! [21:26:42] the review commentary " otherwise icinga"... -> mass icinga spam was kinda awesome [21:26:59] bblack: sorry, they should be fixing themselves now [21:27:00] PROBLEM - puppet last run on cp3003 is CRITICAL: CRITICAL: Puppet has 1 failures [21:27:00] PROBLEM - puppet last run on cp3016 is CRITICAL: CRITICAL: Puppet has 1 failures [21:27:08] black, hahah [21:27:11] PROBLEM - puppet last run on amssq35 is CRITICAL: CRITICAL: Puppet has 1 failures [21:27:12] hahah [21:27:12] (03PS20) 10GWicke: Add a simple restbase::labs role [puppet] - 10https://gerrit.wikimedia.org/r/171741 [21:27:14] (03CR) 10GWicke: Add a simple restbase::labs role (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/171741 (owner: 10GWicke) [21:27:30] PROBLEM - puppet last run on amssq32 is CRITICAL: CRITICAL: Puppet has 1 failures [21:27:50] PROBLEM - puppet last run on cp1061 is CRITICAL: CRITICAL: Puppet has 1 failures [21:27:56] don't worry there's only ~104 of them all total :) [21:28:40] PROBLEM - puppet last run on cp3014 is CRITICAL: CRITICAL: Puppet has 1 failures [21:29:09] PROBLEM - puppet last run on cp4008 is CRITICAL: CRITICAL: Puppet has 1 failures [21:29:10] PROBLEM - puppet last run on cp4003 is CRITICAL: CRITICAL: Puppet has 1 failures [21:29:20] PROBLEM - puppet last run on amssq48 is CRITICAL: CRITICAL: Puppet has 1 failures [21:29:30] PROBLEM - puppet last run on cp4004 is CRITICAL: CRITICAL: Puppet has 1 failures [21:29:30] PROBLEM - puppet last run on amssq46 is CRITICAL: CRITICAL: Puppet has 1 failures [21:29:40] PROBLEM - puppet last run on amssq60 is CRITICAL: CRITICAL: Puppet has 1 failures [21:29:50] PROBLEM - puppet last run on cp3010 is CRITICAL: CRITICAL: Puppet has 1 failures [21:30:00] PROBLEM - puppet last run on amssq55 is CRITICAL: CRITICAL: Puppet has 1 failures [21:30:19] PROBLEM - puppet last run on cp4014 is CRITICAL: CRITICAL: Puppet has 1 failures [21:30:19] PROBLEM - puppet last run on amssq34 is CRITICAL: CRITICAL: Puppet has 1 failures [21:30:20] PROBLEM - puppet last run on cp1050 is CRITICAL: CRITICAL: Puppet has 1 failures [21:30:30] PROBLEM - puppet last run on amssq47 is CRITICAL: CRITICAL: Puppet has 1 failures [21:30:37] (03PS1) 10Ori.livneh: Fix-up for I4ad01776c79 [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/174285 [21:30:45] (03CR) 10Ori.livneh: [C: 032 V: 032] Fix-up for I4ad01776c79 [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/174285 (owner: 10Ori.livneh) [21:30:50] PROBLEM - puppet last run on amssq56 is CRITICAL: CRITICAL: Puppet has 1 failures [21:31:09] PROBLEM - puppet last run on cp1046 is CRITICAL: CRITICAL: Puppet has 1 failures [21:31:10] PROBLEM - puppet last run on cp1063 is CRITICAL: CRITICAL: Puppet has 1 failures [21:31:11] PROBLEM - puppet last run on cp4019 is CRITICAL: CRITICAL: Puppet has 1 failures [21:31:16] i hate submodules [21:31:17] (03PS1) 10Ori.livneh: Update varnishkafka submodule [puppet] - 10https://gerrit.wikimedia.org/r/174286 [21:31:21] PROBLEM - puppet last run on cp4001 is CRITICAL: CRITICAL: Puppet has 1 failures [21:31:21] PROBLEM - puppet last run on amssq40 is CRITICAL: CRITICAL: Puppet has 1 failures [21:31:21] PROBLEM - puppet last run on amssq36 is CRITICAL: CRITICAL: Puppet has 1 failures [21:31:24] (03CR) 10Ori.livneh: [C: 032 V: 032] Update varnishkafka submodule [puppet] - 10https://gerrit.wikimedia.org/r/174286 (owner: 10Ori.livneh) [21:31:41] PROBLEM - puppet last run on cp4005 is CRITICAL: CRITICAL: Puppet has 1 failures [21:31:50] PROBLEM - puppet last run on amssq51 is CRITICAL: CRITICAL: Puppet has 1 failures [21:31:59] PROBLEM - puppet last run on cp1048 is CRITICAL: CRITICAL: Puppet has 1 failures [21:32:09] PROBLEM - puppet last run on cp4018 is CRITICAL: CRITICAL: Puppet has 1 failures [21:32:30] PROBLEM - puppet last run on cp1062 is CRITICAL: CRITICAL: Puppet has 1 failures [21:32:49] PROBLEM - puppet last run on amssq41 is CRITICAL: CRITICAL: Puppet has 1 failures [21:32:59] PROBLEM - puppet last run on cp1038 is CRITICAL: CRITICAL: Puppet has 1 failures [21:33:09] PROBLEM - puppet last run on amssq42 is CRITICAL: CRITICAL: Puppet has 1 failures [21:33:10] PROBLEM - puppet last run on amssq62 is CRITICAL: CRITICAL: Puppet has 1 failures [21:33:18] <^d> !log elasticsearch: set auto_expand_replicas to 0-2 on ttmserver(-test) like other indexes for extra redundancy. [21:33:20] RECOVERY - puppet last run on cp1056 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [21:33:21] sorry for the spam, they're recovering now [21:33:22] Logged the message, Master [21:33:38] <^d> ori: That's twice today :) [21:33:49] ^d: ? [21:33:50] PROBLEM - puppet last run on cp1060 is CRITICAL: CRITICAL: Puppet has 1 failures [21:33:50] PROBLEM - puppet last run on cp3009 is CRITICAL: CRITICAL: Puppet has 1 failures [21:34:05] greg-g: Reedy: Are you ok with the Wikibase backport? [21:34:09] PROBLEM - puppet last run on amssq38 is CRITICAL: CRITICAL: Puppet has 1 failures [21:34:18] <^d> ori: j.oe broke puppet this morning :) [21:34:21] oh [21:35:10] RECOVERY - puppet last run on cp4018 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [21:35:19] RECOVERY - puppet last run on cp4019 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [21:35:29] RECOVERY - puppet last run on cp4014 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [21:36:30] RECOVERY - puppet last run on cp4006 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [21:36:49] RECOVERY - puppet last run on cp4005 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [21:37:10] RECOVERY - puppet last run on cp4008 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [21:37:10] RECOVERY - puppet last run on cp4003 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [21:37:28] (03PS2) 10Ori.livneh: add varnish::logging::statslistener [puppet] - 10https://gerrit.wikimedia.org/r/174195 [21:37:30] RECOVERY - puppet last run on cp4001 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [21:37:40] RECOVERY - puppet last run on cp4004 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [21:38:41] RECOVERY - puppet last run on cp3020 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [21:38:41] RECOVERY - puppet last run on cp3013 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [21:38:50] RECOVERY - puppet last run on cp3014 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [21:39:10] RECOVERY - puppet last run on cp3015 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [21:39:21] RECOVERY - puppet last run on cp3016 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [21:39:55] hoo: I assume so, what's up/ [21:40:00] RECOVERY - puppet last run on cp3010 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [21:40:00] RECOVERY - puppet last run on cp3009 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [21:40:00] RECOVERY - puppet last run on cp3006 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [21:40:15] hoo: things broken? just do it if needed [21:40:26] Ok, will do it in a moment then [21:40:29] RECOVERY - puppet last run on cp3003 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [21:40:36] https://gerrit.wikimedia.org/r/174284 is the change [21:40:50] RECOVERY - puppet last run on cp3008 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:40:50] (03PS1) 1020after4: Clean up some puppet-lint errors and warnings [puppet] - 10https://gerrit.wikimedia.org/r/174288 [21:41:00] RECOVERY - puppet last run on cp1061 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [21:41:20] RECOVERY - puppet last run on cp1063 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [21:41:20] RECOVERY - puppet last run on cp1070 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [21:41:39] RECOVERY - puppet last run on cp1062 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [21:41:50] (03CR) 10Dzahn: [C: 032] ganglia/gerrit: move install_cert out of site.pp [puppet] - 10https://gerrit.wikimedia.org/r/173477 (owner: 10Dzahn) [21:42:09] RECOVERY - puppet last run on amssq59 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [21:42:29] grrrr [21:42:30] RECOVERY - puppet last run on cp1055 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [21:42:39] RECOVERY - puppet last run on amssq43 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [21:42:40] RECOVERY - puppet last run on amssq44 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [21:42:51] RECOVERY - puppet last run on cp1060 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [21:43:00] RECOVERY - puppet last run on cp1049 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [21:43:10] RECOVERY - puppet last run on cp1048 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [21:43:12] RECOVERY - puppet last run on cp1046 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [21:43:30] RECOVERY - puppet last run on amssq54 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [21:43:40] RECOVERY - puppet last run on amssq49 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [21:43:40] RECOVERY - puppet last run on cp1050 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [21:43:49] RECOVERY - puppet last run on cp1047 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [21:43:49] RECOVERY - puppet last run on amssq61 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [21:44:00] RECOVERY - puppet last run on cp1039 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [21:44:50] RECOVERY - puppet last run on amssq32 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [21:44:59] RECOVERY - puppet last run on amssq53 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [21:45:10] RECOVERY - puppet last run on cp1038 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [21:45:10] RECOVERY - puppet last run on amssq60 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [21:45:10] RECOVERY - puppet last run on amssq56 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [21:45:30] RECOVERY - puppet last run on amssq62 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [21:45:40] RECOVERY - puppet last run on amssq35 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [21:45:56] (03CR) 10Dzahn: [C: 032] Adding blogs to de. and en.planet.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/174182 (owner: 10Dereckson) [21:46:41] RECOVERY - puppet last run on amssq48 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [21:46:50] RECOVERY - puppet last run on amssq46 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [21:46:59] RECOVERY - puppet last run on amssq47 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [21:47:19] RECOVERY - puppet last run on amssq51 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [21:47:30] RECOVERY - puppet last run on amssq55 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:48:24] (03CR) 10Dzahn: [C: 032] set up redirects for bugzilla urls to redirect to phabricator. [puppet] - 10https://gerrit.wikimedia.org/r/173483 (owner: 1020after4) [21:48:30] RECOVERY - puppet last run on amssq38 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [21:48:30] RECOVERY - puppet last run on amssq42 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [21:48:49] RECOVERY - puppet last run on amssq34 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [21:48:49] RECOVERY - puppet last run on amssq40 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [21:48:49] RECOVERY - puppet last run on amssq36 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [21:49:19] RECOVERY - puppet last run on amssq41 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:50:53] !log hoo Synchronized php-1.25wmf8/extensions/Wikidata/: Fix EntityIdLabelFormatter et al. (duration: 00m 17s) [21:50:58] Logged the message, Master [21:51:42] (03PS1) 10BBlack: config-geo: initial conversion to failover arrays [dns] - 10https://gerrit.wikimedia.org/r/174291 [21:52:38] (03CR) 10BBlack: [C: 032] config-geo: initial conversion to failover arrays [dns] - 10https://gerrit.wikimedia.org/r/174291 (owner: 10BBlack) [21:57:05] (03PS1) 10BBlack: silence info-level spam on gdnsd restart [puppet] - 10https://gerrit.wikimedia.org/r/174294 [21:57:23] (03CR) 10BBlack: [C: 032 V: 032] silence info-level spam on gdnsd restart [puppet] - 10https://gerrit.wikimedia.org/r/174294 (owner: 10BBlack) [21:59:12] (03PS1) 10Hoo man: Bump wgCacheEpoch for Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174295 [21:59:30] (03CR) 10Hoo man: [C: 032] Bump wgCacheEpoch for Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174295 (owner: 10Hoo man) [21:59:38] (03Merged) 10jenkins-bot: Bump wgCacheEpoch for Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174295 (owner: 10Hoo man) [22:00:05] spagewmf, ebernhardson: Respected human, time to deploy Flow (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141118T2200). Please do the needful. [22:00:12] !log hoo Synchronized wmf-config/Wikibase.php: Bump cache epoch (duration: 00m 07s) [22:00:14] Logged the message, Master [22:00:24] * hoo is done [22:01:52] (03PS1) 10Ori.livneh: Add 'phaste' tool for pastebinning text to Phabricator [puppet] - 10https://gerrit.wikimedia.org/r/174297 [22:02:36] (03CR) 10jenkins-bot: [V: 04-1] Add 'phaste' tool for pastebinning text to Phabricator [puppet] - 10https://gerrit.wikimedia.org/r/174297 (owner: 10Ori.livneh) [22:03:12] (03PS3) 10Dzahn: misc-web varnish: bugzilla to phab box [puppet] - 10https://gerrit.wikimedia.org/r/172471 [22:04:09] (03PS2) 10Ori.livneh: Add 'phaste' tool for pastebinning text to Phabricator [puppet] - 10https://gerrit.wikimedia.org/r/174297 [22:05:52] (03CR) 10Dzahn: [C: 032] "let's do this now. this way it's just done and doesn't have to be done during actual migration, the switch can be just the actual DNS chan" [puppet] - 10https://gerrit.wikimedia.org/r/172471 (owner: 10Dzahn) [22:06:35] (03CR) 10Ori.livneh: "@ottomata:" [puppet] - 10https://gerrit.wikimedia.org/r/174195 (owner: 10Ori.livneh) [22:09:25] (03CR) 10Rush: [C: 031] "seems good actually, only comment would be, is it worth a module for one script? I'm ok with "utility" type things living in base (I pref" [puppet] - 10https://gerrit.wikimedia.org/r/174297 (owner: 10Ori.livneh) [22:12:06] chasemp: what do you mean by 'common'? [22:12:36] confusing comment on my part [22:12:51] I like teh idea of a "common" module for utilities and things you expect everywhere [22:12:56] we seems to kinda sorta use base for this [22:12:58] oh, i see [22:12:59] but it's a bit of a mess [22:13:06] no worries [22:13:10] so how about: [22:13:11] just commentary [22:13:29] (03CR) 10GWicke: [C: 031] Add a simple restbase::labs role [puppet] - 10https://gerrit.wikimedia.org/r/171741 (owner: 10GWicke) [22:13:29] modules/phaste/manifests/init.pp -> modules/base/manifests/phaste.pp [22:13:38] seems good to me man [22:13:39] put the file into the phabricator module? [22:13:52] it isn't really part of phabricator, though [22:13:56] ori, https://gerrit.wikimedia.org/r/#/c/171741/ should be ready [22:14:39] (03PS21) 10Ori.livneh: Add a simple restbase::labs role [puppet] - 10https://gerrit.wikimedia.org/r/171741 (owner: 10GWicke) [22:19:38] (03PS3) 10Ori.livneh: Add 'phaste' tool for pastebinning text to Phabricator [puppet] - 10https://gerrit.wikimedia.org/r/174297 [22:19:52] (03PS22) 10Ori.livneh: Add a simple restbase role [puppet] - 10https://gerrit.wikimedia.org/r/171741 (owner: 10GWicke) [22:20:03] chasemp: {{done}} [22:20:45] (03PS4) 10Rush: Add 'phaste' tool for pastebinning text to Phabricator [puppet] - 10https://gerrit.wikimedia.org/r/174297 (owner: 10Ori.livneh) [22:20:51] (03CR) 10Ori.livneh: [C: 032] Add a simple restbase role [puppet] - 10https://gerrit.wikimedia.org/r/171741 (owner: 10GWicke) [22:21:23] (03CR) 10Rush: [C: 031] Add 'phaste' tool for pastebinning text to Phabricator [puppet] - 10https://gerrit.wikimedia.org/r/174297 (owner: 10Ori.livneh) [22:21:40] (03CR) 10Ori.livneh: [C: 032] Add 'phaste' tool for pastebinning text to Phabricator [puppet] - 10https://gerrit.wikimedia.org/r/174297 (owner: 10Ori.livneh) [22:22:05] chasemp: danke! [22:27:57] <^d> !log elasticsearch: set a template to apply auto_expand_replicas 0-2 on all newly created indexes. [22:28:03] Logged the message, Master [22:34:24] (03PS4) 10GWicke: Add cassandra role [puppet] - 10https://gerrit.wikimedia.org/r/167700 (owner: 10Ottomata) [22:38:16] (03PS5) 10GWicke: Add cassandra role [puppet] - 10https://gerrit.wikimedia.org/r/167700 (owner: 10Ottomata) [22:39:06] (03CR) 10GWicke: Add cassandra role (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/167700 (owner: 10Ottomata) [22:39:39] (03CR) 10GWicke: [C: 031] "Removed the sysctl stuff in the amend, as this is already set by the Debian package." [puppet] - 10https://gerrit.wikimedia.org/r/167700 (owner: 10Ottomata) [22:39:52] (03PS1) 10Kaldari: Adding 'types of actors' WikiGrok campaign [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174303 [22:40:00] (03CR) 10jenkins-bot: [V: 04-1] Adding 'types of actors' WikiGrok campaign [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174303 (owner: 10Kaldari) [22:41:34] (03PS2) 10Kaldari: Adding 'types of actors' WikiGrok campaign [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174303 [22:46:00] ori: https://gerrit.wikimedia.org/r/#/c/167700/ [22:46:30] gwicke: Also needs a new version of Cassandra .deb uploaded to apt. <-- is this done? [22:46:43] !log Updated zuul config on gallium to include I511b14e (Make cdb-phpunit job non-voting) [22:46:45] Logged the message, Master [22:49:15] (03PS3) 10Kaldari: Adding 'types of actors' WikiGrok campaign [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174303 [22:54:13] ori: afaik yes [22:54:49] labs has 2.0.10 [22:55:07] which is the latest stable [23:00:11] precise has 2.0.9 it seems, but we don't use that [23:11:31] (03PS1) 1020after4: Fix a bug in redirector that broke the alternate-files-domain Also, don't die when connection fails, just log the error and return so that there is no chance of this interfering with normal operation of phabricator. [puppet] - 10https://gerrit.wikimedia.org/r/174310 [23:12:40] (03PS1) 10Dzahn: Revert "set up redirects for bugzilla urls to redirect to phabricator." [puppet] - 10https://gerrit.wikimedia.org/r/174311 [23:14:12] (03CR) 10Dzahn: [C: 032] "https://phab.wmfusercontent.org/file/data/g7yf5tr6qig54odl3h6s/PHID-FILE-4ewkvx2xw2shfcpkpdpc/README.md" [puppet] - 10https://gerrit.wikimedia.org/r/174311 (owner: 10Dzahn) [23:17:08] (03PS1) 10Ori.livneh: Undeploy AntiBot [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174312 [23:18:27] (03CR) 10Chad: [C: 032] "Beat me to it." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174312 (owner: 10Ori.livneh) [23:18:30] PROBLEM - git.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:18:35] (03Merged) 10jenkins-bot: Undeploy AntiBot [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174312 (owner: 10Ori.livneh) [23:19:49] !log demon Synchronized wmf-config: undeploy antibot (duration: 00m 04s) [23:19:52] Logged the message, Master [23:19:53] <^d> ori: ^ [23:28:40] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/). [23:29:50] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 58798 bytes in 9.115 second response time [23:34:29] !log ori Synchronized wmf-config: I76f2023a1: 'Undeploy AntiBot' (duration: 00m 04s) [23:34:33] Logged the message, Master [23:34:35] ^d: thanks [23:34:43] <^d> yw [23:34:49] legoktm: ditto! [23:34:49] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [23:35:01] :) [23:45:31] (03CR) 10Dzahn: "however that happened but " $this->config->mysql->user," somehow became "user rush" which was then denied access to a db on localhost" [puppet] - 10https://gerrit.wikimedia.org/r/173483 (owner: 1020after4) [23:46:17] (03CR) 10Dzahn: "copied preamble.php from before the change back in place to fix errors. /srv/phab/phabricator/support/preamble.php" [puppet] - 10https://gerrit.wikimedia.org/r/173483 (owner: 1020after4) [23:49:54] (03CR) 10Dzahn: "error was: redirector.php: Connect Error (1045) Access denied for user 'rush'@'10.64.32.150' (using password: NO) so you gotta figure out " [puppet] - 10https://gerrit.wikimedia.org/r/173483 (owner: 1020after4) [23:56:57] MySQL server has gone away (10.64.16.10) SELECT cur_text FROM `cur` WHERE cur_id = '393247' LIMIT 1 [23:56:59] hehe [23:57:58] AaronSchulz, o_0 [23:58:17] where did it come from?