[00:00:05] RoanKattouw, ^d, marktraceur, MaxSem, kaldari: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141212T0000). [00:00:10] on it [00:01:35] MaxSem: Thanks [00:09:13] MaxSem: Once the WikiGrok change is live, let me know so we can test on en.wiki. If it fixes the problem, we'll also want to deploy your config change from yesterday to enable WikiGrok on enwiki. [00:09:46] the one that got accidentally deployed while were sleeping and then reverted? :P [00:09:47] as this will probably be our last chance before the holidays [00:09:57] !log stop profiler-to-carbon on tungsten [00:10:01] Logged the message, Master [00:10:16] oh maybe :) [00:15:21] !log maxsem Synchronized php-1.25wmf12/extensions/MobileFrontend/: (no message) (duration: 00m 05s) [00:15:27] Logged the message, Master [00:15:37] !log maxsem Synchronized php-1.25wmf12/extensions/WikiGrok/: (no message) (duration: 00m 06s) [00:15:41] Logged the message, Master [00:17:14] !log maxsem Synchronized php-1.25wmf11/extensions/WikiGrok/: (no message) (duration: 00m 06s) [00:17:16] Logged the message, Master [00:17:39] !log maxsem Synchronized php-1.25wmf11/extensions/MobileFrontend: (no message) (duration: 00m 06s) [00:17:42] Logged the message, Master [00:17:46] kaldari, ^^^^ [00:20:14] yurikR, Warning: Unable to get config content: title=Zero:631-20, result={"ns":480,"title":"Zero:631-20","missing":""} [Called from JsonConfig\JCUtils::warn in /srv/mediawiki/php-1.25wmf11/extensions/JsonConfig/includes/JCUtils.php at line 50] in /srv/mediawiki/p [00:20:36] MaxSem, thx [00:21:52] MaxSem, when/where did you see it? [00:22:01] not seeing it in logstash [00:22:05] in logstash [00:22:21] use hte fatalmonitor dashboard [00:24:00] (03CR) 10MaxSem: [C: 032] Second attempt at mobile wikidata, now with a subdomain [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179341 (owner: 10MaxSem) [00:24:15] (03Merged) 10jenkins-bot: Second attempt at mobile wikidata, now with a subdomain [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179341 (owner: 10MaxSem) [00:25:17] !log maxsem Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/179341/ (duration: 00m 05s) [00:25:22] Logged the message, Master [00:29:09] (03PS1) 10MaxSem: Fix m.wikidata host [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179359 [00:29:29] (03CR) 10MaxSem: [C: 032] Fix m.wikidata host [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179359 (owner: 10MaxSem) [00:29:39] (03Merged) 10jenkins-bot: Fix m.wikidata host [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179359 (owner: 10MaxSem) [00:30:24] !log maxsem Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/179359 (duration: 00m 11s) [00:30:28] Logged the message, Master [00:35:50] !log profiler-to-carbon is logging too much on tungsten, cause unknown yet but don't restart [00:35:54] Logged the message, Master [00:38:38] MaxSem: Go ahead and turn WikiGrok on for enwiki [00:38:46] weeeeeee [00:38:56] MaxSem: All systems are go! [00:39:04] (03PS1) 10MaxSem: Revert "Revert "Reenable WikiGrok UI on enwiki"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179360 [00:39:22] (03CR) 10MaxSem: [C: 032] Revert "Revert "Reenable WikiGrok UI on enwiki"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179360 (owner: 10MaxSem) [00:39:35] (03Merged) 10jenkins-bot: Revert "Revert "Reenable WikiGrok UI on enwiki"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179360 (owner: 10MaxSem) [00:41:02] !log maxsem Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/179360 (duration: 00m 06s) [00:41:06] Logged the message, Master [00:41:10] kaldari, ^^^ [00:41:25] MaxSem: Thanks! [00:47:16] MaxSem: Seems to be working well :) [00:47:48] oh noes [00:57:08] kaldari: what did you deploy? [00:57:36] api spiked, worth watching [00:58:29] ori, wikigrok for logged in users on enwiki [01:00:35] ori, it seeems to have returned to its previous level [01:01:13] yeah [01:21:55] (03PS1) 10BryanDavis: Introduce wmgUseMonologLogger feature flag [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179368 [01:21:57] (03PS1) 10BryanDavis: Optional MWLoggerMonologSpi configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179369 [01:21:59] (03PS1) 10BryanDavis: Enable MWLoggerMonologSpi for testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179370 [01:33:38] gwicke: Is it expected that restbase is sending 550,000 log events per hour to logstash? And that they have empty "message" fields? [01:35:07] what, you'd rather have half a million messages *with* a message body? [01:36:20] bd808: I can change the log level if the volume is too high [01:36:33] I'm currently testing, and am logging each request [01:36:59] It's ok until logstash dies I guess. [01:37:29] let me bump it up to log only warnings + errors then [01:37:34] that's twice the volume of hadoop (which is noisy) and 4x the volume of runJobs [01:38:40] bd808: puppet change coming [01:39:06] ori: On a related note, I'd love your help figuring out which of the apache log entries we are storing right now are unwanted noise. [01:39:34] bd808: unwanted by whom? [01:39:37] "AH01070: Error parsing script headers" seems to be most of them [01:39:38] i have no idea, really [01:39:48] oh, yeah, that's a known mod_proxy_fcgi bug [01:39:53] so that one is unwanted [01:40:12] *nod* I'll add some config to ignore that one [01:41:22] bd808: how tight are the logstash resources currently? [01:41:52] gwicke: We have lots of disk now but the servers themselves are small misc boxes [01:42:02] So my fear is cpu/ram problems [01:42:15] kk [01:43:11] gwicke: I think I see where you started your tests -- http://ganglia.wikimedia.org/latest/graph_all_periods.php?c=Logstash%20cluster%20eqiad&m=cpu_report&r=hour&s=by%20name&hc=4&mc=2&st=1418348548&g=cpu_report&z=large [01:43:19] * gwicke is waiting for gerrit [01:44:05] bd808: yeah, that's quite obvious [01:44:50] I don't really need the normal requests, so no problem to change it (once gerrit lets me update the puppet repo) [01:55:46] something seems to be up with the gerrit host: my pull finished only now [02:00:42] (03PS1) 10GWicke: Set default logging level to 'warn' [puppet] - 10https://gerrit.wikimedia.org/r/179382 [02:01:24] bd808|BUFFER, ori ^^ [02:01:33] (03CR) 10Ori.livneh: [C: 032 V: 032] Set default logging level to 'warn' [puppet] - 10https://gerrit.wikimedia.org/r/179382 (owner: 10GWicke) [02:02:19] ori: thanks! [02:15:30] !log l10nupdate Synchronized php-1.25wmf11/cache/l10n: (no message) (duration: 00m 03s) [02:15:34] !log LocalisationUpdate completed (1.25wmf11) at 2014-12-12 02:15:34+00:00 [02:15:39] Logged the message, Master [02:15:44] Logged the message, Master [02:20:33] !log l10nupdate Synchronized php-1.25wmf12/cache/l10n: (no message) (duration: 00m 03s) [02:20:36] Logged the message, Master [02:20:37] !log LocalisationUpdate completed (1.25wmf12) at 2014-12-12 02:20:37+00:00 [02:20:39] Logged the message, Master [02:48:17] (03PS1) 10Ori.livneh: Log xenon-captured traces via wfDebugLog [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179389 [02:48:20] ^ AaronSchulz [02:48:51] andrewbogott: morning! mw1041 is unresponsive; could you possibly see if you can connect via console and reboot? [02:49:01] ori: sure, one moment [02:51:00] (03CR) 10jenkins-bot: [V: 04-1] Log xenon-captured traces via wfDebugLog [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179389 (owner: 10Ori.livneh) [02:52:59] ori: I cycled power, the console isn't saying much so far [02:53:07] oh, ok, it's booting now [02:55:15] ori: it's up now. It's host key is broken somehow… can you access? [02:55:24] s/It's/its/ [02:55:42] !log rebooted mw1041 from mgmt [02:55:48] Logged the message, Master [03:13:47] greg-g, could i quickly deploy a minor zeroportal bugfix? affects only zero.wikimedia.org, but prevents us from progressing [03:14:54] https://gerrit.wikimedia.org/r/#/c/179394 [03:17:21] (03PS2) 10Ori.livneh: Log xenon-captured traces via wfDebugLog [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179389 [03:18:28] (03PS1) 10Ori.livneh: set hhvm.xenon.period=600 on mw108{1-3} via hiera [puppet] - 10https://gerrit.wikimedia.org/r/179395 [03:18:49] (03CR) 10Ori.livneh: [C: 032 V: 032] set hhvm.xenon.period=600 on mw108{1-3} via hiera [puppet] - 10https://gerrit.wikimedia.org/r/179395 (owner: 10Ori.livneh) [03:21:27] (03CR) 10Ori.livneh: [C: 032] Log xenon-captured traces via wfDebugLog [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179389 (owner: 10Ori.livneh) [03:21:36] (03Merged) 10jenkins-bot: Log xenon-captured traces via wfDebugLog [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179389 (owner: 10Ori.livneh) [03:22:32] andrewbogott: d'oh, i lose context and missed your messages. (i just realized this could have happened, and checked the channel log). yes, i can access it now -- it looks fine. thanks for kicking it. [03:22:37] s/lose/lost [03:22:52] great [03:24:09] yurikR2: just do it, it only affects zerowiki [03:24:31] ori, thx, i'm already merging it, should be out shortly [03:24:56] !log ori Synchronized wmf-config: I1d218c2d6: Log xenon-captured traces via wfDebugLog (duration: 00m 06s) [03:25:03] Logged the message, Master [03:27:15] (03Abandoned) 10Ori.livneh: hhvm: load tidy.so extension [puppet] - 10https://gerrit.wikimedia.org/r/176881 (owner: 10Ori.livneh) [03:27:36] !log yurik Synchronized php-1.25wmf12/extensions/ZeroPortal/: updatidng ZeroPortal to master - urgent bugfix (duration: 00m 05s) [03:27:39] Logged the message, Master [03:28:12] 03:27:34 ['/srv/deployment/scap/scap/bin/sync-common', '--no-update-l10n', '--include', 'php-1.25wmf12', '--include', 'php-1.25wmf12/extensions', '--include', 'php-1.25wmf12/extensions/ZeroPortal', '--include', 'php-1.25wmf12/extensions/ZeroPortal/***', 'mw1010.eqiad.wmnet', 'mw1070.eqiad.wmnet', 'mw1161.eqiad.wmnet', 'mw1201.eqiad.wmnet'] on mw1190 returned [255]: Error reading response length from authentication socket. [03:28:12] Permission denied (publickey). [03:28:34] yurikR2: try again [03:28:52] !log yurik Synchronized php-1.25wmf12/extensions/ZeroPortal/: updatidng ZeroPortal to master - urgent bugfix - retry (duration: 00m 10s) [03:28:56] Logged the message, Master [03:29:02] ori, thx, worked [03:29:07] :/ [03:29:10] means there's a keyholder bug. [03:48:00] (03CR) 10Tim Starling: "Presumably the idea is to show fundraising banners once per week to people who click the close button. I do wonder where this would sit on" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177278 (owner: 10Ejegg) [03:50:11] !log LocalisationUpdate ResourceLoader cache refresh completed at Fri Dec 12 03:50:10 UTC 2014 (duration 50m 9s) [03:50:15] Logged the message, Master [04:24:49] (03PS1) 10Ori.livneh: hhvm: set hhvm.xenon.period to 600 [puppet] - 10https://gerrit.wikimedia.org/r/179403 [04:31:51] (03CR) 10Ori.livneh: [C: 032] hhvm: set hhvm.xenon.period to 600 [puppet] - 10https://gerrit.wikimedia.org/r/179403 (owner: 10Ori.livneh) [04:37:45] (03Restored) 10Ori.livneh: Move idiosyncratic gdbinit to /home/ori [puppet] - 10https://gerrit.wikimedia.org/r/176307 (owner: 10Tim Starling) [04:39:22] (03PS3) 10Ori.livneh: Move idiosyncratic gdbinit to /home/ori [puppet] - 10https://gerrit.wikimedia.org/r/176307 (owner: 10Tim Starling) [04:39:44] (03CR) 10Ori.livneh: [C: 032 V: 032] Move idiosyncratic gdbinit to /home/ori [puppet] - 10https://gerrit.wikimedia.org/r/176307 (owner: 10Tim Starling) [04:41:05] (03CR) 10Ori.livneh: "Manually removed leftover /etc/gdb/gdbinit on app servers" [puppet] - 10https://gerrit.wikimedia.org/r/176307 (owner: 10Tim Starling) [04:42:36] i get a db error every time i try to access https://en.wikipedia.org/wiki/Special:WhatLinksHere/Module:HtmlBuilder [04:42:40] i haven't been able to see it for days [04:46:52] springle: do you have a slow query killer running? [04:48:07] PROBLEM - puppet last run on mw1245 is CRITICAL: CRITICAL: Puppet has 1 failures [04:48:45] PROBLEM - puppet last run on mw1244 is CRITICAL: CRITICAL: Puppet has 1 failures [04:48:55] PROBLEM - puppet last run on mw1216 is CRITICAL: CRITICAL: Puppet has 1 failures [04:49:14] PROBLEM - puppet last run on mw1132 is CRITICAL: CRITICAL: Puppet has 1 failures [04:49:37] PROBLEM - puppet last run on mw1124 is CRITICAL: CRITICAL: Puppet has 1 failures [04:52:42] RECOVERY - puppet last run on mw1124 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:00:51] RECOVERY - puppet last run on mw1244 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [05:01:59] RECOVERY - puppet last run on mw1062 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:02:09] RECOVERY - puppet last run on mw1109 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:02:09] RECOVERY - puppet last run on mw1130 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:02:28] RECOVERY - puppet last run on mw1218 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:02:34] RECOVERY - puppet last run on mw1040 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:02:50] RECOVERY - puppet last run on mw1161 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [05:03:11] RECOVERY - puppet last run on mw1147 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [05:03:17] RECOVERY - puppet last run on mw1245 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [05:04:02] ori: yes, slow wikiuser queries will be sniped after 5min. in some circumstances, 60sec [05:04:14] RECOVERY - puppet last run on mw1216 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [05:04:24] RECOVERY - puppet last run on mw1132 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [05:04:25] springle: what circumstances? [05:04:59] a storm of identical queries or server hitting max connections [05:05:53] jackmcbarn: https://en.wikipedia.org/wiki/Special:WhatLinksHere/Module:HtmlBuilder just loaded for me [05:07:34] for me also, but a refresh and it is slow [05:08:09] i just tried again, another error [05:08:45] PROBLEM - Kafka Broker Messages In on analytics1021 is CRITICAL: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate CRITICAL: 7.07288346886e-21 [05:15:14] jackmcbarn: these i guess https://tendril.wikimedia.org/report/slow_queries?host=^db&user=wikiuser&schema=wik&qmode=eq&query=SpecialWhatLinksHere%3A%3AshowIndirectLinks&hours=6 [05:15:35] springle: i get asked for a login that i don't think i have [05:15:44] oh [05:16:21] (03CR) 10Faidon Liambotis: [C: 04-1] "I hadn't realized we were so close to dropping iptables.pp. Cool! Thanks for working on that." (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/179166 (owner: 10Dzahn) [05:20:47] (03CR) 10Faidon Liambotis: [C: 04-1] let bastion hosts have base::firewall (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/96424 (owner: 10Dzahn) [05:37:27] <_joe_> morning [05:43:21] <_joe_> paravoid: if you didn't get the time to backport ffmpeg2theora, I will do it today - it's basically the only package we need to rebuild for the imagescalers on trusty [05:44:20] <_joe_> as mediawiki already supports the newer librsvg security model :)) [05:52:23] PROBLEM - Disk space on fluorine is CRITICAL: DISK CRITICAL - free space: /a 74595 MB (3% inode=99%): [05:55:04] (03PS1) 1020after4: change "default: none" to "default: default" for phabricator's security_topic field to fix the "changed none to none" bug. This fixes T479 [puppet] - 10https://gerrit.wikimedia.org/r/179407 [06:03:38] (03PS1) 10Springle: depool db1055 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179410 [06:04:27] (03CR) 10Springle: [C: 032] depool db1055 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179410 (owner: 10Springle) [06:05:09] (03CR) 10Rush: [C: 031] "that simple huh? nice" [puppet] - 10https://gerrit.wikimedia.org/r/179407 (owner: 1020after4) [06:05:28] !log springle Synchronized wmf-config/db-eqiad.php: depool db1055 (duration: 00m 05s) [06:05:32] Logged the message, Master [06:13:58] _joe_: sure, go ahead :) [06:14:58] fluorine is running out of disk space again [06:28:37] api.log is 132GB [06:32:39] PROBLEM - puppet last run on elastic1022 is CRITICAL: CRITICAL: puppet fail [06:33:39] PROBLEM - puppet last run on mw1011 is CRITICAL: CRITICAL: puppet fail [06:34:56] PROBLEM - puppet last run on amssq35 is CRITICAL: CRITICAL: Puppet has 2 failures [06:35:03] PROBLEM - puppet last run on mw1052 is CRITICAL: CRITICAL: Puppet has 2 failures [06:35:26] PROBLEM - puppet last run on cp3014 is CRITICAL: CRITICAL: Puppet has 1 failures [06:36:06] PROBLEM - puppet last run on mw1065 is CRITICAL: CRITICAL: Puppet has 2 failures [06:36:17] PROBLEM - puppet last run on mw1118 is CRITICAL: CRITICAL: Puppet has 1 failures [06:36:23] PROBLEM - puppet last run on elastic1027 is CRITICAL: CRITICAL: Puppet has 2 failures [06:36:29] PROBLEM - puppet last run on search1007 is CRITICAL: CRITICAL: Puppet has 1 failures [06:36:31] PROBLEM - puppet last run on cp4008 is CRITICAL: CRITICAL: Puppet has 1 failures [06:37:02] PROBLEM - puppet last run on mw1092 is CRITICAL: CRITICAL: Puppet has 1 failures [06:37:02] PROBLEM - puppet last run on mw1126 is CRITICAL: CRITICAL: Puppet has 1 failures [06:37:23] PROBLEM - puppet last run on lvs2004 is CRITICAL: CRITICAL: Puppet has 1 failures [06:37:32] PROBLEM - puppet last run on cp1061 is CRITICAL: CRITICAL: Puppet has 3 failures [06:46:00] RECOVERY - puppet last run on mw1052 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [06:46:22] PROBLEM - puppet last run on search1012 is CRITICAL: CRITICAL: Puppet has 1 failures [06:46:50] RECOVERY - puppet last run on mw1065 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [06:47:00] RECOVERY - puppet last run on elastic1027 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:43] RECOVERY - puppet last run on mw1092 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:47:51] RECOVERY - puppet last run on lvs2004 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:48:09] PROBLEM - puppet last run on palladium is CRITICAL: CRITICAL: Puppet has 1 failures [06:48:11] RECOVERY - puppet last run on cp1061 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:48:29] PROBLEM - puppet last run on sca1001 is CRITICAL: CRITICAL: Puppet has 1 failures [06:49:19] RECOVERY - puppet last run on amssq35 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:49:19] PROBLEM - puppet last run on amssq37 is CRITICAL: CRITICAL: Puppet has 1 failures [06:50:04] RECOVERY - puppet last run on cp3014 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:50:28] PROBLEM - puppet last run on mw1128 is CRITICAL: CRITICAL: Puppet has 1 failures [06:50:30] PROBLEM - puppet last run on mw1194 is CRITICAL: CRITICAL: Puppet has 1 failures [06:50:40] RECOVERY - puppet last run on elastic1022 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:50:40] RECOVERY - puppet last run on mw1118 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:50:56] RECOVERY - puppet last run on cp4008 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:51:21] RECOVERY - puppet last run on mw1126 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:52:39] PROBLEM - puppet last run on amssq58 is CRITICAL: CRITICAL: Puppet has 1 failures [06:56:13] PROBLEM - puppet last run on elastic1018 is CRITICAL: CRITICAL: Puppet has 1 failures [06:56:46] RECOVERY - puppet last run on search1012 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [06:57:04] RECOVERY - puppet last run on mw1194 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [06:57:38] PROBLEM - puppet last run on mw1235 is CRITICAL: CRITICAL: Puppet has 1 failures [06:58:16] RECOVERY - puppet last run on sca1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:59:07] PROBLEM - puppet last run on ms-be3002 is CRITICAL: CRITICAL: Puppet has 1 failures [06:59:14] RECOVERY - puppet last run on amssq37 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [06:59:32] PROBLEM - puppet last run on wtp1018 is CRITICAL: CRITICAL: Puppet has 1 failures [07:00:29] RECOVERY - puppet last run on mw1128 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [07:00:30] PROBLEM - puppet last run on mw1183 is CRITICAL: CRITICAL: Puppet has 1 failures [07:00:42] PROBLEM - puppet last run on ms-be2008 is CRITICAL: CRITICAL: Puppet has 1 failures [07:00:48] PROBLEM - puppet last run on ms-be1009 is CRITICAL: CRITICAL: Puppet has 1 failures [07:01:22] PROBLEM - puppet last run on cp4018 is CRITICAL: CRITICAL: Puppet has 1 failures [07:01:29] RECOVERY - puppet last run on palladium is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [07:01:58] PROBLEM - puppet last run on db1001 is CRITICAL: CRITICAL: Puppet has 1 failures [07:02:28] RECOVERY - puppet last run on amssq58 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:03:16] PROBLEM - puppet last run on elastic1011 is CRITICAL: CRITICAL: Puppet has 1 failures [07:03:31] PROBLEM - puppet last run on lvs2003 is CRITICAL: CRITICAL: Puppet has 1 failures [07:03:31] PROBLEM - puppet last run on cp3012 is CRITICAL: CRITICAL: Puppet has 1 failures [07:04:08] PROBLEM - puppet last run on amssq38 is CRITICAL: CRITICAL: Puppet has 1 failures [07:04:44] PROBLEM - puppet last run on analytics1028 is CRITICAL: CRITICAL: Puppet has 1 failures [07:04:55] PROBLEM - puppet last run on strontium is CRITICAL: CRITICAL: Puppet has 1 failures [07:05:10] PROBLEM - puppet last run on amssq31 is CRITICAL: CRITICAL: Puppet has 1 failures [07:05:49] RECOVERY - puppet last run on elastic1018 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:06:54] RECOVERY - puppet last run on search1007 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [07:06:54] PROBLEM - puppet last run on es2006 is CRITICAL: CRITICAL: Puppet has 1 failures [07:07:01] RECOVERY - puppet last run on mw1235 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:07:21] PROBLEM - puppet last run on lvs2002 is CRITICAL: CRITICAL: Puppet has 1 failures [07:07:29] RECOVERY - puppet last run on mw1011 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [07:08:17] RECOVERY - puppet last run on ms-be3002 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [07:08:39] PROBLEM - puppet last run on ms-be1010 is CRITICAL: CRITICAL: Puppet has 1 failures [07:10:31] RECOVERY - puppet last run on cp4018 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [07:10:59] PROBLEM - puppet last run on es1003 is CRITICAL: CRITICAL: Puppet has 1 failures [07:11:12] PROBLEM - puppet last run on mw1136 is CRITICAL: CRITICAL: Puppet has 1 failures [07:11:21] RECOVERY - puppet last run on db1001 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [07:12:04] RECOVERY - puppet last run on wtp1018 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:12:30] RECOVERY - puppet last run on elastic1011 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:12:43] RECOVERY - puppet last run on mw1183 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:12:51] RECOVERY - puppet last run on lvs2003 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [07:12:56] RECOVERY - puppet last run on cp3012 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [07:13:11] RECOVERY - puppet last run on ms-be2008 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:13:18] puppet [07:13:19] RECOVERY - puppet last run on ms-be1009 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:13:20] ? [07:13:24] Hm [07:13:51] RECOVERY - puppet last run on amssq38 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:14:13] RECOVERY - puppet last run on analytics1028 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [07:14:19] RECOVERY - puppet last run on strontium is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [07:17:46] RECOVERY - puppet last run on amssq31 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:17:57] RECOVERY - puppet last run on ms-be1010 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:19:35] RECOVERY - puppet last run on es2006 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:20:13] RECOVERY - puppet last run on lvs2002 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [07:20:32] RECOVERY - puppet last run on es1003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:20:39] RECOVERY - puppet last run on mw1136 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:40:41] (03PS1) 10Ori.livneh: Sample 'api' debug log group at 1:1000 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179412 [07:40:45] ^ paravoid [07:41:03] (03PS1) 10Faidon Liambotis: tor: remove redundant firewall rule [puppet] - 10https://gerrit.wikimedia.org/r/179413 [07:41:50] (03CR) 10Faidon Liambotis: [C: 032] tor: remove redundant firewall rule [puppet] - 10https://gerrit.wikimedia.org/r/179413 (owner: 10Faidon Liambotis) [07:47:15] (03PS1) 10Ori.livneh: add `pv` (pipe viewer) to base::standard-packages [puppet] - 10https://gerrit.wikimedia.org/r/179414 [07:47:26] <_joe_> pv? [07:48:02] it's handy! [07:48:12] http://www.ivarch.com/programs/pv.shtml [07:48:24] <_joe_> yeah looking at the manual now [08:06:07] PROBLEM - Disk space on fluorine is CRITICAL: DISK CRITICAL - free space: /a 75553 MB (3% inode=99%): [09:18:26] (03PS1) 10Ori.livneh: xenon log: collate stack samples and fold into single lines [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179418 [09:19:29] (03CR) 10Ori.livneh: [C: 032] xenon log: collate stack samples and fold into single lines [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179418 (owner: 10Ori.livneh) [09:19:38] (03Merged) 10jenkins-bot: xenon log: collate stack samples and fold into single lines [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179418 (owner: 10Ori.livneh) [09:20:35] !log ori Synchronized wmf-config/StartProfiler.php: I63864cc79: xenon log: collate stack samples and fold into single lines (duration: 00m 06s) [09:20:42] Logged the message, Master [09:27:45] PROBLEM - puppet last run on amssq59 is CRITICAL: CRITICAL: puppet fail [09:33:41] !log restarted mwprof on tungsten [09:33:44] Logged the message, Master [09:34:03] RECOVERY - MediaWiki profile collector on tungsten is OK: OK: All defined mwprof jobs are runnning. [09:42:51] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Apart from what Faidon said, hosts with roles having udp2log are:" [puppet] - 10https://gerrit.wikimedia.org/r/179166 (owner: 10Dzahn) [09:42:57] RECOVERY - puppet last run on amssq59 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [09:43:19] PROBLEM - MediaWiki profile collector on tungsten is CRITICAL: CRITICAL: Not all configured mwprof instances are running. [09:46:23] RECOVERY - MediaWiki profile collector on tungsten is OK: OK: All defined mwprof jobs are runnning. [10:00:50] (03CR) 10Alexandros Kosiaris: [C: 032] grub: use augeas to modify the config [puppet] - 10https://gerrit.wikimedia.org/r/178897 (owner: 10Faidon Liambotis) [10:04:29] RECOVERY - Disk space on fluorine is OK: DISK OK [10:12:13] (03PS1) 10Filippo Giunchedi: send log to stdout with upstart and systemd [debs/python-diamond] - 10https://gerrit.wikimedia.org/r/179423 [10:20:23] (03PS3) 10Alexandros Kosiaris: Run "apt-get update" outside of/before puppet [puppet] - 10https://gerrit.wikimedia.org/r/179082 (owner: 10Faidon Liambotis) [10:20:51] (03CR) 10Alexandros Kosiaris: [C: 032] "Happy to see puppet not running apt-get update directly anymore!!!" [puppet] - 10https://gerrit.wikimedia.org/r/179082 (owner: 10Faidon Liambotis) [10:21:03] don't merge that yet [10:21:21] (03CR) 10Dzahn: [C: 032] "identical compilation - http://puppet-compiler.wmflabs.org/549/change/179184/html/" [puppet] - 10https://gerrit.wikimedia.org/r/179184 (owner: 10Dzahn) [10:21:22] I need to test it somehwere... [10:26:28] some day i still want to rename myself in gerrit/labs. i guess this is it https://wikitech.wikimedia.org/wiki/Renaming_users [10:26:38] to get the regular full name [10:27:12] involves gerrit restart ? heh :p [10:28:49] PROBLEM - puppet last run on mw1224 is CRITICAL: CRITICAL: puppet fail [10:29:19] PROBLEM - puppet last run on mw1187 is CRITICAL: CRITICAL: puppet fail [10:29:19] PROBLEM - puppet last run on db1073 is CRITICAL: CRITICAL: puppet fail [10:29:19] PROBLEM - puppet last run on potassium is CRITICAL: CRITICAL: puppet fail [10:29:26] PROBLEM - puppet last run on db2005 is CRITICAL: CRITICAL: puppet fail [10:29:31] <_joe_> mmmh [10:29:39] ugh, master again? [10:29:41] PROBLEM - puppet last run on analytics1041 is CRITICAL: CRITICAL: puppet fail [10:29:41] PROBLEM - puppet last run on amssq32 is CRITICAL: CRITICAL: puppet fail [10:29:52] <_joe_> no looks like your change? [10:30:06] <_joe_> Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Puppet::Parser::AST::Resource failed with error ArgumentError: Could not find declared class apt::update at /etc/puppet/manifests/stages.pp:6 on node mw1224.eqiad.wmnet [10:30:07] PROBLEM - puppet last run on snapshot1003 is CRITICAL: CRITICAL: puppet fail [10:30:21] PROBLEM - puppet last run on es2008 is CRITICAL: CRITICAL: puppet fail [10:30:23] PROBLEM - puppet last run on db2034 is CRITICAL: CRITICAL: puppet fail [10:30:24] PROBLEM - puppet last run on ms-be1003 is CRITICAL: CRITICAL: puppet fail [10:30:25] PROBLEM - puppet last run on mw1176 is CRITICAL: CRITICAL: puppet fail [10:30:25] my change just touched smokeping [10:30:28] PROBLEM - puppet last run on db1066 is CRITICAL: CRITICAL: puppet fail [10:30:28] PROBLEM - puppet last run on mw1003 is CRITICAL: CRITICAL: puppet fail [10:30:37] PROBLEM - puppet last run on elastic1018 is CRITICAL: CRITICAL: puppet fail [10:30:40] looks like my change [10:30:40] <_joe_> mmmh [10:30:41] PROBLEM - puppet last run on db1031 is CRITICAL: CRITICAL: puppet fail [10:30:49] PROBLEM - puppet last run on elastic1007 is CRITICAL: CRITICAL: puppet fail [10:30:51] <_joe_> yeah :) [10:30:51] PROBLEM - puppet last run on mw1008 is CRITICAL: CRITICAL: puppet fail [10:30:52] PROBLEM - puppet last run on mw1189 is CRITICAL: CRITICAL: puppet fail [10:31:08] <_joe_> thanks [10:31:09] so I guess akosiaris didn't see my "don't merge that yet" :) [10:31:55] <_joe_> well, no big deal [10:32:02] <_joe_> now we need to test it for sure :P [10:32:05] Could not find dependency File[usr/local/sbin/puppet-run] for File[/etc/cron.d/puppet] [10:32:40] the first error is strange though [10:32:46] there is no apt::update class anymore [10:32:57] <_joe_> mmmh failed merge? [10:33:06] yeah desync I think [10:33:27] ditto for the second error [10:33:43] <_joe_> no, the code is merged on strontium [10:34:09] <_joe_> so maybe some weird mod_passenger fuckup [10:34:23] <_joe_> because now I do get Error: Failed to apply catalog: Could not find dependency File[usr/local/sbin/puppet-run] for File[/etc/cron.d/puppet] at /etc/puppet/modules/base/manifests/puppet.pp:109 [10:34:28] <_joe_> like mutante reported [10:35:00] <_joe_> (you need a / at the start of the dependency) [10:35:16] blergh [10:35:16] <_joe_> usr/local/sbin/puppet-run vs /usr/local/sbin/puppet-run [10:35:19] <_joe_> :) [10:35:25] <_joe_> typos happen [10:36:07] (03PS1) 10Faidon Liambotis: Brown paper bag fix for 7909d4c [puppet] - 10https://gerrit.wikimedia.org/r/179425 [10:36:26] (03CR) 10Faidon Liambotis: [C: 032 V: 032] Brown paper bag fix for 7909d4c [puppet] - 10https://gerrit.wikimedia.org/r/179425 (owner: 10Faidon Liambotis) [10:36:48] sigh, I was actually thinking this was a superfluous dependency... then I decided it was ok [10:37:00] I actually pondered on the line and yet missed the /.. :-( [10:39:02] confirmed working again on netmon1001 where i got the error, i see it change the puppet cron job [10:41:39] (03PS1) 10Faidon Liambotis: More puppet-run fixes [puppet] - 10https://gerrit.wikimedia.org/r/179426 [10:42:00] (03CR) 10Faidon Liambotis: [C: 032] More puppet-run fixes [puppet] - 10https://gerrit.wikimedia.org/r/179426 (owner: 10Faidon Liambotis) [10:42:08] (03CR) 10Faidon Liambotis: [V: 032] More puppet-run fixes [puppet] - 10https://gerrit.wikimedia.org/r/179426 (owner: 10Faidon Liambotis) [10:42:15] I like how we have a CI infrastructure [10:42:23] that can't even catch a failed dependency ffs [10:43:05] <_joe_> paravoid: we could make the compiler run on a generic node at least, if we want to [10:43:18] <_joe_> but jenkins would be even slower than it is [10:44:12] Notice: Finished catalog run in 6.76 seconds [10:44:37] Notice: Finished catalog run in 6.28 seconds [10:44:38] \o/ [10:45:18] 612 puppet failures [10:45:21] ohjoy [10:46:14] less than in the past :) [10:46:47] i'd like to have a graph now that shows number of compiler warns/errors per line of code over time [10:47:07] i wonder about "Warning: ActiveRecord-based storeconfigs and inventory are deprecated" every once in a while [10:47:26] yeah, we need to start looking at puppetdb [10:47:36] (03CR) 10Dzahn: [C: 032] "removing 11 compilation warnings. identical result http://puppet-compiler.wmflabs.org/550/change/179190/html/" [puppet] - 10https://gerrit.wikimedia.org/r/179190 (owner: 10Dzahn) [10:47:39] and perhaps that clojure abomination later on [10:55:54] (03PS15) 10Yuvipanda: [WIP] Add dblist / shard support for bootstrapper [software/labsdb-auditor] - 10https://gerrit.wikimedia.org/r/179110 [10:56:27] <_joe_> paravoid: 600 seconds is not going to be enough when sync-common is run by puppet [10:56:43] <_joe_> or upon first-install of a server [10:56:49] are we still doing that...? [10:56:59] <_joe_> only upon installation [10:57:25] <_joe_> when scap is installed, it runs if the code isn't already there [10:58:20] (03PS3) 10Dzahn: rm module apachesync [puppet] - 10https://gerrit.wikimedia.org/r/177080 [10:59:56] (03PS16) 10Yuvipanda: [WIP] Add dblist / shard support for bootstrapper [software/labsdb-auditor] - 10https://gerrit.wikimedia.org/r/179110 [11:04:33] creating human readable yaml out of pyyaml seems to be such a pain. [11:04:46] so much optimizations I don’t give a shit about to shave off whitespace [11:06:55] finally :) [11:07:05] (03CR) 10Dzahn: udp2log - stop using old iptables classes (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/179166 (owner: 10Dzahn) [11:07:54] (03CR) 10Dzahn: "thanks for the reviews. it seems only fair if we continue on matanya's patch because that one was first and already has less issues" [puppet] - 10https://gerrit.wikimedia.org/r/179166 (owner: 10Dzahn) [11:10:22] (03CR) 10Dzahn: "let me dump this valuable information Alex already got and pasted on the duplicate change i made at Change-Id: I326ae8de27249b1e4 so that " [puppet] - 10https://gerrit.wikimedia.org/r/169691 (owner: 10Matanya) [11:12:01] (03Abandoned) 10Dzahn: udp2log - stop using old iptables classes [puppet] - 10https://gerrit.wikimedia.org/r/179166 (owner: 10Dzahn) [11:14:41] (03PS1) 10KartikMistry: Added initial Debian packaging [debs/contenttranslation/apertium-sv-da] - 10https://gerrit.wikimedia.org/r/179428 [11:15:25] anyone know why I am getting [11:15:29] /bin/sh: 1: puppet-run: not found [11:15:34] with a lot of failed crons? [11:16:21] oh [11:16:37] from tools [11:16:40] and I presume rest of labs [11:16:42] oh, scrollback [11:21:06] grumble [11:21:07] hmm, still spamming [11:21:09] yes [11:21:20] * YuviPanda just woke up, brain not fully awake yet. [11:21:30] cron.d, unlike cron.hourly/weekly, doesn't have a PATH set [11:21:39] aaah, right. [11:21:42] so it uses the init script's PATH, which is [11:21:43] PATH=/bin:/usr/bin:/sbin:/usr/sbin [11:22:03] but if this is the puppet cron does that mean puppet won’t actually run on these hosts now automatically? [11:22:11] yup [11:22:13] salt... [11:22:19] yeah [11:22:24] it was a premature merge, this was entirely untested [11:22:30] heh :) [11:22:39] and we had a miscommunication with akosiaris [11:23:14] I asked for input but he thought it was ready for merge (and I hadn't -1/2ed it) [11:23:22] aaaah :) [11:23:50] _joe_: how much time does scap want for the initial run? [11:24:06] I’m going to guess you guys are already on it. let me know if I could help. happy to salt labs after fixes are merged [11:24:20] <_joe_> paravoid: on appservers, the first two runs are ~ 1000 seconds [11:24:39] <_joe_> due to 200 calls to apt-get install [11:25:27] the first run isn't being run by puppet-run [11:25:38] I thought of that before [11:26:09] <_joe_> oh ok, scap usually installs on the second one [11:26:14] right [11:26:38] <_joe_> so yes, ~ 1000 seconds should be ok [11:29:03] <_joe_> convert images/originals/santa1.jpg -resize 300x200 /tmp/scrap.png is much slower on trusty than on precise [11:29:08] <_joe_> any idea why? [11:30:19] (03PS1) 10Dzahn: kafkatee: webstats-collector, add ferm service [puppet] - 10https://gerrit.wikimedia.org/r/179429 [11:31:57] (03PS1) 10Faidon Liambotis: Bump puppet-run puppet timeout to 1800 again [puppet] - 10https://gerrit.wikimedia.org/r/179430 [11:31:59] (03PS1) 10Faidon Liambotis: Use an absolute path for the puppet-run binary [puppet] - 10https://gerrit.wikimedia.org/r/179431 [11:32:35] (03CR) 10Dzahn: "re: webstats-collector: https://gerrit.wikimedia.org/r/#/c/179429/1" [puppet] - 10https://gerrit.wikimedia.org/r/169691 (owner: 10Matanya) [11:33:38] (03CR) 10Faidon Liambotis: [C: 032] Bump puppet-run puppet timeout to 1800 again [puppet] - 10https://gerrit.wikimedia.org/r/179430 (owner: 10Faidon Liambotis) [11:34:04] (03CR) 10Faidon Liambotis: [C: 032] Use an absolute path for the puppet-run binary [puppet] - 10https://gerrit.wikimedia.org/r/179431 (owner: 10Faidon Liambotis) [11:36:19] _joe_: there's some MAGICK_THREAD_LIMIT env variable.. maybe that could be different? [11:36:51] <_joe_> mutante: I just tried running with -limit thread 1 [11:37:00] <_joe_> and the difference is smaller in fact [11:37:05] also see people asking about "disable/enable OpenMP" [11:37:08] <_joe_> but still present [11:37:49] http://www.imagemagick.org/script/openmp.php [11:37:54] convert -bench ? [11:39:46] <_joe_> mmmh seems like by default trusty's version uses 1 thread, while the precise one uses two [11:40:58] ah [11:41:31] paravoid: should I force a puppet run via salt now or do you have more fixes lined up? [11:41:36] (for labs, at least) [11:41:40] yes [11:41:45] I'm forcing one in prod right now [11:41:46] ok [11:43:45] !log force puppet run on all labs hosts via salt [11:44:31] Logged the message, Master [11:44:40] it's odd when roles have names that include a specific node [11:44:46] like on oxygen there is include role::logging::udp2log::oxygen [11:45:16] (03PS1) 10Dereckson: Namespaces configuration on kab.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179433 [11:45:17] but i dont see a role that puts rsync there (to add a firewall hole) [11:45:27] sigh, I need to batch it. [11:47:13] (03PS2) 10Dereckson: Namespaces configuration on kab.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179433 [11:47:35] <_joe_> gee, 3.2 req/s vs 4.3 req/s [11:49:09] <_joe_> I need to do some real-hardware test, though [11:49:57] <_joe_> the whole throughput difference is due to the huge change in convert performance. So it's probably some config [11:54:23] ok, so… I killed virt1000 [11:54:37] so wikitech is dead. [11:54:42] might need to powercycle the machine [11:54:47] <_joe_> uh? [11:54:51] <_joe_> how did you kill it? [11:54:52] forced puppet run on all labs hosts [11:54:57] without batching [11:55:03] (03PS1) 10Dzahn: role/logging: add ferm service webstats-collector [puppet] - 10https://gerrit.wikimedia.org/r/179434 [11:55:20] so they all hit virt1000 at the same time, and it died quicker than I could at least try to kill the salt job [11:56:02] <_joe_> getting in console [11:56:14] (03CR) 10Dzahn: "which source addresses does webstats-collector need to talk to?" [puppet] - 10https://gerrit.wikimedia.org/r/179434 (owner: 10Dzahn) [11:56:18] <_joe_> YuviPanda: the machine is up [11:56:49] <_joe_> but well, not exactly responsive [11:56:52] _joe_: machine *is* up, but can’t ssh in. responds to ping as well. we could also wait for a while to see the storm dies down... [11:56:53] yeah [11:56:57] <_joe_> I'll try to log in [11:57:05] (03CR) 10Dzahn: "re: webstats-collector. another one: https://gerrit.wikimedia.org/r/#/c/179434/" [puppet] - 10https://gerrit.wikimedia.org/r/169691 (owner: 10Matanya) [11:57:18] (03CR) 10Alexandros Kosiaris: [C: 031] "I am always kind of undecided on this. I love the ability of both systemd and upstart (well, ok systemd's more) to catch non-logged messag" [debs/python-diamond] - 10https://gerrit.wikimedia.org/r/179423 (owner: 10Filippo Giunchedi) [11:58:00] _joe_: one of my earlier login attempts succeeded. let me try to stop apache [11:58:13] <_joe_> ok [11:58:35] for now my stop is just hanging, let’s see if it completes [11:59:08] or kill all the puppet agents on the clients with salt? [11:59:47] mutante: virt1000 is also the salt master. I tried that but it pretty much hung before I could finish that. or salt managed to send the command to some hosts but wasn’t enough? [12:00:38] (03CR) 10Alexandros Kosiaris: "$ALL_NETWORKS should be fine for now I think Daniel" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/179434 (owner: 10Dzahn) [12:01:06] YuviPanda: oh, right, same master of course :p [12:01:09] mutante: I saw the ::oxygen thing for udp2log too and I decided to erase it from my mind ... it didn't work :-( [12:01:40] akosiaris: it's confusing, also like webstats-collector is setup twice in a different place when i looked [12:01:44] hence the 2 changes :p [12:01:52] looking at the rsyncd part now [12:02:02] it already uses $hosts_allow btw [12:02:07] paravoid: wanna unban icinga-wm ? [12:02:53] not yet [12:02:57] _joe_: alright, managed to kill apache, and virt1000 is usable again. will restart it in a minute. [12:03:09] salt is still running [12:03:25] I batched it ;) [12:04:27] hehe :) [12:08:10] (03PS1) 10Filippo Giunchedi: codfw-prod: empty ms-be2013/2014/2015 sdm3/sdn3 [software/swift-ring] - 10https://gerrit.wikimedia.org/r/179436 [12:08:12] (03PS1) 10Filippo Giunchedi: codfw-prod: put back weight on sdm/sdn for ms-be2013/14/15 [software/swift-ring] - 10https://gerrit.wikimedia.org/r/179437 [12:08:34] (03PS1) 10Dzahn: udp2log: rsync, add ferm service for rsyncd [puppet] - 10https://gerrit.wikimedia.org/r/179438 [12:09:52] (03PS2) 10Dzahn: udp2log: rsync, add ferm service for rsyncd [puppet] - 10https://gerrit.wikimedia.org/r/179438 [12:10:10] paravoid: y’know, I’m thinking of just sending a salt command to symlink puppet-run to somewhere in cron’s path [12:10:21] and then unsymlink it a couple of hours later. [12:10:26] I thought of that too [12:10:33] but I thought, meh, they haven't run puppet anyway [12:10:37] so why not just run it myself [12:10:47] heh [12:10:51] but it’s gonna be a while, I guess. [12:11:21] (03PS3) 10Dzahn: udp2log: rsync, add ferm service for rsyncd [puppet] - 10https://gerrit.wikimedia.org/r/179438 [12:12:24] (03CR) 10Dzahn: "re: rsync https://gerrit.wikimedia.org/r/#/c/179438/3" [puppet] - 10https://gerrit.wikimedia.org/r/169691 (owner: 10Matanya) [12:15:51] (03PS2) 10Dzahn: kafkatee: webstats-collector, add ferm service [puppet] - 10https://gerrit.wikimedia.org/r/179429 [12:17:28] mutante: fwiw, you could add a rule to misc::udp2log::firewall that ACCEPTs everything tcp [12:17:29] (03PS2) 10Dzahn: role/logging: add ferm service webstats-collector [puppet] - 10https://gerrit.wikimedia.org/r/179434 [12:17:37] temporary [12:18:20] paravoid: I’m doing the lazy way, and doing the ln -s.. [12:19:20] paravoid: ah, that would be easier yea. i think we are getting closer already to actually find them all though [12:19:33] (03PS1) 10Faidon Liambotis: monitoring: do not warn for > 1 salt-minion procs [puppet] - 10https://gerrit.wikimedia.org/r/179440 [12:20:29] (03CR) 10jenkins-bot: [V: 04-1] monitoring: do not warn for > 1 salt-minion procs [puppet] - 10https://gerrit.wikimedia.org/r/179440 (owner: 10Faidon Liambotis) [12:22:01] (03Abandoned) 10Alexandros Kosiaris: apache-graceful-all: drop dsh, use salt [puppet] - 10https://gerrit.wikimedia.org/r/160953 (owner: 10Alexandros Kosiaris) [12:22:36] (03CR) 10Alexandros Kosiaris: [C: 031] rm module apachesync [puppet] - 10https://gerrit.wikimedia.org/r/177080 (owner: 10Dzahn) [12:22:47] (03CR) 10Dzahn: "i like how you are removing the "check_check_" kind of names. the jenkins fail is somehow surprising. does it fail on the ": "? does it ne" [puppet] - 10https://gerrit.wikimedia.org/r/179440 (owner: 10Faidon Liambotis) [12:23:09] (03PS2) 10Faidon Liambotis: monitoring: do not warn for > 1 salt-minion procs [puppet] - 10https://gerrit.wikimedia.org/r/179440 [12:24:27] (03CR) 10Faidon Liambotis: [C: 032] monitoring: do not warn for > 1 salt-minion procs [puppet] - 10https://gerrit.wikimedia.org/r/179440 (owner: 10Faidon Liambotis) [12:40:04] (03PS1) 10Dzahn: openstack-manager: fix compiler warnings [puppet] - 10https://gerrit.wikimedia.org/r/179442 [12:40:58] (03CR) 10jenkins-bot: [V: 04-1] openstack-manager: fix compiler warnings [puppet] - 10https://gerrit.wikimedia.org/r/179442 (owner: 10Dzahn) [12:42:32] (03PS2) 10Dzahn: openstack-manager: fix compiler warnings [puppet] - 10https://gerrit.wikimedia.org/r/179442 [12:43:29] (03CR) 10jenkins-bot: [V: 04-1] openstack-manager: fix compiler warnings [puppet] - 10https://gerrit.wikimedia.org/r/179442 (owner: 10Dzahn) [12:48:20] (03PS1) 10Dzahn: openstack-manager: re-add tmp removed mwdeploy user [puppet] - 10https://gerrit.wikimedia.org/r/179444 [12:49:24] (03PS2) 10Dzahn: openstack-manager: re-add tmp removed mwdeploy user [puppet] - 10https://gerrit.wikimedia.org/r/179444 [12:51:45] (03PS3) 10Dzahn: openstack-manager: fix compiler warnings [puppet] - 10https://gerrit.wikimedia.org/r/179442 [12:52:33] (03CR) 10jenkins-bot: [V: 04-1] openstack-manager: fix compiler warnings [puppet] - 10https://gerrit.wikimedia.org/r/179442 (owner: 10Dzahn) [12:54:40] (03PS3) 10Giuseppe Lavagetto: hhvm: make the puppet module more configurable [puppet] - 10https://gerrit.wikimedia.org/r/179108 [12:55:35] (03CR) 10Dzahn: [C: 04-1] "Error: Could not find class role::apache::helper_scripts for tin.eqiad.wmnet on node tin.eqiad.wmnet uhm..." [puppet] - 10https://gerrit.wikimedia.org/r/177080 (owner: 10Dzahn) [12:57:56] Looks like I missed a lot of good stuff [12:58:04] (03PS4) 10Dzahn: openstack-manager: fix compiler warnings [puppet] - 10https://gerrit.wikimedia.org/r/179442 [12:58:53] (03CR) 10jenkins-bot: [V: 04-1] openstack-manager: fix compiler warnings [puppet] - 10https://gerrit.wikimedia.org/r/179442 (owner: 10Dzahn) [12:59:19] andrewbogott: is still ongoing :) puppet isn’t running on most labs host yet. [12:59:37] Is that related to the apt-get frontloading? Or something else? [13:00:13] andrewbogott: small oversight, cron job that runs puppet didn’t have path specified, so bam, after that was executed puppet doesn’t run anymore automatically [13:00:38] so I’m atm running a salt job to symlink the puppet-run script to /sbin so it will run [13:00:45] can remove later [13:00:53] what broke it in the first place? I didn't follow that part [13:00:59] (03PS1) 10Dzahn: openstack: firewall, enclose variables in {} [puppet] - 10https://gerrit.wikimedia.org/r/179446 [13:01:22] andrewbogott: https://gerrit.wikimedia.org/r/#/c/179431/ was the fix [13:01:55] YuviPanda: the fix I understand, not why it broke in the first place :) [13:02:18] andrewbogott: ah, I think paravoid was moving the location of the script that is run by cron, I think? [13:02:23] or putting it into a script in the first place? [13:02:48] moved apt-get update outside of puppet [13:02:48] Ah, makes sense. I didn't actually read that patch when it went by. [13:03:01] andrewbogott: https://gerrit.wikimedia.org/r/#/c/179082/ [13:04:44] (03PS5) 10Dzahn: openstack-manager: fix compiler warnings [puppet] - 10https://gerrit.wikimedia.org/r/179442 [13:05:34] that is a pretty big patch just to avoid cronspam :) [13:08:04] (03CR) 10Hashar: "Filippo, since you apparently did some tweak to our gdash files, would you mind reviewing this change and potentially get it deployed? Th" [puppet] - 10https://gerrit.wikimedia.org/r/166511 (https://bugzilla.wikimedia.org/65478) (owner: 10Nemo bis) [13:09:04] (03PS3) 10Dereckson: Namespaces configuration on kab.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179433 [13:10:32] (03CR) 10Dereckson: "PS3: The translation of talk namespaces now follows the same grammar rule than for user namespace, not the rule used for other misc namesp" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179433 (owner: 10Dereckson) [13:12:26] andrewbogott: heh [13:12:32] andrewbogott: any update on the labs machines? :) [13:13:38] Yeah, Alex has one of them installing. It's a one-off but at least there's a potential solution. [13:15:16] yay [13:15:21] YuviPanda: I forwarded you the email thread in case you're curious [13:16:29] (03CR) 10Andrew Bogott: [C: 032] openstack: firewall, enclose variables in {} [puppet] - 10https://gerrit.wikimedia.org/r/179446 (owner: 10Dzahn) [13:16:33] andrewbogott: Iam! cool, will dig through alllll the shinnken spam [13:17:45] (03CR) 10Andrew Bogott: [C: 031] "Looks right, but would appreciate Ori's review." [puppet] - 10https://gerrit.wikimedia.org/r/179444 (owner: 10Dzahn) [13:18:01] (03CR) 10Dzahn: "akosiaris: i don't get why the compiler fails on this one :p" [puppet] - 10https://gerrit.wikimedia.org/r/177080 (owner: 10Dzahn) [13:18:14] thanks andrew [13:18:17] (03CR) 10Hashar: "@faidon For CI https://gerrit.wikimedia.org/r/#/c/178806/ propose a contint::hhvm class which invokes ::hhvm with suitable settings. I ha" [puppet] - 10https://gerrit.wikimedia.org/r/179108 (owner: 10Giuseppe Lavagetto) [13:19:38] (03CR) 10Andrew Bogott: [C: 032] openstack-manager: fix compiler warnings [puppet] - 10https://gerrit.wikimedia.org/r/179442 (owner: 10Dzahn) [13:19:40] (03CR) 10Faidon Liambotis: "Thanks. I see no need for a configurable /var/log or /run there." [puppet] - 10https://gerrit.wikimedia.org/r/179108 (owner: 10Giuseppe Lavagetto) [13:24:09] mutante: thanks for all this work, if you wish, i'll finish my patch on sunday [13:25:52] matanya: :) yes [13:26:28] noted, i'll do, a bit swamped, but i'll spare some time for this [13:26:42] (03CR) 10Faidon Liambotis: [C: 04-1] hhvm: make the puppet module more configurable [puppet] - 10https://gerrit.wikimedia.org/r/179108 (owner: 10Giuseppe Lavagetto) [13:27:08] (03PS1) 10Dzahn: openstack-database-server: enclose variables [puppet] - 10https://gerrit.wikimedia.org/r/179452 [13:30:58] (03CR) 10Dzahn: [C: 031] change "default: none" to "default: default" for phabricator's security_topic field to fix the "changed none to none" bug. This fixes T479 [puppet] - 10https://gerrit.wikimedia.org/r/179407 (owner: 1020after4) [13:33:23] (03CR) 10Dzahn: [C: 031] add `pv` (pipe viewer) to base::standard-packages [puppet] - 10https://gerrit.wikimedia.org/r/179414 (owner: 10Ori.livneh) [13:36:23] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] codfw-prod: empty ms-be2013/2014/2015 sdm3/sdn3 [software/swift-ring] - 10https://gerrit.wikimedia.org/r/179436 (owner: 10Filippo Giunchedi) [13:36:29] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] codfw-prod: put back weight on sdm/sdn for ms-be2013/14/15 [software/swift-ring] - 10https://gerrit.wikimedia.org/r/179437 (owner: 10Filippo Giunchedi) [14:01:15] strace is definitely our best friend [14:01:24] PROBLEM - puppet last run on mw1219 is CRITICAL: CRITICAL: Puppet has 1 failures [14:01:53] open("/mnt\"/mnt/somefile\"", ...) = -1 ENOENT (No such file or directory) [14:02:12] had an env var which was: "/mnt/somefile" (including quotes) doh [14:05:22] !seen chasemp [14:09:29] @seen chasemp [14:09:35] damn bots [14:09:48] yea, it's @seen ..and it works in a PM with wm-bot [14:10:04] mutante: /msg nickserv info chasemp [14:10:07] yields 12 hours ago [14:11:05] hashar: that's good, but wm-bot leaks even more info [14:11:09] "is still in the channel ..." :p [14:11:52] but it lies, haha [14:12:30] ah no, it doesn't. it's smart about renames to "Guest" users [14:13:34] !log upload python-statsd 3.0.1-1 to trusty-wikimedia [14:13:43] Logged the message, Master [14:16:48] RECOVERY - puppet last run on mw1219 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:19:53] (03CR) 10Alexandros Kosiaris: "you have the helper_scripts in manifests/role/apache/helper_scripts. This is not a module to do autoload. site.pp has implicit import for " [puppet] - 10https://gerrit.wikimedia.org/r/177080 (owner: 10Dzahn) [14:30:04] (03CR) 10Filippo Giunchedi: "@Alex thanks for the feedback!" [debs/python-diamond] - 10https://gerrit.wikimedia.org/r/179423 (owner: 10Filippo Giunchedi) [14:32:49] (03CR) 10Filippo Giunchedi: [C: 031] "I can get it merged and deployed, assuming it is good to go?" [puppet] - 10https://gerrit.wikimedia.org/r/166511 (https://bugzilla.wikimedia.org/65478) (owner: 10Nemo bis) [14:36:16] (03PS17) 10Yuvipanda: Add script to generate config about _p viewdbs [software/labsdb-auditor] - 10https://gerrit.wikimedia.org/r/179110 [14:36:18] (03PS1) 10Yuvipanda: Add generated + hand curated tableschema.yaml [software/labsdb-auditor] - 10https://gerrit.wikimedia.org/r/179466 [14:36:35] (03CR) 10Yuvipanda: [C: 032 V: 032] Add script to generate config about _p viewdbs [software/labsdb-auditor] - 10https://gerrit.wikimedia.org/r/179110 (owner: 10Yuvipanda) [14:36:49] (03CR) 10Yuvipanda: [C: 032 V: 032] Add generated + hand curated tableschema.yaml [software/labsdb-auditor] - 10https://gerrit.wikimedia.org/r/179466 (owner: 10Yuvipanda) [14:36:58] !log upload python-statsd 3.0.1-1 to precise-wikimedia [14:37:12] Logged the message, Master [14:40:20] godog: can you push it to Trusty as well? :) [14:40:28] godog: and again thank you for taking care of updating it [14:40:38] will upgrade the package on gallium / zuul server [14:40:51] hashar: yep it is there already [14:41:01] mutante: Around? [14:41:01] considering log /var/log/wikidatadump/dumpwikidatajson-0 [14:41:01] log does not need rotating [14:41:03] still fighting withhhvm configuration though :D [14:43:47] RECOVERY - gdash.wikimedia.org on graphite1002 is OK: HTTP OK: HTTP/1.1 200 OK - 9447 bytes in 0.046 second response time [14:45:39] ah, found it [14:45:39] doh [14:46:31] sudo logrotate -f /etc/logrotate.d/dumpwikidatajson [14:46:34] PROBLEM - puppet last run on ruthenium is CRITICAL: CRITICAL: Puppet last ran 4 hours ago [14:46:37] YuviPanda: ^ can you do that? [14:47:23] the problem exists because the log files are older tahn the logrotate [14:48:59] godog: is there a way to upgrade a single package? I am looking for: apt-get upgrade python-statsd [14:49:28] hashar: apt-get install python-statsd [14:49:36] of course [14:49:58] !log upgrading python-statsd on Zuul server and restarting service. [14:50:07] Logged the message, Master [14:51:14] /zuul/status.json: Internal Server Error [14:51:16] :-( [14:56:34] (03PS1) 10Giuseppe Lavagetto: varnish: fix scope warnings in templates [puppet] - 10https://gerrit.wikimedia.org/r/179468 [14:57:38] (03CR) 10Ottomata: "udp2log will be going away soon! (but I have been saying that for ever so you cannot trust me). :)" [puppet] - 10https://gerrit.wikimedia.org/r/169691 (owner: 10Matanya) [14:58:06] (03PS1) 10Hoo man: Update entity suggester blacklist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179469 [14:58:45] PROBLEM - DPKG on analytics1011 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:59:01] PROBLEM - DPKG on analytics1033 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:59:11] PROBLEM - puppet last run on analytics1014 is CRITICAL: CRITICAL: Puppet has 1 failures [14:59:16] !log Zuul status page is no more. https://phabricator.wikimedia.org/T78400 [14:59:31] Logged the message, Master [14:59:34] (03CR) 10Sjoerddebruin: [C: 031] Update entity suggester blacklist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179469 (owner: 10Hoo man) [15:00:56] (03CR) 10Ottomata: [C: 031] "Ok. FYI, the kafkatee stuff is not currently included anywhere, since we had some kernel panics on analytics1003 (a cisco, and our kafkat" [puppet] - 10https://gerrit.wikimedia.org/r/179429 (owner: 10Dzahn) [15:02:11] RECOVERY - DPKG on analytics1033 is OK: All packages OK [15:02:57] <_joe_> weird, the puppet compiler gives me an error that makes sense but is not present in production [15:03:13] <_joe_> Error: Function 'fail' does not return a value at /opt/wmf/software/compare-puppet-catalogs/external/change/179468/puppet/modules/apt/manifests/init.pp:39 [15:04:10] (03CR) 10Ottomata: [C: 031] role/logging: add ferm service webstats-collector [puppet] - 10https://gerrit.wikimedia.org/r/179434 (owner: 10Dzahn) [15:04:13] (03PS10) 10Hashar: contint: provision hhvm on CI slaves [puppet] - 10https://gerrit.wikimedia.org/r/178806 [15:05:00] (03CR) 10Ottomata: [C: 031] udp2log: rsync, add ferm service for rsyncd [puppet] - 10https://gerrit.wikimedia.org/r/179438 (owner: 10Dzahn) [15:07:31] (03CR) 10Hashar: "So finally I found a way. PS10 skips local repo entirely and set central repo to an empty path. Thus hhvm fall back to the HHVM_CENTRAL_RE" [puppet] - 10https://gerrit.wikimedia.org/r/178806 (owner: 10Hashar) [15:11:26] RECOVERY - puppet last run on analytics1014 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [15:13:37] !log Zuul Reverting Zuul back to wmf-deploy-20141030-4 . I previously reverted it to another change which was wrong. [15:13:44] Logged the message, Master [15:13:55] RECOVERY - DPKG on analytics1011 is OK: All packages OK [15:16:23] <_joe_> aha! gotcha. [15:20:51] (03PS1) 10Giuseppe Lavagetto: apt: fix for failure case [puppet] - 10https://gerrit.wikimedia.org/r/179472 [15:24:32] (03CR) 10Alexandros Kosiaris: [C: 032] role/logging: add ferm service webstats-collector [puppet] - 10https://gerrit.wikimedia.org/r/179434 (owner: 10Dzahn) [15:25:00] (03CR) 10Giuseppe Lavagetto: "see:" [puppet] - 10https://gerrit.wikimedia.org/r/179472 (owner: 10Giuseppe Lavagetto) [15:25:51] godog: so Zuul works with statsd 3.0.0 apparently. Though the unit tests fails ;] [15:25:56] godog: will fix them next week [15:26:07] PROBLEM - Disk space on ms-be2009 is CRITICAL: DISK CRITICAL - /srv/swift-storage/sdi1 is not accessible: Input/output error [15:26:51] hashar: oh? fail hard? [15:26:57] PROBLEM - RAID on ms-be2009 is CRITICAL: CRITICAL: 1 failed LD(s) (Offline) [15:27:07] (03CR) 10Alexandros Kosiaris: [C: 032] kafkatee: webstats-collector, add ferm service [puppet] - 10https://gerrit.wikimedia.org/r/179429 (owner: 10Dzahn) [15:28:53] paravoid, akosiaris, After disabling the 10g nics I look to be getting a good OS install. Thank you for your help! [15:30:21] (03PS2) 10Giuseppe Lavagetto: varnish: fix scope warnings in templates [puppet] - 10https://gerrit.wikimedia.org/r/179468 [15:30:56] andrewbogott: don't forget to document it [15:32:09] RECOVERY - Disk space on ms-be2009 is OK: DISK OK [15:34:48] godog: yeah that is just the tests [15:34:58] godog: will look at them on monday, I filled a task a reminder [15:35:11] godog: Zuul seems to run happily with statsd 3.0.1 ;] [15:35:34] hashar: perfect! yeah despite the version bump the git log wasn't massive [15:38:16] (03CR) 10Hashar: "And I crafted a basic Jenkins job that use that setup, run hhvm and bump the .hhbc files created. https://integration.wikimedia.org/ci/job" [puppet] - 10https://gerrit.wikimedia.org/r/178806 (owner: 10Hashar) [15:38:53] PROBLEM - puppet last run on virt1000 is CRITICAL: CRITICAL: Puppet has 1 failures [15:38:55] (03PS1) 10Ottomata: Don't ensure $hadoop_journal_directory directory in role class, this is already done in journalnode.pp in cdh module [puppet] - 10https://gerrit.wikimedia.org/r/179474 [15:39:09] (03PS2) 10Ottomata: Don't ensure $hadoop_journal_directory directory in role class, this is already done in journalnode.pp in cdh module [puppet] - 10https://gerrit.wikimedia.org/r/179474 [15:39:20] (03CR) 10Ottomata: [C: 032 V: 032] Don't ensure $hadoop_journal_directory directory in role class, this is already done in journalnode.pp in cdh module [puppet] - 10https://gerrit.wikimedia.org/r/179474 (owner: 10Ottomata) [15:43:11] (03CR) 10Alexandros Kosiaris: [C: 04-1] udp2log: rsync, add ferm service for rsyncd (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/179438 (owner: 10Dzahn) [15:46:05] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Added initial Debian packaging [debs/contenttranslation/apertium-en-ca] - 10https://gerrit.wikimedia.org/r/179117 (owner: 10KartikMistry) [15:46:25] PROBLEM - puppet last run on ms-be2009 is CRITICAL: CRITICAL: Puppet has 1 failures [15:47:30] ACKNOWLEDGEMENT - RAID on ms-be2009 is CRITICAL: CRITICAL: 1 failed LD(s) (Offline) Filippo Giunchedi sdi failed [15:48:00] ACKNOWLEDGEMENT - puppet last run on ms-be2009 is CRITICAL: CRITICAL: Puppet has 1 failures Filippo Giunchedi sdi failed [15:54:28] RECOVERY - puppet last run on virt1000 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:56:32] PROBLEM - Unmerged changes on repository puppet on palladium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [16:06:47] <_joe_> ottomata: an unmerged change on palladium :) [16:08:19] (03PS3) 10Giuseppe Lavagetto: varnish: fix scope warnings in templates [puppet] - 10https://gerrit.wikimedia.org/r/179468 [16:08:24] oh [16:08:31] ah sorry, forgot because that chnage only affects labs [16:08:36] thanks [16:08:40] fmerged [16:08:53] RECOVERY - Unmerged changes on repository puppet on palladium is OK: No changes to merge. [16:11:01] (03CR) 10Giuseppe Lavagetto: [C: 032] "Noop according to the puppet compiler." [puppet] - 10https://gerrit.wikimedia.org/r/179468 (owner: 10Giuseppe Lavagetto) [16:15:18] (03PS1) 10BryanDavis: logstash: Parse apache syslog messages [puppet] - 10https://gerrit.wikimedia.org/r/179480 [16:19:42] PROBLEM - puppet last run on virt1000 is CRITICAL: CRITICAL: Puppet has 1 failures [16:20:43] so much log noise from sloppy array index access -- https://phabricator.wikimedia.org/P152 [16:27:21] (03CR) 10Alexandros Kosiaris: [C: 031 V: 032] "Seems OK, minor questions inline" (033 comments) [debs/contenttranslation/hfst] - 10https://gerrit.wikimedia.org/r/179153 (owner: 10KartikMistry) [16:35:14] RECOVERY - puppet last run on virt1000 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [16:36:18] (03PS1) 10Giuseppe Lavagetto: exim: fix compilation warnings [puppet] - 10https://gerrit.wikimedia.org/r/179485 [16:37:40] (03CR) 10Manybubbles: Basic rspec setup (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/178810 (owner: 10Hashar) [16:42:00] (03CR) 10Hashar: Basic rspec setup (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/178810 (owner: 10Hashar) [16:42:07] !log uploaded apertium-sv-da, apertium-en-ca to apt.wikimedia.org [16:42:23] Logged the message, Master [16:45:46] PROBLEM - Varnish traffic logger on cp4015 is CRITICAL: PROCS CRITICAL: 1 process with command name varnishncsa [16:52:12] (03PS1) 10Alexandros Kosiaris: Delete the install-server::caching-proxy class [puppet] - 10https://gerrit.wikimedia.org/r/179487 [17:01:20] RECOVERY - Varnish traffic logger on cp4015 is OK: PROCS OK: 2 processes with command name varnishncsa [17:06:08] anything change with hhvm or redis lately? [17:06:14] _joe_: ^ (not sure who else to ping) [17:06:26] <_joe_> greg-g: why? [17:06:28] we are having some issues on Beta Cluster, related to those things (probably) [17:06:39] <_joe_> greg-g: not that I know of [17:07:14] <_joe_> also, hhvm and redis are quite disjointed [17:07:31] right, just the two things that are producing errors [17:07:37] https://logstash-beta.wmflabs.org/#/dashboard/elasticsearch/default [17:08:04] (03CR) 10Ori.livneh: [C: 031] openstack-manager: re-add tmp removed mwdeploy user [puppet] - 10https://gerrit.wikimedia.org/r/179444 (owner: 10Dzahn) [17:08:20] <_joe_> greg-g: I guess someone else can take a look? it's 6 PM here [17:08:30] greg-g: hi... do you have a moment to talk about your (probably) favorite topic: Friday deploys [17:09:47] hoo: given beta cluster isn't up right now, not really [17:09:47] (03PS1) 10Alexandros Kosiaris: Remove module url_downloader [puppet] - 10https://gerrit.wikimedia.org/r/179488 [17:11:06] It's not... meh :/ [17:11:10] Not that important anyway [17:11:31] Especially as aude seems to not be around right now [17:14:01] (03PS2) 10Ori.livneh: add `pv` (pipe viewer) to base::standard-packages [puppet] - 10https://gerrit.wikimedia.org/r/179414 [17:14:16] <_joe_> greg-g: I'm taking a look at beta [17:14:48] _joe_: bd808 killed hhvm, which fixed it [17:14:54] (03CR) 10Ori.livneh: [C: 032 V: 032] add `pv` (pipe viewer) to base::standard-packages [puppet] - 10https://gerrit.wikimedia.org/r/179414 (owner: 10Ori.livneh) [17:15:06] _joe_: well "fixed" [17:15:23] <_joe_> greg-g: mh I do see quite a lot of timeouts connecting to redis [17:15:30] We still have a huge amount of memcached error messages too [17:15:36] what's been done so far: [17:15:36] 17:12 bd808: restarted hhvm on deployment-mediawiki0[12] and purged hhbc database [17:15:36] <_joe_> which looks like a network problem more than an hhvm problem [17:15:39] 17:00 bd808: restarted apache2 on deployment-mediawiki01 [17:15:42] 16:59 bd808: restarted apache2 on deployment-mediawiki02 [17:16:34] <_joe_> oh the windows way, ok. [17:16:47] <_joe_> I don't see errors on the backends since quite some time [17:16:54] _joe_: yeah. :/ I'm that kind of fixer. [17:17:16] <_joe_> bd808: eheh no problem, it's just I can't try to figure out where the failure was [17:17:28] _joe_: given that no one in Release Engineering can have the breadth of knowledge of 16 opsen... yeah, sometimes we just restart things. [17:18:05] _joe_: From the apache side -- [proxy_fcgi:error] [pid 15111] [client 10.68.16.12:9259] AH01079: failed to make connection to backend: 127.0.0.1 [17:18:09] over and over and over [17:18:25] <_joe_> bd808: that is a bug in apache [17:18:30] <_joe_> it's a bogus message [17:18:32] so apache and hhvm had decided not to talk to each other [17:18:39] It went away when I restarted hhvm [17:18:39] <_joe_> bd808: no that's not it [17:18:50] <_joe_> oh sorry [17:18:56] <_joe_> "failed to make a connection" [17:19:01] <_joe_> no that is an error [17:19:16] <_joe_> that means that hhvm had its thread pool full, and the queue as well [17:19:37] <_joe_> curl localhost:9002/check-health can help in those cases [17:20:13] does that work when the apache child pool is full? [17:20:32] (03CR) 10Legoktm: [C: 031] Introduce wmgUseMonologLogger feature flag [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179368 (owner: 10BryanDavis) [17:20:57] <_joe_> well, if hhvm refuses connection, apache will return a 503 [17:22:59] RECOVERY - RAID on virt1005 is OK: OK: Active: 14, Working: 14, Failed: 0, Spare: 0 [17:26:30] (03CR) 10Legoktm: "logging.php needs a symlink and to be added to createTxtFileSymlinks.sh I think." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179369 (owner: 10BryanDavis) [17:26:47] (03CR) 10Legoktm: [C: 031] Enable MWLoggerMonologSpi for testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179370 (owner: 10BryanDavis) [17:28:52] _joe_: Is there a way to debug settings from Hiera:Deployment-prep not being used by puppet? It looks like sometime after 2014-12-12T11:41 the settings I have there for syslog forwarding stopped working. [17:29:12] * bd808 will force a puppet run and see what happens this time [17:29:35] assume related to: [17:29:35] 12:20 < YuviPanda> bd808: hashar so the puppet cron was broken in a commit, a followup fixed it but this means that puppet won’t auto run until it’s forced to run (with an up to date ops/puppet) at least once [17:29:55] <_joe_> bd808: mhhh maybe restarting the puppet master with --debug would help [17:30:26] hiera not working explains all the redis and memecached spam too [17:31:10] <_joe_> bd808: hiera depends on wikitech responding, in fact [17:31:15] looks like forcing a new run is fixing it. transient I guess [17:31:33] but that's a huge failure mode for beta [17:32:09] yeah, and we had a wikitech outage for about… 5 mins? [17:36:53] YuviPanda, _joe_: Is there some way we could make puppet fail hard if it can't get the mwyaml data? [17:37:04] (03PS1) 10Alexandros Kosiaris: Add README, RSpecs and tests for squid3 module [puppet] - 10https://gerrit.wikimedia.org/r/179493 [17:37:16] PROBLEM - puppet last run on dysprosium is CRITICAL: CRITICAL: Puppet last ran 4 hours ago [17:37:22] setting the beta app servers to point at prod resources is not such a great thing [17:38:12] I think there was a report of 503 on login at eswiki in #wikipedia [17:38:17] About 40 minutes ago [17:38:57] PROBLEM - puppet last run on ms1004 is CRITICAL: CRITICAL: Puppet last ran 4 hours ago [17:39:28] The guy said he couldn't log in since the 7th [17:40:14] Well, he said "access", or Google Translate did, but the access URL was for special:login I think [17:53:30] marktraceur: https://phabricator.wikimedia.org/T75462 [17:54:46] Thanks legoktm, if he comes back I'll give that to him. [17:55:12] ori: heh, adding pv to base broke toollabs, since pv is already installed there [17:55:13] * YuviPanda fixes [17:55:28] YuviPanda: oops, thanks [17:55:56] (03PS1) 10Yuvipanda: tools: Remove pv from exec_environ, is provided by base now [puppet] - 10https://gerrit.wikimedia.org/r/179497 [17:56:43] (03CR) 10Yuvipanda: [C: 032] tools: Remove pv from exec_environ, is provided by base now [puppet] - 10https://gerrit.wikimedia.org/r/179497 (owner: 10Yuvipanda) [17:57:44] (03PS2) 10BryanDavis: logstash: Parse apache syslog messages [puppet] - 10https://gerrit.wikimedia.org/r/179480 [17:57:53] (03PS1) 10GWicke: Bump ulimit slightly [puppet] - 10https://gerrit.wikimedia.org/r/179499 [17:58:28] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Comments here and there" (036 comments) [puppet] - 10https://gerrit.wikimedia.org/r/178810 (owner: 10Hashar) [18:00:56] (03CR) 10Alexandros Kosiaris: Basic rspec setup (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/178810 (owner: 10Hashar) [18:01:48] PROBLEM - puppet last run on virt1000 is CRITICAL: CRITICAL: Puppet has 2 failures [18:03:16] ori: https://gerrit.wikimedia.org/r/179499 [18:03:57] what now, virt1000 [18:05:17] (03PS1) 10Ori.livneh: xenon profiler: omit empty stacks; log to fluorine [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179501 [18:06:32] andrewbogott_afk: hmm, so mysql on virt seems a bit fucked up. access denied for root... [18:06:40] (03CR) 10Ottomata: [C: 032] Point stats and datasets .wikmedia.org at misc-web-lb [dns] - 10https://gerrit.wikimedia.org/r/179215 (owner: 10Ottomata) [18:06:46] gwicke: seems ok to me, but this is the sort of change that ought to be merged by someone in ops -- not because it's controversial, but because it's a way of keeping everyone on the same page [18:06:56] (03PS2) 10Ori.livneh: Bump ulimit slightly [puppet] - 10https://gerrit.wikimedia.org/r/179499 (owner: 10GWicke) [18:07:01] (03CR) 10Ori.livneh: [C: 031] Bump ulimit slightly [puppet] - 10https://gerrit.wikimedia.org/r/179499 (owner: 10GWicke) [18:07:36] (03CR) 10Ori.livneh: [C: 032] xenon profiler: omit empty stacks; log to fluorine [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179501 (owner: 10Ori.livneh) [18:08:11] ori: okay, fair enough [18:08:44] (03Merged) 10jenkins-bot: xenon profiler: omit empty stacks; log to fluorine [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179501 (owner: 10Ori.livneh) [18:08:50] YuviPanda, if you have a moment: https://gerrit.wikimedia.org/r/#/c/179499/ [18:09:59] gwicke: 100k to 1 million is ‘slightly’? :) [18:10:36] maybe the quotes don't render properly on your screen? [18:10:42] ;) [18:10:45] (03PS1) 10Ottomata: Remove SSL configs for stats.wikmedia.org [puppet] - 10https://gerrit.wikimedia.org/r/179502 [18:11:01] (03PS3) 10Yuvipanda: Bump ulimit [puppet] - 10https://gerrit.wikimedia.org/r/179499 (owner: 10GWicke) [18:12:17] (03CR) 10Yuvipanda: [C: 032] Bump ulimit [puppet] - 10https://gerrit.wikimedia.org/r/179499 (owner: 10GWicke) [18:12:50] YuviPanda: thank you! [18:13:13] gwicke: yw! [18:13:37] !log ori Synchronized wmf-config/StartProfiler.php: (no message) (duration: 00m 08s) [18:13:42] RECOVERY - puppet last run on virt1000 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [18:13:44] Logged the message, Master [18:13:57] YuviPanda: Go my ping earlier on? [18:13:59] (03PS2) 10Ottomata: Remove SSL configs for stats.wikmedia.org [puppet] - 10https://gerrit.wikimedia.org/r/179502 [18:14:00] * Got [18:14:04] hoo: oh, no? [18:14:35] hoo: just saw [18:14:38] hoo: on snapshot1003? [18:14:42] yep [18:14:46] (03PS3) 10Ottomata: Remove SSL configs for stats.wikmedia.org [puppet] - 10https://gerrit.wikimedia.org/r/179502 [18:14:54] cool [18:15:01] I should have added the logrotate when I introduced the log :P [18:15:06] done [18:15:17] !log ran sudo logrotate -f /etc/logrotate.d/dumpwikidatajson on snapshot1003 forhoo [18:15:18] Awesome, thanks :) [18:15:23] yw! [18:15:24] Logged the message, Master [18:16:12] (03PS1) 10Ori.livneh: hhvm: don't dump heap profile on exit [puppet] - 10https://gerrit.wikimedia.org/r/179503 [18:16:28] (03CR) 10Ottomata: [C: 032] Remove SSL configs for stats.wikmedia.org [puppet] - 10https://gerrit.wikimedia.org/r/179502 (owner: 10Ottomata) [18:16:32] (03CR) 10Ori.livneh: [C: 032 V: 032] hhvm: don't dump heap profile on exit [puppet] - 10https://gerrit.wikimedia.org/r/179503 (owner: 10Ori.livneh) [18:16:48] ottomata1: ok to puppet-merge? [18:16:50] yup [18:16:51] do it [18:16:54] was about to ask you the same :) [18:17:08] done [18:18:10] (03PS1) 10Ottomata: Remove community-analytics site, it is not used [puppet] - 10https://gerrit.wikimedia.org/r/179504 [18:21:01] (03CR) 10Ottomata: [C: 032] Remove community-analytics site, it is not used [puppet] - 10https://gerrit.wikimedia.org/r/179504 (owner: 10Ottomata) [18:22:47] hello [18:22:57] is there going on some server side mass upload on commons? [18:23:17] hello? [18:23:21] emergency [18:23:22] (change visibility) 18:21, 12 December 2014 Steinsplitter (talk | contribs | block) blocked 1Veertje (talk | contribs) with an expiry time of 2 hours (autoblock disabled) (script/bot out of control, see AN/U) (unblock | change block) [18:23:29] but uploads are still flooding [18:23:56] That's me [18:24:14] aborted [18:24:27] thanks [18:24:47] What was the problem? [18:25:07] https://commons.wikimedia.org/wiki/Category:Media_requiring_renaming [18:25:14] (03PS1) 10Ottomata: Remove include of non existient community_analytics class [puppet] - 10https://gerrit.wikimedia.org/r/179505 [18:25:53] Aw, I see [18:25:56] https://commons.wikimedia.org/wiki/Commons:Administrators%27_noticeboard/User_problems#User:1Veertje [18:25:57] meh [18:26:03] (03CR) 10Ottomata: [C: 032] Remove include of non existient community_analytics class [puppet] - 10https://gerrit.wikimedia.org/r/179505 (owner: 10Ottomata) [18:26:09] (03CR) 10Ottomata: [V: 032] Remove include of non existient community_analytics class [puppet] - 10https://gerrit.wikimedia.org/r/179505 (owner: 10Ottomata) [18:27:52] (03PS1) 10BryanDavis: beta: Log !log messages from #wikimedia-qa [puppet] - 10https://gerrit.wikimedia.org/r/179507 [18:28:55] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 13.33% of data above the critical threshold [500.0] [18:29:47] (03CR) 10Greg Grossmeier: [C: 031] beta: Log !log messages from #wikimedia-qa [puppet] - 10https://gerrit.wikimedia.org/r/179507 (owner: 10BryanDavis) [18:32:38] (03CR) 10Yuvipanda: "This broke toollabs puppet https://dpaste.de/8hAA" [puppet] - 10https://gerrit.wikimedia.org/r/179083 (owner: 10Faidon Liambotis) [18:32:59] paravoid: ^ [18:33:19] I’m not fully sure what that means. [18:33:40] but I think this is why I didn’t just use the -dev when I added packages differently [18:34:01] (03CR) 10Ori.livneh: [C: 032] mediawiki::monitoring::webserver: provision `apachetop` [puppet] - 10https://gerrit.wikimedia.org/r/178740 (owner: 10Ori.livneh) [18:34:04] (03CR) 10Filippo Giunchedi: "FWIW, 10x increase per process seem excessive to me and it would bump into the system limit anyway. What figures have you seen so far btw?" [puppet] - 10https://gerrit.wikimedia.org/r/179499 (owner: 10GWicke) [18:35:32] (03PS1) 10Yuvipanda: tools: Fix libboost-python-dev [puppet] - 10https://gerrit.wikimedia.org/r/179508 [18:36:32] (03CR) 10BryanDavis: "Tested in beta via cherry-pick. Events captured to logstash -- " [puppet] - 10https://gerrit.wikimedia.org/r/179507 (owner: 10BryanDavis) [18:37:09] (03PS2) 10Yuvipanda: tools: Fix libboost-python-dev [puppet] - 10https://gerrit.wikimedia.org/r/179508 [18:37:27] (03PS3) 10Yuvipanda: tools: Fix libboost-python-dev [puppet] - 10https://gerrit.wikimedia.org/r/179508 [18:38:26] (03CR) 10Yuvipanda: [C: 032] tools: Fix libboost-python-dev [puppet] - 10https://gerrit.wikimedia.org/r/179508 (owner: 10Yuvipanda) [18:39:16] (03CR) 10GWicke: "Actual usage hovers around 4k connections, but can spike higher when backend services back up." [puppet] - 10https://gerrit.wikimedia.org/r/179499 (owner: 10GWicke) [18:41:19] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [18:46:28] PROBLEM - puppet last run on mw1128 is CRITICAL: CRITICAL: Puppet has 1 failures [18:52:38] (03PS1) 10MaxSem: Don't collapse sections on mobile WD [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179513 [18:55:12] (03CR) 10Awight: "@Tim: I'd like to see some of that, too. I know the FR-creative team has been collecting actual data this year, hopefully we'll soon have" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177278 (owner: 10Ejegg) [18:58:43] RECOVERY - puppet last run on mw1128 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [18:59:53] PROBLEM - puppet last run on virt1000 is CRITICAL: CRITICAL: Puppet has 1 failures [19:15:00] RECOVERY - puppet last run on virt1000 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [19:46:56] !log initiating kafka preferred-replica-election to bring analytics1021 back in to leadership :/ need to figure this out, or replace this node soon. [19:47:33] Logged the message, Master [19:48:53] PROBLEM - puppet last run on cp4006 is CRITICAL: CRITICAL: puppet fail [19:51:23] RECOVERY - Kafka Broker Messages In on analytics1021 is OK: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate OKAY: 7869.03383407 [19:56:05] (03PS3) 10BryanDavis: logstash: Parse apache syslog messages [puppet] - 10https://gerrit.wikimedia.org/r/179480 [20:01:57] PROBLEM - puppet last run on virt1000 is CRITICAL: CRITICAL: Puppet has 1 failures [20:03:32] (03CR) 10BryanDavis: "Testing via cherry-pick in beta." [puppet] - 10https://gerrit.wikimedia.org/r/179480 (owner: 10BryanDavis) [20:04:32] RECOVERY - puppet last run on cp4006 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [20:08:06] (03PS2) 10BryanDavis: Optional MWLoggerMonologSpi configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179369 [20:09:09] wikitech is down -- (Cannot contact the database server: Too many connections (208.80.154.18)) [20:09:32] works for me :P [20:09:41] me now too [20:10:00] grammar are hard [20:14:04] RECOVERY - puppet last run on virt1000 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [20:25:31] (03PS1) 10BryanDavis: Fix StartProfiler undef variable warning [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179539 [20:25:52] ori: ^ fixes "Notice: Undefined variable: stacks in /srv/mediawiki/wmf-config/StartProfiler.php on line 122" I think [20:27:50] (03CR) 10BryanDavis: "Top errors in fatal log for the last hour are:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179539 (owner: 10BryanDavis) [20:41:52] PROBLEM - puppet last run on virt1000 is CRITICAL: CRITICAL: Puppet has 1 failures [20:53:52] RECOVERY - puppet last run on virt1000 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:34:46] PROBLEM - puppet last run on mw1061 is CRITICAL: CRITICAL: Puppet has 1 failures [21:46:26] (03PS1) 10Ori.livneh: xenon profiling: fix undefined notice; decouple from mw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179556 [21:47:04] (03CR) 10Ori.livneh: [C: 032] Fix StartProfiler undef variable warning [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179539 (owner: 10BryanDavis) [21:47:10] (03Merged) 10jenkins-bot: Fix StartProfiler undef variable warning [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179539 (owner: 10BryanDavis) [21:47:33] RECOVERY - puppet last run on mw1061 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:49:08] (03PS2) 10Ori.livneh: xenon profiling: fix undefined notice; decouple from mw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179556 [21:50:00] (03PS3) 10Ori.livneh: Tweaks for StartProfiler.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179556 [21:50:04] (03CR) 10jenkins-bot: [V: 04-1] Tweaks for StartProfiler.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179556 (owner: 10Ori.livneh) [21:50:37] (03PS4) 10Ori.livneh: Tweaks for Xenon-based profiling [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179556 [22:05:14] (03CR) 10BryanDavis: "xhprof_enable() is called unconditionally by our Xhprof class, so if it's not present and the profiler is started it will fail hard. You c" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178591 (owner: 10Aaron Schulz) [22:05:37] (03CR) 10Ori.livneh: [C: 032] Tweaks for Xenon-based profiling [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179556 (owner: 10Ori.livneh) [22:05:44] (03Merged) 10jenkins-bot: Tweaks for Xenon-based profiling [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179556 (owner: 10Ori.livneh) [22:06:41] !log ori Synchronized wmf-config/StartProfiler.php: (no message) (duration: 00m 06s) [22:06:46] Logged the message, Master [22:09:02] (03CR) 10Aaron Schulz: "I'll amend later. The hosts are just in labs afaik, so the check could go in that part." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178591 (owner: 10Aaron Schulz) [22:16:50] Anyone know why beta-scap-eqiad is getting permission denied on deployment-rsync01.eqiad.wmflabs? [22:18:10] James_F: It's a transient failure we've been seeing for the last week or so [22:18:31] bd808: Ah, OK. Fun! https://integration.wikimedia.org/ci/job/beta-scap-eqiad/ doesn't look healthy because of it. [22:18:33] I haven't been able to figure out if it's dns, ldap or something else flaking out [22:18:45] * James_F nods. [22:18:59] ugh. not so transient today apparently [22:19:28] We're currently 30 patches behind master on Beta Labs. :-( [22:19:43] (Including one I care about, otherwise I wouldn't have noticed. :-) [22:20:29] <^d> James_F: I think we have a template for this... [22:20:33] <^d> {{sofixit}}? ;-) [22:20:55] ^d: I have no shell in deployment-prep. Do you? :-) [22:21:05] <^d> I can give you that!! [22:21:10] Argh. [22:21:18] I shouldn't have said anything. ;-) [22:21:29] James_F: syncing maually to see if I can figure it out [22:21:37] bd808: Thanks! [22:21:40] James_F: And you should have shell there [22:21:44] and fix things [22:21:53] Oh dear. [22:21:55] or at least help figure out what's broken [22:21:59] * James_F nods. [22:22:11] If I can help, I will. [22:22:29] rsync was wayyyyy out of date [22:32:16] James_F: I think I may have fixed a problem. There was a broken partial rsync on deployement-rsync01 that I cleaned up. [22:32:57] bd808: Aha. Awesome. [22:33:25] bd808: Certainly, the code is now up-to-date. Will keep an eye on it. [22:34:29] We have that job set to only notify on irc on the first failure or I would have noticed that it have been consistently failing for hours [22:48:01] (03PS1) 10BryanDavis: Set wgTranslateTranslationServices['TTMServer']['cutoff'] [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179566 [23:13:01] (03CR) 10MaxSem: [C: 031] Set wgTranslateTranslationServices['TTMServer']['cutoff'] [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179566 (owner: 10BryanDavis) [23:15:01] (03PS1) 10Dr0ptp4kt: Update outbound X-CS behavior in light of unified [puppet] - 10https://gerrit.wikimedia.org/r/179571 [23:23:49] bblack: when you have a moment, would you please take a look at ^ ? i understand no merge-n-deploy today