[00:00:05] RoanKattouw, ^d, marktraceur, MaxSem, kaldari: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141218T0000). Please do the needful. [00:00:05] chasemp, andre__: Respected human, time to deploy RT Migration (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141218T0000). Please do the needful. [00:00:15] (03CR) 10MaxSem: [C: 032] Turn off Main Page special casing on en.wiki Beta Labs for testing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180376 (owner: 10Kaldari) [00:00:25] alright [00:00:30] (03Merged) 10jenkins-bot: Turn off Main Page special casing on en.wiki Beta Labs for testing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180376 (owner: 10Kaldari) [00:00:43] Is this thing on? It's unusually quiet tonight. [00:01:07] andre__: can you invite coren to that chat room for migration? [00:01:28] if that means a private text message: yeah [00:01:40] oops wrong ident [00:01:50] !log maxsem Synchronized wmf-config/: https://gerrit.wikimedia.org/r/#/c/180376/ - no-op for prod (duration: 00m 06s) [00:01:53] Logged the message, Master [00:02:38] MaxSem, there's a last minute one for SWAT. [00:03:21] It's on the calendar now. I'll do the submodule bump. [00:04:56] is mw1152 neing reinstalled? I'm getting mismatching remote key erors [00:05:15] <^d> You're the third person to notice & complain :) [00:05:49] BAAAH BAAAAH @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @ [00:06:00] "IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!" [00:06:12] LIKELY, even, these days. [00:06:48] 08:43 _joe_: depooling mw1152, reimaging as an HAT jobrunner [00:07:15] <^d> Yes, that was like 8 hours ago :) [00:09:17] I wonder if puppet was ever re-enabled on tin -- 2014-12-16 23:31 ori: disabled puppet on tin [00:09:31] no, i never re-enabled it [00:09:32] i'll do that now [00:09:47] that would keep the keys from updating there [00:10:02] PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: Puppet last ran 1 day ago [00:16:37] (03CR) 10TTO: "Probably also remove the logos from InitialiseSettings. The closed sites will be fine with just the default Wikibooks logo, no need for co" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180451 (owner: 10Dzahn) [00:19:32] !log maxsem Synchronized php-1.25wmf13/extensions/Flow/: https://gerrit.wikimedia.org/r/#/c/180677/ (duration: 00m 08s) [00:19:37] superm401, ^ [00:19:39] Logged the message, Master [00:20:30] Thanks, MaxSem [00:22:10] (03PS1) 10Dereckson: Namespace configuration on sa.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180684 [00:24:30] !log Restarting Jenkins to remove a deadlock on deployment-bastion slave [00:24:34] Logged the message, Master [00:28:46] RECOVERY - puppet last run on tin is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:31:11] That change requires an i18n update, but it's not really urgent since it's not in a very visible part of the interface. [00:36:42] (03PS1) 10Springle: repool db1066 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180686 [00:37:36] (03CR) 10Springle: [C: 032] repool db1066 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180686 (owner: 10Springle) [00:38:32] !log springle Synchronized wmf-config/db-eqiad.php: repool db1066, warm up (duration: 00m 05s) [00:38:39] Logged the message, Master [01:05:33] PROBLEM - HHVM busy threads on mw1191 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [115.2] [01:06:20] PROBLEM - HHVM queue size on mw1191 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [80.0] [01:06:37] mw1191 having problems ^^ [01:21:46] PROBLEM - HHVM queue size on mw1191 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [80.0] [01:22:15] !log mw1191 restarted hhvm, apparently stuck in futex [01:22:23] Logged the message, Master [01:22:35] 3000% cpu heh [01:24:17] (03PS1) 10Springle: depool db1072 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180693 [01:24:44] (03CR) 10Springle: [C: 032 V: 032] depool db1072 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180693 (owner: 10Springle) [01:25:35] !log springle Synchronized wmf-config/db-eqiad.php: depool db1072 (duration: 00m 08s) [01:25:38] Logged the message, Master [01:30:36] RECOVERY - HHVM busy threads on mw1191 is OK: OK: Less than 30.00% above the threshold [76.8] [01:31:07] RECOVERY - HHVM queue size on mw1191 is OK: OK: Less than 30.00% above the threshold [10.0] [01:42:26] !log ori Synchronized php-1.25wmf12/resources/src/mediawiki.action/mediawiki.action.edit.stash.js: Ibb29a825c: mediawiki.action.edit.stash: set timeout to 4 seconds (duration: 00m 05s) [01:42:31] Logged the message, Master [01:43:17] ori, make it a conf variable? [02:09:21] (03PS1) 10Ori.livneh: Fix bug in xenon-log [puppet] - 10https://gerrit.wikimedia.org/r/180707 [02:09:44] (03CR) 10Ori.livneh: [C: 032 V: 032] Fix bug in xenon-log [puppet] - 10https://gerrit.wikimedia.org/r/180707 (owner: 10Ori.livneh) [02:57:47] (03PS1) 10Chad: Remove $wmgUseCirrusAsAlternative [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180712 [03:01:10] (03PS2) 10Aaron Schulz: Simplified profiler config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178591 [03:07:25] (03CR) 10Aaron Schulz: [C: 04-2] Simplified profiler config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178591 (owner: 10Aaron Schulz) [03:45:07] PROBLEM - puppet last run on mw1001 is CRITICAL: CRITICAL: Puppet has 23 failures [03:46:47] ps [03:59:57] whoami [04:12:50] PROBLEM - puppet last run on mw1009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:13:03] Fiona, Fiona [04:13:19] Fiona, Max? [04:16:25] RECOVERY - puppet last run on mw1001 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [04:28:50] RECOVERY - puppet last run on mw1009 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [04:29:32] Who's Max? [04:29:48] (03PS1) 10Springle: depool db1055 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180717 [04:30:13] (03CR) 10Springle: [C: 032 V: 032] depool db1055 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180717 (owner: 10Springle) [04:33:07] (03PS12) 10Andrew Bogott: Gzip SVGs on back upload varnishes [puppet] - 10https://gerrit.wikimedia.org/r/108484 (https://bugzilla.wikimedia.org/54291) (owner: 10Ori.livneh) [04:33:26] PROBLEM - SSH on mw1011 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:33:56] PROBLEM - puppet last run on mw1011 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:33:57] PROBLEM - nutcracker process on mw1011 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:34:18] PROBLEM - configured eth on mw1011 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:34:19] PROBLEM - nutcracker port on mw1011 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:34:19] PROBLEM - dhclient process on mw1011 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:34:49] PROBLEM - salt-minion processes on mw1011 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:34:58] PROBLEM - RAID on mw1011 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:35:30] !log springle Synchronized wmf-config/db-eqiad.php: depool db1055 (duration: 04m 19s) [04:35:37] Logged the message, Master [04:37:08] RECOVERY - nutcracker process on mw1011 is OK: PROCS OK: 1 process with UID = 112 (nutcracker), command name nutcracker [04:38:02] andrewbogott: Did you see my comments about the systemd issue? [04:38:23] Coren: just your email -- was there more? [04:39:20] andrewbogott: Sadly, my research hasn't pinpointed a usable fix that I can see; but there are suggestions that there are some recent versions of the lvm package that work around the issue (or makes it better in some way) [04:39:28] RECOVERY - SSH on mw1011 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [04:40:06] PROBLEM - configured eth on mw1009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:40:07] PROBLEM - RAID on mw1009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:40:15] Coren: any idea if that package is frozen in Jessie or if fixes might yet make it in? [04:40:27] RECOVERY - configured eth on mw1011 is OK: NRPE: Unable to read output [04:40:39] RECOVERY - nutcracker port on mw1011 is OK: TCP OK - 0.000 second response time on port 11212 [04:40:45] RECOVERY - dhclient process on mw1011 is OK: PROCS OK: 0 processes with command name dhclient [04:40:58] PROBLEM - RAID on mw1013 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:41:03] (03PS5) 10Andrew Bogott: Gzip .svg and .ico files on bits.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/113687 (https://bugzilla.wikimedia.org/61442) (owner: 10Brion VIBBER) [04:41:05] (03PS13) 10Andrew Bogott: Gzip SVGs on back upload varnishes. [puppet] - 10https://gerrit.wikimedia.org/r/108484 (https://bugzilla.wikimedia.org/54291) (owner: 10Ori.livneh) [04:41:11] RECOVERY - salt-minion processes on mw1011 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [04:41:11] RECOVERY - RAID on mw1011 is OK: OK: no RAID installed [04:42:06] Coren: an alternative that matanya suggested is using btrfs in those images. I'd have to hack on bootstrap-vz quite a bit to support that, and then we'd need different puppet classes for partitioning the different image types, which I don't love. [04:42:20] I did see a forum post with a workaround for at least one example of the issue we're having. Lemme see... [04:43:10] RECOVERY - configured eth on mw1009 is OK: NRPE: Unable to read output [04:43:30] RECOVERY - RAID on mw1009 is OK: OK: no RAID installed [04:43:50] "I just edited my fstab for the /var partition changing the UUID to /dev/mapper/vgArchie-lvolVar and everything boots up fine now" [04:43:54] RECOVERY - RAID on mw1013 is OK: OK: no RAID installed [04:43:54] https://bbs.archlinux.org/viewtopic.php?id=149351 [04:43:58] why are the job runners freaking out? [04:44:00] No idea if that's the same issue though [04:48:15] WTH is going on. I see people coming in and out of the channel but I doubt nobody said anything in hours. [04:48:44] Coren: going on with what? [04:48:57] the wmf christmas party is tonight [04:49:01] so a lot of folks are out [04:51:25] ori: maybe Coren is haunting this channel and can talk but not hear? [04:51:34] I just emailed him to ask :) [04:51:44] heh :) [04:52:34] PROBLEM - configured eth on mw1009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:52:54] PROBLEM - RAID on mw1009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:53:22] Coren: any better now? [04:53:24] Ping? [04:53:29] Pong! [04:53:38] Yep. I was desync'ed somehow. [04:53:44] Odd. Other channels worked fine. [04:54:16] It's too bad that you were suspicious, otherwise you could've spent days just assuming that everything is perfect... [04:54:30] Hah. [04:54:46] Here's what you missed: [04:54:49] andrewbogott: Coren: any idea if that package is frozen in Jessie or if fixes might yet make it in? [04:55:06] andrewbogott: Coren: an alternative that matanya suggested is using btrfs in those images. I'd have to hack on bootstrap-vz quite a bit to support that, and then we'd need different puppet classes for partitioning the different image types, which I don't love. [04:55:07] [12:42pm] andrewbogott: I did see a forum post with a workaround for at least one example of the issue we're having. Lemme see... [04:55:08] [12:43pm] [04:55:16] andrewbogott: "I just edited my fstab for the /var partition changing the UUID to /dev/mapper/vgArchie-lvolVar and everything boots up fine now" [04:55:17] [12:43pm] andrewbogott: https://bbs.archlinux.org/viewtopic.php?id=149351 [04:55:18] [12:43pm] [04:55:31] RECOVERY - configured eth on mw1009 is OK: NRPE: Unable to read output [04:55:57] andrewbogott: No idea about freezes, I know Ubuntu's process but I'm not sure where I'd have to look for debian. [04:55:58] RECOVERY - puppet last run on mw1011 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [04:56:00] RECOVERY - RAID on mw1009 is OK: OK: no RAID installed [04:57:56] But the UUID issue is different (even if the symptom is the same) -- the idea is that by the type systemd decides the devices aren't there the lvm volumes aren't yet visible -- either because the scanning process never got really started (seen in arch linux, FC) or because it failed (seen in gentoo). FC also has issues because initramfs, but we're not using that. [04:59:32] We need to get some serious systemd-fu anyways; I've a feeling this isn't going to be the last time it bites us. [05:00:36] PROBLEM - puppet last run on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:01:21] Any thoughts about what I should do about that image in the meantime? I can build one without lvm, it'll just have a small fixed-size / partition. [05:01:31] I guess that'll allow people to get working on other issues, so there's no reason not to do that. [05:01:37] I just wanted it to be perfect :) [05:01:47] i think i know what is up with the job runners [05:01:51] and it may be my fault [05:01:54] i'm investigating [05:02:03] http://ganglia.wikimedia.org/latest/graph.php?r=day&z=xlarge&c=Jobrunners+eqiad&m=cpu_report&s=by+name&mc=2&g=mem_report is not a happy graph [05:02:14] Well, we really want to have LVM anyways; we got plenty of manifests that rely on being able to allocate disk space. [05:02:22] possibly related to https://gerrit.wikimedia.org/r/#/c/180385/ [05:02:36] Coren: yeah, but there's not any clear way to move forward on this is there? [05:02:43] Or are you still poking? [05:02:55] PROBLEM - puppet last run on mw1009 is CRITICAL: CRITICAL: Puppet has 5 failures [05:03:51] andrewbogott: My first instinct is to try to get some extra info on what is going on during bootstrap. The solution might be as simple as tweaking some of the systemd config. Or there might be trickery allowing us to change the ordering of mount attempts I've seen hints to. [05:04:08] Right now, I'm reading systemd documentation and trying to get my head around it. [05:05:15] Coren: ok. I'm going to make a wiki page with the step-by-step for building new images so that you can test when I'm away [05:05:25] Since presumably you don't have a lot of awake time left tonight. [05:07:25] Heh. Certainly none left at full brain capacity. [05:08:08] Are there any tests I can run in the meantime? We're already getting quite complete console logs on wikitech which I'm pleased about [05:08:30] braaaaiiiiiins. no wait, wrong holiday. [05:10:21] andrewbogott: If you can find a way to run something /really/ early in the systemd bootstrap (like, before it even tries to mount anything beyond /) the output of 'lvscan -b -v' would be really helpful. [05:10:54] I take it that rc.local is too late? [05:11:00] hm... [05:11:00] Also, much of the similar problems I've heard of are reported intermittent (timeout/race issues?) - I take it you already tried rebooting the ailing instances a couple times? [05:11:23] I haven't tried that too much, but will try now [05:11:28] Yeah, sysvinit-style would have rc.sysinit early enough. [05:11:37] (03PS1) 10Rush: phab add rt mail routes to projects [puppet] - 10https://gerrit.wikimedia.org/r/180720 [05:12:18] I can certainly copy an arbitrary script into the image, if you know what I should copy or where I should put it... [05:12:34] Not yet - that's why I'm reading docs. :-) [05:12:41] 'k :) [05:12:49] RECOVERY - puppet last run on mw1003 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [05:13:36] PROBLEM - puppet last run on mw1012 is CRITICAL: CRITICAL: Puppet has 1 failures [05:14:23] PROBLEM - nutcracker process on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:14:37] (03PS1) 10Rush: rt: add aliases for queue mail redirects [puppet] - 10https://gerrit.wikimedia.org/r/180721 [05:15:27] PROBLEM - salt-minion processes on mw1009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:15:30] PROBLEM - nutcracker process on mw1009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:18:29] RECOVERY - salt-minion processes on mw1009 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [05:18:30] RECOVERY - nutcracker process on mw1009 is OK: PROCS OK: 1 process with UID = 112 (nutcracker), command name nutcracker [05:18:47] (03CR) 10Rush: [C: 032] phab add rt mail routes to projects [puppet] - 10https://gerrit.wikimedia.org/r/180720 (owner: 10Rush) [05:19:00] (03CR) 10Rush: [C: 032] rt: add aliases for queue mail redirects [puppet] - 10https://gerrit.wikimedia.org/r/180721 (owner: 10Rush) [05:20:33] RECOVERY - nutcracker process on mw1003 is OK: PROCS OK: 1 process with UID = 112 (nutcracker), command name nutcracker [05:22:13] Coren: https://wikitech.wikimedia.org/wiki/OpenStack#Building_a_Debian_image [05:22:45] Also: so far reboots are giving the same repeated behavior. I'll try a few more to be sure. [05:23:02] RECOVERY - puppet last run on mw1012 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [05:23:16] PROBLEM - puppet last run on mw1004 is CRITICAL: CRITICAL: Puppet has 1 failures [05:25:00] RECOVERY - puppet last run on mw1009 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [05:25:21] PROBLEM - puppet last run on mw1001 is CRITICAL: CRITICAL: Puppet has 18 failures [05:26:14] RECOVERY - puppet last run on mw1004 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [05:27:01] ori: I am (obviously) not very tuned in to the job runner thing, but I'm around to review if you have a fix. [05:27:51] Coren: yeah, "Welcome to emergency mode!" on every reboot [05:30:31] "fun" [05:31:21] Yeah, they should really change that message to "Welcome to emergency mode :( " [05:34:13] RECOVERY - puppet last run on mw1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:40:53] (03PS1) 10KartikMistry: WIP: Beta: Add support for more language pairs [puppet] - 10https://gerrit.wikimedia.org/r/180724 [05:46:28] (03CR) 10Andrew Bogott: "I set aside the issue I was working on when you wrote this, but I will definitely want this tool when I return." [puppet] - 10https://gerrit.wikimedia.org/r/175153 (owner: 10Ori.livneh) [05:48:55] (03PS5) 10Andrew Bogott: Get betalabs localsettings.js file from deploy repo (just like prod) [puppet] - 10https://gerrit.wikimedia.org/r/166610 (owner: 10Subramanya Sastry) [05:51:01] (03CR) 10Andrew Bogott: [C: 032] Get betalabs localsettings.js file from deploy repo (just like prod) [puppet] - 10https://gerrit.wikimedia.org/r/166610 (owner: 10Subramanya Sastry) [06:22:21] PROBLEM - puppet last run on es1004 is CRITICAL: CRITICAL: Puppet has 3 failures [06:22:22] PROBLEM - puppet last run on amssq50 is CRITICAL: CRITICAL: puppet fail [06:22:38] PROBLEM - puppetmaster backend https on palladium is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 8141: HTTP/1.1 500 Internal Server Error [06:22:57] PROBLEM - puppet last run on cp3019 is CRITICAL: CRITICAL: puppet fail [06:22:57] PROBLEM - puppet last run on cp3011 is CRITICAL: CRITICAL: Puppet has 34 failures [06:22:57] PROBLEM - puppet last run on mw1157 is CRITICAL: CRITICAL: puppet fail [06:23:12] PROBLEM - puppet last run on radon is CRITICAL: CRITICAL: Puppet has 27 failures [06:23:28] PROBLEM - puppet last run on virt1005 is CRITICAL: CRITICAL: Puppet has 19 failures [06:23:28] PROBLEM - puppet last run on labsdb1002 is CRITICAL: CRITICAL: Puppet has 23 failures [06:23:28] PROBLEM - puppet last run on cp3021 is CRITICAL: CRITICAL: Puppet has 25 failures [06:23:28] PROBLEM - puppet last run on mw1230 is CRITICAL: CRITICAL: puppet fail [06:23:29] PROBLEM - puppet last run on cp4015 is CRITICAL: CRITICAL: puppet fail [06:23:36] <_joe|justawake> !log restarted the puppetmaster on palladium [06:23:44] Logged the message, Master [06:23:49] PROBLEM - puppet last run on mw1021 is CRITICAL: CRITICAL: Puppet has 17 failures [06:23:49] PROBLEM - puppet last run on search1003 is CRITICAL: CRITICAL: Puppet has 55 failures [06:23:49] PROBLEM - puppet last run on cp1064 is CRITICAL: CRITICAL: puppet fail [06:24:03] PROBLEM - puppet last run on palladium is CRITICAL: CRITICAL: Puppet has 30 failures [06:24:05] PROBLEM - puppet last run on mw1019 is CRITICAL: CRITICAL: puppet fail [06:24:07] PROBLEM - puppet last run on elastic1013 is CRITICAL: CRITICAL: puppet fail [06:24:08] PROBLEM - puppet last run on virt1009 is CRITICAL: CRITICAL: Puppet has 20 failures [06:24:14] PROBLEM - puppet last run on mw1199 is CRITICAL: CRITICAL: Puppet has 4 failures [06:24:24] PROBLEM - puppet last run on db1005 is CRITICAL: CRITICAL: Puppet has 30 failures [06:24:30] PROBLEM - puppet last run on elastic1029 is CRITICAL: CRITICAL: Puppet has 26 failures [06:24:30] PROBLEM - puppet last run on mw1073 is CRITICAL: CRITICAL: Puppet has 69 failures [06:24:30] PROBLEM - puppet last run on db1027 is CRITICAL: CRITICAL: Puppet has 23 failures [06:24:40] PROBLEM - puppet last run on ms-be2009 is CRITICAL: CRITICAL: puppet fail [06:24:41] PROBLEM - puppet last run on cp4010 is CRITICAL: CRITICAL: Puppet has 26 failures [06:24:41] PROBLEM - puppet last run on labsdb1007 is CRITICAL: CRITICAL: puppet fail [06:24:41] PROBLEM - puppet last run on mw1015 is CRITICAL: CRITICAL: puppet fail [06:24:46] PROBLEM - puppet last run on mw1137 is CRITICAL: CRITICAL: Puppet has 54 failures [06:24:46] PROBLEM - puppet last run on search1019 is CRITICAL: CRITICAL: Puppet has 54 failures [06:25:06] PROBLEM - puppet last run on mw1070 is CRITICAL: CRITICAL: puppet fail [06:25:08] PROBLEM - puppet last run on mw1103 is CRITICAL: CRITICAL: Puppet has 77 failures [06:25:08] PROBLEM - puppet last run on db1019 is CRITICAL: CRITICAL: puppet fail [06:25:08] PROBLEM - puppet last run on db1068 is CRITICAL: CRITICAL: puppet fail [06:25:08] PROBLEM - puppet last run on ms-be3004 is CRITICAL: CRITICAL: Puppet has 16 failures [06:25:08] PROBLEM - puppet last run on search1009 is CRITICAL: CRITICAL: Puppet has 57 failures [06:25:10] PROBLEM - puppet last run on mw1018 is CRITICAL: CRITICAL: Puppet has 72 failures [06:25:20] PROBLEM - puppet last run on lvs1006 is CRITICAL: CRITICAL: puppet fail [06:25:20] PROBLEM - puppet last run on cp4016 is CRITICAL: CRITICAL: Puppet has 26 failures [06:25:20] PROBLEM - puppet last run on cp3005 is CRITICAL: CRITICAL: Puppet has 35 failures [06:25:20] PROBLEM - puppet last run on tmh1001 is CRITICAL: CRITICAL: puppet fail [06:25:21] PROBLEM - puppet last run on tmh1002 is CRITICAL: CRITICAL: Puppet has 63 failures [06:25:21] PROBLEM - puppet last run on mw1102 is CRITICAL: CRITICAL: puppet fail [06:25:21] PROBLEM - puppet last run on mw1013 is CRITICAL: CRITICAL: puppet fail [06:25:21] PROBLEM - puppet last run on mw1095 is CRITICAL: CRITICAL: puppet fail [06:25:31] PROBLEM - puppet last run on mw1194 is CRITICAL: CRITICAL: Puppet has 67 failures [06:25:35] PROBLEM - puppet last run on cp4011 is CRITICAL: CRITICAL: puppet fail [06:25:36] PROBLEM - puppet last run on lvs2005 is CRITICAL: CRITICAL: puppet fail [06:25:46] PROBLEM - puppet last run on bast2001 is CRITICAL: CRITICAL: Puppet has 24 failures [06:25:47] PROBLEM - puppet last run on search1008 is CRITICAL: CRITICAL: puppet fail [06:25:50] PROBLEM - puppet last run on cp3007 is CRITICAL: CRITICAL: puppet fail [06:25:50] PROBLEM - puppet last run on cp1066 is CRITICAL: CRITICAL: puppet fail [06:25:51] PROBLEM - puppet last run on pollux is CRITICAL: CRITICAL: puppet fail [06:25:52] PROBLEM - puppet last run on lvs2002 is CRITICAL: CRITICAL: Puppet has 16 failures [06:25:52] PROBLEM - puppet last run on cp1065 is CRITICAL: CRITICAL: Puppet has 27 failures [06:25:53] PROBLEM - puppet last run on mw1179 is CRITICAL: CRITICAL: Puppet has 65 failures [06:26:04] PROBLEM - puppet last run on lanthanum is CRITICAL: CRITICAL: puppet fail [06:26:04] PROBLEM - puppet last run on ms-be1010 is CRITICAL: CRITICAL: Puppet has 21 failures [06:26:05] PROBLEM - puppet last run on mw1047 is CRITICAL: CRITICAL: Puppet has 69 failures [06:26:05] PROBLEM - puppet last run on ms-be2015 is CRITICAL: CRITICAL: puppet fail [06:26:05] PROBLEM - puppet last run on es2006 is CRITICAL: CRITICAL: Puppet has 20 failures [06:26:05] PROBLEM - puppet last run on ocg1002 is CRITICAL: CRITICAL: puppet fail [06:26:05] PROBLEM - puppet last run on mw1075 is CRITICAL: CRITICAL: puppet fail [06:26:06] PROBLEM - puppet last run on mw1078 is CRITICAL: CRITICAL: Puppet has 59 failures [06:26:17] PROBLEM - puppet last run on mc1015 is CRITICAL: CRITICAL: Puppet has 15 failures [06:26:17] PROBLEM - puppet last run on db1009 is CRITICAL: CRITICAL: puppet fail [06:26:17] PROBLEM - puppet last run on db1049 is CRITICAL: CRITICAL: Puppet has 24 failures [06:26:17] PROBLEM - puppet last run on db2017 is CRITICAL: CRITICAL: Puppet has 23 failures [06:26:17] PROBLEM - puppet last run on mw1020 is CRITICAL: CRITICAL: puppet fail [06:26:18] PROBLEM - puppet last run on neon is CRITICAL: CRITICAL: puppet fail [06:26:29] PROBLEM - puppet last run on hydrogen is CRITICAL: CRITICAL: puppet fail [06:26:32] PROBLEM - puppet last run on mw1017 is CRITICAL: CRITICAL: puppet fail [06:26:33] PROBLEM - puppet last run on elastic1010 is CRITICAL: CRITICAL: Puppet has 26 failures [06:26:33] PROBLEM - puppet last run on cp4017 is CRITICAL: CRITICAL: Puppet has 26 failures [06:26:33] PROBLEM - puppet last run on amssq37 is CRITICAL: CRITICAL: Puppet has 25 failures [06:26:33] PROBLEM - puppet last run on db2010 is CRITICAL: CRITICAL: Puppet has 14 failures [06:26:33] PROBLEM - puppet last run on amslvs4 is CRITICAL: CRITICAL: Puppet has 21 failures [06:26:33] PROBLEM - puppet last run on cp4013 is CRITICAL: CRITICAL: puppet fail [06:26:34] PROBLEM - puppet last run on mw1101 is CRITICAL: CRITICAL: puppet fail [06:26:34] PROBLEM - puppet last run on lvs3002 is CRITICAL: CRITICAL: Puppet has 24 failures [06:26:35] PROBLEM - puppet last run on amssq52 is CRITICAL: CRITICAL: puppet fail [06:26:35] PROBLEM - puppet last run on cp3017 is CRITICAL: CRITICAL: Puppet has 26 failures [06:26:36] PROBLEM - puppet last run on amssq45 is CRITICAL: CRITICAL: Puppet has 29 failures [06:26:45] PROBLEM - puppet last run on db1058 is CRITICAL: CRITICAL: Puppet has 23 failures [06:26:45] PROBLEM - puppet last run on mw1138 is CRITICAL: CRITICAL: puppet fail [06:26:45] PROBLEM - puppet last run on ms-be1014 is CRITICAL: CRITICAL: puppet fail [06:26:45] PROBLEM - puppet last run on mc1008 is CRITICAL: CRITICAL: Puppet has 22 failures [06:26:45] PROBLEM - puppet last run on es1003 is CRITICAL: CRITICAL: Puppet has 25 failures [06:26:45] PROBLEM - puppet last run on mw1036 is CRITICAL: CRITICAL: Puppet has 82 failures [06:26:46] PROBLEM - puppet last run on mw1184 is CRITICAL: CRITICAL: puppet fail [06:26:46] PROBLEM - puppet last run on mw1169 is CRITICAL: CRITICAL: puppet fail [06:26:46] PROBLEM - puppet last run on mw1094 is CRITICAL: CRITICAL: puppet fail [06:26:47] PROBLEM - puppet last run on mw1128 is CRITICAL: CRITICAL: Puppet has 78 failures [06:26:54] PROBLEM - puppet last run on mw1058 is CRITICAL: CRITICAL: Puppet has 75 failures [06:26:54] PROBLEM - puppet last run on mw1232 is CRITICAL: CRITICAL: puppet fail [06:26:54] PROBLEM - puppet last run on es1009 is CRITICAL: CRITICAL: Puppet has 19 failures [06:26:54] PROBLEM - puppet last run on db1024 is CRITICAL: CRITICAL: puppet fail [06:26:59] PROBLEM - puppet last run on elastic1009 is CRITICAL: CRITICAL: puppet fail [06:26:59] PROBLEM - puppet last run on mw1136 is CRITICAL: CRITICAL: Puppet has 73 failures [06:27:00] PROBLEM - puppet last run on elastic1016 is CRITICAL: CRITICAL: Puppet has 30 failures [06:27:00] PROBLEM - puppet last run on lvs4001 is CRITICAL: CRITICAL: puppet fail [06:27:08] PROBLEM - puppet last run on mc1010 is CRITICAL: CRITICAL: puppet fail [06:27:08] PROBLEM - puppet last run on search1021 is CRITICAL: CRITICAL: puppet fail [06:27:09] PROBLEM - puppet last run on mw1182 is CRITICAL: CRITICAL: Puppet has 73 failures [06:27:09] PROBLEM - puppet last run on cp1043 is CRITICAL: CRITICAL: Puppet has 22 failures [06:27:15] PROBLEM - puppet last run on es1006 is CRITICAL: CRITICAL: puppet fail [06:27:19] PROBLEM - puppet last run on analytics1036 is CRITICAL: CRITICAL: Puppet has 26 failures [06:27:20] PROBLEM - puppet last run on db1007 is CRITICAL: CRITICAL: Puppet has 25 failures [06:27:20] PROBLEM - puppet last run on mw1085 is CRITICAL: CRITICAL: Puppet has 72 failures [06:27:20] PROBLEM - puppet last run on mw1130 is CRITICAL: CRITICAL: puppet fail [06:27:20] PROBLEM - puppet last run on cp1069 is CRITICAL: CRITICAL: puppet fail [06:27:20] PROBLEM - puppet last run on mw1191 is CRITICAL: CRITICAL: Puppet has 75 failures [06:27:20] PROBLEM - puppet last run on search1014 is CRITICAL: CRITICAL: puppet fail [06:27:21] PROBLEM - puppet last run on es1005 is CRITICAL: CRITICAL: puppet fail [06:27:21] PROBLEM - puppet last run on amssq58 is CRITICAL: CRITICAL: Puppet has 24 failures [06:27:48] PROBLEM - puppet last run on mw1221 is CRITICAL: CRITICAL: puppet fail [06:27:48] PROBLEM - puppet last run on db1041 is CRITICAL: CRITICAL: puppet fail [06:27:49] PROBLEM - puppet last run on mw1214 is CRITICAL: CRITICAL: Puppet has 80 failures [06:27:49] PROBLEM - puppet last run on zirconium is CRITICAL: CRITICAL: puppet fail [06:27:49] PROBLEM - puppet last run on cp1057 is CRITICAL: CRITICAL: Puppet has 21 failures [06:27:49] PROBLEM - puppet last run on mw1245 is CRITICAL: CRITICAL: puppet fail [06:27:57] PROBLEM - puppet last run on erbium is CRITICAL: CRITICAL: Puppet has 31 failures [06:27:59] PROBLEM - puppet last run on mw1096 is CRITICAL: CRITICAL: puppet fail [06:27:59] PROBLEM - puppet last run on wtp1024 is CRITICAL: CRITICAL: puppet fail [06:27:59] PROBLEM - puppet last run on analytics1015 is CRITICAL: CRITICAL: puppet fail [06:28:07] PROBLEM - puppet last run on mw1246 is CRITICAL: CRITICAL: puppet fail [06:28:07] PROBLEM - puppet last run on analytics1024 is CRITICAL: CRITICAL: puppet fail [06:28:07] PROBLEM - puppet last run on mw1257 is CRITICAL: CRITICAL: puppet fail [06:28:19] PROBLEM - puppet last run on ms-be1005 is CRITICAL: CRITICAL: puppet fail [06:28:19] PROBLEM - puppet last run on mw1218 is CRITICAL: CRITICAL: Puppet has 75 failures [06:28:19] PROBLEM - puppet last run on ms-be2010 is CRITICAL: CRITICAL: Puppet has 28 failures [06:28:19] PROBLEM - puppet last run on cp3006 is CRITICAL: CRITICAL: puppet fail [06:28:19] PROBLEM - puppet last run on elastic1026 is CRITICAL: CRITICAL: Puppet has 20 failures [06:28:19] PROBLEM - puppet last run on mw1234 is CRITICAL: CRITICAL: puppet fail [06:28:24] <_joe|justawake> it's gonna stop shortly I hope [06:28:27] PROBLEM - puppet last run on mw1161 is CRITICAL: CRITICAL: puppet fail [06:28:33] <_joe|justawake> the puppetmaster on palladium is working now [06:28:42] PROBLEM - puppet last run on analytics1034 is CRITICAL: CRITICAL: puppet fail [06:28:46] PROBLEM - puppet last run on mw1083 is CRITICAL: CRITICAL: Puppet has 72 failures [06:28:46] PROBLEM - puppet last run on mw1062 is CRITICAL: CRITICAL: Puppet has 66 failures [06:28:46] PROBLEM - puppet last run on analytics1029 is CRITICAL: CRITICAL: puppet fail [06:28:46] PROBLEM - puppet last run on mw1080 is CRITICAL: CRITICAL: puppet fail [06:28:46] PROBLEM - puppet last run on analytics1019 is CRITICAL: CRITICAL: puppet fail [06:28:46] PROBLEM - puppet last run on mw1127 is CRITICAL: CRITICAL: Puppet has 72 failures [06:28:46] PROBLEM - puppet last run on elastic1028 is CRITICAL: CRITICAL: Puppet has 22 failures [06:28:47] PROBLEM - puppet last run on vanadium is CRITICAL: CRITICAL: Puppet has 30 failures [06:28:47] PROBLEM - puppet last run on lvs4002 is CRITICAL: CRITICAL: puppet fail [06:28:54] PROBLEM - puppet last run on amssq44 is CRITICAL: CRITICAL: puppet fail [06:28:57] PROBLEM - puppet last run on amssq57 is CRITICAL: CRITICAL: Puppet has 30 failures [06:29:04] PROBLEM - puppet last run on wtp1014 is CRITICAL: CRITICAL: Puppet has 21 failures [06:29:05] PROBLEM - puppet last run on dbproxy1002 is CRITICAL: CRITICAL: puppet fail [06:29:06] PROBLEM - puppet last run on wtp1021 is CRITICAL: CRITICAL: Puppet has 27 failures [06:29:07] PROBLEM - puppet last run on mw1233 is CRITICAL: CRITICAL: Puppet has 68 failures [06:29:07] PROBLEM - puppet last run on mw1132 is CRITICAL: CRITICAL: Puppet has 77 failures [06:29:07] PROBLEM - puppet last run on mc1011 is CRITICAL: CRITICAL: Puppet has 22 failures [06:29:29] PROBLEM - puppet last run on mw1115 is CRITICAL: CRITICAL: Puppet has 63 failures [06:29:29] PROBLEM - puppet last run on cp1051 is CRITICAL: CRITICAL: puppet fail [06:29:29] PROBLEM - puppet last run on ms-fe1003 is CRITICAL: CRITICAL: puppet fail [06:29:29] PROBLEM - puppet last run on mc1009 is CRITICAL: CRITICAL: Puppet has 26 failures [06:29:29] PROBLEM - puppet last run on pc1001 is CRITICAL: CRITICAL: puppet fail [06:29:30] PROBLEM - puppet last run on db1010 is CRITICAL: CRITICAL: puppet fail [06:29:30] PROBLEM - puppet last run on mw1216 is CRITICAL: CRITICAL: puppet fail [06:29:32] PROBLEM - puppet last run on cp4007 is CRITICAL: CRITICAL: puppet fail [06:29:32] PROBLEM - puppet last run on mw1134 is CRITICAL: CRITICAL: puppet fail [06:29:38] PROBLEM - puppet last run on ytterbium is CRITICAL: CRITICAL: Puppet has 30 failures [06:29:48] PROBLEM - puppet last run on mw1007 is CRITICAL: CRITICAL: puppet fail [06:29:49] PROBLEM - puppet last run on labsdb1001 is CRITICAL: CRITICAL: puppet fail [06:29:49] PROBLEM - puppet last run on mw1244 is CRITICAL: CRITICAL: puppet fail [06:29:49] PROBLEM - puppet last run on mw1196 is CRITICAL: CRITICAL: Puppet has 73 failures [06:29:49] PROBLEM - puppet last run on mw1067 is CRITICAL: CRITICAL: Puppet has 66 failures [06:29:49] PROBLEM - puppet last run on mw1147 is CRITICAL: CRITICAL: Puppet has 70 failures [06:29:57] PROBLEM - puppet last run on es1001 is CRITICAL: CRITICAL: puppet fail [06:30:01] PROBLEM - puppet last run on cp3008 is CRITICAL: CRITICAL: puppet fail [06:30:04] PROBLEM - puppet last run on virt1002 is CRITICAL: CRITICAL: puppet fail [06:30:04] PROBLEM - puppet last run on search1020 is CRITICAL: CRITICAL: puppet fail [06:30:07] PROBLEM - puppet last run on mw1224 is CRITICAL: CRITICAL: puppet fail [06:30:07] PROBLEM - puppet last run on mc1004 is CRITICAL: CRITICAL: Puppet has 25 failures [06:30:07] PROBLEM - puppet last run on mw1072 is CRITICAL: CRITICAL: Puppet has 74 failures [06:30:08] PROBLEM - puppet last run on cp1070 is CRITICAL: CRITICAL: puppet fail [06:30:08] PROBLEM - puppet last run on uranium is CRITICAL: CRITICAL: puppet fail [06:30:08] PROBLEM - puppet last run on mw1035 is CRITICAL: CRITICAL: Puppet has 71 failures [06:30:08] PROBLEM - puppet last run on ms-be1004 is CRITICAL: CRITICAL: puppet fail [06:30:09] PROBLEM - puppet last run on db1029 is CRITICAL: CRITICAL: puppet fail [06:30:09] PROBLEM - puppet last run on mw1109 is CRITICAL: CRITICAL: Puppet has 74 failures [06:30:10] PROBLEM - puppet last run on dbstore1001 is CRITICAL: CRITICAL: Puppet has 19 failures [06:30:20] PROBLEM - puppet last run on rdb1004 is CRITICAL: CRITICAL: puppet fail [06:30:20] PROBLEM - puppet last run on mw1252 is CRITICAL: CRITICAL: puppet fail [06:30:20] PROBLEM - puppet last run on mw1089 is CRITICAL: CRITICAL: Puppet has 68 failures [06:30:20] PROBLEM - puppet last run on mw1256 is CRITICAL: CRITICAL: Puppet has 70 failures [06:30:20] PROBLEM - puppet last run on analytics1039 is CRITICAL: CRITICAL: Puppet has 21 failures [06:30:21] PROBLEM - puppet last run on ms-be1013 is CRITICAL: CRITICAL: puppet fail [06:30:21] RECOVERY - puppetmaster backend https on palladium is OK: HTTP OK: Status line output matched 400 - 335 bytes in 0.035 second response time [06:30:22] PROBLEM - puppet last run on wtp1009 is CRITICAL: CRITICAL: Puppet has 24 failures [06:30:22] PROBLEM - puppet last run on mw1059 is CRITICAL: CRITICAL: puppet fail [06:30:23] PROBLEM - puppet last run on analytics1033 is CRITICAL: CRITICAL: Puppet has 17 failures [06:30:24] PROBLEM - puppet last run on ms-be2013 is CRITICAL: CRITICAL: Puppet has 26 failures [06:30:24] PROBLEM - puppet last run on db2035 is CRITICAL: CRITICAL: Puppet has 18 failures [06:30:24] PROBLEM - puppet last run on ms-be2002 is CRITICAL: CRITICAL: Puppet has 31 failures [06:30:25] PROBLEM - puppet last run on cp4006 is CRITICAL: CRITICAL: puppet fail [06:30:25] PROBLEM - puppet last run on amssq54 is CRITICAL: CRITICAL: puppet fail [06:30:26] PROBLEM - puppet last run on amssq59 is CRITICAL: CRITICAL: puppet fail [06:30:26] PROBLEM - puppet last run on cp3022 is CRITICAL: CRITICAL: puppet fail [06:30:27] PROBLEM - puppet last run on ms-be1001 is CRITICAL: CRITICAL: puppet fail [06:30:27] PROBLEM - puppet last run on cp3020 is CRITICAL: CRITICAL: Puppet has 35 failures [06:30:28] PROBLEM - puppet last run on mc1016 is CRITICAL: CRITICAL: puppet fail [06:30:28] PROBLEM - puppet last run on amssq43 is CRITICAL: CRITICAL: Puppet has 24 failures [06:30:29] PROBLEM - puppet last run on mw1040 is CRITICAL: CRITICAL: Puppet has 59 failures [06:30:29] PROBLEM - puppet last run on logstash1001 is CRITICAL: CRITICAL: Puppet has 32 failures [06:30:37] PROBLEM - puppet last run on mw1124 is CRITICAL: CRITICAL: puppet fail [06:30:39] PROBLEM - puppet last run on cp3013 is CRITICAL: CRITICAL: puppet fail [06:30:40] PROBLEM - puppet last run on elastic1007 is CRITICAL: CRITICAL: puppet fail [06:30:40] PROBLEM - puppet last run on mw1178 is CRITICAL: CRITICAL: Puppet has 59 failures [06:30:40] PROBLEM - puppet last run on mw1045 is CRITICAL: CRITICAL: Puppet has 79 failures [06:30:40] PROBLEM - puppet last run on mw1174 is CRITICAL: CRITICAL: puppet fail [06:30:51] PROBLEM - puppet last run on mw1141 is CRITICAL: CRITICAL: puppet fail [06:30:53] PROBLEM - puppet last run on mw1197 is CRITICAL: CRITICAL: puppet fail [06:30:53] PROBLEM - puppet last run on mw1140 is CRITICAL: CRITICAL: Puppet has 56 failures [06:30:54] PROBLEM - puppet last run on snapshot1003 is CRITICAL: CRITICAL: puppet fail [06:30:54] PROBLEM - puppet last run on mw1200 is CRITICAL: CRITICAL: puppet fail [06:30:54] PROBLEM - puppet last run on mw1240 is CRITICAL: CRITICAL: Puppet has 70 failures [06:30:54] PROBLEM - puppet last run on mw1187 is CRITICAL: CRITICAL: puppet fail [06:30:54] PROBLEM - puppet last run on db1053 is CRITICAL: CRITICAL: puppet fail [06:30:55] PROBLEM - puppet last run on db1065 is CRITICAL: CRITICAL: puppet fail [06:30:55] PROBLEM - puppet last run on ms-be2003 is CRITICAL: CRITICAL: Puppet has 27 failures [06:30:56] PROBLEM - puppet last run on es2008 is CRITICAL: CRITICAL: puppet fail [06:30:56] PROBLEM - puppet last run on db2019 is CRITICAL: CRITICAL: puppet fail [06:30:57] PROBLEM - puppet last run on mw1038 is CRITICAL: CRITICAL: puppet fail [06:30:57] PROBLEM - puppet last run on ocg1001 is CRITICAL: CRITICAL: puppet fail [06:31:02] PROBLEM - puppet last run on logstash1003 is CRITICAL: CRITICAL: puppet fail [06:31:03] PROBLEM - puppet last run on analytics1025 is CRITICAL: CRITICAL: Puppet has 19 failures [06:31:08] PROBLEM - puppet last run on mw1063 is CRITICAL: CRITICAL: puppet fail [06:31:08] PROBLEM - puppet last run on wtp1019 is CRITICAL: CRITICAL: puppet fail [06:31:08] PROBLEM - puppet last run on mw1028 is CRITICAL: CRITICAL: Puppet has 77 failures [06:31:08] PROBLEM - puppet last run on search1016 is CRITICAL: CRITICAL: puppet fail [06:31:08] PROBLEM - puppet last run on mw1106 is CRITICAL: CRITICAL: Puppet has 83 failures [06:31:08] PROBLEM - puppet last run on mw1082 is CRITICAL: CRITICAL: puppet fail [06:31:10] PROBLEM - puppet last run on elastic1031 is CRITICAL: CRITICAL: puppet fail [06:31:17] PROBLEM - puppet last run on wtp1017 is CRITICAL: CRITICAL: puppet fail [06:31:27] PROBLEM - puppet last run on mw1005 is CRITICAL: CRITICAL: Puppet has 52 failures [06:31:28] PROBLEM - puppet last run on cp1049 is CRITICAL: CRITICAL: Puppet has 28 failures [06:31:29] PROBLEM - puppet last run on potassium is CRITICAL: CRITICAL: puppet fail [06:31:34] (03PS2) 10KartikMistry: WIP: Beta: Add support for more language pairs [puppet] - 10https://gerrit.wikimedia.org/r/180724 [06:31:41] PROBLEM - puppet last run on mw1012 is CRITICAL: CRITICAL: Puppet has 59 failures [06:31:42] PROBLEM - puppet last run on rbf1002 is CRITICAL: CRITICAL: Puppet has 37 failures [06:31:58] Phabricator reports it's alive. This is good, I expect. [06:32:03] PROBLEM - puppet last run on platinum is CRITICAL: CRITICAL: Puppet has 17 failures [06:32:08] PROBLEM - puppet last run on mw1006 is CRITICAL: CRITICAL: puppet fail [06:32:08] PROBLEM - puppet last run on mw1031 is CRITICAL: CRITICAL: Puppet has 82 failures [06:32:08] PROBLEM - puppet last run on neptunium is CRITICAL: CRITICAL: puppet fail [06:32:08] PROBLEM - puppet last run on cp1059 is CRITICAL: CRITICAL: Puppet has 20 failures [06:32:08] PROBLEM - puppet last run on caesium is CRITICAL: CRITICAL: puppet fail [06:32:17] PROBLEM - puppet last run on mw1254 is CRITICAL: CRITICAL: puppet fail [06:32:19] PROBLEM - puppet last run on ms-be1006 is CRITICAL: CRITICAL: Puppet has 23 failures [06:32:22] PROBLEM - puppet last run on analytics1040 is CRITICAL: CRITICAL: puppet fail [06:32:23] PROBLEM - puppet last run on netmon1001 is CRITICAL: CRITICAL: Puppet has 34 failures [06:32:27] Unlike the broken puppets. Anyone on that? [06:32:28] PROBLEM - puppet last run on db2034 is CRITICAL: CRITICAL: Puppet has 20 failures [06:32:29] PROBLEM - puppet last run on db2009 is CRITICAL: CRITICAL: Puppet has 20 failures [06:32:29] PROBLEM - puppet last run on labstore2001 is CRITICAL: CRITICAL: puppet fail [06:32:40] PROBLEM - puppet last run on carbon is CRITICAL: CRITICAL: puppet fail [06:32:45] PROBLEM - puppet last run on mw1145 is CRITICAL: CRITICAL: puppet fail [06:32:51] PROBLEM - puppet last run on search1010 is CRITICAL: CRITICAL: puppet fail [06:32:52] PROBLEM - puppet last run on searchidx1001 is CRITICAL: CRITICAL: puppet fail [06:32:54] PROBLEM - puppet last run on cp3015 is CRITICAL: CRITICAL: Puppet has 28 failures [06:32:54] PROBLEM - puppet last run on amssq61 is CRITICAL: CRITICAL: puppet fail [06:32:54] PROBLEM - puppet last run on ms-be1003 is CRITICAL: CRITICAL: puppet fail [06:32:54] PROBLEM - puppet last run on amslvs2 is CRITICAL: CRITICAL: Puppet has 22 failures [06:32:54] PROBLEM - puppet last run on cp1055 is CRITICAL: CRITICAL: puppet fail [06:32:55] PROBLEM - puppet last run on lvs1002 is CRITICAL: CRITICAL: puppet fail [06:32:55] PROBLEM - puppet last run on gold is CRITICAL: CRITICAL: puppet fail [06:32:56] PROBLEM - puppet last run on wtp1006 is CRITICAL: CRITICAL: puppet fail [06:33:07] PROBLEM - puppet last run on cp1047 is CRITICAL: CRITICAL: puppet fail [06:33:13] PROBLEM - puppet last run on analytics1017 is CRITICAL: CRITICAL: Puppet has 20 failures [06:33:20] PROBLEM - puppet last run on db1031 is CRITICAL: CRITICAL: Puppet has 23 failures [06:33:20] PROBLEM - puppet last run on mw1041 is CRITICAL: CRITICAL: Puppet has 72 failures [06:33:20] PROBLEM - puppet last run on mw1048 is CRITICAL: CRITICAL: Puppet has 71 failures [06:33:21] PROBLEM - puppet last run on mw1250 is CRITICAL: CRITICAL: puppet fail [06:33:38] PROBLEM - puppet last run on db1073 is CRITICAL: CRITICAL: puppet fail [06:33:38] PROBLEM - puppet last run on cp1039 is CRITICAL: CRITICAL: Puppet has 20 failures [06:33:38] PROBLEM - puppet last run on mw1026 is CRITICAL: CRITICAL: Puppet has 22 failures [06:33:38] PROBLEM - puppet last run on elastic1004 is CRITICAL: CRITICAL: puppet fail [06:33:53] PROBLEM - puppet last run on mc1006 is CRITICAL: CRITICAL: puppet fail [06:33:55] PROBLEM - puppet last run on ms-fe2004 is CRITICAL: CRITICAL: puppet fail [06:34:13] PROBLEM - puppet last run on analytics1020 is CRITICAL: CRITICAL: Puppet has 11 failures [06:34:15] PROBLEM - puppet last run on db2005 is CRITICAL: CRITICAL: puppet fail [06:34:16] PROBLEM - puppet last run on db2039 is CRITICAL: CRITICAL: Puppet has 18 failures [06:34:16] PROBLEM - puppet last run on ms-be2006 is CRITICAL: CRITICAL: puppet fail [06:34:16] PROBLEM - puppet last run on amssq32 is CRITICAL: CRITICAL: puppet fail [06:34:24] PROBLEM - puppet last run on amssq53 is CRITICAL: CRITICAL: puppet fail [06:34:24] PROBLEM - puppet last run on amssq35 is CRITICAL: CRITICAL: Puppet has 1 failures [06:34:24] PROBLEM - puppet last run on analytics1041 is CRITICAL: CRITICAL: puppet fail [06:34:24] PROBLEM - puppet last run on es1008 is CRITICAL: CRITICAL: puppet fail [06:34:34] PROBLEM - puppet last run on elastic1001 is CRITICAL: CRITICAL: Puppet has 22 failures [06:34:35] PROBLEM - puppet last run on wtp1020 is CRITICAL: CRITICAL: puppet fail [06:34:46] PROBLEM - puppet last run on mw1160 is CRITICAL: CRITICAL: Puppet has 95 failures [06:34:46] PROBLEM - puppet last run on amssq49 is CRITICAL: CRITICAL: Puppet has 31 failures [06:34:59] (03PS1) 10Rush: RT: fix redirect emails for old queues [puppet] - 10https://gerrit.wikimedia.org/r/180725 [06:35:25] PROBLEM - puppet last run on lvs2004 is CRITICAL: CRITICAL: Puppet has 1 failures [06:35:45] PROBLEM - puppet last run on db2018 is CRITICAL: CRITICAL: Puppet has 1 failures [06:36:14] PROBLEM - puppet last run on cp3003 is CRITICAL: CRITICAL: Puppet has 2 failures [06:36:20] (03PS2) 10Rush: RT: fix redirect emails for old queues [puppet] - 10https://gerrit.wikimedia.org/r/180725 [06:36:24] (03CR) 10Rush: [C: 032] RT: fix redirect emails for old queues [puppet] - 10https://gerrit.wikimedia.org/r/180725 (owner: 10Rush) [06:36:39] PROBLEM - puppet last run on mw1235 is CRITICAL: CRITICAL: Puppet has 1 failures [06:36:49] (03CR) 10Rush: [V: 032] RT: fix redirect emails for old queues [puppet] - 10https://gerrit.wikimedia.org/r/180725 (owner: 10Rush) [06:36:54] RECOVERY - puppet last run on cp1066 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [06:37:07] RECOVERY - puppet last run on lvs2002 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [06:37:10] RECOVERY - puppet last run on mw1047 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [06:37:17] RECOVERY - puppet last run on lanthanum is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [06:37:17] RECOVERY - puppet last run on ms-be1010 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [06:37:18] RECOVERY - puppet last run on es2006 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [06:37:18] RECOVERY - puppet last run on es1004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:37:25] RECOVERY - puppet last run on amssq50 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [06:37:30] RECOVERY - puppet last run on mc1015 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [06:37:31] RECOVERY - puppet last run on db1009 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [06:37:39] RECOVERY - puppet last run on db2017 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:37:46] RECOVERY - puppet last run on hydrogen is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:37:46] PROBLEM - puppet last run on db1028 is CRITICAL: CRITICAL: Puppet has 1 failures [06:37:54] RECOVERY - puppet last run on elastic1010 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [06:37:59] RECOVERY - puppet last run on db2010 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:37:59] RECOVERY - puppet last run on amssq37 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [06:37:59] PROBLEM - puppet last run on cp4004 is CRITICAL: CRITICAL: Puppet has 1 failures [06:38:00] RECOVERY - puppet last run on cp3017 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [06:38:00] RECOVERY - puppet last run on cp3019 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [06:38:00] RECOVERY - puppet last run on db1058 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [06:38:00] RECOVERY - puppet last run on lvs3002 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [06:38:01] PROBLEM - puppet last run on cp3016 is CRITICAL: CRITICAL: Puppet has 4 failures [06:38:01] RECOVERY - puppet last run on cp3011 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:38:05] RECOVERY - puppet last run on mw1157 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [06:38:05] RECOVERY - puppet last run on ms-be1014 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [06:38:05] RECOVERY - puppet last run on es1003 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [06:38:14] RECOVERY - puppet last run on mw1128 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [06:38:14] RECOVERY - puppet last run on radon is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:38:15] RECOVERY - puppet last run on mw1058 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [06:38:15] RECOVERY - puppet last run on elastic1009 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [06:38:15] RECOVERY - puppet last run on elastic1016 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [06:38:24] RECOVERY - puppet last run on virt1005 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:38:25] RECOVERY - puppet last run on mw1182 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [06:38:25] RECOVERY - puppet last run on mw1230 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [06:38:26] RECOVERY - puppet last run on labsdb1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:38:26] RECOVERY - puppet last run on cp3021 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:38:26] RECOVERY - puppet last run on cp4015 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [06:38:26] RECOVERY - puppet last run on mw1085 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [06:38:34] RECOVERY - puppet last run on es1005 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [06:38:38] RECOVERY - puppet last run on mw1021 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:38:38] RECOVERY - puppet last run on search1003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:38:44] RECOVERY - puppet last run on cp1064 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:38:48] RECOVERY - puppet last run on palladium is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:38:48] RECOVERY - puppet last run on mw1019 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [06:38:54] RECOVERY - puppet last run on cp1057 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [06:38:55] RECOVERY - puppet last run on elastic1013 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:38:55] RECOVERY - puppet last run on virt1009 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:39:05] RECOVERY - puppet last run on mw1199 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:39:06] RECOVERY - puppet last run on erbium is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [06:39:24] RECOVERY - puppet last run on db1005 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:39:28] RECOVERY - puppet last run on analytics1024 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:39:34] RECOVERY - puppet last run on elastic1029 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:39:34] RECOVERY - puppet last run on ms-be1005 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [06:39:34] RECOVERY - puppet last run on mw1073 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:39:34] RECOVERY - puppet last run on cp4010 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:39:45] RECOVERY - puppet last run on ms-be2010 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:39:46] RECOVERY - puppet last run on ms-be2009 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:39:52] RECOVERY - puppet last run on mw1234 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [06:39:55] RECOVERY - puppet last run on labsdb1007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:39:55] RECOVERY - puppet last run on mw1015 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:39:55] RECOVERY - puppet last run on mw1137 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:39:55] RECOVERY - puppet last run on analytics1034 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [06:39:55] RECOVERY - puppet last run on search1019 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:39:56] RECOVERY - puppet last run on mw1083 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:39:56] RECOVERY - puppet last run on analytics1019 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [06:39:57] RECOVERY - puppet last run on mw1103 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:39:57] RECOVERY - puppet last run on mw1070 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:39:58] RECOVERY - puppet last run on analytics1029 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:39:58] RECOVERY - puppet last run on db1019 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:40:05] RECOVERY - puppet last run on db1068 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:40:06] RECOVERY - puppet last run on mw1127 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:40:06] RECOVERY - puppet last run on elastic1028 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [06:40:07] RECOVERY - puppet last run on search1009 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:40:07] RECOVERY - puppet last run on wtp1014 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:40:07] RECOVERY - puppet last run on ms-be3004 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:40:16] RECOVERY - puppet last run on mw1018 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:40:20] RECOVERY - puppet last run on lvs1006 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:40:22] RECOVERY - puppet last run on cp4016 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:40:22] RECOVERY - puppet last run on cp3005 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:40:26] RECOVERY - puppet last run on tmh1001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:40:26] RECOVERY - puppet last run on tmh1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:40:26] RECOVERY - puppet last run on mw1013 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [06:40:26] RECOVERY - puppet last run on mw1102 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:40:26] RECOVERY - puppet last run on mw1095 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:40:27] RECOVERY - puppet last run on cp1051 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:40:27] RECOVERY - puppet last run on db1010 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [06:40:36] RECOVERY - puppet last run on mw1216 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [06:40:37] RECOVERY - puppet last run on mw1194 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:40:38] RECOVERY - puppet last run on lvs2005 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:40:38] RECOVERY - puppet last run on cp4011 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:40:38] RECOVERY - puppet last run on bast2001 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [06:40:46] RECOVERY - puppet last run on ytterbium is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [06:40:49] RECOVERY - puppet last run on cp3007 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:40:49] RECOVERY - puppet last run on search1008 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:40:58] RECOVERY - puppet last run on pollux is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:41:10] RECOVERY - puppet last run on cp1065 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:41:10] RECOVERY - puppet last run on labsdb1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:41:10] RECOVERY - puppet last run on mw1244 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:41:10] RECOVERY - puppet last run on mw1179 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:41:10] RECOVERY - puppet last run on mw1196 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:41:11] RECOVERY - puppet last run on ms-be2015 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:41:11] RECOVERY - puppet last run on ocg1002 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:41:18] RECOVERY - puppet last run on mw1147 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [06:41:19] RECOVERY - puppet last run on mw1075 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:41:19] RECOVERY - puppet last run on mw1078 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:41:19] RECOVERY - puppet last run on virt1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:41:19] RECOVERY - puppet last run on search1020 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [06:41:28] RECOVERY - puppet last run on mc1004 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [06:41:31] RECOVERY - puppet last run on uranium is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [06:41:31] RECOVERY - puppet last run on mw1035 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:41:34] RECOVERY - puppet last run on db1049 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:41:36] RECOVERY - puppet last run on mw1109 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [06:41:36] RECOVERY - puppet last run on dbstore1001 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [06:41:36] RECOVERY - puppet last run on rdb1004 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [06:41:38] RECOVERY - puppet last run on neon is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [06:41:38] RECOVERY - puppet last run on mw1252 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:41:38] RECOVERY - puppet last run on mw1256 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [06:41:49] RECOVERY - puppet last run on mw1020 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:41:49] RECOVERY - puppet last run on analytics1039 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:41:54] RECOVERY - puppet last run on ms-be1013 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [06:41:54] RECOVERY - puppet last run on mw1017 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:41:54] RECOVERY - puppet last run on cp4017 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:41:54] RECOVERY - puppet last run on db2035 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:41:54] RECOVERY - puppet last run on cp4013 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:41:55] RECOVERY - puppet last run on amslvs4 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:41:55] RECOVERY - puppet last run on mw1101 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:41:56] RECOVERY - puppet last run on mc1016 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [06:41:56] RECOVERY - puppet last run on amssq52 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:41:57] RECOVERY - puppet last run on amssq45 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:41:57] RECOVERY - puppet last run on cp3022 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [06:41:58] RECOVERY - puppet last run on ms-be1001 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [06:41:58] RECOVERY - puppet last run on mw1040 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [06:41:59] RECOVERY - puppet last run on mw1124 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [06:41:59] RECOVERY - puppet last run on mw1138 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:42:00] RECOVERY - puppet last run on cp3013 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [06:42:10] RECOVERY - puppet last run on mc1008 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:42:11] RECOVERY - puppet last run on mw1178 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [06:42:11] RECOVERY - puppet last run on mw1184 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:42:11] RECOVERY - puppet last run on mw1036 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:42:11] RECOVERY - puppet last run on mw1169 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:42:11] RECOVERY - puppet last run on mw1094 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:42:11] RECOVERY - puppet last run on mw1240 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [06:42:11] RECOVERY - puppet last run on mw1232 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:42:12] RECOVERY - puppet last run on es1009 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:42:12] RECOVERY - puppet last run on db1024 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:42:18] RECOVERY - puppet last run on db1053 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [06:42:21] RECOVERY - puppet last run on db1065 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [06:42:21] RECOVERY - puppet last run on mw1136 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:42:28] RECOVERY - puppet last run on mw1038 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [06:42:28] RECOVERY - puppet last run on lvs4001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:42:28] RECOVERY - puppet last run on mc1010 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:42:28] RECOVERY - puppet last run on cp1043 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:42:29] RECOVERY - puppet last run on search1021 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:42:29] RECOVERY - puppet last run on ocg1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:42:29] RECOVERY - puppet last run on es1006 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:42:29] RECOVERY - puppet last run on logstash1003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:42:43] RECOVERY - puppet last run on analytics1036 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:42:45] RECOVERY - puppet last run on cp1069 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:42:45] RECOVERY - puppet last run on mw1130 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:42:45] RECOVERY - puppet last run on db1007 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:42:46] RECOVERY - puppet last run on mw1063 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [06:42:46] RECOVERY - puppet last run on mw1191 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:42:46] RECOVERY - puppet last run on wtp1019 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:42:46] RECOVERY - puppet last run on search1014 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:42:47] RECOVERY - puppet last run on mw1028 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:42:47] RECOVERY - puppet last run on wtp1017 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:42:48] RECOVERY - puppet last run on amssq58 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:42:48] RECOVERY - puppet last run on elastic1031 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:42:49] RECOVERY - puppet last run on mw1005 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:42:59] RECOVERY - puppet last run on mw1012 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [06:43:00] RECOVERY - puppet last run on mw1221 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:43:00] RECOVERY - puppet last run on mw1214 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:43:00] RECOVERY - puppet last run on db1041 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [06:43:00] RECOVERY - puppet last run on rbf1002 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [06:43:00] RECOVERY - puppet last run on zirconium is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:43:10] RECOVERY - puppet last run on mw1245 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:43:10] RECOVERY - puppet last run on mw1096 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:43:11] RECOVERY - puppet last run on platinum is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [06:43:19] RECOVERY - puppet last run on wtp1024 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:43:24] RECOVERY - puppet last run on neptunium is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [06:43:25] RECOVERY - puppet last run on cp1059 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:43:25] RECOVERY - puppet last run on analytics1015 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:43:28] RECOVERY - puppet last run on mw1006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:43:28] RECOVERY - puppet last run on mw1246 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:43:28] RECOVERY - puppet last run on mw1031 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:43:40] RECOVERY - puppet last run on caesium is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [06:43:41] RECOVERY - puppet last run on mw1257 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:43:42] RECOVERY - puppet last run on mw1254 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [06:43:42] RECOVERY - puppet last run on ms-be1006 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [06:43:42] RECOVERY - puppet last run on analytics1040 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [06:43:42] RECOVERY - puppet last run on mw1218 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:43:42] RECOVERY - puppet last run on netmon1001 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [06:43:42] RECOVERY - puppet last run on db2034 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [06:43:43] RECOVERY - puppet last run on labstore2001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:43:43] RECOVERY - puppet last run on db2009 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [06:43:44] RECOVERY - puppet last run on cp3006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:43:44] RECOVERY - puppet last run on mw1161 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:43:45] RECOVERY - puppet last run on elastic1026 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:44:00] RECOVERY - puppet last run on searchidx1001 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [06:44:00] RECOVERY - puppet last run on mw1062 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:44:00] RECOVERY - puppet last run on carbon is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:44:00] RECOVERY - puppet last run on amssq57 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:44:00] RECOVERY - puppet last run on amssq44 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:44:01] RECOVERY - puppet last run on mw1145 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:44:01] RECOVERY - puppet last run on mw1080 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:44:02] RECOVERY - puppet last run on vanadium is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:44:02] RECOVERY - puppet last run on lvs4002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:44:12] RECOVERY - puppet last run on dbproxy1002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:44:12] RECOVERY - puppet last run on ms-be1003 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [06:44:12] RECOVERY - puppet last run on cp3015 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:44:12] RECOVERY - puppet last run on amslvs2 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:44:12] RECOVERY - puppet last run on cp1055 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [06:44:12] RECOVERY - puppet last run on lvs1002 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [06:44:12] RECOVERY - puppet last run on wtp1021 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:44:13] RECOVERY - puppet last run on mw1233 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:44:13] RECOVERY - puppet last run on gold is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:44:18] RECOVERY - puppet last run on wtp1006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:44:24] RECOVERY - puppet last run on mw1132 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:44:24] RECOVERY - puppet last run on cp1047 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [06:44:24] RECOVERY - puppet last run on analytics1017 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:44:24] RECOVERY - puppet last run on mc1011 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:44:24] RECOVERY - puppet last run on mw1115 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:44:24] RECOVERY - puppet last run on db1031 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:44:24] RECOVERY - puppet last run on mw1041 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:44:25] RECOVERY - puppet last run on mw1048 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:44:28] RECOVERY - puppet last run on mw1250 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [06:44:29] RECOVERY - puppet last run on ms-fe1003 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:44:29] RECOVERY - puppet last run on mc1009 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:44:29] RECOVERY - puppet last run on pc1001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:44:40] RECOVERY - puppet last run on cp1039 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:44:41] RECOVERY - puppet last run on mw1134 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:44:43] RECOVERY - puppet last run on cp4007 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:44:45] RECOVERY - puppet last run on db1073 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:45:00] RECOVERY - puppet last run on mw1007 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:45:00] RECOVERY - puppet last run on mw1026 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:45:00] RECOVERY - puppet last run on elastic1004 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:45:00] RECOVERY - puppet last run on mw1067 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:45:08] RECOVERY - puppet last run on es1001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:45:08] RECOVERY - puppet last run on cp3008 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:45:08] RECOVERY - puppet last run on mw1224 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:45:09] RECOVERY - puppet last run on cp1070 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:45:09] RECOVERY - puppet last run on mw1072 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:45:24] RECOVERY - puppet last run on db1029 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:45:24] RECOVERY - puppet last run on ms-be1004 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:45:28] RECOVERY - puppet last run on mw1059 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:45:28] RECOVERY - puppet last run on wtp1009 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:45:28] RECOVERY - puppet last run on mc1006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:45:28] RECOVERY - puppet last run on ms-fe2004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:45:28] RECOVERY - puppet last run on mw1089 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:45:40] RECOVERY - puppet last run on amssq32 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:45:40] RECOVERY - puppet last run on cp3020 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:45:40] RECOVERY - puppet last run on analytics1033 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:45:40] RECOVERY - puppet last run on analytics1020 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:45:40] RECOVERY - puppet last run on ms-be2013 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:45:40] RECOVERY - puppet last run on db2005 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:45:41] RECOVERY - puppet last run on db2039 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:45:41] RECOVERY - puppet last run on ms-be2006 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:45:42] RECOVERY - puppet last run on ms-be2002 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:45:42] RECOVERY - puppet last run on cp4006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:45:43] RECOVERY - puppet last run on amssq54 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:45:43] RECOVERY - puppet last run on amssq59 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:45:44] RECOVERY - puppet last run on logstash1001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:45:44] RECOVERY - puppet last run on amssq43 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:45:49] RECOVERY - puppet last run on amssq53 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:45:49] RECOVERY - puppet last run on amssq35 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [06:45:50] RECOVERY - puppet last run on cp3016 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [06:45:51] RECOVERY - puppet last run on analytics1041 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:45:51] RECOVERY - puppet last run on elastic1007 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:45:51] RECOVERY - puppet last run on es1008 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:45:51] RECOVERY - puppet last run on mw1174 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:45:51] RECOVERY - puppet last run on elastic1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:45:51] RECOVERY - puppet last run on mw1045 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:45:52] RECOVERY - puppet last run on mw1197 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:45:52] RECOVERY - puppet last run on mw1141 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:45:59] RECOVERY - puppet last run on snapshot1003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:46:02] RECOVERY - puppet last run on mw1200 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:46:02] RECOVERY - puppet last run on mw1187 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:46:03] RECOVERY - puppet last run on wtp1020 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:46:03] RECOVERY - puppet last run on mw1140 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:46:03] RECOVERY - puppet last run on es2008 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:46:03] RECOVERY - puppet last run on db2019 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:46:03] RECOVERY - puppet last run on ms-be2003 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:46:11] RECOVERY - puppet last run on mw1160 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:46:11] RECOVERY - puppet last run on amssq49 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:46:19] RECOVERY - puppet last run on analytics1025 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:46:22] RECOVERY - puppet last run on search1016 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:46:22] RECOVERY - puppet last run on mw1106 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:46:22] RECOVERY - puppet last run on mw1082 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:46:30] RECOVERY - puppet last run on cp1049 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:46:38] RECOVERY - puppet last run on potassium is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:47:03] RECOVERY - puppet last run on lvs2004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:24] RECOVERY - puppet last run on db2018 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:47] RECOVERY - puppet last run on search1010 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:48:02] RECOVERY - puppet last run on amssq61 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:48:03] RECOVERY - puppet last run on cp3003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:48:05] RECOVERY - puppet last run on mw1235 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:49:20] RECOVERY - puppet last run on db1028 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:49:25] RECOVERY - puppet last run on cp4004 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [06:57:09] RECOVERY - puppet last run on db1027 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [06:58:09] (03PS1) 10Rush: RT: update phab aliases [puppet] - 10https://gerrit.wikimedia.org/r/180727 [06:58:18] (03CR) 10jenkins-bot: [V: 04-1] RT: update phab aliases [puppet] - 10https://gerrit.wikimedia.org/r/180727 (owner: 10Rush) [06:58:52] (03PS2) 10Rush: RT: update phab aliases [puppet] - 10https://gerrit.wikimedia.org/r/180727 [06:59:14] (03CR) 10Rush: [C: 032 V: 032] RT: update phab aliases [puppet] - 10https://gerrit.wikimedia.org/r/180727 (owner: 10Rush) [07:03:01] (03PS1) 10Rush: phab: update direct mail routes [puppet] - 10https://gerrit.wikimedia.org/r/180728 [07:03:11] (03PS2) 10Rush: phab: update direct mail routes [puppet] - 10https://gerrit.wikimedia.org/r/180728 [07:04:50] (03CR) 10Rush: [C: 032] phab: update direct mail routes [puppet] - 10https://gerrit.wikimedia.org/r/180728 (owner: 10Rush) [07:15:27] (03CR) 10Steinsplitter: [C: 031] Removed 'OTRS-member' user group on commons. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180560 (owner: 10Dereckson) [07:52:39] PROBLEM - puppet last run on cp3014 is CRITICAL: CRITICAL: puppet fail [07:56:14] RECOVERY - puppet last run on ms1004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:56:44] _joe_: let me know when you have time to talk about hhvm/tools [07:56:50] toollabs that is [07:58:15] RECOVERY - puppet last run on ruthenium is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [07:58:55] RECOVERY - puppet last run on cp3014 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [07:59:38] <_joe_> YuviPanda: today I have to do a cleanup of the RT queue on phabricator, rebuild HHVM, backport java 8 to trusty [07:59:48] <_joe_> I'm sure I have plenty of time for you :) [08:00:03] <_joe_> jokes aside, what about in 1 hour? [08:00:06] _joe_: heh :D This isn’t very high priority at all, so whenever :) [08:00:08] _joe_: sure! [08:00:16] RECOVERY - puppet last run on dysprosium is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [08:00:35] _joe_: also let me know how java8 on trusty goes, there have been requests for it on tools as well [08:01:02] <_joe_> YuviPanda: oh so I could outsource that to you [08:01:06] <_joe_> [08:01:18] _joe_: :P hahaha, yeaaaaah, no :P [08:05:48] RECOVERY - DPKG on lanthanum is OK: All packages OK [08:17:34] greetings [08:19:55] <_joe_> ciao godog [08:20:24] ciao _joe_ [08:23:05] buongiorno a tutti [08:25:44] <_joe_> hashar: my step-daughter thinks escargots are gross. I failed as an educator. [08:26:13] bonjour hashar [08:27:54] _joe_: well it is mostly butter and garlic anyway. You should send her to a chef camp here in france :D [08:28:19] <_joe_> hashar: we eat snails differently [08:28:48] don't tell me you boil them with sugar and some mint sauce! [08:30:04] anyway on a more serious note, I have some patch pending to get hhvm deployed on the ci slaves [08:30:13] with ensure => latest (evil me) [08:30:43] Faidon pointed at Debian unattended upgrade system which I discovered overnight. Do we have any puppet integration for that? [08:32:25] <_joe_> hashar: no idea [08:32:37] <_joe_> I'm doing the human traffic light right now [08:32:46] sounds like jenkins [08:32:51] green red green red green red [08:32:53] <_joe_> re-assigning tickets [08:33:09] <_joe_> hashar: this ticket to ops A, this other one to ops B [08:33:40] paravoid: do you have some spare cycles to talk about hhvm => latest on ci slaves? https://gerrit.wikimedia.org/r/#/c/178806/ :D [08:33:48] what's to talk about? [08:33:50] don't do that :) [08:34:21] (03PS1) 10Faidon Liambotis: misc::maintainenance: fix a bunch of cronspam [puppet] - 10https://gerrit.wikimedia.org/r/180733 [08:34:32] <_joe_> \o/ [08:34:36] well our point is to avoid having to connect on each slaves to apt-get install hhvm manually each time a new package is pushed [08:34:41] hence delegating to puppet :D [08:34:53] (03CR) 10Faidon Liambotis: [C: 032] misc::maintainenance: fix a bunch of cronspam [puppet] - 10https://gerrit.wikimedia.org/r/180733 (owner: 10Faidon Liambotis) [08:35:24] pfft typo [08:35:38] (03PS2) 10Faidon Liambotis: misc::maintenance: fix a bunch of cronspam [puppet] - 10https://gerrit.wikimedia.org/r/180733 [08:40:22] PROBLEM - HHVM busy threads on mw1191 is CRITICAL: CRITICAL: 77.78% of data above the critical threshold [115.2] [08:40:58] PROBLEM - HHVM queue size on mw1191 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [80.0] [08:41:21] <_joe_> looks like it's stuck [08:41:29] <_joe_> well, later I'll take a look [08:47:35] (03PS1) 10Faidon Liambotis: puppetmaster: fix logrotate for cronspam [puppet] - 10https://gerrit.wikimedia.org/r/180734 [08:47:37] (03PS1) 10Faidon Liambotis: contint: fix qunit apache config syntax error [puppet] - 10https://gerrit.wikimedia.org/r/180735 [08:48:01] (03CR) 10Faidon Liambotis: [C: 032 V: 032] puppetmaster: fix logrotate for cronspam [puppet] - 10https://gerrit.wikimedia.org/r/180734 (owner: 10Faidon Liambotis) [08:48:20] (03CR) 10Faidon Liambotis: [C: 032 V: 032] contint: fix qunit apache config syntax error [puppet] - 10https://gerrit.wikimedia.org/r/180735 (owner: 10Faidon Liambotis) [08:56:16] (03CR) 10Hoo man: "Any news on this?" [puppet] - 10https://gerrit.wikimedia.org/r/152724 (owner: 10Hoo man) [08:56:24] (03CR) 10TTO: "What are the contents of interwiki-labs.cdb? Moreover, what's the reason for this change?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/175755 (owner: 10Reedy) [08:56:56] (03CR) 10TTO: "Ignore that, didn't see this has a task attached" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/175755 (owner: 10Reedy) [08:59:38] (03PS1) 10Andrew Bogott: Removed --no-cache from our nova commandline. [puppet] - 10https://gerrit.wikimedia.org/r/180736 [09:01:07] (03CR) 10Andrew Bogott: [C: 032] Removed --no-cache from our nova commandline. [puppet] - 10https://gerrit.wikimedia.org/r/180736 (owner: 10Andrew Bogott) [09:02:43] godog: is there a way for me to tell if a given version of a package is finalized in jesse or still subject to change? We're running into bugs with the lvm2 version, I'm hoping that it'll just get fixed on its own if I wait. [09:02:52] (Bug is rumored to be fixed in recent versions.) [09:04:47] andrewbogott: every package is potentially up to be fixed until jessie is released, depends on how severe the bug is affecting the package [09:05:17] godog: ok. Is there a specific bugbase where I can log a 'please patch this' request? [09:06:08] andrewbogott: yep http://bugs.debian.org/src:lvm2 is a good start to find the bug [09:06:15] (03CR) 10Hashar: "I am fine dropping the $ensure_packages from the hhvm class. I have discovered Debian unattended upgrade functionality overnight, but I ha" [puppet] - 10https://gerrit.wikimedia.org/r/178806 (owner: 10Hashar) [09:07:05] _joe_: any chance to get the new hhvm package available today? :D [09:07:25] godog: thanks, I will read. [09:10:13] andrewbogott: lvm issues in jessie I take it? it is indeed a much more recent version than ubuntu's (bumped into that when trying out dm-cache) [09:10:47] godog: yeah -- pretty much no matter what we do we bounce into 'Emergency mode!' when lvm starts up [09:11:06] I don't know a lot, though, Coren was doing most of the research [09:11:30] <_joe_> hashar: *maybe* [09:11:54] <_joe_> we have the RT migration completed and I'm on clinic duty [09:12:06] :-( [09:12:28] andrewbogott: I see, no never came across that yet [09:13:07] godog: in case you enjoy syslogs, here it is: https://tools.wmflabs.org/paste/view/0dd299e2 [09:13:11] !log enabled MediaWiki core 'structure' PHPUnit tests for all extensions. Will require folks to fix their incorrect AutoLoader and RessourceLoader entries. {{gerrit|180496}} {{bug|T78798}} [09:13:14] Logged the message, Master [09:16:37] andrewbogott: doesn't ring a bell sorry :( as a data point though my home workstation has jessie with systemd on an encrypted root with lvm on top (and it boots) [09:16:54] godog: ok, so it's at least possible for it to work :) [09:17:57] yep, it isn't horribly broken at least for that use case [09:28:45] RECOVERY - RAID on heze is OK: OK: optimal, 1 logical, 12 physical [09:28:54] RECOVERY - configured eth on heze is OK: NRPE: Unable to read output [09:29:20] RECOVERY - dhclient process on heze is OK: PROCS OK: 0 processes with command name dhclient [09:30:51] RECOVERY - DPKG on heze is OK: All packages OK [09:31:08] RECOVERY - Disk space on heze is OK: DISK OK [09:31:29] RECOVERY - salt-minion processes on heze is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [09:49:00] (03CR) 10Filippo Giunchedi: [C: 031] Make `es-tool ban-node` handle both IP addressses and hostnames [puppet] - 10https://gerrit.wikimedia.org/r/180210 (owner: 10Chad) [09:57:58] PROBLEM - HHVM queue size on mw1191 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [80.0] [10:00:08] (03PS1) 10Alexandros Kosiaris: Distinguish bacula Storage devices based on hostname [puppet] - 10https://gerrit.wikimedia.org/r/180746 [10:01:42] (03CR) 10Alexandros Kosiaris: [C: 032] Distinguish bacula Storage devices based on hostname [puppet] - 10https://gerrit.wikimedia.org/r/180746 (owner: 10Alexandros Kosiaris) [10:03:54] RECOVERY - puppet last run on helium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:14:02] (03PS1) 10Alexandros Kosiaris: Monitor bacula director/sd processes [puppet] - 10https://gerrit.wikimedia.org/r/180749 [10:16:44] (03CR) 10Alexandros Kosiaris: [C: 032] Monitor bacula director/sd processes [puppet] - 10https://gerrit.wikimedia.org/r/180749 (owner: 10Alexandros Kosiaris) [10:18:38] (03PS1) 10Giuseppe Lavagetto: admin: move access rights from ssmith to phuedx [puppet] - 10https://gerrit.wikimedia.org/r/180750 [10:18:54] (03PS2) 10Giuseppe Lavagetto: admin: move access rights from ssmith to phuedx [puppet] - 10https://gerrit.wikimedia.org/r/180750 [10:19:15] (03CR) 10Giuseppe Lavagetto: [C: 032] admin: move access rights from ssmith to phuedx [puppet] - 10https://gerrit.wikimedia.org/r/180750 (owner: 10Giuseppe Lavagetto) [10:37:15] (03PS1) 10Giuseppe Lavagetto: [WMF] New Package Version with various bugfixes [debs/hhvm] - 10https://gerrit.wikimedia.org/r/180752 [10:52:59] (03Draft1) 10Filippo Giunchedi: Import new upstream version from git 1acdff3 [debs/carbon-c-relay] - 10https://gerrit.wikimedia.org/r/180757 [10:53:40] any opinions on versioning scheme for ^? namely importing changes from upstream git [11:03:57] (03CR) 10Hashar: "\O/" [debs/hhvm] - 10https://gerrit.wikimedia.org/r/180752 (owner: 10Giuseppe Lavagetto) [11:09:11] (03PS2) 10Giuseppe Lavagetto: [WMF] New Package Version with various bugfixes [debs/hhvm] - 10https://gerrit.wikimedia.org/r/180752 [11:09:16] <_joe_> hashar: that will still take time [11:13:20] _joe_: sure thing [11:15:37] <_joe_> mmmh tim [11:15:41] <_joe_> err [11:15:57] <_joe_> tim's timelib patch fails to build, I need to take a better look at it [11:23:26] (03PS2) 10Faidon Liambotis: Replace admin::sudo calls with sudo::user/group [puppet] - 10https://gerrit.wikimedia.org/r/180509 [11:23:28] (03PS2) 10Faidon Liambotis: admin::sudo: remove comment support [puppet] - 10https://gerrit.wikimedia.org/r/180508 [11:23:30] (03PS2) 10Faidon Liambotis: sudo: port over linting & sudoers from admin::sudo [puppet] - 10https://gerrit.wikimedia.org/r/180511 [11:23:32] (03PS2) 10Faidon Liambotis: admin: rename "privs" to "privileges" [puppet] - 10https://gerrit.wikimedia.org/r/180510 [11:23:34] (03PS2) 10Faidon Liambotis: sudo: reduce delta between ::group & ::user [puppet] - 10https://gerrit.wikimedia.org/r/180505 [11:23:36] (03PS2) 10Faidon Liambotis: sudo: fold sudo::labs_project into the role class [puppet] - 10https://gerrit.wikimedia.org/r/180504 [11:23:38] (03PS2) 10Faidon Liambotis: admin::sudo: remove privs => [absent] support [puppet] - 10https://gerrit.wikimedia.org/r/180507 [11:23:40] (03PS2) 10Faidon Liambotis: sudo: adjust sudoers for compat with newer sudo [puppet] - 10https://gerrit.wikimedia.org/r/180506 [11:23:42] (03PS2) 10Faidon Liambotis: Move /etc/sudoers from module "admin" to "sudo" [puppet] - 10https://gerrit.wikimedia.org/r/180503 [11:23:44] (03PS2) 10Faidon Liambotis: sudo: move sudo-ldap Package from "ldap" to "sudo" [puppet] - 10https://gerrit.wikimedia.org/r/180502 [11:23:46] (03PS2) 10Faidon Liambotis: admin: remove ::sudo in favor of sudo::user/group [puppet] - 10https://gerrit.wikimedia.org/r/180512 [11:23:48] (03PS2) 10Faidon Liambotis: sudo: recursively manage /etc/sudoers.d [puppet] - 10https://gerrit.wikimedia.org/r/180513 [11:24:08] half of the +1s stuck, the other ones didn't [11:24:47] _joe_: I addressed your comments; want to re-+1 what is not +1ed? [11:24:59] and review the couple of ones that you didn't yesterday? [11:25:07] <_joe_> paravoid: yes, will do [11:25:19] sorry to bug you :) [11:25:33] <_joe_> paravoid: can you wait 10 minutes? [11:25:37] sure [11:30:04] (03PS3) 10Giuseppe Lavagetto: [WMF] New Package Version with various bugfixes [debs/hhvm] - 10https://gerrit.wikimedia.org/r/180752 [11:30:14] (03CR) 10Filippo Giunchedi: Import new upstream version from git 1acdff3 (031 comment) [debs/carbon-c-relay] - 10https://gerrit.wikimedia.org/r/180757 (owner: 10Filippo Giunchedi) [11:30:29] <_joe_> ok, on it [11:31:18] (03CR) 10Faidon Liambotis: [C: 032] Import new upstream version from git 1acdff3 (031 comment) [debs/carbon-c-relay] - 10https://gerrit.wikimedia.org/r/180757 (owner: 10Filippo Giunchedi) [11:32:15] https://gerrit.wikimedia.org/r/#/q/project:operations/puppet+topic:admin-sudo,n,z is the topic URL [11:32:23] but stupid gerrit doesn't show it in dependency order [11:34:54] paravoid: ack, thanks! I went the manual route of git archive etcetc and then gbp import-orig, perhaps there's something more automated than that [11:35:17] I tend to just package off upstream's git [11:35:20] in those cases [11:35:26] doesn't always work great [11:36:44] indeed, I'll try that too, I'm sure this isn't the last time I'll need it [11:37:14] <_joe_> sometimes the gerrit UI goes nuts and shows everything as selected [11:37:19] <_joe_> and I cannot unselect it [11:37:36] (03CR) 10Giuseppe Lavagetto: [C: 031] Move /etc/sudoers from module "admin" to "sudo" [puppet] - 10https://gerrit.wikimedia.org/r/180503 (owner: 10Faidon Liambotis) [11:38:10] (03Draft1) 10Filippo Giunchedi: Imported Upstream version 0.36+git1acdff3 [debs/carbon-c-relay] - 10https://gerrit.wikimedia.org/r/180756 [11:38:16] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] Imported Upstream version 0.36+git1acdff3 [debs/carbon-c-relay] - 10https://gerrit.wikimedia.org/r/180756 (owner: 10Filippo Giunchedi) [11:38:35] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] Import new upstream version from git 1acdff3 [debs/carbon-c-relay] - 10https://gerrit.wikimedia.org/r/180757 (owner: 10Filippo Giunchedi) [11:38:42] (03CR) 10Giuseppe Lavagetto: [C: 031] sudo: port over linting & sudoers from admin::sudo [puppet] - 10https://gerrit.wikimedia.org/r/180511 (owner: 10Faidon Liambotis) [11:39:46] <_joe_> paravoid: re 180513; I do agree, is there any chance some packages we build in-house added some rules to sudoers.d ? [11:40:06] <_joe_> just for the sake of not breaking things [11:41:02] paravoid@serenity:~$ apt-file search /etc/sudoers.d [11:41:02] paravoid@serenity:~$ [11:41:31] hm [11:41:37] scratch that, my jessie laptop says otherwise [11:42:02] did something related to sudo-ldap just get merged? [11:42:07] yes [11:42:09] https://dpaste.de/OEX7 puppet failure on almost all labs hosts again [11:42:09] what did I break? [11:42:16] <_joe_> :) [11:42:24] _joe_: you're right, nova-common ships a sudoers file !@$%!@W#% [11:42:38] YuviPanda: wtf [11:42:45] the private module declares a Package? seriously? [11:43:10] apparently. I’ve never looked at labs/private, ever... [11:43:12] * YuviPanda goes to look [11:43:13] <_joe_> paravoid: niiiice [11:43:22] <_joe_> (both things) [11:44:09] heh, guarded with a if ! defined (Package['sudo-ldap']) { [11:44:13] fixing [11:44:22] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "While I do agree in principle, in practice there are packages that do install things in sudoers.d, so I think this would break things" [puppet] - 10https://gerrit.wikimedia.org/r/180513 (owner: 10Faidon Liambotis) [11:46:54] <_joe_> use require_package in both places [11:46:58] no [11:47:09] the private repo has no business into installing packages [11:47:33] <_joe_> paravoid: true, but I guess someone did that hack for some weird reason [11:47:41] whatever, I fixed it :) [11:47:45] (hopefully) [11:49:51] paravoid: there’s a cycle now [11:49:56] (File[/etc/sudo-ldap.conf] => Class[Ldap::Client::Sudo] => Class[Ldap::Client::Sudo] => File[/etc/sudo-ldap.conf]) [11:50:00] oh ffs [11:50:12] I hate the labs/ldap spaghetti [11:50:21] <_joe_> paravoid: told ya [11:50:23] <_joe_> :) [11:50:32] (03CR) 10Giuseppe Lavagetto: [C: 031] admin::sudo: remove privs => [absent] support [puppet] - 10https://gerrit.wikimedia.org/r/180507 (owner: 10Faidon Liambotis) [11:50:34] oh I've hated it before you even joined us :) [11:50:53] _joe_: get off paravoid’s lawn? :) [11:50:58] haha [11:51:04] :D [11:51:14] <_joe_> my point was only that I learned from experience that when I find an awful hack, it is usually covering a biggest WTF [11:51:21] <_joe_> *bigger [11:51:45] that loop makes no sense to me [11:52:20] (03CR) 10Giuseppe Lavagetto: [C: 031] admin::sudo: remove comment support [puppet] - 10https://gerrit.wikimedia.org/r/180508 (owner: 10Faidon Liambotis) [11:53:14] (03CR) 10Giuseppe Lavagetto: [C: 031] Replace admin::sudo calls with sudo::user/group [puppet] - 10https://gerrit.wikimedia.org/r/180509 (owner: 10Faidon Liambotis) [11:55:12] <_joe_> paravoid: {{done}} [12:00:14] PROBLEM - puppet last run on ruthenium is CRITICAL: CRITICAL: Puppet last ran 4 hours ago [12:04:29] !log upload carbon-c-relay 0.36+git20141218-1 to trusty-wikimedia [12:04:38] Logged the message, Master [12:05:43] (03PS3) 10Faidon Liambotis: Replace admin::sudo calls with sudo::user/group [puppet] - 10https://gerrit.wikimedia.org/r/180509 [12:05:45] (03PS3) 10Faidon Liambotis: admin::sudo: remove comment support [puppet] - 10https://gerrit.wikimedia.org/r/180508 [12:05:47] (03PS3) 10Faidon Liambotis: admin: rename "privs" to "privileges" [puppet] - 10https://gerrit.wikimedia.org/r/180510 [12:05:49] (03PS3) 10Faidon Liambotis: admin::sudo: remove privs => [absent] support [puppet] - 10https://gerrit.wikimedia.org/r/180507 [12:05:50] (rebasing) [12:05:51] (03PS3) 10Faidon Liambotis: admin: remove ::sudo in favor of sudo::user/group [puppet] - 10https://gerrit.wikimedia.org/r/180512 [12:06:18] (03CR) 10Faidon Liambotis: [C: 032] Move /etc/sudoers from module "admin" to "sudo" [puppet] - 10https://gerrit.wikimedia.org/r/180503 (owner: 10Faidon Liambotis) [12:07:02] (03CR) 10Faidon Liambotis: [C: 032] sudo: fold sudo::labs_project into the role class [puppet] - 10https://gerrit.wikimedia.org/r/180504 (owner: 10Faidon Liambotis) [12:07:41] (03CR) 10Faidon Liambotis: [C: 032] sudo: reduce delta between ::group & ::user [puppet] - 10https://gerrit.wikimedia.org/r/180505 (owner: 10Faidon Liambotis) [12:08:13] (03CR) 10Faidon Liambotis: [C: 032] sudo: adjust sudoers for compat with newer sudo [puppet] - 10https://gerrit.wikimedia.org/r/180506 (owner: 10Faidon Liambotis) [12:09:38] let's see what breaks [12:10:13] surprisingly nothing, wow [12:10:34] no offence _joe_ :P [12:11:22] <_joe_> eheh [12:11:35] (03CR) 10Faidon Liambotis: [C: 032] sudo: port over linting & sudoers from admin::sudo [puppet] - 10https://gerrit.wikimedia.org/r/180511 (owner: 10Faidon Liambotis) [12:11:43] (03CR) 10Faidon Liambotis: [C: 032] admin::sudo: remove privs => [absent] support [puppet] - 10https://gerrit.wikimedia.org/r/180507 (owner: 10Faidon Liambotis) [12:11:51] (03CR) 10Faidon Liambotis: [C: 032] admin::sudo: remove comment support [puppet] - 10https://gerrit.wikimedia.org/r/180508 (owner: 10Faidon Liambotis) [12:12:01] (03CR) 10Faidon Liambotis: [C: 032] Replace admin::sudo calls with sudo::user/group [puppet] - 10https://gerrit.wikimedia.org/r/180509 (owner: 10Faidon Liambotis) [12:12:02] <_joe_> can I ask you some help with some C? I must be missing something [12:12:07] (03CR) 10Faidon Liambotis: [C: 032] admin: rename "privs" to "privileges" [puppet] - 10https://gerrit.wikimedia.org/r/180510 (owner: 10Faidon Liambotis) [12:12:17] <_joe_> well, later maybe :) [12:12:44] yes, give me a few minutes :) [12:15:40] PROBLEM - puppet last run on db1069 is CRITICAL: CRITICAL: Puppet has 1 failures [12:15:47] PROBLEM - puppet last run on cp1048 is CRITICAL: CRITICAL: Puppet has 1 failures [12:16:24] PROBLEM - puppet last run on cp1063 is CRITICAL: CRITICAL: Puppet has 1 failures [12:16:33] <_joe_> Error: /Stage[main]/Admin/File[/etc/sudoers]: Could not evaluate: Could not retrieve information from environment production source(s) puppet:///modules/admin/sudoers [12:16:39] yeah that's false [12:16:43] PROBLEM - puppet last run on labmon1001 is CRITICAL: CRITICAL: Puppet has 1 failures [12:16:46] catalog caching or something [12:16:48] <_joe_> yeah transient failures [12:16:49] PROBLEM - puppet last run on db2004 is CRITICAL: CRITICAL: Puppet has 1 failures [12:16:49] PROBLEM - puppet last run on stat1002 is CRITICAL: CRITICAL: Puppet has 1 failures [12:16:49] PROBLEM - puppet last run on db2001 is CRITICAL: CRITICAL: Puppet has 1 failures [12:16:52] weird though [12:16:56] this keeps happening lately [12:17:00] <_joe_> yes [12:17:08] PROBLEM - puppet last run on mw1190 is CRITICAL: CRITICAL: Puppet has 1 failures [12:17:29] PROBLEM - puppet last run on ms-be1012 is CRITICAL: CRITICAL: Puppet has 1 failures [12:17:30] PROBLEM - puppet last run on oxygen is CRITICAL: CRITICAL: Puppet has 1 failures [12:17:43] PROBLEM - puppet last run on db1020 is CRITICAL: CRITICAL: Puppet has 1 failures [12:17:46] <_joe_> mmmh it's happening on quite a few hosts too [12:18:12] PROBLEM - puppet last run on mw1258 is CRITICAL: CRITICAL: Puppet has 1 failures [12:18:13] PROBLEM - puppet last run on snapshot1004 is CRITICAL: CRITICAL: Puppet has 1 failures [12:18:24] <_joe_> paravoid: ugh, something nasty is happening on one host where I ran puppet manually [12:18:31] PROBLEM - puppet last run on analytics1023 is CRITICAL: CRITICAL: Puppet has 1 failures [12:18:32] stuff is broken in labs too [12:19:03] PROBLEM - puppet last run on thallium is CRITICAL: CRITICAL: Puppet has 1 failures [12:19:03] PROBLEM - puppet last run on rubidium is CRITICAL: CRITICAL: Puppet has 1 failures [12:19:11] PROBLEM - puppet last run on search1017 is CRITICAL: CRITICAL: Puppet has 1 failures [12:19:11] PROBLEM - puppet last run on tungsten is CRITICAL: CRITICAL: Puppet has 1 failures [12:19:12] PROBLEM - puppet last run on mw1181 is CRITICAL: CRITICAL: Puppet has 1 failures [12:19:25] PROBLEM - puppet last run on mw1146 is CRITICAL: CRITICAL: Puppet has 1 failures [12:19:47] PROBLEM - puppet last run on amssq41 is CRITICAL: CRITICAL: Puppet has 1 failures [12:20:14] PROBLEM - puppet last run on analytics1027 is CRITICAL: CRITICAL: puppet fail [12:20:41] <_joe_> Krenair: what's broken in labs? [12:20:55] Well I know that ssh to some deployment hosts times out [12:21:13] <_joe_> which ones? [12:21:42] parsoid04 for example [12:21:47] <_joe_> and, this has _nothing_ to do with ssh [12:21:53] PROBLEM - puppet last run on mw1223 is CRITICAL: CRITICAL: puppet fail [12:22:25] what, the issue you were fighting, or the timeout? [12:22:38] <_joe_> the issue we're looking at [12:22:40] ok [12:22:54] <_joe_> also, deployment-prep has its own puppetmaster [12:29:14] (03PS4) 10Faidon Liambotis: admin: remove ::sudo in favor of sudo::user/group [puppet] - 10https://gerrit.wikimedia.org/r/180512 [12:29:16] (03PS1) 10Faidon Liambotis: sudo: fix sudoers.erb template mishap [puppet] - 10https://gerrit.wikimedia.org/r/180773 [12:29:36] (03CR) 10Faidon Liambotis: [C: 032] sudo: fix sudoers.erb template mishap [puppet] - 10https://gerrit.wikimedia.org/r/180773 (owner: 10Faidon Liambotis) [12:29:43] (03CR) 10Faidon Liambotis: [V: 032] sudo: fix sudoers.erb template mishap [puppet] - 10https://gerrit.wikimedia.org/r/180773 (owner: 10Faidon Liambotis) [12:29:51] <_joe_> uhh [12:29:59] uh? [12:30:06] <_joe_> that simple? [12:30:07] <_joe_> :) [12:30:21] yup [12:34:47] (03CR) 10Faidon Liambotis: [C: 032] admin: remove ::sudo in favor of sudo::user/group [puppet] - 10https://gerrit.wikimedia.org/r/180512 (owner: 10Faidon Liambotis) [12:34:57] RECOVERY - puppet last run on mw1223 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [12:35:39] _joe_: so, I'm suspecting that for large puppet-merges, the window between palladium & strontium is bigger and hence we get these transient failures [12:37:04] RECOVERY - puppet last run on analytics1027 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [12:37:18] any reports about the file description page on Commons causing Firefox and Seamonkey to quit unexpectedly ? [12:45:30] PROBLEM - puppet last run on heze is CRITICAL: CRITICAL: Puppet has 2 failures [12:48:54] <_joe_> paravoid: no I think it is the fact that puppet compilation requires a non-zero time and is highly parallelized, so if a file changes during compilation that causes weird behaviour [12:49:06] <_joe_> or mod_passenger has some broken caching on precise [12:49:12] <_joe_> or both [12:49:40] RECOVERY - puppet last run on labmon1001 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [12:49:43] RECOVERY - puppet last run on db2004 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [12:49:43] RECOVERY - puppet last run on db2001 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [12:50:14] RECOVERY - puppet last run on ms-be1012 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [12:50:15] RECOVERY - puppet last run on mw1190 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [12:50:15] RECOVERY - puppet last run on oxygen is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [12:50:39] RECOVERY - puppet last run on db1020 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [12:51:05] RECOVERY - puppet last run on mw1258 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [12:51:06] RECOVERY - puppet last run on snapshot1004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [12:51:19] RECOVERY - puppet last run on analytics1023 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [12:51:45] RECOVERY - puppet last run on thallium is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [12:51:45] RECOVERY - puppet last run on db1069 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [12:51:49] RECOVERY - puppet last run on rubidium is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [12:52:00] RECOVERY - puppet last run on search1017 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [12:52:00] RECOVERY - puppet last run on tungsten is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [12:52:08] RECOVERY - puppet last run on cp1048 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [12:52:08] RECOVERY - puppet last run on mw1181 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [12:52:19] RECOVERY - puppet last run on cp1063 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [12:52:29] RECOVERY - puppet last run on mw1146 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [12:52:32] PROBLEM - HHVM busy threads on mw1242 is CRITICAL: CRITICAL: 87.50% of data above the critical threshold [115.2] [12:52:51] RECOVERY - puppet last run on amssq41 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [12:52:56] RECOVERY - puppet last run on stat1002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [12:56:35] PROBLEM - HHVM busy threads on mw1243 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [115.2] [13:00:21] PROBLEM - HHVM busy threads on mw1236 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [115.2] [13:00:33] !log salt-cleaning up /etc/sudoers.d/50_* (old naming scheme) [13:00:36] Logged the message, Master [13:01:13] (03PS1) 10Faidon Liambotis: sudo: actually make the lintin safety checks work [puppet] - 10https://gerrit.wikimedia.org/r/180777 [13:01:34] RECOVERY - puppet last run on ruthenium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:01:55] (03PS2) 10Faidon Liambotis: sudo: actually make the linting safety checks work [puppet] - 10https://gerrit.wikimedia.org/r/180777 [13:02:28] (03CR) 10Faidon Liambotis: [C: 032] sudo: actually make the linting safety checks work [puppet] - 10https://gerrit.wikimedia.org/r/180777 (owner: 10Faidon Liambotis) [13:03:47] <_joe_> !log restarted hhvm on mw1191, load at 200 [13:03:49] Logged the message, Master [13:04:50] RECOVERY - Host d-i-test is UP: PING OK - Packet loss = 0%, RTA = 0.73 ms [13:06:33] RECOVERY - HHVM queue size on mw1191 is OK: OK: Less than 30.00% above the threshold [10.0] [13:08:54] RECOVERY - HHVM busy threads on mw1191 is OK: OK: Less than 30.00% above the threshold [76.8] [13:11:22] ACKNOWLEDGEMENT - puppet last run on heze is CRITICAL: CRITICAL: Puppet has 2 failures alexandros kosiaris netapp mount issues, to be fixed in puppet [13:11:35] <_joe_> !log restarted hhvm on mw1242, stuck in getrusage() [13:11:39] Logged the message, Master [13:16:17] PROBLEM - DPKG on d-i-test is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:18:00] RECOVERY - HHVM busy threads on mw1242 is OK: OK: Less than 30.00% above the threshold [76.8] [13:19:37] paravoid YuviPanda I have got some sudo-ldap error on labs :-( [13:19:37] Error: Could not apply complete catalog: Found 1 dependency cycle: [13:19:38] (File[/etc/sudo-ldap.conf] => Class[Ldap::Client::Sudo] => Class[Ldap::Client::Sudo] => File[/etc/sudo-ldap.conf]) [13:22:03] PROBLEM - HHVM busy threads on mw1191 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [115.2] [13:22:23] PROBLEM - HHVM queue size on mw1191 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [80.0] [13:22:42] hashar: yup, paravoid just fixed that. you need to git pull labs/private repo on your puppetmaster as well [13:22:52] ah labs/private [13:22:53] grr [13:23:01] YuviPanda: thx! [13:23:06] hashar: yw [13:25:41] RECOVERY - HHVM queue size on mw1191 is OK: OK: Less than 30.00% above the threshold [10.0] [13:30:32] PROBLEM - HHVM busy threads on mw1253 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [115.2] [13:34:51] PROBLEM - HHVM queue size on mw1191 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [80.0] [13:37:17] PROBLEM - HHVM queue size on mw1253 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [80.0] [13:38:41] PROBLEM - HHVM busy threads on mw1257 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [115.2] [13:38:56] <_joe|lunch> mmmh this is not good [13:40:15] YuviPanda: paravoid: still have a dependency cycle with sudo-ldap. I have pulled both puppet and private repo and restarted the puppetmaster :/ [13:41:40] PROBLEM - HHVM busy threads on mw1258 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [115.2] [13:42:17] <_joe|lunch> !log restarting hhvm on a few servers [13:42:21] Logged the message, Master [13:42:31] <_joe|lunch> hey opsens, whenever you see this alarm ^^, it's important [13:42:37] <_joe|lunch> or it won't be nere [13:42:39] <_joe|lunch> *here [13:47:31] PROBLEM - HHVM busy threads on mw1241 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [115.2] [13:48:17] <_joe|lunch> the warning is a bit late btw [13:49:28] RECOVERY - HHVM busy threads on mw1253 is OK: OK: Less than 30.00% above the threshold [76.8] [13:49:49] RECOVERY - HHVM queue size on mw1253 is OK: OK: Less than 30.00% above the threshold [10.0] [13:50:00] (03CR) 10Hashar: "This actually broke Parsoid on beta. Parsoid05 is not a Jenkins slave and thus does not receive code update after a change is merged . Je" [puppet] - 10https://gerrit.wikimedia.org/r/169622 (owner: 10Catrope) [13:50:59] RECOVERY - HHVM busy threads on mw1258 is OK: OK: Less than 30.00% above the threshold [76.8] [13:51:11] RECOVERY - HHVM busy threads on mw1257 is OK: OK: Less than 30.00% above the threshold [76.8] [13:51:12] RECOVERY - HHVM busy threads on mw1236 is OK: OK: Less than 30.00% above the threshold [76.8] [13:53:44] RECOVERY - DPKG on hafnium is OK: All packages OK [13:53:51] RECOVERY - HHVM busy threads on mw1241 is OK: OK: Less than 30.00% above the threshold [76.8] [13:54:43] !log purged unpuppetized rrdcached from hafnium. It was segfaulting when started via the init script, which led to the package being unconfigured which led to dpkg alerts on icinga [13:54:46] Logged the message, Master [14:01:05] !log EventLogging: deployed 937d804 & restarted EventLogging [14:01:09] Logged the message, Master [14:04:20] PROBLEM - Host d-i-test is DOWN: PING CRITICAL - Packet loss = 100% [14:05:15] what are those d-i-test hosts? :/ [14:05:26] or is there only one. probably that one host. [14:12:32] PROBLEM - HHVM queue size on mw1191 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [80.0] [14:16:12] (03CR) 10Hashar: "Fixed :] See QA mailing list for details" [puppet] - 10https://gerrit.wikimedia.org/r/169622 (owner: 10Catrope) [14:16:58] (03Draft3) 10Filippo Giunchedi: txstatsd: add support for graphite line-protocol [debs/txstatsd] - 10https://gerrit.wikimedia.org/r/180786 [14:19:20] (03CR) 10Alexandros Kosiaris: [C: 031] "For the curious here is a list of packages that install files in /etc/sudoers.d (list taken for jessie)" [puppet] - 10https://gerrit.wikimedia.org/r/180513 (owner: 10Faidon Liambotis) [14:19:36] PROBLEM - HHVM queue size on mw1243 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [80.0] [14:22:45] (03PS2) 10BBlack: Switch public SSL terminators to new unified cert [puppet] - 10https://gerrit.wikimedia.org/r/180325 [14:22:51] PROBLEM - HHVM busy threads on mw1246 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [115.2] [14:23:48] (03CR) 10BBlack: [C: 032] Switch public SSL terminators to new unified cert [puppet] - 10https://gerrit.wikimedia.org/r/180325 (owner: 10BBlack) [14:25:16] PROBLEM - HHVM busy threads on mw1257 is CRITICAL: CRITICAL: 77.78% of data above the critical threshold [115.2] [14:25:51] RECOVERY - HHVM queue size on mw1243 is OK: OK: Less than 30.00% above the threshold [10.0] [14:27:57] <_joe_> oook I'll disable the hotprofiler for now [14:28:30] you could try disabling the xenon stuff first [14:29:21] PROBLEM - HHVM queue size on mw1246 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [80.0] [14:30:33] RECOVERY - HHVM busy threads on mw1243 is OK: OK: Less than 30.00% above the threshold [76.8] [14:34:22] <_joe_> tried [14:34:26] <_joe_> didn't work [14:34:44] <_joe_> the only issue I see is that xhprof won't work [14:36:59] that will be a serious issue ;) [14:37:11] so the xenon stuff didn't make the difference? [14:37:16] that makes me wonder why it just started happening [14:44:39] <_joe_> !log restarting hhvm on mw1191 [14:44:48] Logged the message, Master [14:44:53] <_joe_> mark: it didn't [14:45:04] how did you disable it? [14:45:07] <_joe_> it started happening at the same time on a specific group of servers [14:45:13] <_joe_> the xenon stuff? [14:45:22] <_joe_> by commenting out the ini setting [14:46:10] <_joe_> btw it's happening on the "most trafficked" servers [14:46:14] right [14:46:35] <_joe_> and no, mw1191 will just refuse to work correctly [14:46:45] <_joe_> even when restarting hhvm [14:46:50] <_joe_> it won't recover [14:47:15] <_joe_> so it has to have something to do with the kernel I guess [14:47:54] PROBLEM - HHVM busy threads on mw1245 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [115.2] [14:48:35] _joe_, heh, going through my emails I saw your changes to https://phabricator.wikimedia.org/T84818 - yeah, I think these requests are supposed to be private [14:48:46] so people can say mean things about the requestor :) [14:49:26] <_joe_> Krenair: well we have subtasks that are private to ops now [14:49:31] _joe_: so if we reboot it, does that fix it? ;) [14:49:32] <_joe_> but sorry, working on something else [14:49:36] <_joe_> mark: I guess so [14:49:54] <_joe_> I can confirm that disabling the hotprofiler fixes it [14:50:31] <_joe_> but lemme try turning one of the servers on and off again :P [14:50:36] PROBLEM - HHVM queue size on mw1257 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [80.0] [14:50:52] does disabling xenon / xhprof profiling helps? [14:51:17] hashar: apparently not [14:51:36] <_joe_> mark: if we disable that in mediawiki and the hotprofiler in hhvm, it will [14:53:59] (03PS1) 10Giuseppe Lavagetto: mediawiki: disable temporarily the hhvm hotprofiler [puppet] - 10https://gerrit.wikimedia.org/r/180790 [14:54:03] PROBLEM - HHVM queue size on mw1246 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [80.0] [14:55:03] RECOVERY - HHVM busy threads on mw1191 is OK: OK: Less than 30.00% above the threshold [76.8] [14:55:32] RECOVERY - HHVM queue size on mw1191 is OK: OK: Less than 30.00% above the threshold [10.0] [14:57:20] (03PS1) 10Giuseppe Lavagetto: Temporarily disable xhprof profiling, due to stability issues [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180791 [14:57:28] <_joe_> !log rebooting mw1257 [14:57:35] Logged the message, Master [14:57:39] (03CR) 10Alexandros Kosiaris: "I updated https://phabricator.wikimedia.org/T75506 with some questions I got before we move forward with this" [puppet] - 10https://gerrit.wikimedia.org/r/178419 (owner: 10Catrope) [14:58:00] <_joe_> mark: I'd go with mediawiki-config first [14:58:13] <_joe_> would you care to review https://gerrit.wikimedia.org/r/180791 ? [14:58:34] akosiaris: Sorry about the confusion on https://phabricator.wikimedia.org/T76949 . Whoever filled in for me during SoS yesterday didn't have the latest information apparently [15:00:01] (03Abandoned) 10Catrope: Expose citoid through misc-web [puppet] - 10https://gerrit.wikimedia.org/r/178419 (owner: 10Catrope) [15:00:22] (03CR) 10Mark Bergsma: [C: 031] Temporarily disable xhprof profiling, due to stability issues [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180791 (owner: 10Giuseppe Lavagetto) [15:00:28] It also doesn't help that I forgot to abandon the change or in fact record the outcome of my chat with Gabriel anywhere at all [15:00:32] PROBLEM - HHVM queue size on mw1246 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [80.0] [15:00:45] <_joe_> ok I'm doing it anyway [15:00:56] (03CR) 10Manybubbles: [C: 031] "This leaves us the option to jump back to lsearchd if cirrus blows up which I'm happy to have for the next little while." (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180712 (owner: 10Chad) [15:02:54] RoanKattouw: no worries, I am happy you jumped in to clear that up :-) [15:03:02] RECOVERY - HHVM queue size on mw1257 is OK: OK: Less than 30.00% above the threshold [10.0] [15:04:15] (03CR) 10Manybubbles: [C: 031] "If this'll make issues stop then its probably a good idea." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180791 (owner: 10Giuseppe Lavagetto) [15:04:21] (03CR) 10Hashar: [C: 031] "Per our discussion on IRC, that block of code seems to enable xhprof and take a profile for every single request. Since there is some HHV" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180791 (owner: 10Giuseppe Lavagetto) [15:05:56] (03CR) 10Chad: [C: 032] Temporarily disable xhprof profiling, due to stability issues [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180791 (owner: 10Giuseppe Lavagetto) [15:05:59] (03CR) 10Giuseppe Lavagetto: [V: 032] Temporarily disable xhprof profiling, due to stability issues [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180791 (owner: 10Giuseppe Lavagetto) [15:07:10] PROBLEM - HHVM queue size on mw1245 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [80.0] [15:07:19] !log demon Synchronized wmf-config/StartProfiler.php: disable xhprof-backed flame graphs for now (duration: 00m 05s) [15:07:24] Logged the message, Master [15:07:31] <_joe_> ^d: thanks [15:07:34] <^d> yw [15:08:47] RECOVERY - HHVM busy threads on mw1257 is OK: OK: Less than 30.00% above the threshold [76.8] [15:09:10] <_joe_> ok /now/ restarting hhvm seems to have an effect [15:09:33] thanks ^d [15:09:56] <_joe_> we all thank ^d [15:10:06] * hashar dances [15:12:46] PROBLEM - HHVM busy threads on mw1238 is CRITICAL: CRITICAL: 77.78% of data above the critical threshold [115.2] [15:13:10] PROBLEM - HHVM queue size on mw1245 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [80.0] [15:15:17] RECOVERY - HHVM busy threads on mw1246 is OK: OK: Less than 30.00% above the threshold [76.8] [15:15:25] _joe_: still ^^^ ? :( [15:15:55] <_joe_> I'm restarting hhvm on mw1245 right now [15:15:58] RECOVERY - HHVM queue size on mw1246 is OK: OK: Less than 30.00% above the threshold [10.0] [15:16:05] <_joe_> it should fix igt [15:16:20] PROBLEM - puppet last run on mw1249 is CRITICAL: CRITICAL: Puppet last ran 1 day ago [15:17:48] <_joe_> (and puppet should run again in a second) [15:18:48] RECOVERY - HHVM busy threads on mw1238 is OK: OK: Less than 30.00% above the threshold [76.8] [15:20:51] PROBLEM - HHVM busy threads on mw1240 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [115.2] [15:20:59] <_joe_> what? [15:21:04] yoo bblack~! [15:22:27] RECOVERY - HHVM queue size on mw1245 is OK: OK: Less than 30.00% above the threshold [10.0] [15:24:03] RECOVERY - HHVM busy threads on mw1240 is OK: OK: Less than 30.00% above the threshold [76.8] [15:24:07] <_joe_> :) [15:25:20] RECOVERY - HHVM busy threads on mw1245 is OK: OK: Less than 30.00% above the threshold [76.8] [15:25:47] RECOVERY - puppet last run on mw1249 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:29:02] (03PS1) 10BBlack: stat100x -> analytics vlans [dns] - 10https://gerrit.wikimedia.org/r/180794 [15:30:56] (03CR) 10Ottomata: [C: 031] stat100x -> analytics vlans [dns] - 10https://gerrit.wikimedia.org/r/180794 (owner: 10BBlack) [15:32:19] PROBLEM - HHVM busy threads on mw1191 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [115.2] [15:32:53] PROBLEM - HHVM queue size on mw1191 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [80.0] [15:33:19] (03PS1) 10RobH: setting analytics1001/1002 mac info [puppet] - 10https://gerrit.wikimedia.org/r/180795 [15:34:23] <_joe_> mmmh how nice [15:34:33] <_joe_> happening again there? [15:37:21] (03PS2) 10RobH: setting analytics1001/1002 mac info [puppet] - 10https://gerrit.wikimedia.org/r/180795 [15:39:11] (03CR) 10RobH: [C: 032] setting analytics1001/1002 mac info [puppet] - 10https://gerrit.wikimedia.org/r/180795 (owner: 10RobH) [15:43:22] <_joe_> !log rebooted mw1191 [15:43:27] Logged the message, Master [15:45:03] RECOVERY - HHVM busy threads on mw1191 is OK: OK: Less than 30.00% above the threshold [76.8] [15:45:56] RECOVERY - HHVM queue size on mw1191 is OK: OK: Less than 30.00% above the threshold [10.0] [15:49:02] is something wrong with labs? i've been waiting 42 minutes to be able to login to two instances i created with standard config. [15:51:00] manybubbles, ^d, marktraceur: Who wants to SWAT today? [15:51:19] gi11es: Ping for SWAT in about 9 minutes [15:51:22] I can do! [15:51:24] I haven't in a while [15:51:26] manybubbles: Ok! [15:55:19] (03CR) 10Manybubbles: [C: 031] Disable thumbnail chaining [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180138 (owner: 10Gilles) [16:00:05] manybubbles, anomie, ^d, marktraceur, gi11es: Respected human, time to deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141218T1600). Please do the needful. [16:01:19] gi11es: ping again! [16:02:51] andrewbogott, coren: around? i've now been waiting 58 minutes for two new labs instances to become reachable. [16:03:15] jgage: what names? [16:03:21] ipsec-c3, ipsec-c4 [16:03:35] PROBLEM - Host stat1003 is DOWN: PING CRITICAL - Packet loss = 100% [16:03:37] console log shows them sitting at login prompt but ssh says publickey permission denied [16:03:41] jgage: project? [16:03:45] ipsec [16:03:45] manybubbles: pong [16:03:47] sorry I was stuck in traffic [16:04:01] (03CR) 10Manybubbles: [C: 032] Disable thumbnail chaining [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180138 (owner: 10Gilles) [16:04:06] gi11es: no problem [16:04:12] I'll deploy for you now if you'll check [16:04:26] I shall [16:04:51] (03Merged) 10jenkins-bot: Disable thumbnail chaining [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180138 (owner: 10Gilles) [16:05:00] jgage: I'm just joining the land of the awake now. [16:05:33] thanks dudes. yesterday this took "only" 17 minutes. [16:05:56] PROBLEM - puppet last run on virt1010 is CRITICAL: Timeout while attempting connection [16:06:00] !log manybubbles Synchronized wmf-config/InitialiseSettings-labs.php: SWAT disable thumbnail caching (duration: 00m 05s) [16:06:04] Logged the message, Master [16:06:40] !log manybubbles Synchronized wmf-config/InitialiseSettings.php: SWAT disable thumbnail caching (duration: 00m 05s) [16:06:42] gi11es: ^^^^ [16:06:42] Logged the message, Master [16:06:52] checking... [16:07:05] PROBLEM - HHVM busy threads on mw1191 is CRITICAL: CRITICAL: 90.00% of data above the critical threshold [115.2] [16:07:42] PROBLEM - HHVM queue size on mw1191 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [80.0] [16:07:52] _joe_: ^^ ? [16:08:06] <_joe_> manybubbles: again, fuck. [16:08:42] (03PS1) 10Faidon Liambotis: sudo: (hopefully) fix dependency loop in Labs [puppet] - 10https://gerrit.wikimedia.org/r/180799 [16:08:45] <_joe_> so my problem is - disabling the hotprofiler will effectively solve the problem, but then I get a shitton of errors from hhvm [16:08:47] manybubbles: lgtm [16:09:09] gi11es: great! when the next one merges I'll deploy it [16:09:15] (03CR) 10Faidon Liambotis: [C: 032] sudo: (hopefully) fix dependency loop in Labs [puppet] - 10https://gerrit.wikimedia.org/r/180799 (owner: 10Faidon Liambotis) [16:09:27] _joe_: well that isn't good - are they errors because we're assuming that it isn't disabled? [16:09:48] PROBLEM - Host virt1010 is DOWN: PING CRITICAL - Packet loss = 100% [16:09:55] PROBLEM - Host virt1011 is DOWN: PING CRITICAL - Packet loss = 100% [16:10:03] PROBLEM - Disk space on lanthanum is CRITICAL: DISK CRITICAL - free space: /srv/ssd 6033 MB (3% inode=86%): [16:10:46] ^ virt101[01] intentional? [16:11:04] RECOVERY - Host virt1011 is UP: PING OK - Packet loss = 0%, RTA = 1.14 ms [16:11:19] bblack: I believe so, being provisioned [16:11:27] RECOVERY - Host virt1010 is UP: PING OK - Packet loss = 0%, RTA = 1.90 ms [16:11:33] manybubbles: it looks merged, not sure why grrrit-wm is quiet about it [16:11:51] gi11es: see it. I'll deploy [16:11:53] bblack -yep, intentional but wrong [16:11:59] jgage: I don't know what's wrong yet, but still looking. [16:12:18] I'll take a look at lanthanum [16:13:51] !log manybubbles Synchronized php-1.25wmf12/extensions/MultimediaViewer/: SWAT backport last-modified performance logging for mediaviewer (duration: 00m 05s) [16:13:57] Logged the message, Master [16:14:00] gi11es: ^^ [16:14:18] manybubbles: waiting for data to appear, shouldn't take long [16:14:30] jgage: it looks like puppet didn't run on those instances. I'm going to reboot them and try to force it, is that ok? [16:14:56] anyone looking at some zero errors now? [16:15:05] I see Dec 18 16:14:15 mw1254: [proxy_fcgi:error] [pid 16287] (70014)End of file found: [client 10.64.32.99:25088] AH01075: Error dispatching request to :, referer: http://fr.m.wikipedia.org/w/index.php?title=Sp%C3%A9cial:MobileOptions&returnto=Sp%C3%A9cial%3AZeroRatedMobileAccess [16:15:06] a lot [16:15:12] well, modulo the dates [16:15:38] or maybe not mobile.... [16:15:38] manybubbles: EL data making it through, deploy is all good [16:15:40] thanks! [16:15:44] gi11es: great! [16:20:17] RECOVERY - HHVM queue size on mw1191 is OK: OK: Less than 30.00% above the threshold [10.0] [16:21:12] (03PS1) 10Alexandros Kosiaris: Temporary mountpoints for heze/helium bacula SDs [puppet] - 10https://gerrit.wikimedia.org/r/180804 [16:21:14] (03PS1) 10Alexandros Kosiaris: Permanent mountpoints for heze/helium bacula SDs [puppet] - 10https://gerrit.wikimedia.org/r/180805 [16:21:20] ACKNOWLEDGEMENT - Disk space on lanthanum is CRITICAL: DISK CRITICAL - free space: /srv/ssd 6066 MB (3% inode=86%): Filippo Giunchedi likely jenkins jobs [16:23:00] RECOVERY - HHVM busy threads on mw1191 is OK: OK: Less than 30.00% above the threshold [76.8] [16:23:17] (03CR) 10Alexandros Kosiaris: [C: 04-2] "Blocking for now. For this to be merged, I first have to partition/LVM/format manually the DAS on helium" [puppet] - 10https://gerrit.wikimedia.org/r/180805 (owner: 10Alexandros Kosiaris) [16:23:57] !log demon Synchronized wmf-config/StartProfiler.php: disable normal eqiad profiling (duration: 00m 06s) [16:23:59] (03CR) 10Alexandros Kosiaris: [C: 032] Temporary mountpoints for heze/helium bacula SDs [puppet] - 10https://gerrit.wikimedia.org/r/180804 (owner: 10Alexandros Kosiaris) [16:24:00] <^d> _joe_: ^^^ [16:24:06] Logged the message, Master [16:24:32] (03PS1) 10Chad: Disable all eqiad profiling [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180806 [16:24:43] <_joe_> ^d: ok thanks I can do some tweaking now [16:24:58] (03CR) 10Chad: [C: 032 V: 032] Disable all eqiad profiling [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180806 (owner: 10Chad) [16:25:08] <^d> There's the actual patch I sync'd. [16:25:14] <_joe_> we could probably just disable cpu profiling [16:25:21] <^d> profiling's still on via custom headers, test* hosts, etc. [16:25:25] <_joe_> ok [16:25:26] <^d> But the normal eqiad profiling's off. [16:25:29] <_joe_> yeah, who cares [16:25:48] <_joe_> well [16:25:50] <_joe_> $wgProfiler['class'] = 'ProfilerStandard'; [16:25:57] <_joe_> nope sorry [16:26:00] <_joe_> got it wrong [16:26:15] <^d> ProfilerStandard is MW's homegrown profiler. [16:26:24] <^d> ProfilerXhprof is ... [16:27:50] any idea if I can just remove jenkin's old workspaces on lanthanum? say 30d old [16:28:25] (03PS5) 10Faidon Liambotis: Introduce rack/rackrow facts based on LLDP facts [puppet] - 10https://gerrit.wikimedia.org/r/167645 (owner: 10Alexandros Kosiaris) [16:28:34] hashar: ^ [16:28:52] oh right, I thought hashar was gone already [16:29:10] godog: almost [16:29:13] RECOVERY - configured eth on virt1010 is OK: NRPE: Unable to read output [16:29:16] godog: what is wrong on lanthanum ? [16:29:27] RECOVERY - configured eth on virt1011 is OK: NRPE: Unable to read output [16:29:39] hashar: ssd at 97% utilized [16:30:02] IOW, this http://ganglia.wikimedia.org/latest/graph.php?r=week&z=xlarge&c=Miscellaneous+eqiad&h=lanthanum.eqiad.wmnet&jr=&js=&v=96.0&m=part_max_used&vl=%25&ti=Maximum+Disk+Space+Used [16:30:08] arff [16:30:13] I knew it was gong to happen [16:30:25] I have refactored a lot of jobs over the last month [16:30:28] and each clone a lot of repos [16:31:03] I've seen some dirs with @ prepended, are they safe to be deleted? [16:31:27] so when the same job run twice on the same machine, it indeed uses a @ suffix [16:31:31] they can be removed usually [16:31:39] /srv/ssd/jenkins-slave/workspace [16:31:51] ah there are still the old extension jobs. *-testextension [16:31:59] they are now suffixed with -zend and -hhvm [16:32:07] (03PS2) 10Ottomata: Use stat box internel addresses (.eqiad.wmnet) for everything [puppet] - 10https://gerrit.wikimedia.org/r/180520 [16:32:24] you seem to know what you are doing better than I am :) I was looking for something actionable [16:32:32] !log lanthanum deleting obsoletes jobs: rm -fR /srv/ssd/jenkins-slave/workspace/*-testextension . They are now suffixed with -zend and -hhvm [16:32:33] (03CR) 10BBlack: [C: 032] stat100x -> analytics vlans [dns] - 10https://gerrit.wikimedia.org/r/180794 (owner: 10BBlack) [16:32:39] Logged the message, Master [16:32:52] godog: well dropping the 30days old directories is a good action :] [16:33:04] godog: my solution is to reduce the number of jobs running [16:33:07] PROBLEM - Host stat1002 is DOWN: CRITICAL - Plugin timed out after 15 seconds [16:33:38] PROBLEM - Host stat1001 is DOWN: PING CRITICAL - Packet loss = 100% [16:34:12] (03PS2) 10Alexandros Kosiaris: Permanent mountpoints for heze/helium bacula SDs [puppet] - 10https://gerrit.wikimedia.org/r/180805 [16:34:14] :) [16:34:19] RECOVERY - puppet last run on heze is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [16:34:24] should have scheduled maintenance in icinga for stat* [16:34:25] doing now [16:35:08] RECOVERY - Disk space on lanthanum is OK: DISK OK [16:35:55] !log deleting jenkins workspaces on lanthanum older than 30d [16:36:01] Logged the message, Master [16:36:29] godog: at worth Jenkins will recreate them and reclone the needed repos [16:37:00] hashar: indeed, so that's "safe" to do regardless, I take it'd fail the job worst case [16:37:31] !log gallium deleting obsoletes jobs: rm -fR /srv/ssd/jenkins-slave/workspace/*-testextension . They are now suffixed with -zend and -hhvm [16:37:32] Logged the message, Master [16:37:44] godog: yeah exactly. [16:38:04] (03PS2) 10Giuseppe Lavagetto: mediawiki: disable temporarily the hhvm hotprofiler [puppet] - 10https://gerrit.wikimedia.org/r/180790 [16:38:47] godog: one of the reason is that Jenkins clone the whole mediawiki/core repo for each build needing it. That is not very smart, I can probably use a shallow clone instead to save disk [16:39:02] (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki: disable temporarily the hhvm hotprofiler [puppet] - 10https://gerrit.wikimedia.org/r/180790 (owner: 10Giuseppe Lavagetto) [16:39:48] hashar: indeed, does jenkins cleanup workspaces after itself when they get old? [16:39:58] godog: nop :-( [16:40:13] so I just delete them manually every six months or so [16:40:22] I am a terrible sysadmin [16:41:01] heh perhaps jenkins has plugins to do that, no idea [16:42:48] godog: yeah probably :-] [16:43:01] that will be less of an issue as I reduce the number of test extension jobs [16:43:11] the idea is to have a single job shared by most extensions [16:43:16] jgage: reboots seem to have helped. Transient puppet failure I think [16:43:38] (03PS3) 10Ottomata: Use stat box internel addresses (.eqiad.wmnet) for everything [puppet] - 10https://gerrit.wikimedia.org/r/180520 [16:43:51] (03CR) 10Ottomata: [C: 032 V: 032] Use stat box internel addresses (.eqiad.wmnet) for everything [puppet] - 10https://gerrit.wikimedia.org/r/180520 (owner: 10Ottomata) [16:44:27] _joe_: ok to puppet-merge ' mediawiki: disable temporarily the hhvm hotprofiler'? [16:44:50] <_joe_> ottomata: yes sorry I was explaining the rationale of that to to some people [16:44:53] <_joe_> go on [16:47:41] hashar: I see, so just one git repo cloned for example (?) [16:49:17] PROBLEM - HHVM busy threads on mw1257 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [115.2] [16:49:57] godog: exaclt [16:50:26] godog: and if the box can run 8 jobs in parallel it will at worth have 8 x (mw/core + vendor + 90 extensions). Gotta do the maths : ( [16:51:04] andrewbogott: thanks! i had ssh in a loop, success on the two hosts at :27 and :33. weird that whatever transient faiulre affected both instances. [16:51:22] hashar: nice, cleaning up old workspaces sounds like a good preventive measure anyway tho [16:53:47] godog: yeah. if in doubt, delete :] [16:54:14] I will be off for most of the next two weeks but will keep an eye on mails / notifications etc anyway [16:56:11] (03PS1) 10Dr0ptp4kt: Support X-WMF-UUID X-Analytics tagging [puppet] - 10https://gerrit.wikimedia.org/r/180812 [16:56:43] (03PS1) 10Glaisher: Enable FlaggedRevs on 'Flexion' namespace at dewiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180813 [16:58:10] PROBLEM - puppet last run on virt1011 is CRITICAL: CRITICAL: Puppet last ran 1 day ago [16:58:19] (03PS1) 10Ottomata: Fix fqdn name for stat1003 for geowiki data sync [puppet] - 10https://gerrit.wikimedia.org/r/180814 [16:58:35] (03CR) 10Ottomata: [C: 032 V: 032] Fix fqdn name for stat1003 for geowiki data sync [puppet] - 10https://gerrit.wikimedia.org/r/180814 (owner: 10Ottomata) [16:58:42] (03PS2) 10Glaisher: Enable FlaggedRevs on 'Flexion' namespace at dewiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180813 [17:01:36] RECOVERY - Host stat1001 is UP: PING OK - Packet loss = 0%, RTA = 0.37 ms [17:01:36] RECOVERY - Host stat1003 is UP: PING OK - Packet loss = 0%, RTA = 0.63 ms [17:03:03] so: https://gerrit.wikimedia.org/r/#/c/176862/ needs to be reverted, it's causing jobrunner problems [17:03:28] RECOVERY - Host stat1002 is UP: PING OK - Packet loss = 0%, RTA = 1.79 ms [17:03:31] I'm not yet sure which wmfNN it's in and where to push the revert around to yet. I'm digging, but if someone else around here can figure it out quicker, that would be awesome [17:03:52] hashar: ack! [17:04:27] (03CR) 10Dr0ptp4kt: "Need we be concerned with CRLF injection by echoing out the string? If so, I guess we could stipulate a regex for well-formedness for the " [puppet] - 10https://gerrit.wikimedia.org/r/180812 (owner: 10Dr0ptp4kt) [17:06:46] ori: ping? [17:08:36] RECOVERY - HHVM busy threads on mw1257 is OK: OK: Less than 30.00% above the threshold [76.8] [17:08:39] !log Made mediawiki-phpunit-hhvm Jenkins job voting. We now enforce HHVM compliance for mediawiki/core [17:08:45] Logged the message, Master [17:09:02] bblack: that uuid patch i just added you to is to move away from uuids in the request path to the proper place in the x-analytics header. the https=1 is meant to be icing on the cake, that would be handy for running various analyses about https. nuria__ (on this channel), ironholds (analytics channel), MaxSem, Deskana and i met along with dbrant and mhurd (mobile channel) to come up with the approach yesterday for uuid header. i let [17:09:03] people know ops would need to review the vcl. [17:09:27] (03CR) 10OliverKeyes: Support X-WMF-UUID X-Analytics tagging (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/180812 (owner: 10Dr0ptp4kt) [17:10:52] (03CR) 10Dr0ptp4kt: Support X-WMF-UUID X-Analytics tagging (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/180812 (owner: 10Dr0ptp4kt) [17:11:24] dr0ptp4kt: ok, I'll get back to that in a while, I'm in the midst of some stuff here [17:11:38] bblack: thanks much. [17:12:23] (03PS1) 10Glaisher: Enable otherProjectsLinks by default on itwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180818 [17:13:59] RECOVERY - puppet last run on virt1011 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [17:18:52] !log enabling md write intent bitmap temporarily on virt1009 [17:18:58] Logged the message, Master [17:19:02] (03CR) 10OliverKeyes: [C: 031] Support X-WMF-UUID X-Analytics tagging [puppet] - 10https://gerrit.wikimedia.org/r/180812 (owner: 10Dr0ptp4kt) [17:20:39] see you later folks. Have a good week :) [17:22:45] !log switched to new unified cert on all nginx terminators via config reload [17:22:51] Logged the message, Master [17:23:09] PROBLEM - puppet last run on bast1001 is CRITICAL: CRITICAL: puppet fail [17:29:04] PROBLEM - puppet last run on carbon is CRITICAL: CRITICAL: Puppet has 1 failures [17:32:07] bblack: It's in all branches now. Ori cherry-picked https://gerrit.wikimedia.org/r/#/c/180384/ to wmf12 to try and stop the problem but we have wmf13 on group0 now without any fix [17:33:02] We have wmf13 on group0 and wmf12 everywhere else [17:33:09] PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: puppet fail [17:33:14] bd808: the 180384 fix isn't a good one anyways. the ops thinking is it's better to revert 176862 if there's no immediate correct fix [17:33:25] *nod* [17:35:06] bd080: so the process would be merge a revert to master and cherrypick to wmf12+wmf13? [17:35:20] I'm still kinda waiting to see if ori pops in anytime soon, since it's his [17:36:26] bblack: What are we reverting now? [17:36:52] https://gerrit.wikimedia.org/r/#/c/176862/ [17:37:13] (and before that, I'm guessing on wmf12 specifically we'd have to revert the partial fixup https://gerrit.wikimedia.org/r/#/c/180384/ as well) [17:37:57] Right, yeah [17:38:09] Which Ori cherry-picked and deployed to wmf12 without review, and then later Tim -1ed it [17:41:11] bblack: Well, the morning SWAT window is still active [17:41:12] (03CR) 10Chad: [C: 032] Remove $wmgUseCirrusAsAlternative (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180712 (owner: 10Chad) [17:41:21] We could deploy reverts now (i.e. in the next 20 mins) [17:41:54] Or any time afterwards, really, I'm sure greg-g would be OK with it given that it's ops instructing us to revert a specific change [17:41:56] (03Merged) 10jenkins-bot: Remove $wmgUseCirrusAsAlternative [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180712 (owner: 10Chad) [17:42:40] !log demon Synchronized wmf-config/: Remove cirrus-as-alternate settings (duration: 00m 06s) [17:42:50] Logged the message, Master [17:43:10] since it's crushing the jobrunners I'd say a revert is fine. If you want to be very polite I guess just revert on the branches [17:43:31] No I'll revert in master too [17:43:32] But really, makes prod sad is plenty of reason for a revert [17:43:35] Ye [17:43:43] ^d: You still SWATting there? [17:44:02] <^d> Should've been in swat. [17:44:05] RoanKattouw: bblack (yep) [17:44:11] <^d> RoanKattouw: I'm done, sorry. [17:44:17] OK, thanks [17:44:21] Then I'll go and do this revert now [17:47:11] bblack: https://gerrit.wikimedia.org/r/180829 (doesn't need immediate +2, but I would appreciate one) [17:49:09] (03PS1) 10Chad: Disable lsearchd almost everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180838 [17:49:11] <^d> \o/ [17:49:24] Meanwhile I'm waiting for Jenkins to merge the revert of the main change in wmf13 and the revert of the followup change in wmf12 (after which I can revert the main change in wmf12), but Jenkins will take a while [17:50:19] (03CR) 10jenkins-bot: [V: 04-1] Disable lsearchd almost everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180838 (owner: 10Chad) [17:50:27] (03PS2) 10Chad: Disable lsearchd almost everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180838 [17:50:39] (03CR) 10jenkins-bot: [V: 04-1] Disable lsearchd almost everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180838 (owner: 10Chad) [17:51:09] (03PS3) 10Chad: Disable lsearchd almost everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180838 [17:51:16] <^d> shut up jenkins that's valid syntax if I say it is. [17:52:16] sorry I've got like 4 things going on in parallel :) [17:52:50] bblack: No worries, we can do it later. I'm taking care of the revert, ETA 10-15 mins [17:53:06] ok thanks! [17:53:56] (03Abandoned) 10Steinsplitter: Adding *.wmflabs.org to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179094 (owner: 10Steinsplitter) [17:55:27] (03CR) 10Kronf: [C: 031] Enable FlaggedRevs on 'Flexion' namespace at dewiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180813 (owner: 10Glaisher) [17:58:00] <^d> RoanKattouw: Can I do one more wmf-config sync while you're waiting? [17:58:21] Go for it [17:58:27] (03CR) 10Chad: [C: 032] Disable lsearchd almost everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180838 (owner: 10Chad) [17:58:33] (03Merged) 10jenkins-bot: Disable lsearchd almost everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180838 (owner: 10Chad) [17:58:34] <^d> Hang on to your pants, folks. [17:58:34] I have a meeting anyway [17:58:38] Stupid Jenkins taking forever [17:59:08] !log demon Synchronized wmf-config/: Disable lsearchd almost everywhere (duration: 00m 07s) [17:59:14] Logged the message, Master [18:02:30] <^d> paravoid: ^ :D [18:03:25] woooo [18:03:36] bblack: I'm aborting my revert because the HHVM unit tests are throwing errors about Tidy that I don't understand. I need Ori for real now [18:03:54] <^d> paravoid: pools 2, 4 and 5 are ops' for decom now as far as I'm concerned. [18:04:11] (03CR) 10Nemo bis: "Eat all the servers" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180838 (owner: 10Chad) [18:08:33] RoanKattouw: ok [18:09:36] !log analytics vlan ACLs updated in eqiad [18:09:41] Logged the message, Master [18:17:35] (03PS1) 10BryanDavis: Enable wfDebugLog for T84894 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180845 [18:20:05] jouncebot: next [18:20:05] In 5 hour(s) and 39 minute(s): Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141219T0000) [18:24:11] (03CR) 10Chad: [C: 032] Enable wfDebugLog for T84894 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180845 (owner: 10BryanDavis) [18:24:22] (03Merged) 10jenkins-bot: Enable wfDebugLog for T84894 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180845 (owner: 10BryanDavis) [18:25:17] !log demon Synchronized wmf-config/InitialiseSettings.php: logging for IP argument bugs (duration: 00m 05s) [18:25:29] Logged the message, Master [18:31:41] (03PS2) 10Dereckson: Enable otherProjectsLinks on it.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180818 (owner: 10Glaisher) [18:31:53] (03CR) 10Dereckson: [C: 031] Enable otherProjectsLinks on it.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180818 (owner: 10Glaisher) [18:34:45] (03PS3) 10Dereckson: FlaggedRevs configuration on de.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180813 (owner: 10Glaisher) [18:34:51] (03CR) 10Dereckson: [C: 031] FlaggedRevs configuration on de.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180813 (owner: 10Glaisher) [18:35:11] PROBLEM - NTP on stat1003 is CRITICAL: NTP CRITICAL: Offset unknown [18:35:36] (03PS1) 10Yurik: Zero: Remove per-carrier analytics variance [puppet] - 10https://gerrit.wikimedia.org/r/180850 [18:35:52] PROBLEM - puppet last run on stat1001 is CRITICAL: CRITICAL: Puppet has 1 failures [18:35:54] (03PS2) 10Dereckson: Added ang.wikibooks and ie.wikibooks to closed.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180451 (owner: 10Dzahn) [18:36:02] PROBLEM - DPKG on stat1002 is CRITICAL: Connection refused by host [18:36:13] PROBLEM - Disk space on stat1002 is CRITICAL: Connection refused by host [18:36:18] PROBLEM - salt-minion processes on stat1002 is CRITICAL: Connection refused by host [18:36:19] PROBLEM - puppet last run on stat1002 is CRITICAL: Connection refused by host [18:36:40] PROBLEM - RAID on stat1002 is CRITICAL: Connection refused by host [18:37:31] PROBLEM - configured eth on stat1002 is CRITICAL: Connection refused by host [18:37:54] PROBLEM - dhclient process on stat1002 is CRITICAL: Connection refused by host [18:40:32] (03PS1) 10RobH: adding wikipedia.id to dns from registrar transfer [dns] - 10https://gerrit.wikimedia.org/r/180851 [18:41:19] (03CR) 10RobH: [C: 032] adding wikipedia.id to dns from registrar transfer [dns] - 10https://gerrit.wikimedia.org/r/180851 (owner: 10RobH) [18:41:55] PROBLEM - puppet last run on cp4002 is CRITICAL: CRITICAL: puppet fail [18:42:04] !log bd808 Synchronized php-1.25wmf12/tests/phpunit/includes/api/format/ApiFormatWddxTest.php: syncing test fix Ia58ec20 (duration: 00m 06s) [18:42:14] Logged the message, Master [18:52:18] !log bd808 Synchronized php-1.25wmf12/includes/utils/IP.php: Log calls to IP::parseRange with invalid array argument [Ie883eb6] (duration: 00m 05s) [18:52:25] Logged the message, Master [18:57:24] RECOVERY - puppet last run on cp4002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [19:01:04] (03PS1) 10RobH: setting mgmt dns for haedus and capella [dns] - 10https://gerrit.wikimedia.org/r/180857 [19:01:25] RECOVERY - Disk space on stat1002 is OK: DISK OK [19:01:28] RECOVERY - salt-minion processes on stat1002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [19:01:28] RECOVERY - puppet last run on stat1002 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [19:01:45] RECOVERY - RAID on stat1002 is OK: OK: optimal, 1 logical, 12 physical [19:02:12] (03CR) 10RobH: [C: 032] setting mgmt dns for haedus and capella [dns] - 10https://gerrit.wikimedia.org/r/180857 (owner: 10RobH) [19:02:26] RECOVERY - configured eth on stat1002 is OK: NRPE: Unable to read output [19:02:36] RECOVERY - dhclient process on stat1002 is OK: PROCS OK: 0 processes with command name dhclient [19:04:06] RECOVERY - DPKG on stat1002 is OK: All packages OK [19:04:55] PROBLEM - puppet last run on stat1003 is CRITICAL: CRITICAL: Puppet has 3 failures [19:10:13] RECOVERY - puppet last run on stat1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:11:37] !log bd808 Synchronized php-1.25wmf12/includes/profiler/ProfilerXhprof.php: backport section profiler fixes [I5935ee2] (duration: 00m 05s) [19:11:49] Logged the message, Master [19:12:10] ^d: Sync was missed on that one apparently [19:12:27] <^d> Bahhh [19:12:49] I think we need it in wmf13 too [19:13:39] * bd808 does that [19:30:16] (03CR) 10Krinkle: [WIP] contint: Add tmpfs mount in jenkins-deploy homedir for labs slaves (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/173512 (https://bugzilla.wikimedia.org/72063) (owner: 10Krinkle) [19:31:50] (03PS1) 10Kaldari: Turning off WikiGrok test on en.wiki, turning on WikiGrok on test2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180867 [19:55:44] (03PS1) 10Ottomata: Set up alias rsync module for /a on stat1003 [puppet] - 10https://gerrit.wikimedia.org/r/180870 [20:00:38] (03CR) 10Ottomata: [C: 032 V: 032] Set up alias rsync module for /a on stat1003 [puppet] - 10https://gerrit.wikimedia.org/r/180870 (owner: 10Ottomata) [20:01:53] * AndyRussG waves [20:03:03] Is this where I should ask about the configuration of http://en.m.wikipedia.beta.wmflabs.org/ ? I'm having trouble targeting a CentralNotice campaign to there, specifically http://meta.wikimedia.beta.wmflabs.org/w/index.php?title=Special:CentralNotice&method=listNoticeDetail¬ice=CN+browser+tests [20:12:31] RECOVERY - puppet last run on stat1003 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [20:13:19] Hi Coren! Enjoying the snow? ... mm do you have anything to do wtih *.beta.wmflabs.org? If not, whom could do u think I could ask about that? [20:14:55] !log bd808 Synchronized php-1.25wmf13/tests/phpunit/includes/api/format/ApiFormatWddxTest.php: Skip ApiFormatWddxTest under HHVM (duration: 00m 07s) [20:15:01] Logged the message, Master [20:15:30] AndyRussG: #wikimedia-qa is a great channel to ask about beta in [20:15:51] bd808: ah OK great idea, thanks! [20:36:39] (03PS1) 10Merlijn van Deen: Add ctcp VERSION response [debs/adminbot] - 10https://gerrit.wikimedia.org/r/180883 [20:39:50] !log bd808 Synchronized php-1.25wmf13/includes/profiler/ProfilerXhprof.php: xhprof: backport section profiler fixes (duration: 00m 07s) [20:39:55] Logged the message, Master [21:05:12] PROBLEM - SSH on mw1012 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:06:25] PROBLEM - configured eth on mw1012 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:06:25] PROBLEM - DPKG on mw1012 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:06:25] PROBLEM - Disk space on mw1012 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:06:55] PROBLEM - dhclient process on mw1012 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:06:55] PROBLEM - nutcracker port on mw1012 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:07:26] PROBLEM - nutcracker process on mw1012 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:07:37] PROBLEM - salt-minion processes on mw1012 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:07:46] PROBLEM - puppet last run on mw1012 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:08:05] PROBLEM - RAID on mw1012 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:08:19] (03CR) 10RobH: [C: 032] setting analytics1001-1002 productoin dns entries [dns] - 10https://gerrit.wikimedia.org/r/179189 (owner: 10RobH) [21:15:12] (03PS1) 10Merlijn van Deen: Make SAL edits non-bot [debs/adminbot] - 10https://gerrit.wikimedia.org/r/180889 [21:15:14] (03PS1) 10Merlijn van Deen: Add url to adminlogbot output [debs/adminbot] - 10https://gerrit.wikimedia.org/r/180890 [21:18:55] (03PS1) 10BBlack: fix dhcpd config typo [puppet] - 10https://gerrit.wikimedia.org/r/180891 [21:19:23] (03CR) 10BBlack: [C: 032 V: 032] fix dhcpd config typo [puppet] - 10https://gerrit.wikimedia.org/r/180891 (owner: 10BBlack) [21:26:28] RECOVERY - puppet last run on carbon is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [21:29:22] (03PS2) 10Merlijn van Deen: Add url to adminlogbot output [debs/adminbot] - 10https://gerrit.wikimedia.org/r/180890 [21:31:54] (03PS1) 10Merlijn van Deen: tabs -> 4 spaces [debs/adminbot] - 10https://gerrit.wikimedia.org/r/180896 [21:34:34] RECOVERY - nutcracker process on mw1012 is OK: PROCS OK: 1 process with UID = 112 (nutcracker), command name nutcracker [21:34:48] RECOVERY - salt-minion processes on mw1012 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [21:35:06] RECOVERY - puppet last run on mw1012 is OK: OK: Puppet is currently enabled, last run 51 minutes ago with 0 failures [21:35:32] RECOVERY - RAID on mw1012 is OK: OK: no RAID installed [21:35:33] RECOVERY - SSH on mw1012 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [21:36:41] RECOVERY - DPKG on mw1012 is OK: All packages OK [21:36:49] RECOVERY - configured eth on mw1012 is OK: NRPE: Unable to read output [21:36:50] RECOVERY - Disk space on mw1012 is OK: DISK OK [21:37:10] RECOVERY - dhclient process on mw1012 is OK: PROCS OK: 0 processes with command name dhclient [21:37:16] RECOVERY - nutcracker port on mw1012 is OK: TCP OK - 0.000 second response time on port 11212 [21:56:25] (03PS1) 10BryanDavis: Guard use of ProfilerXhprof with ini check of hhvm.stats.enable_hot_profiler [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180903 [21:56:43] ^d, MaxSem: ^ [21:57:50] !log integration-slave1005 is not ready. It's incompletely setup due to https://phabricator.wikimedia.org/T84917 [21:57:54] Logged the message, Master [21:59:24] Would anyone mind deleting a zuul working dir? /srv/ssd/jenkins-slave/workspace/mwext-DonationInterface-testextension-zend [21:59:31] We added a submodule that seems to be ignored on the last few checkouts. [21:59:50] awight believes nuking that dir might help [22:00:23] ejegg: Krinkle might know/be-able-to-help [22:01:50] Thanks! Krinkle, can you shed any light on our zuul submodule woes? [22:02:12] ejegg: link? [22:02:53] https://integration.wikimedia.org/ci/job/mwext-DonationInterface-testextension-zend/58/console [22:03:17] seems it's not checking out our new /vendor submodule [22:03:59] ejegg: zuul-cloner doens't support submodules by default. [22:04:18] ah, so we have to specify those in layout? [22:04:20] Needs to be hacked in manually for jobs that need it. [22:05:15] ejegg: Something like that. File a task on phrabricator. I'll ask Antoine to look at it tomorrow. [22:05:27] Krinkle: thanks! [22:05:29] It should just do that by default. I don't know why it doesn't. [22:06:22] ejegg: 'vendor' smells like composer though. Don't we already have a repo for that? [22:06:43] If you haven't already, be sure to coordinate with bd808 on whether you should create your own submodule for that. [22:06:57] DonationInterface is supposed to work outside of mediawiki as well [22:07:11] In general, afaik the policy is: the repo works on its own by using composer- install. Not by using submodules. [22:07:12] so we can reuse the logic within our CRM [22:07:16] Then it should ship with a compoer.json [22:07:29] And only when used inside mediawiki on wmf, do we subst that with a separate clone (not a submodule) for vendor [22:08:31] Ahh, we've been doing both, here and in our CRM repo. Creating a submodule, composer --install ing into that, then commiting to both repos [22:08:35] ejegg: We can chat, but it shouldn't be necessary for you to ship a pre-built vendor dir [22:08:36] bblack, RoanKattouw_away: hey, I'm here now. What's up? [22:09:08] ori: the jobrunner shell thing is still causing troubles [22:09:10] I guess with composer.lock, there's no worry about different versions being installed [22:09:31] bd808: blargh. I'll revert my change. [22:09:38] great news! I'd love to get rid of our vendor/ dirs. Is there a similar job for node_modules, yet? [22:09:46] ejegg: you should be using very specific versions in your composer.json [22:10:04] awight: No need, Jenkins runs npm install. [22:10:39] But we don't cook node modules for deployment like we do with mediawiki/vendor [22:10:49] That was my next question :) [22:11:21] awight: Is this for front-end modules, back-end modules, or test/dev environment modules (build/test) [22:12:00] Krinkle: we have projects using npm, composer, and bower modules. All are deployed in server configurations. [22:12:21] err, meaning, for production use. [22:13:57] awight: what does 'deployed in server config' mean [22:14:00] are you saying it's in prod and it works? [22:14:17] or looking for way to get it in prod [22:14:24] g2g, mail me :) [22:14:27] sort of. They do the same thing we do with mediawiki/vendor [22:14:54] manage a repo full of what it built with those tools [22:14:55] PROBLEM - puppet last run on mw1009 is CRITICAL: CRITICAL: Puppet has 2 failures [22:15:10] Krinkle: we have those three types of modules, front and back-end, deployed and used for production stuff. In all the fundraising repos, so far, we've been managing the contrib libraries manually using submodules. [22:15:18] It would be great to slay that cruft. [22:16:33] awight: no going to happen any time soon unfortunately. Running composer/npm/pip/etc on the prod cluster is not allowed (for good reasons). Maybe someday we will have a build server in an intermediate dmz. [22:17:23] bd808: ok, no worries. I was imagining we already had a WMF packagist mirror or something... I guess not? [22:18:19] no, we have jsut consolidated the composer dependiencies for WMF prod into a single repo (mediawiki/vendor.git) [22:18:42] So if an extension we deploy needs composer stuff we put it there [22:19:02] and it gets branches and included as a submodule in the wmf branches [22:19:14] aah. ok thx that explains some things. [22:19:21] !log ori Synchronized php-1.25wmf12/includes/parser/MWTidy.php: I03cc1f46f: Revert "Simplify MWTidy" (duration: 00m 14s) [22:19:25] ahh, so mediawiki/vendor is its own composer package [22:19:27] !log ori Synchronized php-1.25wmf13/includes/parser/MWTidy.php: I7e67a61f7: Revert "Simplify MWTidy" (duration: 00m 05s) [22:19:28] Logged the message, Master [22:19:33] Logged the message, Master [22:19:35] bd808: the only catch for us, is that the extension we're working on is not in the standard deployment. [22:19:39] not just the installed composer stuff from above [22:20:07] awight: ah, right. You folks are special [22:20:56] :p seriously. We'd rather hang up the snowflake capes, but no end in sight, yet. [22:20:57] ejegg: it has its own composer.json that is a manual merge of the composer.json from mediawiki/core.git and other extensions [22:21:14] <^d> 2050976 [22:21:17] cool, got it [22:21:24] <^d> shit, wrong window. [22:21:34] 8675309 [22:21:40] <^d> 23871298129 [22:21:42] <^d> numberssssss [22:21:56] I thought this was the new CIA numbers channel [22:22:05] <^d> Maybe it is ;-) [22:22:10] <^d> We'd tell you, but then.... [22:23:08] ^d: 2fa? [22:23:49] <^d> I was actually counting the number of spots on carpet near my desk. [22:24:44] ^d: All of the WMF health care plans cover psychiatric care now, just FYI. [22:24:47] bd808: beyond the usual fr-specialness, DonationInterface is actually supposed to work (in some fashion) under drupal or mediawiki. [22:26:59] RECOVERY - puppet last run on mw1009 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [22:29:20] * bd808 shivers at the mention of drupal [22:29:56] yep, pretty tangled web over here [22:30:27] so DonationInterface is your bank gateway basically I guess [22:31:02] * bd808 denies all the lies on his resume about being a payment gateway expert [22:32:48] bd808: you are in so much trouble. Just wait 'til we are ready to feed on new hires again. [22:33:43] awight: :) K4 kind of tried to lure me back to the dark side over a beer in London [22:34:07] * ^d once tried to use Drupal for something vaguely payment related. [22:34:10] <^d> I still have nightmares. [22:34:33] hah! /me makes a note in the book of souls [22:36:21] * bd808 certainly did not write a double entry accounting system or a NACHA client that passed a Sarb-Ox audit [22:36:22] greg-g, wanted to let you know my deploy is running a little behind. [22:36:56] superm401: the TOC stuff? [22:37:21] greg-g, we changed it to two other smaller Flow features (schedule was updated earlier) [22:37:23] <^d> bd808: I had to write the same for a class. Yay business school. [22:37:36] bd808 is a man of many talents [22:37:39] <^d> Luckily undergrad projects don't have to pass Sarb-Ox. [22:37:46] superm401: I totally read that deployments schedule page repeatedly throughout the day [22:38:10] chrismcmahon: or a man who has worked at some seriously understaffed companies :) [22:38:13] <^d> superm401: It's true. I look over and that's all he does all day. Just mash F5 on the deployments page to see what's happening ;-) [22:38:22] bd808: same thing [22:38:39] greg-g, sorry I didn't ping you about the change. The new things are separate but much smaller in scope. [22:38:50] superm401: no problem :) [22:39:19] ^d: you'd think I'd learn of edit histories [22:39:23] I think greg-g mostly lets the inmates run the asylum these days as long as we don't hit each other [22:39:45] I did have to install the padded walls, though [22:39:53] obviously [22:40:05] <^d> greg-g: My home office lacks padded walls. [22:40:06] <^d> plz fix. [22:40:29] Has Zuul hung? https://integration.wikimedia.org/zuul/ shows a core job where everything's done that's been sitting there. [22:40:51] * AndyRussG waves :) [22:41:02] Hey, AndyRussG [22:41:05] :) [22:41:08] This is blocking the merge queue -- https://gerrit.wikimedia.org/r/#/c/180915/ [22:41:08] everyone jump on ops! (jk) uhhhh [22:41:26] where do I find the config for redirection to the mobile *.m.* URLs? [22:41:37] <^d> in puppet! [22:41:41] Just disappeared, so it finally got through or someone fixed it. [22:42:08] Ori, this cherry pick is failing tests and blocking the merge queue -- https://gerrit.wikimedia.org/r/#/c/180915/ [22:42:12] ^d: grep -ri mobile puppet/ gave me more than I wanted... [22:42:28] bd808: looking [22:42:29] *which is a goofy thing that zuul does [22:42:31] <^d> AndyRussG: Maybe you should grep for "m"? :) [22:42:43] blrrggr [22:42:51] Or maybe \.m\. [22:42:58] * ^d is being super duper helpful today [22:43:57] actually that was helpful! Looks like it's in puppet/modules/mediawiki/files/apache/sites/redirects.conf [22:44:25] <^d> Seems likely :) [22:44:32] <^d> Either that or varnish conf somewhere. [22:44:49] AndyRussG: https://github.com/wikimedia/operations-puppet/blob/production/modules/mediawiki/files/apache/sites/redirects/redirects.dat [22:45:03] bd808: do you know where the MediaWiki settings for the unit test environment lives? [22:45:19] ori ... i did at some point [22:45:21] bd808: thanks! [22:45:22] * bd808 looks [22:46:17] <^d> ori: What are you looking for exactly? [22:46:37] to set $wgInternalTidy to false for the unit tests [22:46:44] <^d> For all unit tests? [22:46:46] <^d> Or just some? [22:47:11] <^d> $this->setMwGlobal( $name, $val ); from setUp() for the latter. [22:47:25] ori: I think it's in here -- https://github.com/wikimedia/integration-jenkins [22:47:28] hm, that could work. [22:48:35] ori: This script builds the LocalSettings.php -- https://github.com/wikimedia/integration-jenkins/blob/master/bin/mw-apply-settings.sh [22:52:33] ^d, MaxSem: This should stop the rest of the xhprof spam in the logs -- https://gerrit.wikimedia.org/r/#/c/180903/ [22:53:14] (03CR) 10Chad: [C: 032] Guard use of ProfilerXhprof with ini check of hhvm.stats.enable_hot_profiler [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180903 (owner: 10BryanDavis) [22:53:27] (03Merged) 10jenkins-bot: Guard use of ProfilerXhprof with ini check of hhvm.stats.enable_hot_profiler [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180903 (owner: 10BryanDavis) [22:54:09] !log demon Synchronized wmf-config/StartProfiler.php: fix profiling again, this time with feeling (duration: 00m 08s) [22:54:11] <^d> bd808: ^^ [22:54:13] What's the status of tidy on HHVM? [22:54:14] Logged the message, Master [22:54:15] Got https://integration.wikimedia.org/ci/job/mediawiki-phpunit-hhvm/259/testReport/%28root%29/HtmlFormatterTest__testTransform/testTransform_with_data_set__0/ [22:54:19] Not related to my change, AFAICT. [22:54:47] superm401: Ori's working on it now I think [22:54:58] yep, sec [22:55:10] Okay, thanks. [22:55:29] ori, should I wait, or just force it through? Zend passes. [22:57:32] ^ greg-g [22:58:03] as long as you promise it won't break production and as long as ori will take more than a small amount of time... [22:58:13] pinky swear promise, like, legit promise :) [22:59:05] https://gerrit.wikimedia.org/r/#/c/180987/ *should* fix it [22:59:05] <^d> there's no more legit promise than a pinky swear. [23:00:10] ^d and I are on the same wavelength today [23:00:19] superm401: you should go regardless [23:01:05] <^d> greg-g: a regular Abbott and Costello we are :) [23:01:17] Flow uses Parsoid, anyway, so I don't think we're even affected by Tidy. [23:01:37] Plus, this change doesn't really change actual text handling (just metadata about revision IDs and such) [23:01:43] So I'm going to force it. [23:02:05] godspeed [23:02:32] tidy is only involved in the action=parse API calls that Parsoid makes for extension tags [23:03:01] which technically isn't even necessary, as Parsoid parses the result to HTML [23:03:22] (DOM) [23:03:29] gwicke, okay, thanks. So we're slightly affected, but the Flow change doesn't change how we use Parsoid. [23:03:41] kk [23:05:26] 2014-12-18 07:47:11 mw1246 mediawikiwiki: Query affected 19034 rows: [23:05:29] * AaronSchulz sighs [23:09:00] !log mattflaschen Started scap: Deploy changes to Flow to fix preview (both branches) and add commit metadata (1.25wmf13) [23:09:04] Logged the message, Master [23:09:26] bd808, ^d: awesome, thanks! [23:09:35] (03PS1) 10Springle: repool db1055 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180989 [23:10:03] MaxSem: It's still happening though :( I'm trying to figure out why/how [23:10:10] (03CR) 10Springle: [C: 032 V: 032] repool db1055 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180989 (owner: 10Springle) [23:11:41] ori: Re: mediawki config for unit tests, need anything? [23:11:57] Krinkle: no, I think I'm good now. Thanks :) [23:15:11] (03PS1) 10Springle: Revert "repool db1055" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180993 [23:15:51] (03CR) 10Springle: [C: 032 V: 032] Revert "repool db1055" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180993 (owner: 10Springle) [23:16:58] <_joe|off> ori: hey [23:19:41] <_joe|off> we had to disable the hotprofiler today [23:20:10] <_joe|off> it's a bit late for the details, I should've sent an email instead, apologies [23:26:16] PROBLEM - HHVM rendering on mw1184 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:26:26] PROBLEM - Apache HTTP on mw1184 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:27:34] _joe|off: hey [23:27:37] no problem [23:27:40] i'll take a look [23:29:32] bblack, around? [23:29:48] bblack, https://gerrit.wikimedia.org/r/#/c/180850/ [23:30:03] <_joe|off> ori: long story short: some servers went into a state where the cpu was all eaten by the system, in a _raw_spinlock. And all hhvm threads where blocked in a get_rusage( RUSAGE_SELF ) call from the getCpu() function of the hotprofiler [23:30:37] <_joe|off> restarting hhvm did not resolve the issue, which is strange. [23:30:48] <_joe|off> but disabling the profiler did it [23:31:47] <_joe|off> also, we had some interesting segfaults for a couple of days, but that is on phabricator [23:32:18] I'll be right back [23:33:05] <_joe|off> I won't :) good night [23:34:51] PROBLEM - HHVM busy threads on mw1184 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [115.2] [23:35:33] !log mattflaschen Finished scap: Deploy changes to Flow to fix preview (both branches) and add commit metadata (1.25wmf13) (duration: 26m 33s) [23:35:38] Logged the message, Master [23:40:37] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There are 2 unmerged changes in mediawiki_config (dir /srv/mediawiki-staging/). [23:41:36] ^ greg-g, done. Testing now. [23:42:04] yurikR: I'm behind on a few dr0ptp4kt commits from today as well, is this related/dependant at all? [23:43:39] bblack, don' think so [23:44:21] bblack, mine is very simple, and you will love it :D [23:44:22] the X-WMF-UUID stuff [23:44:27] yes I see that I love it :) [23:44:41] PROBLEM - HHVM queue size on mw1184 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [80.0] [23:46:27] (03PS2) 10BBlack: Support X-WMF-UUID X-Analytics tagging [puppet] - 10https://gerrit.wikimedia.org/r/180812 (owner: 10Dr0ptp4kt) [23:46:35] (03CR) 10BBlack: [C: 032] Support X-WMF-UUID X-Analytics tagging [puppet] - 10https://gerrit.wikimedia.org/r/180812 (owner: 10Dr0ptp4kt) [23:46:49] (03CR) 10BBlack: [V: 032] Support X-WMF-UUID X-Analytics tagging [puppet] - 10https://gerrit.wikimedia.org/r/180812 (owner: 10Dr0ptp4kt) [23:46:55] bblack: thx [23:47:16] (03PS2) 10BBlack: Zero: Remove per-carrier analytics variance [puppet] - 10https://gerrit.wikimedia.org/r/180850 (owner: 10Yurik) [23:47:26] thx [23:47:53] (03CR) 10BBlack: [C: 032 V: 032] Zero: Remove per-carrier analytics variance [puppet] - 10https://gerrit.wikimedia.org/r/180850 (owner: 10Yurik) [23:48:16] now if all the caches break we won't even know which patch to blame! :) [23:48:30] Flow deployment verified. [23:50:22] dr0ptp4kt: the other non-puppet ones, I'll try to peek at later tonight or tomorrow AM to at least give them whatever "this isn't insane" +1 I can. [23:50:34] not that I understand half of what's going on in those places [23:50:35] bblack: :) [23:51:08] bblack: we need to do some runtime debugging together some day through all tiers. would be fun! [23:51:18] :) [23:51:40] ok I manually checked puppet on a text and a mobile cache, and no big syntax error goofs happened or whatever. [23:51:50] I'm off to grab some dinner, call me if something goes up in flames that I can help with. [23:58:08] Who's going to do the SWAT, and can I steal a few +2s on the extension cherry-picks so I can make the MW-core ones? [23:58:40] James_F: Give me links, and I'll return +2 :P [23:59:12] hoo: Thanks! [23:59:15] hoo: https://gerrit.wikimedia.org/r/#/c/181002/ and https://gerrit.wikimedia.org/r/#/c/180858/ for wmf12. [23:59:39] hoo: https://gerrit.wikimedia.org/r/#/c/180701/ and https://gerrit.wikimedia.org/r/#/c/180700/ for wmf13.