[00:04:14] !log updated Parsoid to 393a3263 [00:04:23] Logged the message, Master [00:05:14] ^demon: are you aware of the template code leaking on git.wikimedia.org? (doesn't look like a serious issue, just looks funny: '! {{item.p}}{{item.n}} {{item.t}}') [00:05:33] <^demon> ori-l: Yes, aware. [00:05:37] <^demon> Upstream is trying to fix [00:05:47] ok cool :) [00:10:24] could somebody with root clear the backend Parsoid Varnish caches on cerium and titanium? [00:12:23] ^demon: btw, it seems that the 'unhide source lines above / below' functionality has been broken in FF for quite a while [00:12:27] in the diff view [00:14:18] ^demon: I'm going to head out now. Our list of stuff left to do for features is dwindling quickly. Admin stuff still needs some love though so we aren't done yet. [00:16:52] <^demon> Yeah, we need to focus on how we're gonna do that. Have a good evening. [00:33:41] ori-l: ye are still using otrs 2.4? [00:36:44] PROBLEM - LVS Lucene on search-pool2.svc.eqiad.wmnet is CRITICAL: No route to host [00:37:19] AzaToth: I'm not sure; ask Jeff_Green, RD or Elsie [00:37:25] PROBLEM - LVS Lucene on search-pool1.svc.eqiad.wmnet is CRITICAL: No route to host [00:37:39] ori-l: asked you as you sent the latest "patch" to upstream :-P [00:37:57] http://bugs.otrs.org/show_bug.cgi?id=9042 [00:38:19] yes, 'patch' is a bit of an overstatement tho [00:39:06] New review: MaxSem; "Can this be merged?" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/66874 [00:39:09] but ok, I'll schpam them to deathness [00:40:21] ori-l: couldn't apply any of the patches to any degree against current otrs in debian (which is 3.1.blablabla) [00:47:28] AzaToth: Yes, we're still using 2.4 [00:47:38] There is a 3.2 test environment live on wmf servers [00:48:13] I see [00:48:55] RD: I assume the test env for 3.2 hasn't been checked into a git near [00:49:09] RD: as I would assume you would have had to redo all the patches [00:49:09] I doubt it [00:49:18] unless you didn't add any [00:49:32] The OTRS inventor set it up [00:49:40] He's still working out a few bugs [00:49:48] But then he plans to assist with the upgrade [00:49:53] OTRS inventor? [00:49:59] Yes. Martin E. [00:50:03] ok [00:50:07] He has an RT account, db access, etc. [00:51:08] so I assume you have no wish to use debian package of otrs then [00:52:21] I personally don't know what that is. I don't know much tech, really. I'm just pretending. :-) I'm an OTRS admin, passing on what I was told by ops [00:52:53] hehe [00:53:29] you can either squeeze in a software manually or you can let the system do the work for you [00:55:40] The test environment of v. 3.2.6 works great, despite the missing patches and a few things we already pointed out to him. I hope we can get that stuff fixed and the live install upgraded soon, but there has been virtually no progress that I've been told of in a while now. But he may be working and we not know it...information/things don't get passed along. [00:56:51] RD: just looking into the debianization [00:57:01] * RD nods [00:57:15] I'm just whining. ;-) [00:57:22] hehe [00:57:24] Maybe somebody is listening. :P [00:58:03] RD: the best way to get upstream "inventor" to get speed, is to discreete imply you are looking into alternatives [00:58:50] I'm not sure what the hold up is at the time... in the past both he and WMF could not find the "right times" to get started. Not sure what stalled things this time. [00:59:08] PROBLEM - Puppet freshness on manutius is CRITICAL: No successful Puppet run in the last 10 hours [00:59:14] (or if things are even stallled... ) [00:59:25] I've no idea [00:59:32] haven't read the memos [00:59:39] I go through Maggie, who talks to Martin [00:59:46] So the whole process sucks :P [00:59:47] perhaps he forgot the new cover page for the tps report [00:59:52] haha [01:01:58] RECOVERY - NTP on ssl3003 is OK: NTP OK: Offset -0.003187298775 secs [01:03:28] RECOVERY - NTP on ssl3002 is OK: NTP OK: Offset -0.002039074898 secs [01:23:06] PROBLEM - NTP on ssl3003 is CRITICAL: NTP CRITICAL: No response from NTP server [01:31:56] RECOVERY - NTP on ssl3003 is OK: NTP OK: Offset 0.003455638885 secs [01:33:44] * Elsie stares at Reedy. [02:07:39] !log LocalisationUpdate completed (1.22wmf7) at Fri Jun 21 02:07:39 UTC 2013 [02:07:49] Logged the message, Master [02:13:50] !log LocalisationUpdate completed (1.22wmf8) at Fri Jun 21 02:13:50 UTC 2013 [02:13:59] Logged the message, Master [02:18:58] PROBLEM - Puppet freshness on ms-be1001 is CRITICAL: No successful Puppet run in the last 10 hours [02:24:03] !log LocalisationUpdate ResourceLoader cache refresh completed at Fri Jun 21 02:24:03 UTC 2013 [02:24:12] Logged the message, Master [02:32:58] PROBLEM - Puppet freshness on ms-be2 is CRITICAL: No successful Puppet run in the last 10 hours [04:01:22] PROBLEM - RAID on mc15 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:03:14] RECOVERY - RAID on mc15 is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0 [04:37:33] PROBLEM - LVS Lucene on search-pool1.svc.eqiad.wmnet is CRITICAL: No route to host [04:38:13] PROBLEM - LVS Lucene on search-pool2.svc.eqiad.wmnet is CRITICAL: No route to host [04:51:44] PROBLEM - NTP on ssl3003 is CRITICAL: NTP CRITICAL: No response from NTP server [05:00:55] PROBLEM - NTP on ssl3002 is CRITICAL: NTP CRITICAL: No response from NTP server [05:50:25] New review: Faidon; "I don't see how those two are disabled, they're being used later in the config." [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/69705 [05:52:03] New review: Faidon; "We already have an rsync module in the tree. Can we reuse that? If so, maybe we should name the "rol..." [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/69703 [05:52:21] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:52:41] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [05:52:41] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [05:52:41] PROBLEM - Puppet freshness on erzurumi is CRITICAL: No successful Puppet run in the last 10 hours [05:52:41] PROBLEM - Puppet freshness on mc15 is CRITICAL: No successful Puppet run in the last 10 hours [05:52:41] PROBLEM - Puppet freshness on virt4 is CRITICAL: No successful Puppet run in the last 10 hours [05:52:42] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: No successful Puppet run in the last 10 hours [05:52:42] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [05:52:43] PROBLEM - Puppet freshness on virt1 is CRITICAL: No successful Puppet run in the last 10 hours [05:52:43] PROBLEM - Puppet freshness on spence is CRITICAL: No successful Puppet run in the last 10 hours [05:52:44] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [05:54:12] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [05:56:29] New review: Faidon; "Actually, looking a bit closer, I don't see anything role-based on that module other than the config..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69703 [06:01:14] apergos: morning [06:01:26] what's snapshotNN/dataset2 being used for? [06:02:08] morning [06:02:13] those are the dump hosts [06:03:22] dumps of various types are generatd from the snapshot hosts, the dataset hosts serve them and other files to the public [06:04:12] aha [06:04:15] and they're actually being used? [06:04:19] yes [06:05:06] if they weren't I would be hearing about it from a lot of folks :-D [06:05:54] why do you ask? [06:07:35] New review: Faidon; "I'm not sure if what I'm proposing is clear, maybe we should discuss it over on IRC?" [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/69682 [06:07:41] I was reviewing ^ [06:07:52] and it has a stanza for the dataset NFS mountpoints [06:07:56] so I was wondering [06:08:51] !log updated Parsoid to 90515ab5 [06:08:51] looking [06:08:59] Logged the message, Master [06:09:45] it's just moving it from manifests/ [06:10:01] so there isn't anything changing regarding dataset/snapshot [06:10:17] we're gradually moving to most things being in modules I suppose? [06:10:21] yes [06:10:26] good [06:20:29] PROBLEM - Puppet freshness on sodium is CRITICAL: No successful Puppet run in the last 10 hours [06:24:29] PROBLEM - Puppet freshness on magnesium is CRITICAL: No successful Puppet run in the last 10 hours [06:26:29] PROBLEM - Puppet freshness on analytics1017 is CRITICAL: No successful Puppet run in the last 10 hours [06:27:29] PROBLEM - Puppet freshness on manganese is CRITICAL: No successful Puppet run in the last 10 hours [06:27:29] PROBLEM - Puppet freshness on mw1152 is CRITICAL: No successful Puppet run in the last 10 hours [06:28:31] PROBLEM - Puppet freshness on aluminium is CRITICAL: No successful Puppet run in the last 10 hours [06:28:31] PROBLEM - Puppet freshness on amslvs2 is CRITICAL: No successful Puppet run in the last 10 hours [06:28:31] PROBLEM - Puppet freshness on amssq31 is CRITICAL: No successful Puppet run in the last 10 hours [06:28:31] PROBLEM - Puppet freshness on amssq35 is CRITICAL: No successful Puppet run in the last 10 hours [06:28:31] PROBLEM - Puppet freshness on amssq32 is CRITICAL: No successful Puppet run in the last 10 hours [06:29:40] RECOVERY - Puppet freshness on mw1162 is OK: puppet ran at Fri Jun 21 06:29:30 UTC 2013 [06:29:40] RECOVERY - Puppet freshness on search1014 is OK: puppet ran at Fri Jun 21 06:29:30 UTC 2013 [06:29:40] RECOVERY - Puppet freshness on lvs1 is OK: puppet ran at Fri Jun 21 06:29:32 UTC 2013 [06:29:40] RECOVERY - Puppet freshness on mw5 is OK: puppet ran at Fri Jun 21 06:29:32 UTC 2013 [06:29:40] RECOVERY - Puppet freshness on sq79 is OK: puppet ran at Fri Jun 21 06:29:34 UTC 2013 [06:29:40] RECOVERY - Puppet freshness on db1048 is OK: puppet ran at Fri Jun 21 06:29:34 UTC 2013 [06:29:40] RECOVERY - Puppet freshness on lvs5 is OK: puppet ran at Fri Jun 21 06:29:34 UTC 2013 [06:29:41] RECOVERY - Puppet freshness on cp1006 is OK: puppet ran at Fri Jun 21 06:29:34 UTC 2013 [06:29:42] RECOVERY - Puppet freshness on mw1190 is OK: puppet ran at Fri Jun 21 06:29:35 UTC 2013 [06:29:42] RECOVERY - Puppet freshness on mw1111 is OK: puppet ran at Fri Jun 21 06:29:35 UTC 2013 [06:29:43] RECOVERY - Puppet freshness on mw83 is OK: puppet ran at Fri Jun 21 06:29:35 UTC 2013 [06:29:43] RECOVERY - Puppet freshness on srv301 is OK: puppet ran at Fri Jun 21 06:29:35 UTC 2013 [06:29:44] RECOVERY - Puppet freshness on srv290 is OK: puppet ran at Fri Jun 21 06:29:35 UTC 2013 [06:29:44] RECOVERY - Puppet freshness on mw1147 is OK: puppet ran at Fri Jun 21 06:29:35 UTC 2013 [06:29:45] RECOVERY - Puppet freshness on search36 is OK: puppet ran at Fri Jun 21 06:29:35 UTC 2013 [06:29:45] RECOVERY - Puppet freshness on mw1161 is OK: puppet ran at Fri Jun 21 06:29:38 UTC 2013 [06:29:46] RECOVERY - Puppet freshness on mw38 is OK: puppet ran at Fri Jun 21 06:29:38 UTC 2013 [06:29:46] RECOVERY - Puppet freshness on mc1005 is OK: puppet ran at Fri Jun 21 06:29:38 UTC 2013 [06:29:49] RECOVERY - Puppet freshness on ms-be4 is OK: puppet ran at Fri Jun 21 06:29:39 UTC 2013 [06:29:49] RECOVERY - Puppet freshness on wtp1024 is OK: puppet ran at Fri Jun 21 06:29:39 UTC 2013 [06:29:49] RECOVERY - Puppet freshness on gadolinium is OK: puppet ran at Fri Jun 21 06:29:40 UTC 2013 [06:29:49] RECOVERY - Puppet freshness on mw118 is OK: puppet ran at Fri Jun 21 06:29:40 UTC 2013 [06:29:49] RECOVERY - Puppet freshness on db1036 is OK: puppet ran at Fri Jun 21 06:29:40 UTC 2013 [06:30:39] RECOVERY - Puppet freshness on sq52 is OK: puppet ran at Fri Jun 21 06:30:29 UTC 2013 [06:30:39] RECOVERY - Puppet freshness on search1007 is OK: puppet ran at Fri Jun 21 06:30:29 UTC 2013 [06:30:39] RECOVERY - Puppet freshness on sq41 is OK: puppet ran at Fri Jun 21 06:30:29 UTC 2013 [06:30:39] RECOVERY - Puppet freshness on mw60 is OK: puppet ran at Fri Jun 21 06:30:29 UTC 2013 [06:30:39] RECOVERY - Puppet freshness on labstore3 is OK: puppet ran at Fri Jun 21 06:30:29 UTC 2013 [06:30:59] RECOVERY - Puppet freshness on ms-fe2 is OK: puppet ran at Fri Jun 21 06:30:48 UTC 2013 [06:30:59] RECOVERY - Puppet freshness on srv272 is OK: puppet ran at Fri Jun 21 06:30:49 UTC 2013 [06:30:59] RECOVERY - Puppet freshness on srv291 is OK: puppet ran at Fri Jun 21 06:30:49 UTC 2013 [06:30:59] RECOVERY - Puppet freshness on sq68 is OK: puppet ran at Fri Jun 21 06:30:49 UTC 2013 [06:30:59] RECOVERY - Puppet freshness on mw1104 is OK: puppet ran at Fri Jun 21 06:30:49 UTC 2013 [06:31:39] RECOVERY - Puppet freshness on mw52 is OK: puppet ran at Fri Jun 21 06:31:38 UTC 2013 [06:31:49] RECOVERY - Puppet freshness on srv252 is OK: puppet ran at Fri Jun 21 06:31:45 UTC 2013 [06:31:49] RECOVERY - Puppet freshness on mw1039 is OK: puppet ran at Fri Jun 21 06:31:47 UTC 2013 [06:31:59] RECOVERY - Puppet freshness on gallium is OK: puppet ran at Fri Jun 21 06:31:50 UTC 2013 [06:33:49] RECOVERY - Puppet freshness on srv294 is OK: puppet ran at Fri Jun 21 06:33:39 UTC 2013 [06:34:00] RECOVERY - Puppet freshness on mw1003 is OK: puppet ran at Fri Jun 21 06:33:55 UTC 2013 [07:01:23] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:02:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [07:26:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:27:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.132 second response time [07:31:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:32:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.133 second response time [07:52:51] New review: ArielGlenn; "250 seems really really long. How about subjectlength + 20 or so?" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/66665 [08:22:40] hello [08:23:29] hi hashar [08:25:32] ah faidon :-] [08:25:46] have you looked at my PHP cherry pick request ? :-] [08:26:13] I haven't... [08:31:36] RECOVERY - NTP on ssl3003 is OK: NTP OK: Offset 0.008274912834 secs [08:32:56] RECOVERY - NTP on ssl3002 is OK: NTP OK: Offset 0.01059162617 secs [08:39:59] I wonder why ntp keeps dying on those boxes... [09:26:41] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:27:33] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [10:37:15] !log upgrading packages on gallium [10:37:23] Logged the message, Master [10:40:39] paravoid: could the ntp daemon be restarted by puppet ? [10:59:39] PROBLEM - Puppet freshness on manutius is CRITICAL: No successful Puppet run in the last 10 hours [11:04:38] Hi, wikimedia.org seems to allow anyone to do an AXFR. Is this intended? [11:20:58] PROBLEM - RAID on mc15 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:21:50] RECOVERY - RAID on mc15 is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0 [11:30:25] jimmyxu: probably not :) [11:30:48] hashar: seems all wikimedia zone allows AXFR... [11:31:01] or should I file a bug then? [11:32:08] why would it be a bug? [11:32:37] cause that expose all our entries to anyone ? :-) [11:33:10] that's kind of the entire purpose of DNS isn't it? [11:34:09] you usually only allow AXFR between master and slaves for the purpose of zone replications [11:34:21] resolver doesn't need it [11:34:27] i know [11:34:31] but what's the point of disabling it [11:35:54] security by obfuscation :-] [11:36:17] honestly I dont know [11:36:29] I always restricted AXFR to slaves only [11:36:47] we don't have anything to hide [11:37:02] so it is a "worksforme" :-] [11:37:11] our configs are all public, but we need to hide hostnames/ips in dns? ... [11:37:21] jimmyxu: so that is intended, quoting mark "we have nothing to hide" [11:37:32] hashar: okay :) [11:46:28] security by obfuscation isn't security [11:55:19] New review: Yurik; "DISABLED means that the config pages:" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/69705 [11:56:13] paravoid, disabled in that patch is something else - i commented in https://gerrit.wikimedia.org/r/#/c/69705/ [11:56:43] I don't see why we need to carry this in the varnish config [11:57:19] paravoid, its juts a reflection of the config pages - the script copies that info together with the carrier's description string [11:57:46] its a comment! [11:57:55] and I don't see why we have to carry this in the varnish config [11:58:18] you'll change one bit on that web page of yours and we'll have to go through code review/merge [11:58:22] why? [11:58:32] what do we gain? [12:00:43] paravoid, i think what we gain the most is the easy way to track when something was disabled a while back and hasn't changed for a while, indicating a problem. Varnish file is where we look frequently (at this point), so if something has been disabled for a long time, we might want to remove it alltogether. Monitoring zero config pages is harder in that regard. [12:02:15] and i don't upload a patch when something gets disabled - only when there is a more significant change [12:05:04] the second argument cancels the first [12:05:18] if you don't keep git up to date, how can you meaningfully see if something is disabled for a long time [12:07:15] our varnish configs are not for these purposes [12:07:26] paravoid, the way it has been going, i get significant change (ips change, new carrier, etc) very frequently - we are signing new carriers non stop. "disabled" at this point is a big convenience to me, but if this is a blocker to get ip changes in, I will change the script - its not worth spending my time arguing about this particular issue [12:08:14] we've agreed we're going to replace all those ACLs soon with bblack's work anyway, so you'll have to find a way to track this history outside of the git tree anyway [12:08:22] so, yeah, change your script I'd say [12:13:00] New patchset: Yurik; "Script-updated zero configs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69705 [12:13:34] paravoid, done [12:14:46] New patchset: Faidon; "zero: add IPs to carrier Orange Niger" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69705 [12:15:16] (fixing commit message) [12:15:23] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69705 [12:19:08] PROBLEM - Puppet freshness on ms-be1001 is CRITICAL: No successful Puppet run in the last 10 hours [12:30:52] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:32:38] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.130 second response time [12:33:09] PROBLEM - Puppet freshness on ms-be2 is CRITICAL: No successful Puppet run in the last 10 hours [12:33:56] apergos: hey [12:34:01] ms-be2 still down? [12:34:05] is this scheduled? [12:38:44] ah I saw the drac is down ticket [12:39:12] it's down and it's not scheduled and that's all I have, it disappeared sometime after I afked yesterday [12:39:24] or at least after the last time I looked yesterday afternoon [12:39:35] right [12:39:59] I wasn't sure if you saw [12:40:03] yeah [13:07:47] New patchset: BBlack; "Make this thing functional..." [operations/software/varnish/libvmod-netmapper] (master) - https://gerrit.wikimedia.org/r/69857 [13:12:06] Change merged: BBlack; [operations/software/varnish/libvmod-netmapper] (master) - https://gerrit.wikimedia.org/r/69857 [13:32:18] New patchset: coren; "Fix upstart job in role::labsnfs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69858 [13:32:52] New review: coren; "Simple enough." [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/69858 [13:32:52] Change merged: coren; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69858 [13:43:09] New patchset: Petrb; "missing motd file" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69860 [13:48:27] Coren merge this ^ [13:48:41] for some reason toolsbeta is looking for motd files prefixed with toolsbeta instead of tools [13:48:49] I have no idea why :/ [13:49:06] but puppet is crying about it [13:51:20] petan: That's quite on purpose; it'd be silly to presume both projects have the same motd. [13:51:37] hmm... [13:51:39] ok [13:51:51] I have no idea how is that accomplished but ok [13:51:57] It makes that test in it quite unnecessary, btw [13:52:03] yes I see [13:52:16] I was just wondering how to easily distiguish between them [13:52:53] Want to amend it first before I merge? [13:53:32] amended [13:53:40] meh [13:53:40] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:53:42] some git error [13:54:01] "some git error"? [13:54:22] New patchset: Petrb; "missing motd file" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69860 [13:54:32] fixed [13:54:39] my git is producing a lot of errors :) [13:54:53] I even tried to purge it and install from scratch but it still produces it :/ [13:54:58] like some file is broken [13:55:11] but when I remove it and install from ubuntu repository it doesn't work either [13:55:30] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.128 second response time [13:55:44] /usr/lib/git-core/git-pull: eval: line 286: syntax error near unexpected token `(' [13:55:45] /usr/lib/git-core/git-pull: eval: line 286: `exec git-merge "$merge_name" HEAD (1.7 GB)' [13:55:49] this error :/ [13:55:50] I hate it [13:56:14] * Coren doesn't hink he ever got it. [13:56:23] it's internal error in git [13:56:30] New review: coren; "LGM" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/69860 [13:56:31] but why I have no idea [13:57:08] Change merged: coren; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69860 [14:25:18] paravoid, in early discussions I think we agreed that we would (or at least could) group roles inside modules. That's mentioned here: https://wikitech.wikimedia.org/wiki/Puppet_usage#Organization [14:33:01] i think we'll have to create one 'role' module [14:33:25] at least if we want the autoloader to work [14:33:55] Why? If I refer to ldap::role::whatsit the autoloader knows where to find that. [14:35:16] if the role name doesn't conflict with the manifests module name [14:35:22] or you mix it, and that's really ugly [14:35:39] since roles are very WMF specific there's little point in putting them in separate modules anyway [14:35:49] (I think 80% of our modules will be very WMF specific anyway, but that's a different matter...) [14:36:02] It's the same matter, though :) [14:36:13] If the modules are wmf-specific then grouping roles inside them makes sense. [14:36:14] (or at least, WMF specific enough to not really be useful outside anyway ;) [14:36:27] when the role name is the same [14:36:32] see cache vs varnish/squid [14:36:52] a role usually combines multiple components [14:36:58] so putting them in a component module doesn't really work [14:37:14] !log jenkins: updating all phpcs-HEAD jobs to rely on git-changed-in-head script {{gerrit|69863}} [14:37:22] Logged the message, Master [14:38:13] Sure, I don't think that /every/ role will be grouped inside a module… just the roles that are clearly module-specific. [14:38:34] Or, rather, some mixed-module roles will be in a roles module [14:40:40] The issue of name conflicts between roles and manifests seems unimportant since the whole idea is to replace manifests with modules. [14:47:33] New patchset: Petrb; "new class for toolsbeta project" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69864 [14:47:35] Coren ^ [14:48:29] Oh, I'd have put it in labs.pp alongside the other. [14:48:36] wait a moment [14:48:43] I think there is some mistake in my class [14:48:55] inherits should also have toolsbeta::config I guess [14:49:06] "This is a nice generic place to make project-specific roles with a sane 2 [14:49:06] # naming scheme." [14:49:10] meh [14:49:23] And yes, the inherits also need to inherit your own config class. :-) [14:50:18] fixed [14:50:28] New patchset: Petrb; "new class for toolsbeta project" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69864 [15:03:21] andrewbogott: I don't like putting a role and the component/service/software that is related into the same module anyway [15:04:24] mark: OK. Well, as usual, I don't care about what the standard is so much as that I care that there /be/ a standard. [15:05:01] I guess I have to start another email thread about this… maybe people will respond this time. [15:05:02] fair enough [15:05:23] i think our use of roles is very similar to what other people do with a "site" module [15:05:30] or however it's called [15:06:54] New review: coren; "I think I would have preferred this to be in labs.pp, but meh." [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/69864 [15:06:55] Change merged: coren; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69864 [15:29:24] PROBLEM - SSH on pdf3 is CRITICAL: Server answer: [15:32:23] RECOVERY - SSH on pdf3 is OK: SSH OK - OpenSSH_4.7p1 Debian-8ubuntu3 (protocol 2.0) [15:38:54] New patchset: coren; "Tool Labs: Typo fix in bastion.pp" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69868 [15:39:26] New review: coren; "Typo fix." [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/69868 [15:39:37] Change merged: coren; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69868 [15:42:34] New patchset: coren; "Tool Labs: Package is actually called qt4-qmake" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69870 [15:43:16] New review: coren; "Typo fix" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/69870 [15:43:17] Change merged: coren; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69870 [15:53:15] PROBLEM - Puppet freshness on erzurumi is CRITICAL: No successful Puppet run in the last 10 hours [15:53:15] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [15:53:15] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [15:53:15] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [15:53:15] PROBLEM - Puppet freshness on mc15 is CRITICAL: No successful Puppet run in the last 10 hours [15:53:16] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: No successful Puppet run in the last 10 hours [15:53:16] PROBLEM - Puppet freshness on spence is CRITICAL: No successful Puppet run in the last 10 hours [15:53:17] PROBLEM - Puppet freshness on virt1 is CRITICAL: No successful Puppet run in the last 10 hours [15:53:17] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [15:53:18] PROBLEM - Puppet freshness on virt4 is CRITICAL: No successful Puppet run in the last 10 hours [16:00:28] could anybody with the necessary right purge the Parsoid Varnish caches on cerium and titanium? [16:02:32] ok [16:02:49] everything? [16:03:55] done [16:03:59] has ext/mysqlnd (which still includes ext/mysql compatability) ever been considered/tested for use? As we consider yet another attempt at horizontal sharding in Flow i have been thinking about how to handle the fall-back fan-out queries, mysqlnd includes async query support which could be used in special cases, but i'm imagining that would involve a ton of testing to even consider? [16:05:10] ext/mysql and ext/mysqlnd are the php client extensions, btw. mysqlnd is included in the newest php releases by default, but not in 5.3 [16:05:20] it is available fo 5.3 though [16:06:38] are you sure? [16:06:54] I would have said that mysqlnd has been included for years [16:07:03] unless you're talking about a different mysqlnd [16:07:48] i'm fairly certain. in the mediawiki vagrant install there are two seperate packages, php5-mysql and php5-mysqlnd. The php5-mysqlnd includes MYSQLI_ASYNC constant (and i've lightly tested async queries) [16:08:24] http://dev.mysql.com/downloads/connector/php-mysqlnd/ says it was integrated to main php releases as of 5.4 [16:08:28] which to be fair, is a year ago :) [16:08:56] mark: thanks! [16:09:17] mark: could you set up some groups so that I can do the purging myself? [16:11:09] it is possible though that prod is already using mysqlnd, i wouldn't know how to check [16:11:12] the varnishadm route might also work with Varnish config changes only [16:15:49] New review: Se4598; "Is this change still pursued?" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/65644 [16:21:17] PROBLEM - Puppet freshness on sodium is CRITICAL: No successful Puppet run in the last 10 hours [16:22:07] PROBLEM - RAID on ms-be10 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:25:18] PROBLEM - Puppet freshness on magnesium is CRITICAL: No successful Puppet run in the last 10 hours [16:30:30] New patchset: QChris; "Fix typo in gerrit's apache config" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69877 [16:32:16] Change abandoned: Demon; "Duplicate of Ie10bfb5e" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69877 [16:38:43] ^demon: Yay for descriptive commit messages :-) No wonder I could not find that change :D [16:38:55] <^demon> :) [16:39:09] <^demon> I was going to ping Ryan about those changes, but it was his bday so he wasn't around. [16:39:33] Happy birthday Ryan! [16:40:08] I put it in locally, and started apache before puppet kicked in, so gerrit-dev is usable again. [16:40:44] <^demon> Yeah, that was the only thing breaking gerrit-dev afaik. [16:41:12] Btw. Is there a reason we stepped back from the 2.8 branch to the 2.7-rc1? [16:41:21] wmf branch was after that IIRC. [16:42:40] greg-g: Whoops - https://bugzilla.wikimedia.org/show_bug.cgi?id=49967 - looks like we switched off ClickTracking but AFT relies on it, so causing fatals. [16:43:08] greg-g: Bug report of actual issue at https://bugzilla.wikimedia.org/show_bug.cgi?id=49966 (bug to have the dependency removed). [16:45:09] ekk [16:45:24] Yeah. [16:46:41] on it, thanks [16:53:20] <^demon> qchris: It still named based on 2.7-rc1 since that's the latest git describe. [16:53:42] <^demon> It's on b1ca2b0. [16:53:50] Oh. I thought you meant we're going to deploy the version at tag v2.7-rc1 [16:54:17] <^demon> Nooo, sorry about that :) [16:54:21] I misread your message in #dev [17:01:47] James_F, duuuude - it's an awesome reason to disable AFT!:P [17:11:17] +1 [17:18:36] ebernhardson: prod isn't using php5-mysqlnd, i just checked [17:19:30] i don't know the first thing about mysql drivers, tho. you might want to e-mail the engineering or ops lists about it. [17:40:05] ori-l: hi, ok i'll put together a mail for the list, any preference which one? I would think its more ops side as its a backwards compatible driver change, just has to be tested that it actually works for us and doesnt introduce bugs [17:40:29] ori-l: also i dont think i'm on the ops mailing list :) [17:51:02] PROBLEM - Host searchidx1001 is DOWN: PING CRITICAL - Packet loss = 100% [17:52:23] RECOVERY - Host searchidx1001 is UP: PING OK - Packet loss = 0%, RTA = 0.32 ms [17:53:54] <^demon> manybubbles: I'm trying to see if there's any low-hanging fruit in the reindexer to improve on. This is painfully slow. [17:55:48] ^demon: ok - how slow is it for you now? I haven't been loading a bunch of data recently. [17:56:31] <^demon> http://p.defau.lt/?kj9Y4S1oaJx3Yb4WqrQJww [17:56:42] <^demon> (Granted some of the slowness is from profiling, but it's still *way too slow*) [18:01:22] RECOVERY - Lucene on search1001 is OK: TCP OK - 0.000 second response time on port 8123 [18:01:22] RECOVERY - Lucene on search1003 is OK: TCP OK - 0.001 second response time on port 8123 [18:01:22] RECOVERY - Lucene on search1005 is OK: TCP OK - 0.000 second response time on port 8123 [18:01:34] RECOVERY - Lucene on search1002 is OK: TCP OK - 0.000 second response time on port 8123 [18:01:34] RECOVERY - Lucene on search1012 is OK: TCP OK - 0.000 second response time on port 8123 [18:01:42] RECOVERY - Lucene on search1013 is OK: TCP OK - 0.001 second response time on port 8123 [18:01:45] RECOVERY - Lucene on search1007 is OK: TCP OK - 0.000 second response time on port 8123 [18:01:45] RECOVERY - Lucene on search1014 is OK: TCP OK - 0.002 second response time on port 8123 [18:01:45] RECOVERY - Lucene on search1009 is OK: TCP OK - 0.001 second response time on port 8123 [18:01:45] RECOVERY - Lucene on search1010 is OK: TCP OK - 0.004 second response time on port 8123 [18:01:45] RECOVERY - Lucene on search1008 is OK: TCP OK - 0.000 second response time on port 8123 [18:01:53] RECOVERY - Lucene on search1020 is OK: TCP OK - 0.000 second response time on port 8123 [18:01:53] RECOVERY - Lucene on search1004 is OK: TCP OK - 0.001 second response time on port 8123 [18:02:01] <^demon> ohi lucene [18:02:11] sbernardin: are you around? did you update the drac license info on ms-be2? [18:02:12] RECOVERY - Lucene on search1006 is OK: TCP OK - 0.000 second response time on port 8123 [18:02:12] RECOVERY - Lucene on search1016 is OK: TCP OK - 0.000 second response time on port 8123 [18:02:12] RECOVERY - Lucene on search1019 is OK: TCP OK - 0.000 second response time on port 8123 [18:02:12] RECOVERY - Lucene on search1015 is OK: TCP OK - 0.001 second response time on port 8123 [18:02:12] RECOVERY - Lucene on search1023 is OK: TCP OK - 0.000 second response time on port 8123 [18:02:22] RECOVERY - Lucene on search1011 is OK: TCP OK - 0.001 second response time on port 8123 [18:02:23] RECOVERY - Lucene on search1018 is OK: TCP OK - 0.001 second response time on port 8123 [18:02:23] RECOVERY - Lucene on search1017 is OK: TCP OK - 0.000 second response time on port 8123 [18:02:34] RECOVERY - Lucene on search1024 is OK: TCP OK - 0.000 second response time on port 8123 [18:02:43] RECOVERY - Lucene on search1021 is OK: TCP OK - 0.002 second response time on port 8123 [18:02:44] RECOVERY - Lucene on search1022 is OK: TCP OK - 0.001 second response time on port 8123 [18:02:45] !log updated Parsoid to 37cd852f [18:02:52] Logged the message, Master [18:03:47] cmjohnson1: where do I get the file from? [18:03:54] ^demon: so it used to be that findUpdates accounted for about 80% of the time. now it is 24%. It looks like the big nasty is parsing the pages. [18:03:57] robh: should have emailed it to you [18:04:13] cmjohnson1: when? [18:04:15] as long as you put in the rt ticket [18:04:27] yikes! 74%! The actual time on solr is ~2% now. [18:04:28] after you replaced the c2100 with the 720 [18:04:53] <^demon> Time on solr, but we're still doing a ton of queries. [18:05:13] <^demon> If I removed the preprocess() call, we'd probably see very different results. [18:05:43] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69629 [18:06:12] sbernardin: okay 2 things...you will need to reboot ms-be2 (ok cuz it's down anyway) and boot into the temporary drac license cd you have and install the temp license [18:06:13] it only looks like that accounts for a bit of time. [18:06:42] then you need to submit a rt ticket to rob requesting the idrac license. [18:06:44] it looks like a lot of time is spent doing things with strings [18:06:52] <^demon> Doing it again without the parsing, see what we get. [18:07:19] PROBLEM - NTP on searchidx1001 is CRITICAL: NTP CRITICAL: Offset unknown [18:08:21] !log installing package upgrades on kaulen (bugzilla) [18:08:21] <^demon> Without the preprocess() call: http://p.defau.lt/?0SBfX7qdBGj9ta2437qmxA [18:08:29] Logged the message, Master [18:08:45] <^demon> Still pretty dang slow, and we're already parsing in buildDocumentforRevision()...wonder if we can re-use that parseroutput somehow. [18:08:50] <^demon> Rather than doing it 2x. [18:11:19] RECOVERY - NTP on searchidx1001 is OK: NTP OK: Offset 0.00308406353 secs [18:13:25] New patchset: Ottomata; "Initial commit of Kafka Puppet module for Apache Kafka 0.8" [operations/puppet/kafka] (master) - https://gerrit.wikimedia.org/r/50385 [18:14:54] New review: Dzahn; "done. test mail has been sent to you." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69629 [18:15:12] New review: Ottomata; "Fixed test issue!" [operations/puppet/kafka] (master) - https://gerrit.wikimedia.org/r/50385 [18:15:59] New review: Ottomata; "Thanks for the reviews, fellas!" [operations/puppet/kafka] (master); V: 2 C: 2; - https://gerrit.wikimedia.org/r/50385 [18:16:00] Change merged: Ottomata; [operations/puppet/kafka] (master) - https://gerrit.wikimedia.org/r/50385 [18:16:28] <^demon> manybubbles: So, I think we're at a point with Cirrus that we can stop iterating in the dark. Let's start pushing for review. [18:16:38] <^demon> We can still be liberal about merging and so-forth, but it's a good habit to get in now. [18:17:11] yeah - we've cranked out most of the features and now we're going to be doing more refining any way [18:17:15] which will take my eyes [18:17:17] mark, paravoid, I'd appreciate you chiming in on the roles/modules email thread before you punch out for the weekend (if it's not too late) [18:17:24] I will [18:18:04] ^demon: what must be done for me to have +2 on CirrusSearch? I figure we're really the most appropriate people that have it. [18:18:20] <^demon> Oh, I need to grant you a ton of permissions, duh. [18:18:22] <^demon> Lemme do that now [18:19:01] <^demon> You should have a ton of more review permissions now. [18:19:03] thanks [18:19:58] New review: Krinkle; "This repository has both a working jenkins pipeline (tests are passing and jenkins-bot votes V+2) an..." [operations/puppet/kafka] (master) - https://gerrit.wikimedia.org/r/50385 [18:23:20] New patchset: Krinkle; "Restrict Verified/Submit to JenkinsBot" [operations/puppet/kafka] (refs/meta/config) - https://gerrit.wikimedia.org/r/69890 [18:35:29] RECOVERY - search indices - check lucene status page on search1004 is OK: HTTP OK: HTTP/1.1 200 OK - 163 bytes in 0.005 second response time [18:35:46] RECOVERY - search indices - check lucene status page on search1005 is OK: HTTP OK: HTTP/1.1 200 OK - 163 bytes in 0.004 second response time [18:40:40] RobH: https://gerrit.wikimedia.org/r/#/c/69421/ if you have a sec. it's a stupidly trivial patch; i hate having those in my queue. should take a second. [18:49:46] RECOVERY - search indices - check lucene status page on search1002 is OK: HTTP OK: HTTP/1.1 200 OK - 157 bytes in 0.002 second response time [18:52:24] !log updated Parsoid to 21e1e7d [18:52:32] Logged the message, Master [18:57:50] coren, iirc you were one of the advocates for dividing up roles amongst different modules. If that's right, please speak up… [18:58:17] * Coren speaks up. [18:58:22] Where? [18:58:27] ops list [18:58:36] * Coren goes to do that now. [19:02:16] RECOVERY - search indices - check lucene status page on search1024 is OK: HTTP OK: HTTP/1.1 200 OK - 747 bytes in 0.004 second response time [19:05:27] RECOVERY - search indices - check lucene status page on search1014 is OK: HTTP OK: HTTP/1.1 200 OK - 747 bytes in 0.003 second response time [19:10:37] RECOVERY - search indices - check lucene status page on search1009 is OK: HTTP OK: HTTP/1.1 200 OK - 369 bytes in 0.006 second response time [19:10:37] RECOVERY - search indices - check lucene status page on search1010 is OK: HTTP OK: HTTP/1.1 200 OK - 369 bytes in 0.003 second response time [19:12:37] RECOVERY - search indices - check lucene status page on search1001 is OK: HTTP OK: HTTP/1.1 200 OK - 213 bytes in 0.006 second response time [19:12:37] RECOVERY - search indices - check lucene status page on search1023 is OK: HTTP OK: HTTP/1.1 200 OK - 747 bytes in 0.006 second response time [19:12:47] RECOVERY - search indices - check lucene status page on search1013 is OK: HTTP OK: HTTP/1.1 200 OK - 747 bytes in 0.003 second response time [19:19:47] RECOVERY - search indices - check lucene status page on search1008 is OK: HTTP OK: HTTP/1.1 200 OK - 351 bytes in 0.010 second response time [19:19:51] * mark spoke up too [19:22:44] note that if puppet actually supported hierarchical roles, then i'd be in favor of splitting some things (e.g. toollabs) off into separate submodules, under the the main role module [19:22:54] but given that that's not possible, I think it should all be in the role module [19:23:00] and that's really not a big deal either [19:24:48] * mark goes back to celebrating weekend ;) [19:25:18] RECOVERY - search indices - check lucene status page on search1011 is OK: HTTP OK: HTTP/1.1 200 OK - 504 bytes in 0.005 second response time [19:28:17] RECOVERY - search indices - check lucene status page on search1012 is OK: HTTP OK: HTTP/1.1 200 OK - 504 bytes in 0.005 second response time [19:29:27] RECOVERY - search indices - check lucene status page on search1007 is OK: HTTP OK: HTTP/1.1 200 OK - 351 bytes in 0.004 second response time [19:29:47] !log olivneh synchronized php-1.22wmf7/extensions/ArticleFeedback/modules/jquery.articleFeedback/jquery.articleFeedback.js 'Gerrit change I3db3cf47b / bug 49967' [19:29:56] Logged the message, Master [19:30:07] !log olivneh synchronized php-1.22wmf8/extensions/ArticleFeedback/modules/jquery.articleFeedback/jquery.articleFeedback.js 'Gerrit change I3db3cf47b / bug 49967' [19:30:16] Logged the message, Master [19:37:30] * ori-l washes his hands [19:56:29] PROBLEM - Host ms-be2 is DOWN: PING CRITICAL - Packet loss = 100% [20:12:26] RECOVERY - search indices - check lucene status page on search1006 is OK: HTTP OK: HTTP/1.1 200 OK - 207 bytes in 0.010 second response time [20:15:06] RECOVERY - search indices - check lucene status page on search1003 is OK: HTTP OK: HTTP/1.1 200 OK - 269 bytes in 0.002 second response time [20:22:42] New patchset: Dzahn; "Provide a sensible resource name for Daniel Kinzler's SSH key" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69421 [20:23:42] New review: Dzahn; "changing the comment, cleaning up on fenari, letting puppet recreate this" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/69421 [20:23:43] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69421 [20:25:59] !log cleaning up /home/daniel/ (the other Daniel) ssh keys on fenari and letting puppet recreate them [20:26:07] Logged the message, Master [20:30:25] !log installing upgrades on bast1001 [20:30:33] Logged the message, Master [20:38:10] New patchset: RobH; "RT 5017 maryana access to analytics cluster + tab spacing cleanup" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69946 [20:39:34] New review: RobH; "skynet says this patchset is great" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/69946 [20:39:35] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69946 [20:40:39] RECOVERY - swift-container-updater on ms-be2 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-updater [20:40:40] RECOVERY - Disk space on ms-be2 is OK: DISK OK [20:40:40] RECOVERY - swift-object-server on ms-be2 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [20:40:49] RECOVERY - swift-object-auditor on ms-be2 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [20:40:50] RECOVERY - Host ms-be2 is UP: PING OK - Packet loss = 0%, RTA = 26.54 ms [20:40:59] RECOVERY - swift-account-reaper on ms-be2 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [20:40:59] RECOVERY - swift-account-replicator on ms-be2 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [20:40:59] RECOVERY - RAID on ms-be2 is OK: OK: State is Optimal, checked 1 logical device(s) [20:41:09] RECOVERY - swift-account-auditor on ms-be2 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [20:41:09] RECOVERY - swift-object-updater on ms-be2 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [20:41:19] RECOVERY - swift-account-server on ms-be2 is OK: PROCS OK: 13 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [20:41:19] RECOVERY - swift-container-auditor on ms-be2 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [20:41:19] RECOVERY - SSH on ms-be2 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [20:41:29] RECOVERY - swift-container-replicator on ms-be2 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [20:41:29] RECOVERY - swift-object-replicator on ms-be2 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [20:41:29] RECOVERY - swift-container-server on ms-be2 is OK: PROCS OK: 13 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [20:41:29] RECOVERY - DPKG on ms-be2 is OK: All packages OK [20:45:21] New patchset: RobH; "RT 5233 aaron halfak access to analytics cluster" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69947 [20:53:43] New patchset: RobH; "RT 5233 aaron halfak access to analytics cluster" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69947 [20:56:11] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69947 [21:00:00] PROBLEM - Puppet freshness on manutius is CRITICAL: No successful Puppet run in the last 10 hours [21:01:00] New review: Dzahn; "true, and adminteam@ is an alias for techsupport so that would create tickets. officeit@ just for th..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/68127 [21:01:01] New patchset: RobH; "RT 5273 adam baso access to analytics" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69949 [21:02:24] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69949 [21:22:04] New patchset: Dzahn; "remove Daniel Kinzler's account, he said he doesn't need it and it had the wrong keys, already checked key was nowhere besides fenari, deleting home dirs via salt" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69951 [21:23:01] New review: Dzahn; "ori-l: fyi" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/69951 [21:23:02] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69951 [21:23:36] !log deleting /home/daniel on fenari (confirmed not needed) [21:23:45] Logged the message, Master [21:27:06] PROBLEM - SSH on mc15 is CRITICAL: Connection timed out [21:27:56] RECOVERY - SSH on mc15 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [21:45:03] New patchset: Dzahn; "Revert "remove Daniel Kinzler's account, he said he doesn't need it and it had the wrong keys, already checked key was nowhere besides fenari, deleting home dirs via salt"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69954 [21:48:48] New patchset: Dzahn; "Revert "remove Daniel Kinzler's account, he said he doesn't need it and it had the wrong keys, already checked key was nowhere besides fenari, deleting home dirs via salt"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69954 [21:51:24] New patchset: Dzahn; "Revert "remove Daniel Kinzler's account, he said he doesn't need it and it had the wrong keys, already checked key was nowhere besides fenari, deleting home dirs via salt"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69954 [21:54:37] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69954 [22:09:20] New patchset: Dzahn; "remove deactivated account daniel from admins::restricted" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69957 [22:12:30] New patchset: Dzahn; "revoke comment for deactivated account daniel in admins::restricted" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69957 [22:14:05] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69957 [22:17:36] New patchset: Dzahn; "add a maintenance cronjob to mail the mchenry alias file to OIT (RT #5278)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/68127 [22:19:14] PROBLEM - Puppet freshness on ms-be1001 is CRITICAL: No successful Puppet run in the last 10 hours [22:20:54] PROBLEM - NTP on ssl3002 is CRITICAL: NTP CRITICAL: No response from NTP server [22:25:32] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/68127 [22:33:14] PROBLEM - Puppet freshness on ms-be2 is CRITICAL: No successful Puppet run in the last 10 hours [22:46:20] !log created a wb_entity_per_page view in labsdb wikidatawiki_p [22:46:28] Logged the message, Master [22:55:47] PROBLEM - NTP on ssl3003 is CRITICAL: NTP CRITICAL: No response from NTP server [23:01:36] thanks for that view binasher :) [23:15:34] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69010