[00:01:58] <icinga-wm>	 PROBLEM - ElasticSearch health check on logstash1001 is CRITICAL: CRITICAL - Could not connect to server 10.64.32.138  
[00:02:30] <Reedy>	 uh oh
[00:04:31] <icinga-wm>	 PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 525 bytes in 0.005 second response time  
[00:09:47] <ori>	 is anyone looking at tungsten?
[00:13:49] <icinga-wm>	 RECOVERY - graphite.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 0.007 second response time  
[00:15:06] <grrrit-wm>	 (03CR) 10Dzahn: "superseded by Ie790fd2e3b607e92 , already does the same thing but in phab module instead of here" [puppet] - 10https://gerrit.wikimedia.org/r/169303 (owner: 10Dzahn)
[00:15:13] <grrrit-wm>	 (03Abandoned) 10Dzahn: add a phabricator check to LVS monitoring [puppet] - 10https://gerrit.wikimedia.org/r/169303 (owner: 10Dzahn)
[00:17:48] <grrrit-wm>	 (03PS1) 10Ori.livneh: hhvm: make HHVM's working directory be /var/tmp/hhvm [puppet] - 10https://gerrit.wikimedia.org/r/169630 
[00:18:37] <grrrit-wm>	 (03Abandoned) 10Ori.livneh: hhvm: make HHVM's working directory be /var/tmp/hhvm [puppet] - 10https://gerrit.wikimedia.org/r/169627 (owner: 10Ori.livneh)
[00:18:59] <grrrit-wm>	 (03CR) 10Ori.livneh: [C: 032] hhvm: make HHVM's working directory be /var/tmp/hhvm [puppet] - 10https://gerrit.wikimedia.org/r/169630 (owner: 10Ori.livneh)
[00:32:28] <icinga-wm>	 PROBLEM - puppet last run on mw1029 is CRITICAL: CRITICAL: puppet fail  
[00:38:18] <icinga-wm>	 PROBLEM - puppet last run on mw1018 is CRITICAL: CRITICAL: puppet fail  
[00:43:18] <icinga-wm>	 PROBLEM - puppet last run on mw1028 is CRITICAL: CRITICAL: puppet fail  
[00:50:03] <ori>	 puppet failure on hhvms is me, fixing
[00:51:28] <icinga-wm>	 RECOVERY - puppet last run on mw1029 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures  
[00:51:59] <icinga-wm>	 PROBLEM - MySQL Slave Delay on db1016 is CRITICAL: CRIT replication delay 351 seconds  
[00:52:08] <icinga-wm>	 PROBLEM - MySQL Replication Heartbeat on db1016 is CRITICAL: CRIT replication delay 356 seconds  
[00:52:18] <grrrit-wm>	 (03PS1) 10Ori.livneh: Fix-up for I777ae49ca [puppet] - 10https://gerrit.wikimedia.org/r/169635 
[00:52:49] <icinga-wm>	 RECOVERY - MySQL Slave Delay on db1016 is OK: OK replication delay 0 seconds  
[00:53:00] <grrrit-wm>	 (03PS2) 10Ori.livneh: Fix-up for I777ae49ca [puppet] - 10https://gerrit.wikimedia.org/r/169635 
[00:53:01] <paravoid>	 dpkg-dev?
[00:53:02] <icinga-wm>	 RECOVERY - MySQL Replication Heartbeat on db1016 is OK: OK replication delay -0 seconds  
[00:53:02] <paravoid>	 for hhvm?
[00:53:06] <grrrit-wm>	 (03CR) 10Ori.livneh: [C: 032 V: 032] Fix-up for I777ae49ca [puppet] - 10https://gerrit.wikimedia.org/r/169635 (owner: 10Ori.livneh)
[00:53:24] <paravoid>	 also, there's a ori dotfile change there
[00:53:41] <ori>	 paravoid: the latter explained in the commit message
[00:53:56] <paravoid>	 no please don't do that
[00:54:04] <paravoid>	 bundle completely unrelated changes like that
[00:54:08] <ori>	 the former: to unpack the source package. i'd prefer it if there was an hhvm-src package instead (i requested it in the RT ticket) but haven't seen action on that
[00:54:22] <paravoid>	 nah, -src packages are a bad idea
[00:54:24] <paravoid>	 uncommon too
[00:54:31] <paravoid>	 because they are a bad idea :)
[00:54:50] <ori>	 why are they a bad idea? because there's already such a thing as a source package?
[00:54:58] <paravoid>	 that for starters
[00:55:03] <paravoid>	 they're hard to build as well
[00:55:20] <paravoid>	 .deb is not the right format for hhvm's source either
[00:55:48] <paravoid>	 linux-source-* for example does exist
[00:56:01] <paravoid>	 for people that want to build their own kernel
[00:56:07] <paravoid>	 but ships a .tar.gz
[00:56:40] <ori>	 i want the source so i can see context in gdb
[00:56:50] <paravoid>	 I know
[00:57:18] <icinga-wm>	 RECOVERY - puppet last run on mw1018 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures  
[00:58:33] <paravoid>	 apt-get source is fine I think
[00:58:43] <paravoid>	 (with dpkg-dev, granted)
[00:59:30] <ori>	 the only downside to that is that we ensure => present on the HHVM package, so we can manage when package upgrades happen
[00:59:37] <ori>	 which means that you can't make the exec subscribe to the package
[00:59:47] <paravoid>	 why do you need to puppetize it?
[01:00:06] <ori>	 puppetize what? the source package?
[01:00:18] <paravoid>	 yes
[01:00:28] <paravoid>	 also -dbg packages, but let's say I can see that
[01:00:30] <ori>	 why not? it should be available on each server
[01:00:35] <paravoid>	 the new hhvm-dbg is 300M btw
[01:00:40] <paravoid>	 why?
[01:01:12] <grrrit-wm>	 (03PS1) 10Dzahn: phab - remove duplicate check command [puppet] - 10https://gerrit.wikimedia.org/r/169637 
[01:01:24] <ori>	 because we're not always immediately able to perceive a problem generically, and reproduce it on separate environments. initial investigation is often tied to the specific servers on which the problem initially manifests. the current leak is a good example.
[01:01:55] <paravoid>	 for most issues you can just take a corefile
[01:01:58] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] phab - remove duplicate check command [puppet] - 10https://gerrit.wikimedia.org/r/169637 (owner: 10Dzahn)
[01:02:13] <icinga-wm>	 RECOVERY - puppet last run on mw1028 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures  
[01:02:17] <paravoid>	 or isolate it to one server you're debugging in
[01:02:33] <ori>	 but why start with a default attitude of not having debugging tools handy?
[01:02:55] <paravoid>	 well I didn't say to debugging tools
[01:03:01] <paravoid>	 no*
[01:03:28] <paravoid>	 but 300M hhvm-dbg, hhvm source extracted..., we're kinda overdoing it
[01:03:58] <ori>	 what is your criterion for overdoing, here?
[01:04:36] <ori>	 Filesystem      Size  Used Avail Use% Mounted on
[01:04:36] <ori>	 /dev/sda1       211G   26G  175G  13% /
[01:04:40] <paravoid>	 I don't think we have /any/ other package in production weighting more than a few dozen MB; maybe hadoop
[01:04:51] <ori>	 so?
[01:05:00] <paravoid>	 similarly, no source for any other software of ours, not even PHP
[01:05:09] <ori>	 that's unfortunate, in the case of php
[01:05:10] <paravoid>	 and possibly not even mediawiki at some point in the future, aiui :)
[01:05:57] <ori>	 tim equipped the app servers with some helpful tools for debugging PHP
[01:06:17] <paravoid>	 I'm aware
[01:06:20] <ori>	 and we lost them mostly because the people migrating the setup weren't aware of them or weren't sure how to port them
[01:07:05] <paravoid>	 how do you envision updating HHVM to new versions?
[01:07:31] <ori>	 major or minor releases?
[01:07:40] <paravoid>	 minor sounds fine
[01:08:06] <ori>	 the workflow giuseppe and i have adopted is basically this:
[01:08:30] <ori>	 first, the package is just scp'd to osmium for some initial testing / sanity checks
[01:08:41] <ori>	 if it looks good, giuseppe uploads it to apt, and we upgrade labs
[01:08:51] <grrrit-wm>	 (03PS3) 10Dzahn: RT - puppetize /etc/aliases for phab redirects [puppet] - 10https://gerrit.wikimedia.org/r/168733 
[01:09:05] <ori>	 we watch it for a while, depending on how big the delta is
[01:09:26] <ori>	 then apply it to prod gradually with salt
[01:09:37] <paravoid>	 ok
[01:10:05] <ori>	 are you worried that having all servers retrieve a 300mb package all at once would overwhelm the apt server?
[01:10:10] <ori>	 we wouldn't be doing it all at once, anyway
[01:11:10] <icinga-wm>	 PROBLEM - Disk space on ocg1002 is CRITICAL: DISK CRITICAL - free space: / 349 MB (3% inode=73%):  
[01:11:12] <paravoid>	 among other things
[01:11:18] <paravoid>	 dunno, it's not exactly great
[01:11:24] <paravoid>	 I don't mind it all that much
[01:11:56] <mutante>	 feel like glancing at my latest change up there?
[01:12:03] * ori looks
[01:12:10] <paravoid>	 but I also don't feel very motivated to jump through extra hoops (such as keeping the source installed and in sync with binaries)
[01:12:33] <paravoid>	 btw, debugging RelWithDeb was broken upstream before my patch
[01:12:37] <paravoid>	 I wonder how facebook deals with this
[01:13:53] <grrrit-wm>	 (03CR) 10Ori.livneh: "This file would be extremely easy to template, allowing the 'includer' of the class to pass in aliases as a hash parameter." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/168733 (owner: 10Dzahn)
[01:14:53] * ori doesn't know
[01:15:05] <mutante>	 hopes that the RT->phab migration is a singular event though
[01:15:28] <ori>	 mutante: oh, is the RT module going to be nuked afterwards?
[01:15:48] <ori>	 in that case, I think it's OK not to bother templating it, but you should still specify u/g/o for the file resource
[01:15:52] <mutante>	 i'm not sure when, first i will become read-only
[01:16:05] <mutante>	 yes, ok
[01:17:43] <grrrit-wm>	 (03PS4) 10Dzahn: RT - puppetize /etc/aliases for phab redirects [puppet] - 10https://gerrit.wikimedia.org/r/168733 
[01:18:38] <ori>	 mutante: 0444, by convention :P
[01:19:45] <mutante>	 well, i'm doing what i see on the server, in order to not introduce any change :p
[01:20:20] <grrrit-wm>	 (03PS5) 10Dzahn: RT - puppetize /etc/aliases for phab redirects [puppet] - 10https://gerrit.wikimedia.org/r/168733 
[01:20:38] <mutante>	 arr, say it.. it's missing a "this file managed by puppet" line :)
[01:21:21] <grrrit-wm>	 (03PS6) 10Dzahn: RT - puppetize /etc/aliases for phab redirects [puppet] - 10https://gerrit.wikimedia.org/r/168733 
[01:24:56] <grrrit-wm>	 (03CR) 10Ori.livneh: [C: 031] RT - puppetize /etc/aliases for phab redirects [puppet] - 10https://gerrit.wikimedia.org/r/168733 (owner: 10Dzahn)
[01:26:06] <ori>	 mutante: actually, paravoid weaned me off of those ('managed by puppet')
[01:26:11] <ori>	 because it can actually become untrue
[01:26:16] <ori>	 in which case it is actively misleading
[01:26:39] <ori>	 i think you actually debugged one such case a while ago, an apache ports.conf that said it was managed by puppet but wasn't
[01:26:59] <ori>	 i think "This file was provisioned by Puppet" may be a good compromise
[01:27:16] <paravoid>	 there's also "mailalias" btw
[01:28:34] <paravoid>	 mailalias { 'ops-request': ensure => present, recipient => 'ops-requests', }
[01:28:41] <mutante>	 oh!
[01:28:50] <paravoid>	 :)
[01:29:02] <paravoid>	 it won't automatically purge unmanaged aliases
[01:29:05] <mutante>	 well then .. thank you :) i'll check it out
[01:29:08] <paravoid>	 but it might be suitable for your use
[01:29:36] <ori>	 paravoid: were you still undecided about https://gerrit.wikimedia.org/r/#/c/167020/ , btw?
[01:30:02] <AndyRussG>	 Hi... Is there some way for one wiki to talk to another server-side? I want to write PHP code that talks to Meta 8p
[01:30:06] <mutante>	 actually i think that is exactly what i want it to do .. NOT touch unmanaged things, but append a few extra ones to the file
[01:30:20] * AndyRussG rolls eyes and looks innocent
[01:30:28] <paravoid>	 ori: I thought we agreed to do it in a stage?
[01:30:46] <paravoid>	 or am I confusing it with something else, I don't remember...
[01:30:52] <ori>	 what the hell, i was sure i did that
[01:30:57] * ori is confused
[01:31:04] <paravoid>	 https://gerrit.wikimedia.org/r/#/c/167835/
[01:31:05] <paravoid>	 heh you did
[01:31:23] <ori>	 oh, i should just abandon the other one then
[01:31:30] <mutante>	 AndyRussG: yea, for example file_get_contents() 
[01:31:43] <paravoid>	 and I should review this one :)
[01:31:57] <grrrit-wm>	 (03Abandoned) 10Ori.livneh: Apt::Conf['no-recommends'] -> Package <| provider == 'apt' |> [puppet] - 10https://gerrit.wikimedia.org/r/167020 (owner: 10Ori.livneh)
[01:32:49] <paravoid>	 right, I started reviewing it
[01:32:55] <paravoid>	 and then realized it's a bit more complex
[01:33:46] <AndyRussG>	 mutante: ahh thanks, fantastic, where does that live? (grepping just came up empty-handed..)
[01:34:04] <paravoid>	 and that I'd need to babysit it
[01:34:08] <paravoid>	 when deploying it :)
[01:34:28] <AndyRussG>	 oic it's a php thing...
[01:34:36] <mutante>	 mailalias is even a native puppet type.. duh :) thx paravoid 
[01:35:04] <ori>	 paravoid: there's always the keyholder patch :P
[01:35:09] <paravoid>	 :P
[01:35:15] <ori>	 it's not actually time-sensitive
[01:35:19] <ori>	 i'm just impatient
[01:35:24] <ori>	 don't worry about it
[01:35:25] <mutante>	 AndyRussG: yea, all it was is saying "you can fetch remote files in PHP"
[01:35:33] <grrrit-wm>	 (03PS2) 10Faidon Liambotis: Add ::apt to stage => first [puppet] - 10https://gerrit.wikimedia.org/r/167835 (owner: 10Ori.livneh)
[01:35:39] <grrrit-wm>	 (03CR) 10Faidon Liambotis: [C: 032] Add ::apt to stage => first [puppet] - 10https://gerrit.wikimedia.org/r/167835 (owner: 10Ori.livneh)
[01:35:43] <paravoid>	 let's see
[01:35:48] <ori>	 uhoh
[01:35:49] <paravoid>	 (famous last words)
[01:35:57] * ori fastens seat-belts
[01:36:02] <AndyRussG>	 mutante:  ah hmm thanks, what about calling another wiki's actual code, kinda like an API request?
[01:36:36] <mutante>	 AndyRussG: well, mediawiki does have an API
[01:36:58] <mutante>	 AndyRussG: http://www.mediawiki.org/wiki/API:Main_page#A_simple_example
[01:38:05] <AndyRussG>	 Yes I know... But can I make PHP of one WMF wiki call the API of another WMF wiki server-side?
[01:38:38] <ori>	 yes
[01:38:41] <ori>	 there's a class for that
[01:38:47] <AndyRussG>	 \o/ woo :)
[01:38:58] * ori digs it up
[01:39:03] <legoktm>	 AndyRussG: just make an API request using MWHttpRequest...
[01:39:33] <paravoid>	 (Exec[/usr/bin/apt-get update] => Class[Apt::Update] => Stage[first] => Stage[main] => Class[Passwords::Root] => User[root] => File[/usr/local/bin/apt2xml] => Class[Apt] => Stage[first])
[01:39:37] <paravoid>	 yeah
[01:39:46] <paravoid>	 I suspected something like that
[01:39:48] <ori>	 AndyRussG: https://www.mediawiki.org/wiki/API:Calling_internally
[01:40:19] <legoktm>	 but that only works for the same wiki
[01:40:26] <grrrit-wm>	 (03PS1) 10Faidon Liambotis: Revert "Add ::apt to stage => first" [puppet] - 10https://gerrit.wikimedia.org/r/169643 
[01:40:34] <grrrit-wm>	 (03CR) 10Faidon Liambotis: [C: 032 V: 032] Revert "Add ::apt to stage => first" [puppet] - 10https://gerrit.wikimedia.org/r/169643 (owner: 10Faidon Liambotis)
[01:40:45] <paravoid>	 (before we get flooed from warnings)
[01:41:07] <ori>	 AndyRussG: if you need to call another wiki, you can just make an http request. see <https://github.com/wikimedia/mediawiki-extensions-EventLogging/blob/master/includes/RemoteSchema.php> for an example
[01:42:10] <AndyRussG>	 weee thanks so much ori legoktm mutante :D
[01:42:15] <ori>	 paravoid: what was it about Apt::Conf['no-recommends'] -> Package <| provider == 'apt' |> that rubbed you the wrong way again?
[01:42:17] * AndyRussG hugs everyone
[01:42:35] <paravoid>	 it adds a depedency on every single package basically
[01:42:44] <paravoid>	 enlarges the dependency tree and the catalog
[01:42:56] <paravoid>	 it's basically exactly why stages exist :)
[01:43:30] <ori>	 oh, right
[01:44:21] <icinga-wm>	 PROBLEM - puppet last run on cp3015 is CRITICAL: CRITICAL: puppet fail  
[01:51:44] <grrrit-wm>	 (03PS7) 10Dzahn: RT - add mail aliases [puppet] - 10https://gerrit.wikimedia.org/r/168733 
[01:53:23] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] RT - add mail aliases [puppet] - 10https://gerrit.wikimedia.org/r/168733 (owner: 10Dzahn)
[01:55:11] <grrrit-wm>	 (03CR) 10Dzahn: "yep, it added them just fine and didn't overwrite the entire file" [puppet] - 10https://gerrit.wikimedia.org/r/168733 (owner: 10Dzahn)
[02:02:31] <icinga-wm>	 RECOVERY - puppet last run on cp3015 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures  
[03:25:00] <grrrit-wm>	 (03PS1) 10Tim Starling: Increase HHVM server thread count [puppet] - 10https://gerrit.wikimedia.org/r/169649 
[03:26:43] <grrrit-wm>	 (03CR) 10Tim Starling: [C: 032] Increase HHVM server thread count [puppet] - 10https://gerrit.wikimedia.org/r/169649 (owner: 10Tim Starling)
[03:36:32] <icinga-wm>	 PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0]  
[03:38:59] <ori>	 !log upgraded mw1114 to custom package with patch from https://phabricator.wikimedia.org/T820#16428 applied
[03:39:12] <morebots>	 Logged the message, Master
[03:49:32] <icinga-wm>	 RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0]  
[05:22:13] <icinga-wm>	 PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 220, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/0: down - Core:  cr1-codfw:xe-5/2/1 (Telia, IC-307235) (#2648) [10Gbps wave]BR  
[05:28:32] <icinga-wm>	 RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 222, down: 0, dormant: 0, excluded: 0, unused: 0  
[05:58:43] <icinga-wm>	 PROBLEM - puppet last run on mw1154 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:16:52] <icinga-wm>	 RECOVERY - puppet last run on mw1154 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures  
[06:26:23] <icinga-wm>	 RECOVERY - Disk space on ocg1003 is OK: DISK OK  
[06:27:04] <icinga-wm>	 RECOVERY - Disk space on ocg1002 is OK: DISK OK  
[06:27:52] <icinga-wm>	 PROBLEM - puppetmaster https on palladium is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[06:28:03] <icinga-wm>	 PROBLEM - puppetmaster backend https on palladium is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[06:28:03] <icinga-wm>	 PROBLEM - puppet last run on db1018 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:28:33] <icinga-wm>	 PROBLEM - puppet last run on amssq60 is CRITICAL: CRITICAL: puppet fail  
[06:28:52] <icinga-wm>	 RECOVERY - puppetmaster https on palladium is OK: HTTP OK: Status line output matched 400 - 335 bytes in 0.343 second response time  
[06:28:52] <icinga-wm>	 PROBLEM - puppet last run on db1051 is CRITICAL: CRITICAL: puppet fail  
[06:28:52] <icinga-wm>	 PROBLEM - puppet last run on cp4014 is CRITICAL: CRITICAL: puppet fail  
[06:29:04] <icinga-wm>	 PROBLEM - puppet last run on amssq47 is CRITICAL: CRITICAL: puppet fail  
[06:29:12] <icinga-wm>	 RECOVERY - puppetmaster backend https on palladium is OK: HTTP OK: Status line output matched 400 - 335 bytes in 0.258 second response time  
[06:29:13] <icinga-wm>	 PROBLEM - puppet last run on mw1011 is CRITICAL: CRITICAL: puppet fail  
[06:29:22] <icinga-wm>	 PROBLEM - puppet last run on db1021 is CRITICAL: CRITICAL: puppet fail  
[06:29:22] <icinga-wm>	 PROBLEM - puppet last run on amssq48 is CRITICAL: CRITICAL: puppet fail  
[06:29:22] <icinga-wm>	 PROBLEM - puppet last run on analytics1010 is CRITICAL: CRITICAL: puppet fail  
[06:29:53] <icinga-wm>	 PROBLEM - puppet last run on amssq35 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:30:53] <icinga-wm>	 PROBLEM - puppet last run on cp3016 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:31:12] <icinga-wm>	 PROBLEM - puppet last run on cp4008 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:31:13] <icinga-wm>	 PROBLEM - puppet last run on analytics1030 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:31:22] <icinga-wm>	 PROBLEM - puppet last run on labsdb1003 is CRITICAL: CRITICAL: Puppet has 2 failures  
[06:31:26] <ori>	 it's like the changing of the guards in front of buckingham palace
[06:31:33] <icinga-wm>	 PROBLEM - puppet last run on mw1061 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:31:34] <icinga-wm>	 PROBLEM - puppet last run on mw1166 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:31:43] <icinga-wm>	 PROBLEM - puppet last run on mw1092 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:31:53] <icinga-wm>	 PROBLEM - puppet last run on mw1042 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:32:02] <icinga-wm>	 PROBLEM - puppet last run on mw1052 is CRITICAL: CRITICAL: Puppet has 2 failures  
[06:32:02] <icinga-wm>	 PROBLEM - puppet last run on cp3014 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:32:02] <icinga-wm>	 PROBLEM - puppet last run on mw1065 is CRITICAL: CRITICAL: Puppet has 2 failures  
[06:32:03] <icinga-wm>	 PROBLEM - puppet last run on lvs2004 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:32:09] <ori>	 at this point, if you guys fix this, i think i might miss it
[06:32:12] <icinga-wm>	 PROBLEM - puppet last run on mw1123 is CRITICAL: CRITICAL: Puppet has 3 failures  
[06:32:12] <icinga-wm>	 PROBLEM - puppet last run on db1059 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:32:12] <icinga-wm>	 PROBLEM - puppet last run on mw1118 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:32:12] <icinga-wm>	 PROBLEM - puppet last run on ms-fe1004 is CRITICAL: CRITICAL: Puppet has 3 failures  
[06:32:13] <icinga-wm>	 PROBLEM - puppet last run on mw1144 is CRITICAL: CRITICAL: Puppet has 2 failures  
[06:32:22] <icinga-wm>	 PROBLEM - puppet last run on mw1025 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:32:22] <icinga-wm>	 PROBLEM - puppet last run on mw1170 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:32:34] <_joe_>	 hi, I'm mod passenger
[06:32:53] <_joe_>	 I am a clueless piece of ruby glue you run your infrastructure on
[06:33:18] <Nemo_bis>	 welcome
[06:33:26] <_joe_>	 ciao Nemo_bis 
[06:33:31] <Nemo_bis>	 'ao
[06:42:37] <icinga-wm>	 PROBLEM - puppet last run on mw1110 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:42:51] <ori>	 there's always a straggler in ever litter
[06:42:55] <ori>	 *every
[06:45:07] <icinga-wm>	 RECOVERY - puppet last run on db1059 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures  
[06:45:07] <icinga-wm>	 RECOVERY - puppet last run on ms-fe1004 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures  
[06:45:16] <icinga-wm>	 RECOVERY - puppet last run on analytics1030 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures  
[06:45:26] <icinga-wm>	 RECOVERY - puppet last run on labsdb1003 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures  
[06:45:27] <icinga-wm>	 RECOVERY - puppet last run on amssq35 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures  
[06:45:41] <icinga-wm>	 RECOVERY - puppet last run on mw1061 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures  
[06:45:41] <icinga-wm>	 RECOVERY - puppet last run on db1018 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures  
[06:45:41] <icinga-wm>	 RECOVERY - puppet last run on mw1166 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures  
[06:45:46] <icinga-wm>	 RECOVERY - puppet last run on db1021 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures  
[06:45:56] <icinga-wm>	 RECOVERY - puppet last run on mw1042 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures  
[06:46:06] <icinga-wm>	 RECOVERY - puppet last run on mw1052 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures  
[06:46:07] <icinga-wm>	 RECOVERY - puppet last run on mw1065 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures  
[06:46:07] <icinga-wm>	 RECOVERY - puppet last run on lvs2004 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures  
[06:46:07] <icinga-wm>	 RECOVERY - puppet last run on cp3014 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures  
[06:46:07] <icinga-wm>	 RECOVERY - puppet last run on cp3016 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures  
[06:46:16] <icinga-wm>	 RECOVERY - puppet last run on mw1118 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures  
[06:46:17] <icinga-wm>	 RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures  
[06:46:26] <icinga-wm>	 RECOVERY - puppet last run on cp4008 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures  
[06:46:29] <icinga-wm>	 RECOVERY - puppet last run on db1051 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures  
[06:46:29] <icinga-wm>	 RECOVERY - puppet last run on mw1144 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures  
[06:46:36] <icinga-wm>	 RECOVERY - puppet last run on mw1025 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures  
[06:46:56] <icinga-wm>	 RECOVERY - puppet last run on mw1092 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures  
[06:47:07] <grrrit-wm>	 (03PS1) 10Gage: logstash: hadoop: disable output temporarily [puppet] - 10https://gerrit.wikimedia.org/r/169655 
[06:47:17] <icinga-wm>	 RECOVERY - puppet last run on mw1123 is OK: OK: Puppet is currently enabled, last run 61 seconds ago with 0 failures  
[06:47:27] <icinga-wm>	 RECOVERY - puppet last run on cp4014 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures  
[06:47:38] <icinga-wm>	 RECOVERY - puppet last run on amssq47 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures  
[06:47:46] <icinga-wm>	 RECOVERY - puppet last run on mw1011 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures  
[06:47:57] <icinga-wm>	 RECOVERY - puppet last run on amssq48 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures  
[06:48:18] <icinga-wm>	 RECOVERY - puppet last run on amssq60 is OK: OK: Puppet is currently enabled, last run 61 seconds ago with 0 failures  
[06:49:57] <icinga-wm>	 RECOVERY - puppet last run on analytics1010 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures  
[06:52:02] <grrrit-wm>	 (03CR) 10Gage: [C: 032] "Current output is too high for existing storage. I will minimize & reenable; we will scale logstash service." [puppet] - 10https://gerrit.wikimedia.org/r/169655 (owner: 10Gage)
[06:53:26] <grrrit-wm>	 (03CR) 10Springle: "Adding mysql::password_file sounds logical. We could probably use it places that currently touch, or expect, /root/.my.cnf." [puppet] - 10https://gerrit.wikimedia.org/r/168993 (owner: 10Ottomata)
[06:56:28] <icinga-wm>	 RECOVERY - puppet last run on mw1110 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures  
[07:16:21] <ori>	 !log repooled mw1189 w/patched hhvm (<https://phabricator.wikimedia.org/T820#16428>)
[07:16:29] <morebots>	 Logged the message, Master
[07:18:18] <grrrit-wm>	 (03CR) 10Giuseppe Lavagetto: "+1 to making this a general-purpose class... but then, what about a more general mysql::config::file module that works similarly to what t" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/168993 (owner: 10Ottomata)
[07:27:51] <grrrit-wm>	 (03CR) 10Springle: "If mysql::config::file means auto generating /etc/my.cnf, then I'll be complaining and stonewalling on production boxes :-) Plain text con" [puppet] - 10https://gerrit.wikimedia.org/r/168993 (owner: 10Ottomata)
[07:34:19] <grrrit-wm>	 (03CR) 10Giuseppe Lavagetto: "yep I wasn't so foolish as to suggest a full my.cnf for a prod server to be done this way :)" [puppet] - 10https://gerrit.wikimedia.org/r/168993 (owner: 10Ottomata)
[07:34:33] <_joe_>	 springle: did you mistake me for a dev? :)
[07:34:37] <springle>	 _joe_: hehe ;) just checking
[07:37:16] <springle>	 _joe_: so bascially an INI file generator?
[07:37:24] <springle>	 that doesn't even need to be mysql::
[07:37:35] <_joe_>	 springle: we already have _that_
[07:37:44] <_joe_>	 springle: lemme find that for you
[07:37:50] <springle>	 so.. why do we need another?
[07:37:53] * springle confused
[07:38:12] <_joe_>	 not the generator, a define specialized for mysql files maybe
[07:38:25] <_joe_>	 springle: I'll bake something up to show it
[07:38:32] <_joe_>	 it's easier done than explained
[07:38:58] <springle>	 :)
[07:39:04] <_joe_>	 modules/wmflib/lib/puppet/parser/functions/php_ini.rb basically does what we need
[07:39:06] <icinga-wm>	 PROBLEM - Router interfaces on cr2-ulsfo is CRITICAL: CRITICAL: host 198.35.26.193, interfaces up: 72, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-1/2/0: down - Core:  cr1-eqiad:xe-4/2/1 (Giglinx/Zayo, ETYX/084858//ZYO) {#1062} [10Gbps MPLS]BR  
[07:39:11] <_joe_>	 modulo sections
[07:39:18] <_joe_>	 which are not used in php
[07:40:12] <_joe_>	 springle: our mysql module is puppetlabs one?
[07:40:59] <_joe_>	 *the
[07:41:04] <springle>	 was once, i think. doubt it's current. it isn't used for production, save for a hook or two in old coredb
[07:41:14] <_joe_>	 oh ok
[07:41:15] <springle>	 i really dont know who else uses it
[07:41:24] <_joe_>	 so, where should I put such a resource?
[07:41:36] <icinga-wm>	 PROBLEM - ElasticSearch health check on logstash1003 is CRITICAL: CRITICAL - Could not connect to server 10.64.32.136  
[07:42:47] <_joe_>	 springle: it's used all over analytics puppet resources
[07:43:02] <springle>	 best to see what otto wants, then
[07:43:21] <_joe_>	 manifests/misc/statistics.pp is responsible for most of the used-only-once code we have
[07:43:36] <icinga-wm>	 PROBLEM - ElasticSearch health check for shards on logstash1003 is CRITICAL: CRITICAL - elasticsearch http://10.64.32.136:9200/_cluster/health error while fetching: Request timed out.  
[07:43:56] <_joe_>	 jgage: is this you? ^^
[07:44:26] <icinga-wm>	 RECOVERY - Router interfaces on cr2-ulsfo is OK: OK: host 198.35.26.193, interfaces up: 74, down: 0, dormant: 0, excluded: 0, unused: 0  
[07:44:26] <icinga-wm>	 PROBLEM - ElasticSearch health check for shards on logstash1002 is CRITICAL: CRITICAL - elasticsearch inactive shards 31 threshold =0.1% breach: {ustatus: uyellow, unumber_of_nodes: 2, uunassigned_shards: 31, utimed_out: False, uactive_primary_shards: 46, ucluster_name: uproduction-logstash-eqiad, urelocating_shards: 0, uactive_shards: 92, uinitializing_shards: 0, unumber_of_data_nodes: 2}  
[07:44:36] <icinga-wm>	 PROBLEM - ElasticSearch health check for shards on logstash1001 is CRITICAL: CRITICAL - elasticsearch inactive shards 31 threshold =0.1% breach: {ustatus: uyellow, unumber_of_nodes: 2, uunassigned_shards: 31, utimed_out: False, uactive_primary_shards: 46, ucluster_name: uproduction-logstash-eqiad, urelocating_shards: 0, uactive_shards: 92, uinitializing_shards: 0, unumber_of_data_nodes: 2}  
[07:46:38] <jgage>	 not sure if i caused that alert but i am tryign to solve that problem
[07:47:47] <_joe_>	 ok thanks
[07:48:00] <_joe_>	 I'm going to have the power out for ~ 2 hours in a few
[07:48:06] <jgage>	 i ok
[07:48:12] <jgage>	 s/i //
[07:48:46] <icinga-wm>	 RECOVERY - ElasticSearch health check for shards on logstash1002 is OK: OK - elasticsearch status production-logstash-eqiad: status: yellow, number_of_nodes: 3, unassigned_shards: 9, timed_out: False, active_primary_shards: 46, cluster_name: production-logstash-eqiad, relocating_shards: 0, active_shards: 112, initializing_shards: 2, number_of_data_nodes: 3  
[07:48:53] <jgage>	 phew
[07:48:55] <icinga-wm>	 RECOVERY - ElasticSearch health check for shards on logstash1001 is OK: OK - elasticsearch status production-logstash-eqiad: status: yellow, number_of_nodes: 3, unassigned_shards: 1, timed_out: False, active_primary_shards: 46, cluster_name: production-logstash-eqiad, relocating_shards: 0, active_shards: 121, initializing_shards: 1, number_of_data_nodes: 3  
[07:49:00] <icinga-wm>	 RECOVERY - ElasticSearch health check for shards on logstash1003 is OK: OK - elasticsearch status production-logstash-eqiad: status: yellow, number_of_nodes: 3, unassigned_shards: 1, timed_out: False, active_primary_shards: 46, cluster_name: production-logstash-eqiad, relocating_shards: 0, active_shards: 121, initializing_shards: 1, number_of_data_nodes: 3  
[07:49:44] <jgage>	 i did a delete by query in attempt to recover some disk space. doesn't seem to have worked though.
[07:51:45] <_joe_>	 jgage: you probably need to redo/compact indices
[07:53:15] <jgage>	 hm ok. *looks that up*
[08:23:06] <grrrit-wm>	 (03PS2) 10KartikMistry: Update Debian package to upstream r57689 [debs/contenttranslation/apertium-apy] - 10https://gerrit.wikimedia.org/r/168760 
[09:05:41] <godog>	 Reedy: I'm assuming https://gerrit.wikimedia.org/r/#/c/169294/ is already deployed?
[09:12:43] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 031] "dependent change seems to have been deployed, will merge this later today" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/169295 (owner: 10Chad)
[09:14:11] <grrrit-wm>	 (03Abandoned) 10Filippo Giunchedi: eqiad-prod: reduce weight on ms-be1013/1014/1015 to help shed some load [software/swift-ring] - 10https://gerrit.wikimedia.org/r/166544 (owner: 10Filippo Giunchedi)
[09:20:51] <grrrit-wm>	 (03PS1) 10Filippo Giunchedi: new script: swift-add-machine [software/swift-ring] - 10https://gerrit.wikimedia.org/r/169662 
[09:21:12] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 032 V: 032] new script: swift-add-machine [software/swift-ring] - 10https://gerrit.wikimedia.org/r/169662 (owner: 10Filippo Giunchedi)
[09:42:19] <icinga-wm>	 PROBLEM - puppet last run on ocg1001 is CRITICAL: CRITICAL: Puppet last ran 14433 seconds ago, expected  14400  
[09:52:32] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] "On a side note, this class is only used in two places. dns::recursor::statistics (which is only used on nescio) and mailman (sodium). It a" [puppet] - 10https://gerrit.wikimedia.org/r/169561 (owner: 10Dzahn)
[09:55:53] <matanya>	 akosiaris: i was looking into putting nginx there myself with mutante last night
[09:56:47] <matanya>	 and now that this is merged, i'd love if you can please handle https://gerrit.wikimedia.org/r/#/c/169571/ as well. Thanks akosiaris !
[09:59:21] <_joe_>	 eh I thought about moving to nginx when I moved everything to a module
[10:00:22] <akosiaris>	 yeah, we should kill the 2-3 instances of lighttpd that we got with fire ...
[10:00:25] <_joe_>	 but that seemed too many things at once
[10:00:40] <_joe_>	 akosiaris: AFAIK we use lighty in labs as well
[10:01:25] <akosiaris>	 really? ... sigh
[10:03:09] <matanya>	 we do
[10:03:26] <matanya>	 there was a long mail thread on labs-l
[10:06:48] <matanya>	 akosiaris: https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help#Web_services
[10:19:20] <YuviPanda>	 akosiaris: yeah, lighty is kind of the backbone of toollabs stuff
[10:20:44] <akosiaris>	 ok so lighttpd is in heavy use in labs. That's fine... production however is a different story
[10:20:59] <akosiaris>	 anyway, everything on its own time
[10:40:41] <icinga-wm>	 PROBLEM - puppet last run on ssl1004 is CRITICAL: CRITICAL: Puppet has 1 failures  
[10:58:40] <icinga-wm>	 RECOVERY - puppet last run on ssl1004 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures  
[11:11:58] <mark>	 poor lighty
[11:12:02] <mark>	 what do y'all have against it
[11:48:13] <grrrit-wm>	 (03PS1) 10Alexandros Kosiaris: Fix bacula rspec tests [puppet] - 10https://gerrit.wikimedia.org/r/169679 
[11:48:15] <grrrit-wm>	 (03PS1) 10Alexandros Kosiaris: Modularize backups.pp [puppet] - 10https://gerrit.wikimedia.org/r/169680 
[12:01:36] <springle>	 !log xtrabackup clone db1007 to db2029
[12:01:37] <cmjohnson>	 !log elastic1001, elastic1008 and elastic1013 powering down to replace ssds RT7779 
[12:01:43] <morebots>	 Logged the message, Master
[12:01:49] <morebots>	 Logged the message, Master
[12:05:51] <icinga-wm>	 PROBLEM - Host elastic1001 is DOWN: PING CRITICAL - Packet loss = 100%  
[12:06:11] <icinga-wm>	 PROBLEM - Host elastic1008 is DOWN: CRITICAL - Plugin timed out after 15 seconds  
[12:06:31] <icinga-wm>	 PROBLEM - Host elastic1013 is DOWN: CRITICAL - Plugin timed out after 15 seconds  
[12:09:58] <icinga-wm>	 ACKNOWLEDGEMENT - DPKG on elastic1001 is CRITICAL: Timeout while attempting connection Chris Johnson upgrading ssds
[12:09:58] <icinga-wm>	 ACKNOWLEDGEMENT - Disk space on elastic1001 is CRITICAL: Timeout while attempting connection Chris Johnson upgrading ssds
[12:09:58] <icinga-wm>	 ACKNOWLEDGEMENT - ElasticSearch health check on elastic1001 is CRITICAL: CRITICAL - Could not connect to server 10.64.0.108 Chris Johnson upgrading ssds
[12:09:58] <icinga-wm>	 ACKNOWLEDGEMENT - ElasticSearch health check for shards on elastic1001 is CRITICAL: CRITICAL - elasticsearch http://10.64.0.108:9200/_cluster/health error while fetching: Max retries exceeded for url: /_cluster/health Chris Johnson upgrading ssds
[12:09:58] <icinga-wm>	 ACKNOWLEDGEMENT - NTP on elastic1001 is CRITICAL: NTP CRITICAL: No response from NTP server Chris Johnson upgrading ssds
[12:09:58] <icinga-wm>	 ACKNOWLEDGEMENT - RAID on elastic1001 is CRITICAL: Timeout while attempting connection Chris Johnson upgrading ssds
[12:09:59] <icinga-wm>	 ACKNOWLEDGEMENT - SSH on elastic1001 is CRITICAL: Connection timed out Chris Johnson upgrading ssds
[12:09:59] <icinga-wm>	 ACKNOWLEDGEMENT - check configured eth on elastic1001 is CRITICAL: Timeout while attempting connection Chris Johnson upgrading ssds
[12:10:00] <icinga-wm>	 ACKNOWLEDGEMENT - check if dhclient is running on elastic1001 is CRITICAL: Timeout while attempting connection Chris Johnson upgrading ssds
[12:10:00] <icinga-wm>	 ACKNOWLEDGEMENT - check if salt-minion is running on elastic1001 is CRITICAL: Timeout while attempting connection Chris Johnson upgrading ssds
[12:10:01] <icinga-wm>	 ACKNOWLEDGEMENT - puppet last run on elastic1001 is CRITICAL: Timeout while attempting connection Chris Johnson upgrading ssds
[12:10:37] <icinga-wm>	 ACKNOWLEDGEMENT - DPKG on elastic1008 is CRITICAL: Timeout while attempting connection Chris Johnson Upgrading ssds
[12:10:39] <icinga-wm>	 ACKNOWLEDGEMENT - Disk space on elastic1008 is CRITICAL: Timeout while attempting connection Chris Johnson Upgrading ssds
[12:10:39] <icinga-wm>	 ACKNOWLEDGEMENT - ElasticSearch health check on elastic1008 is CRITICAL: CRITICAL - Could not connect to server 10.64.32.140 Chris Johnson Upgrading ssds
[12:10:39] <icinga-wm>	 ACKNOWLEDGEMENT - ElasticSearch health check for shards on elastic1008 is CRITICAL: CRITICAL - elasticsearch http://10.64.32.140:9200/_cluster/health error while fetching: Max retries exceeded for url: /_cluster/health Chris Johnson Upgrading ssds
[12:10:39] <icinga-wm>	 ACKNOWLEDGEMENT - NTP on elastic1008 is CRITICAL: NTP CRITICAL: No response from NTP server Chris Johnson Upgrading ssds
[12:10:39] <icinga-wm>	 ACKNOWLEDGEMENT - RAID on elastic1008 is CRITICAL: Timeout while attempting connection Chris Johnson Upgrading ssds
[12:10:39] <icinga-wm>	 ACKNOWLEDGEMENT - SSH on elastic1008 is CRITICAL: Connection timed out Chris Johnson Upgrading ssds
[12:10:39] <icinga-wm>	 ACKNOWLEDGEMENT - check configured eth on elastic1008 is CRITICAL: Timeout while attempting connection Chris Johnson Upgrading ssds
[12:10:40] <icinga-wm>	 ACKNOWLEDGEMENT - check if dhclient is running on elastic1008 is CRITICAL: Timeout while attempting connection Chris Johnson Upgrading ssds
[12:10:40] <icinga-wm>	 ACKNOWLEDGEMENT - check if salt-minion is running on elastic1008 is CRITICAL: Timeout while attempting connection Chris Johnson Upgrading ssds
[12:10:41] <icinga-wm>	 ACKNOWLEDGEMENT - puppet last run on elastic1008 is CRITICAL: Timeout while attempting connection Chris Johnson Upgrading ssds
[12:11:04] <icinga-wm>	 ACKNOWLEDGEMENT - DPKG on elastic1013 is CRITICAL: Timeout while attempting connection Chris Johnson Upgrading ssds
[12:11:04] <icinga-wm>	 ACKNOWLEDGEMENT - Disk space on elastic1013 is CRITICAL: Timeout while attempting connection Chris Johnson Upgrading ssds
[12:11:04] <icinga-wm>	 ACKNOWLEDGEMENT - ElasticSearch health check on elastic1013 is CRITICAL: CRITICAL - Could not connect to server 10.64.48.10 Chris Johnson Upgrading ssds
[12:11:04] <icinga-wm>	 ACKNOWLEDGEMENT - ElasticSearch health check for shards on elastic1013 is CRITICAL: CRITICAL - elasticsearch http://10.64.48.10:9200/_cluster/health error while fetching: Max retries exceeded for url: /_cluster/health Chris Johnson Upgrading ssds
[12:11:04] <icinga-wm>	 ACKNOWLEDGEMENT - NTP on elastic1013 is CRITICAL: NTP CRITICAL: No response from NTP server Chris Johnson Upgrading ssds
[12:11:04] <icinga-wm>	 ACKNOWLEDGEMENT - RAID on elastic1013 is CRITICAL: Timeout while attempting connection Chris Johnson Upgrading ssds
[12:11:05] <icinga-wm>	 ACKNOWLEDGEMENT - SSH on elastic1013 is CRITICAL: Connection timed out Chris Johnson Upgrading ssds
[12:11:05] <icinga-wm>	 ACKNOWLEDGEMENT - check configured eth on elastic1013 is CRITICAL: Timeout while attempting connection Chris Johnson Upgrading ssds
[12:11:06] <icinga-wm>	 ACKNOWLEDGEMENT - check if dhclient is running on elastic1013 is CRITICAL: Timeout while attempting connection Chris Johnson Upgrading ssds
[12:11:06] <icinga-wm>	 ACKNOWLEDGEMENT - check if salt-minion is running on elastic1013 is CRITICAL: Timeout while attempting connection Chris Johnson Upgrading ssds
[12:11:07] <icinga-wm>	 ACKNOWLEDGEMENT - puppet last run on elastic1013 is CRITICAL: Timeout while attempting connection Chris Johnson Upgrading ssds
[12:18:40] <matanya>	 akosiaris: can you please pm me list of hosts lower than 12.04 ?
[12:21:47] <akosiaris>	 matanya: no need for pm really. sodium, nickel (pretty much ready to be replaced), nescio and ms1004
[12:21:53] <akosiaris>	 so down to 3 soon :-)
[12:22:03] <matanya>	 Yay! thanks :)
[12:24:14] <matanya>	 akosiaris: on top of this, tampa is 100% out? no netapp there anymore ?
[12:26:23] <grrrit-wm>	 (03PS2) 10Giuseppe Lavagetto: hiera: mediawiki-based backend for labs [puppet] - 10https://gerrit.wikimedia.org/r/168984 
[12:26:38] <akosiaris>	 yes tampa is 100% out, and the netapp is being moved to codfw
[12:26:45] <grrrit-wm>	 (03PS1) 10Alexandros Kosiaris: Remove PMTPA from icinga::nsca::firewall [puppet] - 10https://gerrit.wikimedia.org/r/169685 
[12:29:37] <matanya>	 thanks akosiaris so what would happen with /srv/home_pmtpa on bast1001 ?
[12:29:55] <matanya>	 should it be changed to codfw ?
[12:31:19] <akosiaris>	 Ι 've already fixed that 
[12:32:48] <akosiaris>	 it is the old historic home from pmtpa and should be mounted on bast1001 as is for people to access it if they want 
[12:33:04] <akosiaris>	 at some point we will delete it I suppose, but not yet
[12:34:53] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] Remove PMTPA from icinga::nsca::firewall [puppet] - 10https://gerrit.wikimedia.org/r/169685 (owner: 10Alexandros Kosiaris)
[12:35:00] <matanya>	 so :     class { 'nfs::netapp::home':
[12:35:00] <matanya>	         mountpoint => '/srv/home_pmtpa',
[12:35:00] <matanya>	         mount_site => 'pmtpa',
[12:35:06] <matanya>	 is correct. thanks
[12:35:59] <akosiaris>	 yes it is
[12:43:41] <icinga-wm>	 RECOVERY - Host elastic1001 is UP: PING OK - Packet loss = 0%, RTA = 1.58 ms  
[12:49:15] <^demon|away>	 cmjohnson: Ah, just saw the ack's on elastic* for the ssd swaps, coolio.
[12:49:22] <^demon|away>	 I'm around far too early if you need a hand from our side.
[12:50:43] <icinga-wm>	 PROBLEM - Host elastic1001 is DOWN: PING CRITICAL - Packet loss = 100%  
[12:51:01] <icinga-wm>	 RECOVERY - Host elastic1001 is UP: PING OK - Packet loss = 0%, RTA = 5.28 ms  
[12:51:08] <cmjohnson>	 thanks...just getting ready to boot up and install now
[12:51:47] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] "This is fine, but the repo misses the upstream/0.1+svn_57689 tag" [debs/contenttranslation/apertium-apy] - 10https://gerrit.wikimedia.org/r/168760 (owner: 10KartikMistry)
[12:57:03] <kart_>	 akosiaris: Fixed^ Thanks.
[12:58:48] <akosiaris>	 thnaks
[12:58:51] <akosiaris>	 thanks*
[12:59:01] <matanya>	 me again, sorry for nagging today akosiaris , can i add a ferm rule without a port? i.e any ?
[12:59:07] <cmjohnson>	 !log disabling puppet on elastic1017 and 1018
[12:59:13] <morebots>	 Logged the message, Master
[12:59:31] <akosiaris>	 matanya: yeah sure
[12:59:43] <akosiaris>	 you probably need to use the ferm::rule define and not the ferm::service
[12:59:52] <icinga-wm>	 RECOVERY - Host elastic1008 is UP: PING OK - Packet loss = 0%, RTA = 1.96 ms  
[12:59:57] <matanya>	 who, the ferm modules doesn't hint at that :)
[13:00:02] <akosiaris>	 and know a little bit about ferm in general
[13:00:41] <akosiaris>	 you mean it is missing docs ? 
[13:00:42] <matanya>	 something like: saddr (0.0.0.0/0) proto tcp dport (ssh) ACCEPT
[13:00:42] <icinga-wm>	 RECOVERY - Host elastic1013 is UP: PING OK - Packet loss = 0%, RTA = 2.09 ms  
[13:00:56] <matanya>	 yes, saying that in a nice way
[13:01:02] <akosiaris>	 it does indeed
[13:01:22] <akosiaris>	 but what you just pointed out, does have a port (ssh) in the rule
[13:01:32] <cmjohnson>	 !log powering down/replacing elastic1017 and elastic1018
[13:01:38] <morebots>	 Logged the message, Master
[13:01:48] <akosiaris>	 so in order to avoid an XY problem, what are you trying to do matanya ?
[13:02:03] <icinga-wm>	 PROBLEM - puppet last run on elastic1017 is CRITICAL: Timeout while attempting connection  
[13:02:22] <matanya>	 replace iptable classes in misc/udp2log
[13:02:31] <matanya>	 i worte the code, but have no way to test it
[13:03:05] <icinga-wm>	 ACKNOWLEDGEMENT - DPKG on elastic1017 is CRITICAL: Timeout while attempting connection Chris Johnson Replacing the server
[13:03:05] <icinga-wm>	 ACKNOWLEDGEMENT - ElasticSearch health check on elastic1017 is CRITICAL: CRITICAL - Could not connect to server 10.64.48.39 Chris Johnson Replacing the server
[13:03:05] <icinga-wm>	 ACKNOWLEDGEMENT - ElasticSearch health check for shards on elastic1017 is CRITICAL: CRITICAL - elasticsearch http://10.64.48.39:9200/_cluster/health error while fetching: Max retries exceeded for url: /_cluster/health Chris Johnson Replacing the server
[13:03:05] <icinga-wm>	 ACKNOWLEDGEMENT - NTP on elastic1017 is CRITICAL: NTP CRITICAL: No response from NTP server Chris Johnson Replacing the server
[13:03:05] <icinga-wm>	 ACKNOWLEDGEMENT - RAID on elastic1017 is CRITICAL: Timeout while attempting connection Chris Johnson Replacing the server
[13:03:06] <icinga-wm>	 ACKNOWLEDGEMENT - SSH on elastic1017 is CRITICAL: Connection refused Chris Johnson Replacing the server
[13:03:06] <icinga-wm>	 ACKNOWLEDGEMENT - check configured eth on elastic1017 is CRITICAL: Timeout while attempting connection Chris Johnson Replacing the server
[13:03:07] <icinga-wm>	 ACKNOWLEDGEMENT - check if dhclient is running on elastic1017 is CRITICAL: Timeout while attempting connection Chris Johnson Replacing the server
[13:03:07] <icinga-wm>	 ACKNOWLEDGEMENT - check if salt-minion is running on elastic1017 is CRITICAL: Timeout while attempting connection Chris Johnson Replacing the server
[13:03:08] <icinga-wm>	 ACKNOWLEDGEMENT - puppet last run on elastic1017 is CRITICAL: Timeout while attempting connection Chris Johnson Replacing the server
[13:03:30] <icinga-wm>	 ACKNOWLEDGEMENT - ElasticSearch health check for shards on elastic1018 is CRITICAL: CRITICAL - elasticsearch http://10.64.48.40:9200/_cluster/health error while fetching: Max retries exceeded for url: /_cluster/health Chris Johnson Replacing the server
[13:03:30] <icinga-wm>	 ACKNOWLEDGEMENT - check if salt-minion is running on elastic1018 is CRITICAL: Timeout while attempting connection Chris Johnson Replacing the server
[13:03:51] <icinga-wm>	 PROBLEM - Host elastic1017 is DOWN: CRITICAL - Plugin timed out after 15 seconds  
[13:04:12] <icinga-wm>	 PROBLEM - Host elastic1018 is DOWN: PING CRITICAL - Packet loss = 100%  
[13:06:42] <icinga-wm>	 PROBLEM - Host elastic1008 is DOWN: PING CRITICAL - Packet loss = 100%  
[13:07:22] <icinga-wm>	 PROBLEM - Host elastic1013 is DOWN: PING CRITICAL - Packet loss = 100%  
[13:08:01] <icinga-wm>	 RECOVERY - Host elastic1008 is UP: PING OK - Packet loss = 0%, RTA = 1.17 ms  
[13:08:59] <grrrit-wm>	 (03PS1) 10Matanya: udp2log: replace iptables with ferm [puppet] - 10https://gerrit.wikimedia.org/r/169691 
[13:09:02] <icinga-wm>	 RECOVERY - Host elastic1013 is UP: PING OK - Packet loss = 0%, RTA = 1.57 ms  
[13:09:05] <matanya>	 akosiaris: this ^
[13:09:40] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] udp2log: replace iptables with ferm [puppet] - 10https://gerrit.wikimedia.org/r/169691 (owner: 10Matanya)
[13:10:28] <grrrit-wm>	 (03PS2) 10Matanya: udp2log: replace iptables with ferm [puppet] - 10https://gerrit.wikimedia.org/r/169691 
[13:11:26] <manybubbles>	 !log lowered redundancy on logstash from 3 way to 2 way
[13:11:32] <icinga-wm>	 RECOVERY - ElasticSearch health check on logstash1003 is OK: OK - elasticsearch (production-logstash-eqiad) is running. status: green: timed_out: false: number_of_nodes: 3: number_of_data_nodes: 3: active_primary_shards: 46: active_shards: 92: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0  
[13:11:32] <icinga-wm>	 RECOVERY - ElasticSearch health check on logstash1001 is OK: OK - elasticsearch (production-logstash-eqiad) is running. status: green: timed_out: false: number_of_nodes: 3: number_of_data_nodes: 3: active_primary_shards: 46: active_shards: 92: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0  
[13:11:32] <icinga-wm>	 RECOVERY - ElasticSearch health check on logstash1002 is OK: OK - elasticsearch (production-logstash-eqiad) is running. status: green: timed_out: false: number_of_nodes: 3: number_of_data_nodes: 3: active_primary_shards: 46: active_shards: 92: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0  
[13:11:36] <morebots>	 Logged the message, Master
[13:13:16] <akosiaris>	 matanya: Seems the same if you take into account the DEFAULT DROP of base::firewall
[13:14:03] <matanya>	 akosiaris: so this change is useless ?
[13:14:32] <akosiaris>	 no, obviously not
[13:14:51] <akosiaris>	 I meant the functionality is the same
[13:15:17] <akosiaris>	 which is what you wanted
[13:16:09] <grrrit-wm>	 (03CR) 10Chad: Decom lsearchd pool 5 (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/169295 (owner: 10Chad)
[13:16:58] <^d>	 godog: That thing is a mess ^
[13:17:04] <^d>	 How pool 5 even works I don't even.
[13:17:31] <matanya>	 oh, thanks. i'm slow today :)
[13:19:00] <godog>	 ^d: haha just saw the full duplication below
[13:19:09] <grrrit-wm>	 (03PS1) 10Chad: Revert "Stop using lsearchd pool 5" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169693 
[13:19:24] <^d>	 But actually the dependent change broke lsearchd on those wikis.
[13:19:33] <^d>	 pool5 is unpuppetized, mostly.
[13:19:34] <^d>	 Figures.
[13:19:45] <grrrit-wm>	 (03CR) 10Chad: [C: 032 V: 032] Revert "Stop using lsearchd pool 5" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169693 (owner: 10Chad)
[13:20:36] <logmsgbot>	 !log demon Synchronized wmf-config/lucene-production.php: unbreak lsearchd for commons, enwikitionary, etc (duration: 00m 04s)
[13:20:42] <grrrit-wm>	 (03PS3) 10Matanya: udp2log: replace iptables with ferm [puppet] - 10https://gerrit.wikimedia.org/r/169691 
[13:20:42] <morebots>	 Logged the message, Master
[13:22:19] <godog>	 sigh, so not a full subset?
[13:22:51] <^d>	 Yeah.
[13:22:53] <manybubbles>	 !log lowered replication on logstash's template for new indexes from 3 way to 2 way
[13:22:53] <^d>	 Seems like
[13:22:57] <morebots>	 Logged the message, Master
[13:23:11] <manybubbles>	 godog and ^d: part of the problem with lsearchd is that its config is just all fucked up
[13:23:37] <godog>	 hahaha
[13:23:40] <^d>	 Yes.
[13:23:59] <^d>	 This is why we try to avoid touching it. Everything breaks.
[13:24:10] <^d>	 I was trying to chip off a tiny edge of the crap from the edge of shit mountain.
[13:24:14] <^d>	 But no, couldn't do that either.
[13:24:15] <godog>	 looks like we're up to dismantle it all together
[13:27:47] <grrrit-wm>	 (03PS1) 10Cmjohnson: Adding new elastic search mgmt dns and correcting WMF to wmf discrepencies [dns] - 10https://gerrit.wikimedia.org/r/169694 
[13:28:32] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: udp2log: replace iptables with ferm (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/169691 (owner: 10Matanya)
[13:28:59] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] udp2log: replace iptables with ferm [puppet] - 10https://gerrit.wikimedia.org/r/169691 (owner: 10Matanya)
[13:29:57] <matanya>	 akosiaris: it can be merged yet anyway because of hosts missing include base::firewall
[13:30:01] <matanya>	 *can't
[13:30:29] <akosiaris>	 matanya: yup
[13:31:18] <grrrit-wm>	 (03PS4) 10Matanya: udp2log: replace iptables with ferm [puppet] - 10https://gerrit.wikimedia.org/r/169691 
[13:32:16] <icinga-wm>	 PROBLEM - puppet last run on search1017 is CRITICAL: CRITICAL: Puppet has 1 failures  
[13:33:44] <grrrit-wm>	 (03Abandoned) 10Chad: Remove search-pool5 LVS entries, exists no more [dns] - 10https://gerrit.wikimedia.org/r/169300 (owner: 10Chad)
[13:33:52] <grrrit-wm>	 (03Abandoned) 10Chad: Decom lsearchd pool 5 [puppet] - 10https://gerrit.wikimedia.org/r/169295 (owner: 10Chad)
[13:39:50] <grrrit-wm>	 (03CR) 10Cmjohnson: [C: 032] Adding new elastic search mgmt dns and correcting WMF to wmf discrepencies [dns] - 10https://gerrit.wikimedia.org/r/169694 (owner: 10Cmjohnson)
[13:40:20] <matanya>	 ottomata: i think i'll really need your help with this one
[13:41:06] <cmjohnson>	 ottomata: fyi elastic1001,1008,1013 have the basic install. I removed all the old salt-keys and puppet certs. 
[13:41:30] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] udp2log: replace iptables with ferm (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/169691 (owner: 10Matanya)
[13:42:35] <grrrit-wm>	 (03PS5) 10Matanya: udp2log: replace iptables with ferm [puppet] - 10https://gerrit.wikimedia.org/r/169691 
[13:42:51] <matanya>	 the only question now is what hosts need base::firewall
[13:43:19] <matanya>	 i guess flourine, oxygen, erbium and some anaylitics
[13:43:23] <matanya>	 probably more
[13:43:37] <ottomata>	 eeef
[13:43:43] <ottomata>	 i am not excited about this!
[13:43:53] <matanya>	 share with me :)
[13:43:54] <ottomata>	 we are trying to get rid of udp2log!  (i know i have been saying that for over a year now)
[13:44:25] <ottomata>	 it is one of those systems that i prefer not to touch too much, because it is fragile, and people get really upset when it breaks
[13:44:27] <matanya>	 in the mean time, i'm trying to get rid of iptables.pp and this is blocking me :)
[13:44:30] <ottomata>	 and, we are getting rid of it
[13:45:20] <andre__>	 join #gsoc
[13:45:25] <andre__>	 (oops, sorry)
[13:50:07] <icinga-wm>	 RECOVERY - puppet last run on search1017 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures  
[13:56:08] <YuviPanda>	 akosiaris: hey! halfak wants to use the postgres db from labs, wondering how much capacity it has...
[13:56:13] <YuviPanda>	 in terms of storage capacity?
[13:58:47] <kart_>	 YuviPanda: hola
[13:58:55] <YuviPanda>	 kart_: hey
[13:59:05] <kart_>	 YuviPanda: Is it possible to get CPU/RAM usage in Beta?
[13:59:12] <kart_>	 for specific instances?
[13:59:13] <YuviPanda>	 kart_: graphite.wmflabs.org :)
[13:59:46] <akosiaris>	 YuviPanda: around 4T as soon as we clear it up from the osm mirror (which is no longer used)
[14:00:03] <kart_>	 YuviPanda: ah. What was the credential? :P
[14:00:59] <YuviPanda>	 kart_: no credentials/
[14:01:02] <akosiaris>	 performance wise it is not going to be great and he might have neighbors in the future but for now it would be pretty much only him
[14:01:12] <YuviPanda>	 halfak: ^
[14:01:25] <YuviPanda>	 akosiaris: can you make him an account?
[14:01:54] <kart_>	 akosiaris: Please :)
[14:01:58] <YuviPanda>	 kart_: you can also use http://tools.wmflabs.org/nagf
[14:02:29] <YuviPanda>	 kart_: https://tools.wmflabs.org/nagf/?project=deployment-prep for betalabs
[14:02:53] <halfak>	 akosiaris, thanks!
[14:03:49] <akosiaris>	 YuviPanda: I 've been meaning to ask you that. You had some code around to make it as easy as populating the mysql dbs for labs, right ?
[14:03:51] <kart_>	 YuviPanda: thanks!
[14:03:59] <akosiaris>	 wanna push it forward ?
[14:04:18] <halfak>	 akosiaris, How do I find out the connection details for postgres?
[14:04:35] <akosiaris>	 halfak: you ain't gonna be able to connect yet
[14:05:08] <YuviPanda>	 akosiaris: sure, I can. Need to figure out where to run it, but that's about it...
[14:05:13] <halfak>	 Oh totally.  I was just hoping to leave myself some notes for when I got back to this later.  :) 
[14:05:21] <YuviPanda>	 akosiaris: not this week though, I'm 'off', and only around totally as a volunteer :)
[14:05:31] <YuviPanda>	 not that that's different from other time, but probably won't do too much work..
[14:05:50] <YuviPanda>	 akosiaris: but yeah, if we think we're ready to open it up to everyone I'm totally up for it.
[14:06:12] <akosiaris>	 YuviPanda: we do think so (me thinks so). What you will need of me ?
[14:06:25] <YuviPanda>	 akosiaris: access credentials for postgres...
[14:06:36] <YuviPanda>	 akosiaris: if having access / root on the machine is good enough, I'll have it next week...
[14:07:03] <akosiaris>	 ok, take your vacation, I 'll create an account for the tool to create users and mail you
[14:07:12] <akosiaris>	 halfak: mind waiting till next week ?
[14:07:20] <halfak>	 akosiaris, I can. 
[14:07:24] <akosiaris>	 thanks :)
[14:07:35] <halfak>	 no, thank you :D
[14:08:10] <akosiaris>	 YuviPanda: thanks for the graphite links as well, they are gonna prove really helpful :-)
[14:08:13] <YuviPanda>	 akosiaris: cool :) do mail me whenever, I might get a head start.
[14:08:25] <YuviPanda>	 akosiaris: yeah :D Krinkle|detached wrote nagf which is quite nice too
[14:11:52] <manybubbles>	 ^d: later this evening (when I'm actually taking my day off :P) can you ban the current masters so they are empty?  
[14:12:17] <manybubbles>	 when we rebuild the "new" masters I'd like to be able to bring the old masters down right away
[14:12:22] <manybubbles>	 and we can do that if they are empty
[14:12:29] <^d>	 We could go ahead and start now.
[14:12:35] <manybubbles>	 we've got plenty of extra capacity
[14:12:37] <manybubbles>	 sure
[14:12:47] <^d>	 I'll do them 1 by 1 instead of all 3 at once.
[14:13:06] <manybubbles>	 we'll have banned 9 nodes out of 31 at that point - still plenty of capacity
[14:13:18] <manybubbles>	 ^d: I don't think it matter if you do them one by one or all 3 at once
[14:13:31] <^d>	 Yeah probably, sure.
[14:13:36] <manybubbles>	 oh - do you want to make your setting for concurrent moves persistent?  it seems pretty stable.
[14:15:04] <^d>	 All 3 banned.
[14:15:12] <^d>	 Yeah, I'll whip up a puppet change for it.
[14:15:13] <grrrit-wm>	 (03PS3) 10Ottomata: Require 2 ACKs from kafka brokers per default [puppet] - 10https://gerrit.wikimedia.org/r/167553 (https://bugzilla.wikimedia.org/69667) (owner: 10QChris)
[14:16:15] <manybubbles>	 thanks!
[14:17:26] <^d>	 Oh duh, we just need it in persistent.
[14:18:06] <^d>	 https://phabricator.wikimedia.org/P47
[14:21:17] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032] Require 2 ACKs from kafka brokers per default [puppet] - 10https://gerrit.wikimedia.org/r/167553 (https://bugzilla.wikimedia.org/69667) (owner: 10QChris)
[14:21:44] <ottomata>	 !log set request.required.acks = 2 for all varnishkafkas
[14:21:51] <morebots>	 Logged the message, Master
[14:22:56] <grrrit-wm>	 (03PS1) 10Chad: Only show comment when section exists [puppet] - 10https://gerrit.wikimedia.org/r/169703 
[14:23:41] <grrrit-wm>	 (03PS1) 10Cmjohnson: Changing dhcpd entries for elastic1017-19 [puppet] - 10https://gerrit.wikimedia.org/r/169704 
[14:24:22] <ottomata>	 manybubbles:, ^d, ok if i bring up 1017-1019?
[14:24:30] <manybubbles>	 ottomata: fine by me
[14:24:43] <manybubbles>	 they are currently still banned so they won't grab shards anyway
[14:24:45] <ottomata>	 oh, cmjohnson, did we change netboot stuff for those 3?
[14:24:46] <grrrit-wm>	 (03PS2) 10Cmjohnson: Changing dhcpd entries for elastic1017-19 [puppet] - 10https://gerrit.wikimedia.org/r/169704 
[14:24:48] <ottomata>	 ah
[14:24:49] <ottomata>	 haha
[14:24:54] <ottomata>	 hahah
[14:24:56] <manybubbles>	 we can validate them and then unban then
[14:24:58] <manybubbles>	 haha?
[14:25:16] <ottomata>	 as I was asking the question cmjohnson made the commit i was looking for
[14:25:59] <cmjohnson>	 heh 
[14:26:22] <cmjohnson>	 ottomata: did you see the ping about the other 3 (1001,1008,1013)
[14:26:29] <ottomata>	 yes
[14:26:31] <grrrit-wm>	 (03CR) 10Cmjohnson: [C: 032] Changing dhcpd entries for elastic1017-19 [puppet] - 10https://gerrit.wikimedia.org/r/169704 (owner: 10Cmjohnson)
[14:26:38] <ottomata>	 going to do these ones first, if you don't mind
[14:26:55] <ottomata>	 wait, i think so, cmjohnson, you said they had been removed and hds upgraded?
[14:27:17] <cmjohnson>	 yes...i did the base install and removed all the old salt keys and puppet certs
[14:27:36] <cmjohnson>	 you will need to add new 
[14:28:06] <ottomata>	 k cool
[14:29:17] <grrrit-wm>	 (03PS1) 10Glaisher: Raise account creation throttle at cawiki temporarily [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169708 (https://bugzilla.wikimedia.org/72611) 
[14:29:19] <ottomata>	 oh, cmjohnson, hypethreading?
[14:29:29] <ottomata>	 on, or should I do that?
[14:29:30] <cmjohnson>	 yep..i remembered
[14:29:31] <ottomata>	 cool!
[14:29:32] <ottomata>	 danke
[14:31:24] <ottomata>	 i think those are not your fault
[14:31:27] <ottomata>	 oops
[14:31:32] <ottomata>	 wrong chat
[14:38:22] <ottomata>	 manybubbles: i'm running puppet on elastic1001,1008,1013 now
[14:38:32] <ottomata>	 shoudl I mark them as master elligible now?
[14:38:33] <manybubbles>	 ottomata: did you merge that one?
[14:38:43] <ottomata>	 ?
[14:38:44] <manybubbles>	 ottomata: yeah - if your running puppet you should merge it
[14:38:53] <manybubbles>	 https://gerrit.wikimedia.org/r/#/c/169550/
[14:38:58] <ottomata>	 k will do that first
[14:39:23] <ottomata>	 manybubbles: should I wait then?
[14:39:27] <ottomata>	 before getting elasticsearch up on those?
[14:39:47] <manybubbles>	 ottomata: nah - its fine
[14:39:52] <manybubbles>	 you can set them up
[14:40:00] <ottomata>	 ok
[14:40:05] <grrrit-wm>	 (03PS2) 10Ottomata: Add three more master node to elasticsearch [puppet] - 10https://gerrit.wikimedia.org/r/169550 (owner: 10Manybubbles)
[14:40:06] <manybubbles>	 the other nodes should be empty pretty soon and we can bring them down
[14:41:01] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032] Add three more master node to elasticsearch [puppet] - 10https://gerrit.wikimedia.org/r/169550 (owner: 10Manybubbles)
[14:43:31] <icinga-wm>	 PROBLEM - DPKG on elastic1013 is CRITICAL: Connection refused by host  
[14:43:32] <icinga-wm>	 PROBLEM - Disk space on elastic1008 is CRITICAL: Connection refused by host  
[14:43:32] <icinga-wm>	 PROBLEM - ElasticSearch health check for shards on elastic1001 is CRITICAL: CRITICAL - elasticsearch http://10.64.0.108:9200/_cluster/health error while fetching: Max retries exceeded for url: /_cluster/health  
[14:43:42] <icinga-wm>	 PROBLEM - Disk space on elastic1013 is CRITICAL: Connection refused by host  
[14:43:42] <icinga-wm>	 PROBLEM - ElasticSearch health check on elastic1008 is CRITICAL: CRITICAL - Could not connect to server 10.64.32.140  
[14:43:52] <icinga-wm>	 PROBLEM - RAID on elastic1001 is CRITICAL: Connection refused by host  
[14:43:52] <icinga-wm>	 PROBLEM - ElasticSearch health check on elastic1013 is CRITICAL: CRITICAL - Could not connect to server 10.64.48.10  
[14:43:52] <icinga-wm>	 PROBLEM - ElasticSearch health check for shards on elastic1008 is CRITICAL: CRITICAL - elasticsearch http://10.64.32.140:9200/_cluster/health error while fetching: Max retries exceeded for url: /_cluster/health  
[14:44:02] <icinga-wm>	 PROBLEM - ElasticSearch health check for shards on elastic1013 is CRITICAL: CRITICAL - elasticsearch http://10.64.48.10:9200/_cluster/health error while fetching: Max retries exceeded for url: /_cluster/health  
[14:44:11] <icinga-wm>	 PROBLEM - check configured eth on elastic1001 is CRITICAL: Connection refused by host  
[14:44:14] <icinga-wm>	 PROBLEM - RAID on elastic1008 is CRITICAL: Connection refused by host  
[14:44:31] <icinga-wm>	 RECOVERY - Disk space on elastic1008 is OK: DISK OK  
[14:44:31] <icinga-wm>	 RECOVERY - DPKG on elastic1013 is OK: All packages OK  
[14:44:41] <icinga-wm>	 RECOVERY - Disk space on elastic1013 is OK: DISK OK  
[14:45:02] <icinga-wm>	 PROBLEM - puppet last run on elastic1001 is CRITICAL: CRITICAL: Puppet has 2 failures  
[14:45:06] <icinga-wm>	 RECOVERY - RAID on elastic1001 is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0  
[14:45:11] <icinga-wm>	 PROBLEM - puppet last run on elastic1008 is CRITICAL: CRITICAL: puppet fail  
[14:45:11] <icinga-wm>	 RECOVERY - check configured eth on elastic1001 is OK: NRPE: Unable to read output  
[14:45:11] <icinga-wm>	 RECOVERY - RAID on elastic1008 is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0  
[14:45:22] <icinga-wm>	 PROBLEM - puppet last run on elastic1013 is CRITICAL: CRITICAL: puppet fail  
[14:45:43] <icinga-wm>	 PROBLEM - ElasticSearch health check on elastic1001 is CRITICAL: CRITICAL - Could not connect to server 10.64.0.108  
[14:46:02] <ottomata>	 ^ acknowledged, puppet is bringing these up
[14:46:51] <icinga-wm>	 RECOVERY - ElasticSearch health check for shards on elastic1001 is OK: OK - elasticsearch status production-search-eqiad: status: green, number_of_nodes: 25, unassigned_shards: 0, timed_out: False, active_primary_shards: 2033, cluster_name: production-search-eqiad, relocating_shards: 66, active_shards: 6094, initializing_shards: 0, number_of_data_nodes: 25  
[14:46:51] <icinga-wm>	 RECOVERY - ElasticSearch health check on elastic1001 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 25: number_of_data_nodes: 25: active_primary_shards: 2033: active_shards: 6094: relocating_shards: 66: initializing_shards: 0: unassigned_shards: 0  
[14:47:03] <icinga-wm>	 RECOVERY - puppet last run on elastic1001 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures  
[14:47:15] <icinga-wm>	 RECOVERY - ElasticSearch health check for shards on elastic1008 is OK: OK - elasticsearch status production-search-eqiad: status: green, number_of_nodes: 28, unassigned_shards: 0, timed_out: False, active_primary_shards: 2033, cluster_name: production-search-eqiad, relocating_shards: 66, active_shards: 6094, initializing_shards: 0, number_of_data_nodes: 28  
[14:47:19] <ottomata>	 there they go!
[14:47:21] <icinga-wm>	 RECOVERY - ElasticSearch health check on elastic1013 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 28: number_of_data_nodes: 28: active_primary_shards: 2033: active_shards: 6094: relocating_shards: 66: initializing_shards: 0: unassigned_shards: 0  
[14:47:22] <ottomata>	 manybubbles: ^d
[14:47:32] <icinga-wm>	 RECOVERY - puppet last run on elastic1008 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures  
[14:47:33] <icinga-wm>	 RECOVERY - puppet last run on elastic1013 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures  
[14:47:41] <icinga-wm>	 RECOVERY - ElasticSearch health check for shards on elastic1013 is OK: OK - elasticsearch status production-search-eqiad: status: green, number_of_nodes: 28, unassigned_shards: 0, timed_out: False, active_primary_shards: 2033, cluster_name: production-search-eqiad, relocating_shards: 66, active_shards: 6094, initializing_shards: 0, number_of_data_nodes: 28  
[14:47:53] <^d>	 wheee
[14:48:05] <icinga-wm>	 RECOVERY - ElasticSearch health check on elastic1008 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 28: number_of_data_nodes: 28: active_primary_shards: 2033: active_shards: 6094: relocating_shards: 66: initializing_shards: 0: unassigned_shards: 0  
[14:48:15] <ottomata>	 elastic1007 10.64.32.139 57 35 1.50 d m elastic1007
[14:48:15] <ottomata>	 elastic1002 10.64.0.109  73 35 1.78 d * elastic1002
[14:48:16] <ottomata>	 elastic1014 10.64.48.11  20 35 0.23 d m elastic1014
[14:48:16] <ottomata>	 elastic1008 10.64.32.140  2 33 0.26 d m elastic1008
[14:48:16] <ottomata>	 elastic1013 10.64.48.10   0 33 0.23 d m elastic1013
[14:48:16] <ottomata>	 elastic1001 10.64.0.108   1 33 0.23 d m elastic1001
[14:49:06] <manybubbles>	 yeah
[14:49:33] <manybubbles>	 I'll unban the new nodes
[14:49:42] <icinga-wm>	 RECOVERY - Host elastic1019 is UP: PING OK - Packet loss = 0%, RTA = 0.73 ms  
[14:49:46] <^d>	 es-tool has made this so much easier.
[14:49:47] <^d>	 :p
[14:50:40] <grrrit-wm>	 (03PS2) 10Alexandros Kosiaris: osm export the expired tile list [puppet] - 10https://gerrit.wikimedia.org/r/169242 
[14:50:53] <mark>	 is this the addition of the new boxes?
[14:50:56] <anomie>	 manybubbles, marktraceur, ^d: Who wants to SWAT today?
[14:51:48] <^d>	 mark: New boxes mostly went in yesterday, this is adding remaining 3 new ones to replace 17-19 that we're retiring.
[14:51:59] <mark>	 ah cool
[14:52:01] <^d>	 And started swapping out ssds on 3 of the old boxes.
[14:52:01] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] osm export the expired tile list [puppet] - 10https://gerrit.wikimedia.org/r/169242 (owner: 10Alexandros Kosiaris)
[14:52:02] <icinga-wm>	 PROBLEM - RAID on elastic1019 is CRITICAL: Connection refused by host  
[14:52:02] <icinga-wm>	 PROBLEM - puppet last run on elastic1019 is CRITICAL: Connection refused by host  
[14:52:02] <icinga-wm>	 PROBLEM - puppet disabled on elastic1019 is CRITICAL: Connection refused by host  
[14:52:02] <icinga-wm>	 PROBLEM - check configured eth on elastic1019 is CRITICAL: Connection refused by host  
[14:52:02] <icinga-wm>	 PROBLEM - ElasticSearch health check on elastic1019 is CRITICAL: CRITICAL - Could not connect to server 10.64.48.41  
[14:52:29] <grrrit-wm>	 (03PS3) 10Alexandros Kosiaris: Backup user home dirs [puppet] - 10https://gerrit.wikimedia.org/r/168981 
[14:52:38] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] Backup user home dirs [puppet] - 10https://gerrit.wikimedia.org/r/168981 (owner: 10Alexandros Kosiaris)
[14:52:42] <icinga-wm>	 PROBLEM - check if dhclient is running on elastic1019 is CRITICAL: Connection refused by host  
[14:52:42] <icinga-wm>	 PROBLEM - DPKG on elastic1019 is CRITICAL: Connection refused by host  
[14:52:43] <icinga-wm>	 PROBLEM - Disk space on elastic1019 is CRITICAL: Connection refused by host  
[14:53:19] <manybubbles>	 anomie: not it
[14:53:57] <^d>	 anomie: It's only 1 patch so I can if you guys can't, but preferably not it.
[14:54:12] <icinga-wm>	 PROBLEM - Puppet freshness on elastic1019 is CRITICAL: Last successful Puppet run was Fri 01 Aug 2014 19:08:08 UTC  
[14:54:29] * anomie will take it unless marktraceur wants it
[14:55:25] <manybubbles>	 !log started rolling shards back to elastic1001, elastic1008, and elastic1013 after hard drive upgrade
[14:55:31] <morebots>	 Logged the message, Master
[14:56:01] <anomie>	 aude: Ping for SWAT in 4 minutes
[14:56:02] <marktraceur>	 anomie: All yours
[14:56:03] <manybubbles>	 ottomata: it looks like 1001 isn't hyper threading?
[14:56:08] <marktraceur>	 I have phone internet this morning
[14:56:18] <marktraceur>	 Because the cable guy is supposed to show up sometime in the next two hours
[14:56:18] <Glaisher>	 anomie: ping me too
[14:56:19] <aude>	 here
[14:56:28] <marktraceur>	 i.e. he'll show up four hours from now
[14:56:51] <icinga-wm>	 PROBLEM - NTP on elastic1013 is CRITICAL: NTP CRITICAL: Offset unknown  
[14:56:52] <icinga-wm>	 PROBLEM - puppet last run on terbium is CRITICAL: CRITICAL: puppet fail  
[14:56:52] <icinga-wm>	 PROBLEM - NTP on elastic1008 is CRITICAL: NTP CRITICAL: Offset unknown  
[14:57:26] <ottomata>	 hm, I didn't check them, cmjohnson?  
[14:57:34] <ottomata>	 hyperthreading on 1001,1008,1013?
[14:57:40] <manybubbles>	 ottomata: should I push the shards back off of 1001, 1008 and 1013 for ht bounce?
[14:58:03] <ottomata>	 oh cmjohnson ran out for a bit
[14:58:06] <ottomata>	 manybubbles: i'd say yes
[14:58:26] <ottomata>	 i asked cmjohnson if he turned ht on, and he said yes, but he might have been answering only about 1017-1019
[14:58:41] <ottomata>	 if you get the shards off, i'll reboot one (or all) and check (and turn it on if it iisn't)
[15:00:05] <jouncebot>	 manybubbles, anomie, ^d, marktraceur, aude: Dear anthropoid, the time has come. Please deploy SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141029T1500).
[15:00:09] * anomie begins SWAT
[15:00:14] <anomie>	 aude: I'll do yours first
[15:00:16] <manybubbles>	 !log started moving shard off of elastic1001, elastic1008, and elastic1013 so we can bounce them to enable hyper threading
[15:00:22] <aude>	 ok
[15:00:23] <morebots>	 Logged the message, Master
[15:01:33] <manybubbles>	 ottomata: none of them are empty yet
[15:02:11] <icinga-wm>	 RECOVERY - NTP on elastic1013 is OK: NTP OK: Offset -0.01706254482 secs  
[15:02:12] <icinga-wm>	 RECOVERY - NTP on elastic1008 is OK: NTP OK: Offset -0.02334403992 secs  
[15:02:31] <ottomata>	 k, lemme know when
[15:04:31] <icinga-wm>	 PROBLEM - NTP on elastic1019 is CRITICAL: NTP CRITICAL: No response from NTP server  
[15:06:28] <grrrit-wm>	 (03PS1) 10Aude: Re-enable hhvm beta feature on Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169714 
[15:07:04] <Reedy>	 !log Killed old (pre 1.25) l10nupdate cache dirs from tin:/var/lib/l10nupdate
[15:07:10] <morebots>	 Logged the message, Master
[15:07:37] <logmsgbot>	 !log anomie Synchronized php-1.25wmf5/extensions/Wikidata: SWAT: Fix WikiData "add links" widget JS error [[gerrit:169700]] (duration: 00m 15s)
[15:07:38] <anomie>	 aude: ^ test please
[15:07:43] <anomie>	 Glaisher: You're next
[15:07:44] <morebots>	 Logged the message, Master
[15:07:45] <aude>	 doing
[15:07:45] <icinga-wm>	 PROBLEM - puppet last run on iron is CRITICAL: CRITICAL: puppet fail  
[15:07:51] <Glaisher>	 k
[15:08:01] <grrrit-wm>	 (03PS2) 10Anomie: Raise account creation throttle at cawiki temporarily [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169708 (https://bugzilla.wikimedia.org/72611) (owner: 10Glaisher)
[15:08:51] <icinga-wm>	 PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: puppet fail  
[15:08:54] <manybubbles>	 ottomata: all of them are sitting there holding 3 shards
[15:09:08] <manybubbles>	 rataher - they are trying to move them away but its taking a while
[15:09:18] <ottomata>	 hmk
[15:10:03] <aude>	 anomie: looks good
[15:10:09] <grrrit-wm>	 (03CR) 10Anomie: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169708 (https://bugzilla.wikimedia.org/72611) (owner: 10Glaisher)
[15:10:16] <grrrit-wm>	 (03Merged) 10jenkins-bot: Raise account creation throttle at cawiki temporarily [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169708 (https://bugzilla.wikimedia.org/72611) (owner: 10Glaisher)
[15:10:27] <grrrit-wm>	 (03PS1) 10Reedy: Clone/update skins directory during l10nupdate [puppet] - 10https://gerrit.wikimedia.org/r/169715 (https://bugzilla.wikimedia.org/67154) 
[15:10:35] <logmsgbot>	 !log anomie Synchronized wmf-config/throttle.php: SWAT: Raise account creation throttle at cawiki temporarily [[gerrit:169708]] (duration: 00m 09s)
[15:10:35] <anomie>	 Glaisher: ^ I suppose there's no way to test that
[15:10:40] <morebots>	 Logged the message, Master
[15:10:46] <manybubbles>	 ottomata: meh - its ok - you can bounce 1013
[15:10:47] <Glaisher>	 mhm
[15:10:48] <Glaisher>	 thanks
[15:10:54] * anomie is done with SWAT
[15:11:11] <icinga-wm>	 PROBLEM - puppet last run on labsdb1006 is CRITICAL: CRITICAL: puppet fail  
[15:11:13] <aude>	 thanks 
[15:11:32] <marktraceur>	 Is that a record for non-empty SWATs?
[15:11:32] <marktraceur>	 :)
[15:11:40] <manybubbles>	 ottomata: in fact you can do them all if you are ready
[15:11:49] <anomie>	 marktraceur: Probably not. One with all config changes would easily be faster
[15:11:54] <marktraceur>	 Oh, true
[15:11:58] <grrrit-wm>	 (03PS1) 10Reedy: Add skins to wgLocalisationUpdateRepositories [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169716 (https://bugzilla.wikimedia.org/67154) 
[15:12:05] <mark>	 so is ES happier with the new hardware?
[15:12:09] <marktraceur>	 Apply 'em all, get 'em on tin, sync-dir wmf-config/
[15:12:21] <marktraceur>	 If they were all in one file it'd be even better
[15:12:36] <ottomata>	 manybubbles: ok...
[15:12:37] <Reedy>	 InitialiseCommonSettings.php
[15:12:43] * bd808 looks at logstash hosts to find why only parsoid messages are making it to the index
[15:12:49] <marktraceur>	 I wonder if jouncebot would like to be able to track stuff like !startdeploy SWAT; !enddeploy SWAT
[15:12:50] <anomie>	 Even one at a time, config merges take seconds versus 10 minutes for MediaWiki changes
[15:12:58] <marktraceur>	 True
[15:13:00] <^d>	 mark: Yeah, I'd say so. Much more breathing room.
[15:13:14] <^d>	 We were able to take 9 nodes down for maintenance without suffering.
[15:13:40] <ottomata>	 manybubbles: do I have to be nice to elasticsearch in some way?
[15:13:43] <ottomata>	 or can I just reboot?
[15:13:52] <manybubbles>	 ottomata: just reboot when you want
[15:14:02] <icinga-wm>	 PROBLEM - puppet last run on labsdb1004 is CRITICAL: CRITICAL: puppet fail  
[15:14:21] <^d>	 Reedy: CommonInitialisedSettings.php?
[15:14:36] <ottomata>	 k, rebooting 1013
[15:15:19] <bd808>	 !log Restarted logstash on logstash1001. No MW events were being added to the index.
[15:15:24] <morebots>	 Logged the message, Master
[15:15:32] <manybubbles>	 bd808: weird
[15:15:52] <icinga-wm>	 PROBLEM - puppet last run on bast1001 is CRITICAL: CRITICAL: puppet fail  
[15:15:59] <bd808>	 manybubbles: It gets stuck sometimes. And sadly really doesn't log anything helpful about why
[15:16:09] <manybubbles>	 lame
[15:16:32] <bd808>	 I hope this goes away when I kill off log2udp packet relay as input
[15:16:47] <icinga-wm>	 PROBLEM - Host elastic1013 is DOWN: CRITICAL - Plugin timed out after 15 seconds  
[15:18:52] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] Clone/update skins directory during l10nupdate [puppet] - 10https://gerrit.wikimedia.org/r/169715 (https://bugzilla.wikimedia.org/67154) (owner: 10Reedy)
[15:18:58] <Reedy>	 akosiaris: thanks! :)
[15:18:59] <bd808>	 !log Restarted logstash on logstash1002 to fix OCG and hadoop log events not being recorded
[15:18:59] <gwicke>	 bd808: not sure if the parsoid folks mentioned this, but there is a field that contains a copy of the entire info as json in the gelf output that we could dropto save space; it's added by the gelf-stream library
[15:19:06] <morebots>	 Logged the message, Master
[15:19:36] <gwicke>	 bd808: the full_message field
[15:19:53] <bd808>	 gwicke: I just noticed that. Seems like an easy fix
[15:20:37] <gwicke>	 we were considering pushing a patch upstream to avoid adding it in the first place, but for now stripping it might be quicker
[15:20:48] <logmsgbot>	 !log reedy Purged l10n cache for 1.25wmf2
[15:20:53] <morebots>	 Logged the message, Master
[15:21:12] <logmsgbot>	 !log reedy Purged l10n cache for 1.25wmf3
[15:21:18] <morebots>	 Logged the message, Master
[15:21:57] <Reedy>	 Can someone rm -rf /srv/mediawiki-staging/operations on tin please?
[15:22:02] <Reedy>	 (it's empty bar the .git dir)
[15:22:04] <manybubbles>	 ottomata: that machine is having trouble coming back up?
[15:22:19] * Reedy looks which WMF dirs can die
[15:22:27] <icinga-wm>	 PROBLEM - puppet last run on ssl3003 is CRITICAL: CRITICAL: puppet fail  
[15:23:01] <manybubbles>	 ottomata: ah - much better
[15:23:08] <icinga-wm>	 RECOVERY - Host elastic1013 is UP: PING OK - Packet loss = 0%, RTA = 1.03 ms  
[15:23:37] <manybubbles>	 !log unbanned elastic1013 now that it is back with hyper threading on
[15:23:43] <morebots>	 Logged the message, Master
[15:23:59] <ottomata>	 ok cool
[15:24:00] <ottomata>	 yeah, 
[15:24:11] <ottomata>	 i will do the other two now
[15:24:13] <ottomata>	 can I do them at the same time?
[15:24:15] <ottomata>	  manybubbles?
[15:25:03] <manybubbles>	 ottomata: sure - yeah
[15:25:14] <grrrit-wm>	 (03PS1) 10Alexandros Kosiaris: osm: Fix typo introduced in 2d96ee8 [puppet] - 10https://gerrit.wikimedia.org/r/169720 
[15:26:13] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] osm: Fix typo introduced in 2d96ee8 [puppet] - 10https://gerrit.wikimedia.org/r/169720 (owner: 10Alexandros Kosiaris)
[15:27:36] <ottomata>	 ok, rebooting 1001 and 1008
[15:28:37] <grrrit-wm>	 (03PS1) 10Ottomata: Add defines for working with mysql config files, and mysql client settings [puppet] - 10https://gerrit.wikimedia.org/r/169722 
[15:28:50] <icinga-wm>	 PROBLEM - Host elastic1001 is DOWN: CRITICAL - Plugin timed out after 15 seconds  
[15:28:51] <icinga-wm>	 PROBLEM - Host elastic1008 is DOWN: CRITICAL - Plugin timed out after 15 seconds  
[15:30:27] <icinga-wm>	 RECOVERY - puppet last run on labsdb1006 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures  
[15:30:32] <grrrit-wm>	 (03PS2) 10Ottomata: Add defines for working with mysql config files, and mysql client settings [puppet] - 10https://gerrit.wikimedia.org/r/169722 
[15:31:17] <icinga-wm>	 PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 525 bytes in 0.002 second response time  
[15:31:18] <grrrit-wm>	 (03CR) 10Ottomata: "Ok, here is an attempt:" [puppet] - 10https://gerrit.wikimedia.org/r/168993 (owner: 10Ottomata)
[15:31:57] <icinga-wm>	 RECOVERY - puppet last run on labsdb1004 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures  
[15:32:17] <icinga-wm>	 RECOVERY - graphite.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 0.013 second response time  
[15:32:36] <Reedy>	 reedy@tin:/srv/mediawiki-staging$ rm -rf php-1.24wmf21/.git/modules/extensions/Wikidata/rr-cache/1c5f30e1e932c16581d9da4f7a9e510910cba134/preimage
[15:32:36] <Reedy>	 rm: cannot remove `php-1.24wmf21/.git/modules/extensions/Wikidata/rr-cache/1c5f30e1e932c16581d9da4f7a9e510910cba134/preimage': Permission denied
[15:32:36] <Reedy>	 reedy@tin:/srv/mediawiki-staging$ ls -al php-1.24wmf21/.git/modules/extensions/Wikidata/rr-cache/1c5f30e1e932c16581d9da4f7a9e510910cba134/preimage
[15:32:36] <Reedy>	 -rw-rw-r-- 1 ori wikidev 48576 Sep 12 00:35 php-1.24wmf21/.git/modules/extensions/Wikidata/rr-cache/1c5f30e1e932c16581d9da4f7a9e510910cba134/preimage
[15:32:36] <Reedy>	 wut
[15:33:04] <grrrit-wm>	 (03CR) 10Ottomata: "Strangely enough, this template already existed in the mysql module. It looks like it was from puppet labs. I did a halfhearted googling" [puppet] - 10https://gerrit.wikimedia.org/r/169722 (owner: 10Ottomata)
[15:33:14] <aude>	 ?
[15:33:23] <Reedy>	 why can't I delete those?
[15:33:28] <Reedy>	 There's a stack of them
[15:33:52] <grrrit-wm>	 (03PS1) 10Reedy: Remove php-1.24wmf(19|2[01]) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169723 
[15:34:06] <grrrit-wm>	 (03CR) 10Reedy: [C: 032] Remove php-1.24wmf(19|2[01]) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169723 (owner: 10Reedy)
[15:34:14] <grrrit-wm>	 (03Merged) 10jenkins-bot: Remove php-1.24wmf(19|2[01]) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169723 (owner: 10Reedy)
[15:34:19] <grrrit-wm>	 (03PS1) 10Alexandros Kosiaris: osm: Fix matching the rsync process via nrpe [puppet] - 10https://gerrit.wikimedia.org/r/169724 
[15:35:07] <akosiaris>	 !log uploaded apertium-apy_0.1+svn~57689-1 on apt.wikimedia.org
[15:35:14] <morebots>	 Logged the message, Master
[15:35:30] <Reedy>	 !log deleted php-1.24wmf19 from mediawiki-installation
[15:35:36] <morebots>	 Logged the message, Master
[15:35:55] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] osm: Fix matching the rsync process via nrpe [puppet] - 10https://gerrit.wikimedia.org/r/169724 (owner: 10Alexandros Kosiaris)
[15:36:27] <Reedy>	 !log deleted php-1.24wmf20 from mediawiki-installation
[15:36:32] <morebots>	 Logged the message, Master
[15:37:17] <Reedy>	 !log deleted php-1.24wmf21 from mediawiki-installation
[15:37:23] <morebots>	 Logged the message, Master
[15:37:47] <ottomata>	 manybubbles: 1001 and 1008 shoudl be back up
[15:37:48] <icinga-wm>	 RECOVERY - Host elastic1001 is UP: PING OK - Packet loss = 0%, RTA = 0.56 ms  
[15:37:50] <icinga-wm>	 RECOVERY - Host elastic1008 is UP: PING OK - Packet loss = 0%, RTA = 1.39 ms  
[15:38:01] <^d>	 I see them
[15:38:14] <Reedy>	 heh
[15:38:26] <manybubbles>	 yeah
[15:39:12] <manybubbles>	 !log start moving shards back to elastic1001 and elastic1008 now that they are up with hyperthreading on
[15:39:17] <morebots>	 Logged the message, Master
[15:40:46] <^d>	 1, 8 and 13 won't take any until 2, 7 and 14 are done dumping theirs.
[15:41:02] <grrrit-wm>	 (03PS1) 10Manybubbles: Remove old elasticsearch masters [puppet] - 10https://gerrit.wikimedia.org/r/169725 
[15:41:13] <^d>	 at least that's what i'm observing.
[15:41:24] <icinga-wm>	 RECOVERY - puppet last run on ssl3003 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures  
[15:41:30] <grrrit-wm>	 (03CR) 10Manybubbles: "Merge me anytime." [puppet] - 10https://gerrit.wikimedia.org/r/169725 (owner: 10Manybubbles)
[15:42:29] <manybubbles>	 ^d: its because elasticsearch won't perform any rebalancing until all the banned nodes are empty
[15:42:33] <^d>	 yeah
[15:42:46] <manybubbles>	 and I believe all the shards on the banned nodes are already moving
[15:43:27] <^d>	 they are.
[15:49:41] <grrrit-wm>	 (03CR) 10Chad: [C: 031] Remove old elasticsearch masters [puppet] - 10https://gerrit.wikimedia.org/r/169725 (owner: 10Manybubbles)
[15:53:28] <grrrit-wm>	 (03PS1) 10BryanDavis: logstash: Drop full_message field from GELF messages [puppet] - 10https://gerrit.wikimedia.org/r/169727 
[15:53:30] <grrrit-wm>	 (03PS1) 10BryanDavis: logstash: reformat gelf filter config [puppet] - 10https://gerrit.wikimedia.org/r/169728 
[15:53:41] <ottomata>	 hey paravoid, did you(and magnus?) package a new librdkafka?
[15:53:58] <ottomata>	 is it going to ubuntu/debian?  
[15:59:52] <bd808>	 jgage: A couple smallish logstash tweaks for your consideration:  https://gerrit.wikimedia.org/r/169727 &  https://gerrit.wikimedia.org/r/169728
[16:00:15] <bd808>	 I haven't tested either one in beta yet because I'm lazy and have to prep for a phone interview. :/
[16:01:03] <greg-g>	 bd808: thanks for that (the phone interview)
[16:02:14] <bd808>	 greg-g: :) np. I kind of like doing interviews. 
[16:03:12] <ottomata>	 ^d, manybubbles, i'm running puppet on 1017-1019 now
[16:03:20] <manybubbles>	 ottomata: cool
[16:03:58] <manybubbles>	 !log shutting down elasticsearch on elastic1017 - its empty and ready to have its disk upgraded/hyper threading enabled
[16:04:05] <morebots>	 Logged the message, Master
[16:04:42] <manybubbles>	 !log shutting down elasticsearch on elastic1014 - its empty and ready to have its disk upgraded/hyper threading enabled
[16:04:46] <morebots>	 Logged the message, Master
[16:05:23] <ottomata>	 manybubbles: did you mean 1017?
[16:05:36] <^d>	 yeah was about to say.
[16:05:42] <manybubbles>	 !log shutting down elasticsearch on elastic1007 - its empty and ready to have its disk upgraded/hyper threading enabled
[16:05:47] <morebots>	 Logged the message, Master
[16:05:51] <manybubbles>	 !log ignore my last log message about 1017 - typod
[16:05:56] <morebots>	 Logged the message, Master
[16:06:04] <icinga-wm>	 RECOVERY - DPKG on elastic1019 is OK: All packages OK  
[16:06:05] <icinga-wm>	 RECOVERY - check if dhclient is running on elastic1019 is OK: PROCS OK: 0 processes with command name dhclient  
[16:06:24] <icinga-wm>	 RECOVERY - Disk space on elastic1019 is OK: DISK OK  
[16:06:44] <icinga-wm>	 RECOVERY - RAID on elastic1019 is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0  
[16:06:45] <icinga-wm>	 RECOVERY - check configured eth on elastic1019 is OK: NRPE: Unable to read output  
[16:06:45] <icinga-wm>	 PROBLEM - ElasticSearch health check on elastic1014 is CRITICAL: CRITICAL - Could not connect to server 10.64.48.11  
[16:07:05] <icinga-wm>	 PROBLEM - ElasticSearch health check for shards on elastic1014 is CRITICAL: CRITICAL - elasticsearch http://10.64.48.11:9200/_cluster/health error while fetching: Max retries exceeded for url: /_cluster/health  
[16:07:35] <icinga-wm>	 RECOVERY - puppet last run on elastic1019 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures  
[16:07:54] <icinga-wm>	 PROBLEM - ElasticSearch health check for shards on elastic1007 is CRITICAL: CRITICAL - elasticsearch http://10.64.32.139:9200/_cluster/health error while fetching: Max retries exceeded for url: /_cluster/health  
[16:07:58] <icinga-wm>	 RECOVERY - ElasticSearch health check on elastic1019 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 29: number_of_data_nodes: 29: active_primary_shards: 2033: active_shards: 6094: relocating_shards: 16: initializing_shards: 0: unassigned_shards: 0  
[16:07:58] <icinga-wm>	 PROBLEM - ElasticSearch health check on elastic1007 is CRITICAL: CRITICAL - Could not connect to server 10.64.32.139  
[16:11:15] <icinga-wm>	 RECOVERY - ElasticSearch health check for shards on elastic1014 is OK: OK - elasticsearch status production-search-eqiad: status: yellow, number_of_nodes: 30, unassigned_shards: 0, timed_out: False, active_primary_shards: 2033, cluster_name: production-search-eqiad, relocating_shards: 12, active_shards: 6093, initializing_shards: 1, number_of_data_nodes: 30  
[16:12:06] <^d>	 manybubbles: That enwiki_general shard is taking its sweet time moving off 1002 :p
[16:12:08] <manybubbles>	 wtf puppet - I disabled you. 
[16:12:10] <manybubbles>	 yeah
[16:12:54] <icinga-wm>	 PROBLEM - ElasticSearch health check on elastic1014 is CRITICAL: CRITICAL - Could not connect to server 10.64.48.11  
[16:13:11] <manybubbles>	 how I disable puppet?
[16:13:25] <manybubbles>	 there we go
[16:13:29] <manybubbles>	 --disable instead of disable
[16:15:04] <godog>	 that also takes an argument to specify why FWIW
[16:15:36] <icinga-wm>	 PROBLEM - ElasticSearch health check for shards on elastic1014 is CRITICAL: CRITICAL - elasticsearch http://10.64.48.11:9200/_cluster/health error while fetching: Max retries exceeded for url: /_cluster/health  
[16:15:53] <manybubbles>	 ottomata: noatime on elastic1018
[16:16:02] <ottomata>	 bwaaaHHH
[16:16:03] <ottomata>	 thanks. sorry
[16:16:15] <ottomata>	 you are a good double checker
[16:16:44] <manybubbles>	 ottomata: I try
[16:16:51] <manybubbles>	 elastic1017 and 1019 need it too
[16:16:54] <manybubbles>	 but 1019 isn't emtpy
[16:17:16] <icinga-wm>	 PROBLEM - ElasticSearch health check on elastic1019 is CRITICAL: CRITICAL - Could not connect to server 10.64.48.41  
[16:17:20] <ottomata>	 manybubbles: it isn't empty!?
[16:17:23] <ottomata>	 i just did i!
[16:17:25] <ottomata>	 it!
[16:17:27] <ottomata>	 uh oh
[16:17:32] <ottomata>	 it is back on now
[16:17:35] <ottomata>	 what did I do?
[16:17:36] <manybubbles>	 !log shutting down elasticsearch on elastic1002 - its empty and ready to have its disk upgraded/hyper threading enabled
[16:17:45] <morebots>	 Logged the message, Master
[16:17:46] <manybubbles>	 ottomata: we just hadn't banned it
[16:17:50] <ottomata>	 OHH
[16:17:54] <ottomata>	 right because it wasn't online
[16:18:05] <ottomata>	 welp, elasticsearch was down there for about 10 seconds then :/
[16:18:06] <icinga-wm>	 PROBLEM - ElasticSearch health check on elastic1002 is CRITICAL: CRITICAL - Could not connect to server 10.64.0.109  
[16:18:15] <manybubbles>	 ottomata: its empty now
[16:18:19] <ottomata>	 hah, ok
[16:18:38] <manybubbles>	 ah - I see.  we're in yellow
[16:18:40] <manybubbles>	 its all good
[16:19:50] <grrrit-wm>	 (03CR) 10Gage: [C: 04-1] "full_message is where Hadoop stores Java stack traces, which we want. Can we move this into a nodejs/parsoid-specific section?" [puppet] - 10https://gerrit.wikimedia.org/r/169727 (owner: 10BryanDavis)
[16:20:15] * ^d puts his yellow hat on
[16:20:26] <icinga-wm>	 PROBLEM - ElasticSearch health check for shards on elastic1002 is CRITICAL: CRITICAL - elasticsearch http://10.64.0.109:9200/_cluster/health error while fetching: Max retries exceeded for url: /_cluster/health  
[16:20:30] <manybubbles>	 !log elastic101[7-9] look good to me - adding them to the cluster
[16:20:35] <morebots>	 Logged the message, Master
[16:20:37] <manybubbles>	 icinga-wm: I know that.  I shut it down intentionally
[16:21:02] <ottomata>	 k
[16:22:15] <manybubbles>	 ^d: can you make es-tool take a hostname instead of an ip address?
[16:22:21] <manybubbles>	 in addition to, rather?
[16:22:23] <grrrit-wm>	 (03CR) 10BryanDavis: "Parsoid and OCG both just have junk there. Are there other useful things in the Hadoop version? Maybe we could pick all the good things ou" [puppet] - 10https://gerrit.wikimedia.org/r/169727 (owner: 10BryanDavis)
[16:22:24] <^d>	 Yeah, I was just thinking that.
[16:22:41] <manybubbles>	 I do a lot of `ifconfig | grep inet`, copy, paste
[16:24:44] <icinga-wm>	 RECOVERY - NTP on elastic1019 is OK: NTP OK: Offset -4.708766937e-05 secs  
[16:28:57] <manybubbles>	 ottomata: this went great!
[16:30:43] <icinga-wm>	 RECOVERY - ElasticSearch health check on elastic1019 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 28: number_of_data_nodes: 28: active_primary_shards: 2033: active_shards: 6094: relocating_shards: 16: initializing_shards: 0: unassigned_shards: 0  
[16:34:58] <ottomata>	 COOL
[16:35:08] <ottomata>	 so,  manybubbles, status?
[16:35:26] <ottomata>	 is one of the reinstalled hosts now the master?
[16:35:52] <ottomata>	 ah, i see your email
[16:40:33] <grrrit-wm>	 (03CR) 10Dzahn: "dependency has been merged (thanks Alex). should not be used anymore now." [puppet] - 10https://gerrit.wikimedia.org/r/169571 (owner: 10Matanya)
[16:42:19] <grrrit-wm>	 (03CR) 10Dzahn: "Alex, thanks! and yea, agree, i think we all want to get rid of the old webserver class as well" [puppet] - 10https://gerrit.wikimedia.org/r/169561 (owner: 10Dzahn)
[16:43:12] <grrrit-wm>	 (03CR) 10JanZerebecki: [C: 031] "Yes, please." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169714 (owner: 10Aude)
[16:43:53] <grrrit-wm>	 (03CR) 10Gage: [C: 031] "Actually it looks like stack traces are now in their own field called StackTrace. full_message seems to be just a copy of short_message, s" [puppet] - 10https://gerrit.wikimedia.org/r/169727 (owner: 10BryanDavis)
[16:46:39] <grrrit-wm>	 (03CR) 10Dzahn: "when glancing at the bug it's reopened as "not yet fixed"? update?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169714 (owner: 10Aude)
[16:47:26] <grrrit-wm>	 (03PS1) 10Reedy: Add mai to langs.tmpl [dns] - 10https://gerrit.wikimedia.org/r/169736 
[16:47:58] <grrrit-wm>	 (03Abandoned) 10Reedy: Add mai to langs.tmpl [dns] - 10https://gerrit.wikimedia.org/r/169736 (owner: 10Reedy)
[16:48:33] <grrrit-wm>	 (03CR) 10Reedy: [C: 031] Add 'mai' to langs.tmpl [dns] - 10https://gerrit.wikimedia.org/r/169011 (https://bugzilla.wikimedia.org/72346) (owner: 10Glaisher)
[16:48:39] <grrrit-wm>	 (03CR) 10Dzahn: "duplicate of.. lol. you're quicker" [dns] - 10https://gerrit.wikimedia.org/r/169736 (owner: 10Reedy)
[16:49:48] <grrrit-wm>	 (03PS2) 10Glaisher: Add 'mai' to langs.tmpl [dns] - 10https://gerrit.wikimedia.org/r/169011 (https://bugzilla.wikimedia.org/72346) 
[16:50:39] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] "the language is "Maithili" spoken in India and Nepal http://en.wikipedia.org/wiki/Maithili_language" [dns] - 10https://gerrit.wikimedia.org/r/169011 (https://bugzilla.wikimedia.org/72346) (owner: 10Glaisher)
[16:58:08] <ottomata>	 oo, manybubbles, i need to enable 1019 in pybal
[16:58:10] <ottomata>	 s'ok to do so?
[16:59:47] <cmjohnson>	 ottomata: regarding HT...i was only talking about 1017-19...the others I thought you did the other day
[17:01:05] <ottomata>	 sorry, nope, will have to do those as we reinstall them
[17:01:25] <cmjohnson>	 oh..okay
[17:02:17] <grrrit-wm>	 (03PS1) 10Reedy: Fix private typo [puppet] - 10https://gerrit.wikimedia.org/r/169744 
[17:03:01] <jgage>	 enabling hyperthreading? yay.
[17:06:01] <jgage>	 cmjohnson, ottomata: are you aware of enabled HT on any other nodes?
[17:06:11] <grrrit-wm>	 (03PS2) 10Ori.livneh: Re-enable hhvm beta feature on Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169714 (owner: 10Aude)
[17:06:16] <grrrit-wm>	 (03CR) 10Ori.livneh: [C: 032] Re-enable hhvm beta feature on Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169714 (owner: 10Aude)
[17:06:32] <grrrit-wm>	 (03Merged) 10jenkins-bot: Re-enable hhvm beta feature on Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169714 (owner: 10Aude)
[17:07:24] <logmsgbot>	 !log ori Synchronized wmf-config/CommonSettings.php: I8dd62e2cc: Re-enable hhvm beta feature on Wikidata (duration: 00m 06s)
[17:07:35] <morebots>	 Logged the message, Master
[17:08:03] <grrrit-wm>	 (03PS2) 10Dzahn: analytics-privatedata-users - Fix typo [puppet] - 10https://gerrit.wikimedia.org/r/169744 (owner: 10Reedy)
[17:08:12] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] analytics-privatedata-users - Fix typo [puppet] - 10https://gerrit.wikimedia.org/r/169744 (owner: 10Reedy)
[17:08:43] <greg-g>	 ori: oh, nice (re wikidata+hhvm)
[17:11:24] <grrrit-wm>	 (03CR) 10GWicke: [C: 031] logstash: Drop full_message field from GELF messages [puppet] - 10https://gerrit.wikimedia.org/r/169727 (owner: 10BryanDavis)
[17:13:27] <grrrit-wm>	 (03CR) 10Subramanya Sastry: [C: 031] logstash: Drop full_message field from GELF messages [puppet] - 10https://gerrit.wikimedia.org/r/169727 (owner: 10BryanDavis)
[17:14:26] <ottomata>	 ^d, should I enable 1019 in pybal?
[17:16:50] <ragesoss>	 csteipp: is that OAuth fix going to be lightning deployed, or will it await the train or somesuch? I'm hoping to get users testing my app today, if possible.
[17:19:08] <csteipp>	 ragesoss: The new branch is being cut at 11 (40 mins?). It should have the fix, and mediawikiwiki is phase 0.
[17:19:17] <csteipp>	 So should be resolved in about an hour :)
[17:19:29] <ragesoss>	 sweet
[17:19:31] <ragesoss>	 :)
[17:21:46] <icinga-wm>	 PROBLEM - check if salt-minion is running on ocg1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion  
[17:22:24] <greg-g>	 ragesoss: you might not know, but we moved the Thur deploy to Wed (starting this week)
[17:22:57] <ragesoss>	 greg-g: I was just looking at that. Serendipity!
[17:23:06] <greg-g>	 :)
[17:24:28] <cmjohnson>	 !log shutting down to replace ssds in elastic1002,1007,1014
[17:24:33] <morebots>	 Logged the message, Master
[17:29:04] <icinga-wm>	 PROBLEM - Host elastic1002 is DOWN: PING CRITICAL - Packet loss = 100%  
[17:30:47] <icinga-wm>	 PROBLEM - Host elastic1007 is DOWN: PING CRITICAL - Packet loss = 100%  
[17:31:23] <icinga-wm>	 PROBLEM - Host elastic1014 is DOWN: CRITICAL - Plugin timed out after 15 seconds  
[17:31:27] <icinga-wm>	 ACKNOWLEDGEMENT - DPKG on elastic1002 is CRITICAL: Timeout while attempting connection Chris Johnson replacing ssds
[17:31:28] <icinga-wm>	 ACKNOWLEDGEMENT - Disk space on elastic1002 is CRITICAL: Timeout while attempting connection Chris Johnson replacing ssds
[17:31:28] <icinga-wm>	 ACKNOWLEDGEMENT - ElasticSearch health check on elastic1002 is CRITICAL: CRITICAL - Could not connect to server 10.64.0.109 Chris Johnson replacing ssds
[17:31:28] <icinga-wm>	 ACKNOWLEDGEMENT - ElasticSearch health check for shards on elastic1002 is CRITICAL: CRITICAL - elasticsearch http://10.64.0.109:9200/_cluster/health error while fetching: Max retries exceeded for url: /_cluster/health Chris Johnson replacing ssds
[17:31:28] <icinga-wm>	 ACKNOWLEDGEMENT - NTP on elastic1002 is CRITICAL: NTP CRITICAL: No response from NTP server Chris Johnson replacing ssds
[17:31:28] <icinga-wm>	 ACKNOWLEDGEMENT - RAID on elastic1002 is CRITICAL: Timeout while attempting connection Chris Johnson replacing ssds
[17:31:28] <icinga-wm>	 ACKNOWLEDGEMENT - SSH on elastic1002 is CRITICAL: Connection timed out Chris Johnson replacing ssds
[17:31:29] <icinga-wm>	 ACKNOWLEDGEMENT - check configured eth on elastic1002 is CRITICAL: Timeout while attempting connection Chris Johnson replacing ssds
[17:31:29] <icinga-wm>	 ACKNOWLEDGEMENT - check if dhclient is running on elastic1002 is CRITICAL: Timeout while attempting connection Chris Johnson replacing ssds
[17:31:30] <icinga-wm>	 ACKNOWLEDGEMENT - check if salt-minion is running on elastic1002 is CRITICAL: Timeout while attempting connection Chris Johnson replacing ssds
[17:31:30] <icinga-wm>	 ACKNOWLEDGEMENT - puppet last run on elastic1002 is CRITICAL: Timeout while attempting connection Chris Johnson replacing ssds
[17:32:04] <icinga-wm>	 ACKNOWLEDGEMENT - DPKG on elastic1007 is CRITICAL: Timeout while attempting connection Chris Johnson replacing ssds
[17:32:04] <icinga-wm>	 ACKNOWLEDGEMENT - Disk space on elastic1007 is CRITICAL: Timeout while attempting connection Chris Johnson replacing ssds
[17:32:04] <icinga-wm>	 ACKNOWLEDGEMENT - ElasticSearch health check on elastic1007 is CRITICAL: CRITICAL - Could not connect to server 10.64.32.139 Chris Johnson replacing ssds
[17:32:04] <icinga-wm>	 ACKNOWLEDGEMENT - ElasticSearch health check for shards on elastic1007 is CRITICAL: CRITICAL - elasticsearch http://10.64.32.139:9200/_cluster/health error while fetching: Max retries exceeded for url: /_cluster/health Chris Johnson replacing ssds
[17:32:04] <icinga-wm>	 ACKNOWLEDGEMENT - NTP on elastic1007 is CRITICAL: NTP CRITICAL: No response from NTP server Chris Johnson replacing ssds
[17:32:05] <icinga-wm>	 ACKNOWLEDGEMENT - RAID on elastic1007 is CRITICAL: Timeout while attempting connection Chris Johnson replacing ssds
[17:32:05] <icinga-wm>	 ACKNOWLEDGEMENT - SSH on elastic1007 is CRITICAL: Connection timed out Chris Johnson replacing ssds
[17:32:06] <icinga-wm>	 ACKNOWLEDGEMENT - check configured eth on elastic1007 is CRITICAL: Timeout while attempting connection Chris Johnson replacing ssds
[17:32:06] <icinga-wm>	 ACKNOWLEDGEMENT - check if dhclient is running on elastic1007 is CRITICAL: Timeout while attempting connection Chris Johnson replacing ssds
[17:32:07] <icinga-wm>	 ACKNOWLEDGEMENT - check if salt-minion is running on elastic1007 is CRITICAL: Timeout while attempting connection Chris Johnson replacing ssds
[17:32:07] <icinga-wm>	 ACKNOWLEDGEMENT - puppet last run on elastic1007 is CRITICAL: Timeout while attempting connection Chris Johnson replacing ssds
[17:35:14] <icinga-wm>	 PROBLEM - Disk space on ms-be3004 is CRITICAL: Timeout while attempting connection  
[17:39:43] <icinga-wm>	 PROBLEM - puppet last run on cp3011 is CRITICAL: CRITICAL: Puppet has 2 failures  
[17:39:43] <icinga-wm>	 PROBLEM - puppet last run on amssq33 is CRITICAL: CRITICAL: Puppet has 3 failures  
[17:39:45] <icinga-wm>	 PROBLEM - puppet last run on eeden is CRITICAL: CRITICAL: Puppet has 2 failures  
[17:39:45] <icinga-wm>	 PROBLEM - puppet last run on ssl3001 is CRITICAL: CRITICAL: Puppet has 3 failures  
[17:44:41] <grrrit-wm>	 (03PS1) 10Glaisher: Initial configuration for Maithili Wikipedia (maiwiki) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169758 (https://bugzilla.wikimedia.org/72346) 
[17:45:23] <Reedy>	 Glaisher: Are any more new wikis in the pipeline?
[17:45:43] <Glaisher>	 none that I'm aware of
[17:50:22] <grrrit-wm>	 (03CR) 10JanZerebecki: "@Dzahn: Yes, updated: https://bugzilla.wikimedia.org/show_bug.cgi?id=64415#c4" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169714 (owner: 10Aude)
[17:54:23] <icinga-wm>	 RECOVERY - puppet last run on amssq33 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures  
[17:55:33] <icinga-wm>	 RECOVERY - puppet last run on ssl3001 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures  
[17:56:24] <icinga-wm>	 RECOVERY - puppet last run on cp3011 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures  
[17:56:34] <icinga-wm>	 RECOVERY - puppet last run on eeden is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures  
[18:00:05] <jouncebot>	 Reedy, greg-g: Dear anthropoid, the time has come. Please deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141029T1800).
[18:00:22] * aude back 
[18:00:32] <hoo>	 aude: Saw the cake?
[18:01:24] <icinga-wm>	 RECOVERY - Host elastic1002 is UP: PING OK - Packet loss = 0%, RTA = 1.76 ms  
[18:03:54] <icinga-wm>	 RECOVERY - Host elastic1014 is UP: PING OK - Packet loss = 0%, RTA = 0.73 ms  
[18:04:35] <aude>	 mmmmm, cupcake! :D
[18:05:26] <hoo>	 Right :)
[18:05:36] <aude>	 and the present is hhvm beta feature again :)
[18:06:23] <hoo>	 Already see hhvm edits happening :)
[18:08:14] <icinga-wm>	 PROBLEM - Host elastic1002 is DOWN: CRITICAL - Plugin timed out after 15 seconds  
[18:08:34] <icinga-wm>	 RECOVERY - Host elastic1007 is UP: PING OK - Packet loss = 0%, RTA = 3.74 ms  
[18:09:29] <aude>	 the error logs look quiet
[18:10:04] <icinga-wm>	 RECOVERY - Host elastic1002 is UP: PING OK - Packet loss = 0%, RTA = 1.99 ms  
[18:14:54] <icinga-wm>	 PROBLEM - puppet last run on ms-be3001 is CRITICAL: CRITICAL: Puppet has 1 failures  
[18:15:33] <icinga-wm>	 PROBLEM - puppet last run on amssq36 is CRITICAL: CRITICAL: Puppet has 1 failures  
[18:16:04] <icinga-wm>	 PROBLEM - Host elastic1014 is DOWN: CRITICAL - Plugin timed out after 15 seconds  
[18:17:15] <grrrit-wm>	 (03PS1) 10Reedy: add symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169765 
[18:17:17] <grrrit-wm>	 (03PS1) 10Reedy: testwiki to 1.25wmf6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169766 
[18:17:19] <grrrit-wm>	 (03PS1) 10Reedy: wikipedias to 1.25wmf5 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169767 
[18:17:21] <grrrit-wm>	 (03PS1) 10Reedy: group0 to 1.25wmf6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169768 
[18:17:50] <grrrit-wm>	 (03CR) 10Reedy: [C: 032] add symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169765 (owner: 10Reedy)
[18:17:58] <grrrit-wm>	 (03CR) 10Reedy: [C: 032] testwiki to 1.25wmf6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169766 (owner: 10Reedy)
[18:18:14] <grrrit-wm>	 (03Merged) 10jenkins-bot: add symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169765 (owner: 10Reedy)
[18:18:22] <grrrit-wm>	 (03Merged) 10jenkins-bot: testwiki to 1.25wmf6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169766 (owner: 10Reedy)
[18:18:33] <logmsgbot>	 !log reedy Started scap: testwiki to 1.25wmf6 and build l10n cache
[18:18:40] <morebots>	 Logged the message, Master
[18:21:24] <icinga-wm>	 RECOVERY - Host elastic1014 is UP: PING OK - Packet loss = 0%, RTA = 2.11 ms  
[18:28:05] <icinga-wm>	 PROBLEM - Host elastic1014 is DOWN: PING CRITICAL - Packet loss = 100%  
[18:28:54] <icinga-wm>	 RECOVERY - puppet last run on ms-be3001 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures  
[18:29:34] <icinga-wm>	 RECOVERY - puppet last run on amssq36 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures  
[18:29:48] <Reedy>	 sync-common:  10% (ok: 24; fail: 0; left: 205)
[18:31:54] <cmjohnson>	 ottomata: elastic1002/7/14 has base install, HT turned on. old puppet certs and salt-keys removed.  
[18:33:14] <icinga-wm>	 RECOVERY - Host elastic1014 is UP: PING OK - Packet loss = 0%, RTA = 0.87 ms  
[18:33:31] <ottomata>	 cool, ok, ^d, shall I keep going?
[18:35:40] <greg-g>	 Reedy: taking a while, eh?
[18:35:52] <Reedy>	 17 mins so far
[18:35:57] <Reedy>	 sync-common:  65% (ok: 149; fail: 0; left: 80)
[18:42:04] <^d>	 ottomata: 2, 7 and 14? I think so sure.
[18:42:48] <ottomata>	 ja, 2, 7 14
[18:42:51] <ottomata>	 on it
[18:47:04] <logmsgbot>	 !log reedy Finished scap: testwiki to 1.25wmf6 and build l10n cache (duration: 28m 30s)
[18:47:10] <morebots>	 Logged the message, Master
[18:54:13] <grrrit-wm>	 (03CR) 10Reedy: [C: 032] wikipedias to 1.25wmf5 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169767 (owner: 10Reedy)
[18:54:21] <grrrit-wm>	 (03Merged) 10jenkins-bot: wikipedias to 1.25wmf5 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169767 (owner: 10Reedy)
[18:57:16] <logmsgbot>	 !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedias to 1.25wmf5
[18:57:23] <morebots>	 Logged the message, Master
[18:57:54] <grrrit-wm>	 (03CR) 10Reedy: [C: 032] group0 to 1.25wmf6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169768 (owner: 10Reedy)
[18:58:10] <grrrit-wm>	 (03Merged) 10jenkins-bot: group0 to 1.25wmf6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169768 (owner: 10Reedy)
[18:58:35] <logmsgbot>	 !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.25wmf6
[18:58:40] <morebots>	 Logged the message, Master
[19:03:04] <icinga-wm>	 PROBLEM - puppet last run on elastic1007 is CRITICAL: CRITICAL: Puppet has 1 failures  
[19:03:49] <icinga-wm>	 PROBLEM - ElasticSearch health check on elastic1007 is CRITICAL: CRITICAL - Could not connect to server 10.64.32.139  
[19:03:55] <icinga-wm>	 PROBLEM - puppet last run on elastic1002 is CRITICAL: CRITICAL: Puppet has 2 failures  
[19:03:55] <icinga-wm>	 PROBLEM - ElasticSearch health check for shards on elastic1007 is CRITICAL: CRITICAL - elasticsearch http://10.64.32.139:9200/_cluster/health error while fetching: Max retries exceeded for url: /_cluster/health  
[19:04:32] <^d>	 icinga-wm: go away, we know.
[19:04:46] <icinga-wm>	 PROBLEM - ElasticSearch health check on elastic1002 is CRITICAL: CRITICAL - Could not connect to server 10.64.0.109  
[19:04:47] * YuviPanda should implement a replacement for ircecho one of these days
[19:04:55] <icinga-wm>	 PROBLEM - ElasticSearch health check for shards on elastic1002 is CRITICAL: CRITICAL - elasticsearch http://10.64.0.109:9200/_cluster/health error while fetching: Max retries exceeded for url: /_cluster/health  
[19:04:56] <YuviPanda>	 one that understands 'go away'
[19:08:44] <icinga-wm>	 PROBLEM - puppet last run on osmium is CRITICAL: CRITICAL: puppet fail  
[19:09:06] <icinga-wm>	 PROBLEM - puppet last run on mw1025 is CRITICAL: CRITICAL: Puppet has 1 failures  
[19:10:19] <ori>	 ack on both
[19:10:53] <paravoid>	 ori: "hackish fix" vs. "fixed this" ? :)
[19:11:14] <paravoid>	 "hackish fix" points to some kind of better fix, doesn't it?
[19:11:14] <icinga-wm>	 RECOVERY - puppet last run on mw1025 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures  
[19:11:38] <ori>	 paravoid: yes, brett self-assigned and made it high-prio. but it's a good enough fix for me for now.
[19:13:29] <paravoid>	 why is it hackish?
[19:14:38] <^d>	 ottomata: Where we at? :)
[19:15:15] <ottomata>	 oop, getting distracted
[19:15:33] <bd808>	 paravoid: Hackish because Tim said so. :) -- https://phabricator.wikimedia.org/T820#16428
[19:15:41] <ottomata>	 oh, ^d, i should enable 1019 in pybal, yes?
[19:15:41] <paravoid>	 I know, that's what I'm asking
[19:15:42] <ori>	 paravoid: because the way the extension was written, it's supposed to be create_node_object()'s responsibility to add the node to m_orphans
[19:15:56] <^d>	 ottomata: Yeah, at some point. No rush on that part tho :)
[19:16:02] <ori>	 not the caller's
[19:16:16] <ori>	 but in this case the caller knows that create_node_object() won't, so it does anyway
[19:16:24] <paravoid>	 ah
[19:16:28] <ottomata>	 hm, ^d, i can do now, though, so it is the same as the others
[19:16:36] <^d>	 okie dokie
[19:16:54] <icinga-wm>	 RECOVERY - puppet last run on elastic1007 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures  
[19:17:22] <^d>	 Ok, I see 7 and 2 back
[19:17:23] <ottomata>	 done
[19:17:25] <ottomata>	 yeah, just now
[19:17:28] <ori>	 !log upgraded HHVM to 3.3.0+dfsg1-1+wm1
[19:17:29] <ottomata>	 1014 i couldnt log into
[19:17:30] <ottomata>	 lemme try again
[19:17:34] <morebots>	 Logged the message, Master
[19:17:35] <icinga-wm>	 RECOVERY - ElasticSearch health check on elastic1002 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 30: number_of_data_nodes: 30: active_primary_shards: 2033: active_shards: 6094: relocating_shards: 16: initializing_shards: 0: unassigned_shards: 0  
[19:17:35] <icinga-wm>	 RECOVERY - ElasticSearch health check on elastic1007 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 30: number_of_data_nodes: 30: active_primary_shards: 2033: active_shards: 6094: relocating_shards: 16: initializing_shards: 0: unassigned_shards: 0  
[19:17:45] <icinga-wm>	 RECOVERY - ElasticSearch health check for shards on elastic1002 is OK: OK - elasticsearch status production-search-eqiad: status: green, number_of_nodes: 30, unassigned_shards: 0, timed_out: False, active_primary_shards: 2033, cluster_name: production-search-eqiad, relocating_shards: 16, active_shards: 6094, initializing_shards: 0, number_of_data_nodes: 30  
[19:17:45] <icinga-wm>	 RECOVERY - puppet last run on elastic1002 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures  
[19:17:54] <icinga-wm>	 RECOVERY - ElasticSearch health check for shards on elastic1007 is OK: OK - elasticsearch status production-search-eqiad: status: green, number_of_nodes: 30, unassigned_shards: 0, timed_out: False, active_primary_shards: 2033, cluster_name: production-search-eqiad, relocating_shards: 16, active_shards: 6094, initializing_shards: 0, number_of_data_nodes: 30  
[19:18:02] <paravoid>	 ok then, I guess we can live with that
[19:18:15] <ottomata>	 cmjohnson: is elastic1014 ok?
[19:18:30] <cmjohnson>	 why...did it not come back up?
[19:18:32] <^d>	 ottomata: Lets also merge https://gerrit.wikimedia.org/r/#/c/169725/ so we don't bring them back up as masters again.
[19:18:32] <paravoid>	 I just wanted to be sure it didn't mean "hackish because it will leak under other circumstances" or something :)
[19:18:39] <ori>	 paravoid: _joe_ and I were talking about re-basing our packages on 3.3.1 early next week, by which point i hope a proper fix for this will land
[19:18:51] <paravoid>	 nod
[19:18:51] <^d>	 (since we already did the master dance to 1/8/13)
[19:19:04] <paravoid>	 I was hoping 3.3.1 would have a large portion of our patches as well, but meh :(
[19:19:10] <ori>	 it does
[19:19:31] <paravoid>	 did paul include them after all?
[19:20:25] <ori>	 all of the ones that already landed in master, which was most of them
[19:21:39] <ori>	 the pcre cache rewrite is in a bit of a limbo state: see https://reviews.facebook.net/D25515 for fascinating reading (actually fascinating, no sarcasm)
[19:22:41] <ottomata>	 ^d, OOP, didn't realize that hadn't been done
[19:22:50] <grrrit-wm>	 (03PS2) 10Ottomata: Remove old elasticsearch masters [puppet] - 10https://gerrit.wikimedia.org/r/169725 (owner: 10Manybubbles)
[19:23:04] <^d>	 Yeah if we do it now I can bounce those nodes again before we reload them with shards.
[19:23:05] <grrrit-wm>	 (03PS1) 10Reedy: Set a default for wgProofreadPageNamespaceIds [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169779 (https://bugzilla.wikimedia.org/72525) 
[19:23:08] <^d>	 So they won't take back master.
[19:23:12] <Reedy>	 Dereckson: ^^
[19:23:25] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032 V: 032] Remove old elasticsearch masters [puppet] - 10https://gerrit.wikimedia.org/r/169725 (owner: 10Manybubbles)
[19:23:53] <Dereckson>	 Thank you.
[19:24:15] <Reedy>	 Dereckson: I thought we might aswell make it more obvious to view the ones that don't match the rest
[19:24:15] <ottomata>	 ^d, running puppet there, then will bounce them
[19:24:21] <^d>	 okie dokie
[19:24:39] <ottomata>	 yeah, dunno what's up with 1014 though
[19:24:45] <ottomata>	 cmjohnson: yeah, its not back
[19:24:57] <ottomata>	 last i checked the console was busy too (but that was an hourish ago)
[19:25:02] <grrrit-wm>	 (03CR) 10Reedy: [C: 032] Set a default for wgProofreadPageNamespaceIds [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169779 (https://bugzilla.wikimedia.org/72525) (owner: 10Reedy)
[19:25:10] <grrrit-wm>	 (03Merged) 10jenkins-bot: Set a default for wgProofreadPageNamespaceIds [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169779 (https://bugzilla.wikimedia.org/72525) (owner: 10Reedy)
[19:26:06] <^d>	 ottomata: Ok, I see them back and not master eligible. Fantastic!
[19:26:36] <logmsgbot>	 !log reedy Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 15s)
[19:26:43] <morebots>	 Logged the message, Master
[19:28:33] <^d>	 ottomata: Unbanned 02 and 07 so they can start taking shards again
[19:28:52] <ottomata>	 great
[19:30:20] <cmjohnson>	 ottomata: stuck in installer
[19:30:31] <cmjohnson>	 will ping you once its fixed
[19:30:39] <grrrit-wm>	 (03PS1) 10Tpt: Revert "Set a default for wgProofreadPageNamespaceIds" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169780 
[19:31:12] <^d>	 cmjohnson: Ok, so we've got 3-6, 9-12 and 15-16 left. Each of those ranges are same rack right? We could do them in 3 batches then.
[19:31:50] <cmjohnson>	 ^d yep....but we'll have to do tomorrow 
[19:32:12] <^d>	 Yeah, I'll start draining traffic from a group of those tonight though so they'll be ready by the morning.
[19:32:19] <^d>	 Got any preference?
[19:32:30] <Dereckson>	 Reedy: we need to revert 169779, 250/252 are used for Page and Index for the new wikisources created for more than one year
[19:32:42] <Dereckson>	 (Tpt dixit)
[19:33:31] <Dereckson>	 (as the default value set by the extension)
[19:33:48] <cmjohnson>	 no preference...just let me know
[19:33:58] <icinga-wm>	 PROBLEM - puppet last run on amssq38 is CRITICAL: CRITICAL: puppet fail  
[19:34:43] <^d>	 will do
[19:36:22] <^d>	 cmjohnson, ottomata: I'm stepping away to grab lunch, back in ~15ish. All's quiet from our side right now.
[19:36:43] <cmjohnson>	 okay 1014 will be up in a few mins
[19:38:24] <ottomata>	 cool
[19:38:32] <ottomata>	 i'll get 1014 up and then wait to do more til tomorrow
[19:38:39] <icinga-wm>	 PROBLEM - puppet last run on mw1135 is CRITICAL: CRITICAL: Puppet has 1 failures  
[19:38:44] <grrrit-wm>	 (03CR) 10Reedy: [C: 032] Revert "Set a default for wgProofreadPageNamespaceIds" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169780 (owner: 10Tpt)
[19:38:52] <grrrit-wm>	 (03Merged) 10jenkins-bot: Revert "Set a default for wgProofreadPageNamespaceIds" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169780 (owner: 10Tpt)
[19:39:28] <logmsgbot>	 !log reedy Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 16s)
[19:39:33] <morebots>	 Logged the message, Master
[19:42:57] <cmjohnson>	 ottomata: all yours
[19:43:02] <ottomata>	 danke
[19:43:25] <grrrit-wm>	 (03PS1) 10Tpt: Adds an explicit default for $wgProofreadPageNamespaceIds [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169781 
[19:43:29] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Adds an explicit default for $wgProofreadPageNamespaceIds [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169781 (owner: 10Tpt)
[19:44:19] <grrrit-wm>	 (03PS2) 10Tpt: Adds an explicit default for $wgProofreadPageNamespaceIds [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169781 
[19:45:13] <grrrit-wm>	 (03CR) 10Reedy: [C: 032] Adds an explicit default for $wgProofreadPageNamespaceIds [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169781 (owner: 10Tpt)
[19:45:20] <grrrit-wm>	 (03Merged) 10jenkins-bot: Adds an explicit default for $wgProofreadPageNamespaceIds [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169781 (owner: 10Tpt)
[19:45:54] <logmsgbot>	 !log reedy Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 16s)
[19:47:53] <grrrit-wm>	 (03CR) 10Dereckson: "Follow-up:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169779 (https://bugzilla.wikimedia.org/72525) (owner: 10Reedy)
[19:48:27] <grrrit-wm>	 (03CR) 10Dereckson: "Follow-up: Reverted commit fixed and merged in I27632aa04215c05c40c9a44f921e6e0a1aff319e" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169780 (owner: 10Tpt)
[19:48:32] <^d>	 nom nom nom
[19:49:37] <grrrit-wm>	 (03CR) 10Dereckson: "This commit fixes I27632aa04215c05c40c9a44f921e6e0a1aff319e after an emergency revert in I6b92c02a91b803a744e4b41519e3beb732aa4be0." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169781 (owner: 10Tpt)
[19:53:14] <icinga-wm>	 RECOVERY - puppet last run on amssq38 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures  
[19:56:15] <icinga-wm>	 RECOVERY - puppet last run on mw1135 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures  
[19:59:18] <grrrit-wm>	 (03PS1) 10BryanDavis: Fix ip address for beta redis master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169789 
[20:00:04] <jouncebot>	 gwicke, cscott, arlolra, subbu: Dear anthropoid, the time has come. Please deploy Parsoid/OCG (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141029T2000).
[20:01:05] <grrrit-wm>	 (03CR) 1020after4: "Can we even test this in labs? does labs use the same front-end proxy setup??" [puppet] - 10https://gerrit.wikimedia.org/r/168509 (owner: 1020after4)
[20:01:10] <arlolra>	 on it jouncebot
[20:01:37] <grrrit-wm>	 (03CR) 10BryanDavis: "$ dig -x 10.68.16.146" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169789 (owner: 10BryanDavis)
[20:04:54] <icinga-wm>	 PROBLEM - ElasticSearch health check on elastic1014 is CRITICAL: CRITICAL - Could not connect to server 10.64.48.11  
[20:05:04] <icinga-wm>	 PROBLEM - ElasticSearch health check for shards on elastic1014 is CRITICAL: CRITICAL - elasticsearch http://10.64.48.11:9200/_cluster/health error while fetching: Max retries exceeded for url: /_cluster/health  
[20:05:07] <^d>	 oh shove it icinga-wm
[20:05:22] <grrrit-wm>	 (03CR) 10John F. Lewis: [C: 031] Fix ip address for beta redis master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169789 (owner: 10BryanDavis)
[20:06:04] <icinga-wm>	 RECOVERY - ElasticSearch health check on elastic1014 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 31: number_of_data_nodes: 31: active_primary_shards: 2033: active_shards: 6094: relocating_shards: 16: initializing_shards: 0: unassigned_shards: 0  
[20:06:04] <icinga-wm>	 RECOVERY - ElasticSearch health check for shards on elastic1014 is OK: OK - elasticsearch status production-search-eqiad: status: green, number_of_nodes: 31, unassigned_shards: 0, timed_out: False, active_primary_shards: 2033, cluster_name: production-search-eqiad, relocating_shards: 16, active_shards: 6094, initializing_shards: 0, number_of_data_nodes: 31  
[20:07:09] <arlolra>	 !log updated Parsoid to version 4e21bdb6fccc377468fd3d1cbc656fb64464ea78
[20:07:15] <morebots>	 Logged the message, Master
[20:08:02] <^d>	 ottomata: 1014 looks good
[20:08:12] <subbu>	 arlolra, that version should be the parsoid repo version not the deploy repo version.
[20:08:18] <ottomata>	 cool
[20:08:20] <subbu>	 you can edit the server admin log directly.
[20:08:25] <arlolra>	 :(
[20:08:46] <subbu>	 https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:09:46] <arlolra>	 ok fixed
[20:10:19] <^d>	 ottomata: Ok, I think we're all done for today :) I'll go ahead and ban 3-6 now for tomorrow morning.
[20:10:23] <^d>	 Thanks for all your help!!
[20:10:46] <ottomata>	 cool, sounds good
[20:14:52] <bd808>	 Anybody on tin with time to do a prod no-op merge for me to fix a beta bug? https://gerrit.wikimedia.org/r/#/c/169789/
[20:15:26] <grrrit-wm>	 (03CR) 10Reedy: [C: 032] Fix ip address for beta redis master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169789 (owner: 10BryanDavis)
[20:15:35] <grrrit-wm>	 (03Merged) 10jenkins-bot: Fix ip address for beta redis master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169789 (owner: 10BryanDavis)
[20:15:48] <bd808>	 thx Reedy 
[20:16:07] <logmsgbot>	 !log reedy Synchronized wmf-config/mc-labs.php: noop for prod (duration: 00m 17s)
[20:16:13] <morebots>	 Logged the message, Master
[20:17:44] <icinga-wm>	 PROBLEM - Parsoid on wtp1021 is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[20:18:25] <icinga-wm>	 PROBLEM - Parsoid on wtp1005 is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[20:18:50] <ori>	 ^ subbu
[20:19:03] <subbu>	 oh.. hmm .. 
[20:19:24] <ori>	 i'd check ganglia to see if they're overloaded
[20:19:34] <subbu>	 they are not.
[20:19:54] <subbu>	 i have that page open .. looks like they maybe stuck (transient)? .. will watch.
[20:20:08] <ori>	 something funky happened, though. from the graphs it looks like parsoid may have been restarted a little bit ago?
[20:20:18] <subbu>	 we deployed new code ..
[20:20:25] <icinga-wm>	 PROBLEM - Parsoid on wtp1023 is CRITICAL: Connection refused  
[20:20:38] <subbu>	 that is what we are monitoring now .. added timeouts to kill stuck processes .. but, we may have to revert that.
[20:21:01] <ori>	 that's a third parsoid host; i'd revert now and ask questions later :P
[20:21:12] <subbu>	 arlolra, let us revert.
[20:21:18] <arlolra>	 k
[20:23:26] <subbu>	 gwicke, ori,  git deploy revert doesn't work?
[20:23:41] <subbu>	 deploy: error: argument <subcommand>: invalid choice: 'revert' (choose from 'abort', 'finish', 'help', 'report', 'service', 'start', 'sync')
[20:24:13] <subbu>	 bd808, ^
[20:24:34] <icinga-wm>	 PROBLEM - Parsoid on wtp1015 is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[20:24:37] <subbu>	 i guess i will revert the manual way by checking out the old reversion and syncing.
[20:24:57] <bd808>	 subbu: yeah that's what you do. There will be an old tag you can checkout
[20:25:06] <JohnFLewis>	 subbu: I'd assume that is how it was supposed to be reverted
[20:25:12] <arlolra>	 poor parsoids ... dying a slow death
[20:25:13] <bd808>	 git deploy start; git checkout <tag>; git deploy sync
[20:26:46] <subbu>	 yes. syncing right now.
[20:27:44] <icinga-wm>	 RECOVERY - Parsoid on wtp1005 is OK: HTTP OK: HTTP/1.1 200 OK - 1108 bytes in 0.006 second response time  
[20:28:14] <icinga-wm>	 RECOVERY - Parsoid on wtp1021 is OK: HTTP OK: HTTP/1.1 200 OK - 1108 bytes in 0.017 second response time  
[20:28:39] <icinga-wm>	 RECOVERY - Parsoid on wtp1015 is OK: HTTP OK: HTTP/1.1 200 OK - 1108 bytes in 0.022 second response time  
[20:28:46] <subbu>	 !log reverted parsoid to version 617e9e61b625f25d79dfaab08830c396537be632 (due to stuck processes)
[20:28:48] <icinga-wm>	 RECOVERY - Parsoid on wtp1023 is OK: HTTP OK: HTTP/1.1 200 OK - 1108 bytes in 0.029 second response time  
[20:28:54] <morebots>	 Logged the message, Master
[20:33:01] <AaronSchulz>	 springle: lots of ' Unknown database ' errors spamming the logs for various servers and dbs
[20:43:09] <subbu>	 bd808, our jenkins job for php parser tests is failing with PHP Fatal error:  Interface 'Psr\Log\LoggerInterface' not found in /srv/ssd/jenkins-slave/workspace/parsoidsvc-php-parsertests/src/mediawiki/core/includes/debug/logger/Logger.php on line 46 .. 
[20:43:43] <subbu>	 i see a wikitech-l thread about it now.
[20:43:55] <bd808>	 subbu: It needs to clone the mediawiki/vendor repo too. Wikidata is having similar issues
[20:44:32] * bd808 is trying to fix too many things at once
[20:44:55] <bd808>	 subbu: What's the job name? I'll look at the the JJB config
[20:45:04] <subbu>	 one sec ..  let me find it.
[20:45:19] <subbu>	 parsoidsvc-php-parsertests
[20:46:27] <bd808>	 ack. custom to the max
[20:48:52] <bd808>	 subbu: You need to add cloning of mediawiki/vendor after /srv/deployment/integration/slave-scripts/bin/mw-core-get.sh is run. That will fix is for now. YOu should open a bug for hashar too because that will need more fixes in the future I fear.
[20:49:29] * bd808 will actually open a meta bug about these errors
[20:53:14] <subbu>	 ok .. let me find the repo ..
[20:54:18] <bd808>	 subbu: https://bugzilla.wikimedia.org/show_bug.cgi?id=72700 to track the problem
[20:56:51] <grrrit-wm>	 (03CR) 10Dzahn: [C: 04-1] "/srv/mediawiki/common/ does not exist on terbium" [software] - 10https://gerrit.wikimedia.org/r/163769 (owner: 10Reedy)
[20:58:21] <subbu>	 thanks.
[20:59:10] <subbu>	 bd808,             cd $MW_INSTALL_PATH; git clone https://git.wikimedia.org/git/mediawiki/vendor.git      will do it? or is there a different dir within the install?
[20:59:38] <grrrit-wm>	 (03CR) 10GWicke: "Will these paths work with the Jenkins update stuff? I don't know much about how that works; Antoine should know." [puppet] - 10https://gerrit.wikimedia.org/r/169622 (owner: 10Catrope)
[21:00:04] <jouncebot>	 yurik: Dear anthropoid, the time has come. Please deploy Wikipedia Zero (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141029T2100).
[21:00:59] <grrrit-wm>	 (03PS2) 10Reedy: Swap to /srv/mediawiki [software] - 10https://gerrit.wikimedia.org/r/163769 
[21:01:27] <grrrit-wm>	 (03PS3) 10Reedy: Swap to /srv/mediawiki [software] - 10https://gerrit.wikimedia.org/r/163769 
[21:02:44] <grrrit-wm>	 (03PS1) 10Reedy: Write ganglia temp file to /tmp [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169866 
[21:03:06] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] "yep, merging because the files exist in that place. doesn't mean i understand why we have dblist in 2 places and none of them is puppet. i" [software] - 10https://gerrit.wikimedia.org/r/163769 (owner: 10Reedy)
[21:06:28] <grrrit-wm>	 (03CR) 10Dzahn: [C: 031] "if that solves the caching on "dbtree" which is slow now.. yes, please" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169866 (owner: 10Reedy)
[21:11:09] <bd808>	 subbu: The clone should do it. You want to end up with mediawiki/vendor.git cloned to $IP/vendor
[21:11:56] <subbu>	 bd808, i also see a lot of chatter in #qa .. so, if the scripts get fixed, that wll do it .. should i wait for that to happen?
[21:14:43] <yurikR>	 ^d, around?
[21:14:56] <^d>	 What's up?
[21:15:18] <yurikR>	 ^d, was looking at your email re updating ext, not sure what i was doing wrong wrt git patching
[21:16:53] <^d>	 What was missing was a change to mediawiki/core on the corresponding wmf/* branch for the submodule update.
[21:18:10] <yurikR>	 ^d, i do these steps:  git co wmf/1.25wmf6 && git add extension/... && git commit && git review.   On tin i do git pull && git submodule update extension/...
[21:18:29] <bd808>	 subbu: If you can wait a bit hopefully we will magically fix things
[21:18:32] <^d>	 Weird, that sounds right.
[21:18:49] <yurikR>	 ^d, the extension is on master branch though
[21:19:10] <^d>	 That shouldn't matter too much.
[21:19:16] <^d>	 Maybe it's just wmf3 that was messed up?
[21:19:18] <^d>	 https://phabricator.wikimedia.org/P48
[21:19:29] <subbu>	 bd808, yes, that works. i like magic.
[21:19:50] <yurikR>	 ^d, sec, about to commit a new patch, you can check
[21:29:23] <greg-g>	 yurikR: heya, so, given the timing (and zerowiki being in phase0), is there a reason we shouldn't just have you ride the train for your code updates?
[21:30:00] <greg-g>	 pushing patches out to an hours old branch seems... weird in this case
[21:31:01] * greg-g goes into the last 1:1 of the day....
[21:33:17] <yurikR>	 greg-g, let me figure out this release first. I'm not really against the train ride, might be worth a try
[21:39:15] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] firewall: remove, unused [puppet] - 10https://gerrit.wikimedia.org/r/169571 (owner: 10Matanya)
[21:41:40] <icinga-wm>	 PROBLEM - Disk space on ocg1003 is CRITICAL: DISK CRITICAL - free space: / 350 MB (3% inode=72%):  
[21:50:13] <grrrit-wm>	 (03CR) 10Dzahn: "i would just do it at this point" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/134962 (owner: 10Reedy)
[21:51:27] <grrrit-wm>	 (03CR) 10Dzahn: "status unclear. +1 or not?" [puppet] - 10https://gerrit.wikimedia.org/r/166406 (owner: 10Christopher Johnson (WMDE))
[21:51:31] <grrrit-wm>	 (03CR) 10Reedy: "Just my comment to fix up, and to a static array as per Timo I guess" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/134962 (owner: 10Reedy)
[21:52:16] <grrrit-wm>	 (03CR) 10Dzahn: "do it" [puppet] - 10https://gerrit.wikimedia.org/r/160953 (owner: 10Alexandros Kosiaris)
[21:53:14] <grrrit-wm>	 (03CR) 10Dzahn: "removing self" [puppet] - 10https://gerrit.wikimedia.org/r/117698 (owner: 10Matanya)
[21:53:45] <grrrit-wm>	 (03CR) 10Dzahn: "should probably have comment from Coren now" [puppet] - 10https://gerrit.wikimedia.org/r/111387 (owner: 10Jeremyb)
[21:54:25] <grrrit-wm>	 (03CR) 10Mark Bergsma: [C: 04-1] "I don't like the mw* glob used on this, we need something more sophisticated than that." [puppet] - 10https://gerrit.wikimedia.org/r/160953 (owner: 10Alexandros Kosiaris)
[21:54:27] <grrrit-wm>	 (03CR) 10Dzahn: "nothing will ever happen without a corresponding ticket" [puppet] - 10https://gerrit.wikimedia.org/r/122621 (owner: 10Reedy)
[21:59:40] <grrrit-wm>	 (03PS1) 10Ori.livneh: Add tmpreaper module w/ tmpreaper::reap resource [puppet] - 10https://gerrit.wikimedia.org/r/169935 
[21:59:46] <ori>	 ^ paravoid, fyi
[22:00:50] <grrrit-wm>	 (03PS3) 10Reedy: Allow faux-renaming/database remapping [mediawiki-config] - 10https://gerrit.wikimedia.org/r/134962 
[22:01:00] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Allow faux-renaming/database remapping [mediawiki-config] - 10https://gerrit.wikimedia.org/r/134962 (owner: 10Reedy)
[22:01:02] <grrrit-wm>	 (03CR) 10Reedy: Allow faux-renaming/database remapping (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/134962 (owner: 10Reedy)
[22:01:35] <grrrit-wm>	 (03PS4) 10Reedy: Allow faux-renaming/database remapping [mediawiki-config] - 10https://gerrit.wikimedia.org/r/134962 
[22:01:43] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Allow faux-renaming/database remapping [mediawiki-config] - 10https://gerrit.wikimedia.org/r/134962 (owner: 10Reedy)
[22:02:31] <grrrit-wm>	 (03PS5) 10Reedy: Allow faux-renaming/database remapping [mediawiki-config] - 10https://gerrit.wikimedia.org/r/134962 
[22:04:43] <Krinkle>	 !log git-deploy: Deploying integration/slave-scripts a6a23ac1ec
[22:04:50] <morebots>	 Logged the message, Master
[22:05:00] <Krinkle>	 bd808: ^
[22:05:10] <greg-g>	 yurikR: everything ok? I just saw https://gerrit.wikimedia.org/r/#/c/169928/
[22:05:42] <yurikR>	 greg-g, i just realized that you were absolutely right - there is no point to relese because it just got released :)
[22:05:50] <greg-g>	 :) :)
[22:05:57] <bd808>	 Krinkle: Running https://integration.wikimedia.org/ci/job/mwext-Wikibase-client-tests/7216/console
[22:06:02] <greg-g>	 I don't get told that very often. I'm going to quote you on that.
[22:06:05] <greg-g>	 :P
[22:06:07] <bd808>	 and it failed :(
[22:06:41] <yurikR>	 greg-g, basq in the glory, it won't last ;)   Unless i was +2ing some additional stuff into master at the last moment, it should work otk
[22:06:41] <yurikR>	 ok
[22:06:48] <bd808>	 oh. new script not there yet.
[22:07:09] <greg-g>	 yurikR: /me nods
[22:07:18] <Krinkle>	 bd808: That one runs in wikidata-jenkins2 in labs, won't be updated until puppet runs
[22:07:25] <bd808>	 aude: How do you update scripts on wikidata-jenkins2?
[22:07:30] <greg-g>	 yurikR: you can of course do SWAT deploys as needed, but, let me know what you think about next week (if you want this window still)
[22:07:33] <aude>	 uhhh
[22:07:35] <Krinkle>	 aude: bd808: don't
[22:07:45] <Krinkle>	 Well, maybe with root.
[22:08:00] <Krinkle>	 let me verify in prod first since those are already deployed
[22:08:10] <Krinkle>	 https://integration.wikimedia.org/ci/job/parsoidsvc-php-parsertests/2773/console
[22:08:19] <Krinkle>	 https://integration.wikimedia.org/ci/job/parsoidsvc-php-parsertests/2776/console
[22:08:21] <Krinkle>	 passes :)
[22:08:39] <Krinkle>	 bd808: labs slaves don't get synced from prod git-deploy.
[22:08:41] <aude>	 i would have thought puppet runs regularly on the instances
[22:08:45] <Krinkle>	 30 min
[22:08:51] <Krinkle>	 it was merged 5min ago
[22:09:03] <aude>	 never had to do anything manually, although i didn't set them up
[22:09:12] <bd808>	 I forgot that they use a puppet hack instead of trebuchet
[22:09:14] <grrrit-wm>	 (03CR) 10Faidon Liambotis: [C: 04-1] "Why do this and have the directories be cleaned during puppet runs, instead of relying on the package's support for that (cron.daily, /etc" [puppet] - 10https://gerrit.wikimedia.org/r/169935 (owner: 10Ori.livneh)
[22:09:48] <Krinkle>	 Hm.. wikidata-jenkins2 are not part of the integration project
[22:09:56] <Krinkle>	 so in that case I don't know. could be anything.
[22:10:28] <Krinkle>	 wikidata custom stuff..
[22:10:34] <bd808>	 But \o/ for copy-n-paste + educated guesses :)
[22:10:44] <aude>	 Krinkle: i think we use teh same puppet classes etc.
[22:10:47] <Krinkle>	 OK
[22:10:50] <Krinkle>	 I can't access them though
[22:10:57] <aude>	 we can add you
[22:11:11] <hoo>	 #331376 {main}
[22:11:12] <hoo>	 wow
[22:11:17] <Krinkle>	 Can you grant me ssh into those instances? They are linked to jenkins prod instnace for slave launch. I can access it via that but would rather do it the right way.
[22:11:19] <hoo>	 largest stacktrace, ever
[22:11:27] <AaronSchulz>	 !log Re-running setZoneAccess.php for swift
[22:11:33] <morebots>	 Logged the message, Master
[22:11:34] <aude>	 hoo: ???
[22:11:38] <Krinkle>	 rerunning puppet manually on other slaves now
[22:11:43] <hoo>	 aude: renumber thing
[22:11:43] <grrrit-wm>	 (03PS6) 10Reedy: Allow faux-renaming/database remapping [mediawiki-config] - 10https://gerrit.wikimedia.org/r/134962 
[22:11:45] <grrrit-wm>	 (03PS1) 10Reedy: Rename chapcomwiki to affcomwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169939 
[22:11:46] <hoo>	 it recursed that often
[22:11:49] <aude>	 aaaah
[22:11:56] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Rename chapcomwiki to affcomwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169939 (owner: 10Reedy)
[22:12:03] <bd808>	 subbu: Your test is green again. :)
[22:12:49] <subbu>	 thanks! :)
[22:12:51] <aude>	 Krinkle: granted
[22:13:25] <aude>	 they are marked as puppet stale
[22:13:33] <aude>	 probably means we have to update manually :(
[22:13:55] <bd808>	 puppet stale usually means a local puppetmaster
[22:14:23] <Krinkle>	 aude: bd808: puppet natural run just started a few seconds ago
[22:14:25] <Krinkle>	 I'll let it finish :)
[22:14:28] <Krinkle>	 Should be good after that
[22:14:28] <grrrit-wm>	 (03PS2) 10Reedy: Rename chapcomwiki to affcomwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169939 
[22:14:31] <aude>	 ok
[22:14:44] <hoo>	 aude: looks good so far regarding Q30
[22:14:50] <aude>	 hoo: yay
[22:15:07] <hoo>	 Will look again tomorrow and close the bug if no new things pop up
[22:15:30] <hoo>	 But still... a 300k+ line stack trace is massive :D
[22:15:57] <aude>	 still a lot of GC cache entry warnings
[22:15:59] <Krinkle>	 https://tools.wmflabs.org/nagf/?project=wikidata-build
[22:16:00] <aude>	 not good
[22:16:19] <hoo>	 aude: What exactly do you mean?
[22:16:49] <aude>	 hitting gc
[22:16:49] <Krinkle>	 aude: bd808: https://integration.wikimedia.org/ci/job/mwext-Wikibase-client-tests/7217/console
[22:17:16] <aude>	 nothing bad should happen with gc, but still shouldn't be hitting it so often
[22:17:26] <bd808>	 Krinkle: sweet.
[22:17:40] <aude>	 Krinkle: looking good
[22:23:42] <grrrit-wm>	 (03CR) 10Reedy: "So this is now just the implementation. Taken the renaming of chapcomwiki to affcomwiki into a dependant patch" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/134962 (owner: 10Reedy)
[22:25:12] <grrrit-wm>	 (03PS1) 10Reedy: chapcomwiki -> affcomwiki [puppet] - 10https://gerrit.wikimedia.org/r/169944 (https://bugzilla.wikimedia.org/39482) 
[22:25:19] <grrrit-wm>	 (03PS3) 10Reedy: Rename chapcomwiki to affcomwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169939 (https://bugzilla.wikimedia.org/39482) 
[22:48:37] <grrrit-wm>	 (03PS1) 10Dzahn: dynamicproxy - disabled SSLv3 [puppet] - 10https://gerrit.wikimedia.org/r/169949 
[23:00:05] <jouncebot>	 RoanKattouw, ^d, marktraceur, MaxSem, RoanKattouw: Respected human, time to deploy SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141029T2300). Please do the needful.
[23:00:26] <RoanKattouw>	 I'll do it
[23:00:41] <mutante>	 !log restarting nginx on cp1044
[23:00:47] <morebots>	 Logged the message, Master
[23:05:29] <grrrit-wm>	 (03CR) 10John F. Lewis: [C: 031] "Looks good but, topic change perhaps?" [puppet] - 10https://gerrit.wikimedia.org/r/169949 (owner: 10Dzahn)
[23:06:05] <grrrit-wm>	 (03PS2) 10Dzahn: dynamicproxy - disable SSLv3 [puppet] - 10https://gerrit.wikimedia.org/r/169949 
[23:07:16] <grrrit-wm>	 (03CR) 10JanZerebecki: "To avoid any doubt: My +1 still stands." [puppet] - 10https://gerrit.wikimedia.org/r/166406 (owner: 10Christopher Johnson (WMDE))
[23:13:05] <logmsgbot>	 !log catrope Synchronized php-1.25wmf6/extensions/VisualEditor: SWAT (duration: 00m 04s)
[23:13:11] <morebots>	 Logged the message, Master
[23:13:13] <RoanKattouw>	 MaxSem: All yours
[23:13:49] <grrrit-wm>	 (03CR) 10Ori.livneh: "@paravoid: /etc/cron.daily/tmpreaper isn't great: it forces --ctime, --mtime-dir, and --symlinks on you, and it doesn't let you specify di" [puppet] - 10https://gerrit.wikimedia.org/r/169935 (owner: 10Ori.livneh)
[23:35:24] <logmsgbot>	 !log maxsem Synchronized php-1.25wmf5/extensions/MobileFrontend/: (no message) (duration: 00m 04s)
[23:35:30] <morebots>	 Logged the message, Master
[23:38:54] <logmsgbot>	 !log maxsem Synchronized php-1.25wmf6/extensions/MobileFrontend/: (no message) (duration: 00m 07s)
[23:38:59] <morebots>	 Logged the message, Master
[23:39:41] * greg-g glares at "(no message)"