[00:01:26] (03CR) 10Tim Landscheidt: [C: 031] "Tested on tools-exec-05." [operations/puppet] - 10https://gerrit.wikimedia.org/r/100147 (owner: 10Dapete) [02:06:31] !log LocalisationUpdate completed (1.23wmf7) at Mon Dec 30 02:06:31 UTC 2013 [02:06:59] Logged the message, Master [02:11:31] PROBLEM - Puppet freshness on virt1007 is CRITICAL: Last successful Puppet run was Tue 24 Dec 2013 07:29:50 PM UTC [02:21:00] !log LocalisationUpdate completed (1.23wmf8) at Mon Dec 30 02:21:00 UTC 2013 [02:21:18] Logged the message, Master [02:32:44] !log LocalisationUpdate ResourceLoader cache refresh completed at Mon Dec 30 02:32:44 UTC 2013 [02:32:59] Logged the message, Master [04:49:41] PROBLEM - Apache HTTP on mw1156 is CRITICAL: Connection timed out [04:49:41] PROBLEM - Apache HTTP on mw1157 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:50:11] PROBLEM - LVS HTTP IPv4 on rendering.svc.eqiad.wmnet is CRITICAL: Connection timed out [04:50:13] PROBLEM - Apache HTTP on mw1159 is CRITICAL: Connection timed out [04:50:14] PROBLEM - Apache HTTP on mw1160 is CRITICAL: Connection timed out [04:50:31] PROBLEM - Apache HTTP on mw1154 is CRITICAL: Connection timed out [04:50:33] greg-g: look up btw [04:50:41] PROBLEM - Apache HTTP on mw1153 is CRITICAL: Connection timed out [04:50:41] PROBLEM - Apache HTTP on mw1158 is CRITICAL: Connection timed out [04:50:41] PROBLEM - Apache HTTP on mw1155 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:51:32] RECOVERY - Apache HTTP on mw1157 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.066 second response time [04:51:32] okay [04:51:33] two things [04:51:37] #1 who broke rendering [04:51:53] #2 who pushed production changes on a holiday weekend sunday [04:52:01] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [04:52:07] i imagine i can possibly find out #2 [04:52:11] RECOVERY - Apache HTTP on mw1159 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 9.349 second response time [04:52:21] RECOVERY - Apache HTTP on mw1154 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.066 second response time [04:52:28] yeah, #2 was more of a "this is an angry rhetorical question" [04:52:31] RECOVERY - Apache HTTP on mw1155 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.679 second response time [04:52:38] right :) [04:53:01] RECOVERY - LVS HTTP IPv4 on rendering.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 70390 bytes in 0.199 second response time [04:53:03] RECOVERY - Apache HTTP on mw1160 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.059 second response time [04:53:10] i was planning to put in some review reqs first thing tomorrow morning :) [04:53:20] (and bump some from last week) [04:53:31] RECOVERY - Apache HTTP on mw1158 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.058 second response time [04:53:43] :) [04:53:45] cool [04:54:31] RECOVERY - Apache HTTP on mw1153 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.046 second response time [04:55:31] RECOVERY - Apache HTTP on mw1156 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.058 second response time [04:56:34] whoa, https://gdash.wikimedia.org/dashboards/reqerror/ is dramatic [04:56:57] looks recovered though [04:58:00] yeah [04:58:03] so .... yay ? [05:00:08] not yet recovered per the bot: 30 04:52:01 <+icinga-wm> PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [05:01:29] yeah, but gdash looks happy -- i'll stick around for a little while :) [05:01:46] erm, i can't figure out how to access icinga web ui [05:01:57] i even tried disabling HTTPS everywhere [05:02:35] oh, there's a security flaw so we enabled authentication on it [05:02:43] oh, yay [05:02:46] haven't gotten a backported package either made or created yet :-/ [05:02:52] ganglia redux [05:03:04] yep [05:03:08] :( [05:05:18] heh, /me spies a LeslieCarr @ RT 5064 [05:05:38] haha [05:05:40] oh yes [05:05:49] ironic ;) [05:07:09] i'm going to stick around for a few, double check that the errors go down [05:07:33] what the [05:07:46] those are rendering boxes? [05:08:55] greg-g's late to the party. i gave an early invitation! [05:09:04] yeah, well, sunday [05:09:25] site.pp says: # mw1153-1160 are imagescalers (precise) [05:09:31] * greg-g nods [05:09:51] greg-g: right but you were active in #wikimedia so i thought i'd ping you here :) [05:10:02] well, "active" [05:10:25] but yeah, thanks :) [05:10:27] ok, i guess it was mostly Gloria [05:12:31] PROBLEM - Puppet freshness on virt1007 is CRITICAL: Last successful Puppet run was Tue 24 Dec 2013 07:29:50 PM UTC [05:13:57] ok, back to sleep [05:15:19] errors look back to normal [05:21:12] LeslieCarr: I don't see any code updates in SAL (aaron was doing scap testing with nops, but he did make a jobqueue change there recently...) [05:21:52] which was just https://gerrit.wikimedia.org/r/#/c/104480/ [05:56:01] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [06:07:41] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:11:31] RECOVERY - RAID on searchidx1001 is OK: OK: optimal, 1 logical, 4 physical [07:30:50] (03PS2) 10Ori.livneh: Added MW version argument to scap [operations/puppet] - 10https://gerrit.wikimedia.org/r/104189 (owner: 10Aaron Schulz) [07:31:30] (03CR) 10Ori.livneh: [C: 032] Added MW version argument to scap [operations/puppet] - 10https://gerrit.wikimedia.org/r/104189 (owner: 10Aaron Schulz) [08:13:31] PROBLEM - Puppet freshness on virt1007 is CRITICAL: Last successful Puppet run was Tue 24 Dec 2013 07:29:50 PM UTC [08:32:52] I will be afk for a couple of hours (friend here from out of town), reachable by cell if something horrible happens [08:33:00] (03PS1) 10Ori.livneh: Update public domain mark, using text from CC0 [operations/software/mwprof] - 10https://gerrit.wikimedia.org/r/104491 [08:33:30] * ori waves [08:33:44] (03CR) 10Ori.livneh: [C: 032 V: 032] Update public domain mark, using text from CC0 [operations/software/mwprof] - 10https://gerrit.wikimedia.org/r/104491 (owner: 10Ori.livneh) [10:48:08] (03CR) 10Alexandros Kosiaris: [C: 04-2] "LVS servers are known to exhibit big problems when NTP is running on them. The fact that they are not running NTP is by design and not by " [operations/puppet] - 10https://gerrit.wikimedia.org/r/20681 (owner: 10Faidon Liambotis) [11:09:00] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Nitpic. On a side note, everything under the Exec line needs a rewrite. It is all about manipulating a couple of files so IMHO it should b" (033 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/104339 (owner: 10Matanya) [11:10:31] akosiaris: have time for a few questions not related to this patch please? [11:10:49] yeah [11:11:50] (03CR) 10Faidon Liambotis: "Yes, past experience with such behavior is the reason this changeset has been sitting for over a year, as is documented in a few places. W" [operations/puppet] - 10https://gerrit.wikimedia.org/r/20681 (owner: 10Faidon Liambotis) [11:12:20] manifests/misc/irc.pp has only one little class used in some places. should it be a module? [11:14:31] PROBLEM - Puppet freshness on virt1007 is CRITICAL: Last successful Puppet run was Tue 24 Dec 2013 07:29:50 PM UTC [11:14:38] akosiaris, also how can i download a patch done by someone else on top of a patch by me? review -d didn't bring those changes [11:15:54] last question: admins.pp really hurts my eyes. what should be the correct approach to handle this? [11:17:28] matanya: on the first question, the correct way would be the trifecta in a single class module and then a role class that includes the monitor_service and that class and that role class should be included in site.pp and wikibugs.pp. It is indeed too small though so it might note make the best of sense. I 'd be for that approach though just for the consistency benefits [11:18:22] on the second question, you just get the patch from gerrit and apply it manually with patch. It can not be done with git review AFAIK [11:20:04] finally on the last question, a define would certainly help to remove some code duplication [11:21:03] those inheritances then should be removed as well (puppet inheritance sucks) [11:21:22] thanks a lot. i'll fix the braces, and then will go on to the rest [11:23:55] (03PS2) 10Matanya: labs_vmbuilder: lint clean [operations/puppet] - 10https://gerrit.wikimedia.org/r/104339 [11:34:01] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Comments inline." (037 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/104348 (owner: 10Matanya) [11:36:14] matanya: .... as far as https://gerrit.wikimedia.org/r/104339, I was not suggesting that approach. I was suggesting [11:36:14] require => [ Package['python-vm-builder'], [11:36:14] File["{blahablah"], [11:36:14] ] [11:36:14] The last line being the important one [11:36:44] the rest was ok. All I wanted was the brace indented some spaces :-) [11:45:45] akosiaris: monday is monday :) [11:46:21] http://en.wikipedia.org/wiki/I_Don't_Like_Mondays ? [11:49:05] (03PS8) 10Matanya: svn: convert into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/100760 [11:51:15] (03PS1) 10Alexandros Kosiaris: Remove etherpad-old [operations/dns] - 10https://gerrit.wikimedia.org/r/104498 [11:53:48] akosiaris: "sometimes it is better to stay in bed on monday in order not to fix monday's work the rest of the week" [11:54:08] hahahaha. so true [11:55:17] should be in our bugzilla quips [11:55:43] (03PS3) 10Matanya: labs_vmbuilder: lint clean [operations/puppet] - 10https://gerrit.wikimedia.org/r/104339 [15:39:01] PROBLEM - Host mw31 is DOWN: PING CRITICAL - Packet loss = 100% [15:39:51] RECOVERY - Host mw31 is UP: PING OK - Packet loss = 0%, RTA = 35.41 ms [15:54:41] PROBLEM - NTP on mw31 is CRITICAL: NTP CRITICAL: Offset unknown [15:58:41] RECOVERY - NTP on mw31 is OK: NTP OK: Offset 0.001707911491 secs [16:08:27] thanks akosiaris [16:08:35] :-) [16:10:49] akosiaris: how about a simple planet update? :) https://gerrit.wikimedia.org/r/#/c/103047/ [16:11:58] (03CR) 10Alexandros Kosiaris: [C: 032] [Planet] Update wikimedia.fi URL [operations/puppet] - 10https://gerrit.wikimedia.org/r/103047 (owner: 10Nemo bis) [16:12:16] (03CR) 10Alexandros Kosiaris: [V: 032] [Planet] Update wikimedia.fi URL [operations/puppet] - 10https://gerrit.wikimedia.org/r/103047 (owner: 10Nemo bis) [16:12:27] thanks :D [16:12:35] you are welcome [16:28:37] hiii akosiaris, got any experience with icinga? [16:28:54] ottomata: yeah [16:28:54] i'm trying to figure out why my new nrpe process alerts aren't showing up [16:29:09] https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=analytics1021 [16:29:20] I added [16:29:27] nrpe::monitor_service { 'kafka': [16:29:27] nrpe_command => '/usr/lib/nagios/plugins/check_procs -c 1:1 -C java -a "kafka.Kafka /etc/kafka/server.properties"', [16:29:28] a while ago [16:29:36] and i see the checks on neon properly [16:30:02] Error: The description string for service 'Hadoop Namenode (Stand By)' on host 'analytics1009' contains one or more illegal characters. [16:30:02] Error: The description string for service 'Hadoop Namenode (Primary)' on host 'analytics1010' contains one or more illegal characters. [16:30:07] that is your problem [16:30:15] psshhh [16:30:20] it fails to reload its configuration due to those two errors [16:30:24] yay! [16:30:27] you are sucha good log checker [16:30:34] lol [16:30:41] my new role/title [16:30:44] :-) [16:31:01] wait, what log is that in? [16:31:34] no log. Just icinga -v /etc/icinga/icinga.cfg or even simplier [16:31:37] service icinga reload [16:31:39] ohhhhhhh [16:31:41] oh [16:31:47] i hadn't done reload yet because I was afraid of what might happen! [16:31:50] (03CR) 10Chad: [C: 031] "lgtm, will merge at the start of the window." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104134 (owner: 10Manybubbles) [16:32:11] yeah not restart or stop/start sequence [16:32:14] hm ok [16:32:17] or it will not come back again [16:32:20] i guess it doesn't like spaces? [16:32:32] i think it the ( character [16:32:35] or maybe the parens [16:32:36] hgm [16:32:40] but the kafka desc doeesn't have parens [16:33:49] (03PS1) 10Ottomata: Removing parens in nrpe descriptions [operations/puppet] - 10https://gerrit.wikimedia.org/r/104515 [16:34:06] (03CR) 10Ottomata: [C: 032 V: 032] Removing parens in nrpe descriptions [operations/puppet] - 10https://gerrit.wikimedia.org/r/104515 (owner: 10Ottomata) [16:43:23] akosiaris: if I am changing someone's ssh key [16:43:36] and the email/desc for the key hasn't changed [16:43:45] is it better to ensure => absent on the first one and add a new one [16:43:48] or just change the actual key? [16:44:04] ensure => absent [16:44:07] k [16:44:38] i'll have to use a new title then, right? [16:44:38] ssh_authorized_key { 'nuria@wikimedia.org': [16:44:41] maybe [16:44:43] yes [16:44:53] does it have to match what they have in their own key file? [16:44:58] or is it arbitrary? [16:45:02] arbitrary [16:45:04] ok [16:46:41] (03PS1) 10Ottomata: Changing Nuria's ssh key per her request [operations/puppet] - 10https://gerrit.wikimedia.org/r/104521 [16:46:42] (03PS1) 10Ottomata: Changing Dan's production ssh key [operations/puppet] - 10https://gerrit.wikimedia.org/r/104522 [16:47:42] (03CR) 10Ottomata: [C: 032 V: 032] Changing Nuria's ssh key per her request [operations/puppet] - 10https://gerrit.wikimedia.org/r/104521 (owner: 10Ottomata) [16:48:14] (03CR) 10Ottomata: [C: 032 V: 032] Changing Dan's production ssh key [operations/puppet] - 10https://gerrit.wikimedia.org/r/104522 (owner: 10Ottomata) [16:50:14] (03PS1) 10Ottomata: Removing email from Dan's key [operations/puppet] - 10https://gerrit.wikimedia.org/r/104524 [16:50:44] (03CR) 10Ottomata: [C: 032 V: 032] Removing email from Dan's key [operations/puppet] - 10https://gerrit.wikimedia.org/r/104524 (owner: 10Ottomata) [17:01:40] ^d: I'm going to start deploying now. First the elasticsearch side config updates. [17:01:50] <^d> mmk. [17:01:55] <^d> I'm gonna start merging in gerrit [17:01:59] <^d> so jenkins can get going [17:03:21] thanks!@ [17:04:23] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [17:04:32] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [17:04:42] RECOVERY - Host virt1005 is UP: PING OK - Packet loss = 0%, RTA = 0.54 ms [17:05:22] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [17:06:39] ^d: all done with pre updates. moving on to deploy to wmf8 which looks like it merged [17:07:06] <^d> Yep [17:07:12] RECOVERY - Puppet freshness on virt1001 is OK: puppet ran at Mon Dec 30 17:07:05 UTC 2013 [17:07:13] <^d> Well, we can just scap all at once, right? [17:07:22] RECOVERY - Puppet freshness on virt1003 is OK: puppet ran at Mon Dec 30 17:07:18 UTC 2013 [17:07:27] <^d> 7's in too [17:07:32] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Mon 30 Dec 2013 05:07:05 PM UTC [17:08:22] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Mon 30 Dec 2013 05:07:18 PM UTC [17:08:30] <^d> manybubbles: Are you wanting to sync config with code, or scap & then sync-file initsettings? [17:08:53] I wanted to do wmf8 first if possible [17:08:58] then wmf7, then config [17:09:20] thats two scaps and a sync-file [17:09:53] I'm actually just now ready for the first scap if that sounds ok [17:11:11] I could do the internationalization updates then the sync-dir [17:11:24] https://wikitech.wikimedia.org/wiki/How_to_deploy_code#Alternative_to_scap [17:11:31] if it is likely to work [17:12:54] greg-g: we've started, btw [17:12:57] what does scap stand for? sync config mumble puppet? [17:13:02] cool [17:13:19] jgage: not sure [17:13:21] <^d> jgage: sync-common-all-php, if memory serves. [17:13:28] ah thanks [17:13:41] ^d: do you have an opinion on which I should do? [17:14:06] <^d> I'd just scap once. [17:14:10] <^d> 7 & 8. [17:14:18] <^d> Then sync config for new wikis with sync-file. [17:14:23] ^d: k. [17:14:25] I'll do that [17:14:40] yaaa, thanks akosiaris, icinga is doing what I want now [17:15:06] <^d> jgage: https://wikitech.wikimedia.org/wiki/Scap [17:16:17] thanks, was just editing to include the definition but perhaps it's not needed [17:16:24] PROBLEM - Puppet freshness on virt1007 is CRITICAL: Last successful Puppet run was Tue 24 Dec 2013 07:29:50 PM UTC [17:16:40] ^d: manybubbles@tin:/a/common$ scap 'Update CirrusSearch and Elastica to master' [17:16:42] Invalid MediaWiki version "Update CirrusSearch and Elastica to master" [17:16:47] jgage: no harm [17:17:05] jgage: and then you know if ^d was wrong, someone will fix it (maybe) ;) [17:17:08] <^d> scap syntax changed... [17:17:10] confusingly, higher in that doc it says that sync-common-all just runs scap [17:17:24] jgage: scap also just calls scap-1 [17:17:27] :) [17:17:30] hah great [17:18:47] er, sync-common just calls scap-1 [17:18:53] https://git.wikimedia.org/blob/operations%2Fpuppet.git/df881e08c2f3a7365aa400462e0515206136e75a/files%2Fscap%2Fsync-common [17:18:58] <^d> AaronSchulz: What's the syntax for scap now? [17:19:02] mmm rtfs [17:19:54] ^d: oh, that was merged, it works the same with no arg [17:20:02] ...unless there is a bug :) [17:20:09] <^d> Must be [17:20:15] <^d> How do you sync all versions with a description? [17:20:32] <^d> (See ~15 lines up from manybubbles) [17:20:34] RECOVERY - Puppet freshness on virt1002 is OK: puppet ran at Mon Dec 30 17:20:33 UTC 2013 [17:20:43] Copying to tin from tin.eqiad.wmnet...Unexpected remote arg: tin.eqiad.wmnet::common/ [17:20:56] did $(scap active 'log log log') [17:21:20] <^d> ugh. [17:21:22] it is running though [17:21:24] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Last successful Puppet run was Mon 30 Dec 2013 05:20:33 PM UTC [17:21:33] should I shoot it before it does anything? [17:21:55] <^d> Yeah let's stop for a sec. [17:22:04] stopped during [17:22:11] Updating LocalisationCache for 1.23wmf7... ^CUpdate of MediaWiki localisation messages failed [17:23:19] .... [17:23:21] <^d> I'm on tin and going to try this. [17:23:51] * ^d hmms [17:24:17] I'm not seeing ore fatals then normal, so I doubt it caused any real damage [17:24:31] greg-g: yay first deploy in the morning! [17:24:39] happy monday [17:24:40] hmm, 'scap "desc"' won't work though [17:25:00] Aaron|home: $(scap active "desc") does not a big deal [17:25:06] once the docs are updated [17:25:14] !log demon started scap: active Cirrus and Elastica to master [17:25:15] <^d> AaronSchulz: if rsync "${RSYNC_ARGS[@]}" "$SERVER"::common/ "${MW_COMMON}" is wrong I think. [17:25:32] Logged the message, Master [17:25:38] <^d> !log aborted scap [17:25:46] that line is old, all I did was change MW_RSYNC_ARGS to RSYNC_ARGS [17:25:53] ^d: are you saying RSYNC_ARGS is fishy? [17:25:56] Logged the message, Master [17:26:07] <^d> Yep. [17:26:24] <^d> I'm getting mw1097: Unexpected remote arg: mw1070.eqiad.wmnet::common/ consistently. [17:26:29] <^d> Missing a path somewhere? :) [17:26:58] greg-g: does what is being deployed include load balancer class rename? [17:27:05] in core [17:27:25] <^d> The deploy right now? [17:27:27] * aude thinks that is next week or test2 on thursday [17:27:40] ^d: core deploy [17:27:47] <^d> ignore me :) [17:27:51] k [17:28:40] i think the change is backwards compatible fine but is worth paying some extra attention to make sure no problems [17:28:55] ^d: have you tried just "scap"? [17:29:08] !log demon started scap [17:29:16] <^d> !log aborted scap [17:29:20] <^d> Same result. [17:29:25] Logged the message, Master [17:29:39] Logged the message, Master [17:29:56] ^d: all that does is RSYNC_ARGS=MW_RSYNC_ARGS ... how is that even different? [17:30:22] <^d> I don't know. I'm just suspecting that line because of the output I'm getting from tin. [17:30:49] yeah I get that with sync-common [17:33:15] <^d> Hmm. [17:33:28] * Aaron|home is confused [17:34:11] ^d: https://gerrit.wikimedia.org/r/#/c/104189/2/files/scap/scap-2 [17:35:01] <^d> Yeah, I know that what you changed isn't suspect. [17:35:16] I see [17:35:18] no $ [17:35:54] not enough $ [17:36:00] insert more $ [17:36:18] sounds like oracle, actually [17:36:25] <^d> Don't we have a bug for rewriting this in something other than bash? :p [17:36:38] ^d: perl [17:36:54] RECOVERY - Puppet freshness on virt1003 is OK: puppet ran at Mon Dec 30 17:36:51 UTC 2013 [17:37:14] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Mon 30 Dec 2013 05:36:51 PM UTC [17:37:25] RECOVERY - Puppet freshness on virt1001 is OK: puppet ran at Mon Dec 30 17:37:21 UTC 2013 [17:37:25] that's not enough though [17:37:34] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Mon 30 Dec 2013 05:37:21 PM UTC [17:37:48] RSYNC_ARGS=${MW_RSYNC_ARGS[@]} [17:37:53] ok, there we go [17:38:04] ^d: bash is by far the worse language I've seen [17:38:18] <^d> I'd vote for redoing it in python. [17:38:20] <^d> :) [17:38:28] Aaron|home: lolcats is at least as bad [17:38:30] manybubbles: Ops prefer things to be in ruby [17:38:36] <^d> node.js! [17:38:40] java [17:38:46] oooh, scala [17:38:54] ok should I scap now? [17:38:56] <^d> coldfusion? [17:39:02] VB .NET [17:39:08] I once went to a coldfusion conference [17:39:16] <^d> Why? [17:39:16] VB.NET is not as shitty as you'd imagine it to be [17:39:27] Is someone messing with virt100x today, or are they just spontaneously unstable? [17:39:28] <^d> Aaron|home: Patch incoming? [17:39:43] manybubbles: Yes and no. I've written enough of it though... [17:39:44] it was only 45% cold fusion. it advertised itself (in some circles) as 10% cold fusion. [17:39:53] Reedy: that was my first job! [17:39:56] Bad venn diagrams [17:40:03] Depends on the overlap [17:40:12] so long as it isn't vb for applications [17:40:16] yeah, ugh, fucking git error [17:40:19] that actually might be the worst language ever [17:40:37] because it has to run inside excel or something. fun to watch though. [17:41:01] You can write standalone apps in VB .NET and such... [17:41:09] It's vbscript in excel? [17:41:23] Reedy: sure. vb.net is a first class peer with C# [17:41:36] it is vbscript in excel. or rather, it is kinda vbscript [17:41:44] <^d> manybubbles: The time tracking application at $DAY_JOBS[0] was one of those "MS Access" macro-applications, originally on top of an Access database. They later upgraded it to use SQL Server. [17:41:54] but you move the curser around and make changes so they have to keep their hands off the keyboard while it works [17:42:11] (03PS1) 10Aaron Schulz: Fixed RSYNC_ARGS assignment in scap-2 [operations/puppet] - 10https://gerrit.wikimedia.org/r/104529 [17:42:24] I like C# [17:42:27] ^d: honestly, I don't hate SQL Server that much. I mean, they have balls naming it SQL Server and all. [17:42:28] VB tends to be too wordy though [17:42:35] but beyond that it is reasonably fully featured [17:42:36] (03CR) 10Chad: [C: 031] "scap is broken, please merge." [operations/puppet] - 10https://gerrit.wikimedia.org/r/104529 (owner: 10Aaron Schulz) [17:42:39] C# = Microsoft Java [17:42:47] * Aaron|home ducks [17:42:51] C# is pretty much better than java [17:42:58] as a language [17:43:09] as an ecosystem is always the java argument [17:43:13] but I hate them all [17:43:13] Aaron|home: Because it is their rival to java [17:43:13] <^d> manybubbles: I don't hate it either. It was actually an upgrade from an access db :) [17:43:22] Mono! [17:43:31] ;) [17:44:02] * Reedy nudges greg-g [17:44:09] * Reedy whispers it's not a programming language [17:44:16] (03PS1) 10Jgreen: flip DNS for fundraisingdb-read.wmnet from db78 to db1008 [operations/dns] - 10https://gerrit.wikimedia.org/r/104530 [17:44:59] <^d> manybubbles: Maybe we should rewrite Lucene & Elasticsearch in C#. [17:45:05] ori: https://gerrit.wikimedia.org/r/104529 on fix [17:45:06] (03CR) 10Manybubbles: [C: 031] Fixed RSYNC_ARGS assignment in scap-2 [operations/puppet] - 10https://gerrit.wikimedia.org/r/104529 (owner: 10Aaron Schulz) [17:45:08] <^d> Because why not? [17:45:33] what's funny is that RSYNC_ARGS=$MW_RSYNC_ARGS result in the value "-a" [17:45:52] Aaron|home: shakes fist at bash [17:46:40] <^d> Aaron|home: ori's idle for 5.5h. Maybe we need to ping someone else :) [17:46:54] great timing [17:47:05] ottomata: what about you? [17:47:08] I think bd is off today [17:47:11] you merge things [17:47:13] apergos? [17:47:16] do y'all need a merge? [17:47:21] yes [17:47:21] <^d> Yessss [17:47:24] Not that I know anything about scap, but… patch looks legit [17:47:25] eh? [17:47:28] <^d> https://gerrit.wikimedia.org/r/104529 [17:47:53] (03CR) 10Andrew Bogott: [C: 032] Fixed RSYNC_ARGS assignment in scap-2 [operations/puppet] - 10https://gerrit.wikimedia.org/r/104529 (owner: 10Aaron Schulz) [17:48:00] ah you took i [17:48:06] t [17:48:21] yep [17:48:31] ah [17:48:46] what aargs were not getting passed? [17:49:30] um… I was pretty much just taking ^d's word for this [17:49:37] asking ^d [17:49:39] that and being well aware that bash never works the first time [17:49:39] <^d> Take Aaron|home's word for it. [17:49:45] oops [17:49:54] <^d> This is the second time ;-) [17:50:14] RECOVERY - Puppet freshness on virt1002 is OK: puppet ran at Mon Dec 30 17:50:10 UTC 2013 [17:50:17] oh I guarantee there is a big difference (between before and after), I just was wondering what it ws being called with [17:50:24] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Last successful Puppet run was Mon 30 Dec 2013 05:50:10 PM UTC [17:50:30] apergos: at least -a [17:50:34] I'm not sure beyond that [17:50:42] I'm actually not brain fried at this hour so I can even remember bash syntax, how nice [17:50:56] <^d> andrewbogott: We'll have to force a puppet run on tin. [17:50:57] usually by the time the evening rolls around I would have to test by hand or give up [17:51:05] apergos, I'm going to lunch but will look at those puppet warnings when I return. I suspect that someone (mhoover?) is messing with the servers and happens on to be on irc :( [17:51:06] Remember? What's that? [17:51:12] ^d ok [17:51:20] thanks [17:51:28] I know the virts were in process [17:51:41] so I was not too worried about it yet [17:51:43] yeah, 1007 has been complaining forever because it's broken in some interesting way [17:51:54] Yeah, safe for you to ignore them for the next couple of weeks [17:52:29] at he point where they have accounts on them it would be nice if puppet would run on them [17:52:59] until then I'm less concerned [17:53:39] 'accounts' like, shell accounts? Because they do have shell accounts for mike. but no one else. [17:53:50] ^d: I imagine I should wait twenty minutes or so for scap to be updated [17:54:08] <^d> No, should be up as soon as andrewbogott says puppet is done :) [17:54:09] (03CR) 10Jgreen: [C: 032 V: 031] flip DNS for fundraisingdb-read.wmnet from db78 to db1008 [operations/dns] - 10https://gerrit.wikimedia.org/r/104530 (owner: 10Jgreen) [17:54:15] tin is doing a puppet run right now, will be done in a minute or two [17:54:29] andrewbogott: thanks! [17:54:46] ok, done. And, -> lunch [17:54:51] thanks [17:54:53] starting again [17:55:02] !log manybubbles started scap: active Update CirrusSearch and Elastica to master [17:55:06] <^d> Aaron|home: Can you send an e-mail to engineering about the scap syntax change btw? [17:55:20] Logged the message, Master [17:55:20] mw1049: rsync: -a --delete-delay --delay-updates --compress --delete --exclude=**/.svn/lock --exclude=**/.git/objects --exclude=**/.git/**/objects --exclude=**/cache/l10n/*.cdb --no-perms: unknown option [17:55:22] mw1030: rsync error: syntax or usage error (code 1) at main.c(1453) [client=3.0.9] [17:55:43] mw1072: rsync: -a --delete-delay --delay-updates --compress --delete --exclude=**/.svn/lock --exclude=**/.git/objects --exclude=**/.git/**/objects --exclude=**/cache/l10n/*.cdb --no-perms: unknown option [17:55:44] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [17:56:11] !log aborted scap [17:56:20] I aborted it. it was blowing up [17:56:26] Aaron|home and ^d: sad [17:56:29] Logged the message, Master [17:56:34] RECOVERY - RAID on searchidx1001 is OK: OK: optimal, 1 logical, 4 physical [17:58:12] what the ef [17:58:54] anyway to just revert to a known "good" (for various definitions of good) version of scap? [18:01:07] !log powering down search1005 to swap /dev/sdb [18:01:23] Logged the message, Master [18:01:30] greg-g: at least I found this problem this morning rather than reedy in an hour [18:01:57] sync-common still giving trouble, even though the only difference is RSYNC_ARGS=${MW_RSYNC_ARGS[@]} and RSYNC_ARGS being used in place of MW_RSYNC_ARGS [18:02:06] since I have no $1 [18:02:23] it's amazing that that could cause it to not work [18:03:14] PROBLEM - Host search1005 is DOWN: PING CRITICAL - Packet loss = 100% [18:06:39] RECOVERY - Puppet freshness on virt1001 is OK: puppet ran at Mon Dec 30 18:06:36 UTC 2013 [18:06:49] RECOVERY - Puppet freshness on virt1003 is OK: puppet ran at Mon Dec 30 18:06:41 UTC 2013 [18:07:19] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Mon 30 Dec 2013 06:06:41 PM UTC [18:07:29] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Mon 30 Dec 2013 06:06:36 PM UTC [18:09:06] <^d> Ok, so what are we gonna do? [18:09:33] i'm here now [18:09:38] I see [18:09:38] sbmit a patch and i'll merge [18:09:43] on it [18:10:19] RECOVERY - Host search1005 is UP: PING OK - Packet loss = 0%, RTA = 0.25 ms [18:10:59] (03PS1) 10Aaron Schulz: Fix array copy in scap-2 for real this time [operations/puppet] - 10https://gerrit.wikimedia.org/r/104533 [18:11:03] ori: ^ [18:11:29] rsync doesn't whine in test.sh...so that better work [18:12:10] (03CR) 10Ori.livneh: [C: 032 V: 032] Fix array copy in scap-2 for real this time [operations/puppet] - 10https://gerrit.wikimedia.org/r/104533 (owner: 10Aaron Schulz) [18:13:06] * Aaron|home was relying on echo too much to test stuff last week [18:13:40] some of these type errors are invisible when you print stuff [18:14:03] rewrite in haskel [18:14:17] meh [18:14:27] today greg-g is helpful I see [18:14:28] you made like a dozen substantial improvements [18:14:30] breakage is ok [18:14:31] ^d: so even if tin works, all the servers need a puppet run to fix scap-2 [18:14:51] Nemo_bis: hey, type errors would exist :) [18:15:40] Aaron|home and ori: let me know when I should redo it. [18:16:17] manybubbles: k, figuring out a scheme to rsync it so you don't have to wait for the entire app server group to run puppet [18:17:08] ori: cool. I don't mind waiting but I figure I'm a better test then the train [18:17:29] rather, lower cost of failure [18:18:13] yep. sorry 'bout that! [18:20:19] RECOVERY - Puppet freshness on virt1002 is OK: puppet ran at Mon Dec 30 18:20:12 UTC 2013 [18:20:29] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Last successful Puppet run was Mon 30 Dec 2013 06:20:12 PM UTC [18:20:57] <^d> ori: salt? [18:23:51] i went with a bash for loop [18:24:03] bash is what got us into this trouble in the first place! [18:24:09] <^d> heh [18:24:17] heh, very true [18:24:29] Reedy: where is the live hack commit? [18:24:45] Aaron|home: in git [18:24:59] I mean, the ID [18:25:04] https://gerrit.wikimedia.org/r/22466 [18:25:34] Reedy: how do you remove things from it? Do you make a new patch set or just a new version? [18:26:02] i usually unabandon, amend, commit, abandon [18:26:06] oh seriously parens? [18:26:33] ^d: btw, https://gerrit.wikimedia.org/r/#/c/104151/ [18:26:58] apergos: it's not even the most ridiculous bash idiosyncrasy to have bitten scap [18:27:50] http://mywiki.wooledge.org/BashFAQ/105 [18:28:14] set -e behavior [18:28:15] "These rules are extremely convoluted, and they still fail to catch even some remarkably simple cases. Even worse, the rules change from one Bash version to another, as Bash attempts to track the extremely slippery POSIX definition of this "feature". When a SubShell is involved, it gets worse still -- the behavior changes depending on whether Bash is invoked in POSIX mode." [18:28:35] nice [18:28:53] especially how "feature" is in quotes [18:29:49] greg-g: I have half an hour in my window left. that isn't really enough time to expect to scap and sync some commit updates [18:29:52] in other words you get different behavior if your shebang is #!/bin/sh, and /bin/sh is symlinked to /bin/bash, and a direct #!/bin/bash -- but only when subshells are involved! [18:29:54] assuming I could scap now [18:30:01] manybubbles: it's still running [18:30:16] I figured [18:30:18] thanks [18:30:28] I would be just as happy for /bin/sh to die die die, even as symlink [18:30:42] manybubbles: it just finished [18:30:45] but I understand that on a minimalist system you are going to have that link to somethingorother [18:30:55] i checked a couple of random servers and the file permissions are correct [18:30:58] but then you shouldn't be writing fancy bash scripts for it in that case [18:30:59] manybubbles: well, reedy could delay a little before the trian window, it shouldn't be a big deal, it's a config change only day [18:31:45] manybubbles: so you're good to go [18:31:52] ori: I'll start! [18:32:07] !log manybubbles started scap: active Update CirrusSearch and Elastica to master [18:32:22] Logged the message, Master [18:32:44] it is not failing [18:32:51] so better! [18:33:41] cool, thanks for your help / patience with this [18:34:18] manybubbles: heh, what you describe as "in-place" updates sounds like the opposite to me [18:34:36] Call to undefined method JpegHandler::getEntireText() [18:34:56] ^d: did we start using that getEntireText thing before it was deployed? [18:35:20] I thought for sure it had made it because it was merged forever ago [18:37:29] RECOVERY - Puppet freshness on virt1003 is OK: puppet ran at Mon Dec 30 18:37:22 UTC 2013 [18:37:29] RECOVERY - Puppet freshness on virt1001 is OK: puppet ran at Mon Dec 30 18:37:28 UTC 2013 [18:37:30] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Mon 30 Dec 2013 06:06:36 PM UTC [18:38:19] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Mon 30 Dec 2013 06:37:22 PM UTC [18:38:27] manybubbles: so why are the index updates staged first? Is there some referential integrity requirement? [18:38:28] <^d> manybubbles: I thought so too :) [18:39:54] ^d: looks like it is only in wmf8 [18:40:09] I propose backporting it to wmf7 to stop the complaining [18:40:18] because we're going to have wmf7 for another couple days [18:42:47] <^d> manybubbles: https://gerrit.wikimedia.org/r/#/c/104538/ [18:42:57] <^d> Bahah, dupe. [18:43:53] <^d> manybubbles: Merged, feel free to sync-file it. [18:44:04] ^d: I'll sync file it once scap is done [18:44:10] <^d> mmk [18:47:06] Aaron|home: I'm not sure what you mean about index updates being staged first. we put in them in the job queue to keep things moving more quickly and to make sure users don't see it if we break it [18:47:50] manybubbles: you said you switch a pointer from the old to new location [18:48:00] actually is this pointer per-page/document? [18:48:33] Aaron|home: nah, it is for the whole index. it is technically switching which real index has an alias. the operation is atomic so it lets there be no interruption of service [18:48:53] you build the new index but you don't add it to the pointer because it'll make duplicate results [18:49:30] then you swap the alias (I keep calling it pointer, please merge terms) [18:50:00] RECOVERY - Puppet freshness on virt1002 is OK: puppet ran at Mon Dec 30 18:49:58 UTC 2013 [18:50:29] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Last successful Puppet run was Mon 30 Dec 2013 06:49:58 PM UTC [18:51:15] ^d: is this normal during scap: Exception from line 468 of /usr/local/apache/common-local/php-1.23wmf7/includes/cache/LocalisationCache.php: No localisation cache found for English. Please run maintenance/rebuildLocalisationCache.php. [18:52:35] <^d> No. [18:52:42] I don't think so [18:53:18] so, maybe that is a new thing from the scap change [18:53:32] it is hard to see from all the other errors that we're causing (and about to fix) [18:53:52] weee [18:54:07] greg-g: reedy can wait, right :) [18:54:13] a bit, yeah [18:55:32] when we rewrite scap, it should have a progress bar [18:55:37] and all the logs [18:55:40] but a progress bar [18:56:42] <^d> Like a microsoft-style one? That goes to 95% and then takes 20 minutes? [18:57:10] ^d: better than nothing, I guess. [19:00:45] (03PS1) 10Ottomata: Making varnishkafka ganglia view into a define to pass topic_regex [operations/puppet] - 10https://gerrit.wikimedia.org/r/104542 [19:01:43] can I sync-file during scap? [19:01:44] (03PS2) 10Ottomata: Making varnishkafka ganglia view into a define to pass topic_regex [operations/puppet] - 10https://gerrit.wikimedia.org/r/104542 [19:01:52] so much blowing up makes me sad [19:01:53] manybubbles: yes [19:02:20] (03CR) 10jenkins-bot: [V: 04-1] Making varnishkafka ganglia view into a define to pass topic_regex [operations/puppet] - 10https://gerrit.wikimedia.org/r/104542 (owner: 10Ottomata) [19:02:43] Aaron|home: thanks [19:02:49] !log manybubbles synchronized php-1.23wmf7/includes/media/MediaHandler.php 'Backport getEntireText so cirrus stop complaining' [19:02:51] (03PS3) 10Ottomata: Making varnishkafka ganglia view into a define to pass topic_regex [operations/puppet] - 10https://gerrit.wikimedia.org/r/104542 [19:03:04] Logged the message, Master [19:03:16] do you think I'll have to rerun it once scap finishes to make sure any in flight copies get updated? or is it good enough just to have waited a few minutes after pulling the file to tin [19:03:20] (03CR) 10Ottomata: [C: 032 V: 032] Making varnishkafka ganglia view into a define to pass topic_regex [operations/puppet] - 10https://gerrit.wikimedia.org/r/104542 (owner: 10Ottomata) [19:04:49] Aaron|home: I'm somewhat worried about the l10n errors. I _think_ they are happening during the CDB update. [19:06:37] RECOVERY - Puppet freshness on virt1001 is OK: puppet ran at Mon Dec 30 19:06:30 UTC 2013 [19:06:47] RECOVERY - Puppet freshness on virt1003 is OK: puppet ran at Mon Dec 30 19:06:42 UTC 2013 [19:07:17] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Mon 30 Dec 2013 07:06:42 PM UTC [19:07:37] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Mon 30 Dec 2013 07:06:30 PM UTC [19:07:52] manybubbles: I don't see a need to re-run it no [19:08:00] ottomata: did you plan on restarting that java deployment conversation [19:08:04] Aaron|home: thank! [19:08:05] manybubbles: what server is it on now? [19:08:18] mw1046: [19:08:47] it hanging or actually doing stuff? [19:08:54] still doing stuff [19:09:01] seems slow...as if '--exclude=**/cache/l10n/*.cdb' isn't applied or something [19:09:18] I dunno [19:09:42] It pauses for a few seconds every once in a while then spits out 5-6 lines [19:09:47] then pauses for a while again [19:11:36] Aaron|home: the message errors definitely coincide with the "Updated 366 CDB file(s)" lines [19:11:51] right before or right after [19:11:53] close [19:13:42] !log manybubbles finished scap: active Update CirrusSearch and Elastica to master [19:13:53] manybubbles: so the index alias switching...is that only done for the jobs added via the maintenance script? [19:13:53] scap completed in 41m 48s. [19:13:56] Logged the message, Master [19:14:37] RECOVERY - RAID on db1001 is OK: OK: optimal, 1 logical, 2 physical [19:15:36] ^d: test2wiki is looking good so I'm going to sync out the config change [19:15:51] (03CR) 10Manybubbles: [C: 032] Cirrus for more wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104134 (owner: 10Manybubbles) [19:16:03] (03Merged) 10jenkins-bot: Cirrus for more wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104134 (owner: 10Manybubbles) [19:16:52] !log manybubbles synchronized wmf-config/InitialiseSettings.php [19:17:09] Logged the message, Master [19:17:42] ^d: terbium not happy? [19:18:48] <^d> Define unhappy? [19:19:43] Class 'CirrusSearchConnection' not found in [19:19:48] RECOVERY - Puppet freshness on virt1002 is OK: puppet ran at Mon Dec 30 19:19:44 UTC 2013 [19:19:54] everything else is fine [19:20:02] not more exceptions from us [19:20:15] but mwscript seems unhappy [19:20:27] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Last successful Puppet run was Mon 30 Dec 2013 07:19:44 PM UTC [19:22:32] <^d> manybubbles: wfm? [19:23:09] ^d: I see [19:23:14] Transient? [19:24:10] (03PS1) 10Manybubbles: Actually turn cirrus on for new wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104546 [19:24:30] ^d: ^^^^ [19:24:32] just stupid [19:24:37] my stupidity [19:24:42] Reedy: We're just about done [19:24:48] I'm not sure about scap, though. [19:25:00] I think it is causing some errors [19:25:01] (03CR) 10Chad: [C: 032 V: 032] Actually turn cirrus on for new wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104546 (owner: 10Manybubbles) [19:25:30] 2 Fatal error: LuaSandboxFunction::call() [luasandboxfunction.call]: PANIC: unprotected error in call to Lua API (not enoug [19:25:30] h memory) in /usr/local/apache/common-local/php-1.23wmf7/extensions/Scribunto/engines/LuaSandbox/Engine.php on line 158 [19:25:49] !log manybubbles synchronized wmf-config/InitialiseSettings.php [19:26:05] Logged the message, Master [19:26:28] ^d: working now [19:26:32] much better [19:26:44] Reedy: that one looks "fun" [19:27:03] I'll just BZ it [19:27:07] If it is an OOM... [19:27:38] Reedy: I'm off tin [19:27:40] greg-g: ^^^ [19:27:45] manybubbles: sweet, thanks [19:28:07] omg RSYNC_ARGS=("${MW_RSYNC_ARGS[@]}") [19:28:11] so, anyone want to give a quick run down of what changed and why it broke? [19:28:18] * andrewbogott was already not very tempted to write things in bash [19:33:07] andrewbogott: don't. ever. [19:35:47] !log reedy updated /a/common to {{Gerrit|Ib1ce6e204}}: Actually turn cirrus on for new wikis [19:35:52] (03PS1) 10Reedy: All non wikipedias to 1.23wmf8 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104547 [19:36:05] Logged the message, Master [19:37:07] RECOVERY - Puppet freshness on virt1001 is OK: puppet ran at Mon Dec 30 19:36:57 UTC 2013 [19:37:18] ^d: test2wiki is able to serve out file contents patches properly. I'm going to start the "inpace" reindex of everything else [19:37:37] RECOVERY - Puppet freshness on virt1003 is OK: puppet ran at Mon Dec 30 19:37:28 UTC 2013 [19:37:37] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Mon 30 Dec 2013 07:36:57 PM UTC [19:37:41] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: non wikipedias to 1.23wmf8 [19:37:57] Logged the message, Master [19:38:00] <^d> manybubbles: Sweet [19:38:17] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Mon 30 Dec 2013 07:37:28 PM UTC [19:39:06] <^d> Oh man. [19:39:10] <^d> That's freaking awesome. [19:39:30] ^d: what? [19:39:32] this? https://test2.wikipedia.org/w/index.php?search=file%3Alinux&title=Special%3ASearch [19:39:42] <^d> Yep, that. [19:39:52] <^d> It's great seeing it in action outside my local install :) [19:40:18] 16 Catchable fatal error: Argument 1 passed to Language::commaList() must be an array, null given, called in /usr/local/apache/common-local/php-1.23wmf8/extension [19:40:18] s/CentralAuth/specials/SpecialCentralAuth.php on line 501 and defined in /usr/local/apache/common-local/php-1.23wmf8/languages/Language.php on line 3254 [19:40:42] akosiaris: do you know how /var/lib/ganglia/xmlcache/hosts gets created and populated on neon for icinga's check_ganglios_generic_value? [19:40:49] andrewbogott: going to replace the raid controller on virt1007 now...okay? [19:41:01] cmjohnson1: yes, thank you! [19:41:17] cmjohnson1: btw, other virt100x boxes have been throwing puppet alerts today. Is that you? [19:41:26] no [19:41:29] hm [19:41:32] i assumed it was you [19:41:44] hmm, actually i think i'm finding it [19:41:45] Might be ryan or mike as well… I emailed them, haven't heard back. [19:41:46] ganglia_parser [19:41:47] hmm [19:41:54] URL: http://be.wikibooks.org/wiki/Адмысловае:CentralAuth/Ryan_lane [19:42:17] wtf [19:42:19] They're all Ryan_Lane [19:42:42] Ryan_lane is everywhere [19:42:45] ottomata: unaware unfortunately [19:42:49] return htmlspecialchars( $this->getLanguage()->commaList( $row['groups'] ) ); [19:43:44] so, akosiaris, i think there is a problem with ganglia value checks right now [19:44:12] there is a cron job on neon that runs /usr/sbin/ganglia_parser [19:44:22] some of the docs in that code say [19:44:28] This will only [19:44:28] work when the ganglia_parser script is run on the same host as the ganglia [19:44:28] web UI and aggregator host. [19:44:35] neon != nickel [19:44:42] but, that's fine, all that matters is that gmetad.conf is the same [19:44:45] but, it isn [19:44:46] 't [19:44:53] it looks like maybe once it used to be? [19:44:57] but barely [19:45:01] it is missing a lot of ganglia aggregaotrs from the list [19:45:06] i don't htink it is puppetized properly [19:45:14] which is causing some ganglios value checks to fail [19:45:27] if the host's proper aggregator is not listed in gmetad.conf on neon [19:46:56] nice.... [19:47:37] (03PS2) 10Aaron Schulz: Removed fileJournal config; this has been used for some time [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104316 [19:47:47] hmm, nope, hm, gmetad is running on neon [19:47:48] hmmm [19:47:51] but the file is not right, hm [19:48:26] ohhhh [19:48:30] # neon runs gmetad for ganglios [19:48:30] /^neon$/: { [19:48:38] there is special config for it [19:48:39] hmm [19:48:42] why not use the production one [19:48:43] hm [19:48:47] PROBLEM - Host virt1007 is DOWN: PING CRITICAL - Packet loss = 100% [19:49:17] !log reedy synchronized php-1.23wmf8/extensions/CentralAuth/ [19:49:34] Logged the message, Master [19:49:37] RECOVERY - Puppet freshness on virt1002 is OK: puppet ran at Mon Dec 30 19:49:29 UTC 2013 [19:50:13] Reedy: something is angry [19:50:27] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Last successful Puppet run was Mon 30 Dec 2013 07:49:29 PM UTC [19:50:55] ? [19:51:24] just logs [19:51:32] Call to undefined method SpecialCentralAuth::getPageTitle [19:52:16] Hmm [19:53:21] Oh [19:53:34] Exception from line 375 of /usr/local/apache/common-local/php-1.23wmf8/includes/specialpage/SpecialPage.php: Call to undefined method SpecialCentralAuth::getPageTitle [19:55:28] https://github.com/wikimedia/mediawiki-extensions-CentralAuth/commit/e34f2bcdc9aa430700f300c38cf87c58d3681a99 [19:55:30] legoktm: ^^ [19:57:14] (03CR) 10Aaron Schulz: [C: 032] Configured $wgJobBackoffThrottling [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103524 (owner: 10Aaron Schulz) [19:57:54] (03Merged) 10jenkins-bot: Configured $wgJobBackoffThrottling [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103524 (owner: 10Aaron Schulz) [19:58:16] ^d: the elasticsearch cluster's load spiked nicely when I started building those indexes. [19:58:19] ~20% [19:58:46] hmm [19:58:46] Author: lcarr [19:58:47] Date: Thu Mar 28 12:45:09 2013 -0700 [19:58:47] deactivating ganglios [19:58:50] LeslieCarr: ? [19:58:52] why no ganglios? [20:00:06] RECOVERY - Puppet freshness on virt1001 is OK: puppet ran at Mon Dec 30 20:00:05 UTC 2013 [20:00:37] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Mon 30 Dec 2013 08:00:05 PM UTC [20:01:59] Ok, i'm going to reenable ganglios, i don't know why it isn't on, LeslieCarr I will put you as reviewer, but probably merge anyway [20:02:03] feel free to revert if need [20:02:06] !log aaron synchronized wmf-config/CommonSettings.php 'Configured $wgJobBackoffThrottling' [20:02:21] Logged the message, Master [20:02:30] that commit didn't disable ganglios anyway, it just keeps gmetad.conf from being updated properly [20:03:44] (03PS1) 10Ottomata: Re-including ganglios on neon for icinga ganlios checks [operations/puppet] - 10https://gerrit.wikimedia.org/r/104644 [20:04:02] (03CR) 10Ottomata: [C: 032 V: 032] Re-including ganglios on neon for icinga ganlios checks [operations/puppet] - 10https://gerrit.wikimedia.org/r/104644 (owner: 10Ottomata) [20:06:47] RECOVERY - Puppet freshness on virt1003 is OK: puppet ran at Mon Dec 30 20:06:41 UTC 2013 [20:07:12] Out of interest; Do you know of the MWException error with CentralAuth? [20:07:16] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Mon 30 Dec 2013 08:06:41 PM UTC [20:07:26] RECOVERY - Puppet freshness on virt1001 is OK: puppet ran at Mon Dec 30 20:07:22 UTC 2013 [20:07:36] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Mon 30 Dec 2013 08:07:22 PM UTC [20:07:55] ori, ^d, manybubbles and/or other folks involved in deployment -- something (Scap or deployment?) is failing a million sudo attempts on beta. [20:07:55] anyone know what that's about? [20:07:55] 3 incorrect password attempts ; TTY=unknown ; PWD=/home/jenkins-deploy/workspace/beta-code-update ; USER=l10nupdate ; COMMAND=/usr/local/bin/refreshCdbJsonFiles --directory=/a/common/php-master/cache/l10n --threads=0 [20:08:07] It's causing a million alert spams… [20:08:22] andrewbogott: hashar I think [20:08:37] yeah, I guess he's out today [20:08:38] I didn't think we used scap in beta [20:08:49] though, it looks like that is part of scap [20:09:01] Aaron|home: do you know anything about ^^ [20:09:20] not really [20:09:36] I'm wondering if each time l10n updates happen on production beta makes a shadowy, failed attempt by accident [20:11:02] might be [20:11:25] ottomata: so ganglia need to be running on the machien running ganglios [20:11:31] which does have some serious performance issues [20:11:33] so... [20:11:57] ah ok [20:11:58] hrm, trying to remember why i turned that off on neon [20:12:02] does it actually have to be running though? [20:12:09] it looks like maybe it just needs the gmetad config file [20:12:30] it jsut happens that our puppetization of gmetad does the config and service not matter what [20:12:45] ganglia needs to be running [20:12:52] it reads from the metrics [20:13:14] hmm [20:13:31] it looks liike hte ganglios source just reads gmetad.conf for aggregator sources [20:13:37] and then queries each of them for their xmldata [20:13:59] https://bitbucket.org/maplebed/ganglios/src/8c015e9b6953c7fa844b01c7c0ca903a19b84a2b/src/ganglia_parser?at=default#cl-112 [20:14:01] hrm [20:14:17] line 112, and then later line 129 [20:14:25] i seem to remember it reading from data but.... haven't looked at it in a while -- could be misremembering [20:14:27] and line 133 [20:15:03] gmetadconf = open('/etc/ganglia/gmetad.conf') [20:15:03] # read hosts out of file [20:15:03] then [20:15:03] for each aggregator host [20:15:03] s.connect((host, 8649)) [20:15:03] ... [20:15:18] that's also how to get xml data from ganglia [20:15:22] you can't connect to gmetad to get it [20:15:27] you have to connect to aggregators [20:15:35] (pretty sure anyway) [20:15:54] anyway, I can see if I can separate the gmetad config from the service in puppet [20:15:59] and just include the config on neon [20:16:23] andrewbogott: did you get virt1006 working [20:16:45] cmjohnson1: Wasn't 1006 the one you thought we should scrap? [20:16:47] oof, there are ganglios package problems anyway [20:16:47] ganglios : Depends: python (< 2.7) but 2.7.3-0ubuntu2.2 is to be installed [20:17:05] cmjohnson1: The one you worked on last week and it didn't survive the surgery? [20:17:24] oh the server is fine...i broke the spare [20:17:35] Oh! Hm… dunno then. [20:17:40] I'll give it another try now. [20:18:26] cmjohnson1: you put a new disk controller in 1007 just now? If so I'll try that one too [20:18:53] not yet...i am poking at it [20:19:06] Anyone see my previous message? [20:19:30] cmjohnson1: 'k [20:19:38] (03PS2) 10Reedy: All non wikipedias to 1.23wmf8 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104547 [20:19:53] (03CR) 10Reedy: [C: 032] All non wikipedias to 1.23wmf8 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104547 (owner: 10Reedy) [20:20:36] RECOVERY - Puppet freshness on virt1002 is OK: puppet ran at Mon Dec 30 20:20:27 UTC 2013 [20:21:00] !log reedy synchronized php-1.23wmf8/extensions/CentralAuth/ [20:21:14] Logged the message, Master [20:21:26] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Last successful Puppet run was Mon 30 Dec 2013 08:20:27 PM UTC [20:22:16] RECOVERY - Puppet freshness on virt1002 is OK: puppet ran at Mon Dec 30 20:22:15 UTC 2013 [20:22:22] wtf is Jenkins doing [20:22:26] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Last successful Puppet run was Mon 30 Dec 2013 08:22:15 PM UTC [20:22:37] (03PS1) 10Ottomata: Factoring out ganglia collector config from ganglia::collector [operations/puppet] - 10https://gerrit.wikimedia.org/r/104657 [20:22:48] (03PS1) 10Hashar: beta: sudo policy for refreshCdbJsonFiles [operations/puppet] - 10https://gerrit.wikimedia.org/r/104658 [20:22:58] apergos: OK, something interesting is happening. Note the last three warnings about virt1002 [20:23:23] (03PS2) 10Ottomata: Factoring out ganglia collector config from ganglia::collector [operations/puppet] - 10https://gerrit.wikimedia.org/r/104657 [20:23:49] Hm… who understands icinga these days? No one? [20:24:23] me some? [20:24:27] (03PS3) 10Ottomata: Factoring out ganglia collector config from ganglia::collector [operations/puppet] - 10https://gerrit.wikimedia.org/r/104657 [20:24:28] just because i've been poking at it [20:24:30] but not much [20:25:13] (03CR) 10Reedy: [V: 032] All non wikipedias to 1.23wmf8 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104547 (owner: 10Reedy) [20:25:29] some service defns disappeared from nagios [20:25:35] it's not that interesting [20:26:13] let's wait for the next puppet run on neon [20:26:19] (03PS2) 10Reedy: Enable EducationProgram on arwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102484 [20:26:24] due to start in one minute [20:26:29] (03CR) 10Andrew Bogott: [C: 032] beta: sudo policy for refreshCdbJsonFiles [operations/puppet] - 10https://gerrit.wikimedia.org/r/104658 (owner: 10Hashar) [20:27:28] apergos: you propose a very easy solution :) [20:27:57] (03CR) 10Ottomata: [C: 032 V: 032] Factoring out ganglia collector config from ganglia::collector [operations/puppet] - 10https://gerrit.wikimedia.org/r/104657 (owner: 10Ottomata) [20:28:21] (03CR) 10Reedy: [C: 032] Enable EducationProgram on arwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102484 (owner: 10Reedy) [20:29:38] after that if there's still an issue it will be easy to see which settings are at fault [20:33:24] (03PS1) 10Reedy: Revert "Enable CAPTCHA for all edits of non-confirmed users on pt.wikipedia in order to reduce editing activity" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104660 [20:33:28] (03PS2) 10Reedy: Revert "Enable CAPTCHA for all edits of non-confirmed users on pt.wikipedia in order to reduce editing activity" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104660 [20:33:42] (03CR) 10Reedy: [C: 032] Revert "Enable CAPTCHA for all edits of non-confirmed users on pt.wikipedia in order to reduce editing activity" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104660 (owner: 10Reedy) [20:34:53] \o/ [20:36:23] :) [20:37:06] RECOVERY - Puppet freshness on virt1003 is OK: puppet ran at Mon Dec 30 20:36:57 UTC 2013 [20:37:11] cmjohnson1: virt1006 actually seems kind of better today. That's just from you opening and closing the case and changing nothing? [20:37:16] RECOVERY - Puppet freshness on virt1001 is OK: puppet ran at Mon Dec 30 20:37:07 UTC 2013 [20:37:16] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Mon 30 Dec 2013 08:36:57 PM UTC [20:37:36] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Mon 30 Dec 2013 08:37:07 PM UTC [20:37:41] Or maybe you just placed your hands on the front panel and chanted a lot? [20:37:44] What the hell is jenkins doing? [20:37:57] "The power of Chris compells you! The power of Chris compells you!" [20:39:03] Error: Could not find any host matching 'db50' (config file '/etc/icinga/puppet_services.cfg', starting on line 36827) [20:39:04] nice [20:39:14] guess it's not fully decommisioned [20:39:19] that will be a problem [20:39:51] Reedy: l10n-bot it seems [20:39:58] Gah [20:41:11] cmjohnson1: can you see what'sup with db50? apparently it has some storedconfig entries [20:41:29] but it's listed in decommissioning.pp and that makes icinga unhappy [20:41:49] andrewbogott: hahaha...wish'd that worked more often [20:42:02] i reseated all the cables ...maybe that was it [20:42:06] (03CR) 10Reedy: [V: 032] Enable EducationProgram on arwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102484 (owner: 10Reedy) [20:42:22] (03PS3) 10Reedy: Revert "Enable CAPTCHA for all edits of non-confirmed users on pt.wikipedia in order to reduce editing activity" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104660 [20:42:36] (03CR) 10Reedy: [C: 032 V: 032] Revert "Enable CAPTCHA for all edits of non-confirmed users on pt.wikipedia in order to reduce editing activity" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104660 (owner: 10Reedy) [20:42:39] andrewbogott: powering up virt1007 now [20:43:22] apergos: will check it in a few [20:43:24] thanks [20:43:46] !log reedy synchronized wmf-config/InitialiseSettings.php [20:44:03] Logged the message, Master [20:44:34] cmjohnson1: ready for me to try partman? [20:44:55] give it a go [20:45:05] fingers crossed [20:45:25] (03PS2) 10Reedy: Remove duplicate configuration from CommonSettings-labs.php [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102510 [20:45:38] 10.0.6.60 this host right? still up and running [20:48:29] (03PS1) 10Ottomata: Including mobile cache ganglia aggregators on neon's gmetad.conf [operations/puppet] - 10https://gerrit.wikimedia.org/r/104661 [20:48:45] (03CR) 10Ottomata: [C: 032 V: 032] Including mobile cache ganglia aggregators on neon's gmetad.conf [operations/puppet] - 10https://gerrit.wikimedia.org/r/104661 (owner: 10Ottomata) [20:48:48] and it's still running puppet [20:49:28] could be the other recently decommed dbs are all like that too [20:50:07] RECOVERY - Puppet freshness on virt1002 is OK: puppet ran at Mon Dec 30 20:49:58 UTC 2013 [20:50:26] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Last successful Puppet run was Mon 30 Dec 2013 08:49:58 PM UTC [20:50:30] !log aaron started scap: php-1.23wmf6 Scap timing test [20:50:47] Logged the message, Master [20:53:01] andrewbogott: if you have access to icinga admin which I believe you do, you can check to se if active checks are turned on for puppet when you see that behavior [20:53:12] 'k [20:53:12] usually that's a symptom of something else set up wrong (not icinga's fault) [20:53:28] like 'it's in decom and it's still running puppet' or some crazy thing like that [20:53:39] anyways that was the problem here: active checks turned on for whatever reason [20:53:42] I turned them off [20:53:46] for virt1002 [20:53:47] with puppet freshness, though, there isn't a list of servers anywhere is there? Isn't it just that once a server contacts icinga about puppet it monitors it from then on? [20:54:00] there is indeed a list of servers and services [20:54:04] Oh, ok, I guess I don't know how this works :) [20:54:13] apergos: the db's are still powered on but all dns entries have been removed [20:54:18] and if icinga doesn't hear good things withing a certain timeframe it whines [20:54:34] ok well they need to be powered off [20:54:48] remember how there are those steps that once you start you need to keep going till power off? [20:54:55] including puppet disable, [20:54:56] talked to sbernardin he will be in later today and will power them down [20:55:02] remove storedconfigs, etc [20:55:12] apergos: i did all that [20:55:32] well db50 (example) still has storedconfigs [20:56:07] and puppet is running over there [20:56:12] so it wasn't disabled it seems [20:56:49] can you doublecheck the dbs and make sure of those two steps? that will let icinga recover (right now its config is broken) [20:57:04] mm unless you are on your way out the door, I dunno if you are travelling today [20:57:29] (03CR) 10Reedy: [C: 032] Remove duplicate configuration from CommonSettings-labs.php [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102510 (owner: 10Reedy) [20:57:35] no not travelling today [20:57:46] !g 104580,1 [20:57:47] https://gerrit.wikimedia.org/r/#q,104580,1,n,z [20:57:52] RECOVERY - Host virt1007 is UP: PING OK - Packet loss = 0%, RTA = 0.27 ms [20:58:01] just odd...cuz I did that and now it's whining [20:58:07] !log aaron finished scap: php-1.23wmf6 Scap timing test [20:58:21] Logged the message, Master [20:58:55] well another puppet run just concluded over there so [20:59:26] I gotta afk (well afk this channel), I have a family phone call [20:59:28] sorry about this [20:59:43] ping me if you want me to double check anything later [20:59:50] ok [21:00:31] (03PS1) 10Ottomata: Adding icinga checks for Kafka Broker MessagesIn and Varnishkafka drerr [operations/puppet] - 10https://gerrit.wikimedia.org/r/104662 [21:01:37] (03PS2) 10Ottomata: Adding icinga checks for Kafka Broker MessagesIn and Varnishkafka drerr [operations/puppet] - 10https://gerrit.wikimedia.org/r/104662 [21:01:46] (03CR) 10Ottomata: [C: 032 V: 032] Adding icinga checks for Kafka Broker MessagesIn and Varnishkafka drerr [operations/puppet] - 10https://gerrit.wikimedia.org/r/104662 (owner: 10Ottomata) [21:04:21] cmjohnson1: virt1007 is still unhappy :( [21:05:48] well shit [21:05:58] that is the same issue we had with an1007 [21:06:06] maybe we should rename to something other than 1007 [21:06:13] it may just work :-P [21:06:49] heh, yeah [21:06:51] https://dpaste.de/rOQr [21:07:10] I don't know really how to read this logfile… it looks like /all/ of the drives are failing? [21:07:23] RECOVERY - Puppet freshness on virt1003 is OK: puppet ran at Mon Dec 30 21:07:11 UTC 2013 [21:07:23] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Mon 30 Dec 2013 08:36:57 PM UTC [21:07:23] RECOVERY - Puppet freshness on virt1001 is OK: puppet ran at Mon Dec 30 21:07:17 UTC 2013 [21:08:12] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Mon 30 Dec 2013 09:07:17 PM UTC [21:08:27] cmjohnson1: Is there any chance that we swapped 'em and this /is/ analytics 1007? [21:08:49] no chance [21:09:00] an1007 is still in the rack filling space [21:09:38] ok [21:10:09] What's the best way to go forward here? Is it possible to detect which drive is broken and work around it? Or is our best bet to just scrap it? [21:10:26] Without it we still have enough ciscos to keep labs going for quite a while. [21:10:43] I'll just be sad that they're nonsequential ;( [21:11:22] i have a suggestion ...we can make wmf5710 virt1006 [21:11:44] (03PS1) 10CSteipp: Central OAuth wiki for Labs (metawiki) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104666 [21:11:55] and i can poke at the current virt1006 later server later [21:12:57] andrewbogott ^ [21:13:48] cmjohnson1: you mean virt1007 in both those last messages, right? [21:13:56] yes [21:13:57] sorry [21:14:01] what is/was wmf5710? [21:14:12] it's a cisco hanging out on the shelf [21:14:17] just above virt1007 atm [21:14:37] Oh, excellent :) No rush, but if you want to rename and renetwork it let me know. [21:14:45] (03PS1) 10Ottomata: Removing unused misc/analytics.pp file [operations/puppet] - 10https://gerrit.wikimedia.org/r/104668 [21:15:19] (03CR) 10Ottomata: [C: 032 V: 032] Removing unused misc/analytics.pp file [operations/puppet] - 10https://gerrit.wikimedia.org/r/104668 (owner: 10Ottomata) [21:15:19] Cool, then I will get that done. Going to put in RT for it [21:19:42] RECOVERY - Puppet freshness on virt1002 is OK: puppet ran at Mon Dec 30 21:19:34 UTC 2013 [21:20:49] cmjohnson1: virt1006 is now running and puppetized and everything. Seems totally (mysteriously) fine. [21:21:26] very odd [21:21:26] (03CR) 10Andrew Bogott: "Jenkins, so good to see you!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/104658 (owner: 10Hashar) [21:23:47] ori: maybe avoid agent forwarding for scap is a good idea...it sucks that performance depends on the client too (e.g. pageant eating cpu) [21:24:36] cmjohnson1: I actually have no frame of reference for this… if we were buying, how much would one of these ciscos cost? Are we investing $thousands of labor in a $200 piece of hardware? [21:25:12] PROBLEM - Host virt1007 is DOWN: PING CRITICAL - Packet loss = 100% [21:30:04] andrewbogott: they're pretty inexpensive servers 1500-1800 USD [21:30:25] (03Merged) 10jenkins-bot: Remove duplicate configuration from CommonSettings-labs.php [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102510 (owner: 10Reedy) [21:33:20] cmjohnson1: Ah, ok -- makes sense to abandon them pretty quickly then when they misbehave. [21:36:24] yeah that and they were donations and the support has been terrible so far [21:36:42] RECOVERY - Puppet freshness on virt1003 is OK: puppet ran at Mon Dec 30 21:36:38 UTC 2013 [21:37:22] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Mon 30 Dec 2013 09:36:38 PM UTC [21:37:32] RECOVERY - Puppet freshness on virt1001 is OK: puppet ran at Mon Dec 30 21:37:29 UTC 2013 [21:38:12] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Mon 30 Dec 2013 09:37:29 PM UTC [21:41:28] (03CR) 10Hashar: "ran puppet on deployment-bastion , that should stop the cron spam" [operations/puppet] - 10https://gerrit.wikimedia.org/r/104658 (owner: 10Hashar) [21:45:42] (03CR) 10Reedy: "You mean https://gerrit.wikimedia.org/r/#/c/104190/" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104191 (owner: 10Aaron Schulz) [21:47:05] (03PS3) 10Reedy: Update favicon spcom.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103203 (owner: 10Murfel) [21:47:12] (03CR) 10Reedy: [C: 032] Update favicon spcom.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103203 (owner: 10Murfel) [21:47:32] (03PS1) 10Cmjohnson: updating virt1007 to reflect new mac [operations/puppet] - 10https://gerrit.wikimedia.org/r/104671 [21:47:46] (03Merged) 10jenkins-bot: Update favicon spcom.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103203 (owner: 10Murfel) [21:48:23] (03PS6) 10Reedy: Fix en.wiktionary favicon [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103183 (owner: 10Sn1per) [21:48:28] (03CR) 10Reedy: [C: 032] Fix en.wiktionary favicon [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103183 (owner: 10Sn1per) [21:48:38] (03Merged) 10jenkins-bot: Fix en.wiktionary favicon [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103183 (owner: 10Sn1per) [21:48:47] (03PS5) 10Reedy: Update favicon wikipedia.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99768 (owner: 10M4tx) [21:49:02] (03CR) 10Cmjohnson: [C: 032] updating virt1007 to reflect new mac [operations/puppet] - 10https://gerrit.wikimedia.org/r/104671 (owner: 10Cmjohnson) [21:49:03] (03CR) 10Reedy: [C: 032] Update favicon wikipedia.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99768 (owner: 10M4tx) [21:49:18] (03Merged) 10jenkins-bot: Update favicon wikipedia.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99768 (owner: 10M4tx) [21:49:57] andrewbogott: the switch has been made and I updated the dhcpd file. hopefully should work [21:49:57] (03PS2) 10Reedy: Enable $wgImportSources for Hebrew Wikivoyage [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103118 (owner: 10Odder) [21:50:03] (03CR) 10Reedy: [C: 032] Enable $wgImportSources for Hebrew Wikivoyage [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103118 (owner: 10Odder) [21:50:12] (03Merged) 10jenkins-bot: Enable $wgImportSources for Hebrew Wikivoyage [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103118 (owner: 10Odder) [21:50:47] (03PS2) 10Reedy: brwikimedia: fix import sources [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103730 (owner: 10Jeremyb) [21:50:55] (03CR) 10Reedy: [C: 032] brwikimedia: fix import sources [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103730 (owner: 10Jeremyb) [21:51:03] (03Merged) 10jenkins-bot: brwikimedia: fix import sources [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103730 (owner: 10Jeremyb) [21:51:12] (03PS3) 10Reedy: fix import sources for all chapter wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103731 (owner: 10Jeremyb) [21:51:22] reedy-spam [21:51:30] (03CR) 10Reedy: [C: 032] fix import sources for all chapter wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103731 (owner: 10Jeremyb) [21:51:31] (03CR) 10Hashar: "apparently that made beta able to generate json files :D" [operations/puppet] - 10https://gerrit.wikimedia.org/r/104658 (owner: 10Hashar) [21:51:39] (03Merged) 10jenkins-bot: fix import sources for all chapter wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103731 (owner: 10Jeremyb) [21:52:04] (03PS2) 10Reedy: Enable local TimedText on Portuguese Wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103861 (owner: 10Odder) [21:52:08] (03CR) 10Reedy: [C: 032] Enable local TimedText on Portuguese Wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103861 (owner: 10Odder) [21:52:19] (03Merged) 10jenkins-bot: Enable local TimedText on Portuguese Wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103861 (owner: 10Odder) [21:53:13] (03PS3) 10Reedy: Add file and translate NS to Wikibase Client excludeNamespaces [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103394 (owner: 10Hoo man) [21:53:21] (03CR) 10Reedy: [C: 032] Add file and translate NS to Wikibase Client excludeNamespaces [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103394 (owner: 10Hoo man) [21:53:32] (03Merged) 10jenkins-bot: Add file and translate NS to Wikibase Client excludeNamespaces [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103394 (owner: 10Hoo man) [21:54:05] (03PS8) 10Reedy: Clean up wgSiteName in InitialiseSettings [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/86418 (owner: 10TTO) [21:54:24] (03CR) 10Reedy: [C: 032] Clean up wgSiteName in InitialiseSettings [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/86418 (owner: 10TTO) [21:55:48] (03Merged) 10jenkins-bot: Clean up wgSiteName in InitialiseSettings [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/86418 (owner: 10TTO) [21:55:58] spammer! [21:57:01] PROBLEM - Kafka Broker Messages In on analytics1022 is CRITICAL: STALE [21:57:21] PROBLEM - Varnishkafka Delivery Errors on cp4019 is CRITICAL: STALE [21:57:28] !log reedy synchronized wmf-config/ [21:57:31] PROBLEM - Varnishkafka Delivery Errors on cp4020 is CRITICAL: STALE [21:57:31] PROBLEM - Varnishkafka Delivery Errors on cp4011 is CRITICAL: STALE [21:57:41] PROBLEM - Varnishkafka Delivery Errors on cp4012 is CRITICAL: STALE [21:57:41] PROBLEM - Kafka Broker Messages In on analytics1021 is CRITICAL: STALE [21:57:43] Logged the message, Master [21:58:32] !log reedy updated /a/common to {{Gerrit|I028589438}}: Clean up wgSiteName in InitialiseSettings [21:58:36] (03PS1) 10Reedy: Remove duplicate wgCopyUploadsDomains from labs InitialiseSettings.php [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104673 [21:58:49] Logged the message, Master [21:58:51] (03CR) 10Reedy: [C: 032] Remove duplicate wgCopyUploadsDomains from labs InitialiseSettings.php [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104673 (owner: 10Reedy) [21:59:00] (03Merged) 10jenkins-bot: Remove duplicate wgCopyUploadsDomains from labs InitialiseSettings.php [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104673 (owner: 10Reedy) [21:59:10] ottomata1: some kafka related whine from icinga above :( [21:59:22] awesoome! [21:59:25] hmm [21:59:29] that is a problem with ganglios [21:59:32] not an actual problem [21:59:35] but at least it is working [21:59:40] i'll mark them as acknowledged in icinga [21:59:43] i gotta run right now [21:59:48] :-) [22:00:42] oh poo i can't edit these [22:00:43] oh well [22:00:45] will fix later [22:00:50] i'll send an email to ops@ [22:01:24] (03PS2) 10Reedy: Submit a new Apple Touch icon for MediaWiki.org [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103117 (owner: 10Odder) [22:01:31] (03CR) 10Reedy: [C: 032] Submit a new Apple Touch icon for MediaWiki.org [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103117 (owner: 10Odder) [22:01:39] (03Merged) 10jenkins-bot: Submit a new Apple Touch icon for MediaWiki.org [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103117 (owner: 10Odder) [22:02:29] https://gerrit.wikimedia.org/r/103203 [22:02:33] (03PS1) 10Andrew Bogott: Give myself some icinga privs [operations/puppet] - 10https://gerrit.wikimedia.org/r/104674 [22:02:36] apergos: ^ [22:02:56] MatmaRex: ^^ [22:03:10] lookin [22:04:25] add yerself to the rest too please [22:04:25] PROBLEM - Puppet freshness on db34 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [22:04:43] !log reedy synchronized docroot and w [22:04:53] andrewbogott: [22:04:59] Logged the message, Master [22:05:05] PROBLEM - Puppet freshness on db54 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [22:05:15] PROBLEM - Puppet freshness on db49 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [22:05:20] * apergos grits teeth [22:05:47] (03PS2) 10Andrew Bogott: Give myself some icinga privs [operations/puppet] - 10https://gerrit.wikimedia.org/r/104674 [22:06:01] not gritting them at you [22:06:15] apergos: I know I disabled checks on the entire lot [22:06:32] twkozlowski: yay [22:06:55] RECOVERY - Puppet freshness on virt1003 is OK: puppet ran at Mon Dec 30 22:06:53 UTC 2013 [22:07:05] RECOVERY - Puppet freshness on virt1001 is OK: puppet ran at Mon Dec 30 22:07:04 UTC 2013 [22:07:15] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Mon 30 Dec 2013 10:06:53 PM UTC [22:07:35] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Mon 30 Dec 2013 10:07:04 PM UTC [22:07:55] PROBLEM - Host db54 is DOWN: PING CRITICAL - Packet loss = 100% [22:08:05] PROBLEM - Host db49 is DOWN: PING CRITICAL - Packet loss = 100% [22:08:27] apergos: Maybe you already answered this… why isn't icinga updating the 'last successful' when it sends that 'OK' message? [22:08:40] (03PS4) 10Reedy: annotating-domain-whitelist [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102739 (owner: 10Dan-nl) [22:09:04] (03CR) 10Reedy: [C: 032] annotating-domain-whitelist [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102739 (owner: 10Dan-nl) [22:09:12] (03Merged) 10jenkins-bot: annotating-domain-whitelist [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102739 (owner: 10Dan-nl) [22:09:35] PROBLEM - Host db34 is DOWN: PING CRITICAL - Packet loss = 100% [22:09:35] PROBLEM - Host db50 is DOWN: PING CRITICAL - Packet loss = 100% [22:09:45] PROBLEM - Host db57 is DOWN: PING CRITICAL - Packet loss = 100% [22:10:17] MatmaRex: a happy yay! [22:10:22] (03PS2) 10Reedy: Disable interwiki magic for wikimedia (chapter) sites [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103649 (owner: 10TTO) [22:10:50] it doesn't say last successful because ... well why? [22:10:59] I mean ok means 'hey it just ran. just now' [22:11:23] (03CR) 10Reedy: [C: 032] Disable interwiki magic for wikimedia (chapter) sites [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103649 (owner: 10TTO) [22:11:25] andrewbogott: change looks good to me [22:11:32] (03Merged) 10jenkins-bot: Disable interwiki magic for wikimedia (chapter) sites [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103649 (owner: 10TTO) [22:12:00] Wait, so 'puppet freshness is OK' is a different thing from a 'successful Puppet run'? [22:12:01] (03PS2) 10Reedy: Add massmessage-sender user group to enwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103652 (owner: 10Odder) [22:12:15] (03CR) 10Andrew Bogott: [C: 032] Give myself some icinga privs [operations/puppet] - 10https://gerrit.wikimedia.org/r/104674 (owner: 10Andrew Bogott) [22:12:24] ok so we need two more things [22:12:29] (03CR) 10Reedy: [C: 032] Add massmessage-sender user group to enwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103652 (owner: 10Odder) [22:12:32] maybe noe more [22:12:36] try a puppet run on neon [22:12:38] (03Merged) 10jenkins-bot: Add massmessage-sender user group to enwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103652 (owner: 10Odder) [22:12:39] *one more [22:12:45] PROBLEM - Host db33 is DOWN: PING CRITICAL - Packet loss = 100% [22:12:54] * andrewbogott runs puppet on neon [22:12:55] RECOVERY - RAID on db1047 is OK: OK: optimal, 3 logical, 6 physical [22:13:06] PROBLEM - Host db31 is DOWN: PING CRITICAL - Packet loss = 100% [22:13:24] cmjohnson1 and sbernardin : thanks for putting the final nail in the coffin of those db hosts [22:13:26] PROBLEM - Host db36 is DOWN: PING CRITICAL - Packet loss = 100% [22:13:26] PROBLEM - Host db47 is DOWN: PING CRITICAL - Packet loss = 100% [22:13:26] PROBLEM - Host db37 is DOWN: PING CRITICAL - Packet loss = 100% [22:13:26] PROBLEM - Host es1 is DOWN: PING CRITICAL - Packet loss = 100% [22:14:06] (03PS3) 10Jeremyb: import sources: move chapter wikis to own section [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103732 [22:14:55] PROBLEM - Host es2 is DOWN: PING CRITICAL - Packet loss = 100% [22:14:56] (03CR) 10Reedy: Removed fileJournal config; this has been unused for some time (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104316 (owner: 10Aaron Schulz) [22:15:02] andrewbogott: if you see errors lemme know, I'll have a look [22:15:09] in particular errors with icinga restart [22:15:11] jeremyb: don't we use 'w' for wikipedia in $wgImportSources? [22:17:32] (03CR) 10Reedy: [C: 04-1] "Annotations were merged in https://gerrit.wikimedia.org/r/#/c/102739/" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104482 (owner: 10Dan-nl) [22:18:03] !log reedy synchronized wmf-config/ [22:18:22] Logged the message, Master [22:18:46] twkozlowski: i would mostly leave that up to the discretion of the wiki. most important is that the source is functional. (that we don't add a non-working source to the list for a given wiki) and second most important is that the people using it know where a given source actually points. some cases i changed recently because "this should work at all" and "this should not work just because it was an accident that there was a matching lang" [22:18:56] > some cases i changed recently because "this should work at all" and "this should not work just because it was an accident that there was a matching lang" trumps the local wiki's desires [22:19:00] ah here it goes [22:19:09] twkozlowski: is there a reason to prefer 'w' generally? [22:19:38] Just habit, I think [22:19:41] (03PS5) 10Reedy: Make missing.php aware of interwiki prefixes [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/94716 (owner: 10TTO) [22:19:49] I never linked to a Wikipedia page with 'wikipedia:xyz' [22:20:04] twkozlowski: did you read some of the commit msgs? or the inline comments between nemo and i? [22:20:11] (03Abandoned) 10Reedy: Allow 'crats on test2wiki to give oversight [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/101030 (owner: 10Reedy) [22:20:28] (03PS3) 10Reedy: Simplify Drafts related TitleQuickPermissions hook subscriber [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102366 [22:20:58] jeremyb: Nemo said 'no opinion' as far as I can see [22:20:59] oh, this week is alex, cool. /me will try him tomorrow morning :P [22:21:03] ok so andrewbogott, it looks good, try logging in [22:21:07] twkozlowski: look harder :) [22:21:39] No. [22:22:07] apergos: do I need to restart icinga by hand or did puppet do that already? [22:22:15] done for you [22:22:42] it would only be a problem if we had config errors but those have been fixed up now, yay [22:22:46] (03PS4) 10Reedy: Simplify Drafts related TitleQuickPermissions hook subscriber [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102366 [22:23:14] (03PS3) 10Reedy: Use local Wiki.png for Persian Wikipedia Logo [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102454 (owner: 10Ebrahim) [22:23:19] (03CR) 10Reedy: [C: 032] Use local Wiki.png for Persian Wikipedia Logo [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102454 (owner: 10Ebrahim) [22:23:21] well, if I can figure out how to log out so I can log back in... [22:23:30] (03Merged) 10jenkins-bot: Use local Wiki.png for Persian Wikipedia Logo [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102454 (owner: 10Ebrahim) [22:23:31] you gave the new url? [22:23:43] (03PS3) 10Reedy: Logo configuration for ckb.wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103184 (owner: 10Ebrahim) [22:23:51] (03CR) 10Reedy: [C: 032] Logo configuration for ckb.wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103184 (owner: 10Ebrahim) [22:24:01] (03Merged) 10jenkins-bot: Logo configuration for ckb.wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103184 (owner: 10Ebrahim) [22:24:05] twkozlowski: oh, sorry, wasn't inline. https://gerrit.wikimedia.org/r/103730 (12-26 01:26) [22:24:09] andrewbogott: [22:24:24] apergos: https://icinga-admin.wikimedia.org/icinga/ you mean? [22:24:28] yep [22:24:40] you should not need to log out [22:24:47] (03PS4) 10Reedy: import sources: move chapter wikis to own section [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103732 (owner: 10Jeremyb) [22:24:49] unless you gave your wikitech name in uppercase or something [22:24:54] (03CR) 10Reedy: [C: 032] import sources: move chapter wikis to own section [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103732 (owner: 10Jeremyb) [22:25:12] apergos: yeah, I'm logged in with mixed case [22:25:18] (03Merged) 10jenkins-bot: import sources: move chapter wikis to own section [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103732 (owner: 10Jeremyb) [22:25:19] ah [22:25:21] hah, Reedy rerebased. (a few mins after i did) [22:25:29] btw, thanks for all the merges :) [22:26:50] andrewbogott: you could go explicitly to https://username@icinga-admin.wikimedia.org/icinga/ ? [22:27:03] you're going to have to exit your browser to log out [22:27:08] with the right username or even a bogus name (literally username@) [22:27:13] basic auth, see [22:27:15] and then it will maybe reprompt you [22:27:38] oh, huh, I haven't tried that, dunno [22:27:40] !log reedy synchronized wmf-config/ [22:27:57] Logged the message, Master [22:28:17] apergos: that worked! So now maybe icinga will simmer down [22:28:23] great [22:28:26] (03CR) 10Aaron Schulz: Removed fileJournal config; this has been unused for some time (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104316 (owner: 10Aaron Schulz) [22:28:32] I still don't understand those warnings, though, puppet was working just fine on those boxes :( [22:28:45] virt100x eh? [22:28:49] yeah [22:28:58] well, 1007 is broken, but the others look just fine when I log in [22:29:05] 'virt1001', #3687 renamed virt1001-3 [22:29:05] 'virt1002', [22:29:05] 'virt1003', [22:29:10] these are in decommissioned.pp! [22:29:12] wtf [22:29:18] so... fix that, fix icinga :-D [22:29:26] hm, ok... [22:29:36] I mean they aren't decommissioned any more are they? [22:29:48] nope! [22:29:53] all righty then! [22:31:00] after you toss those the cleanup cron job will stop tossing their exported resources which puppet runs on the hosts keep trying to put back in [22:31:08] (03PS1) 10Andrew Bogott: Recomission virt1001, 1002, 1003. [operations/puppet] - 10https://gerrit.wikimedia.org/r/104676 [22:31:20] and then icinga will behave properly, just like clockwork :-) [22:34:03] apergos, ok, ^^. I'm about to go, will turn icinga back on for those hosts when I'm around to deal with the fallout. [22:34:20] all up to you [22:34:23] happy trails! [22:36:48] RECOVERY - Puppet freshness on virt1001 is OK: puppet ran at Mon Dec 30 22:36:40 UTC 2013 [22:37:17] RECOVERY - Puppet freshness on virt1003 is OK: puppet ran at Mon Dec 30 22:37:10 UTC 2013 [22:37:17] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Mon 30 Dec 2013 10:06:53 PM UTC [22:37:28] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Mon 30 Dec 2013 10:36:40 PM UTC [22:41:53] and we wait for neon puppet to run again .... [22:43:29] after puppet runs on virt1001 and virt1003 again [22:43:35] * apergos wanders off, too bored to wait [22:44:24] !log jenkins job mediawiki-core-lint is broken since 20:35 UTC roughly [22:44:42] Logged the message, Master [22:53:25] (03Abandoned) 10Dan-nl: adding '*.raa.se' to the wgCopyUploadsDomains array. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104482 (owner: 10Dan-nl) [22:57:07] (03PS1) 10Dan-nl: adding '*.raa.se' to the wgCopyUploadsDomains array. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104683 [22:57:37] !log Jenkins on gallium, /var/lib/jenkins/jobs/mediawiki-core-lint/builds eventually reached the maximum possible number of file entries. Deleting a bunch [22:57:53] Logged the message, Master [23:05:39] !log Jenkins purged build history of mediawiki-core-lint from before July 2013. [23:05:57] Logged the message, Master [23:06:48] RECOVERY - Puppet freshness on virt1001 is OK: puppet ran at Mon Dec 30 23:06:39 UTC 2013 [23:06:49] RECOVERY - Puppet freshness on virt1003 is OK: puppet ran at Mon Dec 30 23:06:44 UTC 2013 [23:10:38] !log Jenkins / gallium deleting build history from 2012 for operations* and mediawiki* jobs. That is running in a screen. [23:10:56] Logged the message, Master [23:14:01] (03PS3) 10Aaron Schulz: Removed fileJournal config; this has been unused for some time [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104316 [23:14:23] (03CR) 10Aaron Schulz: [C: 032] Removed fileJournal config; this has been unused for some time [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104316 (owner: 10Aaron Schulz) [23:14:38] (03Merged) 10jenkins-bot: Removed fileJournal config; this has been unused for some time [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104316 (owner: 10Aaron Schulz) [23:15:50] is there any way to see the queries run for a page in production? Trying to debug some watchlist issues that dont happen locally or on our labs instance, but are happening on test wiki [23:16:10] tried using the ?forcetrace=1 parameter, but because the watchlist query is so long it gets truncated inside the field list [23:16:33] db queries? [23:16:43] yes, specifically the main watchlist query [23:17:08] !log aaron synchronized wmf-config/filebackend.php 'Removed fileJournal config; this has been unused for some time' [23:17:10] everything looks to be correct, but its not finding the right answer. Hoping that by looking at the query it issues can figure out whats different [23:17:24] Logged the message, Master [23:21:41] ebernhardson: there's https://ishmael.wikimedia.org/ [23:21:55] since you're a deployer, you also have raw sql access to the production databases [23:27:10] ori: peeked at the prod db for testwiki already, it looks correct and my test queries of what i think it should be are returning the correct result. Its quite the odd bug [23:27:30] hmm, i dont seem to have access to ishmael, its not accepting my login (it says wiki id, so same as office wiki?) [23:27:38] nope, labs [23:30:38] hmm, doesn't like my labs login either? odd. 'The server says: WMF Labs (use wiki login name not shell)', but its not accepting "EBernhardson (WMF)" (wiki login) or "ebernhardson" (labs) [23:32:09] you might not be in the list [23:32:47] jgage: so in octoberish we reverted a change that had been sending out all needed fonts to view the main www.wikipedia.org [23:32:58] because that meant that everyone going there got every font known to humans [23:33:01] which was a lot of traffic [23:33:20] oh [23:33:23] it was around octoberish [23:33:24] wikitech username [23:33:25] oh wow [23:33:30] uh [23:34:16] in the graphs the colors change as well a few times due to moving ports around - we have switched ams-ix from our kenniset (AMS) to our evoswitch (also AMS) locations, and turned it into a larger port [23:34:28] ebernhardson: do you have a wikitech account? [23:34:28] gotcha [23:34:35] because that's what needs to happen [23:35:08] and hopefully after we get the new transit and transport links hooked up, the colors will shift again as we move traffic back to ulsfo [23:35:19] we currently cannot, since our transit and transport there have been incredibly unstable [23:35:33] huh. lame. [23:35:44] yeah [23:35:46] apergos: yup, as "EBernhardson" [23:36:10] apergos: but that doesn't log into ishamel either, i may just not be setup with access yet [23:36:26] plus, this new transit provider also has a lot better routes to asia than the other one [23:36:38] we have some peering as well there (need to work on a bit more of that!) [23:36:53] and joel is working on actually getting netflow data using pmacct [23:37:53] Reedy: how hard would it be to automatically give all extensions wmf branches with new versions? [23:37:59] pmacct sounds cool [23:38:26] that way the cherry-pick-to button could work on them [23:39:15] Aaron|home: What do you mean? [23:39:37] You mean, for every extension we use, make a deployment branch for each mediawiki version? [23:40:14] yes [23:40:37] That's what $branchedExtensions is for [23:41:11] We could update the code to remove that if we no longer need it, and make $normalExtensions branch [23:41:22] Leaving $specialExtensions as is etc [23:41:28] I wondeer what ldap groups you're in [23:41:34] mayb not the right ones [23:41:41] I just want it to be easier to backport changes [23:42:13] Mmmm [23:42:29] Shouldn't need much code changes to make-wmf-branch [23:44:22] ori: hmm, MergeCdbFileUpdates would be a bit faster if it just compared .json and .cdb file timestamp and the perms to set them to match... [23:44:46] that's what rsync does (well, it also checks size for good measure) [23:44:55] that would also eliminate the md5 files [23:45:18] or at least regulate their use to post-build sanity checks [23:45:41] you're reimplementing git :P [23:46:46] ebernhardson: you're not in the wmf ldap group as it turns out [23:49:27] apergos: ahha, that would do it :) who should i email to get that sorted out? [23:49:58] I'm looking at it [23:52:22] duly added [23:53:00] apergos: and it works now, thanks! [23:53:05] sweet [23:55:48] bedtime for bonzo [23:55:49] see yas!