[00:14:13] (03CR) 10Ori.livneh: [C: 032] role::performance: Convert to package<|provider==trebuchet|> [puppet] - 10https://gerrit.wikimedia.org/r/163368 (owner: 10BryanDavis) [00:14:58] (03CR) 10Ori.livneh: [C: 032] ocg: Convert to package<|provider==trebuchet|> [puppet] - 10https://gerrit.wikimedia.org/r/163375 (owner: 10BryanDavis) [00:24:53] (03PS4) 10Ori.livneh: role::parsoid: Convert to package<|provider==trebuchet|> [puppet] - 10https://gerrit.wikimedia.org/r/163373 (owner: 10BryanDavis) [00:24:58] (03CR) 10Ori.livneh: [C: 032 V: 032] role::parsoid: Convert to package<|provider==trebuchet|> [puppet] - 10https://gerrit.wikimedia.org/r/163373 (owner: 10BryanDavis) [00:33:46] (03PS4) 10Ori.livneh: role::elasticsearch: Convert to package<|provider==trebuchet|> [puppet] - 10https://gerrit.wikimedia.org/r/163364 (owner: 10BryanDavis) [00:34:02] (03CR) 10Ori.livneh: [C: 032 V: 032] role::elasticsearch: Convert to package<|provider==trebuchet|> [puppet] - 10https://gerrit.wikimedia.org/r/163364 (owner: 10BryanDavis) [00:34:05] go ori go :) [00:37:29] (03CR) 10Ori.livneh: [C: 032 V: 032] role::analytics: Convert to package<|provider==trebuchet|> [puppet] - 10https://gerrit.wikimedia.org/r/163366 (owner: 10BryanDavis) [00:38:45] (03PS4) 10Ori.livneh: role::ci::slave: Convert to package<|provider==trebuchet|> [puppet] - 10https://gerrit.wikimedia.org/r/163365 (owner: 10BryanDavis) [00:41:32] (03CR) 10Ori.livneh: [C: 032] role::ci::slave: Convert to package<|provider==trebuchet|> [puppet] - 10https://gerrit.wikimedia.org/r/163365 (owner: 10BryanDavis) [00:43:51] (03PS4) 10Ori.livneh: Remove deployment::target [puppet] - 10https://gerrit.wikimedia.org/r/163376 (owner: 10BryanDavis) [00:43:58] (03CR) 10Ori.livneh: [C: 032 V: 032] Remove deployment::target [puppet] - 10https://gerrit.wikimedia.org/r/163376 (owner: 10BryanDavis) [00:44:57] bd808: it's gone! thanks [00:45:10] w00t! [00:45:11] i applied each of those on the relevant hosts to confirm [00:51:14] PROBLEM - MySQL Slave Delay on db1016 is CRITICAL: CRIT replication delay 313 seconds [00:51:47] PROBLEM - MySQL Replication Heartbeat on db1016 is CRITICAL: CRIT replication delay 348 seconds [00:52:47] RECOVERY - MySQL Replication Heartbeat on db1016 is OK: OK replication delay -0 seconds [00:53:27] RECOVERY - MySQL Slave Delay on db1016 is OK: OK replication delay 0 seconds [00:54:29] humph [01:16:13] (03PS1) 10Dzahn: ssl_ciphersuite - add new compat mode [puppet] - 10https://gerrit.wikimedia.org/r/166710 [02:15:32] !log LocalisationUpdate completed (1.25wmf2) at 2014-10-15 02:15:31+00:00 [02:15:41] Logged the message, Master [02:28:00] !log LocalisationUpdate completed (1.25wmf3) at 2014-10-15 02:28:00+00:00 [02:28:09] Logged the message, Master [02:32:26] (03PS2) 10Ori.livneh: mediawiki::monitoring::webserver: tidy [puppet] - 10https://gerrit.wikimedia.org/r/166196 [02:32:34] (03CR) 10Ori.livneh: [C: 032 V: 032] mediawiki::monitoring::webserver: tidy [puppet] - 10https://gerrit.wikimedia.org/r/166196 (owner: 10Ori.livneh) [03:34:50] !log LocalisationUpdate ResourceLoader cache refresh completed at Wed Oct 15 03:34:50 UTC 2014 (duration 34m 49s) [03:34:58] Logged the message, Master [06:22:00] PROBLEM - Apache HTTP on mw1114 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:22:20] PROBLEM - HHVM rendering on mw1114 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:23:50] RECOVERY - Apache HTTP on mw1114 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.050 second response time [06:24:11] RECOVERY - HHVM rendering on mw1114 is OK: HTTP OK: HTTP/1.1 200 OK - 67869 bytes in 0.185 second response time [06:28:29] PROBLEM - puppet last run on lvs2001 is CRITICAL: CRITICAL: puppet fail [06:28:30] PROBLEM - puppet last run on db2036 is CRITICAL: CRITICAL: puppet fail [06:28:40] PROBLEM - puppet last run on mw1114 is CRITICAL: CRITICAL: puppet fail [06:28:50] PROBLEM - puppet last run on mw1211 is CRITICAL: CRITICAL: puppet fail [06:28:51] PROBLEM - puppet last run on amssq46 is CRITICAL: CRITICAL: puppet fail [06:29:09] PROBLEM - puppet last run on mw1052 is CRITICAL: CRITICAL: Puppet has 2 failures [06:30:21] PROBLEM - puppet last run on lvs2004 is CRITICAL: CRITICAL: Puppet has 2 failures [06:30:29] PROBLEM - puppet last run on db1059 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:33] PROBLEM - puppet last run on db1002 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:39] PROBLEM - puppet last run on search1001 is CRITICAL: CRITICAL: Puppet has 2 failures [06:30:40] PROBLEM - puppet last run on mw1042 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:40] PROBLEM - puppet last run on mw1166 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:50] PROBLEM - puppet last run on amssq47 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:00] PROBLEM - puppet last run on mw1119 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:00] PROBLEM - puppet last run on cp3014 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:19] PROBLEM - puppet last run on search1018 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:20] PROBLEM - puppet last run on ms-fe2003 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:20] PROBLEM - puppet last run on mw1123 is CRITICAL: CRITICAL: Puppet has 2 failures [06:31:21] PROBLEM - puppet last run on mw1118 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:29] PROBLEM - puppet last run on mw1092 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:30] PROBLEM - puppet last run on cp4008 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:39] PROBLEM - puppet last run on mw1025 is CRITICAL: CRITICAL: Puppet has 1 failures [06:39:10] PROBLEM - puppet last run on db1027 is CRITICAL: CRITICAL: Puppet has 1 failures [06:43:09] PROBLEM - Host mw1205 is DOWN: PING CRITICAL - Packet loss = 100% [06:45:09] ACKNOWLEDGEMENT - RAID on db1051 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) Sean Pringle RT 8650 [06:45:36] RECOVERY - puppet last run on db1059 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [06:45:39] RECOVERY - puppet last run on db1002 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [06:45:49] RECOVERY - puppet last run on mw1166 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [06:46:11] RECOVERY - puppet last run on cp3014 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [06:46:12] RECOVERY - puppet last run on mw1119 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [06:46:12] RECOVERY - puppet last run on search1018 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [06:46:20] RECOVERY - puppet last run on mw1123 is OK: OK: Puppet is currently enabled, last run 1 seconds ago with 0 failures [06:46:33] RECOVERY - puppet last run on mw1052 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [06:46:33] RECOVERY - puppet last run on mw1118 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [06:46:34] RECOVERY - puppet last run on mw1092 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [06:46:35] RECOVERY - puppet last run on lvs2004 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [06:46:39] RECOVERY - puppet last run on lvs2001 is OK: OK: Puppet is currently enabled, last run 1 seconds ago with 0 failures [06:46:39] RECOVERY - puppet last run on cp4008 is OK: OK: Puppet is currently enabled, last run 62 seconds ago with 0 failures [06:46:39] RECOVERY - puppet last run on search1001 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [06:46:40] RECOVERY - puppet last run on mw1025 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [06:46:50] RECOVERY - puppet last run on db2036 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [06:46:59] RECOVERY - puppet last run on mw1042 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [06:47:00] RECOVERY - puppet last run on mw1114 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [06:47:09] RECOVERY - puppet last run on amssq46 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [06:47:10] RECOVERY - puppet last run on amssq47 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [06:47:19] RECOVERY - puppet last run on mw1211 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [06:47:20] RECOVERY - puppet last run on ms-fe2003 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [06:48:30] (03PS1) 10ArielGlenn: beta: bastion ssh rule needs the class that defines bastion ip [puppet] - 10https://gerrit.wikimedia.org/r/166717 [06:49:41] (03CR) 10ArielGlenn: [C: 032] beta: bastion ssh rule needs the class that defines bastion ip [puppet] - 10https://gerrit.wikimedia.org/r/166717 (owner: 10ArielGlenn) [06:55:50] RECOVERY - puppet last run on db1027 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [06:58:43] <_joe_> !log restarting hhvm on mw1114 to avoid memory exhaustion [06:58:50] Logged the message, Master [07:26:55] (03CR) 10TTO: "No results for `git grep langlist` in operations/puppet, and it looks like operations/dns has its own language list [1]. I couldn't see a " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166281 (https://bugzilla.wikimedia.org/43697) (owner: 10TTO) [07:39:19] PROBLEM - puppet last run on lvs3002 is CRITICAL: CRITICAL: puppet fail [07:40:19] _joe_: memleak ftw! [07:41:25] <_joe_> paravoid: :/ [07:52:13] !log ongoing schema changes rev_content_(model|format) multiple shards, ok to kill osc_host.sh jobs on terbium in emergency [07:52:23] Logged the message, Master [07:55:12] (03CR) 10Giuseppe Lavagetto: [C: 031] "LGTM, but care to explain the changes in the chiphersuite?" [puppet] - 10https://gerrit.wikimedia.org/r/166710 (owner: 10Dzahn) [07:59:00] RECOVERY - puppet last run on lvs3002 is OK: OK: Puppet is currently enabled, last run 65 seconds ago with 0 failures [08:01:11] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "One small implementation detail, but also - we should be careful with this on the API cluster, where I have repeatedly seen servers in a b" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/130296 (owner: 10ArielGlenn) [08:06:32] !log Jenkins: upgrading Gearman plugin to Patchset 9 of https://review.openstack.org/#/c/125755/ [08:06:40] Logged the message, Master [08:09:40] !log restarting Jenkins [08:09:45] Logged the message, Master [08:10:26] (03PS4) 10Giuseppe Lavagetto: rsync: qualify vars [puppet] - 10https://gerrit.wikimedia.org/r/159462 (owner: 10Matanya) [08:40:57] (03PS5) 10Giuseppe Lavagetto: rsync: qualify vars [puppet] - 10https://gerrit.wikimedia.org/r/159462 (owner: 10Matanya) [08:59:04] (03CR) 10Giuseppe Lavagetto: [C: 032] "http://puppet-compiler.wmflabs.org/419/change/159462/html/" [puppet] - 10https://gerrit.wikimedia.org/r/159462 (owner: 10Matanya) [09:11:53] (03CR) 10Filippo Giunchedi: [C: 031] ssl_ciphersuite - add new compat mode [puppet] - 10https://gerrit.wikimedia.org/r/166710 (owner: 10Dzahn) [09:13:55] (03CR) 10Filippo Giunchedi: [C: 04-1] "shouldn't this be already in the debian package?" [puppet] - 10https://gerrit.wikimedia.org/r/166686 (owner: 10John F. Lewis) [09:15:18] paravoid: re: the ms-be1013 issues on friday, it turns out sdl was failing too :( [09:24:19] !log enable container sync for commons containers [09:24:26] Logged the message, Master [09:38:05] godog: :( [09:44:12] (03CR) 10Hashar: [C: 04-1] import LogFormat s from apache2 package [puppet] - 10https://gerrit.wikimedia.org/r/162541 (owner: 10Jeremyb) [09:54:52] (03PS2) 10Filippo Giunchedi: Fix doc header for ensure_link [puppet] - 10https://gerrit.wikimedia.org/r/159174 (owner: 10BryanDavis) [09:54:59] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] Fix doc header for ensure_link [puppet] - 10https://gerrit.wikimedia.org/r/159174 (owner: 10BryanDavis) [10:27:28] <_joe_> !log depooling mw1114, stopping puppet for debugging purposes [10:27:35] Logged the message, Master [10:35:28] (03PS1) 10ArielGlenn: Revert "Remove deployment::target" [puppet] - 10https://gerrit.wikimedia.org/r/166720 [10:35:34] PROBLEM - Apache HTTP on mw1114 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50350 bytes in 0.031 second response time [10:35:42] (03PS2) 10ArielGlenn: Revert "Remove deployment::target" [puppet] - 10https://gerrit.wikimedia.org/r/166720 [10:38:35] RECOVERY - Apache HTTP on mw1114 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.066 second response time [10:40:04] (03CR) 10ArielGlenn: [C: 032] Revert "Remove deployment::target" [puppet] - 10https://gerrit.wikimedia.org/r/166720 (owner: 10ArielGlenn) [10:40:52] ew on that commit message :) [10:40:59] not line-wrapped [10:47:16] <_joe_> !log repooled mw1114 with reduced load, using jemalloc with prof_leak enabled for sampling. will depool again soon [10:47:21] Logged the message, Master [10:48:48] (03PS1) 10ArielGlenn: use the old deployment::target class for trebuchet test repo [puppet] - 10https://gerrit.wikimedia.org/r/166729 [10:51:01] (03CR) 10ArielGlenn: [C: 032] use the old deployment::target class for trebuchet test repo [puppet] - 10https://gerrit.wikimedia.org/r/166729 (owner: 10ArielGlenn) [10:52:23] PROBLEM - HHVM rendering on mw1114 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50350 bytes in 0.007 second response time [10:53:12] PROBLEM - Apache HTTP on mw1114 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50350 bytes in 0.011 second response time [10:54:03] RECOVERY - Apache HTTP on mw1114 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.138 second response time [10:54:35] RECOVERY - HHVM rendering on mw1114 is OK: HTTP OK: HTTP/1.1 200 OK - 67869 bytes in 0.585 second response time [10:59:50] !log AMS-IX renumbering: adding second IP, peering with RS1 [10:59:56] Logged the message, Master [11:38:04] <_joe_> !log depooling mw1114 again [11:38:12] Logged the message, Master [11:51:42] PROBLEM - puppet last run on amssq62 is CRITICAL: CRITICAL: puppet fail [11:52:35] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [12:05:54] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [12:11:23] RECOVERY - puppet last run on amssq62 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [12:15:18] (03CR) 10Ori.livneh: "This aspect of Trebuchet's design is pretty dumb. Rather than map a deployment target to an arbitrary string that is usually the same as t" [puppet] - 10https://gerrit.wikimedia.org/r/166720 (owner: 10ArielGlenn) [12:21:16] (03CR) 10Ori.livneh: "(This becomes pretty obvious when you look at the config file: . It's pretty useless.)" [puppet] - 10https://gerrit.wikimedia.org/r/166720 (owner: 10ArielGlenn) [12:29:41] (03PS1) 10Ori.livneh: trebuchet: derive the grain name from the repo name [puppet] - 10https://gerrit.wikimedia.org/r/166736 [12:29:59] ^ apergos [12:30:24] that means people cannot use one grain for several repos [12:30:49] which was a feature of the previous setup [12:30:55] ori_: [12:31:12] not really a feature, if you ask me [12:31:27] how's the offsite, ori? [12:31:27] where is it anyway? :) [12:31:36] well hashar uses it, as an example [12:31:51] san diego [12:31:51] mark: so-so, tbh [12:32:20] several integration repos, one grain to cover them instead of a grain for each [12:32:23] apergos: because the relationship between the repositories ends up being declared in a giant hash blob in the configuration of trebuchet in role::deployment. the proper way to do it would be to enclose them in the role where they belong. [12:32:25] ah, california, but at least not san francisco [12:33:26] yes, having them in the role would be ok, as long as the same flexbility is there [12:33:35] apergos: what makes more sense: to have role::apt, in which you declare apt::virtual_package { 'integration': packages => ['foo', 'bar', buzz'] } [12:33:43] (03CR) 10CSteipp: [C: 031] ssl_ciphersuite - add new compat mode [puppet] - 10https://gerrit.wikimedia.org/r/166710 (owner: 10Dzahn) [12:33:56] or to have role::integration, in which you have: package { [ 'foo', 'bar', buzz' ]: } [12:34:05] so your offsite is still going on at 5am? [12:34:33] !log Zuul frozen \O/ [12:34:39] Logged the message, Master [12:35:20] my gut feeling is to go for the second tbh [12:35:34] yeah, i think so too [12:35:37] but you should probbly check with a couple ctive users of the system [12:37:17] I see the '' key is bck to its old tricks... [12:37:22] * apergos glres at the keyboard [12:37:22] do you have any spare time to help me with that? i'm not sure i have the time to do that adequately [12:37:52] ori_: I am happy to a) chat about it with you, b) review stuff, test stuff [12:38:01] if you need a c) you will hve to sk [12:38:43] lso do you want a bz report for this? [12:38:46] c) but you should probbly check with a couple ctive users of the system [12:39:01] yep I cn do that also [12:39:15] what do you think is appropriate? imo, we can just perform this migration [12:39:35] if it's not going to tke more than a day I sy don't bother with the report [12:39:44] long as it's documented in gerrit, which it is [12:39:58] it's back! (the 'a') [12:40:17] let me not push my luck :-D [12:40:24] chasemp: hah, we can't actually use shinken autodiscovery, since that's with 2.0 and we don't have packages for that... [12:40:30] 2.0 package just landed on sid... [12:40:42] !log disabled/reenabled gearman plugin at https://integration.wikimedia.org/ci/manage [12:40:50] apergos: george perec wrote an entire book in french without using the letter 'e', and someone translated it to english http://en.wikipedia.org/wiki/A_Void [12:40:50] Logged the message, Master [12:41:52] "Augustus, who has had a bad night, sits up blinking and purblind. Oh what was that word (is his thought) that ran through my brain all night, that idiotic word that, hard as I'd try to pun it down, was always just an inch or two out of my grasp - fowl or foul or Vow or Voyal? - a word which, by association, brought into play an incongruous mass and magma of nouns, idioms, slogans and sayings, a confusing, amorphous outp [12:41:52] ouring which I sought in vain to control or turn off but which wound around my mind a whirlwind of a cord" ... [12:42:14] might have to read that! an odd variant is to write without any gender pronouns and have it be natural [12:42:51] nice that the translators stuck to the same rule [12:43:13] yes, it's almost a greater achievement, because it adds an additional constraint [12:43:30] (you have to be faithful to the original *and* avoid the letter) [12:44:34] yep [12:44:51] mark: we were supposed to come up with a mission statement on day 1, day 2 ended without success and arthur gave up and told us to do it at home [12:45:11] it's now homework for robla [12:45:19] * ori_ isn't making this up [12:46:13] ok [12:46:20] while True should always be banned [12:46:24] i see [12:46:29] wrong chan [12:46:32] RECOVERY - RAID on db1051 is OK: OK: optimal, 1 logical, 2 physical [12:49:27] 'while 1:' is faster, but i doubt that's what you meant [12:49:43] clearly you must recurse upon yourself and make sure your language supports tail calls [12:49:55] err, optimizes tail calls [12:51:24] ocaml! [12:53:04] assembly! [12:55:11] YuviPanda: it really is faster (at least in cpython), if only nominally: https://dpaste.de/pLXj/raw [12:55:29] ori_: don't you need to change the custom provider to add the grain with the repo name instead of the base name? [12:55:53] haha :D [12:55:59] apergos: probably [12:56:59] YuviPanda: it's because you can reassign True in the loop [12:57:08] so it has to check at every iteration [12:57:11] wait, you can re-assign True in python? [12:57:16] yes [12:57:22] True = False [12:57:23] oh god [12:57:24] is valid [12:57:25] I didn't realize [12:57:27] just tried [12:58:03] ori_: can't re-assign 1, I guess. [12:58:04] hence LOAD_GLOBAL, POP_JUMP_IF_FALSE on each iteration [12:58:42] right, that makes sense... [12:58:49] but True being assignable, not so much... [12:59:00] hmm, so if I set True = False, I can't get True back... [12:59:01] oh wait [12:59:02] of course I can [12:59:09] True = (True == False) [12:59:11] would get that back [12:59:22] and it does [12:59:55] or 'True = not True' [13:00:45] we need an icinga alert for that ;) [13:01:45] CRITICAL: I don't even [13:02:38] apergos: i'll amend the patch when i'm actually awake [13:02:41] SANITY is CRITICAL: world ending [13:03:07] I can't believe you're even on line now [13:25:33] (03CR) 10ArielGlenn: "also needs modification of the trebuchet provider to use the repo name for the grain instead of the base." [puppet] - 10https://gerrit.wikimedia.org/r/166736 (owner: 10Ori.livneh) [13:29:30] !log dist-upgrade and reboot indium [13:29:38] Logged the message, Master [13:32:33] any SWATters around? [13:32:46] I'm unsure how to cherry-pick two patches that are dependent on each other to the branch... [13:32:50] I suppose I could just squash them [13:33:10] ? [13:33:18] just cherry pick them both? [13:33:45] Reedy: I cherry picked https://gerrit.wikimedia.org/r/#/c/166543/ but attempting to cherry-pick https://gerrit.wikimedia.org/r/#/c/166673/ shows a merge conflict [13:34:06] do it locally? [13:34:14] lazy :P [13:34:15] but fine [13:34:39] you might not even have to rebase or anything locally [13:34:44] if it's just jgit being shit [13:35:02] Reedy: I'll need to give 'em new change-ids tho [13:35:04] but that's not too hard [13:35:09] No you won't [13:35:36] oh? [13:35:53] if it's on a different branch/repo it can share a change-id [13:36:23] https://gerrit.wikimedia.org/r/#/q/I85d857f018e759cec4fc0e04ee6f195242096704,n,z [13:37:53] Reedy: yeah, done. thanks [13:40:22] Reedy: since you're around (what is it with you guys, all on line at ridiculous-o-clock?)... any idea who manages deployment-memc02 and 03 in beta? asking you cause your nme was on a changeset with them in it :-P [13:40:41] apergos: I'm at home [13:40:48] I didn't get to go to the core offsite :( [13:40:56] ah, ugh [13:40:57] mark___: When are you doing the Ams-ix renumbering thing? You're one of the few peers still in the old range [13:41:15] multichill|work: paravoid did some stuff earlier [13:41:44] we are in the process of renumbering, yes [13:42:07] what do you mean "still in the old range" though [13:42:07] apergos: Anyway. I don't think anyone specifically manages them. Just a handful of people that will do stuff if necessary. Wassup? [13:42:10] we're not, we like the old range better [13:42:15] most peers are in both ranges atm [13:42:23] I see they hve a salt mster running on them, not that their minions talk to it... [13:42:28] seemed pretty bizarre to me [13:42:29] I know, I'm in both too [13:43:03] apergos: Sounds like someone got box checking happy or something [13:43:07] On a Juniper the change is very easy [13:43:21] you who? [13:43:21] apergos: With the dedicated salt/puppet master, seems a bit redundant for them to be salt masters too [13:43:24] I would just shoot it and deinstall tht package (I looked at the configure list .. course, if someone checked it and then unchecked it the package would be round forever) [13:43:58] I'm gonna go ahead and toss then [13:44:05] WFM [13:44:49] Who are you refering to paravoid? I'm 195.69.144.252 / 80.249.208.252 / AS1126 ;-) [13:45:15] aka "knams" to us ;) [13:45:21] ah, I didn't know [13:45:43] although that's a -really- misleading name now [13:46:03] For a change I'm actually sitting in the DC. It's still in Amsterdam and Kennisnet is still a customer [13:46:39] true, although we have nothing to do with kn anymore [13:55:42] mark: You're welcome to rename it to vaams :P [13:55:46] We call it asd01 [13:55:54] we don't rename ever ;) [13:56:10] too much work for too little gain, hehe [14:00:46] Hi all, it looks like some e-mail sent to our OTRS queue get lost. Does someone know if there's any way to figure out what is wrong? [14:01:21] Which queue is that? [14:01:28] Reedy: wm-cz@wikimedia.org [14:01:49] Do you know of any specific emails that are missing? ie from a certain address maybe? (you don't need to post it publicly) [14:02:50] Reedy: reportedly this one: Oct 14 23:10:56 pravy postfix/smtp[30275]: 35C30400B7: to=, orig_to=, relay=polonium.wikimedia.org[208.80.154.90]:25, delay=2.4, delays=0.02/0.01/1.3/1.1, dsn=2.0.0, status=sent (250 OK id=1Xe9Mt-0001j2-QK) [14:03:20] Reedy: (the timezone is UTC+2) [14:04:37] Reedy: these are relayed from our mailserver through info@wikimedia.cz address [14:06:03] mark: True, we had to rename .sara.nl to vancis.net, I just put all the new stuff in the new domain and kept the old stuff in the old one [14:06:17] that's what we did too [14:06:30] i think we just finally got rid of the .knams. that were actually moved to esams back in 2008 ;) [14:07:01] I remember that. Around christmas and some battle scars? [14:07:26] yep [14:09:43] Something else Mark. Talked with Sillke the other day when I was in Berlin. She doesn't really have a good donor for the recent servers. You any ideas or want to claim it? [14:09:53] Three decent recent Dell servers + storage [14:10:50] i already replied to her once [14:11:04] yeah those recent ones we could take, even though we can'd do all that much in amsterdam with it [14:11:06] but can't hurt to mount it there [14:11:12] the rest is... scrap metal [14:11:33] <_joe_> !log powercycling mw1205, down since this morning, console blank [14:11:40] Logged the message, Master [14:12:16] !log reimaging mw1035 for great justice!!! (HHVM) [14:12:21] Logged the message, Master [14:13:17] In de Waarderpolder there's a company that buys old hardware, used them once or twice. Could be used for the scrap metal so at least it won't cost any money [14:14:23] RECOVERY - Host mw1205 is UP: PING OK - Packet loss = 0%, RTA = 1.38 ms [14:14:30] yup, i offered her to take care of that [14:14:38] we have some old equipment that needs to go too [14:14:56] Good to bundle, you might even get some money or toys from it [14:15:18] i was waiting for the toolserver stuff anyway [14:15:22] might as well do it all at once [14:15:26] it's just sitting in a rack now [14:16:36] !log AMS-IX renumbering: peering with (renumbered) top-10 ASNs + ASNs with large number of prefixes [14:16:45] Logged the message, Master [14:19:29] The old Dell squid servers? [14:19:38] those too indeed [14:19:41] some of them [14:20:06] !log Not reimaging mw1035 after all; hhvm is in our base, killing our ramz. [14:20:11] Logged the message, Master [14:20:17] Damn, classics! Making room for some new caching servers? [14:22:14] not really room we need soon [14:32:53] !log AMS-IX renumbering: move all remaining ASNs to the new space [14:33:01] Logged the message, Master [14:35:36] (03PS1) 10Filippo Giunchedi: ganglia: use install2001 public name [puppet] - 10https://gerrit.wikimedia.org/r/166746 [14:36:07] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] ganglia: use install2001 public name [puppet] - 10https://gerrit.wikimedia.org/r/166746 (owner: 10Filippo Giunchedi) [14:37:44] OK...let's see what's on the SWATdar today [14:38:13] * YuviPanda waves at marktraceur [14:38:20] I've a few things for SWAT, but they're for wikitech! [14:38:30] I have no idea what that means [14:38:32] hopefully andrewbogott comes by in the meantime so he can run sync-common [14:38:37] Ah. [14:38:47] So I can get it up but it won't do anything, basically? [14:38:53] yeah, but that's ok. [14:39:00] if andrew doesn't turn up I can always bug Core.n [14:39:23] Looks like editing have been busy. [14:39:33] Nothing too complicated [14:40:37] PROBLEM - Host payments1003 is DOWN: PING CRITICAL - Packet loss = 100% [14:41:56] oops, forgot to enable down-for-maintenance before rebooting payments1003, pls ignore ^^^ [14:42:14] ok [14:45:29] RECOVERY - Host payments1003 is UP: PING OK - Packet loss = 0%, RTA = 0.76 ms [14:52:24] (03CR) 10Alexandros Kosiaris: [C: 032] Beta: Define cxserver port [puppet] - 10https://gerrit.wikimedia.org/r/166542 (owner: 10KartikMistry) [14:52:53] marktraceur: I'm stepping away for a few mins but should be back in 10 [14:53:07] KK [14:53:13] I can do James's stuff first [14:54:35] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Added initial Debian packaging [debs/contenttranslation/apertium-en-es] - 10https://gerrit.wikimedia.org/r/165471 (owner: 10KartikMistry) [14:59:13] All right, it's time to play everyone's favorite game show: Sync or Swim! I'm your host marktraceur, here with the lovely Scap Dancers, ready for a fun morning ahead. [14:59:44] Our first contestant is James_F|Away. Oh, dear, has he stepped out to the restroom? [14:59:54] I was going to ask you to do yuvi's patches first, but maybe James has dibs [15:00:05] manybubbles, anomie, ^d, marktraceur: Respected human, time to deploy SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141015T1500). Please do the needful. [15:00:09] andrewbogott: YuviPanda's not here for about 2 more minutes [15:00:23] marktraceur: I'll stand in for James [15:00:26] But right now neither of...aha [15:00:29] I wrote the patches anyway :) [15:00:32] OK, RoanKattouw first [15:00:43] Would it help if I told you that I was wearing a brightly colored printed shirt and showed great enthusiasm during the pre-show interview? [15:00:55] andrewbogott: heya! SWAT is now, I think, and I've the api patches set up for it. [15:00:58] andrewbogott: Yes, but RoanKattouw is Dutch, and that's weird. [15:01:07] I'm not listed because I'm not usually awake at 8am ever, but jetlag [15:01:10] fair point [15:01:49] All right, +2 in, waiting on Jenkins [15:03:10] andrewbogott: pre-show interview? [15:04:46] Based on my exuberant game show entrance [15:05:00] aaaah, I think my network made me miss that [15:07:35] Hm… is Jenkins broken? [15:07:35] mark: Hey. [15:07:37] Bah. [15:07:39] marktraceur: Hey. [15:07:41] marktraceur: Did you not get my text? [15:07:51] Oh, er, no [15:07:57] obviously, I got your text [15:08:08] Damn tab-complete phones [15:08:11] mark: Sorry for mis-ping. [15:08:13] ;) [15:08:28] OK well, James_F and RoanKattouw, I'm doing yours now, it got merged finally [15:08:35] Thanks marktraceur. [15:09:20] Has anyone ever actually had a security patch in an extension? [15:09:30] I check every time but always disappointments [15:09:42] our extensions are always secure [15:09:55] CA does semi regularly [15:10:00] James_F: Oh, got it now [15:10:10] marktraceur: Helpful. [15:10:51] First I'll do the core changes, then VE, sound sane James_F? [15:11:03] marktraceur: It's one patch for a reason. No need to split it. [15:11:09] Ugh [15:11:26] I guess sync-dir for the entire core directory, then [15:11:29] marktraceur: But the core change is fine on its own if you push that first. [15:11:47] I'd rather, yeah. Maybe just symbolic. :) [15:12:10] are you making a link? :) [15:12:26] Booooo [15:12:53] !log marktraceur Synchronized php-1.25wmf3/resources/lib/oojs-ui: [SWAT] [wmf3] Update OOjs UI to v0.1.0-pre (d74a46ca6a) and VisualEditor-MediaWiki to Ie06056b (duration: 00m 06s) [15:13:01] Logged the message, Master [15:13:42] !log marktraceur Synchronized php-1.25wmf3/extensions/VisualEditor/modules/ve-mw/ui: [SWAT] [wmf3] Update OOjs UI to v0.1.0-pre (d74a46ca6a) and VisualEditor-MediaWiki to Ie06056b (duration: 00m 05s) [15:13:45] Much faster [15:13:47] Logged the message, Master [15:13:48] Or something. [15:13:50] James_F: Testy test! [15:13:55] YuviPanda: You're next [15:14:08] * YuviPanda touches up make up, paints nails again quickly [15:14:17] OK then. [15:14:28] *five* patches? [15:14:53] I think I can do them all in two syncs [15:15:04] Oh, no, because submodule bump [15:15:07] marktraceur: yeah. [15:15:15] one of them is a security fix! Sort of [15:15:20] YuviPanda: What have you wrought?! ;-) [15:15:26] James_F: wikitech :) [15:15:34] marktraceur: wmf3 is a no-op as well, since wikitech is on wmf2 [15:15:39] Super duper [15:15:42] Oh dear. [15:15:45] YuviPanda: But it *will* be. [15:15:50] Four syncs, awayyyyyy [15:15:59] marktraceur: indeed, which is why I submitted there as well [15:16:02] ah, godog, you already got python-elasticsearch in for trusty, eh? [15:16:03] cool. [15:16:10] i was going to work on that today (or tomorrwo :/) [15:16:11] thanks! [15:16:58] marktraceur: BTW, confirmed that it works fine. [15:17:13] James_F: Sweet, thanks [15:17:19] andrewbogott: the memcached thing? well, the thing it would potentially be a security fix to is also getting deployed just now... [15:17:39] YuviPanda: I guess actually I might as well sync the whole bloody thing, so two syncs, one for each branch [15:18:00] marktraceur: hmm, I don't know if you can sync to wikitech, I thought only andrewbogott can do that [15:18:20] No, but if I merge things I have to sync them to the cluster. [15:18:25] Which is why I'm even doing this [15:18:32] ah, of course [15:19:48] gwicke: i'm thikning about working on some cassandra puppetization stuff soon! just checking in with you before I start on that [15:20:33] (03PS8) 10coren: Autogenerate chained certificates [puppet] - 10https://gerrit.wikimedia.org/r/163798 [15:21:06] bblack: ^^ rebased for great justice. [15:22:03] let there be hope for our future! [15:22:05] ottomata: yup I merged Chad's scripts but at that point was breaking puppet while trying to install python-elasticsearch, was straightforward enough :) [15:22:10] !log AMS-IX renumbering: remove old IP from interface, migration over; > 75% of total peers migrated, accounting for much more bandwidth/routes [15:22:19] Logged the message, Master [15:22:31] bblack: One thing to note is that it is probably that the generated chains don't quite match the sums of the older ones for two reasons: I insert a nl after every cert to make sure we never lack for a trailing nl (as caused a bug previously, see comments on patch 6); and I never include the root CA [15:22:52] ok [15:22:59] nl after every cert? [15:23:03] so empty newlines in the chains? [15:23:19] don't do that, not all implementations will be happy about that [15:23:33] cool, thanks godog [15:23:36] paravoid: I /could/ make it conditional for the pretties, but you really think that there are extant bits of code that would complain? [15:23:43] paravoid: that was my suggestion, based on a bug that I think is linked... [15:23:51] there are always extant bits of code that complain about everything :( [15:23:56] Heh. [15:23:56] yes [15:24:07] Well, lemme make a quick patch to only add the nl if it is actually missing. [15:24:11] perl? [15:24:15] shelling out to openssl? [15:24:32] ottomata: that's great news! [15:24:41] I thought we've previously agreed to use python as our preferred language for scripts [15:24:42] paravoid: In controled circumstances (basically, it can only be invoked with cert files as a parameter) [15:24:56] PROBLEM - puppet last run on cp1051 is CRITICAL: CRITICAL: Puppet has 1 failures [15:25:17] Krinkle|detached: Are you around to do your patch? [15:25:32] andrewbogott: want to run sync? [15:25:51] YuviPanda: is it done? I was still waiting for Jenkins to log... [15:25:54] oh, nevermind [15:25:57] (and it's not pretty perl code either) [15:26:02] gwicke: ja so, just checking [15:26:03] I saw the +2, forgot about jenkins [15:26:09] perl -e 'system("/bin/bash -c /usr/bin/python ...")' [15:26:14] your status is that you have a small manually configure cassandra cluster right now, right? [15:26:16] not puppetized at all? [15:26:26] bblack: not that different from my $dn = `/usr/bin/openssl x509 -subject -in "$cert" -noout|sed -e 's/subject=\\s*//'`; [15:26:33] ottomata: I have some old notes at https://www.mediawiki.org/wiki/User:GWicke/Notes/Storage/Cassandra_testing; some of those might no longer be relevant, but it might give you a good idea on which files we'll need to tweak [15:26:40] ok cool [15:26:46] that will help for sure [15:26:53] ottomata: yes, right now nothing is puppetized [15:27:03] it's debianized, but we'll want to change at least two files [15:27:09] Wait [15:27:13] in /etc/cassandra [15:27:22] YuviPanda: You put patches for the extension into the SWAT queue [15:27:29] I totally didn't notice what repository it was in [15:27:44] hmm? they're just cherry picks [15:27:52] ok great [15:27:58] YuviPanda: Did the submodule bump include the patches you were linking to in the SWAT summary? [15:28:02] ottomata: there is a good amount of GC tuning in /etc/cassandra/cassandra-env.sh [15:28:08] marktraceur: yes [15:28:13] OK, well, there you go. [15:28:19] the per-cluster config is in /etc/cassandra/cassandra.yaml [15:28:26] I merged *something* into wmf2, not sure what at this point [15:28:58] ottomata: should we write down what we'll need to parametrize somewhere? [15:29:31] marktraceur: hmm? I'm updating my local check out to verify [15:29:36] KK [15:29:55] gwicke: ja, or, just give me copies of the files you have modified from the .deb's version [15:29:57] and I can figure it out [15:30:09] oh, gwickei can cehck from the files you just mentioned [15:30:11] what machines? [15:30:21] praseodymium, xenon and cerium [15:30:29] I think they haven't been wiped yet [15:30:30] k danke [15:31:07] ottomata: I changed a lot of things in there as part of my testing, and not all of that makes sense [15:31:26] marktraceur: you merged https://gerrit.wikimedia.org/r/#/c/166744/, which submodule bumps two commits. seems ok to me? [15:31:30] k, if you have more epxlicit instructions of what needs changed [15:31:32] email me with a list :) [15:31:33] OK fair enough [15:31:36] marktraceur: what're you confused about, if you're confused about anything at all? or am I confused? [15:31:36] YuviPanda: I'm going for wmf2 [15:31:49] YuviPanda: I merged two core commits, which were submodule bumps, which is fine [15:31:55] ok [15:32:00] YuviPanda: But I also merged three extension commits for some reason [15:32:08] They were linked from the calendar [15:32:11] ottomata: okay [15:32:18] If they weren't merged yet, how are they in the submodule update? [15:32:33] marktraceur: ah, that's because I can't merge into wmf branches? [15:32:38] (03CR) 10John F. Lewis: "@Daniel: Translations are from MetaWiki (https://meta.wikimedia.org/wiki/Mailing_lists/List_info)" [puppet] - 10https://gerrit.wikimedia.org/r/166686 (owner: 10John F. Lewis) [15:32:46] Aha. [15:32:50] marktraceur: so they were merged into master, and cherry picks were the ones you merged. [15:33:01] KK [15:33:09] marktraceur: and you can submodule bump without having to merge them because git is magic like that :) [15:33:59] (03CR) 10Faidon Liambotis: [C: 04-1] "First off, this needs to be in Python -- that's our agreement regarding ops scripts. (this isn't pretty Perl code anyway, calling out to b" [puppet] - 10https://gerrit.wikimedia.org/r/163798 (owner: 10coren) [15:34:56] Apparently [15:35:33] OK, two sync-dirs coming your way [15:36:23] !log marktraceur Synchronized php-1.25wmf2/extensions/OpenStackManager/: [SWAT] [wmf2] Make list=novainstances available to anons (duration: 00m 05s) [15:36:31] Logged the message, Master [15:36:38] !log marktraceur Synchronized php-1.25wmf3/extensions/OpenStackManager/: [SWAT] [wmf3] Make list=novainstances available to anons (duration: 00m 06s) [15:36:42] YuviPanda: Make sure I didn't ruin everything? :) [15:36:44] Logged the message, Master [15:36:50] Krinkle|detached: Oi [15:36:53] marktraceur: we can find out after andrewbogott syncs :) [15:37:04] is that all of them? [15:37:13] James_F: Our good friend Krinkle appears not to be around, do you have any clues? [15:37:18] andrewbogott: Krinkle still has a patch, standby [15:37:29] Oh, unless you mean the patches for wikitech, in which case yes. [15:38:08] andrewbogott: I think so, yeah. I've no idea why the log message just lists that message, tho [15:38:15] yeah, that's what I meant :) I will sync! [15:38:28] !log running sync-common on virt1000 [15:38:34] Logged the message, Master [15:38:37] marktraceur: What? [15:39:03] James_F: https://gerrit.wikimedia.org/r/#q,166716,n,z from Krinkle, wants me to deploy it [15:39:06] But he not here [15:39:20] marktraceur: Oh. He didn't mention it to me. Punt it to the 16:00 one? [15:39:22] Wondering if you've seen him, or something [15:39:27] Yeah I figure [15:39:35] I'll give him the rest of the window [15:39:41] Not like I'm doing anything else [15:40:20] RECOVERY - puppet last run on cp1051 is OK: OK: Puppet is currently enabled, last run 60 seconds ago with 0 failures [15:42:14] andrewbogott: yay, that seems to work fine :) [15:42:36] marktraceur: Thanks. [15:42:42] YuviPanda: Great! [15:42:44] ty, marktraceur [15:42:54] YuviPanda: No problemo [15:42:56] Now I'm going for breakfast. Back soon. [15:43:15] andrewbogott: ok! Will have patches for you when you come back :) [15:43:24] That doesn't shock me [15:44:20] heh [15:45:34] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Added initial Debian packaging [debs/contenttranslation/apertium-pt-ca] - 10https://gerrit.wikimedia.org/r/165475 (owner: 10KartikMistry) [15:46:45] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Add .gitreview file [debs/contenttranslation/apertium-apy] - 10https://gerrit.wikimedia.org/r/166396 (owner: 10KartikMistry) [15:51:19] Krinkle|detached: Well, it's ten minutes to the hour and you're not here, I'm officially (and symbolically) admonishing you for putting a patch on the SWAT list and not being around to test it, and pushing it back to the 23:00 UTC slot. [15:51:34] ottomata: first pass at https://wikitech.wikimedia.org/wiki/Cassandra [15:57:25] when there's an ottomata around with a few minutes I'd like to get their input on a minor trebuchet change affecting users of it [15:58:34] sure, apergos, in an interview right now...done in 45ish [15:59:08] So how many people are getting HHVM on the Wikipedias? [16:02:17] Bsadowski1: I think it got enabled by default for newbies. [16:02:28] (03CR) 10Hashar: [C: 04-1] "At first I was wondering of the implications to have Parsoid configuration hosted in the /deploy/ git repository. But since that is alrea" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/166610 (owner: 10Subramanya Sastry) [16:02:36] ragesoss: Bsadowski1 5% of anons at the moment, actually. [16:02:40] because I'm seeing every student who went through the on-wiki training in the last couple days is tagged HHVM. [16:02:57] YuviPanda: plus on for new users? [16:03:09] I don't think it's on as such for new users [16:04:04] YuviPanda: I'm seeing about half of the students who do the training (usually right after they create accounts) are using HHVM lately. [16:04:26] anyhow, nice to see things moving toward full deployment! [16:04:35] ragesoss: ah, hmm. hopefully they aren't seeing crashes, etc :) [16:04:50] indeed! [16:04:54] ragesoss: I think the cookie is probably set when they view, and then it carries on? I'm unsure. [16:04:58] <_joe_> YuviPanda: or memleaks :/ [16:05:19] _joe_: yeah, vaguely following that. bad enough that it dumps core / dies after a while? [16:05:30] YuviPanda: ah, that's an interesting theory... so the people who first hit edit before logging in end up on HHVM? [16:05:38] <_joe_> YuviPanda: worse, performance degrades [16:05:40] ragesoss: that's mere speculation on my part :) [16:05:42] _joe_: ow :( [16:05:57] <_joe_> YuviPanda: then there is a crash involved for sure [16:06:02] YuviPanda: but if true, it makes for an interesting natural experiment. [16:06:12] ragesoss: _joe_ would know better, I think (the logic for getting people hhvm) [16:06:15] <_joe_> which is a pity, given how well it works for the first few hours of operation [16:06:48] _joe_: yeah... [16:06:55] * YuviPanda crons a restart script for every 2h [16:07:12] <_joe_> YuviPanda: lol [16:07:18] _joe_: it's the PHP way [16:10:12] hmm, where should scripts we install go? [16:10:26] /usr/bin I presume, or /usr/local/bin [16:10:30] * YuviPanda goes to check the standard [16:10:37] <_joe_> YuviPanda: which scripts? [16:10:51] _joe_: I'm writing one for labs that 'archives' deleted instance metrics [16:11:27] /usr/local/bin I presume [16:19:42] _joe_: i'll debug today [16:19:58] <_joe_> ori_: oh, hi! [16:20:04] hey [16:20:09] thanks for debugging [16:20:14] <_joe_> I just wrote you a braindump in private [16:20:16] <_joe_> :P [16:20:19] * ori_ reads [16:24:20] (03PS2) 10Subramanya Sastry: Get betalabs localsettings.js file from deploy repo (just like prod) [puppet] - 10https://gerrit.wikimedia.org/r/166610 [16:28:16] PROBLEM - very high load average likely xfs on ms-be1007 is CRITICAL: CRITICAL - load average: 225.70, 106.26, 52.47 [16:31:54] marktraceur: Thx. I stayed up longer and slept in over that slot. [16:31:56] Thx for moving [16:40:36] PROBLEM - Host ms-be1007 is DOWN: PING CRITICAL - Packet loss = 100% [16:46:15] Krinkle: Up late working on https://bugzilla.wikimedia.org/72063 ? You're too good for us. :-) [16:46:40] James_F: yeah, not finished yet though. [17:11:12] _joe_: ygpm [17:12:00] <_joe_> ori_: you too :) [17:21:21] ottomata: interview done yet? [17:26:59] apergos: yes, but just ate lunch and now Scrum of Scrums [17:27:19] apergos: is there a changeset i can look at? or you just wanna discuss? [17:27:32] just chat [17:27:48] i's a "do you like A better or B' [17:28:06] k [17:28:08] so go scrum and holler when you have free time, if i"m here then I'll answer [17:28:11] ok [17:38:40] (03CR) 10coren: "yeah, I'll cave to the python fanbois and rewrite it there." [puppet] - 10https://gerrit.wikimedia.org/r/163798 (owner: 10coren) [17:53:24] Krinkle: hah, nice work on nagf [17:53:46] paravoid: Yeah, I just needed to get my ganglia graphs back [17:53:51] Krinkle: one comment would be that you display too many metrics by default [17:53:55] e.g. https://tools.wmflabs.org/nagf/?project=analytics is huge [17:54:02] So I dug deep into ganglia and tried to map it to diamond metrics with graphite queries [17:54:21] besides being a bit bad for UX, this might put an unreasonable strain on the server [17:54:27] paravoid: My main use case is project=integration. I made it generic because it was easy to do. [17:54:40] that's long as well :) [17:54:57] well, long but I usually go through all of them :P [17:55:12] I don't expect to have much cycles to make it better, but I'll address any bug reports files on github, and will review any pull requests. [17:55:19] paravoid: How would you fragment it? [17:55:56] probably similar to ganglia [17:56:00] note you can jump to a graph with the second dropdown menu [17:56:23] a cluster view by default, a menu for invidivual hosts, plus a way to aggregate one metric over all hosts [17:56:32] or something like that, dunno :) [17:57:01] Yeah [17:57:32] !log installed lua5.1 on mw1114 so i can switch scribunto to luastandalone and thus potentially isolate the leak to luasandbox [17:57:34] I think I'll add a split between node and graphs, so there's three dropdown menus: project, node, graph. The first two being pages, and the third one just a section jumper. [17:57:41] Logged the message, Master [17:57:50] and a magic node=overview for the cluster view (default) [17:58:08] ok, apergos what's up? [17:58:30] ah so [17:58:42] I guess you use trebuchet for deployment of something or other? [17:59:09] basically the old setup was, you have a directory with some repos under it, then you assign grains to each of these repos, it might be the same grain for several or whatever [17:59:28] then you put that grain or those grains on the host you want to have that repo/those repos [17:59:28] (03Draft1) 10Ori.livneh: publichtml: tidy [puppet] - 10https://gerrit.wikimedia.org/r/166688 [17:59:35] (03PS2) 10Ori.livneh: publichtml: tidy [puppet] - 10https://gerrit.wikimedia.org/r/166688 [17:59:44] so turns out that's not really workable in the new refactored world [17:59:55] one grain per repo, that's how it's going to be [18:00:04] yurik: Respected human, time to deploy Wikipedia Zero (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141015T1800). Please do the needful. [18:00:35] and you would not ever specifiy the grain yourself, in order to get your host to be a repo target you would require the Package[repo-name-here] [18:01:11] so in order to give that same flexibility we had before (you could use one puppet line to get several repos nto your host) we could either [18:01:43] qq, this is to replace deployment::target with package provider => trebuchet? [18:01:48] (03PS2) 10Andrew Bogott: ldap: Move ldapsupportlib.py to standard location [puppet] - 10https://gerrit.wikimedia.org/r/144848 (owner: 10Tim Landscheidt) [18:02:09] have something like this: role::apt, in which you declare apt::virtual_package with the packages listed there [18:02:34] or have a role which just lists the packages and you include that role o the host [18:02:53] question is what feels gut right to you as a user of the system [18:04:48] apt....? [18:04:56] ottomata: yes that's happened already, it's just broken-ish righ tnow [18:05:06] til thi grain thinig gets sorted [18:05:15] are we talking about .deb packages or git deployed code? [18:05:21] git dpeloyed code [18:05:38] so the package provider for trebuchet 'packages' is 'here's a repo name, deploy it' [18:06:11] but if we defined vitual packages someplace that were lists of like say the three repos you always have o X type of host [18:06:19] you would pass that in instead (maybe) [18:06:20] i think i'm not following, shouldn't I be able to do: [18:06:42] package { 'analytics-refinery': [18:06:42] provider => 'trebuchet', [18:06:42] } [18:06:50] in the relevant class(es)? [18:07:01] or something like that? [18:08:01] well the package name will be not the grain any more but th repo name [18:08:10] which I guess would be analytic/refinery in the example [18:08:17] *analytics [18:08:28] hm, apergos, i haven't seen any cases (at least I don't use them), where the same grain is used for multiple repos [18:08:50] ok so as far as you're concerned it doesn't matter [18:09:04] hm, ok, eventlogging is used as a grain twice [18:09:05] I need to pitch hashar, he actually does use one grain for several repos [18:09:09] oh. ah hah [18:09:19] ah, and, contint-production-slaves [18:09:21] hm [18:09:22] yep [18:09:27] (03PS1) 10Hoo man: Wikidata: Also search in NS_PROPERTY per default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166785 [18:09:44] ok, apergos, i think people won't mind having to specific each repo they want to deploy in their classes [18:09:46] I have an opinion on the issue but I want someone who uses it that way to chime in (or who might) [18:09:52] if they need them to be grouped to gether [18:09:56] they can do so with some wrapper class [18:09:59] just shove em in a class. [18:10:02] rather than relying on the same grain matching all repos [18:10:02] ja [18:10:06] that's the puppet way is't it anyhow [18:10:09] yeah [18:10:18] it actually gives more flexibility this way because [18:10:26] yeah, you can choose one if you need to [18:10:27] if I want repos a b c on one set of hosts [18:10:31] and b c d on another [18:10:32] PROBLEM - puppet last run on ms-fe3002 is CRITICAL: CRITICAL: puppet fail [18:10:38] with the config we had, not very easy [18:10:49] witht he new setup, role classes and done [18:10:51] aye [18:10:55] aye, yeah, i'm pretty sure nik won't care, he'll like it [18:11:05] maybe ask hashar about the contint ones? [18:11:09] ok, thanks, that's all I had, I'll chat with hashar about it tomorrow mrning and then done [18:11:11] yep [18:11:12] cool [18:11:14] sounds good to me! [18:11:16] thanks [18:11:20] he hd already checked out this afternoon when I tried to poll him [18:11:25] thanks you too [18:15:55] (03CR) 10Andrew Bogott: [C: 032] ldap: Move ldapsupportlib.py to standard location [puppet] - 10https://gerrit.wikimedia.org/r/144848 (owner: 10Tim Landscheidt) [18:21:17] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [18:25:19] (03CR) 10Andrew Bogott: "This works fine in my tests. I agree both that a) this is how things are done and b) it seems weird that it's only in the dir for one pyt" [puppet] - 10https://gerrit.wikimedia.org/r/144848 (owner: 10Tim Landscheidt) [18:29:06] RECOVERY - puppet last run on ms-fe3002 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [18:34:27] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [18:46:47] !log adjusting pybal weight for mw1114 back up to 20 to confirm that leak is in luasandbox [18:46:56] Logged the message, Master [19:10:16] (03CR) 10Ori.livneh: "ping! I applied this on deployment-bastion.eqiad.wmflabs, if you want to check it out." [puppet] - 10https://gerrit.wikimedia.org/r/165779 (owner: 10Ori.livneh) [19:10:43] ^ paravoid [19:16:20] mutante: you there? [19:26:14] (03PS1) 10Yurik: Moved zerowiki to group0 depl [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166800 [19:26:42] gwicke: do you have a labs project for your restbase or cassandra stuff? [19:26:52] want to fire up an instance somewhere [19:27:34] (I want to) [19:28:59] (03PS1) 10Hoo man: Define client CSS classes for new wikidata badges [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166801 [19:29:06] (03CR) 10jenkins-bot: [V: 04-1] Define client CSS classes for new wikidata badges [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166801 (owner: 10Hoo man) [19:29:11] lool [19:29:32] :P [19:30:03] (03CR) 10Reedy: [C: 04-1] Define client CSS classes for new wikidata badges (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166801 (owner: 10Hoo man) [19:30:14] You don't say :D [19:30:31] (03PS2) 10Hoo man: Define client CSS classes for new wikidata badges [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166801 [19:33:31] (03CR) 10Lydia Pintscher: [C: 031] "Seems sensible to me." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166785 (owner: 10Hoo man) [19:36:23] paravoid: did you add the current cassandra package to our apt? [19:36:32] yes [19:36:51] by "current" you don't mean upgrade it to the last version, right? [19:37:11] i'm just starting to puppetize it, and am using trusty, i assume there isn't a trusty dist built? [19:37:29] by current i mean what is in apt now :) [19:37:36] what's in apt comes by reprepro updates [19:37:44] oh easy, danke [19:37:49] !log yurik Synchronized php-1.25wmf2/extensions/ZeroBanner/: Latest ZeroBanner (duration: 01m 07s) [19:37:50] cassandra has their own suites [19:37:54] it has no dependencies iirc [19:37:54] Logged the message, Master [19:38:13] so there's no reason why it wouldn't work on trusty I think [19:38:42] Suite: 20x [19:38:48] that may have to be converted to 21x [19:38:51] (for 2.1) [19:39:12] gabriel is asking for 2.0.10, mind if I update to that? [19:39:25] gwicke: why not 2.1? :) [19:39:49] from https://wikitech.wikimedia.org/wiki/Cassandra (for me I think): Cassandra has a pretty good https://wiki.apache.org/cassandra/DebianPackaging. We want Cassandra 2.0.10 for now, until 2.1.x has seen more testing [19:39:49] (03CR) 10Chmarkine: "Instead of creating a new compat mode, how about changing the existing compat mode to disable SSL 3.0? I think if we introduce "compatnoss" [puppet] - 10https://gerrit.wikimedia.org/r/166710 (owner: 10Dzahn) [19:40:14] gabriel going for anything less than bleeding edge? [19:40:18] I'm shocked! :P [19:40:34] hah [19:47:00] !log yurik Synchronized php-1.25wmf3/extensions/ZeroBanner/: Latest ZeroBanner (duration: 01m 11s) [19:47:06] Logged the message, Master [19:54:39] greg-g, no functionality changes or bug fixes to deploy .. just internal code cleanup, changes to dev tools, etc. so, skipping parsoid deploy today. [19:58:57] (03CR) 10Faidon Liambotis: [C: 04-1] "I've said this months ago on IRC, but mea culpa for not putting in Gerrit:" [puppet] - 10https://gerrit.wikimedia.org/r/117021 (https://bugzilla.wikimedia.org/41754) (owner: 10Nemo bis) [20:00:05] gwicke, cscott, subbu: Respected human, time to deploy Parsoid/OCG (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141015T2000). Please do the needful. [20:01:08] _joe_, arlolra is going to restart the pdf service till he gets a chance to test out the other fixes. sound good to you? [20:01:38] <_joe_> subbu: yes! [20:01:38] <_joe_> subbu: I was going to restart it tomorrow [20:01:52] gwicke, cscott_a` subbu - almost done with my depl [20:02:00] ah, arlolra got deploy rights recenlty .. so, this might be a good time to test them out. [20:02:12] yurikR1, no problem, we aren't deploy parsoid today. [20:02:26] *deploying [20:02:31] thx ) [20:02:41] i will keep on debugging - we are having some minor issues [20:02:44] !log yurik Synchronized php-1.25wmf2/extensions/ZeroPortal: updatidng ZeroPortal to master (duration: 01m 15s) [20:02:50] Logged the message, Master [20:03:51] Reedy, i think the ZeroPortal ext repo setup on wmf3 on tin is broken -- git submodule update fails [20:04:32] is it set up differently from other exts with the depl scripts? [20:04:36] PROBLEM - udp2log log age for lucene on oxygen is CRITICAL: CRITICAL: log files /a/log/lucene/lucene.log, have not been written in a critical amount of time. For most logs, this is 4 hours. For slow logs, this is 4 days. [20:05:11] Reedy, it also fails jenkins - https://gerrit.wikimedia.org/r/#/c/166794/ [20:05:16] works fine on wmf-2 [20:06:18] (03CR) 10Nemo bis: "Oh, I totally missed this point. Sure, let's do that if you think it's simpler. We're not going to break any history/incoming links are we" [puppet] - 10https://gerrit.wikimedia.org/r/117021 (https://bugzilla.wikimedia.org/41754) (owner: 10Nemo bis) [20:10:03] yurikR2: Jenkins failure looks unrelated [20:11:45] Locally it doesn't look like ZeroPortal has been updated [20:11:46] extensions/Collection | 2 +- [20:11:46] extensions/OpenStackManager | 2 +- [20:11:46] extensions/VisualEditor | 2 +- [20:11:46] extensions/ZeroBanner | 2 +- [20:12:04] (03CR) 10Dzahn: "CN=virt*.pmtpa.wmnet. note how there is the same but for eqiad still here" [puppet] - 10https://gerrit.wikimedia.org/r/164696 (owner: 10Dzahn) [20:12:13] Reedy, by git pull? [20:12:17] yeah [20:12:21] That change isn't submitted [20:12:27] (03PS4) 10Dzahn: remove virt-star.pmtpa SSL cert [puppet] - 10https://gerrit.wikimedia.org/r/164696 [20:12:40] (03CR) 10Dzahn: [C: 032] remove virt-star.pmtpa SSL cert [puppet] - 10https://gerrit.wikimedia.org/r/164696 (owner: 10Dzahn) [20:13:30] !log restarted ocg service [20:13:34] Reedy, yes, it doesn't make it past jenkins for some very strange reason. But when I do "git submodule update extensions/ZeroPortal", it always shows me the same message for wmf-2, whereas in wmf-3 it outputs nothing [20:13:37] Logged the message, Master [20:14:07] I just submitted it [20:14:19] need to git pull then git submodule update [20:15:44] Updating 6262d20..1b549d7 [20:15:54] Submodule path 'extensions/ZeroPortal': checked out '3215c4598eb0f6ee21e57cb1c97c4e986a277c26' [20:16:17] Reedy, thx, done, syncing... [20:16:31] what was the reason why it wasn't merging? [20:16:58] because that test is failing [20:17:04] !log yurik Synchronized php-1.25wmf3/extensions/ZeroPortal: updatidng ZeroPortal to master (duration: 01m 11s) [20:17:05] I guess it should be non voting as it's under development [20:17:10] Logged the message, Master [20:17:10] !log Deleted ten orphan wb_entity_per_page rows on wikidata [20:17:16] Logged the message, Master [20:17:20] oh, it is non voting [20:17:29] Maybe just zuul/jenkins being sucky [20:19:02] Reedy, thx. I tihnk it failed here - https://integration.wikimedia.org/ci/job/mediawiki-core-bundle-rubocop/17/console [20:21:25] (03CR) 10Dzahn: "it could only work in labs if it was in wmflabs.org but this is .wikimedia.org ?" [puppet] - 10https://gerrit.wikimedia.org/r/164690 (owner: 10Dzahn) [20:21:27] (03PS1) 10Yurik: Disabled ZeroBanner img font [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166865 [20:21:36] (03CR) 10Yurik: [C: 032] Disabled ZeroBanner img font [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166865 (owner: 10Yurik) [20:21:43] (03Merged) 10jenkins-bot: Disabled ZeroBanner img font [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166865 (owner: 10Yurik) [20:22:08] PROBLEM - MySQL Replication Heartbeat on db1026 is CRITICAL: CRIT replication delay 311 seconds [20:22:37] PROBLEM - MySQL Slave Delay on db1026 is CRITICAL: CRIT replication delay 345 seconds [20:22:56] RECOVERY - udp2log log age for lucene on oxygen is OK: OK: all log files active [20:22:59] RECOVERY - MySQL Replication Heartbeat on db1026 is OK: OK replication delay 83 seconds [20:23:36] RECOVERY - MySQL Slave Delay on db1026 is OK: OK replication delay 0 seconds [20:24:38] (03CR) 10Dzahn: [C: 031] "ishmael.wikimedia.org is an alias for misc-web-lb.eqiad.wikimedia.org." [puppet] - 10https://gerrit.wikimedia.org/r/164697 (owner: 10Dzahn) [20:25:04] (03PS3) 10Dzahn: remove ishmael SSL cert [puppet] - 10https://gerrit.wikimedia.org/r/164697 [20:25:39] !log yurik Synchronized wmf-config/mobile.php: Disable font for ZeroBanner (duration: 01m 05s) [20:25:44] Logged the message, Master [20:30:43] who grants sysop on meta for WMF? [20:32:05] yurikR2: isn't it granted via +staff already ? [20:33:03] (03PS1) 10Dzahn: ishmael - remove SSL config remnants [puppet] - 10https://gerrit.wikimedia.org/r/166868 [20:33:16] (03CR) 10Dzahn: [C: 04-1] "heh, first stop installing it via puppet :p" [puppet] - 10https://gerrit.wikimedia.org/r/164697 (owner: 10Dzahn) [20:33:57] hashar, it was, until my acct was demoted and a new, (wmf) was created [20:34:34] yurikR2: then poke James Alexander :-] [20:35:07] (03CR) 10Dzahn: [C: 032] "the Apache config does not have an SSL part" [puppet] - 10https://gerrit.wikimedia.org/r/166868 (owner: 10Dzahn) [20:35:24] yurikR2: he did the user rights tweaks ( see https://meta.wikimedia.org/w/index.php?title=Special%3ALog&type=rights&user=Jalexander-WMF ) [20:39:53] (03CR) 10Dzahn: "Chmarkine: the idea was being able to switch some services, especially Gerrit, to use this new setting while not having to exclude IE6 use" [puppet] - 10https://gerrit.wikimedia.org/r/166710 (owner: 10Dzahn) [20:40:45] !log restarting apache on mw1115 to test luasandbox [20:40:52] Logged the message, Master [20:41:23] !log Deleted 147 orphan wb_terms entries (bug 71914) [20:41:27] Logged the message, Master [20:43:08] (03CR) 10Dzahn: "Giuseppe: i did not intend to change the cipher list, it was supposed to just like compat just minus SSLv3" [puppet] - 10https://gerrit.wikimedia.org/r/166710 (owner: 10Dzahn) [20:44:00] (03PS4) 10Dzahn: remove ishmael SSL cert [puppet] - 10https://gerrit.wikimedia.org/r/164697 [20:44:12] (03CR) 10Dzahn: [C: 032] "removed in Change-Id: Ib3f83dc1cf160464da91" [puppet] - 10https://gerrit.wikimedia.org/r/164697 (owner: 10Dzahn) [20:48:57] !log deleting/shredding ishmael cert/keys from neon [20:49:03] Logged the message, Master [20:50:52] (03CR) 10Dzahn: [C: 032] "this was the Tampa serial console server" [puppet] - 10https://gerrit.wikimedia.org/r/159439 (owner: 10Dzahn) [20:55:44] (03CR) 10Dzahn: [C: 032] publichtml: tidy [puppet] - 10https://gerrit.wikimedia.org/r/166688 (owner: 10Ori.livneh) [20:58:09] (03CR) 10Dzahn: "yep, noop on terbium. and those things are right (for example 'ensure' always first, moving the includes up to top..)" [puppet] - 10https://gerrit.wikimedia.org/r/166688 (owner: 10Ori.livneh) [20:59:23] (03CR) 10Dzahn: "since we already have those for other languages i don't think we should block adding a couple more" [puppet] - 10https://gerrit.wikimedia.org/r/166686 (owner: 10John F. Lewis) [21:00:20] (03CR) 10Dzahn: "i'm not sure though if there was some manual part on the server involved. vaguely remember AndrewBogott and you doing this and mentioning " [puppet] - 10https://gerrit.wikimedia.org/r/166686 (owner: 10John F. Lewis) [21:02:42] !log running mwscript extensions/CentralAuth/maintenance/migrateAccount.php on terbium for broken accounts (bug 61876) [21:02:51] Logged the message, Master [21:02:56] Nemo_bis: ^ [21:03:22] thanks [21:08:37] (03Abandoned) 10Dzahn: remove entire $ORIGIN pmtpa. from wmnet [dns] - 10https://gerrit.wikimedia.org/r/165414 (owner: 10Dzahn) [21:10:47] (03PS1) 10Dzahn: remove Tampa appserver DNS entries [dns] - 10https://gerrit.wikimedia.org/r/166881 [21:13:35] (03PS1) 10Dzahn: remove Tampa power distribution unit entries [dns] - 10https://gerrit.wikimedia.org/r/166882 [21:16:57] PROBLEM - ElasticSearch health check on logstash1003 is CRITICAL: CRITICAL - Could not connect to server 10.64.32.136 [21:17:07] PROBLEM - ElasticSearch health check for shards on logstash1003 is CRITICAL: CRITICAL - elasticsearch http://10.64.32.136:9200/_cluster/health error while fetching: Request timed out. [21:17:53] Bleugh [21:18:07] RECOVERY - ElasticSearch health check for shards on logstash1003 is OK: OK - elasticsearch status production-logstash-eqiad: status: yellow, number_of_nodes: 3, unassigned_shards: 1, timed_out: False, active_primary_shards: 41, cluster_name: production-logstash-eqiad, relocating_shards: 0, active_shards: 112, initializing_shards: 0, number_of_data_nodes: 3 [21:18:22] log on, it fixes itself [21:18:23] gj [21:18:27] @logstash1003:~# /etc/init.d/logstash status [21:18:27] logstash is running [21:18:28] yea [21:18:34] PHP Fatal error: Base lambda function for closure not found in /srv/mediawiki/php-1.25wmf3/extensions/Wikidata/extensions/Wikibase/lib/config/WikibaseLib.default.php on line 18 [21:18:46] logstash was slow as hell btw [21:19:04] ...and is still slow [21:19:04] MaxSem: which mw? [21:19:12] 1115 [21:19:38] mutante: It's on about elasticsearch on logstash1003, not logstash :) [21:20:07] !log Gracefulled apache on mw1115 [21:20:12] Logged the message, Master [21:20:28] We really need to remove thta stupd closure [21:20:40] or APC :P [21:20:52] Reedy: :p true. but that is also "is running" [21:20:53] or both [21:21:22] hoo: how did you do that, apparently that change was merged to let developers do this? [21:21:31] eh, deployers [21:21:37] mutante: No, I abandoned it... we can do it anyway [21:21:49] just needed to look at stuff a bit more [21:21:55] .. then how did you just do it [21:22:07] The rule already exists [21:22:11] ah :) [21:22:15] so no need to add it :D [21:22:21] gotcha, ok [21:22:29] sudo apache2ctl graceful [21:22:54] that [21:23:00] yea, it's not that long since it was suggested to _add_ that [21:23:57] after all, if we can kill apaches with whatever the crap we deploy we should be allowed to restart them too :] [21:24:08] <_joe_> mutante: DON'T [21:24:23] _joe_: don't what? [21:24:33] MaxSem: Workaround: Just make them all segfault... also restarts them [21:24:33] <_joe_> mutante: gracefullying the api servers has mixed consequences [21:25:51] (03PS1) 10Yurik: Disable ZeroPortal ext on metawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166886 [21:27:07] ok, so first of all.. i did not graceful anything, second i just asked if deployers already can, and it seems that they do, also that is new that nobody is supposed to graceful, because it is needed all the time.. [21:27:29] hoo: ^ i don't know [21:27:38] if stuff fatals the only other solution would be to depool the apache [21:27:46] and then do something with it [21:27:56] like hard kill [21:29:36] (03PS1) 10Nemo bis: Enable uploads on he.wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166887 (https://bugzilla.wikimedia.org/72060) [21:30:48] (03PS1) 10Ottomata: [WIP] Initial commit of Cassandra puppet module [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/166888 [21:33:07] PROBLEM - puppet last run on mw1198 is CRITICAL: CRITICAL: Puppet has 1 failures [21:33:09] mutante: What's the proper way of adding myself to CC of an RT ticket (E.g. "watching" it) [21:33:36] Krinkle: Click people. Then add a watcher [21:33:43] Ala https://rt.wikimedia.org/Ticket/ModifyPeople.html?id=8007 [21:34:06] (03CR) 10Ottomata: "Tested locally in mediawiki-vagrant with hiera. Needs some documentation and cleanup." [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/166888 (owner: 10Ottomata) [21:34:45] Krinkle: try AdminCC , that's what others like jeremyb use all the time [21:35:09] Krinkle: ops are auto bcc, so we don't usually do it [21:35:11] (03CR) 10Ottomata: "Oh, also TODO: Gabriel mentioned that we might want to consider some security settings. Want to talk about this and see how necessary i" [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/166888 (owner: 10Ottomata) [21:35:18] mutante: I'm looking but don't see that [21:35:24] where do I go from e.g. https://rt.wikimedia.org/Ticket/Display.html?id=8655 [21:35:44] the "People" tab [21:36:01] then there's a drop-down on the left, Add new watchers [21:36:27] Right [21:36:29] and you can select the type of it [21:36:36] requestor, cc or admincc [21:36:40] and I should leave all the opsen bcc enabled right? Or is that just for the notification of my cc? [21:37:03] !log restarting hhvm on mw1114, this time with luasandbox [21:37:06] yea, don't change the ops bcc [21:37:10] Logged the message, Master [21:37:15] just add yourself above and hit save [21:37:52] mutante: will that notify people? [21:38:30] Loads of things in RT notify everyone [21:39:02] Krinkle: no, adding a watcher should not create an extra mail [21:39:24] well, i say that because when i see jeremy adding himself on other tickets there is no "outgoing mail recorded"-comment after that [21:40:50] okay! [21:45:27] (03CR) 10GWicke: [WIP] Initial commit of Cassandra puppet module (034 comments) [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/166888 (owner: 10Ottomata) [21:51:27] RECOVERY - puppet last run on mw1198 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [21:59:41] (03PS1) 10Dzahn: remove exim 'imap_accounts' file [puppet] - 10https://gerrit.wikimedia.org/r/166894 [22:27:36] (03CR) 10Dzahn: [C: 031] Enable uploads on he.wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166887 (https://bugzilla.wikimedia.org/72060) (owner: 10Nemo bis) [22:38:15] (03CR) 10GWicke: "Re security: I think we'll at the very least want to protect our cassandra access with passwords." [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/166888 (owner: 10Ottomata) [22:39:12] (03CR) 10John F. Lewis: "@Daniel: Yeah - what happened with Andrew first time was there was a symlink (the usual template directory to /etc/mailman/) which was not" [puppet] - 10https://gerrit.wikimedia.org/r/166686 (owner: 10John F. Lewis) [22:40:23] (03CR) 10Chmarkine: [C: 031] "Thank you for your explanation, Dzahn!" [puppet] - 10https://gerrit.wikimedia.org/r/166710 (owner: 10Dzahn) [22:40:32] (03PS1) 10Yuvipanda: graphite: Add labs archiver script [puppet] - 10https://gerrit.wikimedia.org/r/166902 [23:00:05] RoanKattouw, ^d, marktraceur, MaxSem, Krinkle: Respected human, time to deploy SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141015T2300). Please do the needful. [23:00:15] mhm [23:00:20] * MaxSem looks around [23:00:32] * Krinkle is here [23:00:46] okayyy [23:00:55] let me deploy then [23:00:59] I think that bot is a bit too respectful, it scares me [23:01:07] yurikR: [23:01:17] Krinkle [23:01:30] yurikR: swat time, you need to be around. bot didn't ping you because syntax https://wikitech.wikimedia.org/w/index.php?title=Deployments&diff=131236&oldid=131235 [23:01:48] jamesofur, "Get your lazy asses up, time to fix your crap"? :P [23:01:51] silly bot :) [23:01:59] MaxSem: that would be much better yup ;) [23:03:13] (03CR) 10MaxSem: [C: 032] Disable ZeroPortal ext on metawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166886 (owner: 10Yurik) [23:03:25] (03Merged) 10jenkins-bot: Disable ZeroPortal ext on metawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166886 (owner: 10Yurik) [23:04:46] !log maxsem Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/166886/ (duration: 00m 05s) [23:04:53] Logged the message, Master [23:04:59] yurikR, ^ [23:08:31] MaxSem, thx. Seems like when we delete namespace, logs could become a bit confusing: https://meta.wikimedia.org/w/index.php?title=Special%3ALog&type=delete&user=&page=&year=&month=-1&tagfilter= [23:10:03] !log maxsem Synchronized php-1.25wmf2/extensions/MobileFrontend/: (no message) (duration: 00m 04s) [23:10:10] Logged the message, Master [23:10:15] Krinkle, ^ [23:17:11] (03PS6) 10Dzahn: [RFC] add annual.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/165927 [23:20:08] MaxSem: confirmed fix [23:23:36] (03PS1) 10Dzahn: wikimedia.org service aliases - indentation fixes [dns] - 10https://gerrit.wikimedia.org/r/166914 [23:25:09] (03CR) 10jenkins-bot: [V: 04-1] wikimedia.org service aliases - indentation fixes [dns] - 10https://gerrit.wikimedia.org/r/166914 (owner: 10Dzahn) [23:29:34] (03CR) 10Jorm: [C: 031] [RFC] add annual.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/165927 (owner: 10Dzahn) [23:35:08] (03CR) 10Jforrester: "If this is for WMF not Wikimedia it should ideally be under the WMF domain, but oh well." [dns] - 10https://gerrit.wikimedia.org/r/165927 (owner: 10Dzahn) [23:41:51] (03CR) 10MZMcBride: "What's wrong with ? You can omit the "foundation" part if you'd like ( hi what are you doing today [23:44:05] _2_Brandon: Hi! Can we help you with something? [23:53:06] <_2_Brandon> yes because my dad he said to my mom stupid and fuck you [23:53:19] !ops [23:53:24] Oh, christ. [23:54:01] <_2_Brandon> what is your name [23:57:59] (03CR) 10Eloquence: "Please see the linked to Phabricator ticket for discussion and let's please keep it focused there." [dns] - 10https://gerrit.wikimedia.org/r/165927 (owner: 10Dzahn)