[00:00:02] (03CR) 10Dzahn: [C: 031] "http://puppet-compiler.wmflabs.org/258/change/153986/diff/palladium.eqiad.wmnet.diff.formatted" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153987 (owner: 10Dzahn) [00:03:20] (03CR) 10Dzahn: puppetmaster Apache template - retab [operations/puppet] - 10https://gerrit.wikimedia.org/r/153987 (owner: 10Dzahn) [00:03:46] (03CR) 10Dzahn: "oops, wrong change. delete comments from gerrit ?:)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153987 (owner: 10Dzahn) [00:04:08] (03CR) 10Dzahn: [C: 031] "http://puppet-compiler.wmflabs.org/258/change/153986/diff/palladium.eqiad.wmnet.diff.formatted" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153986 (owner: 10Dzahn) [00:04:44] (03PS3) 10Dzahn: puppetmaster Apache template - retab [operations/puppet] - 10https://gerrit.wikimedia.org/r/153987 [00:11:36] (03CR) 10Dzahn: [C: 032] mw-rc-irc - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153843 (owner: 10Dzahn) [00:18:48] (03CR) 10Dzahn: "another class that is not actually used??" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153843 (owner: 10Dzahn) [00:22:22] (03PS1) 10Dzahn: mw-rc-irc - actually use the Apache class [operations/puppet] - 10https://gerrit.wikimedia.org/r/156740 [00:23:26] (03CR) 10Hashar: [C: 04-1] "On labs we can't use deployment::target but rely on puppet to update git repositories via git::clone()." (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/156669 (owner: 10Chad) [00:26:20] (03CR) 10Dzahn: "this could have never worked before. it uses path module/downloads but there is no such module, only download, singular" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153817 (owner: 10Dzahn) [00:29:07] (03PS4) 10Dzahn: download.wm.org - use apache::site method [operations/puppet] - 10https://gerrit.wikimedia.org/r/153817 [00:29:20] (03PS5) 10Dzahn: download.wm.org - use apache::site method [operations/puppet] - 10https://gerrit.wikimedia.org/r/153817 [00:32:17] (03CR) 10Dzahn: [C: 031] "grep "role::down" site.pp - not even used" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153817 (owner: 10Dzahn) [00:38:34] (03CR) 10Dzahn: [C: 032] salt - minion.erb - fix compiler warnings [operations/puppet] - 10https://gerrit.wikimedia.org/r/154347 (owner: 10Dzahn) [00:45:49] (03CR) 10Dzahn: "yes, this potentially affected each and every host because touching salt minion, but - NOP - double checked on argon and also on deploymen" [operations/puppet] - 10https://gerrit.wikimedia.org/r/154347 (owner: 10Dzahn) [00:47:17] (03CR) 10Dzahn: "removes a bunch of noise from every single puppet compiler run ( i know you don't care:)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/154347 (owner: 10Dzahn) [00:50:07] PROBLEM - Puppet freshness on mw1053 is CRITICAL: Last successful Puppet run was Wed 27 Aug 2014 06:41:47 UTC [01:18:35] (03CR) 10Chad: "We can't? We use git-deploy to deploy it to the deployment-elastic* boxes." [operations/puppet] - 10https://gerrit.wikimedia.org/r/156669 (owner: 10Chad) [01:38:04] (03CR) 10Chad: [C: 032] Use __DIR__ instead of the CWD [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156704 (owner: 10Aaron Schulz) [01:38:12] (03Merged) 10jenkins-bot: Use __DIR__ instead of the CWD [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156704 (owner: 10Aaron Schulz) [01:38:40] !log demon Synchronized rpc/RunJobs.php: __DIR__ instead of cwd (duration: 00m 05s) [01:40:46] (03CR) 10Chad: [C: 031] gerrit: allow . in Jenkins jobs names [operations/puppet] - 10https://gerrit.wikimedia.org/r/156103 (owner: 10Hashar) [01:42:07] PROBLEM - Puppet freshness on install2001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 00:00:00 UTC [01:43:59] (03CR) 10Chad: [C: 031] "Let's go ahead and try this like tomorrow :)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/156260 (owner: 10Filippo Giunchedi) [01:51:49] (03CR) 10Dzahn: "is this missing the actual check_elasticsearch.py in the module files/nagios/ ?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/156260 (owner: 10Filippo Giunchedi) [02:11:30] (03PS1) 10Dzahn: fix updating of newly added W3C wikis [operations/debs/wikistats] - 10https://gerrit.wikimedia.org/r/156745 [02:17:21] !log LocalisationUpdate completed (1.24wmf17) at 2014-08-28 02:16:17+00:00 [02:22:55] (03CR) 10MZMcBride: Make permenant the recovery concurrent stream throttle (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/156238 (owner: 10Chad) [02:25:01] (03PS1) 10Dzahn: fix outgoing links in table for W3C wikis [operations/debs/wikistats] - 10https://gerrit.wikimedia.org/r/156746 [02:30:46] (03PS1) 10Dzahn: wikistats - add cron for w3c table updates [operations/puppet] - 10https://gerrit.wikimedia.org/r/156747 [02:31:37] PROBLEM - mailman_ctl on sodium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 38 (list), regex args /mailman/bin/mailmanctl [02:32:17] PROBLEM - mailman_qrunner on sodium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 38 (list), regex args /mailman/bin/qrunner [02:34:37] RECOVERY - mailman_ctl on sodium is OK: PROCS OK: 1 process with UID = 38 (list), regex args /mailman/bin/mailmanctl [02:35:17] RECOVERY - mailman_qrunner on sodium is OK: PROCS OK: 8 processes with UID = 38 (list), regex args /mailman/bin/qrunner [02:39:17] PROBLEM - mailman_qrunner on sodium is CRITICAL: PROCS CRITICAL: 15 processes with UID = 38 (list), regex args /mailman/bin/qrunner [02:39:37] PROBLEM - mailman_ctl on sodium is CRITICAL: PROCS CRITICAL: 2 processes with UID = 38 (list), regex args /mailman/bin/mailmanctl [02:40:48] PROBLEM - puppet last run on mw1078 is CRITICAL: CRITICAL: Puppet has 1 failures [02:42:48] PROBLEM - puppet last run on mw1216 is CRITICAL: CRITICAL: Puppet has 1 failures [02:44:11] !log LocalisationUpdate completed (1.24wmf18) at 2014-08-28 02:43:08+00:00 [02:45:00] (03CR) 10Dzahn: [C: 032] wikistats - add cron for w3c table updates [operations/puppet] - 10https://gerrit.wikimedia.org/r/156747 (owner: 10Dzahn) [02:49:17] RECOVERY - mailman_qrunner on sodium is OK: PROCS OK: 8 processes with UID = 38 (list), regex args /mailman/bin/qrunner [02:49:37] RECOVERY - mailman_ctl on sodium is OK: PROCS OK: 1 process with UID = 38 (list), regex args /mailman/bin/mailmanctl [02:51:07] PROBLEM - Puppet freshness on mw1053 is CRITICAL: Last successful Puppet run was Wed 27 Aug 2014 06:41:47 UTC [02:53:37] PROBLEM - mailman_ctl on sodium is CRITICAL: PROCS CRITICAL: 2 processes with UID = 38 (list), regex args /mailman/bin/mailmanctl [02:54:17] PROBLEM - mailman_qrunner on sodium is CRITICAL: PROCS CRITICAL: 16 processes with UID = 38 (list), regex args /mailman/bin/qrunner [02:54:28] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [02:58:47] RECOVERY - puppet last run on mw1078 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [03:01:47] RECOVERY - puppet last run on mw1216 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [03:09:37] RECOVERY - mailman_ctl on sodium is OK: PROCS OK: 1 process with UID = 38 (list), regex args /mailman/bin/mailmanctl [03:10:17] RECOVERY - mailman_qrunner on sodium is OK: PROCS OK: 8 processes with UID = 38 (list), regex args /mailman/bin/qrunner [03:14:17] PROBLEM - mailman_qrunner on sodium is CRITICAL: PROCS CRITICAL: 15 processes with UID = 38 (list), regex args /mailman/bin/qrunner [03:14:37] PROBLEM - mailman_ctl on sodium is CRITICAL: PROCS CRITICAL: 2 processes with UID = 38 (list), regex args /mailman/bin/mailmanctl [03:29:17] RECOVERY - mailman_qrunner on sodium is OK: PROCS OK: 8 processes with UID = 38 (list), regex args /mailman/bin/qrunner [03:29:37] RECOVERY - mailman_ctl on sodium is OK: PROCS OK: 1 process with UID = 38 (list), regex args /mailman/bin/mailmanctl [03:35:17] PROBLEM - mailman_qrunner on sodium is CRITICAL: PROCS CRITICAL: 16 processes with UID = 38 (list), regex args /mailman/bin/qrunner [03:35:37] PROBLEM - mailman_ctl on sodium is CRITICAL: PROCS CRITICAL: 2 processes with UID = 38 (list), regex args /mailman/bin/mailmanctl [03:36:42] !log LocalisationUpdate ResourceLoader cache refresh completed at Thu Aug 28 03:35:36 UTC 2014 (duration 35m 35s) [03:43:07] PROBLEM - Puppet freshness on install2001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 00:00:00 UTC [03:49:17] RECOVERY - mailman_qrunner on sodium is OK: PROCS OK: 8 processes with UID = 38 (list), regex args /mailman/bin/qrunner [03:49:37] RECOVERY - mailman_ctl on sodium is OK: PROCS OK: 1 process with UID = 38 (list), regex args /mailman/bin/mailmanctl [03:54:17] PROBLEM - mailman_qrunner on sodium is CRITICAL: PROCS CRITICAL: 16 processes with UID = 38 (list), regex args /mailman/bin/qrunner [03:54:37] PROBLEM - mailman_ctl on sodium is CRITICAL: PROCS CRITICAL: 2 processes with UID = 38 (list), regex args /mailman/bin/mailmanctl [03:55:54] springle: thank you for renaming that! [03:56:31] I just now logged on to suggest that I merge the config patch now, but it occurs to me that I don't actually know how to properly merge after doing that, so probably I should wait until tomorrow and get advice from roan. [03:57:21] (03CR) 10Chad: Make permenant the recovery concurrent stream throttle (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/156238 (owner: 10Chad) [03:58:48] <^d> andrewbogott: Your labswiki change? [03:59:17] ^d: this one: https://gerrit.wikimedia.org/r/#/c/156695/ [03:59:34] springle renamed labswiki to deploymentwiki so now that can be merged. [03:59:37] <^d> Ah. [03:59:42] ...maybe [03:59:43] <^d> That should go live automatically by jenkins. [04:00:04] (03CR) 10Andrew Bogott: "retest" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156695 (owner: 10Andrew Bogott) [04:00:45] <^d> I'm not sure if it should go in deleted.dblist though. [04:00:51] <^d> If we're going to be adding it in prod. [04:01:00] oh, good point [04:01:31] <^d> Other than that lgtm. [04:02:15] (03PS5) 10Andrew Bogott: Rename labswiki to deploymentwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156695 [04:02:22] (03CR) 10jenkins-bot: [V: 04-1] Rename labswiki to deploymentwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156695 (owner: 10Andrew Bogott) [04:02:26] It's not as simple as just merging, is it? Doesn't a sync need running afterwards? [04:02:37] <^d> Not for beta. [04:02:43] <^d> It'll deploy by jenkins every 5mins. [04:03:01] <^d> Oh, guess we'll need to pull to tin so icinga doesn't complain. [04:03:04] andrewbogott: yw [04:03:33] hm, ^d, do you understand the test failure that I'm getting now? [04:05:06] <^d> hmm [04:08:17] RECOVERY - mailman_qrunner on sodium is OK: PROCS OK: 8 processes with UID = 38 (list), regex args /mailman/bin/qrunner [04:08:37] RECOVERY - mailman_ctl on sodium is OK: PROCS OK: 1 process with UID = 38 (list), regex args /mailman/bin/mailmanctl [04:09:30] <^d> andrewbogott: I think that test will be broken until you merge. [04:09:44] ^d: Well, that kind of defeats the purpose :) [04:10:19] <^d> Yeah, it's kind of a silly test. [04:10:27] <^d> Well, maybe not so silly if database names don't change. [04:11:13] Anyway I guess there's no real reason to merge this now since springle left 'labswiki' working. I don't want to stay up all night if something goes horribly wrong. [04:12:37] PROBLEM - mailman_ctl on sodium is CRITICAL: PROCS CRITICAL: 2 processes with UID = 38 (list), regex args /mailman/bin/mailmanctl [04:13:17] PROBLEM - mailman_qrunner on sodium is CRITICAL: PROCS CRITICAL: 16 processes with UID = 38 (list), regex args /mailman/bin/qrunner [04:15:27] PROBLEM - puppet last run on cp3018 is CRITICAL: CRITICAL: Epic puppet fail [04:28:17] RECOVERY - mailman_qrunner on sodium is OK: PROCS OK: 8 processes with UID = 38 (list), regex args /mailman/bin/qrunner [04:28:37] RECOVERY - mailman_ctl on sodium is OK: PROCS OK: 1 process with UID = 38 (list), regex args /mailman/bin/mailmanctl [04:31:07] PROBLEM - puppet last run on analytics1020 is CRITICAL: CRITICAL: Puppet last ran 14877 seconds ago, expected 14400 [04:31:19] analytics1020 is me [04:32:07] RECOVERY - puppet last run on analytics1020 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [04:33:17] PROBLEM - mailman_qrunner on sodium is CRITICAL: PROCS CRITICAL: 16 processes with UID = 38 (list), regex args /mailman/bin/qrunner [04:33:37] PROBLEM - mailman_ctl on sodium is CRITICAL: PROCS CRITICAL: 2 processes with UID = 38 (list), regex args /mailman/bin/mailmanctl [04:34:27] RECOVERY - puppet last run on cp3018 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [04:44:55] (03PS1) 10Springle: Various config changes for s1 test boxes (db107[23]) [operations/puppet] - 10https://gerrit.wikimedia.org/r/156752 [04:49:17] RECOVERY - mailman_qrunner on sodium is OK: PROCS OK: 8 processes with UID = 38 (list), regex args /mailman/bin/qrunner [04:49:37] RECOVERY - mailman_ctl on sodium is OK: PROCS OK: 1 process with UID = 38 (list), regex args /mailman/bin/mailmanctl [04:51:06] (03CR) 10Springle: [C: 032] Various config changes for s1 test boxes (db107[23]) [operations/puppet] - 10https://gerrit.wikimedia.org/r/156752 (owner: 10Springle) [04:52:07] PROBLEM - Puppet freshness on mw1053 is CRITICAL: Last successful Puppet run was Wed 27 Aug 2014 06:41:47 UTC [04:52:42] is beta unreachable or is the issue on my end? [04:52:55] https://en.wikipedia.beta.wmflabs.org/ gives me "unable to connect" atm [04:53:17] PROBLEM - mailman_qrunner on sodium is CRITICAL: PROCS CRITICAL: 16 processes with UID = 38 (list), regex args /mailman/bin/qrunner [04:53:37] PROBLEM - mailman_ctl on sodium is CRITICAL: PROCS CRITICAL: 2 processes with UID = 38 (list), regex args /mailman/bin/mailmanctl [04:58:11] !log xtrabackup clone db1051 to db1072 [04:59:24] no bot [05:10:17] RECOVERY - mailman_qrunner on sodium is OK: PROCS OK: 8 processes with UID = 38 (list), regex args /mailman/bin/qrunner [05:10:37] RECOVERY - mailman_ctl on sodium is OK: PROCS OK: 1 process with UID = 38 (list), regex args /mailman/bin/mailmanctl [05:13:37] PROBLEM - mailman_ctl on sodium is CRITICAL: PROCS CRITICAL: 2 processes with UID = 38 (list), regex args /mailman/bin/mailmanctl [05:14:17] PROBLEM - mailman_qrunner on sodium is CRITICAL: PROCS CRITICAL: 16 processes with UID = 38 (list), regex args /mailman/bin/qrunner [05:28:17] RECOVERY - mailman_qrunner on sodium is OK: PROCS OK: 8 processes with UID = 38 (list), regex args /mailman/bin/qrunner [05:28:37] RECOVERY - mailman_ctl on sodium is OK: PROCS OK: 1 process with UID = 38 (list), regex args /mailman/bin/mailmanctl [05:32:37] PROBLEM - mailman_ctl on sodium is CRITICAL: PROCS CRITICAL: 2 processes with UID = 38 (list), regex args /mailman/bin/mailmanctl [05:33:17] PROBLEM - mailman_qrunner on sodium is CRITICAL: PROCS CRITICAL: 16 processes with UID = 38 (list), regex args /mailman/bin/qrunner [05:34:52] !log icinga - delay notification for mailman_ctl an mailman_qrunner on sodium for 6 hours - process check args needs adjustment [05:42:39] mutante: morebots is dead. do you have access to tools-login? [05:44:07] PROBLEM - Puppet freshness on install2001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 00:00:00 UTC [05:48:17] RECOVERY - mailman_qrunner on sodium is OK: PROCS OK: 8 processes with UID = 38 (list), regex args /mailman/bin/qrunner [05:48:37] RECOVERY - mailman_ctl on sodium is OK: PROCS OK: 1 process with UID = 38 (list), regex args /mailman/bin/mailmanctl [05:53:17] PROBLEM - mailman_qrunner on sodium is CRITICAL: PROCS CRITICAL: 16 processes with UID = 38 (list), regex args /mailman/bin/qrunner [05:53:37] PROBLEM - mailman_ctl on sodium is CRITICAL: PROCS CRITICAL: 2 processes with UID = 38 (list), regex args /mailman/bin/mailmanctl [06:09:17] RECOVERY - mailman_qrunner on sodium is OK: PROCS OK: 8 processes with UID = 38 (list), regex args /mailman/bin/qrunner [06:09:37] RECOVERY - mailman_ctl on sodium is OK: PROCS OK: 1 process with UID = 38 (list), regex args /mailman/bin/mailmanctl [06:11:55] !log xtrabackup clone db1051 to db1072 [06:12:03] Logged the message, Master [06:14:17] PROBLEM - mailman_qrunner on sodium is CRITICAL: PROCS CRITICAL: 16 processes with UID = 38 (list), regex args /mailman/bin/qrunner [06:14:37] PROBLEM - mailman_ctl on sodium is CRITICAL: PROCS CRITICAL: 2 processes with UID = 38 (list), regex args /mailman/bin/mailmanctl [06:27:27] PROBLEM - puppet last run on mw1211 is CRITICAL: CRITICAL: Epic puppet fail [06:28:18] RECOVERY - mailman_qrunner on sodium is OK: PROCS OK: 8 processes with UID = 38 (list), regex args /mailman/bin/qrunner [06:28:38] RECOVERY - mailman_ctl on sodium is OK: PROCS OK: 1 process with UID = 38 (list), regex args /mailman/bin/mailmanctl [06:28:48] PROBLEM - puppet last run on mw1025 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:37] PROBLEM - mailman_ctl on sodium is CRITICAL: PROCS CRITICAL: 2 processes with UID = 38 (list), regex args /mailman/bin/mailmanctl [06:33:17] PROBLEM - mailman_qrunner on sodium is CRITICAL: PROCS CRITICAL: 16 processes with UID = 38 (list), regex args /mailman/bin/qrunner [06:36:27] RECOVERY - Disk space on ms1004 is OK: DISK OK [06:39:27] PROBLEM - Disk space on ms1004 is CRITICAL: DISK CRITICAL - free space: / 97 MB (0% inode=94%): /var/lib/ureadahead/debugfs 97 MB (0% inode=94%): [06:46:48] RECOVERY - puppet last run on mw1025 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [06:47:28] RECOVERY - puppet last run on mw1211 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [06:49:17] RECOVERY - mailman_qrunner on sodium is OK: PROCS OK: 8 processes with UID = 38 (list), regex args /mailman/bin/qrunner [06:49:37] RECOVERY - mailman_ctl on sodium is OK: PROCS OK: 1 process with UID = 38 (list), regex args /mailman/bin/mailmanctl [06:53:07] PROBLEM - Puppet freshness on mw1053 is CRITICAL: Last successful Puppet run was Wed 27 Aug 2014 06:41:47 UTC [06:53:17] PROBLEM - mailman_qrunner on sodium is CRITICAL: PROCS CRITICAL: 16 processes with UID = 38 (list), regex args /mailman/bin/qrunner [06:53:37] PROBLEM - mailman_ctl on sodium is CRITICAL: PROCS CRITICAL: 2 processes with UID = 38 (list), regex args /mailman/bin/mailmanctl [07:08:17] RECOVERY - mailman_qrunner on sodium is OK: PROCS OK: 8 processes with UID = 38 (list), regex args /mailman/bin/qrunner [07:08:37] RECOVERY - mailman_ctl on sodium is OK: PROCS OK: 1 process with UID = 38 (list), regex args /mailman/bin/mailmanctl [07:13:17] PROBLEM - mailman_qrunner on sodium is CRITICAL: PROCS CRITICAL: 16 processes with UID = 38 (list), regex args /mailman/bin/qrunner [07:13:37] PROBLEM - mailman_ctl on sodium is CRITICAL: PROCS CRITICAL: 2 processes with UID = 38 (list), regex args /mailman/bin/mailmanctl [07:27:26] (03PS3) 10Giuseppe Lavagetto: hadoop: Give deskana access to Hive [operations/puppet] - 10https://gerrit.wikimedia.org/r/155137 (owner: 10Yuvipanda) [07:28:17] RECOVERY - mailman_qrunner on sodium is OK: PROCS OK: 8 processes with UID = 38 (list), regex args /mailman/bin/qrunner [07:28:37] RECOVERY - mailman_ctl on sodium is OK: PROCS OK: 1 process with UID = 38 (list), regex args /mailman/bin/mailmanctl [07:32:37] PROBLEM - mailman_ctl on sodium is CRITICAL: PROCS CRITICAL: 2 processes with UID = 38 (list), regex args /mailman/bin/mailmanctl [07:33:06] (03CR) 10Filippo Giunchedi: "minor stuff, LGTM" (034 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/156303 (owner: 10Giuseppe Lavagetto) [07:33:18] PROBLEM - mailman_qrunner on sodium is CRITICAL: PROCS CRITICAL: 16 processes with UID = 38 (list), regex args /mailman/bin/qrunner [07:45:07] PROBLEM - Puppet freshness on install2001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 00:00:00 UTC [07:48:37] RECOVERY - mailman_ctl on sodium is OK: PROCS OK: 1 process with UID = 38 (list), regex args /mailman/bin/mailmanctl [07:49:04] I'm taking a look at sodium [07:49:17] RECOVERY - mailman_qrunner on sodium is OK: PROCS OK: 8 processes with UID = 38 (list), regex args /mailman/bin/qrunner [07:53:17] PROBLEM - mailman_qrunner on sodium is CRITICAL: PROCS CRITICAL: 16 processes with UID = 38 (list), regex args /mailman/bin/qrunner [07:53:37] PROBLEM - mailman_ctl on sodium is CRITICAL: PROCS CRITICAL: 2 processes with UID = 38 (list), regex args /mailman/bin/mailmanctl [08:01:17] PROBLEM - Puppet freshness on es1003 is CRITICAL: Last successful Puppet run was Thu 28 Aug 2014 07:58:23 UTC [08:01:37] RECOVERY - mailman_ctl on sodium is OK: PROCS OK: 1 process with UID = 38 (list), regex args /mailman/bin/mailmanctl [08:02:18] RECOVERY - mailman_qrunner on sodium is OK: PROCS OK: 8 processes with UID = 38 (list), regex args /mailman/bin/qrunner [08:03:17] PROBLEM - Puppet freshness on es1003 is CRITICAL: Last successful Puppet run was Thu 28 Aug 2014 07:58:23 UTC [08:03:17] !log killed stray mailman processes on sodium (no pid file) and restarted mailman [08:03:23] Logged the message, Master [08:04:12] <_joe_> godog: again? [08:04:19] <_joe_> it happened yesterday as well [08:04:24] <_joe_> something is not right [08:05:17] PROBLEM - Puppet freshness on es1003 is CRITICAL: Last successful Puppet run was Thu 28 Aug 2014 07:58:23 UTC [08:07:17] PROBLEM - Puppet freshness on es1003 is CRITICAL: Last successful Puppet run was Thu 28 Aug 2014 07:58:23 UTC [08:07:19] _joe_: yep, one of the problem is that puppet refreshes mailman at each run and calls dpkg-reconfigure mailman [08:07:46] <_joe_> ?? [08:07:53] <_joe_> why is that? [08:09:17] PROBLEM - Puppet freshness on es1003 is CRITICAL: Last successful Puppet run was Thu 28 Aug 2014 07:58:23 UTC [08:10:33] no idea yet [08:10:58] in other news, puppet agent --disable "reason" is nice [08:11:03] (03CR) 10Alexandros Kosiaris: [C: 032] gerrit: allow . in Jenkins jobs names [operations/puppet] - 10https://gerrit.wikimedia.org/r/156103 (owner: 10Hashar) [08:11:17] PROBLEM - Puppet freshness on es1003 is CRITICAL: Last successful Puppet run was Thu 28 Aug 2014 07:58:23 UTC [08:12:37] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [08:12:53] godog: yes it is... we can even parse that reason and add it to the icinga check's message [08:13:05] and of course I don't do it...yet [08:13:17] PROBLEM - Puppet freshness on es1003 is CRITICAL: Last successful Puppet run was Thu 28 Aug 2014 07:58:23 UTC [08:13:40] <_joe_> akosiaris: I was thinking of it the other day [08:13:45] akosiaris: yep I think that'd be very nice to have! [08:14:11] <_joe_> when I disabled puppet on the appservers with some not publicly reproducible reason :) [08:14:19] <_joe_> out of frustration of course [08:15:17] PROBLEM - Puppet freshness on es1003 is CRITICAL: Last successful Puppet run was Thu 28 Aug 2014 07:58:23 UTC [08:15:31] (03CR) 10Alexandros Kosiaris: [C: 032] Comment updates from default config [operations/puppet] - 10https://gerrit.wikimedia.org/r/156689 (owner: 10Chad) [08:17:17] PROBLEM - Puppet freshness on es1003 is CRITICAL: Last successful Puppet run was Thu 28 Aug 2014 07:58:23 UTC [08:17:57] RECOVERY - Puppet freshness on es1003 is OK: puppet ran at Thu Aug 28 08:17:51 UTC 2014 [08:32:30] (03CR) 10Alexandros Kosiaris: "Please no /run/shm. It is a tmpfs filesystem but strictly speaking it is intended as storage for programs using the POSIX Shared Memory AP" [operations/puppet] - 10https://gerrit.wikimedia.org/r/156673 (owner: 10Ottomata) [08:44:18] PROBLEM - Apache HTTP on mw1200 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:44:57] PROBLEM - Apache HTTP on mw1196 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:45:07] PROBLEM - Apache HTTP on mw1207 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:45:18] PROBLEM - Apache HTTP on mw1131 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:45:28] PROBLEM - Apache HTTP on mw1120 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:45:28] PROBLEM - RAID on mw1120 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:45:28] PROBLEM - Apache HTTP on mw1197 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:45:37] PROBLEM - Apache HTTP on mw1192 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:45:47] PROBLEM - Apache HTTP on mw1126 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:45:47] PROBLEM - Apache HTTP on mw1137 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:45:47] PROBLEM - RAID on mw1148 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:45:57] PROBLEM - Apache HTTP on mw1145 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:45:57] PROBLEM - puppet last run on mw1116 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:45:57] PROBLEM - RAID on mw1144 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:45:57] PROBLEM - RAID on mw1147 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:45:58] PROBLEM - RAID on mw1143 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:45:58] PROBLEM - puppet last run on mw1148 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:45:58] PROBLEM - Apache HTTP on mw1205 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:45:58] PROBLEM - Apache HTTP on mw1195 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:45:58] PROBLEM - Apache HTTP on mw1194 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:45:59] PROBLEM - Apache HTTP on mw1202 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:45:59] PROBLEM - Apache HTTP on mw1135 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:46:07] PROBLEM - Apache HTTP on mw1198 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:46:07] PROBLEM - RAID on mw1122 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:46:07] PROBLEM - Apache HTTP on mw1129 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:46:07] PROBLEM - Apache HTTP on mw1139 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:46:17] PROBLEM - Apache HTTP on mw1140 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:46:18] PROBLEM - Apache HTTP on mw1122 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:46:18] PROBLEM - Apache HTTP on mw1148 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:46:37] PROBLEM - Apache HTTP on mw1191 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:46:47] PROBLEM - puppet last run on mw1131 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:46:47] RECOVERY - RAID on mw1144 is OK: OK: no RAID installed [08:46:48] RECOVERY - RAID on mw1147 is OK: OK: no RAID installed [08:46:48] RECOVERY - puppet last run on mw1116 is OK: OK: Puppet is currently enabled, last run 942 seconds ago with 0 failures [08:46:48] RECOVERY - RAID on mw1143 is OK: OK: no RAID installed [08:46:48] RECOVERY - puppet last run on mw1148 is OK: OK: Puppet is currently enabled, last run 905 seconds ago with 0 failures [08:46:57] RECOVERY - Apache HTTP on mw1139 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 0.207 second response time [08:46:57] PROBLEM - Apache HTTP on mw1190 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:47:04] uh oh, that's the api servers [08:47:37] RECOVERY - puppet last run on mw1131 is OK: OK: Puppet is currently enabled, last run 703 seconds ago with 0 failures [08:47:37] RECOVERY - Apache HTTP on mw1192 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 9.864 second response time [08:47:37] RECOVERY - Apache HTTP on mw1126 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 0.468 second response time [08:47:40] https://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&m=mem_report&s=by+name&c=API+application+servers+eqiad&h=&host_regex=&max_graphs=0&tab=m&vn=&hide-hf=false&sh=1&z=small&hc=4 [08:47:48] RECOVERY - Apache HTTP on mw1195 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 0.474 second response time [08:47:48] RECOVERY - Apache HTTP on mw1205 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 0.726 second response time [08:47:57] RECOVERY - Apache HTTP on mw1196 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 1.892 second response time [08:47:57] PROBLEM - RAID on mw1137 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:47:57] RECOVERY - RAID on mw1122 is OK: OK: no RAID installed [08:47:57] RECOVERY - Apache HTTP on mw1194 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 4.311 second response time [08:47:57] RECOVERY - Apache HTTP on mw1198 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 0.278 second response time [08:47:57] RECOVERY - Apache HTTP on mw1129 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 0.129 second response time [08:48:07] RECOVERY - Apache HTTP on mw1140 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 0.134 second response time [08:48:07] RECOVERY - Apache HTTP on mw1122 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 0.073 second response time [08:48:07] RECOVERY - Apache HTTP on mw1148 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 0.146 second response time [08:48:07] RECOVERY - Apache HTTP on mw1131 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 0.150 second response time [08:48:27] RECOVERY - RAID on mw1120 is OK: OK: no RAID installed [08:48:27] RECOVERY - Apache HTTP on mw1120 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 5.494 second response time [08:48:27] RECOVERY - Apache HTTP on mw1197 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 0.734 second response time [08:48:27] RECOVERY - Apache HTTP on mw1191 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 2.981 second response time [08:48:28] PROBLEM - check if dhclient is running on mw1117 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:48:37] RECOVERY - Apache HTTP on mw1137 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 0.079 second response time [08:48:37] RECOVERY - RAID on mw1148 is OK: OK: no RAID installed [08:48:47] RECOVERY - RAID on mw1137 is OK: OK: no RAID installed [08:48:47] RECOVERY - Apache HTTP on mw1145 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 0.196 second response time [08:48:48] RECOVERY - Apache HTTP on mw1190 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 0.332 second response time [08:48:48] RECOVERY - Apache HTTP on mw1202 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 0.602 second response time [08:48:48] RECOVERY - Apache HTTP on mw1135 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 0.073 second response time [08:49:07] RECOVERY - Apache HTTP on mw1200 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 0.246 second response time [08:49:17] RECOVERY - check if dhclient is running on mw1117 is OK: PROCS OK: 0 processes with command name dhclient [08:50:28] PROBLEM - HTTP error ratio anomaly detection on labmon1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 4 below the confidence bounds [08:50:37] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 4 below the confidence bounds [08:51:37] PROBLEM - Apache HTTP on mw1193 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:51:47] PROBLEM - DPKG on mw1129 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:51:48] PROBLEM - puppet last run on mw1120 is CRITICAL: CRITICAL: Puppet has 1 failures [08:51:48] PROBLEM - nutcracker process on mw1129 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:51:57] RECOVERY - Apache HTTP on mw1207 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 0.260 second response time [08:51:57] PROBLEM - RAID on mw1129 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:51:58] !log restarted apache on mw1134 [08:52:05] Logged the message, Master [08:52:27] RECOVERY - Apache HTTP on mw1193 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 2.320 second response time [08:52:37] RECOVERY - DPKG on mw1129 is OK: All packages OK [08:52:38] RECOVERY - nutcracker process on mw1129 is OK: PROCS OK: 1 process with UID = 112 (nutcracker), command name nutcracker [08:52:40] (03PS1) 10Alexandros Kosiaris: Cleanup dhcpd.conf [operations/puppet] - 10https://gerrit.wikimedia.org/r/156760 [08:53:47] RECOVERY - RAID on mw1129 is OK: OK: no RAID installed [08:54:07] PROBLEM - Puppet freshness on mw1053 is CRITICAL: Last successful Puppet run was Wed 27 Aug 2014 06:41:47 UTC [08:55:47] PROBLEM - MySQL InnoDB on db1038 is CRITICAL: CRIT longest blocking idle transaction sleeps for 630 seconds [08:55:48] PROBLEM - MySQL Idle Transactions on db1038 is CRITICAL: CRIT longest blocking idle transaction sleeps for 635 seconds [08:57:47] RECOVERY - MySQL InnoDB on db1038 is OK: OK longest blocking idle transaction sleeps for 0 seconds [08:57:47] RECOVERY - MySQL Idle Transactions on db1038 is OK: OK longest blocking idle transaction sleeps for 0 seconds [09:01:28] PROBLEM - HTTP error ratio anomaly detection on labmon1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 9 below the confidence bounds [09:01:37] PROBLEM - Number of mediawiki jobs running on tungsten is CRITICAL: CRITICAL: Anomaly detected: 39 data above and 0 below the confidence bounds [09:01:37] PROBLEM - Number of mediawiki jobs queued on tungsten is CRITICAL: CRITICAL: Anomaly detected: 39 data above and 0 below the confidence bounds [09:01:38] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 9 below the confidence bounds [09:03:48] PROBLEM - puppet last run on labsdb1001 is CRITICAL: CRITICAL: Puppet has 1 failures [09:04:48] RECOVERY - puppet last run on mw1120 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [09:13:03] (03PS1) 10Giuseppe Lavagetto: beta: manage virtualhosts via puppet [operations/puppet] - 10https://gerrit.wikimedia.org/r/156762 [09:14:09] <_joe_> hashar: ^^ [09:14:15] <_joe_> tell me what you think [09:15:30] _joe_: ah yeah I noticed yesterday that we no more use operations/apache-config.git for production [09:15:41] beta has some files in the 'betacluster'branch of that repo [09:16:32] <_joe_> I imported them [09:16:39] <_joe_> I basically got all.conf [09:16:58] <_joe_> and transformed it to a puppet class [09:18:50] and there are some live hack in the conf [09:18:53] not even commited [09:19:41] <_joe_> eww [09:19:45] <_joe_> let me check that [09:20:34] <_joe_> tehre is just one hack [09:20:50] <_joe_> one line, it's not that hard to port :) [09:21:48] RECOVERY - puppet last run on labsdb1001 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [09:21:59] _joe_: I have resetted the repo [09:22:41] the apache configuration files are mostly a copy paste from the files we had in production 2 years+ ago [09:22:52] with a few more hacks to adjust them for beta [09:23:03] _joe_: may i trouble you with some Italian or is it out of scope for ops ? :) [09:23:52] <_joe_> matanya: absolutely not [09:23:56] <_joe_> in query maybe [09:23:57] <_joe_> :) [09:28:10] (03PS2) 10Giuseppe Lavagetto: beta: manage virtualhosts via puppet [operations/puppet] - 10https://gerrit.wikimedia.org/r/156762 [09:30:16] <_joe_> hashar: the "hack" is in the patch. [09:30:36] <_joe_> I was thinking of merging it and test it on deployment-mediawiki01 first [09:30:50] <_joe_> s/merging/cherry-picking/ [09:33:34] _joe_: I can't remember how we load those confs [09:33:41] we have all.conf site.conf labs.conf and such [09:33:45] <_joe_> hashar: I checked [09:33:53] <_joe_> it's the apache::site I removed [09:33:58] <_joe_> that included all.conf [09:34:00] <_joe_> from nfs [09:46:07] PROBLEM - Puppet freshness on install2001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 00:00:00 UTC [09:47:28] RECOVERY - HTTP error ratio anomaly detection on labmon1001 is OK: OK: No anomaly detected [09:47:37] RECOVERY - HTTP error ratio anomaly detection on tungsten is OK: OK: No anomaly detected [09:48:23] (03PS5) 10Alexandros Kosiaris: Mathoid role for production [operations/puppet] - 10https://gerrit.wikimedia.org/r/156542 (https://bugzilla.wikimedia.org/69989) (owner: 10Physikerwelt) [09:50:57] (03CR) 10Alexandros Kosiaris: [C: 032] "With some minor cosmetic changes on my part, this is ready. Merging" [operations/puppet] - 10https://gerrit.wikimedia.org/r/156542 (https://bugzilla.wikimedia.org/69989) (owner: 10Physikerwelt) [09:53:04] (03CR) 10Alexandros Kosiaris: [C: 032] Cleanup dhcpd.conf [operations/puppet] - 10https://gerrit.wikimedia.org/r/156760 (owner: 10Alexandros Kosiaris) [10:18:16] !log restarting mailman on sodium [10:18:23] Logged the message, Master [10:20:17] PROBLEM - mailman_qrunner on sodium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 38 (list), regex args /mailman/bin/qrunner [10:20:47] PROBLEM - mailman_ctl on sodium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 38 (list), regex args /mailman/bin/mailmanctl [10:22:14] icinga-wm: shush [10:22:45] it'd be actually nice if we could tell it "shush for 10m" [10:23:17] RECOVERY - mailman_qrunner on sodium is OK: PROCS OK: 8 processes with UID = 38 (list), regex args /mailman/bin/qrunner [10:23:47] RECOVERY - mailman_ctl on sodium is OK: PROCS OK: 1 process with UID = 38 (list), regex args /mailman/bin/mailmanctl [10:36:31] (03PS1) 10Filippo Giunchedi: mailman: leave used_languages alone, use site_languages [operations/puppet] - 10https://gerrit.wikimedia.org/r/156766 [10:37:37] that should fix sodium going forward [10:37:46] mutante: ^ [10:44:18] !log xtrabackup clone db1051 to db1073 [10:44:25] Logged the message, Master [10:46:15] <_joe_> godog: tricky indeed [10:46:48] <_joe_> this may be one of the cases where creating a custom type was a good idea [10:48:23] (03PS4) 10Giuseppe Lavagetto: hadoop: Give deskana access to Hive [operations/puppet] - 10https://gerrit.wikimedia.org/r/155137 (owner: 10Yuvipanda) [10:48:33] (03CR) 10Giuseppe Lavagetto: [C: 032] hadoop: Give deskana access to Hive [operations/puppet] - 10https://gerrit.wikimedia.org/r/155137 (owner: 10Yuvipanda) [10:50:57] PROBLEM - puppet last run on amssq36 is CRITICAL: CRITICAL: Epic puppet fail [10:54:29] _joe_: ye, I couldn't figure out when it broke and/or if it has ever worked [10:54:36] PROBLEM - Puppet freshness on mw1053 is CRITICAL: Last successful Puppet run was Wed 27 Aug 2014 06:41:47 UTC [11:09:06] RECOVERY - puppet last run on amssq36 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [11:21:25] (03CR) 10TTO: "Could you explain how you decided what to move and what not to move? I can't see any rhyme or reason here." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156081 (https://bugzilla.wikimedia.org/58247) (owner: 10Withoutaname) [11:46:36] PROBLEM - Puppet freshness on install2001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 00:00:00 UTC [11:49:56] <_joe_> that looks promising [11:50:21] <_joe_> (I guess snmp is not getting routed correctly) [11:50:46] (03CR) 10Alexandros Kosiaris: [C: 031] "I have tested this and it is working fine. However I had a couple of questions about the branch approach, which I seem to have answered th" [operations/debs/vips] - 10https://gerrit.wikimedia.org/r/113098 (owner: 10Hashar) [12:08:36] PROBLEM - Puppet freshness on sodium is CRITICAL: Last successful Puppet run was Thu 28 Aug 2014 10:08:14 UTC [12:35:37] (03CR) 10TTO: "Anything that allows us to easily rename wikis (i.e. move subdomains) makes me happy, even if it is slightly evil!" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134962 (owner: 10Reedy) [12:49:55] (03PS2) 10Giuseppe Lavagetto: hadoop: grant hive access to Bernd Sitzmann [operations/puppet] - 10https://gerrit.wikimedia.org/r/156253 [12:49:57] (03PS2) 10Giuseppe Lavagetto: hadoop: grant access to Dimitry Brant [operations/puppet] - 10https://gerrit.wikimedia.org/r/156254 [12:50:35] (03CR) 10Giuseppe Lavagetto: [C: 032] hadoop: grant hive access to Bernd Sitzmann [operations/puppet] - 10https://gerrit.wikimedia.org/r/156253 (owner: 10Giuseppe Lavagetto) [12:50:44] (03CR) 10jenkins-bot: [V: 04-1] hadoop: grant access to Dimitry Brant [operations/puppet] - 10https://gerrit.wikimedia.org/r/156254 (owner: 10Giuseppe Lavagetto) [12:51:08] <_joe_> damn [12:52:26] (03PS3) 10Giuseppe Lavagetto: hadoop: grant access to Dimitry Brant [operations/puppet] - 10https://gerrit.wikimedia.org/r/156254 [12:52:52] (03CR) 10Giuseppe Lavagetto: [C: 032] hadoop: grant access to Dimitry Brant [operations/puppet] - 10https://gerrit.wikimedia.org/r/156254 (owner: 10Giuseppe Lavagetto) [12:55:36] PROBLEM - Puppet freshness on mw1053 is CRITICAL: Last successful Puppet run was Wed 27 Aug 2014 06:41:47 UTC [12:59:46] RECOVERY - Puppet freshness on sodium is OK: puppet ran at Thu Aug 28 12:59:41 UTC 2014 [13:12:28] (03PS2) 10Giuseppe Lavagetto: Add oit to lead and polonium RT8214 [operations/puppet] - 10https://gerrit.wikimedia.org/r/156354 (owner: 10Jkrauska) [13:12:51] (03CR) 10Giuseppe Lavagetto: [C: 032] Add oit to lead and polonium RT8214 [operations/puppet] - 10https://gerrit.wikimedia.org/r/156354 (owner: 10Jkrauska) [13:18:40] !log Reactivated cr2-eqiad AS3257 transit link [13:18:46] Logged the message, Master [13:21:51] (03CR) 10Giuseppe Lavagetto: [C: 04-2] "I think ori moved everything out to the role already." [operations/puppet] - 10https://gerrit.wikimedia.org/r/148041 (owner: 10Hashar) [13:24:25] (03Abandoned) 10Giuseppe Lavagetto: Load nutcracker server list from realm-specific yaml file [operations/puppet] - 10https://gerrit.wikimedia.org/r/147830 (owner: 10Ori.livneh) [13:25:57] PROBLEM - puppet last run on lead is CRITICAL: CRITICAL: Epic puppet fail [13:29:17] PROBLEM - puppet last run on polonium is CRITICAL: CRITICAL: Epic puppet fail [13:30:11] <_joe_> mmmh [13:30:38] <_joe_> that's on me [13:31:12] <_joe_> inclusion order fail [13:31:49] * _joe_ ♥ puppet [13:32:41] <_joe_> mmmh not really [13:35:19] hashar and/or hashar_: Are you working at all today? If so… springle renamed labsdb last night, so i'd like to merge https://gerrit.wikimedia.org/r/#/c/156695/ [13:35:33] andrewbogott: yeah working as usual :-] [13:36:19] Do you have a moment to review that, and then stand by while I merge it to see if beta survives? [13:36:57] <_joe_> hey hey what about beta? [13:38:03] _joe_: this one isn't interesting… it just turns out that beta has an (unrelated) db with the same name as wikitech. springle renamed it overnight, now we need to catch up with the new name. [13:38:33] <_joe_> oh ok, just "average noise" then [13:47:36] PROBLEM - Puppet freshness on install2001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 00:00:00 UTC [13:52:13] andrewbogott: I am unlikely to be able to review / tweak that today :-( [13:52:20] andrewbogott: I am busy refactoring the Wikidata Jenkins jobs with addshore [13:52:36] hashar: ok, I'll just merge it and cross my fingers :) [13:53:15] yeah that is a strategy :] [13:53:24] I guess with sean view hack, that might just work [13:53:49] I am not sure whether there is any impact to prod though [13:54:14] such as deleted.dblist, not sure what it is used for [13:55:15] that might probably screw up CentralAuth somehow but we can fix it [14:02:54] (03PS1) 10Giuseppe Lavagetto: puppet: fix duplicate declarations [operations/puppet] - 10https://gerrit.wikimedia.org/r/156785 [14:03:16] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] puppet: fix duplicate declarations [operations/puppet] - 10https://gerrit.wikimedia.org/r/156785 (owner: 10Giuseppe Lavagetto) [14:04:40] (03CR) 10Ottomata: "Ok cool. Will mount a tmpfs for this." [operations/puppet] - 10https://gerrit.wikimedia.org/r/156673 (owner: 10Ottomata) [14:08:17] RECOVERY - puppet last run on polonium is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [14:25:06] RECOVERY - puppet last run on lead is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [14:41:27] (03CR) 10Alexandros Kosiaris: [C: 031] "Good catch!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/156766 (owner: 10Filippo Giunchedi) [14:47:32] akosiaris: Is production using ubuntu 12 or 14. I think it would be good to use ubuntu 14 for mathoid. That's the same OS that was tested in beta. [14:54:11] physikerwelt: we now install 14 by default as of yesterday [14:54:15] _joe_: yay, auto hdfs user directories work! [14:54:21] that was the first real test of that [14:54:31] thanks for merging those, looks like everything worked as it should [14:54:53] <_joe_> I hoped so :) [14:55:57] (03PS1) 10Giuseppe Lavagetto: varnish: add a php engine token [operations/puppet] - 10https://gerrit.wikimedia.org/r/156793 [14:56:36] PROBLEM - Puppet freshness on mw1053 is CRITICAL: Last successful Puppet run was Wed 27 Aug 2014 06:41:47 UTC [14:56:37] Coren: The discussion in RT 6077 is about which production server to use for mathoid. Since Mathoid is a new product I think it would be good if we had to maintain support for one OS only [14:56:40] akosiaris: preference on where this tmpfs dir shoud be mounted? [14:56:47] physikerwelt: production is still using nothing. The servers I recommended though are 12.04. But that requirement has me puzzled to be honest [14:56:51] /mnt/webstats? /run/webstats? [14:57:04] ottomata: /run/webstats [14:57:14] k [14:57:18] I think _joe_ has create /run/hhvm so let's be consistent [14:57:20] akosiaris: I think he meant "production in general" not mathoid specifically. :-) [14:57:58] Coren: then the answer is both, though I doubt that is of any help [14:59:01] Well, strictly speaking, the answer is both, with some stragglers on Lucid. :-) [14:59:12] (At least we have no Hardy left) [14:59:36] Coren: you'd think [14:59:52] pdf service is hardy.. and is considered producton (kind of) [15:00:04] manybubbles, anomie, Reedy: Dear anthropoid, the time has come. Please deploy SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140828T1500). [15:00:05] pdf is dead man walking though. :-) [15:00:12] thankfully [15:00:28] (03PS1) 10BBlack: Add missing eqiad LVS eth3 forward DNS [operations/dns] - 10https://gerrit.wikimedia.org/r/156794 [15:00:30] (03PS1) 10BBlack: Add codfw LVS cross-connects [operations/dns] - 10https://gerrit.wikimedia.org/r/156795 [15:00:36] can't wait for ocg to be declared production ready [15:01:09] (03CR) 10BBlack: [C: 032] Add missing eqiad LVS eth3 forward DNS [operations/dns] - 10https://gerrit.wikimedia.org/r/156794 (owner: 10BBlack) [15:01:24] Yay, jouncebot is jouncing again! Nothing for SWAT today, though. [15:01:34] akosiaris: ubuntu 14 is not a requirement. mathoid.eqiad.wmflabs uses ubutu 12 and is now up for 153 days. but if the same os is used for beta and production this would simplify the deployment process [15:01:41] bd808: can I get a review of https://gerrit.wikimedia.org/r/#/c/156695/ and help babysitting it when I merge? [15:02:19] andrewbogott: I will review this morning. When do you want to merge? [15:02:55] physikerwelt: true, problems is we need to find some ubuntu 14.04 boxes to assign mathoid to... [15:02:59] bd808: Any time today is fine. I'm about to vanish for ~an hour anyway [15:03:06] (03CR) 10BBlack: [C: 032] Add codfw LVS cross-connects [operations/dns] - 10https://gerrit.wikimedia.org/r/156795 (owner: 10BBlack) [15:03:24] perhaps we can work around this in another way... assign boxes to mathoid and start migrating parsoid to those as well... [15:03:33] I wonder if gwicke has done any tests for parsoid on ubuntu 14.04 [15:04:26] akosiaris: I run parsoid on Trusty, so it /works/. Dunno how well tested it may be though. [15:04:43] I did some testing for parsoid on ubuntu 14. that's how I learned how to migrate mathoid to ubutu 14 [15:05:12] anomie: we restarted jouncebot yesterday. I now have access to the tool to restart and greg-g does as well. I can totally add you if you are interested in keeping it alive. [15:05:53] bd808: No thanks, I was just glad to see it alive again [15:05:57] physikerwelt: would you say it is an achievable goal ? as in this year/quarter/month/week/day/hour (:P) ? [15:06:40] not that we got much choice. We must migrate parsoid to ubuntu 14.04 at some point anyway [15:07:02] For me it took one day to get parsoid running on ububtu 14. I documented the steps at https://www.mediawiki.org/w/index.php?title=Parsoid%2FSetup&diff=1075318&oldid=1074292 [15:07:42] akosiaris: node-gyp is the crucial point [15:09:10] wth is node-gyp... /me educating myself [15:09:46] ah node-waf replacement [15:11:17] <_joe_> I thought that was node-wtf [15:12:36] akosiaris: there was a bug in the debian package for node-gyp so we needed to install the nonofficial packages for parsoid [15:13:11] ^^I had to install... this has not been done by the parsoid team to the best of my knowledge\ [15:13:56] * physikerwelt is looking for the bug number [15:15:06] (03PS1) 10BBlack: lvs200x initial puppet config [operations/puppet] - 10https://gerrit.wikimedia.org/r/156797 [15:16:04] (03CR) 10BBlack: [C: 04-1] "Not quite ready yet, as noted in commit msg" [operations/puppet] - 10https://gerrit.wikimedia.org/r/156797 (owner: 10BBlack) [15:16:08] physikerwelt: would it help if nodejs 0.10.x made it in production for ubuntu 12.04 ? [15:16:34] akosiaris: this was the bug we run into while trying to install jsdom https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=742347 [15:17:47] akosiaris: The problem is not nodejs itself but jsdom and as far as I know this problem boils down to issues in connection with node-gyp [15:18:05] physikerwelt: yes I figured that reading the bug report [15:18:17] (03PS1) 10Ottomata: Run webstats collector in a dedicated tmpfs directory [operations/puppet] - 10https://gerrit.wikimedia.org/r/156801 [15:18:20] akosiaris: ^ [15:19:46] akosiaris: currently mathoid and parsoid both use binary node packages that are specific to the OS. Those binary packages are shipped in the node_modules folder via git [15:20:53] native nodejs modules... I was hoping that would never happen but why should it not ? [15:21:13] 158 node modules packages in 14.04 ... [15:21:43] akosiaris: I think it would be good to discuss that with the parsoid team [15:22:10] physikerwelt: I was going to suggest pretty much the same. Let's rethink this a bit [15:22:36] parsoid should anyway upgrade to 14.04 at some point, it is probably a good time to start discussing it [15:22:41] akosiaris: my goal was to use the same mechanism that are used in parsoid [15:23:07] (03CR) 10Ottomata: [C: 031] varnish: add a php engine token (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/156793 (owner: 10Giuseppe Lavagetto) [15:24:46] gwicke: ping [15:26:09] (03CR) 10Alexandros Kosiaris: [C: 032] Run webstats collector in a dedicated tmpfs directory [operations/puppet] - 10https://gerrit.wikimedia.org/r/156801 (owner: 10Ottomata) [15:26:32] thanks [15:30:15] to some degree the binary node packages can be built for multiple platforms. it may well be possible to commit binary packages for both ubuntu 12 and ubuntu 14 into the deploy repos, prior to the changeover. [15:30:38] quoting from #mediawiki-paroid: [15:31:02] cscott-free: physikerwelt: generally 'npm install' from the parsoid directory should be sufficient [15:31:02] cscott-free: that will install all the required modules, rebuilding from source any which have binary dependencies (i don't think any binary dependencies are required, technically) [15:31:02] cscott-free: our deploy process in production uses a separate repo, mediawiki/services/parsoid/deploy, which has the dependencies prebuilt for the production environment. that's only useful to you if your production environment exactly matches the wmf production environment. [15:31:02] physikerwelt: cscott: the question is how this should be done once the production cluster is updated to ubuntu 14 [15:31:03] cscott-free: we will update mediawiki/services/parsoid/deploy to match the new production environment. [15:31:03] physikerwelt: cscott: could you explain that to akosiaris on #wikimedia-operations [15:31:46] jsdom is indeed the problematic package, but fwiw we don't actually use the binary component there. [15:31:48] (03CR) 10BryanDavis: [C: 04-1] "Test is failing because of line 146 in multiversion/MWMultiVersion.php where deployment.wikimedia.beta.wmflabs.org is mapped to 'labs'. Th" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156695 (owner: 10Andrew Bogott) [15:32:04] it's listed as a dependency of the html5 lib, although it is actually only an *optional* dependency, and parsoid doesn't use it. [15:32:37] but other binary dependencies other than jsdom might have crept in. the OCG service has a number of binary dependencies, including sqlite3. [15:32:46] (03PS6) 10BryanDavis: Rename labswiki to deploymentwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156695 (owner: 10Andrew Bogott) [15:33:09] (03CR) 10Reedy: Rename labswiki to deploymentwiki (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156695 (owner: 10Andrew Bogott) [15:33:17] but anyway, that's the whole point of the */deploy repo. it contains packages built for production. it should match whatever is in production. we can branch it if we need to for testing. [15:34:00] cscott: I would prefer to use debian packages to install the required binary packages [15:34:07] cscott: that sure explains some stuff [15:34:30] physikerwelt: mathoid would be the first service to actually do that btw [15:34:49] all other bundle their modules in the deploy IIRC [15:35:33] akosiaris: ok I think it's best to do whatever is the well practiced standard [15:36:44] The problem with using node packages from apt is that 1.5 years from now you will still have the same old node packages [15:37:11] LTS means no updates or bring in another apt repo [15:38:09] (03PS2) 10Filippo Giunchedi: add ssh-based uploads to releases.wikimedia.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/150851 [15:38:40] imho, given the state of such things, I don't think app-layer software should come from the OS repos at all, really (or be installed in OS paths) [15:38:42] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] add ssh-based uploads to releases.wikimedia.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/150851 (owner: 10Filippo Giunchedi) [15:39:02] (03PS7) 10BryanDavis: Rename labswiki to deploymentwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156695 (owner: 10Andrew Bogott) [15:39:07] e.g. in this case node.js and npm's should be in some app-layer directory and installed independently just for the app that needs them [15:39:08] bd808: OK. I can live with the binary node modules even though I have the feeling that this is not very elegant... but I want to focus on the math aspect [15:39:25] but we still have to find a way to explicitly manage versions, and log any updates to those versions! [15:39:31] bblack: +1 [15:39:40] bblack: bd808 good luck with that [15:39:46] :) [15:40:13] I'm just spouting opinion here, I have no intention of working on the problem [15:40:19] ahaha [15:41:17] lxc containers :) [15:41:51] bd808: how would that solve the version management issue ? And security updates? [15:42:06] the actual tracking of them, to be clear [15:42:42] It doesn't solve that part, but it solves "some app-layer directory and installed independently just for the app that needs them" [15:43:21] There is a burden on "maintenance team" (which we don't really have) for the security updates [15:43:45] well that is us, isn't it ? [15:43:50] at least to some extent [15:44:19] but we can't track down security updates for things we don't have a way of tracking down... [15:44:20] I agree that this is a problem that needs to be reasonably solved. The same thing is currently an open question for the Composer in prod RFC [15:45:17] bd808: are there os dependent binaries for composer too [15:45:43] physikerwelt: I think we will be going with two new ubuntu 14.04 boxes for mathoid, as it seems [15:45:58] akosiaris: That's great news. [15:46:44] For Composer we found that there is some level of tracking provided by https://github.com/sensiolabs/security-advisories [15:46:53] akosiaris: From a conceptual perspective, I think it's a good idea to the same setup that was tested in beta labs [15:48:36] PROBLEM - Puppet freshness on install2001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 00:00:00 UTC [15:48:40] (03PS1) 10Faidon Liambotis: Fix RIPE Atlas eqiad DNS records [operations/dns] - 10https://gerrit.wikimedia.org/r/156808 [15:48:57] akosiaris: That means I can abandon https://gerrit.wikimedia.org/r/#/c/156576/ ? [15:49:23] (03CR) 10BryanDavis: Rename labswiki to deploymentwiki (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156695 (owner: 10Andrew Bogott) [15:49:38] physikerwelt: no, amend it when we got the relevant info [15:49:55] I added me as a reviewer btw [15:50:26] (03PS8) 10BryanDavis: Rename labswiki to deploymentwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156695 (owner: 10Andrew Bogott) [15:50:43] (03PS2) 10Faidon Liambotis: Fix RIPE Atlas eqiad DNS records [operations/dns] - 10https://gerrit.wikimedia.org/r/156808 [15:51:11] bd808: I'm back; thank you for tidying up! [15:51:23] (03CR) 10Faidon Liambotis: [C: 032] Fix RIPE Atlas eqiad DNS records [operations/dns] - 10https://gerrit.wikimedia.org/r/156808 (owner: 10Faidon Liambotis) [15:51:27] Time to merge, or are there more pieces left? [15:51:59] bd808: my primary concern is really repeatability. if some random server dies and we re-puppet/deploy from scratch, it needs to end up with the same bits on it [15:52:18] (as it had before) [15:52:23] excuse my impatience... but what's the source for the missing information [15:52:59] physikerwelt: the names of the two new boxes [15:53:18] akosiaris: who will know that? [15:53:29] physikerwelt: me [15:53:38] physikerwelt: but I need to get the boxes assigned first [15:53:52] it is going to take some time, please be patient [15:55:31] bblack: As it had or as the rest of the cluster has? I would vote for the latter. [15:55:50] the two should be identical to begin with :) [15:56:01] akosiaris: OK thank you. It's not a problem for me to wait as long as I know who is responsible for the next step... The only thing that bothers be if there is a change submitted for review and you don't know if it will be reviewed some day [15:56:14] bblack: Sure. modulo what happens while the box is dead [15:56:16] we can't a have flow that installs "the latest foo" whenever foo gets installed, because that can change out from under us [15:56:28] Agreed. [15:57:12] generally that's tricky with npm (or gem, or cpan, etc), because they autoinstall dependencies based on >= version, etc [15:57:45] and you need to map out the version of everything manually, or alternatively you work with the >= version deps directly, but snapshot the whole upstream repo, so that you've got whatever the tool installs with the repo on a fixed date, and upgrade by repo date [15:59:26] bblack: I have done the last thing before and that's what I'm recommending for Composer components in prod [15:59:37] We make a frozen snapshot and deploy that [15:59:46] given the interconnectedness of packages in those kinds of repos and the explosion of dependencies, generally only the later is maintainable (by repo snapshot date)... [15:59:48] updating is a manual (or Jenkins automated) process [15:59:54] but then you run into issues with quick security updates [16:00:12] because the last snapshot was 2 months ago, and nobody's tested the 34 updates since then, but we need 1 update "right now" [16:01:12] (03CR) 10Filippo Giunchedi: [C: 031] download.wm.org - use apache::site method [operations/puppet] - 10https://gerrit.wikimedia.org/r/153817 (owner: 10Dzahn) [16:01:58] !log restarted webstats-collector on gadolinium [16:02:05] Logged the message, Master [16:03:45] (03CR) 10BryanDavis: [C: 031] "This looks good to me. The string "labswiki" only appears as a substring in 5 lines of deleted.dblist now and the tests (such as they are)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156695 (owner: 10Andrew Bogott) [16:13:41] (03CR) 10Andrew Bogott: [C: 032] Rename labswiki to deploymentwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156695 (owner: 10Andrew Bogott) [16:23:29] _joe_, can you join #wikimedia-labs if you are not there already? [16:23:36] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There is one unmerged change in mediawiki_config (dir /a/common/). [16:24:19] bd808: any idea why we have no recentchanges page on beta? http://wikidata.beta.wmflabs.org/wiki/Special:RecentChanges [16:24:28] ori: ? [16:24:35] (03CR) 10Ottomata: "I doubt he is blocked by it, he probably just got the pw from another researcher. Aaron?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/155452 (owner: 10Dzahn) [16:24:42] http://wikidata.beta.wmflabs.org/wiki/Special:Random [16:24:49] nothing [16:25:01] http://wikidata.beta.wmflabs.org/ no configured [16:25:14] aude: join us in #wikimedia-labs, this may be part of the thing discussed there. [16:25:44] (03CR) 10Ottomata: [C: 031] delete blog SSL certificates [operations/puppet] - 10https://gerrit.wikimedia.org/r/153228 (owner: 10Dzahn) [16:26:20] (03PS2) 10Ottomata: Run refinery-drop-webrequest-partitions every 4 hours [operations/puppet] - 10https://gerrit.wikimedia.org/r/156111 [16:26:26] (03CR) 10Ottomata: [C: 032 V: 032] Run refinery-drop-webrequest-partitions every 4 hours [operations/puppet] - 10https://gerrit.wikimedia.org/r/156111 (owner: 10Ottomata) [16:26:41] <_joe_> aude: I think I know why [16:26:56] <_joe_> aude: let me check [16:27:17] being discussed in #wikimedia-labs [16:27:19] _joe_: [16:27:46] <_joe_> aude: 1 min please :) [16:28:35] k [16:39:35] Reedy (greg-g) , the patch to 1.24wmf18 to freeze Echo and Flow at their current versions for group2 wikis is https://gerrit.wikimedia.org/r/#/c/156678 [16:45:21] (03CR) 10BryanDavis: [C: 04-1] "Needs to get the vhosts that are defined in https://github.com/wikimedia/operations-apache-config/blob/betacluster/site.conf" [operations/puppet] - 10https://gerrit.wikimedia.org/r/156762 (owner: 10Giuseppe Lavagetto) [16:57:22] Reedy: is it safe to merge https://gerrit.wikimedia.org/r/#/c/156678/ for freezing echo deploy? [16:57:37] PROBLEM - Puppet freshness on mw1053 is CRITICAL: Last successful Puppet run was Wed 27 Aug 2014 06:41:47 UTC [16:57:47] * ebernhardson just noticed spage said that 20 minutes ago, you probably know already then :P [16:57:59] I'm only just back at my PC [16:58:16] Are you wanting me to deploy it? Don't mind as long as I know ;) [16:58:47] ebernhardson: spagewmf ^^ [16:58:58] Reedy: yes with the group2 bump [17:05:47] <^d> ottomata: Gah! We forgot to drain elastic1016! [17:07:41] <^d> I can do that now if we're still wanting 1016 for disk testing. [17:10:48] ^d, let's do it, cmjohnson has removed the new SSDs from 1019 [17:10:54] so i think he will be able to put those in 1016 if we are ready [17:10:56] <^d> Ok, doing. [17:11:04] but ja, 1018 is up and running, so i think we shoudl be fine without 1016 [17:12:22] <^d> Ok, stuff's moving off 16. [17:13:43] <^d> !log elastic: excluded the elastic1016 node from shard allocation, shards draining so we can take it down for disk testing [17:13:50] Logged the message, Master [17:18:23] <^d> ottomata: `watch 'curl -s localhost:9200/_cat/allocation/elastic1016?v; curl -s localhost:9200/_cat/health?v'` if you care to keep an eye on progress in a spare tab from any elastic box. [17:19:12] <^d> pretty boring though, as expected. [17:22:28] (03CR) 10BBlack: "Putting this as a vcl_deliver directly in hhvm.inc is going to end up setting it multiple times per request (e.g. t2-fe -> t2-be -> t1-be)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/156793 (owner: 10Giuseppe Lavagetto) [17:25:16] (03CR) 10Aaron Schulz: [C: 032] Recognize ::1 as loopback too [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156706 (owner: 10Aaron Schulz) [17:25:23] (03Merged) 10jenkins-bot: Recognize ::1 as loopback too [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156706 (owner: 10Aaron Schulz) [17:26:13] !log aaron Synchronized rpc: 9564e93ecd4953126d91b99d7728f63401a4dc86 (duration: 00m 07s) [17:26:20] Logged the message, Master [17:26:36] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [17:33:31] cool, ^d, we are waiting for shards to drop to 0? [17:33:41] on 1016? [17:34:04] <^d> Yep. [17:34:04] <^d> Down to 180 now. [17:34:19] cool [17:40:34] (03PS1) 10Gerrit Patch Uploader: Fix typos in various localizations of dvwiki configurations [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156821 [17:40:36] (03CR) 10Gerrit Patch Uploader: "This commit was uploaded using the Gerrit Patch Uploader [1]." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156821 (owner: 10Gerrit Patch Uploader) [17:41:04] bd808: hi. you around ? [17:48:31] (03PS2) 10Glaisher: Fix typos in various localizations of dvwiki configurations [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156821 (owner: 10Gerrit Patch Uploader) [17:49:36] PROBLEM - Puppet freshness on install2001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 00:00:00 UTC [17:51:09] so screen is not setuid-root on bastion. was trying to screenshare with reedy to watch him do a deploy. is something like hangouts our only option or is there a blessed way to observe a terminal session on bastion? [17:51:58] bblack: you might know ^^ [17:52:07] andrewbogott: or you ;) ^^ [17:52:12] andrewbogott: re liason duties ;) [17:52:29] why would screen want to be setuid-root? [17:52:34] that seems like a horrible idea :) [17:52:37] to do a shareed scrreen session [17:53:15] there has to be another way to do that [17:53:28] I use dtach [17:53:32] that's why I was asking in here [17:54:11] google suggests to use screen -d -m -S shared [17:54:19] googling that is [17:54:50] "NOTE: Screen sharing with another account requires that the screen command be suid root." -- http://wiki.networksecuritytoolkit.org/nstwiki/index.php/HowTo_Share_A_Terminal_Session_Using_Screen#Sharing_A_Screen_Session_With_Another_User [17:55:01] dtach: dtach -c /tmp/foo bash [17:55:12] Maybe we should document this on wikitech [17:55:17] and attachL dtach -a /tmp/foo [17:55:25] bd808: yeah still, that seems like the least-excusable reason ever to make something suid root [17:55:51] It's really about permissions on a socket I think [17:56:21] (03CR) 10Florianschmidtwelzow: [C: 04-1] "I think there is no need to add a backup file (*.orig)." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156821 (owner: 10Gerrit Patch Uploader) [17:56:27] you can make a special user just to share, both put your SSH keys in its home, login as the same user, first one types screen, second one types screen -x [17:56:54] that's what i did with friends before to actually type in the same session [17:56:54] sounds like the shared deploy user idea ;) [17:57:03] sudo -u mwdeploy screen -r [17:58:27] sshhh don't tell them we can share screens easily, or they'll start making us do pair-operating as part of some Xtreme Operations thing :p [17:59:03] hahaa [17:59:14] aka "xoxo" [17:59:18] * ebernhardson always cheated with sudo to share screens, didn't even realize [17:59:22] replaces etherpad with shared vim [18:00:04] Reedy, greg-g: Dear anthropoid, the time has come. Please deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140828T1800). [18:01:28] XtremeOps will be the next thing after DevOps now bblack [18:08:09] (03PS1) 10Filippo Giunchedi: releases: setup tin to upload [operations/puppet] - 10https://gerrit.wikimedia.org/r/156828 [18:09:43] re ^ I'm doing some testing with gwicke [18:09:44] (03CR) 10GWicke: [C: 031] releases: setup tin to upload [operations/puppet] - 10https://gerrit.wikimedia.org/r/156828 (owner: 10Filippo Giunchedi) [18:10:01] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] releases: setup tin to upload [operations/puppet] - 10https://gerrit.wikimedia.org/r/156828 (owner: 10Filippo Giunchedi) [18:11:42] * bblack is filing a trademark for MegaUltraTurboOps to get ahead of the curve and cash in [18:12:43] * Reedy brokers bblack a .biz domain [18:13:55] counters with a .consultant ".consulting says knowledge." [18:15:22] Reedy: mediawiki.expert 59.99 [18:16:48] can we have http://mediawiki.Title and http://mw.loader.using ? [18:16:58] mutante, Reedy time to take a look into an exim patch - https://gerrit.wikimedia.org/r/#/c/155753/16/templates/exim/exim4.conf.SMTP_IMAP_MM.erb. I wanted to know whether the realm switch I used there is correct. [18:17:56] PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: Puppet has 1 failures [18:17:58] (03CR) 10Krinkle: "ping(2)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/108484 (https://bugzilla.wikimedia.org/54291) (owner: 10Ori.livneh) [18:18:14] icinga-wm: shush! [18:18:57] RECOVERY - puppet last run on tin is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [18:21:01] bblack: re: krinkle's ping -- any news, re: svgs? [18:21:13] huh? [18:21:31] https://bugzilla.wikimedia.org/54291 [18:21:39] err, https://gerrit.wikimedia.org/r/#/c/108484/ rather [18:22:08] oh gzip [18:22:22] (03PS1) 10Reedy: Add symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156831 [18:22:24] (03PS1) 10Reedy: testwiki to 1.24wmf19 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156832 [18:22:26] (03PS1) 10Reedy: wikipedias to 1.24wmf18 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156833 [18:22:28] (03PS1) 10Reedy: group0 to 1.24wmf19 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156834 [18:22:31] honestly that's like 43,726 items down my stack of backlogged things to go look at and/or fix [18:22:53] (03CR) 10Reedy: [C: 032] Add symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156831 (owner: 10Reedy) [18:22:57] (03Merged) 10jenkins-bot: Add symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156831 (owner: 10Reedy) [18:23:01] (03CR) 10Reedy: [C: 032] testwiki to 1.24wmf19 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156832 (owner: 10Reedy) [18:23:06] (03Merged) 10jenkins-bot: testwiki to 1.24wmf19 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156832 (owner: 10Reedy) [18:23:48] bblack: FIFO or LIFO? ;) [18:24:03] !log reedy Started scap: testwiki to 1.24wmf19 [18:24:05] MIFO [18:24:11] Logged the message, Master [18:25:03] * ori wonders if he should look that up in urbandictionary.com [18:25:05] <^d> ottomata: 32 shards to go. [18:25:17] I think the answer to that question is always no :) [18:25:45] <^d> ori: Add a new entry. "fifo, lifo, or gtfo" :) [18:25:50] !log install build-essential and fakeroot on tin [18:25:55] (03CR) 10Ori.livneh: "bblack ack'd that it's still on his radar but he has more urgent things to worry about atm" [operations/puppet] - 10https://gerrit.wikimedia.org/r/108484 (https://bugzilla.wikimedia.org/54291) (owner: 10Ori.livneh) [18:25:58] Logged the message, Master [18:26:00] MIFO -> Most Important First Out :) [18:26:54] I honestly don't expect I'll even investigate gzip on varnish3, it's a waste of time since we'll have to validate all this crap for v4 anyways [18:27:15] when's v4? [18:27:35] in exactly 78.6215 days [18:27:57] * ori isn't nagging, just curious! [18:28:03] I'm curious too! [18:28:12] <^d> Well the 78 was easy to wait. [18:28:12] <^d> It was the remaining .6215 that killed me. [18:29:26] RECOVERY - Puppet freshness on labsdb1006 is OK: puppet ran at Thu Aug 28 18:29:21 UTC 2014 [18:30:10] varnish4 is probably after varnish storage sanitizing, backlogged LVS cleanup in eqiad, codfw infra setup, ipv6 SLAAC fixups, gdnsd2 release. Most of which have been the next-most-important things on my list that I'm gonna do this week for several weeks now [18:30:15] because something else always comes up first [18:30:51] bblack: ah, wow. yeah, that's a lot. [18:32:08] the DNS refactor/cleanup for the zonefiles and dynamic stuff is probably ahead of it, too [18:42:43] (03CR) 1001tonythomas: "Can someone take a look into the real::switch I used in the exim template to ensure its working ?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/155753 (owner: 1001tonythomas) [18:46:43] (03Abandoned) 10Ori.livneh: mediawiki: use 'udplog' service alias instead of hard-coding fluorine [operations/puppet] - 10https://gerrit.wikimedia.org/r/153792 (owner: 10Ori.livneh) [18:51:03] bblack: ori: MIFO is middle in first out, yeah? [18:51:18] http://personalitycafe.com/intp-forum-thinkers/142125-fifo-filo-lilo-mimo-lifo.html [18:52:08] heh [18:52:42] Well lately my backlog seems to only gain items, never lose them. [18:53:03] so maybe it's more like WINO - Whenever In, Never Out [18:58:19] (03PS1) 10Cmjohnson: Dns changes for ms-fe1003 DO NOT MERGE YET [operations/dns] - 10https://gerrit.wikimedia.org/r/156841 [18:58:36] PROBLEM - Puppet freshness on mw1053 is CRITICAL: Last successful Puppet run was Wed 27 Aug 2014 06:41:47 UTC [19:00:23] (03PS1) 10Ori.livneh: mediawiki: add packages::fonts and packages::multimedia [operations/puppet] - 10https://gerrit.wikimedia.org/r/156842 [19:00:25] (03PS1) 10Ori.livneh: mediawiki::packages: remove libtidy-0.99-0 [operations/puppet] - 10https://gerrit.wikimedia.org/r/156843 [19:02:58] scap-rebuild-cdbs: 99% (ok: 226; fail: 0; left: 1) [19:03:00] nearly there [19:03:16] fenari I bet [19:03:43] eh for rebuild could be any [19:04:43] 40 minutes so far this time [19:04:48] (03PS5) 10Ori.livneh: mediawiki: use 'udplog' service alias instead of hard-coding fluorine [operations/puppet] - 10https://gerrit.wikimedia.org/r/154710 [19:06:00] ori: The svg patch itself could also benefit a clarification on the regex pattern thing [19:06:08] ori: Once that's cleared, I reckon most opsen could merge it for us. [19:06:23] It's pretty straight forrward [19:07:03] !log reedy Finished scap: testwiki to 1.24wmf19 (duration: 43m 00s) [19:07:10] Logged the message, Master [19:09:02] (03CR) 10Reedy: [C: 032] wikipedias to 1.24wmf18 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156833 (owner: 10Reedy) [19:09:04] (03CR) 10Dzahn: [C: 031] Add mobile subdomains for Wikimedia chapter wikis [operations/dns] - 10https://gerrit.wikimedia.org/r/156596 (owner: 10Reedy) [19:09:08] (03Merged) 10jenkins-bot: wikipedias to 1.24wmf18 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156833 (owner: 10Reedy) [19:09:44] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikipedias to 1.24wmf18 [19:09:50] Logged the message, Master [19:11:33] (03CR) 10Reedy: [C: 032] group0 to 1.24wmf19 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156834 (owner: 10Reedy) [19:11:38] (03Merged) 10jenkins-bot: group0 to 1.24wmf19 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156834 (owner: 10Reedy) [19:11:46] (03PS2) 10Krinkle: Work in progress: set .svg and .ico files to be compressed on bits.wikimedia.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/113687 (https://bugzilla.wikimedia.org/61442) (owner: 10Brion VIBBER) [19:12:11] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.24wmf19 [19:12:17] Logged the message, Master [19:12:29] (03CR) 10Krinkle: [C: 04-1] "Rebased for merge conflict. Re-add -1 due to concerns about + having special meaning in regex, should be escaped somehow." [operations/puppet] - 10https://gerrit.wikimedia.org/r/113687 (https://bugzilla.wikimedia.org/61442) (owner: 10Brion VIBBER) [19:15:30] (03CR) 10Dzahn: "well, pl really needs it because they have MobileFrontend already on, the others can still be discussed on the other change which i will r" [operations/dns] - 10https://gerrit.wikimedia.org/r/156597 (owner: 10Dzahn) [19:16:11] (03CR) 10Dzahn: [C: 032] add pl.m CNAME for PL chapter mobile frontend [operations/dns] - 10https://gerrit.wikimedia.org/r/156597 (owner: 10Dzahn) [19:17:22] grrr.unwanted dependency [19:17:38] (03CR) 10Reedy: "This is version control, ofc there's no need to include an old version :P" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156821 (owner: 10Gerrit Patch Uploader) [19:18:08] (03PS2) 10Reedy: Enable job queue to process notification on mediawikiwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/155169 (owner: 10Bsitu) [19:18:13] (03CR) 10Reedy: [C: 032] Enable job queue to process notification on mediawikiwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/155169 (owner: 10Bsitu) [19:18:17] (03Merged) 10jenkins-bot: Enable job queue to process notification on mediawikiwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/155169 (owner: 10Bsitu) [19:18:55] (03PS2) 10Dzahn: add pl.m CNAME for PL chapter mobile frontend [operations/dns] - 10https://gerrit.wikimedia.org/r/156597 [19:20:15] (03CR) 10Dzahn: "pl.m.wikimedia.org is an alias for m.wikimedia.org." [operations/dns] - 10https://gerrit.wikimedia.org/r/156597 (owner: 10Dzahn) [19:20:27] (03CR) 10Reedy: CommonSettings.php: Remove some dated cruft (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154442 (owner: 10PleaseStand) [19:21:38] (03PS2) 10Reedy: Use a different profile ID for job requests [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/155778 (owner: 10Aaron Schulz) [19:21:42] (03CR) 10Reedy: [C: 032] Use a different profile ID for job requests [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/155778 (owner: 10Aaron Schulz) [19:21:47] (03Merged) 10jenkins-bot: Use a different profile ID for job requests [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/155778 (owner: 10Aaron Schulz) [19:24:10] (03PS3) 10Reedy: Set wgUploadNavigationUrl for eowiki to Commons [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154448 (https://bugzilla.wikimedia.org/69055) (owner: 10TTO) [19:24:13] (03CR) 10Reedy: [C: 032] Set wgUploadNavigationUrl for eowiki to Commons [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154448 (https://bugzilla.wikimedia.org/69055) (owner: 10TTO) [19:24:17] (03Merged) 10jenkins-bot: Set wgUploadNavigationUrl for eowiki to Commons [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154448 (https://bugzilla.wikimedia.org/69055) (owner: 10TTO) [19:24:54] (03PS4) 10Reedy: Use the hash preprocessor when using HHVM [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/155594 (owner: 10Aaron Schulz) [19:25:59] (03CR) 10Reedy: [C: 032] Use the hash preprocessor when using HHVM [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/155594 (owner: 10Aaron Schulz) [19:26:04] (03Merged) 10jenkins-bot: Use the hash preprocessor when using HHVM [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/155594 (owner: 10Aaron Schulz) [19:26:38] (03PS2) 10Reedy: cswikinews: Remove unused custom namespace [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/155893 (owner: 10Danny B.) [19:26:42] (03CR) 10Reedy: [C: 032] cswikinews: Remove unused custom namespace [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/155893 (owner: 10Danny B.) [19:27:13] (03Merged) 10jenkins-bot: cswikinews: Remove unused custom namespace [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/155893 (owner: 10Danny B.) [19:27:36] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There are 5 unmerged changes in mediawiki_config (dir /a/common/). [19:27:59] (03PS2) 10Reedy: Enable webfonts by default for Divehi (dv) wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/155741 (https://bugzilla.wikimedia.org/69860) (owner: 10KartikMistry) [19:28:07] (03CR) 10Reedy: [C: 032] Enable webfonts by default for Divehi (dv) wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/155741 (https://bugzilla.wikimedia.org/69860) (owner: 10KartikMistry) [19:28:09] (03Merged) 10jenkins-bot: Enable webfonts by default for Divehi (dv) wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/155741 (https://bugzilla.wikimedia.org/69860) (owner: 10KartikMistry) [19:28:17] (03PS1) 10MaxSem: Keep beta enabled [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156879 [19:28:31] (03PS3) 10Reedy: Add namespace alias on ckbwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154239 (https://bugzilla.wikimedia.org/69594) (owner: 10Calak) [19:28:33] (03CR) 10Krinkle: "refreshDomainRedirects.php is now in this repo as of I42a07252bf6ed37231." [operations/puppet] - 10https://gerrit.wikimedia.org/r/138292 (owner: 10Ori.livneh) [19:28:35] (03CR) 10Reedy: [C: 032] Add namespace alias on ckbwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154239 (https://bugzilla.wikimedia.org/69594) (owner: 10Calak) [19:28:43] (03Merged) 10jenkins-bot: Add namespace alias on ckbwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154239 (https://bugzilla.wikimedia.org/69594) (owner: 10Calak) [19:29:02] mutante: I checked yesterday, at least ar has also mobilefrontend enabled [19:29:06] (03CR) 10Dzahn: [C: 032] download.wm.org - use apache::site method [operations/puppet] - 10https://gerrit.wikimedia.org/r/153817 (owner: 10Dzahn) [19:29:19] Reedy, and https://gerrit.wikimedia.org/r/156879 please:) [19:29:25] (03PS2) 10Reedy: Phase out $wgRateLimitLog in favor of debug bucket [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154142 (owner: 10Hashar) [19:29:29] (03CR) 10Reedy: [C: 032] Phase out $wgRateLimitLog in favor of debug bucket [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154142 (owner: 10Hashar) [19:29:34] (03Merged) 10jenkins-bot: Phase out $wgRateLimitLog in favor of debug bucket [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154142 (owner: 10Hashar) [19:30:00] Reedy: thank you! :-) [19:30:05] (03PS2) 10Reedy: Keep beta enabled [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156879 (owner: 10MaxSem) [19:30:07] lazowik: ah, there you are:) so PL works now for you on mobile? [19:30:09] (03CR) 10Reedy: [C: 032] Keep beta enabled [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156879 (owner: 10MaxSem) [19:30:15] (03Merged) 10jenkins-bot: Keep beta enabled [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156879 (owner: 10MaxSem) [19:30:20] yeah [19:30:22] Reedy, cheers Sam:) [19:30:41] well, on dekstop after switching to mobile [19:30:46] checking actual mobile now [19:31:10] http://pl.m.wikimedia.org/wiki/Strona_g%C5%82%C3%B3wna wfm, yep [19:31:44] tables <3 [19:31:57] (03PS4) 10Reedy: Change FlaggedRevs configuration on fa.wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154473 (https://bugzilla.wikimedia.org/69668) (owner: 10Calak) [19:32:02] (03CR) 10Reedy: [C: 032] Change FlaggedRevs configuration on fa.wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154473 (https://bugzilla.wikimedia.org/69668) (owner: 10Calak) [19:32:07] (03Merged) 10jenkins-bot: Change FlaggedRevs configuration on fa.wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154473 (https://bugzilla.wikimedia.org/69668) (owner: 10Calak) [19:32:34] lazowik: Max pointed out we need to enabled actual redirection [19:32:51] * lazowik just hit typosquatting on pl.wkimedia.org [19:33:11] (03PS2) 10Reedy: Flagged Revisions configuration for uk.wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/155219 (https://bugzilla.wikimedia.org/67748) (owner: 10Calak) [19:33:16] (03CR) 10Reedy: [C: 032] Flagged Revisions configuration for uk.wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/155219 (https://bugzilla.wikimedia.org/67748) (owner: 10Calak) [19:33:20] mutante: yeah, expected that [19:33:20] (03Merged) 10jenkins-bot: Flagged Revisions configuration for uk.wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/155219 (https://bugzilla.wikimedia.org/67748) (owner: 10Calak) [19:34:06] (03PS2) 10Reedy: Enable DynamicPageList on mediawiki.org [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156113 (https://bugzilla.wikimedia.org/69974) (owner: 10Nemo bis) [19:34:11] (03CR) 10Reedy: [C: 032] Enable DynamicPageList on mediawiki.org [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156113 (https://bugzilla.wikimedia.org/69974) (owner: 10Nemo bis) [19:34:14] (03Merged) 10jenkins-bot: Enable DynamicPageList on mediawiki.org [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156113 (https://bugzilla.wikimedia.org/69974) (owner: 10Nemo bis) [19:36:26] PROBLEM - puppet last run on mw1220 is CRITICAL: CRITICAL: Puppet has 1 failures [19:36:27] !log reedy Synchronized wmf-config/: (no message) (duration: 00m 14s) [19:36:33] Logged the message, Master [19:36:36] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [19:37:56] (03CR) 10Florianschmidtwelzow: "I forgot my smily ":P" :D" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156821 (owner: 10Gerrit Patch Uploader) [19:37:58] (03PS1) 10MaxSem: Enable mobile redirect fo pl.wm.org and wikimania2015 [operations/puppet] - 10https://gerrit.wikimedia.org/r/156881 [19:38:02] mutante, ^^^ [19:39:46] lazowik: 12:39 < MaxSem> mutante, all wikis but wikidata have MF. the question is whether they have an adequate mobile main page [19:39:59] MaxSem: cool:) tx [19:40:05] lazowik: ^ [19:41:35] mhm [19:41:40] (03CR) 10Dzahn: "< MaxSem> mutante, all wikis but wikidata have MF. the question is whether they have an adequate mobile main page" [operations/dns] - 10https://gerrit.wikimedia.org/r/156596 (owner: 10Reedy) [19:42:10] mutante: but not all domains in *.wikimedia.org are wikis [19:42:23] some are redirects to e.g. wikimedia.* [19:42:32] yup, that's why we maintain a manual list for *.wikimedia.org [19:43:24] does that need any extra steps besides merging to deploy? [19:43:59] nope [19:44:37] e.g. uk.mwikimedia.org [19:44:40] * uk.m.wikimedia.org [19:44:43] is not needed [19:44:59] as uk.wikimedia.org redirects to wikimedia.org.uk [19:45:11] mutante: ^ [19:45:37] (03PS6) 10Ori.livneh: mediawiki: use 'udplog' service alias instead of hard-coding fluorine [operations/puppet] - 10https://gerrit.wikimedia.org/r/154710 [19:47:20] em, what is pa_us ? [19:47:38] (03CR) 10Ori.livneh: [C: 032] mediawiki: use 'udplog' service alias instead of hard-coding fluorine [operations/puppet] - 10https://gerrit.wikimedia.org/r/154710 (owner: 10Ori.livneh) [19:47:41] (03PS1) 10Rush: phabricator elastic search logic [operations/puppet] - 10https://gerrit.wikimedia.org/r/156886 [19:47:47] lazowik: Pennsylvania, united states. [19:47:58] but subdomain? [19:48:02] with underscore? [19:49:12] (03CR) 10PleaseStand: CommonSettings.php: Remove some dated cruft (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/154442 (owner: 10PleaseStand) [19:50:00] (03CR) 10Dzahn: [C: 032] Enable mobile redirect fo pl.wm.org and wikimania2015 [puppet] - 10https://gerrit.wikimedia.org/r/156881 (owner: 10MaxSem) [19:50:16] those unprefixed names :( [19:50:20] strange though it may seem, but underscores work:P [19:50:29] (03CR) 10Michał Łazowik: "What with non-wiki domains? See inline." (032 comments) [dns] - 10https://gerrit.wikimedia.org/r/156596 (owner: 10Reedy) [19:50:35] however, the actual wiki is http://pa-us.wikimedia.org/wiki/Main_Page [19:50:36] PROBLEM - Puppet freshness on install2001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 00:00:00 UTC [19:50:45] legoktm: heh [19:50:56] the US chapter URL discussion? [19:50:57] PROBLEM - puppet last run on mw1125 is CRITICAL: CRITICAL: Epic puppet fail [19:50:57] legoktm: are you confusing it with mediawiki/dns? :) [19:51:06] been there, suggested they all use wikimedia.us to solve it [19:51:18] YuviPanda: no, with uncyclopedia/mediawiki-config [19:51:23] uncyclomedia* [19:51:24] legoktm: haha :) [19:51:29] chapter.wikimedia.us is natural to me.. but people didn't really want it [19:51:30] (03PS2) 10Rush: phabricator elastic search logic [puppet] - 10https://gerrit.wikimedia.org/r/156886 [19:51:34] legoktm: that doesn't put out here, tho [19:51:42] (03PS1) 10RobH: install params for bast2001, also updating domain_search bastions [puppet] - 10https://gerrit.wikimedia.org/r/156890 [19:51:56] (03PS3) 10Rush: phabricator elastic search logic [puppet] - 10https://gerrit.wikimedia.org/r/156886 [19:51:58] maybe it should! [19:52:02] #uncyclomedia-operations [19:52:21] lazowik: see, it made sense to just get pl. going while the others are still discussed? [19:52:40] merged the redirect now [19:52:43] mutante: ok, next one m.uncyclomedia.org [19:52:57] you could check on mobile device now if you want [19:53:09] (03CR) 10Rush: [C: 032] "setup logic" [puppet] - 10https://gerrit.wikimedia.org/r/156886 (owner: 10Rush) [19:53:16] lazowik, pa_us used to be pa.us until people realised that broke SSL [19:53:18] lazowik: wait, uncyclomedia is not really WMF :) [19:53:26] Then it moved to pa-us [19:53:28] u don't say :D [19:53:36] haha [19:53:52] Krenair: wildcard only works for one level? [19:54:03] and then another chapter wanted us-something instead of something-us [19:54:04] mutante: mobile doesn't redirect :( [19:54:08] and we said not a good idea [19:54:09] and then drama [19:54:18] lazowik: yes [19:54:26] only one level [19:54:28] And people still aren't happy [19:54:29] https://bugzilla.wikimedia.org/show_bug.cgi?id=64557 [19:54:58] they are never happy, that is what people are for [19:55:04] lazowik, it will take up to 30 minutes to update [19:55:07] Krenair: ? but they abandoned the gerrit changes? i had removed my -1 [19:55:18] * Krenair shrugs [19:55:21] ok [19:55:26] RECOVERY - puppet last run on mw1220 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [19:55:33] Also we have nyc.wikimedia.org [19:55:40] Not nyc-us or us-nyc, just nyc [19:55:43] (03PS2) 10Ori.livneh: mediawiki: add packages::fonts and packages::multimedia [puppet] - 10https://gerrit.wikimedia.org/r/156842 [19:55:46] nice [19:55:55] wikimedia.us would solve it, and we have it anyways.. but shrug [19:55:59] 65 COMMENTS oO [19:56:00] yeah [19:56:08] sorry, had to shout [19:56:22] (03CR) 10Reedy: "I just copied the lot from the Wikimedia dblist for ease" [dns] - 10https://gerrit.wikimedia.org/r/156596 (owner: 10Reedy) [19:56:28] So since I'm touching the domain search strings for every bastion [19:56:29] https://gerrit.wikimedia.org/r/#/c/156890/1 [19:56:30] (on bugzilla) [19:56:32] someone else please +1 that [19:56:33] (03PS3) 10Ori.livneh: mediawiki: add packages::fonts and packages::multimedia [puppet] - 10https://gerrit.wikimedia.org/r/156842 [19:56:41] (so im not taking shit down on a typo ;) [19:56:46] (03PS1) 10Rush: point phabricator to main es [puppet] - 10https://gerrit.wikimedia.org/r/156891 [19:57:22] lazowik, https://bugzilla.wikimedia.org/show_bug.cgi?id=62598 [19:57:25] (03CR) 10RobH: [C: 031] "I'm going to try to hunt down a second party to review so I don't go crashing bastion search domains on a typo." [puppet] - 10https://gerrit.wikimedia.org/r/156890 (owner: 10RobH) [19:57:49] someone review my patchset and I'll owe them a similar easy review ;] [19:58:00] Krenair: … [19:58:05] * robh pokes cmjohnson1 [19:58:13] oh, he may be actually in eqiad... [19:58:20] lazowik, If we had https://bugzilla.wikimedia.org/show_bug.cgi?id=50422 I could find the biggest [19:58:34] bah, he is [19:58:39] cmjohnson1: disregard, sorry [19:58:46] jgage: check my patchset pls? [19:58:55] Krenair: hah [19:59:09] robh: it's never "esams.wmnet"? [19:59:16] that is being removed from hooft [19:59:25] (03CR) 10Reedy: [C: 031] "Looks sane..." [puppet] - 10https://gerrit.wikimedia.org/r/156890 (owner: 10RobH) [19:59:30] oh [19:59:35] mutante: indeed, you are right [19:59:36] i fucked up [19:59:42] so ONLY hooft can route esams.wmnet [19:59:47] as that mgmt network isnt bridged [19:59:49] lazowik, but yeah, controversy on bugzilla tends to generate 50+ comments [19:59:52] and i entirely overlooked it =P [20:00:00] and i put in shit it cannot touch, ack [20:00:11] ew, why is that not an array [20:00:30] (03CR) 10Dzahn: [C: 04-1] "removes esams.wmnet from hooft, but 13:01 < robh> so ONLY hooft can route esams.wmnet" [puppet] - 10https://gerrit.wikimedia.org/r/156890 (owner: 10RobH) [20:00:40] i didnt really think about that correctly [20:00:44] =P [20:00:52] was all 'standardization happy' [20:01:12] and this is why we don't self review anymore. [20:01:21] (03PS2) 10RobH: install params for bast2001, also updating domain_search bastions [puppet] - 10https://gerrit.wikimedia.org/r/156890 [20:01:49] (03PS1) 10Ori.livneh: Allow $domain_search to be an array of items [puppet] - 10https://gerrit.wikimedia.org/r/156892 [20:01:51] feels ori is going to turn that into [20:01:55] haha [20:01:56] haha, that ^ :) [20:02:00] (03CR) 10Chad: phabricator elastic search logic (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/156886 (owner: 10Rush) [20:02:00] ^d, looks like 1016 is ready to be taken offline, right? [20:02:04] :P [20:02:13] robh: quick, merge and cause a merge conflict [20:02:14] ;) [20:02:15] ottomata: Yessir [20:02:18] It's drained. [20:02:30] or not quite [20:02:47] nah, no conflict, my patch allows the current format [20:03:04] chad you look...different! [20:03:18] That's boring [20:03:20] i learned the Array() trick from ottomata actually [20:04:06] (03CR) 10Chad: "Ignore previous comments. Following up on followup change." [puppet] - 10https://gerrit.wikimedia.org/r/156886 (owner: 10Rush) [20:04:28] anyways, Reedy, mutante and MaxSem: thanks :) [20:04:39] (03CR) 10Dzahn: [C: 031] install params for bast2001, also updating domain_search bastions [puppet] - 10https://gerrit.wikimedia.org/r/156890 (owner: 10RobH) [20:04:48] mutante: thx dude [20:05:05] (03CR) 10Chad: point phabricator to main es (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/156891 (owner: 10Rush) [20:05:18] (03CR) 10RobH: [C: 032] "Well if this patchset wasn't a good example of why we stopped self-reviewing our own stuff, I dunno what is." [puppet] - 10https://gerrit.wikimedia.org/r/156890 (owner: 10RobH) [20:06:08] (03CR) 10Dzahn: [C: 031] Allow $domain_search to be an array of items [puppet] - 10https://gerrit.wikimedia.org/r/156892 (owner: 10Ori.livneh) [20:06:27] hey ori [20:06:32] hey MatmaRex [20:06:39] bbiaw [20:06:42] ori: remember that hack for mediawiki and jquery being raw modules you added last week? [20:06:51] ori: did a fix for that end up in master? [20:07:05] because it seems we're getting the same errors again :D [20:07:17] i don't know. Krinkle, legoktm? [20:07:28] I didn't end up in master [20:07:34] so it got undeployed this morning [20:07:35] (03PS2) 10Ori.livneh: Allow $domain_search to be an array of items [puppet] - 10https://gerrit.wikimedia.org/r/156892 [20:07:38] it* [20:07:40] (03CR) 10Ori.livneh: [C: 032 V: 032] Allow $domain_search to be an array of items [puppet] - 10https://gerrit.wikimedia.org/r/156892 (owner: 10Ori.livneh) [20:07:44] * mchwmf puts legoktm into master [20:07:59] and my fix has a -1 that I haven't gotten around to fixing yet [20:08:01] but I can do that right now [20:08:04] ori: https://bugzilla.wikimedia.org/show_bug.cgi?id=69924 [20:08:23] I don't think that patch is the same bug [20:08:29] ok [20:08:38] jquery|mediawiki being ready doesn't unbreak the condition wrap, right? [20:08:57] RECOVERY - puppet last run on mw1125 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [20:08:57] ori: What was the appending of loader.state addressing? [20:09:01] (03PS2) 10BBlack: lvs200x initial puppet config [puppet] - 10https://gerrit.wikimedia.org/r/156797 [20:09:48] (03PS3) 10BBlack: lvs200x initial puppet config [puppet] - 10https://gerrit.wikimedia.org/r/156797 [20:09:49] Krinkle: ResourceLoader / JavaScript are such a context-switch from what I'm looking at currently that I'm having a hard time remembering the details [20:10:07] Krinkle: the problem is things depending on the mediawiki and jquery modules [20:10:20] Krinkle: since the modules are raw, RL never registers that they are already loaded, and loads them again [20:10:27] which clobbers the state of mw.* and $.* [20:10:47] which causes weird errors, "undefined mw.config" being the most common [20:11:06] ori's hack marked these modules as ready so that they wouldn't be reloaded [20:11:23] MatmaRex: I remember now [20:11:36] MatmaRex: ori: the problem was gadgets explicitly depending on jquery or mediawiki [20:11:38] which is illegal [20:11:45] not the conditional wrap [20:11:49] thats unrelated [20:11:59] and also not affecting the same thing (user scripts vs proper modules) [20:12:03] yes [20:12:16] but right now i have no idea what's causing the reloading [20:12:27] i see the errors on pl.wp, but i also think i eradicated all problematic dependencies [20:12:31] (03CR) 10Chad: point phabricator to main es (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/156891 (owner: 10Rush) [20:12:37] well, if something explicitly depends on it, mw.loader will load a fresh copy of jquery and blow away any plugins [20:12:45] same with meidawiki, blow away any attached classes and properties [20:13:17] MatmaRex: Hm.. would be nice to search gadgets-definitions on all wikis [20:14:17] (03PS1) 10Ori.livneh: $domain_search: convert to an array of values [puppet] - 10https://gerrit.wikimedia.org/r/156897 [20:14:21] robh: ^ [20:14:48] (03CR) 10BBlack: [C: 032] lvs200x initial puppet config [puppet] - 10https://gerrit.wikimedia.org/r/156797 (owner: 10BBlack) [20:14:59] ohh [20:15:02] that reads nicely [20:15:35] ori: why the empty space after the start, is that standard? [20:15:59] !log temporarily disable puppet on gadolinium [20:16:06] eg: domain search line, then empty line, then the array [20:16:06] i think that's just the diff highlighting [20:16:06] Logged the message, Master [20:16:10] k [20:16:52] robh: i'm running the catalog compiler for the relevant nodes just to be extra safe: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/262/ [20:17:24] this looks good to me, but i like testing ^_^ [20:17:40] Krinkle: i'm going to fix https://gerrit.wikimedia.org/r/#/c/155647/ , let's merge and backport afterwards? [20:17:50] oh, legoktm did it [20:17:53] oh, I just updated it [20:18:48] hrm, failed with 'grep: writing output: No space left on device ' [20:19:12] no bueno [20:19:40] it's the puppet compiler labs instance again [20:19:52] its prolly ok to merge as long as we arent in a deployment window [20:19:58] yeah, it's borked (the instance) [20:19:59] so i can watch it merge on tin, and ensure it works [20:20:03] MatmaRex: Cool. Is there a bug we can refer to in the code comment? [20:20:04] cool, go for it [20:20:13] Krinkle: don't think so [20:20:16] (cuz i dunno if every script for deployement uses fqdn or not, i assume not) [20:20:43] (03PS2) 10RobH: $domain_search: convert to an array of values [puppet] - 10https://gerrit.wikimedia.org/r/156897 (owner: 10Ori.livneh) [20:20:46] rebassseeee [20:21:09] Krinkle: we can refer to https://gerrit.wikimedia.org/r/#/c/155643/ i guess [20:21:11] MatmaRex: Gonna make a small change to the comment to document something and namedrop the word "illegal" for consistency elsewhere with how we refer to qjeruy/mediawiki [20:21:20] is anyone looking at the exceptions in the error logs? [20:21:20] right [20:21:25] https://ganglia.wikimedia.org/latest/graph.php?r=day&z=xlarge&title=MediaWiki+errors&vl=errors+%2F+sec&x=0.5&n=&hreg[]=vanadium.eqiad.wmnet&mreg[]=fatal|exception>ype=stack&glegend=show&aggregate=1&embed=1 [20:22:47] exception log on fluorine is quiet [20:23:00] (03CR) 10RobH: [C: 032] "array trumps string" [puppet] - 10https://gerrit.wikimedia.org/r/156897 (owner: 10Ori.livneh) [20:23:24] First one being [20:23:24] 2014-08-28 20:23:12 mw1050 commonswiki: [9cb05d8c] /wiki/File:Flag_of_Mexico_(1864-1867).svg Exception from line 1871 of /usr/local/apache/common-local/php-1.24wmf18/includes/filerepo/file/LocalFile.php: Could not acquire lock for 'Flag_of_Mexico_(1864-1867).svg.' [20:26:41] (03PS1) 10Ottomata: Set uid, gid and mode options for webstats-collector tmpfs mount [puppet] - 10https://gerrit.wikimedia.org/r/156899 [20:28:13] (03CR) 10QChris: [C: 031] Set uid, gid and mode options for webstats-collector tmpfs mount [puppet] - 10https://gerrit.wikimedia.org/r/156899 (owner: 10Ottomata) [20:30:32] (03PS2) 10Ottomata: Set uid, gid and mode options for webstats-collector tmpfs mount [puppet] - 10https://gerrit.wikimedia.org/r/156899 [20:30:34] (03CR) 10Ottomata: [C: 032 V: 032] Set uid, gid and mode options for webstats-collector tmpfs mount [puppet] - 10https://gerrit.wikimedia.org/r/156899 (owner: 10Ottomata) [20:30:40] cmjohnson1: shards are now all off of elastic1016 [20:30:47] you can take it down and put the new SSDs in it [20:31:03] let's do it now...so if doesn't work we can get it back online asap [20:31:07] ^d, I'm going to depool 1016, s'ok? [20:31:20] legoktm: MatmaRex: Is one of you going to backport? I can do it, just checking if you were planning to already [20:31:22] ottomata: depool and shutdown..ping when you're ready [20:31:30] I wasn't [20:31:35] Krinkle: i couldn't deploy it anyway [20:31:57] MatmaRex: SWAT deploys for you, you'd request a slow and add it to the queue/page [20:32:02] (and i think it'd be better to deploy now) [20:32:03] and be in this channel in 3 hours [20:32:09] Yeah, I'll do it now [20:32:09] manybubbles: anything else I need to do to take 1016 down? [20:32:12] shards are not on it [20:32:13] (unless there's a reason not to) [20:32:14] i've depooled it [20:32:24] <^d> cmjohnson1, ottomata: Go for it. [20:32:27] ottomata: nothing [20:32:45] greg-g: I'm going to deploy a backport for resourceloader fix. There was a hotpatch for it in the previous wmf branch, but since the new branch doesn't have it, have to re-do it for the new one ASAP. https://gerrit.wikimedia.org/r/#/c/155647/ [20:32:46] ok, shutting it down [20:33:04] greg-g: double checking if it's okay to do now. [20:33:28] !log shutting down elastic1016 [20:33:34] Logged the message, Master [20:35:09] cmjohnson1: [20:35:10] go ahead [20:35:21] it might still be powering off, but I've lost my ssh connection to it [20:35:26] PROBLEM - Host elastic1016 is DOWN: PING CRITICAL - Packet loss = 100% [20:35:30] i acked that! [20:35:34] i mean, scheduled downtime [20:35:35] i think... [20:36:11] actually, i'm not sure where to ack that [20:36:19] ah found it [20:36:29] merged and works (domain search) [20:42:28] (03PS1) 10BBlack: move lvs200x to hw raid setup [puppet] - 10https://gerrit.wikimedia.org/r/156946 [20:42:46] (03CR) 10BBlack: [C: 032 V: 032] move lvs200x to hw raid setup [puppet] - 10https://gerrit.wikimedia.org/r/156946 (owner: 10BBlack) [20:47:15] ottomata: booting up now...let's hope it installs [20:47:25] mutante: ok, redirect works now, thanks again [20:48:18] oook, fingers and toes crossed cmjohnson1! [20:49:16] ottomata: success it's installing [20:49:16] ! [20:49:24] ok, so sounds like a weird vlan problem for 1019, eh? [20:49:28] if 1018 and 1019 have the same deal [20:49:32] but 1016 does not [20:49:49] hm, but [20:49:51] or h/w ..they're a different type of server [20:49:53] yeah [20:49:56] 1016 and 1019 are in the same rack [20:50:03] meaning, they shoudl be on the same vlan, rigth? [20:50:16] it could be the h310 controller card screwing it up [20:50:21] hm, could be, ja [20:50:26] they are same rack and everything [20:50:29] i reinstalled them before though... [20:50:45] i know, i installed once b4 as well...not sure what would've changed [20:50:50] that's the baffling part [20:50:57] yeah [20:51:01] Krinkle: deployed/deploying? [20:51:06] PROBLEM - Disk space on elastic1009 is CRITICAL: DISK CRITICAL - free space: /var/lib/elasticsearch 19627 MB (3% inode=99%): [20:51:16] RECOVERY - Host elastic1016 is UP: PING OK - Packet loss = 0%, RTA = 0.46 ms [20:51:40] Nope, waiting for Math.max( jenkins merging, greg-g responding w/ timeout of 20 min, my bacon not burning to a crisp) [20:52:30] min() [20:52:32] erp [20:53:06] PROBLEM - Disk space on elastic1009 is CRITICAL: DISK CRITICAL - free space: /var/lib/elasticsearch 20141 MB (3% inode=99%): [20:53:12] Krinkle: the first two already happened. dunno about your bacon though. [20:54:01] MatmaRex: [1] only just this minute though [20:54:30] a timeout's a timeout, if you ask me ;) [20:54:33] psshh [20:55:21] pdssh [20:55:57] MatmaRex: deploying now [20:57:30] !log krinkle Synchronized php-1.24wmf19/includes/resourceloader/ResourceLoaderStartUpModule.php: fd5b963458c19 (duration: 00m 06s) [20:57:36] Logged the message, Master [20:58:06] https://bits.wikimedia.org/www.mediawiki.org/load.php?debug=false&lang=en&modules=startup&only=scripts&skin=vector123 [20:58:07] ottomata: install is finished [20:58:08] "jquery" 0 matches [20:58:09] :) [20:58:11] MatmaRex: [20:58:12] legoktm: [20:58:14] ori: [20:58:44] :D [20:59:36] yay [20:59:36] PROBLEM - Puppet freshness on mw1053 is CRITICAL: Last successful Puppet run was Wed 27 Aug 2014 06:41:47 UTC [20:59:48] cmjohnson1: amazing [21:00:06] RECOVERY - Disk space on elastic1009 is OK: DISK OK [21:00:08] ^d: is it ok to just run puppet and bring this thing back into the cluster? [21:00:17] yep..so that confirms it's the h/w on those 2 probably 3 servers [21:00:22] ja [21:00:45] <^d> ottomata: Yeah you can run puppet. It won't repool it though. [21:00:51] <^d> Have to de-ban it from shard allocation. [21:00:59] oh you banned it? [21:01:00] oh right [21:01:02] when you removed the shards [21:01:03] hmmm [21:01:07] ok, i think that's good [21:01:20] hm, ^d, can we de-ban it today, and then repool it tomorrow? [21:01:20] Krinkle: sorry, was in a meeting, if the question is "is it clear?" https://wikitech.wikimedia.org/wiki/Deployments says yes [21:01:33] that way we won't serve traffic on it while I am away (i'm almost done for the day) [21:02:01] <^d> Let's start re-sharding to it now. [21:02:09] ok, lemme get puppet up then [21:02:26] Krinkle: hm [21:02:37] Krinkle: can you reproduce weird errors on https://pl.wikipedia.org/wiki/Czesław_Robakowski ? (wmf18) [21:02:59] oh boy [21:03:00] trusty. [21:03:01] right. [21:03:34] um, ^d, are there good reasons to not do trusty? [21:03:42] this just got reinstalled as trusty, rob recently made it the default [21:03:48] <^d> Haven't tested :) [21:03:50] haha [21:03:51] yeah [21:03:51] <^d> First time for everything! [21:04:14] well i mean, are the packages even available for trusty now? [21:05:10] <^d> I think so? We'll find out when puppet runs. [21:05:17] checking manually... [21:05:53] hm i think so... [21:05:55] ok [21:06:01] welp, let's try it i guess [21:06:52] running puppet [21:07:26] Krinkle: D: [21:07:37] Krinkle: there is no fix/workaround on wmf18 [21:07:42] ori's is on 17 [21:07:43] MatmaRex: bacon almost crisp, brb post-diner [21:07:44] yours is on 19 [21:07:50] oriiiii [21:07:52] OK. let's go 18 as well then [21:08:00] wil need to find someone else or wait 1h [21:08:18] hm? [21:08:22] what's up? [21:08:25] i'm here? [21:08:30] er, that last one should be a statement [21:08:33] what do you need me to do? [21:08:42] ori: https://gerrit.wikimedia.org/r/#/c/155647/ in wmf18 [21:08:55] (it is in wmf19 already) [21:09:02] bd808: do you have a specific reason to suggest ::mediawiki::sync rather than mediawiki or role::mediawiki::common? [21:09:17] MatmaRex: https://gerrit.wikimedia.org/r/#/c/156953/ ; i'll merge [21:09:23] MatmaRex: i'll deploy rather [21:09:25] if you +1/+2 [21:09:38] Just that the scap bits are in ::mediawiki::sync [21:09:44] ori: +1'd, can't +2 [21:10:37] (03CR) 10Chad: [C: 031] point phabricator to main es [puppet] - 10https://gerrit.wikimedia.org/r/156891 (owner: 10Rush) [21:11:26] andrewbogott: If your going for the whole enchilada, role::mediawiki::webserver is the right role [21:11:49] well. that's more than you want too [21:12:02] (03PS2) 10Rush: point phabricator to main es [puppet] - 10https://gerrit.wikimedia.org/r/156891 [21:12:08] (03CR) 10Rush: [C: 032 V: 032] point phabricator to main es [puppet] - 10https://gerrit.wikimedia.org/r/156891 (owner: 10Rush) [21:12:12] (03PS1) 10Andrew Bogott: Add ::mediawiki::sync to virt1000. [puppet] - 10https://gerrit.wikimedia.org/r/156956 [21:12:18] bd808: ^ [21:14:02] (03CR) 10BryanDavis: [C: 031] "This is necessary for scap/sync-common. I'm not yet sure if it is sufficient but we can always add more config if needed." [puppet] - 10https://gerrit.wikimedia.org/r/156956 (owner: 10Andrew Bogott) [21:14:47] !log ori Synchronized php-1.24wmf18/includes/resourceloader/ResourceLoaderStartUpModule.php: (no message) (duration: 00m 03s) [21:14:48] bd808: I will merge and we'll see what happens :) [21:14:53] Logged the message, Master [21:15:07] !log last sync was of Iac37a2369: resourceloader: Don't register raw modules client-side [21:15:11] MatmaRex: ^ [21:15:14] Logged the message, Master [21:15:25] !log i never admin logged when install2001.wikimedia.org went online the other day, opps. [21:15:30] Logged the message, Master [21:15:34] !log bast2001.wikimedia.org now online in codfw. [21:15:40] Logged the message, Master [21:16:14] * bd808 was tricked by that !log -- Permission denied (publickey). [21:16:40] ori: thanks [21:18:23] (03PS2) 10Andrew Bogott: Add ::mediawiki::sync to virt1000. [puppet] - 10https://gerrit.wikimedia.org/r/156956 [21:19:18] (03CR) 10Andrew Bogott: [C: 032] Add ::mediawiki::sync to virt1000. [puppet] - 10https://gerrit.wikimedia.org/r/156956 (owner: 10Andrew Bogott) [21:19:40] ok ^d, looking good on 1016 [21:19:42] i think [21:19:45] i deployed plugins [21:19:49] elasticsearch is running [21:19:54] ready for shards to come back [21:20:08] <^d> Okie dokie [21:22:12] <^d> unbanned. [21:22:56] PROBLEM - puppet last run on virt1000 is CRITICAL: CRITICAL: Epic puppet fail [21:24:39] (03PS1) 10Andrew Bogott: Moved ::mediawiki::sync to the Openstack manager class. [puppet] - 10https://gerrit.wikimedia.org/r/156957 [21:24:57] (03PS2) 10Rush: mw-rc-irc - actually use the Apache class [puppet] - 10https://gerrit.wikimedia.org/r/156740 (owner: 10Dzahn) [21:25:04] (03CR) 10Rush: [C: 031] "nice thanks" [puppet] - 10https://gerrit.wikimedia.org/r/156740 (owner: 10Dzahn) [21:25:31] (03CR) 10Rush: "what's the thought here? go for bold?" [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/156100 (https://bugzilla.wikimedia.org/69747) (owner: 10Aklapper) [21:25:33] (03CR) 10Ori.livneh: "If you can avoid using /a, do -- we're trying desperately to kill it in ::mediawiki::sync, so it's a bit of a bummer to see it spread." [puppet] - 10https://gerrit.wikimedia.org/r/156957 (owner: 10Andrew Bogott) [21:26:14] <^d> ottomata: Data's moving back to 1016. It'll probably take a couple hours for it all to smoothen out again. [21:27:06] (03CR) 10Dzahn: [C: 032] Work around Bugzilla XML RPC bug with special Unicode characters [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/156100 (https://bugzilla.wikimedia.org/69747) (owner: 10Aklapper) [21:27:08] (03CR) 10Andrew Bogott: "I don't think I'm spreading it, just maintaining the status quo. It's already present on virt1000 but the duplicate definition in ::sync " [puppet] - 10https://gerrit.wikimedia.org/r/156957 (owner: 10Andrew Bogott) [21:27:37] (03PS1) 10RobH: adding admin class to bast2001 [puppet] - 10https://gerrit.wikimedia.org/r/156959 [21:27:43] (03CR) 10Dzahn: [V: 032] Work around Bugzilla XML RPC bug with special Unicode characters [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/156100 (https://bugzilla.wikimedia.org/69747) (owner: 10Aklapper) [21:28:45] chasemp: andre__ ^ [21:28:50] mutante: thanks [21:28:55] what borke? [21:29:13] ah, you merged? [21:29:20] andre__: yes, merged your patch [21:29:29] mutante: cool. let's see what breaks :P [21:29:41] chasemp: I'll try your limited testcase script in a few minutes [21:29:45] try fetching something via XMLRPC? [21:29:54] ok,cool [21:29:56] will do in a few min, need to fix something else first [21:30:04] * mutante nods [21:30:18] chasemp: you're of course also welcome to have a testrun of exporting data to fabexim and see if there's weirdness [21:30:44] poking at something else atm, not in a good place to test [21:30:49] and fabexim is destroyed :) [21:30:55] cool, i see shards coming back ^d [21:30:59] ok, will leave that as is until tomorrow [21:31:02] i gotta head home soon [21:31:04] seeyaaaaa [21:31:08] (03CR) 10Andrew Bogott: [C: 032] Moved ::mediawiki::sync to the Openstack manager class. [puppet] - 10https://gerrit.wikimedia.org/r/156957 (owner: 10Andrew Bogott) [21:31:10] (03CR) 10Dzahn: [C: 031] adding admin class to bast2001 [puppet] - 10https://gerrit.wikimedia.org/r/156959 (owner: 10RobH) [21:31:17] (03PS2) 10Dzahn: adding admin class to bast2001 [puppet] - 10https://gerrit.wikimedia.org/r/156959 (owner: 10RobH) [21:32:20] (03CR) 10RobH: [C: 032] adding admin class to bast2001 [puppet] - 10https://gerrit.wikimedia.org/r/156959 (owner: 10RobH) [21:37:33] (03PS1) 10BBlack: macaddrs for lvs200[456] [puppet] - 10https://gerrit.wikimedia.org/r/156964 [21:37:50] (03CR) 10BBlack: [C: 032 V: 032] macaddrs for lvs200[456] [puppet] - 10https://gerrit.wikimedia.org/r/156964 (owner: 10BBlack) [21:39:22] (03PS5) 10Dzahn: Add mobile subdomains for Wikimedia chapter wikis [dns] - 10https://gerrit.wikimedia.org/r/156596 (owner: 10Reedy) [21:40:12] chasemp: using your minimal script against BZ now it doesn't crash anymore for #9444 after merging that workaround [21:40:28] does it still grab correct output for other tickets :D [21:40:56] running in verbose mode now to take a look at the XML [21:41:11] (03CR) 10Dzahn: [C: 032] mw-rc-irc - actually use the Apache class [puppet] - 10https://gerrit.wikimedia.org/r/156740 (owner: 10Dzahn) [21:41:26] PROBLEM - puppet last run on virt0 is CRITICAL: CRITICAL: Puppet has 1 failures [21:41:36] PROBLEM - Disk space on virt1000 is CRITICAL: DISK CRITICAL - free space: / 2124 MB (3% inode=81%): [21:41:52] * andre__ should probably give xmllint a shot [21:42:03] (03PS1) 10Yuvipanda: graphite: Only send data from betacluster to graphite.wmflabs.org [puppet] - 10https://gerrit.wikimedia.org/r/156966 [21:42:07] chasemp, yeah, looks good [21:42:11] andrewbogott: ^ disk space virt1000 ? [21:42:12] greg-g: ^ should make graphite.wmflabs.org a lot more stable or betacluster [21:42:13] *for [21:42:14] andre__: sweet that is good stuff [21:42:25] mutante: I'm watching… not sure what the problem is yet. [21:42:36] * YuviPanda waves at chasemp [21:43:15] *flaps arms wildly* [21:43:22] hashar: if you want, I can put integration back in - since tools is the one that uses most cpu / disk anyway [21:43:24] and hey YuviPandal long time no talk [21:43:29] chasemp: indeed [21:43:45] I see you knocking things out graphite side in labs, seems good [21:44:00] * andre__ sees YuviPanda and thinks of making him comment on http://fab.wmflabs.org/T221 ;) [21:44:03] chasemp: indeed, our graphite stack now runs on trusty as well [21:44:14] (03CR) 10Dzahn: "alright, this installed apache defaults, site config and more on argon." [puppet] - 10https://gerrit.wikimedia.org/r/156740 (owner: 10Dzahn) [21:44:22] (03CR) 10Greg Grossmeier: [C: 031] "Thank you, Yuvi. I hope we can get the general toollabs info back in here (or another graphite) later, but for now, I think the health of " [puppet] - 10https://gerrit.wikimedia.org/r/156966 (owner: 10Yuvipanda) [21:44:25] andre__: :) But... but... PHP! :) [21:44:27] YuviPanda: it was good of you to volunteer to make graphite in prod trusty [21:44:30] much appreciated :D [21:44:34] hehe [21:44:42] bd808: so… running that puppet change seems to be gobbling up disk space like mad. Any idea what that's about? Is it likely to be syncing something now, preemptively? [21:44:50] chasemp: :D labmon1001 is running graphite + txstatsd in trusty as well. just needs ports to be opened up so we can start sending data [21:45:05] YuviPanda: integration can wait. Pool it back whenever graphite.labs is robust enough :] [21:45:12] grr, and now virt1000 is going to die [21:45:12] hashar: \o/ cool [21:45:18] andrewbogott: Yeah it will pull over the /usr/local/apache/common content [21:45:27] shit, ok... [21:45:31] apparently that was more than there was room for [21:45:34] sleeping time [21:45:37] (03PS2) 10Yuvipanda: graphite: Only send data from betacluster to graphite.wmflabs.org [puppet] - 10https://gerrit.wikimedia.org/r/156966 [21:45:39] chasemp: wanna merge ^? [21:45:56] Krinkle: note that cvn -> graphite is being disabled atm, to ensure that the labs box stays up for betacluster. [21:45:59] andrewbogott: gah. sorry I didn't make that more clear [21:46:14] (03PS3) 10Rush: graphite: Only send data from betacluster to graphite.wmflabs.org [puppet] - 10https://gerrit.wikimedia.org/r/156966 (owner: 10Yuvipanda) [21:46:15] YuviPanda: OK. It's only like 3 instances though [21:46:20] but OK.. [21:46:21] YuviPanda: assuming you clearned w/ ppl you are turning off? [21:46:27] bd808: not your fault, there should have been room… it's much bigger than I expected. [21:46:36] rm'ing for the moment... [21:46:36] it's huge [21:46:39] bd808: how huge? [21:46:42] (03CR) 10Rush: [C: 031] "if greg likes it I hate it. but then again whatever, seems great!" [puppet] - 10https://gerrit.wikimedia.org/r/156966 (owner: 10Yuvipanda) [21:46:52] chasemp: yeah. quarry is me, toollabs is me, integration is hashar and cvn is Krinkle [21:46:53] (03CR) 10Rush: [C: 032 V: 032] "if greg likes it I hate it. but then again whatever, seems great!" [puppet] - 10https://gerrit.wikimedia.org/r/156966 (owner: 10Yuvipanda) [21:47:00] (03PS1) 10BBlack: macaddr for lvs2003 [puppet] - 10https://gerrit.wikimedia.org/r/156967 [21:47:10] YuviPanda: oh graphite, not ganglia.. [21:47:12] andrewbogott: running du on tin to get a real number [21:47:18] YuviPanda: got it, your gtg [21:47:24] (03PS2) 10BBlack: macaddr for lvs2003 [puppet] - 10https://gerrit.wikimedia.org/r/156967 [21:47:24] chasemp: \o/ cool. [21:47:29] (03CR) 10BBlack: [C: 032 V: 032] macaddr for lvs2003 [puppet] - 10https://gerrit.wikimedia.org/r/156967 (owner: 10BBlack) [21:47:36] RECOVERY - Disk space on virt1000 is OK: DISK OK [21:47:39] greg-g: you can use graphite.wmflabs.org now, should be fairly stable [21:47:53] yay [21:47:55] it will take a cycle for that to catch up [21:47:56] PROBLEM - Redis on virt1000 is CRITICAL: Connection refused [21:48:03] and stop the offenders [21:48:11] chasemp: I've a whitelist on the machine (unpuppetized) as well :P [21:48:13] andrewbogott: Current size of /a/common on tin is 31G [21:48:15] chasemp: I can confirm you can turn off graphite.labs for 'integration' project. [21:48:26] bd808: /dev/mapper/vg1-root 65G 56G 5.9G 91% / [21:48:27] thanks hashar :) [21:48:42] What is all that? It's way bigger than my mw install [21:48:43] chasemp: I have a few days of data already, that is more than enough for my use case. [21:49:13] (03PS3) 10Rush: Disable darkconsole on legalpad.wm.o [puppet] - 10https://gerrit.wikimedia.org/r/156291 (owner: 10Chad) [21:49:19] (03CR) 10Rush: [C: 032 V: 032] Disable darkconsole on legalpad.wm.o [puppet] - 10https://gerrit.wikimedia.org/r/156291 (owner: 10Chad) [21:49:26] andrewbogott: It's 14 mw installs plus l10n cache for some subset of that number [21:49:34] (03CR) 10Dzahn: "added a dba :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/134962 (owner: 10Reedy) [21:49:50] chasemp: also, you're a jerk [21:49:53] bd808: can I configure where it syncs to? I have tons of space in /a [21:49:57] /dev/mapper/vg1-data1 745G 123G 623G 17% /a [21:50:16] greg-g: wha? you shock me sir [21:50:39] chasemp: get used to it. I'm all shock and awe. [21:50:56] you mean shock and awwwwwwww [21:51:30] that's just my cuddly beard [21:51:33] andrewbogott: You can make /usr/local/apache/common-local a symlink to somewhere else and the code will end up there [21:51:36] PROBLEM - Puppet freshness on install2001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 00:00:00 UTC [21:52:05] The puppet shouldn't mess with that because there is no force [21:52:14] bd808: that's terrible, but, yeah, that's probably what I'll do [21:53:04] Also shame on you for havin /a instead of /srv :) [21:53:08] greg-g: good news! I can setup icinga alerts for betacluster *now*, and it's trivial to do. [21:53:14] Hey, I didn't build this server, it was like this when I got here [21:53:16] greg-g: you just have to tell me what to set alerts for [21:53:41] andrewbogott: ori doesn't let me off for that excuse so I won't let you off either ;) [21:53:46] fair [21:54:15] In related news, Reedy there are 14 wmf branches on tin :( [21:54:23] so, /a/common will be linked to /usr/local/apache/common-local which will be linked to /a/common-local \o/ [21:54:35] bd808: Yeah, need clean up again [21:54:53] YuviPanda: that's awesome! how? [21:54:56] andrewbogott: perfect! [21:55:18] YuviPanda: just betacluster or any labs service? [21:55:18] mutante: monitor_graphite_threshold, with graphite url [21:55:25] Reedy: I would really recommend dropping a branch every tuesday [21:55:26] mutante: set to graphite.wmflabs.org [21:55:36] YuviPanda: i'll have to look at an example soon to copy it [21:55:47] mutante: well, only betacluster for now since that's all I have enabled for graphite.wmflabs.org [21:55:57] ok [21:56:12] andrewbogott: On the plus side, we just validated that all you needed to sync was that puppet class [21:56:18] YuviPanda: what are my options?! neat! [21:56:38] greg-g: explore deployment-prep on graphite.wmflabs.org, and we can set threshold based alerts for any of them [21:56:39] YuviPanda: anything in graphite? [21:56:43] * greg-g nods [21:57:15] greg-g: pretty much [21:57:59] YuviPanda: puppetagent::time_since_last_run !!:) [21:58:09] mutante: that's actually useless. you want 'failed_events' [21:58:14] :p ok [21:58:21] mutante: time_since_last_run only detects puppet disabled [21:58:23] well, you can have no failures [21:58:27] if you dont run it at all [21:58:35] indeed, so you want both [21:58:35] !log Jenkins: deployment-bastion slave was no more processing jobs due to executors being stalled somehow. Marked the node offline and bring it back online to have the executors killed and recreated. Beta cluster is updating again (has been frozen for 2:30 hours). [21:58:39] but the former is more interesting [21:58:42] yea, i'd say both [21:58:59] (03PS1) 10RobH: adding in install2001 as a aggregator data source [puppet] - 10https://gerrit.wikimedia.org/r/156972 [21:58:59] indeed [21:59:03] morebots: ?? [21:59:03] I am a logbot running on tools-exec-11. [21:59:03] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [21:59:04] To log a message, type !log . [21:59:14] (03PS2) 10RobH: adding in install2001 as a aggregator data source [puppet] - 10https://gerrit.wikimedia.org/r/156972 [21:59:28] what's "wildcat" in beta? [21:59:35] (03CR) 10RobH: [C: 032] adding in install2001 as a aggregator data source [puppet] - 10https://gerrit.wikimedia.org/r/156972 (owner: 10RobH) [22:00:28] mutante: there's a 'wildcat' project in labs, is that what you mean? [22:00:28] (03PS1) 10Andrew Bogott: Link /usr/local/apache/common-local to /a/common-local [puppet] - 10https://gerrit.wikimedia.org/r/156975 [22:01:17] greg-g: although note that it should be considered a 'hack' until labmon1001 is ready. [22:01:26] * greg-g notes [22:01:45] andrewbogott: it shows up in the labs graphite and Yuvi said "only beta"? [22:02:12] ok, I don't know what that means [22:02:17] YuviPanda: that looks like much more than just beta? [22:02:25] mutante: the rest is all bogus data :P [22:02:43] andrewbogott: http://graphite.wmflabs.org/ [22:02:44] mutante: it was being collected at some point in the past, but that kills the instance once a day or so, so I scaled waaaaay back [22:03:00] mutante: I mean, I don't know what 'only beta' means [22:03:02] andrewbogott: Yuvi can monitor stuff in there now [22:03:17] mutante: only beta means only deployment-prep data is being collected now [22:03:38] so how do you add/remove projects? [22:03:47] bd808: https://gerrit.wikimedia.org/r/#/c/156975/1 who should own that directory? [22:04:19] andrewbogott: beta as in the project name "beta" [22:04:30] andrewbogott: mwdeploy:mwdeploy [22:04:38] mutante: there's a whitelist in /srv/carbon/whitelist.conf and also a whitelist in role/diamond [22:04:42] andrewbogott: To match https://github.com/wikimedia/operations-puppet/blob/production/modules/mediawiki/manifests/sync.pp#L24-L29 [22:04:53] mutante: however, I just removed a bunch of roles. the labs machine can't handle that much graphite [22:05:09] mutante: see my last comment in https://bugzilla.wikimedia.org/show_bug.cgi?id=63362 [22:05:13] oh, but puppetizing that link will cause a collision :( [22:05:19] cmjohnson1: I'm good to start, let me know when! [22:05:34] YuviPanda: aha, that whitelist would be edited via a gerrit change already? [22:05:42] reads comments [22:05:47] godog when! [22:05:48] mutante: read the last one [22:06:27] go ahead and depool fe-1003 (godog) [22:06:38] cmjohnson1: yep [22:06:46] YuviPanda: yea, i know .. labmon1001 [22:07:05] mutante: yup, so until that shows up I don't think we should whitelist more projects [22:07:20] (03PS2) 10Andrew Bogott: Link /usr/local/apache/common-local to /a/common-local [puppet] - 10https://gerrit.wikimedia.org/r/156975 [22:07:26] !log depool ms-fe1003 [22:07:47] YuviPanda: yep :) 64 bytes from labmon1001.eqiad.wmnet [22:07:55] no acknowledgment? :( [22:08:07] mutante: hmm? '64 bytes from'? [22:08:11] mutante: did you ping? :) [22:08:14] morebots: ping [22:08:14] I am a logbot running on tools-exec-11. [22:08:15] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [22:08:15] To log a message, type !log . [22:08:18] YuviPanda: yes [22:08:28] mutante: yeah, all packets filtered out :) [22:08:46] YuviPanda: i actually wanted to say "it's up" by that [22:08:56] mutante: ah, yeah :) it's up and fully provisioned as well [22:09:03] !log depool ms-fe1003 [22:09:18] !log can haz logging? [22:09:20] doesn't look like morebots is working? [22:09:56] (03PS3) 10Andrew Bogott: Link /usr/local/apache/common-local to /a/common-local [puppet] - 10https://gerrit.wikimedia.org/r/156975 [22:10:37] (03CR) 10Cmjohnson: [C: 032] Dns changes for ms-fe1003 DO NOT MERGE YET [dns] - 10https://gerrit.wikimedia.org/r/156841 (owner: 10Cmjohnson) [22:10:56] (03CR) 10Andrew Bogott: [C: 032] Link /usr/local/apache/common-local to /a/common-local [puppet] - 10https://gerrit.wikimedia.org/r/156975 (owner: 10Andrew Bogott) [22:11:10] godog: wanna restart it? it should be on tools-login [22:11:12] https://wikitech.wikimedia.org/wiki/Morebots [22:11:18] cmjohnson1: good to go [22:11:56] godog...good ..just merged dns and updated network on the server...moving and adding card now [22:12:01] cmjohnson1: didn't change /etc yet though, happy to do that [22:12:03] PROBLEM - puppet last run on virt1000 is CRITICAL: CRITICAL: Epic puppet fail [22:12:15] godog: done [22:12:19] i'll do it [22:12:49] cmjohnson1: cool! [22:12:54] RECOVERY - Redis on virt1000 is OK: TCP OK - 0.002 second response time on port 6379 [22:13:10] mutante: thanks :) in the middle of this thing [22:13:24] !log test [22:13:32] Logged the message, Master [22:13:43] mutante: bah, how do I enable access to graphite.wmflabs.org from labmon1001? I know you can't access labs from prod, but this seems like a good idea to do, since we can't setup icinga checks otherwise (/cc greg-g) [22:13:45] !log ms-fe1003 down for relocation [22:14:04] !log killing tools.morebots production instance [22:15:06] !log restarted tools.morebots production instance - can i log now? [22:15:12] Logged the message, Master [22:15:13] PROBLEM - Host ms-fe1003 is DOWN: PING CRITICAL - Packet loss = 100% [22:15:17] godog: ^ [22:15:34] mutante: sweet, thanks :)) [22:16:25] YuviPanda: that should probably be a ticket or ops list post [22:16:32] mutante: right, ok [22:26:59] (03PS1) 10Dzahn: allow codfw networks to send snmtraps to icinga [puppet] - 10https://gerrit.wikimedia.org/r/156985 [22:27:16] bd808: ok, now it's syncing into /a -- should be safe this time. [22:28:05] (03CR) 10Dzahn: "you can compare to manifests/network.pp how those variable names are derived from the hash" [puppet] - 10https://gerrit.wikimedia.org/r/156985 (owner: 10Dzahn) [22:28:06] andrewbogott: cool. [22:29:57] (03CR) 10Dzahn: "for example:" [puppet] - 10https://gerrit.wikimedia.org/r/156985 (owner: 10Dzahn) [22:30:58] (03CR) 10Dzahn: [C: 032] allow codfw networks to send snmtraps to icinga [puppet] - 10https://gerrit.wikimedia.org/r/156985 (owner: 10Dzahn) [22:31:32] (03PS2) 10Dzahn: allow codfw networks to send snmptraps to icinga [puppet] - 10https://gerrit.wikimedia.org/r/156985 [22:35:19] (03PS3) 10Dzahn: allow codfw networks to send snmptraps to icinga [puppet] - 10https://gerrit.wikimedia.org/r/156985 [22:35:26] mutante: im reviewing that now [22:36:01] (03CR) 10Dzahn: [C: 032] allow codfw networks to send snmptraps to icinga [puppet] - 10https://gerrit.wikimedia.org/r/156985 (owner: 10Dzahn) [22:36:04] (03CR) 10RobH: [C: 031] allow codfw networks to send snmptraps to icinga [puppet] - 10https://gerrit.wikimedia.org/r/156985 (owner: 10Dzahn) [22:36:12] robh: thanks:) [22:36:20] welcome [22:36:25] i'm waiting for jenkins [22:37:24] hmm, suspicious, jenkins didnt talk at all yet on this one [22:39:03] RECOVERY - puppet last run on virt0 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [22:42:03] RECOVERY - puppet last run on virt1000 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [22:45:58] bd808: well, and after the sync, puppet broke my symlink and made common-local a local dir again, thus setting the stage for disaster on the next puppet run [22:46:10] blerg [22:46:47] You can link one level higher. /usr/local/apache -> /a/apache-local [22:46:57] yeah [22:46:58] or some lame crap like that [22:47:17] The symlink turned into a dir because of rsync I bet [22:47:43] PROBLEM - Disk space on virt1000 is CRITICAL: DISK CRITICAL - free space: / 1798 MB (2% inode=79%): [22:48:05] dammit, already puppet? [22:49:43] RECOVERY - Disk space on virt1000 is OK: DISK OK [22:50:19] chasemp: 1) thanks a ton for that phab day 1 email summary. 2) I was about to reply calling out some needs, but that might be unwanted: do people like legal know you're waiting on them? [22:51:38] Legal is in fab so should...and Andre sent them a note said, not sure who else you mean but do your thing [22:51:53] ^he said [22:52:02] yeah [22:52:09] dropped a line to Luis today [22:52:16] !log aaron Synchronized php-1.24wmf18/maintenance/findMissingFiles.php: (no message) (duration: 00m 07s) [22:52:22] Logged the message, Master [22:52:54] ...but I won't stop greg-g from making a visit on 6th floor together with some grumpy looking people to say "do reply to this. now." [22:53:17] chasemp: nothing else I guess :) [22:54:51] The more I think on it the less I want to try to uproot Bugzilla when Andre is on half time, and frankly there is enough to do otherwise. Andre is there anyone who can play second fiddle for Bugzilla? [22:55:02] (03PS1) 10Andrew Bogott: More jiggering of symlinks to make room for scap [puppet] - 10https://gerrit.wikimedia.org/r/156991 [22:55:13] * ori pretends he didn't see that [22:55:26] local-apache?! [22:55:27] yes ori please look away [22:55:30] andrewbogott: what?! [22:55:38] what?! [22:55:42] what is "local-apache"? [22:55:44] I can call it anything you like [22:55:50] /a is big; / is not [22:56:00] but I don't want to burn down and rebuild this server anytime soon [22:56:12] You can link one level higher. /usr/local/apache -> /a/apache-local [22:56:20] i'm not friends with you any more [22:56:31] hey, I welcome alternatives [22:56:34] that don't involve repartitioning [22:56:36] chasemp, hmm. well, mutante helps out sometimes with low level stuff, but I cannot really tell who worked on stuff (if at all) before I joined WMF and if people are still around [22:56:38] * bd808 waits for ori to stop sulking [22:56:57] seriously, what is that? [22:57:02] can you give me a bit of context? [22:57:10] chasemp, let me reply to your message tomorrow, probably a good idea to go for RT first then, but today I'm slowly fading out here (1AM) [22:57:15] virt1000 getting scap pushes [22:57:22] ori: So, trying to get wikitech to use the deployment system. [22:57:24] andre__: g'night, go sleep [22:57:33] not yet, still stuff to do, sigh :) [22:57:36] but thanks :) [22:57:36] The deployment system, step one, is 'copy 30 gigs to stuff to /usr/local/apache/blahblah' [22:57:38] great, why does it need to deploy to its own directory, unlike the rest of the cluster? [22:57:40] And I don't have 30 gigs in /usr [22:58:19] just copy over everything in /usr to the bigger disk [22:58:22] and mount that as /usr [22:58:27] how long does that take? probably 5-10 minutes [22:59:10] andrewbogott: i know i must seem like an incredible ass right now, but it's "just another symlink" to you only because you've been spared the details of the fight against the cluster-fuck of symlinks that is mediawiki over the past several months [22:59:22] for the love of christ let's not add any [22:59:55] and bd808 should really know better than to advise you to add them [23:00:04] RoanKattouw, ori, MaxSem, legoktm: Dear anthropoid, the time has come. Please deploy SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140828T2300). [23:00:06] I'll do it [23:00:14] o/ [23:00:21] We both know better. I'm just not confident that moving /usr to a different volume is as trivial as you suggest [23:00:33] PROBLEM - Puppet freshness on mw1053 is CRITICAL: Last successful Puppet run was Wed 27 Aug 2014 06:41:47 UTC [23:01:18] andrewbogott: why wouldn't it be? [23:01:46] Because virt1000 is a tangle of bobby pins and bailing wire and I try not to mess with it [23:01:58] One other thing we could try is adding a custom /etc/scap.cfg on virt1000 to target a different directory. I'm not 100% sure that we wouldn't still need a symlink for docroot stuff but I can check [23:02:03] grrrit-wm: helo? [23:02:09] why would we target a different directory? [23:02:18] how is it not a bad idea to make it "special" in yet another way? [23:02:25] because there is not space on /usr [23:02:34] that's a solvable problem [23:02:50] ln -s to /data/storage? [23:03:04] eh, ./project i mean [23:03:06] ori: You have root on virt1000, right? I'm totally down with you rearranging the drives. I'm a bit too close to the end of my day and a bit to insecure with my fstab skills to want to do it just now. [23:03:22] It is /probably/ safe :) [23:03:24] andrewbogott: ok, let me take a stab (heh) at it [23:03:30] an f-stab ?:) [23:03:42] * greg-g groans [23:03:51] andrewbogott, bd808: http://mywiki.wooledge.org/XyProblem [23:04:04] * spagewmf JIT production of Flow-Thanks-Echo backports :) [23:04:58] (03CR) 10Alex Monk: "Can we also use this to move all *.wikipedia.org databases to *wikipedia, and similar for *.wikimedia.org?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/134962 (owner: 10Reedy) [23:06:10] ori: note that there is some stuff in /a that is unrelated to scap and needs to persist [23:06:19] or, if moved, some puppet changes will be required, I don't know exactly what. [23:07:05] RoanKattouw: is Mon/Wed only still valid for you for swats? [23:07:13] RECOVERY - Puppet freshness on install2001 is OK: puppet ran at Thu Aug 28 23:07:08 UTC 2014 [23:07:20] robh: bblack ^ [23:07:38] [1409267233] SERVICE NOTIFICATION: irc;install2001;Puppet freshness;OK;notify-service-by-irc;puppet ran at Thu Aug 28 23:07:08 UTC 2014 [23:07:38] spagewmf, you have no patches for swat? [23:07:43] welcome codfw [23:07:58] greg-g: No it's not, feel free to put me in whatever windows you need as long as they're not at 8am [23:07:59] OuKB: three coming up, first is ready [23:08:05] mutante: nice! [23:08:07] greg-g: I'm gradumacated now :) [23:08:08] RoanKattouw: :) will do, thank you sir [23:08:11] so was the rules eh? [23:08:17] (03CR) 10Dzahn: "[1409267233] SERVICE NOTIFICATION: irc;install2001;Puppet freshness;OK;notify-service-by-irc;puppet ran at Thu Aug 28 23:07:08 UTC 2014" [puppet] - 10https://gerrit.wikimedia.org/r/156985 (owner: 10Dzahn) [23:08:22] RoanKattouw: yeah, just didn't want to presume you didn't schedule something else there [23:08:26] robh: yes, ferm rules on neon [23:08:33] awesome, thanks for fixing that [23:08:45] more codfw progress [23:08:48] (it counts) [23:08:57] yep:) yw [23:09:09] greg-g: Oh can you try not to put me on Monday? I have a 4pm meeting on Monday (I'm optional for it and I don't always attend, but I'd like to be able to attend it when I do want to) [23:09:45] (03CR) 10Krinkle: "Maybe a static array instead of a switch case? Gonna be boilerplate of switch/return otherwise." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/134962 (owner: 10Reedy) [23:10:19] (03PS1) 10BBlack: Escape user controlled text in error page [puppet] - 10https://gerrit.wikimedia.org/r/156997 (https://bugzilla.wikimedia.org/70008) [23:10:44] (03CR) 10Krinkle: "@Krenari: Maybe, though, while dbname is internal, it is exposed a fair bit in js, and especially on more public wikis, it's used a fair b" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/134962 (owner: 10Reedy) [23:11:04] csteipp: ^ [23:11:21] RoanKattouw: will (not) do [23:12:47] (03CR) 10Krinkle: "Also, actually, doesn't rename the dbname. Only the lang/site combo for config files and subdomain matching." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/134962 (owner: 10Reedy) [23:12:59] (03CR) 10CSteipp: [C: 031] Escape user controlled text in error page [puppet] - 10https://gerrit.wikimedia.org/r/156997 (https://bugzilla.wikimedia.org/70008) (owner: 10BBlack) [23:13:06] Cool thanks [23:13:37] who's doing SWAT? The Flow and Thanks wmf19 bumps are ready [23:13:59] spagewmf: OuKB is (who is Max) [23:14:17] (03CR) 10BBlack: [C: 032] Escape user controlled text in error page [puppet] - 10https://gerrit.wikimedia.org/r/156997 (https://bugzilla.wikimedia.org/70008) (owner: 10BBlack) [23:14:19] I thought MaxSem was Max [23:14:31] WE ARE LEGION [23:15:53] legoktm: Thanks bump is https://gerrit.wikimedia.org/r/157000 if you're around to sanity check [23:15:54] https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140828T2300 [23:16:00] (03PS1) 10Dzahn: icinga - add monitor group for misc codfw servers [puppet] - 10https://gerrit.wikimedia.org/r/157003 [23:16:27] my bouncer died [23:16:38] andrewbogott: what is /var/lib/glance/images/foo ? [23:16:59] ori-l you still haven't billed me for 18 months of ZNC [23:17:09] spagewmf: The Thanks change looks good to me, FWWI. [23:17:17] OuKB: lolwat https://gerrit.wikimedia.org/r/#/c/156999/ is on wmf15? [23:17:26] yup, slight typo [23:17:28] spagewmf: the cost is negligible; enjoy [23:17:34] ori-l: I don't know, but it can almost certainly be deleted. Give me a second to verify. [23:17:51] (03CR) 10Dzahn: [C: 032] icinga - add monitor group for misc codfw servers [puppet] - 10https://gerrit.wikimedia.org/r/157003 (owner: 10Dzahn) [23:18:23] andrewbogott: the other files in that directory are all from february/may [23:18:32] ehm, there's one from june i guess [23:18:46] it's 34G altogether [23:19:05] /var/lib/glance/images is important [23:19:26] But /foo is probably a backup leftover from when I migrated from virt0 [23:19:55] andrewbogott: it's set via 'filesystem_store_datadir' [23:19:58] * OuKB bites Krinkle for not doing a submodule commit [23:20:03] legoktm: I guess you ran user requests for GlobalCssJs requesting revision limit override also via your user hack [23:20:03] andrewbogott: why don't we set that to /a/glance ? [23:20:18] Krinkle: no, I didn't run any of those yet [23:20:26] andrewbogott: templates/openstack/havana/glance/glance-api.conf.erb:177:filesystem_store_datadir = /var/lib/glance/images/ [23:20:34] legoktm: k, I ran it for myself earlier today but got bitten by the lack of that [23:20:39] legoktm: saw you only added that feature earlier today [23:20:42] backported it now [23:20:48] OuKB: Sorry :-/ [23:20:51] thanks [23:20:57] OuKB, legoktm I think 157000 is the right Thanks commit [23:21:08] andrewbogott: imo just resize the disk, but i don't have console access for that [23:21:18] OuKB: I was going to actually, I only added it a minute ago. Want me to create it? [23:21:29] * ori-l_ now [23:21:33] moving glance stuff is probably ok... [23:21:38] meh, doing myself [23:21:46] k [23:22:59] (03PS1) 10Aude: re-enable wikibase badges css setting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/157006 [23:23:37] spagewmf, I already did Thanks in https://gerrit.wikimedia.org/r/#/c/157004/ [23:23:42] ori: Any such change should probably be done on virt0 as well, even though it's headed for the scrap-heap in a month... [23:24:45] I will copy and hotfix to make sure moving doesn't upset glance. [23:25:01] andrewbogott: common-local isn't actually 33g; .git gets excluded by rsync [23:25:15] if you look at a deployment target you'll see it's 19G [23:25:19] OuKB: oh, thanks. I thought submitters were supposed to do the backports. I just prepared https://gerrit.wikimedia.org/r/157008 for the Echo backport [23:25:29] andrewbogott: so with the 'foo' image removed you should be good [23:25:48] ori: yeah, although now you've inspired me since that box was low on space anyway :) [23:25:57] spagewmf, they are, but before the window start [23:25:59] ;) [23:26:03] Just yesterday I cleared out some obsolete images to pry out an extra gig. Pretty tedious. [23:26:17] andrewbogott: resize it; the cost of a diff of config is bigger than that of resizing a partition once [23:26:59] OuKB: yeah, sorry sorry! [23:28:22] ori: When you say 'resize a partition' I hear '60% chance of unrecoverable disk error' [23:28:26] maybe I'm stuck in the 90's [23:28:52] (03PS2) 10Aude: re-enable wikibase badges css setting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/157006 [23:29:41] (03CR) 10Aude: [C: 04-2] "not to deploy until lydia and/or bene* approve" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/157006 (owner: 10Aude) [23:30:05] !log maxsem Synchronized php-1.24wmf18/extensions/GlobalCssJs/: https://gerrit.wikimedia.org/r/#/c/157009/ (duration: 00m 04s) [23:30:16] Logged the message, Master [23:32:30] (03Abandoned) 10Andrew Bogott: More jiggering of symlinks to make room for scap [puppet] - 10https://gerrit.wikimedia.org/r/156991 (owner: 10Andrew Bogott) [23:33:08] Krinkle, ^^ [23:33:49] Thx [23:34:21] !log maxsem Synchronized php-1.24wmf19/extensions/MobileFrontend/: https://gerrit.wikimedia.org/r/#/c/156968/ (duration: 00m 05s) [23:34:30] Logged the message, Master [23:35:21] !log maxsem Synchronized php-1.24wmf19/extensions/Thanks/: https://gerrit.wikimedia.org/r/#/c/156898/ (duration: 00m 04s) [23:35:28] legoktm, ^^ [23:35:34] * legoktm tests [23:35:36] Logged the message, Master [23:36:28] OuKB: works, thanks! [23:36:46] !log maxsem Synchronized php-1.24wmf19/extensions/Echo: https://gerrit.wikimedia.org/r/#/c/157008/ (duration: 00m 04s) [23:37:03] Logged the message, Master [23:38:05] PROBLEM - puppet last run on virt1000 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:38:11] !log maxsem Synchronized php-1.24wmf19/extensions/Flow/: https://gerrit.wikimedia.org/r/#/c/156994/ (duration: 00m 05s) [23:38:21] Logged the message, Master [23:38:24] spagewmf, ^^^ [23:39:35] !log maxsem Synchronized php-1.24wmf19/includes/: https://gerrit.wikimedia.org/r/#/c/156979/ (duration: 00m 06s) [23:39:50] Logged the message, Master [23:39:53] !log maxsem Synchronized php-1.24wmf19/maintenance/: https://gerrit.wikimedia.org/r/#/c/156979/ (duration: 00m 04s) [23:40:16] Logged the message, Master [23:40:24] (03CR) 10TTO: "I don't really see the value in "renaming" enwiki to enwikipedia. However, if this can help with (e.g.) bug 23215, where a wiki has the wr" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/134962 (owner: 10Reedy) [23:41:11] OuKB: <3 for your flexibility. I will give you all the money I saved up to pay Mr. Livneh for ZNC. [23:41:52] spagewmf, I see a few "Did not load root post Rysujnjdpyb7661r" in fatalmonitor [23:42:08] seem to be before the deployment though [23:44:07] OuKB: we'll investigate, thanks [23:46:01] legoktm: Hm.. https://fr.wiktionary.org/wiki/Utilisateur:Krinkle/common.js wasn't deleted [23:46:10] frwiktionary: Utilisateur:Krinkle/common.js was deleted. [23:46:11] frwiktionary: Utilisateur:Krinkle/common.css was deleted. [23:46:33] hmm [23:46:51] others were fine such as https://ru.wikiquote.org/wiki/%D0%A3%D1%87%D0%B0%D1%81%D1%82%D0%BD%D0%B8%D0%BA:Krinkle/common.css - still checking the rest [23:47:04] we're not checking the return value of WikiPage::doDeleteArticleReal [23:47:09] omg are you using IE? [23:47:15] PROBLEM - puppet last run on cp4003 is CRITICAL: CRITICAL: Epic puppet fail [23:47:21] I guess we could be getting caught by the ArticleDelete hook [23:47:40] user permissions, db error, etc. [23:47:44] is it doing waitforslaves? [23:47:59] it's not, should I add that in? [23:48:04] we should be bypassing user permissions [23:48:07] the account doesn't even exist [23:48:20] sure, but I'm just saying that's one of the possible errors from WikiPage [23:48:43] doDeleteArticleReal doesn't check permissions [23:48:47] Yeah, add waitforslaves after or before each delete (maybe both) + checking return value before printing that it was deleted. [23:48:48] if it's doing a lot of deletes, it should probably add a wait for slaves every N actions [23:49:06] I've ran it three times in total. frwiktionary is still there, so its probably a deteministic error [23:49:15] Reedy: About 10 pages at most, but foreachwiki [23:49:28] https://fr.wiktionary.org/wiki/Utilisateur:Legoktm/common.js is still there too. [23:49:35] guess it wouldn't hurt to stick it at the bottom of the script before exit [23:49:36] https://github.com/wikimedia/mediawiki-extensions-GlobalCssJs/blob/master/removeOldManualUserPages.php [23:49:39] yeah [23:49:46] I can't view the abuselog. [23:49:47] Reedy: btw, is foreachwiki supposed to be broken? [23:49:54] I had to run it with sudo -u apache [23:50:01] mwscript on the other hand works fine [23:50:02] that's been the case for a long time [23:50:05] ok [23:50:07] yes, was like that for ages [23:50:10] not doucented anywhere [23:50:11] ? [23:50:12] script should probably be updated [23:50:13] foreachwiki is broken? [23:50:19] oh [23:50:21] I just got 800 errors when I ruan it [23:50:29] and then re-ran it with sudo [23:50:33] :-/ [23:50:33] oh, I run it with sudo -u apache since it told me to [23:50:38] not broken, just wasn't updated when apache user was made a requirement [23:51:05] running a script per its documented call signature throwing an error is broken by my definitino. [23:51:45] !log repooling ms-fe1003 [23:51:45] {{sofixit}} [23:51:46] :) [23:51:47] legoktm: enwiktionary: User:Krinkle/common.js was deleted. https://en.wiktionary.org/wiki/User:Krinkle/common.js [23:51:51] Logged the message, Master [23:52:22] Krinkle: it's an abusefilter [23:52:40] string(129) "{{Avertissement filtre [23:52:40] |niveau = information [23:52:41] |texte = {{:MediaWiki:abusefilter-warning-page-utilisateur-tierce}} [23:52:41] |filtre = 14 [23:52:41] }}" [23:52:41] Yeah, I figured. [23:53:02] (03PS1) 10Andrew Bogott: Move glance images to /a where there's more room. [puppet] - 10https://gerrit.wikimedia.org/r/157012 [23:53:04] I can't see the filter though [23:53:09] I get the same sometimes when doing globalinterface edits on wikis that have a stupid filter for something that's already in mediawiki core [23:53:16] (namely to disallow editing of other users jscss pages) [23:53:25] You do not have permission to view abuse filters, for the following reason: [23:53:25] The action you have requested is limited to users in one of the groups: Administrators, Patrollers, Autopatrollers, Abuse filter editors. [23:53:28] ori: can you tolerate that? [23:53:35] legoktm: wiki dbname and abusefilter id? [23:53:36] Krinkle: can you edit the filter and add action == edit in front of it? [23:53:45] it looks like filter 14, frwiktionary [23:53:56] I can't see the filter log either... [23:54:05] I can just see the error message [23:54:09] https://en.wiktionary.org/wiki/Special:AbuseFilter/14 [23:54:15] I can see the filter but dont have edit permission [23:54:18] will ask a steward [23:54:25] Krinkle: frwikitionary [23:54:28] not en [23:54:30] right [23:54:38] yeah, can't see it [23:56:06] (03PS1) 10Reedy: Add sudo -u apache call to foreachwikiindblist [puppet] - 10https://gerrit.wikimedia.org/r/157013 [23:58:15] legoktm: Did you find What the error on enwikt is? [23:58:45] !log depool ms-fe1004 [23:58:47] oh, it's broken there too? /me checks [23:58:52] Logged the message, Master [23:58:52] 23:35, 28 August 2014: 127.0.0.1 (Talk) triggered filter 24, performing the action "delete" on User:Krinkle/common.css. Actions taken: Disallow; Filter description: Users touching other users' user pages and subpages (details | examine) [23:59:14] https://en.wiktionary.org/wiki/Special:AbuseFilter/24 [23:59:16] same thing