[00:00:30] mutante: That page is the perfect example of documentation rot [00:00:42] (03PS2) 10Ori.livneh: Update remaining references to /u/l/a/common-local [mediawiki-config] - 10https://gerrit.wikimedia.org/r/159635 [00:01:50] bd808: yea, "type apt-get install && reboot" [00:07:34] (03CR) 10Ori.livneh: [C: 032] Update remaining references to /u/l/a/common-local [mediawiki-config] - 10https://gerrit.wikimedia.org/r/159635 (owner: 10Ori.livneh) [00:07:56] !log ori updated /a/common to {{Gerrit|Id607bf36d}}: Update remaining references to /u/l/a/common-local [00:08:01] (03CR) 10Dzahn: [C: 032] delete "check_bad_apaches" monitoring [puppet] - 10https://gerrit.wikimedia.org/r/159619 (owner: 10Dzahn) [00:08:02] Logged the message, Master [00:08:16] * ori tries that config change on mw1017 [00:09:18] grrr, terbium wtf [00:10:23] Jeff_Green: what's up? [00:10:39] insane cronspam [00:11:06] apache : user NOT in sudoers [00:11:09] ooh? [00:11:15] that's the one yeah [00:11:19] fail [00:11:41] i bet that is [00:11:43] Reedy: i think that's your change [00:11:45] https://gerrit.wikimedia.org/r/#/c/157013/ [00:12:16] apache : user NOT in sudoers ; TTY=unknown ; PWD=/var/www ; USER=apache ; COMMAND=/usr/bin/php ... [00:12:22] yea, must be from deployment [00:12:47] Reedy: can you amend that to check that the user is not already apache? [00:12:50] lol [00:12:52] Poor apache can't even sudo to himself :( [00:12:52] one mail per wiki :) [00:12:53] ... [00:12:57] :D [00:13:21] Jeff_Green: sorry 'bout that [00:13:41] no worries [00:14:05] bye! [00:14:16] Is there a way of doing this without having to duplicate the whole line in an if statement? [00:14:38] put the whole line in an if statement, don't get fancy now [00:14:45] oh, just $SUDO at the start or something? [00:15:27] $sudo = ''; if ( user != apache ) { $sudo = 'sudo -u apache'; } $SUDO php -ddisplay_errors=On $MW_COMMON/multiversion/MWScript.php $CMD --wiki=$x " [00:15:27] ${@}" | sed -u "s/^/$x: /" [00:15:29] that should work [00:15:38] noting that isn't bash syntax, yada yada [00:16:07] (03CR) 10RobH: [C: 031] blog/techblog - TTL back to 1H [dns] - 10https://gerrit.wikimedia.org/r/158277 (owner: 10Dzahn) [00:16:34] sudo= [00:16:36] if groups | grep -Ewq 'sudo|wikidev|root'; then [00:16:42] sudo=sudo -u apache [00:16:44] fi [00:16:46] :P [00:17:02] or just $USER [00:17:23] mwscript is more fancy [00:17:27] and rpobably for a reason [00:17:53] eg. snapshots use dataset instead of apache for stuff [00:18:31] so a sole check on $USER might lead to subtle breakage [00:18:59] Reedy: [[ "$(id -u)" != "$(id -u apache)" ]] || [[ "$(groups)" == *wikidev* ]] [00:19:15] $waysToSkinACat++; [00:19:17] yes, it assumes there no groups with 'wikidev' in their name other than 'wikidev' [00:20:54] (03PS2) 10Dzahn: blog/techblog - TTL back to 1H [dns] - 10https://gerrit.wikimedia.org/r/158277 [00:21:05] (03CR) 10jenkins-bot: [V: 04-1] blog/techblog - TTL back to 1H [dns] - 10https://gerrit.wikimedia.org/r/158277 (owner: 10Dzahn) [00:21:23] lol [00:22:38] !log ori Synchronized docroot and w: Id607bf36d: Update remaining references to /u/l/a/common-local (duration: 00m 04s) [00:22:42] Logged the message, Master [00:23:57] (03PS3) 10Dzahn: blog/techblog - TTL back to 1H [dns] - 10https://gerrit.wikimedia.org/r/158277 [00:24:04] Reedy: what is this [2014-09-11 00:23:53] Fatal error: Base lambda function for closure not found at /usr/local/apache/common-local/php-1.24wmf20/extensions/Wikidata/extensions/Wikibase/lib/config/WikibaseLib.default.php on line 18 ? already reported/debugged? [00:24:10] https://github.com/wikimedia/mediawiki-vagrant/blob/master/puppet/modules/mediawiki/templates/multiwiki/foreachwiki.erb#L4 [00:24:16] ori: It's an intermittant APC issue [00:24:20] ori: That's the APC f.ck up [00:24:25] Gracefulling the apache "fixes" it [00:24:35] (03CR) 10Dzahn: [C: 032] blog/techblog - TTL back to 1H [dns] - 10https://gerrit.wikimedia.org/r/158277 (owner: 10Dzahn) [00:24:40] didn't joe do a graceful-all a little bit ago? [00:24:49] Yeah, it started just after [00:24:58] mutante gracefulled anothr apache [00:25:03] just a single one [00:25:23] 10.64.32.41 [00:25:31] 10.64.32.57 [00:25:32] atm it seems [00:25:38] !log ori Synchronized multiversion: Id607bf36d: Update remaining references to /u/l/a/common-local (duration: 00m 04s) [00:25:42] !log ori Synchronized wmf-config: Id607bf36d: Update remaining references to /u/l/a/common-local (duration: 00m 03s) [00:25:43] Logged the message, Master [00:25:48] Logged the message, Master [00:26:34] yup, looks to be just those 2 [00:27:11] Inbox(416) [00:27:29] Reedy: they need graceful too? [00:27:42] please [00:28:12] done [00:28:24] !log graceful'ed Apaches on mw1171, mw1187 [00:28:29] Logged the message, Master [00:31:14] Reedy: i got the user check thing [00:35:57] (03PS1) 10Ori.livneh: foreachwikiindblist: check groups before attempting to sudo [puppet] - 10https://gerrit.wikimedia.org/r/159643 [00:36:11] w/in 37 [00:36:13] mutante: ^ [00:36:30] mutante: 37 what? [00:37:23] (03CR) 10Hoo man: [C: 031] "That's one way to do it :P" [puppet] - 10https://gerrit.wikimedia.org/r/159643 (owner: 10Ori.livneh) [00:37:24] 37 windows in my IRC client :o [00:37:33] oh, heh. [00:39:21] (03CR) 10Dzahn: [C: 031] "yes please, it's like mwscript. confirmed. and this should stop current mail spam to roots" [puppet] - 10https://gerrit.wikimedia.org/r/159643 (owner: 10Ori.livneh) [00:41:14] (03CR) 10Ori.livneh: [C: 032] foreachwikiindblist: check groups before attempting to sudo [puppet] - 10https://gerrit.wikimedia.org/r/159643 (owner: 10Ori.livneh) [00:43:16] (03PS1) 10Ori.livneh: Remove MW_DBLISTS config vars [puppet] - 10https://gerrit.wikimedia.org/r/159645 [00:43:17] mutante: merged and ran puppet on terbium [00:43:22] mutante: are the alerts better now? [00:45:13] ori: i think so, yea, the last one i got was now , curiously [00:45:20] Mail Delivery Subsystem [00:45:27] Delivery to the following recipient failed permanently:... [00:45:34] " Internal Message-ID collision" [00:51:08] yea, has stopped. thanks [01:03:39] (03PS1) 10Ori.livneh: Replace remaining references to /u/l/a/common [mediawiki-config] - 10https://gerrit.wikimedia.org/r/159650 [01:06:25] (03PS1) 10Jforrester: Move Parsoid extension-list pointer now the code has moved [mediawiki-config] - 10https://gerrit.wikimedia.org/r/159651 [01:06:27] (03PS1) 10Jforrester: Move Parsoid pointer in extension-list now the repo has been restructured [mediawiki-config] - 10https://gerrit.wikimedia.org/r/159652 [01:07:49] (03PS2) 10Jforrester: Move Parsoid pointer in Labs extension-list as the repo has been restructured [mediawiki-config] - 10https://gerrit.wikimedia.org/r/159651 [01:08:07] (03CR) 10MZMcBride: "Ottomata: People are sharing passwords? I don't follow." [puppet] - 10https://gerrit.wikimedia.org/r/155452 (owner: 10Dzahn) [01:39:30] (03PS1) 10Ori.livneh: Add grafana.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/159655 [01:39:39] brgh [01:39:47] (03Abandoned) 10Ori.livneh: Add grafana.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/159655 (owner: 10Ori.livneh) [01:40:28] (03PS2) 10Ori.livneh: Add grafana.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/133275 [01:40:30] (03PS4) 10Dzahn: webserver - use ssl_ciphersuite in generic_vhost [puppet] - 10https://gerrit.wikimedia.org/r/153971 [01:41:15] (03CR) 10Ori.livneh: "@mutante: definitely still desired! the dependent change () needs a friend, too." [dns] - 10https://gerrit.wikimedia.org/r/133275 (owner: 10Ori.livneh) [01:41:16] (03CR) 10Dzahn: "_joe_:" [puppet] - 10https://gerrit.wikimedia.org/r/153971 (owner: 10Dzahn) [01:49:46] (03CR) 10Dzahn: [C: 032] StrictTransportSecurity for lists.wm.org [puppet] - 10https://gerrit.wikimedia.org/r/145500 (https://bugzilla.wikimedia.org/38516) (owner: 10Dzahn) [01:50:50] (03CR) 10Dzahn: [V: 032] StrictTransportSecurity for lists.wm.org [puppet] - 10https://gerrit.wikimedia.org/r/145500 (https://bugzilla.wikimedia.org/38516) (owner: 10Dzahn) [01:54:18] (03CR) 10Dzahn: "curl -s -D- https://lists.wikimedia.org/mailman/listinfo | grep Strict" [puppet] - 10https://gerrit.wikimedia.org/r/145500 (https://bugzilla.wikimedia.org/38516) (owner: 10Dzahn) [02:05:34] (03CR) 10Catrope: [C: 032] Move Parsoid pointer in Labs extension-list as the repo has been restructured [mediawiki-config] - 10https://gerrit.wikimedia.org/r/159651 (owner: 10Jforrester) [02:05:39] (03Merged) 10jenkins-bot: Move Parsoid pointer in Labs extension-list as the repo has been restructured [mediawiki-config] - 10https://gerrit.wikimedia.org/r/159651 (owner: 10Jforrester) [02:05:41] (03PS2) 10Catrope: Move Parsoid pointer in extension-list now the repo has been restructured [mediawiki-config] - 10https://gerrit.wikimedia.org/r/159652 (owner: 10Jforrester) [02:06:02] (03PS1) 10Yurik: zerowiki: Removed zeroadmin group, rights adjustments [mediawiki-config] - 10https://gerrit.wikimedia.org/r/159656 [02:09:11] (03CR) 10Jforrester: "Must be merged and deployed alongside 83ff3f54c." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/159652 (owner: 10Jforrester) [02:10:30] PROBLEM - Disk space on virt0 is CRITICAL: DISK CRITICAL - free space: /a 3615 MB (3% inode=99%): [02:14:49] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There is one unmerged change in mediawiki_config (dir /a/common/). [02:15:13] (03PS1) 10Dzahn: add stats table for sourceforge wikis [debs/wikistats] - 10https://gerrit.wikimedia.org/r/159661 (https://bugzilla.wikimedia.org/58396) [02:16:47] greg-g: OK for me to schedule a follow-on deploy after the train tomorrow for Roan to fix up Parsoid? The repo structure changed and we need to move the entry in extension-list and push the new Parsoid cherry-pick at the same time. [02:23:30] !log LocalisationUpdate completed (1.24wmf15) at 2014-09-11 02:23:29+00:00 [02:23:40] Logged the message, Master [02:28:55] greg-g: Never mind; Roan and I found that Reedy is wonderful, and all he needs to do is push a config change as part of the train instead. [02:29:20] (03CR) 10Jforrester: [C: 031] "Actually, this is good to go now; must be done before wmf21 train." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/159652 (owner: 10Jforrester) [02:32:43] (03PS1) 10Dzahn: add stats table for orain.org wikis [debs/wikistats] - 10https://gerrit.wikimedia.org/r/159663 (https://bugzilla.wikimedia.org/70309) [02:35:51] (03PS1) 10Yurik: Zero: fix X-CS tagging for the zero.wp and m.wp redirect pages [puppet] - 10https://gerrit.wikimedia.org/r/159664 [02:35:59] bblack, when you have a sec ^ [02:36:01] minor fix [02:36:02] (03PS1) 10Dzahn: wikistats-crons for updates of sourceforge, orain [puppet] - 10https://gerrit.wikimedia.org/r/159665 (https://bugzilla.wikimedia.org/70309) [02:36:37] !log LocalisationUpdate completed (1.24wmf19) at 2014-09-11 02:36:37+00:00 [02:36:42] Logged the message, Master [02:36:50] (03CR) 10Dzahn: [C: 032] add stats table for sourceforge wikis [debs/wikistats] - 10https://gerrit.wikimedia.org/r/159661 (https://bugzilla.wikimedia.org/58396) (owner: 10Dzahn) [02:37:06] (03CR) 10Dzahn: [C: 032] add stats table for orain.org wikis [debs/wikistats] - 10https://gerrit.wikimedia.org/r/159663 (https://bugzilla.wikimedia.org/70309) (owner: 10Dzahn) [02:38:46] (03CR) 10BBlack: [C: 032] Zero: fix X-CS tagging for the zero.wp and m.wp redirect pages [puppet] - 10https://gerrit.wikimedia.org/r/159664 (owner: 10Yurik) [02:38:53] thx ) [02:39:09] np [02:45:33] (03PS1) 10Dzahn: bump version up to 2.8 [debs/wikistats] - 10https://gerrit.wikimedia.org/r/159666 [02:46:11] (03CR) 10Dzahn: [C: 032] bump version up to 2.8 [debs/wikistats] - 10https://gerrit.wikimedia.org/r/159666 (owner: 10Dzahn) [02:46:19] (03CR) 10Dzahn: [V: 032] bump version up to 2.8 [debs/wikistats] - 10https://gerrit.wikimedia.org/r/159666 (owner: 10Dzahn) [02:46:51] (03CR) 10Dzahn: [C: 032] wikistats-crons for updates of sourceforge, orain [puppet] - 10https://gerrit.wikimedia.org/r/159665 (https://bugzilla.wikimedia.org/70309) (owner: 10Dzahn) [02:47:08] (03PS1) 10Jforrester: Follow-up I51abd7c: Enable Commons use for wikitech (labswiki) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/159667 [02:49:17] (03CR) 10Jforrester: Mark out a bunch of code for wikitech. (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/158138 (owner: 10Andrew Bogott) [02:49:26] !log LocalisationUpdate completed (1.24wmf20) at 2014-09-11 02:49:26+00:00 [02:49:31] Logged the message, Master [02:50:06] James_F: https://gerrit.wikimedia.org/r/#/c/158313/ :p [02:50:16] Bleurgh. [02:50:27] hehe [02:50:42] (03CR) 10Jforrester: [C: 04-1] "The convention is intentionally tabs, not spaces. This is definitely the wrong way to go…" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/158313 (owner: 10Dzahn) [02:54:32] (03CR) 10Catrope: [C: 04-1] "We use tabs for all PHP code in the MediaWiki ecosystem, this repo shouldn't be different." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/158313 (owner: 10Dzahn) [03:00:19] RECOVERY - Disk space on virt0 is OK: DISK OK [03:25:22] (03PS1) 10Dzahn: orain actually has a /w/ style URL, the default [debs/wikistats] - 10https://gerrit.wikimedia.org/r/159668 [03:25:57] (03CR) 10Dzahn: [C: 032] orain actually has a /w/ style URL, the default [debs/wikistats] - 10https://gerrit.wikimedia.org/r/159668 (owner: 10Dzahn) [03:26:03] (03CR) 10Dzahn: [V: 032] orain actually has a /w/ style URL, the default [debs/wikistats] - 10https://gerrit.wikimedia.org/r/159668 (owner: 10Dzahn) [03:37:25] (03CR) 10Ori.livneh: "@mutante: yes, that should work." [puppet] - 10https://gerrit.wikimedia.org/r/153971 (owner: 10Dzahn) [03:41:06] !log LocalisationUpdate ResourceLoader cache refresh completed at Thu Sep 11 03:41:03 UTC 2014 (duration 41m 2s) [03:41:13] Logged the message, Master [04:24:18] (03CR) 10BryanDavis: "The problem that led to introducing this flag was that virt1001 where labswiki/wikitech runs cannot communicate with the production databa" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/159667 (owner: 10Jforrester) [05:08:57] (03CR) 10Chmarkine: "Note that now the connection to http://ishmael.wikimedia.org no longer redirects to HTTPS." [puppet] - 10https://gerrit.wikimedia.org/r/154969 (owner: 10Dzahn) [06:15:53] (03PS1) 10Ori.livneh: mediawiki::sync: re-declare deployment paths [puppet] - 10https://gerrit.wikimedia.org/r/159674 [06:28:11] PROBLEM - puppet last run on db1040 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:42] PROBLEM - puppet last run on db1018 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:01] PROBLEM - puppet last run on mw1123 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:02] PROBLEM - puppet last run on mw1092 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:11] PROBLEM - puppet last run on mw1120 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:05] it's mod_passenger o'clock [06:39:35] <_joe_> yes [06:39:38] <_joe_> 8:30 here [06:39:55] <_joe_> so 23:30 PT I guess [06:40:24] <_joe_> or PDT, more probably [06:40:51] RECOVERY - Disk space on ms1004 is OK: DISK OK [06:43:51] PROBLEM - Disk space on ms1004 is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=94%): /var/lib/ureadahead/debugfs 0 MB (0% inode=94%): [06:45:22] RECOVERY - puppet last run on mw1120 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [06:45:52] RECOVERY - puppet last run on db1018 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [06:46:21] RECOVERY - puppet last run on mw1123 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [06:46:21] RECOVERY - puppet last run on mw1092 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [06:46:22] RECOVERY - puppet last run on db1040 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [07:01:59] _joe_: i merged tim's patch, so luasandbox should be good to go [07:02:09] <_joe_> ok [07:02:23] i have a good feeling about this [07:02:29] <_joe_> I'll package it and I'll deploy it on hhvm everywhere [07:02:48] sweet. yeah, it's safe [07:02:51] <_joe_> Do I have a way to confirm it works? [07:02:57] the behavior it's replacing is totally broken [07:02:59] sure, yeah, just a sec [07:03:01] <_joe_> I'd like to restart the JR in case [07:09:04] _joe_: ok so, osmium:/root/bug70177.py . evidence of bug: [07:09:13] (Usage: bug70177.py HOST NUM_REQS, btw) [07:09:23] <_joe_> ok [07:10:11] # python bug70177.py mw1019 1 [07:10:11] limitreport-cputime: 7.927 [07:10:11] limitreport-walltime: 7.972 [07:10:13] scribunto-limitreport-timeusage: 1.962 [07:10:15] --- [07:10:24] but with three requests: [07:10:58] https://dpaste.de/dJ8W/raw [07:12:10] <_joe_> on the JR, do you remember which errors woud appear in the logs? [07:12:17] <_joe_> it may be in the bug [07:12:20] note how scribunto-limitreport-timeusage balloons to >5 secs [07:12:40] but three reqs against a zend host: https://dpaste.de/qLBt/raw [07:12:58] <_joe_> heh [07:13:38] don't pay attention to cputime and walltime; they'll be busted on prod until https://gerrit.wikimedia.org/r/#/c/158550/ rolls out [07:14:51] _joe_: the errors weren't in the logs, because HHVM wouldn't fatal or throw an exception; as far as it was concerned it was doings its job enforcing the lua time limit on misbehaving scripts. [07:15:04] <_joe_> oh my [07:15:27] we saw it on the job runner because there's actually some parallelism there [07:15:30] <_joe_> it wouldn't even print a notice somewhere? [07:15:42] the script error is part of the page output [07:16:11] e.g.: https://test.wikipedia.org/wiki/BusyLoop [07:16:22] (if you click on 'Script Error' you'll get the details) [07:16:27] <_joe_> yes I've seen it [07:16:43] <_joe_> Ihoped something was in the logs as well [07:20:40] well, there's https://test.wikipedia.org/w/index.php?title=Category:Pages_with_script_errors [07:20:48] which is automatically populated/updated [07:21:00] does that count? :) [07:22:10] <_joe_> eheh kind of [07:22:12] <_joe_> :) [07:26:13] <_joe_> ok thanks - I think we can 'announce' the worst-kept secret of the world to engineering@ and in other places once I'm done with this [07:30:41] PROBLEM - DPKG on rubidium is CRITICAL: DPKG CRITICAL dpkg reports broken packages [07:32:22] PROBLEM - DPKG on mexia is CRITICAL: DPKG CRITICAL dpkg reports broken packages [07:32:41] RECOVERY - DPKG on rubidium is OK: All packages OK [07:33:18] ^ that's me [07:33:22] RECOVERY - DPKG on mexia is OK: All packages OK [07:33:34] apparently long-running apt-get upgrade -> spam irc [07:33:43] new gdnsd? [07:33:51] that too, yeah [07:34:10] I built some one off 1.11.5~precise1 and ~trusty1 for our nsX and installed them [07:34:30] .5 being the pure fix? [07:34:41] yeah, and misc other little things [07:35:05] ok, I'll upload into Debian too then [07:35:32] PROBLEM - DPKG on eeden is CRITICAL: DPKG CRITICAL dpkg reports broken packages [07:35:42] thanks :) [07:36:26] today I finally figured out how to use git-pbuilder to do chroot builds for multiple distros [07:36:35] RECOVERY - DPKG on eeden is OK: All packages OK [07:36:36] it wasn't nearly as difficult as I thought it would be :) [07:36:59] <_joe_> cool [07:38:08] and it only took about 800M disk space on my labs instance to do both precise and trusty [07:38:38] the basic idea is to do "DIST=trusty ARCH=amd64 git-pbuilder create" to set up the environment [07:39:01] and then in your git checkout of debian branch, e.g. "git-buildpackage --git-pbuilder --git-dist=trusty --git-arch=amd64 -us -uc" [08:04:00] (03CR) 10Filippo Giunchedi: [C: 031] Remove MW_DBLISTS config vars [puppet] - 10https://gerrit.wikimedia.org/r/159645 (owner: 10Ori.livneh) [08:18:48] (03CR) 10Alexandros Kosiaris: [C: 032] "Ran it through compiler, noop" [puppet] - 10https://gerrit.wikimedia.org/r/159460 (owner: 10Matanya) [08:34:24] <_joe_> !log updating php-pear php5 php5-cli php5-common php5-curl php5-dev php5-intl php5-mysql php5-xmlrpc libapache2-mod-php5 on mw1018, see USN 2344-1 [08:34:29] Logged the message, Master [08:35:25] <_joe_> akosiaris: mmm we need to rebuild our packages [08:35:39] <_joe_> (I didn't remmember we make our own php packages) [08:36:00] yeah we do.. due to 2 patches we incorporated for zend memory cleanup [08:36:09] somehow our test suite crashes php [08:36:15] <_joe_> akosiaris: ok, what's the project? [08:36:31] ? [08:36:37] I am almost done building them [08:36:40] <_joe_> in gerrit, I mean [08:36:43] <_joe_> oh ok [08:36:46] <_joe_> d'oh [08:36:49] will put them in carbon in a few [08:36:54] <_joe_> ok [08:45:50] apergos: is there anyone I should poke for https://rt.wikimedia.org/Ticket/Display.html?id=8253 ? [09:00:34] !log upgrading php5 to 5.3.10-1ubuntu3.14+wmf1 on mw1212 [09:00:39] Logged the message, Master [09:01:01] _joe_: ^ let's see what happens [09:02:06] <_joe_> akosiaris: ok, I'm actually working on testwiki to see if the luasandbox update works [09:02:12] PROBLEM - DPKG on mw1212 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:04:03] <_joe_> \o/ [09:04:10] <_joe_> it _works_ [09:04:12] RECOVERY - DPKG on mw1212 is OK: All packages OK [09:21:16] <_joe_> !log upgrading hhvm and hhvm-luasandbox across the production cluster [09:21:21] Logged the message, Master [09:32:44] akosiaris: i looked at the admin module, it calls validate_ensure which comes from modules/wmflib/lib/puppet/parser/functions/validate_ensure.rb but i can't see how the that function is called i took it into a VM i have. and puppet fails for not finding the .rb although i added it. it is in line 56 of user.pp i get: Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Unknown function validate_ensure at [09:32:44] /etc/puppet/modules/admin/manifests/user.pp:56 [09:33:09] can you please educate me ? :) [09:36:07] I need an op to graceful apache on mw1200 mw1196 and mw1186 [09:36:30] _joe_: ^ [09:36:51] <_joe_> hoo: why is it so? [09:37:00] _joe_: APC mess up again [09:37:15] no idea why we have that so frequently now, but we do [09:37:41] <_joe_> can I take a look at APC stats before gracefulling them [09:37:42] <_joe_> ? [09:37:50] sure [09:37:52] <_joe_> we may be able to set up some alert maybe' [09:38:01] but they're currently flooding the fatal logs [09:38:07] <_joe_> ok [09:38:14] <_joe_> so better make it quick [09:38:27] <_joe_> (or, we set a metric on the number of fatals) [09:38:51] matanya: not sure if I understand correctly but validate_ensure is defined in the wmflib module. That means that the puppetmaster needs to have the wmflib module in the puppet modules directory to be able to access it [09:38:55] <_joe_> !log gracefulling mw1200 mw1196 and mw1186 as they have APC issues [09:39:00] Logged the message, Master [09:39:17] matanya: after that is done, it is a "global" function. it can be used by any module [09:39:27] does this answer your question ? [09:39:34] akosiaris: my question is basically how puppet know to find it in that module ? [09:39:49] "that" being ? [09:39:56] wmflib [09:40:35] <_joe_> matanya: include wmflib [09:40:47] <_joe_> like you do with stdlib [09:41:11] ah, it parses modules//lib/puppet/parser/functions [09:41:27] _joe_: no, he speaks about the functions [09:41:33] <_joe_> hoo: done, fatal logs don't show anything more [09:41:43] <_joe_> akosiaris: include ;) [09:41:49] so if i put the function in modules/admin/lib it should work as well, and it doesn't [09:42:08] <_joe_> akosiaris: once you include the module, all its functions are available to the puppet master when compiling the manifest [09:42:15] _joe_: Any ideas about the root cause? [09:42:27] matanya: you should read carefully https://docs.puppetlabs.com/guides/custom_functions.html [09:42:28] <_joe_> hoo: no, let me look at the metrics [09:42:38] _joe_: include where ? [09:42:39] thank you akosiaris [09:42:57] <_joe_> akosiaris: if you want to use wmflib functions in another module [09:43:03] <_joe_> you have to include it [09:43:05] <_joe_> AFAIR [09:43:16] <_joe_> but this may have changed across puppet versions [09:43:55] I don't think so.. I 've never included a module for its functions [09:44:02] and they are always available to me [09:44:21] matanya: Put custom functions in the lib/puppet/parser/functions subdirectory of your module as it says in that doc [09:44:38] it is there :/ [09:44:56] <_joe_> matanya: what's the name of the function file, and what's the content? [09:44:57] matanya: be careful with custom functions... they have a tendency to not really help [09:46:00] <_joe_> hoo: I still have no idea about what caused the apc failure [09:46:16] <_joe_> but I suspect it happened more often in the last couple of days, right? [09:46:18] :S [09:46:23] Yep [09:46:36] <_joe_> that's because we gracefully restarted apache [09:46:46] <_joe_> thus cleaning the apc cache after a long time [09:46:53] we had that before, but never on such a wide scale [09:46:55] <_joe_> and well, we _fixed_ something [09:47:21] i;m checking something a sec [09:47:29] <_joe_> http://graphite.wikimedia.org/render/?width=586&height=308&_salt=1410428733.953&target=servers.mw1200.mw_apcCollector.cache_frag_pcnt.value&from=-240hours [09:47:52] <_joe_> hoo: the apc cache was completely fragmented, thus very inefficient, for quite some time [09:48:24] ok, _joe_ akosiaris Thanks! the issue was file permission on that function [09:48:37] <_joe_> and apc was basically not doing GC in that condition [09:48:57] i copied it into the admin module from wmflib module without chowning [09:50:09] _joe_: mh... how can that be prevented? Auto graceful on high fragmention (that seems dirty) [09:50:18] <_joe_> hoo: eh [09:50:39] <_joe_> hoo: btw, GC seems to have had not-so-positive effects on cache miss ratios [09:52:58] <_joe_> I guess our solution is... hhvm [09:53:12] someone rang? [09:53:52] (03PS3) 10Alexandros Kosiaris: Purge the amanda-server packages/configurations [puppet] - 10https://gerrit.wikimedia.org/r/159283 [09:53:55] <_joe_> uh, no, not really [09:54:07] <_joe_> kill -SLEEP ori [09:54:11] heh [09:54:33] <_joe_> ori: if you're unable to sleep, I'm about to re-activate the jobrunner on mw1053 [09:54:36] _joe_: https://gerrit.wikimedia.org/r/159636 [09:54:51] _joe_: \o/ [09:57:22] the profiler timing data on labs is correct too, which means the php side of things works as well [09:58:08] <_joe_> ori: did you update beta? [09:58:25] not luasandbox [09:58:36] but it's not dependent on that; it's the getrusage thing [09:58:43] which we already upgraded beta for, iirc [09:58:47] i can upgrade it now [09:58:53] (03PS4) 10Alexandros Kosiaris: Purge the amanda-server packages/configurations [puppet] - 10https://gerrit.wikimedia.org/r/159283 [09:59:03] since prod is looking stable, we can move forward to beta [09:59:28] <_joe_> :P [09:59:34] (03CR) 10jenkins-bot: [V: 04-1] Purge the amanda-server packages/configurations [puppet] - 10https://gerrit.wikimedia.org/r/159283 (owner: 10Alexandros Kosiaris) [09:59:52] PROBLEM - puppet last run on mw1053 is CRITICAL: CRITICAL: Puppet last ran 639454 seconds ago, expected 14400 [10:00:04] <_joe_> !log enabled puppet on mw1053 [10:00:09] Logged the message, Master [10:00:31] RECOVERY - Puppet freshness on mw1053 is OK: puppet ran at Thu Sep 11 10:00:27 UTC 2014 [10:01:54] RECOVERY - puppet last run on mw1053 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [10:02:15] <_joe_> ori: 2014-09-11T10:01:13+0000: Initialized loop 0 with 20 runner(s) [10:02:37] <_joe_> oh it's 9/11 I didn't even notice [10:02:57] <_joe_> which middle-east country will the US bomb this year for commemoration? [10:03:54] <_joe_> (joking aside, it's really sad that NATO decides to go to war exactly on the anniversary of a tragic terrorist attack) [10:09:05] i think it's the first time that there are no known issues with HHVM [10:09:13] i'm sure it'll be over in 90 seconds when new bugs come in [10:09:31] but i feel pretty happy about it right now [10:09:54] quick, shut down bugzilla! :P [10:10:41] <_joe_> done [10:11:03] * ori high-fives _joe_ [10:11:20] <_joe_> :) [10:12:52] <_joe_> hhvm _is_ impressive btw [10:13:48] well, you know, i wrote it myself [10:13:55] it wasn't easy! [10:16:20] * ori should really sleep [10:16:33] but not before making alexandros "i hate custom functions" kosiaris sad [10:16:47] (03PS1) 10Ori.livneh: wmflib: add to_milliseconds() / to_seconds() [puppet] - 10https://gerrit.wikimedia.org/r/159692 [10:17:00] nice and frivolous [10:17:43] <_joe_> lol [10:18:02] <_joe_> I hate custom functions as well if they're not needed [10:19:45] dismiss_objections() [10:19:58] bye! _joe_ thanks for the packages [10:23:06] (03PS5) 10Alexandros Kosiaris: Purge the amanda-server packages/configurations [puppet] - 10https://gerrit.wikimedia.org/r/159283 [10:25:11] (03PS1) 10Yuvipanda: labmon: Use a common prefix for betalabs icinga checks [puppet] - 10https://gerrit.wikimedia.org/r/159693 [10:25:13] (03PS1) 10Yuvipanda: labmon: Add low space check for / on betalabs [puppet] - 10https://gerrit.wikimedia.org/r/159694 (https://bugzilla.wikimedia.org/70141) [10:26:24] ori: btw, I added some python code to check_graphite in https://gerrit.wikimedia.org/r/#/c/159473/ (since merged). Critique / CR welcome [10:26:29] (whenever you have the time) [10:27:01] you should really sleep as well, yea [10:32:24] (03CR) 10Alexandros Kosiaris: [C: 032] Purge the amanda-server packages/configurations [puppet] - 10https://gerrit.wikimedia.org/r/159283 (owner: 10Alexandros Kosiaris) [10:34:46] godog: I guess that's a mark approval thing (and he can give access too) [10:35:09] sorry, I was off in a 'clean up tridge' window... it's been real exciting. for bad vallues of exciting [10:37:24] apergos: amanda directory is about to go away. I saw you did some great job btw on /data on tridge [10:38:41] I'm still working on it :-D [10:39:12] jsut found an old 'here's the new X password, plese delete when you're done' notice to someone... from 6 years ago. I bet they deleted it after a few hours and [10:39:19] unfortunately, after the rsync :-D [10:39:36] apergos: ack, thanks! anyone else I could poke? I'm sure mark is busy already :| [10:40:20] godog, for approval, I'm not sure, I think akosiaris also may be able to give net device access [10:40:20] and surely para void [10:40:20] (who is formally not here but still) [10:40:35] context ? [10:40:58] I only know what's on the ticket https://rt.wikimedia.org/Ticket/Display.html?id=8253 which ain't much [10:41:12] akosiaris: network device access for me, https://rt.wikimedia.org/Ticket/Display.html?id=8253 [10:41:20] yeah I can [10:41:22] I 'll do it [10:41:33] it's definitely well past 3 days and yer ops so... [10:41:45] thanks, a kosiaris [10:41:51] thanks! showing seniority already akosiaris [10:42:07] lol [10:42:17] <_joe_> godog: I guess how many white hair he got with that [10:42:32] if that is a degree of seniority I am doomed... [10:42:55] I am not getting white hair.. I am actually getting no hair at all... [10:43:00] _joe_: worst case I can send some, my beard is turning white/grey [10:43:06] <_joe_> I just discovered my first completely white hair in my beard [10:43:48] akosiaris: ah the transparent hair! same boat [10:44:01] * YuviPanda probably will never get white hair [10:44:09] at the rate I'm losing hair, that is [10:44:38] i'm making a command decsion and tossing these stafford backups from 2011. no one has ever asked for them and why would they [10:44:41] (on tridge) [10:45:01] YuviPanda: same boat man... [10:45:06] apergos: please do [10:45:20] heh [10:45:35] * YuviPanda should cut it all off at some point, when he still has the choice [10:46:09] isidore from 2010 also going, we have new incarnations of everything that's in there [10:48:12] killing project2... 2011 [10:49:45] anybody been missing formey recently? because if not, that dir's going as well [10:50:04] ldap was the last thing to move off of there I think and I have ever heard a complaint about it [10:50:53] not ldap, more like ldap-client [10:51:08] used to modify the ldap due to its /etc/ldap/ldap.conf IIRC [10:51:31] but I have modified the modify-ldap* scripts and they no longer rely on it [10:51:32] yes, the client [10:51:40] oh that and having pam_ldap installed [10:51:47] yikes that was ugly [10:51:58] eww [10:51:59] I was happy to drop that dependency [10:52:01] (03PS1) 10Filippo Giunchedi: swift: fix swift-labs-ring unbound variable [puppet] - 10https://gerrit.wikimedia.org/r/159699 [10:52:18] well perhaps you want to do the honors of delting the vestiges on tridge (if not, I'll jut hit 'enter' :-D) [10:53:19] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] swift: fix swift-labs-ring unbound variable [puppet] - 10https://gerrit.wikimedia.org/r/159699 (owner: 10Filippo Giunchedi) [10:54:16] 5... [10:54:17] 4... [10:54:44] 3..2..1 (impatient) done :-P [10:57:54] sud-thingie is going, 2008 copies of wp deployment dir plus cofs plus old mysql root paswords etc, not loving it [10:59:03] who's touched otrs in the last year? I need to ask them about old data I guess [10:59:07] (03PS2) 10Giuseppe Lavagetto: pybal: serve the virtualhost with pybal lb files with a dedicated vhost [puppet] - 10https://gerrit.wikimedia.org/r/159495 [11:04:41] apergos: JeffGreen I think [11:04:47] (I dunno if I got his name right) [11:05:04] ah sweet, I'll check with him later then, thanks! [11:06:02] :) [11:07:42] ragweed's been out of service for a couple years now, we don't need those bckups either [11:09:57] (03PS1) 10Yuvipanda: labmon: Add puppet freshness check for betalabs [puppet] - 10https://gerrit.wikimedia.org/r/159701 (https://bugzilla.wikimedia.org/70141) [11:17:11] RECOVERY - Disk space on dataset1001 is OK: DISK OK [11:19:51] (03PS1) 10Giuseppe Lavagetto: dns: add pybal-config CNAME [dns] - 10https://gerrit.wikimedia.org/r/159702 [11:22:17] folks were talking about wikitech earlier in this channel iirc [11:22:29] some sort of update? [11:23:24] apergos wikitech is now deployed with scap, etc, rather than by hand [11:24:05] great, which means I can toss the old copy from feb 2013 [11:31:11] (03PS1) 10Filippo Giunchedi: swift: remove ganglia stats via ganglia-logtailer [puppet] - 10https://gerrit.wikimedia.org/r/159705 [12:13:02] (03PS1) 10Yuvipanda: icinga: Simplify check_graphite series threshold messages [puppet] - 10https://gerrit.wikimedia.org/r/159708 [12:15:38] anyone around to merge a bunch of mostly trivial puppet patches? [12:22:09] * YuviPanda pokes Coren [12:22:14] wanna merge some monitoring patches? :) [12:22:36] nothing complex, just adding more monitors [12:22:39] (for betalabs) [12:24:09] (03PS36) 1001tonythomas: Added the bouncehandler router to catch in all bounce emails [puppet] - 10https://gerrit.wikimedia.org/r/155753 [12:29:31] (03Abandoned) 10Giuseppe Lavagetto: dns: add pybal-config CNAME [dns] - 10https://gerrit.wikimedia.org/r/159702 (owner: 10Giuseppe Lavagetto) [12:32:35] (03PS1) 10Yuvipanda: labmon: Add basic monitoring for toollabs [puppet] - 10https://gerrit.wikimedia.org/r/159709 [12:38:49] (03PS1) 10Yuvipanda: icinga: Minor formatting fix for check_graphite [puppet] - 10https://gerrit.wikimedia.org/r/159711 [12:41:02] (03PS1) 10Giuseppe Lavagetto: dns: add pybal-config CNAME [dns] - 10https://gerrit.wikimedia.org/r/159712 [12:41:54] (03CR) 10Giuseppe Lavagetto: [C: 032] icinga: Minor formatting fix for check_graphite [puppet] - 10https://gerrit.wikimedia.org/r/159711 (owner: 10Yuvipanda) [12:42:31] _joe_: \o/ it has a ton of dependencies, tho. I should probably re-arrange that series [12:42:41] <_joe_> lol ok [12:42:51] _joe_: let me re-arrange the truly trivial ones [12:43:55] (03PS2) 10Yuvipanda: icinga: Simplify check_graphite series threshold messages [puppet] - 10https://gerrit.wikimedia.org/r/159708 [12:44:11] (03PS2) 10Yuvipanda: labmon: Add basic monitoring for toollabs [puppet] - 10https://gerrit.wikimedia.org/r/159709 [12:44:27] (03PS2) 10Yuvipanda: icinga: Minor formatting fix for check_graphite [puppet] - 10https://gerrit.wikimedia.org/r/159711 [12:44:57] _joe_: triival: https://gerrit.wikimedia.org/r/#/c/159693/ https://gerrit.wikimedia.org/r/#/c/159708/ and https://gerrit.wikimedia.org/r/#/c/159711/ [12:45:10] <_joe_> YuviPanda: gimme one min [12:45:14] _joe_: cool [12:48:13] (03CR) 10Giuseppe Lavagetto: [C: 032] icinga: Minor formatting fix for check_graphite [puppet] - 10https://gerrit.wikimedia.org/r/159711 (owner: 10Yuvipanda) [12:49:05] (03PS3) 10Giuseppe Lavagetto: icinga: Simplify check_graphite series threshold messages [puppet] - 10https://gerrit.wikimedia.org/r/159708 (owner: 10Yuvipanda) [12:49:15] (03CR) 10Giuseppe Lavagetto: [C: 032] icinga: Simplify check_graphite series threshold messages [puppet] - 10https://gerrit.wikimedia.org/r/159708 (owner: 10Yuvipanda) [12:49:43] PROBLEM - puppet last run on bast4001 is CRITICAL: CRITICAL: Epic puppet fail [12:55:49] (03PS1) 10Yuvipanda: icinga: Clarify boolean check for _total key [puppet] - 10https://gerrit.wikimedia.org/r/159714 [12:57:37] (03CR) 10Giuseppe Lavagetto: [C: 032] dns: add pybal-config CNAME [dns] - 10https://gerrit.wikimedia.org/r/159712 (owner: 10Giuseppe Lavagetto) [12:59:20] (03PS1) 10Yuvipanda: icinga: Consistently use single quotes in check_graphite [puppet] - 10https://gerrit.wikimedia.org/r/159715 [13:02:40] (03PS1) 10Alexandros Kosiaris: Ensure cron absent for misc::nfs-server::home::backup [puppet] - 10https://gerrit.wikimedia.org/r/159716 [13:04:03] !log uploaded php5_5.3.10-1ubuntu3.14+wmf1 on apt.wikimedia.org [13:04:09] Logged the message, Master [13:10:04] RECOVERY - puppet last run on bast4001 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [13:14:08] (03PS3) 10Giuseppe Lavagetto: pybal: serve the virtualhost with pybal lb files with a dedicated vhost [puppet] - 10https://gerrit.wikimedia.org/r/159495 [13:14:46]