[00:00:05] RoanKattouw, ^d, ebernhardson: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150303T0000). Please do the needful. [00:00:08] pong [00:00:11] pong [00:00:13] pong [00:00:29] <^d> MatmaRex: Started yours merging [00:00:38] <^d> mutante: Yours is first, easiest [00:00:47] (03CR) 10Chad: [C: 032] noc/dbtree: remove the entire dbtree directory [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193143 (https://phabricator.wikimedia.org/T90837) (owner: 10Dzahn) [00:01:01] :) and note it was already broken before, so no worries [00:01:20] actually, last mwdeploy reverted to old dbtree [00:01:34] pong [00:03:23] springle: hello, i merged the dbtree things, puppet clones and deploys the config.php [00:03:28] (03CR) 10Chad: [C: 032] Enable CORS support logging on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/182767 (https://phabricator.wikimedia.org/T507) (owner: 10Gergő Tisza) [00:03:45] springle: but maybe the "tendril_web" is limited to connect from neon? [00:07:01] (03Merged) 10jenkins-bot: noc/dbtree: remove the entire dbtree directory [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193143 (https://phabricator.wikimedia.org/T90837) (owner: 10Dzahn) [00:07:38] !log demon Synchronized docroot/noc/: rm dbtree (duration: 00m 06s) [00:07:40] <^d> mutante: Done yours [00:07:44] Logged the message, Master [00:08:13] ^d: thanks, that should have only done things on terbium [00:09:30] !log demon Synchronized php-1.25wmf19/includes/Linker.php: (no message) (duration: 00m 05s) [00:09:32] <^d> MatmaRex: ^^ [00:09:33] Logged the message, Master [00:09:53] ^d: verified, thanks [00:09:59] good night! [00:10:28] well, let's say terbium is the only place we use docroot/noc and in all other places it never did anything. so saves a bit of time and space [00:10:59] !log demon Synchronized php-1.25wmf18/extensions/VisualEditor: (no message) (duration: 00m 05s) [00:11:02] Logged the message, Master [00:11:10] !log demon Synchronized php-1.25wmf19/extensions/VisualEditor: (no message) (duration: 00m 06s) [00:11:13] Logged the message, Master [00:11:14] <^d> Krenair: Done ^^ [00:11:28] checking [00:12:45] yeah looks good [00:12:52] thanks ^d [00:13:06] <^d> yw [00:13:45] (03PS1) 10BBlack: fix sysctl application on jessie [puppet] - 10https://gerrit.wikimedia.org/r/194000 [00:14:35] (03PS2) 10BBlack: fix sysctl application on jessie [puppet] - 10https://gerrit.wikimedia.org/r/194000 [00:14:53] (03Merged) 10jenkins-bot: Enable CORS support logging on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/182767 (https://phabricator.wikimedia.org/T507) (owner: 10Gergő Tisza) [00:15:27] PROBLEM - puppet last run on terbium is CRITICAL: CRITICAL: Puppet has 2 failures [00:15:48] !log demon Synchronized wmf-config/: cors logging for beta (duration: 00m 06s) [00:15:50] Logged the message, Master [00:15:51] <^d> tgr: Done ^^ [00:16:00] <^d> ebernhardson: Almost done with yours, just doing submodule update [00:16:01] the puppet fail on terbium is me, handling it [00:17:39] RECOVERY - puppet last run on terbium is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [00:18:31] ^d, Notice: Undefined variable: wmgImageMetricsCorsSamplingFactor in /srv/mediawiki/wmf-config/CommonSettings.php on line 1920 [00:19:15] <^d> Hmm? [00:19:16] <^d> Weird. [00:19:22] did I make a typo there: [00:19:25] ? [00:19:33] <^d> I see it defined in InitialiseSettings [00:19:50] transitional crap? [00:20:47] the patch looks good, so probably transitional, yes [00:21:00] do you still see the notice? [00:21:58] <^d> Not for the last minute [00:22:03] (03PS3) 10Ori.livneh: fix sysctl application on jessie [puppet] - 10https://gerrit.wikimedia.org/r/194000 (owner: 10BBlack) [00:23:11] !log demon Synchronized php-1.25wmf18/extensions/Flow: (no message) (duration: 00m 07s) [00:23:14] Logged the message, Master [00:23:15] <^d> ebernhardson: ^^ done [00:23:21] ^d: thanks, checking [00:24:45] ^d: all looks in order, thanks [00:25:32] <^d> yw [00:25:51] and dbtree is the new version now, fwiw [00:26:06] (03CR) 10BBlack: [V: 032] fix sysctl application on jessie [puppet] - 10https://gerrit.wikimedia.org/r/194000 (owner: 10BBlack) [00:26:12] Reedy: ^ it's gone from mw-config and all that [00:26:13] (03CR) 10BBlack: [C: 032] fix sysctl application on jessie [puppet] - 10https://gerrit.wikimedia.org/r/194000 (owner: 10BBlack) [00:36:57] What's the process for requesting SSH access to elastic10nm on wmnet? [00:37:21] https://wikitech.wikimedia.org/wiki/Requesting_shell_access [00:37:37] earldouglas: ^ all pretty well documented there [00:37:52] Thanks [00:38:03] welcome, fyi there is also the generic landing help page for ops [00:38:04] https://wikitech.wikimedia.org/wiki/Operations_requests [00:38:39] RECOVERY - Graphite Carbon on graphite2001 is OK: OK: All defined Carbon jobs are runnning. [00:40:35] ^d: works, thanks [00:40:52] apparently it takes a while for site config changes to propagate through ResourceLoaderGetConfigVars [00:41:01] 6operations, 5Patch-For-Review: dbtree - duplicated code in 2 locations - puppetize config - https://phabricator.wikimedia.org/T90837#1080595 (10Dzahn) Patches have been merged, the mw-config patch has been deployed in SWAT. confirmed on terbium. - dbtree is now in a separate repo in operations/software/dbtre... [00:41:18] 6operations, 5Patch-For-Review: dbtree - duplicated code in 2 locations - puppetize config - https://phabricator.wikimedia.org/T90837#1080599 (10Dzahn) 5Open>3Resolved [00:41:54] 10Ops-Access-Requests, 6operations: Requesting access to elastic10XY for jdouglas - https://phabricator.wikimedia.org/T91348#1080607 (10Jdouglas) 3NEW [00:41:58] PROBLEM - Graphite Carbon on graphite2001 is CRITICAL: CRITICAL: Not all configured Carbon instances are running. [00:46:38] (03PS1) 10Dzahn: add service name for tendril backend db [dns] - 10https://gerrit.wikimedia.org/r/194005 [00:49:33] 10Ops-Access-Requests, 6operations: Requesting access to elastic10XY for jdouglas - https://phabricator.wikimedia.org/T91348#1080635 (10Jdouglas) [00:54:23] (03PS1) 10Dzahn: dbtree: use service alias instead of server name [puppet] - 10https://gerrit.wikimedia.org/r/194010 [00:55:14] 10Ops-Access-Requests, 6operations: Requesting access to elastic10XY for jdouglas - https://phabricator.wikimedia.org/T91348#1080652 (10Jdouglas) Looks like I already have the necessary access to Bastion. [00:55:25] 10Ops-Access-Requests, 6operations: Requesting access to elastic10XY for jdouglas - https://phabricator.wikimedia.org/T91348#1080653 (10Jdouglas) 5Open>3Resolved [00:55:49] <^d> earldouglas: Whatcha need from elastic* boxen? [00:56:03] Just to run a few queries against them. [00:56:07] i.e. HTTP GET to :9200 [00:56:10] <^d> You don't need shell access to them [00:56:23] Yep, max helped me figure that out. :] [00:56:48] So I'm good to go. [00:57:07] <^d> Okie dokie [01:00:04] AndyRussG, ejegg: Dear anthropoid, the time has come. Please deploy CentralNotice (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150303T0100). [01:21:04] 6operations: Remove clevel@lists.wikimedia.org - https://phabricator.wikimedia.org/T91323#1080692 (10Dzahn) I unsubscribed all members and then deleted the list with "./rmlist clevel" in /var/lib/mailman/bin/" on sodium. There were no list archives so nothing needed to be kept. [01:21:46] 6operations: Remove clevel@lists.wikimedia.org - https://phabricator.wikimedia.org/T91323#1080697 (10Dzahn) 5Open>3Resolved "No such list clevel" [01:37:47] (03CR) 10Springle: "@Manybubbles, I guess you found courage and went ahead with the deploy ;-) Perfectly fine for new tables. I only really need to be a bottl" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193392 (https://phabricator.wikimedia.org/T89818) (owner: 10Glaisher) [01:46:50] springle, does it make sense to create geo_tags table everywhere to facilitate https://gerrit.wikimedia.org/r/#/c/156450/ (note that it's already being added to all new wikis indiscriminately)? [01:47:31] MaxSem: probably does, yep [01:47:45] thanks:) [01:56:36] springle, ok - preparing to create them but even though I can whack IF NOT EXISTS on CREATE TABLE, there's no analogs for CREATE INDEX. is that ok for it to spam error logs on a few wikis where these tables are already present? [01:57:46] MaxSem: eww. Why not supply indexes within the CREATE TABLE ? [01:58:39] I know why we require CREATE INDEX generally, but if you're already whacking IF NOT EXISTS in... :) [01:59:00] yeh, I guess I'll just hack it some more:P [01:59:13] MaxSem: that said, spamming error logs won't break stuff [01:59:14] heh [02:00:59] springle, https://gist.github.com/MaxSem/e81bfbf99a4e94ba315f ok? [02:01:47] MaxSem: looks good [02:01:56] wee, thanks [02:03:22] !log l10nupdate Synchronized php-1.25wmf18/cache/l10n: (no message) (duration: 00m 02s) [02:03:27] Logged the message, Master [02:04:29] !log LocalisationUpdate completed (1.25wmf18) at 2015-03-03 02:03:26+00:00 [02:04:33] Logged the message, Master [02:04:53] !log l10nupdate Synchronized php-1.25wmf19/cache/l10n: (no message) (duration: 00m 01s) [02:04:55] Logged the message, Master [02:06:00] !log LocalisationUpdate completed (1.25wmf19) at 2015-03-03 02:04:56+00:00 [02:06:04] Logged the message, Master [02:06:31] !log Creating geo_tags table everywhere it's not yet present [02:06:34] Logged the message, Master [02:17:50] !log LocalisationUpdate ResourceLoader cache refresh completed at Tue Mar 3 02:16:47 UTC 2015 (duration 16m 46s) [02:17:55] Logged the message, Master [02:18:51] !log andyrussg Synchronized php-1.25wmf19/extensions/CentralNotice/: CenralNotice update (duration: 00m 09s) [02:18:54] Logged the message, Master [02:20:35] (03CR) 10MaxSem: "Yes, except for labswiki to which I don't have access." [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/156450 (https://bugzilla.wikimedia.org/51225) (owner: 10MaxSem) [02:29:22] !log andyrussg Synchronized php-1.25wmf18/extensions/CentralNotice/: CenralNotice update (duration: 00m 07s) [02:29:26] Logged the message, Master [02:38:48] PROBLEM - HHVM queue size on mw1087 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [80.0] [02:38:58] PROBLEM - HHVM busy threads on mw1087 is CRITICAL: CRITICAL: 70.00% of data above the critical threshold [86.4] [03:31:40] !log codfw replag coming up, schema changes [03:31:46] Logged the message, Master [04:18:18] RECOVERY - Graphite Carbon on graphite2001 is OK: OK: All defined Carbon jobs are runnning. [04:21:48] PROBLEM - Graphite Carbon on graphite2001 is CRITICAL: CRITICAL: Not all configured Carbon instances are running. [05:05:14] (03PS1) 10Tim Starling: Remove labs sshd_banner [puppet] - 10https://gerrit.wikimedia.org/r/194048 [05:12:52] (03PS2) 10Tim Starling: Don't make script files writable by unprivileged users [puppet] - 10https://gerrit.wikimedia.org/r/193776 [05:13:58] (03CR) 10Tim Starling: [C: 032] Don't make script files writable by unprivileged users [puppet] - 10https://gerrit.wikimedia.org/r/193776 (owner: 10Tim Starling) [05:14:31] (03PS3) 10BryanDavis: [WIP] Add role::mediawiki_vagrant_lxc [puppet] - 10https://gerrit.wikimedia.org/r/193665 [05:16:27] (03CR) 10BryanDavis: [WIP] Add role::mediawiki_vagrant_lxc (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/193665 (owner: 10BryanDavis) [05:19:17] (03PS4) 10BryanDavis: [WIP] Add role::mediawiki_vagrant_lxc [puppet] - 10https://gerrit.wikimedia.org/r/193665 [05:26:08] (03PS5) 10BryanDavis: [WIP] Add role::mediawiki_vagrant_lxc [puppet] - 10https://gerrit.wikimedia.org/r/193665 [05:44:44] (03CR) 10Tim Landscheidt: [C: 031] "Yes. In the past, I had to plea for bastion3 to not use that banner, but if there is consensus that it should go altogether, perfect." [puppet] - 10https://gerrit.wikimedia.org/r/194048 (owner: 10Tim Starling) [05:49:46] (03CR) 10BryanDavis: [C: 04-1] "Needs a manual rebase to fix merge conflict with I71211bfd24b703e9192d18849645129b2c454990 which also added scanning the wikipedia.dblist " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191937 (https://phabricator.wikimedia.org/T48640) (owner: 1001tonythomas) [06:03:30] (03PS1) 10KartikMistry: Update apertium package [debs/contenttranslation/apertium] - 10https://gerrit.wikimedia.org/r/194051 [06:05:46] akosiaris: What can be reason for https://integration.wikimedia.org/ci/job/operations-debs-apertium-debian-glue/2/console ? [06:05:57] Pretty strange. [06:21:27] PROBLEM - txstatsd backend instances on graphite1001 is CRITICAL: CRITICAL: Not all configured txstatsd instances are running. [06:23:37] RECOVERY - txstatsd backend instances on graphite1001 is OK: OK: All defined txstatsd jobs are runnning. [06:25:59] (03CR) 10Yuvipanda: [C: 031] "+1, definitely. However, 'users just page opsen' isn't the best defense, since we don't know how many users ran into the issue and were he" [puppet] - 10https://gerrit.wikimedia.org/r/194048 (owner: 10Tim Starling) [06:26:08] RECOVERY - HHVM busy threads on mw1087 is OK: OK: Less than 30.00% above the threshold [57.6] [06:26:58] RECOVERY - HHVM queue size on mw1087 is OK: OK: Less than 30.00% above the threshold [10.0] [06:28:08] PROBLEM - puppet last run on db1023 is CRITICAL: CRITICAL: puppet fail [06:28:58] PROBLEM - puppet last run on db2018 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:08] PROBLEM - puppet last run on mw1120 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:27] PROBLEM - puppet last run on lvs2004 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:18] PROBLEM - puppet last run on mw1065 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:36] <_joe_> and good morning [06:31:05] :D [06:31:08] PROBLEM - puppet last run on cp4003 is CRITICAL: CRITICAL: Puppet has 2 failures [06:31:18] PROBLEM - HHVM queue size on mw1087 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [80.0] [06:31:38] PROBLEM - HHVM busy threads on mw1087 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [86.4] [06:32:05] (03CR) 10Legoktm: [C: 031] Remove labs sshd_banner [puppet] - 10https://gerrit.wikimedia.org/r/194048 (owner: 10Tim Starling) [06:45:28] RECOVERY - puppet last run on db2018 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [06:45:39] RECOVERY - puppet last run on mw1120 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [06:45:48] RECOVERY - puppet last run on mw1065 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [06:45:58] RECOVERY - puppet last run on lvs2004 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [06:46:48] RECOVERY - puppet last run on db1023 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:48] RECOVERY - puppet last run on cp4003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:08:15] (03PS1) 10Yuvipanda: tools: Add redis slave role [puppet] - 10https://gerrit.wikimedia.org/r/194059 (https://phabricator.wikimedia.org/T91239) [07:08:26] (03CR) 10jenkins-bot: [V: 04-1] tools: Add redis slave role [puppet] - 10https://gerrit.wikimedia.org/r/194059 (https://phabricator.wikimedia.org/T91239) (owner: 10Yuvipanda) [07:10:08] (03PS1) 10KartikMistry: Beta: CX: Add Azerbaijani (az) in target language [puppet] - 10https://gerrit.wikimedia.org/r/194060 (https://phabricator.wikimedia.org/T91230) [07:16:40] (03PS2) 10Yuvipanda: tools: Add redis slave role [puppet] - 10https://gerrit.wikimedia.org/r/194059 (https://phabricator.wikimedia.org/T91239) [07:20:22] (03CR) 10Yuvipanda: [C: 032] tools: Add redis slave role [puppet] - 10https://gerrit.wikimedia.org/r/194059 (https://phabricator.wikimedia.org/T91239) (owner: 10Yuvipanda) [07:42:20] (03PS2) 10Giuseppe Lavagetto: mediawiki: add currently installed codfw memcached [puppet] - 10https://gerrit.wikimedia.org/r/193798 [07:42:54] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] mediawiki: add currently installed codfw memcached [puppet] - 10https://gerrit.wikimedia.org/r/193798 (owner: 10Giuseppe Lavagetto) [07:43:08] (03PS2) 10Giuseppe Lavagetto: mediawiki: add codfw appservers to site.pp [puppet] - 10https://gerrit.wikimedia.org/r/193799 [07:43:56] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] mediawiki: add codfw appservers to site.pp [puppet] - 10https://gerrit.wikimedia.org/r/193799 (owner: 10Giuseppe Lavagetto) [07:49:35] (03PS4) 10KartikMistry: CX: Publish translations to the Main namespace by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193835 [07:53:29] (03CR) 10Nikerabbit: [C: 031] CX: Publish translations to the Main namespace by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193835 (owner: 10KartikMistry) [08:01:25] hmm [08:01:27] strange redis is strange [08:01:53] <_joe_> YuviPanda: wassup? [08:02:10] _joe_: so I’m trying to set one up to be a slave of another [08:02:19] and I get back an ‘OK’, but it isn’t actually been made a slave [08:02:36] however, toollabs redis is slightly restricted (several commands are disabled) so I’m wondering if that’s causing the issues [08:02:53] <_joe_> YuviPanda: how can you say it isn't a slave [08:02:58] <_joe_> and yes that may count :P [08:03:30] keys * returns empty [08:04:21] and I did a set on the master, and it isn’t there on slave [08:04:34] class toollabs::redis has list of disabled commands [08:06:40] <_joe_> I'll take a peak later [08:06:51] <_joe_> *peek [08:06:53] yeah [08:06:58] I’m off to have some food. [08:10:15] (03PS1) 10Yuvipanda: tools: Puppetize LVM volume for redis data [puppet] - 10https://gerrit.wikimedia.org/r/194068 (https://phabricator.wikimedia.org/T91370) [08:10:31] (03PS1) 10KartikMistry: style: Misc code style fixes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194069 [08:21:18] (03CR) 10Tim Landscheidt: "IIRC this would cause the issue described in T91225 (not tested)." [puppet] - 10https://gerrit.wikimedia.org/r/194068 (https://phabricator.wikimedia.org/T91370) (owner: 10Yuvipanda) [08:55:39] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 14.29% of data above the critical threshold [500.0] [08:55:40] PROBLEM - HTTP 5xx req/min on graphite2001 is CRITICAL: CRITICAL: 14.29% of data above the critical threshold [500.0] [08:55:40] PROBLEM - puppet last run on cp3009 is CRITICAL: CRITICAL: Puppet has 1 failures [08:55:48] <_joe_> mmmh [08:56:05] <_joe_> nothing major I'd say [08:59:54] Maybe https://gerrit.wikimedia.org/r/#/c/190379/ would let us know ;) [09:02:18] <_joe_> Nemo_bis: nope [09:07:58] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [09:07:58] RECOVERY - HTTP 5xx req/min on graphite2001 is OK: OK: Less than 1.00% above the threshold [250.0] [09:12:18] RECOVERY - puppet last run on cp3009 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [09:15:50] 6operations, 10ops-ulsfo: Dear ulsfo@rt.wikimedia.org, Call for Submissions on Various Academic Disciplines - https://phabricator.wikimedia.org/T91380#1081374 (10emailbot) [09:41:46] (03PS1) 10Gilles: Enable flickr uploads on commons beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194073 (https://phabricator.wikimedia.org/T86120) [09:50:54] (03PS1) 10MaxSem: Kill some usages of 'wiki' group in mobile-related settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194074 (https://phabricator.wikimedia.org/T91340) [09:50:58] PROBLEM - puppet last run on amssq40 is CRITICAL: CRITICAL: puppet fail [10:00:07] (03PS1) 10MaxSem: Convert some usages of 'wiki' to 'wikipedia' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194075 (https://phabricator.wikimedia.org/T91340) [10:01:54] (03PS1) 10MaxSem: Switch CA icon from 'wiki' to 'wikipedia' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194084 (https://phabricator.wikimedia.org/T91340) [10:04:09] (03PS1) 10MaxSem: Switch some usages of 'wiki' to 'wikipedia' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194086 (https://phabricator.wikimedia.org/T91340) [10:09:48] RECOVERY - puppet last run on amssq40 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [10:12:29] (03PS1) 10Odder: Add namespace aliases for Nepali Wikipedia (newiki) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194088 (https://phabricator.wikimedia.org/T89817) [10:53:55] (03PS1) 10KartikMistry: Beta: CX: Enable publishing to Main namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194091 [10:54:02] (03CR) 10jenkins-bot: [V: 04-1] Beta: CX: Enable publishing to Main namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194091 (owner: 10KartikMistry) [10:56:06] (03PS2) 10KartikMistry: Beta: CX: Enable publishing to Main namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194091 [10:56:13] 7Puppet, 6operations: Puppet's apache2_test_config_and_restart fails to restart apache - https://phabricator.wikimedia.org/T86652#1081579 (10Joe) I tried to reproduce this repeatedly today but with no luck. I'll try the conversion to mpm_worker again. [10:58:45] (03PS1) 10Giuseppe Lavagetto: mediawiki: transition 10 more servers to mpm_worker [puppet] - 10https://gerrit.wikimedia.org/r/194092 [10:59:45] (03PS2) 10Giuseppe Lavagetto: mediawiki: transition 10 more servers to mpm_worker [puppet] - 10https://gerrit.wikimedia.org/r/194092 [11:00:18] (03CR) 10Nikerabbit: [C: 032] Beta: CX: Enable publishing to Main namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194091 (owner: 10KartikMistry) [11:00:23] (03Merged) 10jenkins-bot: Beta: CX: Enable publishing to Main namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194091 (owner: 10KartikMistry) [11:04:06] <_joe_> Nikerabbit: are you going to deploy? [11:04:14] (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki: transition 10 more servers to mpm_worker [puppet] - 10https://gerrit.wikimedia.org/r/194092 (owner: 10Giuseppe Lavagetto) [11:04:21] * YuviPanda|afk is back [11:10:47] what's with all the varnishkafka alerts? [11:10:48] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/). [11:11:58] PROBLEM - HHVM rendering on mw1030 is CRITICAL: Connection refused [11:11:58] PROBLEM - HHVM rendering on mw1034 is CRITICAL: Connection refused [11:12:04] <_joe_> ook [11:12:12] <_joe_> this ^^ is kind of expected [11:12:38] PROBLEM - Apache HTTP on mw1030 is CRITICAL: Connection refused [11:12:49] PROBLEM - Apache HTTP on mw1034 is CRITICAL: Connection refused [11:13:47] RECOVERY - Apache HTTP on mw1030 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.046 second response time [11:13:58] RECOVERY - Apache HTTP on mw1034 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.065 second response time [11:14:08] RECOVERY - HHVM rendering on mw1030 is OK: HTTP OK: HTTP/1.1 200 OK - 65649 bytes in 0.141 second response time [11:14:08] RECOVERY - HHVM rendering on mw1034 is OK: HTTP OK: HTTP/1.1 200 OK - 65649 bytes in 0.148 second response time [11:14:08] PROBLEM - HHVM rendering on mw1032 is CRITICAL: Connection refused [11:15:18] RECOVERY - HHVM rendering on mw1032 is OK: HTTP OK: HTTP/1.1 200 OK - 65649 bytes in 0.165 second response time [11:16:00] (03CR) 10Yuvipanda: "Hmm, looks like it would..." [puppet] - 10https://gerrit.wikimedia.org/r/194068 (https://phabricator.wikimedia.org/T91370) (owner: 10Yuvipanda) [11:16:17] (03CR) 10Yuvipanda: [C: 04-2] "(-2ing for now)" [puppet] - 10https://gerrit.wikimedia.org/r/194068 (https://phabricator.wikimedia.org/T91370) (owner: 10Yuvipanda) [11:18:37] RECOVERY - Graphite Carbon on graphite2001 is OK: OK: All defined Carbon jobs are runnning. [11:21:59] PROBLEM - Graphite Carbon on graphite2001 is CRITICAL: CRITICAL: Not all configured Carbon instances are running. [11:22:08] PROBLEM - HHVM rendering on mw1036 is CRITICAL: Connection refused [11:22:28] PROBLEM - Apache HTTP on mw1036 is CRITICAL: Connection refused [11:23:14] (03PS5) 10KartikMistry: CX: Publish translations to the Main namespace by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193835 [11:23:18] RECOVERY - HHVM rendering on mw1036 is OK: HTTP OK: HTTP/1.1 200 OK - 65649 bytes in 0.165 second response time [11:23:38] RECOVERY - Apache HTTP on mw1036 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.049 second response time [11:24:18] PROBLEM - HHVM rendering on mw1038 is CRITICAL: Connection refused [11:24:19] PROBLEM - HHVM rendering on mw1031 is CRITICAL: Connection refused [11:24:48] PROBLEM - Apache HTTP on mw1031 is CRITICAL: Connection refused [11:25:28] RECOVERY - HHVM rendering on mw1038 is OK: HTTP OK: HTTP/1.1 200 OK - 65649 bytes in 0.136 second response time [11:25:28] RECOVERY - HHVM rendering on mw1031 is OK: HTTP OK: HTTP/1.1 200 OK - 65649 bytes in 0.154 second response time [11:25:58] RECOVERY - Apache HTTP on mw1031 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.061 second response time [11:29:09] PROBLEM - Apache HTTP on mw1039 is CRITICAL: Connection refused [11:30:18] RECOVERY - Apache HTTP on mw1039 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.049 second response time [11:35:01] _joe_: it's for beta only [11:35:10] (03PS1) 10KartikMistry: Update lttoolbox [debs/contenttranslation/lttoolbox] - 10https://gerrit.wikimedia.org/r/194093 [11:35:29] (03PS2) 10KartikMistry: Update apertium package [debs/contenttranslation/apertium] - 10https://gerrit.wikimedia.org/r/194051 [11:35:34] <_joe_> Nikerabbit: still, it's been said repeatedly that changes should be merged and distributed [11:36:38] hmm, I wonder how much of my replication problems are caused by the fact that these are two different versions of redis [11:36:41] (precise vs trusty) [11:36:55] <_joe_> YuviPanda: that seems like a winner tbh :P [11:37:00] _joe_: I don't remember seeing any notes how to handle beta only changes [11:37:13] _joe_: well, it’s worked for me in the past from precise to trusty, and from precise to jessie. [11:37:19] (when I was testing webproxies) [11:37:34] I thought it was obvious to deployers that they can safely continue if there are changes to *-labs.php [11:39:23] hashar: what is our default system now to build Debian packages? Should I rebuild everything on wheezy? [11:39:27] kart_: pbuilder-satisfydepends-dummy : Depends: liblttoolbox3-3.3-dev (>= 3.3) which is a virtual package. [11:39:51] I suppose that is the reason [11:39:55] akosiaris: right. But, that was earlier too. [11:40:06] yeah, I just got online [11:49:02] _joe_: ^ was the problem. [11:49:03] bah [11:49:25] <_joe_> YuviPanda: what was? [11:49:33] vm_overcommit wasn’t set [11:49:42] so bgsave failed on tools-redis [11:49:49] <_joe_> oh my [11:49:49] and so no replication [11:49:51] <_joe_> yep [11:50:07] https://gerrit.wikimedia.org/r/#/c/194095/ [11:50:17] I’m not sure what the other implications are. [11:50:23] should read up on what exactly overcommit does [11:51:38] (03PS1) 10Yuvipanda: redis: Always have redis machines set vm overcommit to 1 [puppet] - 10https://gerrit.wikimedia.org/r/194095 [11:51:42] (03CR) 10jenkins-bot: [V: 04-1] redis: Always have redis machines set vm overcommit to 1 [puppet] - 10https://gerrit.wikimedia.org/r/194095 (owner: 10Yuvipanda) [11:51:53] (03PS2) 10Yuvipanda: redis: Always have redis machines set vm overcommit to 1 [puppet] - 10https://gerrit.wikimedia.org/r/194095 [11:54:48] kart_: the debian glue Jenkins jobs are now running on Ubuntu Trusty [11:55:04] kart_: and the debian glue script uses the host machine as a default distribution/release whatever. [11:55:18] kart_: so the .deb packages build by Jenkins are build against Ubuntu Trusty. [11:55:19] _joe_: so what's the outcome? Do you want me to deploy now and update documentation? [12:04:57] (03CR) 10Alexandros Kosiaris: [C: 031] "Yes, please" [puppet] - 10https://gerrit.wikimedia.org/r/194048 (owner: 10Tim Starling) [12:07:24] (03CR) 10Hashar: [C: 031] zuul: Use umask 022 for installing zuul [puppet] - 10https://gerrit.wikimedia.org/r/193836 (https://phabricator.wikimedia.org/T90984) (owner: 10Krinkle) [12:07:40] (03CR) 10Hashar: [C: 031] contint: keep 180 min of puppet reports [puppet] - 10https://gerrit.wikimedia.org/r/193825 (https://phabricator.wikimedia.org/T87484) (owner: 10Hashar) [12:11:47] (03CR) 10Giuseppe Lavagetto: [C: 031] Remove labs sshd_banner [puppet] - 10https://gerrit.wikimedia.org/r/194048 (owner: 10Tim Starling) [12:21:06] (03PS1) 10Giuseppe Lavagetto: mediawiki: use mpm worker on mw1026-1029 [puppet] - 10https://gerrit.wikimedia.org/r/194098 [12:25:53] (03CR) 10Alexandros Kosiaris: [C: 032] Beta: CX: Add Azerbaijani (az) in target language [puppet] - 10https://gerrit.wikimedia.org/r/194060 (https://phabricator.wikimedia.org/T91230) (owner: 10KartikMistry) [12:27:05] (03PS2) 10Giuseppe Lavagetto: mediawiki: use mpm worker on mw1026-1029 [puppet] - 10https://gerrit.wikimedia.org/r/194098 [12:27:16] (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki: use mpm worker on mw1026-1029 [puppet] - 10https://gerrit.wikimedia.org/r/194098 (owner: 10Giuseppe Lavagetto) [12:27:49] (03CR) 10Giuseppe Lavagetto: [V: 032] mediawiki: use mpm worker on mw1026-1029 [puppet] - 10https://gerrit.wikimedia.org/r/194098 (owner: 10Giuseppe Lavagetto) [12:28:15] <_joe_> akosiaris: ok to merge? [12:28:32] Ι was about to aks the same thing [12:28:39] _joe_: go ahead [12:28:43] <_joe_> k [12:28:59] <_joe_> done [12:31:30] I love internet [12:31:43] my local university peers with my ISP in Paris :( [12:48:05] (03PS1) 10Giuseppe Lavagetto: apache: perform the hard restart before refreshing the service [puppet] - 10https://gerrit.wikimedia.org/r/194099 [12:49:47] (03CR) 10Giuseppe Lavagetto: [C: 032] apache: perform the hard restart before refreshing the service [puppet] - 10https://gerrit.wikimedia.org/r/194099 (owner: 10Giuseppe Lavagetto) [12:51:49] PROBLEM - puppet last run on mw1027 is CRITICAL: CRITICAL: puppet fail [12:52:27] (03PS1) 10Giuseppe Lavagetto: apache: fix typo [puppet] - 10https://gerrit.wikimedia.org/r/194100 [12:52:41] (03CR) 10Giuseppe Lavagetto: [C: 032] apache: fix typo [puppet] - 10https://gerrit.wikimedia.org/r/194100 (owner: 10Giuseppe Lavagetto) [12:52:49] (03CR) 10Giuseppe Lavagetto: [V: 032] apache: fix typo [puppet] - 10https://gerrit.wikimedia.org/r/194100 (owner: 10Giuseppe Lavagetto) [12:53:38] PROBLEM - puppet last run on mw1243 is CRITICAL: CRITICAL: puppet fail [12:53:38] PROBLEM - puppet last run on mw1032 is CRITICAL: CRITICAL: puppet fail [12:53:48] PROBLEM - puppet last run on mw1239 is CRITICAL: CRITICAL: puppet fail [12:53:58] PROBLEM - puppet last run on mw1148 is CRITICAL: CRITICAL: puppet fail [12:54:28] PROBLEM - puppet last run on mw1001 is CRITICAL: CRITICAL: puppet fail [12:54:48] PROBLEM - puppet last run on mw1022 is CRITICAL: CRITICAL: puppet fail [12:55:02] akosiaris: those pacakging changes are okay otherwise? [12:55:07] <_joe_> this was my typo ^^ [12:55:08] PROBLEM - puppet last run on mw1105 is CRITICAL: CRITICAL: puppet fail [12:55:09] PROBLEM - puppet last run on mw1024 is CRITICAL: CRITICAL: puppet fail [12:55:09] PROBLEM - puppet last run on mw1139 is CRITICAL: CRITICAL: puppet fail [12:55:09] PROBLEM - puppet last run on mw1121 is CRITICAL: CRITICAL: puppet fail [12:55:09] PROBLEM - puppet last run on mw1186 is CRITICAL: CRITICAL: puppet fail [12:55:09] RECOVERY - puppet last run on mw1027 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [12:55:18] PROBLEM - puppet last run on mw1122 is CRITICAL: CRITICAL: puppet fail [12:55:29] PROBLEM - puppet last run on mw1223 is CRITICAL: CRITICAL: puppet fail [12:55:30] _joe_ 's typo, puppet's life :D [12:55:38] PROBLEM - puppet last run on mw1108 is CRITICAL: CRITICAL: puppet fail [12:55:48] PROBLEM - puppet last run on mw1077 is CRITICAL: CRITICAL: puppet fail [12:56:07] PROBLEM - puppet last run on mw1209 is CRITICAL: CRITICAL: puppet fail [12:56:18] PROBLEM - puppet last run on fluorine is CRITICAL: CRITICAL: puppet fail [12:56:29] PROBLEM - puppet last run on mw1225 is CRITICAL: CRITICAL: puppet fail [13:06:45] kart_: which ones ? the lttoolbox and apertium ones ? [13:10:12] hmm, who else can look at the redis patch? [13:10:15] https://gerrit.wikimedia.org/r/#/c/194095/ [13:12:39] RECOVERY - puppet last run on mw1239 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [13:12:48] RECOVERY - puppet last run on mw1148 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:13:37] RECOVERY - puppet last run on mw1032 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [13:13:48] RECOVERY - puppet last run on mw1105 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [13:13:58] RECOVERY - puppet last run on mw1121 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [13:14:08] RECOVERY - puppet last run on mw1122 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [13:14:18] RECOVERY - puppet last run on mw1223 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [13:14:28] RECOVERY - puppet last run on mw1001 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [13:14:38] RECOVERY - puppet last run on mw1243 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [13:15:07] RECOVERY - puppet last run on mw1024 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [13:16:27] RECOVERY - puppet last run on mw1225 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:16:38] RECOVERY - puppet last run on mw1108 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [13:16:48] RECOVERY - puppet last run on mw1022 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [13:17:08] RECOVERY - puppet last run on mw1209 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [13:17:18] RECOVERY - puppet last run on fluorine is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [13:17:18] RECOVERY - puppet last run on mw1186 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [13:18:27] RECOVERY - puppet last run on mw1139 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [13:19:21] <_joe_> YuviPanda: I'm going to lunch, sorry. If you can wait ~ 1 hour [13:20:15] _joe_: yeah, no hurry at all. I’ve ‘fixed’ it by hand on tools-redis. [13:20:32] _joe_: and our prod redis stuff already has it applied, but in site.pp. I’m just moving it to the module. take your time. [13:20:35] I might not be here in an hour tho [13:21:44] is http://stream.wikimedia.org/ down? [13:21:51] it returns 404 for everything [13:27:27] petan: works for me. It is just that there is no / :) [13:27:41] hashar: so that python script works to you? [13:27:48] https://phabricator.wikimedia.org/T91393 [13:27:49] petan: that ? [13:27:51] ^ [13:28:00] it shows errors to me only [13:28:03] the service works if you look at wikimedia.meteor.com [13:28:15] but indeed there should probaqbly be some content served at the root / [13:28:17] it's nice that meteor say it works, but my tools can't use it :D [13:28:27] it only shows errors [13:28:36] WARNING:root:stream.wikimedia.org:80/socket.io [waiting for connection] unexpected status code (404) [13:31:28] (03CR) 10Alexandros Kosiaris: [C: 032] Update lttoolbox [debs/contenttranslation/lttoolbox] - 10https://gerrit.wikimedia.org/r/194093 (owner: 10KartikMistry) [13:32:05] (03CR) 10Alexandros Kosiaris: [C: 032] Update apertium package [debs/contenttranslation/apertium] - 10https://gerrit.wikimedia.org/r/194051 (owner: 10KartikMistry) [13:33:37] RECOVERY - puppet last run on mw1077 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [13:34:08] Package cowdancer is not available, but is referred to by another package. [13:34:11] that never cease [13:35:05] * hashar misses 'universe' [13:37:22] (03CR) 10Krinkle: "The main reason the old instance was clogging suddenly was due to the dozen extra instances co-existing. Creating twice as many logs :)" [puppet] - 10https://gerrit.wikimedia.org/r/193825 (https://phabricator.wikimedia.org/T87484) (owner: 10Hashar) [13:46:25] akosiaris: I have managed to setup cowbuilder all by myself \o/ [13:49:39] (03CR) 10Phuedx: [C: 04-1] Kill some usages of 'wiki' group in mobile-related settings (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194074 (https://phabricator.wikimedia.org/T91340) (owner: 10MaxSem) [13:52:51] hashar: \o/ [14:13:37] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "Shouldn't we move this to role::redis instead?" [puppet] - 10https://gerrit.wikimedia.org/r/194095 (owner: 10Yuvipanda) [14:20:29] (03PS1) 10Giuseppe Lavagetto: mediawiki: convert 10 more appservers to mpm worker [puppet] - 10https://gerrit.wikimedia.org/r/194106 [14:25:20] (03CR) 10Alexandros Kosiaris: [C: 031] mailman: SENDER_HEADERS use from only [puppet] - 10https://gerrit.wikimedia.org/r/154846 (https://bugzilla.wikimedia.org/46049) (owner: 10John F. Lewis) [14:27:24] (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki: convert 10 more appservers to mpm worker [puppet] - 10https://gerrit.wikimedia.org/r/194106 (owner: 10Giuseppe Lavagetto) [14:34:43] hashar: cowbuilder is nice. [14:42:59] RECOVERY - Apache HTTP on mw1087 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.072 second response time [14:43:29] RECOVERY - HHVM rendering on mw1087 is OK: HTTP OK: HTTP/1.1 200 OK - 65468 bytes in 0.161 second response time [14:47:23] (03PS1) 10Hoo man: Re-Organize (test)wikidata cron jobs [puppet] - 10https://gerrit.wikimedia.org/r/194108 [14:47:30] (03CR) 10Yuvipanda: "I think this fits better here because arguably BG saving is a core feature of redis and it doesn't work unser any load without this sysctl" [puppet] - 10https://gerrit.wikimedia.org/r/194095 (owner: 10Yuvipanda) [14:50:38] RECOVERY - HHVM queue size on mw1087 is OK: OK: Less than 30.00% above the threshold [10.0] [14:50:47] PROBLEM - check_listener_gc on thulium is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Temporarily Unavailable - string OK not found on https://127.0.0.1:443https://payments-listener.wikimedia.org/globalcollect - 214 bytes in 0.007 second response time [14:50:47] PROBLEM - check_listener_ipn on thulium is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Temporarily Unavailable - 214 bytes in 0.006 second response time [14:50:48] RECOVERY - HHVM busy threads on mw1087 is OK: OK: Less than 30.00% above the threshold [57.6] [14:54:11] Jeff_Green: here? [14:54:15] yeah [14:54:18] It's a bit early, but I'll do SWAT, gi11es has a patch in [14:54:31] Jeff_Green: see above, there's also a watchmouse alert [14:54:46] paravoid: yeah sorry, that's me [14:54:52] ah ok :) [14:55:09] I put it in maintenance mode to do a package update on the queue server [14:55:21] didn't realize watchmouse would notice [14:55:38] (03PS1) 10Giuseppe Lavagetto: mediawiki: use mpm worker for all appservers [puppet] - 10https://gerrit.wikimedia.org/r/194110 [14:55:47] PROBLEM - check_listener_gc on thulium is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Temporarily Unavailable - string OK not found on https://127.0.0.1:443https://payments-listener.wikimedia.org/globalcollect - 214 bytes in 0.007 second response time [14:55:47] PROBLEM - check_listener_ipn on thulium is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Temporarily Unavailable - 214 bytes in 0.006 second response time [14:56:05] done now, flipping back to normal operation [14:56:23] since you're here, icinga has some boron alerts for quite a while now [14:56:39] i know, i haven't had a chance to figure out what broke [14:56:51] it has something to do with the fact that we upgraded boron to trusty [14:57:04] something must have changed re. passive checks [14:57:18] i guess we should just disable monitoring for now [14:57:34] there's a phab ticket for it already [14:58:30] weird thing is that the passive checks seem to make it to the nsca server, they show up in the stored data file on neon [14:58:32] oh ok [15:00:47] RECOVERY - check_listener_gc on thulium is OK: HTTP OK: HTTP/1.1 200 OK - 248 bytes in 0.012 second response time [15:00:47] RECOVERY - check_listener_ipn on thulium is OK: HTTP OK: HTTP/1.1 200 OK - 202 bytes in 0.016 second response time [15:10:37] (03PS1) 10Giuseppe Lavagetto: mediawiki: enable mpm worker on api appservers [puppet] - 10https://gerrit.wikimedia.org/r/194113 [15:13:15] marktraceur: I'm around if you need me [15:14:34] gi11es: If it's only a beta patch I won't, I'll just make sure it's OK [15:14:43] The beta config will get updated automatically [15:15:08] (03PS2) 10Giuseppe Lavagetto: mediawiki: use mpm worker for all appservers [puppet] - 10https://gerrit.wikimedia.org/r/194110 [15:16:25] (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki: use mpm worker for all appservers [puppet] - 10https://gerrit.wikimedia.org/r/194110 (owner: 10Giuseppe Lavagetto) [15:22:25] (03PS2) 10Giuseppe Lavagetto: mediawiki: enable mpm worker on api appservers [puppet] - 10https://gerrit.wikimedia.org/r/194113 [15:25:49] (03CR) 10Glaisher: "wikibook.org is the domain which is not hosted by WMF (not wikibooks.com). See T87375. I mailed legal about this last week. I don't think " [puppet] - 10https://gerrit.wikimedia.org/r/185474 (https://phabricator.wikimedia.org/T87039) (owner: 10Glaisher) [15:26:31] (03PS3) 10Glaisher: Redirect wikibook(s).(org|com) to www.wikibooks.org [puppet] - 10https://gerrit.wikimedia.org/r/185474 (https://phabricator.wikimedia.org/T87039) [15:26:49] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [15:30:01] <_joe_> ffs puppet-merge [15:30:09] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [15:54:45] (03PS7) 1001tonythomas: Added BounceHandler extension to group1 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191937 (https://phabricator.wikimedia.org/T48640) [15:55:37] (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki: enable mpm worker on api appservers [puppet] - 10https://gerrit.wikimedia.org/r/194113 (owner: 10Giuseppe Lavagetto) [15:55:48] jouncebot, next [15:55:48] In 0 hour(s) and 4 minute(s): Morning SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150303T1600) [15:55:52] * twkozlowski wondering if Yuri Astrakhan uses IRC [15:56:11] yurik, ^ [15:56:25] Krenair, pong [15:56:46] twkozlowski wanted to know if you were here :) [15:56:53] he's not around [15:57:04] gi11es, twkozlowski, yurik: Ping for SWAT in 3 minutes [15:57:12] andre__, pong [15:57:16] anomie, pong [15:57:19] sorry andre [15:57:20] yurik: Re: Wikipedia killing the Russian guy. [15:57:32] marktraceur, ^d, manybubbles: Did any of you want SWAT this morning? [15:57:37] I do. [15:57:37] $wgLocaltimezone, yurik [15:57:40] yep, we are all owned by CIA & Massad, didn't you know? :) [15:57:41] marktraceur: ok! [15:57:42] I claimed it an hour ago [15:57:50] Ah, I didn't scroll back far enough. [15:57:53] cool [15:57:53] pong [15:57:53] :) [15:57:55] have fun marktraceur [15:57:58] I sort of just encouraged someone to add something to swat. [15:58:00] Thanks [15:58:05] Hope that'll be okay: https://gerrit.wikimedia.org/r/#/c/191937/ [15:58:13] yurik: set that to Europe/Moscow like ruwikivoyage does, and that's it. [15:58:33] twkozlowski, i doubt ru community would agree - there is a lot of politics behind that [15:58:55] yurik: ru community would not agree to having the default wiki timezone set to Moscow time? [15:58:59] yurik: Where is your extension update patch for SWAT today? [15:59:20] (03CR) 10MarkTraceur: [C: 032] Enable flickr uploads on commons beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194073 (https://phabricator.wikimedia.org/T86120) (owner: 10Gilles) [15:59:21] marktraceur, i have cherrypicked it, but didn't create a core patch [15:59:34] yurik: Well, please do [15:59:43] yurik: Well, never mind politics. My point was, instead of thinking of silly solutions with JS, just set that variable to Europe/Moscow and you're done :) [15:59:44] I'll do the other two first [16:00:04] manybubbles, anomie, ^d, marktraceur, gi11es: Respected human, time to deploy Morning SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150303T1600). Please do the needful. [16:00:11] Thanks jouncebot. [16:00:25] Trying to get gi11es's patch merged. [16:00:26] twkozlowski, we should be showing correct time at the end of the day. IT should not be same for all if the browser knows where it is [16:00:43] <^d> marktraceur: Always on time that jouncebot [16:00:46] in the mean time, we should show UTC [16:00:47] <^d> Never early, never late [16:01:12] ^d: My server seems to think it started 1:22 early, but my server is probably to blame. [16:01:22] (03PS1) 10Aklapper: Put clearer Login/Register instructions on the Phabricator login page [puppet] - 10https://gerrit.wikimedia.org/r/194126 (https://phabricator.wikimedia.org/T545) [16:04:22] marktraceur, https://gerrit.wikimedia.org/r/#/c/194127/ [16:04:31] Thanks! [16:04:41] (03PS1) 10BBlack: service_unit for varnishkafka::instance [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/194128 [16:06:13] Is zuul frozen? [16:06:14] https://integration.wikimedia.org/zuul/ [16:06:27] My job finished but zuul is just drooling on the table [16:07:00] bblack, hi, any updates with tagging? [16:07:15] marktraceur: Just hit the server with something and it'll fix it right up [16:07:23] Like slap it [16:07:48] (03CR) 10BBlack: [C: 032 V: 032] service_unit for varnishkafka::instance [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/194128 (owner: 10BBlack) [16:08:03] yurik: no [16:08:15] bblack, anythign i could help with? [16:08:15] (03CR) 10Glaisher: "More bumping. Could someone review/deploy this?" [puppet] - 10https://gerrit.wikimedia.org/r/170925 (https://phabricator.wikimedia.org/T57737) (owner: 10Glaisher) [16:09:02] not really much you can do about me, there's only one of me! If you want to raise the priority of it in the backlog, you could provide more evidence that the problem is more-widespread and urgent I guess, than it sounds initially. [16:11:01] I guess I'll restart zuul. [16:11:56] (03CR) 10Qgil: [C: 031] Put clearer Login/Register instructions on the Phabricator login page [puppet] - 10https://gerrit.wikimedia.org/r/194126 (https://phabricator.wikimedia.org/T545) (owner: 10Aklapper) [16:12:29] hey, can we have https://gerrit.wikimedia.org/r/#/c/191937/ done in todays SWAT ? [16:12:59] tonythomas: Add it to the list! [16:13:11] (03CR) 10MarkTraceur: [C: 031] Enable flickr uploads on commons beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194073 (https://phabricator.wikimedia.org/T86120) (owner: 10Gilles) [16:13:16] (03CR) 10MarkTraceur: [C: 032] Enable flickr uploads on commons beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194073 (https://phabricator.wikimedia.org/T86120) (owner: 10Gilles) [16:13:35] marktraceur: cool. adding it now [16:14:47] WTF ZUUL. [16:14:54] Gearman maybe is down [16:15:13] Oh there we bloody go [16:15:26] (03CR) 10MarkTraceur: [C: 031] Enable flickr uploads on commons beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194073 (https://phabricator.wikimedia.org/T86120) (owner: 10Gilles) [16:15:30] (03CR) 10Andrew Bogott: [C: 032] Remove labs sshd_banner [puppet] - 10https://gerrit.wikimedia.org/r/194048 (owner: 10Tim Starling) [16:15:33] (03CR) 10MarkTraceur: [C: 032] Enable flickr uploads on commons beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194073 (https://phabricator.wikimedia.org/T86120) (owner: 10Gilles) [16:15:58] marktraceur: Don't restart zuul. [16:16:02] (Next time.) [16:16:06] See -releng [16:16:08] (03Merged) 10jenkins-bot: Enable flickr uploads on commons beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194073 (https://phabricator.wikimedia.org/T86120) (owner: 10Gilles) [16:16:13] added. https://wikitech.wikimedia.org/wiki/Deployments#Tuesday.2C.C2.A0March.C2.A003 [16:16:35] (03PS1) 10BBlack: bump vk mod for service_unit [puppet] - 10https://gerrit.wikimedia.org/r/194129 [16:16:37] (03PS1) 10BBlack: puppet-level dep of vk::instance on v instances [puppet] - 10https://gerrit.wikimedia.org/r/194130 [16:17:00] (03CR) 10BBlack: [C: 032 V: 032] bump vk mod for service_unit [puppet] - 10https://gerrit.wikimedia.org/r/194129 (owner: 10BBlack) [16:17:52] (03CR) 10BBlack: [C: 032 V: 032] puppet-level dep of vk::instance on v instances [puppet] - 10https://gerrit.wikimedia.org/r/194130 (owner: 10BBlack) [16:18:07] RECOVERY - Graphite Carbon on graphite2001 is OK: OK: All defined Carbon jobs are runnning. [16:19:52] marktraceur: change works on beta commons, thanks [16:19:57] Sweet. [16:20:03] I'll just sync that file. [16:20:34] twkozlowski, nope, they don't want it to be switched to Moscow time - too many readers (25%) are not even from RU, plus that country spans 10 timezones [16:20:57] kart_: Looks like you merged a beta config change without syncing it? [16:21:11] yurik: Yeah, I see. [16:21:41] twkozlowski, it seems a {{CURRENTTIMEZONE}} tag would be better - can be used in the message above [16:22:04] !log marktraceur Synchronized wmf-config/CommonSettings-labs.php: [SWAT] [config] Enable Flickr uploads on betacommons (duration: 00m 06s) [16:22:13] Logged the message, Master [16:22:47] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [16:22:50] !log marktraceur Synchronized wmf-config/InitialiseSettings-labs.php: [SWAT] [config] Kartik forgot to sync this beta config patch for CX (duration: 00m 10s) [16:22:51] yurik: they should do the China solution and make it all one timezone :) [16:22:53] Logged the message, Master [16:23:31] OK, twkozlowski is next. [16:23:49] (03CR) 10MarkTraceur: [C: 032] Add namespace aliases for Nepali Wikipedia (newiki) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194088 (https://phabricator.wikimedia.org/T89817) (owner: 10Odder) [16:24:11] yay. [16:24:37] PROBLEM - Varnishkafka log producer on cp1070 is CRITICAL: PROCS CRITICAL: 0 processes with command name varnishkafka [16:26:08] ^ oops, a bunch of those may happen soon, working on it [16:26:08] PROBLEM - Varnishkafka log producer on cp1064 is CRITICAL: PROCS CRITICAL: 0 processes with command name varnishkafka [16:26:21] Uh oh. [16:28:59] (03Merged) 10jenkins-bot: Add namespace aliases for Nepali Wikipedia (newiki) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194088 (https://phabricator.wikimedia.org/T89817) (owner: 10Odder) [16:29:37] * twkozlowski wondering why it took jenkins-bot 5 minutes to merge that. [16:29:54] twkozlowski: Something is weird about gallium today I think [16:30:33] !log marktraceur Synchronized wmf-config/InitialiseSettings.php: [SWAT] [config] Add namespace aliases for Nepali Wikipedia (newiki) (duration: 00m 08s) [16:30:37] Logged the message, Master [16:30:38] Which means merging a core change will be extra-fun. [16:30:44] And to that end, yurik is up. [16:31:49] https://ne.wikipedia.org/w/api.php?action=query&meta=siteinfo&siprop=namespaces|namespacealiases [16:32:28] PROBLEM - Varnishkafka log producer on amssq42 is CRITICAL: PROCS CRITICAL: 0 processes with command name varnishkafka [16:33:00] twkozlowski: Expected values? [16:33:13] Errrm... did I just add a namespace alias to a non-existing namespace. [16:33:18] PROBLEM - Varnishkafka log producer on cp1060 is CRITICAL: PROCS CRITICAL: 0 processes with command name varnishkafka [16:33:19] (That's a statement, not a question.) [16:33:23] marktraceur: sorry - but I need to go afk for ~15 mins [16:33:27] (03PS1) 10BBlack: correct configfile names under service_unit [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/194131 [16:33:38] (03CR) 10BBlack: [C: 032 V: 032] correct configfile names under service_unit [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/194131 (owner: 10BBlack) [16:33:40] college lab shutdown :\ [16:33:50] tonythomas: OK, I'll wait on your patch. [16:33:58] twkozlowski: Which alias? [16:34:20] oh. will brb. thanks :) [16:34:22] Oh, fail [16:34:22] (03PS1) 10BBlack: bump vk mod [puppet] - 10https://gerrit.wikimedia.org/r/194132 [16:34:32] marktraceur: the ones leading to NS_PORTAL [16:34:34] (03CR) 10BBlack: [C: 032 V: 032] bump vk mod [puppet] - 10https://gerrit.wikimedia.org/r/194132 (owner: 10BBlack) [16:35:04] stupid me didn't realize they don't have that namespace on ne.wp [16:35:14] Should have checked that :-/ [16:35:17] Ah. [16:35:26] twkozlowski: We can still remove it, not that it's doing any harm [16:35:27] marktraceur: Care to quick merge a patch fixing that? [16:35:31] Yup, I'd be happy to [16:35:58] RECOVERY - Varnishkafka log producer on cp1064 is OK: PROCS OK: 1 process with command name varnishkafka [16:36:38] RECOVERY - Varnishkafka log producer on cp1070 is OK: PROCS OK: 2 processes with command name varnishkafka [16:37:38] RECOVERY - Varnishkafka log producer on cp1060 is OK: PROCS OK: 1 process with command name varnishkafka [16:38:03] marktraceur, deployed? [16:38:07] RECOVERY - Varnishkafka log producer on amssq42 is OK: PROCS OK: 1 process with command name varnishkafka [16:38:26] yurik: No, merging [16:38:34] (03PS1) 10Odder: Remove redundant namespace aliases from newiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194133 (https://phabricator.wikimedia.org/T89817) [16:38:38] marktraceur: ^ [16:38:52] twkozlowski: K, will do after yurik. [16:39:21] Thanks, marktraceur [16:39:33] K, merged, going [16:39:33] merged [16:39:36] )) [16:39:47] jenkins needs more speed ) [16:39:57] As ever is the case. [16:41:34] !log marktraceur Synchronized php-1.25wmf19/extensions/Graph/: [SWAT] [wmf19] Update Graph to 25.19 (duration: 00m 07s) [16:41:35] yurik: Test please :) [16:41:38] Logged the message, Master [16:41:46] * yurik testing [16:44:41] tonythomas: Let me know when you're back, I'm doing twkozlowski's config patch next. [16:45:02] tonythomas: Wait, you're enabling an extension on all non-wikipedias? [16:45:22] greg-g: You know about this? [16:45:58] bouncehandler? yeah [16:46:00] (03CR) 10MarkTraceur: [C: 032] Remove redundant namespace aliases from newiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194133 (https://phabricator.wikimedia.org/T89817) (owner: 10Odder) [16:46:05] greg-g: OK, fair enough [16:46:05] (03Merged) 10jenkins-bot: Remove redundant namespace aliases from newiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194133 (https://phabricator.wikimedia.org/T89817) (owner: 10Odder) [16:46:28] marktraceur: it would have been in my weekly update email, if I hadn't have gotten a migraine on Friday [16:46:37] s/have/had/ [16:46:46] hadn't had? hadn't. [16:46:51] :) [16:51:41] * twkozlowski just remembered we should get rid of all those 'Wikipedia' => NS_PROJECT aliases [16:53:24] marktraceur, seems to be ok [16:53:42] OK! [16:53:48] twkozlowski: Pushing yours now. [16:53:57] !log marktraceur Synchronized wmf-config/InitialiseSettings.php: [SWAT] [config] Remove redundant namespace aliases for Nepali Wikipedia (newiki) (duration: 00m 05s) [16:54:01] Logged the message, Master [16:54:03] twkozlowski: Test it again? [16:54:32] marktraceur: \o/ [16:55:39] yurik: Re: UTC again; so how come Wikivoyage agreed on Europe/Moscow? [16:55:50] (One reasons might be, it's a much smaller community?) [16:57:41] OK, tonythomas, when you get back we can push your patch. [17:00:07] twkozlowski, no idea [17:01:33] marktraceur: back! [17:01:42] have I missed the window ? [17:01:52] Well, technically yes [17:01:57] I don't think anyone is doing anything else though [17:02:04] Not for an hour [17:02:15] greg-g: Cool to go over two minutes? :) [17:02:24] (03CR) 10MarkTraceur: [C: 032] Added BounceHandler extension to group1 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191937 (https://phabricator.wikimedia.org/T48640) (owner: 1001tonythomas) [17:02:29] yay :) [17:02:34] Jeff_Green: around ? [17:02:40] ya [17:02:49] (03Merged) 10jenkins-bot: Added BounceHandler extension to group1 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191937 (https://phabricator.wikimedia.org/T48640) (owner: 1001tonythomas) [17:02:54] great! [17:03:34] !log marktraceur Synchronized wmf-config/InitialiseSettings.php: [SWAT] [config] Enable BounceHandler on non-wikipedias (duration: 00m 05s) [17:03:35] tonythomas: Testy test test test. [17:03:40] Logged the message, Master [17:03:47] marktraceur: yup. one sec [17:04:11] * maybe more than few minutes though - we will have to setup a fake email box and test bounce [17:05:02] (03PS1) 10BBlack: add service+systemd -level deps on varnish service [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/194137 [17:05:06] That's fine [17:06:30] marktraceur: its showing up in https://en.wikivoyage.org/wiki/Special:Version :) [17:06:58] Sweet. [17:07:32] and not showing up in https://en.wikipedia.org/wiki/Special:Version as expected [17:07:48] Even better. [17:07:49] now. trying out a sample bounce - hope Jeff_Green got his fake mail server at trouser.org ready [17:08:08] oh, it's there. do you need a working account initially? [17:09:16] Jeff_Green: yup ! [17:09:36] (03CR) 10BBlack: [C: 032 V: 032] add service+systemd -level deps on varnish service [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/194137 (owner: 10BBlack) [17:09:38] let me sign up in https://en.wikivoyage.org [17:09:50] (03CR) 10Giuseppe Lavagetto: add service+systemd -level deps on varnish service (031 comment) [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/194137 (owner: 10BBlack) [17:11:09] Jeff_Green: account ready ? [17:11:37] yup [17:12:18] nice. creating one mw account now [17:12:31] (03PS1) 10BBlack: use mod/vk varnish_svc_name [puppet] - 10https://gerrit.wikimedia.org/r/194138 [17:14:56] marktraceur: yes :) [17:15:02] greg-g: Thanks. [17:16:54] Jeff_Green: email conffirmed in wikivoyage - now sending out first mail to the user [17:17:18] ok, let me know when you want me to lock the account [17:17:27] Jeff_Green: can you do a "SELECT * from 'bounce_records"; " [17:17:28] (03CR) 10BBlack: [C: 032] use mod/vk varnish_svc_name [puppet] - 10https://gerrit.wikimedia.org/r/194138 (owner: 10BBlack) [17:17:49] sure, once I figure out where to look again... [17:18:02] Jeff_Green: ah. we should deactiveate the account now ( forgot that ) [17:18:03] (03PS3) 10Dzahn: Add joal to deployment group [puppet] - 10https://gerrit.wikimedia.org/r/192810 (https://phabricator.wikimedia.org/T90731) (owner: 10Ottomata) [17:18:08] tonythomas: it won't have bounced right [17:18:11] sec [17:18:28] yup. I can see the email in the trouser inbox [17:18:36] try now [17:19:12] I see the mailbox down in trouser ; sending email now. hope you somehow got into extension2 db [17:19:18] (03PS4) 10Dzahn: Add joal to deployment group [puppet] - 10https://gerrit.wikimedia.org/r/192810 (https://phabricator.wikimedia.org/T90731) (owner: 10Ottomata) [17:19:26] (03CR) 10Dzahn: [C: 032] "has approval and other checklist items already done per bug" [puppet] - 10https://gerrit.wikimedia.org/r/192810 (https://phabricator.wikimedia.org/T90731) (owner: 10Ottomata) [17:19:26] haven't got there yet [17:19:53] hmm. was that in tin or something like that ? [17:20:38] PROBLEM - puppet last run on amssq50 is CRITICAL: CRITICAL: Puppet has 1 failures [17:21:07] can someone figure out where to do a query for extension2 ? [17:21:28] i'm out of my element here...sorry [17:22:14] Jeff_Green: I think its deep down there somewhere in my chat logs. [17:22:55] i think i found it [17:23:16] Jeff_Green: yay ! [17:24:43] where's phab bot [17:25:18] joal: hey, you just became a deployer [17:26:13] got it [17:26:23] I think legoktm was kicking wikibugs last I saw. [17:26:32] tonythomas: [br_timestamp] => 20150303171939 [17:26:59] Jeff_Green: yay. cool. looks recent ? and the "from" address ? [17:27:05] right [17:27:07] or - failed recipient ? [17:27:19] ? [17:27:37] there's a row in the db for today for that user, i forget the exact br_reason... [17:27:45] it was something to the effect of "message returned" [17:28:04] in the latest row - I think you never pasted the 'br_user' field [17:28:25] true, but that timestamp is for the row with our user [17:28:33] its test@trouser.org ! ? [17:28:40] ha of course! [17:28:53] woot ! great. looks like bounce reached there. now un-subscribtion [17:28:57] sending in bounce#2 [17:29:19] (03CR) 10Dzahn: [C: 031] Add tnegrin to statistics-web-users [puppet] - 10https://gerrit.wikimedia.org/r/193848 (owner: 10Rush) [17:29:58] Jeff_Green: sent ! [17:30:37] (03PS2) 10BBlack: nginx: tmpfs for /var/lib/nginx on jessie [puppet/nginx] - 10https://gerrit.wikimedia.org/r/192826 [17:30:56] Jeff_Green: caught ? [17:31:55] yep [17:32:07] (03PS3) 10BBlack: nginx: tmpfs for /var/lib/nginx on jessie [puppet/nginx] - 10https://gerrit.wikimedia.org/r/192826 [17:32:27] Jeff_Green: cool. now our limit is 5 :\ [17:32:33] sending in 3 more [17:32:37] ok [17:33:04] (03CR) 10BBlack: [C: 032] nginx: tmpfs for /var/lib/nginx on jessie [puppet/nginx] - 10https://gerrit.wikimedia.org/r/192826 (owner: 10BBlack) [17:33:39] Jeff_Green: sent 3,4 and 5 [17:33:59] This user has not specified a valid email address. [17:34:02] yaaaaay ! [17:35:23] Jeff_Green: looks like it ran smoothly Special:Preferences of testuserverp says e-mail not confirmed [17:35:25] (03PS1) 10BBlack: bump nginx mod for tmpfs [puppet] - 10https://gerrit.wikimedia.org/r/194139 [17:35:37] (03CR) 10BBlack: [C: 032 V: 032] bump nginx mod for tmpfs [puppet] - 10https://gerrit.wikimedia.org/r/194139 (owner: 10BBlack) [17:36:32] tonythomas: congrats [17:36:59] Jeff_Green: yay ! looks like we completed 3/4 of our handling. [17:37:05] now only wikipedias left [17:37:32] most excellent! [17:38:08] RECOVERY - puppet last run on amssq50 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [17:38:22] yeah. and out of curiosity - how many rows our bounce_records have as of now ? [17:38:42] 61 [17:38:58] woot ! [17:40:16] Jeff_Green: will brb. [17:42:47] (03PS1) 10Chad: Add myself to the releasers group [puppet] - 10https://gerrit.wikimedia.org/r/194140 [17:43:19] (03PS1) 10Chad: Adding my gpg public key to the list of releasers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194141 [17:44:34] nice use of meeting time, ^d (seriously) ;) [17:45:02] <^d> :) [17:49:34] <^d> I can self-sync that docroot change but it feels like escalation somehow [17:49:49] <^d> Or rather someone who knows my key should +2 [17:50:25] <^d> bd808: We track each other :) [17:50:52] ^d: we should sign keys and be crypto nerds [17:51:10] * ^d has keybase :p [17:51:11] actually, I think it's a good idea to be signed by chris s or tim or something [17:51:18] bah, hipster crypto nerd [17:51:34] <^d> Hey I also export my key to the public rings :) [17:51:44] <^d> I just wish keybase did that automatically and played nice [17:52:13] <^d> s/rings/keyservs/ [17:52:15] (03PS1) 10BBlack: scale upload bigobj cache size [puppet] - 10https://gerrit.wikimedia.org/r/194142 [17:53:30] ^d: I can verify at lunch from my other laptop [17:53:45] * bd808 does not have keybase on his work laptop [17:54:02] <^d> greg-g: You want an invite? :p [17:54:44] ^d: i do, if you have one to spare [17:54:46] I think I have an account... [17:54:49] (03PS2) 10BBlack: scale upload bigobj cache size [puppet] - 10https://gerrit.wikimedia.org/r/194142 [17:54:53] <^d> ori: I do, e-mail? [17:54:58] ori.livneh@gmail.com [17:55:18] <^d> sent! [17:56:12] (03CR) 10BBlack: [C: 032] scale upload bigobj cache size [puppet] - 10https://gerrit.wikimedia.org/r/194142 (owner: 10BBlack) [17:57:29] <^d> greg-g: You're clearly not gregg :) [17:57:33] (03CR) 10Andrew Bogott: [C: 032] Remove regex from labs hiera config [puppet] - 10https://gerrit.wikimedia.org/r/193165 (https://phabricator.wikimedia.org/T90466) (owner: 10Thcipriani) [17:57:50] ^d: thanks! [17:57:55] <^d> yw [17:58:00] bblack: shall I merge “Brandon Black: scale upload bigobj cache size” ? [17:58:37] <^d> greg-g: just to be extra hipster, I'm going to sign the MW release with `keybase` :p [17:58:38] (03PS1) 10BBlack: make bigobj min size on jessie labs [puppet] - 10https://gerrit.wikimedia.org/r/194144 [17:58:39] tonythomas: Did it work? [17:58:50] (03CR) 10BBlack: [C: 032 V: 032] make bigobj min size on jessie labs [puppet] - 10https://gerrit.wikimedia.org/r/194144 (owner: 10BBlack) [17:59:11] andrewbogott: can you restart merge and grab both of mine then? [17:59:14] greg-g: gpg --recv-keys F5F6A067 ; gpg --list-sigs F5F6A067 [17:59:19] I had to pause to do a second commit as fixup :) [17:59:20] bblack: yep [17:59:28] done [17:59:31] thanks! [17:59:43] Jeff_Green: Do you know if tonythomas finished testing the patch, and if it worked? [17:59:55] We're about to enter the next deployment window. [18:00:04] maxsem, kaldari: Respected human, time to deploy Mobile Web (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150303T1800). Please do the needful. [18:00:21] he tested and it worked, I think he was done [18:00:24] Neither of them are here -.- [18:00:27] Oh, OK, perfect [18:02:29] marktraceur: that worked great :) [18:03:20] tonythomas: Awesome [18:03:57] the very brief 5xx spike that's about to get alerted here is already over, and was me screwing up with a varnish restart [18:04:10] heh [18:05:18] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 7.14% of data above the critical threshold [500.0] [18:05:57] bblack is botblack [18:07:03] (03CR) 10Andrew Bogott: [C: 032] Add hiera yaml for staging, refactor salt class [puppet] - 10https://gerrit.wikimedia.org/r/193545 (https://phabricator.wikimedia.org/T88304) (owner: 10Thcipriani) [18:11:19] (03PS1) 10Chad: More debugging for multiversion weird error [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194152 [18:13:24] (03PS2) 10MaxSem: Kill some usages of 'wiki' group in mobile-related settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194074 (https://phabricator.wikimedia.org/T91340) [18:17:18] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [18:17:59] (03PS2) 10BBlack: zuul: Use umask 022 for installing zuul [puppet] - 10https://gerrit.wikimedia.org/r/193836 (https://phabricator.wikimedia.org/T90984) (owner: 10Krinkle) [18:18:05] (03CR) 10BBlack: [C: 032 V: 032] zuul: Use umask 022 for installing zuul [puppet] - 10https://gerrit.wikimedia.org/r/193836 (https://phabricator.wikimedia.org/T90984) (owner: 10Krinkle) [18:18:18] RECOVERY - uWSGI web apps on graphite2001 is OK: OK: All defined uWSGI apps are runnning. [18:19:54] (03CR) 10BBlack: [C: 031] mailman: SENDER_HEADERS use from only [puppet] - 10https://gerrit.wikimedia.org/r/154846 (https://bugzilla.wikimedia.org/46049) (owner: 10John F. Lewis) [18:24:31] <^d> andrewbogott: Since you were in the meeting where we talked about it, could you have a look at https://gerrit.wikimedia.org/r/#/c/194140/? [18:24:43] yep [18:25:16] ^d: the patch is fine of course, may be subject to a waiting period, I’m not sure. I think mutante is the most up-to-date about that policy. [18:25:39] <^d> Oh dur, I probably need a Phab ticket for this then [18:25:44] * ^d forgets the obvious [18:25:53] also true! [18:28:26] (03CR) 10Andrew Bogott: "This is fine, but needs a phab ticket (and possible a waiting period, not sure about that)" [puppet] - 10https://gerrit.wikimedia.org/r/194140 (owner: 10Chad) [18:30:02] (03PS2) 10Chad: Add myself to the releasers group [puppet] - 10https://gerrit.wikimedia.org/r/194140 (https://phabricator.wikimedia.org/T91424) [18:30:40] <^d> andrewbogott: wikibugs is awol but T91424 filed [18:31:48] PROBLEM - puppet last run on cp1046 is CRITICAL: CRITICAL: Puppet has 1 failures [18:33:08] Coren: do you know w/not ^d is subject to a waiting period? I can’t even find the policy page at the moment [18:33:57] * Coren reads [18:34:37] <^d> I philed a phab task phor completeness [18:35:37] comphleteness. [18:36:50] (03PS1) 10BBlack: depool cp10(60|65|70),amssq42 for reinstalls [puppet] - 10https://gerrit.wikimedia.org/r/194158 [18:37:18] RECOVERY - puppet last run on cp1046 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [18:37:46] (03CR) 10BBlack: [C: 032 V: 032] depool cp10(60|65|70),amssq42 for reinstalls [puppet] - 10https://gerrit.wikimedia.org/r/194158 (owner: 10BBlack) [18:38:32] andrewbogott: I'm... not sure. Does it make sense to have a waiting period for something which afaict is a subset of rights he already has indirectly? [18:38:46] * Coren hunts the documentation down. [18:39:07] !log depooled cp10(60|65|70),amssq42 temporarily in pybal for reinstalls [18:39:12] Logged the message, Master [18:40:16] andrewbogott: Should be https://wikitech.wikimedia.org/wiki/Requesting_shell_access#Escalating_Existing_Shell_Access - and that would seem to answer "yes" [18:40:30] Yeah, I think this counts as ‘escalation’ even though only barely. [18:40:56] but not management approval needed, so the clock starts ticking now. [18:41:01] * Coren handles the ticket. [18:41:07] PROBLEM - puppet last run on ruthenium is CRITICAL: CRITICAL: Puppet last ran 4 hours ago [18:41:10] <^d> I have rights to caesium already? [18:41:10] ^d: Who's the manager for that? [18:41:15] So, ^d, you should nag me or Coren on Friday afternoon to merge that patch. [18:41:22] <^d> Coren: cc greg-g [18:47:28] That is SO odd. vim crashes when editing data.yaml on my box. [18:47:56] Only if I scroll below a certain point in the file, too. [18:48:05] * Coren suspects syntax highlighting. [18:48:10] (03CR) 10jenkins-bot: [V: 04-1] More debugging for multiversion weird error [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194152 (owner: 10Chad) [18:50:12] (03PS2) 10Chad: More debugging for multiversion weird error [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194152 [18:55:09] Thx mutante ! [18:59:42] Coren: sounds vaguely familiar (issue with yaml syntax highlighting in vim) you could move /usr/share/vim/vim73/synxtax/yaml.vim and see if that stops it [18:59:48] joal: you're welcome [19:00:04] twentyafterfour, greg-g: Respected human, time to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150303T1900). Please do the needful. [19:00:17] mutante: Oh, I found an easier way. ':syntax off'. And indeed, it stops breaking. :-) [19:00:39] ah :) ok. i think i updated the syntax file for yaml at one point [19:05:06] !log Starting the tuesday train deploy ( Group1 wikis to 1.25wmf19 ) [19:05:10] Logged the message, Master [19:07:22] (03CR) 10Chad: [C: 032] More debugging for multiversion weird error [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194152 (owner: 10Chad) [19:11:05] blarg, go me! [19:11:53] (03PS1) 10BBlack: cp1064 -> reinstall as well... [puppet] - 10https://gerrit.wikimedia.org/r/194166 [19:12:12] (03CR) 10BBlack: [C: 032 V: 032] cp1064 -> reinstall as well... [puppet] - 10https://gerrit.wikimedia.org/r/194166 (owner: 10BBlack) [19:12:47] PROBLEM - Host cp1064 is DOWN: PING CRITICAL - Packet loss = 100% [19:14:04] ACKNOWLEDGEMENT - DPKG on cp1064 is CRITICAL: Timeout while attempting connection Brandon Black reinstalling, missed in downtime [19:14:04] ACKNOWLEDGEMENT - Disk space on cp1064 is CRITICAL: Timeout while attempting connection Brandon Black reinstalling, missed in downtime [19:14:04] ACKNOWLEDGEMENT - HTTPS on cp1064 is CRITICAL: Return code of 255 is out of bounds Brandon Black reinstalling, missed in downtime [19:14:04] ACKNOWLEDGEMENT - NTP on cp1064 is CRITICAL: NTP CRITICAL: No response from NTP server Brandon Black reinstalling, missed in downtime [19:14:04] ACKNOWLEDGEMENT - RAID on cp1064 is CRITICAL: Timeout while attempting connection Brandon Black reinstalling, missed in downtime [19:14:05] ACKNOWLEDGEMENT - SSH on cp1064 is CRITICAL: Connection timed out Brandon Black reinstalling, missed in downtime [19:14:05] ACKNOWLEDGEMENT - Varnish HTCP daemon on cp1064 is CRITICAL: Timeout while attempting connection Brandon Black reinstalling, missed in downtime [19:14:06] ACKNOWLEDGEMENT - Varnish HTTP upload-backend on cp1064 is CRITICAL: Connection timed out Brandon Black reinstalling, missed in downtime [19:14:06] ACKNOWLEDGEMENT - Varnish HTTP upload-frontend on cp1064 is CRITICAL: Connection timed out Brandon Black reinstalling, missed in downtime [19:14:07] ACKNOWLEDGEMENT - Varnish traffic logger on cp1064 is CRITICAL: Timeout while attempting connection Brandon Black reinstalling, missed in downtime [19:14:07] ACKNOWLEDGEMENT - Varnishkafka log producer on cp1064 is CRITICAL: Timeout while attempting connection Brandon Black reinstalling, missed in downtime [19:14:08] ACKNOWLEDGEMENT - configured eth on cp1064 is CRITICAL: Timeout while attempting connection Brandon Black reinstalling, missed in downtime [19:14:08] ACKNOWLEDGEMENT - dhclient process on cp1064 is CRITICAL: Timeout while attempting connection Brandon Black reinstalling, missed in downtime [19:14:09] ACKNOWLEDGEMENT - puppet last run on cp1064 is CRITICAL: Timeout while attempting connection Brandon Black reinstalling, missed in downtime [19:14:48] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 13.33% of data above the critical threshold [500.0] [19:15:08] RECOVERY - Host cp1064 is UP: PING OK - Packet loss = 0%, RTA = 1.71 ms [19:16:03] (03Merged) 10jenkins-bot: More debugging for multiversion weird error [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194152 (owner: 10Chad) [19:19:25] (03PS8) 10Dzahn: rm module apachesync [puppet] - 10https://gerrit.wikimedia.org/r/177080 [19:19:46] (03PS1) 1020after4: Group1 wikis to 1.25wmf19 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194167 [19:20:27] (03CR) 10Yurik: [C: 031] Kill some usages of 'wiki' group in mobile-related settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194074 (https://phabricator.wikimedia.org/T91340) (owner: 10MaxSem) [19:21:28] (03PS2) 1020after4: Group1 wikis to 1.25wmf19 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194167 [19:22:13] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/). [19:22:27] <^d> twentyafterfour: Sorry, didn't mean to pull in while you had stuff in-flight on tin [19:23:11] ^d: I didn't notice ;) [19:23:20] you want me to wait a minute? [19:24:02] <^d> I rebased my change into your stuff on tin. [19:24:08] <^d> It'll just go out with the scap [19:25:46] (03CR) 1020after4: [C: 032] Group1 wikis to 1.25wmf19 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194167 (owner: 1020after4) [19:25:51] (03Merged) 10jenkins-bot: Group1 wikis to 1.25wmf19 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194167 (owner: 1020after4) [19:26:33] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [19:26:34] (03CR) 10John F. Lewis: [C: 031] "Content looks good, removing the module and moving to apache with the addition to deployment also looks good." [puppet] - 10https://gerrit.wikimedia.org/r/177080 (owner: 10Dzahn) [19:27:52] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [19:29:26] ^d: is there a scap? [19:29:35] suppose there can be [19:29:39] <^d> Oh, maybe not? I can sync-file afterwords [19:29:42] <^d> It's just one file for me [19:30:04] normally no scap for group1 [19:30:08] ^d: tuesday deploy doesn't call for a scap [19:30:17] but I can do it if you'd like ;) [19:30:18] <^d> Yeah. Days of week are hard [19:30:38] !log twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group1 to 1.25wmf19 [19:30:48] Logged the message, Master [19:30:52] !log demon Synchronized multiversion/MWMultiVersion.php: moar debugging (duration: 00m 07s) [19:30:57] Logged the message, Master [19:34:45] (03PS9) 10Dzahn: rm module apachesync [puppet] - 10https://gerrit.wikimedia.org/r/177080 [19:38:43] ok, so! misc-web! What should a server that sits behind misc-web varnish look like? Just like any other web server? [19:39:50] andrewbogott: it should not have a SSL config anymore, only needs to speak HTTP on 80, but if you still want to enforce HTTPs you need a new rewrite rule [19:40:33] ok. And to configure the varnishes… that’s in puppet? [19:40:36] beware of redirect loop when leaving an old "enforce https" rewrite rule in there [19:40:50] I will probably skip https until i have something-or-other working. [19:40:51] yes, templates/varnish/misc.inc.vcl.erb [19:40:57] cool [19:41:04] Oh, unless the front end enforces https [19:41:28] it will support https but not enforce it [19:41:40] Are there any security concerns with me just setting up an ‘It works!’ with http page on misc-web? [19:42:06] i don't think so, no [19:42:20] good, because it would be nice for me to be able to see apache before I start setting up services :) [19:42:23] first you gotta know if your server is already a backend or not [19:42:47] PROBLEM - Varnish HTTP bits on cp1070 is CRITICAL: Connection refused [19:43:09] (03PS1) 10BBlack: fix deps for /var/lib/nginx mount [puppet/nginx] - 10https://gerrit.wikimedia.org/r/194169 [19:43:17] PROBLEM - Varnishkafka log producer on cp1070 is CRITICAL: PROCS CRITICAL: 0 processes with command name varnishkafka [19:43:43] (03CR) 10BBlack: [C: 032] fix deps for /var/lib/nginx mount [puppet/nginx] - 10https://gerrit.wikimedia.org/r/194169 (owner: 10BBlack) [19:43:47] PROBLEM - puppet last run on cp1070 is CRITICAL: CRITICAL: Puppet has 1 failures [19:44:24] (03PS1) 10BBlack: bump nginx module [puppet] - 10https://gerrit.wikimedia.org/r/194170 [19:44:36] (03CR) 10BBlack: [C: 032 V: 032] bump nginx module [puppet] - 10https://gerrit.wikimedia.org/r/194170 (owner: 10BBlack) [19:44:36] PROBLEM - HTTPS on cp1070 is CRITICAL: Return code of 255 is out of bounds [19:44:46] andrewbogott: the server you are running on apache on, has it been used before to run something else behind misc-web? [19:44:56] nope, fresh install [19:45:01] californium [19:45:03] (03CR) 10Phuedx: [C: 031] Kill some usages of 'wiki' group in mobile-related settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194074 (https://phabricator.wikimedia.org/T91340) (owner: 10MaxSem) [19:45:21] andrewbogott: then first you need to add that in manifests/role/cache.pp see from line 1645 [19:45:37] after that you can use it as a backend in the varnish template [19:45:43] thank you! [19:45:45] yw [19:46:03] with any luck I’ll get this working before Yosemite crashes and I lose my backscroll [19:46:07] RECOVERY - Varnish HTTP bits on cp1070 is OK: HTTP OK: HTTP/1.1 200 OK - 189 bytes in 0.003 second response time [19:46:08] * andrewbogott writes it down instead [19:46:37] RECOVERY - Varnishkafka log producer on cp1070 is OK: PROCS OK: 2 processes with command name varnishkafka [19:46:47] RECOVERY - HTTPS on cp1070 is OK: SSLXNN OK - 36 OK [19:47:06] RECOVERY - puppet last run on cp1070 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [19:48:58] I've observed mail delivery delays ranging from 4 minutes to 4 hours in past 5 days. Headers indicate the delay happens at "localhost -> sodium.wikimedia.org". Known issue or not? [19:49:38] Nikerabbit: https://phabricator.wikimedia.org/T61731 ? [19:49:50] chasemp: exactly, thank you [19:50:34] chasemp: but nothing to do with yahoo this time [19:51:17] hmmm yeah maybe the original issue wasn't as yahoo specific as thought [19:51:17] idk [19:51:31] seems eerily similar [19:52:12] chasemp: yeah it's been reported recently to me by the community [19:52:59] should I comment on the bug or anything I can help with? [19:53:28] I would comment and poke a few people if you think it's an ongoing broader scope thing [19:53:40] I really wasn't in the loop on it, just saw it existed [19:53:55] ok [19:55:47] no idea about scope, but I've noticed it two times when I was expecting email, hard to tell how many I've missed [19:59:04] (03PS1) 10BBlack: min_free_kbytes should always be an integer [puppet] - 10https://gerrit.wikimedia.org/r/194171 [19:59:33] (03CR) 10BBlack: [C: 032 V: 032] min_free_kbytes should always be an integer [puppet] - 10https://gerrit.wikimedia.org/r/194171 (owner: 10BBlack) [20:00:06] is wikibugs not reporting things properly? [20:00:11] it's been having issues [20:00:33] I think it's having issues joining channels... [20:03:12] ottomata: about? [20:03:34] (03PS1) 10BBlack: repool cp10(60|64|65|70),amssq42 post-reinstall [puppet] - 10https://gerrit.wikimedia.org/r/194173 [20:03:48] (03CR) 10BBlack: [C: 032 V: 032] repool cp10(60|64|65|70),amssq42 post-reinstall [puppet] - 10https://gerrit.wikimedia.org/r/194173 (owner: 10BBlack) [20:04:25] hiya chasemp yup [20:04:54] ottomata: would you be willing to take on https://phabricator.wikimedia.org/T90927 [20:04:57] he says keep stat [20:05:01] but stat1002.eqiad.wmnet has like 5 groups [20:05:06] and I don't know which is appropriate, etc [20:05:17] my guess is you know exactly what is right there? [20:05:43] !log cp10(60|64|65|70),amssq42 repooled in pybal [20:05:49] Logged the message, Master [20:06:01] chasemp, likely [20:06:06] statistics-privatedata-users [20:06:17] that is the usual one people have if they are on stat1002 [20:06:25] there are others too, but that is the main one [20:06:33] and, the reason people would ahve wanted access to stat1002 [20:06:36] if tfinc's is a very old account [20:06:44] is to be able to do stuff that statistics-privatedata-users is for [20:06:51] hmm ? [20:06:56] he does not need to be on stat1001 [20:07:08] oh, that's where you are [20:07:13] tfinc, yeah, i don't think you need to be on stat1001 [20:07:16] that's just a web host [20:07:25] of some datasets, data access shoudl be done via stat1002 or stat1003 [20:07:31] ottomata: yeah all i need is the regular host that i'd need to query for data analytics [20:07:40] ja, so chasemp [20:07:52] tfinc you are not on stat1003? hm. tfinc, when you say query for data analytics [20:07:56] do you mean mysql research slaves? [20:08:08] ottomata: assigning that one to you :D [20:08:12] bah! [20:08:13] ottomata: whatever i would need to query the event logging streams [20:08:20] i will get to it next week! :) [20:08:21] PROBLEM - Kafka Broker Messages In Per Second on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 9 data above and 45 below the confidence bounds [20:08:24] eventlogging ah [20:08:25] ok stat1003 [20:08:27] heh no worries [20:08:39] statistics-users [20:08:44] ottomata: if you could maybe comment to the final result? [20:08:46] chasemp, can I just use my discretion? [20:08:57] it's to be treated like a new access-request [20:08:58] AFAIK [20:09:05] this account I guess predated or never went through puppet [20:09:09] ok [20:09:36] commented [20:09:51] tx sir [20:10:53] ottomata: do you know what would cause / how to debug the icinga check that ends up saying "Varnishkafka Delivery Errors per minute [20:11:07] UNKNOWN: No valid datapoints found" [20:11:32] I've been having that on all the jessie test hosts so far, and now it seems to be spreading to some others, probably due to related recent changes in puppet [20:12:16] (the others it spread to now seem to be only the precise caches in esams, for some reason) [20:12:30] https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?host=all&type=detail&servicestatustypes=8&hoststatustypes=3&serviceprops=2097162&nostatusheader [20:12:35] tfinc: on https://phabricator.wikimedia.org/T90927 who is the supervisor/team lead / tie wearing person who should approve? [20:14:17] (03PS10) 10Dzahn: rm module apachesync [puppet] - 10https://gerrit.wikimedia.org/r/177080 [20:14:57] (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/613/change/177080/html/tin.eqiad.wmnet.html" [puppet] - 10https://gerrit.wikimedia.org/r/177080 (owner: 10Dzahn) [20:16:47] bblack, have you checked the json stats output from varnishkafka? [20:16:50] in [20:17:17] /var/cache/varnishkafka/stats.json [20:17:44] touches puppet deployment role and watches puppet on tin [20:17:51] the file exists and has json data in it [20:18:37] this is (probably) somehow related to a misnaming of some $name variable substitution somewhere was my best guess, e.g. s/webrequest/varnishkafka-webrequest/ or vice-versa, due to changes related to systemd service defs, etc [20:19:29] I think vk in general is delivering data, it's just we're not getting the monitoring on vk dr_err [20:20:18] ori: _joe_: ^ fyi, that apachesync module is killed now and that also deleted apache-graceful-all , and closing that related ticket [20:27:22] (03CR) 10John F. Lewis: [C: 031] Add apache config for m.{project}.org (-wikipedia) [puppet] - 10https://gerrit.wikimedia.org/r/185461 (https://phabricator.wikimedia.org/T78421) (owner: 10Glaisher) [20:27:42] (03PS1) 10Andrew Bogott: Set up horizon.wikimedia.org on misc-web. [puppet] - 10https://gerrit.wikimedia.org/r/194179 [20:27:47] ottomata: can this stuff be removed now? (and could it be related?) https://gerrit.wikimedia.org/r/#/c/187316/ [20:29:17] I notice that was end-of-jan, which was around the same timeframe these first jessie nodes were reinstalling too, so it may be that rather than jessie-specific (that the other nodes have been online the whole time, but these were freshly provisioned circa just after that patch) [20:29:19] (03PS1) 10Andrew Bogott: Point horizon.wikimedia.org to misc-web. [dns] - 10https://gerrit.wikimedia.org/r/194180 [20:29:40] (03CR) 10BryanDavis: [C: 031] "Matches which I know to be Chad's public key." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194141 (owner: 10Chad) [20:29:47] bblack, what stuff be removed? [20:29:52] oh [20:29:55] the code bits [20:29:55] yes. [20:30:10] yeah you can just cut it out now that is ensured absent everywhere [20:31:15] (03CR) 10Dzahn: [C: 031] Point horizon.wikimedia.org to misc-web. [dns] - 10https://gerrit.wikimedia.org/r/194180 (owner: 10Andrew Bogott) [20:31:54] mutante: is that really all it takes? [20:31:57] (03CR) 10Dzahn: [C: 031] Set up horizon.wikimedia.org on misc-web. [puppet] - 10https://gerrit.wikimedia.org/r/194179 (owner: 10Andrew Bogott) [20:32:20] (03PS19) 10Andrew Bogott: Add class and role for Openstack Horizon [puppet] - 10https://gerrit.wikimedia.org/r/170340 [20:32:33] andrewbogott: if you have Apache config up and running on the backend and that is just http, yea [20:32:42] cool. [20:32:45] I don’t, but will soon. [20:32:56] After I re-read this code I wrote 100 years ago [20:33:15] it can just be about the order of merging things [20:33:23] but not that critical when adding a new service [20:33:43] you can first do Apache and Varnish but not DNS [20:34:09] could test it by hacking /etc/hosts locally [20:34:22] and when you like you switch DNS over [20:36:48] andrewbogott: copy Apache config snippet from existing templates, like templates/apache/sites/servermon.wikimedia.org.erb [20:37:29] see the Rewrite rules there to enforce https [20:39:51] (03PS1) 10BBlack: remove old vk drerr check [puppet] - 10https://gerrit.wikimedia.org/r/194181 [20:40:34] (03CR) 10BBlack: [C: 032 V: 032] remove old vk drerr check [puppet] - 10https://gerrit.wikimedia.org/r/194181 (owner: 10BBlack) [20:43:53] (03CR) 10Dzahn: [C: 032] Re-Organize (test)wikidata cron jobs [puppet] - 10https://gerrit.wikimedia.org/r/194108 (owner: 10Hoo man) [20:45:37] (03CR) 10Dzahn: "thanks, i also wondered about these in icinga a couple times" [puppet] - 10https://gerrit.wikimedia.org/r/194181 (owner: 10BBlack) [20:46:20] (03CR) 10Dzahn: "Notice: /Stage[main]/Misc::Maintenance::Wikidata/Cron[wikibase-dispatch-changes-test]/minute: minute changed '*/5' to '*/15'" [puppet] - 10https://gerrit.wikimedia.org/r/194108 (owner: 10Hoo man) [20:46:28] chasemp: are you doing something funny with phab ? [20:46:42] am no [20:46:45] t [20:47:03] i'm getting mails from commit i did over a year ago [20:47:12] *commits [20:47:21] i believe ^d has been importing gerrit repos [20:47:27] could that be it? [20:47:32] <^d> yes [20:48:53] https://phabricator.wikimedia.org/settings/panel/emailpreferences/ -> Audit -> Ignore "a commit is created" [20:49:47] thank you all [20:51:58] hrmm, puppet, that's a cronjob with ensure => absent, but you don't remove it. come on [20:52:46] (03CR) 10Dzahn: "puppet did not remove the cron it was supposed to remove:" [puppet] - 10https://gerrit.wikimedia.org/r/194108 (owner: 10Hoo man) [20:54:56] (03CR) 10Dzahn: "i deleted wikibase-dispatch-changes3 manually" [puppet] - 10https://gerrit.wikimedia.org/r/194108 (owner: 10Hoo man) [20:57:19] (03CR) 10Greg Grossmeier: [C: 031] Add myself to the releasers group [puppet] - 10https://gerrit.wikimedia.org/r/194140 (https://phabricator.wikimedia.org/T91424) (owner: 10Chad) [20:57:27] (03CR) 10Greg Grossmeier: [C: 031] Adding my gpg public key to the list of releasers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194141 (owner: 10Chad) [21:00:17] (03CR) 10Dzahn: [C: 031] "approval on T91424" [puppet] - 10https://gerrit.wikimedia.org/r/194140 (https://phabricator.wikimedia.org/T91424) (owner: 10Chad) [21:04:26] (03CR) 10BryanDavis: "Spotted some typos" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/193665 (owner: 10BryanDavis) [21:06:53] ottomata: ok I found a clue: logster has changed [21:06:54] root@amssq42:/var/cache/varnishkafka# /usr/bin/logster -o statsd --statsd-host=localhost:8125 --metric-prefix=varnishkafka.amssq42.webrequest.text JsonLogster /var/cache/varnishkafka/webrequest.stats.json [21:06:59] Usage: logster [options] parser logfile [21:07:01] logster: error: option -o: invalid choice: 'statsd' (choose from 'graphite', 'ganglia', 'stdout') [21:07:07] whereas the equivalent command with same -o works fine on the older nodes [21:07:18] or, we're missing a package perhaps? [21:08:25] I uh wrote the statsd handler for logster and it doesn't require another package [21:08:32] in addition I mean [21:08:41] hmm [21:08:45] but maybe the package is old? [21:08:58] where did you get logster, bblack? [21:08:58] yup [21:09:00] maybe it is in jessie? [21:09:02] on the new hosts: ii logster 0.0.1-2 all Generate metrics from logfiles for Graphite and Ganglia [21:09:05] because we have our own [21:09:11] ah, there we go! [21:09:20] http://apt.wikimedia.org/wikimedia/pool/main/l/logster/ [21:09:25] so, I need to repackage our logster for jessie [21:09:51] aye guess so :/ [21:09:52] https://gerrit.wikimedia.org/r/#/admin/projects/operations/debs/logster [21:09:54] shoudl be easy though [21:10:33] https://github.com/wikimedia/operations-debs-logster [21:11:00] git-buildpacakge should do it [21:12:50] PROBLEM - puppet last run on mw1210 is CRITICAL: CRITICAL: Puppet has 1 failures [21:13:14] still worked at dA then actually https://github.com/wikimedia/operations-debs-logster/commit/68e26f3b58b3b4e2b423076dcbac68c44f5cc944 [21:16:43] Hello, quick q: Is the statsd server down? [21:19:06] bblack: ^ i saw you were on that, right [21:19:12] memeht: yes, i think it is [21:19:44] ah, makes sense. I have some timers and they are all emitting 0 ms [21:21:29] mutante: are the statsd servers for both beta labs and production down? [21:21:58] what I'm looking at is coincidentally related to the same stats, but it's not me [21:25:00] logster's updated on all the jessie caches now, so those 5 should eventually resolve [21:25:41] I still have the no-data-for-vk-drerr for many-but-not-all other caches that just cropped up today, which is probably a separate issue, and could be related to whatever wider statsd problem is going on [21:26:05] what wider statsd problem? [21:26:19] statsd in prod looks fine to me [21:27:04] I don't know, whatever people are talking about above [21:27:14] i just assumed " invalid choice: 'statsd' (choose from 'graphite', 'ganglia', 'stdout')" was related when memeht asked if statsd is down [21:27:22] ah, no, it isn't [21:27:38] statsd isn't down? [21:27:41] it was just causing missing stats from 5 specific hosts [21:27:48] no, it's not down [21:28:01] e.g. https://graphite.wikimedia.org/render/?title=navigationStart%20to%20loadEventEnd%20on%20desktop%20sites,%20last%20hour&vtitle=milliseconds&from=-1hour&width=1024&height=500&until=now&areaMode=none&hideLegend=false&lineWidth=1&lineMode=connected&target=alias(color(frontend.navtiming.totalPageLoadTime.desktop.overall.mean,%22blue%22),%22mean%22)&target=alias(color(frontend.navtiming.totalPageLoadTime.desktop.overall.99percent [21:28:01] ile,%22red%22),%2299th%20percentile%22) [21:28:09] long url is long [21:30:44] ori, but for some reason, all my timers are emitting 0ms, and this began yesterday at noon? [21:31:50] RECOVERY - puppet last run on mw1210 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [21:32:54] memeht: when is yesterday at noon in UTC terms? [21:33:13] (or hours ago) [21:33:53] some of the (possibly-related) missing stats I'm looking at started 15h ago [21:33:56] bblack, memeht parsoid timer output being broken is a known thing .. we pushed a broken patch y'day. [21:34:44] what I'm seeing isn't parsoid related I don't think, it's missing/undefined data for varnishkafka->txstatsd->graphite [21:34:56] we deployed a new parsoid version @ 1:15 pm pst. [21:35:23] (unless parsoid is somehow involved in varnishkafka graphite data) [21:35:25] bblack: PST + 8hrs = ~9pm [21:35:46] memeht: so your issue is since ~24h ago? [21:36:19] bblack: yes [21:36:39] tjat [21:36:52] that lines up with subbu I think, but not with the other thing I'm looking at [21:37:39] the parsoid vanishes aren't emitting varnishkafka data [21:37:52] afaik, IANAL etc [21:38:19] bblack, yes memeht is talking about parsoid timers .. and we know that we broke parsoid timers as part of y'days deploy. so parsoid's timer issues shouldn't be related to what you are investigating. [21:38:38] kk [21:40:27] (03CR) 10Chad: [C: 032] Adding my gpg public key to the list of releasers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194141 (owner: 10Chad) [21:45:10] (03CR) 10Hashar: "thanks bblack & krinkle !" [puppet] - 10https://gerrit.wikimedia.org/r/193836 (https://phabricator.wikimedia.org/T90984) (owner: 10Krinkle) [21:45:19] !log re-initializing cassandra on test hosts xenon, praseodymium and cerium for new test run; expect some downtime [21:45:27] Logged the message, Master [21:45:41] PROBLEM - Cassandra database on xenon is CRITICAL: PROCS CRITICAL: 0 processes with UID = 109 (cassandra), command name java, args CassandraDaemon [21:46:58] (03Merged) 10jenkins-bot: Adding my gpg public key to the list of releasers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194141 (owner: 10Chad) [21:48:31] PROBLEM - Cassandra database on praseodymium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 109 (cassandra), command name java, args CassandraDaemon [21:48:41] PROBLEM - Cassandra database on cerium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 109 (cassandra), command name java, args CassandraDaemon [21:49:33] !log demon Synchronized docroot/mediawiki/keys/: (no message) (duration: 00m 05s) [21:49:40] Logged the message, Master [21:50:51] RECOVERY - Cassandra database on cerium is OK: PROCS OK: 1 process with UID = 109 (cassandra), command name java, args CassandraDaemon [21:54:31] RECOVERY - Cassandra database on xenon is OK: PROCS OK: 1 process with UID = 109 (cassandra), command name java, args CassandraDaemon [21:55:11] RECOVERY - Cassandra database on praseodymium is OK: PROCS OK: 1 process with UID = 109 (cassandra), command name java, args CassandraDaemon [21:55:34] (03PS2) 10Andrew Bogott: Set up horizon.wikimedia.org on misc-web. [puppet] - 10https://gerrit.wikimedia.org/r/194179 [21:55:36] (03PS20) 10Andrew Bogott: Add class and role for Openstack Horizon [puppet] - 10https://gerrit.wikimedia.org/r/170340 [21:56:44] (03CR) 10Andrew Bogott: [C: 032] Add class and role for Openstack Horizon [puppet] - 10https://gerrit.wikimedia.org/r/170340 (owner: 10Andrew Bogott) [22:01:11] (03PS3) 10Andrew Bogott: Set up horizon.wikimedia.org on misc-web. [puppet] - 10https://gerrit.wikimedia.org/r/194179 [22:01:13] (03PS1) 10Andrew Bogott: Set up Californium as the Horizon host [puppet] - 10https://gerrit.wikimedia.org/r/194202 [22:02:15] (03CR) 10jenkins-bot: [V: 04-1] Set up Californium as the Horizon host [puppet] - 10https://gerrit.wikimedia.org/r/194202 (owner: 10Andrew Bogott) [22:02:57] (03CR) 10Dzahn: "role:horizon -> role::horizon" [puppet] - 10https://gerrit.wikimedia.org/r/194202 (owner: 10Andrew Bogott) [22:03:39] (03PS2) 10Andrew Bogott: Set up Californium as the Horizon host [puppet] - 10https://gerrit.wikimedia.org/r/194202 [22:03:41] (03PS4) 10Andrew Bogott: Set up horizon.wikimedia.org on misc-web. [puppet] - 10https://gerrit.wikimedia.org/r/194179 [22:05:15] (03CR) 10Dzahn: [C: 031] Set up Californium as the Horizon host [puppet] - 10https://gerrit.wikimedia.org/r/194202 (owner: 10Andrew Bogott) [22:05:39] (03CR) 10Andrew Bogott: [C: 032] Set up Californium as the Horizon host [puppet] - 10https://gerrit.wikimedia.org/r/194202 (owner: 10Andrew Bogott) [22:07:01] (03CR) 10Andrew Bogott: [C: 032] Set up horizon.wikimedia.org on misc-web. [puppet] - 10https://gerrit.wikimedia.org/r/194179 (owner: 10Andrew Bogott) [22:09:47] (03PS1) 10Andrew Bogott: memcached_ip => ip [puppet] - 10https://gerrit.wikimedia.org/r/194203 [22:10:21] PROBLEM - puppet last run on californium is CRITICAL: CRITICAL: puppet fail [22:10:52] (03CR) 10Andrew Bogott: [C: 032] memcached_ip => ip [puppet] - 10https://gerrit.wikimedia.org/r/194203 (owner: 10Andrew Bogott) [22:14:24] (03PS1) 10Andrew Bogott: Switched horizon default version to icehouse [puppet] - 10https://gerrit.wikimedia.org/r/194206 [22:15:59] (03CR) 10Andrew Bogott: [C: 032] Switched horizon default version to icehouse [puppet] - 10https://gerrit.wikimedia.org/r/194206 (owner: 10Andrew Bogott) [22:18:01] RECOVERY - puppet last run on californium is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [22:18:49] (03PS1) 10Dzahn: zotero: fix "invalid byte sequence in US-ASCII" [puppet] - 10https://gerrit.wikimedia.org/r/194209 (https://phabricator.wikimedia.org/T91453) [22:20:42] (03CR) 10Andrew Bogott: [C: 032] Point horizon.wikimedia.org to misc-web. [dns] - 10https://gerrit.wikimedia.org/r/194180 (owner: 10Andrew Bogott) [22:20:44] (03PS2) 10Dzahn: zotero: fix "invalid byte sequence in US-ASCII" [puppet] - 10https://gerrit.wikimedia.org/r/194209 (https://phabricator.wikimedia.org/T91453) [22:21:05] (03CR) 10Ori.livneh: [C: 031] zotero: fix "invalid byte sequence in US-ASCII" [puppet] - 10https://gerrit.wikimedia.org/r/194209 (https://phabricator.wikimedia.org/T91453) (owner: 10Dzahn) [22:21:21] (03CR) 10Dzahn: [C: 032] zotero: fix "invalid byte sequence in US-ASCII" [puppet] - 10https://gerrit.wikimedia.org/r/194209 (https://phabricator.wikimedia.org/T91453) (owner: 10Dzahn) [22:23:36] PROBLEM - puppet last run on cp3006 is CRITICAL: CRITICAL: puppet fail [22:23:37] PROBLEM - Memcached on californium is CRITICAL: Connection refused [22:25:56] RECOVERY - Memcached on californium is OK: TCP OK - 0.000 second response time on port 11000 [22:36:40] (03PS1) 10Dzahn: planet: convert UTF-8 to HTML entities [puppet] - 10https://gerrit.wikimedia.org/r/194214 (https://phabricator.wikimedia.org/T91453) [22:37:21] (03PS2) 10Dzahn: planet: convert UTF-8 to HTML entities [puppet] - 10https://gerrit.wikimedia.org/r/194214 (https://phabricator.wikimedia.org/T91453) [22:37:38] (03CR) 10Dzahn: "sounds minor but actually this will unblock upgrading puppet version in production." [puppet] - 10https://gerrit.wikimedia.org/r/194214 (https://phabricator.wikimedia.org/T91453) (owner: 10Dzahn) [22:38:57] 6operations, 10Continuous-Integration, 5Patch-For-Review: invalid byte sequence in US-ASCII - puppet issues with UTF-8 - https://phabricator.wikimedia.org/T91453#1084710 (10Dzahn) [22:39:26] mutante: ok, now i’m trying to debug misc-web. What kind of log files should I be looking at? [22:39:34] (03CR) 10Dzahn: [C: 032] planet: convert UTF-8 to HTML entities [puppet] - 10https://gerrit.wikimedia.org/r/194214 (https://phabricator.wikimedia.org/T91453) (owner: 10Dzahn) [22:39:55] mutante: for context… https://horizon.wikimedia.org/ errors out and I don’t see apache getting hit on californium [22:40:25] andrewbogott: just a minute, in the middle of a merge [22:40:33] np [22:40:52] (03PS3) 10Dzahn: planet: convert UTF-8 to HTML entities [puppet] - 10https://gerrit.wikimedia.org/r/194214 (https://phabricator.wikimedia.org/T91453) [22:41:57] andrewbogott: ran puppet on cp1043 and cp1044 ? [22:42:02] those are the 2 misc boxes [22:42:11] It’s been half an hour, but… will try [22:42:11] did they actually get the config change already? [22:42:27] RECOVERY - puppet last run on cp3006 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [22:43:51] ah, it’s a connectivity issue [22:45:19] the misc boxes cant talk to californium? [22:45:25] because private labs vlan? [22:45:30] mutante (or bblack) any idea how I can get 80/443 connectivity between misc-web and this box that’s on the labs-private subnet? [22:45:32] uhm.. [22:45:34] mutante: yeah, I assume so [22:45:59] yea, so we want to separate those networks by design [22:46:15] i'm not sure about using misc-web then .. hmm [22:46:53] mutante: One way or another I ultimately to access the rest interface on virt1000. So there has to be a hole someplace [22:47:18] But yeah, I guess horizon could be a public box inside the labs subnet [22:48:11] robh: do you know? [22:55:28] (03PS1) 10Ori.livneh: Add four additional txStatsD backends on graphite1001 [puppet] - 10https://gerrit.wikimedia.org/r/194218 [22:56:29] (03CR) 10Ori.livneh: [C: 032] Add four additional txStatsD backends on graphite1001 [puppet] - 10https://gerrit.wikimedia.org/r/194218 (owner: 10Ori.livneh) [22:56:44] !log adding four additional txstatsd backends to graphite1001 to cope with load [22:56:50] Logged the message, Master [22:57:51] 6operations, 10Staging: Package trebuchet-trigger for trusty - https://phabricator.wikimedia.org/T91463#1084831 (10thcipriani) 3NEW [22:57:53] 6operations, 10Analytics: investigate txstatsd error logs - https://phabricator.wikimedia.org/T91464#1084838 (10BBlack) 3NEW a:3fgiunchedi [22:58:56] 6operations, 10Staging: Package trebuchet-trigger for trusty - https://phabricator.wikimedia.org/T91463#1084855 (10thcipriani) [22:59:28] 6operations, 10Continuous-Integration, 5Patch-For-Review: invalid byte sequence in US-ASCII - puppet issues with UTF-8 - https://phabricator.wikimedia.org/T91453#1084857 (10Dzahn) a:3Dzahn [23:01:44] 6operations, 10Continuous-Integration, 5Patch-For-Review: invalid byte sequence in US-ASCII - puppet issues with UTF-8 - https://phabricator.wikimedia.org/T91453#1084905 (10Dzahn) see http://projects.puppetlabs.com/issues/20522 and all the subtasks [23:04:33] andrewbogott: it would also work to let californium have a public IP and still let it be a backend of misc-web [23:04:51] hm, I suppose so, then it wouldn’t need a cert [23:04:53] andrewbogott same with zirconium for example (until we remove the public IP) [23:05:03] yes, you still get free cert and caching [23:27:17] PROBLEM - Kafka Broker Messages In Per Second on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 12 data above and 45 below the confidence bounds [23:29:05] (03CR) 10Dzahn: [C: 031] lvs: init.pp lint [puppet] - 10https://gerrit.wikimedia.org/r/190689 (owner: 10Matanya) [23:33:52] 6operations, 3HTTPS-by-default, 5Patch-For-Review: varnish disk cache auditing/correction - https://phabricator.wikimedia.org/T90583#1085024 (10BBlack) The scope of the unknowns here that matter are now down to the cp30xx servers only. Those seem to have several different current partition sizes between the... [23:34:57] PROBLEM - Kafka Broker Messages In Per Second on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 13 data above and 45 below the confidence bounds [23:35:28] 6operations, 3HTTPS-by-default: Upgrade all HTTP frontends to Debian jessie - https://phabricator.wikimedia.org/T86648#1085040 (10BBlack) [23:35:30] 6operations, 3HTTPS-by-default: jessie kernel vm subsystem issues for upload caches - https://phabricator.wikimedia.org/T88996#1085038 (10BBlack) 5Open>3Resolved This is (and many related things are) now resolved, and new automatic installs on jessie are set up with appropriate kernels, VM tuning, filesyst... [23:35:48] 6operations, 3HTTPS-by-default: Upgrade all HTTP frontends to Debian jessie - https://phabricator.wikimedia.org/T86648#973396 (10BBlack) [23:37:37] RECOVERY - puppet last run on ruthenium is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [23:38:34] 6operations, 3HTTPS-by-default: Upgrade all HTTP frontends to Debian jessie - https://phabricator.wikimedia.org/T86648#1085066 (10BBlack) Test installs are now successful and all known issues are resolved for all cache types (e.g. systemd transitions for various daemons, varnishkafka issues, etc). Next step h... [23:38:56] !log ran puppet on ruthenium (keeps showing up in icinga but then no issue when you run it) [23:39:01] Logged the message, Master [23:40:27] ACKNOWLEDGEMENT - HTTPS on antimony is CRITICAL: SSL CRITICAL - Certificate svn.wikimedia.org expired daniel_zahn T88731 [23:41:21] (03PS1) 10Hoo man: Also don't log the output of testwikidata's dispatchChanges [puppet] - 10https://gerrit.wikimedia.org/r/194228 [23:41:56] mutante: ^ easy one [23:42:28] Will purge the old logs from terbium after [23:42:34] about 2GiB by now :P [23:43:01] ACKNOWLEDGEMENT - Router interfaces on mr1-esams is CRITICAL: CRITICAL: host 91.198.174.247, interfaces up: 36, down: 1, dormant: 0, excluded: 1, unused: 0BRge-0/0/0: down - Core: msw-oe12-esamsBR daniel_zahn T84700 [23:44:42] bblack: ^ last un-acked CRIT in Icinga. actually 0 for once [23:44:52] hoo: ok, looking [23:46:13] (03CR) 10Dzahn: [C: 032] "yes, that cron has been deleted already" [puppet] - 10https://gerrit.wikimedia.org/r/194228 (owner: 10Hoo man) [23:49:06] Thanks :) [23:50:23] mutante: nice :) [23:52:02] jouncebot, next [23:52:03] In 0 hour(s) and 7 minute(s): Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150304T0000) [23:52:23] meanwhile 2 different ones are back. CirrusSearch-slow.log_line_rate (all the time right, but coming and going) and "Difference between raw and validated EventLogging overall message rates" on graphite (no clue what it tells me) [23:52:36] <^d> I can take it [23:52:58] <^d> Krenair, legoktm: Ping for swat in a few minutes [23:53:04] pong [23:53:16] ^d: here, going to add CentralAuth submodule updates in a minute [23:53:21] <^d> Ok [23:57:40] (03PS1) 10BBlack: kill exec bit on systemd unit files [puppet] - 10https://gerrit.wikimedia.org/r/194237 [23:59:48] ^d: added the submodule updates [23:59:54] <^d> +2'd all the changes