[00:00:04] RoanKattouw, ^d: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150129T0000). [00:00:34] ^lurker: the suprm401 patch is me too [00:00:40] <^lurker> okie dokie [00:00:47] (03PS1) 10BryanDavis: Set umask when exec'ing git commands [puppet] - 10https://gerrit.wikimedia.org/r/187286 [00:00:49] (03PS1) 10Ottomata: Fix variable reference [puppet] - 10https://gerrit.wikimedia.org/r/187287 [00:01:07] > rror: Failed to apply catalog: Could not find dependency File[undef] for Cron[geowiki-process-db-to-limn] at /etc/puppet/modules/geowiki/manifests/job/limn.pp:28 [00:01:08] ottomata: ^ [00:01:10] on stat1003 [00:01:12] ah [00:01:13] ok [00:01:24] (03CR) 10Ottomata: [C: 032 V: 032] Fix variable reference [puppet] - 10https://gerrit.wikimedia.org/r/187287 (owner: 10Ottomata) [00:01:30] (03CR) 10Chad: [C: 032] Add Flooders group on Officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/185663 (https://phabricator.wikimedia.org/T86237) (owner: 10Mattflaschen) [00:01:41] (03CR) 10BryanDavis: "Might be worth testing on the integration hosts via cherry-pick to see if it helps or not." [puppet] - 10https://gerrit.wikimedia.org/r/187286 (owner: 10BryanDavis) [00:02:05] <^lurker> ebernhardson: We'll do the officewiki patch first. easiest. [00:02:18] RECOVERY - puppet last run on lvs4002 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [00:03:12] <^lurker> And started the jenkins dance for your core patches [00:03:16] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [00:03:25] ^lurker, yeah, I'm here. [00:05:38] (03PS1) 10Ottomata: Fix another variable reference [puppet] - 10https://gerrit.wikimedia.org/r/187289 [00:05:47] RECOVERY - puppet last run on stat1003 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [00:05:57] (03CR) 10BBlack: "does "exec" even have a "mode" parameter? My initial guess is no, as I can't imagine how it would apply it to arbitary commands. I suspec" [puppet] - 10https://gerrit.wikimedia.org/r/187286 (owner: 10BryanDavis) [00:06:19] (03PS2) 10Ottomata: Fix another variable reference [puppet] - 10https://gerrit.wikimedia.org/r/187289 [00:06:35] (03CR) 10Alexandros Kosiaris: [C: 032] Remove server beryllium from site.pp [puppet] - 10https://gerrit.wikimedia.org/r/187283 (owner: 10Alexandros Kosiaris) [00:06:43] (03PS3) 10Ottomata: Fix another variable reference [puppet] - 10https://gerrit.wikimedia.org/r/187289 [00:06:45] (03CR) 10BBlack: "does "exec" even have a "mode" parameter? My initial guess is no, as I can't imagine how it would apply it to arbitary commands. I suspec" [puppet] - 10https://gerrit.wikimedia.org/r/187286 (owner: 10BryanDavis) [00:06:55] (03CR) 10Ottomata: [C: 032 V: 032] Fix another variable reference [puppet] - 10https://gerrit.wikimedia.org/r/187289 (owner: 10Ottomata) [00:07:08] (03CR) 10Chad: [C: 032] Make flow-bot grantable/removable on enwiki, testwiki, test2wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/181120 (https://phabricator.wikimedia.org/T86403) (owner: 10Mattflaschen) [00:07:25] (03Merged) 10jenkins-bot: Add Flooders group on Officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/185663 (https://phabricator.wikimedia.org/T86237) (owner: 10Mattflaschen) [00:10:29] paravoid: [00:10:32] ls -l manifests/misc/statistics.pp [00:10:32] ls: manifests/misc/statistics.pp: No such file or directory [00:10:36] I HOPE YOU'RE HAPPY [00:10:38] ^lurker: pong [00:10:43] (03PS1) 10Alexandros Kosiaris: Remove server beryllium [dns] - 10https://gerrit.wikimedia.org/r/187290 [00:10:46] (03CR) 10BryanDavis: [C: 04-1] "bah. don't code without reading the spec while walking to the next meeting." [puppet] - 10https://gerrit.wikimedia.org/r/187286 (owner: 10BryanDavis) [00:10:50] \o/ \o/ \o/ [00:10:52] I AM [00:11:29] now we just need to s/analytics/statistics/, and then make it a puppet submodule [00:11:44] !log demon Synchronized wmf-config/InitialiseSettings.php: flooders on officewiki (duration: 00m 08s) [00:11:47] <^lurker> superm401, ebernhardson: ^^ [00:11:52] Logged the message, Master [00:12:15] (03PS3) 10Ori.livneh: osmium: remove appserver role [puppet] - 10https://gerrit.wikimedia.org/r/187257 [00:12:24] (03CR) 10Ori.livneh: [C: 032 V: 032] osmium: remove appserver role [puppet] - 10https://gerrit.wikimedia.org/r/187257 (owner: 10Ori.livneh) [00:13:18] bblack har har har [00:13:30] i ain't makin no WMF specific submodules [00:14:24] (03PS1) 10Giuseppe Lavagetto: hiera: allow false values in the role backend [puppet] - 10https://gerrit.wikimedia.org/r/187292 [00:14:42] (03PS2) 10Giuseppe Lavagetto: hiera: allow false values in the role backend [puppet] - 10https://gerrit.wikimedia.org/r/187292 [00:14:57] <_joe_> ^^ I <3 ruby [00:15:04] (03CR) 10Chad: [C: 032] Remove Media Viewer tag [mediawiki-config] - 10https://gerrit.wikimedia.org/r/187268 (owner: 10Gilles) [00:15:06] (03Merged) 10jenkins-bot: Make flow-bot grantable/removable on enwiki, testwiki, test2wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/181120 (https://phabricator.wikimedia.org/T86403) (owner: 10Mattflaschen) [00:15:21] (03PS8) 10KartikMistry: Use only cxserver/deploy in deployment [puppet] - 10https://gerrit.wikimedia.org/r/184217 [00:15:27] PROBLEM - Apache HTTP on osmium is CRITICAL: HTTP CRITICAL: HTTP/1.0 500 Internal Server Error - 245 bytes in 0.016 second response time [00:15:52] (03CR) 10Giuseppe Lavagetto: [C: 032] hiera: allow false values in the role backend [puppet] - 10https://gerrit.wikimedia.org/r/187292 (owner: 10Giuseppe Lavagetto) [00:16:06] PROBLEM - HHVM rendering on osmium is CRITICAL: HTTP CRITICAL: HTTP/1.0 500 Internal Server Error - 245 bytes in 0.029 second response time [00:16:42] (03Merged) 10jenkins-bot: Remove Media Viewer tag [mediawiki-config] - 10https://gerrit.wikimedia.org/r/187268 (owner: 10Gilles) [00:17:27] !log demon Synchronized php-1.25wmf14/includes/api/ApiPageSet.php: (no message) (duration: 00m 06s) [00:17:32] Logged the message, Master [00:17:39] !log demon Synchronized php-1.25wmf15/includes/api/ApiPageSet.php: (no message) (duration: 00m 05s) [00:17:44] Logged the message, Master [00:17:53] !log demon Synchronized php-1.25wmf15/includes/Title.php: (no message) (duration: 00m 06s) [00:17:56] Logged the message, Master [00:18:32] !log demon Synchronized php-1.25wmf14/includes/Title.php: (no message) (duration: 00m 06s) [00:18:36] Logged the message, Master [00:19:06] !log demon Synchronized wmf-config/: (no message) (duration: 00m 06s) [00:19:11] Logged the message, Master [00:19:20] <^lurker> ebernhardson: You're live [00:19:28] <^lurker> gi11es: You too [00:20:42] ^lurker: testing... [00:21:15] ^lurker: looks successful, thanks [00:21:17] PROBLEM - puppet last run on mw1234 is CRITICAL: CRITICAL: puppet fail [00:21:17] PROBLEM - puppet last run on mw1127 is CRITICAL: CRITICAL: Puppet has 2 failures [00:21:27] PROBLEM - puppet last run on ms-be2010 is CRITICAL: CRITICAL: Puppet has 2 failures [00:21:27] PROBLEM - puppet last run on cp1069 is CRITICAL: CRITICAL: puppet fail [00:21:31] <^lurker> gi11es: you're welcome [00:22:06] PROBLEM - puppet last run on mc1008 is CRITICAL: CRITICAL: Puppet has 6 failures [00:22:07] PROBLEM - puppet last run on mw1083 is CRITICAL: CRITICAL: Puppet has 40 failures [00:22:07] PROBLEM - puppet last run on mw1096 is CRITICAL: CRITICAL: Puppet has 3 failures [00:22:16] PROBLEM - puppet last run on mw1035 is CRITICAL: CRITICAL: puppet fail [00:22:26] PROBLEM - puppet last run on mw1013 is CRITICAL: CRITICAL: Puppet has 7 failures [00:22:27] PROBLEM - puppet last run on mw1232 is CRITICAL: CRITICAL: Puppet has 2 failures [00:22:27] PROBLEM - puppet last run on stat1001 is CRITICAL: CRITICAL: Puppet has 8 failures [00:22:27] PROBLEM - puppet last run on db1024 is CRITICAL: CRITICAL: Puppet has 11 failures [00:22:36] PROBLEM - puppet last run on mw1094 is CRITICAL: CRITICAL: Puppet has 47 failures [00:22:37] PROBLEM - puppet last run on cp4017 is CRITICAL: CRITICAL: Puppet has 1 failures [00:22:40] <^lurker> um, puppet? [00:23:21] <_joe_> that's me sorry [00:23:30] (03PS1) 10Yuvipanda: Test if hiera is flailing on boolean falses [puppet] - 10https://gerrit.wikimedia.org/r/187295 [00:23:37] PROBLEM - puppet last run on cp4013 is CRITICAL: CRITICAL: Puppet has 2 failures [00:23:42] _joe_: ^ [00:23:44] flailing ? [00:23:45] <^lurker> No worries, just making sure someone had it [00:23:47] ^lurker, don't see the officewiki change in effect. [00:23:54] <^lurker> Boo [00:23:58] <^lurker> Lemme try harder, with feeling [00:24:21] akosiaris: see https://gerrit.wikimedia.org/r/#/c/187292/ [00:25:05] (03CR) 10Yuvipanda: [C: 032] Test if hiera is flailing on boolean falses [puppet] - 10https://gerrit.wikimedia.org/r/187295 (owner: 10Yuvipanda) [00:25:17] PROBLEM - check_disk on db1025 is CRITICAL: DISK CRITICAL - free space: / 3217 MB (45% inode=55%): /dev 32199 MB (99% inode=99%): /run 6441 MB (99% inode=99%): /run/lock 5 MB (100% inode=99%): /run/shm 32209 MB (100% inode=99%): /a 577645 MB (47% inode=99%): /a/tmp 2735 MB (2% inode=99%): [00:26:08] ^lurker, something is weird. It says "Bot users" but links to the flood group. I might have messed it up. [00:26:23] <^lurker> Hmmm [00:26:37] RECOVERY - puppet last run on iodine is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [00:27:19] Messages? [00:27:46] RECOVERY - puppet last run on polonium is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [00:28:19] Yup [00:28:21] superm401: https://office.wikimedia.org/wiki/MediaWiki:Group-flood [00:28:31] I guess it's named in WikimediaMessages [00:28:51] (03PS1) 10Yuvipanda: Revert "Test if hiera is flailing on boolean falses" [puppet] - 10https://gerrit.wikimedia.org/r/187296 [00:28:56] Sigh, Wikidata (and presumably any sane wikis) have an override for it. [00:28:58] (03CR) 10Yuvipanda: [C: 032] Revert "Test if hiera is flailing on boolean falses" [puppet] - 10https://gerrit.wikimedia.org/r/187296 (owner: 10Yuvipanda) [00:29:08] Just create a local message [00:29:11] job done [00:29:20] (03CR) 10Yuvipanda: [V: 032] Revert "Test if hiera is flailing on boolean falses" [puppet] - 10https://gerrit.wikimedia.org/r/187296 (owner: 10Yuvipanda) [00:29:26] RECOVERY - puppet last run on iridium is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [00:29:47] RECOVERY - puppet last run on lead is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [00:30:20] Reedy, I don't have rights to do that on officewiki. [00:30:26] RECOVERY - check_disk on db1025 is OK: DISK OK - free space: / 3217 MB (45% inode=55%): /dev 32199 MB (99% inode=99%): /run 6441 MB (99% inode=99%): /run/lock 5 MB (100% inode=99%): /run/shm 32209 MB (100% inode=99%): /a 577645 MB (47% inode=99%): /a/tmp 35441 MB (34% inode=99%): [00:30:27] I also wonder if we should fix it in WikimediaMessages. [00:32:15] (03PS2) 10Nuria: Increasing warning threshold to 500 [puppet] - 10https://gerrit.wikimedia.org/r/187011 [00:32:24] <^lurker> guillom: Can I get +bcrat on officewiki by the way? [00:32:36] PROBLEM - puppet last run on iridium is CRITICAL: CRITICAL: puppet fail [00:32:57] PROBLEM - puppet last run on iodine is CRITICAL: CRITICAL: puppet fail [00:32:57] PROBLEM - puppet last run on lead is CRITICAL: CRITICAL: puppet fail [00:32:58] <^lurker> superm401: Message updated [00:33:35] <^lurker> Probably cached, heh. It's still showing as "bot users" on special:userrights [00:33:47] (03CR) 10Alexandros Kosiaris: [C: 032] Remove server beryllium [dns] - 10https://gerrit.wikimedia.org/r/187290 (owner: 10Alexandros Kosiaris) [00:34:07] <^lurker> Looks ok on https://office.wikimedia.org/wiki/Special:ListGroupRights now though [00:35:20] Thanks, ^lurker. I'm going to propose a change to WikimediaMessages as well. [00:35:29] ^lurker: I've got admin... I think I did it via createAndPromote [00:35:40] <^lurker> I've got +sysop too [00:35:43] <^lurker> I want +bcrat though [00:36:01] lol [00:36:13] (03PS3) 10Nuria: Increasing warning threshold to 500 [puppet] - 10https://gerrit.wikimedia.org/r/187011 [00:38:56] ^ https://gerrit.wikimedia.org/r/#/c/187300/ [00:38:57] RECOVERY - puppet last run on mw1083 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [00:39:16] RECOVERY - puppet last run on mw1127 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [00:39:16] RECOVERY - puppet last run on stat1001 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [00:39:16] RECOVERY - puppet last run on db1024 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [00:39:17] RECOVERY - puppet last run on mw1094 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [00:39:27] RECOVERY - puppet last run on ms-be2010 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [00:39:27] RECOVERY - puppet last run on cp4013 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [00:39:27] RECOVERY - puppet last run on cp4017 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [00:39:56] RECOVERY - puppet last run on mc1008 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:40:06] RECOVERY - puppet last run on mw1096 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [00:40:07] RECOVERY - puppet last run on mw1035 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [00:40:16] RECOVERY - puppet last run on mw1234 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:40:17] RECOVERY - puppet last run on mw1013 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [00:40:17] RECOVERY - puppet last run on mw1232 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [00:40:28] RECOVERY - puppet last run on cp1069 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [00:41:39] (03CR) 10Ottomata: [C: 032] Increasing warning threshold to 500 [puppet] - 10https://gerrit.wikimedia.org/r/187011 (owner: 10Nuria) [00:44:36] RECOVERY - puppet last run on lead is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [00:46:13] (03PS1) 10Andrew Bogott: Change a parted commandline slightly. [puppet] - 10https://gerrit.wikimedia.org/r/187301 [00:47:09] (03CR) 10Andrew Bogott: [C: 032] Change a parted commandline slightly. [puppet] - 10https://gerrit.wikimedia.org/r/187301 (owner: 10Andrew Bogott) [00:50:47] RECOVERY - puppet last run on iodine is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [00:50:57] RECOVERY - puppet last run on magnesium is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [00:52:21] <_joe_> wtf puppet [00:54:37] RECOVERY - puppet last run on iridium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:56:21] fwiw, the npm cache is corrupt again: https://integration.wikimedia.org/ci/job/mwext-DonationInterface-npm/907/console [00:56:43] (03CR) 10Alexandros Kosiaris: [C: 032] Use only cxserver/deploy in deployment [puppet] - 10https://gerrit.wikimedia.org/r/184217 (owner: 10KartikMistry) [00:57:10] (03CR) 10Alexandros Kosiaris: "Cherry-picked on beta, it worked, shepherding in production" [puppet] - 10https://gerrit.wikimedia.org/r/184217 (owner: 10KartikMistry) [01:04:40] PROBLEM - cxserver on sca1001 is CRITICAL: Connection refused [01:04:59] PROBLEM - LVS HTTP IPv4 on cxserver.svc.eqiad.wmnet is CRITICAL: Connection refused [01:05:19] PROBLEM - cxserver on sca1002 is CRITICAL: Connection refused [01:10:58] (03PS1) 10Ori.livneh: Set $wgResourceLoaderStorageEnabled to false on osmium [mediawiki-config] - 10https://gerrit.wikimedia.org/r/187311 [01:11:59] (03CR) 10Ori.livneh: [C: 032] Set $wgResourceLoaderStorageEnabled to false on osmium [mediawiki-config] - 10https://gerrit.wikimedia.org/r/187311 (owner: 10Ori.livneh) [01:12:05] (03Merged) 10jenkins-bot: Set $wgResourceLoaderStorageEnabled to false on osmium [mediawiki-config] - 10https://gerrit.wikimedia.org/r/187311 (owner: 10Ori.livneh) [01:12:48] !log ori Synchronized wmf-config/CommonSettings.php: Id5186348f: Set $wgResourceLoaderStorageEnabled to false on osmium (duration: 00m 07s) [01:12:56] Logged the message, Master [01:13:29] (03CR) 10Dzahn: [C: 04-2] "it uses 108 for both name spaces, should be 108 and 109" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/187278 (https://phabricator.wikimedia.org/T369) (owner: 10Spage) [01:16:45] (03Abandoned) 10BryanDavis: Set umask when exec'ing git commands [puppet] - 10https://gerrit.wikimedia.org/r/187286 (owner: 10BryanDavis) [01:20:42] (03PS1) 10Giuseppe Lavagetto: hiera: actively look up the role hierarchy instead of the standard one [puppet] - 10https://gerrit.wikimedia.org/r/187312 [01:23:02] 3operations, MediaWiki-Vagrant: Provisioning MediaWiki-Vagrant fails with "Could not set 'file' on ensure: Is a directory - /etc/hhvm/php.ini" - https://phabricator.wikimedia.org/T87478#1000251 (10Gilles) I confirm that this currently happens for brand new VMs. [01:26:17] akosiaris: so, when I run parted on my new jessie VM, it says "Warning: Not all of the space available to /dev/sda appears to be used, you can fix the GPT to use all of the space (an extra 1953103872 blocks) or continue with the current setting?" and then prompts me to fix. [01:26:20] PROBLEM - puppet last run on sca1002 is CRITICAL: CRITICAL: Puppet has 1 failures [01:26:36] My question is: How can I 'fix' non-interactively? parted refuses to do it. [01:32:32] (03PS1) 10Ori.livneh: vbench: use numpy to compute stats; add warmup option [puppet] - 10https://gerrit.wikimedia.org/r/187314 [01:32:52] (03CR) 10Ori.livneh: [C: 032 V: 032] vbench: use numpy to compute stats; add warmup option [puppet] - 10https://gerrit.wikimedia.org/r/187314 (owner: 10Ori.livneh) [01:35:57] 3operations: Remove misc/statistics.pp - https://phabricator.wikimedia.org/T87450#1000270 (10Ottomata) 5Open>3Resolved [01:37:40] PROBLEM - puppet last run on sca1001 is CRITICAL: CRITICAL: Puppet has 1 failures [01:39:05] (03PS2) 10Yuvipanda: Get rid of superfluous diamond includes [puppet] - 10https://gerrit.wikimedia.org/r/187261 [01:40:30] (03CR) 10Yuvipanda: [C: 032 V: 032] Get rid of superfluous diamond includes [puppet] - 10https://gerrit.wikimedia.org/r/187261 (owner: 10Yuvipanda) [01:40:58] (03PS1) 10Ottomata: Ensure monitoring::ganglia varnishkafka use is absent [puppet] - 10https://gerrit.wikimedia.org/r/187316 [01:41:10] (03PS2) 10Ottomata: Ensure monitoring::ganglia varnishkafka use is absent [puppet] - 10https://gerrit.wikimedia.org/r/187316 [01:42:53] ottomata: DEATH TO GANGLIA!!!1 [01:43:33] (03PS3) 10Yuvipanda: Move deployment-prep hiera config into ops/puppet [puppet] - 10https://gerrit.wikimedia.org/r/186852 (https://phabricator.wikimedia.org/T87223) [01:44:05] 3operations, MediaWiki-Vagrant: Provisioning MediaWiki-Vagrant fails with "Could not set 'file' on ensure: Is a directory - /etc/hhvm/php.ini" - https://phabricator.wikimedia.org/T87478#1000286 (10Spage) FWIW something like this happened to me this morning on flow-tests.wmflabs.org running (labs) /vagrant from O... [01:44:10] (03CR) 10Yuvipanda: [C: 032 V: 032] Move deployment-prep hiera config into ops/puppet [puppet] - 10https://gerrit.wikimedia.org/r/186852 (https://phabricator.wikimedia.org/T87223) (owner: 10Yuvipanda) [01:45:06] (03CR) 10Dzahn: "this moves all files/icinga into modules/icinga/files , reduce global ./files/" [puppet] - 10https://gerrit.wikimedia.org/r/187087 (owner: 10Dzahn) [01:46:32] YuviPanda: I am going to wait til tomorrow to merge that one. i was about to, but then was like: nahhHHHHHHH [01:46:42] ottomata: awwww maaaan [01:47:04] haha [01:47:36] (03CR) 10Yuvipanda: [C: 04-1] "I *think* I didn't move them when I cleaned up the module because they were being referenced by code in manifests/* rather than in a modul" [puppet] - 10https://gerrit.wikimedia.org/r/187087 (owner: 10Dzahn) [01:49:44] ^lurker: {{done}} [01:52:19] RECOVERY - LVS HTTP IPv4 on cxserver.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 1103 bytes in 0.009 second response time [01:52:24] RECOVERY - puppet last run on sca1001 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [01:52:30] RECOVERY - cxserver on sca1002 is OK: HTTP OK: HTTP/1.1 200 OK - 1103 bytes in 0.026 second response time [01:52:49] RECOVERY - puppet last run on sca1002 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [01:53:10] RECOVERY - cxserver on sca1001 is OK: HTTP OK: HTTP/1.1 200 OK - 1103 bytes in 0.010 second response time [01:54:30] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [01:54:38] (03CR) 10Dzahn: [C: 031] Add mforns to eventlogging-admins group [puppet] - 10https://gerrit.wikimedia.org/r/187271 (https://phabricator.wikimedia.org/T87816) (owner: 10Mforns) [01:57:40] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [01:59:15] 3operations, Beta-Cluster: Minimize differences between beta and production (Tracking) - https://phabricator.wikimedia.org/T87220#1000325 (10yuvipanda) [02:04:52] (03PS1) 10Yuvipanda: beta: Disable standard mail sender on deployment-mx [puppet] - 10https://gerrit.wikimedia.org/r/187318 (https://phabricator.wikimedia.org/T86575) [02:05:15] (03CR) 10Yuvipanda: [C: 032 V: 032] beta: Disable standard mail sender on deployment-mx [puppet] - 10https://gerrit.wikimedia.org/r/187318 (https://phabricator.wikimedia.org/T86575) (owner: 10Yuvipanda) [02:19:58] !log l10nupdate Synchronized php-1.25wmf14/cache/l10n: (no message) (duration: 00m 02s) [02:20:07] Logged the message, Master [02:21:06] !log LocalisationUpdate completed (1.25wmf14) at 2015-01-29 02:20:02+00:00 [02:21:10] Logged the message, Master [02:24:46] 3operations: Monitor Netapps - https://phabricator.wikimedia.org/T87836#1000337 (10Gage) 3NEW [02:25:30] 3operations: Graph Netapp SNMP stats with LIbreNMS - https://phabricator.wikimedia.org/T87837#1000344 (10Gage) 3NEW [02:27:01] jgage: ? [02:27:06] why are you working on the netapps? [02:27:23] 3operations: Create Icinga alerts for Netapp health - https://phabricator.wikimedia.org/T87839#1000358 (10Gage) 3NEW [02:27:39] 3operations: Retire Torrus - https://phabricator.wikimedia.org/T87840#1000364 (10Gage) 3NEW [02:27:56] 3operations: Monitor Netapps - https://phabricator.wikimedia.org/T87836#1000374 (10faidon) 5Open>3declined a:3faidon We're phasing out the NetApps completely. I see no reason to add monitoring for them at this point. [02:28:16] 3operations: Create Icinga alerts for Netapp health - https://phabricator.wikimedia.org/T87839#1000358 (10Gage) [02:28:18] 3operations: Retire Torrus - https://phabricator.wikimedia.org/T87840#1000377 (10Gage) [02:28:19] 3operations: Graph Netapp SNMP stats with LIbreNMS - https://phabricator.wikimedia.org/T87837#1000379 (10Gage) [02:28:40] 3operations: Create Icinga alerts for Netapp health - https://phabricator.wikimedia.org/T87839#1000358 (10Gage) [02:29:07] i'm not, i'm making a task to nuke torrus, with its dependent tasks [02:29:27] why? [02:29:37] because i just fixed torrus today. none of us had noticed that it was down and it mostly doesn't work. [02:29:43] and why would we put it in librenms anyway [02:30:01] and opening up a bunch of tasks for such simple tasks doesn't really help [02:30:19] a task that said 'add monitoring for netapps' with a checklist for icinga/graphs would be enough to track this [02:30:49] we can do it however you like, i just want to track the goal so that it gets done [02:31:17] feel free to modify [02:31:18] I already declined the parent task [02:32:11] this tiny task proliferation now means that I'll have to decline the rest of them as well, for the same reason [02:33:59] !log l10nupdate Synchronized php-1.25wmf15/cache/l10n: (no message) (duration: 00m 02s) [02:34:07] Logged the message, Master [02:35:07] !log LocalisationUpdate completed (1.25wmf15) at 2015-01-29 02:34:03+00:00 [02:35:11] Logged the message, Master [02:40:49] i proposed librenms because it already duplicates most of the other monitoring in torrus, and it speaks snmp. i haven't looked at how to collect snmp sources into graphite but i guess we could do that instead. [03:38:02] (03PS3) 10BBlack: Raise and recalculate varnish frontend mallocs [puppet] - 10https://gerrit.wikimedia.org/r/186816 [03:50:40] gotta love it when puppet says: Syntax error at '}'; expected '}' [03:57:31] oh I had a nice one today too: [03:57:40] "Munging failed for value 174 in class content: can't convert Fixnum into String at line 1" [03:57:57] 174 being the Fixnum value that it did, in fact, convert to a String in the process of creating that stupid error message, obviously [03:58:18] haha [04:08:10] (03PS4) 10BBlack: Raise and recalculate varnish frontend mallocs [puppet] - 10https://gerrit.wikimedia.org/r/186816 [04:11:28] (03CR) 10BBlack: [C: 032] Raise and recalculate varnish frontend mallocs [puppet] - 10https://gerrit.wikimedia.org/r/186816 (owner: 10BBlack) [04:11:50] ^ there is a small but real chance this will break puppet on all the caches due to some random unpredictable infelicities of puppet type conversion and maths. if so, sorry ahead of time for the spam! [04:13:21] * bblack wishes we had the tuits to keep puppet-compiler working all the time [04:20:11] hey neat, it mostly worked as intended (except that the math on US bits boxes ended up at 47G instead of 48G, but that's not important really) [04:25:28] (03CR) 10devunt: [C: 031] mediawikiwiki: Allow sysop to add and remove themself from translationadmin [mediawiki-config] - 10https://gerrit.wikimedia.org/r/187183 (https://phabricator.wikimedia.org/T87797) (owner: 10Florianschmidtwelzow) [04:52:59] !log LocalisationUpdate ResourceLoader cache refresh completed at Thu Jan 29 04:51:55 UTC 2015 (duration 51m 54s) [04:53:08] Logged the message, Master [04:59:05] (03PS1) 10BBlack: SPDY for Jessie [puppet] - 10https://gerrit.wikimedia.org/r/187328 [04:59:43] (03CR) 10jenkins-bot: [V: 04-1] SPDY for Jessie [puppet] - 10https://gerrit.wikimedia.org/r/187328 (owner: 10BBlack) [05:00:53] (03PS2) 10BBlack: SPDY for Jessie [puppet] - 10https://gerrit.wikimedia.org/r/187328 [05:02:26] (03CR) 10BBlack: [C: 032] SPDY for Jessie [puppet] - 10https://gerrit.wikimedia.org/r/187328 (owner: 10BBlack) [05:05:40] (03PS1) 10BBlack: bugfix for 89836a36 [puppet] - 10https://gerrit.wikimedia.org/r/187329 [05:06:16] (03CR) 10BBlack: [C: 032 V: 032] bugfix for 89836a36 [puppet] - 10https://gerrit.wikimedia.org/r/187329 (owner: 10BBlack) [05:36:15] 3Beta-Cluster: Remove beta specific mediawiki roles - https://phabricator.wikimedia.org/T87210#1000435 (10greg) p:5Triage>3Normal [05:36:30] 3Beta-Cluster: Remove all ::beta roles in puppet - https://phabricator.wikimedia.org/T86644#1000438 (10greg) p:5Triage>3Normal [05:36:51] 3operations, Beta-Cluster: Set 'cluster' salt grain appropriately for all instances in beta cluster - https://phabricator.wikimedia.org/T87199#1000441 (10greg) p:5Triage>3Normal [05:40:28] oh, those first two showed up because of the "puppet" tag [06:08:26] 3Continuous-Integration: Puppet is causing changed/added files in 'slave-scripts' git::clone on integration slaves in labs to become root read-only - https://phabricator.wikimedia.org/T87843#1000457 (10Krinkle) 3NEW [06:08:39] 3Continuous-Integration: Puppet is causing changed/added files in 'slave-scripts' git::clone on integration slaves in labs to become root read-only - https://phabricator.wikimedia.org/T87843#1000464 (10Krinkle) p:5Triage>3Unbreak! [06:28:14] PROBLEM - puppet last run on mw1144 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:04] PROBLEM - puppet last run on virt1006 is CRITICAL: CRITICAL: Puppet has 2 failures [06:29:24] PROBLEM - puppet last run on cp1061 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:24] PROBLEM - puppet last run on mw1175 is CRITICAL: CRITICAL: puppet fail [06:29:54] PROBLEM - puppet last run on mw1166 is CRITICAL: CRITICAL: Puppet has 5 failures [06:30:03] PROBLEM - puppet last run on mw1052 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:14] PROBLEM - puppet last run on cp3016 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:23] PROBLEM - puppet last run on cp1056 is CRITICAL: CRITICAL: Puppet has 2 failures [06:31:03] PROBLEM - puppet last run on mw1114 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:03] PROBLEM - puppet last run on mw1042 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:24] PROBLEM - puppet last run on mw1065 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:34] PROBLEM - puppet last run on cp3014 is CRITICAL: CRITICAL: Puppet has 2 failures [06:31:48] 3operations, MediaWiki-Vagrant: Provisioning MediaWiki-Vagrant fails with "Could not set 'file' on ensure: Is a directory - /etc/hhvm/php.ini" - https://phabricator.wikimedia.org/T87478#1000499 (10dan-nl) @bd808, @Tgr, i agree, that ideally this would be fixed in the hhvm package, but it seems that this issue h... [06:31:54] PROBLEM - puppet last run on amssq55 is CRITICAL: CRITICAL: Puppet has 1 failures [06:45:44] RECOVERY - puppet last run on mw1166 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [06:45:54] RECOVERY - puppet last run on virt1006 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [06:46:13] RECOVERY - puppet last run on mw1144 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [06:46:14] RECOVERY - puppet last run on cp3016 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [06:46:14] RECOVERY - puppet last run on cp1056 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [06:46:14] RECOVERY - puppet last run on mw1065 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [06:46:14] RECOVERY - puppet last run on cp1061 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [06:46:53] RECOVERY - puppet last run on mw1052 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [06:46:54] RECOVERY - puppet last run on mw1042 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [06:47:23] RECOVERY - puppet last run on mw1175 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [06:47:33] RECOVERY - puppet last run on cp3014 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [06:47:53] RECOVERY - puppet last run on amssq55 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [06:47:54] RECOVERY - puppet last run on mw1114 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [06:55:40] (03PS2) 10Mattflaschen: Simplify Echo and Thanks settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/186539 [06:58:58] (03PS3) 10Mattflaschen: Simplify Echo and Thanks settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/186539 [06:59:42] (03CR) 10Mattflaschen: "Rebased, and removed two additional references (symlinks, and facilitation code for the now-unneeded tag)." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/186539 (owner: 10Mattflaschen) [07:06:14] (03PS4) 10Mattflaschen: Simplify Echo and Thanks settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/186539 [07:07:31] (03PS1) 10BBlack: fix git::clone umask issues T87843 [puppet] - 10https://gerrit.wikimedia.org/r/187331 [07:08:23] 3Continuous-Integration: Puppet is causing changed/added files in 'slave-scripts' git::clone on integration slaves in labs to become root read-only - https://phabricator.wikimedia.org/T87843#1000533 (10BBlack) I'm not entirely confident in the above patch given how much reuse git::clone sees all over puppet, but... [07:11:00] (03PS2) 10BBlack: fix git::clone umask issues T87843 [puppet] - 10https://gerrit.wikimedia.org/r/187331 [07:12:23] PROBLEM - puppet last run on mw1084 is CRITICAL: CRITICAL: Puppet has 1 failures [07:16:54] PROBLEM - puppet last run on cp3005 is CRITICAL: CRITICAL: puppet fail [07:29:14] RECOVERY - puppet last run on mw1084 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [07:36:55] RECOVERY - puppet last run on cp3005 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [07:48:43] PROBLEM - git.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:49:34] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 59696 bytes in 0.817 second response time [08:47:56] 3Beta-Cluster: Puppet failures on deployment-mx - https://phabricator.wikimedia.org/T87848#1000610 (10yuvipanda) 3NEW [08:48:36] 3Beta-Cluster: deployment-mx does not have salt master set to deployment-salt - https://phabricator.wikimedia.org/T87849#1000617 (10yuvipanda) 3NEW [08:50:42] (03PS1) 10Yuvipanda: Add missing .yaml extension to deployment-mx hiera file [puppet] - 10https://gerrit.wikimedia.org/r/187340 [08:50:55] (03CR) 10Yuvipanda: [C: 032 V: 032] Add missing .yaml extension to deployment-mx hiera file [puppet] - 10https://gerrit.wikimedia.org/r/187340 (owner: 10Yuvipanda) [08:58:05] 3operations, Beta-Cluster: Minimize differences between beta and production (Tracking) - https://phabricator.wikimedia.org/T87220#1000629 (10yuvipanda) [09:19:15] 3operations, Beta-Cluster: Renumber apache user/group to uid=48 - https://phabricator.wikimedia.org/T78076#1000652 (10yuvipanda) As an update, I think @faidon and @joe are working on moving our apache user to just use www-data instead (in prod). [10:23:13] (03PS1) 10Mark Bergsma: Add no-gravity configuration option [debs/pybal] - 10https://gerrit.wikimedia.org/r/187346 (https://phabricator.wikimedia.org/T86650) [10:24:25] (03CR) 10jenkins-bot: [V: 04-1] Add no-gravity configuration option [debs/pybal] - 10https://gerrit.wikimedia.org/r/187346 (https://phabricator.wikimedia.org/T86650) (owner: 10Mark Bergsma) [10:25:21] 3ops-core: Add support for setting weight=0 when depooling - https://phabricator.wikimedia.org/T86650#1000740 (10mark) This way of working with server state variables in PyBal is becoming quite unpleasant. It would probably make sense to rewrite this with a decent state machine at some point. This changes a lot... [10:27:23] (03CR) 10Mark Bergsma: [C: 04-2] "And yeah, I'll probably rename that config option. ;-)" [debs/pybal] - 10https://gerrit.wikimedia.org/r/187346 (https://phabricator.wikimedia.org/T86650) (owner: 10Mark Bergsma) [11:03:03] PROBLEM - puppet last run on mc1011 is CRITICAL: CRITICAL: Puppet has 1 failures [11:11:12] (03CR) 10Thiemo Mättig (WMDE): [C: 031] "I think it's obvious that we should look into this some day and try to show the captchas in a meaningful way. But I think this will be har" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/186828 (https://phabricator.wikimedia.org/T86453) (owner: 10Hoo man) [11:20:53] RECOVERY - puppet last run on mc1011 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [11:43:19] (03CR) 10Lydia Pintscher: "I have created T87854 to track enabling captchas properly." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/186828 (https://phabricator.wikimedia.org/T86453) (owner: 10Hoo man) [12:51:17] (03CR) 10Tim Landscheidt: "In the past, I had thought about writing a lighttpd module/patch that binds lighttpd to a free port (= 0) and then signals the assigned po" [puppet] - 10https://gerrit.wikimedia.org/r/187078 (owner: 10Yuvipanda) [13:07:34] PROBLEM - puppet last run on cp4014 is CRITICAL: CRITICAL: puppet fail [13:27:34] RECOVERY - puppet last run on cp4014 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [13:34:04] PROBLEM - puppet last run on mw1053 is CRITICAL: CRITICAL: Puppet has 1 failures [13:45:50] (03CR) 10MZMcBride: "Why not have 'sysop' inherit the rights of 'translationadmin'?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/187183 (https://phabricator.wikimedia.org/T87797) (owner: 10Florianschmidtwelzow) [13:50:54] RECOVERY - puppet last run on mw1053 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [14:09:15] (03CR) 10JanZerebecki: [C: 031] "Looks good." [puppet] - 10https://gerrit.wikimedia.org/r/187121 (https://phabricator.wikimedia.org/T68996) (owner: 10Dzahn) [14:09:27] (03PS2) 10JanZerebecki: add IPv6 interface to dataset1001 (eth2) [puppet] - 10https://gerrit.wikimedia.org/r/187121 (https://phabricator.wikimedia.org/T68996) (owner: 10Dzahn) [14:09:39] (03CR) 10JanZerebecki: [C: 031] add IPv6 interface to dataset1001 (eth2) [puppet] - 10https://gerrit.wikimedia.org/r/187121 (https://phabricator.wikimedia.org/T68996) (owner: 10Dzahn) [14:45:59] 3operations, ops-requests: set up DMARC aggregate report collection into a database for research and reporting - https://phabricator.wikimedia.org/T86209#1000879 (10Jgreen) ah yes, here's the new version https://gerrit.wikimedia.org/r/#/c/185472/ [15:35:48] 3operations, ops-core, Analytics: Deprecate HTTPS udp2log stream? - https://phabricator.wikimedia.org/T86656#1000896 (10QChris) >>! In T86656#999773, @Ottomata wrote: > The data is still being backfilled in hadoop. Done. [16:00:04] manybubbles, anomie, ^d, marktraceur: Respected human, time to deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150129T1600). Please do the needful. [17:09:16] 3operations, ops-core, Analytics: Deprecate HTTPS udp2log stream? - https://phabricator.wikimedia.org/T86656#1000993 (10Ottomata) Looking GOOD! Brandon, you may turn of nginx udp2log! :) [17:17:29] (03PS1) 10Glaisher: Set $wmgUseFloatedToc to false at dewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/187408 (https://phabricator.wikimedia.org/T87534) [17:17:45] (03PS2) 10Glaisher: Set $wmgUseFloatedToc to false at dewikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/187408 (https://phabricator.wikimedia.org/T87534) [17:18:45] (03PS5) 10BBlack: disable nginx udplog for SSL clusters [puppet] - 10https://gerrit.wikimedia.org/r/186257 [17:23:40] (03PS6) 10BBlack: disable nginx udplog for SSL clusters [puppet] - 10https://gerrit.wikimedia.org/r/186257 [17:24:51] (03CR) 10BBlack: [C: 032] "Per T86656 this is good to go!" [puppet] - 10https://gerrit.wikimedia.org/r/186257 (owner: 10BBlack) [17:30:01] (03CR) 10Ori.livneh: "congrats, folks." [puppet] - 10https://gerrit.wikimedia.org/r/186257 (owner: 10BBlack) [17:32:24] 3Beta-Cluster: Puppet failures on deployment-mx - https://phabricator.wikimedia.org/T87848#1001029 (10greg) p:5Triage>3Normal [17:32:26] 3Beta-Cluster: deployment-mx does not have salt master set to deployment-salt - https://phabricator.wikimedia.org/T87849#1001032 (10greg) p:5Triage>3Normal [17:37:20] 3Beta-Cluster: Unify labs and prod roles for role::deployment::deployment_servers - https://phabricator.wikimedia.org/T86885#1001050 (10greg) p:5Triage>3Normal [17:43:14] 3operations, ops-core, Analytics: Deprecate HTTPS udp2log stream? - https://phabricator.wikimedia.org/T86656#1001069 (10BBlack) nginx udp2log is off and nginx configs have been reloaded: https://gerrit.wikimedia.org/r/#/c/186257/ [17:48:10] Reedy: about? can you undeploy the Graph extension https://gerrit.wikimedia.org/r/#/c/187142/, see https://phabricator.wikimedia.org/T87770#1000408 [17:48:41] (03PS1) 10Reedy: Revert "Enable Extension:Graph on cawiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/187421 [17:48:45] (03PS2) 10Reedy: Revert "Enable Extension:Graph on cawiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/187421 [17:48:59] :) [17:49:09] yurikR: we're undeploying the graph extension now [17:49:21] (03CR) 10Reedy: [C: 032] Revert "Enable Extension:Graph on cawiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/187421 (owner: 10Reedy) [17:49:35] greg-g, thx ) [17:50:35] * Reedy waits for jenkins [17:59:31] (03Merged) 10jenkins-bot: Revert "Enable Extension:Graph on cawiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/187421 (owner: 10Reedy) [17:59:52] 3operations, ops-core, Analytics: Deprecate HTTPS udp2log stream? - https://phabricator.wikimedia.org/T86656#1001093 (10faidon) 5Open>3Resolved \o/ [18:01:31] 3operations, Wikimedia-General-or-Unknown: Add a wiki on wikitech is out of date, incomplete - https://phabricator.wikimedia.org/T87588#1001101 (10Glaisher) needs to be checked by both #operations and #shell side, I suppose. [18:01:47] (03PS1) 10Faidon Liambotis: Kill upstart-nfs-noidmapd.conf, unused [puppet] - 10https://gerrit.wikimedia.org/r/187428 [18:01:49] (03PS1) 10Faidon Liambotis: sysctl: switch Exec to Service [puppet] - 10https://gerrit.wikimedia.org/r/187429 [18:01:51] (03PS1) 10Faidon Liambotis: Add sysfs module, to handle /sys settings [puppet] - 10https://gerrit.wikimedia.org/r/187430 [18:04:56] (03PS3) 10Ottomata: Ensure monitoring::ganglia varnishkafka use is absent [puppet] - 10https://gerrit.wikimedia.org/r/187316 [18:08:31] (03CR) 10Ottomata: [C: 032 V: 032] Ensure monitoring::ganglia varnishkafka use is absent [puppet] - 10https://gerrit.wikimedia.org/r/187316 (owner: 10Ottomata) [18:08:45] ori: https://gerrit.wikimedia.org/r/187429 ? [18:09:04] (03PS2) 10Faidon Liambotis: Kill upstart-nfs-noidmapd.conf, unused [puppet] - 10https://gerrit.wikimedia.org/r/187428 [18:09:11] (03CR) 10Faidon Liambotis: [C: 032 V: 032] Kill upstart-nfs-noidmapd.conf, unused [puppet] - 10https://gerrit.wikimedia.org/r/187428 (owner: 10Faidon Liambotis) [18:09:21] paravoid: looks sane; have you tested? [18:09:30] no :) [18:09:40] but I'm wondering why it was like that in the first place [18:09:45] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/). [18:12:03] paravoid: https://gerrit.wikimedia.org/r/#/c/136948/ [18:12:42] ok, but why? :) [18:12:45] paravoid: one reason, I think, is that because it's a task job its status changes to 'stopped' right after it runs, so Puppet starts it on every run [18:12:52] ah hm [18:13:04] well of course that Exec is broken under systemd [18:13:57] paravoid: is there no command that would refresh the service regardless of it being managed by systemd or upstart? [18:14:11] is there a jessie system i can ssh to and take a look for a sec? [18:14:20] service foo restart [18:14:43] changing the exec to service foo restart would work, then [18:14:54] there's also a test -f /etc/init/procps ;) [18:15:13] the onlyif was for hardy [18:15:17] which doens't have the service [18:15:21] in any form [18:15:36] if the hardy boxes are all gone that onlyif can be banished [18:16:34] see the diff https://gerrit.wikimedia.org/r/#/c/136948/3/modules/sysctl/manifests/init.pp [18:17:19] i removed the comment because i was not anticipating jessie at the time so i thought that onlyif could live there forever without anyone having to mind it [18:17:55] ubuntu's switching to systemd too :) [18:18:05] (technicalities) [18:18:16] I'll test and fix, no worries [18:18:18] yeah i'm not complaining just explaining my thoughts at the time [18:18:19] thanks [18:18:19] thanks for the background [18:18:32] the commit message should have been better. but then again, so should yours :P [18:19:41] 3Analytics-Engineering, operations: Decommission webstatscollector - https://phabricator.wikimedia.org/T87868#1001126 (10Ottomata) 3NEW a:3Ottomata [18:20:22] (03PS1) 10Ottomata: Decom webstatscollector step 1 [puppet] - 10https://gerrit.wikimedia.org/r/187432 (https://phabricator.wikimedia.org/T87868) [18:21:28] (03PS2) 10Ottomata: Decom webstatscollector step 1 [puppet] - 10https://gerrit.wikimedia.org/r/187432 (https://phabricator.wikimedia.org/T87868) [18:23:48] technicalities! I see gangs assembling around the matter as if it were the equivalent of the council of Nicaea [18:24:18] is jenkins-bot not verifying things right now? [18:25:15] grrr [18:25:28] (03CR) 10Ottomata: [C: 032 V: 032] Decom webstatscollector step 1 [puppet] - 10https://gerrit.wikimedia.org/r/187432 (https://phabricator.wikimedia.org/T87868) (owner: 10Ottomata) [18:27:36] 3operations, Beta-Cluster: Renumber apache user/group to uid=48 - https://phabricator.wikimedia.org/T78076#1001166 (10faidon) apache right now has no uid (so all kinds of uid across the fleet) and a gid 48, which is < 100 and thus, wrong (that space is reserved for packages). Rather than renumbering both uid/gi... [18:27:57] (03PS1) 10Ottomata: Remove dependency on webstatscollector service [puppet] - 10https://gerrit.wikimedia.org/r/187433 (https://phabricator.wikimedia.org/T87868) [18:29:10] (03CR) 10Ottomata: [C: 032 V: 032] Remove dependency on webstatscollector service [puppet] - 10https://gerrit.wikimedia.org/r/187433 (https://phabricator.wikimedia.org/T87868) (owner: 10Ottomata) [18:29:27] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [18:29:29] !log reedy Synchronized wmf-config/InitialiseSettings.php: Disable Extension:Graph on cawiki (duration: 00m 06s) [18:29:30] ottomata: \o/ [18:29:36] Logged the message, Master [18:29:36] PROBLEM - puppet last run on gadolinium is CRITICAL: CRITICAL: puppet fail [18:29:58] ottomata: don't forget the whole gadolinium/protactinium deal! [18:30:01] paravoid, I've been making those arms of yours rise up in the air all over the place, eh? [18:30:03] yup! [18:30:09] ottomata: yes you have :) [18:30:19] heheh [18:30:58] paravoid, are you still in the office? [18:31:01] yes [18:31:05] (03PS1) 10Ori.livneh: chromium: specify additional command-line args [puppet] - 10https://gerrit.wikimedia.org/r/187434 [18:31:06] but about to jump into a meeting [18:31:31] you going to services review? [18:31:40] at 11:30 i think? [18:31:42] no [18:32:06] I'll be on my way to the airport at that time... [18:32:45] RECOVERY - puppet last run on gadolinium is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [18:33:41] (03CR) 10Ori.livneh: [C: 032 V: 032] "Catalog compiler: http://puppet-compiler.wmflabs.org/578/change/187434/html/osmium.eqiad.wmnet.html" [puppet] - 10https://gerrit.wikimedia.org/r/187434 (owner: 10Ori.livneh) [18:36:35] (03PS1) 10Ottomata: Remove webstats puppet references [puppet] - 10https://gerrit.wikimedia.org/r/187436 (https://phabricator.wikimedia.org/T87868) [18:37:03] (03PS2) 10Ottomata: Remove webstats puppet references [puppet] - 10https://gerrit.wikimedia.org/r/187436 (https://phabricator.wikimedia.org/T87868) [18:37:35] (03CR) 10Ottomata: [C: 032 V: 032] Remove webstats puppet references [puppet] - 10https://gerrit.wikimedia.org/r/187436 (https://phabricator.wikimedia.org/T87868) (owner: 10Ottomata) [18:47:43] (03PS1) 10Ottomata: Remove final referneces to webstatscollector [puppet] - 10https://gerrit.wikimedia.org/r/187444 (https://phabricator.wikimedia.org/T87868) [18:47:49] (03CR) 10jenkins-bot: [V: 04-1] Remove final referneces to webstatscollector [puppet] - 10https://gerrit.wikimedia.org/r/187444 (https://phabricator.wikimedia.org/T87868) (owner: 10Ottomata) [18:47:51] (03PS2) 10Ottomata: Remove final referneces to webstatscollector [puppet] - 10https://gerrit.wikimedia.org/r/187444 (https://phabricator.wikimedia.org/T87868) [18:48:33] (03CR) 10Ottomata: [C: 032 V: 032] Remove final referneces to webstatscollector [puppet] - 10https://gerrit.wikimedia.org/r/187444 (https://phabricator.wikimedia.org/T87868) (owner: 10Ottomata) [18:48:56] oh jenkins is back! [18:51:36] 3Analytics-Engineering, operations: Decommission webstatscollector - https://phabricator.wikimedia.org/T87868#1001260 (10Ottomata) 5Open>3Resolved [18:53:51] ottomata: it's extremely backlogged right now [18:56:36] (03CR) 10jenkins-bot: [V: 04-1] Add sysfs module, to handle /sys settings [puppet] - 10https://gerrit.wikimedia.org/r/187430 (owner: 10Faidon Liambotis) [19:42:28] (03PS3) 10Amire80: cxserver: enable no->nn language pair [puppet] - 10https://gerrit.wikimedia.org/r/186522 (https://phabricator.wikimedia.org/T76674) (owner: 10KartikMistry) [19:48:47] !log Updated Wikimania Scholarships (2f4a99f) "Add Sakha to list of languages" [19:49:17] morebots: hello? [19:49:20] Logged the message, Master [19:49:20] I am a logbot running on tools-exec-02. [19:49:20] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [19:49:20] To log a message, type !log . [20:01:08] 3operations, MediaWiki-Vagrant: Provisioning MediaWiki-Vagrant fails with "Could not set 'file' on ensure: Is a directory - /etc/hhvm/php.ini" - https://phabricator.wikimedia.org/T87478#1002174 (10Tgr) Vagrant fetches the current version of packages via apt; what version of MW-Vagrant you use does not affect thi... [20:07:41] (03PS2) 10Dzahn: ircd: create upstart file and ensure running [puppet] - 10https://gerrit.wikimedia.org/r/187059 (https://phabricator.wikimedia.org/T87679) [20:08:57] (03PS3) 10Rush: ircd: create upstart file and ensure running [puppet] - 10https://gerrit.wikimedia.org/r/187059 (https://phabricator.wikimedia.org/T87679) (owner: 10Dzahn) [20:10:34] Hey, we have a serious emergency involving gerrit, can someone PM me? [20:10:50] * awight taps fingers :) [20:10:59] ^d: ^^ [20:11:20] (03CR) 10Rush: [C: 032 V: 032] ircd: create upstart file and ensure running [puppet] - 10https://gerrit.wikimedia.org/r/187059 (https://phabricator.wikimedia.org/T87679) (owner: 10Dzahn) [20:12:49] hi [20:13:32] pizzzacat: I haven't named names :p btw, I just wanted you to be around [20:14:14] awight: pizzzacat: can you relay through katie (who's in the security private channel)? [20:14:38] greg-g: what's the full chennel name? [20:14:40] PROBLEM - puppet last run on argon is CRITICAL: CRITICAL: puppet fail [20:16:10] (03PS1) 10Rush: ircd fix dependencies for /etc/init/ircd.conf [puppet] - 10https://gerrit.wikimedia.org/r/187463 [20:18:16] (03CR) 10Dzahn: [C: 031] ircd fix dependencies for /etc/init/ircd.conf [puppet] - 10https://gerrit.wikimedia.org/r/187463 (owner: 10Rush) [20:19:57] (03CR) 10Rush: [C: 032] ircd fix dependencies for /etc/init/ircd.conf [puppet] - 10https://gerrit.wikimedia.org/r/187463 (owner: 10Rush) [20:20:31] for the public record: issue fixed [20:21:00] RECOVERY - puppet last run on argon is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [20:32:31] 3operations: ircd doesnt come back after server reboot - https://phabricator.wikimedia.org/T87679#1002270 (10chasemp) 5Open>3Resolved Should be gtg [20:33:33] 3operations, MediaWiki-Vagrant: Provisioning MediaWiki-Vagrant fails with "Could not set 'file' on ensure: Is a directory - /etc/hhvm/php.ini" - https://phabricator.wikimedia.org/T87478#1002272 (10dan-nl) @Tgr, i see. thanks, but how is MW-Vagrant determining where to get the apt-get package for hhvm? # i crea... [20:47:10] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 7.14% of data above the critical threshold [500.0] [21:00:03] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [21:26:45] 3operations, MediaWiki-Vagrant: Provisioning MediaWiki-Vagrant fails with "Could not set 'file' on ensure: Is a directory - /etc/hhvm/php.ini" - https://phabricator.wikimedia.org/T87478#1002405 (10bd808) 5Open>3Resolved I merged @dan-nl's patch for our puppet config and created {T88014} to track the packagin... [21:40:04] 3operations, Beta-Cluster: Renumber apache user/group to uid=48 - https://phabricator.wikimedia.org/T78076#1002444 (10bd808) >>! In T78076#1001166, @faidon wrote: > apache right now has no uid (so all kinds of uid across the fleet) and a gid 48, which is < 100 and thus, wrong (that space is reserved for packages... [21:48:03] (03CR) 10Krinkle: [C: 031] "Cherry-picked to integration-puppetmaster. Seems to work. There's quite a few deployments queued up which will put this to the test." [puppet] - 10https://gerrit.wikimedia.org/r/187331 (owner: 10BBlack) [22:00:23] 3RESTBase, operations, Scrum-of-Scrums, Services: Restbase deployment - https://phabricator.wikimedia.org/T1228#1002500 (10GWicke) [22:00:44] 3RESTBase, operations, Scrum-of-Scrums, Services: Restbase deployment - https://phabricator.wikimedia.org/T1228#21142 (10GWicke) [22:00:48] 3operations, MediaWiki-Vagrant: Provisioning MediaWiki-Vagrant fails with "Could not set 'file' on ensure: Is a directory - /etc/hhvm/php.ini" - https://phabricator.wikimedia.org/T87478#1002508 (10bd808) [22:01:04] 3RESTBase, operations, Scrum-of-Scrums, Services: RESTbase deployment - https://phabricator.wikimedia.org/T1228#1002510 (10ori) [22:05:24] 3Continuous-Integration: Puppet is causing changed/added files in 'slave-scripts' git::clone on integration slaves in labs to become root read-only - https://phabricator.wikimedia.org/T87843#1002513 (10Krinkle) 5Open>3Resolved a:3Krinkle https://gerrit.wikimedia.org/r/187331 has been cherry-picked to integ... [22:09:26] who (with puppet +2) should I add to CI related puppet changes? [22:09:53] (this one: https://gerrit.wikimedia.org/r/#/c/187331/ ) [22:10:19] bah, I'm tired (and have a headache) [22:10:27] !log git-deploy: Deploying integration/slave-scripts I5aa76b0, I4d94af46735c, I66fbce3fa [22:10:29] brandon can do it, it's his patch [22:10:43] Logged the message, Master [22:11:09] greg-g: you can add me as well, although right now I'm busy with tools users freaking out [22:14:55] andrewbogott: :) no worries [22:15:01] andrewbogott: and thanks [22:47:07] (03PS1) 10QChris: Add raw_webrequest to refinery's webrequest status emails [puppet] - 10https://gerrit.wikimedia.org/r/187603 [22:47:17] ottomata: ^ [22:51:21] (03PS2) 10QChris: Add refined webrequests to refinery's webrequest status emails [puppet] - 10https://gerrit.wikimedia.org/r/187603 [22:54:38] 3operations, Beta-Cluster: Renumber apache user/group to uid=48 - https://phabricator.wikimedia.org/T78076#1002618 (10Dzahn) per https://wikitech.wikimedia.org/wiki/UID using www-data means we want uid/gid 33/33 (not 48/48 or random:48) [22:55:18] (03CR) 10Ottomata: [C: 032] Add refined webrequests to refinery's webrequest status emails [puppet] - 10https://gerrit.wikimedia.org/r/187603 (owner: 10QChris) [22:55:21] 3operations, Beta-Cluster: Renumber apache user/group to uid=48 - https://phabricator.wikimedia.org/T78076#1002631 (10Dzahn) I think apache user might have been in use since before we even used Ubuntu. Like on Fedora... [22:55:24] qchris: actually,i haven't gotten any emails yet, should I have? [22:55:51] 0 data loss since we first merged this? [22:56:07] ottomata: I had thought that you should have received an email. [22:56:19] maybe i just didnt' notice, checking harder... [22:56:23] maybe i'm filtering it! [22:56:30] Since there is no bits, the bits partition should always be marked as faulty. [22:56:53] I'll double-check the cron. [22:57:52] Running the cron manually as hdfs user produces output, and MAILTO is set. [23:00:46] yeah hm, i got no email [23:28:06] 3Analytics, operations: Fix Varnishkafka delivery error icinga warning - https://phabricator.wikimedia.org/T76342#1002702 (10Ottomata) 5Open>3Resolved [23:33:48] PROBLEM - Router interfaces on mr1-esams is CRITICAL: CRITICAL: host 91.198.174.247, interfaces up: 36, down: 1, dormant: 0, excluded: 1, unused: 0BRge-0/0/0: down - Core: msw-oe12-esamsBR [23:42:05] (03PS1) 10Dzahn: decom cp1037,cp1038,cp1039,cp1040 [puppet] - 10https://gerrit.wikimedia.org/r/187615 (https://phabricator.wikimedia.org/T87800) [23:43:02] (03PS2) 10Dzahn: decom cp1037,cp1038,cp1039,cp1040 [puppet] - 10https://gerrit.wikimedia.org/r/187615 (https://phabricator.wikimedia.org/T87800) [23:43:54] 3operations: decom cp1037,cp1038,cp1039,cp1040 - https://phabricator.wikimedia.org/T87800#1002742 (10Dzahn) > "no longer have a puppet cache role" this looks like they still do: https://gerrit.wikimedia.org/r/#/c/187615/1/manifests/site.pp