[00:00:04] RoanKattouw, ^d, marktraceur, MaxSem, kaldari: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141209T0000). Please do the needful. [00:00:08] on it [00:00:24] (03CR) 10BryanDavis: [C: 04-1] "Tested via cherry pick in beta. This doesn't seem to fix the problem. Apache is still not processing the 404 page using mod_php." [puppet] - 10https://gerrit.wikimedia.org/r/177700 (owner: 10BryanDavis) [00:00:49] (03CR) 10MaxSem: [C: 032] Revert "Disable gadgets caching" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178240 (owner: 10MaxSem) [00:01:07] (03Merged) 10jenkins-bot: Revert "Disable gadgets caching" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178240 (owner: 10MaxSem) [00:01:58] we have something for swat [00:02:15] !log maxsem Synchronized wmf-config/: https://gerrit.wikimedia.org/r/178240 (duration: 00m 05s) [00:02:16] https://gerrit.wikimedia.org/r/#/c/178374/ [00:02:19] Logged the message, Master [00:02:21] will put on the wiki now [00:05:47] and https://gerrit.wikimedia.org/r/#/c/178375/ (wmf11) [00:06:20] aude, second link grbled on wikitech [00:08:04] aaah, omg [00:08:24] !log maxsem Synchronized php-1.25wmf11/extensions/MobileFrontend/: https://gerrit.wikimedia.org/r/#q,177942,n,z (duration: 00m 08s) [00:08:28] Logged the message, Master [00:08:32] kaldari|2, ^^ [00:08:56] thanks, checking.... [00:09:05] !log maxsem Synchronized php-1.25wmf11/extensions/VisualEditor/: https://gerrit.wikimedia.org/r/#q,178372,n,z (duration: 00m 07s) [00:09:07] Logged the message, Master [00:09:41] James_F, ^^^ [00:09:47] !log maxsem Synchronized php-1.25wmf11/extensions/Wikidata/: https://gerrit.wikimedia.org/r/#q,178375,n,z (duration: 00m 13s) [00:09:49] aude, ^^^ [00:09:51] Logged the message, Master [00:10:01] checking [00:10:34] looks ok [00:12:21] !log maxsem Synchronized php-1.25wmf10/extensions/VisualEditor/: https://gerrit.wikimedia.org/r/#q,178371,n,z (duration: 00m 07s) [00:12:24] Logged the message, Master [00:12:28] James_F, ^^^ [00:12:47] the_nobodies: Yeah yeah, waiting for debug=true to work. [00:13:12] the_nobodies: Works in wmf11. [00:13:13] !log maxsem Synchronized php-1.25wmf10/extensions/Wikidata/: https://gerrit.wikimedia.org/r/#q,178374,n,z (duration: 00m 13s) [00:13:15] aude, ^^^ [00:13:16] Logged the message, Master [00:13:16] the_nobodies: Testing wmf10 now. [00:13:21] checking [00:13:29] looks good [00:13:30] thanks :) [00:14:27] :) [00:29:53] the_nobodies: And (finally) confirmed in wmf10; sorry, RoanKattouw was trying to work out a RL/deployment bug. [00:30:10] thanks [00:34:45] (03PS3) 10Andrew Bogott: Use ext4 everywhere on the new hp virt nodes. [puppet] - 10https://gerrit.wikimedia.org/r/177701 [00:36:09] andrewbogott: not sure if you saw backlog [00:36:16] it'd be nice for the virt* partman profiles to at least be similar [00:36:26] and I think the virt10-raid-cisco is the better of all three [00:36:42] paravoid: the disk setup is totally different in the two servers... [00:36:53] the hp servers have hardware raid, and two different kinds of drives [00:37:27] It doesn't make sense to have one big raid10 with differently-sized drives, does it? [00:38:14] Or did you just mean w/respect to the ciscos? [01:05:36] Hey opsen, who do I talk to if I have a twemproxy question? [01:06:01] Specifically, I want to delete all memcached/twemproxy entries with a certain key prefix (not now but like 2-3 weeks from now) [01:14:16] (03PS1) 10EBernhardson: Allow enwiki bots to create flow boards [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178388 [01:14:25] (03CR) 10jenkins-bot: [V: 04-1] Allow enwiki bots to create flow boards [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178388 (owner: 10EBernhardson) [01:17:34] (03PS2) 10EBernhardson: Allow enwiki bots to create flow boards [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178388 [01:22:32] (03CR) 10Andrew Bogott: [C: 032] Use ext4 everywhere on the new hp virt nodes. [puppet] - 10https://gerrit.wikimedia.org/r/177701 (owner: 10Andrew Bogott) [01:26:15] (03PS1) 10Andrew Bogott: Remove virt-raid10.cfg. It hasn't been used since tampa. [puppet] - 10https://gerrit.wikimedia.org/r/178390 [01:26:45] (03Abandoned) 10Andrew Bogott: Stab in the dark: Try booting virt1011 from port2 [puppet] - 10https://gerrit.wikimedia.org/r/178172 (owner: 10Andrew Bogott) [01:30:11] andrewbogott: you also need to remove it from netboot.cfg [01:31:11] virt-raid10.cfg? It's not in used in netboot.cfg is it? [01:31:59] it is, there is an entry for tampa hosts that needs to be removed [01:33:42] Oh -- I agree that there's an obsolete line but it only refers to virt-raid10-cisco.cfg [01:35:24] paravoid: if you are awake enough to consider… I'm wondering about virt-hp.cfg. I'm trying to configure two different raids with two different partition schemes… is doing them in sequence like this legit? [01:35:31] (It doesn't work, but that's not necessarily the reason) [01:36:05] (03PS1) 10Andrew Bogott: Remove ref to some no-longer-existing tampa servers. [puppet] - 10https://gerrit.wikimedia.org/r/178391 [01:36:06] didn't you say these come with hardware raid? [01:36:40] but to answer your question... I'm not sure [01:36:55] but IIRC alex was fighting with something similar for the Cassandra hosts last week [01:37:01] YuviPanda: here? [01:37:19] (03PS1) 10Faidon Liambotis: Revert "monitoring: add config class" [puppet] - 10https://gerrit.wikimedia.org/r/178392 [01:37:23] paravoid: yes [01:37:27] see ^ [01:37:36] do you rely on monitoring::configuration::group in labs? [01:37:44] paravoid: nope [01:37:54] ok, good [01:38:03] monitoring::* isn’t used there at all [01:38:13] config comes from somewhere else (shinkengen) [01:38:16] paravoid: yes, hardware raid. But I still need to partition don't I? [01:38:57] yes [01:39:57] I set up two hardware raids, so presumably partman sees two drives, /dev/sda and /dev/sdb [01:40:06] (03PS2) 10Faidon Liambotis: partman: remove virt-raid10.cfg, unused [puppet] - 10https://gerrit.wikimedia.org/r/178390 (owner: 10Andrew Bogott) [01:40:16] (03CR) 10Faidon Liambotis: [C: 032 V: 032] partman: remove virt-raid10.cfg, unused [puppet] - 10https://gerrit.wikimedia.org/r/178390 (owner: 10Andrew Bogott) [01:40:44] (03PS2) 10Faidon Liambotis: partman: remove reference to Tampa virt servers [puppet] - 10https://gerrit.wikimedia.org/r/178391 (owner: 10Andrew Bogott) [01:40:52] (03PS3) 10Faidon Liambotis: partman: remove reference to Tampa virt servers [puppet] - 10https://gerrit.wikimedia.org/r/178391 (owner: 10Andrew Bogott) [01:41:00] (03CR) 10Faidon Liambotis: [C: 032 V: 032] partman: remove reference to Tampa virt servers [puppet] - 10https://gerrit.wikimedia.org/r/178391 (owner: 10Andrew Bogott) [01:41:11] (03PS2) 10Faidon Liambotis: Revert "monitoring: add config class" [puppet] - 10https://gerrit.wikimedia.org/r/178392 [01:41:27] (03CR) 10Faidon Liambotis: [C: 032] Revert "monitoring: add config class" [puppet] - 10https://gerrit.wikimedia.org/r/178392 (owner: 10Faidon Liambotis) [01:42:01] (03PS1) 10Ori.livneh: wmflib: add os_version() [puppet] - 10https://gerrit.wikimedia.org/r/178394 [01:42:07] paravoid: ^ [01:42:41] (03CR) 10Ori.livneh: "If this is merged, I will convert ubuntu_version() calls to os_version() in a separate patch." [puppet] - 10https://gerrit.wikimedia.org/r/178394 (owner: 10Ori.livneh) [01:45:20] hamm, haha [01:46:36] (03CR) 10Aaron Schulz: [C: 031] Set $wgAjaxEditStash to false while API cluster is on Zend [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178350 (owner: 10Ori.livneh) [01:47:02] RECOVERY - puppet last run on ms-fe1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [01:47:04] RECOVERY - puppet last run on ms-fe2001 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [01:47:37] (03PS2) 10Ori.livneh: Set $wgAjaxEditStash to false while API cluster is on Zend [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178350 [01:47:43] ori: this is awesome [01:47:45] thanks man [01:47:52] np! [01:48:06] (03CR) 10Ori.livneh: [C: 032] Set $wgAjaxEditStash to false while API cluster is on Zend [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178350 (owner: 10Ori.livneh) [01:48:14] (03Merged) 10jenkins-bot: Set $wgAjaxEditStash to false while API cluster is on Zend [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178350 (owner: 10Ori.livneh) [01:48:16] we'll need requires_os too btw [01:48:19] but I can amend :) [01:48:57] ah, cool [01:49:00] !log ori Synchronized wmf-config/InitialiseSettings.php: Set $wgAjaxEditStash to false while API cluster is on Zend (duration: 00m 06s) [01:49:07] Logged the message, Master [01:49:52] thanks a bunch [01:50:56] (03PS1) 10Andrew Bogott: Simplify the virt-hp.cfg partman recipe. [puppet] - 10https://gerrit.wikimedia.org/r/178398 [01:52:27] so far the only things that break under Debian are puppet manifest bugs [01:52:34] real bugs [01:52:36] so this is nice :) [01:53:21] oh and the fact that newer facter ships with a "gid" fact that clashes with our $gid assignment in realm.pp :( [01:53:26] chasemp: ^ fwiw [01:54:15] paravoid: that one sounds familiar :) https://gerrit.wikimedia.org/r/#/c/176671/1/manifests/realm.pp [01:54:59] (03CR) 10Andrew Bogott: [C: 032] Simplify the virt-hp.cfg partman recipe. [puppet] - 10https://gerrit.wikimedia.org/r/178398 (owner: 10Andrew Bogott) [01:54:59] how about these [01:55:05] (03PS1) 10Faidon Liambotis: sudo: ensure Package['sudo'] is installed [puppet] - 10https://gerrit.wikimedia.org/r/178399 [01:55:07] (03PS1) 10Faidon Liambotis: nrpe: do not install nagios-plugins-extra [puppet] - 10https://gerrit.wikimedia.org/r/178400 [01:55:09] (03PS1) 10Faidon Liambotis: certs: add missing dep & make apparmor optional [puppet] - 10https://gerrit.wikimedia.org/r/178401 [01:55:11] (03PS1) 10Faidon Liambotis: ssh: remove provider => upstart from the Service [puppet] - 10https://gerrit.wikimedia.org/r/178402 [01:55:13] (03PS1) 10Faidon Liambotis: salt: remove provider => upstart from the Service [puppet] - 10https://gerrit.wikimedia.org/r/178403 [01:55:15] (03PS1) 10Faidon Liambotis: ganglia: remove legacy gmond compatibility stanzas [puppet] - 10https://gerrit.wikimedia.org/r/178404 [01:55:17] (03PS1) 10Faidon Liambotis: ganglia: drop the "gmond" Service index [puppet] - 10https://gerrit.wikimedia.org/r/178405 [01:56:22] yeah, I've just been overlooking the field of red on my labs instance [01:56:28] Happy to have you fix them properly! [01:56:43] https://gerrit.wikimedia.org/r/178392 above was a jessie fix too [01:58:57] (03PS2) 10Faidon Liambotis: ganglia: remove legacy gmond compatibility stanzas [puppet] - 10https://gerrit.wikimedia.org/r/178404 [01:58:59] (03PS2) 10Faidon Liambotis: ganglia: drop the "gmond" Service index [puppet] - 10https://gerrit.wikimedia.org/r/178405 [01:59:01] (03PS2) 10Faidon Liambotis: ssh: remove provider => upstart from the Service [puppet] - 10https://gerrit.wikimedia.org/r/178402 [01:59:02] let's see what I'll break at 4am at night [01:59:03] (03PS2) 10Faidon Liambotis: salt: remove provider => upstart from the Service [puppet] - 10https://gerrit.wikimedia.org/r/178403 [01:59:05] (03PS2) 10Faidon Liambotis: nrpe: do not install nagios-plugins-extra [puppet] - 10https://gerrit.wikimedia.org/r/178400 [01:59:07] (03PS2) 10Faidon Liambotis: certs: add missing dep & make apparmor optional [puppet] - 10https://gerrit.wikimedia.org/r/178401 [01:59:09] (03PS2) 10Faidon Liambotis: sudo: ensure Package['sudo'] is installed [puppet] - 10https://gerrit.wikimedia.org/r/178399 [01:59:13] anyone up for reviewing? [01:59:31] if not, I think I'll call it a night :) [01:59:48] I'm happy to review, but you should probably all it a night anyway [01:59:57] heh I guess that's ture [02:00:52] (03CR) 10Andrew Bogott: [C: 031] sudo: ensure Package['sudo'] is installed [puppet] - 10https://gerrit.wikimedia.org/r/178399 (owner: 10Faidon Liambotis) [02:01:54] (03CR) 10Andrew Bogott: [C: 031] "Win!" [puppet] - 10https://gerrit.wikimedia.org/r/178400 (owner: 10Faidon Liambotis) [02:04:53] (03CR) 10Andrew Bogott: [C: 031] "Yep, I ran into this same debian vs. apparmor issue in labs." [puppet] - 10https://gerrit.wikimedia.org/r/178401 (owner: 10Faidon Liambotis) [02:06:46] paravoid: I'll go back and merge some of these +1s after I get further in the series. It's morning here so I'm around to babysit the patches. [02:07:05] (03CR) 10Andrew Bogott: [C: 031] ssh: remove provider => upstart from the Service [puppet] - 10https://gerrit.wikimedia.org/r/178402 (owner: 10Faidon Liambotis) [02:07:52] (03CR) 10Andrew Bogott: [C: 031] salt: remove provider => upstart from the Service [puppet] - 10https://gerrit.wikimedia.org/r/178403 (owner: 10Faidon Liambotis) [02:10:28] PROBLEM - puppet last run on ms-fe2001 is CRITICAL: CRITICAL: puppet fail [02:10:58] (03CR) 10Andrew Bogott: [C: 032] sudo: ensure Package['sudo'] is installed [puppet] - 10https://gerrit.wikimedia.org/r/178399 (owner: 10Faidon Liambotis) [02:12:42] dammit! Install of virt1011 is back to failing on dhcp. wtf [02:13:17] (03CR) 10Andrew Bogott: [C: 032] nrpe: do not install nagios-plugins-extra [puppet] - 10https://gerrit.wikimedia.org/r/178400 (owner: 10Faidon Liambotis) [02:14:41] (03PS2) 10Faidon Liambotis: wmflib: add os_version() & requires_os() [puppet] - 10https://gerrit.wikimedia.org/r/178394 (owner: 10Ori.livneh) [02:15:01] ori: ^ [02:15:13] ori: I'm unsure whether keeping the variadic option makes sense [02:16:21] andrewbogott: oh just saw -- thanks! [02:16:44] andrewbogott: btw, I split the ssh & salt patches with the intention of distancing them a bit [02:17:03] if they're broken and they end up killing the service [02:17:07] !log l10nupdate Synchronized php-1.25wmf10/cache/l10n: (no message) (duration: 00m 03s) [02:17:10] let's not kill ssh & salt at the same time :P [02:17:12] !log LocalisationUpdate completed (1.25wmf10) at 2014-12-09 02:17:11+00:00 [02:17:13] Logged the message, Master [02:17:15] Logged the message, Master [02:18:14] paravoid: ok, I'll wait a good long while between those merges [02:18:22] 5-10' should be enough [02:18:31] icinga will complain if we lose ssh [02:21:43] (03CR) 10Faidon Liambotis: [C: 04-2] "Do not merge yet. I want to explore the possibility of fixing our Ubuntu repository components first." [puppet] - 10https://gerrit.wikimedia.org/r/178167 (owner: 10Faidon Liambotis) [02:21:49] (03CR) 10Faidon Liambotis: [C: 04-2] "Do not merge yet. I want to explore the possibility of fixing our Ubuntu repository components first." [puppet] - 10https://gerrit.wikimedia.org/r/178166 (owner: 10Faidon Liambotis) [02:21:54] (03CR) 10Andrew Bogott: [C: 032] certs: add missing dep & make apparmor optional [puppet] - 10https://gerrit.wikimedia.org/r/178401 (owner: 10Faidon Liambotis) [02:22:04] PROBLEM - puppet last run on mw1220 is CRITICAL: CRITICAL: Puppet has 1 failures [02:22:18] ok, off for the night [02:22:20] thanks a lot andrewbogott [02:22:25] and ori [02:23:31] !log l10nupdate Synchronized php-1.25wmf11/cache/l10n: (no message) (duration: 00m 01s) [02:23:35] !log LocalisationUpdate completed (1.25wmf11) at 2014-12-09 02:23:35+00:00 [02:23:35] Logged the message, Master [02:23:38] Logged the message, Master [02:25:27] (03CR) 10Faidon Liambotis: [C: 031] "Wow -- thanks a lot for following up Bryan!" [puppet] - 10https://gerrit.wikimedia.org/r/177432 (owner: 10BryanDavis) [02:26:13] (03Abandoned) 10Faidon Liambotis: Add Twitter account to Varnish's error page [puppet] - 10https://gerrit.wikimedia.org/r/97190 (https://bugzilla.wikimedia.org/20079) (owner: 10Faidon Liambotis) [02:28:20] (03CR) 10Andrew Bogott: [C: 032] ssh: remove provider => upstart from the Service [puppet] - 10https://gerrit.wikimedia.org/r/178402 (owner: 10Faidon Liambotis) [02:31:19] PROBLEM - puppet last run on ms-fe1001 is CRITICAL: CRITICAL: puppet fail [02:36:46] RECOVERY - puppet last run on mw1220 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [02:45:37] PROBLEM - puppet last run on nescio is CRITICAL: CRITICAL: Puppet has 1 failures [02:50:05] (03PS1) 10Andrew Bogott: Define package['python-statsd'] in ganglia_reporter. [puppet] - 10https://gerrit.wikimedia.org/r/178413 [02:51:37] PROBLEM - Host d-i-test is DOWN: CRITICAL - Plugin timed out after 15 seconds [02:51:41] (03CR) 10Andrew Bogott: [C: 032] Define package['python-statsd'] in ganglia_reporter. [puppet] - 10https://gerrit.wikimedia.org/r/178413 (owner: 10Andrew Bogott) [02:53:49] (03PS1) 10Andrew Bogott: Revert "Define package['python-statsd'] in ganglia_reporter." [puppet] - 10https://gerrit.wikimedia.org/r/178417 [02:54:50] (03CR) 10Andrew Bogott: [C: 032] Revert "Define package['python-statsd'] in ganglia_reporter." [puppet] - 10https://gerrit.wikimedia.org/r/178417 (owner: 10Andrew Bogott) [02:55:00] ugh, dumb mistake [02:56:34] (03PS1) 10Andrew Bogott: Define package['python-statsd'] in ganglia_reporter. [puppet] - 10https://gerrit.wikimedia.org/r/178418 [02:58:19] (03CR) 10Andrew Bogott: [C: 032] Define package['python-statsd'] in ganglia_reporter. [puppet] - 10https://gerrit.wikimedia.org/r/178418 (owner: 10Andrew Bogott) [03:09:06] RECOVERY - puppet last run on nescio is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [03:12:12] PROBLEM - puppet last run on ms1004 is CRITICAL: CRITICAL: Puppet has 1 failures [03:16:29] (03CR) 10Springle: [C: 04-1] "We should pass a wfGetDB(DB_SLAVE, 'vslow') database object to SiteStatsInit before enabling this job." [puppet] - 10https://gerrit.wikimedia.org/r/178170 (owner: 10Nemo bis) [03:20:07] (03Abandoned) 10Springle: ocg log fills up faster than daily cycle [puppet] - 10https://gerrit.wikimedia.org/r/168536 (owner: 10Springle) [03:26:09] (03PS1) 10Catrope: Expose citoid through misc-web [puppet] - 10https://gerrit.wikimedia.org/r/178419 [03:26:26] RECOVERY - puppet last run on ms1004 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [03:26:57] Hey opsen, who is familiar with misc-web? Specifically, who should I talk to about https://gerrit.wikimedia.org/r/178419 ? I have no confidence that what I did will even work [03:27:03] (03PS1) 10Springle: depool db1010 for upgrade [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178421 [03:28:36] (03CR) 10Catrope: "I have no idea if I did this correctly. Someone who knows things about misc-web should look at this." [puppet] - 10https://gerrit.wikimedia.org/r/178419 (owner: 10Catrope) [03:32:21] (03CR) 10Springle: [C: 032] depool db1010 for upgrade [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178421 (owner: 10Springle) [03:32:31] (03Merged) 10jenkins-bot: depool db1010 for upgrade [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178421 (owner: 10Springle) [03:33:41] !log springle Synchronized wmf-config/db-eqiad.php: depool db1010 (duration: 00m 08s) [03:33:46] Logged the message, Master [04:01:27] (03PS1) 10Springle: upgrade db1010 to trusty and mariadb 10 [puppet] - 10https://gerrit.wikimedia.org/r/178426 [04:04:05] (03CR) 10Springle: [C: 032] upgrade db1010 to trusty and mariadb 10 [puppet] - 10https://gerrit.wikimedia.org/r/178426 (owner: 10Springle) [04:04:11] (03PS1) 10Andrew Bogott: Don't install sudo on labs. We use sudo-ldap instead. [puppet] - 10https://gerrit.wikimedia.org/r/178427 [04:05:57] (03CR) 10Andrew Bogott: [C: 032] "i welcome a more elegant solution to this. But for now, unbreaking labs!" [puppet] - 10https://gerrit.wikimedia.org/r/178427 (owner: 10Andrew Bogott) [04:14:21] PROBLEM - puppet last run on nescio is CRITICAL: CRITICAL: Puppet has 1 failures [04:18:47] !log upgrade db1010 trusty [04:18:51] Logged the message, Master [04:27:31] !log LocalisationUpdate ResourceLoader cache refresh completed at Tue Dec 9 04:27:31 UTC 2014 (duration 27m 30s) [04:27:36] Logged the message, Master [04:29:49] RECOVERY - puppet last run on nescio is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [04:39:06] ori: are you up? I have a puzzle with require_package() [04:39:17] Error: Failed to apply catalog: Could not find dependency Package[python-statsd] for File[/usr/local/bin/swift-dispersion-stats] at /etc/puppet/manifests/role/swift.pp:79 [04:39:21] of course i'm up [04:39:37] But as you can see in that file, I have a require_package('python-statsd') in that class [04:39:43] am I misunderstanding how that is supposed to work? [04:40:00] give me a sec, finishing up something else [04:40:02] oh, only 8 PM in san francisco… my timezone awareness is not so good right now :) [04:40:05] sure, np. [04:44:18] andrewbogott: well, you don't need to require => Package['python-statsd'] anywhere in that class, because require_package makes the package a prerequisite for all resources in the class [04:44:35] andrewbogott: so the easy fix is to just remove them. OTOH, it shouldn't be an error, so there's a bug there [04:44:47] ori: it makes it a prereq /and/ it defines it, right? [04:44:48] but removing the require will solve the problem on your end, and i can worry about the bug [04:44:51] yes [04:45:00] ok. Would you like me to log something about this? [04:45:10] nah, i'm going to figure it out now [04:45:17] or fail to, and then log :) [04:45:20] ok. Thank you! [04:45:26] np [04:47:20] andrewbogott: actually, would you mind leaving in the 'requires' for a few minutes so we can test a fix I'm cooking up? [04:47:26] sure [04:47:37] the failure is on ms-fe1001 [05:08:33] RECOVERY - puppet last run on ms-fe2001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [05:11:54] ori: I'm about to go to lunch -- back in an hour or so. It's ok if ms-fe1001 puppet stays broken in the meantime. [05:13:52] thanks [05:13:58] (03PS1) 10Ori.livneh: Fix require_package() [puppet] - 10https://gerrit.wikimedia.org/r/178431 [05:16:08] PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 220, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/0: down - Core: cr1-codfw:xe-5/2/1 (Telia, IC-307235) (#2648) [10Gbps wave]BR [05:16:32] oops, I'm still here :) Want me to merge that and test? [05:26:02] andrewbogott: sure, go for it [05:28:32] andrewbogott: you're probably gone, so i'll give it a shot [05:28:43] (03CR) 10Ori.livneh: [C: 032] Fix require_package() [puppet] - 10https://gerrit.wikimedia.org/r/178431 (owner: 10Ori.livneh) [05:29:12] hi _joe_ [05:31:43] PROBLEM - puppet last run on ms-fe2001 is CRITICAL: CRITICAL: puppet fail [05:42:57] (03PS1) 10Ori.livneh: require_package(): try harder to evaluate resource [puppet] - 10https://gerrit.wikimedia.org/r/178434 [05:43:28] (03CR) 10Ori.livneh: [C: 032 V: 032] require_package(): try harder to evaluate resource [puppet] - 10https://gerrit.wikimedia.org/r/178434 (owner: 10Ori.livneh) [05:46:19] andrewbogott: it's fine now [05:46:31] puppet ran successfully on both ms-fe2001 and ms-fe1001 [05:46:54] RECOVERY - puppet last run on ms-fe2001 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [05:47:03] it's your call whether to remote those 'requires' now -- they're superfluous but they won't hurt any more. [05:49:32] PROBLEM - puppet last run on ms-be2010 is CRITICAL: CRITICAL: Puppet has 1 failures [05:51:51] PROBLEM - puppet last run on ms1004 is CRITICAL: CRITICAL: Puppet has 1 failures [05:54:14] PROBLEM - puppet last run on nescio is CRITICAL: CRITICAL: Puppet has 1 failures [05:58:30] RECOVERY - puppet last run on ms-be2010 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [05:58:34] andrewbogott: ms1004 is still failing because it's running hardy, which doesn't have the package [06:01:21] !log ori Synchronized php-1.25wmf10/extensions/Math/MathInputCheckTexvc.php: Fix for fatal caused by static call to MathRenderer::getError (duration: 00m 06s) [06:01:27] Logged the message, Master [06:01:32] !log ori Synchronized php-1.25wmf11/extensions/Math/MathInputCheckTexvc.php: Fix for fatal caused by static call to MathRenderer::getError (duration: 00m 06s) [06:01:35] Logged the message, Master [06:04:53] RECOVERY - puppet last run on ms-fe1001 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [06:25:44] <_joe_> hi ori [06:26:04] <_joe_> (you cheered my remote shell earlier :P) [06:26:23] _joe_: morning. app servers are all on wm5 with tidy enabled and hhvm.stat_cache=true. everything looks wonderful. [06:26:41] <_joe_> tidy enabled? since when? nice [06:26:56] <_joe_> yesterday/this morning for you I guess [06:27:04] i re-did it as a simple HNI extension for the one function we needed and enabled it on beta on friday [06:27:20] there weren't any crashes, so i rolled it out to mw1081, and there were still no crashes, so i did the rest. [06:27:20] <_joe_> yeah I've seen the code, simple indeed [06:27:31] <_joe_> good [06:27:57] it's very pleasant to look at the app servers in ganglia :) [06:28:14] <_joe_> stat_cache = true looks good, if we see any bugs, we may rollback when we meet them [06:28:53] yeah, there were a number of scaps / syncs today and no issues so far [06:29:11] <_joe_> good! [06:29:26] <_joe_> so, did you by any chance took tidy to the api cluster as well? [06:29:31] <_joe_> and mw5? [06:29:35] <_joe_> *wm [06:29:36] yeah [06:30:04] the only issue that is still unresolved is what to do about the bytecode repo. it will just keep growing, since there is no code in hhvm to prune old entries. restarting hhvm won't help, either, since it'll continue using the same repo. we have to actually stop the service, delete the repo, and start it back up. [06:30:17] fortunately we probably won't have to do it more often than, say, once a week [06:30:33] <_joe_> which is still annoying [06:30:42] <_joe_> once a week seems quite often tbh [06:31:05] yes, it's annoying [06:31:22] <_joe_> I thought "once a month" [06:31:36] <_joe_> we do have disk and memory after all :P [06:31:49] <_joe_> but I'll monitor that [06:31:54] yeah, we should watch it carefully [06:32:05] because we could run out of space in /run all at once on all app servers [06:32:06] <_joe_> did you test the performance on API? [06:32:07] which won't be pretty [06:32:14] <_joe_> no :P [06:32:25] <_joe_> I'll add a check [06:32:39] i did: yes, i updated the [06:32:45] woops, bad paste [06:32:46] i did: https://phabricator.wikimedia.org/T758#829551 [06:32:50] <_joe_> yeah sorry gust got up [06:33:01] yeah no worries, i'm going to bed [06:33:07] <_joe_> ok great [06:33:16] <_joe_> should we do API then? [06:33:16] good morning / night [06:33:18] <_joe_> I guess so [06:33:23] yeah, i think so [06:33:29] <_joe_> you may wake up in 100% hhvm-land then [06:33:32] PROBLEM - puppet last run on cp1061 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:35] PROBLEM - puppet last run on amssq35 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:43] puppetmaster o'clock [06:33:44] <_joe_> good job, and good night [06:33:47] thanks, good night [06:33:48] <_joe_> yeah [06:33:51] PROBLEM - puppet last run on db1059 is CRITICAL: CRITICAL: Puppet has 2 failures [06:33:59] ori: Thanks, I'll have a look at ms1004 [06:34:09] np [06:34:44] <_joe_> ms1004 is an old semi-abandoned box [06:34:51] <_joe_> what's this about andrewbogott ? [06:34:53] PROBLEM - puppet last run on mw1061 is CRITICAL: CRITICAL: Puppet has 1 failures [06:35:16] _joe_: oh, I actually don't know if there's a new problem, just, ori mentioned that puppet was failing there. [06:35:19] python-statsd was added as a dependency for swift but it's not packaged for trusty [06:35:19] If I can ignore it, I will! [06:35:23] PROBLEM - puppet last run on lvs2004 is CRITICAL: CRITICAL: Puppet has 1 failures [06:35:24] err, not packaged for hardy [06:35:27] which is what ms1004 is on [06:35:28] PROBLEM - puppet last run on db2018 is CRITICAL: CRITICAL: Puppet has 1 failures [06:36:02] hardy has python 2.6.2, not sure if python-statsd would work [06:37:21] <_joe_> doesn'tmatter [06:38:05] splendid, I will ignore. [06:38:11] <_joe_> I'll talk to filippo, but I think that box is just old [06:38:24] <_joe_> andrewbogott: yeah ignore for now :) [06:40:35] RECOVERY - puppet last run on cp1061 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:42:39] RECOVERY - puppet last run on db2018 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:42:42] <_joe_> bbl, [06:43:43] (03CR) 10Andrew Bogott: [C: 032] salt: remove provider => upstart from the Service [puppet] - 10https://gerrit.wikimedia.org/r/178403 (owner: 10Faidon Liambotis) [06:47:29] RECOVERY - puppet last run on amssq35 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:31] morning [06:47:38] ms1004 runs lucid, not hardy, iirc [06:47:42] RECOVERY - puppet last run on db1059 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [06:47:56] Ubuntu 10.04.4 LTS [06:47:58] yup [06:48:23] we have three lucid boxes in production right now [06:48:33] ms1004, nescio (esams ntp) & sodium (mailman) [06:48:42] RECOVERY - puppet last run on mw1061 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:48:42] and no hardy anymore, fortunately [06:48:47] I think mchenry was the last one [06:48:57] RECOVERY - puppet last run on lvs2004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:52:19] (03CR) 10Andrew Bogott: [C: 032] ganglia: remove legacy gmond compatibility stanzas [puppet] - 10https://gerrit.wikimedia.org/r/178404 (owner: 10Faidon Liambotis) [06:53:51] (03PS3) 10Ori.livneh: wmflib: add os_version() & requires_os() [puppet] - 10https://gerrit.wikimedia.org/r/178394 [06:53:53] (03PS1) 10Ori.livneh: replace usage of requires_ubuntu / ubuntu_version with requires_os / os_version [puppet] - 10https://gerrit.wikimedia.org/r/178440 [06:53:57] (03PS1) 10Ori.livneh: Remove requires_ubuntu() / ubuntu_version() [puppet] - 10https://gerrit.wikimedia.org/r/178441 [06:54:06] I was about to do this... :) [06:54:16] (03CR) 10Ori.livneh: [C: 031] wmflib: add os_version() & requires_os() [puppet] - 10https://gerrit.wikimedia.org/r/178394 (owner: 10Ori.livneh) [06:54:35] requires_os('trusty') [06:54:37] that won't work [06:54:39] afaik [06:54:42] oh, d'oh [06:55:11] i updated the code correctly, but not the docs [06:55:55] PROBLEM - puppet last run on gold is CRITICAL: CRITICAL: Puppet has 1 failures [06:56:15] (03CR) 10Andrew Bogott: [C: 031] ganglia: drop the "gmond" Service index [puppet] - 10https://gerrit.wikimedia.org/r/178405 (owner: 10Faidon Liambotis) [06:56:18] wait [06:56:26] (03PS3) 10Faidon Liambotis: ganglia: drop the "gmond" Service alias [puppet] - 10https://gerrit.wikimedia.org/r/178405 [06:56:39] (03PS4) 10Faidon Liambotis: ganglia: drop the "gmond" Service alias [puppet] - 10https://gerrit.wikimedia.org/r/178405 [06:56:49] I have no clue why I wrote "index" instead of "alias" [06:57:01] (03CR) 10Faidon Liambotis: [C: 032] ganglia: drop the "gmond" Service alias [puppet] - 10https://gerrit.wikimedia.org/r/178405 (owner: 10Faidon Liambotis) [06:57:10] (03PS2) 10Ori.livneh: Remove requires_ubuntu() / ubuntu_version() [puppet] - 10https://gerrit.wikimedia.org/r/178441 [06:57:12] (03PS2) 10Ori.livneh: replace usage of requires_ubuntu / ubuntu_version with requires_os / os_version [puppet] - 10https://gerrit.wikimedia.org/r/178440 [06:57:14] (03PS4) 10Ori.livneh: wmflib: add os_version() & requires_os() [puppet] - 10https://gerrit.wikimedia.org/r/178394 [06:57:26] PROBLEM - puppet last run on amssq47 is CRITICAL: CRITICAL: Puppet has 1 failures [06:58:14] PROBLEM - puppet last run on mw1166 is CRITICAL: CRITICAL: Puppet has 1 failures [06:58:25] PROBLEM - puppet last run on mw1052 is CRITICAL: CRITICAL: Puppet has 1 failures [06:59:15] I wonder if we should allow failures of apt-get within puppet [06:59:28] returns => [0, 1] or what the syntax is [06:59:35] this is a bit annoying [06:59:52] the other idea is running apt-get from cron before we run puppet [07:00:30] (03PS5) 10Faidon Liambotis: wmflib: add os_version() & requires_os() [puppet] - 10https://gerrit.wikimedia.org/r/178394 (owner: 10Ori.livneh) [07:00:39] (03CR) 10Faidon Liambotis: [C: 032] "Thanks Ori!" [puppet] - 10https://gerrit.wikimedia.org/r/178394 (owner: 10Ori.livneh) [07:01:32] paravoid: on labs I increased the apt timeout and it reduced transient failures by quite a bit. [07:01:39] PROBLEM - puppet last run on virt1000 is CRITICAL: CRITICAL: Puppet has 1 failures [07:01:55] (03PS3) 10Faidon Liambotis: replace usage of requires_ubuntu / ubuntu_version with requires_os / os_version [puppet] - 10https://gerrit.wikimedia.org/r/178440 (owner: 10Ori.livneh) [07:01:59] https://gerrit.wikimedia.org/r/#/c/178181/ [07:02:37] PROBLEM - puppet last run on wtp1013 is CRITICAL: CRITICAL: Puppet has 1 failures [07:02:55] (03CR) 10Faidon Liambotis: [C: 032] replace usage of requires_ubuntu / ubuntu_version with requires_os / os_version [puppet] - 10https://gerrit.wikimedia.org/r/178440 (owner: 10Ori.livneh) [07:02:57] PROBLEM - puppet last run on amssq38 is CRITICAL: CRITICAL: Puppet has 1 failures [07:02:58] PROBLEM - puppet last run on db1054 is CRITICAL: CRITICAL: Puppet has 1 failures [07:03:22] (03PS3) 10Faidon Liambotis: Remove requires_ubuntu() / ubuntu_version() [puppet] - 10https://gerrit.wikimedia.org/r/178441 (owner: 10Ori.livneh) [07:03:45] modules/apache/manifests/mod.pp:class apache::mod::version { if ubuntu_version( '< 13.10') { apache::mod_conf { 've [07:03:48] modules/contint/manifests/packages.pp: if ubuntu_version( '< trusty') { [07:03:50] PROBLEM - puppet last run on mw1148 is CRITICAL: CRITICAL: Puppet has 1 failures [07:03:51] missed one? [07:03:51] PROBLEM - puppet last run on mw1243 is CRITICAL: CRITICAL: Puppet has 1 failures [07:03:56] well, two :) [07:04:01] PROBLEM - puppet last run on mc1013 is CRITICAL: CRITICAL: Puppet has 1 failures [07:04:52] RECOVERY - puppet last run on virt1000 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:05:05] RECOVERY - puppet last run on gold is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:05:33] PROBLEM - puppet last run on cp1069 is CRITICAL: CRITICAL: puppet fail [07:05:45] PROBLEM - puppet last run on cp1051 is CRITICAL: CRITICAL: puppet fail [07:06:00] contint::packages, what a mess... [07:06:13] paravoid: labs is unhappy with that last… https://dpaste.de/CwY2 [07:06:15] PROBLEM - puppet last run on amssq57 is CRITICAL: CRITICAL: puppet fail [07:06:37] PROBLEM - puppet last run on cp1065 is CRITICAL: CRITICAL: puppet fail [07:06:57] PROBLEM - puppet last run on cp1057 is CRITICAL: CRITICAL: puppet fail [07:07:25] RECOVERY - puppet last run on mw1166 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:07:27] PROBLEM - puppet last run on cp4017 is CRITICAL: CRITICAL: puppet fail [07:07:27] PROBLEM - puppet last run on amssq44 is CRITICAL: CRITICAL: puppet fail [07:07:35] PROBLEM - puppet last run on cp4013 is CRITICAL: CRITICAL: puppet fail [07:07:45] RECOVERY - puppet last run on mw1052 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:07:54] PROBLEM - puppet last run on amssq58 is CRITICAL: CRITICAL: puppet fail [07:08:04] those prod failures are somethign to do with gmond, I'm looking now [07:08:04] PROBLEM - puppet last run on amssq59 is CRITICAL: CRITICAL: puppet fail [07:08:04] PROBLEM - puppet last run on amssq54 is CRITICAL: CRITICAL: puppet fail [07:08:05] PROBLEM - puppet last run on cp1070 is CRITICAL: CRITICAL: puppet fail [07:08:05] PROBLEM - puppet last run on cp3022 is CRITICAL: CRITICAL: puppet fail [07:08:05] (03CR) 10Nemo bis: "I disagree that T76560 has anything to do with this changeset." [puppet] - 10https://gerrit.wikimedia.org/r/97190 (https://bugzilla.wikimedia.org/20079) (owner: 10Faidon Liambotis) [07:08:16] PROBLEM - puppet last run on cp3013 is CRITICAL: CRITICAL: puppet fail [07:08:31] argh [07:08:36] that needs ruby 1.9 [07:08:39] PROBLEM - puppet last run on cp4006 is CRITICAL: CRITICAL: puppet fail [07:08:56] PROBLEM - puppet last run on amssq43 is CRITICAL: CRITICAL: puppet fail [07:08:56] PROBLEM - puppet last run on cp3015 is CRITICAL: CRITICAL: puppet fail [07:08:57] PROBLEM - puppet last run on cp1059 is CRITICAL: CRITICAL: puppet fail [07:08:57] PROBLEM - puppet last run on cp4007 is CRITICAL: CRITICAL: puppet fail [07:08:59] PROBLEM - puppet last run on cp3006 is CRITICAL: CRITICAL: puppet fail [07:09:25] PROBLEM - puppet last run on cp1047 is CRITICAL: CRITICAL: puppet fail [07:09:31] PROBLEM - puppet last run on cp3008 is CRITICAL: CRITICAL: puppet fail [07:09:40] PROBLEM - puppet last run on amssq53 is CRITICAL: CRITICAL: puppet fail [07:09:40] PROBLEM - puppet last run on amssq61 is CRITICAL: CRITICAL: puppet fail [07:10:05] PROBLEM - puppet last run on cp1039 is CRITICAL: CRITICAL: puppet fail [07:10:11] ah, submodules ungrepped [07:10:19] ? [07:10:25] PROBLEM - puppet last run on cp1055 is CRITICAL: CRITICAL: puppet fail [07:10:25] PROBLEM - puppet last run on cp1049 is CRITICAL: CRITICAL: puppet fail [07:10:36] PROBLEM - puppet last run on cp3016 is CRITICAL: CRITICAL: puppet fail [07:10:47] PROBLEM - puppet last run on cp3020 is CRITICAL: CRITICAL: puppet fail [07:11:01] PROBLEM - puppet last run on amssq49 is CRITICAL: CRITICAL: puppet fail [07:11:06] PROBLEM - puppet last run on amssq32 is CRITICAL: CRITICAL: puppet fail [07:11:06] PROBLEM - puppet last run on cp3003 is CRITICAL: CRITICAL: puppet fail [07:11:07] /etc/puppet/modules/varnishkafka/manifests/monitor.pp [07:11:14] a submodule, still contains a reference to gmond. [07:11:15] oh ffs [07:11:49] RECOVERY - puppet last run on wtp1013 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [07:12:06] RECOVERY - puppet last run on db1054 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:12:16] PROBLEM - puppet last run on cp4008 is CRITICAL: CRITICAL: puppet fail [07:12:26] PROBLEM - puppet last run on cp1061 is CRITICAL: CRITICAL: puppet fail [07:12:39] PROBLEM - puppet last run on cp4004 is CRITICAL: CRITICAL: puppet fail [07:12:39] PROBLEM - puppet last run on amssq35 is CRITICAL: CRITICAL: puppet fail [07:13:06] PROBLEM - puppet last run on cp4003 is CRITICAL: CRITICAL: puppet fail [07:13:08] ruby 1.8 has no optional groups and no named groups [07:13:09] RECOVERY - puppet last run on mw1148 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:13:17] RECOVERY - puppet last run on mw1243 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:13:30] RECOVERY - puppet last run on mc1013 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:13:47] PROBLEM - puppet last run on amssq56 is CRITICAL: CRITICAL: puppet fail [07:13:47] PROBLEM - puppet last run on amssq60 is CRITICAL: CRITICAL: puppet fail [07:13:48] PROBLEM - puppet last run on cp3010 is CRITICAL: CRITICAL: puppet fail [07:13:59] PROBLEM - puppet last run on cp4019 is CRITICAL: CRITICAL: puppet fail [07:13:59] PROBLEM - puppet last run on cp1056 is CRITICAL: CRITICAL: puppet fail [07:14:09] <_joe_> hi :) [07:14:11] PROBLEM - puppet last run on cp3014 is CRITICAL: CRITICAL: puppet fail [07:14:13] PROBLEM - puppet last run on amssq46 is CRITICAL: CRITICAL: puppet fail [07:14:18] hi [07:14:22] ignore these for now [07:14:31] <_joe_> yeah I figured that [07:14:37] PROBLEM - puppet last run on cp4014 is CRITICAL: CRITICAL: puppet fail [07:14:37] PROBLEM - puppet last run on cp4001 is CRITICAL: CRITICAL: puppet fail [07:14:37] PROBLEM - puppet last run on amssq48 is CRITICAL: CRITICAL: puppet fail [07:14:37] PROBLEM - puppet last run on cp1046 is CRITICAL: CRITICAL: puppet fail [07:14:56] PROBLEM - puppet last run on amssq55 is CRITICAL: CRITICAL: puppet fail [07:14:56] PROBLEM - puppet last run on amssq40 is CRITICAL: CRITICAL: puppet fail [07:15:17] PROBLEM - puppet last run on cp1062 is CRITICAL: CRITICAL: puppet fail [07:15:18] PROBLEM - puppet last run on cp1048 is CRITICAL: CRITICAL: puppet fail [07:15:27] (03PS1) 10Andrew Bogott: Replace gmond with ganglia-monitor in this submodule. [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/178443 [07:15:47] PROBLEM - puppet last run on amssq34 is CRITICAL: CRITICAL: puppet fail [07:15:49] PROBLEM - puppet last run on amssq36 is CRITICAL: CRITICAL: puppet fail [07:16:10] PROBLEM - puppet last run on cp1050 is CRITICAL: CRITICAL: puppet fail [07:16:17] (03CR) 10Andrew Bogott: [C: 032] Replace gmond with ganglia-monitor in this submodule. [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/178443 (owner: 10Andrew Bogott) [07:16:29] PROBLEM - puppet last run on cp1063 is CRITICAL: CRITICAL: puppet fail [07:16:37] PROBLEM - puppet last run on cp4005 is CRITICAL: CRITICAL: puppet fail [07:16:39] <_joe_> meh submodules [07:16:46] PROBLEM - puppet last run on cp1038 is CRITICAL: CRITICAL: puppet fail [07:16:47] PROBLEM - puppet last run on cp4018 is CRITICAL: CRITICAL: puppet fail [07:16:48] PROBLEM - puppet last run on amssq51 is CRITICAL: CRITICAL: puppet fail [07:17:22] paravoid: pending patches on palladium? [07:17:28] (03PS1) 10Andrew Bogott: Submodule commit [puppet] - 10https://gerrit.wikimedia.org/r/178444 [07:17:29] yes [07:17:34] sec [07:17:36] don't merge just yet :) [07:17:36] PROBLEM - puppet last run on amssq41 is CRITICAL: CRITICAL: puppet fail [07:17:46] (03CR) 10Andrew Bogott: [V: 032] Replace gmond with ganglia-monitor in this submodule. [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/178443 (owner: 10Andrew Bogott) [07:17:52] PROBLEM - puppet last run on cp3004 is CRITICAL: CRITICAL: puppet fail [07:17:55] ok [07:17:59] PROBLEM - puppet last run on amssq42 is CRITICAL: CRITICAL: puppet fail [07:18:15] (03CR) 10jenkins-bot: [V: 04-1] Submodule commit [puppet] - 10https://gerrit.wikimedia.org/r/178444 (owner: 10Andrew Bogott) [07:18:18] PROBLEM - puppet last run on amssq62 is CRITICAL: CRITICAL: puppet fail [07:18:27] PROBLEM - puppet last run on amssq31 is CRITICAL: CRITICAL: puppet fail [07:18:36] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 11 data above and 0 below the confidence bounds [07:18:47] PROBLEM - puppet last run on cp1052 is CRITICAL: CRITICAL: puppet fail [07:18:54] PROBLEM - puppet last run on cp3012 is CRITICAL: CRITICAL: puppet fail [07:19:11] RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 222, down: 0, dormant: 0, excluded: 0, unused: 0 [07:19:34] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 12 data above and 0 below the confidence bounds [07:19:47] PROBLEM - puppet last run on cp4012 is CRITICAL: CRITICAL: puppet fail [07:19:47] PROBLEM - puppet last run on cp1060 is CRITICAL: CRITICAL: puppet fail [07:19:48] (03PS1) 10Faidon Liambotis: Replace ubuntu_version callsites that were missed [puppet] - 10https://gerrit.wikimedia.org/r/178445 [07:19:50] (03PS1) 10Faidon Liambotis: Fix os_version() to work with Ruby 1.8 [puppet] - 10https://gerrit.wikimedia.org/r/178446 [07:19:57] PROBLEM - puppet last run on cp4020 is CRITICAL: CRITICAL: puppet fail [07:20:09] PROBLEM - puppet last run on cp3009 is CRITICAL: CRITICAL: puppet fail [07:20:09] PROBLEM - puppet last run on amssq39 is CRITICAL: CRITICAL: puppet fail [07:20:09] _joe_: https://gerrit.wikimedia.org/r/#/c/178446/ look okay to you? [07:20:46] PROBLEM - puppet last run on cp4002 is CRITICAL: CRITICAL: puppet fail [07:20:58] there's probably a more elegant way to do this [07:21:04] PROBLEM - puppet last run on cp1067 is CRITICAL: CRITICAL: puppet fail [07:21:06] .match()[N] I think [07:21:08] PROBLEM - puppet last run on cp1037 is CRITICAL: CRITICAL: puppet fail [07:21:22] (03CR) 10Faidon Liambotis: [C: 032] Replace ubuntu_version callsites that were missed [puppet] - 10https://gerrit.wikimedia.org/r/178445 (owner: 10Faidon Liambotis) [07:21:45] PROBLEM - puppet last run on cp1040 is CRITICAL: CRITICAL: puppet fail [07:21:45] <_joe_> paravoid: probably yes, but it's good I think [07:21:53] PROBLEM - puppet last run on cp4009 is CRITICAL: CRITICAL: puppet fail [07:21:54] PROBLEM - puppet last run on cp1054 is CRITICAL: CRITICAL: puppet fail [07:21:54] PROBLEM - puppet last run on amssq33 is CRITICAL: CRITICAL: puppet fail [07:21:58] (03CR) 10Faidon Liambotis: [C: 032] Fix os_version() to work with Ruby 1.8 [puppet] - 10https://gerrit.wikimedia.org/r/178446 (owner: 10Faidon Liambotis) [07:22:05] andrewbogott: is labs happy with that change? [07:22:20] <_joe_> paravoid: I thought I would do all the requires_ubuntu and ubuntu_version [07:22:23] (03PS2) 10Faidon Liambotis: Submodule commit [puppet] - 10https://gerrit.wikimedia.org/r/178444 (owner: 10Andrew Bogott) [07:22:26] checking... [07:22:39] (03PS3) 10Faidon Liambotis: Update varnishkafka submodule [puppet] - 10https://gerrit.wikimedia.org/r/178444 (owner: 10Andrew Bogott) [07:22:56] PROBLEM - puppet last run on cp4011 is CRITICAL: CRITICAL: puppet fail [07:22:59] ah, thanks, beat me to it [07:23:09] <_joe_> andrewbogott: now that we're in semi-compatible timezones (not clear to me what time is it there, though, I would take a look on the labs self-hosted puppetmaster on trusty [07:23:11] PROBLEM - puppet last run on amssq37 is CRITICAL: CRITICAL: puppet fail [07:23:11] PROBLEM - puppet last run on cp3018 is CRITICAL: CRITICAL: puppet fail [07:23:11] PROBLEM - puppet last run on cp4010 is CRITICAL: CRITICAL: puppet fail [07:23:19] PROBLEM - puppet last run on cp3011 is CRITICAL: CRITICAL: puppet fail [07:23:28] PROBLEM - puppet last run on cp1053 is CRITICAL: CRITICAL: puppet fail [07:23:38] PROBLEM - puppet last run on cp3017 is CRITICAL: CRITICAL: puppet fail [07:23:38] <_joe_> andrewbogott: can you name to me one puppetmaster on trusty there? [07:23:39] paravoid: at least one labs box is happier :) [07:23:48] PROBLEM - puppet last run on cp1066 is CRITICAL: CRITICAL: puppet fail [07:23:52] (03CR) 10Faidon Liambotis: [C: 032] Update varnishkafka submodule [puppet] - 10https://gerrit.wikimedia.org/r/178444 (owner: 10Andrew Bogott) [07:23:59] PROBLEM - puppet last run on cp1068 is CRITICAL: CRITICAL: puppet fail [07:24:07] PROBLEM - puppet last run on cp1064 is CRITICAL: CRITICAL: puppet fail [07:24:27] PROBLEM - puppet last run on cp3005 is CRITICAL: CRITICAL: puppet fail [07:24:27] PROBLEM - puppet last run on amssq52 is CRITICAL: CRITICAL: puppet fail [07:24:38] PROBLEM - puppet last run on cp4016 is CRITICAL: CRITICAL: puppet fail [07:24:41] _joe_: looking… if not I can make a new one [07:24:49] PROBLEM - puppet last run on cp3019 is CRITICAL: CRITICAL: puppet fail [07:24:53] <_joe_> ok, if not, I can make one :) [07:24:58] PROBLEM - puppet last run on cp4015 is CRITICAL: CRITICAL: puppet fail [07:25:08] PROBLEM - puppet last run on cp3021 is CRITICAL: CRITICAL: puppet fail [07:25:39] PROBLEM - puppet last run on amssq50 is CRITICAL: CRITICAL: puppet fail [07:26:09] PROBLEM - puppet last run on amssq45 is CRITICAL: CRITICAL: puppet fail [07:26:20] PROBLEM - puppet last run on cp3007 is CRITICAL: CRITICAL: puppet fail [07:27:21] _joe_: I reverted a patch of your [07:27:21] s [07:27:31] (03PS4) 10Faidon Liambotis: Remove requires_ubuntu() / ubuntu_version() [puppet] - 10https://gerrit.wikimedia.org/r/178441 (owner: 10Ori.livneh) [07:27:42] <_joe_> paravoid: uh which one? [07:27:58] see 5bbe4ba0038e94a12f3d0d3cdafdc99612be400f [07:28:16] RECOVERY - puppet last run on cp4003 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [07:28:27] RECOVERY - puppet last run on cp1050 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [07:28:37] (03CR) 10Faidon Liambotis: [C: 032] Remove requires_ubuntu() / ubuntu_version() [puppet] - 10https://gerrit.wikimedia.org/r/178441 (owner: 10Ori.livneh) [07:29:15] RECOVERY - puppet last run on amssq60 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:29:16] RECOVERY - puppet last run on cp4019 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [07:29:32] RECOVERY - puppet last run on cp3007 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [07:29:47] RECOVERY - puppet last run on cp1046 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:29:48] RECOVERY - puppet last run on cp4014 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:29:48] RECOVERY - puppet last run on cp4001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:30:03] <_joe_> paravoid: yeah I noticed that on friday btw, I was thinking of a solution [07:30:26] _joe_: testlabs-hieratests has a very minimalist demonstration of the issue [07:30:32] RECOVERY - puppet last run on cp1062 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [07:30:36] <_joe_> andrewbogott: thanks [07:30:48] RECOVERY - puppet last run on cp1048 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [07:31:12] RECOVERY - puppet last run on amssq34 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:31:42] <_joe_> paravoid: thanks for pointing that out [07:31:55] PROBLEM - puppet last run on mw1054 is CRITICAL: CRITICAL: puppet fail [07:31:55] RECOVERY - puppet last run on cp1063 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:32:05] (03PS1) 10Faidon Liambotis: realm: remove $gid = '500' [puppet] - 10https://gerrit.wikimedia.org/r/178448 [07:32:14] RECOVERY - puppet last run on cp4005 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:32:19] also Yuvi broke something with graphite I think [07:32:22] RECOVERY - puppet last run on cp1060 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [07:32:22] RECOVERY - puppet last run on cp1038 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:32:24] RECOVERY - puppet last run on cp4018 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:32:30] all the HHVM hosts are complaining about their check_graphite checks :/ [07:32:36] RECOVERY - puppet last run on cp3009 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [07:32:42] and there's a corresponding SAL entry for graphite at about that time [07:33:06] <_joe_> mh [07:33:14] Unhandled Problems [07:33:14] 368 Active [07:33:16] :( [07:33:19] RECOVERY - puppet last run on amssq38 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [07:33:20] RECOVERY - puppet last run on amssq40 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:33:20] RECOVERY - puppet last run on amssq42 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:33:20] PROBLEM - puppet last run on mw1076 is CRITICAL: CRITICAL: puppet fail [07:33:20] RECOVERY - puppet last run on amssq41 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:33:22] <_joe_> ffs it's 13 hours [07:33:34] PROBLEM - puppet last run on mw1251 is CRITICAL: CRITICAL: puppet fail [07:33:37] I know [07:33:38] sorry :) [07:33:39] <_joe_> somebody removed the metrics? [07:33:53] PROBLEM - puppet last run on mw1162 is CRITICAL: CRITICAL: puppet fail [07:33:53] RECOVERY - puppet last run on amssq62 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:33:57] (03CR) 10Andrew Bogott: [C: 031] realm: remove $gid = '500' [puppet] - 10https://gerrit.wikimedia.org/r/178448 (owner: 10Faidon Liambotis) [07:34:12] PROBLEM - puppet last run on mw1055 is CRITICAL: CRITICAL: puppet fail [07:34:22] RECOVERY - puppet last run on cp1052 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [07:34:27] Dec 9 07:25:57 mw1251 puppet-agent[41161]: Could not retrieve catalog from remote server: Error 400 on SERVER: custom functions must be called with a single array that contains the arguments. For example, function_example([1]) instead of function_example(1) at /etc/puppet/modules/mediawiki/manifests/hhvm.pp:6 on node mw1251.eqiad.wmnet [07:34:32] wth [07:34:38] RECOVERY - puppet last run on cp3012 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:34:48] <_joe_> paravoid: lemme take a look [07:34:49] PROBLEM - puppet last run on mw1177 is CRITICAL: CRITICAL: puppet fail [07:35:03] PROBLEM - puppet last run on mw1172 is CRITICAL: CRITICAL: puppet fail [07:35:03] PROBLEM - puppet last run on mw1180 is CRITICAL: CRITICAL: puppet fail [07:35:13] PROBLEM - puppet last run on mw1175 is CRITICAL: CRITICAL: puppet fail [07:35:22] Heh, it is getting smoky in here! [07:35:23] PROBLEM - puppet last run on mw1237 is CRITICAL: CRITICAL: puppet fail [07:35:39] RECOVERY - puppet last run on cp4020 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:35:50] RECOVERY - puppet last run on amssq39 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [07:35:58] RECOVERY - puppet last run on cp1053 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [07:36:22] RECOVERY - puppet last run on cp4002 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [07:36:22] PROBLEM - puppet last run on mw1247 is CRITICAL: CRITICAL: puppet fail [07:36:23] PROBLEM - puppet last run on mw1057 is CRITICAL: CRITICAL: puppet fail [07:36:23] RECOVERY - puppet last run on cp1068 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [07:36:29] PROBLEM - puppet last run on mw1056 is CRITICAL: CRITICAL: puppet fail [07:36:30] PROBLEM - puppet last run on mw1087 is CRITICAL: CRITICAL: puppet fail [07:36:30] PROBLEM - puppet last run on mw1183 is CRITICAL: CRITICAL: puppet fail [07:36:30] PROBLEM - puppet last run on mw1188 is CRITICAL: CRITICAL: puppet fail [07:36:30] RECOVERY - puppet last run on cp1067 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:36:30] RECOVERY - puppet last run on cp3004 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [07:36:43] RECOVERY - puppet last run on cp1037 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:36:45] <_joe_> paravoid: that was the case for a long time (functions must be called with array args) [07:36:53] RECOVERY - puppet last run on cp3005 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [07:36:54] PROBLEM - puppet last run on mw1181 is CRITICAL: CRITICAL: puppet fail [07:36:54] <_joe_> but it wasn't enforced AFAIR [07:36:54] PROBLEM - puppet last run on mw1049 is CRITICAL: CRITICAL: puppet fail [07:36:54] PROBLEM - puppet last run on mw1171 is CRITICAL: CRITICAL: puppet fail [07:36:54] PROBLEM - puppet last run on mw1029 is CRITICAL: CRITICAL: puppet fail [07:37:04] PROBLEM - puppet last run on mw1050 is CRITICAL: CRITICAL: puppet fail [07:37:04] PROBLEM - puppet last run on mw1151 is CRITICAL: CRITICAL: puppet fail [07:37:15] RECOVERY - puppet last run on amssq31 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:37:27] (03Abandoned) 10Andrew Bogott: -- DRAFT -- [puppet] - 10https://gerrit.wikimedia.org/r/176671 (owner: 10Andrew Bogott) [07:37:37] RECOVERY - puppet last run on cp1040 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:37:50] PROBLEM - puppet last run on mw1053 is CRITICAL: CRITICAL: puppet fail [07:37:50] RECOVERY - puppet last run on cp4009 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:37:50] RECOVERY - puppet last run on cp1054 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:37:50] PROBLEM - puppet last run on mw1051 is CRITICAL: CRITICAL: puppet fail [07:37:50] PROBLEM - puppet last run on mw1032 is CRITICAL: CRITICAL: puppet fail [07:37:50] RECOVERY - puppet last run on amssq33 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:37:59] RECOVERY - puppet last run on cp3021 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [07:38:00] time to kill the bot [07:38:19] <_joe_> paravoid: "os_version" has :arity => -2 [07:38:21] PROBLEM - puppet last run on mw1030 is CRITICAL: CRITICAL: puppet fail [07:38:21] PROBLEM - puppet last run on mw1210 is CRITICAL: CRITICAL: puppet fail [07:38:21] PROBLEM - puppet last run on mw1081 is CRITICAL: CRITICAL: puppet fail [07:38:21] PROBLEM - puppet last run on mw1077 is CRITICAL: CRITICAL: puppet fail [07:38:21] PROBLEM - puppet last run on rcs1002 is CRITICAL: CRITICAL: puppet fail [07:38:21] PROBLEM - puppet last run on mw1243 is CRITICAL: CRITICAL: puppet fail [07:38:22] PROBLEM - puppet last run on mw1165 is CRITICAL: CRITICAL: puppet fail [07:38:22] PROBLEM - puppet last run on mw1248 is CRITICAL: CRITICAL: puppet fail [07:38:22] RECOVERY - puppet last run on amssq50 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:38:23] gah, wrong bot [07:38:23] PROBLEM - puppet last run on mw1212 is CRITICAL: CRITICAL: puppet fail [07:38:28] <_joe_> but you call it with one parameter [07:38:29] PROBLEM - puppet last run on mw1074 is CRITICAL: CRITICAL: puppet fail [07:38:29] PROBLEM - puppet last run on mw1163 is CRITICAL: CRITICAL: puppet fail [07:38:29] PROBLEM - puppet last run on mw1034 is CRITICAL: CRITICAL: puppet fail [07:38:38] PROBLEM - puppet last run on mw1186 is CRITICAL: CRITICAL: puppet fail [07:38:39] RECOVERY - puppet last run on cp4011 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [07:38:39] RECOVERY - puppet last run on cp4012 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:38:39] PROBLEM - puppet last run on mw1022 is CRITICAL: CRITICAL: puppet fail [07:38:48] PROBLEM - puppet last run on mw1229 is CRITICAL: CRITICAL: puppet fail [07:38:48] PROBLEM - puppet last run on mw1023 is CRITICAL: CRITICAL: puppet fail [07:38:48] PROBLEM - puppet last run on mw1097 is CRITICAL: CRITICAL: puppet fail [07:38:48] PROBLEM - puppet last run on mw1043 is CRITICAL: CRITICAL: puppet fail [07:38:48] RECOVERY - puppet last run on amssq37 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:38:48] RECOVERY - puppet last run on cp3018 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:38:48] RECOVERY - puppet last run on cp4010 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [07:38:57] PROBLEM - puppet last run on mw1024 is CRITICAL: CRITICAL: puppet fail [07:39:14] RECOVERY - puppet last run on cp3011 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:40:40] <_joe_> paravoid: I can't reproduce the problem locally [07:41:15] <_joe_> ok no, I did it [07:42:21] <_joe_> it's because the function is not an rvalue I guess? [07:43:19] no [07:43:23] i should have caught that [07:43:37] <_joe_> I did [07:43:44] <_joe_> function_os_version([clauses]) [07:43:50] yes [07:43:57] <_joe_> also, arities are completely bogus in those functions [07:43:59] bleh [07:43:59] my bad [07:44:06] <_joe_> why -2? [07:44:10] they're not; -2 means "one or more" [07:44:12] (03PS1) 10Faidon Liambotis: os_version/requires_os: drop variadic arguments [puppet] - 10https://gerrit.wikimedia.org/r/178449 [07:44:14] what do you think of [07:44:17] that [07:44:20] because how would you represent "zero or more"? [07:44:22] <_joe_> requires_os should be arity 1 [07:44:28] that's -1 [07:44:53] ori: what do you think of that? [07:45:17] looking [07:45:27] <_joe_> ori: yeah we have no reason to have more than one argument in requires_os, given how it's don now [07:45:42] (03CR) 10Ori.livneh: [C: 031] os_version/requires_os: drop variadic arguments [puppet] - 10https://gerrit.wikimedia.org/r/178449 (owner: 10Faidon Liambotis) [07:45:42] multiple arguments for requires_os() was my doing [07:45:54] I just thought that it should be consistent with os_version() [07:46:08] I can imagine someone removing an if and replacing os_version with requires_os and get bitten by it [07:46:23] <_joe_> paravoid: requires_os could use multiple arguments if we choose how to treat them [07:46:38] what do you mean? [07:46:49] having those functions take exactly one argument is the better idea [07:46:49] the intent was to pass them to os_version(), which was joining them with || [07:46:56] <_joe_> paravoid: I can imagine requires_os 'ubuntu >= trusty', 'debian >= jessie' [07:47:16] that should be requires_os('ubuntu >= trusty || debian >= jessie') [07:47:22] or the former but not the letter [07:47:24] but not both [07:47:29] yes, agreed [07:47:30] that's too DWIM [07:47:30] <_joe_> yeah [07:47:37] <_joe_> I prefer requires_os('ubuntu >= trusty || debian >= jessie') [07:47:45] me too, hence the above commit :) [07:48:02] i +1'd [07:48:04] (03CR) 10Giuseppe Lavagetto: [C: 031] os_version/requires_os: drop variadic arguments [puppet] - 10https://gerrit.wikimedia.org/r/178449 (owner: 10Faidon Liambotis) [07:48:22] neither of you caught that I didn't update the docs :) [07:48:45] <_joe_> but they're correct I think? [07:48:53] <_joe_> the ones for requires_os [07:50:07] (03PS2) 10Faidon Liambotis: os_version/requires_os: drop support for variadic args [puppet] - 10https://gerrit.wikimedia.org/r/178449 [07:50:27] (03CR) 10Faidon Liambotis: [C: 032] os_version/requires_os: drop support for variadic args [puppet] - 10https://gerrit.wikimedia.org/r/178449 (owner: 10Faidon Liambotis) [07:50:53] it's funny, I was merging stuff until what, 3am? [07:50:58] and nothing happened [07:51:14] then half of the merges I did since I woke up were broken in some way [07:51:20] <_joe_> it's my presence clearly [07:51:21] clearly not a morning person I guess [07:51:26] <_joe_> or too few coffees [07:51:31] <_joe_> well [07:51:39] <_joe_> if you do commit until 3 am [07:51:49] I've authored/merged what, two dozen commits the past 24h [07:51:53] <_joe_> you cna expect yourself to be pretty tired in the morning [07:52:24] oh for the love of god [07:52:30] now gerrit is playing tricks [07:52:33] Submitted, Merge Pending [07:52:52] doesn't that mean it's blocked on a dependency? [07:53:15] (03CR) 10Faidon Liambotis: [C: 032] realm: remove $gid = '500' [puppet] - 10https://gerrit.wikimedia.org/r/178448 (owner: 10Faidon Liambotis) [07:53:26] yeah... I guess coffee time it is [07:53:30] <_joe_> I don't get if graphite is down for good [07:53:33] <_joe_> I think so [07:53:35] iq halfed [07:54:09] _joe_: gdash dashboards work? [07:54:16] as far as I can see they [07:54:18] <_joe_> yes [07:54:25] *work [07:54:25] <_joe_> so it's diamond maybe? [07:54:41] <_joe_> also, we've had a few more 503s [07:54:46] <_joe_> since some time ago [07:55:02] <_joe_> could be an hhvm server with issues, but how can I tell that? [07:59:27] hrm [07:59:36] requires_os is still broken with the same error [07:59:52] <_joe_> it's txstatsd [08:00:05] <_joe_> paravoid: [args] [08:00:08] no [08:00:13] args is already an array [08:00:24] <_joe_> mh [08:00:27] <_joe_> lemme check [08:00:34] may need to restart puppetmaster on palladium [08:00:38] <_joe_> 2014-12-09 07:59:43+0000 [-] Bad line: 'varnishkafka.cp1056.bits.varnishkafka.cp1056.bits_kafka.topics.webrequest_bits.partitions.5.unknown:False|g' [08:00:41] there's a weird thing with functions getting cached [08:00:42] <_joe_> ori: ^^ [08:00:48] <_joe_> txstatsd [08:00:55] wtf is that key [08:01:13] <_joe_> something very wrong I'd say [08:01:26] 'False' snuck past masquerading as a zero [08:01:39] _joe_: there are a few php fatals due to errors in our code [08:01:43] one in math, i fixed that earlier [08:01:45] a couple of other ones [08:01:58] <_joe_> which code sorry? [08:02:03] ori: good tip, thanks [08:02:05] mediawiki + extensions [08:02:08] that seemed to work [08:02:14] oh puppet... [08:02:20] <_joe_> hhvm-only fatals? [08:02:23] <_joe_> paravoid: :/ [08:02:32] ori: and btw, it's strontium too [08:02:34] _joe_: no, no connection to hhvm [08:02:42] <_joe_> oh ok [08:02:43] (if you ever need to do it too) [08:02:49] <_joe_> that's the reason of the 503s [08:02:58] paravoid: good tip! :) [08:03:01] _joe_: yeah [08:03:17] <_joe_> so, it seems txstatsd works, diamond works, data do not get into graphite [08:03:28] <_joe_> and I have no idea what was done there exactly [08:03:40] I think it correlated it with a SAL entry [08:03:41] <_joe_> but it's only data coming from diamond it seems [08:03:52] <_joe_> yeah I've seen that [08:03:57] maybe Yuvi rm -rfed and permissions are all wonky? [08:05:02] <_joe_> the logs don't seem to agree [08:05:46] ok, i'm actually off. bye [08:05:48] <_joe_> ok, it was diamond on the servers that needed a restart [08:05:53] good night [08:05:55] <_joe_> ori: good night! [08:06:00] thanks again for your help ori [08:06:15] _joe_: godog also updated diamond at some point yesterday [08:06:26] <_joe_> !log restarting diamond on all appservers [08:06:32] Logged the message, Master [08:06:37] s/on all appservers// maybe? [08:07:02] <_joe_> paravoid: for now, it's the appservers, lemme check if it's needed somewhere else [08:07:33] <_joe_> paravoid: actually, it's not I guess [08:07:53] <_joe_> ms-be* are fine, for instance [08:09:21] <_joe_> the funny part is, the last error in the diamond log was [2014-12-06 12:54:01,096] [Thread-1] Failed to collect metrics [08:09:34] <_joe_> so at first I assumed diamond wasn't the problem here [08:12:59] <_joe_> !log depooling mw1115-1119 from the api pool, reimaging [08:13:03] Logged the message, Master [08:42:14] where should access requests be filed nowadays? [08:57:19] <_joe_> mlitn: RT [08:59:18] I'm thinking of renaming our "universe" suite to "backports", any objections? [08:59:32] <_joe_> +1 [08:59:42] and on that note, I'm thinking of creating a new component for all the packages we bring via reprepro updates [08:59:53] should I call it "autoupdates" maybe? [08:59:56] or "autoupdated"? [09:00:40] <_joe_> "autoupdates" sounds better [09:04:52] RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK: OK: No anomaly detected [09:06:07] RECOVERY - HTTP error ratio anomaly detection on tungsten is OK: OK: No anomaly detected [09:07:02] (03PS1) 10Giuseppe Lavagetto: hiera: use TLS 1 instead of TLS v 1.2, it fails on trusty hosts [puppet] - 10https://gerrit.wikimedia.org/r/178459 [09:07:42] godog: the python-statsd dependency breaks on our 3 lucid hosts [09:13:01] godog: also ms-be2014 puppet has been broken for 12 days now :) [09:13:02] paravoid: rm -rf’d what? [09:13:12] the restbase metrics? [09:13:18] yeah, but it wasn't it after all [09:13:42] ah, good :) [09:13:50] springle: around? [09:17:58] paravoid: ah, anything horribly broken? it might make sense to restrict the dependency to >= trusty [09:18:30] paravoid: ms-be2014 yeah it is the bcache machine, I didn't bother letting puppet pass if we don't know if we like it or not [09:19:46] ACKNOWLEDGEMENT - puppet last run on ms-be2014 is CRITICAL: CRITICAL: Puppet has 12 failures Filippo Giunchedi bcache on swift testing machine, pending benchmarks [09:27:15] PROBLEM - check if salt-minion is running on mw1115 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [09:27:15] PROBLEM - check if dhclient is running on mw1116 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [09:27:15] PROBLEM - check configured eth on mw1117 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [09:27:45] godog: mmhm, I don't know, it's kinda dangerous to not have a working puppet [09:27:52] I guess it does work, partially [09:27:59] PROBLEM - check configured eth on mw1118 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [09:28:00] PROBLEM - check if dhclient is running on mw1117 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [09:28:02] PROBLEM - check if salt-minion is running on mw1116 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [09:28:16] but I'm wondering e.g. if I switch apt to use "backports" instead of "universe" if it'll apply there [09:28:19] <_joe_> and I just re-scheduled downtime on all those hosts :/ [09:28:47] <_joe_> paravoid: if it's not failing compilation, it would [09:28:49] <_joe_> I guess [09:29:11] <_joe_> that is, if we do the right thing and make the first apt-get update depend on the apt config [09:29:33] we don't but that's not wrong [09:29:44] we run apt-get update in the first stage [09:30:07] <_joe_> apt config should'be run in the first stage as well? [09:30:16] which one? [09:30:21] it's not very easy, depending on what you want to do [09:30:25] RECOVERY - check if salt-minion is running on mw1115 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [09:30:26] RECOVERY - check if dhclient is running on mw1116 is OK: PROCS OK: 0 processes with command name dhclient [09:30:26] RECOVERY - check configured eth on mw1117 is OK: NRPE: Unable to read output [09:30:26] we have apt "configs" all over the tree [09:30:27] <_joe_> any apt-configuration [09:30:48] <_joe_> yeah I know, let's say at least the ones in base [09:31:07] RECOVERY - check if dhclient is running on mw1117 is OK: PROCS OK: 0 processes with command name dhclient [09:31:09] RECOVERY - check configured eth on mw1118 is OK: NRPE: Unable to read output [09:31:09] RECOVERY - check if salt-minion is running on mw1116 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [09:31:13] people have tried, it's not very trivial [09:31:19] probably not worthy of anyone's time :) [09:31:56] <_joe_> mh [09:32:16] (03PS2) 10Giuseppe Lavagetto: hiera: use TLS 1 instead of TLS v 1.2, it fails on trusty hosts [puppet] - 10https://gerrit.wikimedia.org/r/178459 [09:34:29] paravoid: yeah I agree in general we shouldn't have puppet failing, I'll take a quick look tho [09:35:13] (03CR) 10Giuseppe Lavagetto: [C: 032] "Tested to fix this for labs puppetmasters." [puppet] - 10https://gerrit.wikimedia.org/r/178459 (owner: 10Giuseppe Lavagetto) [09:36:15] <_joe_> andrewbogott: ^^ fixed [09:36:22] <_joe_> well, "fixed" [09:39:29] PROBLEM - puppet last run on db1038 is CRITICAL: CRITICAL: Puppet has 1 failures [09:41:36] _joe_: I want to file a new ticket on RT, but it tells me I don’t have access to create tickets in that queue [09:41:57] <_joe_> mlitn: create it in ops-request, we can move it later [09:42:14] alright thanks [09:43:45] <_joe_> !log repooling mw1115-mw1119 [09:45:25] <_joe_> !log depooled mw1120-1125 for reimaging [09:54:39] RECOVERY - puppet last run on db1038 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:21:33] (03CR) 10Dzahn: [C: 032] Attempt to fix double-encoding in old-bugzilla HTTPS redirects [puppet] - 10https://gerrit.wikimedia.org/r/178186 (owner: 10Gergő Tisza) [10:26:42] PROBLEM - nutcracker process on mw1123 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:26:43] PROBLEM - puppet last run on mw1122 is CRITICAL: CRITICAL: Puppet has 7 failures [10:26:46] (03CR) 10Dzahn: "thanks Gergő, attempt successful" [puppet] - 10https://gerrit.wikimedia.org/r/178186 (owner: 10Gergő Tisza) [10:27:24] PROBLEM - puppet last run on mw1123 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:28:06] PROBLEM - DPKG on mw1123 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:28:06] PROBLEM - puppet last run on mw1124 is CRITICAL: CRITICAL: Puppet has 7 failures [10:28:44] PROBLEM - puppet last run on mw1125 is CRITICAL: CRITICAL: Puppet has 7 failures [10:29:44] RECOVERY - nutcracker process on mw1123 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [10:31:25] PROBLEM - puppet last run on mw1120 is CRITICAL: CRITICAL: Puppet has 7 failures [10:31:34] PROBLEM - puppet last run on mw1121 is CRITICAL: CRITICAL: Puppet has 7 failures [10:36:29] YuviPanda: hey, you might know... [10:37:03] RECOVERY - DPKG on mw1123 is OK: All packages OK [10:37:10] YuviPanda: what do we usually do when we want to push puppet changes that we absolutely need on all labs VMs, even if they run a local puppetmaster? [10:37:12] so WMF once did use CACert https://wikitech.wikimedia.org/wiki/CAcert :) [10:37:23] " There are various team members who have access to the organizational account (brion & tomasz) and you should ask on IRC if you need a certificate signed. " :) [10:37:28] do we have a process for cherry-picking a fix and pushing it to puppetmaster-selfs? [10:37:42] paravoid: aaah, we… don’t actually. [10:37:48] meh [10:38:22] paravoid: we could hack something up with salt, I guess [10:38:35] PROBLEM - HHVM rendering on mw1123 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:39:06] PROBLEM - HHVM rendering on mw1124 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:39:29] PROBLEM - puppet last run on mw1123 is CRITICAL: CRITICAL: Puppet has 1 failures [10:40:03] 130 instances with a local puppetmaster [10:40:05] *sigh* [10:41:37] RECOVERY - HHVM rendering on mw1123 is OK: HTTP OK: HTTP/1.1 200 OK - 66105 bytes in 0.293 second response time [10:41:53] RECOVERY - HHVM rendering on mw1124 is OK: HTTP OK: HTTP/1.1 200 OK - 66105 bytes in 0.272 second response time [10:42:12] <_joe_> paravoid: 130? [10:42:14] <_joe_> lol [10:42:24] PROBLEM - HHVM rendering on mw1125 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:43:04] yes, although 42 of them use the deployment-prep puppetmaster [10:43:08] and 12 the integration-puppetmaster [10:43:25] RECOVERY - puppet last run on mw1124 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [10:43:55] <_joe_> !log repooling mw1120-25, depooling mw1126-32 [10:44:00] sigh [10:45:15] RECOVERY - HHVM rendering on mw1125 is OK: HTTP OK: HTTP/1.1 200 OK - 66105 bytes in 1.025 second response time [10:45:17] preilly: both deployment and integration-puppetmaster will autoupdate themselves. [10:45:36] err [10:45:37] paravoid: [10:45:48] aha [10:45:50] thanks. [10:46:23] brb [10:46:41] RECOVERY - puppet last run on mw1120 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:48:34] RECOVERY - puppet last run on mw1123 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [10:49:41] RECOVERY - puppet last run on mw1125 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [10:53:52] RECOVERY - puppet last run on mw1122 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:55:34] RECOVERY - puppet last run on mw1121 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [11:02:09] (03PS1) 10Springle: repool db1010 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178469 [11:04:17] (03CR) 10Springle: [C: 032] repool db1010 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178469 (owner: 10Springle) [11:05:42] (03PS1) 10Giuseppe Lavagetto: mediawiki: allow use of mpm_worker instead of mpm_prefork [puppet] - 10https://gerrit.wikimedia.org/r/178470 [11:06:36] _joe_: ^ the depooled mw* will need another sync when done, but this is a db slave repool, so no real hurry [11:07:03] <_joe_> springle: they will run sync-common before they are repooled [11:07:09] excellent [11:07:12] <_joe_> they're down at the moment [11:07:29] !log repool db1010, warm up [11:07:32] <_joe_> so if you don't repool that server while scap is running on said machines, all should be good [11:07:54] oh, no bot [11:07:55] <_joe_> springle: btw tell me if you happen to notice some server not catching up on the config [11:08:36] <_joe_> we are running with the hhvm stat cache (it invalidates file caches based on inotify) [11:08:46] _joe_: will do [11:09:05] <_joe_> it should work decently, but it's always better to keep an eye on that [11:15:01] !log kicked morebots [11:15:08] Logged the message, Master [11:15:13] !log repool db1010, warm up [11:15:14] Logged the message, Master [11:15:42] _joe_: ^ it missed some of yours, btw [11:18:28] <_joe_> springle: meh, whatever [11:19:36] !log [10:43:54] <_joe_> !log repooling mw1120-25, depooling mw1126-32 [11:19:39] Logged the message, Master [11:27:42] sigh I'll need to restart diamond on other machines besides appservers, will do in batch of 5 [11:30:00] !log restarting diamond on trusty hosts via salt [11:30:03] Logged the message, Master [11:30:13] <_joe_> godog: on all trusty hosts? [11:30:30] yep [11:30:40] <_joe_> I thought I checked another, I must have got the wrong one [11:31:38] (03PS1) 10KartikMistry: Fixed Undefined wmgUseContentTranslationCluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178473 [11:32:04] I wasn't expecting the varnishkafka metrics to be there already though, I'll poke ottomata [11:32:55] <_joe_> godog: also, they have weird keys, see the txstatsd log [11:34:18] heheh that doesn't look good alrigth [11:36:22] (03PS2) 10KartikMistry: Fixed Undefined wmgUseContentTranslationCluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178473 [11:38:42] (03CR) 10Nikerabbit: [C: 032] Fixed Undefined wmgUseContentTranslationCluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178473 (owner: 10KartikMistry) [11:38:51] (03Merged) 10jenkins-bot: Fixed Undefined wmgUseContentTranslationCluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178473 (owner: 10KartikMistry) [11:40:33] (03CR) 10Hashar: Fixed Undefined wmgUseContentTranslationCluster (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178473 (owner: 10KartikMistry) [11:46:35] (03PS1) 10KartikMistry: Beta: Remove deprecated wgContentTranslationServerURL [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178474 [11:47:11] PROBLEM - DPKG on mw1131 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [11:47:43] PROBLEM - DPKG on mw1132 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [11:51:15] PROBLEM - puppet last run on mw1128 is CRITICAL: CRITICAL: Puppet has 6 failures [11:52:00] PROBLEM - puppet last run on mw1130 is CRITICAL: CRITICAL: Puppet has 7 failures [11:52:05] PROBLEM - HHVM rendering on mw1128 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:52:57] RECOVERY - DPKG on mw1131 is OK: All packages OK [11:53:05] PROBLEM - puppet last run on mw1132 is CRITICAL: CRITICAL: Puppet has 6 failures [11:53:08] PROBLEM - HHVM rendering on mw1126 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:53:35] PROBLEM - puppet last run on mw1126 is CRITICAL: CRITICAL: Puppet has 1 failures [11:53:39] RECOVERY - DPKG on mw1132 is OK: All packages OK [11:53:51] PROBLEM - HHVM rendering on mw1127 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:53:56] PROBLEM - puppet last run on mw1127 is CRITICAL: CRITICAL: Puppet has 1 failures [11:54:22] (03PS1) 10Filippo Giunchedi: diamond: install python-statsd on >= precise [puppet] - 10https://gerrit.wikimedia.org/r/178475 [11:54:37] PROBLEM - puppet last run on mw1129 is CRITICAL: CRITICAL: Puppet has 1 failures [11:54:46] RECOVERY - HHVM rendering on mw1128 is OK: HTTP OK: HTTP/1.1 200 OK - 66111 bytes in 0.287 second response time [11:55:08] PROBLEM - puppet last run on mw1131 is CRITICAL: CRITICAL: Puppet has 1 failures [11:55:36] godog: ubuntu_version doesn't exist anymore, you should make this os_version('debian >= jessie || ubuntu >= precise') [11:55:39] PROBLEM - HHVM rendering on mw1132 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:56:12] RECOVERY - HHVM rendering on mw1126 is OK: HTTP OK: HTTP/1.1 200 OK - 66111 bytes in 0.289 second response time [11:56:45] RECOVERY - HHVM rendering on mw1127 is OK: HTTP OK: HTTP/1.1 200 OK - 66111 bytes in 0.278 second response time [11:57:00] paravoid: oh, thanks I didn't get the memo [11:57:06] :)) [11:58:09] RECOVERY - puppet last run on mw1131 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [11:58:34] RECOVERY - HHVM rendering on mw1132 is OK: HTTP OK: HTTP/1.1 200 OK - 66111 bytes in 0.283 second response time [11:58:57] (03PS2) 10Filippo Giunchedi: diamond: install python-statsd on >= precise [puppet] - 10https://gerrit.wikimedia.org/r/178475 [11:59:59] RECOVERY - puppet last run on mw1127 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [12:00:22] RECOVERY - puppet last run on mw1128 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [12:01:24] <_joe_> !log repooling mw1125-1132, depooling mw1133-39 [12:01:27] Logged the message, Master [12:02:14] RECOVERY - puppet last run on mw1132 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [12:04:02] RECOVERY - puppet last run on mw1130 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [12:08:08] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/). [12:09:07] RECOVERY - puppet last run on mw1126 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [12:09:42] RECOVERY - puppet last run on mw1129 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [12:18:10] (03PS2) 10Giuseppe Lavagetto: admins: use concat() instead of an inline template [puppet] - 10https://gerrit.wikimedia.org/r/177757 [12:22:15] (03CR) 10Filippo Giunchedi: [C: 031] mediawiki: allow use of mpm_worker instead of mpm_prefork [puppet] - 10https://gerrit.wikimedia.org/r/178470 (owner: 10Giuseppe Lavagetto) [12:25:36] (03PS1) 10Giuseppe Lavagetto: Add some experimental settings to one server in each pool [puppet] - 10https://gerrit.wikimedia.org/r/178478 [12:35:02] (03PS2) 10Giuseppe Lavagetto: mediawiki: allow use of mpm_worker instead of mpm_prefork [puppet] - 10https://gerrit.wikimedia.org/r/178470 [12:41:20] godog: re: https://wiki.debian.org/AccountHandlingInMaintainerScripts, it says account removal is contentious, and mentions the problem of what to do with files owned by account [12:41:28] godog: so I didn’t do any removal what so ever [12:41:32] I had read that page before as well [12:44:05] (03CR) 10Nikerabbit: [C: 04-1] Beta: Remove deprecated wgContentTranslationServerURL (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178474 (owner: 10KartikMistry) [12:50:47] PROBLEM - puppet last run on mw1133 is CRITICAL: CRITICAL: Puppet has 7 failures [12:50:57] PROBLEM - puppet last run on mw1134 is CRITICAL: CRITICAL: Puppet has 7 failures [12:51:15] (03PS2) 10KartikMistry: Beta: Remove deprecated wgContentTranslationServerURL [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178474 [12:51:28] PROBLEM - puppet last run on mw1135 is CRITICAL: CRITICAL: Puppet has 7 failures [12:51:44] PROBLEM - puppet last run on mw1136 is CRITICAL: CRITICAL: Puppet has 7 failures [12:52:08] PROBLEM - puppet last run on mw1138 is CRITICAL: CRITICAL: Puppet has 7 failures [12:52:34] (03PS4) 10Dzahn: phabricator: community metrics stats mail [puppet] - 10https://gerrit.wikimedia.org/r/177792 [12:53:16] PROBLEM - puppet last run on mw1139 is CRITICAL: CRITICAL: Puppet has 1 failures [12:53:24] PROBLEM - HHVM rendering on mw1133 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:54:55] PROBLEM - puppet last run on mw1137 is CRITICAL: CRITICAL: Puppet has 6 failures [12:56:15] RECOVERY - HHVM rendering on mw1133 is OK: HTTP OK: HTTP/1.1 200 OK - 66101 bytes in 0.323 second response time [12:57:18] RECOVERY - puppet last run on mw1135 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:00:45] <_joe_> !log repool mw1133-39, depooling mw1140-46 [13:00:50] Logged the message, Master [13:01:06] RECOVERY - puppet last run on mw1136 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:01:26] RECOVERY - puppet last run on mw1138 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:02:17] is B security rating from ssllabs a known issue? https://www.ssllabs.com/ssltest/analyze.html?d=en.wikipedia.org [13:02:46] or is that a non-issue? [13:03:20] RECOVERY - puppet last run on mw1134 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [13:03:30] <_joe_> yurikR: this is new actually, we had A+ [13:03:47] <_joe_> "This server accepts the RC4 cipher, which is weak. Grade capped to B" [13:04:12] <_joe_> mh, we should remove that [13:04:17] yep, and a few other orange items [13:04:36] Someone complained about it a couple weeks ago, I told them to file a bug but I'm not sure they did [13:04:37] <_joe_> all related [13:05:53] that damn security, can never rest on laurels ( [13:06:34] <_joe_> well RC4 was there for IE6, we could just remove that [13:07:00] _joe_: let me know if you make changes to ssl in nginx, will have to repro for labs [13:07:35] _joe_: It fails for IE6 anyway with "Protocol or cipher suite mismatch" [13:07:43] <_joe_> YuviPanda: ? [13:07:49] _joe_: removing RC4 [13:08:03] <_joe_> YuviPanda: aren't you using ssl_chiphersuite? [13:08:03] labs proxy + tools proxy would need that removed too, I guess. [13:08:09] _joe_: nope, sadly. not unified. [13:08:14] maybe I should just switch [13:08:25] <_joe_> yeah, probably yes [13:08:56] * YuviPanda files a task, will do later today [13:11:55] RECOVERY - puppet last run on mw1133 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:12:07] godog: fixes based on your comments https://bitbucket.org/magnusmanske/wikidataquery/pull-request/4/fixes-to-debian-package/diff [13:14:04] (03PS1) 10Faidon Liambotis: base: drop support for old Ubuntu version/Solaris [puppet] - 10https://gerrit.wikimedia.org/r/178483 [13:14:06] (03PS1) 10Faidon Liambotis: interface: drop support for Ubuntu < 10.04 [puppet] - 10https://gerrit.wikimedia.org/r/178484 [13:14:08] (03PS1) 10Faidon Liambotis: apt::puppet: ensure => absent [puppet] - 10https://gerrit.wikimedia.org/r/178485 [13:14:10] (03PS1) 10Faidon Liambotis: apt: make per-host proxy conditional on $lsbdistid [puppet] - 10https://gerrit.wikimedia.org/r/178486 [13:14:20] anyone up for reviewing these? [13:14:35] RECOVERY - puppet last run on mw1139 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:15:57] (03PS1) 10Dzahn: icinga: use ssl_ciphersuite [puppet] - 10https://gerrit.wikimedia.org/r/178487 [13:16:56] _joe_ / godog / mutante? :) [13:17:49] (03PS1) 10Giuseppe Lavagetto: ssl_chiphersuite: remove RC4 [puppet] - 10https://gerrit.wikimedia.org/r/178488 [13:17:52] <_joe_> paravoid: not now, lunch [13:17:57] <_joe_> but after that, yeah [13:18:51] (03PS1) 10Faidon Liambotis: Kill a few more hardy references [puppet] - 10https://gerrit.wikimedia.org/r/178489 [13:18:53] (03PS1) 10Faidon Liambotis: mysql: remove Ubuntu 11.10 conditional for /run [puppet] - 10https://gerrit.wikimedia.org/r/178490 [13:19:09] RECOVERY - puppet last run on mw1137 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:19:27] paravoid: sure [13:29:31] godog: so? [13:29:45] sorry, impatient enough :) [13:30:00] haha okay, sec [13:30:32] godog: why does diamond install python-statsd btw? [13:31:10] paravoid: because I've switched from our custom statsd collector to upstream's, which depends on python-statsd [13:31:14] (03CR) 10Dzahn: [C: 031] "yes, the downvoting for "This server accepts the RC4 cipher, which is weak. Grade capped to B." is new, we definitely used to have A ratin" [puppet] - 10https://gerrit.wikimedia.org/r/178488 (owner: 10Giuseppe Lavagetto) [13:32:30] (03PS2) 10Dzahn: icinga: use ssl_ciphersuite [puppet] - 10https://gerrit.wikimedia.org/r/178487 [13:35:57] (03CR) 10Giuseppe Lavagetto: [C: 031] apt::puppet: ensure => absent [puppet] - 10https://gerrit.wikimedia.org/r/178485 (owner: 10Faidon Liambotis) [13:36:12] (03PS2) 10Nemo bis: ssl_ciphersuite: remove RC4 [puppet] - 10https://gerrit.wikimedia.org/r/178488 (owner: 10Giuseppe Lavagetto) [13:36:35] (03CR) 10Nemo bis: "FWIW https://wikitech.wikimedia.org/wiki/Httpsless_domains" [puppet] - 10https://gerrit.wikimedia.org/r/178488 (owner: 10Giuseppe Lavagetto) [13:36:50] godog: any other diamond changes with the upgrade? IIRC they re-architected some things... [13:38:07] (03PS1) 10Dzahn: domainproxy: SSL, disable RC4 ciphers [puppet] - 10https://gerrit.wikimedia.org/r/178493 [13:38:35] Ah, yes, he did file the bug https://phabricator.wikimedia.org/T75644 [13:38:53] (03CR) 10Filippo Giunchedi: [C: 031] mysql: remove Ubuntu 11.10 conditional for /run [puppet] - 10https://gerrit.wikimedia.org/r/178490 (owner: 10Faidon Liambotis) [13:38:59] wasnt RC4 against BEAST? [13:39:14] (03CR) 10Yuvipanda: "w00, thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/178493 (owner: 10Dzahn) [13:39:27] (03CR) 10Filippo Giunchedi: [C: 031] Kill a few more hardy references [puppet] - 10https://gerrit.wikimedia.org/r/178489 (owner: 10Faidon Liambotis) [13:40:38] YuviPanda: threading perhaps? I think chasemp mentioned some diamond changes related to that [13:40:59] godog: yeah, they did something there. I’ll check with them to see if our custom collectors need anything [13:41:49] (03CR) 10Filippo Giunchedi: [C: 031] apt: make per-host proxy conditional on $lsbdistid [puppet] - 10https://gerrit.wikimedia.org/r/178486 (owner: 10Faidon Liambotis) [13:43:10] (03CR) 10Filippo Giunchedi: [C: 031] interface: drop support for Ubuntu < 10.04 [puppet] - 10https://gerrit.wikimedia.org/r/178484 (owner: 10Faidon Liambotis) [13:43:52] (03CR) 10Faidon Liambotis: [C: 032] base: drop support for old Ubuntu version/Solaris [puppet] - 10https://gerrit.wikimedia.org/r/178483 (owner: 10Faidon Liambotis) [13:44:04] (03CR) 10Faidon Liambotis: [C: 032] interface: drop support for Ubuntu < 10.04 [puppet] - 10https://gerrit.wikimedia.org/r/178484 (owner: 10Faidon Liambotis) [13:44:14] (03CR) 10Faidon Liambotis: [C: 032] apt::puppet: ensure => absent [puppet] - 10https://gerrit.wikimedia.org/r/178485 (owner: 10Faidon Liambotis) [13:44:26] (03CR) 10Faidon Liambotis: [C: 032] apt: make per-host proxy conditional on $lsbdistid [puppet] - 10https://gerrit.wikimedia.org/r/178486 (owner: 10Faidon Liambotis) [13:44:31] mark: yes, it was, but now it's like " we now consider this attack sufficiently mitigated client-side" [13:44:41] (03CR) 10Faidon Liambotis: [C: 032] Kill a few more hardy references [puppet] - 10https://gerrit.wikimedia.org/r/178489 (owner: 10Faidon Liambotis) [13:44:52] (03CR) 10Faidon Liambotis: [C: 032] mysql: remove Ubuntu 11.10 conditional for /run [puppet] - 10https://gerrit.wikimedia.org/r/178490 (owner: 10Faidon Liambotis) [13:46:16] https://community.qualys.com/blogs/securitylabs/2013/09/10/is-beast-still-a-threat [13:46:26] PROBLEM - puppet last run on mw1143 is CRITICAL: CRITICAL: Puppet has 7 failures [13:46:40] shit :) [13:46:47] YuviPanda: yep that'd be nice perhaps we need adjustments [13:46:48] PROBLEM - puppet last run on mw1144 is CRITICAL: CRITICAL: Puppet has 7 failures [13:46:56] or is that _joe_? [13:47:09] godog: yeah, although we have only one custom collector afaik [13:48:05] (03PS1) 10Faidon Liambotis: reprepro: remove hardy-wikimedia/lucid-wikimedia-mysql [puppet] - 10https://gerrit.wikimedia.org/r/178494 [13:48:07] (03PS1) 10Faidon Liambotis: reprepro: drop architecture i386 [puppet] - 10https://gerrit.wikimedia.org/r/178495 [13:48:23] PROBLEM - DPKG on mw1146 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:48:53] (03CR) 10Faidon Liambotis: [C: 032] reprepro: remove hardy-wikimedia/lucid-wikimedia-mysql [puppet] - 10https://gerrit.wikimedia.org/r/178494 (owner: 10Faidon Liambotis) [13:49:05] (03CR) 10Faidon Liambotis: [C: 032] reprepro: drop architecture i386 [puppet] - 10https://gerrit.wikimedia.org/r/178495 (owner: 10Faidon Liambotis) [13:49:18] (03CR) 10Filippo Giunchedi: base: drop support for old Ubuntu version/Solaris (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/178483 (owner: 10Faidon Liambotis) [13:49:55] paravoid: FWIW feel free to include me explicitly in code reviews so they show up in my inbox [13:50:08] inbox? [13:50:14] Gerrit 34872(3523) [13:50:19] I gave up on that a long time ago [13:50:26] PROBLEM - puppet last run on mw1140 is CRITICAL: CRITICAL: Puppet has 6 failures [13:50:34] PROBLEM - puppet last run on mw1141 is CRITICAL: CRITICAL: Puppet has 7 failures [13:50:39] I wonder how you manage that [13:51:04] PROBLEM - puppet last run on mw1142 is CRITICAL: CRITICAL: Puppet has 6 failures [13:51:27] RECOVERY - DPKG on mw1146 is OK: All packages OK [13:51:54] I have a separate gerrit folder, however reviews for which I'm a reviewer show up in inbox [13:51:57] PROBLEM - HHVM rendering on mw1144 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:53:00] also gerrit noise is marked as read already [13:53:11] <_joe_> me too [13:53:14] PROBLEM - puppet last run on mw1145 is CRITICAL: CRITICAL: Puppet has 1 failures [13:54:10] RECOVERY - puppet last run on mw1142 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [13:54:10] PROBLEM - puppet last run on mw1146 is CRITICAL: CRITICAL: Puppet has 1 failures [13:54:46] (03PS1) 10Yuvipanda: wdq-mm: Initial module + labs role [puppet] - 10https://gerrit.wikimedia.org/r/178496 [13:54:53] RECOVERY - HHVM rendering on mw1144 is OK: HTTP OK: HTTP/1.1 200 OK - 66007 bytes in 0.277 second response time [13:55:08] RECOVERY - puppet last run on mw1143 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [13:58:23] (03PS2) 10Dzahn: domainproxy: switch to use ssl_ciphersuite [puppet] - 10https://gerrit.wikimedia.org/r/178493 [13:59:20] <_joe_> !log repooling mw1140-46, depooling mw114[78], mw119[0-3] [13:59:24] Logged the message, Master [14:00:02] prefers the web interface to check incoming gerrit [14:01:18] (03CR) 10Dzahn: "Yuvipanda, i think so, yep, PS2" [puppet] - 10https://gerrit.wikimedia.org/r/178493 (owner: 10Dzahn) [14:04:47] (03PS1) 10Dzahn: stats.wm.org - disable RC4 [puppet] - 10https://gerrit.wikimedia.org/r/178497 [14:04:54] (03PS1) 10Yuvipanda: wdq-mm: Setup monit based monitoring to restart service [puppet] - 10https://gerrit.wikimedia.org/r/178498 [14:05:07] RECOVERY - puppet last run on mw1145 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:05:18] RECOVERY - puppet last run on mw1140 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:05:29] RECOVERY - puppet last run on mw1141 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:05:43] (03CR) 10jenkins-bot: [V: 04-1] wdq-mm: Setup monit based monitoring to restart service [puppet] - 10https://gerrit.wikimedia.org/r/178498 (owner: 10Yuvipanda) [14:07:23] (03CR) 10Matanya: [C: 031] stats.wm.org - disable RC4 [puppet] - 10https://gerrit.wikimedia.org/r/178497 (owner: 10Dzahn) [14:07:29] RECOVERY - puppet last run on mw1144 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:09:17] (03PS3) 10Dzahn: labsproxy: switch to use ssl_ciphersuite [puppet] - 10https://gerrit.wikimedia.org/r/178493 [14:11:36] RECOVERY - puppet last run on mw1146 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:11:39] (03PS2) 10Dzahn: stats.wm.org - disable RC4 [puppet] - 10https://gerrit.wikimedia.org/r/178497 [14:12:16] (03PS2) 10Yuvipanda: wdq-mm: Setup monit based monitoring to restart service [puppet] - 10https://gerrit.wikimedia.org/r/178498 [14:13:41] (03PS3) 10Yuvipanda: wdq-mm: Setup monit based monitoring to restart service [puppet] - 10https://gerrit.wikimedia.org/r/178498 [14:13:43] (03PS2) 10Yuvipanda: wdq-mm: Initial module + labs role [puppet] - 10https://gerrit.wikimedia.org/r/178496 [14:14:13] (03CR) 10Dzahn: [C: 032] stats.wm.org - disable RC4 [puppet] - 10https://gerrit.wikimedia.org/r/178497 (owner: 10Dzahn) [14:25:30] (03CR) 10PleaseStand: "Won't this break compatibility with IE 8 on Windows XP? According to ori: apergos: Please abort the json dump process on snapshot1003 [14:40:13] it's to slow right now, and will take days [14:40:25] We will fix that and then have someone restart it [14:40:39] YuviPanda: ^ [14:40:59] hoo: looking [14:41:15] dumpwikidata.sh or so [14:41:18] running as datasets [14:41:41] datasets 30750 0.0 0.0 12312 1436 pts/7 S<+ Dec08 0:00 /bin/bash /usr/local/bin/dumpwikidatajson.sh [14:41:43] hoo: ^ that one? [14:42:08] yes, please [14:42:43] !log killed wikidata dump process (/usr/local/bin/dumpwikidatajson.sh) per hoo [14:42:46] Logged the message, Master [14:43:24] (03PS1) 10Faidon Liambotis: ganglia_new: switch to stock upstart script [puppet] - 10https://gerrit.wikimedia.org/r/178512 [14:43:44] hoo: ^^ done [14:44:28] YuviPanda: Thanks :) [14:44:36] hoo: :) [14:46:04] ocg is alerting for the past 7 days [14:47:43] PROBLEM - RAID on mw1190 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [14:47:45] paravoid: the ganglia check? [14:48:21] there’s some discussion about that on https://phabricator.wikimedia.org/T76115 [14:48:23] PROBLEM - puppet last run on mw1148 is CRITICAL: CRITICAL: Puppet has 7 failures [14:48:45] PROBLEM - check configured eth on mw1190 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [14:48:52] PROBLEM - check if dhclient is running on mw1190 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [14:49:02] PROBLEM - check if salt-minion is running on mw1190 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [14:49:36] PROBLEM - nutcracker port on mw1190 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [14:49:41] PROBLEM - nutcracker process on mw1190 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [14:50:03] PROBLEM - puppet last run on mw1190 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [14:50:14] PROBLEM - DPKG on mw1190 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [14:50:14] PROBLEM - puppet last run on mw1191 is CRITICAL: CRITICAL: Puppet has 7 failures [14:51:05] RECOVERY - RAID on mw1190 is OK: OK: no RAID installed [14:51:12] PROBLEM - DPKG on mw1192 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:51:34] RECOVERY - check configured eth on mw1190 is OK: NRPE: Unable to read output [14:51:45] RECOVERY - check if dhclient is running on mw1190 is OK: PROCS OK: 0 processes with command name dhclient [14:51:54] RECOVERY - check if salt-minion is running on mw1190 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [14:52:19] (03PS3) 10Filippo Giunchedi: diamond: install python-statsd on >= precise [puppet] - 10https://gerrit.wikimedia.org/r/178475 [14:52:24] RECOVERY - nutcracker port on mw1190 is OK: TCP OK - 0.000 second response time on port 11212 [14:52:26] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] diamond: install python-statsd on >= precise [puppet] - 10https://gerrit.wikimedia.org/r/178475 (owner: 10Filippo Giunchedi) [14:52:34] RECOVERY - nutcracker process on mw1190 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [14:53:34] PROBLEM - puppet last run on mw1192 is CRITICAL: CRITICAL: Puppet has 7 failures [14:53:54] RECOVERY - DPKG on mw1192 is OK: All packages OK [14:55:36] (03CR) 10JanZerebecki: "Yes please." [puppet] - 10https://gerrit.wikimedia.org/r/178488 (owner: 10Giuseppe Lavagetto) [14:55:58] hey hoo, ummm, q! what are the wikidata json dumps?! [14:56:24] ottomata: yo, re: metrics I don't think : is allowed in statsd metric names [14:56:45] PROBLEM - puppet last run on mw1147 is CRITICAL: CRITICAL: Puppet has 1 failures [14:56:53] ah! the port separator. [14:56:55] cool [14:57:01] thanks, I will fix that. [14:57:04] ottomata: https://dumps.wikimedia.org/other/wikidata/ [14:57:16] Just all entities within Wikidata in a big list [14:57:21] (03PS2) 10Nemo bis: Update cached article count monthly to avoid social unrest [puppet] - 10https://gerrit.wikimedia.org/r/178170 [14:57:37] Problem we hit now is https://gerrit.wikimedia.org/r/178504 [14:57:38] ah ok, for a second maybe I thought it was xml revision dumps in json format that I didn't know about [14:57:43] probably MariaDB 10 regression [14:58:48] ottomata: also what we did for swift statsd is to run a local (tx)statsd on the machine since the metrics don't need to be aggregated across machines at the statsd level anyway [14:58:50] godog: hmm [14:58:51] Error: Failed to apply catalog: Could not find dependency Class[Packages::Python_statsd] for Package[python-configobj] at /etc/puppet/modules/diamond/manifests/init.pp:68 [14:58:57] godog: ^ in labs [14:59:00] possibly all of labs? [14:59:08] PROBLEM - puppet last run on mw1190 is CRITICAL: CRITICAL: Puppet has 6 failures [14:59:13] godog: I see errors trickling in one by one [14:59:19] RECOVERY - DPKG on mw1190 is OK: All packages OK [14:59:39] <_joe_> !log repooled mw1147-48 [14:59:45] Logged the message, Master [14:59:50] PROBLEM - puppet last run on rbf1001 is CRITICAL: CRITICAL: puppet fail [14:59:51] PROBLEM - puppet last run on amssq39 is CRITICAL: CRITICAL: puppet fail [14:59:58] godog, and each txstatsd then sends to graphite? [15:00:01] (03CR) 10JanZerebecki: [C: 04-1] "Forgot the -1." [puppet] - 10https://gerrit.wikimedia.org/r/178488 (owner: 10Giuseppe Lavagetto) [15:00:07] PROBLEM - puppet last run on db1061 is CRITICAL: CRITICAL: puppet fail [15:00:07] ottomata: precisely [15:00:11] ah ok [15:00:13] ja could do that [15:00:19] YuviPanda: sigh, I hate going against puppet's grain [15:00:38] godog: heh :) I’ve a lot of spam now [15:00:46] PROBLEM - puppet last run on terbium is CRITICAL: CRITICAL: puppet fail [15:00:47] PROBLEM - puppet last run on amssq33 is CRITICAL: CRITICAL: puppet fail [15:00:47] PROBLEM - puppet last run on mw1155 is CRITICAL: CRITICAL: puppet fail [15:00:56] PROBLEM - puppet last run on cp1044 is CRITICAL: CRITICAL: puppet fail [15:00:57] PROBLEM - puppet last run on db1037 is CRITICAL: CRITICAL: puppet fail [15:00:58] PROBLEM - puppet last run on mw1204 is CRITICAL: CRITICAL: puppet fail [15:01:07] PROBLEM - puppet last run on cp1053 is CRITICAL: CRITICAL: puppet fail [15:01:08] yeah that's me, reverting [15:01:19] PROBLEM - puppet last run on cp4009 is CRITICAL: CRITICAL: puppet fail [15:01:20] PROBLEM - puppet last run on strontium is CRITICAL: CRITICAL: puppet fail [15:01:20] PROBLEM - puppet last run on chromium is CRITICAL: CRITICAL: puppet fail [15:01:39] PROBLEM - puppet last run on search1011 is CRITICAL: CRITICAL: puppet fail [15:01:39] PROBLEM - puppet last run on cp1067 is CRITICAL: CRITICAL: puppet fail [15:01:39] PROBLEM - puppet last run on cp3018 is CRITICAL: CRITICAL: puppet fail [15:01:39] PROBLEM - puppet last run on lvs1004 is CRITICAL: CRITICAL: puppet fail [15:01:47] PROBLEM - puppet last run on palladium is CRITICAL: CRITICAL: puppet fail [15:01:49] PROBLEM - puppet last run on search1012 is CRITICAL: CRITICAL: puppet fail [15:01:50] PROBLEM - puppet last run on search1022 is CRITICAL: CRITICAL: puppet fail [15:01:51] PROBLEM - puppet last run on virt1008 is CRITICAL: CRITICAL: puppet fail [15:01:51] PROBLEM - puppet last run on dysprosium is CRITICAL: CRITICAL: puppet fail [15:01:51] PROBLEM - puppet last run on analytics1018 is CRITICAL: CRITICAL: puppet fail [15:01:51] PROBLEM - puppet last run on bast1001 is CRITICAL: CRITICAL: puppet fail [15:01:51] PROBLEM - puppet last run on cp3019 is CRITICAL: CRITICAL: puppet fail [15:02:07] PROBLEM - puppet last run on cp1052 is CRITICAL: CRITICAL: puppet fail [15:02:07] PROBLEM - puppet last run on db1045 is CRITICAL: CRITICAL: puppet fail [15:02:07] PROBLEM - puppet last run on search1006 is CRITICAL: CRITICAL: puppet fail [15:02:07] PROBLEM - puppet last run on cp3017 is CRITICAL: CRITICAL: puppet fail [15:02:07] PROBLEM - puppet last run on mw1154 is CRITICAL: CRITICAL: puppet fail [15:02:07] PROBLEM - puppet last run on search1004 is CRITICAL: CRITICAL: puppet fail [15:02:08] PROBLEM - puppet last run on db1035 is CRITICAL: CRITICAL: puppet fail [15:02:09] PROBLEM - puppet last run on labsdb1005 is CRITICAL: CRITICAL: puppet fail [15:02:09] PROBLEM - puppet last run on cp1054 is CRITICAL: CRITICAL: puppet fail [15:02:09] PROBLEM - puppet last run on cp4012 is CRITICAL: CRITICAL: puppet fail [15:02:30] PROBLEM - puppet last run on cp4010 is CRITICAL: CRITICAL: puppet fail [15:02:40] PROBLEM - puppet last run on mw1203 is CRITICAL: CRITICAL: puppet fail [15:02:47] PROBLEM - puppet last run on eeden is CRITICAL: CRITICAL: puppet fail [15:02:47] PROBLEM - puppet last run on lvs3003 is CRITICAL: CRITICAL: puppet fail [15:02:47] PROBLEM - puppet last run on cp1045 is CRITICAL: CRITICAL: puppet fail [15:02:47] PROBLEM - puppet last run on mw1207 is CRITICAL: CRITICAL: puppet fail [15:02:47] PROBLEM - puppet last run on mw1158 is CRITICAL: CRITICAL: puppet fail [15:02:47] RECOVERY - puppet last run on mw1147 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:02:57] PROBLEM - puppet last run on analytics1021 is CRITICAL: CRITICAL: puppet fail [15:02:57] PROBLEM - puppet last run on virt1005 is CRITICAL: CRITICAL: puppet fail [15:02:57] PROBLEM - puppet last run on stat1001 is CRITICAL: CRITICAL: puppet fail [15:02:59] PROBLEM - puppet last run on analytics1012 is CRITICAL: CRITICAL: puppet fail [15:02:59] PROBLEM - puppet last run on cp1040 is CRITICAL: CRITICAL: puppet fail [15:02:59] PROBLEM - puppet last run on cp1068 is CRITICAL: CRITICAL: puppet fail [15:03:06] <_joe_> godog: ugh that may have broken reimaging [15:03:07] PROBLEM - puppet last run on cp3011 is CRITICAL: CRITICAL: puppet fail [15:03:11] (03Abandoned) 10Nemo bis: Redirect bugzilla alias URLs to old-bugzilla [puppet] - 10https://gerrit.wikimedia.org/r/176898 (owner: 10Nemo bis) [15:03:16] PROBLEM - puppet last run on cp4011 is CRITICAL: CRITICAL: puppet fail [15:03:21] PROBLEM - puppet last run on db1068 is CRITICAL: CRITICAL: puppet fail [15:03:22] PROBLEM - puppet last run on db1056 is CRITICAL: CRITICAL: puppet fail [15:03:22] PROBLEM - puppet last run on cp3021 is CRITICAL: CRITICAL: puppet fail [15:03:23] PROBLEM - puppet last run on mw1015 is CRITICAL: CRITICAL: puppet fail [15:03:23] PROBLEM - puppet last run on cp4002 is CRITICAL: CRITICAL: puppet fail [15:03:29] PROBLEM - puppet last run on osm-cp1001 is CRITICAL: CRITICAL: puppet fail [15:03:38] PROBLEM - puppet last run on amssq37 is CRITICAL: CRITICAL: puppet fail [15:03:38] PROBLEM - puppet last run on mw1199 is CRITICAL: CRITICAL: puppet fail [15:03:38] PROBLEM - puppet last run on amssq52 is CRITICAL: CRITICAL: puppet fail [15:03:46] PROBLEM - puppet last run on mw1194 is CRITICAL: CRITICAL: puppet fail [15:03:46] PROBLEM - puppet last run on virt1009 is CRITICAL: CRITICAL: puppet fail [15:03:47] PROBLEM - puppet last run on tmh1001 is CRITICAL: CRITICAL: puppet fail [15:03:47] PROBLEM - puppet last run on lvs1003 is CRITICAL: CRITICAL: puppet fail [15:03:47] PROBLEM - puppet last run on search1003 is CRITICAL: CRITICAL: puppet fail [15:03:58] PROBLEM - puppet last run on db1009 is CRITICAL: CRITICAL: puppet fail [15:03:58] PROBLEM - puppet last run on hydrogen is CRITICAL: CRITICAL: puppet fail [15:03:58] PROBLEM - puppet last run on es1004 is CRITICAL: CRITICAL: puppet fail [15:03:58] PROBLEM - puppet last run on cp1066 is CRITICAL: CRITICAL: puppet fail [15:03:58] PROBLEM - puppet last run on lvs4004 is CRITICAL: CRITICAL: puppet fail [15:04:10] PROBLEM - puppet last run on tmh1002 is CRITICAL: CRITICAL: puppet fail [15:04:10] PROBLEM - puppet last run on cp4015 is CRITICAL: CRITICAL: puppet fail [15:04:10] PROBLEM - puppet last run on cp3005 is CRITICAL: CRITICAL: puppet fail [15:04:10] PROBLEM - puppet last run on search1009 is CRITICAL: CRITICAL: puppet fail [15:04:11] PROBLEM - puppet last run on lvs1006 is CRITICAL: CRITICAL: puppet fail [15:04:11] PROBLEM - puppet last run on es1003 is CRITICAL: CRITICAL: puppet fail [15:04:11] PROBLEM - puppet last run on lvs3002 is CRITICAL: CRITICAL: puppet fail [15:04:17] PROBLEM - puppet last run on labsdb1007 is CRITICAL: CRITICAL: puppet fail [15:04:17] PROBLEM - puppet last run on cp4016 is CRITICAL: CRITICAL: puppet fail [15:04:30] sigh, gerrit ultra slow [15:04:36] PROBLEM - puppet last run on db1058 is CRITICAL: CRITICAL: puppet fail [15:04:50] PROBLEM - puppet last run on cp3007 is CRITICAL: CRITICAL: puppet fail [15:04:52] ok these puppet checks really need to go [15:04:55] PROBLEM - puppet last run on mc1015 is CRITICAL: CRITICAL: puppet fail [15:04:56] PROBLEM - puppet last run on mw1157 is CRITICAL: CRITICAL: puppet fail [15:04:56] PROBLEM - puppet last run on search1014 is CRITICAL: CRITICAL: puppet fail [15:04:56] PROBLEM - puppet last run on search1008 is CRITICAL: CRITICAL: puppet fail [15:04:56] PROBLEM - puppet last run on search1021 is CRITICAL: CRITICAL: puppet fail [15:04:56] PROBLEM - puppet last run on search1019 is CRITICAL: CRITICAL: puppet fail [15:04:59] and have something in aggregate [15:05:07] PROBLEM - puppet last run on zinc is CRITICAL: CRITICAL: puppet fail [15:05:07] (03PS1) 10Filippo Giunchedi: Revert "diamond: install python-statsd on >= precise" [puppet] - 10https://gerrit.wikimedia.org/r/178520 [15:05:08] PROBLEM - puppet last run on amssq50 is CRITICAL: CRITICAL: puppet fail [15:05:08] PROBLEM - puppet last run on amssq45 is CRITICAL: CRITICAL: puppet fail [15:05:15] PROBLEM - puppet last run on cp4013 is CRITICAL: CRITICAL: puppet fail [15:05:15] PROBLEM - puppet last run on db1024 is CRITICAL: CRITICAL: puppet fail [15:05:31] PROBLEM - puppet last run on mc1008 is CRITICAL: CRITICAL: puppet fail [15:05:33] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] Revert "diamond: install python-statsd on >= precise" [puppet] - 10https://gerrit.wikimedia.org/r/178520 (owner: 10Filippo Giunchedi) [15:05:36] PROBLEM - puppet last run on cp1057 is CRITICAL: CRITICAL: puppet fail [15:05:36] PROBLEM - puppet last run on db1019 is CRITICAL: CRITICAL: puppet fail [15:05:36] PROBLEM - puppet last run on es1005 is CRITICAL: CRITICAL: puppet fail [15:05:50] PROBLEM - puppet last run on cp1064 is CRITICAL: CRITICAL: puppet fail [15:05:50] PROBLEM - puppet last run on lvs4001 is CRITICAL: CRITICAL: puppet fail [15:05:51] PROBLEM - puppet last run on neon is CRITICAL: CRITICAL: puppet fail [15:05:51] PROBLEM - puppet last run on db1005 is CRITICAL: CRITICAL: puppet fail [15:05:51] PROBLEM - puppet last run on mw1013 is CRITICAL: CRITICAL: puppet fail [15:06:00] shoudl be recovering "shortly" [15:06:12] (03PS6) 10Hoo man: Allow "hoo" to sudo into datasets [puppet] - 10https://gerrit.wikimedia.org/r/152724 [15:06:13] PROBLEM - puppet last run on cp1043 is CRITICAL: CRITICAL: puppet fail [15:06:13] PROBLEM - puppet last run on amssq57 is CRITICAL: CRITICAL: puppet fail [15:06:15] paravoid: yeah the alert shower isn't fun [15:06:27] PROBLEM - puppet last run on cp1051 is CRITICAL: CRITICAL: puppet fail [15:06:27] PROBLEM - puppet last run on cp4007 is CRITICAL: CRITICAL: puppet fail [15:06:27] PROBLEM - puppet last run on analytics1019 is CRITICAL: CRITICAL: puppet fail [15:06:44] PROBLEM - puppet last run on erbium is CRITICAL: CRITICAL: puppet fail [15:06:44] (03CR) 10Hoo man: "Manually rebased" [puppet] - 10https://gerrit.wikimedia.org/r/152724 (owner: 10Hoo man) [15:06:45] PROBLEM - puppet last run on mc1010 is CRITICAL: CRITICAL: puppet fail [15:06:45] PROBLEM - puppet last run on amssq43 is CRITICAL: CRITICAL: puppet fail [15:06:55] (03CR) 10jenkins-bot: [V: 04-1] Allow "hoo" to sudo into datasets [puppet] - 10https://gerrit.wikimedia.org/r/152724 (owner: 10Hoo man) [15:07:00] PROBLEM - puppet last run on cp3022 is CRITICAL: CRITICAL: puppet fail [15:07:02] well :P [15:07:05] PROBLEM - puppet last run on zirconium is CRITICAL: CRITICAL: puppet fail [15:07:06] PROBLEM - puppet last run on db1041 is CRITICAL: CRITICAL: puppet fail [15:07:06] PROBLEM - puppet last run on cp4017 is CRITICAL: CRITICAL: puppet fail [15:07:26] PROBLEM - puppet last run on cp1065 is CRITICAL: CRITICAL: puppet fail [15:07:26] PROBLEM - puppet last run on cp1069 is CRITICAL: CRITICAL: puppet fail [15:07:27] PROBLEM - puppet last run on db1007 is CRITICAL: CRITICAL: puppet fail [15:07:27] PROBLEM - puppet last run on rdb1004 is CRITICAL: CRITICAL: puppet fail [15:07:27] PROBLEM - puppet last run on mw1196 is CRITICAL: CRITICAL: puppet fail [15:07:46] PROBLEM - puppet last run on logstash1003 is CRITICAL: CRITICAL: puppet fail [15:07:46] 15:06:26 Duplicate group GIDs: [733] [15:07:47] PROBLEM - puppet last run on virt1002 is CRITICAL: CRITICAL: puppet fail [15:07:58] I'm not even mad, that's amazing [15:07:59] PROBLEM - puppet last run on search1020 is CRITICAL: CRITICAL: puppet fail [15:08:06] PROBLEM - puppet last run on amslvs4 is CRITICAL: CRITICAL: puppet fail [15:08:08] PROBLEM - puppet last run on cp3006 is CRITICAL: CRITICAL: puppet fail [15:08:10] PROBLEM - puppet last run on es1006 is CRITICAL: CRITICAL: puppet fail [15:08:16] PROBLEM - puppet last run on mc1016 is CRITICAL: CRITICAL: puppet fail [15:08:16] PROBLEM - puppet last run on mc1004 is CRITICAL: CRITICAL: puppet fail [15:08:17] RECOVERY - puppet last run on mw1190 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [15:08:27] PROBLEM - puppet last run on analytics1033 is CRITICAL: CRITICAL: puppet fail [15:08:27] PROBLEM - puppet last run on es1009 is CRITICAL: CRITICAL: puppet fail [15:08:27] PROBLEM - puppet last run on mc1011 is CRITICAL: CRITICAL: puppet fail [15:08:28] PROBLEM - puppet last run on mc1009 is CRITICAL: CRITICAL: puppet fail [15:08:28] PROBLEM - puppet last run on carbon is CRITICAL: CRITICAL: puppet fail [15:08:33] PROBLEM - puppet last run on amssq49 is CRITICAL: CRITICAL: puppet fail [15:08:33] PROBLEM - puppet last run on cp3013 is CRITICAL: CRITICAL: puppet fail [15:08:38] PROBLEM - puppet last run on ytterbium is CRITICAL: CRITICAL: puppet fail [15:08:38] PROBLEM - puppet last run on amssq59 is CRITICAL: CRITICAL: puppet fail [15:08:38] PROBLEM - puppet last run on amssq58 is CRITICAL: CRITICAL: puppet fail [15:08:50] PROBLEM - puppet last run on magnesium is CRITICAL: CRITICAL: puppet fail [15:08:52] PROBLEM - puppet last run on mw1005 is CRITICAL: CRITICAL: puppet fail [15:08:52] PROBLEM - puppet last run on mw1007 is CRITICAL: CRITICAL: puppet fail [15:08:59] PROBLEM - puppet last run on dbproxy1002 is CRITICAL: CRITICAL: puppet fail [15:09:11] PROBLEM - puppet last run on cp1070 is CRITICAL: CRITICAL: puppet fail [15:09:13] PROBLEM - puppet last run on vanadium is CRITICAL: CRITICAL: puppet fail [15:09:26] PROBLEM - puppet last run on dbstore1001 is CRITICAL: CRITICAL: puppet fail [15:09:26] PROBLEM - puppet last run on cp3015 is CRITICAL: CRITICAL: puppet fail [15:09:26] PROBLEM - puppet last run on amslvs2 is CRITICAL: CRITICAL: puppet fail [15:09:38] PROBLEM - puppet last run on mw1012 is CRITICAL: CRITICAL: puppet fail [15:09:39] PROBLEM - puppet last run on db1073 is CRITICAL: CRITICAL: puppet fail [15:09:39] PROBLEM - puppet last run on pc1001 is CRITICAL: CRITICAL: puppet fail [15:09:39] PROBLEM - puppet last run on logstash1001 is CRITICAL: CRITICAL: puppet fail [15:09:47] PROBLEM - puppet last run on searchidx1001 is CRITICAL: CRITICAL: puppet fail [15:09:47] PROBLEM - puppet last run on amssq44 is CRITICAL: CRITICAL: puppet fail [15:09:47] PROBLEM - puppet last run on amssq54 is CRITICAL: CRITICAL: puppet fail [15:09:50] vim loves yaml... [15:09:58] PROBLEM - puppet last run on mw1009 is CRITICAL: CRITICAL: puppet fail [15:10:08] PROBLEM - puppet last run on cp3020 is CRITICAL: CRITICAL: puppet fail [15:10:08] PROBLEM - puppet last run on mw1205 is CRITICAL: CRITICAL: puppet fail [15:10:08] PROBLEM - puppet last run on es1008 is CRITICAL: CRITICAL: puppet fail [15:10:09] PROBLEM - puppet last run on mw1006 is CRITICAL: CRITICAL: puppet fail [15:10:19] PROBLEM - puppet last run on lvs4002 is CRITICAL: CRITICAL: puppet fail [15:10:19] PROBLEM - puppet last run on snapshot1003 is CRITICAL: CRITICAL: puppet fail [15:10:19] PROBLEM - puppet last run on cp4006 is CRITICAL: CRITICAL: puppet fail [15:10:29] PROBLEM - puppet last run on cp1059 is CRITICAL: CRITICAL: puppet fail [15:10:29] PROBLEM - puppet last run on cp1039 is CRITICAL: CRITICAL: puppet fail [15:10:29] PROBLEM - puppet last run on db1065 is CRITICAL: CRITICAL: puppet fail [15:10:30] PROBLEM - puppet last run on mw1008 is CRITICAL: CRITICAL: puppet fail [15:10:30] PROBLEM - puppet last run on netmon1001 is CRITICAL: CRITICAL: puppet fail [15:10:41] PROBLEM - puppet last run on amssq32 is CRITICAL: CRITICAL: puppet fail [15:10:41] PROBLEM - puppet last run on cp3003 is CRITICAL: CRITICAL: puppet fail [15:10:45] PROBLEM - puppet last run on mw1197 is CRITICAL: CRITICAL: puppet fail [15:10:46] (03PS7) 10Hoo man: Allow "hoo" to sudo into datasets [puppet] - 10https://gerrit.wikimedia.org/r/152724 [15:11:01] PROBLEM - puppet last run on db1053 is CRITICAL: CRITICAL: puppet fail [15:11:09] PROBLEM - puppet last run on search1016 is CRITICAL: CRITICAL: puppet fail [15:11:09] PROBLEM - puppet last run on search1010 is CRITICAL: CRITICAL: puppet fail [15:11:09] PROBLEM - puppet last run on mw1200 is CRITICAL: CRITICAL: puppet fail [15:11:10] PROBLEM - puppet last run on cp1047 is CRITICAL: CRITICAL: puppet fail [15:11:19] PROBLEM - puppet last run on db1031 is CRITICAL: CRITICAL: puppet fail [15:11:19] PROBLEM - puppet last run on cp1049 is CRITICAL: CRITICAL: puppet fail [15:11:19] PROBLEM - puppet last run on db1029 is CRITICAL: CRITICAL: puppet fail [15:11:19] PROBLEM - puppet last run on mw1160 is CRITICAL: CRITICAL: puppet fail [15:11:20] PROBLEM - puppet last run on db1040 is CRITICAL: CRITICAL: puppet fail [15:11:20] PROBLEM - puppet last run on mc1006 is CRITICAL: CRITICAL: puppet fail [15:11:33] PROBLEM - puppet last run on db1066 is CRITICAL: CRITICAL: puppet fail [15:11:33] PROBLEM - puppet last run on helium is CRITICAL: CRITICAL: puppet fail [15:11:39] PROBLEM - puppet last run on mw1153 is CRITICAL: CRITICAL: puppet fail [15:11:39] PROBLEM - puppet last run on caesium is CRITICAL: CRITICAL: puppet fail [15:11:45] PROBLEM - puppet last run on cp3008 is CRITICAL: CRITICAL: puppet fail [15:11:46] (03CR) 10Hoo man: "Another group took the gid 733 which I wanted to use for the new group. Changed gid to 740 now." [puppet] - 10https://gerrit.wikimedia.org/r/152724 (owner: 10Hoo man) [15:11:47] PROBLEM - puppet last run on es1001 is CRITICAL: CRITICAL: puppet fail [15:11:47] PROBLEM - puppet last run on mc1002 is CRITICAL: CRITICAL: puppet fail [15:11:50] PROBLEM - puppet last run on cp3014 is CRITICAL: CRITICAL: puppet fail [15:11:50] PROBLEM - puppet last run on amssq35 is CRITICAL: CRITICAL: puppet fail [15:11:51] PROBLEM - puppet last run on virt1006 is CRITICAL: CRITICAL: puppet fail [15:11:51] PROBLEM - puppet last run on cp1056 is CRITICAL: CRITICAL: puppet fail [15:11:51] PROBLEM - puppet last run on potassium is CRITICAL: CRITICAL: puppet fail [15:11:52] PROBLEM - puppet last run on cp3016 is CRITICAL: CRITICAL: puppet fail [15:12:00] PROBLEM - puppet last run on mw1003 is CRITICAL: CRITICAL: puppet fail [15:12:00] PROBLEM - puppet last run on lvs1002 is CRITICAL: CRITICAL: puppet fail [15:12:13] PROBLEM - puppet last run on cp1055 is CRITICAL: CRITICAL: puppet fail [15:12:15] PROBLEM - puppet last run on db1002 is CRITICAL: CRITICAL: puppet fail [15:12:17] PROBLEM - puppet last run on rbf1002 is CRITICAL: CRITICAL: puppet fail [15:12:33] PROBLEM - puppet last run on db1042 is CRITICAL: CRITICAL: puppet fail [15:12:34] PROBLEM - puppet last run on amssq53 is CRITICAL: CRITICAL: puppet fail [15:12:34] PROBLEM - puppet last run on amssq61 is CRITICAL: CRITICAL: puppet fail [15:12:34] PROBLEM - puppet last run on db1022 is CRITICAL: CRITICAL: puppet fail [15:12:35] PROBLEM - puppet last run on cp4008 is CRITICAL: CRITICAL: puppet fail [15:12:43] PROBLEM - puppet last run on lvs3001 is CRITICAL: CRITICAL: puppet fail [15:12:44] PROBLEM - puppet last run on db1018 is CRITICAL: CRITICAL: puppet fail [15:12:51] RECOVERY - puppet last run on mw1148 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [15:13:00] PROBLEM - puppet last run on db1067 is CRITICAL: CRITICAL: puppet fail [15:13:13] PROBLEM - puppet last run on db1021 is CRITICAL: CRITICAL: puppet fail [15:13:13] PROBLEM - puppet last run on lvs1005 is CRITICAL: CRITICAL: puppet fail [15:13:13] PROBLEM - puppet last run on cp4014 is CRITICAL: CRITICAL: puppet fail [15:13:14] PROBLEM - puppet last run on mc1003 is CRITICAL: CRITICAL: puppet fail [15:13:14] PROBLEM - puppet last run on iron is CRITICAL: CRITICAL: puppet fail [15:13:14] PROBLEM - puppet last run on db1015 is CRITICAL: CRITICAL: puppet fail [15:13:24] PROBLEM - puppet last run on search1001 is CRITICAL: CRITICAL: puppet fail [15:13:24] PROBLEM - puppet last run on search1007 is CRITICAL: CRITICAL: puppet fail [15:13:35] PROBLEM - puppet last run on db1034 is CRITICAL: CRITICAL: puppet fail [15:13:37] PROBLEM - puppet last run on ruthenium is CRITICAL: CRITICAL: puppet fail [15:13:37] PROBLEM - puppet last run on cp1061 is CRITICAL: CRITICAL: puppet fail [15:13:54] PROBLEM - puppet last run on amssq47 is CRITICAL: CRITICAL: puppet fail [15:13:57] PROBLEM - puppet last run on cp4004 is CRITICAL: CRITICAL: puppet fail [15:13:58] PROBLEM - puppet last run on amslvs1 is CRITICAL: CRITICAL: puppet fail [15:14:05] RECOVERY - puppet last run on search1011 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:14:17] RECOVERY - puppet last run on search1022 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [15:14:17] PROBLEM - puppet last run on search1018 is CRITICAL: CRITICAL: puppet fail [15:14:17] PROBLEM - puppet last run on dbproxy1001 is CRITICAL: CRITICAL: puppet fail [15:14:25] PROBLEM - puppet last run on amssq48 is CRITICAL: CRITICAL: puppet fail [15:14:27] PROBLEM - puppet last run on amssq46 is CRITICAL: CRITICAL: puppet fail [15:14:27] PROBLEM - puppet last run on amssq55 is CRITICAL: CRITICAL: puppet fail [15:14:35] RECOVERY - puppet last run on cp1052 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [15:14:39] RECOVERY - puppet last run on search1006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:14:44] RECOVERY - puppet last run on search1004 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [15:14:44] PROBLEM - puppet last run on mw1011 is CRITICAL: CRITICAL: puppet fail [15:14:45] PROBLEM - puppet last run on mw1002 is CRITICAL: CRITICAL: puppet fail [15:14:45] PROBLEM - puppet last run on cp4003 is CRITICAL: CRITICAL: puppet fail [15:15:10] RECOVERY - puppet last run on mw1203 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [15:15:16] RECOVERY - puppet last run on rbf1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:15:24] RECOVERY - puppet last run on lvs3003 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [15:15:26] RECOVERY - puppet last run on cp1045 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [15:15:27] PROBLEM - puppet last run on logstash1002 is CRITICAL: CRITICAL: puppet fail [15:15:27] RECOVERY - puppet last run on stat1001 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [15:15:34] RECOVERY - puppet last run on cp1040 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [15:15:36] RECOVERY - puppet last run on cp1068 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [15:15:44] RECOVERY - puppet last run on db1056 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [15:15:44] RECOVERY - puppet last run on db1061 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:15:57] RECOVERY - puppet last run on cp4002 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [15:15:58] RECOVERY - puppet last run on osm-cp1001 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [15:16:22] PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: puppet fail [15:16:23] RECOVERY - puppet last run on amssq33 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [15:16:23] RECOVERY - puppet last run on lvs1003 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [15:16:23] RECOVERY - puppet last run on search1003 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [15:16:24] RECOVERY - puppet last run on terbium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:16:26] RECOVERY - puppet last run on es1004 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [15:16:27] RECOVERY - puppet last run on lvs4004 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [15:16:38] RECOVERY - puppet last run on cp1044 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:16:38] RECOVERY - puppet last run on cp1066 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [15:16:38] RECOVERY - puppet last run on mw1155 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [15:16:38] RECOVERY - puppet last run on db1037 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:16:39] RECOVERY - puppet last run on mw1204 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:16:47] RECOVERY - puppet last run on cp1053 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:16:50] RECOVERY - puppet last run on cp3005 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [15:17:00] RECOVERY - puppet last run on strontium is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:17:00] RECOVERY - puppet last run on cp4009 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [15:17:00] RECOVERY - puppet last run on chromium is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:17:07] RECOVERY - puppet last run on cp1067 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:17:09] RECOVERY - puppet last run on lvs1004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:17:16] RECOVERY - puppet last run on mc1015 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:17:17] RECOVERY - puppet last run on cp3018 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:17:18] RECOVERY - puppet last run on palladium is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [15:17:18] RECOVERY - puppet last run on search1012 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:17:30] RECOVERY - puppet last run on search1019 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [15:17:39] RECOVERY - puppet last run on zinc is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [15:17:39] RECOVERY - puppet last run on virt1008 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:17:39] RECOVERY - puppet last run on dysprosium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:17:40] RECOVERY - puppet last run on analytics1018 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:17:40] RECOVERY - puppet last run on bast1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:17:40] RECOVERY - puppet last run on cp3019 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [15:17:40] RECOVERY - puppet last run on db1045 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:17:41] RECOVERY - puppet last run on mw1154 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:17:50] RECOVERY - puppet last run on db1035 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:17:55] RECOVERY - puppet last run on cp1054 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:17:55] RECOVERY - puppet last run on labsdb1005 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:17:55] RECOVERY - puppet last run on cp4012 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:18:10] RECOVERY - puppet last run on cp4010 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [15:18:11] RECOVERY - puppet last run on es1005 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [15:18:21] RECOVERY - puppet last run on cp1064 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [15:18:22] RECOVERY - puppet last run on db1005 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [15:18:22] RECOVERY - puppet last run on eeden is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:18:31] RECOVERY - puppet last run on mw1207 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:18:31] RECOVERY - puppet last run on mw1158 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [15:18:31] RECOVERY - puppet last run on analytics1021 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:18:41] RECOVERY - puppet last run on virt1005 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:18:41] RECOVERY - puppet last run on analytics1012 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:18:42] RECOVERY - puppet last run on amssq39 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:18:49] RECOVERY - puppet last run on cp3011 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:19:03] RECOVERY - puppet last run on cp4011 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:19:04] RECOVERY - puppet last run on db1068 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:19:05] RECOVERY - puppet last run on mw1015 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:19:05] RECOVERY - puppet last run on cp3021 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:19:15] RECOVERY - puppet last run on erbium is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [15:19:16] RECOVERY - puppet last run on amssq37 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [15:19:20] RECOVERY - puppet last run on mw1199 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:19:21] RECOVERY - puppet last run on amssq52 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [15:19:21] RECOVERY - puppet last run on virt1009 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:19:21] RECOVERY - puppet last run on mw1194 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:19:21] RECOVERY - puppet last run on tmh1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:19:22] anyone else have phab 503? [15:19:35] RECOVERY - puppet last run on db1009 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:19:36] RECOVERY - puppet last run on hydrogen is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:20:00] RECOVERY - puppet last run on tmh1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:20:02] RECOVERY - puppet last run on cp1065 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [15:20:02] RECOVERY - puppet last run on cp1069 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [15:20:03] RECOVERY - puppet last run on cp4015 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:20:04] RECOVERY - puppet last run on search1009 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:20:04] RECOVERY - puppet last run on lvs1006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:20:04] RECOVERY - puppet last run on es1003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:20:04] RECOVERY - puppet last run on labsdb1007 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:20:11] RECOVERY - puppet last run on lvs3002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:20:13] RECOVERY - puppet last run on cp4016 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:20:31] RECOVERY - puppet last run on db1058 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:20:32] RECOVERY - puppet last run on virt1002 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [15:20:39] RECOVERY - puppet last run on cp3007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:20:39] RECOVERY - puppet last run on mw1157 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:20:40] RECOVERY - puppet last run on search1014 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [15:20:40] RECOVERY - puppet last run on search1008 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [15:20:40] RECOVERY - puppet last run on search1021 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:20:56] RECOVERY - puppet last run on amslvs4 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [15:20:56] RECOVERY - puppet last run on amssq45 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:20:56] RECOVERY - puppet last run on amssq50 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [15:20:56] RECOVERY - puppet last run on es1006 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [15:20:56] RECOVERY - puppet last run on cp4013 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [15:20:57] RECOVERY - puppet last run on cp3017 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:21:00] RECOVERY - puppet last run on db1024 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:21:00] RECOVERY - puppet last run on es1009 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [15:21:14] RECOVERY - puppet last run on mc1008 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:21:27] RECOVERY - puppet last run on mw1191 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:21:27] RECOVERY - puppet last run on cp1057 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:21:27] RECOVERY - puppet last run on ytterbium is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [15:21:27] RECOVERY - puppet last run on amssq58 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [15:21:27] RECOVERY - puppet last run on db1019 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:21:28] RECOVERY - puppet last run on lvs4001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:21:28] RECOVERY - puppet last run on mw1192 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [15:21:28] RECOVERY - puppet last run on neon is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [15:21:42] RECOVERY - puppet last run on mw1013 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:21:45] in particular 503 is on subscribe to a maniphest [15:21:50] RECOVERY - puppet last run on vanadium is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [15:21:57] RECOVERY - puppet last run on dbstore1001 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [15:22:04] RECOVERY - puppet last run on cp1043 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:22:05] RECOVERY - puppet last run on amssq57 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:22:15] <_joe_> !log repooling mw1190-93, depooling mw1194-1200 [15:22:19] Logged the message, Master [15:22:22] RECOVERY - puppet last run on cp1051 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:22:23] RECOVERY - puppet last run on pc1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:22:23] RECOVERY - puppet last run on cp4007 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [15:22:23] RECOVERY - puppet last run on analytics1019 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:22:30] RECOVERY - puppet last run on mc1010 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:22:44] RECOVERY - puppet last run on cp3022 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [15:22:54] RECOVERY - puppet last run on zirconium is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:22:54] RECOVERY - puppet last run on db1041 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [15:22:54] RECOVERY - puppet last run on cp4017 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [15:22:55] RECOVERY - puppet last run on mw1006 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [15:23:00] RECOVERY - puppet last run on lvs4002 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [15:23:13] RECOVERY - puppet last run on cp1059 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:23:13] RECOVERY - puppet last run on db1007 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [15:23:13] RECOVERY - puppet last run on rdb1004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:23:14] RECOVERY - puppet last run on db1065 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:23:20] RECOVERY - puppet last run on mw1196 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [15:23:25] RECOVERY - puppet last run on netmon1001 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [15:23:31] RECOVERY - puppet last run on mw1197 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [15:23:50] RECOVERY - puppet last run on logstash1003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:23:56] RECOVERY - puppet last run on db1053 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:24:01] RECOVERY - puppet last run on search1016 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [15:24:01] RECOVERY - puppet last run on search1020 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:24:01] RECOVERY - puppet last run on mw1200 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:24:05] RECOVERY - puppet last run on cp1047 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [15:24:06] RECOVERY - puppet last run on db1031 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [15:24:06] RECOVERY - puppet last run on db1029 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:24:06] RECOVERY - puppet last run on cp1049 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:24:08] RECOVERY - puppet last run on cp3006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:24:08] RECOVERY - puppet last run on mc1016 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:24:12] RECOVERY - puppet last run on mc1006 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [15:24:16] RECOVERY - puppet last run on mc1004 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:24:30] RECOVERY - puppet last run on db1066 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [15:24:35] RECOVERY - puppet last run on helium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:24:36] RECOVERY - puppet last run on analytics1033 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:24:36] RECOVERY - puppet last run on carbon is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:24:37] RECOVERY - puppet last run on mc1011 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:24:37] RECOVERY - puppet last run on mc1009 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [15:24:37] RECOVERY - puppet last run on caesium is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:24:37] RECOVERY - puppet last run on es1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:24:37] RECOVERY - puppet last run on cp3008 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:24:37] RECOVERY - puppet last run on mc1002 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [15:24:37] RECOVERY - puppet last run on amssq49 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:24:38] RECOVERY - puppet last run on cp3013 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:24:41] RECOVERY - puppet last run on potassium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:24:42] RECOVERY - puppet last run on amssq59 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:24:52] RECOVERY - puppet last run on magnesium is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:24:52] RECOVERY - puppet last run on mw1003 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [15:24:52] RECOVERY - puppet last run on mw1005 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:24:52] RECOVERY - puppet last run on mw1007 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:24:52] RECOVERY - puppet last run on lvs1002 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [15:24:52] RECOVERY - puppet last run on dbproxy1002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:25:02] RECOVERY - puppet last run on cp1055 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:25:11] RECOVERY - puppet last run on cp1070 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:25:23] RECOVERY - puppet last run on rbf1002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:25:23] RECOVERY - puppet last run on cp3015 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:25:23] RECOVERY - puppet last run on amslvs2 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:25:23] RECOVERY - puppet last run on amssq61 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [15:25:23] RECOVERY - puppet last run on amssq53 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [15:25:23] RECOVERY - puppet last run on db1022 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [15:25:31] RECOVERY - puppet last run on mw1012 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:25:40] RECOVERY - puppet last run on db1073 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:25:49] RECOVERY - puppet last run on lvs3001 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [15:25:51] RECOVERY - puppet last run on db1018 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [15:25:51] RECOVERY - puppet last run on logstash1001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [15:25:52] RECOVERY - puppet last run on searchidx1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:25:52] RECOVERY - puppet last run on lvs1005 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:25:53] RECOVERY - puppet last run on amssq44 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:25:53] RECOVERY - puppet last run on amssq54 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:26:04] RECOVERY - puppet last run on mw1009 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [15:26:04] RECOVERY - puppet last run on amssq43 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [15:26:04] RECOVERY - puppet last run on cp3020 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:26:04] RECOVERY - puppet last run on es1008 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:26:04] RECOVERY - puppet last run on iron is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:26:05] RECOVERY - puppet last run on db1015 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [15:26:05] RECOVERY - puppet last run on mc1003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:26:05] RECOVERY - puppet last run on search1001 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [15:26:18] RECOVERY - puppet last run on search1007 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [15:26:22] RECOVERY - puppet last run on snapshot1003 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [15:26:35] RECOVERY - puppet last run on cp4006 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:26:40] RECOVERY - puppet last run on cp1039 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [15:26:40] RECOVERY - puppet last run on cp1061 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [15:26:41] RECOVERY - puppet last run on mw1008 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:26:55] RECOVERY - puppet last run on amslvs1 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [15:26:55] RECOVERY - puppet last run on amssq32 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:26:55] RECOVERY - puppet last run on cp3003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:27:17] RECOVERY - puppet last run on search1010 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:27:17] RECOVERY - puppet last run on search1018 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:27:17] RECOVERY - puppet last run on dbproxy1001 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [15:27:30] RECOVERY - puppet last run on mw1160 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [15:27:31] RECOVERY - puppet last run on db1040 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:27:33] RECOVERY - puppet last run on mw1011 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:27:42] RECOVERY - puppet last run on mw1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:27:42] RECOVERY - puppet last run on cp4003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:27:54] RECOVERY - puppet last run on cp3014 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:27:54] RECOVERY - puppet last run on amssq35 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:28:00] RECOVERY - puppet last run on virt1006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:28:00] RECOVERY - puppet last run on cp1056 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:28:01] RECOVERY - puppet last run on mw1153 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:28:01] RECOVERY - puppet last run on cp3016 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:28:20] RECOVERY - puppet last run on db1002 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [15:28:23] RECOVERY - puppet last run on logstash1002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:28:35] RECOVERY - puppet last run on db1042 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:28:36] (03CR) 10Nikerabbit: [C: 032] Beta: Remove deprecated wgContentTranslationServerURL [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178474 (owner: 10KartikMistry) [15:28:45] (03Merged) 10jenkins-bot: Beta: Remove deprecated wgContentTranslationServerURL [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178474 (owner: 10KartikMistry) [15:28:56] RECOVERY - puppet last run on cp4008 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:29:02] RECOVERY - puppet last run on db1067 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [15:29:02] RECOVERY - puppet last run on db1021 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:29:03] RECOVERY - puppet last run on cp4014 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [15:29:15] RECOVERY - puppet last run on tin is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:29:15] RECOVERY - puppet last run on mw1205 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:29:34] RECOVERY - puppet last run on db1034 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [15:29:49] RECOVERY - puppet last run on ruthenium is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:29:55] RECOVERY - puppet last run on amssq47 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:30:06] RECOVERY - puppet last run on cp4004 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:30:25] PROBLEM - MySQL Processlist on db1015 is CRITICAL: CRIT 4 unauthenticated, 0 locked, 86 copy to table, 97 statistics [15:30:37] RECOVERY - puppet last run on amssq48 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:30:38] RECOVERY - puppet last run on amssq55 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [15:30:38] RECOVERY - puppet last run on amssq46 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:33:10] RECOVERY - MySQL Processlist on db1015 is OK: OK 1 unauthenticated, 0 locked, 0 copy to table, 3 statistics [15:36:57] !log restarted EventLogging's m2 writer on vanadium. Events did not get written into the database. [15:37:01] Logged the message, Master [15:43:35] PROBLEM - puppet last run on mw1105 is CRITICAL: CRITICAL: Puppet has 1 failures [15:44:40] (03PS1) 10Cmjohnson: updating dhcpd entry for virt1010-1012 [puppet] - 10https://gerrit.wikimedia.org/r/178525 [15:46:31] (03CR) 10Cmjohnson: [C: 032] updating dhcpd entry for virt1010-1012 [puppet] - 10https://gerrit.wikimedia.org/r/178525 (owner: 10Cmjohnson) [15:50:12] * anomie sees nothing for SWAT this morning [15:50:51] anomie: can I be a pain and ninja-add a last minute thing? [15:50:56] gi11es: Sure [15:51:00] I'm scrambling to prepare the commits right now [15:52:36] <_joe_> anomie: can you ping me just before swatting? [15:52:40] _joe_: Sure [15:53:12] <_joe_> (still reimaging things) [15:53:24] anomie: aude has something there about a patch coming I think [15:53:51] manybubbles: Ah, another last-minute addition [15:53:57] check again :) [15:54:08] should have submodule update soon [15:55:36] RECOVERY - puppet last run on mw1105 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [15:56:27] added my stuff [15:57:52] lots of stuff in the empty SWAT! [15:57:58] PROBLEM - nutcracker port on mw1231 is CRITICAL: Cannot assign requested address [15:58:37] <^d> Oh man, this is gonna take forever! [15:59:08] <_joe_> mmmh [16:00:04] manybubbles, anomie, ^d, marktraceur, aude: Dear anthropoid, the time has come. Please deploy Morning SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141209T1600). [16:00:09] _joe_: You ready? [16:00:36] gi11es: You're first once _joe_ gives the go-ahead. [16:00:47] ok [16:01:04] RECOVERY - nutcracker port on mw1231 is OK: TCP OK - 0.000 second response time on port 11212 [16:01:58] PROBLEM - puppet last run on carbon is CRITICAL: CRITICAL: Puppet has 1 failures [16:02:16] * aude waiting for jenkins now [16:02:47] PROBLEM - HHVM busy threads on mw1225 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [115.2] [16:03:29] PROBLEM - HHVM busy threads on mw1229 is CRITICAL: CRITICAL: 42.86% of data above the critical threshold [115.2] [16:04:41] and wait [16:05:58] PROBLEM - HHVM busy threads on mw1231 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [115.2] [16:06:25] RECOVERY - HHVM busy threads on mw1229 is OK: OK: Less than 30.00% above the threshold [76.8] [16:07:11] PROBLEM - puppet last run on mw1195 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [16:08:12] merged, patch ready [16:08:17] RECOVERY - HHVM busy threads on mw1225 is OK: OK: Less than 30.00% above the threshold [76.8] [16:08:24] https://gerrit.wikimedia.org/r/#/c/178534/ [16:08:45] PROBLEM - DPKG on mw1196 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [16:09:05] RECOVERY - HHVM busy threads on mw1231 is OK: OK: Less than 30.00% above the threshold [76.8] [16:10:03] _joe_: Ping again [16:10:56] <_joe_> anomie: sorry, yeah something tells me I'll need to run sync-common again on a few hosts [16:11:05] <_joe_> namely mw1194-1200 [16:11:13] PROBLEM - puppet last run on mw1196 is CRITICAL: CRITICAL: Puppet has 7 failures [16:11:13] <_joe_> and you will se failures there [16:11:20] <_joe_> but if you need to, go on [16:11:33] * anomie begins SWAT [16:11:46] RECOVERY - DPKG on mw1196 is OK: All packages OK [16:11:49] PROBLEM - puppet last run on mw1197 is CRITICAL: CRITICAL: Puppet has 7 failures [16:11:55] PROBLEM - puppet last run on mw1194 is CRITICAL: CRITICAL: Puppet has 7 failures [16:13:18] (03CR) 10Aude: [C: 031] "our json dumps broke once again yesterday. it would be much easier for hoo to investigate this sort of thing if he could sudo." [puppet] - 10https://gerrit.wikimedia.org/r/152724 (owner: 10Hoo man) [16:14:31] (03PS5) 10Dzahn: phabricator: community metrics stats mail [puppet] - 10https://gerrit.wikimedia.org/r/177792 [16:16:17] PROBLEM - puppet last run on mw1195 is CRITICAL: CRITICAL: Puppet has 6 failures [16:18:00] !log anomie Synchronized php-1.25wmf11/includes/filerepo/file/File.php: SWAT: Fix for broken thumbnails when the file width is in $wgThumbnailBucket [[gerrit:178531]] (duration: 00m 08s) [16:18:02] gi11es: ^ Test please [16:18:05] Logged the message, Master [16:18:35] anomie: testing [16:21:17] (03CR) 10BryanDavis: "Cherry-picked to deployment-salt for testing" [puppet] - 10https://gerrit.wikimedia.org/r/177432 (owner: 10BryanDavis) [16:21:55] anomie: works fine [16:22:00] gi11es: Ok, doing wmf10 now [16:22:55] PROBLEM - HHVM rendering on mw1197 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 MediaWiki exception - 687 bytes in 0.326 second response time [16:23:35] PROBLEM - HHVM rendering on mw1196 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 MediaWiki exception - 687 bytes in 0.320 second response time [16:24:28] RECOVERY - puppet last run on mw1197 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [16:27:09] (03CR) 10BryanDavis: "Causing a puppet error in beta --" [puppet] - 10https://gerrit.wikimedia.org/r/178440 (owner: 10Ori.livneh) [16:28:29] RECOVERY - puppet last run on mw1195 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [16:28:56] PROBLEM - puppet last run on mw1200 is CRITICAL: CRITICAL: Puppet has 7 failures [16:29:19] <_joe_> bd808: restart the puppetmaster [16:29:31] _joe_: okey doke [16:29:37] !log anomie Synchronized php-1.25wmf10/includes/filerepo/file/File.php: SWAT: Fix for broken thumbnails when the file width is in $wgThumbnailBucket [[gerrit:178529]] (duration: 01m 04s) [16:29:39] gi11es: ^ Test please [16:29:43] aude: You're next [16:29:43] Logged the message, Master [16:29:44] anomie: testing... [16:30:16] ok [16:30:37] PROBLEM - HHVM rendering on mw1200 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 MediaWiki exception - 687 bytes in 3.144 second response time [16:30:41] anomie: works fine [16:30:44] thanks! [16:31:34] (03CR) 10BryanDavis: "Beta error fixed by restarting puppetmaster to pick up new function definition. h/t _joe_" [puppet] - 10https://gerrit.wikimedia.org/r/178440 (owner: 10Ori.livneh) [16:31:36] !log anomie Synchronized php-1.25wmf11/extensions/Wikidata/: SWAT: Fix issue with json dump and sites caching in Wikidata [[gerrit:178533]] (duration: 00m 15s) [16:31:38] Logged the message, Master [16:31:39] aude: ^ test please [16:31:40] checking [16:32:55] looks ok [16:33:04] thanks [16:33:11] * anomie is done with SWAT! [16:33:17] :) [16:34:15] idk if there's something broken but still have 503 for subscribe to maniphest task [16:34:30] !log anomie Synchronized wmf-config/CommonSettings-labs.php: Deploy some Labs-only changes so they're not showing as undeployed (duration: 00m 05s) [16:34:35] Logged the message, Master [16:34:43] _joe_: All done, FYI [16:34:53] i.e. known broken. saw plenty of icinga flood [16:35:03] <_joe_> anomie: thanks [16:35:47] RECOVERY - HHVM rendering on mw1196 is OK: HTTP OK: HTTP/1.1 200 OK - 66003 bytes in 5.968 second response time [16:36:03] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [16:38:03] RECOVERY - HHVM rendering on mw1197 is OK: HTTP OK: HTTP/1.1 200 OK - 66002 bytes in 0.212 second response time [16:41:52] <_joe_> !log repooling mw1194-1200 [16:41:55] Logged the message, Master [16:42:43] <_joe_> !log depooling mw1201-04 [16:42:46] Logged the message, Master [16:45:11] RECOVERY - puppet last run on mw1196 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures [16:45:11] RECOVERY - HHVM rendering on mw1200 is OK: HTTP OK: HTTP/1.1 200 OK - 66002 bytes in 0.219 second response time [16:45:40] RECOVERY - puppet last run on mw1200 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [16:54:15] PROBLEM - very high load average likely xfs on ms-be1012 is CRITICAL: CRITICAL - load average: 304.61, 252.62, 141.34 [16:56:34] PROBLEM - Host mw1204 is DOWN: PING CRITICAL - Packet loss = 100% [17:00:34] RECOVERY - Host mw1204 is UP: PING OK - Packet loss = 0%, RTA = 12.93 ms [17:03:03] (03PS1) 10Ottomata: Experiment with different request ack timeout for varnishkafka on a bits host [puppet] - 10https://gerrit.wikimedia.org/r/178538 [17:03:10] gage ^ [17:03:16] PROBLEM - Host ms-be1012 is DOWN: PING CRITICAL - Packet loss = 100% [17:03:46] (03PS2) 10Ottomata: Experiment with different request ack timeout for varnishkafka on a bits host [puppet] - 10https://gerrit.wikimedia.org/r/178538 [17:04:03] (03CR) 10Gage: [C: 032] Experiment with different request ack timeout for varnishkafka on a bits host [puppet] - 10https://gerrit.wikimedia.org/r/178538 (owner: 10Ottomata) [17:04:16] PROBLEM - Host mw1205 is DOWN: PING CRITICAL - Packet loss = 100% [17:04:35] ack don't want vk module in that change! [17:05:33] looks like my +2 didn't matter anyway because it landed before jenkins verified [17:05:59] looking into ms-be1012 [17:06:02] (03PS3) 10Ottomata: Experiment with different request ack timeout for varnishkafka on a bits host [puppet] - 10https://gerrit.wikimedia.org/r/178538 [17:06:45] RECOVERY - Host mw1205 is UP: PING OK - Packet loss = 0%, RTA = 4.99 ms [17:07:07] PROBLEM - Host mw1206 is DOWN: PING CRITICAL - Packet loss = 100% [17:07:08] (03CR) 10Gage: [C: 032] Experiment with different request ack timeout for varnishkafka on a bits host [puppet] - 10https://gerrit.wikimedia.org/r/178538 (owner: 10Ottomata) [17:07:21] !log powercycle ms-be1012, no console [17:07:24] Logged the message, Master [17:07:31] ottomata, i'll puppet-merge [17:07:59] done [17:08:09] danke [17:09:35] RECOVERY - Host mw1206 is UP: PING OK - Packet loss = 0%, RTA = 0.26 ms [17:10:36] RECOVERY - Host ms-be1012 is UP: PING OK - Packet loss = 0%, RTA = 2.16 ms [17:10:47] RECOVERY - very high load average likely xfs on ms-be1012 is OK: OK - load average: 30.12, 7.33, 2.44 [17:16:31] (03PS7) 10Andrew Bogott: Add an 'apache' user to eqiad labstores. [puppet] - 10https://gerrit.wikimedia.org/r/177584 [17:17:54] (03CR) 10Andrew Bogott: [C: 032] Add an 'apache' user to eqiad labstores. [puppet] - 10https://gerrit.wikimedia.org/r/177584 (owner: 10Andrew Bogott) [17:18:00] PROBLEM - Host mw1206 is DOWN: PING CRITICAL - Packet loss = 100% [17:18:50] RECOVERY - Host mw1206 is UP: PING OK - Packet loss = 0%, RTA = 6.14 ms [17:21:41] PROBLEM - check configured eth on mw1206 is CRITICAL: Timeout while attempting connection [17:24:11] RECOVERY - check configured eth on mw1206 is OK: NRPE: Unable to read output [17:26:11] PROBLEM - Host mw1204 is DOWN: PING CRITICAL - Packet loss = 100% [17:27:03] !log csteipp Synchronized php-1.25wmf11/extensions/Listings/Listings.body.php: (no message) (duration: 00m 07s) [17:27:06] Logged the message, Master [17:27:10] RECOVERY - Host mw1204 is UP: PING WARNING - Packet loss = 61%, RTA = 75.97 ms [17:28:59] (03PS1) 10RobH: fixing dhcp files to not have dupes for virt1010-1012 [puppet] - 10https://gerrit.wikimedia.org/r/178543 [17:30:57] (03CR) 10RobH: [C: 032] fixing dhcp files to not have dupes for virt1010-1012 [puppet] - 10https://gerrit.wikimedia.org/r/178543 (owner: 10RobH) [17:31:28] !log patched for T77624 [17:31:32] Logged the message, Master [17:31:57] <_joe_> running puppet on carbon [17:34:24] RECOVERY - puppet last run on carbon is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [17:34:31] PROBLEM - Host mw1204 is DOWN: PING CRITICAL - Packet loss = 100% [17:37:13] RECOVERY - Host mw1204 is UP: PING OK - Packet loss = 0%, RTA = 7.99 ms [17:41:34] (03PS1) 10RobH: setting graphite2001 production dns entry [dns] - 10https://gerrit.wikimedia.org/r/178545 [17:44:18] (03PS1) 10RobH: setting install params for graphite2001 [puppet] - 10https://gerrit.wikimedia.org/r/178546 [17:45:09] (03PS2) 10RobH: setting graphite2001 production dns entry [dns] - 10https://gerrit.wikimedia.org/r/178545 [17:46:16] (03CR) 10RobH: [C: 032] setting graphite2001 production dns entry [dns] - 10https://gerrit.wikimedia.org/r/178545 (owner: 10RobH) [17:47:04] (03CR) 10RobH: [C: 032] setting install params for graphite2001 [puppet] - 10https://gerrit.wikimedia.org/r/178546 (owner: 10RobH) [17:49:36] PROBLEM - puppet last run on ms-be2013 is CRITICAL: CRITICAL: Puppet has 1 failures [18:00:04] maxsem, kaldari: Dear anthropoid, the time has come. Please deploy Mobile Web (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141209T1800). [18:06:13] (03PS3) 10Rush: admins: use concat() instead of an inline template [puppet] - 10https://gerrit.wikimedia.org/r/177757 (owner: 10Giuseppe Lavagetto) [18:06:43] (03CR) 10Rush: [C: 032] "per puppet docs as I understand them thsi should work and is way more clear if so. thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/177757 (owner: 10Giuseppe Lavagetto) [18:10:12] PROBLEM - puppet last run on mw1203 is CRITICAL: CRITICAL: Puppet has 7 failures [18:10:32] PROBLEM - puppet last run on mw1204 is CRITICAL: CRITICAL: Puppet has 7 failures [18:10:42] PROBLEM - puppet last run on mw1205 is CRITICAL: CRITICAL: Puppet has 7 failures [18:12:19] RECOVERY - puppet last run on ms-be2013 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [18:17:36] (03PS3) 10Ottomata: Rename all webrequest varnishkafka instances as 'webrequest' [puppet] - 10https://gerrit.wikimedia.org/r/177546 [18:21:26] <_joe_> !log depooling mw1207-8, repooling mw1203-06 [18:21:28] Logged the message, Master [18:21:35] <_joe_> !log api 100% on HHVM now [18:21:38] Logged the message, Master [18:22:07] <_joe_> ori: ^^ [18:22:20] _joe_: :D:D:D:D [18:22:30] all app servers over hhvm [18:23:24] <_joe_> well, apart from the imagescalers and the jobrunners [18:23:34] <_joe_> but I think we may be done by christmas [18:23:34] ignoring those [18:25:48] :) [18:26:07] _joe_: congrats [18:26:30] RECOVERY - puppet last run on mw1205 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [18:26:33] PROBLEM - Host mw1202 is DOWN: PING CRITICAL - Packet loss = 100% [18:27:12] <_joe_> http://ganglia.wikimedia.org/latest/graph.php?r=day&z=xlarge&h=mw1192.eqiad.wmnet&m=cpu_report&s=by+name&mc=2&g=cpu_report&c=API+application+servers+eqiad I should sell this graph to facebook [18:28:53] :) [18:29:21] (03PS8) 10Hoo man: Allow "hoo" to sudo into datasets [puppet] - 10https://gerrit.wikimedia.org/r/152724 [18:30:50] RECOVERY - Host mw1202 is UP: PING OK - Packet loss = 0%, RTA = 2.93 ms [18:31:37] (03CR) 10Hoo man: "I don't need to be able to ssh into the datasets hosts (being able to sudo into the user on snapshot would be enough)... so removed that (" [puppet] - 10https://gerrit.wikimedia.org/r/152724 (owner: 10Hoo man) [18:35:22] RECOVERY - puppet last run on mw1203 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [18:35:42] RECOVERY - puppet last run on mw1204 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [18:42:37] _joe_: \o/ [18:42:38] (03CR) 10Legoktm: "And the current behavior of redirecting to the phabricator home page is less confusing?? +1" [puppet] - 10https://gerrit.wikimedia.org/r/176898 (owner: 10Nemo bis) [18:43:18] <_joe_> ori: :) [18:43:40] _joe_: that's awesome!! [18:44:35] <_joe_> ori: and I have plenty of open questions for imagescalers already :P [18:44:45] Hey opsen, could anyone point me to someone who's knowledgeable about the misc-web proxy and can review https://gerrit.wikimedia.org/r/#/c/178419/ ? [18:45:53] RoanKattouw: I'm about to go out, but maybe bblack? [18:46:07] oh just saw the change [18:46:10] why misc-web? [18:46:37] I figured that might be better than giving it its own public IP [18:47:01] Citoid responses aren't very cacheable, I figured we didn't need completely separate infra like for Parsoid [18:47:13] OTOH maybe we want a generic proxy for services stuff? [18:48:23] PROBLEM - Host mw1202 is DOWN: PING CRITICAL - Packet loss = 100% [18:48:24] Anyway I'm cool with however you guys wanna do this, as long as we end up with a citoid.wikimedia.org that we can send requests to [18:48:55] _joe_: what happened to mw1207 and mw1208? they're not showing any load. and mw1201 and mw1202 are down in ganglia [18:49:08] <_joe_> ori: https://gerrit.wikimedia.org/r/#/c/178454/ [18:49:16] <_joe_> ori: still reimaging, but depooled [18:49:17] (03PS1) 10coren: We are/will be toolserver.org now. Add template. [dns] - 10https://gerrit.wikimedia.org/r/178552 [18:49:37] _joe_: ahhh [18:49:37] <_joe_> ori: I hit a couple of roadblocks in the afternoon, or I'd be done reimaging by now [18:49:48] _joe_: that's pretty impressive [18:50:04] RECOVERY - Host mw1202 is UP: PING OK - Packet loss = 0%, RTA = 3.02 ms [18:50:33] <_joe_> ori: it's actually pretty boring [18:50:38] <_joe_> :P [18:51:20] (03PS1) 10Ori.livneh: Set $wgAjaxEditStash = true [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178553 [18:51:46] (03CR) 10CSteipp: "Nice! I didn't realize signals made it into trusty." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/177876 (owner: 10Cscott) [18:51:58] PROBLEM - Host mw1203 is DOWN: PING CRITICAL - Packet loss = 100% [18:52:15] csteipp: apparently so, at least according to /var/log/syslog! [18:52:16] (03CR) 10Ori.livneh: [C: 032] Set $wgAjaxEditStash = true [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178553 (owner: 10Ori.livneh) [18:52:23] (03Merged) 10jenkins-bot: Set $wgAjaxEditStash = true [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178553 (owner: 10Ori.livneh) [18:52:42] syslog don't lie, except in some cases... ;) [18:52:43] robh: Quick doublecheck on https://gerrit.wikimedia.org/r/#/c/178552/ for me? Simple zone file with little in it. [18:53:10] reviewing [18:53:13] <_joe_> cscott: can you take a look at https://gerrit.wikimedia.org/r/#/c/178454/ ? [18:53:26] <_joe_> it's a bug that's causing 503s with HHVM [18:53:46] (03CR) 10Cscott: Allow OCG binaries to send/receive signals (AppArmor fixes). (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/177876 (owner: 10Cscott) [18:54:18] (03CR) 10RobH: [C: 031] "Looks right to me when comparing to other existing zone files. +1 and leaving unmerged for Marc to merge when he wants." [dns] - 10https://gerrit.wikimedia.org/r/178552 (owner: 10coren) [18:55:53] (03CR) 10coren: [C: 032] "Not going to do much until the root servers actually point here, but prerequisite to doing so." [dns] - 10https://gerrit.wikimedia.org/r/178552 (owner: 10coren) [18:56:37] PROBLEM - nutcracker port on mw1202 is CRITICAL: Connection refused by host [18:56:46] PROBLEM - puppet last run on mw1202 is CRITICAL: Connection refused by host [18:57:05] PROBLEM - Apache HTTP on mw1202 is CRITICAL: Connection refused [18:57:15] PROBLEM - RAID on mw1202 is CRITICAL: Connection refused by host [18:57:15] PROBLEM - nutcracker process on mw1202 is CRITICAL: Connection refused by host [18:57:37] PROBLEM - Disk space on mw1202 is CRITICAL: Connection refused by host [18:58:37] PROBLEM - DPKG on mw1202 is CRITICAL: Connection refused by host [18:58:56] PROBLEM - check configured eth on mw1202 is CRITICAL: Connection refused by host [18:59:15] PROBLEM - check if dhclient is running on mw1202 is CRITICAL: Connection refused by host [18:59:36] PROBLEM - check if salt-minion is running on mw1202 is CRITICAL: Connection refused by host [18:59:41] Are misc. hosts like terbium, tin, snapshot*, ... going to be switched over to HHVM as well? [18:59:44] (03PS1) 10BBlack: SSL: Remove RC4, enable 3DES, add aDH key exchange [puppet] - 10https://gerrit.wikimedia.org/r/178555 [18:59:48] Or at least to a newer PHP version [18:59:55] <_joe_> hoo: I have no plan for that tbh [19:00:04] Reedy, greg-g: Respected human, time to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141209T1900). Please do the needful. [19:00:22] <_joe_> and I guess it will take a non-trivial amount of time for anything running something different than mediawiki [19:00:39] _joe_: Yeah :S [19:00:56] Would be nice to be able to drop 5.3 support at some point [19:00:58] <_joe_> so, for mediawiki-only hosts, that's a possibility [19:01:01] especially the evil hacks [19:01:29] _joe_: If you say mediawiki-only, you mean appservers/ api servers or also those that run the maintenance scripts and stuff [19:02:28] <_joe_> hoo: also those should be ok to move [19:02:52] <_joe_> but say you have any other application using php - I won't be involved :P [19:03:01] _joe_: Do you know about hhvm's cli performance? [19:03:20] (03CR) 10BBlack: "At least temporarily, pinkunicorn.wikimedia.org has puppet disabled with this change applied locally if you want to hit it with various to" [puppet] - 10https://gerrit.wikimedia.org/r/178555 (owner: 10BBlack) [19:03:33] <_joe_> hoo: yes, for short-running scripts it's a wee bit worse than zend, for long-running ones it's way better [19:03:40] Awesome [19:03:56] <_joe_> but, we converted the jobrunner to post jobs to a fastcgi server for this reason specifically [19:03:59] Cause we have a few that run for 12h or even 36h [19:04:12] <_joe_> why do they take so long? [19:04:25] <_joe_> are those cpu bound? io bound? [19:04:36] They need to load and process all Wikidata-entities [19:04:39] mostly CPU bound [19:04:46] <_joe_> ok that could be helped [19:04:49] <_joe_> I guess :P [19:05:05] PROBLEM - puppet last run on mw1112 is CRITICAL: CRITICAL: Puppet has 1 failures [19:05:31] <_joe_> but ori could be more specific than me on cli performance [19:05:33] <_joe_> or AaronS [19:05:35] <_joe_> :) [19:06:07] yeah, hhvm should do much better [19:06:08] <_joe_> they rewrote the jobrunner to use http in order to communicate with HHVM for the jobrunners [19:06:28] <_joe_> but I think that's for short-running scripts mostly [19:06:45] (03PS2) 10BBlack: SSL: Remove RC4, enable 3DES, add non-anon DH key exchange [puppet] - 10https://gerrit.wikimedia.org/r/178555 [19:06:59] We don't use jobs for that... our stuff is mostly kicked of by crons [19:07:46] <_joe_> hoo: for short scripts, converting your cron from php somefile to a curl could be a good idea, actually :) [19:08:15] <_joe_> because running them time after time will benefit from JIT optimizations [19:08:29] if its not super time sensitive, having the script run by cron just insert a job could also be viable [19:08:53] We have a dispatcher running on terbium that gets restarted every 15 min. or so [19:09:08] Would be nice to convert that to a job (we have a bug for that for ages...) [19:13:45] (03CR) 10JanZerebecki: [C: 031] icinga: use ssl_ciphersuite [puppet] - 10https://gerrit.wikimedia.org/r/178487 (owner: 10Dzahn) [19:16:52] (03PS1) 10Reedy: Non wikipedias to 1.25wmf11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178562 [19:17:19] RECOVERY - puppet last run on mw1112 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:18:35] (03CR) 10Reedy: [C: 032] Non wikipedias to 1.25wmf11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178562 (owner: 10Reedy) [19:18:49] (03Merged) 10jenkins-bot: Non wikipedias to 1.25wmf11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178562 (owner: 10Reedy) [19:22:47] PROBLEM - DPKG on mw1193 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [19:23:20] Reedy: did twentyafterfour talk with you today about doing the deploy next week? [19:23:30] (03CR) 10JanZerebecki: [C: 04-1] "RC4 will also be removed in compat and strong does have TLS1 disabled which results in quite many browsers not being supported. I think co" [puppet] - 10https://gerrit.wikimedia.org/r/178493 (owner: 10Dzahn) [19:23:50] greg-g: sorry no I didn't talk to Reedy yet, helping chase with some phab stuff [19:24:10] Reedy: I need to sync up with you when you have a chance ;) [19:24:12] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: non wikipedias to 1.25wmf11 [19:24:18] Logged the message, Master [19:24:59] there's the deploy going on now, which is why I asked :) [19:25:09] !log that is a lie, 266 hosts failed [19:25:11] Logged the message, Master [19:25:19] eek [19:25:23] well "going on" in some way [19:27:16] * Reedy files the false positive as a task [19:28:39] Awesome [19:28:50] Looks like the memcached traffice dropped slightly again [19:28:59] and we're not even on wikipedias with wmf11 [19:30:04] hoo: note, that deploy didn't take [19:30:29] Meh :P [19:30:53] Now imagine the impact if this actually hits some wiki :D [19:31:43] twentyafterfour: good news, you might be able to join/do this deploy for real today if you and reedy can coordinate now, the hosts are all failing so we'll have a second chance [19:32:02] Reedy: please do that so you can take off next tuesday safely :) [19:32:55] currently deploying phabricator update, I can join in but need 5 minutes to finish this [19:33:37] I think it's gonna take a bit longer to get it fixed ;) [19:33:45] twentyafterfour: yep :) [19:42:04] So yeah, mwdeploy has bad entries in its known_host [19:42:28] Hm. Doesn't keep its own. [19:42:35] nope, i am in it now was about to say that [19:43:02] Then I'm guessing it uses the invoking users' instead [19:43:31] greg-g: What's your username again? [19:45:54] Coren: I'm gjg, but reedy is deploying [19:48:34] Ah, it may be the /etc/ssh/sshd_known_hosts that's the probkem. [19:49:35] Nope. For some reason it's 600 so only root'd be using it. [19:50:31] PROBLEM - puppet last run on mw1207 is CRITICAL: CRITICAL: Puppet has 1 failures [19:52:18] (03PS2) 10Cscott: Allow OCG binaries to send/receive signals (AppArmor fixes). [puppet] - 10https://gerrit.wikimedia.org/r/177876 [19:52:22] Reedy: I don't know enough about the deploy system to tell you where exactly, but the problem is almost certainly that you have obsolete host keys in a known_hosts somewhere. [19:52:25] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 11 data above and 1 below the confidence bounds [19:52:29] (03PS3) 10Cscott: Allow OCG binaries to send/receive signals (AppArmor fixes). [puppet] - 10https://gerrit.wikimedia.org/r/177876 [19:54:24] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 13 data above and 1 below the confidence bounds [19:55:05] Coren: due to reimaging? [19:55:19] greg-g: Almost certainly. [19:55:23] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [500.0] [19:55:33] I didn't have a problem with the new servers done before [19:55:37] Well, not this problem [19:55:41] Coren: so, in that case, we'd need to repopulate the knwonhosts with salt or some such, no? [19:55:50] reedy@tin:/srv/mediawiki-staging$ wc -l ~/.ssh/known_hosts [19:55:50] 9 /home/reedy/.ssh/known_hosts [19:56:23] And the new servers done before were working fine [19:56:26] But yet all fail now [19:56:31] RECOVERY - puppet last run on mw1207 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [19:56:37] greg-g: Likely, but for the life of me I can't seem to locate where the host keys are checked against. It's not mwdeploy's .ssh/known_hosts, nor the global /etc/ssh/ssh_known_hosts [19:56:48] hmm [19:56:48] reedy@tin:/srv/mediawiki-staging$ dsh -g mediawiki-installation -M -F 40 -- 'sudo -u mwdeploy -- rm -rf /usr/local/apache/common-local/php-1.24wmf21' [19:56:49] The authenticity of host 'tmh1002 (10.64.16.146)' can't be established. [19:56:49] RSA key fingerprint is 3d:8c:4d:2d:2d:80:b3:35:91:2f:86:52:cf:2e:c1:7d. [19:56:49] Are you sure you want to continue connecting (yes/no)? The authenticity of host 'mw1031 (10.64.0.61)' can't be established. [19:56:51] RSA key fingerprint is cc:35:a2:42:ed:f4:9a:3e:da:df:e3:54:cb:16:c1:bb. [19:57:10] that happens to me whenever a system is reinstalled [19:57:14] greg-g: The one really odd thing is that the global known_hosts is restricted to root. [19:57:14] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 53.33% of data above the critical threshold [500.0] [19:57:16] sometimes in past folks copy old host keys and replace [19:57:16] that's going to take a while... [19:57:20] but its fallen out of practice [19:57:51] like I say, I didn't have to do this for the servers that were reinstalled before [19:57:51] (03CR) 10JanZerebecki: "https://gerrit.wikimedia.org/r/#/c/178555/ also disables RC4 but enables 3DES instead to still support for the above mentioned browsers." [puppet] - 10https://gerrit.wikimedia.org/r/178488 (owner: 10Giuseppe Lavagetto) [19:58:51] (03CR) 10JanZerebecki: "https://gerrit.wikimedia.org/r/#/c/178488 also disables RC4 but doesn't do the other 2 parts of this patch." [puppet] - 10https://gerrit.wikimedia.org/r/178555 (owner: 10BBlack) [19:59:29] -rw------- 1 root root 1696962 Dec 9 19:26 ssh_known_hosts [19:59:32] Is that right? [20:00:01] Wondered about that as well [20:00:06] no idea where it is puppetized even [20:00:06] there's the new key 'arming' thing [20:00:11] * aude reads scrollback [20:00:17] in the past that was readable [20:00:20] It's... surprising to me, but it's the same as iron so I presume it's on purpose. [20:00:33] * Coren wonders if someone changed that in puppet. [20:00:36] * Coren checks. [20:00:49] Presumably, my known_hosts would've been bloated if it had everything in mediawiki-installation [20:01:07] (03CR) 10CSteipp: [C: 031] Allow OCG binaries to send/receive signals (AppArmor fixes). [puppet] - 10https://gerrit.wikimedia.org/r/177876 (owner: 10Cscott) [20:03:46] ssh_known_hosts has been untouched and with the same permissions since at least Nov 15 so whatever the current issue is couldn't be that if any deployment took place since. [20:04:24] sweet [20:04:31] Can someone else test and see if it's just me(tm)? [20:04:42] yeah [20:04:56] thanks [20:05:18] same for me [20:05:33] Coren: eh? "Dec 9 19:26" [20:05:37] hoo: What is the exact thing you have tried. [20:05:45] Coren: sync-file [20:05:46] greg-g: I checked on iron [20:05:55] not related to HHVM [20:06:06] I also get taht error for snapshot1003 which is still on Zend [20:06:07] oh [20:06:10] hoo: please to give me the exact command line? [20:06:18] _joe_ mentioned that all the scap proxies are now on 14.04 [20:06:20] which might be the clue [20:06:22] hoo@tin:/srv/mediawiki-staging$ sync-file wmf-config/Wikibase.php 'No no no op...' [20:07:32] * Coren investigates further. [20:07:55] Like I say, I tried a plain dsh, and it was asking me to confirm every key [20:08:18] <_joe_> Reedy: dsh from tin to? [20:08:33] EVERYWHERE! [20:08:41] dsh -g mediawiki-installation -M -F 40 -- 'sudo -u mwdeploy -- rm -rf /usr/local/apache/common-local/php-1.24wmf21' [20:08:46] should be a noop [20:09:13] <_joe_> Reedy: I think puppet had something to do with it [20:09:21] hoo@bast1001:~$ ls -l /etc/ssh/ssh_known_hosts [20:09:21] -rw-r--r-- 1 root root 1697600 Dec 9 19:34 /etc/ssh/ssh_known_hosts [20:09:22] Coren: ^ [20:09:43] lol [20:10:05] _joe_: I've checked puppet and it seems to have no reference to known_hosts for general hosts. [20:10:18] I also checked that [20:10:21] yes, weird [20:10:24] <_joe_> Coren: we do colelct the hostkeys [20:10:33] _joe_: Where? [20:10:34] <_joe_> and put them in the known_hosts file [20:10:49] Reedy: Still fighting scap? We could pick the patch that makes it ignore host keys in beta if we need to. [20:11:01] <_joe_> /Stage[main]/Ssh::Hostkeys-collect/Ssh::Hostkey[mw1207.eqiad.wmnet]/Sshkey[mw1207.eqiad.wmn [20:11:05] <_joe_> et]/key: key changed [20:11:08] bd808: It seems it's probably not scap at fault [20:11:23] Reedy: Try that sync-file again from tin? [20:11:32] Reedy: Yeah it would be puppet and reimaging [20:11:36] <_joe_> so at some point, that process changed the permissions of that file? [20:11:48] It takes a while for the new keys to make it to tin [20:11:56] _joe_: I'm looking at that now. [20:12:06] _joe_: The one on bast1001 looks more up to date [20:12:08] Reedy: Success? [20:12:09] bd808: yeah, but they're ALL failing now. Many were working post reinstall [20:12:14] Coren: nope, same [20:12:28] 20:11:50 ['/srv/deployment/scap/scap/bin/sync-common', '--no-update-l10n', '--include', 'wmf-config', '--include', 'wmf-config/CommonSettings.php', 'mw1010.eqiad.wmnet', 'mw1070.eqiad.wmnet', 'mw1161.eqiad.wmnet', 'mw1201.eqiad.wmnet'] on mw1120 returned [255]: Host key verification failed. [20:12:36] <_joe_> hoo: yeah [20:12:41] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [20:12:47] !log reedy Synchronized wmf-config/CommonSettings.php: touch (duration: 01m 03s) [20:12:52] Logged the message, Master [20:12:54] <_joe_> Reedy: that's because /etc/ssh/known_hosts is not reachable by common users [20:13:19] Reedy: Once more? [20:13:23] with feeling? [20:13:36] should work now [20:13:41] yay [20:13:42] 20:13:29 ['/srv/deployment/scap/scap/bin/sync-common', '--no-update-l10n', '--include', 'wmf-config', '--include', 'wmf-config/CommonSettings.php', 'mw1010.eqiad.wmnet', 'mw1070.eqiad.wmnet', 'mw1161.eqiad.wmnet', 'mw1201.eqiad.wmnet'] on mw1214 returned [255]: Error reading response length from authentication socket. [20:13:42] Permission denied (publickey). [20:13:45] Different error, obvs [20:13:50] but just that host [20:13:50] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [20:14:04] <_joe_> which one? [20:14:08] _joe_: Yeah, that's definitely a perm problem with ssh_known_hosts [20:14:15] mw1214 [20:14:21] 1 host is taking an age too [20:14:28] Swamped the ssh-agent on tin? [20:14:33] !log reedy Synchronized wmf-config/CommonSettings.php: touch (duration: 01m 07s) [20:14:36] Logged the message, Master [20:14:47] 20:14:32 ['/srv/deployment/scap/scap/bin/sync-common', '--no-update-l10n', '--include', 'wmf-config', '--include', 'wmf-config/CommonSettings.php', 'mw1010.eqiad.wmnet', 'mw1070.eqiad.wmnet', 'mw1161.eqiad.wmnet', 'mw1201.eqiad.wmnet'] on mw1203 returned [255]: ssh: connect to host mw1203 port 22: Connection timed out [20:14:59] mw1203 isn't responding to ping [20:15:02] !log mw1203 seems to be down [20:15:03] <_joe_> Reedy: that may be because it's reimaging? [20:15:05] Logged the message, Master [20:15:12] <_joe_> oh no it's not [20:15:20] 24 packets transmitted, 0 received, 100% packet loss, time 23184ms [20:15:35] * Coren tracks down what changed the permission. [20:15:45] <_joe_> Reedy: taking a look nnow [20:15:52] _joe_: Coren thanks both :) [20:16:05] Reedy: in case you need this hack someday to work around a similar problem with host keys -- https://gerrit.wikimedia.org/r/#/c/148112/ [20:16:36] (03PS1) 10Aaron Schulz: Simplified profiler config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178591 [20:17:36] _joe_: I don't know whether the permission change was caused by puppet, but a puppet run /now/ doesn't change it back to 0600 [20:18:03] (03CR) 10Hoo man: "Will this work on the remaining Zend hosts?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178591 (owner: 10Aaron Schulz) [20:18:19] twentyafterfour: soooooooooooooooooooooooooooo [20:18:30] !log gave a+r to /etc/ssh/ssh_known_hosts on tin and iron [20:18:31] Reedy: yo [20:18:32] Logged the message, Master [20:18:49] I can think of no reason why that file wouldn't be world-readable. [20:19:17] <_joe_> Coren: mmmh does look like something changed those permissions everywhere? [20:19:28] Reedy: hoo: greg-g: the immediate issue is fixed and, as far as I can tell, puppet won't break it again. I still have no idea why it was changed at all though. [20:19:48] _joe_: Not through puppet that I can see. Maybe a badly aimed salt run? [20:20:21] <_joe_> !log repooled the last api servers [20:20:25] Logged the message, Master [20:20:27] <_joe_> Coren: no idea [20:21:05] twentyafterfour: So, how do you want to do this? My internet connection is beyond shocking atm, so no chance of screen sharing [20:21:34] tuesday deploys are nice and simple (barring stuff like this ;)) [20:21:39] greg-g: Are you all set? [20:21:41] RECOVERY - Host mw1203 is UP: PING OK - Packet loss = 0%, RTA = 2.24 ms [20:21:54] Coren: i think so [20:22:13] Reedy: ok can you just save me an ssh session log from your terminal so [20:22:24] Reedy: so I can use it as a reference card next week? [20:22:31] or something like that [20:23:19] Reedy: it's already documented on wiki but it's a lot easier to follow a log than the way it's presented in the wiki, imo [20:23:44] ./multiversion/updateWikiversions all.dblist php-1.25wmf11 [20:23:44] ./multiversion/updateWikiversions wikipedia.dblist php-1.25wmf10 [20:23:44] git commit -a -m "non wikipedias to 1.25wmf11" [20:23:44] git push origin HEAD:refs/for/master [20:23:44] [20:23:46] git pull [20:23:49] sync-wikiversions non wikipedias to 1.25wmf11 [20:24:04] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: non wikipedias to 1.25wmf11 [20:24:19] <_joe_> Reedy: mw1203 is up, but I'd run sync-common there [20:24:35] (03CR) 10JanZerebecki: [C: 04-1] "Yes, please." [puppet] - 10https://gerrit.wikimedia.org/r/178555 (owner: 10BBlack) [20:25:47] 20:25:30 Finished rsync common (duration: 00m 49s) [20:25:52] !log ran sync-common on mw1203 [20:25:55] Logged the message, Master [20:26:10] why does this go through gerrit if you are reviewing it yourself [20:26:24] cause we don't have direct push on the repo [20:26:26] might as well just push to master ;) [20:26:32] heh [20:26:43] twentyafterfour: then there is a record [20:26:52] that can be seen in gerrit [20:26:54] aude: git commit is not a record? [20:26:57] aude: there is still in git log ;) [20:27:01] twentyafterfour: because it's more transparent for other users :) [20:27:09] (03PS2) 10Reedy: multiversion: cdb/cdb was renamed to wikimedia/cdb [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178340 (owner: 10Legoktm) [20:27:09] FlorianSW: ok [20:27:13] - if you don't have a local copy ;) [20:27:14] true [20:27:15] (03CR) 10Reedy: [C: 032] multiversion: cdb/cdb was renamed to wikimedia/cdb [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178340 (owner: 10Legoktm) [20:27:17] FlorianSW: I'll buy that answer I guess ;) [20:27:21] makes sense [20:27:24] (03Merged) 10jenkins-bot: multiversion: cdb/cdb was renamed to wikimedia/cdb [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178340 (owner: 10Legoktm) [20:27:46] though the repository commit logs are totally published [20:27:57] just like gerrit changes are [20:27:58] * aude normally looks at gerrit and not gitblit or github [20:28:01] twentyafterfour: this answer is out of stock, sorry :D [20:28:16] this might get better with phabricator+differential [20:28:16] Fast-forwarded master to e1f8140d1946079ec03b230c7199e463999c5216. [20:28:18] e1f! [20:28:20] it's christmas! [20:28:37] easier the way it presents all the changes in a nice list across projects [20:28:45] despite its faults [20:28:45] !log reedy Synchronized multiversion/: cdb bump (duration: 00m 05s) [20:28:47] Logged the message, Master [20:29:17] aude: true, that is nice and concise, unlike the rest of gerrit [20:29:21] (03PS2) 10Reedy: wgMemoryLimit to 330MB [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173507 [20:30:02] (03CR) 10Reedy: [C: 032] wgMemoryLimit to 330MB [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173507 (owner: 10Reedy) [20:30:10] (03Merged) 10jenkins-bot: wgMemoryLimit to 330MB [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173507 (owner: 10Reedy) [20:30:16] mediawiki-config is a bit quiet this week [20:30:58] greg-g, Reedy: i guess, that there will be no deployment between 22 and 31 december? [20:31:10] * hoo eyes YuviPanda ... I need your super eh sudo-powers :D [20:31:15] w/c 22 and 29 [20:31:17] yha [20:31:23] hi hoo [20:31:24] ‘sup [20:31:37] YuviPanda: Remember that script I asked you to kill earlier on [20:31:49] now that Wikidata is on wmf11 it should work in a reasoanble time [20:31:50] yeah? [20:31:53] (03PS3) 10Reedy: Remove Anexo namespace on pt.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172012 (https://bugzilla.wikimedia.org/73164) (owner: 10Dereckson) [20:31:55] hmm [20:31:59] Reedy: wirth's law? ;) [20:32:01] it it not running on a cron or something? [20:32:03] (03CR) 10Reedy: "Can this go now?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172012 (https://bugzilla.wikimedia.org/73164) (owner: 10Dereckson) [20:32:08] YuviPanda: Weekly, it is [20:32:17] aaah, weekly [20:32:19] FlorianSW: right, next week is the last deploy until 2015 [20:32:20] but we would like to have a dump this week now that it was aborted [20:32:41] Reedy, ok, thanks :) greg-g: and next year the "normal" deployment plan or is there a gap, too? :) [20:33:20] hoo: right. [20:33:28] ooh right [20:33:30] hoo: I’m trying to figure out the best way to start this now. [20:33:57] !log reedy Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 06s) [20:34:00] Logged the message, Master [20:36:19] hoo: I think it should be running now [20:36:26] YuviPanda: Checking [20:36:56] Looks fine :) [20:37:00] Thanks again! [20:37:40] hoo: yw! [20:37:54] !log started /usr/local/bin/dumpwikidatajson.sh on snapshot1003 per hoo, to re-start dump script aborted earlier [20:37:59] Logged the message, Master [20:38:27] YuviPanda: Something's fishy [20:38:43] hoo: did it die? [20:39:42] hoo: I think it died. looks like my method of persisting it after I disconnect didn’t really work [20:40:10] YuviPanda: There are a lot of processes running now [20:40:27] I think your kill earlier on didn't kill the php processes [20:40:31] (03PS4) 10Ottomata: Rename all webrequest varnishkafka instances as 'webrequest' [puppet] - 10https://gerrit.wikimedia.org/r/177546 [20:41:31] hoo: there… is a php process? [20:41:37] dammit, I just killed the shellscript. [20:41:42] :S [20:41:44] should’ve known. [20:41:46] Not really an issue [20:41:57] It's only reading data [20:42:05] Reedy: are you still deploying stuff? [20:42:19] hoo: ok, I’m thinking I’ll kill *all* the php processes, and restart it again? [20:42:43] wtf [20:42:43] https://commons.wikimedia.org/wiki/Special:ListFiles?limit=50&user=Spinster&ilshowall=1 [20:42:55] If you select that first file, does it apparently not exist for anyone else? [20:43:16] yes [20:43:32] I suspect of the ' character :P [20:43:33] doesn't exist for me [20:43:35] what the hell has mediawiki done [20:44:00] (03CR) 10Ottomata: [C: 032] Rename all webrequest varnishkafka instances as 'webrequest' [puppet] - 10https://gerrit.wikimedia.org/r/177546 (owner: 10Ottomata) [20:44:10] YuviPanda: all jsonDump ones, yes [20:44:14] !log renaming all webrequest varnishkafka instances [20:44:17] Logged the message, Master [20:44:22] but be careful, there are other crons on it [20:44:29] so might be that other stuff is running [20:45:24] hoo: I only killed the dumpJson ones. [20:45:36] hoo: and restarted. now there are 4 running, rather than 8 [20:46:01] https://commons.wikimedia.org/wiki/File:Nova_Guinea_-_r%C3%A9sultats_de_l%27exp%C3%A9dition_scientifique_n%C3%A9erlandaise_%C3%A0_la_Nouvelle-Guin%C3%A9e_en_1903_-_Vol_2-1.djvu is there fine... [20:46:10] (03PS1) 10Cmjohnson: Flipping virt1010-1012 dhcp again, can't get anything on serial now [puppet] - 10https://gerrit.wikimedia.org/r/178602 [20:46:14] YuviPanda: Ah, thanks [20:46:32] given the way the scripts were killed this time it might that there are temp. files you need to clean up [20:46:36] give me a second [20:46:52] !log started /usr/local/bin/dumpwikidatajson.sh on snapshot1003 per hoo, after killing php processes from earlier start as well as from the earlier botched kill [20:46:54] hoo: ok [20:46:55] Logged the message, Master [20:47:03] * YuviPanda really wishes he knew more about how this entire thing is set up [20:47:52] (03PS2) 10Cmjohnson: Flipping virt1010-1012 dhcp again, can't get anything on serial now [puppet] - 10https://gerrit.wikimedia.org/r/178602 [20:48:02] !log ran update revision set rev_page="8555529" where rev_page="1469156"; on frwiki (for T76979) [20:48:07] Logged the message, Master [20:48:07] YuviPanda: Actually not [20:48:15] The new run will just overwrite them [20:48:31] /mnt/data/xmldatadumps/temp/wikidataJson.$i.gz [20:48:51] where 4 > $i > 0 [20:48:57] legoktm: No, not deploying stuff. Just looking at https://phabricator.wikimedia.org/tag/shell/ atm [20:49:17] (03CR) 10Cmjohnson: [C: 032] Flipping virt1010-1012 dhcp again, can't get anything on serial now [puppet] - 10https://gerrit.wikimedia.org/r/178602 (owner: 10Cmjohnson) [20:49:22] ok, I'm rescuing some revisions on frwiki [20:49:29] YuviPanda: Thank you for your help... and sorry for these headaches :D [20:49:47] hoo: :D that’s what I’ve root for! :) and sorry for not cleaning it up properly earlier. [20:51:54] (03PS1) 10Ottomata: Fix name of varnishkafka instance in require dependency [puppet] - 10https://gerrit.wikimedia.org/r/178605 [20:52:33] (03PS2) 10Ottomata: Fix name of varnishkafka instance in require dependency [puppet] - 10https://gerrit.wikimedia.org/r/178605 [20:52:55] (03CR) 10Ottomata: [C: 032 V: 032] Fix name of varnishkafka instance in require dependency [puppet] - 10https://gerrit.wikimedia.org/r/178605 (owner: 10Ottomata) [20:53:21] !log ran update revision set rev_page="8555535" where rev_page="6628330"; on frwiki [20:53:24] Logged the message, Master [20:54:59] * legoktm is done for now [20:55:17] PROBLEM - puppet last run on cp1070 is CRITICAL: CRITICAL: puppet fail [20:55:33] PROBLEM - puppet last run on cp1038 is CRITICAL: CRITICAL: puppet fail [20:55:35] PROBLEM - puppet last run on cp1065 is CRITICAL: CRITICAL: puppet fail [20:55:47] PROBLEM - puppet last run on cp1060 is CRITICAL: CRITICAL: puppet fail [20:55:48] PROBLEM - puppet last run on cp1052 is CRITICAL: CRITICAL: puppet fail [20:56:03] PROBLEM - puppet last run on cp1069 is CRITICAL: CRITICAL: puppet fail [20:56:15] PROBLEM - puppet last run on cp1055 is CRITICAL: CRITICAL: puppet fail [20:56:38] PROBLEM - puppet last run on cp1039 is CRITICAL: CRITICAL: puppet fail [20:57:02] PROBLEM - puppet last run on cp1048 is CRITICAL: CRITICAL: puppet fail [20:57:09] PROBLEM - puppet last run on cp1037 is CRITICAL: CRITICAL: puppet fail [20:57:09] PROBLEM - puppet last run on cp1061 is CRITICAL: CRITICAL: puppet fail [20:57:16] PROBLEM - puppet last run on amssq53 is CRITICAL: CRITICAL: puppet fail [20:57:23] PROBLEM - puppet last run on cp1049 is CRITICAL: CRITICAL: puppet fail [20:57:23] PROBLEM - puppet last run on cp1057 is CRITICAL: CRITICAL: puppet fail [20:57:23] PROBLEM - puppet last run on cp1067 is CRITICAL: CRITICAL: puppet fail [20:57:36] PROBLEM - puppet last run on cp1062 is CRITICAL: CRITICAL: puppet fail [20:57:36] PROBLEM - puppet last run on cp3016 is CRITICAL: CRITICAL: puppet fail [20:58:03] PROBLEM - puppet last run on cp1047 is CRITICAL: CRITICAL: puppet fail [20:58:04] PROBLEM - puppet last run on cp3012 is CRITICAL: CRITICAL: puppet fail [20:58:14] PROBLEM - puppet last run on amssq35 is CRITICAL: CRITICAL: puppet fail [20:58:15] PROBLEM - puppet last run on amssq32 is CRITICAL: CRITICAL: puppet fail [20:58:15] PROBLEM - puppet last run on cp1050 is CRITICAL: CRITICAL: puppet fail [20:58:15] PROBLEM - puppet last run on cp1051 is CRITICAL: CRITICAL: puppet fail [20:58:24] PROBLEM - puppet last run on cp1064 is CRITICAL: CRITICAL: puppet fail [20:58:24] PROBLEM - puppet last run on cp3009 is CRITICAL: CRITICAL: puppet fail [20:58:48] gj [20:59:14] PROBLEM - puppet last run on amssq31 is CRITICAL: CRITICAL: puppet fail [20:59:14] PROBLEM - puppet last run on amssq38 is CRITICAL: CRITICAL: puppet fail [20:59:14] PROBLEM - puppet last run on cp3004 is CRITICAL: CRITICAL: puppet fail [20:59:45] RECOVERY - puppet last run on cp1055 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [20:59:58] RECOVERY - puppet last run on cp1039 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [21:00:17] RECOVERY - puppet last run on cp1048 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [21:00:29] RECOVERY - puppet last run on cp1037 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:00:30] RECOVERY - puppet last run on cp1061 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:00:32] RECOVERY - puppet last run on cp1049 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:00:32] RECOVERY - puppet last run on cp1057 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:00:32] RECOVERY - puppet last run on cp1067 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:00:35] taht's me [21:00:36] sorry [21:00:38] RECOVERY - puppet last run on cp1062 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:00:52] should be coming back [21:01:09] RECOVERY - puppet last run on cp1047 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:01:18] RECOVERY - puppet last run on cp3012 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [21:01:18] RECOVERY - puppet last run on amssq35 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [21:01:18] RECOVERY - puppet last run on amssq32 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [21:01:28] RECOVERY - puppet last run on cp1050 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:01:28] RECOVERY - puppet last run on cp1051 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [21:01:30] RECOVERY - puppet last run on cp1070 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [21:01:37] RECOVERY - puppet last run on cp1064 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [21:01:46] RECOVERY - puppet last run on cp3009 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:01:59] RECOVERY - puppet last run on cp1038 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [21:01:59] RECOVERY - puppet last run on cp1065 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [21:02:35] RECOVERY - puppet last run on cp1060 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [21:02:35] RECOVERY - puppet last run on cp1052 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [21:02:35] RECOVERY - puppet last run on amssq31 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:02:35] RECOVERY - puppet last run on amssq38 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:02:35] RECOVERY - puppet last run on cp3004 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [21:02:37] RECOVERY - puppet last run on cp1069 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [21:03:39] RECOVERY - puppet last run on amssq53 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [21:03:51] RECOVERY - puppet last run on cp3016 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [21:07:24] (03CR) 10JanZerebecki: SSL: Remove RC4, enable 3DES, add non-anon DH key exchange (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/178555 (owner: 10BBlack) [21:13:38] Platonides: https://phabricator.wikimedia.org/T78060 [21:13:38] ffs [21:15:10] RECOVERY - HTTP error ratio anomaly detection on tungsten is OK: OK: No anomaly detected [21:16:11] RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK: OK: No anomaly detected [21:21:11] (03CR) 10JanZerebecki: "To get DHE to 2k nginx needs dhparam setting with a file from openssl dhparam, apache needs v2.4.7 ( https://httpd.apache.org/docs/2.4/mod" [puppet] - 10https://gerrit.wikimedia.org/r/178555 (owner: 10BBlack) [21:25:08] Reedy, found your image at https://commons.wikimedia.org/wiki/?curid=37220364 ;) [21:25:45] yeah, I'd got that [21:25:52] I'm hacking a delete script to use page id :P [21:26:41] what I don't get is why is it "not-existing"? [21:26:56] mmh... maybe the memcached entry is lying [21:27:04] I'd purge it first [21:27:13] I did try that [21:27:24] reedy@tin:/srv/mediawiki-staging/php-1.25wmf11/maintenance$ mwscript deleteOne.php --wiki=commonswiki --u="Reedy (WMF)" --r="Bad filename" /tmp/uploads/delete.txt [21:27:24] File:Nova Guinea - résultats de l'expédition scientifique néerlandaise à la Nouvelle-Guinée en 1903 - Vol 4.djvu Deleted! [21:27:24] reedy@tin:/srv/mediawiki-staging/php-1.25wmf11/maintenance$ [21:32:25] Platonides: ok, wtf [21:32:27] It's not the ' [21:32:28] https://commons.wikimedia.org/wiki/File:Nova_Guinea_-_re%CC%81sultats_de_l_expe%CC%81dition_scientifique_ne%CC%81erlandaise_a%CC%80_la_Nouvelle-Guine%CC%81e_en_1903_-_Vol_4.djvu [21:32:29] (03CR) 10Alexandros Kosiaris: "I should have seen it too :-(" [puppet] - 10https://gerrit.wikimedia.org/r/177864 (owner: 10Alexandros Kosiaris) [21:32:38] https://commons.wikimedia.org/wiki/Special:ListFiles?limit=50&user=Spinster&ilshowall=1 [21:33:17] Reedy, see #wikimedia-commons [21:33:22] it's the diacritic [21:33:42] ah [21:34:24] What should I call it? :P [21:35:15] wghat do you mean? [21:36:17] How should I get it uploaded where I want it to be? [21:36:25] /it was requested [21:36:29] you need to normalise the filenames [21:36:44] (03CR) 10Dereckson: "From pt. point of view, yes, it can go." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172012 (https://bugzilla.wikimedia.org/73164) (owner: 10Dereckson) [21:39:06] https://en.wikipedia.org/wiki/Unicode#Ready-made_versus_composite_characters [21:39:16] we have a class for that [21:40:15] https://git.wikimedia.org/blob/mediawiki%2Fcore/9015ffdcf0c0e1a3b2ba924eb8dd4f1f3751e882/includes%2Fnormal%2FUtfNormal.php [21:43:17] beware that other in the batch are probably in the wrong normal form, too [21:43:51] I thought Title called UtfNormal, but now I don't see that [21:47:22] (03PS1) 10Ottomata: Make default key_filter callback replace . and : found in key names to _ [debs/logster] - 10https://gerrit.wikimedia.org/r/178655 [21:52:08] ori: https://gerrit.wikimedia.org/r/#/c/178656/1 [21:58:20] Coren: ping [21:58:28] hoo: Pong? [21:58:43] Coren: the time has come... remember when you protected "Germany" on Wikidata? [21:58:51] Indeed. [21:59:07] Now that we're all on hhvm it should be ok to unprotect it again [21:59:11] https://www.wikidata.org/w/index.php?title=Q183&action=unprotect [21:59:38] And remove superprotect while at it [21:59:41] We will ask people to be gentle and not add tons of data to it, but only do necessary changes [22:00:00] AaronSchulz: i'll sync [22:00:05] who has unstaged changes? [22:00:16] hoo: Back to unprotected or back to normal protection? [22:00:23] AaronSchulz: where? [22:00:32] deleteBatch.php [22:00:40] Me [22:00:42] Coren: Make it edit=sysop and I'll bring back the right revision then [22:00:42] let me revert [22:00:43] (BZ 715189) [22:00:44] and unportect [22:00:48] lol, typo [22:01:03] AaronSchulz: fixed [22:01:34] !log ori Synchronized php-1.25wmf11/includes/api/ApiStashEdit.php: I5c296325: Various edit stash fixes (duration: 00m 06s) [22:01:35] ^ AaronSchulz [22:01:37] Logged the message, Master [22:02:14] (03PS1) 10EBernhardson: Flow enable three minor talk namespaces on officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178662 [22:02:14] hoo: {{done}} [22:02:16] (03PS1) 10EBernhardson: Flow enable the category talk namespace on officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178663 [22:02:18] (03PS1) 10EBernhardson: Flow enable the rest of the talk namespaces on officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178664 [22:02:22] Coren: Awesome, thanks :) [22:02:51] It's a bit slowwww [22:03:03] Reedy: still depoying? [22:03:16] ebernhardson: Nope. Other people are though it seems :) [22:03:44] btw, api open search giving db errors [22:04:16] :) Should be safe for me to go ahead? be updating flow in 1.25wmf11 then interspersing some maint scripts with mediawiki-config updates after verying things [22:04:31] ori: Anything else to deploy? [22:04:38] AaronSchulz: example? :P [22:09:05] AaronSchulz: filed and pinged Brad [22:09:16] ebernhardson: I presume so [22:10:02] (03CR) 10EBernhardson: [C: 032] Enable flow on catalan wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178278 (owner: 10EBernhardson) [22:10:12] (03Merged) 10jenkins-bot: Enable flow on catalan wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178278 (owner: 10EBernhardson) [22:10:40] (03CR) 10Ottomata: [C: 032 V: 032] Make default key_filter callback replace . and : found in key names to _ [debs/logster] - 10https://gerrit.wikimedia.org/r/178655 (owner: 10Ottomata) [22:11:04] !log ebernhardson Synchronized wmf-config/InitialiseSettings.php: enable flow on cawiki (duration: 00m 06s) [22:11:08] Logged the message, Master [22:11:53] (03PS1) 10Ottomata: Fix bug where metric_name would be prefixed and suffixed multiple times when using multiple outputs [debs/logster] - 10https://gerrit.wikimedia.org/r/178666 [22:12:35] (03CR) 10Ottomata: [C: 032] Fix bug where metric_name would be prefixed and suffixed multiple times when using multiple outputs [debs/logster] - 10https://gerrit.wikimedia.org/r/178666 (owner: 10Ottomata) [22:12:41] (03CR) 10Ottomata: [V: 032] Fix bug where metric_name would be prefixed and suffixed multiple times when using multiple outputs [debs/logster] - 10https://gerrit.wikimedia.org/r/178666 (owner: 10Ottomata) [22:15:33] AaronSchulz: it still doesn't work, somehow, even if i deliberately pause for a while on the edit summary field [22:16:00] ori: it's like checkCache() isn't even being hit [22:16:08] it leaves a log entry for all paths [22:17:02] AaronSchulz: so it must be the checkCache parameter which defaults to false [22:17:08] or useCache, rather [22:17:16] what was the rationale again for not making that always true? [22:17:43] doEditContent() sets it to true...maybe some abuse filter thingy is calling it first [22:17:43] !log ebernhardson Synchronized php-1.25wmf11/extensions/Flow/: Push flow updates for officewiki deploy (duration: 00m 08s) [22:17:43] Logged the message, Master [22:18:49] !log ori Synchronized php-1.25wmf11/includes/page/WikiPage.php: (hack) $useCache = true (duration: 00m 06s) [22:18:51] Logged the message, Master [22:19:08] AaronSchulz: there we go [22:19:29] ori: I'll change the default to true and make it false in the stash api [22:19:40] AaronSchulz: cool [22:19:43] i'll revert that sync [22:20:11] !log ori Synchronized php-1.25wmf11/includes/page/WikiPage.php: undo: (hack) $useCache = true (duration: 00m 07s) [22:20:13] Logged the message, Master [22:21:06] (03CR) 10EBernhardson: [C: 032] Flow enable three minor talk namespaces on officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178662 (owner: 10EBernhardson) [22:23:27] (03Merged) 10jenkins-bot: Flow enable three minor talk namespaces on officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178662 (owner: 10EBernhardson) [22:24:39] !log ebernhardson Synchronized wmf-config/InitialiseSettings.php: enable flow on three namespaces on officewiki (duration: 00m 05s) [22:24:43] Logged the message, Master [22:26:22] ori: https://gerrit.wikimedia.org/r/#/c/178673/ [22:27:30] !log ebernhardson Synchronized wmf-config/InitialiseSettings.php: enable flow on three namespaces on officewiki (duration: 00m 06s) [22:27:33] Logged the message, Master [22:27:52] (03PS1) 10JanZerebecki: Change ru.wikinews.org to HTTPS only. [puppet] - 10https://gerrit.wikimedia.org/r/178676 [22:33:41] !log aaron Synchronized php-1.25wmf11/includes/api/ApiStashEdit.php: dff1662755d828675e5ae119b1987ace10865693 (duration: 00m 06s) [22:33:45] Logged the message, Master [22:34:04] !log aaron Synchronized php-1.25wmf11/includes/page/WikiPage.php: dff1662755d828675e5ae119b1987ace10865693 (duration: 00m 06s) [22:34:06] Logged the message, Master [22:38:32] (03PS2) 10JanZerebecki: Change ru.wikinews.org to HTTPS only. [puppet] - 10https://gerrit.wikimedia.org/r/178676 [22:44:15] (03CR) 10JanZerebecki: [C: 04-1] "Oops forgot to restrict it to ru." [puppet] - 10https://gerrit.wikimedia.org/r/178676 (owner: 10JanZerebecki) [22:46:47] (03PS1) 10EBernhardson: Properly flow enable 4 pages on cawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178689 [22:47:00] (03PS1) 10BryanDavis: Ensure that apache's uid=48 [puppet] - 10https://gerrit.wikimedia.org/r/178690 [22:47:48] (03CR) 10BryanDavis: [C: 04-1] "Will renumber in beta manually first, then cherry-pick this and if all goes well ask for a merge." [puppet] - 10https://gerrit.wikimedia.org/r/178690 (owner: 10BryanDavis) [22:47:59] (03CR) 10EBernhardson: [C: 032] Properly flow enable 4 pages on cawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178689 (owner: 10EBernhardson) [22:48:08] (03Merged) 10jenkins-bot: Properly flow enable 4 pages on cawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178689 (owner: 10EBernhardson) [22:49:44] !log ebernhardson Synchronized wmf-config/InitialiseSettings.php: Flow enable 4 pages on cawiki (duration: 00m 05s) [22:49:49] Logged the message, Master [22:49:59] (03PS1) 10RobH: setting mgmt entries for hostname [dns] - 10https://gerrit.wikimedia.org/r/178691 [22:50:11] (03CR) 10jenkins-bot: [V: 04-1] setting mgmt entries for hostname [dns] - 10https://gerrit.wikimedia.org/r/178691 (owner: 10RobH) [22:51:19] (03PS2) 10RobH: setting mgmt entries for hostname [dns] - 10https://gerrit.wikimedia.org/r/178691 [22:51:54] (03CR) 10RobH: [C: 032] setting mgmt entries for hostname [dns] - 10https://gerrit.wikimedia.org/r/178691 (owner: 10RobH) [22:53:54] !log ebernhardson Synchronized php-1.25wmf11/extensions/Flow: Bump flow in 1.25wmf11 for officewiki import fixes (duration: 00m 07s) [22:54:00] Logged the message, Master [22:54:15] PROBLEM - puppet last run on cp4018 is CRITICAL: CRITICAL: puppet fail [22:54:40] (03CR) 10EBernhardson: [C: 032] Flow enable the category talk namespace on officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178663 (owner: 10EBernhardson) [22:54:49] (03Merged) 10jenkins-bot: Flow enable the category talk namespace on officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178663 (owner: 10EBernhardson) [22:55:30] !log ebernhardson Synchronized wmf-config/InitialiseSettings.php: Flow enable category talk namespace on officewiki (duration: 00m 08s) [22:55:32] Logged the message, Master [22:57:12] !log ori Synchronized php-1.25wmf11/includes/api/ApiStashEdit.php: (no message) (duration: 00m 05s) [22:57:15] Logged the message, Master [22:59:13] AaronSchulz: seeing cache hits now [23:03:02] (03CR) 10EBernhardson: [C: 032] Flow enable the rest of the talk namespaces on officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178664 (owner: 10EBernhardson) [23:03:16] (03Merged) 10jenkins-bot: Flow enable the rest of the talk namespaces on officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178664 (owner: 10EBernhardson) [23:04:15] !log ebernhardson Synchronized wmf-config/InitialiseSettings.php: Flow enable the rest of officewiki talk namespaces (duration: 00m 09s) [23:04:20] Logged the message, Master [23:07:02] (03PS1) 10Spage: Comment with rationale for Flow on catalan pages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178701 [23:12:58] RECOVERY - puppet last run on cp4018 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [23:15:23] (03PS1) 10Ori.livneh: Initial commit of statsv [puppet] - 10https://gerrit.wikimedia.org/r/178703 [23:15:54] (03PS2) 10Ori.livneh: Initial commit of statsv [puppet] - 10https://gerrit.wikimedia.org/r/178703 [23:16:14] (03CR) 10Ori.livneh: [C: 032 V: 032] Initial commit of statsv [puppet] - 10https://gerrit.wikimedia.org/r/178703 (owner: 10Ori.livneh) [23:17:11] (03PS3) 10Yuvipanda: wdq-mm: Initial module + labs role [puppet] - 10https://gerrit.wikimedia.org/r/178496 [23:18:22] (03CR) 10Yuvipanda: [C: 032] wdq-mm: Initial module + labs role [puppet] - 10https://gerrit.wikimedia.org/r/178496 (owner: 10Yuvipanda) [23:18:35] (03PS4) 10Yuvipanda: wdq-mm: Setup monit based monitoring to restart service [puppet] - 10https://gerrit.wikimedia.org/r/178498 [23:20:20] (03CR) 10Yuvipanda: [C: 032] wdq-mm: Setup monit based monitoring to restart service [puppet] - 10https://gerrit.wikimedia.org/r/178498 (owner: 10Yuvipanda) [23:20:33] (03PS1) 10Ori.livneh: statsv: correct module path [puppet] - 10https://gerrit.wikimedia.org/r/178704 [23:20:43] (03PS2) 10Ori.livneh: statsv: correct module path [puppet] - 10https://gerrit.wikimedia.org/r/178704 [23:21:15] (03CR) 10Ori.livneh: [C: 032 V: 032] statsv: correct module path [puppet] - 10https://gerrit.wikimedia.org/r/178704 (owner: 10Ori.livneh) [23:22:25] RECOVERY - DPKG on mw1193 is OK: All packages OK [23:22:51] cajoel: is copy through ? [23:23:54] or should i ask cajoel_ ? [23:24:20] we're both here [23:24:29] go in to the dir and du -h [23:24:35] and you can tell if it's growing [23:24:37] most likely it is [23:24:49] did you try that? [23:25:16] yep -- still transferring [23:25:29] 4.05MB/s [23:26:10] was interetsed in the rate more than the fact it is still running, thanks for the info [23:26:48] (03CR) 10BryanDavis: "It seems to be working as expected in beta. It should be watched in prod for a bit to make sure there isn't something weird about the chan" [puppet] - 10https://gerrit.wikimedia.org/r/177432 (owner: 10BryanDavis) [23:29:21] ori: have you thought about emitting statsd counters directly from (a) varnish? [23:30:39] (03PS1) 10Springle: depool db1039 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178706 [23:32:10] (03CR) 10Springle: [C: 032] depool db1039 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178706 (owner: 10Springle) [23:33:21] !log springle Synchronized wmf-config/db-eqiad.php: depool db1039 (duration: 00m 06s) [23:33:24] Logged the message, Master [23:33:35] ori: you could just set up a varnish on the webperf box and we could set it as the backend for ^(statsv|event.gif) requests [23:34:16] ori: then there either have VCL that does all kinds of shit (push to statsd directly, for instance) or do VSM/varnishlog/varnishkafka processing [23:34:28] or you could even set up a proper application there :) [23:34:32] wsgi/python [23:34:45] I'm not sure I understand why you're using Kafka to transport query arguments [23:35:25] I'm probably missing some context [23:35:32] (03CR) 10EBernhardson: [C: 032] Disable LQT on office wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/175563 (owner: 10EBernhardson) [23:35:48] (03Merged) 10jenkins-bot: Disable LQT on office wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/175563 (owner: 10EBernhardson) [23:37:39] (03CR) 10EBernhardson: [C: 032] Comment with rationale for Flow on catalan pages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178701 (owner: 10Spage) [23:37:49] (03Merged) 10jenkins-bot: Comment with rationale for Flow on catalan pages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178701 (owner: 10Spage) [23:38:41] !log ebernhardson Synchronized wmf-config/: Disable LQT on officewiki (duration: 00m 05s) [23:38:45] Logged the message, Master [23:38:58] ACKNOWLEDGEMENT - RAID on db1010 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) Sean Pringle RT 9026 [23:46:23] paravoid: well, you don't want the client to wait on the frontend varnish doing the backend request [23:46:37] you want to return a blank HTTP 204 and move on as fast as possible [23:46:37] (03PS1) 10EBernhardson: Enable the non NS_* talk namepaces on officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178712 [23:51:07] PROBLEM - Unmerged changes on repository puppet on palladium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [23:51:11] if it's /just/ for statsd, you could just emit them from VCL too [23:52:03] (03CR) 10Spage: [C: 04-1] "I think NS_MODULE_TALK is fragile." (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178712 (owner: 10EBernhardson) [23:53:08] (03PS2) 10EBernhardson: Enable the non NS_* talk namepaces on officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178712 [23:54:56] it just sounds like a lot of moving parts just for getting key/value pairs and writing them to a udp socket [23:55:02] but anyway [23:55:04] off to sleep [23:55:58] (03CR) 10EBernhardson: [C: 032] Enable the non NS_* talk namepaces on officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178712 (owner: 10EBernhardson) [23:56:07] (03Merged) 10jenkins-bot: Enable the non NS_* talk namepaces on officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178712 (owner: 10EBernhardson) [23:56:40] !log ebernhardson Synchronized wmf-config/InitialiseSettings.php: Flow enable the non NS_* talk namespaces (duration: 00m 07s) [23:56:45] Logged the message, Master [23:58:01] Ready for SWAT. Ping me know if there are any questions.