[00:00:41] PROBLEM - CI: Low disk space on /var on labmon1001 is CRITICAL: CRITICAL: integration.integration-puppetmaster.diskspace._var.byte_avail.value (11.11%) [00:02:01] (03CR) 10BryanDavis: mha: replace pmtpa with codfw? (and logging.pp) (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/173464 (owner: 10Dzahn) [00:07:51] RECOVERY - CI: Low disk space on /var on labmon1001 is OK: OK: All targets OK [00:14:54] (03PS2) 10Dzahn: mha: replace pmtpa with codfw? (and logging.pp) [puppet] - 10https://gerrit.wikimedia.org/r/173464 [00:14:57] (03CR) 10Dzahn: mha: replace pmtpa with codfw? (and logging.pp) (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/173464 (owner: 10Dzahn) [00:15:44] (03CR) 10Dzahn: "Bryan, thanks. should i split it into 2 changes because logging and mha aren't related?" [puppet] - 10https://gerrit.wikimedia.org/r/173464 (owner: 10Dzahn) [00:19:11] (03CR) 10BryanDavis: "> should i split it into 2 changes because logging and mha aren't related?" [puppet] - 10https://gerrit.wikimedia.org/r/173464 (owner: 10Dzahn) [00:22:42] jesus, the shinken package... [00:23:43] * YuviPanda gives up on trying to get 2.0 working [00:50:20] (03PS3) 10Dzahn: mha: replace pmtpa with codfw? [puppet] - 10https://gerrit.wikimedia.org/r/173464 [00:51:43] PROBLEM - MySQL Replication Heartbeat on db1016 is CRITICAL: CRIT replication delay 327 seconds [00:51:59] PROBLEM - MySQL Slave Delay on db1016 is CRITICAL: CRIT replication delay 345 seconds [00:52:52] RECOVERY - MySQL Replication Heartbeat on db1016 is OK: OK replication delay -1 seconds [00:53:12] RECOVERY - MySQL Slave Delay on db1016 is OK: OK replication delay 0 seconds [00:53:22] (03PS1) 10Dzahn: logstash beta: remove pmtpa, do TODO [puppet] - 10https://gerrit.wikimedia.org/r/173469 [00:59:10] (03PS4) 10Dzahn: mha: replace pmtpa with codfw? [puppet] - 10https://gerrit.wikimedia.org/r/173464 [01:02:14] (03CR) 10BryanDavis: "We have never setup lvs and pybal in for beta in eqiad so it's probably fine to just kill it all for now. I think there has been talk abou" [puppet] - 10https://gerrit.wikimedia.org/r/173456 (owner: 10Dzahn) [01:09:14] (03PS1) 10Dzahn: LDAP: rm pmtpa, +codfw, gluster/NFS server undef [puppet] - 10https://gerrit.wikimedia.org/r/173470 [01:09:54] (03CR) 10jenkins-bot: [V: 04-1] LDAP: rm pmtpa, +codfw, gluster/NFS server undef [puppet] - 10https://gerrit.wikimedia.org/r/173470 (owner: 10Dzahn) [01:11:34] (03PS1) 10Aaron Schulz: Enable xhprof in labs, testwiki, and with ?forceprofile anywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173472 [01:19:50] (03PS2) 10Dzahn: LDAP: rm pmtpa, +codfw, gluster/NFS server undef [puppet] - 10https://gerrit.wikimedia.org/r/173470 [01:20:31] (03CR) 10jenkins-bot: [V: 04-1] LDAP: rm pmtpa, +codfw, gluster/NFS server undef [puppet] - 10https://gerrit.wikimedia.org/r/173470 (owner: 10Dzahn) [01:21:36] (03CR) 10Chad: Enable xhprof in labs, testwiki, and with ?forceprofile anywhere (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173472 (owner: 10Aaron Schulz) [01:22:06] (03PS3) 10Dzahn: LDAP: rm pmtpa, +codfw, gluster/NFS server undef [puppet] - 10https://gerrit.wikimedia.org/r/173470 [01:25:42] (03CR) 10Dzahn: "Yuvipanda: before i move this into a module called facilities: do you think the PDU monitoring part should be included in one of the other" [puppet] - 10https://gerrit.wikimedia.org/r/171493 (owner: 10Dzahn) [01:27:01] (03PS2) 10Dzahn: (WIP) facilities: move to module [puppet] - 10https://gerrit.wikimedia.org/r/171493 [01:27:08] (03CR) 10jenkins-bot: [V: 04-1] (WIP) facilities: move to module [puppet] - 10https://gerrit.wikimedia.org/r/171493 (owner: 10Dzahn) [01:40:49] (03CR) 10Dzahn: [C: 04-2] "like this getting "Invalid resource type install_certificate" whereever install_certificate is used. but the intention was to move it. can" [puppet] - 10https://gerrit.wikimedia.org/r/171496 (owner: 10Dzahn) [01:42:06] (03CR) 10Dzahn: "http://puppet-compiler.wmflabs.org/494/change/171496/compiled/puppet_catalogs_3_171496/ how would you fix it?" [puppet] - 10https://gerrit.wikimedia.org/r/171496 (owner: 10Dzahn) [01:47:09] (03PS1) 10Dzahn: gerrit templates: fix jenkins/lint warnings [puppet] - 10https://gerrit.wikimedia.org/r/173475 [01:51:10] (03PS3) 10Dereckson: Gerrit also listens on port 22 [puppet] - 10https://gerrit.wikimedia.org/r/172313 (https://bugzilla.wikimedia.org/35611) [01:52:21] (03CR) 10Dereckson: "PS3: rebased" [puppet] - 10https://gerrit.wikimedia.org/r/172313 (https://bugzilla.wikimedia.org/35611) (owner: 10Dereckson) [01:52:39] (03PS1) 10Dzahn: realm.pp - remove pmtpa [puppet] - 10https://gerrit.wikimedia.org/r/173476 [01:54:18] (03CR) 10Dzahn: "changed have been merged that let us use ssh::server with a different $listen_address, but that doesn't mean yet we have done something th" [puppet] - 10https://gerrit.wikimedia.org/r/172313 (https://bugzilla.wikimedia.org/35611) (owner: 10Dereckson) [01:57:27] (03CR) 10Dzahn: [C: 031] "$main_ipaddress is what $site is determined from" [puppet] - 10https://gerrit.wikimedia.org/r/173476 (owner: 10Dzahn) [01:58:38] PROBLEM - CI: Low disk space on /var on labmon1001 is CRITICAL: CRITICAL: integration.integration-puppetmaster.diskspace._var.byte_avail.value (11.11%) [02:01:26] (03PS2) 10Dzahn: realm.pp - remove pmtpa [puppet] - 10https://gerrit.wikimedia.org/r/173476 [02:02:52] (03PS3) 10Dzahn: realm.pp - remove pmtpa [puppet] - 10https://gerrit.wikimedia.org/r/173476 [02:10:32] !log LocalisationUpdate completed (1.25wmf7) at 2014-11-15 02:10:32+00:00 [02:10:41] Logged the message, Master [02:15:41] (03PS4) 10Dzahn: add monitoring for search.wm (Apple dict bridge) [puppet] - 10https://gerrit.wikimedia.org/r/171193 [02:16:31] !log LocalisationUpdate completed (1.25wmf8) at 2014-11-15 02:16:31+00:00 [02:16:35] Logged the message, Master [02:17:04] (03PS5) 10Dzahn: add monitoring for search.wm (Apple dict bridge) [puppet] - 10https://gerrit.wikimedia.org/r/171193 [02:18:31] (03CR) 10Dzahn: "matanya, Giuseppe: how about this way now to make sure we just get a single check. put it on terbium and in a role" [puppet] - 10https://gerrit.wikimedia.org/r/171193 (owner: 10Dzahn) [02:22:58] (03PS1) 10Dzahn: ganglia/gerrit: move install_cert out of site.pp [puppet] - 10https://gerrit.wikimedia.org/r/173477 [02:50:30] (03CR) 10Andrew Bogott: "If removing this we should kill the folsom templates in the same patch. And essex!" [puppet] - 10https://gerrit.wikimedia.org/r/173459 (owner: 10Dzahn) [02:51:17] (03CR) 10Andrew Bogott: "This is fine but I think it's rendered moot by https://gerrit.wikimedia.org/r/#/c/173294/ (presuming that gets merged)" [puppet] - 10https://gerrit.wikimedia.org/r/173460 (owner: 10Dzahn) [02:56:03] (03PS1) 10Dzahn: fix misc jenkins warnings in multiple modules [puppet] - 10https://gerrit.wikimedia.org/r/173478 [02:59:29] (03PS2) 10Dzahn: openstack: rm all folsom/essex files, templates [puppet] - 10https://gerrit.wikimedia.org/r/173459 [03:00:05] (03CR) 10Andrew Bogott: "I bet a lot of this code is moot... patch looks fine though." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/173470 (owner: 10Dzahn) [03:00:38] (03CR) 10Andrew Bogott: [C: 031] openstack: rm all folsom/essex files, templates [puppet] - 10https://gerrit.wikimedia.org/r/173459 (owner: 10Dzahn) [03:02:55] thx, cya [03:28:47] !log LocalisationUpdate ResourceLoader cache refresh completed at Sat Nov 15 03:28:47 UTC 2014 (duration 28m 46s) [03:28:51] Logged the message, Master [03:48:22] (03PS1) 10Dereckson: Set $wgUploadNavigationUrl on it.wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173480 (https://bugzilla.wikimedia.org/73439) [03:49:05] RECOVERY - ElasticSearch health check for shards on logstash1002 is OK: OK - elasticsearch status production-logstash-eqiad: status: yellow, number_of_nodes: 3, unassigned_shards: 8, timed_out: False, active_primary_shards: 41, cluster_name: production-logstash-eqiad, relocating_shards: 0, active_shards: 111, initializing_shards: 4, number_of_data_nodes: 3 [03:49:06] RECOVERY - ElasticSearch health check for shards on logstash1001 is OK: OK - elasticsearch status production-logstash-eqiad: status: yellow, number_of_nodes: 3, unassigned_shards: 8, timed_out: False, active_primary_shards: 41, cluster_name: production-logstash-eqiad, relocating_shards: 0, active_shards: 111, initializing_shards: 4, number_of_data_nodes: 3 [03:49:26] RECOVERY - ElasticSearch health check for shards on logstash1003 is OK: OK - elasticsearch status production-logstash-eqiad: status: yellow, number_of_nodes: 3, unassigned_shards: 8, timed_out: False, active_primary_shards: 41, cluster_name: production-logstash-eqiad, relocating_shards: 0, active_shards: 111, initializing_shards: 4, number_of_data_nodes: 3 [03:53:04] (03CR) 10Dereckson: "pt. isn't going to clear the namespace, so it pages could be moved and renamed through a maintenance script." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172012 (https://bugzilla.wikimedia.org/73164) (owner: 10Dereckson) [03:53:34] (03CR) 10Dereckson: "(its)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172012 (https://bugzilla.wikimedia.org/73164) (owner: 10Dereckson) [04:36:34] My favourite Opsens are mentioned here: https://blog.wikimedia.org/2014/11/14/apertium-and-wikimedia-a-collaboration-that-powers-the-content-translation-tool/ :) [04:36:56] RECOVERY - CI: Low disk space on /var on labmon1001 is OK: OK: All targets OK [04:44:16] (03CR) 10Mattflaschen: "> Temporarily... for two months?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172110 (owner: 10Nemo bis) [04:50:10] kart_: :) [06:29:16] PROBLEM - puppet last run on cp4003 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:56] PROBLEM - puppet last run on mw1025 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:05] PROBLEM - puppet last run on cp4014 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:15] PROBLEM - puppet last run on mw1123 is CRITICAL: CRITICAL: Puppet has 2 failures [06:46:06] RECOVERY - puppet last run on mw1025 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [06:46:17] RECOVERY - puppet last run on mw1123 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [06:46:45] RECOVERY - puppet last run on cp4003 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [06:47:16] RECOVERY - puppet last run on cp4014 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [06:48:26] PROBLEM - puppet last run on db1040 is CRITICAL: CRITICAL: Puppet has 1 failures [07:06:05] RECOVERY - puppet last run on db1040 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [07:13:21] (03PS1) 1020after4: set up redirects for bugzilla urls to redirect to phabricator. [puppet] - 10https://gerrit.wikimedia.org/r/173483 [07:14:22] (03CR) 1020after4: [C: 04-1] "still needs a couple of extra regex patterns but this is mostly" [puppet] - 10https://gerrit.wikimedia.org/r/173483 (owner: 1020after4) [07:17:21] (03PS2) 1020after4: set up redirects for bugzilla urls to redirect to phabricator. [puppet] - 10https://gerrit.wikimedia.org/r/173483 [07:36:42] (03CR) 10Nemo bis: "As noted by Atlasowa on the talk page, the team asked/promised this to be enabled for two weeks. It stayed enabled for two months." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172110 (owner: 10Nemo bis) [07:46:05] (03PS1) 10Yuvipanda: base: Allow overriding core dump path via hiera [puppet] - 10https://gerrit.wikimedia.org/r/173484 [07:46:12] (03CR) 10Steinsplitter: [C: 031] Set $wgUploadNavigationUrl on it.wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173480 (https://bugzilla.wikimedia.org/73439) (owner: 10Dereckson) [07:48:12] ori: _joe_ would appreciate +1 if you have the time on https://gerrit.wikimedia.org/r/#/c/173484/ already had to clean up core dumps once more [07:53:59] (03CR) 10Yuvipanda: "https://wikitech.wikimedia.org/w/index.php?title=Hiera%3ATools&diff=134584&oldid=132589 is associated hiera change." [puppet] - 10https://gerrit.wikimedia.org/r/173484 (owner: 10Yuvipanda) [07:54:11] (03Abandoned) 10Glaisher: Redirect mul.wikisource.org to wikisource.org [puppet] - 10https://gerrit.wikimedia.org/r/173250 (https://bugzilla.wikimedia.org/73407) (owner: 10Glaisher) [08:33:41] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "See my inline comment" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/173484 (owner: 10Yuvipanda) [08:34:16] PROBLEM - puppet last run on cp3004 is CRITICAL: CRITICAL: puppet fail [08:41:24] Hmm locked out of the house again [08:53:46] RECOVERY - puppet last run on cp3004 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [09:03:36] (03PS1) 10Reedy: Remove -hhvm suffix from beta multiversion config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173486 [09:14:19] (03PS1) 10Reedy: Update CDB code from upstream (mediawiki core) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173487 (https://bugzilla.wikimedia.org/73454) [09:28:38] (03PS2) 10Reedy: Update CDB code from upstream (mediawiki core) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173487 (https://bugzilla.wikimedia.org/73454) [09:28:40] (03PS1) 10Reedy: Swap missing.php to use CDB reader wrappers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173488 (https://bugzilla.wikimedia.org/73454) [09:32:21] (03PS2) 10TTO: Swap missing.php to use CDB reader wrappers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173488 (https://bugzilla.wikimedia.org/73454) (owner: 10Reedy) [09:34:22] (03CR) 10TTO: Swap missing.php to use CDB reader wrappers (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173488 (https://bugzilla.wikimedia.org/73454) (owner: 10Reedy) [09:35:13] (03CR) 10Reedy: [C: 04-1] Swap missing.php to use CDB reader wrappers (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173488 (https://bugzilla.wikimedia.org/73454) (owner: 10Reedy) [09:35:59] (03PS3) 10Reedy: Swap missing.php to use CDB reader wrappers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173488 (https://bugzilla.wikimedia.org/73454) [09:40:17] (03CR) 10TTO: [C: 031] "LGTM, although I haven't tested it locally" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173488 (https://bugzilla.wikimedia.org/73454) (owner: 10Reedy) [10:02:02] (03PS1) 10Reedy: Fix php short tags [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173489 [10:37:44] (03PS4) 10Reedy: mediawiki: simplify apache config [puppet] - 10https://gerrit.wikimedia.org/r/170300 (owner: 10Giuseppe Lavagetto) [10:59:43] (03PS1) 10Reedy: Rebuild beta apache config ontop of production config [puppet] - 10https://gerrit.wikimedia.org/r/173492 [11:07:22] (03CR) 10Reedy: "[11:02:24] https://github.com/wikimedia/operations-puppet/blob/production/modules/mediawiki/files/apache/sites/remnant.conf#L293-L" [puppet] - 10https://gerrit.wikimedia.org/r/170300 (owner: 10Giuseppe Lavagetto) [11:17:15] (03PS1) 10Reedy: Make qualitywiki HTTPS only [puppet] - 10https://gerrit.wikimedia.org/r/173493 [11:17:36] PROBLEM - CI: Low disk space on /var on labmon1001 is CRITICAL: CRITICAL: integration.integration-puppetmaster.diskspace._var.byte_avail.value (11.11%) [11:31:00] !log reedy Synchronized php-1.25wmf7/extensions/cldr/: Fix warnings (duration: 00m 15s) [11:31:02] Logged the message, Master [11:31:26] !log reedy Synchronized php-1.25wmf7/extensions/CommonsMetadata/: Fix warnings (duration: 00m 18s) [11:31:28] Logged the message, Master [11:31:51] (03CR) 10Reedy: [C: 032] Fix php short tags [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173489 (owner: 10Reedy) [11:32:02] (03Merged) 10jenkins-bot: Fix php short tags [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173489 (owner: 10Reedy) [11:32:30] !log reedy Synchronized wmf-config/missing.php: Fix php short tags (duration: 00m 16s) [11:32:33] Logged the message, Master [11:33:45] (03PS3) 10Reedy: Update CDB code from upstream (mediawiki core) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173487 (https://bugzilla.wikimedia.org/73454) [11:33:53] (03CR) 10Reedy: [C: 032] Update CDB code from upstream (mediawiki core) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173487 (https://bugzilla.wikimedia.org/73454) (owner: 10Reedy) [11:34:05] (03Merged) 10jenkins-bot: Update CDB code from upstream (mediawiki core) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173487 (https://bugzilla.wikimedia.org/73454) (owner: 10Reedy) [11:41:41] _joe_: I was gonna ask why has_ganglia is a bare hiera value and core_dump_pattern isn't, then realized has_ganglia will be used outside of any single role... [11:42:42] <_joe_> YuviPanda|zzz: role::cache makes use of a lot of inheritance [11:42:51] <_joe_> which makes hiera usage uncomfortable [11:42:58] <_joe_> but yeah, it may be better for sure [11:42:59] well, memcached also has a dependency on ganglia [11:43:05] so I can just use the same variable there [11:43:08] <_joe_> YuviPanda|zzz: yes [11:43:21] <_joe_> but I'd tend not to do that if possible [11:43:37] <_joe_> but ok, this is a simple global feature switch [11:43:42] <_joe_> so that's cool [11:45:20] yeah [11:45:22] (03PS2) 10Yuvipanda: base: Allow overriding core dump path via hiera [puppet] - 10https://gerrit.wikimedia.org/r/173484 [11:45:37] which for practical purposes at this point is also just a realm branch without being a realm branch :) [11:45:43] _joe_: ^ +1? [11:45:46] err [11:45:49] I should ask for CR [11:45:51] rather than +1s [11:45:55] a bit presumptuous [11:48:09] brb [11:52:28] PROBLEM - puppet last run on amssq38 is CRITICAL: CRITICAL: puppet fail [11:55:33] <_joe_> YuviPanda: the secret is to have feature branches, not real ones [11:56:08] Heh [11:56:17] <_joe_> because a feature branch can be reused 100 times [11:56:21] <_joe_> *realm ones [11:56:28] <_joe_> so if you have 5 realms [11:56:34] <_joe_> you don't need 5 cases [11:57:12] <_joe_> say we want the core dump pattern to have a special value on the appservers, another on the varnishes, another in labs, another again in beta [11:57:31] <_joe_> before hiera, that would've meant a ton of branches [11:57:39] (03CR) 10Giuseppe Lavagetto: [C: 031] base: Allow overriding core dump path via hiera [puppet] - 10https://gerrit.wikimedia.org/r/173484 (owner: 10Yuvipanda) [11:58:13] Yeah [11:58:14] <_joe_> the idea is that you should branch the code only when it's strictly necessary, and not just to set data [11:58:15] Definitely [11:58:42] <_joe_> so, you branch on "do I use ganglia?" and not on "am I in labs?" [11:58:58] <_joe_> the info that ganglia is installed or not resides outside your code [11:59:08] I'm slowly getting used to Hiera [11:59:08] <_joe_> and that is a sane pattern [11:59:19] <_joe_> it takes some (small) effort [11:59:36] <_joe_> and it's definitely counterintuitive if you're used to our puppet code [12:00:02] <_joe_> which is by many respects a series of workarounds to the absence of a config mechanism for classes [12:00:13] Yeah [12:00:17] <_joe_> so we used extensively glocal variables [12:00:44] <_joe_> (global variables that can be overridden at the node scope and are then reused inside an unrelated class scope) [12:01:57] I [12:02:01] So [12:02:17] <_joe_> bbl [12:02:27] I think we should have a Hiera:labs [12:02:29] Thing [12:03:00] So we can kill a lot more realm branching [12:04:08] <_joe_> YuviPanda: we have that, but not configurable via mediawiki I guess [12:04:16] <_joe_> I'll take a look later [12:04:19] Ok [12:04:24] <_joe_> or well, monday :) [12:04:50] Might be OK to do it in puppet itself [12:05:08] Have a fun weekend [12:13:09] RECOVERY - puppet last run on amssq38 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [12:20:44] !log reedy Synchronized multiversion/: CDB updates (duration: 00m 14s) [12:20:49] Logged the message, Master [12:21:58] (03PS4) 10Reedy: Swap missing.php to use CDB reader wrappers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173488 (https://bugzilla.wikimedia.org/73454) [12:23:11] (03CR) 10Reedy: [C: 032] Swap missing.php to use CDB reader wrappers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173488 (https://bugzilla.wikimedia.org/73454) (owner: 10Reedy) [12:23:18] (03Merged) 10jenkins-bot: Swap missing.php to use CDB reader wrappers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173488 (https://bugzilla.wikimedia.org/73454) (owner: 10Reedy) [12:23:53] !log reedy Synchronized wmf-config/missing.php: hhvm support (duration: 00m 14s) [12:23:56] Logged the message, Master [12:34:48] PROBLEM - CI: Low disk space on /var on labmon1001 is CRITICAL: CRITICAL: integration.integration-puppetmaster.diskspace._var.byte_avail.value (11.11%) [13:08:57] (03PS1) 10Reedy: wgMemoryLimit to 330MB [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173507 [13:23:51] (03PS10) 10Krinkle: [WIP] contint: Apply contint::qunit_localhost to labs slaves [puppet] - 10https://gerrit.wikimedia.org/r/168631 (https://bugzilla.wikimedia.org/72063) [13:27:12] (03PS1) 10Reedy: Fix Undefined index: fulltext [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173508 [13:35:57] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [500.0] [13:49:35] Krinkle: sudo ldaplist -l passwd jenkins-deploy [13:49:53] thx [13:51:28] Reedy: Ah, I still have to restructure the require in tmpfs.pp though because in prod file { user=>foo will auto-create the user via require User[foo]. [13:51:39] so I'll have to move that require to the caller of tmpfs [13:51:44] :) [13:53:37] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [13:55:14] (03CR) 10Yuvipanda: [C: 032] base: Allow overriding core dump path via hiera [puppet] - 10https://gerrit.wikimedia.org/r/173484 (owner: 10Yuvipanda) [14:05:02] (03PS1) 10Yuvipanda: memcached: Make ganglia inclusion optional [puppet] - 10https://gerrit.wikimedia.org/r/173510 [14:07:43] (03PS11) 10Krinkle: [WIP] contint: Apply contint::qunit_localhost to labs slaves [puppet] - 10https://gerrit.wikimedia.org/r/168631 (https://bugzilla.wikimedia.org/72063) [14:07:45] (03PS1) 10Krinkle: contint: Move tmpfs Require to caller to support labs' jenkins-deploy [puppet] - 10https://gerrit.wikimedia.org/r/173511 [14:07:47] (03PS1) 10Krinkle: [WIP] contint: Add tmpfs mount in jenkins-deploy homedir for labs slaves [puppet] - 10https://gerrit.wikimedia.org/r/173512 (https://bugzilla.wikimedia.org/72063) [14:12:29] ok shinken is dead and I don't know why [14:12:31] hmm [14:13:27] Hm.. var/lib/puppet/reports is filling up on integration-puppetmaster [14:13:35] 1.6 GB of yaml files one for each date? [14:13:49] one for every run (every 30 min) 500K a run [14:13:51] no purging? [14:14:31] https://github.com/wikimedia/operations-puppet/blob/fcf512316029d451fc604cf1fac6e88561752309/modules/puppetmaster/manifests/scripts.pp#L35 [14:14:41] Ah, that should be running on self-hosted puppet masters, too I guess [14:14:43] YuviPanda: ideas? [14:15:12] icinga-wm: PROBLEM - CI: Low disk space on /var on labmon1001 is CRITICAL: CRITICAL: integration.integration-puppetmaster.diskspace._var.byte_avail.value (11.11%) [14:16:33] Hm.. it seems to only contain logs for 2014-11 [14:16:40] I guess it's just that big? [14:16:42] -_- [14:26:23] Krinkle: sorry, have to go :( [14:26:37] !log made reedy 'full' user on webmaster tools [14:26:44] Logged the message, Master [14:27:27] RECOVERY - CI: Low disk space on /var on labmon1001 is OK: OK: All targets OK [14:37:13] https://en.wikipedia.org/wiki/User:Tifego/monobook.css?action=raw [14:37:14] 403? [15:50:57] (03PS1) 10Aklapper: Reapply Bugzilla XML-RPC API workaround for a short while again [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/173516 [15:51:15] (03PS4) 10Reedy: Upgrade to jquery 2.1.1 and jquery-ui 1.11.2 [software] - 10https://gerrit.wikimedia.org/r/125883 [15:51:27] (03PS2) 10Aklapper: Reapply Bugzilla XML-RPC API workaround for a short while again [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/173516 [15:51:34] (03PS5) 10Reedy: Upgrade to jquery 2.1.1 and jquery-ui 1.11.2 [software] - 10https://gerrit.wikimedia.org/r/125883 [15:53:51] (03PS6) 10Reedy: Upgrade to jquery 2.1.1 and jquery-ui 1.11.2 [software] - 10https://gerrit.wikimedia.org/r/125883 [16:21:23] (03PS1) 10Hoo man: Enable global AbuseFilter on wikidatawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173519 [16:31:28] PROBLEM - Host berkelium is DOWN: PING CRITICAL - Packet loss = 100% [16:31:51] ^^ me [16:31:56] (03PS1) 10Reedy: Upgrade to jquery 2.1.1 and jquery-ui 1.11.2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173520 [16:31:57] curium too [16:32:38] RECOVERY - Host berkelium is UP: PING OK - Packet loss = 0%, RTA = 0.27 ms [16:33:06] PROBLEM - Host curium is DOWN: PING CRITICAL - Packet loss = 100% [16:34:37] RECOVERY - Host curium is UP: PING OK - Packet loss = 0%, RTA = 2.41 ms [16:34:59] (03PS2) 10Reedy: Upgrade to jquery 2.1.1 and jquery-ui 1.11.2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173520 [16:37:19] (03CR) 10Reedy: [C: 032] Upgrade to jquery 2.1.1 and jquery-ui 1.11.2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173520 (owner: 10Reedy) [16:37:26] (03Merged) 10jenkins-bot: Upgrade to jquery 2.1.1 and jquery-ui 1.11.2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173520 (owner: 10Reedy) [16:38:50] !log reedy Synchronized docroot and w: dbtree (duration: 00m 14s) [16:38:54] Logged the message, Master [16:47:31] (03PS1) 10Reedy: Update jquery.bt.min.js to 0.9.7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173522 [16:47:33] (03PS1) 10Reedy: Minify jquery.jOrgChart [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173523 [16:47:52] (03CR) 10Reedy: [C: 032] Update jquery.bt.min.js to 0.9.7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173522 (owner: 10Reedy) [16:47:59] (03CR) 10Reedy: [C: 032] Minify jquery.jOrgChart [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173523 (owner: 10Reedy) [16:48:01] (03Merged) 10jenkins-bot: Update jquery.bt.min.js to 0.9.7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173522 (owner: 10Reedy) [16:48:07] (03Merged) 10jenkins-bot: Minify jquery.jOrgChart [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173523 (owner: 10Reedy) [16:48:27] !log reedy Synchronized docroot and w: dbtree (duration: 00m 14s) [16:49:22] !log reedy Synchronized docroot and w: fix typo (duration: 00m 15s) [16:49:24] Logged the message, Master [16:50:21] (03PS1) 10Reedy: Add missing . [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173524 [16:50:32] (03CR) 10Reedy: [C: 032] Add missing . [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173524 (owner: 10Reedy) [16:50:40] (03Merged) 10jenkins-bot: Add missing . [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173524 (owner: 10Reedy) [17:27:18] PROBLEM - puppet last run on mw1176 is CRITICAL: CRITICAL: Puppet has 1 failures [17:45:38] RECOVERY - puppet last run on mw1176 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [18:00:17] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.00666666666667 [18:20:26] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.00333333333333 [18:25:17] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [20:05:46] PROBLEM - MySQL Replication Heartbeat on db1015 is CRITICAL: CRIT replication delay 305 seconds [20:07:55] RECOVERY - MySQL Replication Heartbeat on db1015 is OK: OK replication delay 43 seconds [21:09:06] (03PS1) 10Reedy: Update size related dblists [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173544 [21:10:08] (03CR) 10Reedy: [C: 032] Update size related dblists [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173544 (owner: 10Reedy) [21:10:19] (03Merged) 10jenkins-bot: Update size related dblists [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173544 (owner: 10Reedy) [21:11:05] jgage: logstash hosts are very happy with their new giant disks [21:11:41] You should turn the hadoop firehose back on when you get around to it. [21:11:44] !log reedy Synchronized database lists: update size dblists (duration: 00m 17s) [21:11:50] Logged the message, Master [21:12:12] Reedy: How's the hackathon? [21:12:14] bd808: ahhh [21:12:20] There was someone else wanting api logging [21:12:31] I know you said to brad to wait till we got bigger disks [21:12:40] I can't remember who/why :( [21:13:34] Reedy: With the new disk we are at 12% used and back to a full copy of everything on all hosts. \o/ [21:13:54] sweeeeet :D [21:14:45] log all the things! [21:15:27] I'm working on config to log with monolog and redis as the transport. I should have it on beta early next week. (2 weeks late from my plan) [21:15:45] /dev/md0 6.0T 650G 5.1T 12% /var/lib/elasticsearch [21:15:51] haha, damn [21:17:04] cmjohnson hooked us up with 2 3T drives in each host. [21:20:02] certainly looks a lot more healthy [21:29:49] bd808: That's something that annoys me with kibana [21:29:55] Why can't I click on "api-feature-usage (6363)" [21:30:04] Why do I have to click on the bar chart for it? :/ [21:33:33] Reedy: {{sofixit}} https://github.com/elasticsearch/kibana/tree/3.0 [21:33:39] hahaa [21:34:07] * Reedy opens an issue to begin with [21:35:33] They have pretty much abandoned the 3.0 branch. Master is diverging pretty rapidly. [21:36:04] But it wants to run a ruby http server that I'm not excited about [21:36:33] "This branch is 1124 commits ahead, 4034 commits behind master" [21:36:36] :/ [21:36:42] https://github.com/elasticsearch/kibana/issues/1945 at least [21:37:09] Is 5.0 going to be in Go or something? [21:38:33] heh. Maybe. They have already done php -> ruby -> angular -> angular + ruby :) [21:40:54] Reedy: We might have a tiny bit of pull if we asked. The PR folks have asked me before if I'd like to help with a blog post. [21:55:47] PROBLEM - puppet last run on cp4012 is CRITICAL: CRITICAL: puppet fail [22:15:17] RECOVERY - puppet last run on cp4012 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [22:50:10] (03PS1) 10Ori.livneh: allow multiple sudo::user grants for same user [puppet] - 10https://gerrit.wikimedia.org/r/173629 [23:25:03] (03CR) 10Ori.livneh: "compiler run: http://puppet-compiler.wmflabs.org/497/change/173629/html/" [puppet] - 10https://gerrit.wikimedia.org/r/173629 (owner: 10Ori.livneh) [23:37:34] (03PS1) 10Ori.livneh: keyholder: add icinga check [puppet] - 10https://gerrit.wikimedia.org/r/173633 [23:59:59] (03PS2) 10Yuvipanda: memcached: Make ganglia inclusion optional [puppet] - 10https://gerrit.wikimedia.org/r/173510