[00:00:04] RoanKattouw, ^d, marktraceur, MaxSem, ebernhardson: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141111T0000). [00:00:22] (03CR) 10Faidon Liambotis: [C: 032] geoip: switch data::maxmind to geoipupdate [puppet] - 10https://gerrit.wikimedia.org/r/172444 (owner: 10Faidon Liambotis) [00:00:29] RECOVERY - puppet last run on mw1096 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [00:00:40] RECOVERY - puppet last run on mw1109 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [00:02:00] ebernhardson: You around for your SWAT? [00:02:04] Also please tell me that https://gerrit.wikimedia.org/r/#/c/172439/ is a typo [00:02:13] Should it be "logged in users should NOT save null values" [00:02:14] whew, icinga ntp check takes a while to become OK after reboot [00:02:14] ? [00:02:16] RoanKattouw: yup [00:02:27] (03PS1) 10Dzahn: move Bugzilla DNS over to phab box for migration [dns] - 10https://gerrit.wikimedia.org/r/172448 [00:02:27] also, that cisco bios took 5 minutes to POST :P [00:02:28] sigh, yes typo. it should store either a user id *or* a user ip address [00:02:31] but never both [00:02:36] (03CR) 10jenkins-bot: [V: 04-1] move Bugzilla DNS over to phab box for migration [dns] - 10https://gerrit.wikimedia.org/r/172448 (owner: 10Dzahn) [00:02:39] lol, yea, cisco boot is slow [00:02:40] ebernhardson: Oh, SHOULD save null values to IP not ID [00:02:57] longest moments ever: waiting for a box to start pinging again after reboot [00:04:24] (03PS1) 10Jalexander: Add securepoll specific dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172449 (https://bugzilla.wikimedia.org/73245) [00:04:49] PROBLEM - puppet last run on mw1031 is CRITICAL: CRITICAL: Puppet has 1 failures [00:04:49] PROBLEM - puppet last run on mw1197 is CRITICAL: CRITICAL: Puppet has 1 failures [00:04:51] PROBLEM - puppet last run on mw1005 is CRITICAL: CRITICAL: Puppet has 2 failures [00:05:22] (03CR) 10Dzahn: [C: 04-2] "ehm, yea, iridium doesn't have a public IP, unlike zirconium. so what's the plan to switch this over then? misc-web varnish config change " [dns] - 10https://gerrit.wikimedia.org/r/172448 (owner: 10Dzahn) [00:05:45] (03PS1) 10Faidon Liambotis: geoip: remove product 121 (GeoIPISP) from config [puppet] - 10https://gerrit.wikimedia.org/r/172450 [00:05:59] (03CR) 10Faidon Liambotis: [C: 032] geoip: remove product 121 (GeoIPISP) from config [puppet] - 10https://gerrit.wikimedia.org/r/172450 (owner: 10Faidon Liambotis) [00:07:10] PROBLEM - puppet last run on mw1153 is CRITICAL: CRITICAL: Puppet has 1 failures [00:07:20] PROBLEM - puppet last run on mw1164 is CRITICAL: CRITICAL: Puppet has 1 failures [00:07:29] PROBLEM - puppet last run on mw1176 is CRITICAL: CRITICAL: Puppet has 1 failures [00:07:32] PROBLEM - puppet last run on cp3003 is CRITICAL: CRITICAL: Puppet has 1 failures [00:09:24] (03PS2) 10Jalexander: Add SecurePoll specific dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172449 (https://bugzilla.wikimedia.org/73245) [00:11:14] (03PS3) 10Jalexander: Add SecurePoll specific dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172449 (https://bugzilla.wikimedia.org/73245) [00:11:16] <^d> paravoid: Earlier we talked about Elastic monitoring and worrying about split-brain if we moved it to icinga. On that note: I've been thinking it'd be good if we had some kind of (at least rudimentary) split-brain detection running somewhere as it is. [00:12:17] (03PS1) 10Dzahn: add IPv6 record for iodine (OTRS) [dns] - 10https://gerrit.wikimedia.org/r/172452 [00:13:03] (03PS2) 10Dzahn: add IPv6 record for iodine (OTRS) [dns] - 10https://gerrit.wikimedia.org/r/172452 [00:13:55] (03CR) 10Ori.livneh: [C: 032] hhvm::debug: add apache2-utils [puppet] - 10https://gerrit.wikimedia.org/r/172433 (owner: 10Ori.livneh) [00:14:31] (03PS3) 10Dzahn: add IPv6 record for iodine (OTRS) [dns] - 10https://gerrit.wikimedia.org/r/172452 (https://bugzilla.wikimedia.org/71262) [00:14:36] is there a swat deploy tomorrow or skipping the holiday and going to Wednesday? [00:15:30] jamesofur: There is no SWAT [00:15:38] jamesofur: See https://wikitech.wikimedia.org/wiki/Deployments#Week_of_November_10th [00:15:43] https://wikitech.wikimedia.org/wiki/Deployments#Tuesday.2C.C2.A0November.C2.A011 [00:16:10] Holy crap I just realized a neat feature that I never discovered before [00:16:14] That calendar auto-converts to local time [00:16:26] I just noticed this for the first time because I'm visiting New York and so my timezone != PST [00:16:47] RoanKattouw: https://wikitech.wikimedia.org/wiki/MediaWiki:Common.js :D [00:18:12] <^d> I still want it to auto-collapse or grey out or something after a window has passed. [00:18:44] i want to type !swat 141232 blah blah blah [00:18:52] in -ops have just have it all automagically work :P [00:19:40] what could go wrong ;) [00:20:31] <^d> ebernhardson: Too much work! [00:20:37] jgage: the deployer is supposed to check with the requestor and make sure they are available, so in theory not much :) [00:20:45] (03PS1) 10Faidon Liambotis: geoip: add support for GeoLite2 [puppet] - 10https://gerrit.wikimedia.org/r/172455 [00:20:49] <^d> The bot should just recognize "!deploy" from a particular person [00:20:58] <^d> And deploy a patch that's already queued up. [00:21:07] thats more scary :P [00:21:43] !log catrope Synchronized php-1.25wmf7/extensions/VisualEditor: SWAT (duration: 00m 04s) [00:21:43] (03PS1) 10Faidon Liambotis: Kill old, unused geoip script from files/misc/ [puppet] - 10https://gerrit.wikimedia.org/r/172456 [00:21:49] Logged the message, Master [00:21:57] !log catrope Synchronized php-1.25wmf7/extensions/Flow: SWAT (duration: 00m 05s) [00:21:59] Logged the message, Master [00:22:17] (03CR) 10Faidon Liambotis: [C: 032] geoip: add support for GeoLite2 [puppet] - 10https://gerrit.wikimedia.org/r/172455 (owner: 10Faidon Liambotis) [00:22:17] ebernhardson: Please check ---^^ [00:22:20] RECOVERY - puppet last run on mw1031 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [00:22:30] RECOVERY - puppet last run on mw1005 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [00:22:31] (03CR) 10Faidon Liambotis: [C: 032] Kill old, unused geoip script from files/misc/ [puppet] - 10https://gerrit.wikimedia.org/r/172456 (owner: 10Faidon Liambotis) [00:22:42] RoanKattouw: checking, thanks [00:23:20] RECOVERY - puppet last run on mw1197 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [00:23:31] what a mess [00:24:08] RoanKattouw: looks to be working well thanks [00:24:49] (03PS4) 10Legoktm: add IPv6 record for iodine (OTRS) [dns] - 10https://gerrit.wikimedia.org/r/172452 (https://bugzilla.wikimedia.org/71262) (owner: 10Dzahn) [00:24:50] RECOVERY - puppet last run on mw1164 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [00:24:58] paravoid: ? [00:25:00] RECOVERY - puppet last run on mw1176 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [00:25:10] RECOVERY - puppet last run on cp3003 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [00:25:10] our geoip module [00:25:12] and our puppetmaster module [00:25:22] and our puppet module :) [00:25:50] RECOVERY - puppet last run on mw1153 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [00:45:59] RECOVERY - ElasticSearch health check on elastic1006 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 31: number_of_data_nodes: 31: active_primary_shards: 2117: active_shards: 6367: relocating_shards: 16: initializing_shards: 0: unassigned_shards: 0 [00:45:59] RECOVERY - ElasticSearch health check on elastic1007 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 31: number_of_data_nodes: 31: active_primary_shards: 2117: active_shards: 6367: relocating_shards: 16: initializing_shards: 0: unassigned_shards: 0 [00:49:45] (03PS8) 10Yuvipanda: [WIP] shinken: Add basic service checks for all of labs [puppet] - 10https://gerrit.wikimedia.org/r/172420 [00:49:47] (03PS1) 10Yuvipanda: beta: Remove redundant shinken declaration [puppet] - 10https://gerrit.wikimedia.org/r/172462 [00:52:39] PROBLEM - MySQL Slave Delay on db1016 is CRITICAL: CRIT replication delay 361 seconds [00:52:45] PROBLEM - MySQL Replication Heartbeat on db1016 is CRITICAL: CRIT replication delay 361 seconds [00:52:57] (03PS13) 10Andrew Bogott: Add class and role for Openstack Horizon [puppet] - 10https://gerrit.wikimedia.org/r/170340 [00:53:49] RECOVERY - MySQL Slave Delay on db1016 is OK: OK replication delay 0 seconds [00:54:02] RECOVERY - MySQL Replication Heartbeat on db1016 is OK: OK replication delay -0 seconds [01:00:01] (03PS1) 10Jalexander: Adjustments to securepoll and usergroups for voteWiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172464 (https://bugzilla.wikimedia.org/72589) [01:04:17] (03PS1) 10Ori.livneh: wmflib: update require_package() [puppet] - 10https://gerrit.wikimedia.org/r/172466 [01:24:56] (03PS1) 10Ori.livneh: hhvm::debug: add new package source path to gdbinit [puppet] - 10https://gerrit.wikimedia.org/r/172468 [01:25:20] (03CR) 10Ori.livneh: [C: 032 V: 032] hhvm::debug: add new package source path to gdbinit [puppet] - 10https://gerrit.wikimedia.org/r/172468 (owner: 10Ori.livneh) [01:26:16] (03CR) 10Ori.livneh: [C: 032] wmflib: update require_package() [puppet] - 10https://gerrit.wikimedia.org/r/172466 (owner: 10Ori.livneh) [01:30:23] (03PS2) 10Dzahn: move Bugzilla DNS over to phab box for migration [dns] - 10https://gerrit.wikimedia.org/r/172448 [01:30:26] (03PS4) 10Ori.livneh: wmflib: make require_package() accept arrays [puppet] - 10https://gerrit.wikimedia.org/r/172305 [01:30:33] (03CR) 10jenkins-bot: [V: 04-1] move Bugzilla DNS over to phab box for migration [dns] - 10https://gerrit.wikimedia.org/r/172448 (owner: 10Dzahn) [01:30:46] (03CR) 10Ori.livneh: [C: 032 V: 032] wmflib: make require_package() accept arrays [puppet] - 10https://gerrit.wikimedia.org/r/172305 (owner: 10Ori.livneh) [01:30:52] (03PS3) 10Dzahn: move Bugzilla DNS over to phab box for migration [dns] - 10https://gerrit.wikimedia.org/r/172448 [01:31:00] (03CR) 10jenkins-bot: [V: 04-1] move Bugzilla DNS over to phab box for migration [dns] - 10https://gerrit.wikimedia.org/r/172448 (owner: 10Dzahn) [01:31:33] PROBLEM - graphite.wikimedia.org on labmon1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:32:23] RECOVERY - graphite.wikimedia.org on labmon1001 is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 0.008 second response time [01:33:14] (03CR) 10Dzahn: "does this need more code to handle all of these 3?" [puppet] - 10https://gerrit.wikimedia.org/r/166283 (owner: 1020after4) [01:36:39] (03PS1) 10Dzahn: switch bugzilla names over to misc-web [dns] - 10https://gerrit.wikimedia.org/r/172469 [01:37:52] PROBLEM - graphite.wikimedia.org on labmon1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:38:22] (03PS2) 10Dzahn: switch bugzilla names over to misc-web [dns] - 10https://gerrit.wikimedia.org/r/172469 [01:38:35] RECOVERY - graphite.wikimedia.org on labmon1001 is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 0.432 second response time [01:38:48] (03CR) 10Dzahn: [C: 04-2] "needs varnish config change as well and coordination when to make the switch" [dns] - 10https://gerrit.wikimedia.org/r/172469 (owner: 10Dzahn) [01:43:48] (03PS1) 10Dzahn: misc-web varnish: bugzilla to phab box [puppet] - 10https://gerrit.wikimedia.org/r/172471 [01:44:37] (03CR) 10Dzahn: "so this instead ? https://gerrit.wikimedia.org/r/#/c/172471/" [dns] - 10https://gerrit.wikimedia.org/r/172448 (owner: 10Dzahn) [01:48:40] (03CR) 10Dzahn: "needs a change to use the actual production URLs instead at some point." [puppet] - 10https://gerrit.wikimedia.org/r/166283 (owner: 1020after4) [01:50:52] (03PS3) 10Andrew Bogott: No longer ensure => absent package python-memcache [puppet] - 10https://gerrit.wikimedia.org/r/172413 [01:50:54] (03PS14) 10Andrew Bogott: Add class and role for Openstack Horizon [puppet] - 10https://gerrit.wikimedia.org/r/170340 [01:50:56] (03PS1) 10Andrew Bogott: Move ganglia memcache.py to gmond_memcached.py [puppet] - 10https://gerrit.wikimedia.org/r/172474 [01:51:34] (03CR) 10Andrew Bogott: "I haven't tested this yet, but it's an attempt to replicate https://gerrit.wikimedia.org/r/#/c/68035/" [puppet] - 10https://gerrit.wikimedia.org/r/172474 (owner: 10Andrew Bogott) [01:52:03] PROBLEM - Disk space on logstash1003 is CRITICAL: DISK CRITICAL - free space: / 16525 MB (3% inode=99%): [01:57:59] (03CR) 10Anomie: "Actually getting it configured in SecurePoll is a little more work, but not much." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172449 (https://bugzilla.wikimedia.org/73245) (owner: 10Jalexander) [01:59:58] (03CR) 10Jalexander: "Perfect, I was going to ask you about the changes required for that, I should be able to push them this evening or tomorrow morning so tha" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172449 (https://bugzilla.wikimedia.org/73245) (owner: 10Jalexander) [02:04:42] PROBLEM - graphite.wikimedia.org on labmon1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:05:43] RECOVERY - graphite.wikimedia.org on labmon1001 is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 6.469 second response time [02:13:17] (03CR) 10Dzahn: [C: 032] beta: Remove redundant shinken declaration [puppet] - 10https://gerrit.wikimedia.org/r/172462 (owner: 10Yuvipanda) [02:18:33] RECOVERY - Disk space on logstash1003 is OK: DISK OK [02:21:33] (03CR) 10Springle: [C: 032] "Faidon proposed uniqeid (which is really hostid), but docs and some very brief tests suggest hostid is fragile on different platforms, pos" [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/172216 (owner: 10Springle) [02:28:21] !log LocalisationUpdate completed (1.25wmf6) at 2014-11-11 02:28:21+00:00 [02:28:28] Logged the message, Master [02:34:32] (03PS1) 10Springle: Update MariaDB submodule. [puppet] - 10https://gerrit.wikimedia.org/r/172475 [02:36:00] (03CR) 10Springle: [C: 032] Update MariaDB submodule. [puppet] - 10https://gerrit.wikimedia.org/r/172475 (owner: 10Springle) [02:40:58] !log LocalisationUpdate completed (1.25wmf7) at 2014-11-11 02:40:58+00:00 [02:41:02] Logged the message, Master [02:41:26] (03PS1) 10Dzahn: gerrit: configure sshd to not listen on gerrit IP [puppet] - 10https://gerrit.wikimedia.org/r/172476 [02:44:22] (03PS2) 10Dzahn: gerrit: configure sshd to not listen on gerrit IP [puppet] - 10https://gerrit.wikimedia.org/r/172476 [02:44:41] (03CR) 10Dzahn: "--> https://gerrit.wikimedia.org/r/#/c/172476/ ?" [puppet] - 10https://gerrit.wikimedia.org/r/172313 (https://bugzilla.wikimedia.org/35611) (owner: 10Dereckson) [02:45:46] (03CR) 10Dzahn: [C: 04-1] "see FIXME for the labs role. needs the right instance name" [puppet] - 10https://gerrit.wikimedia.org/r/172476 (owner: 10Dzahn) [02:58:15] (03CR) 10Dzahn: [C: 031] Add cron job that generates flow statistics [puppet] - 10https://gerrit.wikimedia.org/r/171465 (owner: 10Milimetric) [03:02:08] (03CR) 10Dzahn: [C: 031] add AAAA for uranium [dns] - 10https://gerrit.wikimedia.org/r/172442 (owner: 10John F. Lewis) [03:02:43] (03PS3) 10Dzahn: add AAAA for uranium [dns] - 10https://gerrit.wikimedia.org/r/172442 (owner: 10John F. Lewis) [03:14:06] (03CR) 1020after4: "@dzahn yes it needs slight tweaks for production rollout, I will submit another patch for that" [puppet] - 10https://gerrit.wikimedia.org/r/166283 (owner: 1020after4) [03:58:15] PROBLEM - ElasticSearch health check on elastic1014 is CRITICAL: CRITICAL - Could not connect to server 10.64.48.11 [04:29:32] !log LocalisationUpdate ResourceLoader cache refresh completed at Tue Nov 11 04:29:32 UTC 2014 (duration 29m 31s) [04:29:38] Logged the message, Master [04:41:46] RECOVERY - ElasticSearch health check on elastic1014 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 31: number_of_data_nodes: 31: active_primary_shards: 2117: active_shards: 6367: relocating_shards: 16: initializing_shards: 0: unassigned_shards: 0 [04:57:50] (03PS9) 10Yuvipanda: [WIP] shinken: Add basic service checks for all of labs [puppet] - 10https://gerrit.wikimedia.org/r/172420 [05:14:40] (03PS10) 10Yuvipanda: [WIP] shinken: Add basic service checks for all of labs [puppet] - 10https://gerrit.wikimedia.org/r/172420 [05:42:35] PROBLEM - graphite.wikimedia.org on labmon1001 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 398 bytes in 0.009 second response time [05:43:45] RECOVERY - graphite.wikimedia.org on labmon1001 is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 0.016 second response time [06:06:54] PROBLEM - graphite.wikimedia.org on labmon1001 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 398 bytes in 0.817 second response time [06:07:54] RECOVERY - graphite.wikimedia.org on labmon1001 is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 0.008 second response time [06:15:51] (03PS1) 10Springle: m2-master switch to dbproxy1002 [dns] - 10https://gerrit.wikimedia.org/r/172498 [06:16:09] (03CR) 10Springle: [C: 04-2] m2-master switch to dbproxy1002 [dns] - 10https://gerrit.wikimedia.org/r/172498 (owner: 10Springle) [06:26:04] PROBLEM - Disk space on vanadium is CRITICAL: DISK CRITICAL - free space: / 4215 MB (3% inode=94%): [06:28:25] PROBLEM - puppet last run on db2036 is CRITICAL: CRITICAL: puppet fail [06:29:04] PROBLEM - puppet last run on mw1170 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:26] RECOVERY - Disk space on vanadium is OK: DISK OK [06:29:36] PROBLEM - puppet last run on ms-fe2001 is CRITICAL: CRITICAL: Puppet has 2 failures [06:29:55] PROBLEM - puppet last run on mw1144 is CRITICAL: CRITICAL: Puppet has 1 failures [06:41:35] PROBLEM - puppet last run on db1019 is CRITICAL: CRITICAL: Puppet has 1 failures [06:41:57] PROBLEM - puppet last run on db1024 is CRITICAL: CRITICAL: Puppet has 1 failures [06:45:36] RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [06:46:07] RECOVERY - puppet last run on ms-fe2001 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [06:46:29] RECOVERY - puppet last run on mw1144 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [06:47:06] RECOVERY - puppet last run on db2036 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [06:48:16] RECOVERY - puppet last run on db1019 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [06:51:56] PROBLEM - puppet last run on db1004 is CRITICAL: CRITICAL: Puppet has 1 failures [06:59:57] RECOVERY - puppet last run on db1024 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [07:09:46] RECOVERY - puppet last run on db1004 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [07:21:44] (03PS1) 10Giuseppe Lavagetto: monitoring: remove monitoring::group from nagios.pp [puppet] - 10https://gerrit.wikimedia.org/r/172500 [07:22:03] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] monitoring: remove monitoring::group from nagios.pp [puppet] - 10https://gerrit.wikimedia.org/r/172500 (owner: 10Giuseppe Lavagetto) [08:51:39] (03CR) 10Adrian Lang: "Ok, thanks for looking into this. For the sake of me learning something, can you tell me why this won't work? Also, can I help with that o" [puppet] - 10https://gerrit.wikimedia.org/r/171535 (https://bugzilla.wikimedia.org/72184) (owner: 10Adrian Lang) [09:30:26] (03CR) 10Adrian Lang: "I guess https://bugzilla.wikimedia.org/show_bug.cgi?id=72184#c3 explains why this won't work." [puppet] - 10https://gerrit.wikimedia.org/r/171535 (https://bugzilla.wikimedia.org/72184) (owner: 10Adrian Lang) [10:01:00] (03PS1) 10Gilles: Add wgDebugLog group for FSFileBackend [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172514 (https://bugzilla.wikimedia.org/73229) [10:01:55] (03CR) 10Alexandros Kosiaris: [C: 04-1] "This approach will not work. The file {'/etc/ssh/sshd_config': resource is already declared in ssh::server (modules/ssh/manifests/server." [puppet] - 10https://gerrit.wikimedia.org/r/172476 (owner: 10Dzahn) [10:03:27] (03CR) 10Alexandros Kosiaris: [C: 04-1] Gerrit also listens on port 22 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/172313 (https://bugzilla.wikimedia.org/35611) (owner: 10Dereckson) [10:04:50] (03Abandoned) 10Adrian Lang: Add qunit localhost setup to role::ci::slave::labs [puppet] - 10https://gerrit.wikimedia.org/r/171535 (https://bugzilla.wikimedia.org/72184) (owner: 10Adrian Lang) [10:12:48] PROBLEM - puppet last run on amssq38 is CRITICAL: CRITICAL: puppet fail [10:21:09] PROBLEM - puppet last run on virt1008 is CRITICAL: CRITICAL: Puppet has 1 failures [10:32:29] RECOVERY - puppet last run on amssq38 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [10:35:09] PROBLEM - check_puppetrun on db1025 is CRITICAL: CRITICAL: Puppet has 1 failures [10:35:42] RECOVERY - puppet last run on virt1008 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [10:37:01] PROBLEM - OCG health on ocg1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:40:12] PROBLEM - check_puppetrun on db1025 is CRITICAL: CRITICAL: Puppet has 1 failures [10:45:12] PROBLEM - check_puppetrun on db1025 is CRITICAL: CRITICAL: Puppet has 1 failures [10:50:16] PROBLEM - check_puppetrun on db1025 is CRITICAL: CRITICAL: Puppet has 1 failures [10:50:30] <_joe_> mmmh why is it repeating that so often [10:50:56] PROBLEM - puppet last run on db2023 is CRITICAL: CRITICAL: puppet fail [10:55:21] RECOVERY - check_puppetrun on db1025 is OK: OK: Puppet is currently enabled, last run 83 seconds ago with 0 failures [10:57:50] (03PS2) 10Steinsplitter: Adding "*.nasa.gov" to wgCopyUploadsDomains. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172204 [11:03:53] (03PS1) 10Alexandros Kosiaris: Fix planet sync cron command [puppet] - 10https://gerrit.wikimedia.org/r/172521 [11:04:42] (03CR) 10Alexandros Kosiaris: [C: 032] Move ganglia memcache.py to gmond_memcached.py [puppet] - 10https://gerrit.wikimedia.org/r/172474 (owner: 10Andrew Bogott) [11:05:01] (03CR) 10Alexandros Kosiaris: [C: 032] Fix planet sync cron command [puppet] - 10https://gerrit.wikimedia.org/r/172521 (owner: 10Alexandros Kosiaris) [11:09:47] RECOVERY - puppet last run on db2023 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [11:11:48] _joe_: i need urgent help please [11:11:57] <_joe_> matanya: tell me all [11:12:05] there is a edit-ahathon in isreal [11:12:17] 70 users are trying to edit [11:12:23] and the ip is blocked [11:12:29] due to rate limiting [11:12:43] what can be done ? [11:13:02] <_joe_> I think there may be some setting, lemme ask someone else :) [11:13:12] <_joe_> I never got to manage these situations [11:13:32] they are all pre-registred users [11:15:28] <_joe_> yeah, I simply don't know how to whitelist an IP, taking a look [11:17:42] <_joe_> matanya: open an RT ticket in the meanwhile, even if I find out in a few minutes, I'd still need that [11:17:53] sure [11:19:07] <_joe_> (as a rule of thumb: open an RT ticket and ping the on duty ops if in the suitable TZ) [11:19:32] _joe_: i found this: https://gerrit.wikimedia.org/r/#/c/136750/1/wmf-config/throttle.php [11:19:41] which i did in the past [11:19:54] but not sure it is still applicable [11:20:20] <_joe_> I was looking at that specifically [11:20:36] <_joe_> just give me an RT ticket number in query [11:21:21] <_joe_> I need the ip, duration, and wikis involved [11:21:42] sec thanks [11:27:13] apergos: please see above [11:32:27] <_joe_> (for the record, that change was related to creating users, not editing restrictions) [11:34:45] (03PS1) 10Filippo Giunchedi: elasticsearch: disable noisy check_elasticsearch [puppet] - 10https://gerrit.wikimedia.org/r/172527 [11:36:54] <_joe_> matanya: I may have found the setting, but I'll need someone else to approve afterwards [11:37:05] ok [11:38:35] akosiaris: thanks for taking care of the postgres thing! [11:39:12] <_joe_> matanya: we have no real sactioned way to raise edit limits AFAICS [11:39:40] i can giv them all steward rights :P [11:39:47] no rate limiting on us [11:40:05] <_joe_> there is 'wgRateLimitsExcludedIPs' but the notes on wmf-config/InitialiseSettings.php suggest not to use that [11:40:56] we need to fix bug #1 [11:41:07] "better documention" [11:41:26] <_joe_> no the docs are pretty sane, see http://www.mediawiki.org/wiki/Manual:$wgRateLimits [11:41:54] <_joe_> so maybe you can grant them temporarily the 'noratelimit' user right? [11:42:17] <_joe_> btw I do not see any limit on editing by registered users [11:42:38] (03PS4) 10Filippo Giunchedi: jheapdump: gdb-based heap dump for JVM [puppet] - 10https://gerrit.wikimedia.org/r/170996 [11:42:44] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] jheapdump: gdb-based heap dump for JVM [puppet] - 10https://gerrit.wikimedia.org/r/170996 (owner: 10Filippo Giunchedi) [11:42:54] <_joe_> oh there is one on 'newbie' [11:43:13] <_joe_> matanya: it's 8 edits/minute [11:43:54] so, 70 users, surely pass that [11:44:26] <_joe_> not sure how that def works, I should go read the code, hold on [11:44:45] <_joe_> so most of your users are "newbies"? [11:44:50] all [11:44:55] <_joe_> :/ [11:45:08] signed up in the last 24 h [11:46:47] <_joe_> matanya: sorry but I don't think we are able to do this properly [11:47:11] ok, it is their fault for not preparing in advance [11:47:20] <_joe_> :/ [11:47:33] thank you very much for your time [11:48:34] <_joe_> I'm still taking a look, to see if I find something [11:50:35] <_joe_> I could easily add that functionality to throttle.php, but that would go through SWAT of course [11:50:54] <_joe_> and thus, not in time for your editathon [11:53:27] we used to be able to unthrottle for ip ranges, but it's been a long time since I looked at that [11:53:27] <_joe_> as I told you, you could assign them temporarily the nouserlimit right [11:53:56] (sorry, I was off battling the greek bureaucracy, trying to get paperwork that will let me travel in Jan.. not sure if I'm winning or not) [11:53:57] <_joe_> apergos: seeing the code in wmf-config/throttle.php, that only works for account creation [11:54:06] <_joe_> apergos: :( [11:54:27] hm let me see how we used to do that [11:54:32] <_joe_> apergos: we also have wgRateLimitsExcludedIPs in InitialiseSettings.php [11:54:49] <_joe_> which may work, but there is a comment there suggesting not to use it [11:55:23] <_joe_> so maybe the comment is plainly misleading [12:01:13] looks bad, looks like we've only done account creation in the past [12:01:27] let me see that other setting [12:01:29] <_joe_> apergos: exactly [12:02:43] hmm, shinken showing up symptoms of wtfery... [12:02:57] * YuviPanda tries to figure out if it is shinken or graphite that's acting up [12:05:53] meh [12:06:01] hmm, graphite can't keep up with shinken doing checks... [12:06:18] it's not even doing *that* many checks... [12:06:20] * YuviPanda digs more [12:07:08] guess it's not going to happen (8 edits a minute is rather a lot even for a new account, hopefully they are not all using the same account??) [12:16:21] apergos: the ip got blocked, that is the problem [12:16:41] but i'm not sure it is going to e relevant anymore soon anyway [12:19:45] right [12:28:16] (03PS1) 10Giuseppe Lavagetto: monitoring: move monitor_host to monitoring::host [puppet] - 10https://gerrit.wikimedia.org/r/172530 [12:28:18] (03PS1) 10Giuseppe Lavagetto: puppet: get rid of the nagios_group global variable [puppet] - 10https://gerrit.wikimedia.org/r/172531 [12:31:28] (03PS3) 10Filippo Giunchedi: swift: report statsd data to localhost [puppet] - 10https://gerrit.wikimedia.org/r/171547 [12:31:40] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] swift: report statsd data to localhost [puppet] - 10https://gerrit.wikimedia.org/r/171547 (owner: 10Filippo Giunchedi) [12:50:00] PROBLEM - puppet last run on labmon1001 is CRITICAL: CRITICAL: puppet fail [12:53:31] aaah, damn [12:53:37] that explains my troubles. [12:53:39] * YuviPanda grumbles to self [12:53:56] (03PS1) 10Nikerabbit: Add read only configuration for ElasticSearchTTMServer [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172534 [12:57:44] (03CR) 10Nikerabbit: Add read only configuration for ElasticSearchTTMServer (033 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172534 (owner: 10Nikerabbit) [12:58:29] (03PS1) 10Yuvipanda: graphite: Kill labs archiver [puppet] - 10https://gerrit.wikimedia.org/r/172536 [12:59:42] (03PS2) 10Yuvipanda: graphite: Kill labs archiver [puppet] - 10https://gerrit.wikimedia.org/r/172536 [13:00:05] Reedy, greg-g: Respected human, time to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141111T1300). Please do the needful. [13:01:23] (03CR) 10Yuvipanda: [C: 032] graphite: Kill labs archiver [puppet] - 10https://gerrit.wikimedia.org/r/172536 (owner: 10Yuvipanda) [13:04:27] RECOVERY - puppet last run on labmon1001 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [13:06:46] well, that was stupid. [13:06:59] * YuviPanda starts deleting close to 2T of junk data... [13:08:08] !log deleting tons of junk data generated by interaction between txstatsd and the labs graphite archiver on labmon1001 [13:08:17] Logged the message, Master [13:10:36] (03PS1) 10Reedy: Non wikipedias to 1.25wmf7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172537 [13:10:47] RECOVERY - Disk space on labmon1001 is OK: DISK OK [13:11:35] !log reedy Purged l10n cache for 1.25wmf5 [13:11:36] (03CR) 10Reedy: [C: 032] Non wikipedias to 1.25wmf7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172537 (owner: 10Reedy) [13:11:38] Logged the message, Master [13:11:43] (03Merged) 10jenkins-bot: Non wikipedias to 1.25wmf7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172537 (owner: 10Reedy) [13:12:25] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non wikipedias to 1.25wmf7 [13:12:28] Logged the message, Master [13:20:14] (03PS1) 10Reedy: Remove php-1.24wmf22 and php-1.25wmf1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172538 [13:20:44] (03CR) 10Reedy: [C: 032] Remove php-1.24wmf22 and php-1.25wmf1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172538 (owner: 10Reedy) [13:20:51] (03Merged) 10jenkins-bot: Remove php-1.24wmf22 and php-1.25wmf1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172538 (owner: 10Reedy) [13:23:03] (03PS3) 10Reedy: Adding "*.nasa.gov" to wgCopyUploadsDomains. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172204 (owner: 10Steinsplitter) [13:23:08] (03CR) 10Reedy: [C: 032] Adding "*.nasa.gov" to wgCopyUploadsDomains. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172204 (owner: 10Steinsplitter) [13:23:16] (03Merged) 10jenkins-bot: Adding "*.nasa.gov" to wgCopyUploadsDomains. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172204 (owner: 10Steinsplitter) [13:23:59] (03PS4) 10Reedy: Remove old AdminSettings.php (symlink) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/145408 (https://bugzilla.wikimedia.org/67820) [13:24:03] (03CR) 10Reedy: [C: 032] Remove old AdminSettings.php (symlink) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/145408 (https://bugzilla.wikimedia.org/67820) (owner: 10Reedy) [13:24:11] (03Merged) 10jenkins-bot: Remove old AdminSettings.php (symlink) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/145408 (https://bugzilla.wikimedia.org/67820) (owner: 10Reedy) [13:24:17] (03CR) 10Manybubbles: [C: 031] "+1 for this. The real problem with all of our elasticsearch checks so far as I'm concerned is that when there is a cluster health degreda" [puppet] - 10https://gerrit.wikimedia.org/r/172527 (owner: 10Filippo Giunchedi) [13:25:31] (03PS2) 10Reedy: Adjustments to securepoll and usergroups for voteWiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172464 (https://bugzilla.wikimedia.org/72589) (owner: 10Jalexander) [13:25:42] (03CR) 10Reedy: [C: 032] Adjustments to securepoll and usergroups for voteWiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172464 (https://bugzilla.wikimedia.org/72589) (owner: 10Jalexander) [13:25:50] (03Merged) 10jenkins-bot: Adjustments to securepoll and usergroups for voteWiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172464 (https://bugzilla.wikimedia.org/72589) (owner: 10Jalexander) [13:27:32] (03PS2) 10Reedy: Task recommendations experiment is over [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172110 (owner: 10Nemo bis) [13:27:37] (03CR) 10Reedy: [C: 032] Task recommendations experiment is over [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172110 (owner: 10Nemo bis) [13:27:46] (03Merged) 10jenkins-bot: Task recommendations experiment is over [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172110 (owner: 10Nemo bis) [13:28:22] (03PS2) 10Reedy: Add wgDebugLog group for FSFileBackend [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172514 (https://bugzilla.wikimedia.org/73229) (owner: 10Gilles) [13:28:27] (03CR) 10Reedy: [C: 032] Add wgDebugLog group for FSFileBackend [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172514 (https://bugzilla.wikimedia.org/73229) (owner: 10Gilles) [13:28:34] (03Merged) 10jenkins-bot: Add wgDebugLog group for FSFileBackend [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172514 (https://bugzilla.wikimedia.org/73229) (owner: 10Gilles) [13:29:16] (03PS3) 10Reedy: (bug 73197) enable Patrolled edits on Hebrew Wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172112 (owner: 10Matanya) [13:29:21] (03CR) 10Reedy: [C: 032] (bug 73197) enable Patrolled edits on Hebrew Wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172112 (owner: 10Matanya) [13:29:28] (03Merged) 10jenkins-bot: (bug 73197) enable Patrolled edits on Hebrew Wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172112 (owner: 10Matanya) [13:29:34] thanks Reedy [13:31:28] PROBLEM - puppet last run on labmon1001 is CRITICAL: CRITICAL: puppet fail [13:32:58] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There are 6 unmerged changes in mediawiki_config (dir /srv/mediawiki-staging/). [13:33:18] (03PS4) 10Reedy: Revert "Set wgMathDisableTexFilter to fix performance regression" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/158559 (https://bugzilla.wikimedia.org/49169) (owner: 10Physikerwelt) [13:34:16] (03PS5) 10Reedy: Revert "Set wgMathDisableTexFilter to fix performance regression" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/158559 (https://bugzilla.wikimedia.org/49169) (owner: 10Physikerwelt) [13:34:58] (03PS6) 10Reedy: Revert "Set wgMathDisableTexFilter to fix performance regression" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/158559 (https://bugzilla.wikimedia.org/49169) (owner: 10Physikerwelt) [13:35:04] !log rolling reload on ms-be2* to pick up statsd changes [13:35:09] Logged the message, Master [13:35:11] (03CR) 10Reedy: [C: 032] Revert "Set wgMathDisableTexFilter to fix performance regression" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/158559 (https://bugzilla.wikimedia.org/49169) (owner: 10Physikerwelt) [13:35:18] (03Merged) 10jenkins-bot: Revert "Set wgMathDisableTexFilter to fix performance regression" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/158559 (https://bugzilla.wikimedia.org/49169) (owner: 10Physikerwelt) [13:37:58] (03PS2) 10Reedy: Adding Ukraine photo sources to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/171484 (https://bugzilla.wikimedia.org/73045) (owner: 10Dereckson) [13:38:11] (03PS3) 10Reedy: Adding Ukraine photo sources to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/171484 (https://bugzilla.wikimedia.org/73045) (owner: 10Dereckson) [13:38:15] (03CR) 10Reedy: [C: 032] Adding Ukraine photo sources to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/171484 (https://bugzilla.wikimedia.org/73045) (owner: 10Dereckson) [13:38:23] (03Merged) 10jenkins-bot: Adding Ukraine photo sources to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/171484 (https://bugzilla.wikimedia.org/73045) (owner: 10Dereckson) [13:39:49] (03PS2) 10Reedy: Adding *.wikiportret.nl to wgCopyUploadsDomains whitelist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/171012 (https://bugzilla.wikimedia.org/72953) (owner: 10Steinsplitter) [13:40:12] (03CR) 10Reedy: [C: 032] Adding *.wikiportret.nl to wgCopyUploadsDomains whitelist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/171012 (https://bugzilla.wikimedia.org/72953) (owner: 10Steinsplitter) [13:40:19] (03Merged) 10jenkins-bot: Adding *.wikiportret.nl to wgCopyUploadsDomains whitelist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/171012 (https://bugzilla.wikimedia.org/72953) (owner: 10Steinsplitter) [13:41:09] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [13:41:53] !log reedy Synchronized wmf-config: (no message) (duration: 00m 14s) [13:41:56] Logged the message, Master [13:42:01] (03CR) 10Filippo Giunchedi: [C: 031] memcached: tidy [puppet] - 10https://gerrit.wikimedia.org/r/171153 (owner: 10Ori.livneh) [13:46:58] (03CR) 10Faidon Liambotis: [C: 031] "Looks good indeed, although if you're targetting a Debian upload as well I'd highly suggest an init script :)" [debs/carbon-c-relay] - 10https://gerrit.wikimedia.org/r/172228 (owner: 10Filippo Giunchedi) [13:48:24] (03CR) 10Faidon Liambotis: [C: 031] "Sure, let's kill it and we can always resurrect it if we change our minds." [puppet] - 10https://gerrit.wikimedia.org/r/170974 (owner: 10Yuvipanda) [13:48:57] RECOVERY - puppet last run on labmon1001 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [13:49:18] (03CR) 10Faidon Liambotis: "Ping?" [puppet] - 10https://gerrit.wikimedia.org/r/145997 (https://bugzilla.wikimedia.org/67957) (owner: 10Ori.livneh) [13:51:47] PROBLEM - puppet last run on tungsten is CRITICAL: CRITICAL: puppet fail [14:04:32] !log reedy Synchronized php-1.25wmf6/extensions/BounceHandler/: (no message) (duration: 00m 14s) [14:04:36] Logged the message, Master [14:04:55] !log reedy Synchronized php-1.25wmf6/vendor/: (no message) (duration: 00m 15s) [14:04:58] Logged the message, Master [14:08:28] PROBLEM - puppet last run on ms-be2006 is CRITICAL: CRITICAL: puppet fail [14:08:28] PROBLEM - puppet last run on ms-be2005 is CRITICAL: CRITICAL: puppet fail [14:08:28] PROBLEM - puppet last run on ms-be2004 is CRITICAL: CRITICAL: puppet fail [14:09:08] !log reedy Synchronized php-1.25wmf7/extensions/BounceHandler/: (no message) (duration: 00m 15s) [14:09:10] Logged the message, Master [14:09:39] RECOVERY - puppet last run on ms-be2005 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [14:09:40] PROBLEM - puppet last run on ms-be2011 is CRITICAL: CRITICAL: puppet fail [14:11:19] PROBLEM - puppet last run on ms-be2012 is CRITICAL: CRITICAL: puppet fail [14:11:20] RECOVERY - puppet last run on tungsten is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [14:11:39] PROBLEM - puppet last run on ms-be2008 is CRITICAL: CRITICAL: puppet fail [14:18:29] (03CR) 10Reedy: [C: 04-1] [WIP] Deploy BounceHandler extension to production (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172322 (https://bugzilla.wikimedia.org/69019) (owner: 10Legoktm) [14:20:40] (03CR) 10Reedy: [WIP] Deploy BounceHandler extension to production (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172322 (https://bugzilla.wikimedia.org/69019) (owner: 10Legoktm) [14:23:40] !log reedy Synchronized private/PrivateSettings.php: Add $wmgVERPsecret for BounceHandler (duration: 00m 14s) [14:23:42] Logged the message, Master [14:24:12] (03CR) 10Reedy: [WIP] Deploy BounceHandler extension to production (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172322 (https://bugzilla.wikimedia.org/69019) (owner: 10Legoktm) [14:24:18] RECOVERY - puppet last run on ms-be2006 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [14:25:27] RECOVERY - puppet last run on ms-be2004 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [14:26:02] (03PS3) 10Reedy: Deploy BounceHandler extension to production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172322 (https://bugzilla.wikimedia.org/69019) (owner: 10Legoktm) [14:26:11] (03PS4) 10Reedy: Deploy BounceHandler extension to production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172322 (https://bugzilla.wikimedia.org/69019) (owner: 10Legoktm) [14:26:37] PROBLEM - puppet last run on ms-fe2001 is CRITICAL: CRITICAL: puppet fail [14:28:37] RECOVERY - puppet last run on ms-be2011 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [14:30:12] RECOVERY - puppet last run on ms-be2012 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [14:30:30] PROBLEM - puppet last run on labmon1001 is CRITICAL: CRITICAL: puppet fail [14:30:55] hmm [14:31:07] same transient error as tungsten [14:31:18] PROBLEM - puppet last run on ms-be2001 is CRITICAL: CRITICAL: puppet fail [14:31:21] txstatsd eh? [14:31:37] I'm reverting a change I pushed yesterday [14:31:45] (03CR) 10Reedy: Deploy BounceHandler extension to production (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172322 (https://bugzilla.wikimedia.org/69019) (owner: 10Legoktm) [14:31:55] godog: graphite, from the looks of it [14:31:57] Error: Failed to apply catalog: Could not find dependency Package[graphite-carbon] for File[/srv/carbon] at /etc/puppet/modules/graphite/manifests/init.pp:45 [14:32:00] (03PS1) 10Filippo Giunchedi: txstatsd/graphite: switch back to package [puppet] - 10https://gerrit.wikimedia.org/r/172546 [14:32:02] saw the same thing on tungsten earlier [14:32:09] YuviPanda: ye, see ^ [14:32:16] * YuviPanda clicks [14:32:27] <_joe_> godog: maybe open an issue on phabricator? [14:32:27] aaah [14:32:28] PROBLEM - puppet last run on ms-be2007 is CRITICAL: CRITICAL: puppet fail [14:32:42] * godog chuckles for the hidden rickroll in gerrit [14:32:54] <_joe_> ? [14:32:56] _joe_: good point, I'm doing that now [14:33:00] there's... a rickroll? [14:33:09] no but there should be! [14:33:48] YuviPanda: perhaps that weird bug you hit the other day with phab links can be exploited to have a trasparent rickroll redirect [14:33:56] hahaha :D [14:38:49] RECOVERY - puppet last run on labmon1001 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [14:42:11] (03CR) 10Filippo Giunchedi: "see also https://phabricator.wikimedia.org/T1245 to track this" [puppet] - 10https://gerrit.wikimedia.org/r/172546 (owner: 10Filippo Giunchedi) [14:42:47] YuviPanda: yeah definitely a bug, see my last comment on that last review [14:44:58] PROBLEM - puppet last run on ms-be2006 is CRITICAL: CRITICAL: puppet fail [14:45:05] yeah [14:45:12] we can't actually use it to rickroll people, though [14:46:17] RECOVERY - puppet last run on ms-fe2001 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [14:47:53] heheh, btw YuviPanda can you take a quick look above? should be easy enough [14:50:40] PROBLEM - puppet last run on ms-be2012 is CRITICAL: CRITICAL: puppet fail [14:50:47] RECOVERY - puppet last run on ms-be2001 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [14:52:07] RECOVERY - puppet last run on ms-be2007 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [15:05:53] RECOVERY - puppet last run on ms-be2006 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [15:06:49] PROBLEM - puppet last run on ms-be2004 is CRITICAL: CRITICAL: puppet fail [15:08:47] godog: yeah, had a +1 forgot to press buttons [15:08:54] (03CR) 10Yuvipanda: [C: 031] txstatsd/graphite: switch back to package [puppet] - 10https://gerrit.wikimedia.org/r/172546 (owner: 10Filippo Giunchedi) [15:09:00] (03CR) 10Faidon Liambotis: [C: 04-1] "This is great, the custom proxy is an especially nice touch." (037 comments) [puppet] - 10https://gerrit.wikimedia.org/r/165779 (owner: 10Ori.livneh) [15:10:38] RECOVERY - puppet last run on ms-be2012 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [15:11:17] RECOVERY - puppet last run on ms-be2008 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [15:11:38] PROBLEM - puppet last run on ms-be2001 is CRITICAL: CRITICAL: puppet fail [15:12:50] (03CR) 10John F. Lewis: [C: 04-1] "Should have a PTR record placed in the IP record :)" [dns] - 10https://gerrit.wikimedia.org/r/172452 (https://bugzilla.wikimedia.org/71262) (owner: 10Dzahn) [15:12:57] PROBLEM - puppet last run on ms-be2007 is CRITICAL: CRITICAL: puppet fail [15:13:39] (03PS1) 10Filippo Giunchedi: swift: report statsd data to localhost [puppet] - 10https://gerrit.wikimedia.org/r/172549 [15:13:41] YuviPanda: thanks! [15:14:00] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] txstatsd/graphite: switch back to package [puppet] - 10https://gerrit.wikimedia.org/r/172546 (owner: 10Filippo Giunchedi) [15:16:59] (03CR) 10Faidon Liambotis: [C: 04-1] "OK, pardon the question but... why? Isn't logging by the keyholder-proxy enough?" [puppet] - 10https://gerrit.wikimedia.org/r/165862 (owner: 10Ori.livneh) [15:18:21] (03CR) 10Faidon Liambotis: "As I told Ori on IRC a few weeks ago, I think it may actually be a sound idea and we should revive it. I can see why it could be controver" [puppet] - 10https://gerrit.wikimedia.org/r/138292 (owner: 10Ori.livneh) [15:18:51] (03CR) 10Faidon Liambotis: [C: 031] "I have no clue why that's there, but sure, remove it, why not :)" [puppet] - 10https://gerrit.wikimedia.org/r/172413 (owner: 10Andrew Bogott) [15:25:41] RECOVERY - puppet last run on ms-be2004 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [15:28:03] (03CR) 10Faidon Liambotis: [C: 031] "Looks good, although I maintain that it should just be called "diamond", not "python-diamond"." [debs/python-diamond] - 10https://gerrit.wikimedia.org/r/168599 (owner: 10Filippo Giunchedi) [15:30:06] (03PS1) 10Giuseppe Lavagetto: hiera: allow regex-based searches [puppet] - 10https://gerrit.wikimedia.org/r/172552 [15:30:28] RECOVERY - puppet last run on ms-be2001 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [15:31:38] RECOVERY - puppet last run on ms-be2007 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [15:35:58] (03PS1) 10Alexandros Kosiaris: Introduce role::ganglia classes [puppet] - 10https://gerrit.wikimedia.org/r/172553 [15:42:28] :D :D:D [15:43:37] <_joe_> so we're finally getting rid of ganglia.pp? [15:46:26] (03PS2) 10Alexandros Kosiaris: Introduce role::ganglia classes [puppet] - 10https://gerrit.wikimedia.org/r/172553 [15:46:28] (03PS1) 10Alexandros Kosiaris: Assign role::ganglia::web to uranium [puppet] - 10https://gerrit.wikimedia.org/r/172555 [15:55:03] <_joe_> ottomata: isn't it veteran day? you supposed to be on holiday? [15:55:27] heh [15:55:33] _joe_: And how many holidays do you take off? ;) [15:55:34] heheh [15:56:12] <_joe_> Reedy: I am an observant non-worker [15:57:46] Reedy: you are probably the guy to ask [15:57:51] oh noes [15:58:03] there was an edit-a-thon to today in a uni in Israel [15:58:30] with 70 editors, their ip got blocked for rate limiting violation [15:59:01] all editors were pre-registered [15:59:33] but only 24 h before. what can be done to overcome this in the future, Reedy ? [16:00:04] manybubbles, anomie, ^d, marktraceur: Respected human, time to deploy Morning SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141111T1600). Please do the needful. [16:00:17] matanya: I guess the problem is the rate limiting for editing? [16:00:17] 'ip' => null, // for each anon and recent account [16:00:29] PROBLEM - CI: Low disk space on /var on labmon1001 is CRITICAL: CRITICAL: integration.integration-puppetmaster.diskspace._var.byte_avail.value (11.11%) [16:00:35] When they're more mature accounts... [16:01:21] yes Reedy , that was the problem [16:01:47] where is the source to patch that ? [16:02:01] 'edit' => array( [16:02:01] // 8 ed./min per each non-autoconfirmed, or group thereof from same IP [16:02:01] 'ip' => array( 8, 60 ), [16:02:01] 'newbie' => array( 8, 60 ), [16:02:01] ), [16:02:08] ^ That's in InitialiseSettings.php [16:02:50] where it says not to use it ... :) [16:02:51] Annoyingly, https://noc.wikimedia.org/conf/highlight.php?file=throttle.php is only really for account creations [16:03:05] ....surely jouncebot is wrong. [16:03:11] jouncebot: reload [16:03:25] marktraceur: I think the entry is still on the deployments page [16:03:27] just with a comment [16:03:45] matanya: I guess throttle.php could be improved to be able to adjust the throttle for other things than just account creation [16:04:03] would be nice [16:04:35] (03PS1) 10Dereckson: wgCopyUploadsDomains configuration for Wikimedia Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172557 (https://bugzilla.wikimedia.org/73045) [16:04:36] Shouldn't be too hard I think [16:04:42] After whoever actually rewrote it :) [16:04:50] (03CR) 10Dereckson: "Follow-up: I63a2c8871aeaa8f3f2046d12b0af5337b6479941" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/171484 (https://bugzilla.wikimedia.org/73045) (owner: 10Dereckson) [16:05:49] Reedy: so changing it for anything other than account creation is not possible at the moment [16:05:53] Right [16:06:15] I can't see it taking more than 10 minutes work to improve it [16:06:24] i'll open a bug for it [16:06:36] and try finding those precious 10 minutes [16:07:06] <_joe_> Reedy: I was about to create a patch to throttle.php to add that ability [16:07:15] :) [16:07:40] <_joe_> but then, no dev was around, and that's not the kind of thing I can merge on my own [16:07:54] go _joe_ ! [16:08:22] (03PS6) 1001tonythomas: Make BounceHandler extension work on meta-wiki [puppet] - 10https://gerrit.wikimedia.org/r/168622 [16:08:41] <_joe_> matanya: Reedy already owes me a more important CR at the moment ;) [16:08:49] _joe_: I left a comment on it last night :P [16:08:53] MORE LINES CAN BE REMOVED [16:09:04] <_joe_> eheh [16:09:04] <_joe_> ok [16:09:18] I think it's pretty awesome otherwise :D [16:09:22] here goes your excuse ... [16:09:45] <_joe_> matanya: my next excuse is "now I need to amend that patch according to reedy's comments" [16:10:00] nicely done [16:10:02] hahah [16:10:14] <_joe_> tsk, I've been manager of operations for 3 years before joining the WMF, I know all the tricks [16:10:45] noted [16:11:29] <_joe_> I can dodge assignments better than Neo could dodge bullets [16:11:36] managers. managers. managers. managers. [16:12:48] <_joe_> Reedy: not anymore :) [16:13:28] https://bugzilla.wikimedia.org/show_bug.cgi?id=73269 at your service [16:15:10] (03CR) 10Nemo bis: Make BounceHandler extension work on meta-wiki (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/168622 (owner: 1001tonythomas) [16:15:59] PROBLEM - puppet last run on ms-fe2002 is CRITICAL: CRITICAL: Puppet has 1 failures [16:20:20] RECOVERY - puppet last run on ms-fe2002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:20:56] (03PS1) 10Reedy: Reduce global function count. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172559 [16:22:37] Is something broken with math rendering(?) right now? Having enabled client-side MathJax in https://de.wikipedia.org/wiki/Spezial:Einstellungen#mw-prefsection-rendering, I don't see any formula on https://de.wikipedia.org/wiki/Ger%C3%A4nderte_Hesse-Matrix, it just keeps loading forever. (Tested with Firefox and Chrome) [16:22:55] Known issue? [16:23:38] there was a deploy earlier, perhaps someone broken core MathJax (again..) [16:24:45] thedj: Wasn't a deploy to wikipedias [16:24:47] other than... [16:24:55] yup, seems broken [16:25:01] https://github.com/wikimedia/operations-mediawiki-config/commit/975fb76d43ef37346609f1e45123d31a7ddd6d69 [16:26:29] * thedj guesses, something still broken ... [16:28:59] pajz: pls file a bugreport. [16:29:14] k [16:29:17] (03Abandoned) 10Giuseppe Lavagetto: HHVM: get 25% of anonymous traffic [mediawiki-config] - 10https://gerrit.wikimedia.org/r/170041 (owner: 10Giuseppe Lavagetto) [16:29:29] (03CR) 10Dereckson: Gerrit also listens on port 22 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/172313 (https://bugzilla.wikimedia.org/35611) (owner: 10Dereckson) [16:30:28] (03PS2) 10Dereckson: Gerrit also listens on port 22 [puppet] - 10https://gerrit.wikimedia.org/r/172313 (https://bugzilla.wikimedia.org/35611) [16:31:59] (03CR) 10Dereckson: "PS2: @host → @hostname" [puppet] - 10https://gerrit.wikimedia.org/r/172313 (https://bugzilla.wikimedia.org/35611) (owner: 10Dereckson) [16:32:33] (03CR) 10Dereckson: [C: 04-1] "Depends of Id27657ca29e41960886b517fe0be63ad992d382a" [puppet] - 10https://gerrit.wikimedia.org/r/172313 (https://bugzilla.wikimedia.org/35611) (owner: 10Dereckson) [16:33:48] (03PS3) 10Giuseppe Lavagetto: mediawiki: simplify apache config [puppet] - 10https://gerrit.wikimedia.org/r/170300 [16:34:34] (03CR) 10Giuseppe Lavagetto: mediawiki: simplify apache config (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/170300 (owner: 10Giuseppe Lavagetto) [16:35:05] thedj, done, https://bugzilla.wikimedia.org/show_bug.cgi?id=73273 [16:35:48] (03CR) 10Reedy: [C: 031] mediawiki: simplify apache config [puppet] - 10https://gerrit.wikimedia.org/r/170300 (owner: 10Giuseppe Lavagetto) [16:36:49] RECOVERY - CI: Low disk space on /var on labmon1001 is OK: OK: All targets OK [16:37:23] most recent change to math, that could have influenced this seems to be oct 22 to me.. that's quite a while ago already. [16:37:51] thedj: ahaha [16:37:54] It may be a hhvm bug [16:37:55] Nov 11 16:24:18 mw1031: message repeated 384 times: [ #012Notice: Missing <code>texvccheck</code> executable. Please see math/README to configure. in /srv/mediawiki/php-1.25wmf6/extensions/Math/MathInputCheckTexvc.php on line 65] [16:38:06] ah [16:38:22] missing package on trusty :) [16:38:27] I'll just revert that change [16:39:07] $wgMathTexvcCheckExecutable = file_exists( '/usr/bin/texvccheck' ) ? [16:39:07] '/usr/bin/texvccheck' : '/usr/local/apache/uncommon/bin/texvccheck'; [16:39:20] I guess it was never built into uncommon either? [16:40:02] godog: You rebuilt the math texvc stuff, right? [16:40:33] Reedy: yep [16:41:05] mh yeah I think we noted the fact that texvccheck isn't in trusty, and the fallback isn't there either? [16:41:52] yeah, you did say it wasn't in the trusty deb [16:41:58] so i guess the fallback isn't there either [16:42:06] thedj: Should we just disable the filte everywhere? [16:42:21] Seems a bit fishy to have it only enabled on some servers [16:42:32] it's a sanitizer [16:43:17] <_joe_> not really an hhvm bug then [16:43:32] nope, just on hhvm servers [16:43:59] <_joe_> godog: are you handling this? or should I? I kinda miss some context though [16:44:19] we can do without, but in theory users could enter any Tex they want. [16:44:36] _joe_: yeah it is fine, I'll followup [16:44:45] <_joe_> thanks [16:44:49] and tex is included by mathjax scripts so..... [16:44:55] 'you do the math' :) [16:46:07] (03PS1) 10Reedy: Disable TexFilter if no executable available [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172565 [16:46:32] * Reedy will deploy ^ [16:48:03] Reedy: cool, I can rebuild/upload the package for trusty too, not today tho [16:48:10] please :) [16:48:20] (03CR) 10Reedy: [C: 032] Disable TexFilter if no executable available [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172565 (owner: 10Reedy) [16:48:25] (03Merged) 10jenkins-bot: Disable TexFilter if no executable available [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172565 (owner: 10Reedy) [16:48:58] !log reedy Synchronized wmf-config: Use Texvc filter if available (duration: 00m 15s) [16:49:01] Logged the message, Master [16:52:10] (03CR) 10Ottomata: "Faidon, yeah, I thought about as I went through it. I could go both ways, but this is how some of the things I've done in the past work, " [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/166888 (owner: 10Ottomata) [16:52:20] (03CR) 10Ottomata: "thought about that*" [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/166888 (owner: 10Ottomata) [16:54:22] Did someone fix the issue? Can't reproduce it right now after purging the page. [16:55:09] pajz: for some value of fix, yes, I did ;) [16:55:54] Thanks. [17:12:45] !log removed old ocg cronjobs on ocg100x; see https://bugzilla.wikimedia.org/show_bug.cgi?id=73166 [17:12:49] Logged the message, Master [17:15:16] (03CR) 1001tonythomas: [C: 031] "@Nemo:" [puppet] - 10https://gerrit.wikimedia.org/r/168622 (owner: 1001tonythomas) [17:23:41] (03CR) 10Andrew Bogott: [C: 032] No longer ensure => absent package python-memcache [puppet] - 10https://gerrit.wikimedia.org/r/172413 (owner: 10Andrew Bogott) [17:25:03] _joe_: MediaWiki/Sites is for the "Site" interface in MediaWiki core software. I think you meant something else. [17:25:17] (e.g. the concept of interwiki destinations etc.) [17:25:18] <_joe_> Krinkle: yes, sorry [17:25:26] <_joe_> Krinkle: "apache config of mediawiki [17:25:59] <_joe_> Krinkle: I will assign the bug to me btw [17:26:17] _joe_: Oh, your comment was fine. I'm referring to the issue moving to the bug component Sites" [17:26:26] Product: MediaWiki; Component: → Sites [17:26:30] <_joe_> yes [17:26:35] <_joe_> got it :) [17:26:40] right, I see! [17:26:43] thx :) [17:27:19] <_joe_> Krinkle: something has changed in the handling of mod_mime Addencoding between apache 2.2 and 2.4, apparently [17:27:22] <_joe_> reading the docs now [17:31:05] <_joe_> shit, this is kinda serious [17:40:44] We have a long history of fighting with MIME behaviours [17:47:41] <_joe_> Nemo_bis: he! [17:47:42] (03PS1) 10Giuseppe Lavagetto: HAT: remove addencoding directives that are harmful on apache 2.4 [puppet] - 10https://gerrit.wikimedia.org/r/172578 [17:50:53] (03PS11) 10Yuvipanda: shinken: Add basic service checks for all of labs [puppet] - 10https://gerrit.wikimedia.org/r/172420 [17:56:48] (03CR) 10Ori.livneh: [C: 031] HAT: remove addencoding directives that are harmful on apache 2.4 [puppet] - 10https://gerrit.wikimedia.org/r/172578 (owner: 10Giuseppe Lavagetto) [17:57:33] bblack: i'd love a review of if you have the chance [17:59:52] andrewbogott: emailed the ops@ list, but +1s/CR on https://gerrit.wikimedia.org/r/#/c/172420/ welcome [18:00:04] maxsem, kaldari: Respected human, time to deploy Mobile Web (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141111T1800). Please do the needful. [18:07:54] (03CR) 10Steinsplitter: [C: 031] "ok" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172557 (https://bugzilla.wikimedia.org/73045) (owner: 10Dereckson) [18:14:07] (03PS1) 10Hoo man: Add "featured portal" badge (Q17580674) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172580 (https://bugzilla.wikimedia.org/73193) [18:16:14] (03CR) 10BBlack: "What's the thinking on 1773 vs 1772? Is the execute bit necessary for dumping core by non-root procs in addition to the write bit?" [puppet] - 10https://gerrit.wikimedia.org/r/171206 (owner: 10Ori.livneh) [18:18:48] (03CR) 10BBlack: [C: 031] "Nevermind, answered my own question via experimentation :)" [puppet] - 10https://gerrit.wikimedia.org/r/171206 (owner: 10Ori.livneh) [18:32:47] bblack: thanks! [18:33:06] bblack: are you up for merging that, by any chance? i'm a bit nervous about pushing changes to the whole cluster [18:53:03] ori: if I'm merging it, it wouldn't be until tomorrow. I'm technically not here today, and I don't want to end up forcing myself to be here later over some fallout :) [18:53:32] bblack: oh right, i forgot, texas is part of the union these days [18:53:37] bblack: no worries :) [18:53:39] thanks for the review [19:04:50] PROBLEM - puppet last run on lvs4002 is CRITICAL: CRITICAL: puppet fail [19:12:21] (03CR) 10Ori.livneh: add `keyholder` module for managing a shared ssh-agent (036 comments) [puppet] - 10https://gerrit.wikimedia.org/r/165779 (owner: 10Ori.livneh) [19:13:03] (03PS3) 10Ori.livneh: add `keyholder` module for managing a shared ssh-agent [puppet] - 10https://gerrit.wikimedia.org/r/165779 [19:16:30] (03PS1) 10Yuvipanda: admin: Purge my older key [puppet] - 10https://gerrit.wikimedia.org/r/172584 [19:16:36] apergos: ^ +1? [19:18:56] sec, I'm bout to sneak in before you [19:20:13] (03PS1) 10ArielGlenn: add midom to ops (was in roots in old manifest) [puppet] - 10https://gerrit.wikimedia.org/r/172585 [19:20:13] heh ok :) [19:21:31] (03CR) 10ArielGlenn: [C: 032] add midom to ops (was in roots in old manifest) [puppet] - 10https://gerrit.wikimedia.org/r/172585 (owner: 10ArielGlenn) [19:23:05] (03CR) 10ArielGlenn: [C: 031] admin: Purge my older key [puppet] - 10https://gerrit.wikimedia.org/r/172584 (owner: 10Yuvipanda) [19:23:20] (03CR) 10Ori.livneh: "@paravoid: No problem at all. If you think keyholder logging to the AUTH syslog facility is enough, feel free to abandon this patch." [puppet] - 10https://gerrit.wikimedia.org/r/165862 (owner: 10Ori.livneh) [19:23:29] RECOVERY - puppet last run on lvs4002 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [19:23:30] it shooould purge your key from auth keys but if for some reaso it doesn't, salt to the rescue [19:30:25] (03PS3) 10Alexandros Kosiaris: Introduce role::ganglia classes [puppet] - 10https://gerrit.wikimedia.org/r/172553 [19:30:27] (03PS2) 10Alexandros Kosiaris: Assign role::ganglia::web to uranium [puppet] - 10https://gerrit.wikimedia.org/r/172555 [19:34:05] (03PS2) 10Yuvipanda: admin: Purge my older key [puppet] - 10https://gerrit.wikimedia.org/r/172584 [19:41:46] (03CR) 10Yuvipanda: [C: 032] admin: Purge my older key [puppet] - 10https://gerrit.wikimedia.org/r/172584 (owner: 10Yuvipanda) [19:43:44] (03CR) 10Giuseppe Lavagetto: [C: 031] "Kill! Kill!" [puppet] - 10https://gerrit.wikimedia.org/r/170974 (owner: 10Yuvipanda) [19:50:42] (03PS3) 10Yuvipanda: Kill ceph module [puppet] - 10https://gerrit.wikimedia.org/r/170974 [19:51:03] paravoid: re: ceph killing, just hygiene. [19:51:17] I also get an odd(?) satisfaction from removing code [19:52:15] you never can be sure of the net value of adding code, but yes it's almost universally true that if you can remove code and nobody screams, it was a net win :) [19:53:31] :D [19:54:06] (03CR) 10Yuvipanda: [C: 032] Kill ceph module [puppet] - 10https://gerrit.wikimedia.org/r/170974 (owner: 10Yuvipanda) [19:54:15] yay, killed [19:54:29] now I'll wait around for a while to see if icinga complains [19:58:10] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [19:58:24] (03CR) 10Ottomata: "Whao, cool!" [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/172418 (owner: 10Ori.livneh) [19:58:36] (03CR) 10ArielGlenn: [C: 032] Add link to pagecounts-all-site dataset [puppet] - 10https://gerrit.wikimedia.org/r/168104 (owner: 10QChris) [20:03:49] PROBLEM - Unmerged changes on repository puppet on palladium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [20:04:18] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [20:04:29] that was me, forgot to hit 'yes' [20:04:29] done now [20:04:58] RECOVERY - Unmerged changes on repository puppet on palladium is OK: No changes to merge. [20:05:21] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [20:09:59] anyone to +1 my shinken changes? [20:10:04] (03PS4) 10Alexandros Kosiaris: Introduce role::ganglia classes [puppet] - 10https://gerrit.wikimedia.org/r/172553 [20:10:06] (03PS3) 10Alexandros Kosiaris: Assign role::ganglia::web to uranium [puppet] - 10https://gerrit.wikimedia.org/r/172555 [20:10:24] https://gerrit.wikimedia.org/r/#/c/172420/ [20:12:39] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [20:16:59] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [20:18:47] (03PS1) 10Yuvipanda: diamond: Don't choke on puppet syntax error failures [puppet] - 10https://gerrit.wikimedia.org/r/172592 [20:29:38] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [20:32:59] (03PS5) 10Alexandros Kosiaris: Introduce role::ganglia classes [puppet] - 10https://gerrit.wikimedia.org/r/172553 [20:33:01] (03PS4) 10Alexandros Kosiaris: Assign role::ganglia::web to uranium [puppet] - 10https://gerrit.wikimedia.org/r/172555 [20:59:59] (03PS6) 10Alexandros Kosiaris: Introduce role::ganglia classes [puppet] - 10https://gerrit.wikimedia.org/r/172553 [21:00:01] (03PS5) 10Alexandros Kosiaris: Assign role::ganglia::web to uranium [puppet] - 10https://gerrit.wikimedia.org/r/172555 [21:04:09] PROBLEM - puppet last run on cp3015 is CRITICAL: CRITICAL: puppet fail [21:15:48] (03PS1) 10Yuvipanda: shinken: Don't miss points just on the threshold for check_graphite [puppet] - 10https://gerrit.wikimedia.org/r/172639 [21:18:18] RECOVERY - puppet last run on cp3015 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [21:21:09] PROBLEM - Disk space on uranium is CRITICAL: DISK CRITICAL - free space: / 347 MB (3% inode=46%): [21:22:00] well, that's an interesting error :p [21:29:35] (03PS2) 10Yuvipanda: shinken: Don't miss points just on the threshold for check_graphite [puppet] - 10https://gerrit.wikimedia.org/r/172639 [21:39:03] (03Abandoned) 10Yuvipanda: shinken: Don't miss points just on the threshold for check_graphite [puppet] - 10https://gerrit.wikimedia.org/r/172639 (owner: 10Yuvipanda) [21:39:33] (03PS12) 10Yuvipanda: shinken: Add basic service checks for all of labs [puppet] - 10https://gerrit.wikimedia.org/r/172420 [21:40:37] (03PS13) 10Yuvipanda: shinken: Add basic service checks for all of labs [puppet] - 10https://gerrit.wikimedia.org/r/172420 [21:57:12] (03PS1) 10Alexandros Kosiaris: osm: avoid cronspam from osmosis [puppet] - 10https://gerrit.wikimedia.org/r/172653 [22:00:04] spagewmf, ebernhardson: Respected human, time to deploy Flow (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141111T2200). Please do the needful. [22:11:30] (03PS1) 10Hashar: Lint setup.py [debs/pybal] - 10https://gerrit.wikimedia.org/r/172654 [22:13:59] (03CR) 10Alexandros Kosiaris: [C: 032] osm: avoid cronspam from osmosis [puppet] - 10https://gerrit.wikimedia.org/r/172653 (owner: 10Alexandros Kosiaris) [22:30:49] PROBLEM - Disk space on uranium is CRITICAL: DISK CRITICAL - free space: / 350 MB (3% inode=46%): [23:43:03] (03CR) 10Ori.livneh: [C: 032] Lint setup.py [debs/pybal] - 10https://gerrit.wikimedia.org/r/172654 (owner: 10Hashar) [23:43:23] (03Merged) 10jenkins-bot: Lint setup.py [debs/pybal] - 10https://gerrit.wikimedia.org/r/172654 (owner: 10Hashar) [23:51:47] (03PS5) 10Ori.livneh: base: standardize the path and file name of core dumps [puppet] - 10https://gerrit.wikimedia.org/r/171206 [23:52:53] (03CR) 10Ori.livneh: [C: 032] base: standardize the path and file name of core dumps [puppet] - 10https://gerrit.wikimedia.org/r/171206 (owner: 10Ori.livneh)