[00:04:35] !log upgrading bugzilla to 4.2.7 [00:04:47] Logged the message, Master [00:06:46] can someone explain to me like i'm five [00:06:58] why wikiversions.dat says: test2wiki php-1.22wmf22 * [00:07:23] blargh. never mind. I refreshed and now test2 magically says 22 as well. [00:14:13] AaronSchulz: did you have a chance to look at https://gerrit.wikimedia.org/r/#/c/90280/ ? [00:15:02] seems fine, can't be merged yet of course [00:15:26] (03CR) 10Aaron Schulz: [C: 031] "Can be merged around deployement" [operations/puppet] - 10https://gerrit.wikimedia.org/r/90280 (owner: 10Legoktm) [00:16:10] do you mean like the normal weekly deployment or full deployment of the extension? [00:17:32] (03PS1) 10Reedy: Kill postrewrites.conf, already handled in main.conf under wikipedia [operations/apache-config] - 10https://gerrit.wikimedia.org/r/90460 [00:18:46] I just realised I didn't properly follow up from that math rendering bug [00:20:54] did we have a bug for that? [00:20:58] yes [00:21:08] I replied to it explaining what you found and how you fixed it [00:21:09] never mind, found it [00:21:44] hope you don't mind [00:22:17] actually the MW cgroup was missing on almost all apaches [00:22:26] sigh [00:22:30] on those ones you listed, the cgroup filesystem wasn't even mounted [00:22:34] legoktm: the extension [00:22:38] so cgconfig had failed [00:22:50] not just mw-cgroup [00:23:03] I think I copied those from SAL [00:23:40] AaronSchulz: it's already on test + test2, or does it need to be everywhere? [00:24:07] yeah, but there was also [00:24:09] 00:41 Tim: manually created MW cgroups on all apaches since apparently the init script is totally broken [00:24:30] legoktm: ahh, didn't know that [00:24:35] I'm at a conf this week and pretty busy, sorry if the "this should be fixed" reply on the bug sounds a bit "can someome fix it for me" :) [00:24:36] I guess it can be done anytime then [00:24:42] !log awight synchronized php-1.22wmf21/extensions/CentralNotice [00:24:44] I just didn't want empty queues checked [00:24:54] Logged the message, Master [00:25:06] (03CR) 10Aaron Schulz: "Actually anytime :)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/90280 (owner: 10Legoktm) [00:25:18] thanks :) [00:25:28] somehow it already needs a rebase... [00:25:47] greg-g: AaronSchulz and I want to switch mediawiki writing files in swift to both pmtpa/eqiad (multiwrite) sometime next week [00:26:35] greg-g: last time around we agreed to involve you and schedule it as to not conflict with deployments -- that was a bit more risky (ceph) though [00:27:44] but we can schedule it this time too, is monday okay e.g. before or during lightning deploy time? [00:31:39] !log fixing MW cgroup on mw1109 [00:31:45] :( [00:31:51] Logged the message, Master [00:32:00] csteipp: hey [00:32:10] Hey paravoid [00:32:13] csteipp: remind me, is Special:CentralAutoLogin/start?type=script supposed to be cached? [00:32:30] let me check for sure.. [00:32:52] (it's not, ~30% of apache requests right now) [00:33:07] paravoid: Yes, it is [00:33:45] ok, I'll reopen #54195 [00:33:51] let me collect headers first [00:33:58] That would be really helpful [00:34:13] PROBLEM - Apache HTTP on mw1109 is CRITICAL: Connection refused [00:34:32] !log awight synchronized php-1.22wmf21/extensions/CentralNotice [00:35:33] (03PS1) 10Andrew Bogott: Include php5-cli in mediawiki_singlenode. [operations/puppet] - 10https://gerrit.wikimedia.org/r/90462 [00:36:25] (03CR) 10Andrew Bogott: [C: 032] Include php5-cli in mediawiki_singlenode. [operations/puppet] - 10https://gerrit.wikimedia.org/r/90462 (owner: 10Andrew Bogott) [00:36:39] (03CR) 10Andrew Bogott: [C: 032] Switch labs instances to use the mysql module. [operations/puppet] - 10https://gerrit.wikimedia.org/r/89764 (owner: 10Andrew Bogott) [00:38:23] root@mw1109:/var/log# initctl stop cgconfig [00:38:23] cgconfig stop/waiting [00:38:23] root@mw1109:/var/log# initctl status mw-cgroup [00:38:23] mw-cgroup start/running [00:38:41] so if you then start cgconfig, it doesn't start mw-cgroup because it's already started [00:39:10] that's what I pointed out before and what https://gerrit.wikimedia.org/r/#/c/83067/ was supposed to fix [00:39:36] I confess I didn't validate the fix worked as intended after merging [00:40:35] not sure why it would help [00:41:23] my interpetation of the fix was that when you'd do "stop cgconfig" it would automatically stop mw-cgroup too [00:42:26] well, a few minutes after I stopped it, the status of cgconfig is still "stop/waiting" [00:42:34] not stop/stopped [00:42:45] so I guess the event wasn't emitted [00:42:57] I'm going to do some more testing along these lines [00:43:50] hm, maybe we should make it 'stop on stopping' then [00:44:10] i don't remember ever seeing stop/stopped, but I don't have much experience with upstart [00:45:01] thanks, I can handle it if you'd prefer that, although not now for sure [00:47:54] the event was emitted [00:48:19] it goes sopping->killed->post-stop->waiting [00:48:27] and post-stop->waiting triggers the stopped event [00:48:52] cgred uses "stop on stopped cgconfig" and it was correctly stopped [00:51:02] (03PS1) 10Ori.livneh: Send VisualEditor metrics to Ganglia via StatsD [operations/puppet] - 10https://gerrit.wikimedia.org/r/90464 [00:51:39] could someone merge this ^ ? [00:56:09] paravoid: Could you grab a few more headers and confirm if all of the redirects for /start are redirecting to the special page's alias on the same wiki? [00:57:29] The cache'd version should be a redirect to login.wikimedia.org/..../checkLoggedIn [01:01:13] RECOVERY - Apache HTTP on mw1109 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.080 second response time [01:01:30] !log awight synchronized php-1.22wmf21/extensions/CentralNotice [01:02:16] !log awight synchronized php-1.22wmf21/extensions/CentralNotice [01:03:25] (03PS1) 10Dzahn: install the bug-attachment.wm.org cert on kaulen [operations/puppet] - 10https://gerrit.wikimedia.org/r/90468 [01:03:58] mutante: can my patch (90464) ride along? [01:05:13] PROBLEM - Apache HTTP on mw1109 is CRITICAL: Connection refused [01:05:44] ori-l: sorry, i don't know what that is doing, it's getting late and last time something with statsd caused reverts [01:06:09] ok, np. [01:07:24] without the one above any apache restart would have killed Bugzilla:P [01:07:27] bbiaw [01:08:30] (03CR) 10Dzahn: [C: 032] install the bug-attachment.wm.org cert on kaulen [operations/puppet] - 10https://gerrit.wikimedia.org/r/90468 (owner: 10Dzahn) [01:17:04] csteipp: I even see http->https redirects with cache-control: private -- is this because of XFF & geo? [01:17:45] arg, no, bugzilla issue with the Apache config [01:18:15] RECOVERY - Apache HTTP on mw1109 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.316 second response time [01:19:15] PROBLEM - HTTP on kaulen is CRITICAL: Connection refused [01:19:42] !log awight synchronized php-1.22wmf21/extensions/CentralNotice [01:20:55] brought it back up.. this sucks [01:21:08] we had changes merged to apache config a while back but apparently never tested [01:21:15] RECOVERY - HTTP on kaulen is OK: HTTP OK: HTTP/1.1 302 Found - 489 bytes in 0.056 second response time [01:21:25] those new SSL certs aren't right [01:22:56] so it seems that mw-cgroup was somehow missing the relevant triggers in init's soft state [01:23:17] and that when I edited mw-cgroup.conf to test another theory, the configuration was reloaded and the problem was resolved [01:23:43] as soon as I edited it, mw-cgroup started getting cgconfig's events [01:37:46] (03CR) 10Dzahn: "this broke things on kaulen. see RT #5011. momentarily live-fixed and puppet deactivated" [operations/puppet] - 10https://gerrit.wikimedia.org/r/82879 (owner: 10RobH) [02:15:02] !log LocalisationUpdate completed (1.22wmf21) at Fri Oct 18 02:15:02 UTC 2013 [02:15:16] Logged the message, Master [02:27:12] (03PS1) 10Springle: s3 master rotation for upgrade [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90476 [02:28:34] (03CR) 10Springle: [C: 032] s3 master rotation for upgrade [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90476 (owner: 10Springle) [02:33:23] !log springle synchronized wmf-config/db-eqiad.php 's3 master rotation for upgrade' [02:33:34] Logged the message, Master [02:34:35] springle, you know how the other day I needed you to figure out why mysql wouldn't come up on an instance? [02:34:45] yep [02:34:52] hi, just got the following error when trying to save an edit: [02:34:53] Error: 1290 The MariaDB server is running with the --read-only option so it cannot execute this statement (10.64.16.27) [02:34:57] Today I have exactly the same question… pretty sure it's not the same dumb mistake, although probably a similar one [02:34:57] is this known? [02:35:04] drdee: yes [02:35:13] ok [02:35:16] something about read-only mode isn't so readonly [02:35:27] andrewbogott: 5 mins [02:37:05] (03PS1) 10Springle: S3 master rotation done [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90480 [02:37:33] (03CR) 10Springle: [C: 032] S3 master rotation done [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90480 (owner: 10Springle) [02:37:50] !log springle synchronized wmf-config/db-pmtpa.php 's3 master rotation for upgrade' [02:38:02] Logged the message, Master [02:38:23] springle, any time you have a moment. [02:38:28] gee tin rsync is slow today [02:38:31] Damn, it /looks/ like the same problem as before... [02:39:40] PROBLEM - MySQL Replication Heartbeat on db1038 is CRITICAL: CRIT replication delay 315 seconds [02:39:40] PROBLEM - MySQL Replication Heartbeat on db66 is CRITICAL: CRIT replication delay 323 seconds [02:39:50] PROBLEM - MySQL Replication Heartbeat on db1010 is CRITICAL: CRIT replication delay 324 seconds [02:39:50] PROBLEM - MySQL Replication Heartbeat on db34 is CRITICAL: CRIT replication delay 325 seconds [02:40:10] PROBLEM - MySQL Replication Heartbeat on db1035 is CRITICAL: CRIT replication delay 349 seconds [02:40:10] PROBLEM - MySQL Replication Heartbeat on db39 is CRITICAL: CRIT replication delay 351 seconds [02:40:11] PROBLEM - MySQL Replication Heartbeat on db1003 is CRITICAL: CRIT replication delay 352 seconds [02:43:04] (03PS1) 10Springle: db1038 is s3 master [operations/puppet] - 10https://gerrit.wikimedia.org/r/90481 [02:43:36] !log springle synchronized wmf-config/db-eqiad.php 's3 master rotation done' [02:43:45] !log LocalisationUpdate completed (1.22wmf22) at Fri Oct 18 02:43:45 UTC 2013 [02:43:50] Logged the message, Master [02:44:04] Logged the message, Master [02:44:18] !log springle synchronized wmf-config/db-pmtpa.php 's3 master rotation done' [02:44:31] Logged the message, Master [02:45:03] andrewbogott: ok, where-abouts [02:45:11] puppet-testing-9.pmtpa.wmflabs [02:46:12] publickey denied [02:46:29] from bastion-restricted.wmflabs.org ? [02:48:02] mutante: i think you were truncated mid-sentence [02:48:09] on 2675 [02:48:14] springle, sorry, one minute... [02:48:48] springle, better now? [02:49:40] yep [02:50:42] springle, this is my first attempt at applying this puppet class to a fresh machine. I know it works when applied to one that already has the db set up. [02:51:56] /mnt/mysql hasn't been initialized. needs mysql_install_db run once to setup stuff [02:52:32] Hm, ok. Yet another failing of the puppetlabs module I guess :( [02:52:50] :) [02:53:45] thanks [02:53:46] i won't do it. presumably you want to tweak and test [02:53:49] np [02:55:41] yep, thanks. [02:55:45] I'll but you again when I have a patch [02:56:02] :) [02:56:33] Well, mysql_install_db doesn't appear anywhere in puppet code. So how was this working before? [02:56:40] Clearly we modify datadir in several places [02:57:46] normally dpkg runs mysql_install_db via hook. maybe datadir is being modified afterwards [02:58:32] or previously datadirs were cloned diectly. certainly the coredbs usually get a datadir copy to avoid slow reload [02:58:50] * springle guessing [03:00:50] andrewbogott: /var/lib/mysql does have stuff setup and dated today. mysql_install_db ran sometime, but in the wrong place [03:01:13] well, sure it's run by dpkg, but... [03:01:37] Surely the package is installed and then the datadir is set, right? It can't happen in the other order because my.cnf wouldn't exist before the package install. [03:01:48] So, we'll always get a /var/lib/mysql because that's what the dpkg installs. [03:02:28] Ah! It does dpkg-reconfigure [03:02:48] if mysqld is not running its safe to just cp -r to the new datadir and set perms [03:04:52] well… that presumes that the datadir is only ever set once… if set a second time we'd have to know the old data dir to cp [03:05:05] I think that reconfigure is the right solution, just have to get the order of ops right [03:05:13] sounds good [03:10:02] !log LocalisationUpdate ResourceLoader cache refresh completed at Fri Oct 18 03:10:02 UTC 2013 [03:10:14] Logged the message, Master [03:48:43] andrewbogott: would you like me to look over the module? [03:49:00] ori-l, the mysql module you mean? [03:49:08] Sure, although I'm somewhat down the path at this point. [03:51:10] I didn't mean a thumbs up / thumbs down, just linting. Up to you. [03:51:32] Yeah, linting would be great. [03:52:02] It has a bunch of stuff for rpm in there… not sure if we should just commit to forking and rip that stuff out. [03:53:33] I... well, you know. :) [03:55:50] yeah :) [03:55:57] that's the change, right? https://gerrit.wikimedia.org/r/#/c/88666/ [03:56:08] * ori-l promises to not be a -1 terrorist [03:57:11] That change is of interest, although not the one I'm working on now... [03:57:59] I'm not sure I'm up to the task of reconciling mysql, mysql_wmf, and coredb_mysql. [03:58:15] So they'll coexist for a while. Right now I'm working on 'mysql' [03:58:35] 'mysql' is a puppetlabs module, mysql_wmf is just copypasta of some of our old mysql_wmf classes. [03:58:44] (which, oddly, do not install mysql) [04:05:41] (03PS1) 10Andrew Bogott: dpkg-reconfigure after my.cnf changes [operations/puppet] - 10https://gerrit.wikimedia.org/r/90483 [04:06:22] springle: ^ [04:09:32] nice [04:09:44] Ugh, I'm having terrible latency connecting to labs right now. That's just me, right, not happening on your end? [04:10:51] seems ok here [04:11:12] ok, good [04:12:13] gerrit won't talk to me either :( [04:12:23] Seems increasingly unlikely that I'll get this fixed before bed [04:13:34] i think coredb_mysql will stay separate from this for some time. i'd be little nervous to link the two too tightly [04:14:47] fine with me, I'm scared to mess with coredb anyway [04:15:37] although, that paranoia is mainly in the my.cnf config area, and core monitoring needs to be hard to break [04:15:41] (03CR) 10Andrew Bogott: [C: 032] dpkg-reconfigure after my.cnf changes [operations/puppet] - 10https://gerrit.wikimedia.org/r/90483 (owner: 10Andrew Bogott) [04:16:36] d'you mind logging into sockpuppet and merging that? I can't reach the cluster right now and don't want to leave a mess :( [04:16:49] ok [04:16:59] because you speak of mysql and monitoring, i'm gonna dump this link .. but then also disappear for bed [04:17:02] https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?host=all&type=detail&servicestatustypes=8&hoststatustypes=3&serviceprops=2097162&nostatusheader [04:17:08] andrewbogott: done [04:17:09] (they are UNKNOWN, not crit) [04:17:21] but shouldn't they be crit .. or removed [04:17:22] thanks! [04:17:36] db1023,db44,db64 .. [04:17:56] mutante: they should [04:18:06] can't be that urgent because they are 25d old [04:18:14] but would be nice if we can get rid of them [04:18:27] they aren't urgent. will do [04:18:47] cool, thanks :) [04:22:20] well… clearly I've used all of my internets for the day. Thanks for your help, springle. [04:22:32] I'll worry about merging the mysql_wmf module next week probably. [04:22:39] np :) [04:22:44] good effort [04:22:49] (03PS1) 10Ori.livneh: coredb_mysql: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/90486 [04:22:54] :P [04:23:19] * ori-l blows out his smoking pistol. [04:23:23] (03CR) 10jenkins-bot: [V: 04-1] coredb_mysql: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/90486 (owner: 10Ori.livneh) [04:23:41] god damn it, jenkins. [04:23:43] heh [04:24:28] (03PS2) 10Ori.livneh: coredb_mysql: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/90486 [04:31:42] (03CR) 10Springle: [C: 032] coredb_mysql: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/90486 (owner: 10Ori.livneh) [04:32:09] thanks [04:58:40] (03PS1) 10Springle: insert HAproxy for S2 master rotation [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90487 [04:59:59] (03CR) 10Springle: [C: 032] insert HAproxy for S2 master rotation [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90487 (owner: 10Springle) [05:00:50] !log springle synchronized wmf-config/db-eqiad.php 'insert HAproxy for S2 master rotation' [05:01:06] Logged the message, Master [05:07:38] (03PS1) 10Springle: depool new master db1036 from slaves ready for HAproxy write-traffic switch [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90488 [05:08:07] (03CR) 10Springle: [C: 032] depool new master db1036 from slaves ready for HAproxy write-traffic switch [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90488 (owner: 10Springle) [05:09:06] !log springle synchronized wmf-config/db-eqiad.php 'depool new master db1036 from slaves ready for HAproxy write-traffic switch' [05:09:19] Logged the message, Master [05:16:45] PROBLEM - MySQL Slave Delay on db1034 is CRITICAL: CRIT replication delay 45700 seconds [05:18:45] RECOVERY - MySQL Slave Delay on db1034 is OK: OK replication delay 0 seconds [05:19:05] PROBLEM - MySQL Replication Heartbeat on db1002 is CRITICAL: CRIT replication delay 302 seconds [05:19:25] PROBLEM - MySQL Replication Heartbeat on db57 is CRITICAL: CRIT replication delay 322 seconds [05:19:35] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 331 seconds [05:19:45] PROBLEM - MySQL Replication Heartbeat on db1009 is CRITICAL: CRIT replication delay 334 seconds [05:19:45] PROBLEM - MySQL Replication Heartbeat on db1018 is CRITICAL: CRIT replication delay 339 seconds [05:19:55] PROBLEM - MySQL Replication Heartbeat on db54 is CRITICAL: CRIT replication delay 346 seconds [05:19:55] PROBLEM - MySQL Replication Heartbeat on db52 is CRITICAL: CRIT replication delay 348 seconds [05:20:05] PROBLEM - MySQL Replication Heartbeat on db1036 is CRITICAL: CRIT replication delay 358 seconds [05:20:14] * springle need to silence heartbeat during rotation.. bogus.. [05:24:38] (03PS1) 10Springle: remove HAproxy after S2 master switch [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90489 [05:25:29] (03PS1) 10Ori.livneh: Set one a one-year Cache-control: max-age header for fonts. [operations/apache-config] - 10https://gerrit.wikimedia.org/r/90490 [05:25:41] TimStarling: could you review that? ^ [05:26:00] (03CR) 10Springle: [C: 032] remove HAproxy after S2 master switch [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90489 (owner: 10Springle) [05:27:23] !log springle synchronized wmf-config/db-eqiad.php 'remove HAproxy after S2 master switch' [05:27:37] Logged the message, Master [05:28:19] !log springle synchronized wmf-config/db-pmtpa.php 'sync new S6 master setting to pmtpa' [05:28:32] Logged the message, Master [06:02:35] (03PS1) 10Springle: insert HAproxy into S7 for master rotation [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90491 [06:07:19] springle: when did we first start with haproxy? maybe we could get a wikitech page on it? [06:07:36] (or if you just tell me about it I can maybe write something) [06:08:13] jeremyb: today really. i'm still experimenting how to fit it in. this stuff is only for master rotations [06:08:40] springle: hah, i just found a museum piece. RT 1351 [06:08:40] whether it gets used for proper load balancing slaves or whatever, still to decide [06:10:18] springle: ok, well it's mediawiki -> haproxy -> current master? [06:10:26] heh.. yeah, won't be going back to that script [06:10:36] and it was mediawiki -> current master before ? [06:10:46] during the switch-over yes [06:10:54] for a few mins [06:10:59] then back to normal [06:11:36] ok, so it's only for during a master switch [06:11:46] so far, yes [06:11:49] k [06:12:02] other usage won't happen until next year (likely) as it's down the list a bit [06:12:15] if it gets used more than for during switches it would be cool to get that represented on dbtree [06:12:26] definitely [06:13:03] didn't see a ticket or anything on https://wikitech.wikimedia.org/wiki/Projects [06:13:09] i guess maybe it's just your own list [06:13:32] only my own stuff-to-think-about list so far [06:13:33] anyway, interesting choice. vs. e.g. lvs because we already have people that know that [06:13:37] right [06:13:55] * jeremyb will keep an eye out for more haproxy :) [06:14:31] springle: do you want to reject the ticket? :) [06:15:28] lvs is a possibility. depends on whether it's done for load balancing or HA reasons, imo [06:17:17] plus how MHA will be used with puppet, etc [06:17:22] many choices [06:18:23] yeah [06:18:35] * jeremyb goes to sleep [06:18:55] thanks for the info [06:20:12] (03CR) 10Springle: [C: 032] insert HAproxy into S7 for master rotation [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90491 (owner: 10Springle) [06:21:18] !log springle synchronized wmf-config/db-eqiad.php 'insert temporary HAproxy for S7 master rotation' [06:21:34] Logged the message, Master [06:24:31] (03PS1) 10Springle: depool db1039 for master switch [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90494 [06:24:53] (03CR) 10Springle: [C: 032] depool db1039 for master switch [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90494 (owner: 10Springle) [06:25:45] !log springle synchronized wmf-config/db-eqiad.php 'depool db1039 for master switch' [06:25:58] Logged the message, Master [06:28:05] (03PS1) 10Springle: remove Haproxy after S7 master switch [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90495 [06:28:45] PROBLEM - MySQL Replication Heartbeat on db1039 is CRITICAL: CRIT replication delay 313 seconds [06:28:45] PROBLEM - MySQL Replication Heartbeat on db68 is CRITICAL: CRIT replication delay 316 seconds [06:28:45] PROBLEM - MySQL Replication Heartbeat on db58 is CRITICAL: CRIT replication delay 317 seconds [06:28:46] PROBLEM - MySQL Replication Heartbeat on db56 is CRITICAL: CRIT replication delay 320 seconds [06:28:55] PROBLEM - MySQL Replication Heartbeat on db1007 is CRITICAL: CRIT replication delay 325 seconds [06:28:55] PROBLEM - MySQL Replication Heartbeat on db1024 is CRITICAL: CRIT replication delay 325 seconds [06:28:55] PROBLEM - MySQL Replication Heartbeat on db37 is CRITICAL: CRIT replication delay 326 seconds [06:29:05] PROBLEM - MySQL Replication Heartbeat on db1028 is CRITICAL: CRIT replication delay 334 seconds [06:29:13] (03CR) 10Springle: [C: 032] remove Haproxy after S7 master switch [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90495 (owner: 10Springle) [06:30:02] !log springle synchronized wmf-config/db-eqiad.php 'remove Haproxy after S7 master switch' [06:30:17] Logged the message, Master [06:41:12] (03PS1) 10Springle: update topology after rotations and upgrades [operations/puppet] - 10https://gerrit.wikimedia.org/r/90496 [06:42:29] (03CR) 10Springle: [C: 032] update topology after rotations and upgrades [operations/puppet] - 10https://gerrit.wikimedia.org/r/90496 (owner: 10Springle) [06:49:45] RECOVERY - MySQL Replication Heartbeat on db1009 is OK: OK replication delay 0 seconds [06:49:45] RECOVERY - MySQL Replication Heartbeat on db1018 is OK: OK replication delay 0 seconds [06:49:55] RECOVERY - MySQL Replication Heartbeat on db54 is OK: OK replication delay 0 seconds [06:49:55] RECOVERY - MySQL Replication Heartbeat on db52 is OK: OK replication delay 0 seconds [06:49:55] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 0 seconds [06:50:05] RECOVERY - MySQL Replication Heartbeat on db1036 is OK: OK replication delay 0 seconds [06:50:05] RECOVERY - MySQL Replication Heartbeat on db1002 is OK: OK replication delay 0 seconds [06:50:25] RECOVERY - MySQL Replication Heartbeat on db57 is OK: OK replication delay 0 seconds [06:51:05] RECOVERY - MySQL Replication Heartbeat on db39 is OK: OK replication delay 0 seconds [06:51:15] RECOVERY - MySQL Replication Heartbeat on db1035 is OK: OK replication delay 0 seconds [06:51:15] RECOVERY - MySQL Replication Heartbeat on db1003 is OK: OK replication delay 0 seconds [06:51:35] RECOVERY - MySQL Replication Heartbeat on db1038 is OK: OK replication delay 0 seconds [06:51:45] RECOVERY - MySQL Replication Heartbeat on db1039 is OK: OK replication delay 0 seconds [06:51:45] RECOVERY - MySQL Replication Heartbeat on db68 is OK: OK replication delay 0 seconds [06:51:45] RECOVERY - MySQL Replication Heartbeat on db66 is OK: OK replication delay 0 seconds [06:51:45] RECOVERY - MySQL Replication Heartbeat on db1010 is OK: OK replication delay 0 seconds [06:51:46] RECOVERY - MySQL Replication Heartbeat on db58 is OK: OK replication delay 0 seconds [06:51:46] RECOVERY - MySQL Replication Heartbeat on db34 is OK: OK replication delay 0 seconds [06:51:46] RECOVERY - MySQL Replication Heartbeat on db56 is OK: OK replication delay 0 seconds [06:51:55] RECOVERY - MySQL Replication Heartbeat on db1024 is OK: OK replication delay 0 seconds [06:51:55] RECOVERY - MySQL Replication Heartbeat on db1007 is OK: OK replication delay 0 seconds [06:51:55] RECOVERY - MySQL Replication Heartbeat on db37 is OK: OK replication delay 0 seconds [06:52:05] RECOVERY - MySQL Replication Heartbeat on db1028 is OK: OK replication delay 0 seconds [06:59:57] (03PS1) 10Springle: stop generating icinga messages for db hosts out of action [operations/puppet] - 10https://gerrit.wikimedia.org/r/90498 [07:00:58] (03CR) 10Springle: [C: 032] stop generating icinga messages for db hosts out of action [operations/puppet] - 10https://gerrit.wikimedia.org/r/90498 (owner: 10Springle) [08:23:05] PROBLEM - Puppet freshness on cp1022 is CRITICAL: No successful Puppet run in the last 10 hours [08:23:06] PROBLEM - Puppet freshness on cp1029 is CRITICAL: No successful Puppet run in the last 10 hours [08:23:06] PROBLEM - Puppet freshness on cp1036 is CRITICAL: No successful Puppet run in the last 10 hours [08:23:15] PROBLEM - Puppet freshness on cp1021 is CRITICAL: No successful Puppet run in the last 10 hours [08:23:15] PROBLEM - Puppet freshness on cp1028 is CRITICAL: No successful Puppet run in the last 10 hours [08:23:15] PROBLEM - Puppet freshness on cp1035 is CRITICAL: No successful Puppet run in the last 10 hours [08:23:15] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [08:23:25] PROBLEM - Puppet freshness on cp1027 is CRITICAL: No successful Puppet run in the last 10 hours [08:23:25] PROBLEM - Puppet freshness on cp1034 is CRITICAL: No successful Puppet run in the last 10 hours [08:23:25] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [08:23:35] PROBLEM - Puppet freshness on cp1026 is CRITICAL: No successful Puppet run in the last 10 hours [08:23:35] PROBLEM - Puppet freshness on cp1033 is CRITICAL: No successful Puppet run in the last 10 hours [08:23:45] PROBLEM - Puppet freshness on cp1025 is CRITICAL: No successful Puppet run in the last 10 hours [08:23:45] PROBLEM - Puppet freshness on cp1032 is CRITICAL: No successful Puppet run in the last 10 hours [08:23:55] PROBLEM - Puppet freshness on cp1024 is CRITICAL: No successful Puppet run in the last 10 hours [08:23:55] PROBLEM - Puppet freshness on cp1031 is CRITICAL: No successful Puppet run in the last 10 hours [08:24:05] PROBLEM - Puppet freshness on cp1023 is CRITICAL: No successful Puppet run in the last 10 hours [08:24:05] PROBLEM - Puppet freshness on cp1030 is CRITICAL: No successful Puppet run in the last 10 hours [08:25:49] ugh, well sorry about that, those are lies but it's the usual: they are in decom.pp and not all the way gone [08:25:55] PROBLEM - Disk space on cp1022 is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=54%): [08:26:20] (03PS1) 10Ori.livneh: coredb_mysql: convert a few additional leftover tabs to spaces [operations/puppet] - 10https://gerrit.wikimedia.org/r/90506 [08:28:55] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Fri Oct 18 08:28:49 UTC 2013 [08:29:25] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [08:31:50] apergos: morning :-] since you have done a few sql changes recently, do you have any idea how we maintained the DB user grants/passwords ? [08:31:58] apergos: that is not obvious in our puppet manifests :( [08:32:46] tbh I don't know, when we had a problem with one of the slaves having not gotten the right info we dumped the table from a good host and shovelled it into the slave [08:32:55] RECOVERY - Puppet freshness on cp1029 is OK: puppet ran at Fri Oct 18 08:32:45 UTC 2013 [08:33:04] passwords are in the private repo so they won't get lost [08:33:05] PROBLEM - Puppet freshness on cp1029 is CRITICAL: No successful Puppet run in the last 10 hours [08:33:40] here is where I wish icinga would quit telling me 'not authorized', I would turn off notiications for cp1021-1036 [08:33:41] oh well [08:33:55] RECOVERY - Puppet freshness on cp1032 is OK: puppet ran at Fri Oct 18 08:33:50 UTC 2013 [08:34:31] I should stare at the decom logic in the manifests again cause clearly that's broken for this case [08:34:45] PROBLEM - Puppet freshness on cp1032 is CRITICAL: No successful Puppet run in the last 10 hours [08:35:45] RECOVERY - Puppet freshness on cp1027 is OK: puppet ran at Fri Oct 18 08:35:40 UTC 2013 [08:36:11] when I've found that a host didn'thave the right permissions for a user (the subnet was too narrow) I've just changed them on the master so othey look like the rest in the cluster... that seems to be how it's gone for now [08:36:22] apergos: so one would fetch the password from the puppet repo and manually create the user? [08:36:25] PROBLEM - Puppet freshness on cp1027 is CRITICAL: No successful Puppet run in the last 10 hours [08:36:26] I know springle is working on some method to sync those all up [08:36:48] the puppet labs mysql module would let us create the user / ensure appropriate grants [08:36:55] not sure we want to rely on it though [08:37:19] to rely on it in labs you mean? [08:38:33] why not? [08:41:20] na I mean in production [08:41:27] I looked at the mysql puppet labs module [08:41:35] it has some stuff like database_grant() and database_user() [08:41:44] so potentially we could use those wrappers to set up grants/user [08:41:55] RECOVERY - Puppet freshness on cp1036 is OK: puppet ran at Fri Oct 18 08:41:46 UTC 2013 [08:42:05] PROBLEM - Puppet freshness on cp1036 is CRITICAL: No successful Puppet run in the last 10 hours [08:42:12] i was merely asking because I have to create a Jenkins user on the continuous integration MySQL server, was wondering how to handle it [08:42:17] I guess I will submit the password in private [08:42:20] and set it up manually. [08:42:45] RECOVERY - Puppet freshness on cp1024 is OK: puppet ran at Fri Oct 18 08:42:42 UTC 2013 [08:42:55] PROBLEM - Puppet freshness on cp1024 is CRITICAL: No successful Puppet run in the last 10 hours [08:43:01] would recommend that for now [08:43:55] RECOVERY - Puppet freshness on cp1030 is OK: puppet ran at Fri Oct 18 08:43:47 UTC 2013 [08:43:56] I guess, lets be pragmatic :-] [08:44:01] now i have to figure out the grants I need hehe [08:44:05] PROBLEM - Puppet freshness on cp1030 is CRITICAL: No successful Puppet run in the last 10 hours [08:44:45] RECOVERY - Puppet freshness on cp1026 is OK: puppet ran at Fri Oct 18 08:44:42 UTC 2013 [08:45:35] PROBLEM - Puppet freshness on cp1026 is CRITICAL: No successful Puppet run in the last 10 hours [08:51:00] hashar: can you please teach me how to test puppet code in labs or point me to docs? I have an instance, but don't know how to access it [08:54:48] matanya, https://wikitech.wikimedia.org/wiki/Help:Self-hosted_puppetmaster [08:55:01] thanks MaxSem [08:55:03] matanya: in a few minutes yes [08:56:05] RECOVERY - Puppet freshness on cp1033 is OK: puppet ran at Fri Oct 18 08:55:59 UTC 2013 [08:56:35] PROBLEM - Puppet freshness on cp1033 is CRITICAL: No successful Puppet run in the last 10 hours [08:56:45] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Fri Oct 18 08:56:39 UTC 2013 [08:57:15] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [09:00:45] RECOVERY - Puppet freshness on cp1022 is OK: puppet ran at Fri Oct 18 09:00:41 UTC 2013 [09:00:55] RECOVERY - Puppet freshness on cp1031 is OK: puppet ran at Fri Oct 18 09:00:51 UTC 2013 [09:00:57] akosiaris: if you are around, the PHP segfault is still happening but I got my mediawiki coverage report :-] [09:01:05] PROBLEM - Puppet freshness on cp1022 is CRITICAL: No successful Puppet run in the last 10 hours [09:01:06] akosiaris: https://integration.wikimedia.org/cover/mediawiki-core/master/php/ \O/ THANK YOU !!!! [09:01:55] PROBLEM - Puppet freshness on cp1031 is CRITICAL: No successful Puppet run in the last 10 hours [09:01:57] hashar: yes I am around. Happy you got results :-D [09:02:33] matanya: got to help maxsem first then will switch to you :-] [09:04:01] (03PS1) 10ArielGlenn: depool db1035 (s3) to be clone source [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90510 [09:05:27] (03CR) 10ArielGlenn: [C: 032] depool db1035 (s3) to be clone source [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90510 (owner: 10ArielGlenn) [09:05:34] matanya, I can help you meanwhile:) [09:05:55] RECOVERY - Puppet freshness on cp1028 is OK: puppet ran at Fri Oct 18 09:05:47 UTC 2013 [09:06:15] PROBLEM - Puppet freshness on cp1028 is CRITICAL: No successful Puppet run in the last 10 hours [09:06:41] !log ariel synchronized wmf-config/db-eqiad.php 'db1035 (s3) depooled to use as clone source' [09:06:53] Logged the message, Master [09:08:03] matanya, so what do you want? [09:08:54] if I was smart enough [09:09:08] I would do a video demo of puppetmaster self :D [09:10:35] dunno, that page was sufficient for me:) [09:16:05] RECOVERY - Puppet freshness on cp1034 is OK: puppet ran at Fri Oct 18 09:16:04 UTC 2013 [09:16:25] PROBLEM - Puppet freshness on cp1034 is CRITICAL: No successful Puppet run in the last 10 hours [09:17:57] MaxSem: I have an instance [09:18:20] and a changeset here: https://gerrit.wikimedia.org/r/#/c/90098/ [09:18:33] I want to test that change set within the instance. [09:18:45] RECOVERY - Puppet freshness on cp1025 is OK: puppet ran at Fri Oct 18 09:18:40 UTC 2013 [09:19:23] yup, just make the instance self-hosted [09:19:45] PROBLEM - Puppet freshness on cp1025 is CRITICAL: No successful Puppet run in the last 10 hours [09:19:46] then check out that patchset to /var/lib/git/operations/puppet [09:19:48] in the configstuff add the role puppetmaster_self? [09:19:56] yup [09:20:50] then force a run with sudo puppetd -tv [09:20:58] and how do i log in to that instance, MaxSem ? [09:21:55] RECOVERY - Puppet freshness on cp1021 is OK: puppet ran at Fri Oct 18 09:21:46 UTC 2013 [09:21:55] RECOVERY - Puppet freshness on cp1023 is OK: puppet ran at Fri Oct 18 09:21:51 UTC 2013 [09:22:05] PROBLEM - Puppet freshness on cp1023 is CRITICAL: No successful Puppet run in the last 10 hours [09:22:06] matanya, have you seen https://wikitech.wikimedia.org/wiki/Help:Access ? [09:22:15] RECOVERY - Puppet freshness on cp1035 is OK: puppet ran at Fri Oct 18 09:22:06 UTC 2013 [09:22:15] PROBLEM - Puppet freshness on cp1021 is CRITICAL: No successful Puppet run in the last 10 hours [09:22:15] PROBLEM - Puppet freshness on cp1035 is CRITICAL: No successful Puppet run in the last 10 hours [09:22:26] yes, but i get public keys issues the whole time [09:25:29] (03CR) 10ArielGlenn: [C: 032] removing spence mgmt entry (rt 5440) [operations/dns] - 10https://gerrit.wikimedia.org/r/90292 (owner: 10ArielGlenn) [09:27:49] * matanya is stupid [09:28:45] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Fri Oct 18 09:28:42 UTC 2013 [09:29:23] (03CR) 10ArielGlenn: [C: 032] coredb_mysql: convert a few additional leftover tabs to spaces [operations/puppet] - 10https://gerrit.wikimedia.org/r/90506 (owner: 10Ori.livneh) [09:29:25] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [09:33:15] matanya, what keys issue? [09:33:15] RECOVERY - Puppet freshness on cp1029 is OK: puppet ran at Fri Oct 18 09:33:09 UTC 2013 [09:33:35] i was on the wrong machine, nvm MaxSem [09:33:55] RECOVERY - Puppet freshness on cp1032 is OK: puppet ran at Fri Oct 18 09:33:49 UTC 2013 [09:34:05] PROBLEM - Puppet freshness on cp1029 is CRITICAL: No successful Puppet run in the last 10 hours [09:34:45] PROBLEM - Puppet freshness on cp1032 is CRITICAL: No successful Puppet run in the last 10 hours [09:35:46] RECOVERY - Puppet freshness on cp1027 is OK: puppet ran at Fri Oct 18 09:35:44 UTC 2013 [09:36:25] PROBLEM - Puppet freshness on cp1027 is CRITICAL: No successful Puppet run in the last 10 hours [09:39:11] matanya: available [09:39:32] (03CR) 10ArielGlenn: [C: 032] "sorry for the long delay, I'll start checking these more often." [operations/dumps] (ariel) - 10https://gerrit.wikimedia.org/r/86822 (owner: 10Adamw) [09:39:43] hashar: thanks.I have applied my patch and did some tests, how do I know all is ok? [09:39:56] matanya: where/how have you applied the patch [09:40:12] should be applied as root to /var/lib/git/operations/puppet [09:40:17] then git diff to confirm [09:40:24] if the instance has role::puppet::self role [09:40:24] in my instance at /var/lib/git/operations/puppet [09:40:24] (03CR) 10ArielGlenn: [C: 032] gitignore things [operations/dumps] (ariel) - 10https://gerrit.wikimedia.org/r/86823 (owner: 10Adamw) [09:40:29] puppet is pointing to that [09:40:38] so you just have to run puppet using: puppetd -tv [09:40:40] (as root) [09:40:45] i have [09:40:49] and it was fine [09:40:52] then hack in /var/lib/git/operations/puppet until you are happy [09:41:00] i'm happy [09:41:05] then grab those changes and send the commit back in gerrit [09:41:09] it was applied as expected [09:41:19] + add a comment about how you tested the change on XXXX labs instance [09:41:19] already commited [09:41:24] ahh [09:41:25] no fun :-] [09:41:28] congratulations [09:41:30] :P [09:41:41] ok, i'll add that note [09:41:44] also, thank you for your puppet help/cleanup :-] [09:41:55] RECOVERY - Puppet freshness on cp1036 is OK: puppet ran at Fri Oct 18 09:41:46 UTC 2013 [09:42:05] PROBLEM - Puppet freshness on cp1036 is CRITICAL: No successful Puppet run in the last 10 hours [09:42:07] hashar: i try to use fishing robs other than being fed :) [09:42:16] and thanks a lot for guidence [09:42:30] * rather than [09:42:43] (03CR) 10ArielGlenn: [C: 032] cheap hack to qualify table with prefix [operations/dumps] (ariel) - 10https://gerrit.wikimedia.org/r/86824 (owner: 10Adamw) [09:42:55] RECOVERY - Puppet freshness on cp1024 is OK: puppet ran at Fri Oct 18 09:42:46 UTC 2013 [09:42:55] PROBLEM - Puppet freshness on cp1024 is CRITICAL: No successful Puppet run in the last 10 hours [09:43:55] RECOVERY - Puppet freshness on cp1030 is OK: puppet ran at Fri Oct 18 09:43:46 UTC 2013 [09:44:05] PROBLEM - Puppet freshness on cp1030 is CRITICAL: No successful Puppet run in the last 10 hours [09:44:12] (03CR) 10Matanya: "was tested on labs at instance ssh-module. (https://wikitech.wikimedia.org/wiki/Nova_Resource:I-00000921) all looks good. please merge whe" [operations/puppet] - 10https://gerrit.wikimedia.org/r/90098 (owner: 10Matanya) [09:44:55] RECOVERY - Puppet freshness on cp1026 is OK: puppet ran at Fri Oct 18 09:44:46 UTC 2013 [09:45:33] walla, I think i'm done with this module. time for the next one [09:45:35] PROBLEM - Puppet freshness on cp1026 is CRITICAL: No successful Puppet run in the last 10 hours [09:48:56] (03CR) 10ArielGlenn: [C: 031] remove chapter domains from DNS that are not owned by the WMF [operations/dns] - 10https://gerrit.wikimedia.org/r/86659 (owner: 10Dzahn) [09:50:39] (03CR) 10ArielGlenn: [C: 032] current dhcpd.conf requires linux-host-entries.ttyS1-9600, add one [operations/puppet] - 10https://gerrit.wikimedia.org/r/88956 (owner: 10ArielGlenn) [09:52:08] hashar: who can merge my patch? [09:52:22] matanya: which patch ? [09:52:37] matanya: for operations/puppet that would be WMF operations team members [09:52:37] https://gerrit.wikimedia.org/r/90098 [09:53:07] !log Jenkins finally producing code coverage reports: https://integration.wikimedia.org/cover/mediawiki-core/master/php/ [09:53:21] Logged the message, Master [09:53:34] matanya: I guess andrew boggot will review it [09:53:52] needs to be handled carefully since that could potentially kill our ssh access :-] [09:54:05] hashar: is there any tree of who is responsible for what execpt git blame? [09:54:29] yeah access are directly in the git repositories [09:54:32] Yeah, i know. it is a sensitive change [09:55:41] git fetch gerrit refs/meta/config && git checkout FETCH_HEAD [09:55:45] RECOVERY - Puppet freshness on cp1033 is OK: puppet ran at Fri Oct 18 09:55:44 UTC 2013 [09:55:47] the file project.config hold the conf [09:55:53] should be browse able on git.wm.o [09:56:02] or you can use the Gerrit ui [09:56:31] In Gerrit: Projects > List , click a project i.e. mediawiki/core https://gerrit.wikimedia.org/r/#/admin/projects/mediawiki/core [09:56:35] PROBLEM - Puppet freshness on cp1033 is CRITICAL: No successful Puppet run in the last 10 hours [09:56:45] there under Projects > Access you will get the list of groups and the actions allowed [09:56:55] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Fri Oct 18 09:56:54 UTC 2013 [09:57:04] so for example folks of the platform-engineering are allowed force push on mediawiki/core [09:57:06] yes, found that [09:57:15] there is lot of inheritance as well [09:57:15] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [09:58:16] https://gerrit.wikimedia.org/r/#/admin/projects/operations/puppet,access shows up that only ops folks can submit :] [09:58:30] on what team are you hashar ? [09:58:38] not ops :-] [09:58:46] engineering > platform > mediawiki/core [09:58:57] though the mw core team is more a backend team [09:59:02] and a bit of devops [09:59:15] we take care of Search / Lua various extensions such as OAuth [09:59:23] and Gerrit / Jenkins [09:59:53] and of course deploying mediawiki on wikimedia cluster [09:59:55] + configuration [10:00:04] and jobqueue :-] [10:00:21] https://www.mediawiki.org/wiki/Wikimedia_MediaWiki_Core_Team [10:00:31] I tried to understand the org stracture from https://wikimediafoundation.org/wiki/Staff_and_contractors [10:00:47] but it isn't the clearest page, and many people are missing [10:00:58] matanya: https://www.mediawiki.org/wiki/Wikimedia_Engineering would give a good overview of engineering [10:01:05] RECOVERY - Puppet freshness on cp1022 is OK: puppet ran at Fri Oct 18 10:00:55 UTC 2013 [10:01:05] RECOVERY - Puppet freshness on cp1031 is OK: puppet ran at Fri Oct 18 10:01:00 UTC 2013 [10:01:05] PROBLEM - Puppet freshness on cp1022 is CRITICAL: No successful Puppet run in the last 10 hours [10:01:06] which is a fairly big department [10:01:22] well, that is the only one that interst me :) [10:01:24] I think we had a org chart somewhere [10:01:41] the last one I build was in 2007 https://meta.wikimedia.org/wiki/File:Wmf-orgchart.png [10:01:46] it is fairly outdated [10:01:55] PROBLEM - Puppet freshness on cp1031 is CRITICAL: No successful Puppet run in the last 10 hours [10:01:55] (trivia: eng was 4 staffs) [10:03:43] so the VE team for example, isn't related to your team? [10:03:44] ah, i see they are features [10:05:04] yeah features are software/ frontend ddevelopers [10:05:12] though they do deploy their change themselves [10:05:38] on top of the mediawiki backend platform that mw/core team is providing them [10:05:57] VisualEditor also maintain some backend such as Parsoid ( a javascript wikitext parser) [10:06:05] RECOVERY - Puppet freshness on cp1028 is OK: puppet ran at Fri Oct 18 10:06:01 UTC 2013 [10:06:07] I think two of them even have root access [10:06:15] PROBLEM - Puppet freshness on cp1028 is CRITICAL: No successful Puppet run in the last 10 hours [10:06:28] mobile is doing frontend development but also works with ops directly for the varnish caches [10:06:39] So the team that i'm mostly intersted in would be ops, correct? [10:06:53] and they handle relations with telco carriers for the Wikipedia Zero project (let user browse wikimedia projects without hitting their data plan) [10:07:12] for the ssh module change, yeah that would be ops [10:07:21] that is part of the low level infrastructure [10:07:27] (low not being pejorative :-D ) [10:07:40] hashar: you are too nice to ops ;) [10:07:43] looking at my patchs, they all would be ops-related [10:07:49] :D [10:08:04] yurik_: you better have to be nice to them :-] They are having a hard time and the only able to +2 your changes! [10:08:07] (03PS1) 10Mark Bergsma: Add ulsfo cache Icinga groups [operations/puppet] - 10https://gerrit.wikimedia.org/r/90515 [10:08:14] sigh [10:08:20] i'm well aware of it :( [10:08:32] matanya: and as you can see ops also maintain some tools such as Icinga a monitoring utility [10:08:36] or ganglia which provides metrics [10:08:39] * yurik_ goal in live is to give ops hard time [10:08:57] * mark will make it his goal in life to give yurik a hard time [10:08:59] from now on :) [10:09:07] nitpicking!!!!! [10:09:09] :D [10:09:13] mark: you are on ops? [10:09:14] mark! i thought you already had that goal!!! [10:09:19] haha no [10:09:24] matanya: mark IS ops [10:09:34] mark is THE ops [10:09:39] if I was I wouldn't be merging your patches within a few days [10:09:40] you mean mark==ops ? [10:10:08] matanya: mark has been there for a very long time and iirc handled most of the ops infrastructure [10:10:10] matanya: more like ops & ~mark != ops [10:10:16] Tim was doing all platform stuff [10:10:17] (03PS1) 10ArielGlenn: remove srv1-234 main and mgmt entries, except for srv193 [operations/dns] - 10https://gerrit.wikimedia.org/r/90516 [10:10:20] and Brion all the features [10:10:22] (kind of) [10:10:35] that is our triumvirate (Brion,Tim,Mark) [10:10:42] scaled out to Features, Platform, Ops [10:10:46] oh, i saw the metrics meeting, you are celebrating 7 years or so mark, right? [10:10:55] so we eventually went from 3 folks to ~90 folks [10:10:57] mark, i'm still having hard time figuring out ESI bug :((( -- the beta cluster ignores my spoofing attempts :( [10:11:04] uh, that was in the metric meeting? [10:11:10] I'm not celebrating anything ;p [10:11:17] :P [10:11:37] hashar: brion is in mobile, not features [10:11:42] ah yea [10:11:45] well you were mentioned as the longest running staffer or something similar [10:11:58] no Tim is [10:12:01] * yurik_ is once again ignored by ops :( [10:12:31] opsen work is awsome [10:12:43] TimStarling is god, mark is more of a st peter role [10:12:48] nop [10:12:49] brion is god [10:12:56] tim is satan :-D [10:13:01] LOL [10:13:02] LOL [10:13:20] they are both equally powerful, but with a different style :-D [10:13:28] we just have to publish this somewhere so not to confuse newcomers [10:13:41] yurik_: figure out using varnishlog what it's doing [10:13:45] hashar: if i would want to help with RT tickets where would i go? [10:13:46] mark is Jesus, making it possible for god and satan to play with folks. [10:13:54] although I guess netmapper won't log what it's doing perhaps [10:14:00] matanya: you would not a non disclosure agreement signed with WMF. [10:14:09] matanya: I think we have a few volunteers enrolled in RT already. [10:14:19] yurik_: add debug log lines with std.log("bla") in the VCL [10:14:23] look for that [10:14:25] matanya: I am not sure whether there is a process. I guess you want to be elected by the cabal. [10:15:18] well, i'm already in a few cabal in the site itself, maybe helping in the backend would be nice too :) [10:15:46] RECOVERY - Puppet freshness on cp1034 is OK: puppet ran at Fri Oct 18 10:15:43 UTC 2013 [10:15:46] well there are several layers of cabals involved, I don't even know all of them [10:16:25] PROBLEM - Puppet freshness on cp1034 is CRITICAL: No successful Puppet run in the last 10 hours [10:16:50] hashar: i better off focus on puppet for now, i guess [10:17:21] matanya: a nice addition would be to write some unit tests for our manifests :d [10:17:30] andrew boggot can help there [10:17:49] https://integration.wikimedia.org/ci/job/operations-puppet-spec/ [10:17:56] hashar: i'm no puppet expert, just a user with some exp [10:18:19] hint: rake spec [10:18:23] at the top of operations/puppet :-] [10:18:35] yeah, saw that [10:18:45] RECOVERY - Puppet freshness on cp1025 is OK: puppet ran at Fri Oct 18 10:18:39 UTC 2013 [10:18:53] wrote one for myself, but it is buggy and ugly [10:19:03] snd full with false alarms [10:19:08] *and [10:19:45] PROBLEM - Puppet freshness on cp1025 is CRITICAL: No successful Puppet run in the last 10 hours [10:19:55] RECOVERY - Puppet freshness on cp1023 is OK: puppet ran at Fri Oct 18 10:19:45 UTC 2013 [10:20:05] PROBLEM - Puppet freshness on cp1023 is CRITICAL: No successful Puppet run in the last 10 hours [10:20:15] hashar: I think moving all manifests to modules has higher priority now anyways. So i prefer focusing on that [10:21:19] yeah [10:21:35] the idea is to avoid loading unneeded manifests [10:21:47] and migrating to modules let us do that since puppet will use autoloading [10:21:53] that will also let us write unit tests for our modules [10:21:58] and later on integration tests [10:22:05] RECOVERY - Puppet freshness on cp1021 is OK: puppet ran at Fri Oct 18 10:21:55 UTC 2013 [10:22:05] RECOVERY - Puppet freshness on cp1035 is OK: puppet ran at Fri Oct 18 10:21:55 UTC 2013 [10:22:15] PROBLEM - Puppet freshness on cp1035 is CRITICAL: No successful Puppet run in the last 10 hours [10:22:15] PROBLEM - Puppet freshness on cp1021 is CRITICAL: No successful Puppet run in the last 10 hours [10:22:25] ok, i'll stop wasting your time, and go do some work :) [10:22:42] I am watching the sept. monthly metrics video http://www.youtube.com/watch?v=S8FGNJWhEYY :D [10:23:26] hashar: you are not "at" work? [10:23:49] I am [10:23:59] err [10:24:10] I mean I am technically working right now, albeit from Nantes, France [10:24:15] in a coworking place :-] [10:24:17] I am a remotee [10:24:46] WMF can't afford a european HQ :] [10:25:06] and even if it did, we are scattered all around europe [10:25:21] spain/russia/uk/france/netherlands .. [10:25:32] aha, that is why your awake at such hours [10:25:48] *you're [10:26:39] classic. ops, I get wikimedia error [10:26:52] varnish issues [10:27:13] If you report this error to the Wikimedia System Administrators, please include the details below. [10:27:14] Request: GET http://git.wikimedia.org/raw/operations%2fpuppet.git/HEAD/manifests%2fadmins.pp, from 127.0.0.1 via cp1043 cp1043 ([127.0.0.1]:80), Varnish XID 333909843 [10:27:14] Forwarded for: ::ffff:199.203.78.152, 127.0.0.1 [10:27:14] Error: 500, Internal Server Error at Fri, 18 Oct 2013 10:26:08 GMT [10:28:33] akosiaris: ^ [10:28:55] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Fri Oct 18 10:28:47 UTC 2013 [10:29:14] that is the git backend being dead I guess [10:29:25] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [10:30:19] gerrit doesn't load for me [10:31:05] antinomy is up https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=antimony [10:31:42] but still lacking the gitblit monitoring :( which is bug https://bugzilla.wikimedia.org/show_bug.cgi?id=51983 [10:32:24] sooo for this to be a 500 [10:32:33] gitblit is probably to blame [10:32:43] but it otherwise works fine... [10:32:55] RECOVERY - Puppet freshness on cp1029 is OK: puppet ran at Fri Oct 18 10:32:47 UTC 2013 [10:33:05] PROBLEM - Puppet freshness on cp1029 is CRITICAL: No successful Puppet run in the last 10 hours [10:33:55] RECOVERY - Puppet freshness on cp1032 is OK: puppet ran at Fri Oct 18 10:33:48 UTC 2013 [10:34:45] PROBLEM - Puppet freshness on cp1032 is CRITICAL: No successful Puppet run in the last 10 hours [10:35:04] (03CR) 10Hashar: "Pinged Chad / Leslie by email to move this forward." [operations/puppet] - 10https://gerrit.wikimedia.org/r/75777 (owner: 10Chad) [10:35:36] aaaah crap java [10:35:43] yeah :( [10:35:45] RECOVERY - Puppet freshness on cp1027 is OK: puppet ran at Fri Oct 18 10:35:38 UTC 2013 [10:35:57] https://wikitech.wikimedia.org/wiki/Gitblit [10:36:08] apparently /etc/init.d/gitblit [10:36:21] on antimony.wikimedia.org [10:36:25] PROBLEM - Puppet freshness on cp1027 is CRITICAL: No successful Puppet run in the last 10 hours [10:36:46] it is running [10:36:56] i wanna see if it logged anything first [10:37:03] and what state it is at... [10:37:10] java..... :-( ... lucky me [10:38:46] meh... i find .exes in the directory... [10:38:48] :-( [10:38:52] ahah [10:38:58] but no log [10:41:25] Oct 18 08:46:47 208.80.154.7 puppet-agent[6607]: (/Stage[main]/Gitblit::Instance/Service[gitblit]/ensure) ensure changed 'stopped' to 'running' [10:41:25] bah [10:41:55] RECOVERY - Puppet freshness on cp1036 is OK: puppet ran at Fri Oct 18 10:41:45 UTC 2013 [10:42:03] the service {} definition or init script is wrong I gues [10:42:05] PROBLEM - Puppet freshness on cp1036 is CRITICAL: No successful Puppet run in the last 10 hours [10:42:10] so is restarted it... [10:42:13] let's see [10:42:32] works for me [10:42:45] RECOVERY - Puppet freshness on cp1024 is OK: puppet ran at Fri Oct 18 10:42:40 UTC 2013 [10:42:46] at least the main page hehe [10:42:55] PROBLEM - Puppet freshness on cp1024 is CRITICAL: No successful Puppet run in the last 10 hours [10:43:26] bah [10:43:26] http://git.wikimedia.org/raw/operations%2Fpuppet.git/master/manifests%2Fadmins.pp [10:43:30] does not work [10:43:33] but http://git.wikimedia.org/raw/operations%2Fpuppet.git/master/manifests%2Fadmins.pp [10:43:39] but giving the sha1 works http://git.wikimedia.org/raw/operations%2Fpuppet.git/c608a72d885e704856b38054b6bf49186183022c/manifests%2Fadmins.pp [10:43:43] that is a bug in giblet I guess [10:43:48] akosiaris: nothing you can do I guess [10:43:49] http://git.wikimedia.org/raw/operations%2Fpuppet.git/c608a72d885e704856b38054b6bf49186183022c/manifests%2Fadmins.pp [10:43:54] this works [10:43:55] RECOVERY - Puppet freshness on cp1030 is OK: puppet ran at Fri Oct 18 10:43:46 UTC 2013 [10:44:05] PROBLEM - Puppet freshness on cp1030 is CRITICAL: No successful Puppet run in the last 10 hours [10:44:08] at least for me [10:44:14] matanya: can you fill a bug against Wikimedia > git/gerrit ? [10:45:05] RECOVERY - Puppet freshness on cp1026 is OK: puppet ran at Fri Oct 18 10:45:01 UTC 2013 [10:45:05] ahh [10:45:10] master does not exist *facepalm [10:45:19] production does http://git.wikimedia.org/raw/operations%2Fpuppet.git/production/manifests%2Fadmins.pp [10:45:35] PROBLEM - Puppet freshness on cp1026 is CRITICAL: No successful Puppet run in the last 10 hours [10:46:14] matanya: maybe gitblit considers HEAD to be master whereas on operations/puppet.git that is 'production' [10:47:42] matanya: how did you end up with that url ? [10:48:18] akosiaris: browse the tree http://git.wikimedia.org/tree/operations%2Fpuppet.git [10:48:21] click on a file [10:48:28] argh no sorry [10:48:35] I can never get HEAD in the url [10:48:55] Failed to find commit "HEAD" in operations/puppet.git! hehe [10:49:05] PROBLEM - Puppet freshness on hafnium is CRITICAL: No successful Puppet run in the last 10 hours [10:49:20] does not seem like a gitblit problem... [10:50:19] (03PS1) 10ArielGlenn: rmeove owa1-3, boxes long since wiped and reclaimed (rt #5143) [operations/puppet] - 10https://gerrit.wikimedia.org/r/90520 [10:50:44] from here : https://meta.wikimedia.org/wiki/System_administrators < akosiaris and hashar [10:50:59] (03PS2) 10ArielGlenn: remove owa1-3, boxes long since wiped and reclaimed (rt #5143) [operations/puppet] - 10https://gerrit.wikimedia.org/r/90520 [10:51:24] matanya: where on that page ? should be fixed I guess [10:51:59] the git link i found at the end of the page is ok [10:52:29] it is actually wrong since it points to a specific sha1 instead of production :C) amending [10:53:11] lol [10:53:23] i must admit I did not even consider that [10:53:28] thanx for fixing it [10:53:46] (03CR) 10ArielGlenn: [C: 032] remove owa1-3, boxes long since wiped and reclaimed (rt #5143) [operations/puppet] - 10https://gerrit.wikimedia.org/r/90520 (owner: 10ArielGlenn) [10:54:56] and fixed the other occurence [10:56:05] RECOVERY - Puppet freshness on cp1033 is OK: puppet ran at Fri Oct 18 10:56:03 UTC 2013 [10:56:35] PROBLEM - Puppet freshness on cp1033 is CRITICAL: No successful Puppet run in the last 10 hours [10:56:46] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Fri Oct 18 10:56:43 UTC 2013 [10:57:01] (03PS1) 10ArielGlenn: toss dataset1 from dsh groups, long since gone [operations/puppet] - 10https://gerrit.wikimedia.org/r/90522 [10:57:15] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [10:58:06] (03CR) 10ArielGlenn: [C: 032] toss dataset1 from dsh groups, long since gone [operations/puppet] - 10https://gerrit.wikimedia.org/r/90522 (owner: 10ArielGlenn) [10:59:19] poor gallium is filling its disk :/ http://ganglia.wikimedia.org/latest/graph.php?r=year&z=xlarge&c=Miscellaneous+eqiad&h=gallium.wikimedia.org&jr=&js=&event=show&ts=0&v=66.6&m=part_max_used&vl=%25&ti=Maximum+Disk+Space+Used [10:59:22] 65% usage :-] [10:59:29] got to compress some stuff i guess [10:59:30] hashar: So I run code coverage on https://github.com/akosiaris/servermon and tests give it a 77% coverage report [11:00:00] ?!! [11:00:04] is that something you are writing ? [11:00:15] it is installed in wmf [11:00:28] on socketpuppet so you don't have access [11:00:32] we have so many monitoring systems :-( [11:00:40] it is not a monitoring system [11:00:46] RECOVERY - Puppet freshness on cp1031 is OK: puppet ran at Fri Oct 18 11:00:44 UTC 2013 [11:00:46] thank god :-) [11:00:55] PROBLEM - Puppet freshness on cp1031 is CRITICAL: No successful Puppet run in the last 10 hours [11:00:55] RECOVERY - Puppet freshness on cp1022 is OK: puppet ran at Fri Oct 18 11:00:49 UTC 2013 [11:00:58] albeit the name was chosen poorly [11:01:02] * hashar fills a RT to get sock puppet access to play with servermon [11:01:05] it is a reporting system [11:01:05] PROBLEM - Puppet freshness on cp1022 is CRITICAL: No successful Puppet run in the last 10 hours [11:01:26] hmmm is suppose you got access to the bastion host right ? [11:01:31] doesn't puppet has such a dashboard already? [11:01:50] yes [11:02:02] and yes I have bastion access [11:02:02] That being said... it does not do much [11:02:35] for example you do not know with puppet-dashboard which machines need updates [11:05:55] RECOVERY - Puppet freshness on cp1028 is OK: puppet ran at Fri Oct 18 11:05:45 UTC 2013 [11:06:15] PROBLEM - Puppet freshness on cp1028 is CRITICAL: No successful Puppet run in the last 10 hours [11:16:05] RECOVERY - Puppet freshness on cp1034 is OK: puppet ran at Fri Oct 18 11:15:58 UTC 2013 [11:16:25] PROBLEM - Puppet freshness on cp1034 is CRITICAL: No successful Puppet run in the last 10 hours [11:18:45] RECOVERY - Puppet freshness on cp1025 is OK: puppet ran at Fri Oct 18 11:18:39 UTC 2013 [11:19:45] PROBLEM - Puppet freshness on cp1025 is CRITICAL: No successful Puppet run in the last 10 hours [11:21:45] RECOVERY - Puppet freshness on cp1023 is OK: puppet ran at Fri Oct 18 11:21:35 UTC 2013 [11:21:55] RECOVERY - Puppet freshness on cp1035 is OK: puppet ran at Fri Oct 18 11:21:45 UTC 2013 [11:21:55] RECOVERY - Puppet freshness on cp1021 is OK: puppet ran at Fri Oct 18 11:21:45 UTC 2013 [11:22:05] PROBLEM - Puppet freshness on cp1023 is CRITICAL: No successful Puppet run in the last 10 hours [11:22:15] PROBLEM - Puppet freshness on cp1035 is CRITICAL: No successful Puppet run in the last 10 hours [11:22:15] PROBLEM - Puppet freshness on cp1021 is CRITICAL: No successful Puppet run in the last 10 hours [11:28:55] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Fri Oct 18 11:28:46 UTC 2013 [11:29:25] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [11:32:55] RECOVERY - Puppet freshness on cp1029 is OK: puppet ran at Fri Oct 18 11:32:48 UTC 2013 [11:33:05] PROBLEM - Puppet freshness on cp1029 is CRITICAL: No successful Puppet run in the last 10 hours [11:33:45] RECOVERY - Puppet freshness on cp1032 is OK: puppet ran at Fri Oct 18 11:33:43 UTC 2013 [11:34:27] (03CR) 10Mark Bergsma: [C: 032] Add ulsfo cache Icinga groups [operations/puppet] - 10https://gerrit.wikimedia.org/r/90515 (owner: 10Mark Bergsma) [11:34:45] PROBLEM - Puppet freshness on cp1032 is CRITICAL: No successful Puppet run in the last 10 hours [11:35:55] RECOVERY - Puppet freshness on cp1027 is OK: puppet ran at Fri Oct 18 11:35:48 UTC 2013 [11:36:25] PROBLEM - Puppet freshness on cp1027 is CRITICAL: No successful Puppet run in the last 10 hours [11:40:09] (03PS1) 10ArielGlenn: db1019 (s3) -> file_per_table, mariadb [operations/puppet] - 10https://gerrit.wikimedia.org/r/90526 [11:41:07] (03CR) 10ArielGlenn: [C: 032] db1019 (s3) -> file_per_table, mariadb [operations/puppet] - 10https://gerrit.wikimedia.org/r/90526 (owner: 10ArielGlenn) [11:41:27] mark: unrelated to ESI: are you planning to split IP pool into MEDIA, DESKTOP, and MOBILE ranges? Or just media & non-media? [11:41:46] media & non-media I think [11:41:47] why desktop? [11:41:55] RECOVERY - Puppet freshness on cp1036 is OK: puppet ran at Fri Oct 18 11:41:45 UTC 2013 [11:42:05] PROBLEM - Puppet freshness on cp1036 is CRITICAL: No successful Puppet run in the last 10 hours [11:42:06] because this way carriers might object to giving desktop version [11:42:11] which uses much more bandwidth [11:43:05] RECOVERY - Puppet freshness on cp1024 is OK: puppet ran at Fri Oct 18 11:42:56 UTC 2013 [11:43:24] we can't make a separate range for everything a carrier might object to :) [11:43:55] PROBLEM - Puppet freshness on cp1024 is CRITICAL: No successful Puppet run in the last 10 hours [11:44:05] RECOVERY - Puppet freshness on cp1030 is OK: puppet ran at Fri Oct 18 11:44:01 UTC 2013 [11:44:55] RECOVERY - Puppet freshness on cp1026 is OK: puppet ran at Fri Oct 18 11:44:46 UTC 2013 [11:45:05] PROBLEM - Puppet freshness on cp1030 is CRITICAL: No successful Puppet run in the last 10 hours [11:45:32] mark: and nowdays lots of people are using various dangles and tethering options to get their laptops online. Well, this would be a big change for them I think - from whitelisting mobile site to whitelisting desktop. Besides, if we split media from non-media IP ranges, and carrier only whitelists non-media, there is no point to zero-rate desktop [11:45:35] PROBLEM - Puppet freshness on cp1026 is CRITICAL: No successful Puppet run in the last 10 hours [11:45:54] also considering that we won't be showing banners on the desktop site [11:46:10] well [11:46:22] 2 ranges might be feasible, 3 ranges is very hard if you need to be able to add and remove ips in them [11:46:26] so at least we should start thinking about it [11:46:29] so you can work out what you want in which range [11:47:40] (03PS1) 10Mark Bergsma: Add ulsfo upload caches [operations/puppet] - 10https://gerrit.wikimedia.org/r/90528 [11:48:14] mark, at the end of the day, current situation is that some carriers include just no-media, some - media & no media. If we switch them to IPs, they will get ALL languages (not a very big increase), plus ALL sister sites (also not a very big increase) - both are fairly easy sell for us [11:48:35] (03CR) 10Mark Bergsma: [C: 032] Add ulsfo upload caches [operations/puppet] - 10https://gerrit.wikimedia.org/r/90528 (owner: 10Mark Bergsma) [11:48:36] but if we piggyback desktop site, they might object [11:49:12] make two ranges then [11:49:20] low bandwidth, potentially high bandwidth [11:49:57] desktop + media in one, mobile text only in another? [11:50:05] yeah [11:50:27] hmm... will see what biz dev thinks [11:53:48] (03PS1) 10Mark Bergsma: Add ulsfo upload cache node entries [operations/puppet] - 10https://gerrit.wikimedia.org/r/90529 [11:55:11] (03PS2) 10Mark Bergsma: Add ulsfo upload cache node entries [operations/puppet] - 10https://gerrit.wikimedia.org/r/90529 [11:55:45] RECOVERY - Puppet freshness on cp1033 is OK: puppet ran at Fri Oct 18 11:55:43 UTC 2013 [11:55:59] (03CR) 10Mark Bergsma: [C: 032] Add ulsfo upload cache node entries [operations/puppet] - 10https://gerrit.wikimedia.org/r/90529 (owner: 10Mark Bergsma) [11:56:35] PROBLEM - Puppet freshness on cp1033 is CRITICAL: No successful Puppet run in the last 10 hours [11:56:45] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Fri Oct 18 11:56:44 UTC 2013 [11:57:15] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [11:59:30] (03PS1) 10Mark Bergsma: Make Varnish the default for upload now [operations/puppet] - 10https://gerrit.wikimedia.org/r/90530 [12:00:36] (03CR) 10Mark Bergsma: [C: 032] Make Varnish the default for upload now [operations/puppet] - 10https://gerrit.wikimedia.org/r/90530 (owner: 10Mark Bergsma) [12:00:55] RECOVERY - Puppet freshness on cp1031 is OK: puppet ran at Fri Oct 18 12:00:45 UTC 2013 [12:00:55] RECOVERY - Puppet freshness on cp1022 is OK: puppet ran at Fri Oct 18 12:00:45 UTC 2013 [12:00:55] PROBLEM - Puppet freshness on cp1031 is CRITICAL: No successful Puppet run in the last 10 hours [12:01:05] PROBLEM - Puppet freshness on cp1022 is CRITICAL: No successful Puppet run in the last 10 hours [12:01:33] icinga-wm is still smoking crack [12:06:05] RECOVERY - Puppet freshness on cp1028 is OK: puppet ran at Fri Oct 18 12:06:01 UTC 2013 [12:06:15] PROBLEM - Puppet freshness on cp1028 is CRITICAL: No successful Puppet run in the last 10 hours [12:16:15] RECOVERY - Puppet freshness on cp1034 is OK: puppet ran at Fri Oct 18 12:16:05 UTC 2013 [12:16:25] PROBLEM - Puppet freshness on cp1034 is CRITICAL: No successful Puppet run in the last 10 hours [12:18:55] RECOVERY - Puppet freshness on cp1025 is OK: puppet ran at Fri Oct 18 12:18:45 UTC 2013 [12:19:45] PROBLEM - Puppet freshness on cp1025 is CRITICAL: No successful Puppet run in the last 10 hours [12:20:05] RECOVERY - Puppet freshness on cp1023 is OK: puppet ran at Fri Oct 18 12:19:56 UTC 2013 [12:20:41] (03PS1) 10Mark Bergsma: Fix ulsfo upload LVS services [operations/puppet] - 10https://gerrit.wikimedia.org/r/90533 [12:21:05] PROBLEM - Puppet freshness on cp1023 is CRITICAL: No successful Puppet run in the last 10 hours [12:21:48] (03CR) 10Mark Bergsma: [C: 032] Fix ulsfo upload LVS services [operations/puppet] - 10https://gerrit.wikimedia.org/r/90533 (owner: 10Mark Bergsma) [12:22:05] RECOVERY - Puppet freshness on cp1021 is OK: puppet ran at Fri Oct 18 12:21:57 UTC 2013 [12:22:15] PROBLEM - Puppet freshness on cp1021 is CRITICAL: No successful Puppet run in the last 10 hours [12:24:35] RECOVERY - Puppet freshness on cp1035 is OK: puppet ran at Fri Oct 18 12:24:33 UTC 2013 [12:25:15] PROBLEM - Puppet freshness on cp1035 is CRITICAL: No successful Puppet run in the last 10 hours [12:28:45] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Fri Oct 18 12:28:44 UTC 2013 [12:29:25] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [12:32:55] RECOVERY - Puppet freshness on cp1029 is OK: puppet ran at Fri Oct 18 12:32:45 UTC 2013 [12:33:05] PROBLEM - Puppet freshness on cp1029 is CRITICAL: No successful Puppet run in the last 10 hours [12:33:55] RECOVERY - Puppet freshness on cp1032 is OK: puppet ran at Fri Oct 18 12:33:45 UTC 2013 [12:34:45] PROBLEM - Puppet freshness on cp1032 is CRITICAL: No successful Puppet run in the last 10 hours [12:35:45] RECOVERY - Puppet freshness on cp1027 is OK: puppet ran at Fri Oct 18 12:35:40 UTC 2013 [12:36:25] PROBLEM - Puppet freshness on cp1027 is CRITICAL: No successful Puppet run in the last 10 hours [12:41:55] RECOVERY - Puppet freshness on cp1036 is OK: puppet ran at Fri Oct 18 12:41:47 UTC 2013 [12:42:05] PROBLEM - Puppet freshness on cp1036 is CRITICAL: No successful Puppet run in the last 10 hours [12:42:45] RECOVERY - Puppet freshness on cp1024 is OK: puppet ran at Fri Oct 18 12:42:43 UTC 2013 [12:42:55] PROBLEM - Puppet freshness on cp1024 is CRITICAL: No successful Puppet run in the last 10 hours [12:43:15] PROBLEM - Host cp4005 is DOWN: PING CRITICAL - Packet loss = 100% [12:43:55] RECOVERY - Puppet freshness on cp1030 is OK: puppet ran at Fri Oct 18 12:43:53 UTC 2013 [12:44:05] PROBLEM - Puppet freshness on cp1030 is CRITICAL: No successful Puppet run in the last 10 hours [12:44:45] RECOVERY - Host cp4005 is UP: PING OK - Packet loss = 0%, RTA = 73.84 ms [12:45:05] RECOVERY - Puppet freshness on cp1026 is OK: puppet ran at Fri Oct 18 12:45:03 UTC 2013 [12:45:35] PROBLEM - Puppet freshness on cp1026 is CRITICAL: No successful Puppet run in the last 10 hours [12:45:35] PROBLEM - Host cp4006 is DOWN: PING CRITICAL - Packet loss = 100% [12:46:25] RECOVERY - Host cp4006 is UP: PING OK - Packet loss = 0%, RTA = 73.26 ms [12:47:55] PROBLEM - Host cp4007 is DOWN: PING CRITICAL - Packet loss = 100% [12:49:15] RECOVERY - Host cp4007 is UP: PING OK - Packet loss = 0%, RTA = 73.73 ms [12:50:20] (03PS1) 10Vogone: Changed Tamil Wikiquote logo [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90538 [12:55:25] PROBLEM - Host cp4013 is DOWN: PING CRITICAL - Packet loss = 100% [12:55:55] RECOVERY - Puppet freshness on cp1033 is OK: puppet ran at Fri Oct 18 12:55:45 UTC 2013 [12:56:35] PROBLEM - Puppet freshness on cp1033 is CRITICAL: No successful Puppet run in the last 10 hours [12:56:35] RECOVERY - Host cp4013 is UP: PING OK - Packet loss = 0%, RTA = 74.99 ms [12:56:55] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Fri Oct 18 12:56:45 UTC 2013 [12:57:15] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [12:57:35] PROBLEM - Host cp4014 is DOWN: PING CRITICAL - Packet loss = 100% [12:58:21] (03PS1) 10Mark Bergsma: Add ulsfo upload caches aggregators to gmetad [operations/puppet] - 10https://gerrit.wikimedia.org/r/90539 [12:58:25] RECOVERY - Host cp4014 is UP: PING OK - Packet loss = 0%, RTA = 75.02 ms [12:59:15] (03CR) 10Mark Bergsma: [C: 032] Add ulsfo upload caches aggregators to gmetad [operations/puppet] - 10https://gerrit.wikimedia.org/r/90539 (owner: 10Mark Bergsma) [13:00:35] PROBLEM - Host cp4015 is DOWN: PING CRITICAL - Packet loss = 100% [13:00:45] PROBLEM - NTP on cp4006 is CRITICAL: NTP CRITICAL: Offset unknown [13:01:05] RECOVERY - Puppet freshness on cp1031 is OK: puppet ran at Fri Oct 18 13:01:01 UTC 2013 [13:01:05] RECOVERY - Puppet freshness on cp1022 is OK: puppet ran at Fri Oct 18 13:01:01 UTC 2013 [13:01:15] RECOVERY - Host cp4015 is UP: PING OK - Packet loss = 0%, RTA = 76.52 ms [13:01:55] PROBLEM - Puppet freshness on cp1031 is CRITICAL: No successful Puppet run in the last 10 hours [13:02:05] PROBLEM - Puppet freshness on cp1022 is CRITICAL: No successful Puppet run in the last 10 hours [13:03:19] (03PS1) 10Vogone: Enabled the abusefilter block option for English Wikinews [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90540 [13:03:55] PROBLEM - NTP on cp4007 is CRITICAL: NTP CRITICAL: Offset unknown [13:05:45] RECOVERY - NTP on cp4006 is OK: NTP OK: Offset -0.003575325012 secs [13:05:55] RECOVERY - Puppet freshness on cp1028 is OK: puppet ran at Fri Oct 18 13:05:47 UTC 2013 [13:06:15] PROBLEM - Puppet freshness on cp1028 is CRITICAL: No successful Puppet run in the last 10 hours [13:07:55] RECOVERY - NTP on cp4007 is OK: NTP OK: Offset 0.003347873688 secs [13:16:15] RECOVERY - Puppet freshness on cp1034 is OK: puppet ran at Fri Oct 18 13:16:05 UTC 2013 [13:16:25] PROBLEM - Puppet freshness on cp1034 is CRITICAL: No successful Puppet run in the last 10 hours [13:18:45] RECOVERY - Puppet freshness on cp1025 is OK: puppet ran at Fri Oct 18 13:18:40 UTC 2013 [13:19:45] PROBLEM - Puppet freshness on cp1025 is CRITICAL: No successful Puppet run in the last 10 hours [13:19:55] RECOVERY - Puppet freshness on cp1023 is OK: puppet ran at Fri Oct 18 13:19:46 UTC 2013 [13:20:05] PROBLEM - Puppet freshness on cp1023 is CRITICAL: No successful Puppet run in the last 10 hours [13:22:15] RECOVERY - Puppet freshness on cp1021 is OK: puppet ran at Fri Oct 18 13:22:06 UTC 2013 [13:22:15] PROBLEM - Puppet freshness on cp1021 is CRITICAL: No successful Puppet run in the last 10 hours [13:22:35] RECOVERY - Puppet freshness on cp1035 is OK: puppet ran at Fri Oct 18 13:22:26 UTC 2013 [13:23:15] PROBLEM - Puppet freshness on cp1035 is CRITICAL: No successful Puppet run in the last 10 hours [13:28:55] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Fri Oct 18 13:28:48 UTC 2013 [13:29:25] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [13:32:55] RECOVERY - Puppet freshness on cp1029 is OK: puppet ran at Fri Oct 18 13:32:49 UTC 2013 [13:33:05] PROBLEM - Puppet freshness on cp1029 is CRITICAL: No successful Puppet run in the last 10 hours [13:33:45] RECOVERY - Puppet freshness on cp1032 is OK: puppet ran at Fri Oct 18 13:33:44 UTC 2013 [13:34:45] PROBLEM - Puppet freshness on cp1032 is CRITICAL: No successful Puppet run in the last 10 hours [13:35:45] RECOVERY - Puppet freshness on cp1027 is OK: puppet ran at Fri Oct 18 13:35:40 UTC 2013 [13:36:25] PROBLEM - Puppet freshness on cp1027 is CRITICAL: No successful Puppet run in the last 10 hours [13:39:26] (03PS1) 10Mark Bergsma: Update cache list [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90542 [13:39:30] (03CR) 10jenkins-bot: [V: 04-1] Update cache list [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90542 (owner: 10Mark Bergsma) [13:41:55] RECOVERY - Puppet freshness on cp1036 is OK: puppet ran at Fri Oct 18 13:41:46 UTC 2013 [13:42:04] (03PS2) 10Mark Bergsma: Update cache list [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90542 [13:42:05] PROBLEM - Puppet freshness on cp1036 is CRITICAL: No successful Puppet run in the last 10 hours [13:42:45] RECOVERY - Puppet freshness on cp1024 is OK: puppet ran at Fri Oct 18 13:42:41 UTC 2013 [13:42:52] (03PS3) 10Mark Bergsma: Update cache list [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90542 [13:42:55] PROBLEM - Puppet freshness on cp1024 is CRITICAL: No successful Puppet run in the last 10 hours [13:43:48] (03CR) 10Mark Bergsma: [C: 032] Update cache list [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90542 (owner: 10Mark Bergsma) [13:43:55] RECOVERY - Puppet freshness on cp1030 is OK: puppet ran at Fri Oct 18 13:43:46 UTC 2013 [13:44:05] PROBLEM - Puppet freshness on cp1030 is CRITICAL: No successful Puppet run in the last 10 hours [13:44:45] RECOVERY - Puppet freshness on cp1026 is OK: puppet ran at Fri Oct 18 13:44:42 UTC 2013 [13:45:35] PROBLEM - Puppet freshness on cp1026 is CRITICAL: No successful Puppet run in the last 10 hours [13:45:38] !log mark synchronized wmf-config/squid.php 'Update cache list' [13:45:39] (03PS4) 10Andrew Bogott: Remove generic::mysql::packages in favor of mysql module [operations/puppet] - 10https://gerrit.wikimedia.org/r/90194 [13:45:51] Logged the message, Master [13:48:53] akosiaris: I got a huge pull request for servermon :-D [13:49:04] already ? [13:49:24] servermon migrations do not work with sqlite :( [13:49:28] duplicate index [13:49:29] yes ... fixed [13:49:33] just pushed it [13:49:36] oh man [13:49:39] I will have to rebase :] [13:50:08] I also wrote a big commit message sharing my pain... [13:50:58] https://github.com/akosiaris/servermon/pull/10 [13:51:00] that is ugly [13:51:11] you probably want to fetch that pull requests and have a look at it / test it [13:51:27] I moved most stuff under the servermon directory [13:51:38] and wrote a bunch of setup.py setup.cfg tox.ini files [13:51:54] that is more or less mimicking what openstack is doing [13:52:04] tox.ini ? [13:52:12] heh... I have some reading to do [13:52:14] that is a wrapper around virtual env [13:52:29] there is a bunch of failing import such as : import puppet.models [13:52:42] apparently should now becomes import servermon.puppet.models [13:52:50] akosiaris: https://pypi.python.org/pypi/tox [13:53:27] let you define isolated environment, for example python 2.7 and python 3.3, tox will run virtualenv and pip to install whatever you want [13:53:30] so it's like this http://virtualenvwrapper.readthedocs.org/en/latest/ [13:53:31] then run a command (i.e. a test) [13:53:39] yeah similar :D [13:53:52] as usual in python world, there are similar tools [13:53:55] ok then.because i 've used that before [13:54:01] tell me about it ... [13:54:09] aaaah the pain of time, datetime [13:54:15] the reason I copied what openstack is doing is that I think they might be interested in contributing to it [13:54:18] date and their friends [13:54:28] (03PS1) 10Mark Bergsma: Add ulsfo upload LVS service monitoring [operations/puppet] - 10https://gerrit.wikimedia.org/r/90544 [13:54:47] ok... thanks.. I will review it during weekend and merge [13:54:56] note I used pbr https://pypi.python.org/pypi/pbr [13:54:59] that is from openstack [13:55:13] as I understand it, that is on top of distutils, provides sane defaults for you [13:55:25] and all configuration is now done in setup.cfg [13:55:41] an interesting thing is that pbr will use 'git describe' to figure out the version of the software [13:55:43] that is veryyy handy [13:55:55] RECOVERY - Puppet freshness on cp1033 is OK: puppet ran at Fri Oct 18 13:55:45 UTC 2013 [13:56:35] PROBLEM - Puppet freshness on cp1033 is CRITICAL: No successful Puppet run in the last 10 hours [13:56:48] (03CR) 10Mark Bergsma: [C: 032] Add ulsfo upload LVS service monitoring [operations/puppet] - 10https://gerrit.wikimedia.org/r/90544 (owner: 10Mark Bergsma) [13:56:55] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Fri Oct 18 13:56:45 UTC 2013 [13:57:14] and now its 4pm :/ [13:57:15] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [13:57:18] (03PS1) 10Cmjohnson: Removing mc1-16 to mc1001-1016 and mc1001-1016 to mc1-16 redis [operations/puppet] - 10https://gerrit.wikimedia.org/r/90545 [13:57:23] 5 :P [13:57:46] this is EEST !!! not for a long but still ... [13:58:01] one more reason to relocate to Athens, you guys are in weekend earlier [13:58:10] ahahahahahaha [13:58:12] akosiaris can you review that change ^^ plz (when you get a chance) [14:00:55] RECOVERY - Puppet freshness on cp1031 is OK: puppet ran at Fri Oct 18 14:00:46 UTC 2013 [14:00:55] RECOVERY - Puppet freshness on cp1022 is OK: puppet ran at Fri Oct 18 14:00:46 UTC 2013 [14:00:55] PROBLEM - Puppet freshness on cp1031 is CRITICAL: No successful Puppet run in the last 10 hours [14:01:05] PROBLEM - Puppet freshness on cp1022 is CRITICAL: No successful Puppet run in the last 10 hours [14:02:00] cmjohnson1: why are you removing role::memcached and redis from mc10*.eqiad ? [14:02:19] aren't they still running redis and memcached and are in production ? [14:03:19] it is replicating with pmtpa which is no longer there [14:05:10] yes. But not only that. Those classes also install and configure redis and memcached [14:05:24] you can just remove the $redis_replication block [14:05:40] and the redist_replication => $redis_replication parameter [14:05:55] RECOVERY - Puppet freshness on cp1028 is OK: puppet ran at Fri Oct 18 14:05:47 UTC 2013 [14:06:15] PROBLEM - Puppet freshness on cp1028 is CRITICAL: No successful Puppet run in the last 10 hours [14:06:30] and the case $::mw_primary at the beginning. All the rest needs to stay for mc10*.eqiad [14:06:38] gimme a sec [14:12:44] (03PS2) 10Akosiaris: Removing mc1-16 to mc1001-1016 and mc1001-1016 to mc1-16 redis [operations/puppet] - 10https://gerrit.wikimedia.org/r/90545 (owner: 10Cmjohnson) [14:13:25] cmjohnson1: ^ this looks fine to me [14:16:15] RECOVERY - Puppet freshness on cp1034 is OK: puppet ran at Fri Oct 18 14:16:04 UTC 2013 [14:16:26] (03CR) 10Akosiaris: [C: 032] Move mysql_wmf into a module. [operations/puppet] - 10https://gerrit.wikimedia.org/r/88666 (owner: 10Andrew Bogott) [14:18:46] RECOVERY - Puppet freshness on cp1025 is OK: puppet ran at Fri Oct 18 14:18:40 UTC 2013 [14:18:46] PROBLEM - Varnish HTTP upload-backend on cp4006 is CRITICAL: Connection refused [14:19:46] PROBLEM - Puppet freshness on cp1025 is CRITICAL: No successful Puppet run in the last 10 hours [14:20:06] RECOVERY - Puppet freshness on cp1023 is OK: puppet ran at Fri Oct 18 14:20:05 UTC 2013 [14:20:46] RECOVERY - Varnish HTTP upload-backend on cp4006 is OK: HTTP OK: HTTP/1.1 200 OK - 188 bytes in 0.147 second response time [14:20:56] PROBLEM - Puppet freshness on cp1023 is CRITICAL: No successful Puppet run in the last 10 hours [14:21:56] RECOVERY - Puppet freshness on cp1021 is OK: puppet ran at Fri Oct 18 14:21:46 UTC 2013 [14:22:16] RECOVERY - Puppet freshness on cp1035 is OK: puppet ran at Fri Oct 18 14:22:06 UTC 2013 [14:22:16] PROBLEM - Puppet freshness on cp1021 is CRITICAL: No successful Puppet run in the last 10 hours [14:22:16] PROBLEM - Puppet freshness on cp1035 is CRITICAL: No successful Puppet run in the last 10 hours [14:23:46] (03PS1) 10Mark Bergsma: Enable upload in ulsfo [operations/dns] - 10https://gerrit.wikimedia.org/r/90548 [14:27:36] PROBLEM - MySQL Slave Delay on db1035 is CRITICAL: CRIT replication delay 325 seconds [14:28:46] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Fri Oct 18 14:28:42 UTC 2013 [14:29:26] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [14:31:36] RECOVERY - MySQL Slave Delay on db1035 is OK: OK replication delay 0 seconds [14:32:43] (03PS13) 10Hashar: Jenkins #1 (please ignore) [operations/debs/pybal] - 10https://gerrit.wikimedia.org/r/84932 [14:32:44] (03PS1) 10Hashar: Jenkins #2 (DO NOT SUBMIT) [operations/debs/pybal] - 10https://gerrit.wikimedia.org/r/90549 [14:32:45] (03PS1) 10Hashar: Jenkins #3 do not submit [operations/debs/pybal] - 10https://gerrit.wikimedia.org/r/90550 [14:32:56] RECOVERY - Puppet freshness on cp1029 is OK: puppet ran at Fri Oct 18 14:32:49 UTC 2013 [14:33:06] PROBLEM - Puppet freshness on cp1029 is CRITICAL: No successful Puppet run in the last 10 hours [14:33:47] RECOVERY - Puppet freshness on cp1032 is OK: puppet ran at Fri Oct 18 14:33:44 UTC 2013 [14:34:46] PROBLEM - Puppet freshness on cp1032 is CRITICAL: No successful Puppet run in the last 10 hours [14:35:46] RECOVERY - Puppet freshness on cp1027 is OK: puppet ran at Fri Oct 18 14:35:40 UTC 2013 [14:36:26] PROBLEM - Puppet freshness on cp1027 is CRITICAL: No successful Puppet run in the last 10 hours [14:41:56] RECOVERY - Puppet freshness on cp1036 is OK: puppet ran at Fri Oct 18 14:41:47 UTC 2013 [14:42:06] PROBLEM - Puppet freshness on cp1036 is CRITICAL: No successful Puppet run in the last 10 hours [14:42:46] RECOVERY - Puppet freshness on cp1024 is OK: puppet ran at Fri Oct 18 14:42:42 UTC 2013 [14:42:56] PROBLEM - Puppet freshness on cp1024 is CRITICAL: No successful Puppet run in the last 10 hours [14:43:56] RECOVERY - Puppet freshness on cp1030 is OK: puppet ran at Fri Oct 18 14:43:47 UTC 2013 [14:43:56] PROBLEM - Puppet freshness on cp1030 is CRITICAL: No successful Puppet run in the last 10 hours [14:44:46] RECOVERY - Puppet freshness on cp1026 is OK: puppet ran at Fri Oct 18 14:44:42 UTC 2013 [14:45:36] PROBLEM - Puppet freshness on cp1026 is CRITICAL: No successful Puppet run in the last 10 hours [14:52:49] (03CR) 10Cmjohnson: [C: 032] "Akosiaris made some changes and approved." [operations/puppet] - 10https://gerrit.wikimedia.org/r/90545 (owner: 10Cmjohnson) [14:55:56] RECOVERY - Puppet freshness on cp1033 is OK: puppet ran at Fri Oct 18 14:55:46 UTC 2013 [14:56:36] PROBLEM - Puppet freshness on cp1033 is CRITICAL: No successful Puppet run in the last 10 hours [14:57:07] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Fri Oct 18 14:57:01 UTC 2013 [14:57:16] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [15:06:19] (03CR) 10Mark Bergsma: [C: 032] Enable upload in ulsfo [operations/dns] - 10https://gerrit.wikimedia.org/r/90548 (owner: 10Mark Bergsma) [15:24:33] (03PS7) 10Andrew Bogott: ssh: convert into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/90098 (owner: 10Matanya) [15:26:23] (03CR) 10Andrew Bogott: "I applied this patch on a labs instance and watched the changes... I saw this:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/90098 (owner: 10Matanya) [15:28:46] (03PS14) 10Hashar: Jenkins #1 (please ignore) [operations/debs/pybal] - 10https://gerrit.wikimedia.org/r/84932 [15:32:55] RECOVERY - Puppet freshness on cp1029 is OK: puppet ran at Fri Oct 18 15:32:47 UTC 2013 [15:33:05] PROBLEM - Puppet freshness on cp1029 is CRITICAL: No successful Puppet run in the last 10 hours [15:34:05] RECOVERY - Puppet freshness on cp1032 is OK: puppet ran at Fri Oct 18 15:34:02 UTC 2013 [15:34:45] PROBLEM - Puppet freshness on cp1032 is CRITICAL: No successful Puppet run in the last 10 hours [15:37:45] RECOVERY - Puppet freshness on cp1027 is OK: puppet ran at Fri Oct 18 15:37:38 UTC 2013 [15:38:25] PROBLEM - Puppet freshness on cp1027 is CRITICAL: No successful Puppet run in the last 10 hours [15:41:55] RECOVERY - Puppet freshness on cp1036 is OK: puppet ran at Fri Oct 18 15:41:45 UTC 2013 [15:42:05] PROBLEM - Puppet freshness on cp1036 is CRITICAL: No successful Puppet run in the last 10 hours [15:42:55] RECOVERY - Puppet freshness on cp1024 is OK: puppet ran at Fri Oct 18 15:42:45 UTC 2013 [15:42:55] PROBLEM - Puppet freshness on cp1024 is CRITICAL: No successful Puppet run in the last 10 hours [15:42:59] (03PS1) 10ArielGlenn: db1019 (s3) warming up in pool after upgrade/conversion to mariadb [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90557 [15:44:05] RECOVERY - Puppet freshness on cp1030 is OK: puppet ran at Fri Oct 18 15:44:00 UTC 2013 [15:44:15] (03PS8) 10Andrew Bogott: ssh: convert into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/90098 (owner: 10Matanya) [15:44:21] (03CR) 10ArielGlenn: [C: 032] db1019 (s3) warming up in pool after upgrade/conversion to mariadb [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90557 (owner: 10ArielGlenn) [15:44:45] RECOVERY - Puppet freshness on cp1026 is OK: puppet ran at Fri Oct 18 15:44:40 UTC 2013 [15:44:55] PROBLEM - Puppet freshness on cp1030 is CRITICAL: No successful Puppet run in the last 10 hours [15:45:15] PROBLEM - Puppet freshness on cp1026 is CRITICAL: No successful Puppet run in the last 10 hours [15:45:35] (03CR) 10Andrew Bogott: [C: 032] ssh: convert into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/90098 (owner: 10Matanya) [15:46:04] !log ariel synchronized wmf-config/db-eqiad.php 'db1019 (s3) in pool warming up after upgrade/conversion to mariadb' [15:46:15] Logged the message, Master [15:56:05] (03PS1) 10Andrew Bogott: Don't allow Password Authentication on production [operations/puppet] - 10https://gerrit.wikimedia.org/r/90561 [15:56:57] (03CR) 10Andrew Bogott: "Matanya, I'm going to merge this immediately. Can you please email me with explanation about how/why these lines were added to your other" [operations/puppet] - 10https://gerrit.wikimedia.org/r/90561 (owner: 10Andrew Bogott) [15:57:23] (03CR) 10Andrew Bogott: [C: 032] Don't allow Password Authentication on production [operations/puppet] - 10https://gerrit.wikimedia.org/r/90561 (owner: 10Andrew Bogott) [16:26:45] PROBLEM - Disk space on db1043 is CRITICAL: DISK CRITICAL - free space: / 277 MB (3% inode=77%): [16:31:44] hmmmmm [16:31:51] this seems strange to me [16:32:00] so, i'm writing a cron in puppet [16:32:10] the script that gets run is put in place by git deployu [16:32:17] so puppet is not managing the script [16:32:27] but I want the cron to require that the script exists [16:32:52] I don't think I can use file { }, beacuse all of the ensure options will create the file if it doesnt' exist [16:32:55] RECOVERY - Puppet freshness on cp1029 is OK: puppet ran at Fri Oct 18 16:32:45 UTC 2013 [16:32:57] I could use an exec [16:32:59] ottomata: copy what anacron does [16:33:00] test -f $script [16:33:02] ? [16:33:05] PROBLEM - Puppet freshness on cp1029 is CRITICAL: No successful Puppet run in the last 10 hours [16:33:23] $ fgrep anacron /etc/crontab | tail -n 1 [16:33:24] 52 6 1 * * root test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.monthly ) [16:33:43] except you want the reverse [16:33:45] RECOVERY - Puppet freshness on cp1032 is OK: puppet ran at Fri Oct 18 16:33:40 UTC 2013 [16:33:45] naw, i want puppet to fail [16:33:48] if the script doesn't exist [16:33:49] so && instead of || [16:33:50] oh [16:34:07] so just exec the test [16:34:12] yeah [16:34:14] and require that exec [16:34:15] i could even exec a noop [16:34:17] and add [16:34:21] creates => $script [16:34:23] heh [16:34:40] idk, that might not work [16:34:42] not sure [16:34:45] PROBLEM - Puppet freshness on cp1032 is CRITICAL: No successful Puppet run in the last 10 hours [16:34:55] anyway, i'm off [16:34:56] yeah, maybe, but test -f is more clear [16:34:57] i'll use that [16:34:58] thanks! [16:35:12] text -x maybe is better? [16:35:45] RECOVERY - Puppet freshness on cp1027 is OK: puppet ran at Fri Oct 18 16:35:41 UTC 2013 [16:36:25] PROBLEM - Puppet freshness on cp1027 is CRITICAL: No successful Puppet run in the last 10 hours [16:39:19] (03PS1) 10Ottomata: Running pagecount-importer and hive-partitioner hourly [operations/puppet] - 10https://gerrit.wikimedia.org/r/90567 [16:39:31] hmm, yeah could be [16:41:03] (03PS2) 10Ottomata: Running pagecount-importer and hive-partitioner hourly [operations/puppet] - 10https://gerrit.wikimedia.org/r/90567 [16:41:21] akosiaris: got a sec to look that over ^? [16:41:32] I'm asking you because this is the second time I've tried to create a kraken role :p [16:41:44] and you reviewed the first one :) [16:41:47] (03CR) 10Chad: [C: 032] Add cirrus.dblist for controlling who has new search [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90430 (owner: 10Chad) [16:41:55] RECOVERY - Puppet freshness on cp1036 is OK: puppet ran at Fri Oct 18 16:41:47 UTC 2013 [16:41:59] (03Merged) 10jenkins-bot: Add cirrus.dblist for controlling who has new search [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90430 (owner: 10Chad) [16:42:05] PROBLEM - Puppet freshness on cp1036 is CRITICAL: No successful Puppet run in the last 10 hours [16:42:45] RECOVERY - Puppet freshness on cp1024 is OK: puppet ran at Fri Oct 18 16:42:42 UTC 2013 [16:42:49] require role::analytics::kraken [16:42:55] PROBLEM - Puppet freshness on cp1024 is CRITICAL: No successful Puppet run in the last 10 hours [16:43:00] ottomata: why require ? ^ [16:43:37] manifests/role/analytics/kraken.pp:34 [16:43:45] RECOVERY - Puppet freshness on cp1030 is OK: puppet ran at Fri Oct 18 16:43:42 UTC 2013 [16:43:55] PROBLEM - Puppet freshness on cp1030 is CRITICAL: No successful Puppet run in the last 10 hours [16:44:02] you want include? [16:44:06] guess it doesnt' matter, hm [16:44:08] in this case [16:44:45] RECOVERY - Puppet freshness on cp1026 is OK: puppet ran at Fri Oct 18 16:44:43 UTC 2013 [16:44:51] welll whenever i see require [16:44:58] alarms bell ring [16:45:15] PROBLEM - Puppet freshness on cp1026 is CRITICAL: No successful Puppet run in the last 10 hours [16:45:17] really? why, I know people like the Class -> Class a little better here, but sometimes you just want to include and do that at the same time [16:45:27] but in this case I don't think I ahve a good reason [16:45:45] PROBLEM - Disk space on db1043 is CRITICAL: DISK CRITICAL - free space: / 246 MB (3% inode=77%): [16:45:46] mainly because ::kraken doesn't actually do deployment [16:46:05] otherwise require would be good, to make sure that the scripts were deployed before the cron jobs were put in place [16:46:05] PROBLEM - MySQL disk space on db1043 is CRITICAL: DISK CRITICAL - free space: / 246 MB (3% inode=77%): [16:46:30] i'll change [16:47:05] (03PS3) 10Ottomata: Running pagecount-importer and hive-partitioner hourly [operations/puppet] - 10https://gerrit.wikimedia.org/r/90567 [16:47:21] akosiaris: ^ [16:57:47] !log demon synchronized cirrus.dblist 'New cirrus.dblist' [16:58:02] Logged the message, Master [16:58:20] !log demon synchronized wmf-config/CommonSettings.php 'Use new cirrus.dblist' [16:58:31] Logged the message, Master [16:58:42] nobody look [16:59:05] ottomata: just something I wanna be clear on... those execs test for scripts that are going to be deployed via git-deploy ? [16:59:18] !log demon synchronized wmf-config/InitialiseSettings.php 'All wikis in cirrus group get cirrus' [16:59:27] yeah, i wasn't sure about that either, I just pinged in here about that and thats the best jeremyb and I came up with [16:59:31] Logged the message, Master [16:59:39] i want the cron to depend on those scripts being in place [16:59:41] right? [16:59:43] but I can't use file [16:59:52] because ensure => file or ensure => present will all create the file [17:00:08] (well, it would probably fail if the parent dirs don't exist, but ja) [17:00:15] (03PS1) 10Yurik: Enable Zero extension for all labs in mobile mode [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90568 [17:00:20] i want to test for the files existence [17:00:24] before installing the cron job [17:00:31] if you know of a better way lemm eknow [17:01:23] (03CR) 10MaxSem: [C: 032] Enable Zero extension for all labs in mobile mode [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90568 (owner: 10Yurik) [17:02:00] (03Merged) 10jenkins-bot: Enable Zero extension for all labs in mobile mode [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90568 (owner: 10Yurik) [17:02:13] so you rely on test -x returning 1 and the exec failing [17:02:15] hmmm [17:02:41] I think it is quite good approach [17:02:43] yeah, or creates => $script being true [17:02:48] i could even just no-op the command [17:02:53] with creates => true [17:02:54] i thinkm [17:02:56] but ja [17:03:28] well [17:03:39] there is an alternative approach that is cleaner [17:03:44] or maybe not [17:03:49] create a wrapper script [17:03:51] (^ that's just a comment fix) [17:03:51] oh? [17:03:57] and populate that [17:03:59] ? [17:04:05] for the cron? [17:04:15] and have the wrapper script running from cron and check for the 1st script [17:04:20] nawwww [17:04:25] (03PS4) 10Ottomata: Running pagecount-importer and hive-partitioner hourly [operations/puppet] - 10https://gerrit.wikimedia.org/r/90567 [17:04:27] (03PS3) 10Chad: Use new LVS setup for search [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/86743 [17:04:29] i don't want the cron to be installed if kraken hasn't been deployed [17:04:46] ok then... forget it [17:04:47] i want puppet to fail [17:05:01] y? [17:05:02] k [17:05:12] well, not all of puppet, but the installation of the cron [17:05:24] i want to see "Cannot install Cron due to dependency blablabla" [17:06:19] well ok. So LGTM... merge it :-) [17:06:39] k danke [17:06:48] +1? [17:07:07] +2 [17:10:07] put it on the review!~ [17:10:11] akosiaris: :) [17:11:57] (03PS1) 10Yurik: Enable Zero extension in labs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90570 [17:12:49] (03CR) 10Ottomata: [C: 032 V: 032] "Reviewed by Akosiaris." [operations/puppet] - 10https://gerrit.wikimedia.org/r/90567 (owner: 10Ottomata) [17:21:38] (03CR) 10coren: [C: 032] "Simple enough." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90570 (owner: 10Yurik) [17:24:30] (03PS1) 10Ottomata: Installing doc opt on Kraken nodes [operations/puppet] - 10https://gerrit.wikimedia.org/r/90572 [17:24:38] (03CR) 10Ottomata: [C: 032 V: 032] Installing doc opt on Kraken nodes [operations/puppet] - 10https://gerrit.wikimedia.org/r/90572 (owner: 10Ottomata) [17:29:13] bblack: ping [17:32:45] RECOVERY - Puppet freshness on cp1029 is OK: puppet ran at Fri Oct 18 17:32:44 UTC 2013 [17:33:05] PROBLEM - Puppet freshness on cp1029 is CRITICAL: No successful Puppet run in the last 10 hours [17:34:05] RECOVERY - Puppet freshness on cp1032 is OK: puppet ran at Fri Oct 18 17:33:59 UTC 2013 [17:34:45] PROBLEM - Puppet freshness on cp1032 is CRITICAL: No successful Puppet run in the last 10 hours [17:35:45] RECOVERY - Puppet freshness on cp1027 is OK: puppet ran at Fri Oct 18 17:35:39 UTC 2013 [17:35:56] (03CR) 10John F. Lewis: [C: 031] Enabled the abusefilter block option for English Wikinews [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90540 (owner: 10Vogone) [17:36:25] PROBLEM - Puppet freshness on cp1027 is CRITICAL: No successful Puppet run in the last 10 hours [17:37:03] (03CR) 10John F. Lewis: [C: 031] Changed Tamil Wikiquote logo [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90538 (owner: 10Vogone) [17:40:33] (03PS1) 10Ottomata: Piping stderr into logfile for kraken jobs [operations/puppet] - 10https://gerrit.wikimedia.org/r/90574 [17:40:43] (03CR) 10Ottomata: [C: 032 V: 032] Piping stderr into logfile for kraken jobs [operations/puppet] - 10https://gerrit.wikimedia.org/r/90574 (owner: 10Ottomata) [17:41:55] RECOVERY - Puppet freshness on cp1036 is OK: puppet ran at Fri Oct 18 17:41:46 UTC 2013 [17:42:05] PROBLEM - Puppet freshness on cp1036 is CRITICAL: No successful Puppet run in the last 10 hours [17:42:55] RECOVERY - Puppet freshness on cp1024 is OK: puppet ran at Fri Oct 18 17:42:46 UTC 2013 [17:42:55] PROBLEM - Puppet freshness on cp1024 is CRITICAL: No successful Puppet run in the last 10 hours [17:43:55] RECOVERY - Puppet freshness on cp1030 is OK: puppet ran at Fri Oct 18 17:43:46 UTC 2013 [17:43:55] PROBLEM - Puppet freshness on cp1030 is CRITICAL: No successful Puppet run in the last 10 hours [17:44:45] RECOVERY - Puppet freshness on cp1026 is OK: puppet ran at Fri Oct 18 17:44:41 UTC 2013 [17:45:15] PROBLEM - Puppet freshness on cp1026 is CRITICAL: No successful Puppet run in the last 10 hours [17:53:18] (03PS2) 10Ori.livneh: Rename / promote 'olivneh' => 'ori' [operations/puppet] - 10https://gerrit.wikimedia.org/r/90387 [17:53:31] (03PS3) 10Ori.livneh: Rename / promote 'olivneh' => 'ori' [operations/puppet] - 10https://gerrit.wikimedia.org/r/90387 [17:56:48] !log disconnecting ps1-b6-eqiad from network [17:57:00] Logged the message, Master [17:57:55] PROBLEM - Host ps1-b5-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [18:02:55] RECOVERY - Host ps1-b5-eqiad is UP: PING OK - Packet loss = 0%, RTA = 1.99 ms [18:03:16] (03CR) 10Reedy: [C: 031] Set one a one-year Cache-control: max-age header for fonts. [operations/apache-config] - 10https://gerrit.wikimedia.org/r/90490 (owner: 10Ori.livneh) [18:03:26] (03Abandoned) 10Ori.livneh: Add Icinga check for l10nupdate & drop !log-based alerts [operations/puppet] - 10https://gerrit.wikimedia.org/r/88009 (owner: 10Ori.livneh) [18:03:51] Reedy: cool, thanks [18:07:23] (03CR) 10Ryan Lane: [C: 032] Rename / promote 'olivneh' => 'ori' [operations/puppet] - 10https://gerrit.wikimedia.org/r/90387 (owner: 10Ori.livneh) [18:07:25] PROBLEM - Host ps1-b5-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [18:17:25] RECOVERY - Host ps1-b5-eqiad is UP: PING OK - Packet loss = 0%, RTA = 2.75 ms [18:32:29] RobH: ping [18:32:55] RECOVERY - Puppet freshness on cp1029 is OK: puppet ran at Fri Oct 18 18:32:45 UTC 2013 [18:33:05] PROBLEM - Puppet freshness on cp1029 is CRITICAL: No successful Puppet run in the last 10 hours [18:33:27] gwicke: sup? [18:33:45] Follwing up on servers with ssds? [18:33:50] RobH: any news on the Cassandra machines? [18:33:53] yes ;) [18:33:53] heh [18:33:59] so we had to order 2.5 to 3.5 brackets [18:34:03] which arrived yesterday [18:34:05] RECOVERY - Puppet freshness on cp1032 is OK: puppet ran at Fri Oct 18 18:33:55 UTC 2013 [18:34:14] cmjohnson1 may be able to provide some insight on when he can get the SSDs in place [18:34:44] will do that next [18:34:45] PROBLEM - Puppet freshness on cp1032 is CRITICAL: No successful Puppet run in the last 10 hours [18:34:47] ok, so not before next week [18:34:48] but he also is running down a different ticket.. [18:34:52] was gonna say he may be afk [18:34:52] heh [18:35:18] gwicke: well, we will try to get them installed today, but i would say lts plan for handoff on monday? [18:35:23] pending ssd install [18:35:35] RobH: yes, sounds good [18:35:44] I have enough other things I can still do in preparation [18:35:45] RECOVERY - Puppet freshness on cp1027 is OK: puppet ran at Fri Oct 18 18:35:41 UTC 2013 [18:36:00] excellent (that we arent blocking ;) [18:36:25] PROBLEM - Puppet freshness on cp1027 is CRITICAL: No successful Puppet run in the last 10 hours [18:36:35] RobH, cmjohnson1: thanks for the update! [18:42:05] RECOVERY - Puppet freshness on cp1036 is OK: puppet ran at Fri Oct 18 18:41:57 UTC 2013 [18:42:05] PROBLEM - Puppet freshness on cp1036 is CRITICAL: No successful Puppet run in the last 10 hours [18:42:45] RECOVERY - Puppet freshness on cp1024 is OK: puppet ran at Fri Oct 18 18:42:42 UTC 2013 [18:42:55] PROBLEM - Puppet freshness on cp1024 is CRITICAL: No successful Puppet run in the last 10 hours [18:44:15] RECOVERY - Puppet freshness on cp1030 is OK: puppet ran at Fri Oct 18 18:44:12 UTC 2013 [18:44:55] PROBLEM - Puppet freshness on cp1030 is CRITICAL: No successful Puppet run in the last 10 hours [18:45:05] RECOVERY - Puppet freshness on cp1026 is OK: puppet ran at Fri Oct 18 18:45:03 UTC 2013 [18:45:15] PROBLEM - Puppet freshness on cp1026 is CRITICAL: No successful Puppet run in the last 10 hours [18:45:16] (03PS1) 10ArielGlenn: db1019 (s3) to full weight in the pool [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90578 [18:46:15] RECOVERY - Puppet freshness on hafnium is OK: puppet ran at Fri Oct 18 18:46:08 UTC 2013 [18:46:17] (03CR) 10ArielGlenn: [C: 032] db1019 (s3) to full weight in the pool [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90578 (owner: 10ArielGlenn) [18:46:21] robh, gwicke ticket is not clear...how many ssds do you need...replacing slots a and b? [18:47:24] https://rt.wikimedia.org/Ticket/Display.html?id=5949 [18:47:56] cmjohnson1: additional ssds to sata [18:47:59] i'll add [18:48:40] robh: there are 4 disks in cerium...assuming the same for the rest [18:48:49] oh, now there are 4? [18:49:00] meh, pull two then [18:49:07] have it be hda, hdb, sdc, sdd [18:49:12] i added that to ticket [18:50:11] yeah..cool. i assumed but want to make certain [18:50:18] Coren: or yurik_, I'm about to merge your wgEnableZeroRatedMobileAccessTesting change on tin [18:50:28] since I have a db-eqiad change I need to go through [18:50:42] apergos: it shouldn't affect production [18:50:47] and its already working on labs [18:51:29] I'll let someone else sync-file it though who can say what it does [18:52:48] !log ariel synchronized wmf-config/db-eqiad.php 'db1019 (s3) to normal weight in pool' [18:53:00] Logged the message, Master [18:54:04] apergos: It's a noop in prod. [18:54:22] Coren: right now it's a noop everywhere since I didn't sync it [18:55:06] Nope, it works on beta; it syncs on its own. :-) [18:55:37] It would be... unwise to require a sync in prod to do experiments on beta. :-) [18:56:13] having unsynced stuff in /a/common is a problem [18:58:32] I didn't even realized InitializeSettings-labs.php was synced in prod. [18:59:18] It certainly doesn't need to be; and I'm concerned that it very probably /shouldn't/ either because we'd sure as hell want to know if something tried to include that from prod. [19:00:19] all kinds of things live in /a/common [19:02:39] IMO, that's a bug and not a feature. :-) [19:03:13] Feel free to sync it though; we're not going to change how /a/common is organized over a 2 minute IRC conversation. :-) [19:03:52] well someone who knows what it does should (so they can add the message logged when it syncs) [19:09:10] Allright, I'll do it. [19:11:33] !log marc synchronized wmf-config/InitialiseSettings-labs.php 'Enable zero extension in labs (beta)' [19:11:46] Logged the message, Master [19:18:40] ah. [19:18:42] ok thanks [19:20:36] (03PS1) 10ArielGlenn: db1035 (s3) back into the pool, done being clone source [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90607 [19:21:41] (03CR) 10ArielGlenn: [C: 032] db1035 (s3) back into the pool, done being clone source [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90607 (owner: 10ArielGlenn) [19:22:45] (03CR) 10Dzahn: [C: 031] Set one a one-year Cache-control: max-age header for fonts. [operations/apache-config] - 10https://gerrit.wikimedia.org/r/90490 (owner: 10Ori.livneh) [19:23:22] !log ariel synchronized wmf-config/db-eqiad.php 'db1035 (s3) back into the pool' [19:23:34] Logged the message, Master [19:32:55] RECOVERY - Puppet freshness on cp1029 is OK: puppet ran at Fri Oct 18 19:32:45 UTC 2013 [19:33:05] PROBLEM - Puppet freshness on cp1029 is CRITICAL: No successful Puppet run in the last 10 hours [19:33:07] I'd appreciate a review of https://gerrit.wikimedia.org/r/#/c/90194/ [19:33:55] RECOVERY - Puppet freshness on cp1032 is OK: puppet ran at Fri Oct 18 19:33:46 UTC 2013 [19:34:45] PROBLEM - Puppet freshness on cp1032 is CRITICAL: No successful Puppet run in the last 10 hours [19:35:45] RECOVERY - Puppet freshness on cp1027 is OK: puppet ran at Fri Oct 18 19:35:41 UTC 2013 [19:36:25] PROBLEM - Puppet freshness on cp1027 is CRITICAL: No successful Puppet run in the last 10 hours [19:39:33] robh: for the servers for gwicke...how do you want them raided. the r320 have a raid controller? [19:39:53] eh, i'd say software raid. [19:40:02] yeah, +1 for soft raid [19:40:03] if only cuz its not hw dependent. [19:40:13] so for os software raid1 on sata [19:40:22] and you can leave the SSDs unmounted and unmodified [19:40:27] (in software) [19:40:27] I'd like to compare jbod with cassandra balancing vs. raid0 [19:41:09] gwicke will more than likely have sudo on these (unless someone blocks it and i dont see that happening) [19:41:16] so he'll be tinkering with all that stuff though [19:41:37] with sudo he can break the raid apart and redo as needed. [19:41:45] RECOVERY - Puppet freshness on cp1036 is OK: puppet ran at Fri Oct 18 19:41:43 UTC 2013 [19:41:47] hw raid less so. [19:42:05] PROBLEM - Puppet freshness on cp1036 is CRITICAL: No successful Puppet run in the last 10 hours [19:42:10] (still possible of course) [19:42:45] RECOVERY - Puppet freshness on cp1024 is OK: puppet ran at Fri Oct 18 19:42:44 UTC 2013 [19:42:55] PROBLEM - Puppet freshness on cp1024 is CRITICAL: No successful Puppet run in the last 10 hours [19:43:39] (03PS1) 10Reedy: Remove www -> en or www -> portal redirects. Already done in wwwportals.conf [operations/apache-config] - 10https://gerrit.wikimedia.org/r/90635 [19:43:45] RECOVERY - Puppet freshness on cp1030 is OK: puppet ran at Fri Oct 18 19:43:44 UTC 2013 [19:43:50] Another 24 lines deleted [19:43:55] PROBLEM - Puppet freshness on cp1030 is CRITICAL: No successful Puppet run in the last 10 hours [19:44:46] RECOVERY - Puppet freshness on cp1026 is OK: puppet ran at Fri Oct 18 19:44:44 UTC 2013 [19:45:13] (03CR) 10Reedy: [C: 031] "(5 comments)" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/90635 (owner: 10Reedy) [19:45:15] PROBLEM - Puppet freshness on cp1026 is CRITICAL: No successful Puppet run in the last 10 hours [19:54:34] (03PS1) 10Ori.livneh: Change for user 'olivneh' to dedupe. [operations/puppet] - 10https://gerrit.wikimedia.org/r/90638 [19:56:08] (03CR) 10Dzahn: [C: 032] "yep, $realname is the resource name and otherwise there was a duplicate definiton making puppet fail on fenari (and where homedirs are cre" [operations/puppet] - 10https://gerrit.wikimedia.org/r/90638 (owner: 10Ori.livneh) [19:56:13] (03CR) 10JanZerebecki: "(1 comment)" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/90635 (owner: 10Reedy) [19:56:47] Bah [19:57:28] (03CR) 10BryanDavis: "(1 comment)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90265 (owner: 10Aaron Schulz) [19:58:31] (03CR) 10Reedy: "(1 comment)" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/90635 (owner: 10Reedy) [19:58:37] (03PS2) 10Reedy: Remove www -> en or www -> portal redirects. Already done in wwwportals.conf [operations/apache-config] - 10https://gerrit.wikimedia.org/r/90635 [20:06:26] (03CR) 10JanZerebecki: [C: 031] Remove www -> en or www -> portal redirects. Already done in wwwportals.conf [operations/apache-config] - 10https://gerrit.wikimedia.org/r/90635 (owner: 10Reedy) [20:06:43] (03PS1) 10Cmjohnson: giving praseodymium and cerium private ips [operations/dns] - 10https://gerrit.wikimedia.org/r/90642 [20:08:14] (03CR) 10Cmjohnson: [C: 032] giving praseodymium and cerium private ips [operations/dns] - 10https://gerrit.wikimedia.org/r/90642 (owner: 10Cmjohnson) [20:09:18] !log dns update [20:09:31] Logged the message, Master [20:11:38] (03CR) 10Ori.livneh: [C: 032] Set one a one-year Cache-control: max-age header for fonts. [operations/apache-config] - 10https://gerrit.wikimedia.org/r/90490 (owner: 10Ori.livneh) [20:17:39] gwicke: do you want raid 1 on the 2 hdd or do you want no raid and OS on a single disk with lvm? [20:18:06] i know rob said raid 1 but i see your comment about jbod [20:21:05] (03CR) 10Dzahn: "fyi, on fenari i noticed this issue:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/87332 (owner: 10Matanya) [20:21:50] ooo [20:22:03] we actually have an active user with the nickname 'Gerrit', fun! [20:22:52] xD [20:23:27] (03CR) 10JanZerebecki: [C: 031] Kill postrewrites.conf, already handled in main.conf under wikipedia [operations/apache-config] - 10https://gerrit.wikimedia.org/r/90460 (owner: 10Reedy) [20:23:50] gerrit@gerrit , sweet [20:24:15] it's probably a system user [20:26:36] "demo.wikinews.org" :) cruft .. so much cruft :) [20:27:35] Yeaaah.... [20:28:16] (03CR) 10Dzahn: [C: 032] Remove www -> en or www -> portal redirects. Already done in wwwportals.conf [operations/apache-config] - 10https://gerrit.wikimedia.org/r/90635 (owner: 10Reedy) [20:29:07] (03PS1) 10JanZerebecki: Remove todo comment that was already done. [operations/apache-config] - 10https://gerrit.wikimedia.org/r/90647 [20:29:37] (03CR) 10Reedy: [C: 031] Remove todo comment that was already done. [operations/apache-config] - 10https://gerrit.wikimedia.org/r/90647 (owner: 10JanZerebecki) [20:32:55] RECOVERY - Puppet freshness on cp1029 is OK: puppet ran at Fri Oct 18 20:32:46 UTC 2013 [20:33:05] PROBLEM - Puppet freshness on cp1029 is CRITICAL: No successful Puppet run in the last 10 hours [20:34:05] RECOVERY - Puppet freshness on cp1032 is OK: puppet ran at Fri Oct 18 20:33:56 UTC 2013 [20:34:45] PROBLEM - Puppet freshness on cp1032 is CRITICAL: No successful Puppet run in the last 10 hours [20:35:45] RECOVERY - Puppet freshness on cp1027 is OK: puppet ran at Fri Oct 18 20:35:42 UTC 2013 [20:36:25] PROBLEM - Puppet freshness on cp1027 is CRITICAL: No successful Puppet run in the last 10 hours [20:40:03] (03CR) 10Dzahn: "dzahn@fenari:~$ apache-fast-test 90635.url mw1044" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/90635 (owner: 10Reedy) [20:40:16] Reedy: http://demo.wikinews.org * 302 Found http://incubator.wikimedia.org/wiki/Wn/demo?goto=mainpage [20:40:22] if it was in DNS ... [20:40:40] Error: This page is unprefixed! .. shrug [20:41:55] RECOVERY - Puppet freshness on cp1036 is OK: puppet ran at Fri Oct 18 20:41:48 UTC 2013 [20:42:01] lol [20:42:05] PROBLEM - Puppet freshness on cp1036 is CRITICAL: No successful Puppet run in the last 10 hours [20:42:45] RECOVERY - Puppet freshness on cp1024 is OK: puppet ran at Fri Oct 18 20:42:44 UTC 2013 [20:42:55] PROBLEM - Puppet freshness on cp1024 is CRITICAL: No successful Puppet run in the last 10 hours [20:43:55] RECOVERY - Puppet freshness on cp1030 is OK: puppet ran at Fri Oct 18 20:43:49 UTC 2013 [20:43:58] (03PS1) 10JanZerebecki: Remove TLDs from host matches that are not reachable. [operations/apache-config] - 10https://gerrit.wikimedia.org/r/90650 [20:44:00] !log sync-apache, graceful'ing .. [20:44:00] (03PS1) 10Ori.livneh: Add Ganglia view for VisualEditor metrics [operations/puppet] - 10https://gerrit.wikimedia.org/r/90651 [20:44:16] Logged the message, Master [20:44:55] RECOVERY - Puppet freshness on cp1026 is OK: puppet ran at Fri Oct 18 20:44:45 UTC 2013 [20:44:55] PROBLEM - Puppet freshness on cp1030 is CRITICAL: No successful Puppet run in the last 10 hours [20:45:15] PROBLEM - Puppet freshness on cp1026 is CRITICAL: No successful Puppet run in the last 10 hours [20:45:45] PROBLEM - MySQL disk space on db1024 is CRITICAL: DISK CRITICAL - free space: / 265 MB (3% inode=84%): [20:45:55] PROBLEM - Disk space on db1024 is CRITICAL: DISK CRITICAL - free space: / 265 MB (3% inode=84%): [20:46:04] (03CR) 10Ori.livneh: [C: 032] Add Ganglia view for VisualEditor metrics [operations/puppet] - 10https://gerrit.wikimedia.org/r/90651 (owner: 10Ori.livneh) [20:48:08] cmjohnson1: sorry, just saw your question- if you went ahead I think I'll be able to still do whatever by dropping one disk out of the raid [20:48:30] i have done anything yet [20:49:06] ah, k- then maybe a smallish raid1 partition for boot and system and jbod otherwise [20:49:29] ok [20:49:33] can then play with raid vs. jbod using the remainder of the disks [20:50:14] sounds like a plan [20:50:29] cmjohnson1: thanks! [20:57:35] (03PS1) 10Cmjohnson: setting up netboot.cfg for cerium, praseodymium, & xenon [operations/puppet] - 10https://gerrit.wikimedia.org/r/90654 [20:59:37] (03CR) 10Cmjohnson: [C: 032] setting up netboot.cfg for cerium, praseodymium, & xenon [operations/puppet] - 10https://gerrit.wikimedia.org/r/90654 (owner: 10Cmjohnson) [20:59:50] (03CR) 10Dzahn: [C: 032] Remove todo comment that was already done. [operations/apache-config] - 10https://gerrit.wikimedia.org/r/90647 (owner: 10JanZerebecki) [21:00:55] ori-l okay to merge your change visual editor monitor change? [21:01:14] cmjohnson1: I just did [21:01:20] cmjohnson1: sorry about that, was having puppetmaster access woes [21:01:28] cool..thx..you got mine as well..it works [21:02:03] cmjohnson1: oh, woops. i should have scrolled up, missed that part of the diff. [21:02:50] no worries...mine was an easy change that wouldn't break anything ;-) [21:05:45] PROBLEM - MySQL disk space on db1002 is CRITICAL: DISK CRITICAL - free space: / 274 MB (3% inode=84%): [21:05:45] PROBLEM - Disk space on db1002 is CRITICAL: DISK CRITICAL - free space: / 274 MB (3% inode=84%): [21:07:04] (03CR) 10Reedy: "These were also on my todo list :)" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/90650 (owner: 10JanZerebecki) [21:08:25] mutante: Someone else starting to do the same stuff :D [21:08:31] (03PS1) 10Ori.livneh: Fix typo in metric regex for VE DOM retrieve stats [operations/puppet] - 10https://gerrit.wikimedia.org/r/90657 [21:08:41] (03CR) 10Reedy: [C: 031] Remove TLDs from host matches that are not reachable. [operations/apache-config] - 10https://gerrit.wikimedia.org/r/90650 (owner: 10JanZerebecki) [21:09:19] Reedy: we know each other, and i mentioned the cleanup:) [21:09:23] (03PS2) 10Ori.livneh: Fix typo in metric regex for VE DOM retrieve stats [operations/puppet] - 10https://gerrit.wikimedia.org/r/90657 [21:09:26] Ahaaa [21:10:46] Don't suppose he'd have any ideaa about unit testing the config? [21:10:58] (03CR) 10Ori.livneh: [C: 032] Fix typo in metric regex for VE DOM retrieve stats [operations/puppet] - 10https://gerrit.wikimedia.org/r/90657 (owner: 10Ori.livneh) [21:11:14] Reedy: actually, yes [21:12:08] jzerebecki: hi:) [21:12:18] lerkers! [21:17:19] (03CR) 10Ryan Lane: [C: 032] replace SSLCACertificatePath with SSLCertificateChainFile in Apache templates [operations/puppet] - 10https://gerrit.wikimedia.org/r/84901 (owner: 10Dzahn) [21:18:49] (03PS1) 10Cmjohnson: Adding site.pp entry for cerium,praseodymium and xenon [operations/puppet] - 10https://gerrit.wikimedia.org/r/90658 [21:19:24] (03CR) 10jenkins-bot: [V: 04-1] Adding site.pp entry for cerium,praseodymium and xenon [operations/puppet] - 10https://gerrit.wikimedia.org/r/90658 (owner: 10Cmjohnson) [21:20:18] mutante: hi [21:21:11] 4 changesets to merge ;) [21:21:57] jzerebecki: ideas about unit testing the apache-config? [21:21:59] Reedy: extend apache-fast-test to not only get a list of URLs to test but also a list of assertions for each URL? [21:22:20] go Perl coding [21:22:28] Not necesserily [21:22:53] Being able to do it away from the cluster would be good [21:23:15] like the http return code, match any of the headers in the response, match a PCRE in the response body... [21:23:21] it already runs apache configtest on them [21:23:35] It's more knowing that our redirects go where we think [21:24:30] yea, but that involves more than just the apache config [21:25:39] * mutante checks if those font packages are not dummy  [21:25:45] so testing it in anything that is more isolated from production or beta probably leads to useless test results [21:27:36] (03PS2) 10Cmjohnson: Adding site.pp entry for cerium,praseodymium and xenon [operations/puppet] - 10https://gerrit.wikimedia.org/r/90658 [21:29:10] (03CR) 10Cmjohnson: [C: 032] Adding site.pp entry for cerium,praseodymium and xenon [operations/puppet] - 10https://gerrit.wikimedia.org/r/90658 (owner: 10Cmjohnson) [21:31:12] Reedy: otoh one could probably try to mock those parts (like squid, varnish, etc on the one side and php on the other) and make apache-fast-test run an apache that is totally independend of the cluster [21:32:40] then you could have jenkins run it on each commit :) [21:32:55] RECOVERY - Puppet freshness on cp1029 is OK: puppet ran at Fri Oct 18 21:32:44 UTC 2013 [21:32:59] (03CR) 10Dzahn: [C: 04-1] "(1 comment)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/88441 (owner: 10Reedy) [21:33:05] PROBLEM - Puppet freshness on cp1029 is CRITICAL: No successful Puppet run in the last 10 hours [21:33:55] RECOVERY - Puppet freshness on cp1032 is OK: puppet ran at Fri Oct 18 21:33:50 UTC 2013 [21:34:19] jzerebecki: that sounds like you mean have it run on beta.wmflabs.org [21:34:29] mutante: swapped z and r [21:34:30] fonts-sil-ezra [21:34:45] PROBLEM - Puppet freshness on cp1032 is CRITICAL: No successful Puppet run in the last 10 hours [21:35:28] (03PS6) 10Dzahn: Update font packages to not use virtual packages [operations/puppet] - 10https://gerrit.wikimedia.org/r/88441 (owner: 10Reedy) [21:35:45] RECOVERY - Puppet freshness on cp1027 is OK: puppet ran at Fri Oct 18 21:35:41 UTC 2013 [21:36:25] PROBLEM - Puppet freshness on cp1027 is CRITICAL: No successful Puppet run in the last 10 hours [21:36:58] ahaha [21:37:13] I was wtf-ing as to why there wasn't the text I want [21:37:59] (03CR) 10Dzahn: [C: 031] "dzahn@fenari:~$ for font in $(cat fonts.list); do echo $font; apt-cache show $font | grep Replaces; done" [operations/puppet] - 10https://gerrit.wikimedia.org/r/88441 (owner: 10Reedy) [21:39:03] Reedy: ... now .. ?:P [21:39:16] mutante: that would be possible then, too. but it would be possible to go even further and make jenkins -> apache-fast-test start a local apache just to run the tests. that would then be portable and fully independent of both production and beta. [21:39:33] the packages look ok, but i dunno if something could be gone that was still expected [21:39:52] or if we'd get tickets over the weekend :) [21:40:17] Certainly no rush for that one [21:40:30] let's Alex also review it then [21:40:35] heh [21:41:55] RECOVERY - Puppet freshness on cp1036 is OK: puppet ran at Fri Oct 18 21:41:47 UTC 2013 [21:42:05] PROBLEM - Puppet freshness on cp1036 is CRITICAL: No successful Puppet run in the last 10 hours [21:42:45] RECOVERY - Puppet freshness on cp1024 is OK: puppet ran at Fri Oct 18 21:42:42 UTC 2013 [21:42:55] PROBLEM - Puppet freshness on cp1024 is CRITICAL: No successful Puppet run in the last 10 hours [21:43:48] jzerebecki: Reedy re: change 90650 , i guess we could as well remove all those Rules [21:43:55] RECOVERY - Puppet freshness on cp1030 is OK: puppet ran at Fri Oct 18 21:43:48 UTC 2013 [21:44:21] The www.XX.project.org ones? [21:44:24] Do people use them? :/ [21:44:45] RECOVERY - Puppet freshness on cp1026 is OK: puppet ran at Fri Oct 18 21:44:38 UTC 2013 [21:44:51] i dunno, but if they do they will create certificate error bugs:) [21:44:55] PROBLEM - Puppet freshness on cp1030 is CRITICAL: No successful Puppet run in the last 10 hours [21:45:15] PROBLEM - Puppet freshness on cp1026 is CRITICAL: No successful Puppet run in the last 10 hours [21:45:28] but didnt you say they are duplicated in redirects.conf anyways [21:45:33] looks [21:45:36] nope those are note [21:45:39] -e [21:45:50] The not .org to .org rewrites [21:46:14] ok [21:50:50] notes how Wikipedias redirect to /w/index.php [21:50:57] and other prijects just to / [21:51:33] (03CR) 10Dzahn: [C: 031] "http://www.de.wikipedia.org" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/90650 (owner: 10JanZerebecki) [21:52:02] (03CR) 10Dzahn: [C: 032 V: 032] "http://www.de.wikipedia.org" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/90650 (owner: 10JanZerebecki) [21:55:43] mutante: the order relative to that rule is different there: RewriteRule ^/$ /w/index.php [21:56:07] i assume for no reason. [21:56:14] (03CR) 10Dzahn: "really? but redirector.wikipedia.org server name would be gone" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/90460 (owner: 10Reedy) [21:56:27] where is it already handled in main.conf [21:56:55] lol [21:57:08] (03PS1) 10Ori.livneh: Ganglia VE view: one graph cached / uncached / overall [operations/puppet] - 10https://gerrit.wikimedia.org/r/90661 [21:57:17] noting redirector.wikipedia.org doesn't resolve ;) [21:57:28] (03CR) 10Ori.livneh: [C: 032] Ganglia VE view: one graph cached / uncached / overall [operations/puppet] - 10https://gerrit.wikimedia.org/r/90661 (owner: 10Ori.livneh) [21:57:42] Reedy: oh.. and it's also there.. i see [21:57:45] actually, lol [21:57:48] (03CR) 10JanZerebecki: "No it was a duplicate of the same ServerName in redirects.conf ." [operations/apache-config] - 10https://gerrit.wikimedia.org/r/90460 (owner: 10Reedy) [21:57:53] /etc/apache2/wmf# grep -r redirector.wikipedia.org * [21:58:22] archive/broken/.svn/text-base/redirects.conf.svn-base [21:58:31] lmfao [21:58:33] awesome [21:59:13] so you meant "already handled in redirects.conf" [21:59:20] while it can also be removed from there [21:59:25] since it's gone from DNS :p [22:00:17] (03CR) 10Dzahn: [C: 032 V: 032] "Host redirector.wikipedia.org not found: 3(NXDOMAIN)" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/90460 (owner: 10Reedy) [22:00:52] mutante: no, that part of the commit message is about the *.wikipedia.org and the rewriterules. [22:01:32] i assume redirector.wikipedia.org was never in DNS and [22:01:47] jzerebecki: yea, either way ServerName redirector.wikipedia.org is just there and in redirects.conf [22:02:04] dunno if it was, but it must have been long ago [22:02:11] would have to search svn [22:02:20] was just used because you need a ServerName withouth a * [22:02:46] and redirects.con mostly contains ServerAliases with wildcards in them [22:02:47] that would make sense too [22:05:14] (03CR) 10Dzahn: "http://www.de.wikipedia.org" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/90460 (owner: 10Reedy) [22:06:02] # FIXME this is dangerous [22:06:04] heh [22:06:58] # Obsolete PDF redirected to current wiki page [22:07:13] bv 2009-01-09 [22:07:14] hhehe [22:07:58] (03CR) 10Dzahn: [C: 032] Tidy up foundation.conf, removing old/commented/un-needed config [operations/apache-config] - 10https://gerrit.wikimedia.org/r/90445 (owner: 10Reedy) [22:08:34] 1 to go :) [22:08:53] ? the wikimania docroots ?:p [22:09:01] Yeaah [22:09:11] anytime? [22:09:15] +93, -455 [22:09:44] All good to go [22:10:02] That's about a 10% decrease in the total lines of the apache configs [22:10:02] and already tested.. ok .. [22:11:13] jzerebecki: btw, unrelated. :) /etc/init.d/apache2: 55: [: nice: unexpected operator [22:11:36] it's nothing new.. it's just a little annoyance [22:12:37] eew bash :P [22:13:01] just replace it with systemd ;) [22:14:16] (03CR) 10Dzahn: "http://wikimediafoundation.org" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/90445 (owner: 10Reedy) [22:15:08] (03CR) 10Dzahn: [C: 032] "ok, already tested this" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/84707 (owner: 10Reedy) [22:16:32] mutante: more seriuosly i think that is probably the init script that comes with the apache2 deb? [22:18:08] anyway i should really sleep :) [22:18:12] * jzerebecki vanishes [22:18:46] jzerebecki: yea, good sleep for now, cya [22:21:20] !log deleting deprecated postrewrites.conf from Apaches, syncing, gracefulling for cleanup changes [22:21:33] Logged the message, Master [22:22:58] (03PS1) 10Yurik: Fixed incorrect domain matching for ZERO [operations/puppet] - 10https://gerrit.wikimedia.org/r/90665 [22:24:16] Reedy: looks all good, it's synced and ...restarted and wfm :) [22:24:22] (03PS2) 10Yurik: Fixed incorrect domain matching for ZERO [operations/puppet] - 10https://gerrit.wikimedia.org/r/90665 [22:25:45] PROBLEM - MySQL disk space on db1040 is CRITICAL: DISK CRITICAL - free space: / 271 MB (3% inode=83%): [22:26:13] !log aaron synchronized php-1.22wmf22/includes/HashRing.php '5ac886a65e0ef3831841f83c2e1e84b1f5d9a717' [22:26:15] PROBLEM - Disk space on db1040 is CRITICAL: DISK CRITICAL - free space: / 271 MB (3% inode=83%): [22:26:26] Logged the message, Master [22:27:05] Reedy: https://wikitech.wikimedia.org/wiki/User_talk:Reedy#A_beer_for_you.21 [22:27:13] brb [22:27:51] !log aaron synchronized php-1.22wmf22/includes/job '5ac886a65e0ef3831841f83c2e1e84b1f5d9a717' [22:28:02] Logged the message, Master [22:30:11] (03PS1) 10Chad: Further changes to the rc2-507 package [operations/debs/gerrit] - 10https://gerrit.wikimedia.org/r/90666 [22:32:45] RECOVERY - Puppet freshness on cp1029 is OK: puppet ran at Fri Oct 18 22:32:43 UTC 2013 [22:33:05] PROBLEM - Puppet freshness on cp1029 is CRITICAL: No successful Puppet run in the last 10 hours [22:33:45] RECOVERY - Puppet freshness on cp1032 is OK: puppet ran at Fri Oct 18 22:33:43 UTC 2013 [22:34:45] PROBLEM - Puppet freshness on cp1032 is CRITICAL: No successful Puppet run in the last 10 hours [22:35:45] RECOVERY - Puppet freshness on cp1027 is OK: puppet ran at Fri Oct 18 22:35:39 UTC 2013 [22:36:00] 3822 -> 3215 [22:36:20] I think that's about 16% so far :D [22:36:25] PROBLEM - Puppet freshness on cp1027 is CRITICAL: No successful Puppet run in the last 10 hours [22:38:36] Think I can kill another 500 or so without much workd [22:40:35] mutante: Think the next task is to look at removing most of the superfluous docroot folders before collapsing them down [22:41:55] RECOVERY - Puppet freshness on cp1036 is OK: puppet ran at Fri Oct 18 22:41:46 UTC 2013 [22:42:05] PROBLEM - Puppet freshness on cp1036 is CRITICAL: No successful Puppet run in the last 10 hours [22:42:45] RECOVERY - Puppet freshness on cp1024 is OK: puppet ran at Fri Oct 18 22:42:41 UTC 2013 [22:42:55] PROBLEM - Puppet freshness on cp1024 is CRITICAL: No successful Puppet run in the last 10 hours [22:43:45] RECOVERY - Puppet freshness on cp1030 is OK: puppet ran at Fri Oct 18 22:43:41 UTC 2013 [22:43:55] PROBLEM - Puppet freshness on cp1030 is CRITICAL: No successful Puppet run in the last 10 hours [22:44:55] RECOVERY - Puppet freshness on cp1026 is OK: puppet ran at Fri Oct 18 22:44:46 UTC 2013 [22:45:15] PROBLEM - Puppet freshness on cp1026 is CRITICAL: No successful Puppet run in the last 10 hours [22:45:25] PROBLEM - MySQL disk space on db1009 is CRITICAL: DISK CRITICAL - free space: / 263 MB (3% inode=79%): [22:46:05] PROBLEM - Disk space on db1009 is CRITICAL: DISK CRITICAL - free space: / 252 MB (3% inode=79%): [22:46:33] (03PS1) 10Mwalker: Add BannerRandom filter to erbium [operations/puppet] - 10https://gerrit.wikimedia.org/r/90667 [22:46:54] (03PS1) 10Reedy: Reduce all the www portal docroots into one [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90668 [22:47:06] (03CR) 10jenkins-bot: [V: 04-1] Reduce all the www portal docroots into one [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90668 (owner: 10Reedy) [22:51:39] "Hi Daniel, If you’re concerned about Windows XP EOL, Bit9 can help. Here’s how:" no, i'm not in the slightest, thanks [22:53:43] (03PS1) 10Reedy: Switch www portals to using one docroot [operations/apache-config] - 10https://gerrit.wikimedia.org/r/90669 [22:53:58] Hah [22:56:48] (03PS1) 10CSteipp: Update beta to use loginwiki for SUL [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90670 [22:58:00] Speaking of EOL stuff... I never realized how slow IE6 was. How do people live with themselves using this stuff?? [22:58:52] I guess if you don't know any better.. [23:01:16] that company offers to continue support after MS dropped it.. for "1/10th of the price":p [23:02:49] http://www.downforeveryoneorjustme.com/ftp://ftp.microsoft.com/ [23:03:49] it was still alive not long ago , happily serving ancient stuff [23:04:16] afair iexplore.exe 1.0 worked in wine, for the hell of it [23:06:36] I'm sure someone has a mirror [23:08:02] http://webcache.googleusercontent.com/search?q=cache:r0drijOU_QAJ:ftp://ftp.microsoft.com/softlib/index.txt+&cd=1&hl=en&ct=clnk&gl=us&client=iceweasel-a [23:12:29] (03PS1) 10Cmjohnson: Removing old dns entries for cerium and praseodymium [operations/dns] - 10https://gerrit.wikimedia.org/r/90671 [23:12:52] (03CR) 10Ryan Lane: [C: 032 V: 032] Further changes to the rc2-507 package [operations/debs/gerrit] - 10https://gerrit.wikimedia.org/r/90666 (owner: 10Chad) [23:13:10] (03CR) 10Cmjohnson: [C: 032] Removing old dns entries for cerium and praseodymium [operations/dns] - 10https://gerrit.wikimedia.org/r/90671 (owner: 10Cmjohnson) [23:14:20] !log dns update [23:14:41] mutante: You deployed the wikimania change, right? :D [23:15:44] Reedy: yes, anything that doesn't look right? [23:15:46] (03PS2) 10Reedy: Kill wikimania docroot folders [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/84721 [23:15:50] ah [23:15:57] I was just confirming I could delete all the directories [23:15:57] :D [23:16:11] (03CR) 10Reedy: [C: 032] Kill wikimania docroot folders [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/84721 (owner: 10Reedy) [23:16:20] (03Merged) 10jenkins-bot: Kill wikimania docroot folders [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/84721 (owner: 10Reedy) [23:17:50] !log reedy synchronized docroot and w [23:18:04] Logged the message, Master [23:18:43] Reedy: not really gone [23:19:26] when i look in /usr/local/apache/common/docroot on an appserver [23:19:37] lol [23:20:11] dsh "${MW_DSH_ARGS[@]}" -- "sudo -u mwdeploy rsync -a --no-perms $MW_RSYNC_HOST::common/docroot $MW_RSYNC_HOST::common/w $MW_COMMON" [23:24:34] all the fingerprints changed again ? [23:32:55] RECOVERY - Puppet freshness on cp1029 is OK: puppet ran at Fri Oct 18 23:32:46 UTC 2013 [23:33:05] PROBLEM - Puppet freshness on cp1029 is CRITICAL: No successful Puppet run in the last 10 hours [23:33:55] RECOVERY - Puppet freshness on cp1032 is OK: puppet ran at Fri Oct 18 23:33:46 UTC 2013 [23:34:45] PROBLEM - Puppet freshness on cp1032 is CRITICAL: No successful Puppet run in the last 10 hours [23:35:45] RECOVERY - Puppet freshness on cp1027 is OK: puppet ran at Fri Oct 18 23:35:41 UTC 2013 [23:36:25] PROBLEM - Puppet freshness on cp1027 is CRITICAL: No successful Puppet run in the last 10 hours [23:39:41] !log reedy synchronized docroot [23:39:52] Logged the message, Master [23:41:45] RECOVERY - Puppet freshness on cp1036 is OK: puppet ran at Fri Oct 18 23:41:43 UTC 2013 [23:42:05] PROBLEM - Puppet freshness on cp1036 is CRITICAL: No successful Puppet run in the last 10 hours [23:42:45] RECOVERY - Puppet freshness on cp1024 is OK: puppet ran at Fri Oct 18 23:42:43 UTC 2013 [23:42:55] PROBLEM - Puppet freshness on cp1024 is CRITICAL: No successful Puppet run in the last 10 hours [23:43:46] RECOVERY - Puppet freshness on cp1030 is OK: puppet ran at Fri Oct 18 23:43:44 UTC 2013 [23:43:55] PROBLEM - Puppet freshness on cp1030 is CRITICAL: No successful Puppet run in the last 10 hours [23:44:46] RECOVERY - Puppet freshness on cp1026 is OK: puppet ran at Fri Oct 18 23:44:44 UTC 2013 [23:44:50] (03CR) 10Dzahn: [C: 032] replace SSLCACertificatePath with SSLCertificateChainFile in Apache templates [operations/puppet] - 10https://gerrit.wikimedia.org/r/84901 (owner: 10Dzahn) [23:45:03] (03CR) 10Dzahn: [V: 032] replace SSLCACertificatePath with SSLCertificateChainFile in Apache templates [operations/puppet] - 10https://gerrit.wikimedia.org/r/84901 (owner: 10Dzahn) [23:45:15] PROBLEM - Puppet freshness on cp1026 is CRITICAL: No successful Puppet run in the last 10 hours [23:45:59] mutante: ori just hit a bad ssl cert (pulling from gerrit to tin), related towhat you just did? [23:46:21] greg-g: yes, i think :( [23:46:29] i fixed it [23:46:39] and i may have to revert [23:46:51] k [23:49:08] yes, give me one more minute [23:52:43] greg-g: ori-l: it's up again, i'm just reverting it for gerrit so puppet won't break it again [23:58:53] (03PS1) 10Dzahn: revert the SSL chain file change on gerrit [operations/puppet] - 10https://gerrit.wikimedia.org/r/90676 [23:58:54] (03PS1) 10TTO: Add import sources for frwikivoyage [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90677