[00:04:49] MaxSem: so… see https://gerrit.wikimedia.org/r/#/c/52553/ i originally did this but for some reason we did the construct thing [00:04:54] not sure if there was a reason.. [00:05:30] was that reason me? [00:06:16] * jdlrobson shrugs :) [00:06:52] yep MaxSem lol: MaxSem: Instead of renaming execute() for every special page, just call clearPageMargins() from constructor. [00:07:17] I came up with a nicer solution [00:08:03] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 00:07:54 UTC 2013 [00:08:03] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:09:13] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 00:09:03 UTC 2013 [00:10:03] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:10:13] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 00:10:08 UTC 2013 [00:11:03] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:11:13] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 00:11:05 UTC 2013 [00:12:03] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:12:43] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 00:12:38 UTC 2013 [00:13:03] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:13:23] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 00:13:13 UTC 2013 [00:14:03] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:15:03] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 00:15:01 UTC 2013 [00:15:34] Reedy, no idea about location - zero doesn't use any of it, only IP-based detection [00:16:03] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:18:15] New patchset: Tim Starling; "Add generic::mysql::packages::client to tin" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63663 [00:18:23] Change merged: Tim Starling; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63663 [00:28:36] !log maxsem synchronized php-1.22wmf3/extensions/MobileFrontend 'https://gerrit.wikimedia.org/r/#/c/63806/' [00:28:44] Logged the message, Master [00:30:41] !log maxsem synchronized php-1.22wmf4/extensions/MobileFrontend 'https://gerrit.wikimedia.org/r/#/c/63806/' [00:30:48] Logged the message, Master [00:31:45] Thehelpfulone, should be gone [00:36:45] PROBLEM - Puppet freshness on db45 is CRITICAL: No successful Puppet run in the last 10 hours [00:38:45] PROBLEM - Puppet freshness on db26 is CRITICAL: No successful Puppet run in the last 10 hours [00:40:11] MaxSem, thanks [00:42:59] New patchset: Ryan Lane; "Adding .gitreview file" [operations/puppet/cdh4] (master) - https://gerrit.wikimedia.org/r/63812 [00:43:23] Change merged: Ryan Lane; [operations/puppet/cdh4] (master) - https://gerrit.wikimedia.org/r/63812 [00:44:07] New patchset: Akosiaris; "Puppetizing Hadoop for CDH4." [operations/puppet/cdh4] (master) - https://gerrit.wikimedia.org/r/61710 [00:49:14] !log graceful apache on stat1001 [00:49:22] Logged the message, Master [00:57:03] New review: Hydriz; "Hmm, some suggestions:" [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/63782 [01:01:05] New patchset: Dzahn; "Weekly Bugzilla Report mail: Fix wrong SQL query on bug resolutions" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/62220 [01:01:16] New patchset: Akosiaris; "Fix a couple of syntax errors" [operations/puppet/cdh4] (master) - https://gerrit.wikimedia.org/r/63815 [01:03:16] New review: Dzahn; "making another test before it's mailing out stuff" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/62220 [01:03:17] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/62220 [01:07:06] # PHP Deprecated: Comments starting with '#' are deprecated .. bla:) [01:19:24] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [01:52:26] PROBLEM - Disk space on ms-be1009 is CRITICAL: DISK CRITICAL - /var/lib/ceph/osd/ceph-106 is not accessible: Input/output error [01:53:26] PROBLEM - HTTP radosgw on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:53:36] PROBLEM - HTTP radosgw on ms-fe1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:53:36] PROBLEM - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:54:16] PROBLEM - HTTP radosgw on ms-fe1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:54:16] PROBLEM - HTTP radosgw on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:54:36] RECOVERY - HTTP radosgw on ms-fe1003 is OK: HTTP OK: HTTP/1.1 200 OK - 311 bytes in 4.738 second response time [01:54:36] RECOVERY - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 311 bytes in 8.228 second response time [01:55:06] RECOVERY - HTTP radosgw on ms-fe1004 is OK: HTTP OK: HTTP/1.1 200 OK - 311 bytes in 0.005 second response time [01:55:06] RECOVERY - HTTP radosgw on ms-fe1001 is OK: HTTP OK: HTTP/1.1 200 OK - 311 bytes in 0.008 second response time [01:55:16] RECOVERY - HTTP radosgw on ms-fe1002 is OK: HTTP OK: HTTP/1.1 200 OK - 311 bytes in 0.004 second response time [01:55:19] hrm [01:56:56] eh, not too much on flourine [01:59:06] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: No successful Puppet run in the last 10 hours [02:08:15] !log LocalisationUpdate completed (1.22wmf3) at Wed May 15 02:08:15 UTC 2013 [02:08:24] Logged the message, Master [02:14:58] !log LocalisationUpdate completed (1.22wmf4) at Wed May 15 02:14:58 UTC 2013 [02:15:06] Logged the message, Master [02:34:38] !log LocalisationUpdate ResourceLoader cache refresh completed at Wed May 15 02:34:37 UTC 2013 [02:34:47] Logged the message, Master [02:53:27] PROBLEM - Puppet freshness on db44 is CRITICAL: No successful Puppet run in the last 10 hours [03:06:37] PROBLEM - LVS HTTP IPv4 on m.wikimedia.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 21532 bytes in 0.002 second response time [03:18:57] PROBLEM - Host mw1173 is DOWN: PING CRITICAL - Packet loss = 100% [03:23:37] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:24:27] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [03:32:37] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:33:27] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [03:39:17] New review: Ori.livneh; "OK, makes sense. I'll work this over." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57890 [03:57:35] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:59:25] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.164 second response time [04:08:03] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 04:07:56 UTC 2013 [04:08:33] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [04:09:13] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 04:09:06 UTC 2013 [04:09:33] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [04:10:13] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 04:10:10 UTC 2013 [04:10:33] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [04:11:13] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 04:11:07 UTC 2013 [04:11:33] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [04:12:03] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 04:11:56 UTC 2013 [04:12:33] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [04:12:43] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 04:12:41 UTC 2013 [04:13:33] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [04:14:53] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 04:14:49 UTC 2013 [04:15:33] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [04:25:13] PROBLEM - Puppet freshness on mc15 is CRITICAL: No successful Puppet run in the last 10 hours [04:26:13] PROBLEM - Puppet freshness on colby is CRITICAL: No successful Puppet run in the last 10 hours [04:27:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:28:23] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.139 second response time [04:35:34] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:37:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [04:43:30] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:49:30] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:49:44] New review: Yurik; "Faidon, I'm not sure I understand -- could you elaborate on the SSL servers? If SSL gateway sets XFF..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/62103 [05:04:42] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:06:00] PROBLEM - Disk space on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:06:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:06:50] RECOVERY - Disk space on snapshot2 is OK: DISK OK [05:07:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.132 second response time [05:09:30] RECOVERY - DPKG on snapshot2 is OK: All packages OK [05:10:30] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:12:30] PROBLEM - DPKG on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:14:00] PROBLEM - Disk space on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:14:40] PROBLEM - SSH on snapshot2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:15:50] RECOVERY - Disk space on snapshot2 is OK: DISK OK [05:16:31] RECOVERY - SSH on snapshot2 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [05:18:30] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:18:30] RECOVERY - DPKG on snapshot2 is OK: All packages OK [05:25:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:27:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.138 second response time [05:31:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:32:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [05:38:55] New patchset: Nemo bis; "Fix path for zh-min-nan.wiktionary logo" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/63826 [06:00:02] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [06:00:02] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [06:00:02] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [06:01:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:02:32] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 8.125 second response time [06:14:02] PROBLEM - Puppet freshness on db1017 is CRITICAL: No successful Puppet run in the last 10 hours [06:15:39] New patchset: Nemo bis; "Fix path for roa-rup and zh-min-nan.wiktionary logo" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/63826 [06:16:40] New patchset: Nemo bis; "Fix path for roa-rup and zh-min-nan.wiktionary logo" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/63826 [06:30:12] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 06:30:08 UTC 2013 [06:30:32] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [06:30:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:31:02] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 06:30:53 UTC 2013 [06:31:32] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [06:31:33] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 06:31:31 UTC 2013 [06:32:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.138 second response time [06:32:32] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [06:45:06] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 06:45:03 UTC 2013 [06:45:36] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [07:06:37] PROBLEM - LVS HTTP IPv4 on m.wikimedia.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 21539 bytes in 0.014 second response time [07:27:37] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:28:27] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.326 second response time [07:31:37] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:32:27] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.143 second response time [07:39:22] lo [07:50:56] ol [07:51:17] you rang? [07:51:47] ringringring [07:51:51] why yes, I did [07:52:00] hi apergos [07:52:21] ah you are the mediawiki vagrant person right? [07:52:37] ori-l: [07:53:01] i guess so [07:53:06] what's up? [07:53:09] did it crash and burn? [07:53:35] no no [07:53:48] first my compliments on the awesome color banner on login :-D [07:53:56] haha [07:53:59] thanks :) [07:54:47] I'm just trying to understand how to trigger a puppet run, not manually, your readme says it vagrant will 'periodically run it' and I wonder how that works, since I didn't see cron or someting, or even a puppet-agent running on the box [07:55:38] ah the other thing immediately related is that I see the stuff in /vagrant on the guest (which is synced to the host) is a git repo, including therefore the puppet files, do I need to commit changes there in order to have an effect? [07:55:54] ringringring < banana phone? [07:56:01] sorry for the noob qs but there's an awful lot of vagrant docs and I got lost pretty fast [07:56:08] p858snake|l: great, now i'll have that in my head for the next, oh, four days [07:56:23] apergos: not noob questions at all -- in fact i think the first one was my mistake [07:56:43] p858snake|l: I prefer shoe phones myself [07:57:21] it won't run 'periodically'; it'll run on "vagrant up" when you boot the machine, though evidently not always. i should check what the exact logic is. [07:57:30] and i should correct the readme. [07:57:38] ok, and then you have to vagrant provision or whatever it is, to manually run it? [07:57:58] yeah, it's essentially a wrapper around 'puppet apply' [07:58:06] there's a puppetmaster provisioner but that seemed needlessly complex [07:58:17] gotcha [07:58:22] you don't have to commit changes -- the contents of /vagrant aren't synced; they're mounted [07:58:42] I see. because the hmm I think vagrant docs talk about being 'synced' [07:59:02] if theyd just said (I guess I could have checked mount points. anyways) and it's mounted on the guest, [07:59:04] ah well [07:59:14] yeah, vagrant calls it a 'synced_folder', but it's inaccurate [07:59:21] so the point of committing stuff there would be...? [07:59:24] it uses VirtualBox Shared Folders by default, NFS if you enable it [07:59:40] ok [08:00:37] i'm hoping to encourage people to contribute puppet roles and modules for extension and other mediawiki-related software that requires some customized setup [08:00:46] for example, the math extension depends on some external renderer written in ocaml [08:00:52] yeah [08:02:10] anyways I like the setup a lot, it's pretty slick [08:02:27] last time I looked at VirtualBox it was much more a PITA [08:02:27] thanks, that's gratifying to hear :) [08:02:59] anyways it seems there is work being done to have libvirt and other providers so that will open it up more [08:03:25] yeah, I hope VMware doesn't hijack it [08:03:39] I think they've thrown some money at the main developer of Vagrant recently [08:03:44] uuhh oohhh [08:04:20] guess there need to be a few more developers :-D [08:04:46] yep. [08:05:22] any other tips about setup and stuff that come to mind, before I go back to my actual work? [08:05:47] what are you hoping to do with it? just checking it out, or evaluating it for some purpose? [08:06:56] well I might try to add some stuff that would allow someone t have a vm to play with dumps scripts (have some content already in the wiki, have the scripts already set up with the right db params and directories to write in, etc) [08:07:09] but also getting a general feel for vagrant and how it works [08:07:44] NFS provides a big performance boost and requires not much more than adding "nfs: true," under the 'config.vm.synced_folder' line in Vagrantfile. i kept it off by default because VirtualBox shared folders work everywhere (including windows) [08:07:45] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 08:07:43 UTC 2013 [08:07:53] ohhh [08:08:08] docs here are pretty good: http://docs-v1.vagrantup.com/v1/docs/nfs.html [08:08:10] if I change that I need to vagrant reload or something? [08:08:35] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [08:08:42] yeah, though vagrant reload had a bug in 1.1, not sureif it was fixed in 1.2.. 'vagrant halt && vagrant up' should work, otherwise [08:08:56] ok [08:09:09] oh wow that is a huge performance hit, no kidding [08:11:34] it's just for the files that are shared, mind you, but yeah [08:12:00] anyways i'm off to bed :) but glad you liked it and let me know if i can help with getting a dump module going at all [08:12:08] sure, thanks again! [08:12:14] * ori-l waves [08:15:15] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 08:15:11 UTC 2013 [08:15:35] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [08:27:35] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:28:25] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [08:39:41] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/63826 [08:44:56] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 08:44:48 UTC 2013 [08:45:36] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [09:07:04] Reedy: thanks, can you also sync it please? [09:23:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:24:23] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.132 second response time [09:40:52] mark: did you have time to look at rt #5100, about adding a new version of vips to apt.wikimedia.org [09:56:34] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:57:23] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.139 second response time [10:12:15] PROBLEM - Puppet freshness on pdf1 is CRITICAL: No successful Puppet run in the last 10 hours [10:12:15] PROBLEM - Puppet freshness on pdf2 is CRITICAL: No successful Puppet run in the last 10 hours [10:13:15] PROBLEM - Puppet freshness on virt0 is CRITICAL: No successful Puppet run in the last 10 hours [10:13:15] PROBLEM - Puppet freshness on virt1000 is CRITICAL: No successful Puppet run in the last 10 hours [10:37:02] PROBLEM - Puppet freshness on db45 is CRITICAL: No successful Puppet run in the last 10 hours [10:39:02] PROBLEM - Puppet freshness on db26 is CRITICAL: No successful Puppet run in the last 10 hours [11:02:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:03:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [11:06:05] Nemo_bis: lol [11:06:12] Yeah, got distracted due to a low battery [11:07:32] PROBLEM - LVS HTTP IPv4 on m.wikimedia.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 21398 bytes in 0.014 second response time [11:07:42] !log reedy synchronized wmf-config/InitialiseSettings.php [11:07:51] Logged the message, Master [11:08:37] New patchset: Reedy; "Move squid config to realm specific config files" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/63781 [11:10:37] failing since yesterday [11:10:38] sigh [11:12:14] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/63781 [11:13:03] !log reedy synchronized wmf-config/squid-labs.php 'wmf-config/squid.php' [11:13:12] Logged the message, Master [11:13:47] !log reedy synchronized wmf-config/squid.php [11:13:54] Logged the message, Master [11:14:26] !log reedy synchronized wmf-config/squid-labs.php [11:14:34] Logged the message, Master [11:15:39] !log reedy synchronized wmf-config/CommonSettings.php [11:15:46] Logged the message, Master [11:17:01] New patchset: Faidon; "Update HTTP content check for m.wikipedia.org" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63837 [11:17:36] apergos: steven has been pinging me to replace those ms-be pmtpa disks, want to handle it instead? [11:17:57] I have eqiad disks to handle :> [11:18:01] did them already [11:18:04] oh, cool [11:18:05] great :) [11:18:07] there's anotehr dead one though [11:18:11] I saw a message from last night [11:18:15] from steven [11:18:17] hence my ping [11:18:18] chris will order it [11:18:19] ah [11:18:32] yeah I dunno, I just know I was on line til quite late mucking with them [11:18:43] :) [11:18:43] and got off when all three were back and happy [11:18:48] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63837 [11:19:10] both me and chris I should say [11:19:24] RECOVERY - LVS HTTP IPv4 on m.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 21399 bytes in 0.003 second response time [11:19:42] New patchset: Reedy; "Display squid files on noc conf" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/63838 [11:19:56] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/63838 [11:20:14] Reedy: I remember hearing they contained "private" information [11:20:19] like blocked IPs etc. [11:20:22] oh [11:20:24] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [11:20:25] squid.php [11:20:27] never mind me :) [11:20:34] It's literally just the "squid" config refactored out [11:28:34] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:29:24] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.174 second response time [11:37:38] New patchset: Reedy; "Refactor out session code" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/63839 [11:39:41] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:40:31] RECOVERY - RAID on searchidx1001 is OK: OK: State is Optimal, checked 4 logical device(s) [11:50:51] RECOVERY - Disk space on ms-be1009 is OK: DISK OK [11:59:21] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: No successful Puppet run in the last 10 hours [12:07:54] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 12:07:52 UTC 2013 [12:08:25] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [12:09:04] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 12:08:54 UTC 2013 [12:09:25] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [12:09:54] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 12:09:51 UTC 2013 [12:10:25] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [12:10:44] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 12:10:40 UTC 2013 [12:11:24] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [12:12:04] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 12:11:55 UTC 2013 [12:12:24] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [12:14:54] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 12:14:53 UTC 2013 [12:15:24] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [12:16:34] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:17:24] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.128 second response time [12:39:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:40:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.156 second response time [12:54:11] PROBLEM - Puppet freshness on db44 is CRITICAL: No successful Puppet run in the last 10 hours [13:00:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:02:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.140 second response time [13:22:34] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:23:24] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.155 second response time [13:31:47] New patchset: coren; "Tool Labs: Add procmail to mail handling chain" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63846 [13:46:09] Change merged: coren; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63846 [13:52:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:53:23] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.150 second response time [14:05:36] New review: Ottomata; "Hm, cool, thank you! The syntax error caused by the comma after the last class parameter must be an..." [operations/puppet/cdh4] (master) - https://gerrit.wikimedia.org/r/63815 [14:06:39] New patchset: Mark Bergsma; "Imported Upstream version 7.32.3" [operations/debs/vips] (master) - https://gerrit.wikimedia.org/r/63851 [14:06:39] New patchset: Mark Bergsma; "Imported Debian patch 7.32.3-1" [operations/debs/vips] (master) - https://gerrit.wikimedia.org/r/63852 [14:06:40] New patchset: Mark Bergsma; "Backport to Ubuntu 12.04" [operations/debs/vips] (master) - https://gerrit.wikimedia.org/r/63853 [14:07:57] New patchset: Ottomata; "Puppetizing Hadoop for CDH4." [operations/puppet/cdh4] (master) - https://gerrit.wikimedia.org/r/61710 [14:08:39] Change abandoned: Ottomata; "Applying this change to as yet unmerged https://gerrit.wikimedia.org/r/#/c/61710/" [operations/puppet/cdh4] (master) - https://gerrit.wikimedia.org/r/63815 [14:11:39] New review: Ottomata; "Alex responded with questions about this changeset in an email. For posterity sake I will paste his..." [operations/puppet/cdh4] (master) - https://gerrit.wikimedia.org/r/61710 [14:25:22] PROBLEM - Puppet freshness on mc15 is CRITICAL: No successful Puppet run in the last 10 hours [14:26:22] PROBLEM - Puppet freshness on colby is CRITICAL: No successful Puppet run in the last 10 hours [14:27:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:29:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.152 second response time [14:34:04] New review: Ottomata; "> So the CDH4 guys provide both deb packages for ubuntu precise amd64 as well as debian source packa..." [operations/puppet/cdh4] (master) - https://gerrit.wikimedia.org/r/61710 [14:40:19] Change merged: Mark Bergsma; [operations/debs/vips] (master) - https://gerrit.wikimedia.org/r/63852 [14:40:46] Change merged: Mark Bergsma; [operations/debs/vips] (master) - https://gerrit.wikimedia.org/r/63853 [14:55:37] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:56:27] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [15:00:01] New patchset: Hashar; "jenkins::slave and a basic role applied gallium" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63666 [15:02:51] New review: coren; "Straightforward enough" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/63666 [15:04:28] New review: coren; "LGM" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/63801 [15:04:40] New patchset: Demon; "Deprecate $name param to systemuser in favor of $title" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60302 [15:06:35] New patchset: Hashar; "jenkins::slave and a basic role applied gallium" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63666 [15:06:45] New review: Hashar; "rebased" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63666 [15:08:17] New review: coren; "Still good after a rebase. :-)" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/63666 [15:09:23] Change merged: coren; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63666 [15:09:55] Change merged: coren; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63801 [15:24:17] PROBLEM - Host analytics1005 is DOWN: PING CRITICAL - Packet loss = 100% [15:24:17] PROBLEM - Host analytics1013 is DOWN: PING CRITICAL - Packet loss = 100% [15:24:17] PROBLEM - Host analytics1025 is DOWN: PING CRITICAL - Packet loss = 100% [15:24:17] PROBLEM - Host analytics1006 is DOWN: PING CRITICAL - Packet loss = 100% [15:24:17] PROBLEM - Host analytics1004 is DOWN: PING CRITICAL - Packet loss = 100% [15:24:17] PROBLEM - Host analytics1008 is DOWN: PING CRITICAL - Packet loss = 100% [15:24:18] PROBLEM - Host analytics1002 is DOWN: PING CRITICAL - Packet loss = 100% [15:24:18] PROBLEM - Host analytics1010 is DOWN: PING CRITICAL - Packet loss = 100% [15:24:19] PROBLEM - Host vanadium is DOWN: PING CRITICAL - Packet loss = 100% [15:24:19] PROBLEM - Host analytics1012 is DOWN: PING CRITICAL - Packet loss = 100% [15:24:37] PROBLEM - Host analytics1015 is DOWN: PING CRITICAL - Packet loss = 100% [15:24:37] PROBLEM - Host analytics1017 is DOWN: PING CRITICAL - Packet loss = 100% [15:24:37] PROBLEM - Host analytics1020 is DOWN: PING CRITICAL - Packet loss = 100% [15:24:37] PROBLEM - Host analytics1014 is DOWN: PING CRITICAL - Packet loss = 100% [15:24:37] PROBLEM - Host analytics1009 is DOWN: PING CRITICAL - Packet loss = 100% [15:24:38] PROBLEM - Host analytics1027 is DOWN: PING CRITICAL - Packet loss = 100% [15:24:47] PROBLEM - Host analytics1026 is DOWN: PING CRITICAL - Packet loss = 100% [15:24:47] PROBLEM - Host analytics1003 is DOWN: PING CRITICAL - Packet loss = 100% [15:24:47] PROBLEM - Host analytics1018 is DOWN: PING CRITICAL - Packet loss = 100% [15:24:47] PROBLEM - Host analytics1011 is DOWN: PING CRITICAL - Packet loss = 100% [15:24:47] PROBLEM - Host analytics1016 is DOWN: PING CRITICAL - Packet loss = 100% [15:24:47] PROBLEM - Host analytics1019 is DOWN: PING CRITICAL - Packet loss = 100% [15:24:47] PROBLEM - Host analytics1021 is DOWN: PING CRITICAL - Packet loss = 100% [15:24:57] PROBLEM - RAID on ms-be1 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:24:57] PROBLEM - Host analytics1024 is DOWN: PING CRITICAL - Packet loss = 100% [15:25:17] PROBLEM - Host analytics1022 is DOWN: PING CRITICAL - Packet loss = 100% [15:25:20] :) [15:25:28] that you? [15:25:47] PROBLEM - Host analytics1023 is DOWN: PING CRITICAL - Packet loss = 100% [15:26:37] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:27:51] yes [15:28:27] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [15:29:21] I guess I'll allow icmp replies to * [15:29:24] echo anyway [15:30:16] haha [15:31:17] RECOVERY - Host analytics1006 is UP: PING WARNING - Packet loss = 66%, RTA = 0.34 ms [15:31:17] RECOVERY - Host analytics1009 is UP: PING WARNING - Packet loss = 66%, RTA = 0.36 ms [15:31:17] RECOVERY - Host analytics1014 is UP: PING WARNING - Packet loss = 86%, RTA = 0.38 ms [15:31:17] RECOVERY - Host analytics1022 is UP: PING WARNING - Packet loss = 86%, RTA = 0.33 ms [15:31:17] RECOVERY - Host analytics1025 is UP: PING WARNING - Packet loss = 80%, RTA = 0.38 ms [15:31:17] RECOVERY - Host analytics1023 is UP: PING WARNING - Packet loss = 28%, RTA = 0.99 ms [15:31:27] RECOVERY - Host analytics1024 is UP: PING OK - Packet loss = 0%, RTA = 3.07 ms [15:31:28] RECOVERY - Host analytics1015 is UP: PING OK - Packet loss = 0%, RTA = 1.05 ms [15:31:28] RECOVERY - Host analytics1017 is UP: PING WARNING - Packet loss = 37%, RTA = 1.18 ms [15:31:28] RECOVERY - Host analytics1020 is UP: PING WARNING - Packet loss = 37%, RTA = 2.06 ms [15:31:28] RECOVERY - Host analytics1027 is UP: PING WARNING - Packet loss = 37%, RTA = 0.51 ms [15:31:28] RECOVERY - Host vanadium is UP: PING WARNING - Packet loss = 50%, RTA = 1.79 ms [15:31:37] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:31:37] RECOVERY - Host analytics1008 is UP: PING WARNING - Packet loss = 66%, RTA = 1.40 ms [15:31:37] RECOVERY - Host analytics1002 is UP: PING WARNING - Packet loss = 66%, RTA = 0.61 ms [15:31:37] RECOVERY - Host analytics1005 is UP: PING WARNING - Packet loss = 66%, RTA = 1.02 ms [15:31:37] RECOVERY - Host analytics1003 is UP: PING WARNING - Packet loss = 44%, RTA = 1.02 ms [15:31:38] RECOVERY - Host analytics1004 is UP: PING WARNING - Packet loss = 44%, RTA = 1.39 ms [15:31:38] RECOVERY - Host analytics1019 is UP: PING WARNING - Packet loss = 44%, RTA = 0.31 ms [15:31:47] RECOVERY - Host analytics1018 is UP: PING WARNING - Packet loss = 54%, RTA = 0.88 ms [15:31:47] RECOVERY - Host analytics1026 is UP: PING WARNING - Packet loss = 54%, RTA = 1.01 ms [15:31:47] RECOVERY - Host analytics1011 is UP: PING WARNING - Packet loss = 54%, RTA = 1.10 ms [15:31:47] RECOVERY - Host analytics1016 is UP: PING WARNING - Packet loss = 54%, RTA = 0.30 ms [15:31:47] RECOVERY - Host analytics1010 is UP: PING WARNING - Packet loss = 58%, RTA = 0.31 ms [15:31:47] RECOVERY - RAID on ms-be1 is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0 [15:31:48] RECOVERY - Host analytics1021 is UP: PING WARNING - Packet loss = 66%, RTA = 0.31 ms [15:31:48] RECOVERY - Host analytics1012 is UP: PING WARNING - Packet loss = 73%, RTA = 0.35 ms [15:31:57] RECOVERY - Host analytics1013 is UP: PING WARNING - Packet loss = 54%, RTA = 1.03 ms [15:32:27] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.122 second response time [15:34:29] !log rebooting boron [15:34:38] Logged the message, Master [15:45:30] PROBLEM - NTP on analytics1009 is CRITICAL: NTP CRITICAL: No response from NTP server [15:45:30] PROBLEM - NTP on analytics1023 is CRITICAL: NTP CRITICAL: No response from NTP server [15:45:30] PROBLEM - NTP on analytics1025 is CRITICAL: NTP CRITICAL: No response from NTP server [15:45:30] PROBLEM - NTP on analytics1020 is CRITICAL: NTP CRITICAL: No response from NTP server [15:45:30] PROBLEM - NTP on analytics1015 is CRITICAL: NTP CRITICAL: No response from NTP server [15:45:40] PROBLEM - NTP on analytics1006 is CRITICAL: NTP CRITICAL: No response from NTP server [15:45:40] PROBLEM - NTP on analytics1026 is CRITICAL: NTP CRITICAL: No response from NTP server [15:45:40] PROBLEM - NTP on analytics1019 is CRITICAL: NTP CRITICAL: No response from NTP server [15:45:40] PROBLEM - NTP on analytics1017 is CRITICAL: NTP CRITICAL: No response from NTP server [15:45:44] isn't this fun [15:46:00]