[00:04:49] MaxSem: so… see https://gerrit.wikimedia.org/r/#/c/52553/ i originally did this but for some reason we did the construct thing [00:04:54] not sure if there was a reason.. [00:05:30] was that reason me? [00:06:16] * jdlrobson shrugs :) [00:06:52] yep MaxSem lol: MaxSem: Instead of renaming execute() for every special page, just call clearPageMargins() from constructor. [00:07:17] I came up with a nicer solution [00:08:03] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 00:07:54 UTC 2013 [00:08:03] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:09:13] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 00:09:03 UTC 2013 [00:10:03] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:10:13] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 00:10:08 UTC 2013 [00:11:03] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:11:13] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 00:11:05 UTC 2013 [00:12:03] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:12:43] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 00:12:38 UTC 2013 [00:13:03] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:13:23] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 00:13:13 UTC 2013 [00:14:03] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:15:03] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 00:15:01 UTC 2013 [00:15:34] Reedy, no idea about location - zero doesn't use any of it, only IP-based detection [00:16:03] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:18:15] New patchset: Tim Starling; "Add generic::mysql::packages::client to tin" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63663 [00:18:23] Change merged: Tim Starling; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63663 [00:28:36] !log maxsem synchronized php-1.22wmf3/extensions/MobileFrontend 'https://gerrit.wikimedia.org/r/#/c/63806/' [00:28:44] Logged the message, Master [00:30:41] !log maxsem synchronized php-1.22wmf4/extensions/MobileFrontend 'https://gerrit.wikimedia.org/r/#/c/63806/' [00:30:48] Logged the message, Master [00:31:45] Thehelpfulone, should be gone [00:36:45] PROBLEM - Puppet freshness on db45 is CRITICAL: No successful Puppet run in the last 10 hours [00:38:45] PROBLEM - Puppet freshness on db26 is CRITICAL: No successful Puppet run in the last 10 hours [00:40:11] MaxSem, thanks [00:42:59] New patchset: Ryan Lane; "Adding .gitreview file" [operations/puppet/cdh4] (master) - https://gerrit.wikimedia.org/r/63812 [00:43:23] Change merged: Ryan Lane; [operations/puppet/cdh4] (master) - https://gerrit.wikimedia.org/r/63812 [00:44:07] New patchset: Akosiaris; "Puppetizing Hadoop for CDH4." [operations/puppet/cdh4] (master) - https://gerrit.wikimedia.org/r/61710 [00:49:14] !log graceful apache on stat1001 [00:49:22] Logged the message, Master [00:57:03] New review: Hydriz; "Hmm, some suggestions:" [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/63782 [01:01:05] New patchset: Dzahn; "Weekly Bugzilla Report mail: Fix wrong SQL query on bug resolutions" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/62220 [01:01:16] New patchset: Akosiaris; "Fix a couple of syntax errors" [operations/puppet/cdh4] (master) - https://gerrit.wikimedia.org/r/63815 [01:03:16] New review: Dzahn; "making another test before it's mailing out stuff" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/62220 [01:03:17] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/62220 [01:07:06] # PHP Deprecated: Comments starting with '#' are deprecated .. bla:) [01:19:24] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [01:52:26] PROBLEM - Disk space on ms-be1009 is CRITICAL: DISK CRITICAL - /var/lib/ceph/osd/ceph-106 is not accessible: Input/output error [01:53:26] PROBLEM - HTTP radosgw on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:53:36] PROBLEM - HTTP radosgw on ms-fe1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:53:36] PROBLEM - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:54:16] PROBLEM - HTTP radosgw on ms-fe1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:54:16] PROBLEM - HTTP radosgw on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:54:36] RECOVERY - HTTP radosgw on ms-fe1003 is OK: HTTP OK: HTTP/1.1 200 OK - 311 bytes in 4.738 second response time [01:54:36] RECOVERY - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 311 bytes in 8.228 second response time [01:55:06] RECOVERY - HTTP radosgw on ms-fe1004 is OK: HTTP OK: HTTP/1.1 200 OK - 311 bytes in 0.005 second response time [01:55:06] RECOVERY - HTTP radosgw on ms-fe1001 is OK: HTTP OK: HTTP/1.1 200 OK - 311 bytes in 0.008 second response time [01:55:16] RECOVERY - HTTP radosgw on ms-fe1002 is OK: HTTP OK: HTTP/1.1 200 OK - 311 bytes in 0.004 second response time [01:55:19] hrm [01:56:56] eh, not too much on flourine [01:59:06] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: No successful Puppet run in the last 10 hours [02:08:15] !log LocalisationUpdate completed (1.22wmf3) at Wed May 15 02:08:15 UTC 2013 [02:08:24] Logged the message, Master [02:14:58] !log LocalisationUpdate completed (1.22wmf4) at Wed May 15 02:14:58 UTC 2013 [02:15:06] Logged the message, Master [02:34:38] !log LocalisationUpdate ResourceLoader cache refresh completed at Wed May 15 02:34:37 UTC 2013 [02:34:47] Logged the message, Master [02:53:27] PROBLEM - Puppet freshness on db44 is CRITICAL: No successful Puppet run in the last 10 hours [03:06:37] PROBLEM - LVS HTTP IPv4 on m.wikimedia.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 21532 bytes in 0.002 second response time [03:18:57] PROBLEM - Host mw1173 is DOWN: PING CRITICAL - Packet loss = 100% [03:23:37] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:24:27] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [03:32:37] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:33:27] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [03:39:17] New review: Ori.livneh; "OK, makes sense. I'll work this over." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57890 [03:57:35] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:59:25] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.164 second response time [04:08:03] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 04:07:56 UTC 2013 [04:08:33] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [04:09:13] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 04:09:06 UTC 2013 [04:09:33] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [04:10:13] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 04:10:10 UTC 2013 [04:10:33] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [04:11:13] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 04:11:07 UTC 2013 [04:11:33] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [04:12:03] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 04:11:56 UTC 2013 [04:12:33] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [04:12:43] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 04:12:41 UTC 2013 [04:13:33] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [04:14:53] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 04:14:49 UTC 2013 [04:15:33] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [04:25:13] PROBLEM - Puppet freshness on mc15 is CRITICAL: No successful Puppet run in the last 10 hours [04:26:13] PROBLEM - Puppet freshness on colby is CRITICAL: No successful Puppet run in the last 10 hours [04:27:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:28:23] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.139 second response time [04:35:34] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:37:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [04:43:30] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:49:30] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:49:44] New review: Yurik; "Faidon, I'm not sure I understand -- could you elaborate on the SSL servers? If SSL gateway sets XFF..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/62103 [05:04:42] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:06:00] PROBLEM - Disk space on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:06:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:06:50] RECOVERY - Disk space on snapshot2 is OK: DISK OK [05:07:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.132 second response time [05:09:30] RECOVERY - DPKG on snapshot2 is OK: All packages OK [05:10:30] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:12:30] PROBLEM - DPKG on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:14:00] PROBLEM - Disk space on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:14:40] PROBLEM - SSH on snapshot2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:15:50] RECOVERY - Disk space on snapshot2 is OK: DISK OK [05:16:31] RECOVERY - SSH on snapshot2 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [05:18:30] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:18:30] RECOVERY - DPKG on snapshot2 is OK: All packages OK [05:25:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:27:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.138 second response time [05:31:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:32:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [05:38:55] New patchset: Nemo bis; "Fix path for zh-min-nan.wiktionary logo" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/63826 [06:00:02] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [06:00:02] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [06:00:02] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [06:01:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:02:32] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 8.125 second response time [06:14:02] PROBLEM - Puppet freshness on db1017 is CRITICAL: No successful Puppet run in the last 10 hours [06:15:39] New patchset: Nemo bis; "Fix path for roa-rup and zh-min-nan.wiktionary logo" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/63826 [06:16:40] New patchset: Nemo bis; "Fix path for roa-rup and zh-min-nan.wiktionary logo" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/63826 [06:30:12] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 06:30:08 UTC 2013 [06:30:32] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [06:30:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:31:02] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 06:30:53 UTC 2013 [06:31:32] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [06:31:33] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 06:31:31 UTC 2013 [06:32:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.138 second response time [06:32:32] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [06:45:06] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 06:45:03 UTC 2013 [06:45:36] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [07:06:37] PROBLEM - LVS HTTP IPv4 on m.wikimedia.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 21539 bytes in 0.014 second response time [07:27:37] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:28:27] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.326 second response time [07:31:37] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:32:27] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.143 second response time [07:39:22] lo [07:50:56] ol [07:51:17] you rang? [07:51:47] ringringring [07:51:51] why yes, I did [07:52:00] hi apergos [07:52:21] ah you are the mediawiki vagrant person right? [07:52:37] ori-l: [07:53:01] i guess so [07:53:06] what's up? [07:53:09] did it crash and burn? [07:53:35] no no [07:53:48] first my compliments on the awesome color banner on login :-D [07:53:56] haha [07:53:59] thanks :) [07:54:47] I'm just trying to understand how to trigger a puppet run, not manually, your readme says it vagrant will 'periodically run it' and I wonder how that works, since I didn't see cron or someting, or even a puppet-agent running on the box [07:55:38] ah the other thing immediately related is that I see the stuff in /vagrant on the guest (which is synced to the host) is a git repo, including therefore the puppet files, do I need to commit changes there in order to have an effect? [07:55:54] ringringring < banana phone? [07:56:01] sorry for the noob qs but there's an awful lot of vagrant docs and I got lost pretty fast [07:56:08] p858snake|l: great, now i'll have that in my head for the next, oh, four days [07:56:23] apergos: not noob questions at all -- in fact i think the first one was my mistake [07:56:43] p858snake|l: I prefer shoe phones myself [07:57:21] it won't run 'periodically'; it'll run on "vagrant up" when you boot the machine, though evidently not always. i should check what the exact logic is. [07:57:30] and i should correct the readme. [07:57:38] ok, and then you have to vagrant provision or whatever it is, to manually run it? [07:57:58] yeah, it's essentially a wrapper around 'puppet apply' [07:58:06] there's a puppetmaster provisioner but that seemed needlessly complex [07:58:17] gotcha [07:58:22] you don't have to commit changes -- the contents of /vagrant aren't synced; they're mounted [07:58:42] I see. because the hmm I think vagrant docs talk about being 'synced' [07:59:02] if theyd just said (I guess I could have checked mount points. anyways) and it's mounted on the guest, [07:59:04] ah well [07:59:14] yeah, vagrant calls it a 'synced_folder', but it's inaccurate [07:59:21] so the point of committing stuff there would be...? [07:59:24] it uses VirtualBox Shared Folders by default, NFS if you enable it [07:59:40] ok [08:00:37] i'm hoping to encourage people to contribute puppet roles and modules for extension and other mediawiki-related software that requires some customized setup [08:00:46] for example, the math extension depends on some external renderer written in ocaml [08:00:52] yeah [08:02:10] anyways I like the setup a lot, it's pretty slick [08:02:27] last time I looked at VirtualBox it was much more a PITA [08:02:27] thanks, that's gratifying to hear :) [08:02:59] anyways it seems there is work being done to have libvirt and other providers so that will open it up more [08:03:25] yeah, I hope VMware doesn't hijack it [08:03:39] I think they've thrown some money at the main developer of Vagrant recently [08:03:44] uuhh oohhh [08:04:20] guess there need to be a few more developers :-D [08:04:46] yep. [08:05:22] any other tips about setup and stuff that come to mind, before I go back to my actual work? [08:05:47] what are you hoping to do with it? just checking it out, or evaluating it for some purpose? [08:06:56] well I might try to add some stuff that would allow someone t have a vm to play with dumps scripts (have some content already in the wiki, have the scripts already set up with the right db params and directories to write in, etc) [08:07:09] but also getting a general feel for vagrant and how it works [08:07:44] NFS provides a big performance boost and requires not much more than adding "nfs: true," under the 'config.vm.synced_folder' line in Vagrantfile. i kept it off by default because VirtualBox shared folders work everywhere (including windows) [08:07:45] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 08:07:43 UTC 2013 [08:07:53] ohhh [08:08:08] docs here are pretty good: http://docs-v1.vagrantup.com/v1/docs/nfs.html [08:08:10] if I change that I need to vagrant reload or something? [08:08:35] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [08:08:42] yeah, though vagrant reload had a bug in 1.1, not sureif it was fixed in 1.2.. 'vagrant halt && vagrant up' should work, otherwise [08:08:56] ok [08:09:09] oh wow that is a huge performance hit, no kidding [08:11:34] it's just for the files that are shared, mind you, but yeah [08:12:00] anyways i'm off to bed :) but glad you liked it and let me know if i can help with getting a dump module going at all [08:12:08] sure, thanks again! [08:12:14] * ori-l waves [08:15:15] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 08:15:11 UTC 2013 [08:15:35] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [08:27:35] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:28:25] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [08:39:41] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/63826 [08:44:56] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 08:44:48 UTC 2013 [08:45:36] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [09:07:04] Reedy: thanks, can you also sync it please? [09:23:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:24:23] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.132 second response time [09:40:52] mark: did you have time to look at rt #5100, about adding a new version of vips to apt.wikimedia.org [09:56:34] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:57:23] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.139 second response time [10:12:15] PROBLEM - Puppet freshness on pdf1 is CRITICAL: No successful Puppet run in the last 10 hours [10:12:15] PROBLEM - Puppet freshness on pdf2 is CRITICAL: No successful Puppet run in the last 10 hours [10:13:15] PROBLEM - Puppet freshness on virt0 is CRITICAL: No successful Puppet run in the last 10 hours [10:13:15] PROBLEM - Puppet freshness on virt1000 is CRITICAL: No successful Puppet run in the last 10 hours [10:37:02] PROBLEM - Puppet freshness on db45 is CRITICAL: No successful Puppet run in the last 10 hours [10:39:02] PROBLEM - Puppet freshness on db26 is CRITICAL: No successful Puppet run in the last 10 hours [11:02:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:03:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [11:06:05] Nemo_bis: lol [11:06:12] Yeah, got distracted due to a low battery [11:07:32] PROBLEM - LVS HTTP IPv4 on m.wikimedia.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 21398 bytes in 0.014 second response time [11:07:42] !log reedy synchronized wmf-config/InitialiseSettings.php [11:07:51] Logged the message, Master [11:08:37] New patchset: Reedy; "Move squid config to realm specific config files" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/63781 [11:10:37] failing since yesterday [11:10:38] sigh [11:12:14] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/63781 [11:13:03] !log reedy synchronized wmf-config/squid-labs.php 'wmf-config/squid.php' [11:13:12] Logged the message, Master [11:13:47] !log reedy synchronized wmf-config/squid.php [11:13:54] Logged the message, Master [11:14:26] !log reedy synchronized wmf-config/squid-labs.php [11:14:34] Logged the message, Master [11:15:39] !log reedy synchronized wmf-config/CommonSettings.php [11:15:46] Logged the message, Master [11:17:01] New patchset: Faidon; "Update HTTP content check for m.wikipedia.org" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63837 [11:17:36] apergos: steven has been pinging me to replace those ms-be pmtpa disks, want to handle it instead? [11:17:57] I have eqiad disks to handle :> [11:18:01] did them already [11:18:04] oh, cool [11:18:05] great :) [11:18:07] there's anotehr dead one though [11:18:11] I saw a message from last night [11:18:15] from steven [11:18:17] hence my ping [11:18:18] chris will order it [11:18:19] ah [11:18:32] yeah I dunno, I just know I was on line til quite late mucking with them [11:18:43] :) [11:18:43] and got off when all three were back and happy [11:18:48] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63837 [11:19:10] both me and chris I should say [11:19:24] RECOVERY - LVS HTTP IPv4 on m.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 21399 bytes in 0.003 second response time [11:19:42] New patchset: Reedy; "Display squid files on noc conf" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/63838 [11:19:56] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/63838 [11:20:14] Reedy: I remember hearing they contained "private" information [11:20:19] like blocked IPs etc. [11:20:22] oh [11:20:24] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [11:20:25] squid.php [11:20:27] never mind me :) [11:20:34] It's literally just the "squid" config refactored out [11:28:34] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:29:24] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.174 second response time [11:37:38] New patchset: Reedy; "Refactor out session code" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/63839 [11:39:41] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:40:31] RECOVERY - RAID on searchidx1001 is OK: OK: State is Optimal, checked 4 logical device(s) [11:50:51] RECOVERY - Disk space on ms-be1009 is OK: DISK OK [11:59:21] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: No successful Puppet run in the last 10 hours [12:07:54] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 12:07:52 UTC 2013 [12:08:25] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [12:09:04] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 12:08:54 UTC 2013 [12:09:25] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [12:09:54] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 12:09:51 UTC 2013 [12:10:25] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [12:10:44] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 12:10:40 UTC 2013 [12:11:24] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [12:12:04] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 12:11:55 UTC 2013 [12:12:24] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [12:14:54] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 12:14:53 UTC 2013 [12:15:24] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [12:16:34] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:17:24] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.128 second response time [12:39:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:40:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.156 second response time [12:54:11] PROBLEM - Puppet freshness on db44 is CRITICAL: No successful Puppet run in the last 10 hours [13:00:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:02:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.140 second response time [13:22:34] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:23:24] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.155 second response time [13:31:47] New patchset: coren; "Tool Labs: Add procmail to mail handling chain" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63846 [13:46:09] Change merged: coren; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63846 [13:52:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:53:23] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.150 second response time [14:05:36] New review: Ottomata; "Hm, cool, thank you! The syntax error caused by the comma after the last class parameter must be an..." [operations/puppet/cdh4] (master) - https://gerrit.wikimedia.org/r/63815 [14:06:39] New patchset: Mark Bergsma; "Imported Upstream version 7.32.3" [operations/debs/vips] (master) - https://gerrit.wikimedia.org/r/63851 [14:06:39] New patchset: Mark Bergsma; "Imported Debian patch 7.32.3-1" [operations/debs/vips] (master) - https://gerrit.wikimedia.org/r/63852 [14:06:40] New patchset: Mark Bergsma; "Backport to Ubuntu 12.04" [operations/debs/vips] (master) - https://gerrit.wikimedia.org/r/63853 [14:07:57] New patchset: Ottomata; "Puppetizing Hadoop for CDH4." [operations/puppet/cdh4] (master) - https://gerrit.wikimedia.org/r/61710 [14:08:39] Change abandoned: Ottomata; "Applying this change to as yet unmerged https://gerrit.wikimedia.org/r/#/c/61710/" [operations/puppet/cdh4] (master) - https://gerrit.wikimedia.org/r/63815 [14:11:39] New review: Ottomata; "Alex responded with questions about this changeset in an email. For posterity sake I will paste his..." [operations/puppet/cdh4] (master) - https://gerrit.wikimedia.org/r/61710 [14:25:22] PROBLEM - Puppet freshness on mc15 is CRITICAL: No successful Puppet run in the last 10 hours [14:26:22] PROBLEM - Puppet freshness on colby is CRITICAL: No successful Puppet run in the last 10 hours [14:27:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:29:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.152 second response time [14:34:04] New review: Ottomata; "> So the CDH4 guys provide both deb packages for ubuntu precise amd64 as well as debian source packa..." [operations/puppet/cdh4] (master) - https://gerrit.wikimedia.org/r/61710 [14:40:19] Change merged: Mark Bergsma; [operations/debs/vips] (master) - https://gerrit.wikimedia.org/r/63852 [14:40:46] Change merged: Mark Bergsma; [operations/debs/vips] (master) - https://gerrit.wikimedia.org/r/63853 [14:55:37] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:56:27] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [15:00:01] New patchset: Hashar; "jenkins::slave and a basic role applied gallium" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63666 [15:02:51] New review: coren; "Straightforward enough" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/63666 [15:04:28] New review: coren; "LGM" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/63801 [15:04:40] New patchset: Demon; "Deprecate $name param to systemuser in favor of $title" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60302 [15:06:35] New patchset: Hashar; "jenkins::slave and a basic role applied gallium" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63666 [15:06:45] New review: Hashar; "rebased" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63666 [15:08:17] New review: coren; "Still good after a rebase. :-)" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/63666 [15:09:23] Change merged: coren; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63666 [15:09:55] Change merged: coren; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63801 [15:24:17] PROBLEM - Host analytics1005 is DOWN: PING CRITICAL - Packet loss = 100% [15:24:17] PROBLEM - Host analytics1013 is DOWN: PING CRITICAL - Packet loss = 100% [15:24:17] PROBLEM - Host analytics1025 is DOWN: PING CRITICAL - Packet loss = 100% [15:24:17] PROBLEM - Host analytics1006 is DOWN: PING CRITICAL - Packet loss = 100% [15:24:17] PROBLEM - Host analytics1004 is DOWN: PING CRITICAL - Packet loss = 100% [15:24:17] PROBLEM - Host analytics1008 is DOWN: PING CRITICAL - Packet loss = 100% [15:24:18] PROBLEM - Host analytics1002 is DOWN: PING CRITICAL - Packet loss = 100% [15:24:18] PROBLEM - Host analytics1010 is DOWN: PING CRITICAL - Packet loss = 100% [15:24:19] PROBLEM - Host vanadium is DOWN: PING CRITICAL - Packet loss = 100% [15:24:19] PROBLEM - Host analytics1012 is DOWN: PING CRITICAL - Packet loss = 100% [15:24:37] PROBLEM - Host analytics1015 is DOWN: PING CRITICAL - Packet loss = 100% [15:24:37] PROBLEM - Host analytics1017 is DOWN: PING CRITICAL - Packet loss = 100% [15:24:37] PROBLEM - Host analytics1020 is DOWN: PING CRITICAL - Packet loss = 100% [15:24:37] PROBLEM - Host analytics1014 is DOWN: PING CRITICAL - Packet loss = 100% [15:24:37] PROBLEM - Host analytics1009 is DOWN: PING CRITICAL - Packet loss = 100% [15:24:38] PROBLEM - Host analytics1027 is DOWN: PING CRITICAL - Packet loss = 100% [15:24:47] PROBLEM - Host analytics1026 is DOWN: PING CRITICAL - Packet loss = 100% [15:24:47] PROBLEM - Host analytics1003 is DOWN: PING CRITICAL - Packet loss = 100% [15:24:47] PROBLEM - Host analytics1018 is DOWN: PING CRITICAL - Packet loss = 100% [15:24:47] PROBLEM - Host analytics1011 is DOWN: PING CRITICAL - Packet loss = 100% [15:24:47] PROBLEM - Host analytics1016 is DOWN: PING CRITICAL - Packet loss = 100% [15:24:47] PROBLEM - Host analytics1019 is DOWN: PING CRITICAL - Packet loss = 100% [15:24:47] PROBLEM - Host analytics1021 is DOWN: PING CRITICAL - Packet loss = 100% [15:24:57] PROBLEM - RAID on ms-be1 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:24:57] PROBLEM - Host analytics1024 is DOWN: PING CRITICAL - Packet loss = 100% [15:25:17] PROBLEM - Host analytics1022 is DOWN: PING CRITICAL - Packet loss = 100% [15:25:20] :) [15:25:28] that you? [15:25:47] PROBLEM - Host analytics1023 is DOWN: PING CRITICAL - Packet loss = 100% [15:26:37] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:27:51] yes [15:28:27] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [15:29:21] I guess I'll allow icmp replies to * [15:29:24] echo anyway [15:30:16] haha [15:31:17] RECOVERY - Host analytics1006 is UP: PING WARNING - Packet loss = 66%, RTA = 0.34 ms [15:31:17] RECOVERY - Host analytics1009 is UP: PING WARNING - Packet loss = 66%, RTA = 0.36 ms [15:31:17] RECOVERY - Host analytics1014 is UP: PING WARNING - Packet loss = 86%, RTA = 0.38 ms [15:31:17] RECOVERY - Host analytics1022 is UP: PING WARNING - Packet loss = 86%, RTA = 0.33 ms [15:31:17] RECOVERY - Host analytics1025 is UP: PING WARNING - Packet loss = 80%, RTA = 0.38 ms [15:31:17] RECOVERY - Host analytics1023 is UP: PING WARNING - Packet loss = 28%, RTA = 0.99 ms [15:31:27] RECOVERY - Host analytics1024 is UP: PING OK - Packet loss = 0%, RTA = 3.07 ms [15:31:28] RECOVERY - Host analytics1015 is UP: PING OK - Packet loss = 0%, RTA = 1.05 ms [15:31:28] RECOVERY - Host analytics1017 is UP: PING WARNING - Packet loss = 37%, RTA = 1.18 ms [15:31:28] RECOVERY - Host analytics1020 is UP: PING WARNING - Packet loss = 37%, RTA = 2.06 ms [15:31:28] RECOVERY - Host analytics1027 is UP: PING WARNING - Packet loss = 37%, RTA = 0.51 ms [15:31:28] RECOVERY - Host vanadium is UP: PING WARNING - Packet loss = 50%, RTA = 1.79 ms [15:31:37] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:31:37] RECOVERY - Host analytics1008 is UP: PING WARNING - Packet loss = 66%, RTA = 1.40 ms [15:31:37] RECOVERY - Host analytics1002 is UP: PING WARNING - Packet loss = 66%, RTA = 0.61 ms [15:31:37] RECOVERY - Host analytics1005 is UP: PING WARNING - Packet loss = 66%, RTA = 1.02 ms [15:31:37] RECOVERY - Host analytics1003 is UP: PING WARNING - Packet loss = 44%, RTA = 1.02 ms [15:31:38] RECOVERY - Host analytics1004 is UP: PING WARNING - Packet loss = 44%, RTA = 1.39 ms [15:31:38] RECOVERY - Host analytics1019 is UP: PING WARNING - Packet loss = 44%, RTA = 0.31 ms [15:31:47] RECOVERY - Host analytics1018 is UP: PING WARNING - Packet loss = 54%, RTA = 0.88 ms [15:31:47] RECOVERY - Host analytics1026 is UP: PING WARNING - Packet loss = 54%, RTA = 1.01 ms [15:31:47] RECOVERY - Host analytics1011 is UP: PING WARNING - Packet loss = 54%, RTA = 1.10 ms [15:31:47] RECOVERY - Host analytics1016 is UP: PING WARNING - Packet loss = 54%, RTA = 0.30 ms [15:31:47] RECOVERY - Host analytics1010 is UP: PING WARNING - Packet loss = 58%, RTA = 0.31 ms [15:31:47] RECOVERY - RAID on ms-be1 is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0 [15:31:48] RECOVERY - Host analytics1021 is UP: PING WARNING - Packet loss = 66%, RTA = 0.31 ms [15:31:48] RECOVERY - Host analytics1012 is UP: PING WARNING - Packet loss = 73%, RTA = 0.35 ms [15:31:57] RECOVERY - Host analytics1013 is UP: PING WARNING - Packet loss = 54%, RTA = 1.03 ms [15:32:27] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.122 second response time [15:34:29] !log rebooting boron [15:34:38] Logged the message, Master [15:45:30] PROBLEM - NTP on analytics1009 is CRITICAL: NTP CRITICAL: No response from NTP server [15:45:30] PROBLEM - NTP on analytics1023 is CRITICAL: NTP CRITICAL: No response from NTP server [15:45:30] PROBLEM - NTP on analytics1025 is CRITICAL: NTP CRITICAL: No response from NTP server [15:45:30] PROBLEM - NTP on analytics1020 is CRITICAL: NTP CRITICAL: No response from NTP server [15:45:30] PROBLEM - NTP on analytics1015 is CRITICAL: NTP CRITICAL: No response from NTP server [15:45:40] PROBLEM - NTP on analytics1006 is CRITICAL: NTP CRITICAL: No response from NTP server [15:45:40] PROBLEM - NTP on analytics1026 is CRITICAL: NTP CRITICAL: No response from NTP server [15:45:40] PROBLEM - NTP on analytics1019 is CRITICAL: NTP CRITICAL: No response from NTP server [15:45:40] PROBLEM - NTP on analytics1017 is CRITICAL: NTP CRITICAL: No response from NTP server [15:45:44] isn't this fun [15:46:00] PROBLEM - NTP on analytics1014 is CRITICAL: NTP CRITICAL: No response from NTP server [15:46:10] PROBLEM - NTP on analytics1022 is CRITICAL: NTP CRITICAL: No response from NTP server [15:46:10] PROBLEM - NTP on analytics1024 is CRITICAL: NTP CRITICAL: No response from NTP server [15:46:30] PROBLEM - NTP on analytics1012 is CRITICAL: NTP CRITICAL: No response from NTP server [15:46:30] PROBLEM - NTP on analytics1027 is CRITICAL: NTP CRITICAL: No response from NTP server [15:46:30] PROBLEM - NTP on analytics1013 is CRITICAL: NTP CRITICAL: No response from NTP server [15:46:30] PROBLEM - NTP on analytics1018 is CRITICAL: NTP CRITICAL: No response from NTP server [15:46:30] PROBLEM - NTP on analytics1005 is CRITICAL: NTP CRITICAL: No response from NTP server [15:46:30] PROBLEM - NTP on vanadium is CRITICAL: NTP CRITICAL: No response from NTP server [15:46:31] PROBLEM - NTP on analytics1004 is CRITICAL: NTP CRITICAL: No response from NTP server [15:46:31] PROBLEM - NTP on analytics1002 is CRITICAL: NTP CRITICAL: No response from NTP server [15:46:32] PROBLEM - NTP on analytics1008 is CRITICAL: NTP CRITICAL: No response from NTP server [15:46:40] PROBLEM - NTP on analytics1010 is CRITICAL: NTP CRITICAL: No response from NTP server [15:46:40] PROBLEM - NTP on analytics1011 is CRITICAL: NTP CRITICAL: No response from NTP server [15:46:40] PROBLEM - NTP on analytics1016 is CRITICAL: NTP CRITICAL: No response from NTP server [15:46:50] PROBLEM - NTP on analytics1021 is CRITICAL: NTP CRITICAL: No response from NTP server [15:46:50] PROBLEM - NTP on analytics1003 is CRITICAL: NTP CRITICAL: No response from NTP server [15:47:20] RECOVERY - NTP on analytics1023 is OK: NTP OK: Offset -0.0004553794861 secs [15:47:20] RECOVERY - NTP on analytics1009 is OK: NTP OK: Offset -0.0006055831909 secs [15:47:20] RECOVERY - NTP on analytics1027 is OK: NTP OK: Offset 0.0005110502243 secs [15:47:20] RECOVERY - NTP on analytics1012 is OK: NTP OK: Offset -0.0003294944763 secs [15:47:20] RECOVERY - NTP on analytics1018 is OK: NTP OK: Offset 0.0002799034119 secs [15:47:21] RECOVERY - NTP on analytics1013 is OK: NTP OK: Offset 0.0003694295883 secs [15:47:21] RECOVERY - NTP on analytics1025 is OK: NTP OK: Offset 0.0001076459885 secs [15:47:22] RECOVERY - NTP on analytics1005 is OK: NTP OK: Offset -2.503395081e-06 secs [15:47:30] RECOVERY - NTP on vanadium is OK: NTP OK: Offset -0.00106549263 secs [15:47:30] RECOVERY - NTP on analytics1004 is OK: NTP OK: Offset -0.0002516508102 secs [15:47:30] RECOVERY - NTP on analytics1020 is OK: NTP OK: Offset -0.001540660858 secs [15:47:30] RECOVERY - NTP on analytics1002 is OK: NTP OK: Offset -0.000706076622 secs [15:47:30] RECOVERY - NTP on analytics1008 is OK: NTP OK: Offset 0.0004247426987 secs [15:47:31] RECOVERY - NTP on analytics1015 is OK: NTP OK: Offset -7.307529449e-05 secs [15:47:31] RECOVERY - NTP on analytics1006 is OK: NTP OK: Offset -0.001481533051 secs [15:47:32] RECOVERY - NTP on analytics1011 is OK: NTP OK: Offset -0.0001208782196 secs [15:47:32] RECOVERY - NTP on analytics1010 is OK: NTP OK: Offset 0.000422000885 secs [15:47:40] RECOVERY - NTP on analytics1016 is OK: NTP OK: Offset -0.0006670951843 secs [15:47:40] RECOVERY - NTP on analytics1017 is OK: NTP OK: Offset -0.0001883506775 secs [15:47:40] RECOVERY - NTP on analytics1019 is OK: NTP OK: Offset -0.0002484321594 secs [15:47:40] RECOVERY - NTP on analytics1026 is OK: NTP OK: Offset -0.002911329269 secs [15:47:40] RECOVERY - NTP on analytics1021 is OK: NTP OK: Offset 0.0002522468567 secs [15:47:50] RECOVERY - NTP on analytics1003 is OK: NTP OK: Offset 0.0004156827927 secs [15:48:00] RECOVERY - NTP on analytics1014 is OK: NTP OK: Offset -0.0004264116287 secs [15:48:00] RECOVERY - NTP on analytics1022 is OK: NTP OK: Offset 0.0007469654083 secs [15:48:00] RECOVERY - NTP on analytics1024 is OK: NTP OK: Offset 0.0003401041031 secs [16:01:00] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [16:01:00] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [16:01:00] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [16:06:40] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:07:30] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [16:08:00] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 16:07:57 UTC 2013 [16:08:30] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [16:09:10] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 16:09:06 UTC 2013 [16:09:30] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [16:10:20] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 16:10:13 UTC 2013 [16:10:30] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [16:10:31] New patchset: Demon; "Remove some long-since-gone SVN stuff" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63864 [16:11:10] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 16:11:03 UTC 2013 [16:11:30] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [16:12:00] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 16:11:53 UTC 2013 [16:12:30] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [16:12:40] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 16:12:36 UTC 2013 [16:13:30] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [16:14:10] PROBLEM - Puppet freshness on db1017 is CRITICAL: No successful Puppet run in the last 10 hours [16:15:00] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 16:14:52 UTC 2013 [16:15:30] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [16:21:17] New patchset: Hashar; "contint: java definitions are now in the module" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63866 [16:21:50] gallium has a broken puppet, should be fixed with https://gerrit.wikimedia.org/r/6386 [16:21:59] err https://gerrit.wikimedia.org/r/63866 [16:22:03] must leave ttyl [16:23:35] Change merged: Andrew Bogott; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63213 [16:26:09] New patchset: Andrew Bogott; "Install RT4 on magnesium" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63867 [16:36:40] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:37:31] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.133 second response time [16:48:48] New patchset: coren; "Tool Labs: Have procmail router use procmail" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63870 [16:49:30] New review: coren; "Trivial fix" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/63870 [16:49:30] Change merged: coren; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63870 [16:56:59] !log authdns update, cleaning out old decom entries [16:57:07] Logged the message, RobH [17:02:39] New patchset: RobH; "decom ocg1-3, removing eqiad entries that werent decoms" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63873 [17:03:13] RobH, I want to apply my RT manifest on magnesium. Can you stand by to verify that it doesn't break racktables? [17:03:47] about to start deployment (zero ext) [17:03:58] yurik: you can also use !log [17:04:11] yurik: that puts it in http://wikitech.wikimedia.org/view/Server_admin_log (aka SAL) [17:04:26] paravoid, yes, but haven't started yet - need to double check something first -- connected to tin, what path is everything in? [17:04:29] yurik: so if there's an effect that's noticed hours from now, people can correlate timestamps there with graphs and other historical data [17:04:51] Tim said /a/common, but I've never deployed mediawiki :) [17:05:04] ohh boy, should be fun [17:05:14] the deployment instructions were updated, i think [17:05:41] so from all the back and forth, tin is the one to use ... i hope [17:10:18] Change merged: Andrew Bogott; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63867 [17:11:26] deploys from tin, yet [17:11:28] *yes [17:15:30] AaronSchulz: morning [17:16:20] bonjour [17:16:35] could you help me decipher https://graphite.wikimedia.org/render/?title=Top%208%20FileBackend%20Methods%20by%20Max%2090th%20Percentile%20Time%20%28ms%29%20log%282%29%20-1day&from=-1day&width=1024&height=500&until=now&areaMode=none&hideLegend=false&logBase=2&lineWidth=1&target=cactiStyle%28substr%28highestMax%28FileBackendStore.*.tp90,8%29,0,2%29%29 ? [17:16:54] what is doQuickOperationsInternal & executeOpHandlesInternal specifically [17:17:09] also, the ones that end in -ceph are obviously Ceph [17:17:20] yurik: yeah, sorry about the back and forth :( tin is it [17:17:24] are all the rest Swift? one of them has -swift but the rest do not [17:18:47] (a disk failed at 02:00am that resulted in a ~1min outage, plus a ~10h recovery where its data was redistributed on the rest of the cluster) [17:19:01] the ones without -swift or -ceph are both combined [17:19:13] you can look at things like highestCurrent(FileBackendStore.*-ceph.tp90,10) in graphite browser to just see ceph stuff [17:20:46] New patchset: Demon; "Puppetize gitblit configuration" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61036 [17:21:45] aha [17:21:54] New patchset: Demon; "Puppetize gitblit configuration" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61036 [17:22:18] executeOpHandlesInternal is used when curl_multi() is used for some operations [17:22:44] doQuickOperationsInternal always uses that and doOperationsInternal uses it when there is more than one op [17:22:58] ...hmm, the former could probably be like the later in the regard [17:23:43] anyway, executeOpHandlesInternal is only for write operations as well [17:28:23] New patchset: Odder; "(bug 48479) Change default favicon in InitialiseSettings.php" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/63877 [17:28:37] New patchset: coren; "Tool Labs: tweaks to exim config" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63878 [17:28:47] New patchset: Demon; "Remove some long-since-gone SVN stuff" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63864 [17:31:58] paravoid: of course, there is tp50 and tp99, though you already know that :) [17:32:04] yeah [17:32:05] New review: coren; "Trivial fix" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/63878 [17:32:05] Change merged: coren; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63878 [17:32:44] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:32:48] tp90 was the most interesting here [17:33:34] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.589 second response time [17:33:35] a failed disk isn't something that's going to happen every day, but it's nice to know the effects of it and possibly reduce the effect it has on performance [17:34:06] we sure had a lot of failed disks though [17:34:10] in both pmtpa & eqiad [17:34:14] zack's complaining of 403's when trying to access dumps URLs from Brussels. is there a known issue? [17:34:44] PROBLEM - RAID on es1001 is CRITICAL: CRITICAL: Degraded [17:34:59] New patchset: Andrew Bogott; "Rearranged RT apache includes:" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63880 [17:35:59] oh, myabe this is it: 2013-05-15 17:35:52: (mod_evasive.c.183) 60.191.2.238 turned away. Too many connections. [17:36:10] Change merged: Andrew Bogott; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63880 [17:37:50] New review: Demon; "Setting -1 until we're ready for this. Ironing out a few rough edges first." [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/63514 [17:38:35] paravoid: was the failed disk what made all the frontends give warnings about being down for a minute? [17:38:37] yes [17:38:44] was that the same incident I pinged you about? [17:38:54] I didn't see any pings [17:39:10] I mentioned it in IRC on this channel afaik [17:39:12] 04:55 < Aaron|home> hrm [17:39:12] 04:56 < Aaron|home> eh, not too much on flourine [17:39:17] that's what you mentioned at the time [17:39:42] anyway, why would that happen? Was it an artifact of the health check using a file that happened to have a primary osd fail? [17:39:59] I don't think so [17:40:04] ceph tends to be... unstable during peering [17:40:32] yeah, that seems to be a pattern [17:43:25] !log yurik synchronized php-1.22wmf4/extensions/ZeroRatedMobileAccess/ [17:43:33] Logged the message, Master [17:44:37] AaronSchulz: so, a couple more questions for you [17:44:51] AaronSchulz: what was the threshold on whether a container should be sharded or not? [17:45:04] there are a few containers with quite a lot of objects [17:45:09] e.g. wikisource-ar-local-thumb has ~300k objects [17:45:32] I think it was based on #of originals (included deleted) >= 25k [17:45:48] 300k thumbs is not that big though [17:46:09] think "web scale" :p [17:46:29] it's more than other containers' shards combined [17:46:34] mw1173 could not be sshed to [17:46:43] during dir-sync [17:49:15] AaronSchulz: next question :) are the pageNNN thumbs for pdfs/tiffs generated on demand? [17:49:32] or does this work via the job queue? [17:49:32] !log deploying new project filter on virt0 [17:49:34] (like transcoded) [17:49:39] Logged the message, Master [17:50:18] this is semi-related [17:50:41] wikisource-ar-local-public has 4279 objects [17:50:49] and 302452 thumbs [17:51:13] New patchset: Ori.livneh; "Set common rsync and dsh parameters in mw-deployment-vars" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57890 [17:51:51] paravoid: page thumbs are on demand [17:52:20] but when you request e.g. page5, is it just page5 that gets generated or all of them? [17:54:31] New patchset: Hashar; "contint: java definitions are now in the module" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63866 [17:55:49] page 5 only [17:56:00] !log yurik synchronized php-1.22wmf3/extensions/ZeroRatedMobileAccess/ [17:56:07] Logged the message, Master [17:56:12] okay [17:56:18] so these 300k thumbs were actually requested [17:56:21] amazing [17:57:12] final question for now [17:57:29] !log authdns update, killin dataset1 entries [17:57:33] AaronSchulz: you had a script to set x-content-duration retroactively, right? [17:57:36] Logged the message, RobH [17:57:56] AaronSchulz: when we started syncing files back in December, we had ceph 0.55 and that had a bug and custom headers were not set [17:58:21] AaronSchulz: everything post 0.56 should had its x-content-duration copied, but a few files copied before that wouldn't have it [17:58:52] would your script work in this case as-is or should I write something in python to sync that information? [17:59:52] the copy scripts I have don't deal with headers like that (though it handles sha1) [17:59:54] I can just re-run setFileHeaders.php [18:00:02] or refreshFileHeaders...whatever I called it [18:00:14] yeah, that's one what I was talking about [18:00:32] is it smart enough to ignore files that already have x-content-duration though? [18:00:42] !log maxsem synchronized php-1.22wmf3/extensions/MobileFrontend/ [18:00:52] Logged the message, Master [18:03:09] paravoid: no, it just does the POST anyway, which does nothing if it has headers [18:03:31] not that many files have them anyway [18:03:47] but it fetches the video file to get its duration? [18:04:00] MW has the metadata in a field in the DB [18:04:28] ah! [18:04:38] makes sense [18:05:50] so, can you run it on terbium or tell me what to run specifically? [18:05:59] (it is terbium, right? :) [18:08:04] now it is, yeah [18:13:47] New patchset: Dzahn; "Remove some long-since-gone SVN stuff" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63864 [18:14:05] ^demon: What's going on with Gerrit. It's sssslllllooooooooowwwww [18:14:18] <^demon> Yeah, I saw. [18:14:21] <^demon> Nothing in logs :\ [18:14:50] <^demon> Java's got cpu pegged at 100 tho. [18:16:27] <^demon> Tons of people trying to clone/fetch mediawiki/core anonymously. [18:16:31] Oh hah [18:16:47] <^demon> Lemme check access logs. [18:16:53] ugh, is there a hackathon somewhere [18:17:02] <^demon> AMS [18:17:14] Not yet, that's a few weeks out [18:17:39] quick, setup a amsterdam mirror for gerrit! [18:17:53] RoanKattouw: what was the git-review commit that makes me have to do fetch gerrit all the time now? [18:17:54] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: fishbowl, special, wikimedia and private to 1.22wmf4 [18:18:02] Logged the message, Master [18:18:18] ^demon: is this a single threaded process? [18:18:19] failmonitor doesn't work on tin :( [18:18:23] AaronSchulz: It's linked from my wikitech-l post. I haven't had time to chase them about it and I lost my browsing history so I don't have the link handy [18:18:28] yurik: Run it on fenari then [18:18:32] <^demon> AaronSchulz: No. [18:18:39] As the logs still seemingly go to NFS [18:18:44] how many cores are pegged then? [18:18:51] Reedy, i did, but hope that it can be ran on tin later [18:18:52] is seems like gerrit goes down eaasily [18:18:58] ^demon: I wonder if we couldn't proxy https://gerrit.wikimedia.org (anonymous clones) out to the other server that runs gitblit and have it be served by real git as opposed to jgit? [18:19:11] does failmonitor work on terbium? [18:19:17] yurik: Should be able to. When the logs are moved. [18:20:13] <^demon> RoanKattouw: We could, possibly. [18:20:32] paravoid: mwscriptwikiset maintenance/refreshFileHeaders.php all.dblist [18:21:33] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikivoyage and wiktionary to 1.22wmf4 [18:21:41] Logged the message, Master [18:24:40] New patchset: Catrope; "[WIP DO NOT MERGE] New Parsoid Varnish puppetization" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63890 [18:24:57] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikiversity and wikibooks to 1.22wmf4 [18:25:04] Logged the message, Master [18:25:36] <^demon> I'm killing some of these long-running clones. [18:26:15] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikinews and wikiquote to 1.22wmf4 [18:26:22] Logged the message, Master [18:28:12] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: aaaand wikisource to 1.22wmf4 [18:28:20] Logged the message, Master [18:31:52] New patchset: Reedy; "Everything non 'pedia to 1.22wmf4" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/63893 [18:32:43] ^demon: im getting 500s from gerrit [18:32:59] <^demon> Yeah, something's not right. [18:33:44] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/63893 [18:34:31] <^demon> !log restarting gerrit [18:34:39] Logged the message, Master [18:35:31] hate hate hate working with gerrit fatal: The remote end hung up unexpectedly73) fatal: protocol error: bad pack header [18:35:55] <^demon> It's not gerrit's fault. [18:36:18] what should I hate then? [18:36:28] The people DoSing it? [18:36:35] Jeff_Green: Computers [18:36:36] And also the ease with which it can be DoSed [18:36:47] sigh [18:36:55] i think i'm going to stick with my original statement then [18:36:57] New review: Dzahn; "kill svn ftw" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/63864 [18:36:58] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63864 [18:37:02] <^demon> RoanKattouw: So, I'm totally cool with proxying anon clones to gitblit. [18:40:05] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: commons back to 1.22wmf3 due to broken UploadWizard [18:40:12] Logged the message, Master [18:40:13] ^demon: your SVN cleanup changed has been merged and applied, no issues [18:41:25] <^demon> Sweet thanks. [18:41:41] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: commons back to 1.22wmf4 [18:41:49] Logged the message, Master [18:42:14] !log cherry-picked change 62100 on virt0 until permanent fix for jquery.chosen minimum width is added [18:42:22] Logged the message, Master [18:52:37] ^demon: if I have an existing git repo -- and an existing svn repo (that arent related) -- is there a way to merge the SVN repo into the git repo and preserve history? [18:53:36] <^demon> Yes. You'd convert the svn repo to a git repo (so you'd have 2 git repos). Then you'd do what's called a "subtree merge" to merge one repo into the other. [18:54:43] cool :) [19:04:02] New patchset: Ryan Lane; "(DO NOT MERGE) change pmtpa virt cluster to folsom" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63898 [19:04:39] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [19:08:18] !log reedy synchronized php-1.22wmf4/resources/ 'touch!' [19:08:25] Logged the message, Master [19:09:04] !log reedy synchronized wmf-config/ 'touch!' [19:09:11] Logged the message, Master [19:13:26] hi, trying to debug an issue in production. Is there a way to see the URL which caused an entry in apache log? [19:14:11] yurik: A PHP error/warning? [19:14:40] RoanKattouw, yes [19:15:17] You can try looking at the various log files in fluorine:/a/mw-log/ [19:17:52] !log disabling puppet on pmtpa virt cluster [19:17:59] Logged the message, Master [19:18:32] !log adding temporary ubuntu-cloud archive apt changes in a non-puppetized way on pmtpa virt nodes [19:18:39] Logged the message, Master [19:24:31] !log aaron cleared profiling data [19:24:38] Logged the message, Master [19:26:27] New patchset: Andrew Bogott; "Further twaddling of deps to get request-tracker and racktables on the same box." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63900 [19:27:17] Change merged: Andrew Bogott; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63900 [19:28:50] New review: Akosiaris; "I think we can use the cloudera packages directly . We don't gain something specific from rebuilding..." [operations/puppet/cdh4] (master) - https://gerrit.wikimedia.org/r/61710 [19:33:39] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [19:35:59] Reedy: terbium foundationwiki Error connecting to db1025.eqiad.wmnet: Unknown MySQL server host 'db1025.eqiad.wmnet' [19:36:03] did anyone ever look at that? [19:37:32] AaronSchulz: we discussed it in IRC but I don't know what happened after that [19:37:47] whatever it is should be temporarily switched to use db78.pmtpa.wmnet [19:38:09] mutante, robh -- short of modifying DNS, is there a way for me to browse to the RT server I just installed on magnesium? (I can change the virtual host in the apache config to… whatever.) [19:38:29] port forwarding? [19:38:41] andrewbogott: hack your /etc/hosts ? [19:38:57] Oh, I suppose it must have a public IP already... [19:38:58] oh, you already installed, lemme take a look [19:39:03] that's simple enough :) [19:39:38] AaronSchulz: it must be this: extensions/ContributionReporting/PopulateFundraisingStatistics.php [19:39:51] is it a cron, can it be turned off? [19:40:01] it's a cron, and I don't know [19:40:14] my guess is it's some foundationwiki reporting thing [19:40:15] that's the second time that broke and it's a pita to find anyone to fix it [19:40:21] Ryan_Lane: there's a cloud archive puppet stanza at swift.pp [19:40:26] Ryan_Lane: if you want to copy that [19:40:32] AaronSchulz: imo it should be burned at the stake [19:40:53] probably best to have it moved to the new fundraising public reporting server--I'll talk to fr-tech folks about redoing it [19:42:27] andrewbogott: looks like i still get racktables.. be back in a moment [19:42:28] andrewbogott: sorry, but yea, i do what mutante says [19:42:32] and local change my /etc/hosts [19:42:42] I'm trying too. Can't get anything but racktables... [19:42:47] but that might be because RT isn't currently working [19:43:17] andrewbogott: uhh, rt.wikimedia.org? [19:43:23] your apache host is for rttest [19:43:29] I just changed it [19:43:33] yeah RobH https://rt.wikimedia.org/ gives "It works! [19:43:33] " [19:43:38] but https://rt.wikimedia.org/index.html works [19:43:54] I will change it back, obviously this is making things worse [19:44:02] New patchset: Krinkle; "(re?)-add misc::deployment::common_scripts to fenari" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/62032 [19:44:05] !log running refreshFileHeaders.php on terbium [19:44:13] Logged the message, Master [19:44:21] New review: Krinkle; "Fixed message to have RT reference in a way that Gerrit automatically links." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/62032 [19:44:29] mutante: ^ [19:44:42] andrewbogott, that was a problem from earlier though - I noticed it a couple of days ago [19:45:57] New review: Aaron Schulz; "Why does this use the _SOURCE ones? That would only work on tin, right?" [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/55059 [19:46:02] Krinkle: thanks, needs path conflict fix.. sigh..i'll look later [19:47:01] New review: Aaron Schulz; "Though I see it did that before...maybe that should be fixed?" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/55059 [19:51:39] AaronSchulz: works perfectly so far, thanks :) [19:51:46] saved me some time [19:52:19] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63873 [19:55:09] hi, i still can't figure out where the warning is coming from. I know where the warning is, but can't figure out how it can happen. I need to find the actual user request (including their IP) to track it down [19:55:27] New patchset: Andrew Bogott; "Specify db1001.eqiad.wmnet rather than just db1001." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63910 [19:56:24] May 15 19:55:15 10.64.16.70 apache2[9389]: PHP Warning: array_key_exists() expects parameter 2 to be array, null given in /usr/local/apache/common-local/php-1.22wmf3/extensions/ZeroRatedMobileAccess/includes/PageRenderingHooks.php on line 564 [19:57:52] i see that message at fenari @ /home/wikipedia/syslog/apache.log [19:59:50] We don't track warnings in any way other than that [19:59:56] You'd have to hack in some debugging [20:00:01] paravoid: I already copied it :) [20:00:06] ah :) [20:00:11] paravoid: but I don't want to run puppet on these systems yet [20:00:17] yurik, if ( !$config['name'] ) wfDebugLog( 'mobile', print_r( $config, true ) ); [20:00:26] so, temporary non-puppetized changes [20:00:37] Reedy, could i find the entry in the log files? [20:00:55] No [20:01:05] Reedy: https://gerrit.wikimedia.org/r/#/c/63911/ [20:01:32] srsly? [20:01:34] but i thought we log all access? or is it one in 1000 ? we could search by time somehow [20:01:40] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:01:45] Reedy: I need to fix DB about that later [20:02:14] access logs are sampled [20:03:40] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:04:30] !log stopping all openstack services. it won't be possible to log into wikitech during this process [20:04:37] Logged the message, Master [20:04:38] !log disabling OpenStackManager [20:04:45] Logged the message, Master [20:05:07] hm. if I disable that maybe login will be possible :D [20:06:15] yurik: we do log unsampled for the mobile varnishes [20:06:24] <^demon|lunch> Ryan_Lane: Does LDAP go down too? [20:06:31] nope [20:06:37] <^demon|lunch> mmk. [20:06:52] drdee, in that case it should be possible to find the entry based on the time? which log file should i check? [20:06:55] !log aaron synchronized php-1.22wmf4/includes/specials/SpecialActiveusers.php 'df79a829457752d21b04e641fda0041e09d13a9f' [20:07:02] Logged the message, Master [20:07:16] much better [20:07:56] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 20:07:49 UTC 2013 [20:08:12] yurik: need some context of what you are trying to achieve [20:08:36] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [20:08:41] but would love to help you! [20:08:51] yurik, just a quick hack for logging should be enough [20:08:59] drdee, i am seeing a warning in production, (see above), and need to track the actual request that generated it [20:09:06] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 20:09:00 UTC 2013 [20:09:08] ok [20:09:11] !log upgrading nova services on compute nodes [20:09:17] you can even log $_SERVER['QUERY_STRING'] or something [20:09:18] Logged the message, Master [20:09:26] !log upgrading glance, keystone and nova-scheduler on virt0 [20:09:32] drdee, together with the originating ip - so i can figure out the X-CS [20:09:33] Logged the message, Master [20:09:36] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [20:09:40] np [20:09:51] drdee, and would love to learn how to do it myself so i don't bug you next time :))) [20:10:06] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 20:10:03 UTC 2013 [20:10:07] would love to show you! [20:10:13] how about quick hangout? [20:10:36] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [20:11:04] drdee, sure, sec [20:11:06] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 20:11:00 UTC 2013 [20:11:36] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [20:11:56] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 20:11:52 UTC 2013 [20:11:57] !log aaron cleared profiling data [20:12:04] Logged the message, Master [20:12:06] !log aaron cleared profiling data [20:12:14] Logged the message, Master [20:12:36] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [20:12:36] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 20:12:32 UTC 2013 [20:12:52] !log backing up keystone, nova, and glance databases [20:12:56] PROBLEM - Puppet freshness on pdf1 is CRITICAL: No successful Puppet run in the last 10 hours [20:12:56] PROBLEM - Puppet freshness on pdf2 is CRITICAL: No successful Puppet run in the last 10 hours [20:12:59] Logged the message, Master [20:13:36] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [20:13:55] !log upgrading schema for keystone, then glance, then nova [20:13:56] PROBLEM - Puppet freshness on virt0 is CRITICAL: No successful Puppet run in the last 10 hours [20:13:56] PROBLEM - Puppet freshness on virt1000 is CRITICAL: No successful Puppet run in the last 10 hours [20:14:02] Logged the message, Master [20:14:46] !log force running puppet on all virt nodes to bring them back up [20:14:53] Logged the message, Master [20:14:56] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 15 20:14:49 UTC 2013 [20:15:36] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [20:17:29] New patchset: Ryan Lane; "change pmtpa virt cluster to folsom" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63898 [20:17:37] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63898 [20:25:15] Change merged: Andrew Bogott; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63910 [20:25:46] RECOVERY - Puppet freshness on virt3 is OK: puppet ran at Wed May 15 20:25:43 UTC 2013 [20:27:01] New patchset: Ryan Lane; "Revert "Several minor changes to the openstack manifest:"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63915 [20:27:32] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63915 [20:27:40] New patchset: Andrew Bogott; "Revert "Several minor changes to the openstack manifest:"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63916 [20:28:01] andrewbogott: heh. beat you to it ;) [20:28:07] yurik: rt ticket filed [20:28:29] drdee, thanks! [20:28:34] do we use 3ware RAID controllers anywhere ever? [20:28:37] Change abandoned: Andrew Bogott; "ryan just did this" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63916 [20:29:46] RECOVERY - Puppet freshness on virt0 is OK: puppet ran at Wed May 15 20:29:37 UTC 2013 [20:38:05] PROBLEM - Puppet freshness on db45 is CRITICAL: No successful Puppet run in the last 10 hours [20:39:05] PROBLEM - Puppet freshness on db26 is CRITICAL: No successful Puppet run in the last 10 hours [20:50:55] !log maxsem synchronized php-1.22wmf4/extensions/MobileFrontend/ [20:51:02] Logged the message, Master [20:53:56] !log maxsem synchronized php-1.22wmf3/extensions/MobileFrontend/ [20:54:03] Logged the message, Master [20:55:04] New patchset: Pyoungmeister; "prelabsdb dbs: redacting tables" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63924 [20:55:53] !log openstack in pmtpa now at the folsom release [20:55:59] Logged the message, Master [20:56:41] !log OpenStackManager is reenabled on virt0. login to wikitech is working again [20:56:48] Logged the message, Master [20:58:06] wooo [20:58:09] congrats Ryan_Lane :) [20:58:16] :) [20:58:28] paravoid: check out how fast "get console output" is now :) [20:58:59] it used to take ages on really busy hosts like virt6. now it's super quick [21:00:23] New review: coren; "LGM, but not familiar enough with the sanitarium to +2" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/63924 [21:03:15] New review: Pyoungmeister; "hi coren," [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63924 [21:07:32] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:07:57] Coren: nothing? I thought that was a really good response to your comment on my patchset :) [21:08:10] notpeter: I'm still listening. :-) [21:08:18] awesome :) [21:08:33] I actually really really like the second one [21:08:44] perhaps more than the original [21:09:55] My tastes hover more towards the more melodic, though. Like Endeverafter. [21:18:32] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:21:32] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:21:42] RECOVERY - Host barium is UP: 72057594037927935 [21:23:22] PROBLEM - Host barium is DOWN: CRITICAL - Host Unreachable (208.80.154.12) [21:25:12] RECOVERY - Host barium is UP: 72057594037927935 [21:26:32] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:27:32] PROBLEM - SSH on barium is CRITICAL: No route to host [21:29:42] PROBLEM - Host barium is DOWN: CRITICAL - Host Unreachable (208.80.154.12) [21:30:12] RECOVERY - Host barium is UP: 72057594037927935 [21:33:34] need to deploy one file in zero ext to hunt for a bug, hope noone minds [21:33:34] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:35:11] greg-g, ping ^ [21:36:09] yurik: no proib [21:36:40] ugh, lag caused by flacky WMF wifi [21:36:50] New patchset: Dzahn; "new RT4 on Apache config - enable SSL, redirect http to https, avoid conflict with other sites, use a ports.conf with NameVirtualHost *:443, ..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63970 [21:37:11] PROBLEM - Host barium is DOWN: CRITICAL - Host Unreachable (208.80.154.12) [21:40:11] RECOVERY - Host barium is UP: 72057594037927935 [21:42:47] Reedy: Do we have an artificial intelligence module in Apache that I don't know about? I just got an email saying Apache created a wiki ;-) [21:43:00] sudo -u apache [21:43:01] They used to mention the user performing the action. [21:43:02] Which wiki? [21:43:05] iegcom [21:43:08] "just" [21:43:11] lol [21:43:21] These aren't high on my priority list [21:43:30] I just read it, rather. [21:43:31] PROBLEM - Host barium is DOWN: CRITICAL - Host Unreachable (208.80.154.12) [21:43:31] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:43:45] wow, been a week uh. [21:44:06] It's #3 chronologically in my krinklemail@gmail inbox right now. [21:44:11] The rest is handled. [21:44:43] New patchset: Dzahn; "new RT4 on Apache config - enable SSL, redirect http to https, avoid conflict with other sites, use a ports.conf with NameVirtualHost *:443, ..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63970 [21:45:11] RECOVERY - Host barium is UP: 72057594037927935 [21:49:51] PROBLEM - Host barium is DOWN: CRITICAL - Host Unreachable (208.80.154.12) [21:50:11] RECOVERY - Host barium is UP: 72057594037927935 [21:50:31] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:50:48] New patchset: Dzahn; "new RT4 on Apache config - enable SSL, redirect http to https, avoid conflict with other sites, use a ports.conf with NameVirtualHost *:443, ..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63970 [21:51:20] Reedy: are you running any revision related scripts? [21:51:34] I don't think so.. [21:52:59] ok [21:53:10] I swear there are more master Revision queries than normal [21:53:25] !log yurik synchronized php-1.22wmf3/extensions/ZeroRatedMobileAccess/includes/PageRenderingHooks.php [21:53:31] Logged the message, Master [21:53:48] New review: Dzahn; "this will make upcoming new RT and racktables live together on magnesium and use SSL" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/63970 [21:53:49] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63970 [21:57:21] PROBLEM - Host barium is DOWN: CRITICAL - Host Unreachable (208.80.154.12) [22:00:01] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: No successful Puppet run in the last 10 hours [22:00:11] RECOVERY - Host barium is UP: 72057594037927935 [22:00:28] !log yurik synchronized php-1.22wmf3/extensions/ZeroRatedMobileAccess/includes/PageRenderingHooks.php [22:00:35] Logged the message, Master [22:03:41] PROBLEM - Host barium is DOWN: CRITICAL - Host Unreachable (208.80.154.12) [22:04:32] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:05:11] RECOVERY - Host barium is UP: 72057594037927935 [22:06:16] New patchset: coren; "Tool Labs: Add packages (user requests)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63979 [22:06:32] PROBLEM - DPKG on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:06:32] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:07:01] New review: coren; "Routine, trivial." [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/63979 [22:07:02] Change merged: coren; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63979 [22:07:22] RECOVERY - DPKG on snapshot2 is OK: All packages OK [22:07:58] New patchset: Pyoungmeister; "fundraisingdb cluster based on credb module" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63980 [22:08:32] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:08:42] https://ishmael.wikimedia.org/sample/more.php?host=db1056&hours=2&checksum=17570832276017652668 [22:09:03] Reedy: hmm, queries count didn't go up, it just got slower [22:09:10] meh [22:10:02] PROBLEM - Host barium is DOWN: CRITICAL - Host Unreachable (208.80.154.12) [22:10:12] RECOVERY - Host barium is UP: 72057594037927935 [22:13:32] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:14:23] New patchset: Pyoungmeister; "fundraisingdb cluster based on credb module" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63980 [22:15:42] Jeff_Green: you there? [22:16:22] PROBLEM - Host barium is DOWN: CRITICAL - Host Unreachable (208.80.154.12) [22:19:40] New review: Asher; "repl_wild_ignore_tables takes "database.table" as the option, so each table will need to be in the f..." [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/63924 [22:20:12] RECOVERY - Host barium is UP: 72057594037927935 [22:20:47] New patchset: Faidon; "Revoke access for Munagala Ramanath (ram)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63982 [22:21:32] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:22:42] PROBLEM - Host barium is DOWN: CRITICAL - Host Unreachable (208.80.154.12) [22:25:12] RECOVERY - Host barium is UP: 72057594037927935 [22:25:29] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63982 [22:25:47] New review: Asher; "Missed the use of stdlib prefix() to take care of the db wildcarding. This looks gtg." [operations/puppet] (production); V: 2 C: 2; - https://gerrit.wikimedia.org/r/63924 [22:26:42] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:27:42] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 5.191 second response time [22:29:02] PROBLEM - Host barium is DOWN: CRITICAL - Host Unreachable (208.80.154.12) [22:29:23] New patchset: Pyoungmeister; "prelabsdb dbs: redacting tables" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63924 [22:29:32] PROBLEM - DPKG on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:29:32] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:29:41] !log krinkle synchronized php-1.22wmf4/resources/mediawiki/mediawiki.js 'touched' [22:29:47] Logged the message, Master [22:30:12] RECOVERY - Host barium is UP: 72057594037927935 [22:30:32] RECOVERY - DPKG on snapshot2 is OK: All packages OK [22:30:37] New review: Dzahn; "wfm on magnesium. you can graceful apache without warnings, both sites redirect to SSL" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63970 [22:36:32] PROBLEM - Host barium is DOWN: CRITICAL - Host Unreachable (208.80.154.12) [22:39:48] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63924 [22:40:08] RECOVERY - Host barium is UP: 72057594037927935 [22:41:32] what does the number tell us? [22:41:55] heh, it's big [22:42:48] PROBLEM - Host barium is DOWN: CRITICAL - Host Unreachable (208.80.154.12) [22:45:08] RECOVERY - Host barium is UP: 72057594037927935 [22:45:35] ah, barium has been moved to another rack [22:45:38] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:47:58] PROBLEM - DPKG on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:49:08] PROBLEM - Host barium is DOWN: CRITICAL - Host Unreachable (208.80.154.12) [22:49:48] RECOVERY - DPKG on snapshot2 is OK: All packages OK [22:50:08] RECOVERY - Host barium is UP: 72057594037927935 [22:50:38] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:54:28] PROBLEM - Puppet freshness on db44 is CRITICAL: No successful Puppet run in the last 10 hours [22:56:38] PROBLEM - Host barium is DOWN: CRITICAL - Host Unreachable (208.80.154.12) [23:00:08] RECOVERY - Host barium is UP: 72057594037927935 [23:02:58] PROBLEM - Host barium is DOWN: CRITICAL - Host Unreachable (208.80.154.12) [23:04:56] New patchset: Dzahn; "apache-sanity-check (part of apache-graceful-all) change the regex to also restart API servers" [operations/debs/wikimedia-task-appserver] (master) - https://gerrit.wikimedia.org/r/60768 [23:05:08] RECOVERY - Host barium is UP: 72057594037927935 [23:05:58] PROBLEM - DPKG on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:06:38] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:06:48] RECOVERY - DPKG on snapshot2 is OK: All packages OK [23:08:30] New patchset: Dzahn; "apache-sanity-check (part of apache-graceful-all) change the regex to also restart API servers" [operations/debs/wikimedia-task-appserver] (master) - https://gerrit.wikimedia.org/r/60768 [23:09:19] PROBLEM - Host barium is DOWN: CRITICAL - Host Unreachable (208.80.154.12) [23:10:09] RECOVERY - Host barium is UP: 72057594037927935 [23:13:19] PROBLEM - Puppet freshness on barium is CRITICAL: No successful Puppet run in the last 10 hours [23:16:49] PROBLEM - Host barium is DOWN: CRITICAL - Host Unreachable (208.80.154.12) [23:20:09] RECOVERY - Host barium is UP: 72057594037927935 [23:20:22] lol [23:21:00] New patchset: Catrope; "[WIP DO NOT MERGE] New Parsoid Varnish puppetization" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63890 [23:22:09] paravoid: it moved to frack :) [23:23:03] it shouldnt be in icinga anymore [23:23:07] =P [23:23:09] PROBLEM - Host barium is DOWN: CRITICAL - Host Unreachable (208.80.154.12) [23:23:28] or perhaps it already came out and thats new one, nm. [23:23:58] New patchset: Catrope; "[WIP DO NOT MERGE] New Parsoid Varnish puppetization" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63890 [23:24:12] RobH: not in decom, because it stays alive but it moved, so special case [23:24:53] New patchset: Pyoungmeister; "fundraisingdb cluster based on credb module" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63980 [23:25:09] RECOVERY - Host barium is UP: 72057594037927935 [23:26:22] ACKNOWLEDGEMENT - NTP on barium is CRITICAL: NTP CRITICAL: No response from NTP server daniel_zahn RT-5113 [23:26:23] ACKNOWLEDGEMENT - Puppet freshness on barium is CRITICAL: No successful Puppet run in the last 10 hours daniel_zahn RT-5113 [23:26:23] ACKNOWLEDGEMENT - SSH on barium is CRITICAL: No route to host daniel_zahn RT-5113 [23:29:17] !log reedy synchronized database lists files: [23:29:24] Logged the message, Master [23:29:29] PROBLEM - Host barium is DOWN: CRITICAL - Host Unreachable (208.80.154.12) [23:29:33] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63980 [23:29:39] blah, i thought that made you shut up, icinga-wm [23:29:47] sticky [23:30:01] schedules a loong downtime instead [23:30:09] RECOVERY - Host barium is UP: 72057594037927935 [23:31:39] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:32:06] !log reedy synchronized database lists files: [23:32:14] Logged the message, Master [23:34:10] New patchset: Aklapper; "Weekly Bugzilla Report mail: Fix wrong SQL query on bug resolutions" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63998 [23:38:10] 367 fault (11) [23:38:34] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:42:34] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:43:24] Change merged: Dzahn; [operations/debs/wikimedia-task-appserver] (master) - https://gerrit.wikimedia.org/r/60768 [23:43:34] PROBLEM - DPKG on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:44:01] !log pushing wikimedia-task-appserver 2.9-1 [23:44:09] Logged the message, Master [23:45:17] New patchset: GWicke; "[WIP DO NOT MERGE] New Parsoid Varnish puppetization" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63890 [23:46:34] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:47:34] RECOVERY - DPKG on snapshot2 is OK: All packages OK [23:47:54] New patchset: Catrope; "[WIP DO NOT MERGE] New Parsoid Varnish puppetization" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63890 [23:49:24] RECOVERY - Apache HTTP on srv291 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.282 second response time [23:49:38] New patchset: Catrope; "[WIP DO NOT MERGE] New Parsoid Varnish puppetization" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63890 [23:49:52] New patchset: Reedy; "Remove nomcom entries" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/64002 [23:53:34] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:56:16] New patchset: Pyoungmeister; "lsetting up db77 as clone of db78 for coredb testing" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64004 [23:59:54] New patchset: Reedy; "Move a few wikis from closed to deleted" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/64005