[00:01:16] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/64005 [00:01:35] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:01:44] PROBLEM - DPKG on searchidx2 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [00:02:44] RECOVERY - DPKG on searchidx2 is OK: All packages OK [00:03:18] New patchset: Reedy; "Remove nomcom entries" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/64002 [00:03:33] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/64002 [00:03:34] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:04:44] !log reedy synchronized database lists files: [00:04:52] Logged the message, Master [00:05:57] !log reedy synchronized database lists files: [00:06:05] Logged the message, Master [00:06:26] New patchset: Catrope; "[WIP DO NOT MERGE] New Parsoid Varnish puppetization" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63890 [00:07:54] RECOVERY - Puppet freshness on virt1000 is OK: puppet ran at Thu May 16 00:07:46 UTC 2013 [00:07:54] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 16 00:07:52 UTC 2013 [00:08:34] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:09:04] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 16 00:09:01 UTC 2013 [00:09:35] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:10:04] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 16 00:10:02 UTC 2013 [00:10:34] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:11:04] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 16 00:10:59 UTC 2013 [00:11:34] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:11:54] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 16 00:11:48 UTC 2013 [00:12:34] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:12:34] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 16 00:12:31 UTC 2013 [00:13:34] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:13:40] !log puppetstoredconfigclean.rb ms2.pmtpa.wmnet [00:13:48] Logged the message, Master [00:15:04] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 16 00:14:54 UTC 2013 [00:15:34] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:16:08] New review: Ryan Lane; "Puppet config looks good. Someone else should likely check the vcl." [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/63890 [00:17:19] New patchset: Catrope; "New Parsoid Varnish puppetization" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63890 [00:17:59] New review: Faidon; "The comment on backend says "upload backends". I also doubt you need the If-Cached mechanism, this w..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63890 [00:23:20] New review: GWicke; "Re If-cache: I'll drop it in a follow-up VCL changeset I am currently working on." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63890 [00:25:54] PROBLEM - Puppet freshness on mc15 is CRITICAL: No successful Puppet run in the last 10 hours [00:26:01] green [00:26:04] err [00:26:19] that was supposed to be a search. today is not my day for IRC skill [00:26:54] PROBLEM - Puppet freshness on colby is CRITICAL: No successful Puppet run in the last 10 hours [00:31:44] PROBLEM - Host ocg3 is DOWN: PING CRITICAL - Packet loss = 100% [00:32:35] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:33:17] New patchset: GWicke; "WIP: Parsoid VCL refinements" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64008 [00:33:45] New patchset: GWicke; "WIP: Parsoid VCL refinements" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64008 [00:34:34] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:35:16] New patchset: Dzahn; "decom barium, it moved to frack, per talk with Jeff" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64010 [00:36:37] !log puppetstoredconfigclean.rb ocg3.pmtpa.wmnet [00:36:44] Logged the message, Master [00:37:28] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64010 [00:40:34] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:41:29] PROBLEM - DPKG on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:43:33] <-- backup user bzip2'ing things [00:44:29] RECOVERY - DPKG on snapshot2 is OK: All packages OK [00:44:54] New patchset: GWicke; "WIP: Parsoid VCL refinements" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64008 [00:49:39] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:53:17] New patchset: GWicke; "WIP: Parsoid VCL refinements" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64008 [00:53:39] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:55:39] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:58:39] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:59:41] New patchset: MarkTraceur; "Add fundraising components to #wm-fundraising" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64012 [01:00:13] New review: MarkTraceur; "Don't merge this until the FR team has had a chance to discuss it, but it's here and ready AFAICT." [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/64012 [01:00:35] New patchset: MarkTraceur; "**awaiting discussion** Add fundraising components to #wm-fundraising" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64012 [01:00:39] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:02:36] does mediawiki have an official minimum memory requirement, with a concomitant commitment to, say, the unit tests passing? [01:03:07] i should probably ask this on #mediawiki, not channel-appropriate [01:03:15] * ori-l retracts [01:08:03] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [01:10:43] PROBLEM - Disk space on ms2 is CRITICAL: NRPE: Command check_disk_space not defined [01:22:14] New patchset: Cmjohnson; "Changing cfg for stat1002" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64013 [01:23:16] Change merged: Cmjohnson; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64013 [01:32:43] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:34:33] PROBLEM - DPKG on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:35:33] RECOVERY - DPKG on snapshot2 is OK: All packages OK [01:36:44] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:43:44] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:45:44] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:49:25] New patchset: Akosiaris; "Puppetizing Hadoop for CDH4." [operations/puppet/cdh4] (master) - https://gerrit.wikimedia.org/r/61710 [01:50:34] PROBLEM - DPKG on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:51:44] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:51:50] New review: Adamw; "(1 comment)" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/64012 [01:53:34] RECOVERY - DPKG on snapshot2 is OK: All packages OK [01:54:12] New review: Akosiaris; "So i added a couple of unit tests for the classes in the module. Most run just fine except for:" [operations/puppet/cdh4] (master) - https://gerrit.wikimedia.org/r/61710 [01:55:44] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:56:34] PROBLEM - DPKG on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:57:25] RECOVERY - DPKG on snapshot2 is OK: All packages OK [01:58:44] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:01:04] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [02:01:04] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [02:01:04] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [02:03:44] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:04:35] PROBLEM - DPKG on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:06:34] !log LocalisationUpdate completed (1.22wmf4) at Thu May 16 02:06:34 UTC 2013 [02:06:42] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:06:42] Logged the message, Master [02:09:42] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:11:32] RECOVERY - DPKG on snapshot2 is OK: All packages OK [02:11:55] !log LocalisationUpdate completed (1.22wmf3) at Thu May 16 02:11:55 UTC 2013 [02:12:02] Logged the message, Master [02:13:22] PROBLEM - Puppet freshness on gallium is CRITICAL: No successful Puppet run in the last 10 hours [02:14:22] PROBLEM - Puppet freshness on db1017 is CRITICAL: No successful Puppet run in the last 10 hours [02:14:32] PROBLEM - DPKG on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:18:42] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:25:32] RECOVERY - DPKG on snapshot2 is OK: All packages OK [02:26:42] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:28:32] PROBLEM - DPKG on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:29:42] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:30:28] !log LocalisationUpdate ResourceLoader cache refresh completed at Thu May 16 02:30:28 UTC 2013 [02:30:36] Logged the message, Master [02:31:32] RECOVERY - DPKG on snapshot2 is OK: All packages OK [02:33:42] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:35:42] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:37:32] PROBLEM - DPKG on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:39:28] RECOVERY - DPKG on snapshot2 is OK: All packages OK [02:40:28] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:40:38] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:41:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [02:41:38] PROBLEM - DPKG on mc15 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:42:28] RECOVERY - DPKG on mc15 is OK: All packages OK [02:42:38] PROBLEM - DPKG on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:42:38] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:46:28] RECOVERY - DPKG on snapshot2 is OK: All packages OK [02:48:38] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:56:38] PROBLEM - DPKG on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:56:38] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:59:29] RECOVERY - DPKG on snapshot2 is OK: All packages OK [03:00:38] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:04:38] PROBLEM - DPKG on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:06:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:07:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.123 second response time [03:08:35] RECOVERY - DPKG on snapshot2 is OK: All packages OK [03:09:45] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:16:38] PROBLEM - DPKG on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:16:45] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:22:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:23:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.128 second response time [03:24:35] RECOVERY - DPKG on snapshot2 is OK: All packages OK [03:24:45] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:27:45] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:32:45] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:35:36] PROBLEM - DPKG on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:38:26] RECOVERY - DPKG on snapshot2 is OK: All packages OK [03:38:46] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:42:46] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:44:46] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:47:36] PROBLEM - DPKG on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:49:36] RECOVERY - DPKG on snapshot2 is OK: All packages OK [03:50:46] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:52:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:53:26] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 9.612 second response time [03:55:36] PROBLEM - DPKG on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:55:46] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:00:46] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:04:27] RECOVERY - DPKG on snapshot2 is OK: All packages OK [04:06:43] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:07:53] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 16 04:07:45 UTC 2013 [04:08:23] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [04:09:03] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 16 04:08:53 UTC 2013 [04:09:23] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [04:09:33] PROBLEM - DPKG on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:10:03] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 16 04:09:55 UTC 2013 [04:10:23] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [04:10:45] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:10:53] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 16 04:10:51 UTC 2013 [04:11:23] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [04:11:43] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 16 04:11:40 UTC 2013 [04:12:23] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [04:12:24] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 16 04:12:22 UTC 2013 [04:13:23] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [04:13:33] RECOVERY - DPKG on snapshot2 is OK: All packages OK [04:16:43] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:16:43] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 16 04:16:42 UTC 2013 [04:17:23] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [04:21:43] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:22:43] PROBLEM - Disk space on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:23:33] PROBLEM - DPKG on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:25:43] RECOVERY - Disk space on snapshot2 is OK: DISK OK [04:26:33] RECOVERY - DPKG on snapshot2 is OK: All packages OK [04:28:43] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:29:33] PROBLEM - DPKG on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:30:33] RECOVERY - DPKG on snapshot2 is OK: All packages OK [04:30:43] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:31:33] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 3.238 second response time [04:33:33] PROBLEM - DPKG on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:33:43] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:34:33] RECOVERY - DPKG on snapshot2 is OK: All packages OK [04:34:46] New review: MZMcBride; "This seems reasonable to me!" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/63877 [04:36:43] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:37:33] PROBLEM - DPKG on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:41:43] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:43:43] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:47:43] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:48:33] RECOVERY - DPKG on snapshot2 is OK: All packages OK [04:51:43] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:01:43] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:05:33] PROBLEM - DPKG on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:05:43] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:07:34] RECOVERY - DPKG on snapshot2 is OK: All packages OK [05:08:44] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:11:44] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:13:44] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:15:44] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:18:44] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:23:44] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:24:34] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.123 second response time [05:31:44] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:31:44] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:32:34] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.129 second response time [05:35:44] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:38:46] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:42:46] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:46:32] TimStarling, around? [05:46:36] PROBLEM - DPKG on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:46:39] yes [05:47:17] hi, i'm having a weird issue, can't figure out what's causing it. Can i file-sync an extension file that would log an error condition? [05:47:35] TimStarling, problem is, i need to log IP and the full request (GET only) [05:48:05] it currently causes warnings in the fatalmonitor [05:48:18] yes you can [05:48:46] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:49:22] TimStarling, thx, i will ping you in a bit with the link to my patch, just in case. [05:50:27] RECOVERY - DPKG on snapshot2 is OK: All packages OK [05:50:37] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:51:26] RECOVERY - RAID on searchidx1001 is OK: OK: State is Optimal, checked 4 logical device(s) [05:51:46] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:53:36] PROBLEM - DPKG on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:54:36] RECOVERY - DPKG on snapshot2 is OK: All packages OK [05:57:36] PROBLEM - DPKG on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:59:46] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:02:36] RECOVERY - DPKG on snapshot2 is OK: All packages OK [06:03:46] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:05:36] PROBLEM - DPKG on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:05:46] PROBLEM - Disk space on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:06:46] RECOVERY - Disk space on snapshot2 is OK: DISK OK [06:06:46] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:08:46] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:10:06] PROBLEM - Puppet freshness on ocg1 is CRITICAL: No successful Puppet run in the last 10 hours [06:10:06] PROBLEM - Puppet freshness on ocg2 is CRITICAL: No successful Puppet run in the last 10 hours [06:12:46] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:13:06] PROBLEM - Puppet freshness on pdf1 is CRITICAL: No successful Puppet run in the last 10 hours [06:13:06] PROBLEM - Puppet freshness on pdf2 is CRITICAL: No successful Puppet run in the last 10 hours [06:14:36] RECOVERY - DPKG on snapshot2 is OK: All packages OK [06:15:46] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:16:17] TimStarling, https://gerrit.wikimedia.org/r/#/c/64020/1/includes/PageRenderingHooks.php [06:16:36] please +2 [06:16:48] will push it out now, and will revert right thereafter [06:17:08] should get enough hits to figure out who is triggering it [06:17:44] what bug number is it? [06:17:46] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:18:14] TimStarling, there is no bug - it was deployed yesterday, and immediatelly we saw it in fatalmonitor [06:18:28] php warning [06:18:36] PROBLEM - DPKG on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:18:59] probably something silly in logic, just that i don't see it in my tests [06:19:01] I think you should file a bug and reference it from the commit message or the patch comment or both [06:19:22] it lets other people know what you are doing without them having to ask you [06:19:34] ok [06:19:46] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:21:16] $dbg .= "\nURL: " . $_SERVER["SERVER_NAME"] . $_SERVER["REQUEST_URI"]; [06:21:28] this is not actually the URL, in a way that's particularly relevant for mobile clients [06:21:38] but it probably doesn't matter for you [06:22:01] TimStarling, ideally i would want to know the full HTTP request + headers [06:22:15] TimStarling, i will also need the source IP [06:22:21] you don't use $_SERVER['REQUEST_URI'] anywhere else, do you? [06:22:30] no, of course not [06:22:46] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:22:53] i need to figure out which X-CS based on IP [06:23:37] you can log wfGetIP() if that's all you need [06:24:04] not all - i also need the query, and which server they requested [06:24:57] if I were you, I'd either use a new log channel named after the bug number, or temp-debug, I wouldn't use a vague name like "mobile" that's already used for something else [06:25:23] sure, but that's there already [06:25:38] well, kind of [06:25:46] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:26:04] TimStarling, wouldn't i need to set up a new config setting, and do lots of other things? i don't want to accidently break too many things with this :) [06:26:35] just add it to wgDebugLogGroups in InitialiseSettings.php [06:26:36] RECOVERY - DPKG on snapshot2 is OK: All packages OK [06:26:39] (can't wait for the hadoop with sql interface :)) [06:26:48] that's what I usually do, but like I say, there's temp-debug if you think that's too hard [06:27:07] I guess someone else didn't like changing InitialiseSettings.php for every production debugging job [06:27:19] TimStarling, you mean i can use "temp-debug" instead of mobile? [06:27:23] yes [06:27:46] 'temp-debug' => "udp://$wmfUdp2logDest/temp-debug", // generic admin debug log [06:28:09] and the directory on fluorine is writable by udp2log now, so the file will be created automatically [06:28:40] it'll appear at /a/mw-log/temp-debug.log [06:28:46] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:29:31] TimStarling, thanks!!! i just git reviewed the change [06:29:35] pls +2 [06:29:36] PROBLEM - DPKG on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:30:16] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 16 06:30:10 UTC 2013 [06:30:19] what's the command to revert "git rm"? I'm sure I've done this once before but I can't remember it now [06:30:26] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [06:30:46] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:30:52] no idea - i usually use tortoise git ;) [06:30:59] or gerrit's [06:31:06] "revert" is very nice there [06:31:06] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 16 06:30:56 UTC 2013 [06:31:06] git reset file ; git checkout -- file [06:31:23] ori-l, are you always lurking??!? amazing :) [06:31:26] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [06:31:46] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 16 06:31:36 UTC 2013 [06:32:26] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [06:32:46] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:33:10] thanks ori-l [06:33:49] yurik: got to head to the airport for a flight in four hours or so, not much point in sleeping [06:34:03] europe? [06:34:28] going to israel first to spend a bit of time with my family [06:34:35] so: pseudo-europe [06:34:46] PROBLEM - Disk space on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:34:46] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:36:14] in the same sense that australia is pseudo-europe? ;) [06:36:34] exactly, with the disturbing colonial implications to boot [06:37:58] I wonder how israel will end up in the long term [06:38:06] like south africa or like liberia? [06:38:07] PROBLEM - Puppet freshness on db45 is CRITICAL: No successful Puppet run in the last 10 hours [06:38:37] i.e. to what extent will it become like its surroundings? [06:38:37] RECOVERY - Disk space on snapshot2 is OK: DISK OK [06:38:43] about to sync file in ext [06:39:07] PROBLEM - Puppet freshness on db26 is CRITICAL: No successful Puppet run in the last 10 hours [06:40:47] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:41:21] !log yurik synchronized php-1.22wmf3/extensions/ZeroRatedMobileAccess/includes/PageRenderingHooks.php [06:41:28] Logged the message, Master [06:42:02] I think Liberia is a really interesting analogue for Israel, in terms of their compassionate rationales for foundation [06:42:41] south africa is an interesting point of comparison; i don't know much about liberian politics but reading about it now [06:43:22] both Israel and Liberia were founded by western states as homelands for oppressed people in those sponsoring states [06:43:44] but Liberia is almost twice as old so maybe it gives you insight into a later stage of the process [06:44:47] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:45:28] RECOVERY - DPKG on snapshot2 is OK: All packages OK [06:46:07] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 16 06:45:59 UTC 2013 [06:46:27] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [06:46:32] New patchset: Tim Starling; "Remove three more scap scripts which were moved to puppet" [operations/debs/wikimedia-task-appserver] (master) - https://gerrit.wikimedia.org/r/64023 [06:47:13] sorry, more than twice as old, my memory was failing [06:48:07] 3 times as old, in fact: 191 years versus 65 years [06:48:37] PROBLEM - DPKG on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:49:37] RECOVERY - DPKG on snapshot2 is OK: All packages OK [06:50:35] palestinians are not capable of instigating civil war at the moment, i don't think. the degree to which israel's "security" apparatuses control every aspect of palestinian life is staggering [06:50:47] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:51:22] a unified nonviolent movement with major international backing is a possibility, but i don't know what it would accomplish. it's an incredibly depressing situation. [06:51:47] PROBLEM - Disk space on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:52:37] PROBLEM - DPKG on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:52:48] RECOVERY - Disk space on snapshot2 is OK: DISK OK [06:53:47] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:53:49] no, the palestinians are not capable of creating a civil war [06:54:14] you could compare the current period for israel with this period for liberia: https://en.wikipedia.org/wiki/History_of_Liberia#Americo-Liberian_domination_and_suppression [06:56:16] the parallels are a bit uncanny, right down to the reproduction of patterns of oppression [06:56:46] 'mobile' is used for bug logging only, so there's no problem in reusing it [06:57:47] PROBLEM - Disk space on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:58:47] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:59:47] but americo-liberians constituted 5% of the population, whereas jews currently slightly outnumber arabs in israel and palestine taken as one unit. they are a minority in modern palestine proper (settlers, that is), but that minority is increasingly geographically contiguous with the centers of jewish population in israel, so maybe the comparison with liberia breaks down there [07:01:47] RECOVERY - Disk space on snapshot2 is OK: DISK OK [07:01:47] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:02:47] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 8.264 second response time [07:02:47] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:03:21] aha yurik - you have logged hits now [07:05:47] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:08:02] MaxSem, yep, btw, tim suggested an excellent way to log :) [07:08:07] separate file, much easier [07:08:44] MaxSem, and the hits are weird :( it seems that our opera recognition is not working :( [07:09:06] that's a side benefit of this log [07:09:14] Anus_m.jpg - why I'm not surprised? [07:09:22] yeah, i was amused too [07:10:06] opera as in opera ips [07:10:19] yes [07:10:25] their forwarding cluster [07:10:58] there are also android hits [07:11:01] although the anus request is not [07:12:10] debating if i should revert now, or collect a few more hits [07:13:24] ooh, the last one is interesting - probably coming from cache [07:13:44] as we no longer have zeropartner=NNN [07:14:41] ok, i think this is good enough, time to stop this [07:17:35] syncing [07:25:16] MaxSem, i ran into an unexpected problem - what happens if my computer dies during the sync operation? [07:26:10] servers go out of sync because sync requires your auth agent [07:27:27] MaxSem, could you do me a favour and sync-file php-1.22wmf3/extensions/ZeroRatedMobileAccess/includes/PageRenderingHooks.php [07:27:36] everything is already in place [07:28:01] i think my desktop just died :( [07:28:22] or is rebooting due to urgent microsoft updates [07:28:32] probably the latter [07:28:39] Tuesday was Patch Tuesday [07:28:50] for me there are ~65.8 MB worth of updates [07:29:04] wow. well, its been down for the past 10 min... [07:29:20] i can just see it trying to reboot and asking me to confirm something... [07:29:21] !log maxsem synchronized php-1.22wmf3/extensions/ZeroRatedMobileAccess/includes/PageRenderingHooks.php [07:29:27] MaxSem, thanks! [07:29:28] Logged the message, Master [07:29:58] yei, it just came back up!!!! [07:30:11] ~12 min down!!! [07:37:14] PROBLEM - Disk space on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:38:04] RECOVERY - DPKG on snapshot2 is OK: All packages OK [07:38:14] RECOVERY - Disk space on snapshot2 is OK: DISK OK [07:39:34] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:44:04] PROBLEM - DPKG on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:45:54] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 16 07:45:45 UTC 2013 [07:46:14] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [07:46:38] !log maxsem synchronized php-1.22wmf4/extensions/GeoData/ 'https://gerrit.wikimedia.org/r/#/c/63972/' [07:46:46] Logged the message, Master [07:48:34] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:53:18] !log maxsem synchronized php-1.22wmf3/extensions/GeoData/ 'https://gerrit.wikimedia.org/r/#/c/63972/' [07:53:26] Logged the message, Master [07:53:34] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:55:39] Tim-away, around? I notice in all the log entries each request had 3 XFF header values. Is that normal? [07:56:14] PROBLEM - Disk space on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:56:44] PROBLEM - SSH on snapshot2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:56:44] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:57:44] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 1.198 second response time [07:58:44] RECOVERY - SSH on snapshot2 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [08:00:54] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: No successful Puppet run in the last 10 hours [08:02:04]