[00:00:43] paravoid: there is a list of bogus files in aaron/commons-boguslistings on terbium [00:01:18] every one I look at seems to have been deleted by an admin [00:01:31] maybe the containers just failed to update [00:08:08] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Tue May 14 00:07:59 UTC 2013 [00:08:18] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:09:18] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Tue May 14 00:09:09 UTC 2013 [00:09:18] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:10:18] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Tue May 14 00:10:11 UTC 2013 [00:11:18] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:11:58] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Tue May 14 00:11:56 UTC 2013 [00:12:18] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:12:48] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Tue May 14 00:12:39 UTC 2013 [00:13:18] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:13:18] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Tue May 14 00:13:13 UTC 2013 [00:14:18] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:14:48] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Tue May 14 00:14:46 UTC 2013 [00:15:18] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:16:21] !log installing package-upgrades on zirconium [00:16:29] Logged the message, Master [00:20:05] !log added myself to root email alias at mchenry [00:20:13] :) there you go Alex [00:20:13] Logged the message, Master [00:31:25] akosiaris: Next you can get it setup so you get a LOADS of messages to your phone from icinga! ;) [00:32:13] Reedy: getting there.... but i think I will create some filters first ;-) [00:38:38] Reedy (you vampire!) any idea about the 1.22wmf4 messages? Is there some update localization cache step for 1.22wmf4 that has to run? [00:41:52] New patchset: Tim Starling; "Fix rsyncd.conf filename" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63600 [00:43:24] spagewmf: There shouldn't be. At this point I think it's worth waiting for localisation update to run tonight (next couple of hours?) and see if that fixes it [00:44:08] Reedy thanks for the update. [00:44:23] Change merged: Tim Starling; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63600 [00:59:23] New patchset: Bsitu; "Add new eventlogging schema:EchoMail" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/63602 [00:59:48] !log tstarling synchronized README [00:59:56] Logged the message, Master [01:03:07] New patchset: Tim Starling; "Copy the new rsyncd "hosts allow" line from nfs1 to tin" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63603 [01:04:33] Change merged: Tim Starling; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63603 [01:12:36] !log tstarling synchronized README [01:12:44] Logged the message, Master [01:25:37] argh [01:26:14] running dsh pegs the CPU at 100% on my ssh-agent instance [01:27:10] for a long time [02:11:07] !log LocalisationUpdate failed: mwversionsinuse returned empty list [02:11:15] Logged the message, Master [02:12:11] !log LocalisationUpdate completed (1.22wmf3) at Tue May 14 02:12:10 UTC 2013 [02:12:18] Logged the message, Master [02:31:15] !log LocalisationUpdate completed (1.22wmf4) at Tue May 14 02:31:15 UTC 2013 [02:31:23] Logged the message, Master [02:51:07] New patchset: Jforrester; "Enable VisualEditor on all content namespaces for MW.org" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/63621 [03:09:49] !log LocalisationUpdate ResourceLoader cache refresh completed at Tue May 14 03:09:49 UTC 2013 [03:09:56] Logged the message, Master [03:13:32] New patchset: coren; "Tool Labs: Email! (@toollabs.org)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63624 [03:15:39] New review: coren; "I can haz domain!" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/63624 [03:15:39] Change merged: coren; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63624 [03:19:07] New patchset: Tim Starling; "Switch back to fenari agent" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63625 [03:20:23] Change merged: Tim Starling; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63625 [03:36:14] New patchset: Tim Starling; "Remove some node lists" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56108 [03:39:01] Change merged: Tim Starling; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56108 [03:51:38] New patchset: Tim Starling; "Add puppetized dsh on fenari and bast1001" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63626 [03:53:36] New patchset: Tim Starling; "Add puppetized dsh on fenari and bast1001" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63626 [03:54:24] Change merged: Tim Starling; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63626 [03:57:39] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:58:29] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 1.086 second response time [04:01:39] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:02:26] !log tstarling synchronized cgi-bin [04:02:29] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [04:02:34] Logged the message, Master [04:08:04] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Tue May 14 04:07:58 UTC 2013 [04:08:14] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [04:09:14] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Tue May 14 04:09:07 UTC 2013 [04:09:14] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [04:10:14] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Tue May 14 04:10:11 UTC 2013 [04:11:14] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [04:12:04] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Tue May 14 04:11:57 UTC 2013 [04:12:14] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [04:12:44] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Tue May 14 04:12:41 UTC 2013 [04:13:14] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [04:13:24] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Tue May 14 04:13:15 UTC 2013 [04:14:14] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [04:15:14] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Tue May 14 04:15:04 UTC 2013 [04:15:14] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [04:34:06] !log moved scap to tin. Copied source from NFS to tin. Fixed up broken git submodule config using sed. [04:34:14] Logged the message, Master [04:34:56] PROBLEM - Puppet freshness on db45 is CRITICAL: No successful Puppet run in the last 10 hours [04:37:56] PROBLEM - Puppet freshness on db26 is CRITICAL: No successful Puppet run in the last 10 hours [05:06:41] New patchset: Tim Starling; "Add tin to mediawiki-installation" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63627 [05:09:35] New patchset: Tim Starling; "Migrate scap-1, scap-2, & sync-common from wikimedia-task-appserver" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57854 [05:18:23] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [05:18:55] Change merged: Tim Starling; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57854 [05:19:54] Change merged: Tim Starling; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63627 [05:24:41] New patchset: Tim Starling; "Update rsyncd location in scap client scripts" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63629 [05:25:39] Change merged: Tim Starling; [operations/debs/wikimedia-task-appserver] (master) - https://gerrit.wikimedia.org/r/58671 [05:26:13] Change merged: Tim Starling; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63629 [05:26:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:27:23] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.142 second response time [05:33:43] New patchset: Tim Starling; "Don't put scap scripts in the root directory" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63630 [05:34:02] Change merged: Tim Starling; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63630 [05:36:36] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:37:26] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.131 second response time [05:57:16] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: No successful Puppet run in the last 10 hours [06:01:36] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:02:26] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [06:22:04] New patchset: Tim Starling; "Move in the remaining scap scripts from wikimedia-task-appserver" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63632 [06:25:03] Change merged: Tim Starling; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63632 [06:30:29] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Tue May 14 06:30:21 UTC 2013 [06:30:29] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [06:31:09] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Tue May 14 06:31:06 UTC 2013 [06:31:29] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [06:31:49] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Tue May 14 06:31:46 UTC 2013 [06:31:49] New patchset: Tim Starling; "Remove duplicate of mwversionsinuse" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63633 [06:32:29] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [06:40:15] !log tstarling synchronized README [06:40:24] Logged the message, Master [06:51:43] PROBLEM - Puppet freshness on db44 is CRITICAL: No successful Puppet run in the last 10 hours [07:02:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:03:23] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [07:03:41] New patchset: Tim Starling; "Updates for migration to tin" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/63635 [07:06:44] New review: Tim Starling; "Sorry, can't wait for review. Will deploy carefully." [operations/mediawiki-config] (master); V: 2 C: 2; - https://gerrit.wikimedia.org/r/63635 [07:07:06] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/63635 [07:11:26] !log tstarling synchronized docroot/noc/db.php [07:11:28] Logged the message, Master [07:13:48] !log tstarling synchronized multiversion [07:13:56] Logged the message, Master [07:14:51] !log tstarling synchronized refresh-dblist [07:14:59] Logged the message, Master [07:15:29] !log tstarling synchronized w/MWVersion.php [07:15:37] Logged the message, Master [07:28:58] New patchset: Tim Starling; "Running commands as apache is required for deployment" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63637 [07:29:21] Change merged: Tim Starling; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63633 [07:29:38] Change merged: Tim Starling; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63637 [07:33:27] !log tstarling Started syncing Wikimedia installation... : [07:33:35] Logged the message, Master [07:36:34] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:37:24] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [07:43:58] !log tstarling Started syncing Wikimedia installation... : [07:44:06] Logged the message, Master [07:47:26] scap is not quite working yet [07:47:34] back shortly [07:57:34] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:57:54] apergos: and finally here I am [07:58:05] hey [07:58:12] well our slot isn't for a while yet [07:58:24] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.137 second response time [07:59:22] we have 10 am utc, which is in a couple hours, the i18n thing should be happening soon [08:00:09] apergos: I guess we can start as soon as i18n as finished [08:00:25] yep [08:02:16] Nikerabbit: hi, are you using your i18n deployment slot today? :) [08:07:15] you know that scap isn't working right? [08:07:51] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Tue May 14 08:07:47 UTC 2013 [08:08:01] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [08:08:28] TimStarling: I guess they are not deploying any i18n changes today [08:08:31] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Tue May 14 08:08:27 UTC 2013 [08:09:01] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [08:09:01] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Tue May 14 08:08:59 UTC 2013 [08:09:04] when do you want it working by? [08:09:22] we won't be using scap for our window [08:09:34] (puppet, a bit of dsh) [08:10:01] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [08:11:10] !log tstarling Started syncing Wikimedia installation... : [08:11:18] Logged the message, Master [08:15:11] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Tue May 14 08:15:03 UTC 2013 [08:16:01] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [08:18:27] New patchset: Tim Starling; "Update location of find-nearest-rsync in sudoers" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63639 [08:18:43] Change merged: Tim Starling; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63639 [08:22:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:23:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [08:24:01] PROBLEM - Puppet freshness on mc15 is CRITICAL: No successful Puppet run in the last 10 hours [08:25:01] PROBLEM - Puppet freshness on colby is CRITICAL: No successful Puppet run in the last 10 hours [08:25:59] !log tstarling Started syncing Wikimedia installation... : [08:26:03] 4th time lucky? [08:26:07] Logged the message, Master [08:33:42] !log tstarling Finished syncing Wikimedia installation... : [08:33:49] Logged the message, Master [08:34:28] woot. [08:36:23] !log tstarling synchronized php-1.22wmf3/LocalSettings.php 'remove testwiki special case' [08:37:37] !log tstarling synchronized php-1.22wmf4/LocalSettings.php 'remove testwiki special case' [08:45:40] ok, I'm done for the day [08:45:59] oh, except I should chase that apt issue I guess [08:46:20] congrats tim ! [08:46:26] thank you [08:49:36] yay [08:50:28] Logged the message, Master [08:50:37] Logged the message, Master [08:55:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:57:23] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.132 second response time [09:03:09] AaronSchulz: I'm looking at that list of yours [09:03:26] I've tried to find a bunch of those on ms7 and have found 0 so far [09:03:42] I'm about to script it, but I don't have much hope [09:06:44] have you tried ms1001? [09:10:32] found 44 out of 5079 [09:16:39] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:17:29] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [09:36:35] !log depooled ssl1 in pmtpa for testing [09:36:43] Logged the message, Master [09:51:14] testing? [09:51:24] hashar's work? [09:52:08] paravoid: yeah Ariel is deploying the new manifests for the proto proxies :-] [09:57:43] !log puppet temp disabled on all ssl terminators except ssl1 in pmtpa [09:57:50] Logged the message, Master [09:58:51] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [09:58:51] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [09:58:51] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [10:00:08] ppooor puppet [10:05:23] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/62966 [10:06:10] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/62973 [10:08:03] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/62976 [10:18:20] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/62977 [10:20:55] New patchset: Hashar; "protoproxy: mobile + beta support" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63431 [10:22:54] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63431 [10:42:10] !log repooled ssl1 [10:42:19] Logged the message, Master [10:51:07] !log re-enabling puppet on ssl1001 for a bit of real traffic [10:51:15] Logged the message, Master [10:56:04] http://ganglia.wikimedia.org/latest/?c=SSL%20cluster%20eqiad&h=ssl1001.wikimedia.org&m=cpu_report&r=hour&s=by%20name&hc=4&mc=2 [10:56:07] still working :D [11:07:56] New patchset: Hashar; "beta: enable HTTPS on all projects" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63644 [11:13:35] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:14:25] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [11:15:35] PROBLEM - DPKG on mc15 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:16:35] RECOVERY - DPKG on mc15 is OK: All packages OK [11:20:16] !reenabling puppet and reloading nginx on all ssl terminators [11:23:25] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:23:35] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:23:40] hm [11:24:01] dos on stafford? :D [11:24:12] no, stafford flaps a lot sadly [11:24:16] it is likely overworked [11:24:28] I'm just yeing the snapshot2 message, will have to check on that [11:24:34] yeah as I said, the catalog are not cached :-D [11:24:52] we should cache them by git revision instead of an ever changing timestamp [11:25:26] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.129 second response time [11:26:47] apergos: final step for beta is https://gerrit.wikimedia.org/r/63644 :-D [11:26:59] that will enable the nginx config for domains besides bits.beta.wmflabs.org [11:27:03] snap looks ok [11:27:17] greedyguts :-P [11:27:26] let's see it [11:28:46] fine fine :-P [11:29:47] hashar: nope [11:29:54] Nikerabbit: :-] [11:30:03] didn't it read in the calendar? [11:30:03] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63644 [11:30:07] Nikerabbit: asked because we had a window just after your [11:31:09] no, the calendar listed you guys with the regular deployment slot [11:31:16] weird [11:31:21] I did email greg [11:31:31] no worries, looks like we are done now [11:32:35] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:33:25] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [11:33:35] PROBLEM - DPKG on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:34:22] andre__: does https://bugzilla.wikimedia.org/show_bug.cgi?id=48072#c2 mean we should be removing patch-in-gerrit after marking bugs RESOLVED FIXED? [11:34:25] RECOVERY - DPKG on snapshot2 is OK: All packages OK [11:35:19] odder: I wouldn't - I don't see any win, and I didn't do that mass removing myself [11:36:35] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:37:25] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [11:41:05] PROBLEM - HTTP radosgw on ms-fe1004 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 758 bytes in 0.003 second response time [11:42:22] New patchset: Hashar; "labs: hardcode nginx server_names_hash_bucket_size to 64" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63646 [11:45:26] New patchset: Hashar; "labs: hardcode nginx server_names_hash_bucket_size to 64" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63646 [11:46:13] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63646 [11:46:41] \O/ [11:46:42] andre__: thanks [11:50:35] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:52:22] hm, interesting to see we now have 300 more reports open than a month ago, andre__ [11:53:35] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:53:35] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:53:40] odder, normal growth I'd say [11:53:48] odder, https://bugzilla.wikimedia.org/reports.cgi?product=-All-&datasets=UNCONFIRMED&datasets=NEW&datasets=ASSIGNED&datasets=REOPENED&banner=1 [11:54:03] oh, that's neat [11:54:25] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [11:57:35] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:59:17] PROBLEM - SSH on snapshot2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:59:35] PROBLEM - Disk space on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:59:35] PROBLEM - DPKG on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:03:11] crap [12:03:25] RECOVERY - Disk space on snapshot2 is OK: DISK OK [12:03:26] RECOVERY - DPKG on snapshot2 is OK: All packages OK [12:03:26] memory. hopefully it will oom one of those [12:04:05] RECOVERY - SSH on snapshot2 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [12:04:35] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:25:39] New patchset: Faidon; "Ceph: move osd min down reporters to [mon]" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63652 [12:25:49] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63651 [12:26:01] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63652 [12:26:21] RECOVERY - SSH on snapshot2 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [12:28:07] PROBLEM - SSH on snapshot2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:33:07] RECOVERY - SSH on snapshot2 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [12:35:27] RECOVERY - Disk space on snapshot2 is OK: DISK OK [12:35:28] RECOVERY - DPKG on snapshot2 is OK: All packages OK [12:36:39] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:38:02] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61636 [12:38:31] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61558 [12:39:39] PROBLEM - Disk space on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:40:39] PROBLEM - DPKG on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:40:39] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:40:46] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/62434 [12:44:39] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:45:29] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.135 second response time [12:47:33] New patchset: Hashar; "contint: install colordiff" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63130 [12:48:19] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63130 [12:48:39] PROBLEM - RAID on snapshot2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:51:30] RECOVERY - Disk space on snapshot2 is OK: DISK OK [12:51:30] RECOVERY - DPKG on snapshot2 is OK: All packages OK [13:02:43] New review: Faidon; "allow_xff is currently all of the Wikimedia networks. I don't think we want to do such magic on all ..." [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/62103 [13:10:31] New review: Faidon; "Thanks for doing this! FTR, I'm not actually reviewing all that, I'll just trust your script :)" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/63500 [13:10:32] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63500 [13:14:13] New review: Mark Bergsma; "Looks very good, very thorough and extensive!" [operations/software/varnish/vhtcpd] (master) C: 1; - https://gerrit.wikimedia.org/r/60390 [13:49:05] apergos: if you have some time can you have a look at https://gerrit.wikimedia.org/r/#/c/63220/ [13:52:04] can I look at this after my lunch actually? [13:52:19] need brain food [13:52:44] !log Zuul: applying a patch to prevent it from fetching a change multiple time. [13:52:53] Logged the message, Master [13:53:05] !log running dns update [13:53:12] Logged the message, Master [13:54:14] apergos:of course! [13:57:42] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63220 [14:07:18] New patchset: Reedy; "Set Thai wikis to uca-default collation" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/63661 [14:08:58] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/63661 [14:12:25] !log reedy synchronized wmf-config/InitialiseSettings.php 'thwiki collation' [14:12:33] Logged the message, Master [14:16:13] someone didn't want to wait :-D [14:16:39] but your input is still very much appreciated :) [14:17:39] sreedy@tin:~$ sql wikidatawiki [14:17:39] /usr/local/bin/sql: line 19: mysql: command not found [14:18:16] Can someone put the mysql client stuff on tin please? It's not on bast1001 either, though I'm not sure it should be [14:20:45] Ah, I can make the changeset [14:21:56] New patchset: Reedy; "Add generic::mysql::packages::client to tin" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63663 [14:22:53] Reedy: [14:22:57] mysql on terbium [14:22:59] not on tin [14:23:15] eh? [14:23:17] tin is apparently just for deployment [14:23:27] not for random mysql queries [14:23:53] <^demon> What about maintenance scripts? [14:23:58] terbium [14:24:01] it's the new hume [14:24:04] all that stuff will be there [14:24:10] <^demon> Mmk. [14:24:30] Ugh. So I have to pop another shell and/or change host just to run an sql query? :/ [14:24:35] Long live fenari! ;) [14:24:43] just leave a window open over there [14:24:45] no biggie [14:26:07] Hmm. sql.php? :D [14:26:37] * apergos glowers [14:27:02] so I guess I shouldn't talk about the dark launch in here then... [14:27:08] or everyone will want it [14:27:10] :-P [14:27:44] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:29:11] Maybe i should just setup one of the putty window management tools. AFAIK they can automatically open numerous windows [14:29:35] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [14:29:39] sorta like your channel autojoin yeah [14:30:35] also that stringutils entry is hilarious [14:31:06] And typically there's at least 5 options.. [14:35:04] PROBLEM - Puppet freshness on db45 is CRITICAL: No successful Puppet run in the last 10 hours [14:37:14] <^demon> apergos: We should do that for mediawiki...don't have a static skin, but have it figure out things randomly based on page title :) [14:37:35] :-D :-D [14:37:40] now there is an apr 1 idea [14:38:02] PROBLEM - Puppet freshness on db26 is CRITICAL: No successful Puppet run in the last 10 hours [15:01:44] New patchset: Hashar; "jenkins::slave and a basic role applied gallium" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63666 [15:02:10] 666 [15:02:17] yeah that sounds evil enough [15:13:37] !log Created EducationProgram tables on dewikiversity [15:13:44] Logged the message, Master [15:19:18] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [15:22:52] New patchset: Reedy; "Cache loaded dblists when tagged. Reuse for SiteMatrix, CentralAuth and Incubator" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/57173 [15:26:42] New patchset: Diederik; "Two fixes to rolematcher.py" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63668 [15:28:24] hashar: Regarding the RT bug about puppet and apt… does your last comment mean that the issue is now fixed? [15:28:34] and the RT is ? :D [15:28:50] for php5 ? [15:29:00] Sorry, my client crashed [15:29:02] https://rt.wikimedia.org/Ticket/Display.html?id=5141 [15:30:56] ah yeah that one [15:31:03] we were talking about it with Ariel [15:31:13] I just copy pasted my investigation results [15:31:33] not sure what happened, but I suspect our pinning to prefer Wikimedia release is/was not working [15:31:38] New patchset: Diederik; "Two fixes to rolematcher.py" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63668 [15:31:51] ok. I just want to make sure someone is on the case :) [15:32:02] someone need to rebuild our package on top of latest ubuntu version I guess [15:32:11] and I am not working on it. [15:32:28] I think Tim logged that RT to make sure the issue will not get forgotten. [15:32:32] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63668 [15:32:41] and I have no who manage the php package :( [15:32:56] ok… paravoid, when I hear 'rebuild the package' I think of you :) [15:39:23] andrewbogott: yeah sorry, I am not very helpful on this topic [15:41:56] * andrewbogott -> dentist [15:42:41] New patchset: Ottomata; "Fixing one more PacketLossLogtailer error" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63669 [15:42:56] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63669 [15:46:31] New patchset: Reedy; "Enable Collection on lbwiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/63670 [15:46:45] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/63670 [15:47:09] noc is going to be out of date unless we run git pull on fenari /home.. [15:47:57] noc needs to get moved to some good location (and have an appropriate syncing mechanism) [15:48:02] I dunno where that is though [15:48:03] mhmm [15:48:22] having cronjob git pull is probably enough.. depending on where it's hosted [15:48:43] Else have it in mediawiki-installation and symlink if necessary [15:49:02] there's already some symlinks to stuff not in the mw tree [15:49:17] well er not in git anyways [15:58:15] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: No successful Puppet run in the last 10 hours [16:01:33] New patchset: Reedy; "Move RightsUrl variables to InitialiseSettings.php" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/63673 [16:01:35] PROBLEM - SSH on lvs1001 is CRITICAL: Server answer: [16:02:22] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/63673 [16:03:35] RECOVERY - SSH on lvs1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [16:06:44] PROBLEM - SSH on lvs1001 is CRITICAL: Server answer: [16:07:57] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Tue May 14 16:07:56 UTC 2013 [16:08:27] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [16:08:39] New review: Greg Grossmeier; "(1 comment)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/63673 [16:09:03] Reedy: ^^ [16:09:07] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Tue May 14 16:09:06 UTC 2013 [16:09:21] Yup, I'd just noticed that myself :D [16:09:27] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [16:09:47] heh, I didn't notice until I went to wikidata to check something else :) [16:10:17]