[00:23:30] wikiversions.cdb version entry does not start with `php-` (got `1.18`). [00:23:35] at http://en.wikinews.org/wiki/Scottish_prosecutors_keeping_quiet_about_Lanarkshire_surgical_deaths [00:42:05] wtf [00:42:11] Then someone built it incorrectly? [00:42:12] Reedy: --^^ [00:43:20] Tim did when he reverted it [01:33:20] LeslieCarr, RobH: Did one of you just break my access to cadmium? [01:33:33] ssh cadmium refuses my public key, from both fenari and bast1001 [01:33:52] RoanKattouw: I haven't touched cadmium other than to blame you for breaking it - the error given is that the group doesn't exist that you are supposed to be in [01:33:55] * RoanKattouw will ssh in as root for now, but would like this to be fixed [01:33:56] so that is probably the issue [01:44:36] !log Installing ffmpeg on cadmium [01:44:38] Logged the message, Mr. Obvious [01:45:34] New patchset: Bhartshorne; "passing through text of back end error messages in addition to their HTTP response code" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2753 [01:45:57] New review: Bhartshorne; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2753 [01:45:57] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2753 [02:11:28] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [02:11:38] !log Installing nfs-kernel-server on cadmium [02:11:41] Logged the message, Mr. Obvious [02:17:28] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [02:17:28] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [02:56:30] New patchset: Tim Starling; "Re-enable wmerrors" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2756 [02:56:59] New review: Tim Starling; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2756 [02:57:00] Change merged: Tim Starling; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2756 [03:07:21] PROBLEM - Apache HTTP on srv234 is CRITICAL: Connection refused [03:07:39] PROBLEM - Apache HTTP on srv247 is CRITICAL: Connection refused [03:10:52] New review: Hashar; "I have removed the warning from the git-review labsconsole article :-)" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/2682 [03:23:33] RECOVERY - Apache HTTP on srv247 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.023 second response time [03:33:48] RECOVERY - Apache HTTP on srv234 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.038 second response time [03:34:51] PROBLEM - Disk space on mw35 is CRITICAL: DISK CRITICAL - free space: /tmp 27 MB (1% inode=88%): [03:36:48] RECOVERY - Disk space on mw35 is OK: DISK OK [03:56:18] PROBLEM - Disk space on mw9 is CRITICAL: DISK CRITICAL - free space: /tmp 10 MB (0% inode=89%): [04:02:18] RECOVERY - Disk space on mw9 is OK: DISK OK [06:22:15] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:24:12] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 8.909 seconds [06:59:19] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:03:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 9.730 seconds [07:39:09] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:43:03] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 1.353 seconds [07:49:03] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 8140: HTTP/1.1 500 Internal Server Error [08:47:06] PROBLEM - Puppet freshness on cadmium is CRITICAL: Puppet has not run in the last 10 hours [08:48:27] !log Restarted apache on stafford to fix puppetmaster [08:48:30] Logged the message, Master [08:50:24] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 0.056 seconds [09:06:35] New patchset: Mark Bergsma; "Set method { biosgrub } to install a BIOS grub partition" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2757 [09:07:14] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2757 [09:07:14] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2757 [09:24:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:26:25] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 0.158 seconds [09:49:41] New patchset: ArielGlenn; "show item status (= items in archive.org 'job queue')" [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/2758 [09:49:44] New review: gerrit2; "Lint check passed." [operations/dumps] (ariel); V: 1 - https://gerrit.wikimedia.org/r/2758 [10:01:22] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:03:10] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 5.351 seconds [10:15:32] !log Doing first puppet run on ms-be5 [10:15:34] Logged the message, Master [10:30:26] New patchset: Mark Bergsma; "Break up swift::storage into subclasses for dependencies" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2759 [10:31:01] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2759 [10:32:40] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2759 [10:32:41] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2759 [10:37:40] PROBLEM - RAID on searchidx2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:40:21] RECOVERY - RAID on searchidx2 is OK: OK: State is Optimal, checked 4 logical device(s) [10:53:01] New patchset: Mark Bergsma; "Modify partman recipe to not make a huge swap partition" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2760 [10:53:24] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2760 [10:53:45] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2760 [10:53:46] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2760 [10:53:48] New review: Hashar; "(no comment)" [operations/dumps] (ariel) C: 1; - https://gerrit.wikimedia.org/r/2683 [10:55:56] New review: Hashar; "(no comment)" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/2748 [11:00:16] !log Reinstalling ms-be5 to correct partitioning of sda and sdb [11:00:20] Logged the message, Master [11:20:07] mark: yesterday gpt problems here were somewhat reminding me of db21-40 installs ;-) [11:20:30] what was that? [11:21:24] I don't remember what was exactly the problem, and I didn't read much yesterday, as I was asked out of the channel, but I had all sorts of troubles, until I started forcing grub - but that was mostly in post-install FS adjustments [11:22:02] forcing grub rewrite [11:27:55] !log Manually preparing swift filesystems sda4 and sdb4 on ms-be5 [11:27:57] Logged the message, Master [11:49:17] !log Added all devices on ms-be5 into the swift rings, new zone 5, weight 100 [11:49:20] Logged the message, Master [11:50:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:51:33] !log Rebalanced swift rings account, container, object after adding ms-be5 [11:51:35] Logged the message, Master [11:52:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 2.634 seconds [11:58:11] New patchset: Mark Bergsma; "Puppet was making no attempt to start Swift" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2761 [11:58:35] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2761 [11:58:40] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2761 [11:58:41] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2761 [12:12:32] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [12:18:32] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [12:18:32] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [12:27:14] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:29:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 8.970 seconds [13:03:41] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:05:29] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 4.292 seconds [13:33:33] * domas . o O ( professor has multiple cores that are not being used, should rewrite collector to be multithreaded :) [13:35:18] !log Copied swift ring builder files from ms-fe1 to all swift hosts [13:35:20] Logged the message, Master [13:40:23] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:40:35] !log Fixed directory permissions of /srv/swift-storage/{sda4,sdb4} on ms-be1 [13:40:38] Logged the message, Master [13:46:23] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 1.995 seconds [14:16:12] New patchset: Mark Bergsma; "Fix the broken cron job mail bombing me" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2762 [14:16:35] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/2762 [14:16:35] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2762 [14:16:41] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2762 [14:16:42] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2762 [14:17:26] mark: could you look at this real quick and let me know if it was what you were looking for? https://gerrit.wikimedia.org/r/#change,2685 [14:17:33] ok [14:18:04] almost [14:18:17] alright [14:18:26] ah, no, that's correct [14:18:37] ok, cool. then I shall merge [14:18:54] ok [14:19:45] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2685 [14:19:46] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2685 [14:20:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:26:26] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 8.986 seconds [14:48:20] New patchset: Pyoungmeister; "don't want generic" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2763 [14:48:44] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2763 [14:50:36] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2763 [14:50:36] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2763 [14:57:31] New patchset: Pyoungmeister; "harumph dynamic typing..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2764 [14:57:57] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2764 [14:58:24] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2764 [14:58:24] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2764 [15:02:44] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:10:41] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 0.042 seconds [15:34:40] morning mark [15:34:55] so did ms-be5 not stall asking for an ubuntu repository during the build? [15:35:02] did you fix brewster or did it just magically work? [15:36:26] thanks for the notes on the docs. re: the tracebacks in the proxy logs - they're on my list to chase down but I haven't done it yet. [15:37:58] New patchset: RobH; "chris changed keys, updated admins file" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2765 [15:38:26] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2765 [15:38:52] New review: RobH; "simple key swap" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2765 [15:38:52] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2765 [15:39:24] mark: ah well, I guess you were afk. I've got a morning appointment but will be back in 1-2hrs. [15:40:53] New patchset: Pyoungmeister; "puppetizing some indexer stuffs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2766 [15:41:15] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2766 [15:42:56] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2766 [15:42:57] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2766 [15:44:35] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:50:35] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 0.019 seconds [16:04:34] PROBLEM - Lucene on mw1020 is CRITICAL: Connection refused [16:16:42] New patchset: Hashar; "mention changes should be ignored" [test/mediawiki/core2] (master) - https://gerrit.wikimedia.org/r/2767 [16:18:07] New review: jenkins-bot; "Build Started https://integration.mediawiki.org/ci/job/MediaWiki-GIT-Fetching/30/ (1/2)" [test/mediawiki/core2] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/2767 [16:18:07] New review: jenkins-bot; "Build Started https://integration.mediawiki.org/ci/job/MediaWiki-GIT-Fetching/25/ (2/2)" [test/mediawiki/core2] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/2767 [16:23:23] New patchset: Hashar; "jenkins: git preparaton script for gerrit" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2513 [16:24:13] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:24:55] New review: jenkins-bot; "Build Started https://integration.mediawiki.org/ci/job/MediaWiki-GIT-Fetching/31/ (2/2)" [test/mediawiki/core2] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/2617 [16:26:24] New patchset: Hashar; "mention changes should be ignored" [test/mediawiki/core2] (master) - https://gerrit.wikimedia.org/r/2767 [16:27:33] New review: jenkins-bot; "Build Started https://integration.mediawiki.org/ci/job/MediaWiki-GIT-Fetching/26/ (1/2)" [test/mediawiki/core2] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/2767 [16:27:33] New review: jenkins-bot; "Build Started https://integration.mediawiki.org/ci/job/MediaWiki-GIT-Fetching/32/ (2/2)" [test/mediawiki/core2] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/2767 [16:30:04] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 3.538 seconds [16:40:28] New review: Hashar; "(no comment)" [test/mediawiki/core2] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2767 [16:40:28] Change merged: Hashar; [test/mediawiki/core2] (master) - https://gerrit.wikimedia.org/r/2767 [16:59:10] New review: jenkins-bot; "Build Started https://integration.mediawiki.org/ci/job/MediaWiki-GIT-Fetching/33/ (2/2)" [test/mediawiki/core2] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/2767 [17:03:49] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:09:58] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 0.461 seconds [17:12:36] New patchset: Krinkle; "Adding some new lines during the git training" [test/mediawiki/extensions/examples] (master) - https://gerrit.wikimedia.org/r/2768 [17:12:37] New review: gerrit2; "Lint check passed." [test/mediawiki/extensions/examples] (master); V: 1 - https://gerrit.wikimedia.org/r/2768 [17:16:36] RECOVERY - Lucene on search1003 is OK: TCP OK - 0.026 second response time on port 8123 [17:19:15] New review: Krinkle; "Hey" [test/mediawiki/extensions/examples] (master) C: 1; - https://gerrit.wikimedia.org/r/2768 [17:24:06] PROBLEM - Lucene on mw1110 is CRITICAL: Connection refused [17:24:33] PROBLEM - Lucene on mw1010 is CRITICAL: Connection refused [17:26:54] New review: Sumanah; "Publishing inline comment." [test/mediawiki/extensions/examples] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/2768 [17:36:52] New review: Sumanah; "This is a great change!" [test/mediawiki/extensions/examples] (master) C: 2; - https://gerrit.wikimedia.org/r/2768 [17:36:52] Change merged: Sumanah; [test/mediawiki/extensions/examples] (master) - https://gerrit.wikimedia.org/r/2768 [17:39:45] New patchset: Sumanah; "suggestion to use Git" [test/mediawiki/extensions/examples] (master) - https://gerrit.wikimedia.org/r/2769 [17:39:46] New review: gerrit2; "Lint check passed." [test/mediawiki/extensions/examples] (master); V: 1 - https://gerrit.wikimedia.org/r/2769 [17:43:54] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:44:38] New review: Krinkle; "Approving this message." [test/mediawiki/extensions/examples] (master) C: 2; - https://gerrit.wikimedia.org/r/2769 [17:44:46] rainman-sr: hey! thank you for your response. [17:45:05] rainman-sr: should I stop just the inc. updater on searchidx2? or the lucene proc as well? [17:45:08] Change merged: Krinkle; [test/mediawiki/extensions/examples] (master) - https://gerrit.wikimedia.org/r/2769 [17:45:20] notpeter, i would stop all java processes on searchidx2 [17:45:25] ok, cool [17:46:03] and then when I restart the inc updater on 1001, what date should I give it as an arg? [17:47:54] !log stopping indexing on searchidx2 to rsync over a clean copy of index to searchidx1001 [17:47:56] Logged the message, and now dispaching a T1000 to your position to terminate you. [17:48:26] New review: jenkins-bot; "Build Started https://integration.mediawiki.org/ci/job/MediaWiki-GIT-Fetching/34/ (2/2)" [test/mediawiki/core2] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/2767 [17:48:55] New patchset: Andre Engels; "* Worked Andrew's comments into my code * Changed test_access_log_pipeline.py into test_traits.py * Did some changes in determining traits based on the test * Removed the trait 'domain' because it duplicated the trait 'site' * Used yield instead of creati" [analytics/reportcard] (andre/mobile) - https://gerrit.wikimedia.org/r/2770 [17:49:54] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 9.156 seconds [17:50:13] notpeter, the date that is given there is a default data if no date is available in /a/search/indexes/status [17:50:19] notpeter, so it can be the same as on searchidx2 [17:51:42] so 2010-02-16 [17:51:44] ? [17:52:34] New review: jenkins-bot; "Build Started https://integration.mediawiki.org/ci/job/MediaWiki-GIT-Fetching/35/ (2/2)" [test/mediawiki/core2] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/2767 [17:53:06] ping mark I'm curious about imaging ms-be5 [17:53:47] ^demon|away: Looks like gmail detected the gerrit email notify as a mailinglist post [17:53:52] ^demon|away: list:"" [17:54:07] <^demon|away> Yeah, it has lots of machine-readable type stuff in the gerrit e-mails. [17:54:11] nice [17:54:11] <^demon|away> Makes for easy rule-writing. [17:54:14] yep [17:54:25] I've got so may gmail filters and all [17:54:32] <^demon|away> I've got my gerrit e-mails being labelled by-project :) [17:54:45] without it it'd be nowhere with a subscription on mediawiki-cvs and wikibugs for starters. [17:55:03] aight going now [17:55:04] cya [17:56:23] New review: jenkins-bot; "Build Started https://integration.mediawiki.org/ci/job/MediaWiki-GIT-Fetching/36/ (2/2)" [test/mediawiki/core2] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/2767 [17:56:32] well that is a bit spammy, sorry :-) [17:56:52] ah hah! RobH I think https://gerrit.wikimedia.org/r/#patch,sidebyside,2757,1,files/autoinstall/raid1-2TB-1partition.cfg is what Mark did to fix the partman recipe for ms-be*. [17:57:19] yea i was reading his email [17:57:31] oh, I didn't see him mention that part in his email. [17:57:33] rainman-sr: so this will take about 3.5 hours [17:57:40] i had not gotten to this yet [17:57:41] ;] [17:58:11] once it's over, I'll use the start indexer script in your home dir to start back up searchidx2 [17:58:31] and then on searchidx1001 I'll start the lucene proc and then start the indexer [17:59:04] you said that I should run all of the things from your crontab. is there any order that would be better than any other? [17:59:14] ok, going to lunch, back shortly [17:59:48] oh hey, he did. I guess I just didn't do a good job reading his email this morning. [17:59:57] unping mark - I see you've told me everything I wanted to know in email. [17:59:58] ;) [18:02:04] notpeter, the things that are in crontab should be in exactly the same way in crontab of searchidx1001, I think all of them should work on searchidx1001 without modification, so you would just copy the crontab and the scripts [18:07:01] rainman-sr: ok [18:09:39] New review: jenkins-bot; "Build Started https://integration.mediawiki.org/ci/job/MediaWiki-GIT-Fetching/37/ (2/2)" [test/mediawiki/core2] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/2767 [18:10:35] New review: jenkins-bot; "Build Started https://integration.mediawiki.org/ci/job/MediaWiki-GIT-Fetching/38/ (2/2)" [test/mediawiki/core2] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/2767 [18:11:58] enough spam, will be back later tonight [18:25:36] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:31:28] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 5.126 seconds [18:37:12] New patchset: ArielGlenn; "the mhash workaround for snaps is hopefully no longer needed" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2771 [18:37:36] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2771 [18:38:16] New review: ArielGlenn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2771 [18:38:17] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2771 [18:43:46] New patchset: Trevor Parscal; "Added important notes to HelloWorld INSTALL" [test/mediawiki/extensions/examples] (master) - https://gerrit.wikimedia.org/r/2772 [18:43:47] New review: gerrit2; "Lint check passed." [test/mediawiki/extensions/examples] (master); V: 1 - https://gerrit.wikimedia.org/r/2772 [18:47:14] New patchset: Robmoen; "added hello world from rob" [test/mediawiki/extensions/examples] (master) - https://gerrit.wikimedia.org/r/2773 [18:48:54] PROBLEM - Puppet freshness on cadmium is CRITICAL: Puppet has not run in the last 10 hours [18:54:00] hrmph. ms-be4 is now in a reinstall loop rather than failing at grub. [18:55:20] New review: Trevor Parscal; "Maybe it's better if people aren't asked directly to give me money." [test/mediawiki/extensions/examples] (master) C: -1; - https://gerrit.wikimedia.org/r/2772 [18:57:00] New review: Robmoen; "No money for you." [test/mediawiki/extensions/examples] (master) C: 0; - https://gerrit.wikimedia.org/r/2772 [18:57:56] New review: Sumanah; "Great!" [test/mediawiki/extensions/examples] (master) C: 2; - https://gerrit.wikimedia.org/r/2772 [18:57:56] Change merged: Sumanah; [test/mediawiki/extensions/examples] (master) - https://gerrit.wikimedia.org/r/2772 [18:58:38] Change abandoned: Sumanah; "Sorry." [test/mediawiki/extensions/examples] (master) - https://gerrit.wikimedia.org/r/2773 [19:02:43] New patchset: Sumanah; "blowing your mind" [test/mediawiki/extensions/examples] (master) - https://gerrit.wikimedia.org/r/2774 [19:02:44] New review: gerrit2; "Lint check passed." [test/mediawiki/extensions/examples] (master); V: 1 - https://gerrit.wikimedia.org/r/2774 [19:03:34] New review: Trevor Parscal; "This is by far the most brilliant commit I have ever seen." [test/mediawiki/extensions/examples] (master) C: 2; - https://gerrit.wikimedia.org/r/2774 [19:03:38] New review: Robmoen; "Cool" [test/mediawiki/extensions/examples] (master) C: 1; - https://gerrit.wikimedia.org/r/2774 [19:04:05] New review: au; "Tres cool!" [test/mediawiki/extensions/examples] (master) C: 1; - https://gerrit.wikimedia.org/r/2774 [19:04:10] Change merged: Trevor Parscal; [test/mediawiki/extensions/examples] (master) - https://gerrit.wikimedia.org/r/2774 [19:05:49] New review: au; "Abandon all changes, ya who commits I5e298de1." [test/mediawiki/extensions/examples] (master) - https://gerrit.wikimedia.org/r/2773 [19:06:00] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:09:54] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 4.611 seconds [19:13:30] any opsen want to do some swift diagnosis? I'm going to check a few things for https://bugzilla.wikimedia.org/show_bug.cgi?id=34611 [19:14:13] notpeter or RobH or LeslieCarr or Ryan_Lane or apergos? [19:14:43] ? [19:14:46] LeslieCarr: sure, i'd love to check it out [19:14:47] I'm mostly not here [19:15:05] what do you need done? [19:15:19] apergos: I'm going to look at a few things for https://bugzilla.wikimedia.org/show_bug.cgi?id=34611 and wanted to know if you wanted to follow along and see some basic swift commands [19:15:26] im mostly here but on a call with dell about pushing those swift servers ;] (ie: dont wait on me) [19:15:27] ah [19:15:29] New patchset: au; "* Testing commit" [test/mediawiki/extensions/examples] (master) - https://gerrit.wikimedia.org/r/2775 [19:15:31] New review: gerrit2; "Lint check passed." [test/mediawiki/extensions/examples] (master); V: 1 - https://gerrit.wikimedia.org/r/2775 [19:15:34] maplebed: via screen? [19:15:34] sure, that sounds fun [19:15:36] just show and tell. nothing terribly exciting. [19:15:39] RobH: yup. [19:15:39] cuz i wouldnt mind that. [19:15:47] what host are you on? [19:15:51] bast1001 [19:15:57] maplebed: same way as yesterday? [19:16:04] New review: au; "Oh I would prefer that I didn't submit this." [test/mediawiki/extensions/examples] (master) C: -1; - https://gerrit.wikimedia.org/r/2775 [19:16:17] New review: au; "...but I changed my mind..." [test/mediawiki/extensions/examples] (master) C: 1; - https://gerrit.wikimedia.org/r/2775 [19:16:21] log in to bast1001, get root, and run 'screen -x ben/swiftobjs' [19:16:51] oh, and I suggest you mute the gerrit bot, because it will totally ruin your ability to follow along with stuff said here. [19:17:04] especially with folks playing around with it and testing all sorts of commits. [19:17:45] ah root, heh [19:17:46] cool, i am following along, but it only has partial attention =] [19:18:00] ok. let me know when you're all connected. [19:18:08] My window is 136*60 [19:18:09] * RobH is connected [19:18:29] I'm on [19:18:43] I have no idea what my window size is [19:18:57] i made mine match, but does it matter? [19:19:01] one of you is 151x37 [19:19:22] RobH: if it's different and I run something like 'less', part of the file will be chopped off from your view if your window is smaller. [19:19:23] I can't get 60 lines on here I don't think [19:19:25] maplebed: what screen? [19:19:25] ahhh [19:19:32] LeslieCarr: swiftobjs [19:19:36] 38 [19:19:40] yeah that's all I got [19:19:46] apergos: I'll shrink down. [19:19:47] I'm on a laptop [19:19:51] portrait monitor ftw. [19:20:01] apergos: you're ok with 136 wide? [19:20:04] we should do 900 deep. [19:20:09] I don't care, width whatever [19:20:12] ;] [19:20:33] ok, so bug https://bugzilla.wikimedia.org/show_bug.cgi?id=34611 has two sections. [19:20:41] the first reports problems with http://commons.wikimedia.org/wiki/File:Hydravulgaris.jpg [19:20:48] the second with http://upload.wikimedia.org/wikipedia/commons/thumb/5/58/Commons-emblem-disambig-notice.svg/1200px-Commons-emblem-disambig-notice.svg.png [19:20:58] I'm going to look at those images on both swift and ms5 and see what's what. [19:21:02] right [19:21:17] all swift commands I'm going to run from either of the front ends. [19:21:29] this isn't really necessary but the swift packages aren't currently installed anywhere else. [19:21:39] I'll probably install them on some worker host sometime but it hasn't happened. [19:22:04] the account credentials we'll use are in the swift proxy config file and in the private puppet repo. [19:22:17] where is the swift proxy config file? [19:22:26] the first thing we'll do is grab those, then go look at some objects. [19:22:33] for doing this on your own later, http://wikitech.wikimedia.org/view/Swift/How_To#List_containers_and_contents [19:22:47] apergos: /etc/swift/* has all the swift configs. [19:22:54] ok great [19:23:45] ok, first thing I'll do is ask swift for a list of all thumbnail sizes for the first image (the Hydravulgaris) [19:24:17] to figure out what I should use as the prefix, [19:24:23] I take an example thumbnail [19:24:40] (right clicking on the file page and choosing view image I get http://upload.wikimedia.org/wikipedia/commons/thumb/2/2f/Hydravulgaris.jpg/625px-Hydravulgaris.jpg) [19:24:53] the prefix I need to use starts after 'thumb' in that URL [19:25:01] so '2/2f/blah...' [19:25:09] Change abandoned: au; "...and I shall finish this change... with abandon!" [test/mediawiki/extensions/examples] (master) - https://gerrit.wikimedia.org/r/2775 [19:25:26] it's an anchored partial string match, so I only need to go as far as I think will be unique enough. [19:25:31] (I should make a script available for that, since I have one that does that work for something else) [19:25:51] +1 apergos - I need to make scripts surrounding a bunch of these actions. [19:25:56] they're kind of a pain [19:26:08] talk to me later about it [19:26:10] the arguments go --prefix partial-path container [19:26:23] the 'container' is figured out by rearranging other contents of teh URL. [19:26:42] tihs URL has /wikipedia/commons/thumb. That translates to wikipedia-commons-local-thumb [19:26:43] depends on if it's commons too eh? [19:26:58] if it were in the enwiki, it would likely be wikipedia-en-local-thumb [19:27:26] with one more rule - if it's enwiki or commons, you include the two character shard. [19:27:30] is local always inserted ? [19:27:32] no other projcts use it. [19:27:39] local is always inserted for thumbs. [19:27:50] there are other non-local repo types that Aaron will introduce later with originals. [19:28:05] in this case our shard is '2f'. [19:28:52] ok, that's the list of all images that match that prefix. [19:29:12] cool [19:29:13] 1200? really? grrrr [19:29:14] Annoyingly, there's no equivalent of 'ls -l' to show size info, so I need a for loop to iterate over them and 'stat' the object. [19:29:22] apergos: the 1200s are generated by the pediapress stuff. [19:29:24] 1200 is the default for pediapress i believe [19:29:28] meh [19:29:37] so if i request a 121px pic, will it now show up here ? [19:29:52] for stat, the order of arguments is inverted. it goes 'stat container object' [19:29:53] LeslieCarr: yup. [19:29:56] try it! [19:30:27] yay [19:30:28] :) [19:30:33] space waster! [19:30:34] * maplebed created a 123px image too. [19:30:45] i'm now part of the problem! w00t [19:30:47] it's ok apergos, were going to purge the image at the end. [19:30:48] :D [19:30:55] :-P [19:31:10] as soon as thumbs are off of ms5 completely I will no longer care! [19:31:45] ok, for a loop over all images returned by list, show me stats on that image. [19:32:13] here we've got detailed info for all the images. [19:32:22] I'm looking for Content-Length that looks odd. [19:32:43] specifically, an image that has more pixels should always have a larger content length that one with fewer pixels. [19:33:24] ok, all are larger than the preceding image. [19:33:42] before we go on to the next test, let's do that again for the svg (the second part of the bug). [19:33:48] can you do an up on that last command again ? i didn't have the chance to copy it :) [19:33:51] someone else want to drive? [19:34:12] i'd love to drive :) [19:34:20] and thanks, got the command [19:34:35] our goal is to look at all thumbnails for http://commons.wikimedia.org/wiki/File:Commons-emblem-disambig-notice.svg [19:34:52] ok, LeslieCarr keyboard's all yours. [19:36:02] is 558 correct for this ones shard ? [19:36:15] oh, just 58 [19:36:16] shards are always 2 character.s [19:36:30] annoyingly, the first character is repeated earlier in the URL. [19:36:54] weird, so what is up with those truncated names ? [19:37:05] I'm not sure how they were created. [19:37:12] I need to go through and wipe them all out. [19:37:19] at the moment though, I don't think they actually hurt anything, [19:37:21] just waste space. [19:37:29] mediawiki won't generate URLs like that so they should never be accessed. [19:37:43] wonder what's in em [19:38:07] (btw, I really like that $(!!) trick.) [19:39:03] looks good. [19:39:58] so we can see the truncated 1200px image right away due to the length onger than the 1199 one, rihgt ? [19:40:01] yup [19:40:11] we have a winner! [19:40:23] the truncated one was feb 2 [19:40:30] wonder what might have been going on back then [19:40:58] apergos: I think that was my thumbnail-filler script not doing a good job. [19:41:05] ok [19:41:11] that was also before we fixed a bug that if the client disconnects early a truncated image got written. [19:41:32] so it might be worth cleaning up and seeing how things go [19:41:34] but that's nice confirmation that it was created while the known bug was in effect, not more recently than our fix. [19:41:43] :-) [19:41:47] I did run a job that cleaned up most of them. [19:42:21] I didn't think it missed any, but I guess it did. [19:42:23] :( [19:42:49] so the test of 'smaller than it should be' is really only a heuristic, not a definitve test. [19:42:57] to check for real, we compare the sizes against ms5.4 [19:42:59] ms5. [19:43:08] well, it got most of them :) [19:43:12] ms5.4: new and improved :-D [19:43:25] hit backtick-n to switch to the next window [19:43:27] (`n) [19:43:39] where we'll get the corresponding listings on ms5 [19:43:54] how can you live without backtick?? [19:44:14] apergos: that's why all my scripts have $(foo) instead of `foo` [19:44:28] (aside from being easier to find balancing parens) [19:44:35] i switched to ctrl+j [19:44:50] that might be good, I think I never use that combo [19:45:01] maplebed: so ecerything is in /export/thumbs ? [19:45:09] yup. [19:45:25] (it's also mounted everywhere via nfs, so we don't actually have to be on ms5 to run this check) [19:45:36] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:45:37] now the annoying part - comparing sizes. [19:45:42] in a directory with the name of the image [19:45:49] (to make it easier, `` will switch between windows) [19:46:11] well this is pretty easy to see the difference, right ? since 1200 was smaller than 1199 on swift and larger on ms5 [19:46:16] uh huh [19:47:16] the last step is to either [19:47:28] * delete the individual thumbnails using the swift 'delete' command [19:47:39] * use mediawiki to purge the image- deleting all thumbnails. [19:47:49] the first is better if you're doing it from a script, but it's way dangerous. [19:47:58] the second is better in this case i think [19:47:59] the same command to delete an object deletes a container (and all the objects contained within it) [19:48:19] because we want to verify that when they are recreated, well after the period of the buggy thumb creation, that they are all ok [19:48:31] the syntax also goes delete object container, so if you slip up and hit return before you start typing the object, you just ficked yourself. [19:48:44] ouch [19:48:45] ?action=purge appended on http://commons.wikimedia.org/wiki/File:Commons-emblem-disambig-notice.svg is the proper MW purge, yeah ? [19:48:46] so though you can delete the individual object, don't. [19:48:54] LeslieCarr: correct. [19:49:26] let's load that URL then look at swift and ms5 again and verify that they're deleted. [19:50:32] ok, I loaded teh purge URL. [19:50:37] looking at teh files on ms5 again: [19:50:48] mostly gone. [19:50:50] really? [19:50:56] why are those left? [19:51:06] I don't know. [19:51:09] weird... [19:51:15] not liking it [19:51:19] they should all be gone. [19:51:22] broken purge is broken :-( [19:51:24] i have robla right here... [19:51:27] should i yell at him ? [19:51:29] I've seen this before, but haven't dug in. [19:51:35] LeslieCarr: I think we should just file a bug [19:51:36] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 8.691 seconds [19:51:44] not as fun as yelling :( [19:51:45] I've also got to stop in 2 minutes... [19:51:57] LeslieCarr: feel free to yell anyways. [19:52:07] ok, so they're gone on ms5, let's go check swift. [19:52:36] oh hey, I've got an idea of why they might still be there, apergos. [19:52:39] oh? [19:52:50] mediawiki has to get a listing of files in order to figure out which ones to delete. [19:52:54] that's always come from ms5, [19:52:58] but commons got 1.19, [19:53:01] which includes cloudfiles. [19:53:11] it's possible that it used swift's listing instead of ms5's, [19:53:39] well there's still some left there too [19:53:46] https://bugzilla.wikimedia.org/show_bug.cgi?id=34697 [19:53:46] ah new, ok [19:53:52] but look - those are all created within the last few minutes. [19:53:57] prolly created from viewing the page [19:54:11] exactly. [19:54:13] the other ones are old though, that's going to be a drag [19:54:23] weird so why is 1199 and the old 1200 gone now ? [19:54:34] did someone delete them by hand or did it just take longer to purge ? [19:54:37] the purge deletes from both places [19:54:40] crap [19:54:54] sorry if I borked anyone's typing [19:55:19] LeslieCarr: fascinating. maybe it just does take longer than we gave it. [19:55:31] yeah, I totally don't know. but it is worth looking into. [19:55:35] maaaaybe [19:56:15] "And this concludes your tour of looking at objects in swift. Thank you very much, and I hope you enjoy the rest of your day (or night). Take care and come again!" [19:56:19] heh [19:56:26] thanks! [19:56:37] I'm going to leave the screen session up for a bit if you want to keep playing or record some of the comands. [19:57:25] yay :) [20:00:51] ok I got me some notes. awesome! [20:01:54] feel free to update http://wikitech.wikimedia.org/view/Swift/How_To with anything taht seems useful. [20:01:56] :D [20:02:25] I'll need to play some before I'd have anything to add [20:03:39] oh maplebed [20:03:45] maplebed: I'm back now...anything you still need from me? [20:03:55] :) [20:04:33] I have in git.. umm... [20:05:05] dumps/tools/thumbs/crunchinglogs/otherscripts/listFileNames.py [20:05:30] it takes filenames on stdin and writes /export/thumbs/wikipedia/commons/thumb/blah/blahblah/dirname out [20:05:54] image names should have _ for space. (it says so in the file too) [20:06:07] so there's your tool if ya want it [20:15:44] New patchset: Sumanah; "better and shorter string" [test/mediawiki/extensions/examples] (master) - https://gerrit.wikimedia.org/r/2776 [20:15:45] New review: gerrit2; "Lint check passed." [test/mediawiki/extensions/examples] (master); V: 1 - https://gerrit.wikimedia.org/r/2776 [20:25:01] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:28:10] PROBLEM - Host search1009 is DOWN: PING CRITICAL - Packet loss = 100% [20:29:40] RECOVERY - Host search1009 is UP: PING OK - Packet loss = 0%, RTA = 26.43 ms [20:30:52] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 6.270 seconds [20:49:10] PROBLEM - MySQL Slave Delay on db16 is CRITICAL: CRIT replication delay 205 seconds [21:04:46] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:10:37] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 4.442 seconds [21:16:37] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 8140: HTTP/1.1 500 Internal Server Error [22:01:38] PROBLEM - Puppet freshness on ms-be5 is CRITICAL: Puppet has not run in the last 10 hours [22:10:55] New patchset: Lcarr; "Adding in plugin-config files to puppet and new nagios manifest" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2777 [22:13:38] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [22:17:14] New patchset: Lcarr; "Adding in plugin-config files to puppet and new nagios manifest" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2777 [22:19:38] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [22:19:38] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [22:24:43] New patchset: Lcarr; "Adding in plugin-config files to puppet and new nagios manifest" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2777 [22:25:48] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2777 [22:25:49] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2777 [22:28:46] wow. the indexes are going to take up roughly twice as much space on teh new system... [22:29:12] I can only guess that it's because there are 160k files, and that xfs doesn't do as well with lots of small files as ext3? [22:29:15] but still [22:29:36] also, this means that rsync of indexes is only half-over [22:38:19] notpeter, there is lots of hardlinking involved in thosei indexes, that's probably why [22:38:37] rsync doesn't know it's hardlinks, and instead copies whole files [22:38:41] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 0.876 seconds [22:38:50] there may be a flag [22:39:02] -H it is [22:39:10] ah [22:39:34] well, as it's half done... it might just be worth restarting with that... [22:39:43] yep [22:39:43] *sigh* [22:40:29] there is also some symbolic links i think [22:40:42] but that might be already copied [22:41:19] well, that just saved a bunch of gigs [22:41:23] and restarted at the same place! [22:41:26] I love you, rsync [22:43:39] yeah, restarted with a -l [22:43:45] for symlinks [22:44:07] awesome [22:44:11] glad you caught that! [22:44:50] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:50:41] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 2.943 seconds [23:00:57] New patchset: Lcarr; "Updating secure.wikimedia.org proxy config a la https://httpd.apache.org/docs/2.2/mod/mod_proxy_http.html" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2778 [23:02:42] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2778 [23:02:42] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2778 [23:05:04] !log Server should be SSL-aware but has no certificate configured [Hint: SSLCertificateFile] ((null):0) [23:05:06] Logged the message, Mistress of the network gear. [23:05:09] doh didn't mean to log that [23:05:10] Server should be SSL-aware but has no certificate configured [Hint: SSLCertificateFile] ((null):0) [23:08:41] I thought that got fixed [23:08:41] PROBLEM - HTTP on singer is CRITICAL: Connection refused [23:08:56] ryan had added something to the planet config, or something, last time [23:09:04] I did [23:09:13] I added the certificate [23:09:18] maybe something got overwritten [23:10:24] !lot restarting indexer and lucene on searchidx2 [23:10:27] I didn't realize it was puppetied [23:10:36] so puppet undid what I did [23:10:40] !log restarting indexer and lucene on searchidx2 [23:10:42] Logged the message, and now dispaching a T1000 to your position to terminate you. [23:11:14] New patchset: Lcarr; "adding in ssl file info" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2779 [23:11:33] ah puppet [23:11:36] how I love to hate you [23:11:37] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2779 [23:11:42] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2779 [23:11:42] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2779 [23:11:44] :) [23:14:41] RECOVERY - HTTP on singer is OK: HTTP OK - HTTP/1.1 302 Found - 0.003 second response time [23:21:23] New patchset: Lcarr; "Ensuring /etc/nagios-plugins directory exists" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2780 [23:23:46] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2780 [23:23:47] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2780 [23:24:53] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:26:15] New patchset: Hashar; "jenkins: git preparaton script for gerrit" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2513 [23:27:04] New review: Catrope; "(no comment)" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/2513 [23:27:48] while I am around, I have some trivial changes pending in gerrit [23:28:06] would be great if someone could merge some of them :-D [23:29:51] rainman-sr: there are an awful lot of entries in the logs on idx1001 that look like [main] INFO org.wikimedia.lsearch.oai.IncrementalUpdater - Sending (3790, -1:, date=null) [23:30:09] any insight? [23:30:24] prolly need to adjust debug level [23:30:29] for log4j [23:30:44] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 3.497 seconds [23:30:44] although i think it's same on searchidx2 [23:31:24] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2513 [23:31:24] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2513 [23:31:37] hashar: link me to some [23:31:46] other than the local lint change :) [23:31:50] which i see you requested me :) [23:32:05] cause you did the local-lint one :-D [23:32:24] here are the links. ignore .swp files https://gerrit.wikimedia.org/r/#change,2587 [23:32:35] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 1; - https://gerrit.wikimedia.org/r/2629 [23:32:55] they are made by vim, it keep bothering me when I have multiple terminals and do a git add * [23:32:57] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2587 [23:32:58] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2587 [23:32:59] oh yay [23:33:00] hehe [23:33:01] arg, and idx1001 is also refusing rmi requests... [23:33:02] sigh [23:33:09] and the same with .pyc files https://gerrit.wikimedia.org/r/#change,2514 [23:33:33] hrm, do we use any compiled python files ? [23:33:43] since we have some python scripts, whenever one run them, you end up with pyc that get added with git add * when we don't want them [23:34:02] let me check [23:34:17] none found [23:35:21] notpeter: I wish I could help in someway. [23:35:25] * hashar sends donuts to notpeter [23:35:27] New review: Bhartshorne; "I don't think we want all swift access logs cetnralized in this way. If we tune swift logs so that ..." [operations/puppet] (production); V: 0 C: -1; - https://gerrit.wikimedia.org/r/2673 [23:35:57] * maplebed commented on change 2673, though probably not with the desired result. [23:36:09] LeslieCarr: lastly https://gerrit.wikimedia.org/r/#change,2677 is an update of a href. [23:36:38] maplebed: That seems quite appropriate. A -1 won't prevent someone else from approving it anyway [23:36:44] hashar: the swift logs are currently pretty noisy and we don't aggregate any other access logs (sfaik) in that way. [23:36:51] cool. [23:37:06] I am fine having a change rejected :-D [23:37:13] I definitely want to centralize some of the logging, but I think it needs a bit more analysis before throwing it into nfs. [23:38:03] aren't all those logs already hitting NFS? [23:38:11] nope. [23:38:13] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2677 [23:38:13] I did that change following a discussion with Ariel IIRc [23:38:13] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2677 [23:38:18] at least, I don't think they are... [23:38:21] huh. [23:38:52] they can be read from fenari:/home/wikipedia/syslog/syslog [23:39:11] no shit. [23:39:46] Well, they're not being written over NFS [23:39:55] New review: Lcarr; "Looks good but a little cautious, want another review" [operations/puppet] (production); V: 0 C: 1; - https://gerrit.wikimedia.org/r/2514 [23:39:55] They're being written to a box that happens to also expose that dir as an NFS share [23:40:17] LeslieCarr: thanks :) [23:40:18] ho damn. [23:40:27] well, nevermind then. [23:40:33] I thought they weren't leaving the box. [23:40:42] (the swift boxes) [23:40:55] so I was expecting the syslog-ng.conf to split the big syslog files in smaller part [23:41:14] you probably don't want to merge that change on friday anyway 8) [23:41:20] New review: Bhartshorne; "well, nevermind. they're already getting pushed via syslog. +1 on splitting them into a swift-spec..." [operations/puppet] (production); V: 0 C: 1; - https://gerrit.wikimedia.org/r/2673 [23:42:10] knowing that the'yre already making it over there, +1. but yeah, not today. [23:42:40] thanks! [23:43:22] great all my patches have a +1 now :-D [23:43:55] the rest is not for a friday merge ;-) [23:55:38] RECOVERY - MySQL Slave Delay on db16 is OK: OK replication delay 0 seconds [23:55:43] New patchset: Hashar; "rm ansi sequences when validating puppet changes" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2781 [23:58:13] no more ansi sequences :-) [23:58:18] off [23:58:34] I wish you all a good week-end. Thanks for the reviews maplebed & LeslieCarr \o/ [23:58:39] you too! [23:58:51] <^demon|away> goodnight hashar. [23:59:54] :) [23:59:55] g'night