[00:06:11] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [00:07:21] PROBLEM - RAID on db11 is CRITICAL: CRITICAL: Defunct disk drive count: 1 [00:07:51] PROBLEM - Disk space on db11 is CRITICAL: DISK CRITICAL - free space: /a 23337 MB (2% inode=99%): [00:08:01] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Mon Apr 1 00:07:59 UTC 2013 [00:08:11] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [00:09:21] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Mon Apr 1 00:09:11 UTC 2013 [00:10:11] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [00:10:21] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Mon Apr 1 00:10:16 UTC 2013 [00:11:11] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [00:11:21] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Mon Apr 1 00:11:13 UTC 2013 [00:12:11] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [00:12:11] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Mon Apr 1 00:12:04 UTC 2013 [00:13:11] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [00:13:31] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Mon Apr 1 00:13:24 UTC 2013 [00:14:11] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [00:14:31] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Mon Apr 1 00:14:29 UTC 2013 [00:15:11] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [00:26:11] PROBLEM - Puppet freshness on virt1005 is CRITICAL: Puppet has not run in the last 10 hours [00:49:37] PROBLEM - Varnish traffic logger on cp1033 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [01:02:37] RECOVERY - Varnish traffic logger on cp1033 is OK: PROCS OK: 3 processes with command name varnishncsa [01:03:31] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [01:05:11] PROBLEM - Disk space on db11 is CRITICAL: DISK CRITICAL - free space: /a 22757 MB (2% inode=99%): [01:05:41] PROBLEM - RAID on db11 is CRITICAL: CRITICAL: Defunct disk drive count: 1 [01:21:01] PROBLEM - Varnish traffic logger on cp1023 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [01:47:16] PROBLEM - Varnish traffic logger on cp1028 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [01:53:06] RECOVERY - Varnish traffic logger on cp1023 is OK: PROCS OK: 3 processes with command name varnishncsa [02:03:40] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [02:05:50] PROBLEM - RAID on db11 is CRITICAL: CRITICAL: Defunct disk drive count: 1 [02:06:20] PROBLEM - Disk space on db11 is CRITICAL: DISK CRITICAL - free space: /a 23162 MB (2% inode=99%): [02:06:31] PROBLEM - Varnish traffic logger on cp1033 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [02:08:50] RECOVERY - Varnish traffic logger on cp1028 is OK: PROCS OK: 3 processes with command name varnishncsa [02:12:25] !log LocalisationUpdate completed (1.21wmf12) at Mon Apr 1 02:12:24 UTC 2013 [02:12:32] Logged the message, Master [02:14:50] PROBLEM - Varnish traffic logger on cp1028 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [02:29:00] PROBLEM - Varnish traffic logger on cp1023 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [02:32:31] RECOVERY - Varnish traffic logger on cp1033 is OK: PROCS OK: 3 processes with command name varnishncsa [02:38:40] RECOVERY - Varnish traffic logger on cp1028 is OK: PROCS OK: 3 processes with command name varnishncsa [02:44:40] PROBLEM - Varnish traffic logger on cp1028 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [02:53:00] RECOVERY - Varnish traffic logger on cp1023 is OK: PROCS OK: 3 processes with command name varnishncsa [03:04:30] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [03:06:40] PROBLEM - RAID on db11 is CRITICAL: CRITICAL: Defunct disk drive count: 1 [03:06:40] PROBLEM - Varnish traffic logger on cp1033 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [03:07:10] PROBLEM - Disk space on db11 is CRITICAL: DISK CRITICAL - free space: /a 22656 MB (2% inode=99%): [03:07:40] RECOVERY - Varnish traffic logger on cp1028 is OK: PROCS OK: 3 processes with command name varnishncsa [03:22:54] !log reinstated max_user_connections = 80 for wikiadmin on s1 [03:31:40] RECOVERY - Varnish traffic logger on cp1033 is OK: PROCS OK: 3 processes with command name varnishncsa [03:44:43] !log asher synchronized wmf-config/db-eqiad.php 'pulling db1051' [03:49:30] PROBLEM - mysqld processes on db1051 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [03:50:31] RECOVERY - mysqld processes on db1051 is OK: PROCS OK: 1 process with command name mysqld [03:50:59] morebots is missing. [03:51:00] Sigh. [03:51:11] Lost in the netsplit. [03:58:05] !log asher synchronized wmf-config/db-eqiad.php 'returning db1051' [04:04:04] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [04:06:14] PROBLEM - RAID on db11 is CRITICAL: CRITICAL: Defunct disk drive count: 1 [04:06:44] PROBLEM - Disk space on db11 is CRITICAL: DISK CRITICAL - free space: /a 22196 MB (2% inode=99%): [04:07:54] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Mon Apr 1 04:07:49 UTC 2013 [04:08:04] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [04:09:04] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Mon Apr 1 04:08:54 UTC 2013 [04:09:04] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [04:09:54] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Mon Apr 1 04:09:53 UTC 2013 [04:10:04] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [04:10:54] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Mon Apr 1 04:10:46 UTC 2013 [04:11:04] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [04:11:34] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Mon Apr 1 04:11:32 UTC 2013 [04:12:04] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [04:12:15] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Mon Apr 1 04:12:13 UTC 2013 [04:13:04] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [04:14:54] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Mon Apr 1 04:14:47 UTC 2013 [04:15:04] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [05:04:16] PROBLEM - Puppet freshness on virt3 is CRITICAL: Puppet has not run in the last 10 hours [05:14:36] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Mon Apr 1 05:14:31 UTC 2013 [05:15:06] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [06:05:07] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [06:07:17] PROBLEM - RAID on db11 is CRITICAL: CRITICAL: Defunct disk drive count: 1 [06:07:47] PROBLEM - Disk space on db11 is CRITICAL: DISK CRITICAL - free space: /a 21899 MB (2% inode=99%): [06:21:17] PROBLEM - Host mw27 is DOWN: PING CRITICAL - Packet loss = 100% [06:22:17] RECOVERY - Host mw27 is UP: PING OK - Packet loss = 0%, RTA = 26.58 ms [06:24:27] PROBLEM - Apache HTTP on mw27 is CRITICAL: Connection refused [06:29:57] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Mon Apr 1 06:29:55 UTC 2013 [06:30:07] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [06:30:30] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Mon Apr 1 06:30:24 UTC 2013 [06:31:07] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [06:32:27] RECOVERY - Apache HTTP on mw27 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.152 second response time [07:04:21] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [07:06:01] PROBLEM - Disk space on db11 is CRITICAL: DISK CRITICAL - free space: /a 21473 MB (2% inode=99%): [07:06:31] PROBLEM - RAID on db11 is CRITICAL: CRITICAL: Defunct disk drive count: 1 [07:14:51] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Mon Apr 1 07:14:42 UTC 2013 [07:15:21] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [07:45:24] !log depooling ms-fe1 for staging [07:50:27] New review: Faidon; "I think it's okay, but as a general comment, I'd rather change the logic of this altogether. I think..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/55302 [08:04:06] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [08:04:29] paravoid: want to boot morebots? :) [08:04:38] * jeremyb_ will re!log everything if it's soon [08:06:16] PROBLEM - RAID on db11 is CRITICAL: CRITICAL: Defunct disk drive count: 1 [08:06:46] PROBLEM - Disk space on db11 is CRITICAL: DISK CRITICAL - free space: /a 20996 MB (2% inode=99%): [08:07:46] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Mon Apr 1 08:07:42 UTC 2013 [08:08:06] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [08:08:25] New review: Yurik; "Faidon, I spoke with dfoy regarding this, and according to him, the reason for this logic is that is..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/55302 [08:12:35] New review: Faidon; "a) this isn't just about zero, zero is just one part of this (and traffic-wise, probably small part)..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/55302 [08:12:53] New review: MaxSem; "Still, can redirection be done in PHP instead of Varnish?" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/55302 [08:13:32] yurik_, where are you and why you're not asleep?:) [08:14:02] i forget what borough yurik_ lives in [08:14:05] * jeremyb_ is in kings [08:14:13] i'm in a hotel in SF [08:14:19] flying back tomorrow night [08:14:36] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Mon Apr 1 08:14:28 UTC 2013 [08:14:36] and it takes forever for the laundry to dry :( [08:14:55] where is "back"? [08:15:06] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [08:16:49] paravoid: my city! [08:16:55] i assume [08:19:05] New patchset: Yurik; "Unified default lang redirect from m. & zero." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/55302 [08:23:20] New review: Yurik; "You are talking about a double-redirect - not very good for a mobile, often no-pictures device that ..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/55302 [08:23:42] New review: Yurik; "patch 10 is a rebase" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/55302 [08:24:07] paravoid, NYC [08:24:20] jeremyb_, yep :) [08:24:24] :) [08:27:51] speaking of varnish... [08:28:07] paravoid, can you review https://gerrit.wikimedia.org/r/#/c/56502/ please? [08:29:07] paravoid, maxsem, rebased and answered. Maybe one day we will have a better solution for a global Zero home page. Or start using browser's lang settings. Maybe. [08:30:35] I don't understand what double redirect you're talking about [08:31:06] basically, go to zero.wp.o -> get redirected by php instead of varnish [08:31:33] slightly more load on backend apaches, but helluva more flexible [08:32:37] if you properly do caching, it shouldn't be much load [08:32:49] anyway, wait for mark's review too [08:34:31] another plus is that you will need no ops review then:) [08:38:41] MaxSem, no ops review is a big plus, totally agree there :) [08:39:39] doing it in PHP is, of course, an option, but that would (if agreed) be another patch [08:40:04] here it was a simple consolidation of logic so that new telcos can be launched quickly [08:41:11] MaxSem, re-redirects - if i understand correctly, we don't have a server at zero.wikipedia.org - all requests were redirected to en.zero.wikipedia.org -- hence the double redirect zero->en.zero->xx.zero [08:41:56] should be fixable [08:42:00] i could be mistaken re nonexistance of zero. server of course - but at least that's the logic we had in varnish before [08:43:19] MaxSem, are you suggesting we have a server just to do zero redirects? Or to point zero.wikipedia.org to en.zero? [08:44:14] every apache has docroots for every WMF wiki [08:44:23] I'm not sure what you're implying about ops review [08:44:44] it was a compliment to ops :) [08:45:07] general infrastructure still needs ops review, yes [08:45:32] but just like with MediaWiki, not every php revision will ned to be dragged to you for review [08:45:50] btw, it isn't nice that zero.wikipedia.org exposes XVO headers [08:46:14] that's all of .m. I think [08:46:26] not just XVO, all of our internal X- headers [08:46:29] the reviews are very thorough, hard to sneak stuff past you :) [08:46:52] but to get back to the subject [08:46:58] oooooph [08:47:16] I don't understand why when I type "m.wikipedia.org" on my phone I get redirected to english wikipedia [08:47:35] why did you guess english and not greek? [08:47:45] why are you even in the business of guessing? [08:47:58] we don't do that for www. on any of our projects [08:48:02] I don't see why we do it for mobile. [08:48:29] also note that there's not an m.$project.org for the other projects, just wikipedia [08:48:42] which is very inconsistent I think [08:48:55] I think all these need to be dealt with and I think varnish (and ops) is the wrong place to deal with them [08:50:38] huh. there's an en.m.wiktionary.org but not an m.wiktionary.org [08:50:56] paravoid, i agree that we should reexamine it, but in the case of ZERO (note, this is not m. vs zero., but rather - zero cost arranged with carriers), we have somewhat different requirements - we don't guess the language, we show the one that we aggreed upon when signing legal paperwork [08:51:21] I don't think you understand [08:51:27] damn lawyers [08:51:31] I don't want to be involved in all that :) [08:51:41] heh, neither do i to be honest [08:51:47] but it's your job :) [08:52:03] nah, my job is to get you to approve my changes ;) [08:52:44] NOT APPROVED:P [08:52:59] * yurik_ throws snowball at MaxSem [08:53:06] yeah, I think you're telling me that the only way to move this forward is to just -2 all of your changes :) [08:53:19] lol [08:53:32] :-) [08:53:34] yurik_, is there snow in SF? [08:53:40] or even NYC? [08:53:49] NYC had some last monday [08:54:08] and i'm sure i can scrape enough in the freezer for one snoball [08:54:55] no, we had very light rain off and on today [08:54:58] no snow [08:55:49] paravoid, so what about https://gerrit.wikimedia.org/r/#/c/56502/ ?:) [08:56:52] aanyway, this patch was not meant to change our philosophical approach to zero & m domains, only to consolidate current logic without substantial changes. If you have strong feelings about it, lets add it to the http://www.mediawiki.org/wiki/Requests_for_comment/Zero_Architecture#Varnish [08:58:07] we all try to "make the world a better place" (tm), lets try to do it constructivelly.. [08:58:13] yurik_, I don't think anyone was blocking your changes to make you change the architecture [08:58:31] notice how my comment didn't come with a -1/-2. [08:58:48] but still you've been explained that there's a better way [08:58:49] paravoid, i did appreciate that ;) [09:00:33] MaxSem, i would love to reexamine this, assuming we can demonstrate a php-based solution which is just as fast for the clients and satisfy "business requirements". (I don't want to write two varnish plugin functions when i can do just 1, or better yet - zero!) [09:01:50] it seems the only true need ATM is to determine X-CS (carrier) id at the varnish level [09:02:14] otherwise we don't have any data for analytics [09:02:44] or a way to customize banners [09:03:41] i got to catch some sleep, MaxSem & paravoid ,would love to pick your brains tomorrow re optimal way to redirect [09:04:39] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [09:04:44] X-CS has nothing to do with this [09:06:19] PROBLEM - Disk space on db11 is CRITICAL: DISK CRITICAL - free space: /a 20494 MB (2% inode=99%): [09:06:49] PROBLEM - RAID on db11 is CRITICAL: CRITICAL: Defunct disk drive count: 1 [09:07:04] paravoid, X-CS is the carrier ID, and that determines which redirect target to use [09:07:27] you think that after all these reviews I don't know what X-CS is for? :) [09:07:42] hehe [09:07:46] I never said that we shouldn't do X-CS at the Varnish layer [09:07:57] I only talked about the m.wikipedia.org & zero.wikipedia.org redirects [09:07:58] if we will say that carrier no longer determines which default language to show, than yes [09:08:13] carrier might just as well determines that [09:08:23] X-CS gets passed to mediawiki and mediawiki can Vary [09:09:01] if php returns 302 redirect, will it be cached? [09:09:12] (per X-CS header) [09:09:45] it can be cached, yes. [09:10:19] depends on MW headers [09:10:26] I don't think we override them for 3xx [09:11:05] i would love to change zero config in that case [09:11:50] sigh [09:11:54] it's not just about zero [09:11:56] could you help with configuring varnish to cache redirects from apache, and to direct all m.* and zero.* to en [09:12:03] I'm not sure how many times I have to say the same thing [09:12:16] sorry, must have missed something [09:12:37] this is also about "m.wikipedia.org" [09:12:59] anyway, let's wait for mark or maybe asher too to comment on that topic [09:13:07] m.wiki if coming from the Zero carrier gets the same rules IIRC [09:13:20] if they agree too that this is a good way forward, I think we should coordinate with the core mobile team about making that happen [09:13:36] and then you can discuss internally between core mobile/partners about specific zero needs [09:13:39] how's that for a plan? [09:13:42] MaxSem: ^ [09:13:52] sounds good to me - i will make changes to that RFC to specify redirect ideas [09:14:25] trust me, i would much rather do the bouncing in Zero extension than varnish [09:14:45] feel free to change that page btw [09:14:58] varnish is flexible enough to put as little or as much logic as you want to it [09:15:10] there are people doing memcached or redis queries from varnish and whatnot [09:15:23] in our case I don't think we should complicate things too much [09:15:45] totally agree [09:17:19] off to bed, thanks for great suggestions and getting me acclimatized with zero stuff & mobile varnish stuff in general [09:17:30] :) [09:17:35] no worries :) [09:17:49] api was easier ;) [09:19:02] New review: Faidon; "If you say so :)" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/56502 [09:42:00] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: Puppet has not run in the last 10 hours [09:42:00] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: Puppet has not run in the last 10 hours [09:42:00] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: Puppet has not run in the last 10 hours [10:04:41] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [10:06:21] PROBLEM - Disk space on db11 is CRITICAL: DISK CRITICAL - free space: /a 19969 MB (2% inode=99%): [10:06:51] PROBLEM - RAID on db11 is CRITICAL: CRITICAL: Defunct disk drive count: 1 [10:26:51] PROBLEM - Puppet freshness on virt1005 is CRITICAL: Puppet has not run in the last 10 hours [11:03:53] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [11:06:03] PROBLEM - RAID on db11 is CRITICAL: CRITICAL: Defunct disk drive count: 1 [11:06:23] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:06:33] PROBLEM - Disk space on db11 is CRITICAL: DISK CRITICAL - free space: /a 21345 MB (2% inode=99%): [11:07:13] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [11:18:42] New review: Nemo bis; "WONTFIX, please abandon." [operations/mediawiki-config] (master) C: -1; - https://gerrit.wikimedia.org/r/56101 [11:37:19] New patchset: Faidon; "Swift: add a document root container" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56911 [11:38:56] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56911 [11:54:15] !log repooling ms-fe1; staggered depool/restart/pool of ms-fe2-4 [12:04:10] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [12:06:20] PROBLEM - RAID on db11 is CRITICAL: CRITICAL: Defunct disk drive count: 1 [12:06:50] PROBLEM - Disk space on db11 is CRITICAL: DISK CRITICAL - free space: /a 20650 MB (2% inode=99%): [12:08:01] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Mon Apr 1 12:07:58 UTC 2013 [12:08:10] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [12:09:11] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Mon Apr 1 12:09:05 UTC 2013 [12:10:10] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [12:10:11] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Mon Apr 1 12:10:06 UTC 2013 [12:11:10] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [12:11:13] Change abandoned: Odder; "Abandoning per Bugzilla bug." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/56101 [12:11:50] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Mon Apr 1 12:11:46 UTC 2013 [12:12:10] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [12:12:32] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Mon Apr 1 12:12:27 UTC 2013 [12:13:10] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [12:14:30] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Mon Apr 1 12:14:27 UTC 2013 [12:15:10] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [12:19:20] New patchset: Faidon; "Swift: add rewrites for legacy math URLs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56912 [13:02:17] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:03:08] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [13:05:28] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [13:07:08] PROBLEM - Disk space on db11 is CRITICAL: DISK CRITICAL - free space: /a 20050 MB (2% inode=99%): [13:07:38] PROBLEM - RAID on db11 is CRITICAL: CRITICAL: Defunct disk drive count: 1 [13:55:28] !log added webstatscollector-0.1-2 to apt repo [14:06:12] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [14:08:22] PROBLEM - RAID on db11 is CRITICAL: CRITICAL: Defunct disk drive count: 1 [14:08:52] PROBLEM - Disk space on db11 is CRITICAL: DISK CRITICAL - free space: /a 20526 MB (2% inode=99%): [14:25:08] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56912 [14:39:19] !log restarted glusterfs-server on labstore1 and labstore2, because of read errors in public/datasets/public in bots project [14:39:49] morebots died [14:42:27] dang [15:05:18] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [15:05:58] PROBLEM - Puppet freshness on virt3 is CRITICAL: Puppet has not run in the last 10 hours [15:07:34] PROBLEM - RAID on db11 is CRITICAL: CRITICAL: Defunct disk drive count: 1 [15:07:58] PROBLEM - Disk space on db11 is CRITICAL: DISK CRITICAL - free space: /a 19900 MB (2% inode=99%): [16:02:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:03:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [16:05:22] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [16:05:46] New patchset: Ottomata; "Adding define ganglia::view for abstracting ganglia web views." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56921 [16:06:21] morning! Ryan_Lane would love a review of that whenever you get a sec [16:06:37] sure [16:07:02] PROBLEM - Disk space on db11 is CRITICAL: DISK CRITICAL - free space: /a 19289 MB (2% inode=99%): [16:07:10] (ergh I see some trailing whitespace, gonna fix) [16:07:32] PROBLEM - RAID on db11 is CRITICAL: CRITICAL: Defunct disk drive count: 1 [16:08:02] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Mon Apr 1 16:07:58 UTC 2013 [16:08:22] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [16:09:12] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Mon Apr 1 16:09:05 UTC 2013 [16:09:15] New patchset: Ottomata; "Adding define ganglia::view for abstracting ganglia web views." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56921 [16:09:22] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [16:10:12] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Mon Apr 1 16:10:05 UTC 2013 [16:10:22] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [16:11:02] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Mon Apr 1 16:10:58 UTC 2013 [16:11:22] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [16:11:52] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Mon Apr 1 16:11:46 UTC 2013 [16:12:01] looking [16:12:22] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [16:12:32] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Mon Apr 1 16:12:27 UTC 2013 [16:13:22] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [16:14:32] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Mon Apr 1 16:14:28 UTC 2013 [16:15:22] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [16:18:31] ottomata: why does labs and production have different configs? [16:18:56] there's not a really great reason for that [16:19:22] that's the way they are? it took me a long time to figure out where prod ganglia's conf dir was [16:19:23] we'll eventually want that manifest to be in a module [16:19:34] yeah totally [16:19:44] i figured we'd change that when the time came [16:19:45] and putting realm specific things there will make it hardr [16:19:47] but here is a better q [16:19:47] *hardr [16:19:51] damn it [16:20:14] ganglia.wikimedia.org apache vhost has [16:20:14] Alias /latest /srv/org/wikimedia/ganglia-web-3.5.4+security [16:20:31] and [16:20:51] /srv/org/wikimedia/ganglia-web-3.5.4+security/conf_default.php has: [16:20:51] $conf['ganglia_web_root']="/srv/org/ganglia_storage/3.5.1"; [16:26:22] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:27:13] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.129 second response time [16:36:49] New patchset: Ottomata; "Adding define ganglia::view for abstracting ganglia web views." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56921 [16:41:36] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56921 [16:42:04] ottomata: that's a nice class [16:42:22] er, resource [16:42:23] yeah, i was wondering how those few views that are in ganglia got there [16:42:27] define! [16:42:27] hehe [16:42:39] i wanted a udp2log (and other) ones [16:42:53] for analytics nodes I had once made a stupid php file taht just gathered the graphs from ganglia manually and put them on a page [16:42:55] this is much better [16:43:27] I shoudl convert the two existing views (swift frontend and backend) to use this in puppet too [16:44:00] btw about the udp graphs.. i think there's something crappy like if you change the metric type you have to kill & restart the ganglia aggregator for it to matter, otherwise the previous value sticks [16:44:04] something like that [16:46:19] hmm, ay, will try that [16:46:33] i still see the counter graph for the ones we changed [16:46:58] no, i mean: kill the receiver end :-/ [16:47:09] or change the metric names slightly so that they go into a new metric [16:48:22] New patchset: Ottomata; "Ensuring ruby-json is installed on puppetmasters" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56927 [16:49:30] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56927 [16:54:48] COOool, thanks ori-l + Ryan_Lane!: [16:54:49] http://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&tab=v&vn=udp2log [16:55:05] http://ganglia.wikimedia.org/latest/graph_all_periods.php?hreg[]=locke%7Cemery%7Coxygen%7Cgadolinium&mreg[]=pkts_in&z=large>ype=stack&title=pkts_in&aggregate=1&r=hour [16:56:00] sweet [16:56:17] i just restarted ganglia-monitor on the two aggregators for misc eqiad [16:56:30] hopefully that will fix the few other udp stats I want to add [17:05:01] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [17:06:11] PROBLEM - RAID on db11 is CRITICAL: CRITICAL: Defunct disk drive count: 1 [17:06:41] PROBLEM - Disk space on db11 is CRITICAL: DISK CRITICAL - free space: /a 18741 MB (2% inode=99%): [17:13:55] New patchset: Cmjohnson; "Adding rdb1/2 dhcpd" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56931 [17:14:31] !log moved aside williams:/opt/otrs-home/.spamassassin/auto-whitelist and restarted spamd to test whether the bloated AWL is resulting in poor filtering [17:16:30] Change merged: Cmjohnson; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56931 [17:23:30] New patchset: Cmjohnson; "Setting absent to a public id for chris johnson" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56936 [17:24:57] robh: can you merge that ^ [17:27:12] !log reedy synchronized wmf-config/ExtensionMessages-1.22wmf1.php [17:29:53] 1.22wmf1! [17:30:11] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56710 [17:34:34] cmjohnson1: i take it you no longer use that key then yes [17:34:34] ? [17:34:55] correct [17:35:22] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56936 [17:35:25] cmjohnson1: ok, i leave sockpuppet merge to you [17:35:42] cool thx [17:44:56] cmjohnson1: are you at DC physically ? [17:45:43] not at the moment...i have someone coming to the house today...will be in later though. whats up? [17:46:03] okay [17:46:08] need to test out the sfp-t's [17:46:22] oh..okay..where do you want them? [17:46:58] i am thinking a5 and moving a normal machine to the 4500 [17:47:04] lemme see if anything is available [17:47:08] kk [17:47:33] notpeter: are any of the search machines in eqiad (search1001-1024 at least) ok to disconnect for a minute and then reconnect ? [17:47:34] mutante-away, are you still away? ;) [17:48:14] jgonera: still looking for a verification scheme? [17:48:23] jgonera: did you update gerrit/wikitech? [17:48:28] I did [17:48:30] I mean [17:48:33] gerrit yes [17:48:36] what's the other one? [17:48:42] wikitech.wikimedia.org [17:48:46] same creds as gerrit [17:49:03] LeslieCarr: nope, they are serving traffic [17:49:16] we can switch traffic to pmtpa, though [17:49:23] although I might take down the site in doing so [17:50:04] ^ hrmmm let's think about that for a minute [17:50:20] notpeter: speaking of search... seen 4844? [17:50:25] jeremyb_, ok, wikitech updated too [17:50:31] cmjohnson1: sorry, i was joking [17:50:34] I can definitely do that [17:50:34] hehehe [17:50:58] ok, if we can do that (or just take down one), whichever is easier [17:51:07] yeah...i know...sarcasm is never clear on irc [17:51:08] or just take the site down ;) nobody really uses it [17:51:31] jeremyb_, mutante-away said he needs to do a hangout or a phone call with me to verify that I'm myself, so I'm still waiting [17:51:48] jgonera: right, i remember [17:52:09] (doesn't have to be him... but he seems active...) [17:52:21] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:52:51] who else can do that? he replied to my e-mail 10 minutes ago so I guess he should be around soon [17:53:12] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.128 second response time [17:55:04] !log reedy synchronized php-1.22wmf1 'Initial sync of php-1.22wmf1' [17:55:45] !log reedy synchronized docroot [17:56:29] !log reedy synchronized w [17:56:30] jgonera: well i guess it should ideally be someone who knows you already. idk who that would be though. and probably easiest if it's a root [17:57:37] New review: Nikerabbit; "Maxing is short for making sure?" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56493 [17:59:10] so, uh, https://wikitech.wikimedia.org/wiki/Server_admin_log doesn't seem to be updating.... [17:59:26] oh, there's no echo bot in here [17:59:30] ohhhh, i almost forgot it's bblack's first day [17:59:46] welcome! :) [17:59:47] me too, good thing I remembered :) [17:59:50] hah [17:59:53] "echo bot" being the bot that echo's every log message because we have a change of bots for loggin ;) [18:00:03] morebots died [18:00:09] greg-g: i think it's been gone almost 16 hours [18:00:10] iirc [18:00:12] that's it, I forgot the name [18:00:17] 04.49 -!- Netsplit *.net <-> *.split quits: +morebots, godog, paravoid, mark, Vito, andrewbogott_afk, AaronSchulz [18:00:23] greg-g: more generic name is ircecho [18:00:27] oh, well then [18:00:30] re split [18:02:03] i have the stuff that was missed all ready to go to re!log once the bot is back [18:06:02] jeremyb_: awesome, thanks much [18:08:40] brion, paravoid & MaxSem suggested last night that we could set DNS entry for zero.wikipedia.org to en.zero (without redirects), and also do the language-specific redirect from PHP, since 302 can be cached per X-CS header. Can you think of any issues with this approach? [18:09:10] no, we discussed something else [18:09:13] this would cause at-most one language redirect [18:09:36] MaxSem, please clarify, that was my understanding [18:09:54] hmm. Looks like there is a high number of segfaults currently in the apache syslogs... [18:10:01] ruh roh [18:10:08] what exactly is X-CS? [18:10:15] we talked about creating specialised a zero portal (that also could redirect people directy to a wiki if needed) accessible precisely at zero.wp.o [18:10:16] carrier code [18:11:45] ah, numeric X-Carrier. but what does CS mean? [18:11:59] i was thinking it was an abbreviation like XVO [18:14:02] jeremyb_: 'carrier short' is the rumor [18:14:21] aha [18:15:22] !log restarted morebots [18:15:28] Logged the message, Master [18:15:29] danke [18:15:32] brion, thanks, i had no idea myself :) [18:15:32] yw [18:15:52] Reedy: we ok with the apaches on mediawiki.org/test? [18:15:57] !log 03:22:54 !log reinstated max_user_connections = 80 for wikiadmin on s1 [18:16:00] !log 03:44:43 !log asher synchronized wmf-config/db-eqiad.php 'pulling db1051' [18:16:01] ok, so is there anyone who could change (or actually add since the old one is gone) my SSH key to stat1? [18:16:02] What do you mean? [18:16:03] Logged the message, Master [18:16:03] !log 03:58:05 !log asher synchronized wmf-config/db-eqiad.php 'returning db1051' [18:16:07] !log 07:45:24 !log depooling ms-fe1 for staging [18:16:08] Logged the message, Master [18:16:13] Logged the message, Master [18:16:19] Logged the message, Master [18:16:23] jgonera: You can make the change to the puppet repo yourself. You just need someone to deploy it ;) [18:16:37] Reedy: you indicated there were apache segphaults, just making sure it wasn't related to today's deploy [18:16:37] MaxSem, i over-concentrated on the 302 redirect point. Having a dedicated portal vs a page in m.zero.wikipedia.org could be discussed too [18:17:01] greg-g: Can't be, the code isn't in use anywhere yet... [18:17:10] !log restarted morebots. wikitech-static had 100% / partition. Have also fixed that and its cause. [18:17:16] Logged the message, Master [18:17:18] Reedy: oh, wasn't sure, thanks. :) [18:17:23] yurik, so the whole point of discussion was "do it in PHP" [18:17:33] Reedy, mutante-away insisted to have a hangout or a phonecall with me before that, to verify my identity... [18:17:43] IIRC segfaults occur with other things like OOMs.. [18:17:49] MaxSem, which i support 100% [18:17:51] Reedy: /me blames morebot not logging to server admin log so I couldn't easily see what's been done ;) [18:18:18] Use the saj instead ;) [18:18:30] saj? [18:18:33] !log 11:54:15 !log repooling ms-fe1; staggered depool/restart/pool of ms-fe2-4 [18:18:33] MaxSem, #wikimedia-mobile is quieter [18:18:36] !log 13:55:28 !log added webstatscollector-0.1-2 to apt repo [18:18:38] Logged the message, Master [18:18:39] !log 14:39:19 !log restarted glusterfs-server on labstore1 and labstore2, because of read errors in public/datasets/public in bots project [18:18:42] !log 17:14:31 !log moved aside williams:/opt/otrs-home/.spamassassin/auto-whitelist and restarted spamd to test whether the bloated AWL is resulting in poor filtering [18:18:44] Logged the message, Master [18:18:45] !log reedy Started syncing Wikimedia installation... : test2wiki to 1.22wmf1 and rebuild l10n cache [18:18:46] !log 17:27:13 !log reedy synchronized wmf-config/ExtensionMessages-1.22wmf1.php [18:18:50] Logged the message, Master [18:18:55] Ryan_Lane: so, about that better pipe-line for logging messages......... ;) [18:18:56] Logged the message, Master [18:19:01] Logged the message, Master [18:19:05] greg-g: heh [18:19:06] Logged the message, Master [18:19:07] jeremyb_: you know you can just edit the wikipage, right? :) [18:19:09] you relogging for us jeremyb_? [18:19:16] I was discussion that the other day [18:19:16] *discusing [18:19:16] ottomata: yeah [18:19:25] man, my brain to hand skills are rough lately [18:19:43] paravoid: no one's complained about this way in the past. maybe it was at a less active time of day though [18:19:46] pebkab [18:19:48] !log 17:55:04 !log reedy synchronized php-1.22wmf1 'Initial sync of php-1.22wmf1' [18:19:51] !log 17:55:45 !log reedy synchronized docroot [18:19:54] !log 17:56:29 !log reedy synchronized w [18:19:54] Logged the message, Master [18:19:56] !log re!logged from earlier today while morebots was gone (all times UTC) [18:20:00] Logged the message, Master [18:20:05] jeremyb_: so, every individual !log is an edit, which is a revision [18:20:05] Logged the message, Master [18:20:10] lolol [18:20:11] Logged the message, Master [18:20:14] Ryan_Lane: Compression! [18:20:18] Reedy: :) [18:20:21] !log Ran scap-recompile to build texvc [18:20:26] Ryan_Lane: right? but that's how it would be anyway [18:20:26] Logged the message, Master [18:20:43] mw10.pmtpa.wmnet: rsync: failed to set times on "/usr/local/apache/common-local/live-1.5": Operation not permitted (1) [18:20:43] mw40.pmtpa.wmnet: rsync: failed to set times on "/usr/local/apache/common-local/live-1.5": Operation not permitted (1) [18:20:45] jeremyb_: not if you edited the page and added all of the entries in one edit :) [18:20:50] :D [18:20:54] Why is live-1.5 still showing up.. [18:20:58] Ryan_Lane: no, but if the bot was alive to begin with... [18:21:06] morebots just does the wiki edit, then? [18:21:06] I am a logbot running on wikitech-static. [18:21:06] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [18:21:06] To log a message, type !log . [18:21:16] gj morebots [18:21:19] morebots [18:21:19] I am a logbot running on wikitech-static. [18:21:19] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [18:21:19] To log a message, type !log . [18:21:23] hah [18:21:30] moarbots? [18:21:36] no no, moarboats [18:21:49] lessbots [18:22:08] catbots [18:22:17] grepbots [18:22:27] that could be actually useful :D [18:22:28] deadjokebots [18:24:13] we really need to change how the bot works [18:24:29] and dump all the old logs to a static file, then purge all the revisions [18:25:08] oh right, wiki migration problem [18:25:15] and replication [18:25:20] yeah [18:25:52] integrate into status.wikimedia.org (or similar) (that static log file)? [18:26:16] nah, just link to it [18:26:24] the static page would be the old logs [18:26:42] ideally we'd import all the individual log messages too, but that's more difficult [18:27:02] it would be better to have the log messages in a database table [18:27:58] an idea was to use EventLogging [18:30:50] Ryan_Lane, RobH, would you have a few minutes to help me with RT-Ticket: wikimedia #4854 ? [18:32:00] (new SSH key) [18:33:09] jgonera: hint: they're in the middle of a meeting now and so they can't do the phone/hangout with you while also in the meeting [18:33:15] :-) [18:33:28] New patchset: Cmjohnson; "fixing spacing error" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56945 [18:33:53] oh, ok, since I saw them talking on IRC I thought they weren't busy [18:34:00] do you know when this meeting ends? [18:34:09] no [18:34:46] if i had to guess i'd say it's a meeting that generally just ends naturally depending on how big the agenda is, etc. [18:35:05] uh... [18:36:21] jgonera, it officially ends in 1h [18:36:25] but could end before that [18:36:32] thanks [18:41:15] Change merged: Cmjohnson; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56945 [18:48:54] !log testing logbot [18:49:01] Logged the message, Mistress of the network gear. [18:50:11] * jeremyb_ thought it was well tested! [18:50:56] Not sure if someone brought this up already, but per request dropping https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Replag as an FYI [18:51:00] binasher ^ [18:51:50] StevenW: yep, read it a couple days ago [18:51:58] k [18:52:08] https://en.wikipedia.org/w/api.php?action=query&meta=siteinfo&siprop=dbrepllag&sishowalldb= [18:52:12] There isn't any replag currently :p [18:52:51] Reedy, I bet it's TS replag;) [18:53:07] binasher: Just stop them editing. That'll fix it [18:54:34] binasher: FYI, there seems to be an increased number of apache segfaults atm [18:55:50] !log reedy Finished syncing Wikimedia installation... : test2wiki to 1.22wmf1 and rebuild l10n cache [18:55:56] Logged the message, Master [18:57:22] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:58:13] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.135 second response time [18:59:31] most of the segfaults are coming from just mw1103 [18:59:48] binasher: out of date code? [19:00:16] I see numerous apaches.. [19:00:25] Reedy: where are you looking? [19:00:28] Apr 1 18:45:03 10.64.0.72 apache2[15745]: [notice] child pid 13612 exit signal Segmentation fault (11) [19:00:28] Apr 1 18:45:04 10.64.16.171 apache2[27367]: [notice] child pid 2676 exit signal Segmentation fault (11) [19:00:28] Apr 1 18:45:04 10.64.16.68 apache2[28216]: [notice] child pid 16887 exit signal Segmentation fault (11) [19:00:28] Apr 1 18:45:04 10.64.16.74 apache2[11437]: [notice] child pid 13179 exit signal Segmentation fault (11) [19:00:28] Apr 1 18:58:52 10.64.0.61 apache2[10530]: [notice] child pid 26372 exit signal Segmentation fault (11) [19:00:40] tail -n 1000 /home/wikipedia/syslog/apache.log | grep Segmentation [19:00:46] ah. segfaults, not 500s [19:00:52] asher@fenari:/h/w/log/syslog$ grep "Segmentation fault" apache.log | awk '{print $4}' | sort | uniq -c | sort -rn | head [19:00:52] 21785 10.64.16.83 [19:00:54] 104 10.64.0.49 [19:00:54] 99 10.64.16.173 [19:00:55] Reedy: ^^ [19:01:06] Haha [19:01:22] * Reedy points at the woods [19:01:59] i stopped apache there and am running sync-common, then will start [19:03:12] PROBLEM - Apache HTTP on mw1103 is CRITICAL: Connection refused [19:03:40] !log reedy synchronized php-1.22wmf1/includes/ [19:03:45] Logged the message, Master [19:04:14] Could someone run dsh -F25 -cM -g mediawiki-installation -o -oSetupTimeout=10 'chown mwdeploy:mwdeploy /usr/local/apache/common-local/live-1.5' for me please? [19:04:14] seems the live-1.5 folder is owned by root:root, causing noise during scap etc. Thanks [19:06:22] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [19:08:02] PROBLEM - Disk space on db11 is CRITICAL: DISK CRITICAL - free space: /a 19322 MB (2% inode=99%): [19:08:35] <^demon> Reedy: I thought live-1.5 was removed in favor of w/? [19:08:35] PROBLEM - RAID on db11 is CRITICAL: CRITICAL: Defunct disk drive count: 1 [19:08:40] ^demon: it is/was [19:08:51] I think it's just been left around as a symlink to prevent breaking anything [19:08:58] ^demon, to remove it, you have to unprotect it:) [19:09:12] RECOVERY - Apache HTTP on mw1103 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.068 second response time [19:10:51] !log reedy synchronized php-1.22wmf1/extensions/ 'Wikidata extensions to same as 1.21wmf12' [19:10:57] Logged the message, Master [19:12:57] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: mediawikiwiki and testwiki to 1.22wmf1 [19:13:03] Logged the message, Master [19:14:04] New patchset: Reedy; "1.22wmf1 deploy" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/56952 [19:15:10] New patchset: Reedy; "1.22wmf1 deploy" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/56952 [19:15:55] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: mediawikiwiki [19:16:01] Logged the message, Master [19:16:36] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/56952 [19:16:58] Heya, Jeff_Green! [19:17:02] poke on locke -> gadolinium! [19:20:42] well now we're fundraising again dammit [19:27:47] haha [19:27:55] ori-l: can you take a look rt 4844? [19:28:56] * jeremyb_ is pretty confident... [19:30:34] aren't we always fundraising!~ [19:30:36] ? [19:30:36] hehe [19:30:56] Jeff_Green: can you puppetize your stuff on gadolinium, while still leaving locke in place doing its thang? [19:30:58] OR [19:31:01] we could give locke over to FR [19:31:11] and you can use it as your dedicated FR udp2log server :p [19:31:15] imo we should deprecate locke, so I hate that idea [19:31:19] hehe [19:31:49] puppetize--there's not that much to puppetize, once I have all-clear to be disruptive it shouldn't take very long. it's just a script, a mount, and a cron job [19:37:35] ottomata, is this meeting over? [19:39:10] jgonera: you trust the current machine completely? [19:39:26] i don't understand the generation of a new key now [19:39:34] comments are easy to change [19:39:48] I do, it's my old desktop, now used by my mother, I'll wipe everything before flying back to SF [19:39:59] I didn't know if they were tied to the key itself or not [19:40:06] so I just quickly created a new one [19:40:13] they are not [19:40:21] it's not because I suspect that someone stole my key or something ;) [19:40:26] it's just so you know which is wish [19:40:30] which is which* [19:40:32] ok, I'll know for the next time [19:41:09] jeremyb_, are you able to add it? [19:41:27] jgonera: i'm not able to do much of anything [19:41:44] is everyone else still in this meeting? [19:41:44] i suppose i could verify that it's the same as the one in wikitech [19:42:01] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: Puppet has not run in the last 10 hours [19:42:01] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: Puppet has not run in the last 10 hours [19:42:02] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: Puppet has not run in the last 10 hours [19:42:04] it is the same one [19:42:07] and gerrit [19:42:25] but the important things are 1) verifying by phone or hangout or teleportation and 2) deploying it [19:42:32] i can't do either of those things. AFAIK [19:43:03] where are the people who can? :( [19:43:28] possibly getting lunch [19:43:33] idk [19:43:40] i'll be back in a few mins [19:44:51] I'm leaving now [19:44:59] it's kind of late where I am now... [19:45:10] well, I guess it'll have to wait till tomorrow [19:46:12] jgonera: you're CEST, no? [19:46:37] is CEST UTC+1? [19:46:55] those named time zones always confuse me [19:47:39] no [19:47:42] you're UTC+2 [19:47:48] cest is +2 [19:47:53] cet is +1 [19:48:10] uh, maybe it's the daylight savings that changes something [19:48:15] it's 9 hours later than SF [19:48:38] and 2 days ago it was 8 hours? [19:48:45] yes [19:48:50] right [19:49:06] RobH, are you there? [19:54:21] !log LocalisationUpdate completed (1.21wmf12) at Mon Apr 1 19:54:21 UTC 2013 [19:54:27] Logged the message, Master [19:59:44] !log LocalisationUpdate completed (1.21wmf12) at Mon Apr 1 19:59:44 UTC 2013 [19:59:50] Logged the message, Master [20:01:27] ? [20:02:10] <^demon> Yes? [20:03:13] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [20:03:28] * Reedy kicks icinga-wm [20:03:56] ^demon: why did LU change time? [20:04:10] Run manually [20:04:16] <^demon> That ^ [20:04:23] ah ok :) [20:04:41] <^demon> Trying to force a message to show up that's not. [20:05:13] RobH: OK, so, RT duty… That just means that I watch the RT front page and triage new tickets? Anything else? [20:05:23] PROBLEM - RAID on db11 is CRITICAL: CRITICAL: Defunct disk drive count: 1 [20:05:44] <^demon> andrewbogott: And when someone says "Hey, can someone look at rt-foobar," you get to do that too :) [20:05:53] PROBLEM - Disk space on db11 is CRITICAL: DISK CRITICAL - free space: /a 18510 MB (2% inode=99%): [20:06:08] Or other simple requests in IRC [20:06:46] <^demon> Such as "Please fetch sandwiches for everyone." :) [20:06:49] <^demon> See, nice and simple. [20:06:54] andrewbogott: I'm still waiting for someone to run this as root for me. Thanks! dsh -F25 -cM -g mediawiki-installation -o -oSetupTimeout=10 'chown mwdeploy:mwdeploy /usr/local/apache/common-local/live-1.5' [20:07:02] andrewbogott: ham and swiss, plz [20:07:34] andrewbogott: i think it's also going through queues and cleaning up old stuff [20:07:38] Reedy, is there a ticket for that? :) [20:07:43] andrewbogott: there's a manual... [20:07:48] New patchset: RobH; "Fixing Juliusz's ssh key" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56957 [20:07:54] New patchset: Ori.livneh; "(RT ????) Prune accounts from vanadium's manifest" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56958 [20:07:59] jeremyb_: really? [20:08:03] Nah. For this sort of thing, it's usually easier to just ask someone to do it ;) [20:08:13] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Mon Apr 1 20:08:02 UTC 2013 [20:08:13] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [20:08:16] andrewbogott: yes [20:08:22] link? [20:08:27] i'm looking! [20:09:01] New patchset: Ori.livneh; "(RT 4863) Prune accounts from vanadium's manifest" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56958 [20:09:19] * andrewbogott already doesn't know what to do about the very first ticket [20:09:23] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Mon Apr 1 20:09:17 UTC 2013 [20:09:45] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56957 [20:09:50] andrewbogott: which? [20:10:03] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [20:10:23] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Mon Apr 1 20:10:21 UTC 2013 [20:10:34] vanadium i guess [20:10:54] hrm? [20:11:03] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [20:11:23] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Mon Apr 1 20:11:20 UTC 2013 [20:11:29] which: https://rt.wikimedia.org/Ticket/Display.html?id=4862 [20:11:43] Oh, I suppose that isn't a public link. [20:12:04] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [20:12:14] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Mon Apr 1 20:12:11 UTC 2013 [20:12:16] andrewbogott: so, that one is for DBA. i guess either notpeter or binasher probably [20:12:26] but could be almost anyway [20:12:29] anyone* [20:12:42] andrewbogott: https://wikitech.wikimedia.org/wiki/Manual_for_ops_on_duty [20:12:52] jeremyb: Thanks. [20:13:03] * andrewbogott should've waited until after lunch to think about this [20:13:03] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [20:13:33] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Mon Apr 1 20:13:28 UTC 2013 [20:14:04] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [20:14:20] ori-l: did you see my ping earlier? [20:14:43] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Mon Apr 1 20:14:34 UTC 2013 [20:15:03] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [20:15:23] 01 19:27:55 < jeremyb_> ori-l: can you take a look rt 4844? [20:16:59] jeremyb_: oh, i missed it. looking now. [20:17:17] ori-l: danke. you can just comment there i guess [20:17:21] back in a bit [20:17:54] * jeremyb_ is pretty sure he's right though [20:18:22] jeremyb_: mm? i'm not sure what i'm supposed to say -- how am i relevant? [20:18:32] oh, the hebrew stuff [20:18:36] yes :) [20:18:54] ori-l, have you already gotten feedback from all the people you're disusering on vanadium? Is that patch ready to be merged? [20:19:08] the only reason i even investigated to begin with was because a local user on uk complained. and you can cover the he part :) [20:20:09] andrewbogott: yes. drdee objected on philosophical grounds, saying the question of access revocation in general needs to be discussed and examined, but he did not have a use-case and so surrendered his objection [20:20:28] ok. Merge coming up... [20:20:29] !log LocalisationUpdate completed (1.22wmf1) at Mon Apr 1 20:20:29 UTC 2013 [20:20:35] Logged the message, Master [20:21:29] !log demon synchronized php-1.21wmf12/cache/l10n 'Manually syncing 1.21wmf12 l10ncache' [20:21:29] New review: Andrew Bogott; "lgtm" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/56958 [20:21:31] Change merged: Andrew Bogott; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56958 [20:21:34] Logged the message, Master [20:21:43] PROBLEM - Apache HTTP on mw1068 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:22:03] PROBLEM - Apache HTTP on mw1021 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:22:33] RECOVERY - Apache HTTP on mw1068 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.062 second response time [20:22:43] andrewbogott: thanks very much, that was fast [20:22:53] RECOVERY - Apache HTTP on mw1021 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.071 second response time [20:24:23] jeremyb_: yes, confirmed [20:25:17] ori-l: are you commenting or should I? [20:26:00] jeremyb_: i commented [20:26:23] i see. toda [20:26:53] PROBLEM - Puppet freshness on virt1005 is CRITICAL: Puppet has not run in the last 10 hours [20:30:34] andrewbogott: do you know what the policy is with respect to home directories? i'd like to at least prune the ones that don't contain anything other than the skeleton .profile & co files, but i'm not sure if it's permissible to peek in a user's directory. [20:30:59] hah [20:31:04] I don't know what the policy is (or if there is a policy :( ) [20:31:07] i'll just leave them be, i guess [20:31:26] You could certainly prune the homedirs of users you're in contact with, right? [20:31:40] i think you can look in anyone's homedir, no? if they are doing stuff on WMF servers [20:31:45] if you want to prune [20:31:52] i'd just tarball what's there, back it up somewhere just in case [20:31:55] and remove what you want [20:32:07] That'd be my guess as well, but it's certainly polite to ask [20:33:53] PROBLEM - SSH on lvs1001 is CRITICAL: Server answer: [20:34:16] LeslieCarr: your favorite alert! [20:35:23] bblack: you may want to invest in a cloak [20:35:29] working on it :) [20:35:38] http://freenode.net/faq.shtml#nicksetup [20:36:08] and then... where is that link [20:36:17] I have the basic nickserv stuff going [20:36:36] someone changed the /topic and it's not there anymore... :( [20:36:43] i think i have the short url memorized [20:37:12] it's apparently https://bit.ly/cloakrequest [20:40:27] * jeremyb_ has fixed the #wikimedia-ops /topic to have cloak requests again :) [20:40:55] is it so frequent to nned one?:P [20:41:42] jeremyb_: doh [20:42:37] MaxSem: err, i think so [20:42:46] maybe i'm wrong [20:44:06] !log restarting lvs1001 [20:44:14] Logged the message, Mistress of the network gear. [20:44:16] !log CORRECTION restarting sshd on lvs1001 [20:44:21] Logged the message, Mistress of the network gear. [20:47:25] ok, i am doing a lvs failover from lvs1001 to lvs1002 [20:48:12] correction lvs1004 [20:48:36] !log failing over lvs from lvs1001 to lvs1004 [20:48:42] Logged the message, Mistress of the network gear. [20:51:00] RECOVERY - SSH on lvs1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [20:52:10] PROBLEM - SSH on lvs1004 is CRITICAL: Server answer: [20:53:10] RECOVERY - SSH on lvs1004 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [20:55:45] !log dist-upgrading lvs1001 [20:55:51] Logged the message, Mistress of the network gear. [21:03:44] !log demon synchronized php-1.22wmf1/cache/l10n 'Manually syncing 1.22wmf1 l10ncache' [21:03:51] Logged the message, Master [21:05:34] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [21:07:14] PROBLEM - Disk space on db11 is CRITICAL: DISK CRITICAL - free space: /a 18806 MB (2% inode=99%): [21:07:44] PROBLEM - RAID on db11 is CRITICAL: CRITICAL: Defunct disk drive count: 1 [21:09:13] !log rebooting lvs1001 [21:09:20] Logged the message, Mistress of the network gear. [21:09:52] on the good side, the lvs switch was seamless (as it should be but you never know...) [21:10:44] PROBLEM - Host lvs1001 is DOWN: CRITICAL - Host Unreachable (208.80.154.55) [21:12:37] cmjohnson1: in the dc ? [21:13:01] Reedy, did you manage to create that wiki? [21:13:14] not yet lesliecarr: i have a repairman at the house atm...going to be at least another hour [21:13:21] okay [21:13:25] Thehelpfulone: No [21:13:28] I didn't even try [21:13:43] mutante-away wanted to do it [21:17:31] the yet-another-private-wiki-with-overlapping-purposes-that-really-should-be-on-internal? [21:18:06] Nemo_bis, it's for the transition team, so not really [21:18:38] * Nemo_bis reads "yes" [21:20:34] RECOVERY - Host lvs1001 is UP: PING OK - Packet loss = 0%, RTA = 0.19 ms [21:24:13] lol [21:24:18] A wiki still seems extreme [21:26:06] <^demon> Creating wikis is serious business. [21:27:59] What's ironic is that one of its objectives is knowledge transfer and they start by fractioning information on yet another location. :) [21:28:17] I'm wondering what they'll do it on it that they can't do on officewiki - candidate discussion? [21:28:33] Nemo_bis, isn't internal being killed? [21:29:16] Nemo_bis: done. http://yet-another-private-wiki-with-overlapping-purposes-that-really-should-be-on-internal.wikimedia.org/ [21:29:34] Thehelpfulone: I doubt that "kill internal" is the 11th commandment [21:30:02] hashar: "malformed URL" [21:30:06] :) [21:30:16] hashar: Yay, no double subdomains that need extra SSL certs [21:39:29] heya, notpeter, have you any idea how to make gmetad (or whatever) take update after a .pyconf change? [21:39:43] this: [21:39:43] http://ganglia.wikimedia.org/latest/graph_all_periods.php?c=Miscellaneous%20eqiad&h=oxygen.wikimedia.org&v=1997694&m=RcvbufErrors&r=hour&z=default&jr=&js=&st=1364847743&vl=packets&ti=UDP%20Receive%20Buffer%20Errors&z=large [21:40:55] wait, hmm, before you answer... [21:42:10] ori-l, i think I still have the udp_stats as 'both', which is wrong [21:42:13] no wonder is hasn't changed [21:42:22] ahhh no no [21:42:24] hm [21:42:32] no its positive [21:42:47] just hadn't updated my local recently, ok so [21:42:48] yeah [21:42:54] notpeter, now if you know you can answer :p [21:44:42] wait [21:44:46] what are you trying to do? [21:44:50] notpeter: so any thoughts on rt 4844? [21:45:08] Change abandoned: Hashar; "solved differently in labs by creating a jenkins-deploy user. This change is no more needed (the oth..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53736 [21:45:09] so, ori-l and I had defined a new udp metric, files/ganglia/udp_stats.{py,pyconf} [21:45:14] the only reason i even investigated to begin with was a local complained [21:45:16] we set it up with a slope of 'both' [21:45:20] but then realized that was wrong [21:45:27] since each of the metrics there is a increasing counter [21:45:34] so, we changed the slope to 'positive' [21:45:41] but, the metrics in ganglia have not changed [21:45:56] heh, otto [21:46:28] I'd say restart the ganglia daemon on each othe the nodes that's sending data [21:46:31] did you do so already? [21:46:34] yes [21:46:43] have also one on the aggragtors for misc eqiad [21:46:45] done* [21:47:16] hhhhmmmm, not sure, tbh [21:47:41] New review: Hashar; "That is still a bit messy. I need another level of iterator, handling the dist by parsing the title ..." [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/56382 [21:47:43] maybe I coudl remove rrd files? [21:48:22] perhaps? [21:48:29] but that sounds like it has potential to fuck up a lot [21:49:31] yup [21:49:33] hm [21:51:36] ok, i just restarted ganglia-monitor on all machines that I know of that would touch this, except for nickel [21:55:34] kraigparkinson: grooming session is filling up nicely: https://mingle.corp.wikimedia.org/projects/analytics/cards?favorite_id=758&view=%3EWIP+-+Feature+Analysis [21:55:47] some cards are stilll in analysis [21:56:19] wow, that's a lot of change since I looked at it 8 minutes ago! :p [22:04:24] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [22:06:04] PROBLEM - Disk space on db11 is CRITICAL: DISK CRITICAL - free space: /a 18101 MB (2% inode=99%): [22:06:34] PROBLEM - RAID on db11 is CRITICAL: CRITICAL: Defunct disk drive count: 1 [22:14:16] notpeter, is this something you could take on? https://rt.wikimedia.org/Ticket/Display.html?id=4862 [22:16:23] Reedy: ChangeNotificationJobs are OOMing [22:16:26] is that already filed? [22:16:28] orly [22:16:35] I don't t hink so [22:21:15] Reedy: I'm not seeing a bug for 'User::addToDatabase: hit a key conflict attempting to insert a user row, but then it doesn't exist when we select it!' [22:22:45] https://bugzilla.wikimedia.org/show_bug.cgi?id=41609 ? [22:25:48] !log root synchronized docroot [22:25:51] !log recreating transitionteam wiki docroot from skel-1.5 and syncing docroot [22:25:52] Logged the message, Master [22:25:58] Logged the message, Master [22:25:59] !log aaron synchronized php-1.22wmf1/includes/job 'deployed c7832c8956d79892be93452e5f4d6b52df1d3bf0' [22:26:04] Logged the message, Master [22:26:24] !log aaron synchronized php-1.22wmf1/maintenance 'deployed c7832c8956d79892be93452e5f4d6b52df1d3bf0' [22:26:29] Logged the message, Master [22:26:34] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:27:24] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [22:29:41] binasher, can you review https://gerrit.wikimedia.org/r/#/c/56502/ please? [22:30:29] binasher: did you get a chance to start https://gerrit.wikimedia.org/r/#/c/35139/ ? [22:31:17] AaronSchulz: not yet, will still do today [22:31:17] MaxSem: sure [22:43:31] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56502 [22:48:40] binasher, thanks [22:49:58] New patchset: Mattflaschen; "Make SSH banner more sympathetic and rhythmic." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57004 [22:51:42] !log aaron synchronized php-1.22wmf1/includes/job 'deployed ec8f42daf19f386d9af2eece6dfc5f8fa6fc42f3' [22:51:48] Logged the message, Master [22:57:30] !log beginning image.img_media_mime scehma migrations, starting with s1 [22:57:36] Logged the message, Master [23:05:18] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [23:06:13] !log creating upload dir on upload7 for transitionteamwiki [23:06:21] Logged the message, Master [23:06:58] PROBLEM - Disk space on db11 is CRITICAL: DISK CRITICAL - free space: /a 17412 MB (2% inode=99%): [23:07:28] PROBLEM - RAID on db11 is CRITICAL: CRITICAL: Defunct disk drive count: 1 [23:07:32] TimStarling: so is https://gerrit.wikimedia.org/r/#/c/55777/ ready to go? [23:09:26] yes, assuming the relevant core/MWSearch patches are live now [23:09:46] mutante: Don't think you should have needed to do that.. Maybe https://wikitech.wikimedia.org/wiki/Add_a_wiki#Swift [23:09:57] I thought it would make sense to deploy it separately, which implies waiting for the MWSearch stuff to go live [23:12:27] New patchset: Pyoungmeister; "correct usage of $title var for multi-instance mysql" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57006 [23:13:18] New patchset: Dzahn; "add transitionteamwiki settings (RT-4850)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/57007 [23:14:02] !log swapping disk1 (slot1) db1001 [23:14:04] Reedy: ok, it told me to at least use those 4: Update with wgUploadDirectory, wgSitename, wgMetaNamespace and wgServer, wgCanonicalServer. [23:14:08] Logged the message, Master [23:14:21] Reedy: and it looked to me like if i don't do it, the UploadDir will _not_ be below ./private [23:14:27] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57006 [23:14:34] Maybe [23:14:35] * Reedy shrugs [23:14:39] Not a big deal [23:14:46] https://gerrit.wikimedia.org/r/#/c/57007/1/wmf-config/InitialiseSettings.php [23:14:54] looks reasonable? [23:15:00] notpeter: let's switch search from eqiad to tampa ? [23:15:18] LeslieCarr: ok [23:16:38] any particular ports for the sfps? [23:17:03] first available - which should be 0/0/16 if the config is accurate ? [23:17:31] New patchset: Dzahn; "add transitionteamwiki settings (RT-4850)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/57007 [23:17:54] Change merged: Dzahn; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/57007 [23:18:20] and let's move search1024 , which should be in 0/23 on a5 [23:18:50] first available is 18...16/17 are mc1017/18 [23:19:48] good to know [23:19:56] i will update the descriptions and vlans [23:20:36] lesliecarr notpeter...okay to move search1024 now? [23:20:44] cmjohnson1: no [23:20:53] we need to fail over traffic first [23:20:58] !log dzahn synchronized ./wmf-config/InitialiseSettings.php [23:21:04] Logged the message, Master [23:21:07] okay...lmk [23:21:15] mutante, I think on private wikis enabling blockdisableslogin is also a good idea [23:21:57] not sure what the guidelines are for using 'wmgUseDisableAccount' => array( [23:21:57] vs 'wgBlockDisablesLogin' => array( [23:21:58] though [23:23:50] It's usually wg if it needs to transforming [23:24:12] RoanKattouw_away: ping [23:24:15] if it's $wmg, it needs to be set after the extensions have been loaded so they don't get overridden, or WMF related configuration [23:24:46] RoanKattouw_away: http://www.meetup.com/Geeklist-San-Francisco-Meetup-Series/events/110490112/ [23:25:09] wg seems to be most popular, let's go with that [23:25:26] !log running addWiki.php for transitionteamwiki - Access denied for user ... [23:25:32] Logged the message, Master [23:26:06] doesn't work for me [23:26:15] access denied for user what? [23:26:33] Error: 1044 Access denied for user 'wikiadmin'@'208.80.152.%' to database 'transitionteam' (10.64.16.153) [23:27:03] oh wait [23:27:12] wrong db name:) [23:27:39] job 76 at Mon Apr 1 23:42:00 2013 [23:28:25] ok, next step says to check if all *.dblist files now contain it.. none of them do [23:28:39] was that what you mentioned about dblist issues? [23:28:46] yeah, you should likely have seen a spam of errors [23:28:47] yeah [23:29:16] all, private, s3 should be enough [23:29:30] yes, i got those errors. ok thanks [23:29:33] and also wikiversions for the same reason [23:30:26] W12: Warning: File "s3.dblist" has changed and the buffer was changed in Vim [23:30:29] edit war? [23:30:55] listing it in all.dblist then using ./refresh-dblist should work [23:31:39] ok, doing that [23:31:46] New patchset: Pyoungmeister; "lucene-production.php: moving search traffic to pmtpa temporarily" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/57010 [23:31:56] 870 wikis listed in all.dblist... [23:31:57] 51 special wikis to be exempted from wikipedia group... [23:32:22] it added it to wikipedia.dblist [23:32:25] it's not a wikipedia though [23:32:26] Change merged: Pyoungmeister; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/57010 [23:33:33] !log adding transitionteamwiki to private.dblist [23:33:39] Logged the message, Master [23:33:42] sync-dblist [23:33:57] Guess it needs to be in special.dblist too then [23:34:37] New patchset: Pyoungmeister; "correction of mis-comment-out" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/57011 [23:35:03] adding to special.dblist, syncing [23:35:06] mutante, this is a good exercise for doc updating too ;) [23:35:29] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/57011 [23:36:25] !log py synchronized wmf-config/lucene-production.php 'moving all search traffic to pmtpa temporarily' [23:36:31] Logged the message, Master [23:36:44] !log dumpInterwiki.php to update interwiki cache [23:36:49] Logged the message, Master [23:38:07] Thehelpfulone: docs WFM ;) [23:38:19] they still reference SVN Reedy :p [23:38:20] cmjohnson1: still getting it switched.. [23:38:31] okay [23:38:39] Only on 2 lines [23:38:50] syncing interwiki.cdb [23:38:55] !log dzahn synchronized php/cache/interwiki.cdb 'Updating interwiki cache' [23:39:00] Logged the message, Master [23:40:14] !log creating swift containers for private wiki [23:40:19] Logged the message, Master [23:41:16] mutante: are you using mwscript? [23:41:22] New patchset: Pyoungmeister; "using correct metavariable" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57013 [23:41:29] mwscript extensions/WikimediaMaintenance/filebackend/setZoneAccess.php --wiki=aawiki --backend=local-multiwrite --private [23:41:34] binasher: db1028, okay to hot swap replacement drive? wanna pull from production first? [23:41:36] AaronSchulz: yes [23:41:53] why aawiki? [23:42:10] https://wikitech.wikimedia.org/wiki/Add_a_wiki#Swift [23:42:10] AaronSchulz, apparently it's needed: [23:42:11] TEMP You need to put in --wiki=aawiki after the addWiki.php and before the langcode for now. Script is wonky --RobH 19:20, 29 September 2009 (UTC) This means literally "aawiki", not the name of the wiki you create!! [23:42:31] mutante: or did you just c/p that for irc? [23:42:31] This means literally "aawiki", not the name of the wiki y [23:42:36] i did what i pasted [23:42:40] that was in 2009, has that not been fixed yet? [23:42:52] mutante: what is the wiki? [23:43:01] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57013 [23:43:08] AaronSchulz: transitionteam [23:43:43] yeah, so you need that for the --wiki param [23:44:32] Thehelpfulone: There's not really anything to fix [23:44:42] it uses the current "config" of aawiki to bootstrap it [23:44:58] I think mwscript handles it now with addWiki in extensions/WikimediaMaintenance [23:45:25] cmjohnson1: can you move search1024 over ? [23:45:55] AaronSchulz: if i do that: Fatal error: /usr/local/apache/common-local/wikiversions.cdb has no version entry for `transitionteamwiki`. [23:46:16] Did you add it to wikiversions.dat? [23:46:19] sync-wikiversions [23:46:50] oh, yea, hold on [23:47:22] lesliecarr: okay [23:48:01] !log dzahn rebuilt wikiversions.cdb and synchronized wikiversions files: [23:48:01] Logged the message, Master [23:48:32] Database name transitionteamwiki is not listed in dblist [23:48:37] sigh, we just confirmed that [23:48:57] it's gone again from dblists [23:49:05] New patchset: Dr0ptp4kt; "Unified default lang redirect from m. & zero. Adding three carriers for testing, too." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/55302 [23:49:14] lesliecarr: chk it [23:50:17] PROBLEM - Host search1024 is DOWN: PING CRITICAL - Packet loss = 100% [23:50:23] brion, would you please review https://gerrit.wikimedia.org/r/55302 and if you agree, add your +1? [23:50:58] hrm [23:51:04] sfp not registering [23:51:06] can you reseat it ? [23:52:04] ahha! [23:52:08] stupid 4500 [23:52:10] ge versus xe [23:52:11] :) [23:52:28] heh [23:52:30] oops [23:53:09] yay [23:53:11] it is alive [23:53:14] IT'S ALIIIIVE [23:53:18] :) [23:53:37] RECOVERY - Host search1024 is UP: PING OK - Packet loss = 0%, RTA = 0.28 ms [23:55:11] !log search1024 is our guinea pig of using sfp-t's and is connected to asw2-a5-eqiad ge 0/0/18 [23:55:12] binasher: any issue with making StatCounter send multiple lines (per stat) in one datagram? [23:55:17] Logged the message, Mistress of the network gear. [23:55:26] thanks cmjohnson1 - RobH and I are all happy now :) [23:55:38] sync-common-all [23:55:51] yw...robh happy though? can't picture it [23:56:05] wait, what are we happy about? [23:56:07] sfp-t? [23:56:14] New patchset: Pyoungmeister; "trying the binasher namespacing model" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57014 [23:56:16] \o/ [23:56:18] binasher: ^^ [23:56:22] do you think that will work? [23:57:26] New review: Dr0ptp4kt; "Patchset 11 adds three carriers for testing." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/55302 [23:58:55] Reedy: sync-common-all can take a couple minutes or more i suppose? [23:59:08] without output that is [23:59:18] mutante: It'll take ages [23:59:21] It's essentially scap [23:59:27] ok :p [23:59:33] well, it's running [23:59:37] Shouldn't need to run scap though [23:59:44] and that should have been all [23:59:48] hope it works afterwards [23:59:54] still missing wiki