[00:06:04] PROBLEM - udp2log processes on locke is CRITICAL: CRITICAL: filters absent: /a/squid/urjc.awk, [00:13:52] RECOVERY - udp2log processes on locke is OK: OK: all filters present [00:30:43] gn8 foks [00:34:48] RECOVERY - Squid on brewster is OK: TCP OK - 0.004 second response time on port 8080 [00:53:24] PROBLEM - Puppet freshness on streber is CRITICAL: Puppet has not run in the last 10 hours [01:04:55] I noticed something odd today. There were two articles on enwiki that were showing up in categories on toolserver even though they were not in the categories. I blanked the pages, and the pages *stayed* in the categories, even after the toolserver updated and the page_len had changed to 0. But I deleted one of the pages and restored it and that fixed the problem. Any idea what could cause this? I have some mysql results at en:User:CBM/Sandbox [01:07:10] carl-m: did you try purging them or null-editing them? [01:07:36] PiRSquared: I blanked the entire page - saving that edit should have removed all the categories, no? [01:08:09] but no, I didn't try explicitly purging them, I will do that now and see what happens [01:10:24] does not have any effect, the strange categories remain [01:42:18] PROBLEM - udp2log processes on locke is CRITICAL: CRITICAL: filters absent: /a/squid/urjc.awk, [01:46:12] RECOVERY - udp2log processes on locke is OK: OK: all filters present [01:52:12] PROBLEM - udp2log processes on locke is CRITICAL: CRITICAL: filters absent: /a/squid/urjc.awk, [02:04:23] RECOVERY - udp2log processes on locke is OK: OK: all filters present [02:10:23] PROBLEM - udp2log processes on locke is CRITICAL: CRITICAL: filters absent: /a/squid/urjc.awk, [02:16:14] RECOVERY - udp2log processes on locke is OK: OK: all filters present [02:18:13] !log LocalisationUpdate completed (1.19) at Tue Mar 6 02:18:13 UTC 2012 [02:18:16] Logged the message, Master [02:36:49] !log tstarling synchronizing Wikimedia installation... : updating to r113119 [02:36:53] Logged the message, Master [02:37:06] !log LocalisationUpdate completed (1.18) at Tue Mar 6 02:37:06 UTC 2012 [02:37:09] Logged the message, Master [02:43:05] PROBLEM - Puppet freshness on db1022 is CRITICAL: Puppet has not run in the last 10 hours [02:53:17] sync done. [03:05:35] PROBLEM - udp2log processes on locke is CRITICAL: CRITICAL: filters absent: /a/squid/urjc.awk, [03:11:44] RECOVERY - udp2log processes on locke is OK: OK: all filters present [03:17:35] PROBLEM - udp2log processes on locke is CRITICAL: CRITICAL: filters absent: /a/squid/urjc.awk, [03:22:31] RECOVERY - udp2log processes on locke is OK: OK: all filters present [03:28:22] PROBLEM - udp2log processes on locke is CRITICAL: CRITICAL: filters absent: /a/squid/urjc.awk, [03:32:16] RECOVERY - udp2log processes on locke is OK: OK: all filters present [03:48:10] PROBLEM - udp2log processes on locke is CRITICAL: CRITICAL: filters absent: /a/squid/urjc.awk, [03:51:09] 05 14:58:32 < yannf> http://commons.wikimedia.org/wiki/Commons:Village_pump#Broken_thumbnails [03:51:56] yannf: maybe related to maplebed's swift work. thumbs recently (in the last 24hrs) were switched to 100% swift. idk what percent they were before [03:53:31] jeremyb: Tuesday: 12.5% (am), 25% (pm) Tuesday: 12.5% (am), 25% (pm) Thursday: 100% (am) [03:53:48] arr..wrong paste.. Wednesday: 50% (am), 75% (pm) [03:53:50] mutante: where are you copying from? [03:54:02] a mail written by maplebed to ops [03:54:19] he added that he updated "deployment calendar and swift deploy wiki pages." [03:55:58] RECOVERY - udp2log processes on locke is OK: OK: all filters present [03:57:40] jeremyb: ooh.. nevermind that. i was off by one month.. it wasnt about March 7,, it was Februrary 7.but got postponed [03:58:14] http://wikitech.wikimedia.org/view/Swift/Deploy_Plan_-_Thumbnails [03:58:16] http://wikitech.wikimedia.org/index.php?title=Server_admin_log&diff=44031&oldid=44030 [03:58:19] http://wikitech.wikimedia.org/index.php?title=Server_admin_log&diff=44203&oldid=44202 [03:58:40] here's the scoop [03:58:56] yannf: anyway, maybe it was related to swift but it's hard to tell if nothing's currently broken [03:59:14] gotcha [03:59:32] Ben discovered problems with the swift stuff last week, so he pulled it out of production [03:59:53] he just put it back into production around 12:30p or so SF time (whatever the wikitech cal says) [04:00:00] robla: was this just the half thumbs or something else? [04:00:11] "This means it's actually only about 2.5% of the objects that are different from ms5 (aka truncated)" [04:01:40] the undeployment was in response to the corrupted (truncated) thumbs. the redeployment was done because Ben believes he's purged all of the broken images [04:01:54] mutante: which email is that from? [04:02:10] robla: "[Ops] Swift is out of rotation again [04:02:40] ah...I didn't get that email [04:02:53] want it? [04:03:36] oh, wait, I take that back [04:03:52] i don't seem to see any mail with an updated timeline (just the feb 6-9 mail) [04:03:55] yep, you are in the list [04:04:09] i do have robla's mail announcing the rollback [04:04:59] did the initial rollout make it all the way to 100%? [04:05:49] jeremyb: I believe so [04:06:01] PROBLEM - udp2log processes on locke is CRITICAL: CRITICAL: filters absent: /a/squid/urjc.awk, [04:06:29] drdee: ping? [04:06:46] is that locke alert related to diedrik? [04:07:58] RECOVERY - udp2log processes on locke is OK: OK: all filters present [04:07:58] for reference (swift): http://lists.wikimedia.org/pipermail/wikitech-l/2012-January/057905.html http://lists.wikimedia.org/pipermail/wikitech-l/2012-February/057970.html http://lists.wikimedia.org/pipermail/wikitech-l/2012-February/058138.html [04:13:17] yannf: anyway, do let someone know if it recurs and at least stick around and lurk next time. (also, helps if you specify "go look at X" where X is a timestamp in the channel or a link to something or a short summary of the issue or *something*. not just "are you alive?" ;-P) [04:14:00] PROBLEM - udp2log processes on locke is CRITICAL: CRITICAL: filters absent: /a/squid/urjc.awk, [04:15:09] hrmm, seems udp2log (at least very recently) has a habit of breaking and then fixing itself [04:15:17] and then breaking again [04:15:25] drdee: ^ [04:15:46] huh, he's 6+ days idle [04:20:06] robla: do you know who's working on GLAM stuff with drdee? [04:20:16] * jeremyb has a mail ready to go, wondering if i should add CC [04:22:24] haha, "Virginia's Silicon...thing" [04:23:54] RECOVERY - udp2log processes on locke is OK: OK: all filters present [04:32:26] hehe, THREE chairs and a good phone [04:33:26] well, sent [04:33:48] PROBLEM - udp2log processes on locke is CRITICAL: CRITICAL: filters absent: /a/squid/urjc.awk, [04:39:21] jeremyb: probably Erik Zachte [04:40:01] robla: too late, already sent to drdee and otto [04:40:20] unless you think it's worth sending to him too [04:40:27] i can forward [04:41:37] well, forwarded [04:45:39] RECOVERY - udp2log processes on locke is OK: OK: all filters present [05:16:24] TimStarling: still thinking about a zhwiki depoyment, or planning to hold off? [05:17:12] I arranged with liangent to do it tonight, around 9:30pm my time [05:17:27] he'll be around to test it [05:18:02] oh good [05:18:33] I could do srwiki as well [05:18:45] since it'll be morning in europe [05:21:36] PROBLEM - Router interfaces on cr1-sdtpa is CRITICAL: CRITICAL: host 208.80.152.196, interfaces up: 73, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-0/0/1: down - Core: cr1-eqiad:xe-5/2/1 (FPL/GBLX, CV71026) [10Gbps wave]BR [05:23:41] yeah, that'd be fine [05:24:00] PROBLEM - udp2log processes on locke is CRITICAL: CRITICAL: filters absent: /a/squid/urjc.awk, [05:28:39] PROBLEM - Puppet freshness on cadmium is CRITICAL: Puppet has not run in the last 10 hours [05:28:39] PROBLEM - Puppet freshness on hooper is CRITICAL: Puppet has not run in the last 10 hours [05:33:45] RECOVERY - udp2log processes on locke is OK: OK: all filters present [05:45:36] PROBLEM - udp2log processes on locke is CRITICAL: CRITICAL: filters absent: /a/squid/urjc.awk, [05:49:30] RECOVERY - udp2log processes on locke is OK: OK: all filters present [05:55:29] anyone else running into problems accessing the API? Status says it is up and I can call queries from a browser but I can't via a script. [05:56:00] https://gist.github.com/1983888 [05:56:08] if anyone has a copy of R and wants to try [05:56:51] PROBLEM - Host cp1017 is DOWN: PING CRITICAL - Packet loss = 100% [05:57:27] PROBLEM - udp2log processes on locke is CRITICAL: CRITICAL: filters absent: /a/squid/urjc.awk, [06:05:46] RECOVERY - udp2log processes on locke is OK: OK: all filters present [06:09:13] Why do people quit so quickly? [06:11:27] Joan: protonk? i wondered too [06:13:43] PROBLEM - udp2log processes on locke is CRITICAL: CRITICAL: filters absent: /a/squid/urjc.awk, [06:15:51] Protonk: What error message are you getting? [06:16:00] Protonk: And has this script ever worked before? [06:16:03] I simplified the code a bit [06:16:03] https://gist.github.com/1983888 [06:16:05] yes [06:16:15] it worked on a number of occasions [06:16:25] that's now just an API call copied from the documentation [06:16:45] it's failing to load the resource at all [06:16:57] I can load it via other functions in R [06:17:01] What error message do you get? [06:17:46] RECOVERY - udp2log processes on locke is OK: OK: all filters present [06:18:14] it won't give me actual HTTP error codes :( [06:18:25] failed to load HTTP resource [06:18:25] Error: 1: failed to load HTTP resource [06:18:32] if I use xmlParse() [06:18:57] and a less informative error if I use any of the html parsing functions (probably because they just get passed values from the xml parser [06:19:13] but it is distinct from the user agent error code [06:19:17] What happens if you just try to read the page contents with no parsing? [06:19:22] Just .read() and print. [06:19:26] Or whatever the R equivalent is. [06:19:43] I can read it w/ R's read() equivalent [06:19:53] so there is an immediate workaround, of course [06:20:02] but this is odd because it worked a few days ago [06:20:27] and I can use the same XML package to read other pages, including wiki pages [06:20:32] just nothing from the API [06:22:26] Protonk: obviously the time for tcpdump is now [06:23:23] Protonk: (and it should be equally obvious that you shouldn't leave so fast next time, wait for an answer and if you must reask) [06:23:37] PROBLEM - udp2log processes on locke is CRITICAL: CRITICAL: filters absent: /a/squid/urjc.awk, [06:24:12] I didn't leave on purpose. Dog was barking at things. [06:24:15] :) [06:24:18] hah [06:24:41] Protonk: use tcpdump with '-s 0 -w filename' to make tcpdump write to a file [06:25:01] Protonk: stop leaving...? [06:25:12] hah [06:25:28] 06 06:24:41 < jeremyb> Protonk: use tcpdump with '-s 0 -w filename' to make tcpdump write to a file [06:25:34] RECOVERY - udp2log processes on locke is OK: OK: all filters present [06:26:05] rgr [06:30:08] ah the fun of figuring out how tcpdump likes to work on a mac [06:30:11] may take a sec [06:35:26] ok [06:35:26] http://pastebin.com/ZY339ksk [06:38:55] and now we transition from stuff I can understand to stuff I have no clue about! [06:38:56] :) [06:39:10] that trace is a bit too verbose :-D [06:40:40] tip: 8.8.8.8 is a Google DNS server. It is a good way to let Google know about all your activity on the internet -- (see: [[big brother]]) :-) [06:40:45] WOOT [06:41:19] yeah, well I've been a gmail user since 2005 so that ship has sailed, so to speak [06:42:06] abandon all hope 8-) [06:42:52] pretty much [06:43:56] Protonk: what did you run to get that? [06:43:58] it may be the XML package (though I imagine someone before me would have discovered this) [06:44:15] tcpdump -i en0 -s 0 -w [06:45:17] so, make that tcpdump -i en0 -s 0 -w tcpdump-en0-0.pcap [06:45:18] ? [06:45:19] but it isn't my computer. I've had someone else run that same gist in their copy of R and get the same result [06:45:37] either way we need the dump ;) [06:45:46] jeremy: yes. but then just piped that to a file and put it on pastebin [06:46:04] err, piped it to a text file, rather [06:46:16] Protonk: piped what? [06:46:31] also, you should try other UA strigns [06:46:33] strings* [06:46:45] like the known string that works from whatever other thing [06:47:39] also, R looks weird [06:47:42] haha [06:47:48] yes it is [06:48:06] Protonk: and passing -A to tcpdump make it decode the packets using ASCII [06:48:21] probably easier to read than some hexadecimal dwords [06:48:30] ok one sec [06:49:02] also same deal w/ no useragent. Also when I get rejected for having no user agent I actually get an informative response from the server and it parses that [06:53:02] Protonk: do you have an issue fetching article using R? [06:53:10] ngrep :O [06:53:25] cause we ban requests from know web fetcher user agent and empty user agents [06:53:40] instead you want to use something meaningful like freenode/protonk [06:54:01] or master_is_Foobar_at_gmail_dot_com [06:54:48] Not that I know of. The only stuff I've used the API for have been to grab accurate category lists for stuff like http://en.wikipedia.org/wiki/User:Protonk/Article_Feedback [06:56:15] http://en.wikipedia.org/wiki/File:Heatmap_of_Correlation_between_Averages_in_Different_Article_Feedback_Categories.svg [06:56:20] I just love those [06:56:55] it is good to know that someone is analyzing the feedback data [06:56:59] yeah there's another way to make those where you overlay the acutal number on top of the cell but I haven't found a way to make it look nice yet [06:57:37] RECOVERY - Router interfaces on cr1-sdtpa is OK: OK: host 208.80.152.196, interfaces up: 75, down: 0, dormant: 0, excluded: 0, unused: 0 [06:59:32] I should have attended my statistics class when I was young [06:59:47] still having some trouble getting tcpdump to speak to me [06:59:48] now I am like "too long, will wait for summary" :-] [06:59:50] so I have tcpdump -i en0 -s 0 -A -w ~/Desktop/DumpFile01.pcap [06:59:53] reading it [07:00:04] tcpdump -s 0 -n -e -v -r ~/Desktop/DumpFile01.pcap > tcpwpapi.txt [07:00:07] writing it [07:00:24] which give me http://pastebin.com/ZY339ksk [07:00:31] you can also filter by port 80 if needed : tcpdump -i en0 -s 0 -A port 80 [07:00:36] Protonk: don't use -r. just post the pcap itself [07:00:37] that will filter out the dns queries though [07:00:42] ahhh, gotcha [07:01:24] Question: why isn't secure login for WP the default? [07:01:51] Pine: $wgSecureLogin is broken [07:02:22] hashar: does that mean that the secure login option doesn't work correctly? [07:02:26] and I don't think the HTTPS architecture is able to handle full load yet [07:02:27] (I'm not a programmer) [07:02:34] yeah it does not work properly [07:02:38] (or at least not much of one) [07:02:56] I need to write test for it and fix it [07:03:01] PROBLEM - Puppet freshness on mw1010 is CRITICAL: Puppet has not run in the last 10 hours [07:03:06] In that case shouldn't the link for the secure login be removed so that people don't assume that it's working correctly? [07:03:36] Pine: well it is mostly working, but sometime it does not do what it should [07:04:01] http://pastebin.com/ZY339ksk [07:04:20] now that's odd. [07:04:33] woot. 403 forbidden. Well...semi-woot [07:04:52] now I have to figure out why it is doing that. [07:05:04] Pine: anyway the answer is : it will eventually be enabled. but not yet :-) [07:05:16] Pine: still need to be polished up and the HTTPS architecture probably need an upgrade [07:05:16] hashar: um, ok. Shouldn't fixing the secure login be a high priority since it's a security issue? [07:05:22] but I think I can manage that [07:05:50] Pine: not really we have higher priority items :) The recommend way is to login using the HTTPS url [07:06:10] May I ask what is a higher priority than account login security? [07:06:17] Pine: what I did is that I remove all HTTP link in my browser history and updated my bookmarks. Thus I am always logging using HTTPS [07:06:18] Seems like the list would be pretty short. [07:06:27] Pine: we have muuuch nastier security issues [07:06:40] https cluster can easily handle it [07:06:46] Pine: we just have finished deploying the last version of mediawiki on the servers [07:06:54] there's a few issues that need to be worked out, though [07:07:10] 1. Our logging format for nginx needs to be fully working (there's still a few bugs) [07:07:13] and this latest version of mediawiki has nastier security issues? :( [07:07:14] jeremy: thanks. Looks like I have to figure out why R isn't respecting the user agent setting. But that's on my end. [07:07:19] 2. MediaWiki needs to properly support it [07:07:37] 3. We have a few Apache rules that need to be set for redirects [07:07:44] security is always a mater of balancing one's interests, as there is no such thing as "perfect security"... so, who are we protecting against and is the level of protection "good enough" compared to other tasks, the level of risk, and the amount of energy that would be required to improve the protection? [07:08:26] that's thy "the list" as you say, Pine, isn't so short [07:08:44] meh. it's not a huge security issue [07:08:49] apergos: "thy"? [07:08:51] use https everywhere [07:08:54] the [07:08:59] and you'll always hit https on our site [07:09:00] sorry, it's early here [07:09:08] need to move out, sorry [07:09:09] security problem gone [07:09:41] ok, I guess I'm glad that the HTTPS login isn't totally screwed up, but it's a concern to hear that there are worse problems. [07:09:42] though, yes, it would be nice for all logged-in users to use https, and it's in the plans [07:09:51] who said there are worse problems [07:09:52] ? [07:10:02] ah. hashar did [07:10:05] Yes [07:10:10] uh huh [07:10:33] well there are certainly "worse problems", not necessarily security-related though [07:10:59] *cough*image server*cough* [07:11:06] if there were bugs that were doing something like, say, preventing all images from loading, that would be a big issue, yes. [07:11:25] PROBLEM - Puppet freshness on mw1020 is CRITICAL: Puppet has not run in the last 10 hours [07:11:25] PROBLEM - Puppet freshness on mw1110 is CRITICAL: Puppet has not run in the last 10 hours [07:11:34] well the current issue with the image server is that we replicae to another box whcih will fill up real soon now [07:11:42] and that leaves the image server as a spof [07:13:13] how does replicating the image server make it a SPOF? [07:14:14] Oh I see [07:14:26] So currently you have redundancy but soon you won't [07:14:29] right? [07:14:39] hi guys:) any idea if we need to worry about those "Udplog processes" on locke? [07:14:52] like filters disappear for a while and then come back [07:15:46] "filters absent: /a/squid/urjc.awk" vs. "OK: all filters present" .. it's kind of "flapping" [07:15:59] mutante: i mailed drdee, otto, ezachte about it [07:16:09] cool, thx [07:16:13] mutante: but i've no idea if it's actually related to them [07:16:36] or whether it's a problem (besides bothering people) or what it means [07:16:57] mutante: can you see how many filters there are? any indication of who owns which? [07:17:23] yes that's right, and our redundancy isn't as good as we'd like either [07:17:58] there are 9 files ending in .awk in /a/squid/ [07:18:11] owned by root:file_mover [07:19:13] hmm..maybe its rather a problem with the mount disappearing in between [07:19:17] /dev/sda3 on /a type jfs (rw,relatime) [07:19:18] jfs [07:20:43] Pine: no. we currently have no redundancy, and soon we will [07:21:13] though, also, soon we'll have no more space on the current servers we have [07:21:28] we'll have redundancy when we change to using swift [07:21:59] * Pine doesn't like the idea of Featured Pictures becoming Featured Empty Spaces [07:23:08] Ryan_Lane: is there an offline backup? [07:23:14] we have the rsync to the one host in eqiad, which is "sort of" redundancy [07:23:23] there's no offline backup of the images [07:23:28] >< [07:23:52] I see what you mean about bigger problems than the secure login. [07:23:55] ah, there is an issue with one of those filters, called "5xx-filter" [07:23:57] yup [07:24:01] locke kernel: [3670201.443128] 5xx-filter[21131]: segfault at 0 ip 0000000000400852 sp 00007fffc2729d60 error 4 in 5xx-filter[400000+1000] [07:24:12] * apergos gets back to those problems [07:24:22] segfault, nice [07:24:38] grep segfault /var/log/syslog :p [07:24:50] every 15 minutes [07:26:03] Ryan_Lane: how soon is "soon" for the redundancy? [07:26:15] imminently? :) [07:26:19] Good :) [07:26:25] PROBLEM - Puppet freshness on ms-be5 is CRITICAL: Puppet has not run in the last 10 hours [07:26:25] we're in the middle of switching to a new system [07:34:54] is there a timeline on when to start copying originals? [07:35:15] what's the current usable space with existing (new) ms boxen? [07:35:22] bah, my phone client has to be built from source, how irritating [07:35:31] apergos: IRC? [07:35:36] i'm assuming we're using copies=3 ? [07:35:48] sflphone [07:36:01] what kind of client is that though? [07:36:05] there's a timeline around, not sure how accurate it is [07:36:12] and maybe only maplebed can answer these questions [07:36:16] siip [07:36:42] apergos: ohhhh, i thought you meant a client (for some unstated protocol) to run on your phone [07:37:34] btw phone client, Ekiga = free replacement for Skype (yes, incl. dial out to landlines, when combined with diamondcard.us) [07:37:55] yeah I haven't had good luck with ekiga in the past [07:38:05] mutante: but not interoperability to call skype ppl or chat with them? [07:38:24] jeremyb: no, afraid not :( [07:38:35] skype's fault of course :) [07:39:44] apergos: worked to dial to landline..with diamondcard.. skype just broken on Debian/*buntu .. issue with microphone input and pulseaudio ..shrug [07:39:57] oh [07:40:09] for fedora it seems to work ok [07:40:14] well, but the other side still said they could hardly understand me :p [07:40:21] there was a long dry period when it was crap [07:40:25] but that was a few years back [07:40:36] apergos: maybe 2.0 works.. but it breaks with the current 2.2.. and skype does not let you download older versions..arg [07:40:46] mutante: http://blogs.digium.com/index.php?s=skype&blog-search=Blog pay particular attention to the 2 middle quarters of 2011 [07:40:49] yeah, i used it succesfully before.but not anymore [07:40:50] that's bad [07:41:01] I admit I haven't tested the latest version [07:41:57] jeremyb: heh, you mean since its owned by MS now. they broke it :? [07:42:22] no, i mean it was bad and they regressed it past bad [07:43:34] well i should not be up in apergos's morning. nacht [07:43:37] arr, yeah, well i have the problem that pops up if you start to google "skype microphone linux" etc [07:44:44] I have the old rpm around in case this one is a problem [07:44:48] guess we'll see [07:44:53] good night [07:45:00] anyways, after spending half a night to get Ekiga plus the diamondcard up, putting funds on it etc.. and then it still didnt work good,, i was fed up with it i'd rather pay for landline [07:45:07] thanks , good night apergos [07:45:39] not night for me, night for jeremyb [07:46:19] ah:p timezone confusin. g'night jeremy [08:20:05] RECOVERY - Puppet freshness on brewster is OK: puppet ran at Tue Mar 6 08:20:00 UTC 2012 [08:22:38] PROBLEM - Router interfaces on cr1-sdtpa is CRITICAL: CRITICAL: host 208.80.152.196, interfaces up: 73, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-0/0/1: down - Core: cr1-eqiad:xe-5/2/1 (FPL/GBLX, CV71026) [10Gbps wave]BR [08:43:38] PROBLEM - Puppet freshness on db1004 is CRITICAL: Puppet has not run in the last 10 hours [08:52:38] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [08:54:11] wth,, where did curl go? Package curl is not available, but is referred to by another package. [09:02:02] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [09:02:02] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [09:33:49] New patchset: Dzahn; "add nagios to the Debian-exim group to allow it to check_disk the tmpfs mount - should fix CRIT on sodium" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2927 [09:36:05] RECOVERY - Router interfaces on cr1-sdtpa is OK: OK: host 208.80.152.196, interfaces up: 75, down: 0, dormant: 0, excluded: 0, unused: 0 [09:43:39] New patchset: Dzahn; "include class {'webserver::php5': ssl => 'true'; } in misc::racktables to fix broken puppet dependency on hooper" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2928 [09:44:49] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2928 [09:44:52] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2928 [09:50:19] New patchset: Dzahn; "ugh, need to remove duplicate service definition for apache2 as well then (hooper/racktables)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2929 [09:50:54] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2929 [09:50:57] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2929 [09:53:41] New patchset: Dzahn; "..and another duplicate..Apache_module[ssl] already defined in the generic class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2930 [09:54:21] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2930 [09:54:23] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2930 [09:55:53] RECOVERY - Puppet freshness on hooper is OK: puppet ran at Tue Mar 6 09:55:41 UTC 2012 [09:56:11] New review: Dzahn; "alright, now puppet runs again on hooper" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2930 [10:00:54] New review: Dzahn; "it's still true that puppet cant manage existing users, right? so using an Exec .." [operations/puppet] (production); V: 1 C: 1; - https://gerrit.wikimedia.org/r/2927 [10:22:35] mutante: so are you established in australia now ? : ) [10:22:43] mutante: or is that just for holidays ? :-b [10:41:27] !log tstarling rebuilt wikiversions.cdb and synchronized wikiversions files: switching zh* from 1.18 to 1.19 [10:41:30] Logged the message, Master [10:42:28] liangent: done [10:44:14] on recentchanges there's no variant dropdown now [10:44:18] was it like this? [10:45:04] I think it was like that before [10:45:19] TimStarling: no. from http://web.archive.org/web/20110605054351/http://zh.wikipedia.org/wiki/Special:%E6%9C%80%E8%BF%91%E6%9B%B4%E6%94%B9 there was [10:45:47] hmm [10:49:18] https://www.mediawiki.org/wiki/Special:Code/MediaWiki/100527 [10:49:39] no, before that I think... [10:50:15] ah, once robin's name starts showing up in the annotation I know I've got the right revision [10:50:44] it was r97849 [10:50:58] TimStarling: when was branching time? [10:51:12] r110996 [10:51:55] what is variant selection on a special page expected to do? just change the interface language? [10:52:19] but on pages like RC it's full of content language [10:52:36] did it change the page titles also? [10:52:44] I'd better set up a 1.18 wiki so I can see [10:52:55] go to srwiki [10:54:46] the selector is there and it's actually converting [10:54:51] it just seems to change the interface language, the page titles, usernames and edit summaries stay the same [10:56:12] so you can change your user preference language to get the same effect [10:56:38] what do you think? should I roll back to 1.18 again for this? [10:56:54] TimStarling: another issue first, enhanced rc fails? [10:57:10] it isn't collapsed anymore on zhwiki [11:00:07] (06:56:33 PM) TimStarling: what do you think? should I roll back to 1.18 again for this? << I don't think it's urgent [11:01:07] (06:56:06 PM) TimStarling: so you can change your user preference language to get the same effect << my user language on srwiki is en and user variant is sr and no &uselang and &variant=sr-el, special page title is "Recent changes" [11:01:43] there's a JS error on enhanced RC, looks like it's from a gadget [11:02:43] "Error: wgUserVariant is not defined" [11:02:52] yeah [11:02:57] it was a js global [11:03:19] was it removed? [11:03:22] https://zh.wikipedia.org/w/index.php?title=MediaWiki:Gadget-site-lib.js&action=edit [11:04:06] it's another robin thing [11:04:13] it's removed on pages that don't have variants [11:05:16] what do you suggest to fix? [11:05:29] wgUserVariant is also used in mediawiki:common.js [11:05:42] r97849 again [11:06:46] I'll just revert that part of it [11:08:04] this is starting to annoy me [11:08:10] maybe I should just revert his entire work product [11:11:25] in bugĀ 34832 fix, which language are you converting it to? [11:11:37] user language or user variant? [11:12:46] should be user variant, same as before [11:12:55] from $wgContLang->getPreferredVariant [11:13:33] then special pages can have &variant= [11:13:43] and it's not the same as &uselang= [11:13:50] !log tstarling synchronized php-1.19/includes/OutputPage.php 'r113128' [11:13:53] Logged the message, Master [11:14:39] ok, enhanced RC works again [11:15:55] !log hashar synchronized php-1.19/languages/messages/MessagesSa.php 'r1113039 for bug 34938 : title is sometime empty on Sanskrit wikis' [11:15:58] Logged the message, Master [11:16:27] http://zh.wikipedia.org/wiki/MediaWiki:Blockedtext/zh-hk?variant=zh-hk the content is still in simplified chinese [11:16:58] http://zh.wikipedia.org/wiki/MediaWiki:Blockedtext/zh-tw?variant=zh-tw but this is in traditional [11:30:05] I think that's just a quirk with MediaWiki namespace page views [11:30:41] I blocked myself and looked at the block message with various user variants set, and it seems ok [11:39:10] I'm going to switch the sr wikis [11:40:25] !log tstarling rebuilt wikiversions.cdb and synchronized wikiversions files: switching sr* to 1.19 [11:40:28] Logged the message, Master [11:50:24] PROBLEM - RAID on searchidx2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:51:58] liangent: file a bug for anything you think is a bug [11:52:12] RECOVERY - RAID on searchidx2 is OK: OK: State is Optimal, checked 4 logical device(s) [11:52:41] I'm going to bed soon, so if it's more or less working then I'll leave it how it is for now [11:55:58] TimStarling: got it [11:56:12] thanks for your help [12:44:36] PROBLEM - Puppet freshness on db1022 is CRITICAL: Puppet has not run in the last 10 hours [14:53:45] Hi [14:59:46] Is anyone here able to have a look at the following bug plz? https://bugzilla.wikimedia.org/show_bug.cgi?id=17618 [15:00:01] I can't use my account on enwiki because of this... [15:01:28] interesting [15:02:06] seems like your enwiki account is not SUL [15:02:40] saper: It was like when created my enwiki account by the sul, something failed at half process [15:03:34] saper: Strange things is that at first login here, everything looks like I was logged in. But I lost my logged in state when tried to "preview" an edit. [15:04:36] it's possible you weren't [15:04:46] but no idea really [15:04:57] saper: Apparently, "Vito" steward already encountered such a case with an itwiki account and solved it by renaming the itwiki account to something different [15:05:01] after I log out I still have my skind (not default vector) [15:05:04] my skin [15:05:11] well yes [15:05:15] they can do it [15:05:16] * Vito nods [15:06:00] But he prefers not to do that in my case to let the possibility to debug the case. I'm not against it, but so hope someone could look at this bug fast :p [15:07:01] what happens [15:07:05] if you logout [15:07:09] delete all cookies [15:07:15] go to enwiki and try to log in there? [15:13:42] saper: i will try, but already I tried on 2 different computers on different places! [15:14:05] I'm just wondering - do you get "wrong password" or what? [15:16:10] saper: yes, wrong password [15:16:46] and trying to get a forgotten password reminder, i have a message that i haven't set any email to recover [15:17:47] 23:46, 5 March 2012 Account Fviard (talk | contribs) was created automatically [15:17:50] yeah [15:18:23] saper: yes, looks like the account was created but not linked to sul :( [15:23:10] Greatgib_2: I have reassigned the bug [15:25:56] saper: thank you :) [15:30:05] PROBLEM - Puppet freshness on cadmium is CRITICAL: Puppet has not run in the last 10 hours [15:59:17] hi! could you fix Polish Planet Wikimedia? => https://bugzilla.wikimedia.org/show_bug.cgi?id=34268 [16:25:13] !log catrope synchronized docroot/foundation/FrameResize.html 'Put Jobvite frame resize file in foundationwiki docroot per Erik' [16:25:17] Logged the message, Master [17:04:39] PROBLEM - Puppet freshness on mw1010 is CRITICAL: Puppet has not run in the last 10 hours [17:12:36] PROBLEM - Puppet freshness on mw1020 is CRITICAL: Puppet has not run in the last 10 hours [17:12:36] PROBLEM - Puppet freshness on mw1110 is CRITICAL: Puppet has not run in the last 10 hours [17:27:36] PROBLEM - Puppet freshness on ms-be5 is CRITICAL: Puppet has not run in the last 10 hours [17:51:45] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 10.0466279646 (gt 8.0) [17:56:08] dpkg -l [17:56:09] dpkg -l [17:56:13] oops [17:56:13] :) [17:56:51] already forgot how to use two screens [17:57:45] RECOVERY - Packetloss_Average on locke is OK: OK: packet_loss_average is 0.584431071429 [18:04:08] New patchset: Lcarr; "Adding in neon as a ntp monitoring server" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2934 [22:08:58] RECOVERY - SSH on search1006 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [22:09:03] RECOVERY - SSH on search1007 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [22:09:03] PROBLEM - Lucene on search1010 is CRITICAL: Connection timed out [22:09:36] RECOVERY - SSH on search1009 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [22:10:03] RECOVERY - RAID on search1001 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 [22:10:21] RECOVERY - SSH on search1010 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [22:11:15] PROBLEM - Lucene on search1012 is CRITICAL: Connection timed out [22:11:24] RECOVERY - Disk space on search1001 is OK: DISK OK [22:11:33] PROBLEM - DPKG on search1016 is CRITICAL: Connection refused by host [22:11:33] PROBLEM - RAID on search1015 is CRITICAL: Connection refused by host [22:11:33] PROBLEM - DPKG on search1015 is CRITICAL: Connection refused by host [22:11:33] PROBLEM - SSH on search1016 is CRITICAL: Connection refused [22:11:42] PROBLEM - Disk space on search1015 is CRITICAL: Connection refused by host [22:12:00] PROBLEM - SSH on search1015 is CRITICAL: Connection refused [22:12:09] PROBLEM - RAID on search1020 is CRITICAL: Connection refused by host [22:12:09] PROBLEM - SSH on search1020 is CRITICAL: Connection refused [22:12:27] RECOVERY - SSH on search1012 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [22:12:36] RECOVERY - SSH on search1011 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [22:12:45] PROBLEM - Lucene on search1013 is CRITICAL: Connection refused [22:12:54] PROBLEM - RAID on search1016 is CRITICAL: Connection refused by host [22:12:54] PROBLEM - Disk space on search1016 is CRITICAL: Connection refused by host [22:12:54] PROBLEM - Lucene on search1019 is CRITICAL: Connection refused [22:13:03] PROBLEM - Disk space on search1020 is CRITICAL: Connection refused by host [22:13:12] PROBLEM - DPKG on search1020 is CRITICAL: Connection refused by host [22:13:30] RECOVERY - SSH on search1013 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [22:16:03] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:16:30] RECOVERY - NTP on search1001 is OK: NTP OK: Offset -0.00751376152 secs [22:17:06] PROBLEM - Lucene on search1015 is CRITICAL: Connection refused [22:18:00] RECOVERY - SSH on search1015 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [22:18:18] PROBLEM - MySQL Replication Heartbeat on db42 is CRITICAL: CRIT replication delay 285 seconds [22:18:45] PROBLEM - Lucene on search1020 is CRITICAL: Connection timed out [22:18:54] PROBLEM - Lucene on search1016 is CRITICAL: Connection refused [22:19:48] RECOVERY - SSH on search1016 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [22:20:15] RECOVERY - SSH on search1019 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [22:20:24] RECOVERY - SSH on search1020 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [22:20:24] PROBLEM - MySQL Slave Delay on db42 is CRITICAL: CRIT replication delay 399 seconds [22:21:18] PROBLEM - NTP on search1003 is CRITICAL: NTP CRITICAL: No response from NTP server [22:21:36] PROBLEM - NTP on search1002 is CRITICAL: NTP CRITICAL: No response from NTP server [22:22:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 7.937 seconds [22:22:48] PROBLEM - NTP on search1011 is CRITICAL: NTP CRITICAL: No response from NTP server [22:23:06] PROBLEM - NTP on search1004 is CRITICAL: NTP CRITICAL: No response from NTP server [22:25:03] PROBLEM - NTP on search1005 is CRITICAL: NTP CRITICAL: No response from NTP server [22:25:21] PROBLEM - NTP on search1006 is CRITICAL: NTP CRITICAL: No response from NTP server [22:25:21] PROBLEM - NTP on search1007 is CRITICAL: NTP CRITICAL: No response from NTP server [22:26:17] New patchset: Pyoungmeister; "well that's a weird dependency loop..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2947 [22:27:18] PROBLEM - NTP on search1009 is CRITICAL: NTP CRITICAL: No response from NTP server [22:27:37] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2947 [22:28:39] PROBLEM - NTP on search1010 is CRITICAL: NTP CRITICAL: No response from NTP server [22:30:45] PROBLEM - NTP on search1012 is CRITICAL: NTP CRITICAL: No response from NTP server [22:30:54] PROBLEM - NTP on search1019 is CRITICAL: NTP CRITICAL: No response from NTP server [22:30:54] PROBLEM - NTP on search1013 is CRITICAL: NTP CRITICAL: No response from NTP server [22:31:03] New patchset: Pyoungmeister; "well that's a weird dependency loop..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2947 [22:32:51] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2947 [22:33:13] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2947 [22:33:15] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2947 [22:35:24] PROBLEM - NTP on search1015 is CRITICAL: NTP CRITICAL: No response from NTP server [22:37:21] PROBLEM - NTP on search1016 is CRITICAL: NTP CRITICAL: No response from NTP server [22:37:57] PROBLEM - NTP on search1020 is CRITICAL: NTP CRITICAL: No response from NTP server [22:38:11] New patchset: Pyoungmeister; "easy way." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2948 [22:40:59] New patchset: Pyoungmeister; "easy way." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2948 [22:41:28] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2948 [22:41:31] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2948 [22:45:27] New patchset: Asher; "testing a varnish instance in front of gdash" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2949 [22:45:54] PROBLEM - Puppet freshness on db1022 is CRITICAL: Puppet has not run in the last 10 hours [22:54:59] New review: Asher; "but without a lint check.. here goes nothing" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2949 [22:55:02] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2949 [22:56:15] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:02:06] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 1.426 seconds [23:20:07] !log reedy synchronized php-1.19/extensions/SyntaxHighlight_GeSHi/SyntaxHighlight_GeSHi.class.php 'r113198' [23:20:11] Logged the message, Master [23:21:36] RECOVERY - Host ms-be4 is UP: PING OK - Packet loss = 0%, RTA = 0.26 ms [23:25:42] !log patched 5xx-filter.c live on locke and reloaded udp2log to stop the segfaults [23:25:45] Logged the message, Master [23:25:48] RECOVERY - DPKG on search1001 is OK: All packages OK [23:27:09] RECOVERY - DPKG on search1002 is OK: All packages OK [23:27:36] RECOVERY - RAID on search1002 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 [23:28:21] RECOVERY - Disk space on search1002 is OK: DISK OK [23:31:12] RECOVERY - NTP on search1002 is OK: NTP OK: Offset 0.04822313786 secs [23:33:42] New patchset: Asher; "looks like xff_sources is required for varnish defs now to avoid an unused acl fatal error" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2950 [23:34:39] RECOVERY - Lucene on search1001 is OK: TCP OK - 0.026 second response time on port 8123 [23:36:29] !log reedy synchronized php-1.19/extensions/SyntaxHighlight_GeSHi/SyntaxHighlight_GeSHi.class.php 'r113200 reverting r113198' [23:36:33] Logged the message, Master [23:37:57] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:39:09] New review: Asher; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2950 [23:39:11] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2950 [23:40:03] RECOVERY - SSH on ms-be4 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [23:45:54] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 0.032 seconds [23:50:23] New patchset: Ryan Lane; "Test" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2951 [23:51:54] gn8 folks [23:51:59] TimStarling: how are sr and zh doing? [23:52:08] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2951 [23:54:34] !log catrope synchronized php-1.19/extensions/CustomUserSignup/ 'Belated sync of r113056' [23:54:34] they were more or less working last night, except for a few quirks [23:54:37] Logged the message, Master [23:54:43] I haven't seen any bug reports, but maybe I'm not CC'd [23:54:47] DarTar: There goes ---^^ , should be disabled now [23:55:20] cool, will keep an eye on the log [23:55:43] They might not disappear immediately, give it 5-10 mins [23:56:15] RECOVERY - RAID on search1003 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 [23:56:15] RECOVERY - Disk space on search1003 is OK: DISK OK [23:57:05] * AaronSchulz thought he scared tim away [23:57:36] RECOVERY - DPKG on search1003 is OK: All packages OK [23:59:51] RECOVERY - DPKG on search1004 is OK: All packages OK [23:59:51] RECOVERY - RAID on search1004 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0