[00:03:53] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Wed Sep 25 00:03:48 UTC 2013 [00:04:43] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [00:27:53] (03PS1) 10Springle: db1018 to precise, mariadb, innodb_file_per_table [operations/puppet] - 10https://gerrit.wikimedia.org/r/85943 [00:29:17] (03PS2) 10Springle: db1018 to precise, mariadb, innodb_file_per_table [operations/puppet] - 10https://gerrit.wikimedia.org/r/85943 [00:30:30] mwalker, why does CN set mediaWiki.user.sessionId cookie? [00:31:10] (03CR) 10Springle: [C: 032] db1018 to precise, mariadb, innodb_file_per_table [operations/puppet] - 10https://gerrit.wikimedia.org/r/85943 (owner: 10Springle) [00:31:23] the only reason I know of is, if, when connection to meta the requesting user does not have a SUL session, MW will autocreate one [00:32:04] it probably also does it when you view a Special:CentralNotice page that uses HTMLForms [00:32:15] I observe it being created from CN JS [00:32:29] interesting; can you explain further? [00:33:05] like what file/line? [00:33:53] because a simple case insensitive grep on my resource loader modules doesn't come up with anything [00:35:35] mwalker, my bad, in addition to CN there was an ULS in that RL blob:( [00:35:37] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Wed Sep 25 00:35:32 UTC 2013 [00:35:37] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [00:36:20] it's ULS [00:36:37] hehe, 'an ULS' [00:36:58] they might be multiplying! [00:37:08] off to hang out with the alots [00:37:19] me looks in Varnish t o see if it results in cache bypass [00:42:25] I'll poke mark tomorrow about it [00:55:44] !log started xtrabackup clone db1002 to db1018 [00:55:58] Logged the message, Master [00:58:51] PROBLEM - Puppet freshness on analytics1021 is CRITICAL: No successful Puppet run in the last 10 hours [01:03:26] (03PS1) 10Legoktm: Enable MassMessage extension on test2.wikipedia.org [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85944 [01:03:29] (03PS11) 10Yuvipanda: Add labsvagrant module [operations/puppet] - 10https://gerrit.wikimedia.org/r/85814 [01:10:16] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [01:13:36] PROBLEM - Varnish HTTP upload-frontend on cp1064 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:13:49] Can someone fix wikibugs? [01:13:56] It should still be connected. [01:14:02] But not joined to #mediawiki. [01:14:36] RECOVERY - Varnish HTTP upload-frontend on cp1064 is OK: HTTP OK: HTTP/1.1 200 OK - 229 bytes in 0.907 second response time [01:14:42] (03PS12) 10Yuvipanda: Add labsvagrant module [operations/puppet] - 10https://gerrit.wikimedia.org/r/85814 [01:17:32] (03PS13) 10Yuvipanda: Add labsvagrant module [operations/puppet] - 10https://gerrit.wikimedia.org/r/85814 [01:17:36] PROBLEM - Varnish HTTP upload-frontend on cp1051 is CRITICAL: HTTP CRITICAL - No data received from host [01:19:36] RECOVERY - Varnish HTTP upload-frontend on cp1051 is OK: HTTP OK: HTTP/1.1 200 OK - 229 bytes in 3.131 second response time [01:20:10] (03CR) 10Ryan Lane: [C: 032] Add labsvagrant module [operations/puppet] - 10https://gerrit.wikimedia.org/r/85814 (owner: 10Yuvipanda) [01:20:24] wooho, thanks Ryan_Lane [01:20:36] yw [01:20:40] hm [01:20:45] it didn't go all the way through yet [01:21:19] must be multiple api calls ;) [01:23:36] PROBLEM - LVS HTTPS IPv4 on upload-lb.eqiad.wikimedia.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 325 bytes in 0.016 second response time [01:24:36] RECOVERY - LVS HTTPS IPv4 on upload-lb.eqiad.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 519 bytes in 2.222 second response time [01:25:29] (03PS1) 10Yuvipanda: Add labsvagrant role [operations/puppet] - 10https://gerrit.wikimedia.org/r/85946 [01:25:30] bah [01:25:33] hop online says ok. [01:25:40] bah i say! [01:26:06] PROBLEM - Varnish HTTP upload-frontend on cp1049 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:26:56] RECOVERY - Varnish HTTP upload-frontend on cp1049 is OK: HTTP OK: HTTP/1.1 200 OK - 230 bytes in 0.589 second response time [01:30:20] (03CR) 10Ryan Lane: [C: 032] Add labsvagrant role [operations/puppet] - 10https://gerrit.wikimedia.org/r/85946 (owner: 10Yuvipanda) [01:30:36] PROBLEM - LVS HTTPS IPv4 on upload-lb.eqiad.wikimedia.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 325 bytes in 0.986 second response time [01:31:36] No ottoman? [01:31:40] Ottomata. [01:32:36] RECOVERY - LVS HTTPS IPv4 on upload-lb.eqiad.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 517 bytes in 5.233 second response time [01:34:56] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Wed Sep 25 01:34:50 UTC 2013 [01:35:16] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [01:37:35] RobH / Ryan_Lane: either of you have access to the box wikibugs sits on (machenry iirc) [01:37:37] ? [01:37:50] (03PS3) 10Yuvipanda: Add labsbastion role [operations/puppet] - 10https://gerrit.wikimedia.org/r/84927 [01:40:13] !log restarted wikibugs [01:40:26] Logged the message, Master [01:40:30] Thanks. [01:43:33] p858snake|l: everyone on the ops team does [01:44:06] Ryan_Lane: I know, that was more of a "is anyone around to do it" question [01:45:44] to do what? [01:45:55] I'm on a plane, making changes isn't likely a great idea [01:46:05] Ryan_Lane: to restart wikibugs (but tim did it already) [01:46:23] ah, ok [01:46:26] (03PS4) 10Yuvipanda: Add labsbastion role [operations/puppet] - 10https://gerrit.wikimedia.org/r/84927 [01:47:13] if we have learn't anything from domas, that making changes on a plane trip is the perfect place! [01:47:43] I miss Domas. [01:47:52] And River. [01:47:56] (03CR) 10Ryan Lane: [C: 032] Add labsbastion role [operations/puppet] - 10https://gerrit.wikimedia.org/r/84927 (owner: 10Yuvipanda) [01:48:04] And Jens. [01:48:04] woohoo, ty Ryan_Lane [01:48:26] yw [01:48:39] p858snake|l: well, I'm reviewing and merging code ;) [02:00:46] PROBLEM - Varnish HTTP upload-frontend on cp1062 is CRITICAL: HTTP CRITICAL - No data received from host [02:01:46] RECOVERY - Varnish HTTP upload-frontend on cp1062 is OK: HTTP OK: HTTP/1.1 200 OK - 230 bytes in 4.828 second response time [02:05:16] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Wed Sep 25 02:05:14 UTC 2013 [02:06:16] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [02:07:36] PROBLEM - Varnish HTTP upload-frontend on cp1051 is CRITICAL: HTTP CRITICAL - No data received from host [02:08:36] RECOVERY - Varnish HTTP upload-frontend on cp1051 is OK: HTTP OK: HTTP/1.1 200 OK - 229 bytes in 3.186 second response time [02:10:56] PROBLEM - LVS HTTPS IPv6 on upload-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:11:11] upload.wikimedia.org is occasionally throwing a 502 Bad Gateway error. [02:11:46] RECOVERY - LVS HTTPS IPv6 on upload-lb.eqiad.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 515 bytes in 0.014 second response time [02:11:50] yeah, meta too [02:11:58] clearly something going on [02:12:04] HTTPS? [02:12:08] https://bugzilla.wikimedia.org/show_bug.cgi?id=50891 [02:12:52] it is on https… though to be fair I'm always on https now so not sure if it's the same thing or different [02:13:00] !log LocalisationUpdate completed (1.22wmf18) at Wed Sep 25 02:13:00 UTC 2013 [02:13:05] hmmm [02:13:08] it could have been that too [02:13:17] Logged the message, Master [02:14:51] (03CR) 10MZMcBride: "I believe you also need to update wmf-config/extension-list." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85944 (owner: 10Legoktm) [02:17:20] loading pages on meta is giving me a ton of time outs from upload too [02:17:34] (I'm assuming that would happen everywhere) [02:17:42] * Jamesofur tries on http [02:18:36] errr [02:18:42] is meta on https only for anon now too? [02:18:55] * Jamesofur clearly missed that memo [02:19:05] Elsie: hm, yeah i do [02:19:43] Jamesofur: You're logged in. [02:19:47] So presumably you're being redirected. [02:19:53] Because otherwise it would be annoying. [02:20:03] I logged out …. and still being redirected [02:20:15] (I even manually went to Special:logout just in case) [02:20:42] (03PS2) 10Legoktm: Enable MassMessage extension on test2.wikipedia.org [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85944 [02:20:46] PROBLEM - Varnish HTTP upload-frontend on cp1050 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:20:46] Dunno. That bug should probably be re-opened. [02:21:17] I think it's that varnish error ^ [02:21:25] there were a couple more before I joined it seems looking at the logs [02:21:36] I have the bot on ignroe. [02:21:37] ignore [02:21:47] fair enough [02:22:36] RECOVERY - Varnish HTTP upload-frontend on cp1050 is OK: HTTP OK: HTTP/1.1 200 OK - 229 bytes in 0.387 second response time [02:23:16] PROBLEM - Puppet freshness on mw1072 is CRITICAL: No successful Puppet run in the last 10 hours [02:25:31] !log LocalisationUpdate completed (1.22wmf17) at Wed Sep 25 02:25:30 UTC 2013 [02:25:43] Logged the message, Master [02:26:31] one of my extensions must be causing the redirect, don't have it on incognito (though I don't have SSLEverywhere on this one on purpose oh well). still seem to be having some of the issues on http though [02:28:21] You're getting errors on HTTP? [02:32:37] Hi [02:32:39] I'm here [02:35:16] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Wed Sep 25 02:35:14 UTC 2013 [02:35:55] Elsie: I'm seeing some upload time outs at least [02:36:03] no meta time out yet [02:36:16] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [02:36:18] s/time out/bad gateway [02:45:35] I take that back, the stuff I'm seeing on http is different [02:45:40] the bad gateways are only on https so far [02:45:44] * Jamesofur reopens bug for now [02:47:11] ahh you beat me to it [02:48:20] !log LocalisationUpdate ResourceLoader cache refresh completed at Wed Sep 25 02:48:19 UTC 2013 [02:48:32] Logged the message, Master [03:04:06] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Wed Sep 25 03:03:59 UTC 2013 [03:04:16] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [03:33:51] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Wed Sep 25 03:33:46 UTC 2013 [03:34:11] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [03:45:56] (03PS1) 10Springle: repool db1018 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85951 [03:46:54] (03CR) 10Springle: [C: 032] repool db1018 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85951 (owner: 10Springle) [03:47:52] !log springle synchronized wmf-config/db-eqiad.php 'repool db1018' [03:48:04] Logged the message, Master [04:08:53] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [04:19:43] something funky going on with site performance: http://ganglia.wikimedia.org/latest/graph.php?r=day&z=xlarge&hreg[]=client-side&mreg[]=%5Ebrowser.rendering.%28desktop%7Cmobile%29_median%24>ype=line&title=Rendering%3A+responseEnd+to+loadEventEnd&aggregate=1 [04:19:56] weekly view also revealing: http://ganglia.wikimedia.org/latest/graph.php?r=week&z=xlarge&hreg[]=client-side&mreg[]=%5Ebrowser.rendering.%28desktop%7Cmobile%29_median%24>ype=line&title=Rendering%3A+responseEnd+to+loadEventEnd&aggregate=1 [04:20:17] ditto http://ganglia.wikimedia.org/latest/graph.php?r=week&z=xlarge&hreg[]=client-side&mreg[]=%5Ebrowser.loading.%28desktop%7Cmobile%29_median%24>ype=line&title=Loading%3A+navStart+to+loadEventStart&aggregate=1 [04:20:48] slowness reports on en.wp: https://en.wikipedia.org/wiki/Wikipedia:VPT#Is_preview_slower_than_usual.3F [04:21:26] TimStarling, ping [04:34:03] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Wed Sep 25 04:34:01 UTC 2013 [04:34:53] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [05:02:59] PROBLEM - Puppet freshness on db1033 is CRITICAL: No successful Puppet run in the last 10 hours [05:04:55] looking [05:05:36] !log tstarling cleared profiling data [05:05:50] Logged the message, Master [05:06:35] may have resolved itself meanwhile, there was a follow-up on VPT saying the slowness went away, and the graphs bear that out [05:07:14] graph at http://status.wikimedia.org/8777/131241/Images-&-media suspicious too [05:09:40] (03PS1) 10Springle: depool db1035 for upgrade [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85954 [05:11:50] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [05:11:52] (03CR) 10Springle: [C: 032] depool db1035 for upgrade [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85954 (owner: 10Springle) [05:12:08] so do you know what was slow exactly? [05:12:59] !log springle synchronized wmf-config/db-eqiad.php 'depool db1035 for upgrade' [05:13:11] Logged the message, Master [05:13:39] TimStarling, I do see users complaining about image loads specifically, but also slow previews/saves. And the usual (frequent) complaints about bits.wm.o being slow. [05:14:01] maybe just an external network issue then [05:15:51] actually, this is a bit suspicious: http://ganglia.wikimedia.org/latest/graph.php?r=day&z=xlarge&c=Upload+caches+eqiad&m=cpu_report&s=by+name&mc=2&g=network_report [05:16:06] that looks like a plateau doesn't it? [05:16:37] or a boa constrictor digesting an elephant [05:16:59] on the weekly: http://ganglia.wikimedia.org/latest/?r=week&cs=&ce=&m=network_report&s=by+name&c=Upload+caches+eqiad&h=&host_regex=&max_graphs=0&tab=m&vn=&sh=1&z=small&hc=4 [05:17:06] you can see that w've exceeded this traffic before [05:17:21] so it may be that we are throttled by some upstream network event [05:18:05] ah, i see the reasoning, makes sense [05:18:31] well, you know that a plateau is indicative of an overload [05:18:42] so if it is a plateau, that is a bad thing [05:19:07] you get a plateau when the normal traffic curve is truncated by capacity constraints [05:21:14] the observium data does not look normal [05:21:22] and if the limit was regularly exceeded in the past than the constraint is external to the system [05:21:30] what's observium? [05:22:13] https://observium.wikimedia.org/ [05:23:14] oh, i don't have access to that [05:23:53] 25/Sep/13 05:06:46 Device status changed to Down [05:24:00] on csw1-sdtpa [05:27:07] also there was an event at 20:00 [05:27:54] can you send a text to leslie? it's not too late there is it? [05:28:30] 10:30pm, i'll grab my phone [05:28:34] this hasn't already been discussed with leslie has it? [05:31:00] not that I've seen (just trying to catch up while doing other things right now) [05:32:01] I'll send a message, if you haven't already [05:33:07] i just did [05:33:16] sorry, had to get my phone from the other room [05:34:00] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Wed Sep 25 05:33:55 UTC 2013 [05:34:50] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [05:36:25] observium is a terrible piece of software [05:36:31] it takes forever to find anything in it [05:39:12] like showing 6000 separate traffic graphs [05:40:19] or if you go to cr1-eqiad, it shows 727 separate traffic graphs, in little thumbnails, all with a different scale so that you can't tell where the traffic is going at a glance [05:40:22] what about torrus? google cache suggests it recently had content about that router but it errors out for me [05:40:37] i didn't know about that subdomain either [05:41:15] there was never much data in torrus [05:41:23] at least observium does actually have the graphs [05:43:30] anyway, if/when Leslie comes online, I just want her to know that there were reports of slowness, ganglia data is suspicious, and observium shows a rerouting of some kind at 18:00 that may be related [05:44:08] ^^ LeslieCarr [05:44:29] I'll flag it if she shows up and I'm still around [05:44:36] but an e-mail might be warranted [05:45:18] I can just copy/paste the last 30 minutes or so into an email if you're running off [05:47:49] I'm not going anywhere [05:49:59] TimStarling: 18 UTC? [05:50:05] when you say 18:00 [05:50:14] that's a long time ago. (~12 hrs) [05:50:32] actually it was closer to 20:00 [05:50:36] and yes, it was a long time ago [05:50:48] that's why I asked if it has already been discussed [05:50:51] k. just wanted to be sure i understood :) [05:51:26] TimStarling: 24 19:18:42 < LeslieCarr> yay all uplinks are 40g now [05:51:34] no idea if that's relevant [05:51:40] (that's UTC) [05:54:49] yes, it would have been around that time [05:59:13] (03CR) 10Physikerwelt: "Mh..." [operations/puppet] - 10https://gerrit.wikimedia.org/r/61767 (owner: 10Physikerwelt) [06:02:01] yeah, look, here is the saturation: https://observium.wikimedia.org/graphs/to=1380088901/id=134/type=port_bits/from=1380002501/ [06:02:21] plain as day [06:08:52] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [06:09:57] what does it look like? [06:10:17] I'll attach it to my mail which I'm writing [06:10:31] this is non-urgent, we have until ~18:00 tomorrow to fix this [06:11:06] 2am here, g'night, thanks for poking [06:11:29] good night [06:34:12] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Wed Sep 25 06:34:04 UTC 2013 [06:34:52] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [07:11:40] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [07:18:47] (03PS1) 10TTO: Temporary celebration logo for tawiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85960 [07:34:40] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Wed Sep 25 07:34:30 UTC 2013 [07:34:40] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [07:55:21] (03CR) 10Jeremyb: [C: 04-1] "the file doesn't appear to be protected. (although I can't read the language so who knows)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85960 (owner: 10TTO) [08:12:25] hi, any ops around to discuss ESI deployment? [08:13:16] yurik: i think it's reasonable to expect that all besides tim and maybe mark are asleep [08:14:00] jeremyb, true that, but mark should be the best person for that i would think :) [08:14:11] for some reason i don't see him online :( [08:14:11] but he's not here [08:14:21] 6JTAABPF7: ^^ [08:14:39] is that his secret random nick? :) [08:15:00] yes! or maybe just the result of a split? [08:15:35] * yurik now knows mark's secret identity! Superman bewarned! [08:25:01] yurik: I would recommend using engineering list [08:25:10] yurik: Gabriel Wicke (in SF) would be interested probably. [08:25:26] yurik: Gabriel wrote some ESI support back in the old day for squid 3, though it never get used. [08:25:53] hashar, i just sent my proposal to ops & wikitech-l [08:26:02] yurik: and from a discussion I had with him during the all-staff, he is still interested in it. An evil plan would be to use yet another layer of cache and have the frontend cache include text from that "new" backend cache. [08:26:07] thx for the heads up abt gabriel [08:26:20] yurik: aren't you SF based? [08:26:24] NYC [08:26:33] sleep you should! [08:26:35] 4:30am here :) [08:26:36] :-D [08:26:42] still on SF time [08:26:43] but yeah definitely Gabriel + Mark [08:26:51] yurik: we have an RFC about it, and a bug IIRC [08:27:02] some other folks would be interested as well such as the mobile team, parsoid team as a whole and ori [08:27:05] Matt Walker did benchmarking [08:27:18] gwicke: Guten Abend :-] [08:27:22] hmm... look i shall. Although something tells me that ESI can be implemented extremelly well in varnish (need to check the code of course) [08:27:24] good morning ;) [08:27:29] are you in Germany? [08:27:30] lol [08:27:46] hashar: no, in SF- still playing with Cassandra [08:28:03] yurik: yes, we did our experiments with Varnish so far [08:28:12] you know what I love, we are all talking across three timezones, each of us being of a different nationality. Who could have imagined that 20 years ago? :-] [08:28:21] Matt verified that more includes slow things down a lot [08:28:42] at 5 includes request rates drop by 50% IIRC, so that is kind of the upper limit [08:29:06] the challenge now is to properly divide up the skin & content to get by with 2-3 fragments.. [08:29:07] hashar, hehe, i had such a drive when i used cell phone in the middle of a lake on a canoe a few years ago... [08:29:29] yurik: indeed. I feel like a kid right now [08:29:52] gwicke: you should talk to domas about cassandra. I am not sure whether it is still actively maintained [08:29:55] gwicke, i don't understand why the perf degrades so badly - basically ESI should be no different from serving several resources though a keep-alive http connection [08:30:24] yurik: the Varnish implementation uses regexps to find the syntax [08:30:33] then there are the subrequests and splicing [08:30:41] has nothing to do with pipelining really [08:31:15] hmm, kind of weird -- it doesn't need to search it on every request [08:31:30] Squid 3 ESI used libxml, which was both brittle (wikis did not produce anything like XML) and slow [08:32:51] you should really follow up on the list so that other people knows about all of that :D [08:32:54] in my mind it should be like this: get main from apache, search of esi tags, split it into several chunks and store those chunks in cache together with metadata about ESI urls. On every subsequent request just pull the metadata, determine other chunks to be included in the main one, and stream those chunks one after another [08:32:59] hashar: errr, doesn't seem so far-fetched. US (CONUS at least) has 4 timezones and 100+ nationalities [08:33:02] :) [08:33:03] we talked about that on the list and in the bug [08:33:16] mainly wanted to get Yuri up to speed ;) [08:33:39] thx gwicke, is my thinking about ESI is how they implemented it? [08:33:49] jeremyb: get to sleep :-D [08:33:52] see http://open.blogs.nytimes.com/tag/varnish/ ; they only run ESI if the backend says they should [08:34:10] jeremyb: err "please kindly head to your bed and enjoy a nice night of sleep" [08:34:15] (as of 3 years ago) [08:34:22] hashar: ok!!! [08:34:23] yurik: afaik fragments are re-validated and reassembled on each request [08:34:32] why??? [08:34:57] otherwise Varnish would have to track which complete page uses a given fragment [08:35:00] its already parsed, all that is needed is dynamically push chunks out one after another [08:35:17] gwicke, not sure i understand why it needs that? [08:35:18] so that it can refresh the page on fragment purge [08:36:00] again, not sure why - when the request comes in for URL1 that includes URL2, it goes to URL1, sees in metadata that it needs URL2, checks the cashing status of URL2, fetches it if needed, and serves [08:36:32] that is what it does, and it is not fast beyond a few fragments [08:36:33] if URL2 needs to be purged, it can simply be deleted [08:36:55] I thought you wanted to also cache the fully assembled page to speed things up [08:37:07] i don't think its a good idea [08:37:25] there is no major benefit as it is all in ram [08:37:39] you can simply give a linked list of pointers to the sending que [08:38:21] you also need to split up the parent page [08:38:37] we did not plan to rearchitect Varnish for now ;) [08:38:39] gwicke, the parent page only gets split up once on the first request [08:38:56] then the parent page is stored as several chunks [08:39:01] performance will be fine with a few fragments [08:39:46] yes, but this approach allows unlimited fragments really [08:40:19] without any substantial degradation of performance (unless of course each of your chunks is a few bytes long and there are thousands of them :) [08:40:28] I believe it when I see it ;) [08:41:03] hehe, i guess i should go dig into varnish now :) [08:42:23] were you thinking about the main site or about Zero? [08:42:41] if the latter, then you probably don't have to worry at all [08:42:47] zero for now, the world later :) [08:44:23] so the plan Matt and me were considering was to have a per-page loader page, then a user-specific head section including the tag, then the content fragment, and finally a footer fragment and the user-specific navi fragment [08:45:09] were you thinking of possibly doing it through ajax? [08:45:27] instead of returning content on every call :) [08:45:42] * yurik still hopes for a more on-the-fly site [08:45:45] that content is fairly small compared to all the JS and CSS we ship these days [08:45:51] exactly [08:46:01] so its better to load everything else once :) [08:46:09] single page foreva [08:47:00] the main functionality should work without JS [08:47:25] and many visits only open a single page after following a link from a search engine [08:47:26] do you think its wise for modern browsers? We could gracefully fallback [08:47:37] the second point is true [08:49:03] there are use cases for ajax, but I still see them more in the optional feature area [08:49:17] at least for content views [08:49:37] well, i think that if the user is logged in, they should be ajaxy - because they are more likely to view multiple pages [08:49:42] on high latency links doing several small requests sucks [08:49:53] better to get 10k of compressed HTML in one go [08:49:54] whereas if they are anonymous, they should be served fastest html chunk [08:50:37] true true [08:50:54] * yurik is digging into varnish source... [08:54:26] yurik: https://bugzilla.wikimedia.org/show_bug.cgi?id=32618 [08:55:31] gwicke, not sure what you proposed there [08:59:26] gwicke, ok, it seems there are no re-assembling or re-parsing (not as sure about later) [08:59:43] reading cache_esi_deliver.c [09:00:21] apparently they mostly have to deal with added complexity of zipped content including unzipped child and vice versa [09:00:45] yes, that is part of the task [09:01:12] and vary etc needs to be considered [09:01:25] for fragments as well [09:01:44] not exactly - vary is dealt with in the request handling for all requests [09:02:00] right, including subrequests [09:02:08] every time it processes a URL, it goes through all those steps again [09:02:37] only when it needs to reassemble it needs to figure out how to work with zipped content, fix CRC, etc [09:02:59] the problem is knowing when it needs to do so [09:03:40] for that it needs to check the cache status for each fragment [09:03:41] so i suspect the real bottleneck is in fixing compression - if you have main and child both zipped, it might be a bit harder to deal with them in a fast manner [09:04:07] Matt went through a lot of permutations re compression [09:04:30] true, but it is still a local search - shouldn't be that long compared with hitting backend [09:05:06] i mean - how long is processing of one request vs 2 requests if there is no network overhead [09:05:40] haha- once you hit the backend you are in a different magnitude [09:05:57] for zero none of this matters really [09:06:18] i meant - 2 requests vs 1 request only on the varnish server :) [09:06:30] of course the backend kills everything completelly (PHP be damned!!!) [09:07:11] five subrequests can be had at a 50% slowdown [09:07:37] which means that it is faster than full requests, but slower than a non-ESI page [09:08:04] this is assuming that all of those are in cache [09:08:23] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [09:08:56] hashar: re Cassandra, there are quite a few users outside of FB these days [09:09:17] afaik Cassandra 2 and CQL happened outside of FB [09:10:02] * gwicke waves goodbye [09:10:21] gwicke, oki, so it means we won't use varnish as wiki templates :) [09:10:33] gwicke_away, ^ :) [09:11:12] (03CR) 10Dzahn: "i see puppet is disabled here. bug or purpose?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/85869 (owner: 10Cmjohnson) [09:34:03] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Wed Sep 25 09:33:54 UTC 2013 [09:34:23] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [09:35:31] <6JTAABPF7> !log Put all *.wikimedia.org, wikidata, wikivoyage eqiad traffic on Varnish [09:35:48] Logged the message, Master [09:35:58] bah puppet doc generator is broken :( https://integration.wikimedia.org/ci/job/operations-puppet-doc/ [09:36:23] * hashar looks at https://gerrit.wikimedia.org/r/#/c/85840/ [09:36:35] by my wth [09:40:48] Reedy: Do you know if the software (as it was in late 2003/early 2004) also recorded timestamps in UTC? I have no reason to believe otherwise, just curious if anyone else does. [09:41:01] e.g. deleted revisions in bnwiki.archive [09:41:03] hashar: the .git stuff ? [09:41:26] mark: gwicke_away: AaronSchulz: TimStarling: ^ [09:41:53] akosiaris: yeah that is crazy. The git plugin is rewriting the submodules URLs [09:42:05] akosiaris: I have reopened an old bug https://bugzilla.wikimedia.org/show_bug.cgi?id=42953#c5 [09:42:05] i'm pretty sure it did by 2004, not sure about before that [09:42:18] I'm trying to unfold the exact time of the first edit to that wiki for their 10 year party. [09:42:41] OK [09:42:49] Krinkle: brion / tim might know. [09:43:05] else you ill have to dig in the old code :D [09:43:48] don't bother :) [09:43:53] it won't be the real first edit anyway [09:44:21] did we drop some history at one point? [09:44:35] I am pretty sure we are missing the history of use mod era since it did not have history [09:44:44] and some history got lost when migrating from phase2 to phase3 [09:44:46] bnwiki wasn't in UseMod afaik. [09:44:55] but bnwiki is probably old enough to have started directly on phase3 [09:45:14] phase3 or phase3. It started amonth after the first MediaWiki release [09:45:29] since we were probably using it internally for a little bit by then, I guess so. [09:46:02] I wonder why the first few edits have hostnames as the username though. I thought that was a UseMod thing. [09:46:03] e.g. flets-a-west-15-144.dsn.jp [09:46:13] Did MediaWiki ever do that? [09:46:54] Krinkle: oldest entry 27 January 2004 // https://bn.wikipedia.org/w/index.php?oldid=6&uselang=en [09:47:11] hashar: I'm way ahead of you [09:47:12] https://meta.wikimedia.org/w/index.php?title=User:Krinkle/Queries/The_Start_of_Bengali_Wikipedia [09:47:21] 2003 actually [09:47:46] HomePage was deleted [09:47:57] oh [09:48:31] hi mark, when you have a chance, take a look at the ESI email i sent earlier wrt zero -- do you think we can start testing the first and second steps this week? (I will be travelling next week and we can already start getting some data on it) [09:49:37] hashar: These are interesting as well: https://bn.wikipedia.org/wiki/User:!Popular_articles [09:49:44] There's about a dozen pages like that [09:50:00] but not created by "MediaWiki default". All by different IPs or user names [09:50:24] https://bn.wikipedia.org/wiki/Special:PrefixIndex/User:! [09:51:46] 3 of them created by Microsoft-owned IPs [09:51:54] I don't see those on other wikis. Weird. [09:52:34] Krinkle: the bengali wikipedia idea has been posted on dec 2003 : http://lists.wikimedia.org/pipermail/wikitech-l/2003-December/007320.html [09:52:50] Krinkle: I can't remember where we logged wiki creations, maybe on meta [09:53:09] "Previous message: [Wikitech-l] MediaWiki 1.1.0 release" [09:53:10] :D [09:53:13] :-D [09:53:18] That explains that [09:53:50] I'd rather not use that link since pipermail urls tend to get screwed over time [09:54:03] http://www.gossamer-threads.com/lists/wiki/wikitech/11087 [09:54:09] thx google [09:55:24] (cur | prev) 21:24, 16 July 2004‎ Angela (talk | contribs)‎ . . (40 bytes) (+40)‎ . . (Requests for new languages moved to Requests for new projects) (thank) [09:55:25] bah [09:55:27] copy pasted [09:55:30] https://meta.wikimedia.org/w/index.php?title=Requests_for_new_languages&offset=20041211234301&action=history [09:56:03] https://meta.wikimedia.org/w/index.php?title=Requests_for_new_projects&action=history [09:56:04] :D [09:56:16] I wish we had a way to merge history of articles moved by copy pasting [09:57:10] There is [09:57:14] Delete, Move, Undelete [09:57:37] "The first subdomain created for a non-English Wikipedia was deutsche.wikipedia.com (created on 16 March 2001, 01:38 UTC),[37] followed after a few hours by Catalan.wikipedia.com" when did we even have language codes?:p [09:58:21] when I first edited in october 2002, we already had the short prefixes [09:58:28] though I did a bunch of edit in wikipedia.com [09:58:43] I quickly used wikipedia.org because it looked nicer [09:59:18] and we didn't keep redirects heh [09:59:32] .com did redirect to .org [09:59:37] at some point [10:00:02] Krinkle: sorry can't find any archive about the creation of the bengali wiki beside the mail above [10:00:22] hashar: Looks like they were on wikipedia-l in stead of wikitech-l [10:00:27] https://meta.wikimedia.org/w/index.php?title=Requests_for_new_languages&dir=prev&action=history [10:00:32] https://meta.wikimedia.org/w/index.php?title=Requests_for_new_languages&oldid=76039 [10:00:46] "request on mailing list: [ .. ]" they all point to wikipedia-l [10:00:48] Krinkle: yeah I look on wikipedia-l, could not find anything relevant. http://lists.wikimedia.org/pipermail/wikipedia-l/ [10:01:53] http://web.archive.org/web/20031225175553/http://bn.wikipedia.org/ [10:02:14] http://web.archive.org/web/20020501000000*/http://bn.wikipedia.com/ :D [10:02:39] hehe the .com did redirect to .org http://web.archive.org/web/20040124201602/http://bn.wikipedia.com/ :-D [10:02:39] Krinkle: http://web.archive.org/web/20040201230339/http://bn.wikipedia.org/wiki/Special:Recentchanges [10:03:39] ah, here [10:03:42] http://web.archive.org/web/20030816194900/http://bn.wikipedia.com/wiki.cgi?action=rc&from=1022969958 [10:03:45] http://markmail.org/message/qnv7j6cxcrtlspr4 [10:03:51] No updates since June 1, 2002 3:19 pm [10:05:28] that date is passed as query parameter though [10:05:35] so that domain probably didn't exist in 2002 [10:05:45] "?action=rc&from=1022969958" [10:05:49] Page generated August 16, 2003 12:49 pm [10:05:55] that is interesting though [10:05:57] at that point the home page said "This subdomain is reserverd for the creation of a Wikipedia in the [[Bengali]] language. " [10:06:04] yeah [10:06:28] that placeholder message was first edited in 2003-12 [10:06:29] I think [10:06:33] (the archived edit() [10:06:52] HomePage [10:07:08] :D [10:07:39] but if bn.wikipedia.com existed in 2003-08, then why did someone ask for it on wikitech-l through Jimbo in 2003-12? [10:07:47] (the mailing list http://www.gossamer-threads.com/lists/wiki/wikitech/11087 ) [10:07:49] Your User ID number: 1005 [10:08:03] in preferences in archive [10:09:33] hmm.. maybe subdomains with placeholders had been autocreated [10:10:10] Or the cache was copied from an empty en.wiki db [10:10:21] so the wiki probably wasn't really visible in August 2003 [10:12:28] unattended luggage found at Dusseldorf airport, evacuation, bomb squad, hours of delay..until they found out it wasn't a bomb, it was "just" full of cocaine :p [10:12:40] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [10:13:10] Krinkle: digging in mediawiki/core history is not that helpful :/ [10:13:24] need svn logs ? [10:13:27] Krinkle: the first revision from April 2003 already had a bn.wikipedia.com entry [10:13:47] ah [10:13:56] we probably have put some placeholders there [10:14:12] s/we/the good old timers/ [10:14:50] hashar: dnshistory.org but it's too new ..hrmm [10:17:51] 14 years ago, I should have accepted that work at Sun [10:18:03] I would probably now be able to hack Java :D [10:18:31] hashar: what revision had a bn.wikipedia.com entry? [10:18:50] d82c14fb4fbac288b42ca5918b0a72f33ecb1e69 [10:18:53] initial revision [10:18:56] by Lee Daniel Crocker [10:19:04] that is http://mediawiki.org/wiki/Special:Code/MediaWiki/1284 [10:19:05] or http://mediawiki.org/wiki/Special:Code/MediaWiki/1286 [10:19:12] which really comes from CVS [10:19:21] I am pretty sure the source forge project got deleted [10:19:27] brion might have a copy of the cvs repo [10:19:42] anyway, that is from April 2003, before bn.wiki got created anyway [10:19:42] + "bn" => "http://bn.wikipedia.com/wiki.cgi?\$1", [10:19:43] aha [10:19:50] or maybe we created them in bulks [10:19:51] Interesting [10:20:04] then opened the wiki for editing if some people got interested in the project [10:20:08] i have no clue [10:20:18] just take the date of the oldid=6 maybe? [10:20:24] on 20030813 it is still "reserved for Bengali", on 20031022 it was "Got an HTTP 302 response at crawl time" [10:21:17] I'm not choosing anything, I;ll let them decide however it will probably be 20040127164554 (revision 6), or 20031226054910 (oldest revision from archive table) [10:21:39] 20031226054910 is a revision from HomePage, the page still contained the "reserved text" but it is not the revision that introduces it. [10:21:39] ahh [10:21:43] it is a user changign that page [10:21:55] probably got created in december so [10:24:38] heh, w gotta add creation_date as claims on Wikidata for all projects [10:24:49] and source for claims :p [10:25:33] :D [10:25:41] Sheldon Cooper (Q629583) ‎ (‎Created claim: instance of (P31): Doctor of Philosophy (Q752297)) [10:26:12] and support to add references for such claims [10:27:05] +=== Sister Projects === [10:27:05] +[http://www.nupedia.com Nupedia] - [http://meta.wikipedia.org Meta-Wikipedia] - [http://sep11.wikipedia.org/ September 11 Memorial Wiki] - [http://wiktionary.org Wiktionary] - [http://wikibooks.org Wikibooks] - [http://wikiquote.org Wikiquote] - [http://sources.wikipedia.org Wikisource] [10:28:13] heh, we turned the sep11 into a redirect to archive.org at least instead of just removing it [10:28:27] nupedia is timing out [10:29:00] !log nupedia is down!!! [10:29:30] maybe log bot runs on nupedia as well. it is timing out :P [10:30:45] hehe, there's a space [10:31:17] irc must've trimmed it. [10:31:28] client [10:34:10] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Wed Sep 25 10:34:00 UTC 2013 [10:34:40] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [10:35:44] https://www.wikidata.org/w/index.php?title=Q52&diff=72521511&oldid=71127219 [10:43:46] (03CR) 10Mark Bergsma: [C: 032] Initial version of PROXY support for Varnish [operations/debs/varnish] (patches/proxy-support) - 10https://gerrit.wikimedia.org/r/81244 (owner: 10Mark Bergsma) [10:50:38] (03CR) 10Mark Bergsma: [V: 032] Initial version of PROXY support for Varnish [operations/debs/varnish] (patches/proxy-support) - 10https://gerrit.wikimedia.org/r/81244 (owner: 10Mark Bergsma) [10:50:52] (03CR) 10Mark Bergsma: [C: 032 V: 032] Fix PROXY bug [operations/debs/varnish] (patches/proxy-support) - 10https://gerrit.wikimedia.org/r/82426 (owner: 10Mark Bergsma) [10:51:02] :) [10:55:23] If the remote URL ends with /.git, a non-bare repository is assumed. [10:55:24] If the remote URL does NOT end with /.git, a bare repository is assumed. [10:55:29] I like trying to understand logic puzzles [10:59:28] PROBLEM - Puppet freshness on analytics1021 is CRITICAL: No successful Puppet run in the last 10 hours [11:04:56] !log running checksetup.pl on bugzilla, installed InlineHistory extension [11:05:09] Logged the message, Master [11:08:06] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [11:12:06] mark, I think I discovered a problem with text varnish: some paths in JS create a cookie called mediaWiki.user.sessionId for anons. it's not a problem for squid due to XVO [11:12:44] however, for varnish this should result in uncacheable requests [11:13:17] hmmm [11:13:29] currently, the offender is ULS, but the cookie is set in core so everything might call this [11:14:49] and this is currently not caught by the session/token regex [11:22:01] (03PS1) 10Mark Bergsma: Pass on requests with a mediaWiki.user.sessionId cookie [operations/puppet] - 10https://gerrit.wikimedia.org/r/85969 [11:22:35] MaxSem: ^ [11:24:56] mmm, I wonder when that cookie gets unset when you log in [11:25:17] * MaxSem looks [11:26:50] so presumably whenever an anon changes any uls setting that cookie gets created? [11:28:02] no, for every page view [11:28:20] ugh, and it's still there after a login [11:29:01] so ^^ will not work [11:29:09] ok [11:32:27] what is the XVO header sent here? [11:32:54] Accept-Encoding;list-contains=gzip,Cookie;string-contains=enwikiToken;string-contains=enwikiLoggedOut;string-contains=enwikiSession;string-contains=centralauth_Token;string-contains=centralauth_Session;string-contains=centralauth_LoggedOut;string-contains=mf_useformat;string-contains=stopMobileRedirect;string-contains=forceHTTPS [11:34:25] I think I'm missing something [11:34:29] why is it not a problem for Squid? [11:34:36] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Wed Sep 25 11:34:27 UTC 2013 [11:35:06] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [11:44:19] (03PS1) 10Dzahn: stats.wikimedia.org - replace webserver::apache::site / inline Apache config with apache_site and template in ./sites/apache [operations/puppet] - 10https://gerrit.wikimedia.org/r/85971 [11:55:19] (03CR) 10Krinkle: "(1 comment)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/85971 (owner: 10Dzahn) [12:00:08] (03CR) 10Krinkle: "This is set my mediawiki core's mediawiki.user.js. Only (supposed to be) used by client side scripts. If server-side code uses it, that co" [operations/puppet] - 10https://gerrit.wikimedia.org/r/85969 (owner: 10Mark Bergsma) [12:01:53] Krinkle: MaxSem says (as far as I understand it) that the presence of that cookie should result in uncacheable request/response. However, I don't currently see how Squid would be doing that at the moment, and I'm also not sure why it's needed... [12:02:28] (03CR) 10Mark Bergsma: "MaxSem says (as far as I understand it) that the presence of that cookie should result in uncacheable request/response. However, I don't c" [operations/puppet] - 10https://gerrit.wikimedia.org/r/85969 (owner: 10Mark Bergsma) [12:02:45] mark: It shouldn't be needed. It is only set and read by javascript. [12:02:47] I said I'm suspicious about varnish, but that squid is ok [12:03:05] Assuming that we just preserve any such cookies [12:03:44] what do you mean exactly by "preserve" here? [12:03:49] it's client side only, right? [12:03:56] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Wed Sep 25 12:03:46 UTC 2013 [12:03:56] so as long as mediawiki doesn't remove it it should be ok? [12:04:00] Yes [12:04:06] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [12:04:33] I'm not sure what that patch set does, but if it results in regular article page views (e.g. nl.wikipedia.org//wiki/Foo) being sent to apache's instead of static cache, that seems wasteful. [12:04:34] in our varnish setup, IF a request ends up at mediawiki because it's not in cache, mediawiki receives the original cookies as sent by the client [12:05:24] so indeed, I guess this isn't needed [12:05:47] I'd also worry about scalability as that cookie is part of a generic mediawiki.js api that any extension or site script can use. [12:06:36] Krinkle, ULS sets it for every pageview [12:07:04] afaik there should be no reason for it to be unconditionally triggered on a plain page view where the user doesn't interact with non-core UI components, only when the user does something with a specific component. [12:07:04] but MaxSem: why do you think it should break caching? [12:07:09] But in theory it can be set for every user. [12:07:51] paranoia, cookie name contains "session" [12:07:55] If ULS calls it unconditionally, that seems like yet another js problem in ULS. But harmless in general though, just code quality. [12:08:42] mw.user.sessionId is used to generate a random id, and preserve it whithin the current browser session (e.g. expires when the browser closes) [12:08:53] mostly for testing and statistics. [12:09:02] a/b testing, that is. [12:09:10] ok [12:09:15] only for anonymous users [12:09:20] and ULS uses it for EventLogging (WTF?) [12:09:24] i'm going to abandon that patch now until we have strong evidence and understanding that it is needed [12:09:37] Krinkle: ahh hhhh [12:09:43] Krinkle: I have upgraded ruby-jsduck by mistake :( [12:09:52] There is most certainly some quesionable things going on in ULS around this, but I'm pretty sure it doesnt' concern server-side cachability however. [12:09:59] (03Abandoned) 10Mark Bergsma: Pass on requests with a mediaWiki.user.sessionId cookie [operations/puppet] - 10https://gerrit.wikimedia.org/r/85969 (owner: 10Mark Bergsma) [12:10:06] (03PS2) 10Dzahn: stats.wikimedia.org - enable SSL, replace webserver::apache::site and its inline Apache config with apache_site and template in ./sites/apache [operations/puppet] - 10https://gerrit.wikimedia.org/r/85971 [12:10:15] hashar: OK, I'll keep an eye on the next jsduck-publish job and see if it breaks anything. [12:10:24] It's fine, I was going to do that tomorrow, but I'll check it out now. [12:10:29] dzahn: how about putting it behind the misc varnish cluster? [12:10:30] sorry :( [12:10:33] mutante: ^ [12:11:04] (03PS3) 10Dzahn: stats.wikimedia.org - enable SSL, replace webserver::apache::site and its inline Apache config with apache_site and template in ./sites/apache [operations/puppet] - 10https://gerrit.wikimedia.org/r/85971 [12:12:25] mark: hmm, i said i'd make the setup the same as metrics.wm because that is on the same host and conflicted and they already got their SSL certs for this. if we'd move one then probably all and also metrics.wm [12:15:14] which SSL certs are they using? [12:16:06] stats.wikimedia.org.pem and metrics.wikimedia.org.pem, so they don't have wildcards [12:16:16] good [12:16:20] we have already bought them [12:16:31] that makes it less urgent [12:16:45] but because metrics.wm and stats.wm used diffrent puppet ways to setup Apache ... [12:16:52] i offered them to just fix that first [12:16:54] right [12:23:59] PROBLEM - Puppet freshness on mw1072 is CRITICAL: No successful Puppet run in the last 10 hours [12:33:59] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Wed Sep 25 12:33:52 UTC 2013 [12:33:59] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [12:36:21] (03PS1) 10Dzahn: stats.wm - have an apache/ports.conf with NameVirtualHost *:443 to avoid '_default_ virtualhost overlap on port 443' [operations/puppet] - 10https://gerrit.wikimedia.org/r/85973 [12:55:31] (03PS1) 10Springle: upgrade db1035 to mariadb [operations/puppet] - 10https://gerrit.wikimedia.org/r/85976 [12:56:40] (03CR) 10Springle: [C: 032] upgrade db1035 to mariadb [operations/puppet] - 10https://gerrit.wikimedia.org/r/85976 (owner: 10Springle) [13:22:38] !log re-enabling disabled puppet on (fixed) mw1046 [13:22:54] Logged the message, Master [13:29:12] cmjohnson: hi [13:29:21] hi [13:30:07] on mw1046, puppet was disabled, was that the bug or did you actually disable it [13:30:22] after reinstall [13:30:29] i didn't disable it [13:30:42] that's what i guessed, so it's a bug [13:30:59] and then there is a bug in a bug, because the log message tells you to do something that doesn't work :p [13:31:16] so if the log is just full of this: " notice: Skipping run of Puppet configuration client; administratively disabled; use 'puppet Puppet configuration client --enable' to re-enable." [13:31:27] then it wants to say: puppet agent client --enable [13:32:14] i did that, and the bad news is now it has issues installing packages [13:32:16] okay...what about the package install issues [13:32:36] it's the PHP version again somehow [13:33:09] i wonder why it would happen on reinstall (now) [13:34:02] err: /Stage[main]/Applicationserver::Packages/Package[php5-cli]/ensure: change from 5.3.10-1ubuntu3.8 to 5.3.10-1ubuntu3.6+wmf1 failed: [13:35:14] it wants to downgrade [13:35:51] which is odd...i know i've had to manually install pkgs during initial setup [13:36:25] 5.3.10-1ubuntu3.6+wmf1 is what f.e. mw1047 has [13:37:54] PROBLEM - Puppet freshness on mw1046 is CRITICAL: No successful Puppet run in the last 10 hours [13:39:14] PROBLEM - Apache HTTP on mw1046 is CRITICAL: Connection refused [13:39:14] RECOVERY - Puppet freshness on mw1046 is OK: puppet ran at Wed Sep 25 13:39:10 UTC 2013 [13:39:24] PROBLEM - twemproxy process on mw1046 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 65534 (nobody), command name nutcracker [13:39:54] PROBLEM - Puppet freshness on mw1046 is CRITICAL: No successful Puppet run in the last 10 hours [13:40:53] (03CR) 10Ottomata: "Ok, so! Removing a class or resource in puppet won't actually turn anything off or do anything. Puppet is declarative, and in order to '" [operations/puppet] - 10https://gerrit.wikimedia.org/r/85853 (owner: 10QChris) [13:42:14] RECOVERY - Puppet freshness on mw1046 is OK: puppet ran at Wed Sep 25 13:42:11 UTC 2013 [13:42:17] hold on ..watching another run after trying things [13:42:39] (and there is also something about twemproxy service not starting yet) [13:42:54] PROBLEM - Puppet freshness on mw1046 is CRITICAL: No successful Puppet run in the last 10 hours [13:43:24] RECOVERY - twemproxy process on mw1046 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [13:43:56] !log apt-get install php5-cli on mw1046 fixes 'ensure: change from 5.3.10-1ubuntu3.8 to 5.3.10-1ubuntu3.6+wmf1 failed' issue after reinstall [13:44:10] Logged the message, Master [13:44:14] RECOVERY - Apache HTTP on mw1046 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.079 second response time [13:44:24] cmjohnson: ii php5-cli 5.3.10-1ubuntu3.6+wmf1 ... [13:44:32] mutante did you manually start twemproxy [13:44:49] no, it worked by puppet [13:44:57] but not all things work after a single run [13:45:16] it needs 2 or 3 or so :p [13:57:57] (03PS3) 10Dzahn: adding mw1046 back to dsh groups [operations/puppet] - 10https://gerrit.wikimedia.org/r/85869 (owner: 10Cmjohnson) [13:58:32] (03CR) 10Dzahn: [C: 032] "it's back up and running" [operations/puppet] - 10https://gerrit.wikimedia.org/r/85869 (owner: 10Cmjohnson) [14:00:03] (03PS4) 10Ottomata: stats.wikimedia.org - enable SSL, replace webserver::apache::site and its inline Apache config with apache_site and template in ./sites/apache [operations/puppet] - 10https://gerrit.wikimedia.org/r/85971 (owner: 10Dzahn) [14:06:25] (03PS5) 10Ottomata: stats.wikimedia.org - enable SSL, replace webserver::apache::site and its inline Apache config with apache_site and template in ./sites/apache [operations/puppet] - 10https://gerrit.wikimedia.org/r/85971 (owner: 10Dzahn) [14:10:34] RECOVERY - Disk space on analytics1003 is OK: DISK OK [14:10:35] RECOVERY - Disk space on analytics1004 is OK: DISK OK [14:11:23] (03PS3) 10Ottomata: analytics1003 and analytics1004 now have public IPs. [operations/dns] - 10https://gerrit.wikimedia.org/r/85878 [14:12:49] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [14:12:59] RECOVERY - Puppet freshness on mw1046 is OK: puppet ran at Wed Sep 25 14:12:53 UTC 2013 [14:13:34] (03CR) 10Dzahn: [C: 032] stats.wikimedia.org - enable SSL, replace webserver::apache::site and its inline Apache config with apache_site and template in ./sites/apac [operations/puppet] - 10https://gerrit.wikimedia.org/r/85971 (owner: 10Dzahn) [14:13:49] PROBLEM - Puppet freshness on mw1046 is CRITICAL: No successful Puppet run in the last 10 hours [14:13:55] (03PS2) 10Dzahn: stats.wm - have an apache/ports.conf with NameVirtualHost *:443 to avoid '_default_ virtualhost overlap on port 443' [operations/puppet] - 10https://gerrit.wikimedia.org/r/85973 [14:14:04] (03PS1) 10Ottomata: Looks like icinga and ganglia didn't properly decommission analytics100[34] yesterday. [operations/puppet] - 10https://gerrit.wikimedia.org/r/85979 [14:14:10] (03CR) 10Dzahn: [C: 032] stats.wm - have an apache/ports.conf with NameVirtualHost *:443 to avoid '_default_ virtualhost overlap on port 443' [operations/puppet] - 10https://gerrit.wikimedia.org/r/85973 (owner: 10Dzahn) [14:15:47] (03PS2) 10Ottomata: Looks like icinga and ganglia didn't properly decommission analytics100[34] yesterday. [operations/puppet] - 10https://gerrit.wikimedia.org/r/85979 [14:15:54] (03CR) 10Ottomata: [C: 032 V: 032] Looks like icinga and ganglia didn't properly decommission analytics100[34] yesterday. [operations/puppet] - 10https://gerrit.wikimedia.org/r/85979 (owner: 10Ottomata) [14:18:30] PROBLEM - Disk space on analytics1003 is CRITICAL: Connection refused by host [14:18:59] PROBLEM - RAID on analytics1003 is CRITICAL: Connection refused by host [14:23:30] (03PS2) 10QChris: Turn off generating geowiki limn files [operations/puppet] - 10https://gerrit.wikimedia.org/r/85853 [14:24:25] (03CR) 10QChris: "Thanks! Good to know." [operations/puppet] - 10https://gerrit.wikimedia.org/r/85853 (owner: 10QChris) [14:25:27] (03CR) 10Ottomata: [C: 032 V: 032] Turn off generating geowiki limn files [operations/puppet] - 10https://gerrit.wikimedia.org/r/85853 (owner: 10QChris) [14:32:40] (03PS1) 10Dzahn: stats.wm - re-enable loading of Apache SSL module in metrics.wm now that we don't do that with webserver::apache::site anymore. or there is a missing dependency on Webserver::Apache::Module[ssl] [operations/puppet] - 10https://gerrit.wikimedia.org/r/85981 [14:33:43] (03CR) 10Dzahn: [C: 032] stats.wm - re-enable loading of Apache SSL module in metrics.wm now that we don't do that with webserver::apache::site anymore. or there is [operations/puppet] - 10https://gerrit.wikimedia.org/r/85981 (owner: 10Dzahn) [14:35:59] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Wed Sep 25 14:35:51 UTC 2013 [14:36:49] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [14:38:24] gah, why didn't someone call ? [14:40:15] LeslieCarr: what's up? [14:41:08] i guess something was broken, need to read through all scrollback [14:41:20] but someone texted, except i was asleep... [14:43:26] oh, didn't see until now. around 22:26 < TimStarling> I25/Sep/13 05:06:46 IDevice status changed to Down [14:43:27] ? [14:43:58] ah, tinet got saturated [14:44:04] that is the what the problem was [14:44:08] yeah, changed vrrp prio for row C for now [14:44:46] the csw-sdtpa "changing to down" is just a symptom of a broken bit in csw-sdtpa … which would require a reboot to fix, as well as some reseating/possibly swapping out power supplies [14:46:09] RECOVERY - Puppet freshness on mw1046 is OK: puppet ran at Wed Sep 25 14:46:04 UTC 2013 [14:46:41] since we have two transits on cr1 maybe we want to swap one of row a/b as well ? [14:46:49] PROBLEM - Puppet freshness on mw1046 is CRITICAL: No successful Puppet run in the last 10 hours [14:49:06] let's wait until I'm done with confeds ;) [14:49:12] there's no acute problem right now [14:49:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 30 seconds [14:50:19] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 18.894 second response time [14:55:06] (03CR) 10Steinsplitter: [C: 031] disable add links wikidata widget on commons, per bug 54497 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85837 (owner: 10Aude) [14:56:53] yeah [15:02:54] (03PS1) 10Dzahn: fix Apache syntax error 'Illegal override option Nonesta' in stats.wikimedia.org/htdocs/reportcard/pediapress that prevented Apache graceful [operations/puppet] - 10https://gerrit.wikimedia.org/r/85984 [15:04:03] (03CR) 10Reedy: [C: 04-1] "(1 comment)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85837 (owner: 10Aude) [15:04:20] uh [15:04:39] oh, wtf did i do [15:04:44] (03CR) 10Dzahn: [C: 032] fix Apache syntax error 'Illegal override option Nonesta' in stats.wikimedia.org/htdocs/reportcard/pediapress that prevented Apache graceful [operations/puppet] - 10https://gerrit.wikimedia.org/r/85984 (owner: 10Dzahn) [15:04:45] :) [15:06:03] LeslieCarr: I think that's a non-solution anyway [15:06:18] stuff shouldn't depend so much on whichever router happens to be the vrrp master at any time [15:06:35] yeah but it will until we split the routers into virtual routeres [15:06:41] sadly [15:06:49] stupid bgp tiebreakers [15:06:59] yup [15:07:03] we should do that [15:07:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 30 seconds [15:07:39] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Wed Sep 25 15:07:33 UTC 2013 [15:07:49] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [15:08:07] (03PS2) 10Aude: disable add links wikidata widget on commons, per bug 54497 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85837 [15:08:21] (03CR) 10Aude: "(1 comment)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85837 (owner: 10Aude) [15:08:51] (03PS3) 10Reedy: disable add links wikidata widget on commons, per bug 54497 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85837 (owner: 10Aude) [15:08:56] (03CR) 10Reedy: [C: 032] disable add links wikidata widget on commons, per bug 54497 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85837 (owner: 10Aude) [15:09:08] (03Merged) 10jenkins-bot: disable add links wikidata widget on commons, per bug 54497 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85837 (owner: 10Aude) [15:10:29] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 27.005 second response time [15:12:21] hashar: https://stats.wikimedia.org [15:13:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 30 seconds [15:14:15] mutante: I have no idea why it is missing the favicon, but that seems to solve some RT ticket :-) kudos! [15:14:19] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 17.324 second response time [15:15:10] does puppet actually ever run on neon? :p [15:15:21] when I try I often get this: [15:15:21] err: Could not retrieve catalog from remote server: execution expired [15:15:24] after like 5 minutes or longer [15:21:50] !log reedy synchronized wmf-config/CommonSettings.php [15:21:58] hrm [15:22:03] that's no good [15:22:06] Logged the message, Master [15:25:29] RECOVERY - Puppet freshness on mw1046 is OK: puppet ran at Wed Sep 25 15:25:27 UTC 2013 [15:25:49] PROBLEM - Puppet freshness on mw1046 is CRITICAL: No successful Puppet run in the last 10 hours [15:29:49] (03CR) 10Dzahn: [C: 031] RT: allow login via LDAP [operations/puppet] - 10https://gerrit.wikimedia.org/r/80577 (owner: 10Faidon Liambotis) [15:30:25] (03PS1) 10Cmjohnson: Decommissioning sq45, removing some missed decom entries [operations/puppet] - 10https://gerrit.wikimedia.org/r/85988 [15:34:59] (03CR) 10Lcarr: [C: 032] analytics1003 and analytics1004 now have public IPs. [operations/dns] - 10https://gerrit.wikimedia.org/r/85878 (owner: 10Ottomata) [15:35:29] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Wed Sep 25 15:35:27 UTC 2013 [15:35:49] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [15:35:55] ottomata: have you done dns changes before ? [15:36:28] LeslieCarr: we has a repo now!! \o/ [15:36:33] also, if neon is still misbehaving… you can sorta cheat to make sure it goes through ;) [15:36:41] yes! it's awesome, thanks to paravoid :) [15:37:02] and i am sure others as well [15:37:13] yes, danke! [15:37:27] yes, made dns changes before [15:37:32] how do I cheat LeslieCarr? [15:37:43] remote the icinga files for those hosts? [15:37:56] kill off other puppet processes to get the load down on stafford [15:38:03] hahahaha [15:38:13] so cheating [15:38:18] echo "notice: Finished catalog run in 565.19 seconds" >> /var/log/puppet.log [15:38:46] well you can uber cheat and put an iptables rule on, blocking all of port 8140 except for the host you want to run…. but i try to only mildly cheat [15:39:33] (03CR) 10Cmjohnson: [C: 032] Decommissioning sq45, removing some missed decom entries [operations/puppet] - 10https://gerrit.wikimedia.org/r/85988 (owner: 10Cmjohnson) [15:39:49] PROBLEM - Disk space on ms1004 is CRITICAL: DISK CRITICAL - free space: / 712 MB (3% inode=90%): /var/lib/ureadahead/debugfs 712 MB (3% inode=90%): [15:45:49] RECOVERY - Puppet freshness on mw1046 is OK: puppet ran at Wed Sep 25 15:45:41 UTC 2013 [15:45:49] PROBLEM - Puppet freshness on mw1046 is CRITICAL: No successful Puppet run in the last 10 hours [16:05:59] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Wed Sep 25 16:05:51 UTC 2013 [16:06:49] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [16:09:59] PROBLEM - DPKG on cp1050 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [16:10:30] (03PS4) 10Ottomata: analytics1003 and analytics1004 now have public IPs. [operations/dns] - 10https://gerrit.wikimedia.org/r/85878 [16:10:38] (03CR) 10Ottomata: [C: 032 V: 032] analytics1003 and analytics1004 now have public IPs. [operations/dns] - 10https://gerrit.wikimedia.org/r/85878 (owner: 10Ottomata) [16:11:25] !log authdns-update to give analytics100[34] public IPs, about to reinstall them [16:11:41] Logged the message, Master [16:11:59] RECOVERY - DPKG on cp1050 is OK: All packages OK [16:12:38] ^ the dpkg on cp1050 thing is me, fixed now [16:13:21] woot [16:13:25] upgrading to new varnish ? [16:14:49] RECOVERY - Puppet freshness on mw1046 is OK: puppet ran at Wed Sep 25 16:14:46 UTC 2013 [16:15:10] LeslieCarr: can I just killall puppet on stafford? [16:15:49] PROBLEM - Puppet freshness on mw1046 is CRITICAL: No successful Puppet run in the last 10 hours [16:15:55] you can… i try to only kill about 20-30 [16:17:17] killing via htop is fun, because then it's like a game…. when will the bars go down….. the world may never know [16:17:32] haha [16:17:33] haha [16:18:00] LesliaCarr: yeah, just an upgraded package to include the netmapper thing. upgrading/restarting varnish is a scary process :) [16:18:29] hahahah, this is fun LeslieCarr :) [16:18:41] :) [16:18:49] it makes me feel like an evil villain [16:18:54] i think it's all the bars on the top [16:20:15] man those bars sneak back up real fast though [16:20:18] hard to keep them at bay [16:20:21] more like tower defense or something [16:20:41] after you kill 100, you get an upgrade that lets you kill 5 per keystroke [16:20:47] ooo [16:21:39] ok if this neon puppet run doesn't go through i might just go ahead an reinstall these anyway and deal with the icinga blasts for a bit [16:21:43] cmon neon! go go go [16:22:54] * akosiaris hates etherpad.... [16:23:01] lite or otherwise... [16:23:24] ottomata: rob is done with the bastion host in ulsfo... we must start installing machines there [16:24:24] !log added a favicon.ico on stats.wikimedia.org for hashar [16:24:40] Logged the message, Master [16:29:09] coooool [16:29:31] akosiaris: is there a doc with servers that need installed, bastion info, etc? [16:29:56] :) [16:30:11] puppet tower defense.... [16:30:43] ottomata: not that i am aware off... there is RT #5828... but not much after that... LeslieCarr? [16:30:58] any idea how we get what needs to be on in ulsfo ? [16:31:11] http://en.wikipedia.org/wiki/The_Typing_of_the_Dead [16:32:10] hahaha [16:32:16] the game cabinet!haha [16:32:44] it's awesome, gaming and increase your typing speed [16:34:23] have you played it? [16:34:31] yea [16:34:44] um, ask RobH for the server stuff? i think bast4001 is all set up but not sure ? [16:34:46] on a Windows box [16:35:29] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Wed Sep 25 16:35:20 UTC 2013 [16:35:49] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [16:36:13] agg [16:36:14] err: Could not retrieve catalog from remote server: execution expired [16:36:18] looks like #5828: update dhcp files so bast4001 is the tftpdboot sever for ulsfo subnet(s) [16:36:44] damnit [16:36:48] that is the easy part..... after that ? [16:37:24] i'm going to just manually edit the icinga files....objections? [16:37:26] honestly i'm not completely sure :) install all the machines, give htem roles... [16:37:28] profit ? [16:37:38] ottomata: nope, just do a reload of icinga after you do [16:37:48] it shoud overwrite next time puppet manages to run [16:37:54] yo [16:38:07] ulsfo stuff, i know what needs to happen, i thought i put on ticket but we can chat here =] [16:38:21] which ticket ? [16:38:25] akosiaris: So the bast4001 is also a tftp server [16:38:32] lemme find, i have a chain of them [16:38:56] https://rt.wikimedia.org/Ticket/Display.html?id=5828 [16:39:04] that one i got ... seems easy enough [16:39:14] parent ticket https://rt.wikimedia.org/Ticket/Display.html?id=5702 [16:39:34] there we go [16:39:35] nice [16:39:36] So yea, update the ulsfo stuff to point at bast4001 ip, then its just normal system setup (doesnt differ much else) [16:39:36] thanx [16:39:54] The lvs machines should be setup like the lvs stuff in eqiad, in terms of OS and parititions and the like [16:40:29] akosiaris: I'm not sure how familar you are with our install setup [16:40:40] So if I am not explaining enough, or too much, lemme know. [16:40:51] not at all... that was the idea... that i learn about it [16:40:56] this will help [16:40:58] https://wikitech.wikimedia.org/wiki/Server_Lifecycle [16:41:17] You'll be taking over from the 'install' step onward [16:41:21] installation even. [16:41:39] I purposefully did NOT do any preinstall work for you guys [16:41:41] normally I do [16:41:47] but I assumed you'd want to do it all to learn it [16:42:20] :-) [16:42:48] akosiaris: i have done all these steps before so we can do them together, i'm resinstalling 2 analytics machines atm [16:43:17] ottomata: cool. when do we start ? [16:43:28] yea and i'm happy to help too, just ask any questions you guys come across [16:43:42] I'd advise setting up the varnish systems first, as they will be easier [16:43:47] then lvs [16:43:57] and afaik varnish now uses internal IPs [16:44:11] (no ips are assigned, but there is a puppet repo for that now thanks to paravoid ;) [16:44:17] sorry, not puppet repo [16:44:19] git repo. [16:44:27] RECOVERY - Puppet freshness on mw1046 is OK: puppet ran at Wed Sep 25 16:44:19 UTC 2013 [16:44:38] so its a lot easier than it used to be (imo) [16:44:47] PROBLEM - Puppet freshness on mw1046 is CRITICAL: No successful Puppet run in the last 10 hours [16:44:55] akosiaris, i also have a standup coming up, so i might be busy for the next hourish [16:44:55] so we also get to assign IPs and create the DNS entries :-) [16:45:02] go ahead and start and RobH and I can chime in with help [16:45:02] yep [16:45:11] i didnt assign anything [16:45:18] except the bast4001 stuff [16:45:34] ja so, akosiaris, maybe first check out the operations/dns repo [16:45:38] and look at the files there, to see where to add stuff [16:45:39] so once dhcp is updated to point to bast4001 , then you can do installs. the dhcpd files in puppet repo need update [16:45:39] oh [16:45:53] you can do dns, or update dhcp, but both happen before install =] [16:46:01] (so order is immaterial) [16:46:30] oh hm [16:46:34] ok i don't know how to do that [16:46:55] bast4001 is already puppetized, just needs dhcp stuff? [16:47:03] nono [16:47:08] sorry, not explaining [16:47:12] So dhcpd still runs on brewster [16:47:14] across the link [16:47:29] so in puppet://files/dhcpd/dhcpd.conf needs update [16:47:30] ok i am trying to debug freaking etherpad right now, but i will read up on all the material and get start tomorrow early (it is getting almost 20:00 here) [16:47:33] for the ulsfo subnet section [16:47:42] akosiaris: Ahh, yea thats no problem man [16:47:45] ok cool, akosiaris sounds good, I probably have a buncha stuff to do today too [16:47:56] cmjohnson also offered to help [16:48:11] cmjohnson: i think akosiaris and i want to do this initial stuff to get a feel for stuff [16:48:26] so we might wait til tomorrowish to do that, but after we've at least installed a few ourselves, we're happy for more help [16:48:47] yea i didnt ask anyone else to touch it, since i knew you two wanted them [16:51:28] RobH, i'm still confused about the neede dhcpd changes, but i'll ask again when I'm ready to do some stuff [16:51:53] !log install a few package upgrades on iron [16:52:06] Logged the message, Master [16:52:08] !log taking down analytics100[34] for reinstall with public IPs [16:52:15] ottomata: i think i know that needs to be done there [16:52:20] Logged the message, Master [16:52:20] ok cool [16:52:30] cmon Ciscos! I know you don't don't like to do what you are told, but you can do it! [16:52:34] ottomata: if you open the dhcpd.conf file in files/dhcpd/ [16:52:49] ja looking [16:52:50] you change the next-server 208.80.154.10; # carbon (tftp server) [16:53:02] OH [16:53:11] notice for tampa, it points to brewster [16:53:19] so for ulsfo, point it to bast4001 [16:53:23] ok okok, setting up dchpd for ulsfo and having tftp point at bast4001 [16:53:24] ok cool [16:53:28] so TFTP is very, very bad over latency links [16:53:30] got it [16:53:31] ok cool [16:53:32] Setting up smbclient .. samba-common <-- when i see Samba upgrades i always wonder if we want to purge it in general [16:53:47] so we only do the single run for the tftpd server per site over link [16:53:53] and then each site gets own tftp server [16:54:13] RobH, do we need sections for the private internal subnets too? [16:54:18] i just see ulsfo public right now [16:54:29] ottomata: we do indeed, which I intentionally did NOT do! [16:54:38] i just forgot to mention it ;P [16:54:41] great, got it. [16:54:52] but you can get the ip info for that subnet off the dns files =] [16:54:55] those nets are listed somewhere I will find when I start looking? [16:54:58] great ja danke [16:55:13] I assumed it was better to give you all the sources, rather than just tell you [16:55:15] =] [16:58:23] ja thanks [17:00:07] ahhhhh ciscos never pxe boot when I tell them [17:00:08] yargghh [17:04:32] ciscos never do anything when folks tell them [17:04:47] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Wed Sep 25 17:04:44 UTC 2013 [17:04:49] mark, hi, any thoughts on ESI deployment? [17:05:17] yurik: nope, didn't have time yet today, I'll look at your mail(s) tomorrow [17:05:47] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [17:12:57] RECOVERY - Puppet freshness on mw1046 is OK: puppet ran at Wed Sep 25 17:12:48 UTC 2013 [17:13:47] PROBLEM - Puppet freshness on mw1046 is CRITICAL: No successful Puppet run in the last 10 hours [17:15:19] hmm, [17:15:20] Sep 25 17:15:05 brewster dhcpd: DHCPDISCOVER from 88:43:e1:c2:99:48 via 10.64.21.3: network 10.64.21/24: no free leases [17:15:31] LeslieCarr: ? [17:15:46] that is analytics1003 trying to get dhcp [17:15:55] Reboot and Select proper Boot device [17:15:56] or Insert Boot Media in selected Boot device and press a key [17:16:19] hmm, maybe wrong network? [17:16:31] hrm [17:16:33] oh i know [17:16:38] didn't switch the vlans [17:17:04] oh hm [17:17:06] that is on the switch? [17:17:23] yep, getting that now [17:23:17] ok [17:23:19] sayonara [17:23:24] i'm tired ;) [17:23:57] au revoir [17:27:49] AaronSchulz: https://www.mediawiki.org/wiki/User:GWicke/Notes/Storage#Cassandra_compression [17:28:39] AaronSchulz: also found out that Cassandra actually orders by timestamp despite presenting a type 1 UUID externally [17:28:52] so no need to mess with that [17:30:07] RECOVERY - DPKG on cp1062 is OK: All packages OK [17:30:32] oh i ficed the vlans [17:30:37] sorry, forgot to say [17:30:56] oh danke [17:31:03] pushing a key to continue [17:31:24] gwicke: nice [17:31:47] gwicke, you're seriously considering Cassandra? [17:32:53] cool, doing [17:32:56] yeah i wanted to ask abou tthat too [17:33:04] cassandra who wha? i want who wha? [17:33:10] gwicke, for external storage? [17:33:35] https://www.mediawiki.org/wiki/Extension:Cassandra ;) [17:33:47] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Wed Sep 25 17:33:45 UTC 2013 [17:34:01] I did that 3 years ago;) [17:34:47] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [17:36:56] gwicke, but now I want to do ES via DataStore and just have a generic DS class for Cassie:) [17:37:16] OR MONGO, IT'S WEBSCALE! [17:37:55] PROBLEM - Puppet freshness on mw1046 is CRITICAL: No successful Puppet run in the last 10 hours [17:42:45] RECOVERY - Puppet freshness on mw1046 is OK: puppet ran at Wed Sep 25 17:42:43 UTC 2013 [17:44:04] but is it cloudscale MaxSem ? [17:45:24] LeslieCarr, WEB CONSISTS OF CLOUDS SO YES! [17:52:09] MaxSem: we are considering to use it for Parsoid HTML/JSON etc initially, and if that works well we could use it for wikitext too [17:56:40] MaxSem: we'll create a REST storage API for it, maybe that could be interesting as a backend for your key/value storage proposal [17:57:15] gwicke, so you're goit go abtract away Thrift? [17:57:26] we are using CQL [17:57:47] but yes, both Cassandra and CQL/Thrift will be hidden [17:58:11] meh, I last looked at it 3 years ago:) [17:58:40] there are some nice new features in cassandra 2 [17:59:09] http://www.datastax.com/documentation/cql/3.1/webhelp/index.html [18:00:20] manybubbles/^d: did you see the new ES official client apis? [18:00:46] paravoid: I don't believe so. let me look [18:01:20] http://www.elasticsearch.org/blog/unleash-the-clients-ruby-python-php-perl/#php [18:03:50] cool! this isn't great timing because we just made an elastica mediawiki module for other folks to use. [18:04:24] I'll certainly investigate them though. [18:05:22] heh [18:05:57] <^d> Whoops. [18:05:59] I can imagine [18:06:09] <^d> Was going to say the same thing as manybubbles. We'll definitely take a look. [18:06:41] ^d and paravoid: they are pretty thin wrappers, it looks like. [18:06:50] <^d> I imagine so. [18:06:56] all I can think of when I see elastica: http://www.youtube.com/watch?v=ilKcXIFi-Rc [18:06:58] they probably do the whole connection failover thing better than Elastica does [18:07:13] <^d> Having one in python is kinda nice though. Would make it easy to write like wmf-specific maintenance scripts. [18:07:29] <^d> Like "delete an index" or somesuch stuff. [18:08:17] ^d: curl and shell is enough for some of that, but yeah. There has always been a python client. they just support this one. [18:09:14] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [18:09:53] ^d: it looks like the new perl client is by the same guy that made the old one. I've seen lots of good things about the old perl client. [18:10:28] I think none of them are new [18:10:34] they're just blessing them now [18:10:57] paravoid: the perl one is actually a rewrite, it says [18:11:03] ah [18:11:08] okay, I should clearly read more carefully [18:11:54] paravoid: that one has a license change though. GPL/Artistic to Apache [18:15:22] manybubbles, since I haven't started poking at elastica, you may as well switch to something else:) [18:16:16] errrrg, third-party dependencies [18:16:17] MaxSem: I'll add it to my list! Let me know when you start looking at stuff. You may as well start with the official client. [18:16:47] WTF is Monolog? [18:17:00] and Pimple... [18:17:44] we don't need composer everywhere... [18:20:34] hmm, MaxSem - any idea when geodata search will be available on wikidata? [18:20:38] or if they're even working at it? [18:20:47] or will Extension:GeoData need to provide it for the foreseeable future? [18:21:13] YuviPanda|train, no timeline afaik. I'm waiting for them [18:21:22] MaxSem: but is it at least on the cards? [18:21:53] ask them!:) [18:26:46] MaxSem: booorrring :P [18:27:32] * MaxSem sends YuviPanda to the Barrels o' Fun D2 level:P [18:29:44] YuviPanda: sounds like you should be talking to aude [18:29:54] just curious, jeremyb. [18:30:04] aude: ! [18:34:04] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Wed Sep 25 18:34:01 UTC 2013 [18:34:14] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [18:37:33] (03PS1) 10Cmjohnson: adding 1 more file for decom of sq45 [operations/puppet] - 10https://gerrit.wikimedia.org/r/86009 [18:38:10] (03CR) 10Cmjohnson: [C: 032] adding 1 more file for decom of sq45 [operations/puppet] - 10https://gerrit.wikimedia.org/r/86009 (owner: 10Cmjohnson) [18:41:04] hmm, mgmt ips for analytics1003 and 1004 don't resolve [18:41:13] they're still up, I can ssh to the IPs [18:41:23] but the mgmt addresses don't resolve anymore [18:41:25] hm [18:45:22] * hexmode grumbles about mailman [18:46:03] who can help me replace a listadmin pwd? [18:47:47] (03PS1) 10Jdlrobson: Replace Watchlist specific schema with generic ClickTracking schema [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/86011 [18:47:47] hrm... nm?? I think it worked? [18:48:36] oh, no. [18:48:39] ugh. [18:54:33] hexmode: ops+philippe [18:54:54] hexmode: can you create a rt ticket? [18:55:13] cmjohnson: yes... [18:56:05] ottomata: check your forward dns entries [18:57:01] ? [18:57:20] check to make sure the mgmt entries are in wmnet [18:57:38] oh yeah, they are in the configs ,just not resolving [18:57:59] want me to take a look? [18:58:10] sure [18:58:23] dig -t any analytics1003.mgmt.eqiad.wmnet has nothing [18:58:33] ost analytics1003.mgmt.eqiad.wmnet [18:58:33] Host analytics1003.mgmt.eqiad.wmnet not found: 3(NXDOMAIN) [18:58:55] yeah i see that [19:02:15] https://rt.wikimedia.org/SelfService/Display.html?id=5837 [19:05:21] ottomata: the entries for an1003/4.mgmt are missing in wmnet [19:05:31] i can fix since I am there now or you can do ...whatever [19:05:51] bwerrrr [19:06:45] ok thanks, i must have taken them out, sorry, i saw that the mgmt entries were in the .arpa file, just assumed they were where they were supposed to be [19:06:53] i must have taken them out when I was removing the regualr .eqiad addies [19:06:56] thanks cmjohnson [19:07:18] (03PS1) 10Cmjohnson: adding forward dns entries for analytics1003/4 mgmt [operations/dns] - 10https://gerrit.wikimedia.org/r/86013 [19:07:38] yeah..the 10.65.x.x are mgmt in eqiad just leave those alone [19:08:24] ottomata: plz review https://gerrit.wikimedia.org/r/#/c/86013/ [19:09:50] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [19:10:03] cmjohnson: that file looks like it uses tabs instead of spaces [19:10:05] aside from that lgtm [19:10:17] oh also [19:10:25] the WMF… entry [19:10:38] not really sure what those are [19:11:44] those are the asset tags associated with the server. when they're first racked sometimes we don't know the names yet but we need mgmt ips so we name them after the asset tag [19:11:50] nothing critical there [19:11:59] and they are spaces...looks right in vim [19:12:11] (03CR) 10Cmjohnson: [C: 032] adding forward dns entries for analytics1003/4 mgmt [operations/dns] - 10https://gerrit.wikimedia.org/r/86013 (owner: 10Cmjohnson) [19:13:26] !log authdns update [19:13:42] Logged the message, Master [19:14:12] omg ciscos [19:14:18] why are they so annoying [19:14:30] hahaha...they're irritating [19:14:37] just rebooting is a 10 minute process [19:14:52] yeah, and the pxe boot order never seems to work consistently [19:14:54] have to catch it [19:16:32] i thought i reinstalled these already aggghhhh, ok one more try [19:19:24] yea the boot order is annoying as the one time options seem to not work [19:19:34] plus there is no manual 'skip memory test' option [19:19:51] which would eliminate 11 of the 15 min post. [19:33:27] (03PS1) 10Ryan Lane: Pull modules/returners/pillars when targeted [operations/puppet] - 10https://gerrit.wikimedia.org/r/86014 [19:33:44] oh no, i am being pinged :) [19:34:50] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Wed Sep 25 19:34:42 UTC 2013 [19:35:27] YuviPanda: MaxSem jeremyb manybubbles maybe geodata / elastic / wikidata stuff is something we can poke at when i come to SF in october [19:35:44] ping!! [19:35:47] i figured out how to make wikibase and the geodata extension / solr work together [19:35:50] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [19:36:03] i don't think elastic / wikidata would be terribly diffitul [19:36:06] difficult [19:36:21] cool. I'm not planning on being in SF in October, but I'd love to work with you [19:36:52] manybubbles: awww, well ^d is around and it might be something i can focus on then [19:37:00] yeah! [19:37:10] manybubbles: where are you normally? [19:37:13] i need to get up to speed on elastic, install it play with it [19:37:18] Raleigh, North Carolina [19:37:19] <^d> aude: Yes :) [19:37:22] oh, right [19:37:34] desparate need for wikidata, generally, also to improve our search! [19:38:19] (03PS2) 10Ryan Lane: Pull modules/returners/pillars when targeted [operations/puppet] - 10https://gerrit.wikimedia.org/r/86014 [19:39:10] (03PS1) 10Ottomata: Removing analytics100[34] from decom.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/86017 [19:39:16] manybubbles: and i am impressed that foursquare uses elastic search [19:39:21] so it must be good for geo [19:39:23] (03CR) 10Ottomata: [C: 032 V: 032] Removing analytics100[34] from decom.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/86017 (owner: 10Ottomata) [19:43:22] \o/ I think I just eliminated the last manual step for deployment targets in git-deploy :) [19:43:57] Ryan_Lane: what's the timeline for git-deploy? [19:44:01] I guess I should also make them do a deployment [19:44:04] when might we try it again? [19:44:18] aude: platform is targeting next quarter [19:44:21] oooh [19:44:22] ok [19:44:25] or is it now this quarter? [19:44:40] maybe on tuesday? [19:44:46] or whenever october starts [19:44:54] * Ryan_Lane nods [19:45:28] I'm not fully working on it. just pushing in things to improve the system when I have time [19:45:42] ok, that's fine [19:46:11] it's just something i should try to see what it does / how it works [19:51:48] hexmode: i enabled your mail subscription and sent you your password to MediaWiki-distributors [19:52:33] cmjohnson: that isn't the pw for /admindb/ :( [19:52:49] which is what I was looking for [19:53:25] i know but I don't see you listed as an admin...I need to check with someone else [19:53:32] I get emails almost every day for spammy senders to the list, but I cannot defer/discard them [19:53:47] cmjohnson: let me see what email is admin [19:56:18] hexmode: i found you. [19:56:30] :) [20:02:44] (03PS3) 10Ryan Lane: Pull dependencies and deploy all when targeted [operations/puppet] - 10https://gerrit.wikimedia.org/r/86014 [20:03:58] (03PS4) 10Ryan Lane: Pull dependencies and deploy all when targeted [operations/puppet] - 10https://gerrit.wikimedia.org/r/86014 [20:06:00] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Wed Sep 25 20:05:59 UTC 2013 [20:06:50] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [20:08:11] greg-g: I'm I good to deploy CentralAuth now? I'm assuming manybubbles is done with the Cirrus deploy [20:08:48] csteipp: I had a Cirrus deploy? [20:09:16] "CirrusSearch to closed wikis" 11-12p (pdt) [20:09:41] Although I think chad said you guys weren't doing more of those... so maybe it was canceled? [20:11:45] csteipp: we did that yesterday I think [20:13:14] manybubbles: Cool. In that case I'll just deploy :) [20:18:56] (03CR) 10Ryan Lane: [C: 032] Pull dependencies and deploy all when targeted [operations/puppet] - 10https://gerrit.wikimedia.org/r/86014 (owner: 10Ryan Lane) [20:28:49] akosiaris: mind if I add a new kafka .deb to apt? [20:28:52] !log csteipp synchronized php-1.22wmf18/extensions/CentralAuth 'update to master for SUL fix' [20:29:03] Logged the message, Master [20:29:06] i'm building it now with an udpated version num [20:31:43] !log csteipp synchronized php-1.22wmf17/extensions/CentralAuth 'update to master for SUL fix' [20:31:54] Logged the message, Master [20:33:50] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Wed Sep 25 20:33:47 UTC 2013 [20:34:50] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [20:39:32] ottomata: feel free [20:39:36] So… resource is defined like this: gitclone { 'integration/zuul': [20:39:43] (where 'gitclone' is a definition, not a class) [20:39:56] When referred to later, like this: subscribe => gitclone['integration/zuul'], [20:40:15] should it be capitalized like subscribe => Gitclone['integration/zuul'] ? [20:40:24] Ryan_Lane, any idea? Anyone else? [20:40:46] <^d> Does gitclone suck less than git::clone? [20:41:07] yes [20:41:11] Gitclone [20:41:23] and if it was something that was namespaced [20:41:25] Git::Clone [20:42:28] ^d: gitclone is tentatively a copy/paste of git::clone into a module [20:42:45] ottomata, so, doesn't matter that it's a define and not a class? [20:43:00] nope, you can treat a define basically like a custom resource [20:43:09] so if you were doing a file [20:43:12] ^d: If you want us to use vcsrepo instead, then… join the fray https://gerrit.wikimedia.org/r/#/c/74099/ [20:43:15] subscribe => File['blabla'] [20:43:18] same as [20:43:19] ottomata, ok, cool. Thanks. [20:43:25] subscribe => My::Define['woohoo'] [20:44:32] andrewbogott: what module? [20:44:42] um… 'gitclone' [20:44:58] i don't see it, [20:45:06] That's because I'm making it now [20:45:11] oh ha [20:45:12] ok [20:45:23] faidon will say no to using a define in a module's init.pp, btw [20:45:51] really? [20:46:04] The alternative is to have a 'git' module that has no init at all. [20:46:08] That's preferred you think? [20:46:20] http://docs.puppetlabs.com/puppet/2.7/reference/modules_fundamentals.html#example [20:46:23] yeah that's fine [20:46:32] i do that for the cdh4 module [20:46:55] Hm… I guess that makes for less renaming anyway. [20:46:59] yeah! [20:47:05] Feels weird, but, *shrug* [20:47:23] also a bit nice in case we want to have more git module features, buuuut then again, i betcha there are a billion git puppet modules already out there [20:47:23] but ja [20:47:25] whaevvva [20:47:28] * andrewbogott git reset --hards [20:47:36] dear.lord. [20:48:03] ok running home for some more workey time, back in a bit [20:48:21] twkozlowski ? [20:48:36] (03CR) 10Odder: [C: 04-1] "Not the right way to do this, and you have no way to ensure this gets reverted on October 31." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85960 (owner: 10TTO) [20:48:48] andrewbogott: ^^ [20:49:18] hm [21:00:04] PROBLEM - Puppet freshness on analytics1021 is CRITICAL: No successful Puppet run in the last 10 hours [21:00:36] kback [21:19:24] heya paravoid hi? [21:19:32] looking into using ferm module to firewall off public kafkas [21:19:50] i think i want to use R_SERVICE, should I add a define::ferm::r_service? [21:20:16] or maybe just an extra arg to ferm::service [21:20:28] and use r_service if $srange is defined [21:43:54] Krinkle screen re-attached? [21:44:22] I'm not krinkle|detached, so yes :). that nickname is automatically changed when I disconnect from the bouncer. [21:45:08] ah [21:45:16] so I was thinking [21:45:22] perhaps we can semi-revive the thing [21:45:32] we could reserve ranges for projects [21:46:04] PPP -> XYZ each 2 units of X can refer to a project [21:46:31] thats 2592 projects per family which should be more than enough [21:46:56] (03PS1) 10Andrew Bogott: Moved git::clone into a new, skeletal 'git' module. [operations/puppet] - 10https://gerrit.wikimedia.org/r/86033 [21:46:57] (03PS1) 10Andrew Bogott: Remove the unused git::init define [operations/puppet] - 10https://gerrit.wikimedia.org/r/86034 [21:46:58] (03PS1) 10Andrew Bogott: Move and rename the (currently unused) gitconfig module. [operations/puppet] - 10https://gerrit.wikimedia.org/r/86035 [21:47:47] ToAruShiroiNeko: Why bother though? [21:48:05] so 2xx and 3xx would be wikipedia etc [21:48:11] well it has the advantage of being readable [21:48:20] you can tell you are going to wikipedia [21:48:29] It's going to be base36 int, so not very readable [21:48:42] its code [21:48:50] people can tell they are going to commons [21:48:58] hence a possible penis file [21:49:00] We can't make the PPP or XYZ deterministic either, since some subdomains are not real language codes. So we'll need a map of each individual subdomain anyway. [21:49:18] subdomains? [21:49:25] you mean en in en.wikipedia [21:49:27] commons, en, nl, meta [21:49:28] right? [21:49:30] Yes [21:49:31] No [21:50:01] commons is not a family, so that wouldn't have a recognizable prefix unless you know its second and third digit. [21:50:03] see I would group commons, meta into one 1296 group [21:50:13] its not a family in the real sense [21:50:14] which you'll have to know in both PPP and FPP (family, pp) [21:50:35] yes, family=special (special.dblist is a database list of *.wikimedia.org list that aren't chapterwikis) [21:50:49] I would group chapter wikis too [21:50:51] + mediawiki.org I think [21:50:52] say Cxx [21:50:58] C for chapter [21:51:07] yes, they are already grouped as families, this is not something we'd invent [21:51:11] Bxx for background [21:51:16] what? [21:51:51] commons and meta would be in family "special", which means to know that its commons you'll need to know the 2 digits after the family id. [21:52:16] Which is the case in both FPP and PPP, so no advantage there. [21:52:32] Also, commons shadows through local wikis, so you can link to any file on commons through any domain name [21:52:48] zh.wiktionary.org/wiki/File:omg works likewise [21:53:21] What does "background" mean? [21:53:47] Krinkle well [21:53:56] background like mediawiki wiki [21:53:58] bugzilla [21:53:59] etc [21:54:32] I suppose you are right but I think grouping would make things more orderly [21:54:45] there is already a family grouping in place in wikimedia configuration, so we don't need to create a new one. [21:54:48] its a simple task while generating the map [21:54:55] but we can use that, yes. [21:55:17] I especially want to classify chapter wikis differently for example [21:55:29] I want to avoid giving en wikipedia 001 designation as well [21:55:44] 00 range perhaps shouldnt be used [21:55:52] we can start form 100 [21:56:23] we can do clever short cuts [21:56:42] like COMPPPPPP for commons [21:57:00] or NEWPPPPPP for wiki news [21:57:09] er [21:57:23] NXXPPP [21:57:27] N for news [21:57:34] WXXPPP for wikipedia [21:57:51] (03PS2) 10Andrew Bogott: Moved git::clone into a new, skeletal 'git' module. [operations/puppet] - 10https://gerrit.wikimedia.org/r/86033 [21:57:52] (03PS2) 10Andrew Bogott: Move and rename the (currently unused) gitconfig module. [operations/puppet] - 10https://gerrit.wikimedia.org/r/86035 [21:57:53] (03PS2) 10Andrew Bogott: Remove the unused git::init define [operations/puppet] - 10https://gerrit.wikimedia.org/r/86034 [21:58:03] it goes back a little to your plan [21:58:18] Krinkle it being 25 characters or less is the objective for QR [21:58:29] if we have an extra char we can use it [21:59:34] (03CR) 10Andrew Bogott: "Hashar, I don't see this used anyplace -- can I move it with impunity, or are there labs instances that depend on the old name?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/86035 (owner: 10Andrew Bogott) [22:01:10] (03CR) 10Andrew Bogott: "Leslie and Chad, added you as reviewers because this patch murders a class that y'all have worked on. If this needs preserving please spe" [operations/puppet] - 10https://gerrit.wikimedia.org/r/86034 (owner: 10Andrew Bogott) [22:02:03] ToAruShiroiNeko: so if zh.wikipedia.org is family=wikipedia, #99, it'd be w2rXXXXX (where base_convert(99, 10, 36) == '2r') [22:02:10] is that what you mean? [22:02:15] (03CR) 10Andrew Bogott: "A more Ori-friendly approach to this is in the patchset beginning here: https://gerrit.wikimedia.org/r/#/c/86033/" [operations/puppet] - 10https://gerrit.wikimedia.org/r/74099 (owner: 10Andrew Bogott) [22:02:20] Krinkle yeah [22:02:36] I can immediately tell I am going to A wikipedia [22:02:39] murderer!!!!!! [22:03:05] When did jenkins-bot start +2ing things? [22:03:53] * andrewbogott briefly looks up at LeslieCarr, resumes murdering [22:04:17] andrewbogott: oh my god [22:04:23] tell me that's not soooooo much better [22:04:24] ? [22:04:27] ToAruShiroiNeko: For any special wiki that has a full 3-character short key, that means the first letter of that is no longer available as a family code [22:04:37] e.g. if commons gets 'COM', we can't use 'C' for chapterwikis [22:04:43] andrewbogott: i'm just happy with it, is all :P [22:04:50] ori-l, hard to argue apart from imaginary future use conditions :) [22:04:53] or shouldn't (we can program it to skip 'OM', but that would be confusing for reading the url as well) [22:05:25] (03CR) 10Lcarr: [C: 032] Moved git::clone into a new, skeletal 'git' module. [operations/puppet] - 10https://gerrit.wikimedia.org/r/86033 (owner: 10Andrew Bogott) [22:05:36] this patch does the job in 106 lines, other patch was 2302(!) [22:05:36] Krinkle right [22:05:44] so maybe SCO [22:05:48] Special - Commons [22:06:01] point is commons shouldnt be some random string [22:06:10] SMW - for mediawiki wiki [22:06:19] or semantic mediawiki :P [22:06:22] or maybe TMW for Techinal - Mediawiki [22:06:27] (03CR) 10Chad: [C: 031] Remove the unused git::init define [operations/puppet] - 10https://gerrit.wikimedia.org/r/86034 (owner: 10Andrew Bogott) [22:06:33] (03CR) 10Lcarr: [C: 032] "despite the fact that you're murdering the (horrible) generic-definitions.pp" [operations/puppet] - 10https://gerrit.wikimedia.org/r/86034 (owner: 10Andrew Bogott) [22:06:44] TMZ should probably redirect to WMF wiki :p [22:07:44] (03CR) 10Lcarr: [C: 031] "I like the new name" [operations/puppet] - 10https://gerrit.wikimedia.org/r/86035 (owner: 10Andrew Bogott) [22:08:41] Krinkle the community probably should decide what goes to where [22:09:08] ToAruShiroiNeko: I'd recommend not spending too much time on the implementation of the PPP (or FPP). The RFC is still quite elaborate, and we might go for full url shortening instead. [22:09:36] true but this is something to consider during implementation phase [22:09:52] with 46,656 possibilities we dont have to be THAT careful in wasting slots [22:10:52] otherwise all redirects will start with a 0 or a 1 [22:10:59] we dont have 1296 wikis [22:11:40] Krinkle mind that the code on wiki should only generate the PPPPPP part [22:11:58] err [22:12:00] CCCCCC part [22:12:13] ammending PPP in front which would be same for the entire wiki [22:12:33] no point in including it on the code table as it would be 3 always same characters [22:13:26] 'the code' is likely a mediawiki extension written in php, generic for all wikis at once, not per wiki. It would be reading in the wikimedia cluster configuration, getting the current wikis's prefix and the current page's shorturl pageid base36 it and lastly the short domainname itself, and generate the url. [22:13:28] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [22:14:05] nothing hardcoded anywhere [22:24:58] PROBLEM - Puppet freshness on mw1072 is CRITICAL: No successful Puppet run in the last 10 hours [22:32:31] (03CR) 10Ryan Lane: [C: 032] Remove stupid package I don't need [operations/puppet] - 10https://gerrit.wikimedia.org/r/85240 (owner: 10Chad) [22:35:58] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Wed Sep 25 22:35:53 UTC 2013 [22:36:28] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [22:38:31] (03PS2) 10Ryan Lane: Add 'gdash' to git-deploy [operations/puppet] - 10https://gerrit.wikimedia.org/r/84193 (owner: 10Ori.livneh) [22:44:36] (03CR) 10Ryan Lane: [C: 032] Add 'gdash' to git-deploy [operations/puppet] - 10https://gerrit.wikimedia.org/r/84193 (owner: 10Ori.livneh) [22:46:52] (03CR) 10Ryan Lane: "Any update on this? It's sitting in my review queue." [operations/puppet] - 10https://gerrit.wikimedia.org/r/62336 (owner: 10QChris) [23:04:58] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Wed Sep 25 23:04:57 UTC 2013 [23:05:28] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [23:09:56] (03PS1) 10Ori.livneh: Applying misc::graphite::gdash makes host a gdash deployment target [operations/puppet] - 10https://gerrit.wikimedia.org/r/86050 [23:10:02] Ryan_Lane: ^ [23:10:19] Krinkle we can do with just 2 digits then [23:10:34] we have a little over 800 wikis after all [23:10:38] !log krinkle synchronized php-1.22wmf18/extensions/TemplateData 'I3653efbe7b6841c70f352da526f4946fa0a4c76f' [23:10:46] I think we should mproperly take advantage of the 3 digits :) [23:10:50] Logged the message, Master [23:11:12] I could create you a map if you like [23:11:23] ToAruShiroiNeko: It doesn't seem improbably that we'll never have more than 36*36 wikis though. We can't easily extend this [23:11:28] no need for a map now [23:11:36] mop then? :p [23:11:55] you seen my idea above W, for wikipedia N for news maybe D for wiktionary [23:12:11] B for books [23:12:17] yes, FPP [23:12:26] !log krinkle synchronized php-1.22wmf18/extensions/VisualEditor 'I3653efbe7b6841c70f352da526f4946fa0a4c76f' [23:12:36] C for chapters, whats tricky is projects like commons [23:12:39] Logged the message, Master [23:17:44] !log krinkle synchronized php-1.22wmf17/extensions/VisualEditor 'I3653efbe7b6841c70f352da526f4946fa0a4c76f' [23:17:58] Logged the message, Master [23:18:51] Ryan_Lane: Cool commit summary bro https://gerrit.wikimedia.org/r/85240 definitely the most descriptive commit message of all time ;) I'm confident we will never see another commit with that commit message ever again, it's so unique. I know exactly what that commit does just from reading the message [23:18:53] [23:19:10] (Sorry, our team is really anal about enforcing descriptive commit message so this one cracked us up) [23:21:26] !log krinkle Started syncing Wikimedia installation... : [23:21:37] Logged the message, Master [23:21:53] RoanKattouw: that was not cool [23:22:10] also, it wasn't even Ryan's changeset [23:22:35] Ryan approved it [23:23:14] that is not a reason at all to be mean and snarky [23:23:33] It definitely deserves a place on our virtual commit message wall though [23:24:26] Hm.. scap has gotten a lot more verbose in its output (tells me about each sync). I hope failed syncs still stand out though? [23:24:52] e.g. dead servers that we never unlist for various reasons [23:25:09] nope, it doesn't have them stand out or aggregate them at the end [23:28:23] Krinkle: No, scap it terrible. [23:28:54] Krinkle: Hence the demand to fix things. [23:29:30] it wasn't like this always. Less then 1 or 2 months ago it was just idle in output during the main sync phase, except for servers it couldn't sync to [23:32:10] !log krinkle Finished syncing Wikimedia installation... : [23:32:21] Logged the message, Master [23:34:48] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Wed Sep 25 23:34:40 UTC 2013 [23:35:28] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [23:41:04] LeslieCarr: Oh whoops yeah that was Chad's change. I'm sorry that was overly snarky, I meant that as poking fun but I didn't do that very well [23:53:20] scap is not terrible