[00:02:17] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Server Error - 1703 bytes in 6.038 second response time [00:03:27] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [00:04:04] ^ that's me, again. [00:04:04] (03CR) 10Catrope: [C: 032] Make VisualEditor opt-out on Portuguese Wikibooks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115213 (owner: 10Jforrester) [00:04:14] (03Merged) 10jenkins-bot: Make VisualEditor opt-out on Portuguese Wikibooks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115213 (owner: 10Jforrester) [00:04:17] (probably!) [00:05:21] (03PS2) 10Catrope: Enable VE in the "Recherche:" (104) namespace for frwikiversity [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115299 (owner: 10Jforrester) [00:05:27] (just minor spikes from varnish restarts, I'm going very slow this time, but it is what it is) [00:05:39] (03CR) 10Catrope: [C: 032] Enable VE in the "Recherche:" (104) namespace for frwikiversity [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115299 (owner: 10Jforrester) [00:05:46] (03Merged) 10jenkins-bot: Enable VE in the "Recherche:" (104) namespace for frwikiversity [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115299 (owner: 10Jforrester) [00:05:57] (03CR) 10Catrope: [C: 032] Fix popup video size by ordering transcode settings properly [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115094 (owner: 10Brion VIBBER) [00:06:02] ^d: Is Reedy also away? [00:06:04] (03Merged) 10jenkins-bot: Fix popup video size by ordering transcode settings properly [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115094 (owner: 10Brion VIBBER) [00:06:10] \o/ [00:06:37] <^d> hoo: no idea. I'm in a meeting. [00:07:34] !log catrope synchronized visualeditor-default.dblist 'Enable VE by default on ptwikibooks' [00:07:43] Logged the message, Master [00:07:45] (03PS4) 10Ori.livneh: Refactor GeoIP lookup code; add tests [operations/puppet] - 10https://gerrit.wikimedia.org/r/113900 [00:08:15] !log catrope synchronized wmf-config/InitialiseSettings.php 'Enable VE in the Recherce namespace on frwikiversity' [00:08:23] Logged the message, Master [00:08:30] k [00:08:37] (03CR) 10Ori.livneh: "PS4:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/113900 (owner: 10Ori.livneh) [00:08:52] !log catrope synchronized wmf-config/CommonSettings.php 'Fix popup video size by ordering transcode settings properly' [00:09:01] brion: ---^^ [00:09:01] Logged the message, Master [00:09:05] (03PS5) 10Ori.livneh: Refactor GeoIP lookup code; add tests [operations/puppet] - 10https://gerrit.wikimedia.org/r/113900 [00:09:07] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [00:09:08] woo lemme test [00:09:10] (03PS3) 10Ori.livneh: Send GeoIP lookup result as 'GeoIP' cookie [operations/puppet] - 10https://gerrit.wikimedia.org/r/113935 [00:09:41] RoanKattouw: confirmed fixed, thanks! [00:09:47] !log restarting gitblit on antimony [00:09:55] Logged the message, Master [00:09:58] git.wm was down again [00:11:17] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 177368 bytes in 9.036 second response time [00:11:55] (03PS2) 10Dzahn: remove db9, decom [operations/dns] - 10https://gerrit.wikimedia.org/r/115241 [00:12:55] (03CR) 10GWicke: "Will requests with this cookie still get a cached response? Would be good to document here why it still works." [operations/puppet] - 10https://gerrit.wikimedia.org/r/113935 (owner: 10Ori.livneh) [00:13:08] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (200955) [00:16:31] (03PS4) 10Ori.livneh: Send GeoIP lookup result as 'GeoIP' cookie [operations/puppet] - 10https://gerrit.wikimedia.org/r/113935 [00:17:38] !log deploying new hook in Bugzilla's edit.html.tmpl for bug 36064 [00:17:47] Logged the message, Master [00:17:55] (03CR) 10Ori.livneh: "PS4:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/113935 (owner: 10Ori.livneh) [00:18:07] (03PS1) 10Ottomata: Initial 2.0.0-1 debian release [operations/debs/archiva] (debian) - 10https://gerrit.wikimedia.org/r/115323 [00:18:11] gwicke: I'll reply, just need a few [00:18:39] ori on vacation looks a lot like ori at work :) [00:18:39] (03PS2) 10Ottomata: [WIP] Initial 2.0.0-1 debian release [operations/debs/archiva] (debian) - 10https://gerrit.wikimedia.org/r/115323 [00:19:55] (03CR) 10Ottomata: "This is debianization of archiva's upstream binary tarball release. I know this is non standard, but packaging the full source is really" [operations/debs/archiva] (debian) - 10https://gerrit.wikimedia.org/r/115323 (owner: 10Ottomata) [00:23:03] (03CR) 10Ottomata: [WIP] Initial 2.0.0-1 debian release (031 comment) [operations/debs/archiva] (debian) - 10https://gerrit.wikimedia.org/r/115323 (owner: 10Ottomata) [00:23:04] !log Bugzilla - replacing custom template with exiting hook (gerrit 114145) [00:23:11] Logged the message, Master [00:23:16] ottomata, we should really have an internal repo too that lets us push security updates and things where licensing is difficult to properly handle [00:23:42] like just a place for crappy debs that we'd never want to distirbute to the public? [00:23:49] that we could still use to puppetize and install things from? [00:25:02] ottomata, yup [00:25:30] that would make things easier, for sure :/ [00:25:31] and that we can push to / upgrade from without letting out security issues prematurely [00:25:37] aye [00:27:35] (03CR) 10Ori.livneh: "(need to amend this)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/113935 (owner: 10Ori.livneh) [00:27:47] !log catrope synchronized php-1.23wmf14/includes/filebackend/FileBackendStore.php 'efb1e99fd for real this time' [00:27:56] Logged the message, Master [00:28:54] !log catrope synchronized php-1.23wmf15/includes/filebackend/FileBackendStore.php 'd52a8af6 for real this time' [00:29:02] Logged the message, Master [00:30:55] !log catrope synchronized php-1.23wmf15/extensions/VisualEditor/modules/ve-mw/ui/pages/ve.ui.MWSettingsPage.js 'Fix adding redirects in VE' [00:31:02] Logged the message, Master [01:47:27] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [01:55:35] (03CR) 10Dzahn: [C: 031] "per RFC 2069 or RFC 2617 or RFC 1945 or Ori:) heh, are we overthinking this? let's just make them use the same term as a starter (and for " [operations/puppet] - 10https://gerrit.wikimedia.org/r/114503 (owner: 10Greg Grossmeier) [01:57:00] (03CR) 10Dzahn: [C: 032] "ZumBot bad_browser , per hashar" [operations/puppet] - 10https://gerrit.wikimedia.org/r/114516 (owner: 10Hashar) [02:02:21] (03CR) 10Dzahn: [C: 031] Give manybubbles and demon sudo on logstash100X [operations/puppet] - 10https://gerrit.wikimedia.org/r/115150 (owner: 10Manybubbles) [02:02:27] (03PS2) 10Manybubbles: Give manybubbles and demon sudo on logstash100X [operations/puppet] - 10https://gerrit.wikimedia.org/r/115150 [02:05:10] (03CR) 10Dzahn: [C: 031] Give manybubbles and demon sudo on logstash100X [operations/puppet] - 10https://gerrit.wikimedia.org/r/115150 (owner: 10Manybubbles) [02:06:57] PROBLEM - Puppet freshness on virt1000 is CRITICAL: Last successful Puppet run was Fri 21 Feb 2014 04:42:42 PM UTC [02:27:26] !log LocalisationUpdate completed (1.23wmf14) at 2014-02-25 02:27:26+00:00 [02:27:35] Logged the message, Master [02:40:07] PROBLEM - puppetmaster https on virt0 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:41:27] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [02:46:57] RECOVERY - puppetmaster https on virt0 is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.157 second response time [02:53:09] !log LocalisationUpdate completed (1.23wmf15) at 2014-02-25 02:53:09+00:00 [02:53:18] Logged the message, Master [02:54:57] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Last successful Puppet run was Sat 22 Feb 2014 02:36:40 PM UTC [03:05:06] (03PS1) 10Springle: s1 direct api traffic to db1043 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115336 [03:05:26] (03CR) 10Springle: [C: 032] s1 direct api traffic to db1043 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115336 (owner: 10Springle) [03:05:34] (03Merged) 10jenkins-bot: s1 direct api traffic to db1043 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115336 (owner: 10Springle) [03:06:27] !log springle synchronized wmf-config/db-eqiad.php 's1 direct api traffic to db1043' [03:06:35] Logged the message, Master [03:38:50] !log LocalisationUpdate ResourceLoader cache refresh completed at 2014-02-25 03:38:50+00:00 [03:38:58] Logged the message, Master [03:40:27] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [05:07:57] PROBLEM - Puppet freshness on virt1000 is CRITICAL: Last successful Puppet run was Fri 21 Feb 2014 04:42:42 PM UTC [05:10:07] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [05:55:57] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Last successful Puppet run was Sat 22 Feb 2014 02:36:40 PM UTC [05:59:03] (03PS5) 10Ori.livneh: Send GeoIP lookup result as 'GeoIP' cookie [operations/puppet] - 10https://gerrit.wikimedia.org/r/113935 [06:00:42] (03CR) 10Ori.livneh: "Tested. I feel pretty good about this one." [operations/puppet] - 10https://gerrit.wikimedia.org/r/113935 (owner: 10Ori.livneh) [06:01:40] bblack: ^ i think i got it [06:03:17] cool stuff :) [06:04:07] btw, I had a minor comment about the other patchset [06:04:27] "inline" doesn't do much by itself, plus the compiler is usually better at doing this sort of thing [06:05:21] (03CR) 10GWicke: "Did you test that anonymous requests with the geo cookie set still get cached results?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/113935 (owner: 10Ori.livneh) [06:05:22] but in any case, it's an inlined function of two lines, used once, so you could also just actually inline it in the code [06:05:51] twice [06:06:30] but yeah, you're probably right [06:06:33] I can only see one [06:06:37] i won't actually inline it, but i'll drop the directive [06:06:38] I'm talking about geo_init(), to be clear [06:07:22] https://gerrit.wikimedia.org/r/#/c/113935/5/templates/varnish/geoip.inc.vcl.erb , line 131 and 92 [06:08:38] ah, right, the second patchset adds the second invocation [06:09:02] but to gwicke's point: we only vary on specific cookies, right? [06:09:25] X-Vary-On isn't implemented, we just have a few hard-coded cookie patterns in VCL IIRC [06:12:25] I think MW sends Vary: Cookie anyway, for downstream caches [06:12:57] I don't think it will matter in this case [06:13:02] I'll let Brandon double-check :) [06:13:42] yeah, things like AFTv5 on enwiki still set a random token for analytics for all anons [06:13:57] if we varied on that we'd be toast [06:13:58] we munge cookies on text though [06:14:10] we have some special logic iirc [06:14:13] and this is bits [06:14:17] nope, text [06:14:34] because, remember, the advantage of set-cookie is that it can spare a surrogate request to bits [06:15:14] oh I thought you'd piggyback on the bits requests [06:19:43] (03PS6) 10Ori.livneh: Refactor GeoIP lookup code; add tests [operations/puppet] - 10https://gerrit.wikimedia.org/r/113900 [06:21:11] yeah I'm pretty happy with the state of both patches now, I've nitpicked ori enough :) [06:21:47] ohhi bblack [06:21:49] bblack: actually i have one more patch that is entirely "cute" (no code change, just formatting/comments) [06:21:55] ok [06:21:58] just about to submit it [06:24:45] my neighborhood bar just closed, so I'm back home reviewing code, I'm not sure how you should feel about that :) [06:25:07] (03PS1) 10Andrew Bogott: Add a script that updates labs instances after migration. [operations/puppet] - 10https://gerrit.wikimedia.org/r/115342 [06:25:18] I do that often :) [06:29:39] (03PS1) 10Ori.livneh: Format GeoIP VCL C code for consistency with Varnish [operations/puppet] - 10https://gerrit.wikimedia.org/r/115343 [06:30:18] (03PS2) 10Andrew Bogott: Add a script that updates labs instances after migration. [operations/puppet] - 10https://gerrit.wikimedia.org/r/115342 [06:30:21] (03PS6) 10Ori.livneh: Send GeoIP lookup result as 'GeoIP' cookie [operations/puppet] - 10https://gerrit.wikimedia.org/r/113935 [06:30:27] (03PS2) 10Ori.livneh: Format GeoIP VCL C code for consistency with Varnish [operations/puppet] - 10https://gerrit.wikimedia.org/r/115343 [06:31:31] (03PS3) 10Andrew Bogott: Add a script that updates labs instances after migration. [operations/puppet] - 10https://gerrit.wikimedia.org/r/115342 [06:31:35] bblack: ok, that's it [06:32:53] I like the complete omission of VRT_re_* :) [06:33:44] (03CR) 10BBlack: [C: 032] "Looking good :)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/113900 (owner: 10Ori.livneh) [06:34:26] yeah, i squinted at it for a while before realizing i was being silly [06:34:32] (03CR) 10BBlack: [C: 032] Send GeoIP lookup result as 'GeoIP' cookie [operations/puppet] - 10https://gerrit.wikimedia.org/r/113935 (owner: 10Ori.livneh) [06:34:48] (03CR) 10BBlack: [C: 032] Format GeoIP VCL C code for consistency with Varnish [operations/puppet] - 10https://gerrit.wikimedia.org/r/115343 (owner: 10Ori.livneh) [06:36:15] I plan to talk to yuri tomorrow about the https thing, but I think it will be fairly trivial and serparate [06:36:54] heh, i caught a stupid bug [06:37:07] not allowed, it's already +2 :) [06:38:10] in the sanitize code? [06:38:46] because the loop over strchr() did look a little funny, but at worst suboptimal [06:40:24] no, not there [06:40:49] (03PS1) 10Ori.livneh: Fix address family determination in GeoIP cookie [operations/puppet] - 10https://gerrit.wikimedia.org/r/115344 [06:40:58] (03PS2) 10Ori.livneh: Fix address family determination in GeoIP cookie [operations/puppet] - 10https://gerrit.wikimedia.org/r/115344 [06:41:19] a little embarassing [06:42:03] hah [06:42:28] well, you could say you're trying to help convince the world to convert to ipv6 :_ [06:42:46] (03CR) 10BBlack: [C: 032] Fix address family determination in GeoIP cookie [operations/puppet] - 10https://gerrit.wikimedia.org/r/115344 (owner: 10Ori.livneh) [06:42:56] s/_/)/ [06:43:27] assert(AF_INET6); [06:44:05] (03PS4) 10Andrew Bogott: Add a script that updates labs instances after migration. [operations/puppet] - 10https://gerrit.wikimedia.org/r/115342 [06:46:19] btw, https://gerrit.wikimedia.org/r/#/c/30836/ [06:46:43] limited usefulness, because of the poor quality of the databases [06:46:47] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 793.400024 [06:46:50] but feel free to adjust it or abandon it [06:47:06] I have it on my gerrit homepage since Oct 2012, I'm sick of seeing it :) [06:48:07] PROBLEM - Varnishkafka Delivery Errors on cp3021 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 668.266663 [06:48:12] (03CR) 10Ori.livneh: "GWicke: the text varnishes stash the cookie header as 'Orig-Cookie' for hash / vary purposes. Specific cookies that we do need to vary on " [operations/puppet] - 10https://gerrit.wikimedia.org/r/113935 (owner: 10Ori.livneh) [06:49:00] i'll try to rebase it [06:50:52] it would need some additional testing though, not sure we should pile it on a single deploy [06:53:32] oh yeah, this one is a bit tricky [06:54:07] RECOVERY - Varnishkafka Delivery Errors on cp3021 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [06:55:54] (03PS1) 10Jkrauska: Initial commit of pmacct module and role [operations/puppet] - 10https://gerrit.wikimedia.org/r/115345 [06:56:06] cajoel: heya [06:56:28] hello there [06:56:33] I was about to sign off [06:56:42] but I'd be happy to chat-- you got some time?> [06:56:42] fat chance! [06:56:45] * ori seals the exits [06:57:17] got mysql data store working nicely [06:57:19] nah, go [06:57:26] I feel bad :) [06:57:30] bad? [06:57:33] sick? [06:57:45] for talking to you right before you were about to leave at 11pm your time [06:57:54] nah [06:58:07] PROBLEM - Varnishkafka Delivery Errors on cp3021 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 407.100006 [06:58:15] let me put a bit in to the gerrit .. [06:58:38] daniel helped a bit explain his view on what should go in a role, so I want him to review it. [06:59:07] okay [06:59:15] added you too [06:59:22] anyone else? [06:59:23] it's *very* tempting but I should not have a look at it this week [06:59:44] I'm using file_line, which is an evil being.. [06:59:52] should /not/? what's going on? [07:00:53] I put some hours on in tonight explictly so you coudl look at it during your normal day.. :) [07:01:24] falling behind on some other stuff which are team's priorities [07:01:55] because of spending my time in a million small things, basically :) [07:02:32] (03CR) 10GWicke: "@Ori: Makes sense. Thanks for the explanation!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/113935 (owner: 10Ori.livneh) [07:04:44] bblack: were you thinking of deploying those changes, or were you just certifying them as sane? no worries if the latter; just wondering if i should stick around. [07:14:47] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [07:15:07] RECOVERY - Varnishkafka Delivery Errors on cp3021 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [07:17:47] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 108.5 [07:18:47] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [07:21:02] ori: still around? [07:21:09] barely [07:21:20] can you run a quick query on the db for me? [07:21:34] for https://bugzilla.wikimedia.org/show_bug.cgi?id=61809 [07:22:07] PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 160.46666 [07:22:08] select * from user_newtalk where user_id=726851; [07:23:02] that's Risker's user id [07:24:07] RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [07:25:10] Empty set (0.00 sec) [07:25:16] ok [07:25:21] must be a caching issue then [07:25:23] thanks! [07:25:29] np [07:29:07] PROBLEM - Varnishkafka Delivery Errors on cp3011 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 200.53334 [07:30:20] ori: my +2 meanx I think they're sane enough to deploy, but it's your change, you deploy :) [07:32:14] uhoh [07:32:52] uh-oh I can't spell? [07:33:03] no, just vaguely anxious [07:33:27] how do you do it, usually? [07:33:45] do you disable puppet on the target hosts and then try it? [07:33:50] try it on one [07:34:40] "the target hosts" is a rather large set to disable puppet on [07:36:09] you could do "git format-patch ..." and make patchfiles of your 3x changes, then disable puppet on one host and apply those patches to /etc/varnish manually with "patch -pX" and see what happens [07:36:43] (assuming templating doesn't interfere with patch too much, seeing as the patches are to templates, but I think not in templated areas) [07:37:16] and edit out the .erb bit and/or just manuall tell it which files to affect [07:38:37] yeah, that makes sense [07:38:53] since some of the failure modes are non-instant (memory leak, cache fragmentation), how long would you let it run on one host with puppet disabled? [07:39:13] mostly I'd look at syslog and varnishlog, maybe push some test queries through [07:39:35] unless the memory leak is severe, you probably won't be able to see it right off, and cache fragmentation likewise [07:40:07] trust your feelings, luke [07:40:28] i think i may wait on faidon or mark [07:41:11] i don't think i've got enough experience to roll it out myself [07:41:20] they have too many other things to pay close attention to. the inside of your official man-purse says Be Bold :) [07:42:02] mind doing it bblack? [07:42:07] sure [07:42:07] RECOVERY - Varnishkafka Delivery Errors on cp3011 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [07:44:29] blah, i should have been clearer earlier. sorry about that. [07:44:47] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 128.266663 [07:44:50] it's ok with me if you want to defer it [07:45:07] PROBLEM - Varnishkafka Delivery Errors on cp3011 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 249.399994 [07:45:13] nah, I think it's fine, we've been over it pretty thoroughly. there's always a risk, but it's not an usually-high one. [07:45:29] *unusually :P [07:45:30] *unusually* :) [07:45:47] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [07:46:27] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [07:47:07] RECOVERY - Varnishkafka Delivery Errors on cp3011 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [07:49:47] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 319.833344 [07:51:05] the inside of my official man-purse says "Be bold", but my recent experience says "OK, maybe not *that* bold" [07:51:47] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [07:54:25] I tested it on one host and nothing immediately-obviously-disastrous happened [07:54:28] it's merged now [07:55:11] which host? [07:55:24] cp4015 [07:58:20] * ori tests [07:58:47] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 13.566667 [07:59:07] PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 13.1 [07:59:47] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [08:00:07] RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [08:01:50] Undefined sub geoip_cookie, first reference: [08:01:50] ('text-frontend.inc.vcl' Line 154 Pos 22) [08:01:50] call geoip_cookie; [08:01:52] ---------------------############- [08:01:56] (03CR) 10Faidon Liambotis: [C: 032] Add variant rewrites for zhwikivoyage [operations/apache-config] - 10https://gerrit.wikimedia.org/r/110155 (owner: 10TTO) [08:04:13] (03CR) 10Faidon Liambotis: [V: 032] Add variant rewrites for zhwikivoyage [operations/apache-config] - 10https://gerrit.wikimedia.org/r/110155 (owner: 10TTO) [08:04:47] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 47.200001 [08:05:15] ? [08:05:31] I didn't get that on my restart/reload [08:05:47] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [08:05:52] because cp4015 is an upload varnish; i ran puppet on cp1065 [08:05:54] sorry, cp1052 is where I tested, it's a text node [08:06:06] I copied the hostname from the wrong window, but still [08:06:27] oh - yeah, I should've restart the frontend, not the back [08:06:45] well, it's not the end of the world, it just doesn't reload [08:07:50] so... looks like a scope issue [08:07:53] yeah, we need to add include "geoip.inc.vcl"; [08:08:27] PROBLEM - Varnish HTTP text-frontend on cp1052 is CRITICAL: Connection refused [08:08:57] PROBLEM - Puppet freshness on virt1000 is CRITICAL: Last successful Puppet run was Fri 21 Feb 2014 04:42:42 PM UTC [08:09:07] it's gated with <% if cluster_options.fetch( "enable_geoiplookup", false ) -%> [08:10:03] <% if cluster_options.fetch( "enable_geoiplookup", false ) -%> [08:10:03] include "geoip.inc.vcl"; [08:10:03] <% end -%> [08:10:15] heh, I should read before I tab over and paste :) [08:11:56] looks like currently that option is only set for "bits" [08:12:08] yeah, patch incoming [08:14:01] seeing as geoip.inc.vcl is just compiled subs without a runtime impact (lacking other supporting code), I don't see the harm in enabling it elsewhere, so yeah [08:14:35] anyways, I'll take the blame for that one, I should've realized to check the -frontend process as well [08:15:07] PROBLEM - Varnishkafka Delivery Errors on cp3021 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 430.933319 [08:15:09] (03CR) 10Mattflaschen: [C: 04-1] "Can you explain this further? Based on http://commons.wikimedia.beta.wmflabs.org/wiki/Special:ListGroupRights , admins already have skipca" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115230 (owner: 10Dereckson) [08:16:32] (03PS1) 10Ori.livneh: text varnishes: require geoip; set enable_geoiplookup for frontend [operations/puppet] - 10https://gerrit.wikimedia.org/r/115346 [08:17:07] PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1170.06665 [08:17:08] PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 31.633333 [08:17:47] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 99.800003 [08:17:48] what's with the kafka.varnishkafka.kafka_drerr.per_second ? [08:18:07] RECOVERY - Varnishkafka Delivery Errors on cp3021 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [08:18:08] RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [08:18:20] it has been intermittent for weeks [08:18:28] well, I guess if someone knew it would've been fixed, yeah [08:18:34] it was more or less a rhetorical question :) [08:18:39] it's a new setup, not stabilized yet, afaik [08:18:47] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [08:19:25] afaik too [08:19:30] we briefly discussed it yesterday [08:19:48] The various intermittent 5xx's are still me, btw [08:20:22] sometimes it takes a while to restart varnish on upgrade, depends how many times in a row it fails to mmap() to the exact same address :P [08:21:24] (03Abandoned) 10Tim Landscheidt: WIP: Puppetize aliases and NAT for LabsDB replica servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/107010 (owner: 10Tim Landscheidt) [08:21:29] I have a new commandline for dealing with it now, though, so it goes faster usually. varnish upgrade->restart: [08:21:32] apt-get -y -o DPkg::Options::="--force-confold" install varnish libvarnishapi1 varnish-dbg; while [ $? -ne 0 ]; do echo ============== retry startup ... =========; sleep 1; apt-get -f install; done; service varnish-frontend restart [08:21:51] lol that's so sad [08:22:07] (... x 83 hosts one by one, waiting a while and doing something else between each to avoid rushing the process and getting too many 5xx) [08:22:07] RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [08:22:12] the fact that we need to do that :( [08:22:22] but a VCL change doesn't need a restart [08:22:35] or are you talking about your 3.0.5 upgrades? [08:23:04] what else? [08:23:15] I thought ori's change [08:23:15] but yes, that cmdline is from the 3.0.5 upgrade process [08:23:47] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 744.43335 [08:23:58] ori's change is just going through puppet, the error in the change just made reload fail, but that doesn't really hurt anything in the immediate sense [08:24:12] I think the zero team has left VCL failing to reload for days in the past :P [08:24:34] well, zero plus someone of us that deployed it and didn't check [08:24:38] I've done that :) [08:25:09] so, spewing 500s for upgrading varnish is bad in the long term [08:25:24] I think it all boils down to the issue we've discussed before with chash [08:25:44] frontends, you can depool, upgrade, repool (automating this is an issue though) [08:25:50] well, even without the mmap problem, and even if chash were better, the restart isn't perfectly seamless [08:25:57] backends, it shouldn't matter, since frontends can and will retry [08:26:03] well, sure, unless you depool, but we could do that even with the mmap issue [08:26:17] you could force-fail the backends [08:26:46] if it weren't for the mmap-restart delays, though, mostly we wouldn't notice the very small hiccups [08:27:47] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [08:28:07] PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 228.199997 [08:28:08] PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 112.866669 [08:28:18] (03CR) 10BBlack: [C: 032 V: 032] text varnishes: require geoip; set enable_geoiplookup for frontend [operations/puppet] - 10https://gerrit.wikimedia.org/r/115346 (owner: 10Ori.livneh) [08:29:08] RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [08:31:07] RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [08:31:34] these varnish upgrade cycles are really making me want to fix the mmap thing sooner [08:31:46] yes please :) [08:32:07] PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 111.066666 [08:32:42] ori: [08:32:45] ./vcl.rNSuL0wH.c:988:24: fatal error: GeoIPCity.h: No such file or directory [08:32:46] Message from C-compiler: [08:32:47] ./vcl.E2AClb1j.c:988:24: fatal error: GeoIPCity.h: No such file or directory [08:32:49] compilation terminated. [08:32:52] Running C-compiler failed, exit 1 [08:33:08] RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [08:33:13] * ori blames mark [08:33:19] * ori investigates [08:33:27] libgeoip-dev package? [08:34:10] I don't see it mentioned in puppet at all, maybe someone installed it without puppet on bits before? [08:34:30] because it's there on a random bits host [08:35:19] that's odd [08:35:43] besides geoip-dev, do we even have geoip databases (i.e. "include geoip") on text varnishes? [08:35:47] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 203.433334 [08:35:59] paravoid: he put that in his new patch [08:36:05] ah, sorry [08:36:11] but "include geoip" doesn't get libgeoip-devel [08:36:24] ew redhat! [08:36:29] -dev :) [08:36:29] nothing does, if "git grep libgeoip-devel" is to be believed, so the header presence on bits caches is a myster [08:36:32] *y [08:36:51] well, either way [08:37:16] yeah, I broke that [08:37:32] 39836917790b24991df662e113f750567ce6d8be [08:37:44] - package { ['libgeoip1', 'libgeoip-dev', 'geoip-bin']: [08:37:44] - ensure => present; [08:37:44] - } [08:38:39] helllo [08:39:13] hi [08:39:28] hashar: i am *definitely* not here [08:39:52] you mean working at 1am in the middle of your vacation? [08:39:57] has anyone seen ori? [08:40:26] paravoid: that would be silly [08:40:32] paravoid: so, geoip-bin almost certainly pulls in libgeoip1 anyways, so I guess I'll just add a geoip::dev class and use it for text+bits varnishes? [08:40:42] yeah [08:40:45] makes sense to me [08:40:57] ori: I dont want to talk with you about technical / wikimedia stuff until monday :-] [08:41:18] bblack: thanks [08:41:33] (not just this specific issue, for the whole review/deployment help) [08:42:14] !log deployment-prep Upgrading all varnishes. [08:42:16] aah [08:42:22] Logged the message, Master [08:42:36] !log wrong channel, I am not upgrading any production varnishes but the beta cluster ones. [08:42:43] Logged the message, Master [08:43:16] lol [08:45:47] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [08:46:05] yeah Failed to parse template varnish/vcl/wikimedia.vcl ! :( [08:46:19] can't convert Symbol into String in varnish/manifests/instance.pp:74 [08:47:32] sounds like a much better error than: [08:47:33] Could not mmap SILO (/srv/sdb3/varnish.main2) at target 3050061824, was mapped at 3629961216 instead [08:48:11] looks like chinese to me, luckily you are probably fluent in those cryptic messages :D [08:48:59] (03PS1) 10BBlack: Add a geoip::dev class to use for VCL compilation [operations/puppet] - 10https://gerrit.wikimedia.org/r/115348 [08:50:02] (03CR) 10Ori.livneh: [C: 031] Add a geoip::dev class to use for VCL compilation [operations/puppet] - 10https://gerrit.wikimedia.org/r/115348 (owner: 10BBlack) [08:50:06] it's BSD-ese for "what? mmap doesn't give you whatever address I asked for? Well, your libc sucks, maybe you should switch operating systems" [08:51:20] (03CR) 10BBlack: [C: 032] Add a geoip::dev class to use for VCL compilation [operations/puppet] - 10https://gerrit.wikimedia.org/r/115348 (owner: 10BBlack) [08:52:26] there is always Debian with a FreeBSD kernel :D [08:53:19] those debian people are crazy [08:53:32] *cough* [08:53:37] :) [08:54:21] as if things weren't complicated enough, someone just thought it would be fun to throw /kFreeBSD into the mix :P [08:54:27] RECOVERY - Varnish HTTP text-frontend on cp1052 is OK: HTTP OK: HTTP/1.1 200 OK - 262 bytes in 0.259 second response time [08:54:44] what's next? GNU/hurd? [08:54:48] oh, wait.. [08:54:58] sure ori https://www.debian.org/ports/hurd/ [08:55:00] and then a thousand source packages that assumed __linux__ implied something about glibc and other such assumptions cried out in agony [08:55:39] there is an active debian port for hurd [08:55:48] two, actually (hurd-i386, hurd-amd64) [08:55:53] i know, i was joking [08:55:57] porterboxes, build boxes and everything [08:56:07] *sorry we can not accept your very useful software until it is fixed on hurd* [08:56:22] no, it's not an official architecture [08:56:39] but then even with Linux you have stuff such as https://github.com/edenhill/librdkafka/issues/87 :) [08:56:40] I used to play with hurd years and years ago :) [08:56:57] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Last successful Puppet run was Sat 22 Feb 2014 02:36:40 PM UTC [08:57:05] bblack: did everthing work on cp1052? [08:57:21] yes [08:57:23] /usr/share/varnish/reload-vcl -n frontend on cp1065 gives me: ./vcl.uXMxEexg.so: undefined symbol: GeoIPRecord_delete [08:57:33] really? [08:57:40] did you restart or reload-vcl? [08:57:53] maybe reload-vcl doesn't set the compile options [08:57:57] paravoid: I dont even understand the bug report you pointed :/ [08:58:03] I restarted [08:58:37] because... we have a subtle bug in our puppet vcl-reload: if vcl-reload fails, and you fix the problem by doing something other than editing the VCL (say, adding a package), puppet's not going to vcl-reload again [08:59:00] you can just /etc/init.d/varnish reload [08:59:01] right, that's a puppet gotcha generlaly [08:59:14] generally, even [08:59:35] that worked [09:00:13] yeah, it doesn't really [09:00:20] it's just that the initscript hides VCL errors on restart [09:00:47] we have to back out the patch now [09:00:50] (es) [09:00:58] because now it's segfaulting all over the place :( [09:01:07] PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 180.199997 [09:01:16] really? it seems to have worked on cp1065 [09:01:24] (03Abandoned) 10Tim Landscheidt: WIP: Tools: Puppetize LabsDB aliases and /etc/hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/107103 (owner: 10Tim Landscheidt) [09:01:39] not at all, look at syslog on cp1065 [09:01:47] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 290.333344 [09:02:02] Feb 25 09:01:50 cp1065 frontend[8127]: Child (11446) said Child starts [09:02:05] Feb 25 09:01:50 cp1065 kernel: [5522605.872203] varnishd[11557]: segfault at 0 ip 00007f143c1f15dc sp 00007f14385c82c0 error 4 in vcl.bdtBsfWo.so[7f143c1ee000+10000] [09:02:08] Feb 25 09:01:50 cp1065 frontend[8127]: Child (11446) died signal=11 [09:02:11] Feb 25 09:01:50 cp1065 frontend[8127]: Child cleanup complete [09:02:13] Feb 25 09:01:50 cp1065 frontend[8127]: child (11645) Started [09:02:16] Feb 25 09:01:50 cp1065 frontend[8127]: Child (11645) said Child starts [09:02:19] Feb 25 09:01:51 cp1065 kernel: [5522605.980089] varnishd[11746]: segfault at 0 ip 00007f143c1f15dc sp 00007f143a1a82c0 error 4 in vcl.bdtBsfWo.so[7f143c1ee000+10000] [09:02:22] over and over [09:02:38] arghh. any idea why? [09:03:32] don't really care at this point, just trying to think the best way to revert these 4-5 patches [09:03:43] before we lose a bunch of varnishes [09:03:59] gate the geoip stuff with a cluster_options guard [09:04:01] in the vcl templates [09:04:03] and set it to false for now [09:04:10] sounds ok? if so i can submit a patch [09:04:35] sure [09:05:08] PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 244.199997 [09:05:14] in the future, it would probably be better to squish your geoip branch to a single gerrit changeset [09:05:33] I was thinking that earlier while we were doing these add-on patches, but, now it's really obvious :) [09:05:47] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [09:07:01] (03PS1) 10Ori.livneh: text varnishes: set enable_geoiplookup to false [operations/puppet] - 10https://gerrit.wikimedia.org/r/115350 [09:07:20] ^ bblack [09:07:41] well, the changes may have broken bits as well, I'm checking now [09:09:07] RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [09:10:36] bits does compile, with a couple warnings about writing to const [09:10:46] how confident are you in the refactor not breaking bits? [09:11:13] (03CR) 10BBlack: [C: 032] text varnishes: set enable_geoiplookup to false [operations/puppet] - 10https://gerrit.wikimedia.org/r/115350 (owner: 10Ori.livneh) [09:11:27] cp1057 seems to be doing fine [09:11:31] it's the one i checked [09:11:47] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 653.533325 [09:11:51] ./vcl.jbqrRcuG.c: In function ‘VGC_function_geoip_cookie’: [09:11:51] ./vcl.jbqrRcuG.c:1019:11: warning: assignment discards ‘const’ qualifier from pointer target type [enabled by default] [09:11:54] ./vcl.jbqrRcuG.c:1028:11: warning: assignment discards ‘const’ qualifier from pointer target type [enabled by default] [09:12:17] PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 307.466675 [09:15:14] there are so many layers of wrong here, maybe of them pre-existing :) [09:15:57] I do wonder what other magic is happening on the bits machines though [09:16:30] I salted to speed up puppet pushing the revert, btw [09:16:48] well, s/revert/cluster_options guard/ [09:17:27] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [09:20:59] <% if cluster_options.fetch( "enable_geoiplookup", false ) -%> [09:20:59] # Bash sucks [09:20:59] CC_COMMAND="exec cc -fpic -shared -Wl,-x -L/usr/local/lib/ -o %o %s -lGeoIP" [09:21:02] <% end %> [09:21:32] hmmm [09:22:02] # TODO rewrite in python [09:26:52] yeah, so, to recap, the initial compile fail was that reload-vcl doesn't change CC_COMMAND, but the segfault is something else [09:28:03] yeah [09:28:07] PROBLEM - Varnishkafka Delivery Errors on cp3021 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 316.399994 [09:28:26] since the const warnings are the cookie func (that text uses and bits doesn't, right?), I wouldn't be surprised if it was related [09:30:27] oh, the const warnings must be on the "cookie = ", which doesn't seem to be a practical problem [09:31:07] RECOVERY - Varnishkafka Delivery Errors on cp3021 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [09:31:07] RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [09:31:27] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [09:31:47] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [09:32:11] ori: it could be that record->country_code needs sanitization, although differently than ->city [09:32:48] IIRC, libGeoIP can return \0\0 as the two-char country code in country-level lookups, donno about city [09:32:59] PROBLEM - Disk space on virt11 is CRITICAL: DISK CRITICAL - free space: /var/lib/nova/instances 40014 MB (3% inode=99%): [09:33:22] and sprintf wouldn't care? [09:33:54] VRT_WrkString might [09:34:07] PROBLEM - Varnishkafka Delivery Errors on cp3021 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 166.633331 [09:34:47] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1701.400024 [09:34:54] I donno, just tossing out random ideas [09:34:58] since i'm calling snprintf anyway to construct latlon i may as well do it with the other colon-separated fields [09:35:00] it makes sense [09:35:15] but really, we should find a way to test this better before pushing it again [09:35:24] maybe stick it on a dev host somewhere and use log replay [09:35:45] varnishreplay, that is [09:37:08] back on the reload-vcl thing: that highlights the absurdity of compiling actual C code as an extension language [09:37:09] i have it running on my dev instance [09:37:17] PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 525.299988 [09:37:26] well, i am not introducing that [09:37:26] to fix all such errors, reload-vcl would need to become autoconf+automake+libtool :P [09:38:31] (03CR) 10Krinkle: "bump" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/94447 (owner: 10Ori.livneh) [09:41:09] i can't reproduce it locally [09:42:07] RECOVERY - Varnishkafka Delivery Errors on cp3021 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [09:43:06] hmm [09:45:24] seems worth trying to avoid VRT_WrkString [09:45:32] and just snprintf the cookie [09:46:08] RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [09:46:47] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [09:48:00] ori: I emailed you an xz-compressed varnishlog file [09:48:09] can you replay that in your test instance with varnishreplay? [09:48:25] it's from 1065 where the segfaults were happening repeatedly, hopefully it has some of the same pattern [09:48:48] Hm.. what happened to this page? [09:48:48] https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:Love_fashion_-_2014-02-21_18-09.jpg [09:48:57] no revisions, no log entries [09:49:07] RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [09:49:08] (03PS5) 10Andrew Bogott: Add a script that updates labs instances after migration. [operations/puppet] - 10https://gerrit.wikimedia.org/r/115342 [09:49:23] no archive either [09:49:33] bblack: trying [09:50:21] https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:Luigi_Visca_nato_a_Caserta_il_24_dicembre_del_1994_2014-02-21_17-16.jpg [09:50:22] https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:My_8G_Prince_Albert_Piercing_1.JPG [09:50:24] two more [09:50:25] wtf [09:51:03] Krinkle: how do you found that pages? [09:51:22] se4598: I found them with my toolserver tool that is supposed to find blanked pages [09:51:34] e.g. someone creates an article, then edits it to blank it, and leaves it behind [09:51:41] it found these three by accident [09:54:53] Krinkle: can you bug fill them please? :-] [09:54:55] (03PS1) 10Ori.livneh: Use snprintf to synthesize GeoIP cookie, rather than VRT_WrkString [operations/puppet] - 10https://gerrit.wikimedia.org/r/115354 [09:55:00] hashar: On it already [09:55:26] Krinkle: I am pretty sure we have a bug about that. Cant remember it which one though but that is definitely a known issue among mw-core team. [09:56:57] https://bugzilla.wikimedia.org/show_bug.cgi?id=61898 [09:57:30] security bug? okayy. [09:57:52] (03Abandoned) 10Andrew Bogott: Further attempt to redirect syslog to console for labs instances. [operations/puppet] - 10https://gerrit.wikimedia.org/r/114414 (owner: 10Andrew Bogott) [09:57:56] Krinkle: how the hell is it a security bug ? [09:58:14] bblack: my varnish instance is not a full-blown text frontend, just a simple vcl that engages the geo_cookie sub [09:58:18] so i can't replay that [09:58:24] (03Abandoned) 10Andrew Bogott: Clobber resolv.conf after we're done with our apt magic. [operations/puppet] - 10https://gerrit.wikimedia.org/r/114501 (owner: 10Andrew Bogott) [09:58:26] I find it more valuable to find this bug then to have some random person "fix" it by doing something on-wiki. [09:58:32] I said so in the bug, read please. [09:59:39] yeah that is why I ask [09:59:55] ori: well I'm about out of steam for the evening. I think it's possible to make a one-off labs host with our varnish puppet config, or if that's not easy, there are existing deployment-test hosts out there with our varnish puppet config running for beta.wmflabs, I believe [10:00:16] Krinkle: anyway I dont have our meeting archives handy, they have been moved somewhere :/ [10:01:10] bblack: ok. would you be willing to give https://gerrit.wikimedia.org/r/#/c/115354/ a shot? (not tonight) given that a revert is easy w/enable_geoiplookup? [10:01:44] well, in theory yes, if we weren't already this far down this road [10:02:07] too many small mistakes in a row, it's really time to step back and validate things better before we push forward with more guesses [10:02:40] you could use beta maybe ? [10:02:52] yeah, i'll give that a shot [10:03:03] i wish i could reproduce it locally [10:03:09] yeah, we need to use beta and find the segfault though, as opposed to use beta to try another random patch [10:03:21] use the existing patch + the log from 1065 [10:03:32] yeah, that's what i meant [10:03:36] the beta upload cache is broken right now (puppet has some weird issue) [10:03:44] but the text / bits /mobile ones are working [10:03:57] deployment-cache-bits03.pmtpa.wmflabs [10:03:57] deployment-cache-mobile01.pmtpa.wmflabs [10:03:57] deployment-cache-text1.pmtpa.wmflabs [10:04:03] fully puppetized [10:04:35] yeah so you can probably jump on there, puppetd --disable, manually undo the recent guard changes, etc [10:05:11] I'll look at this more tomorrow, I need sleep [10:05:39] * ori nods [10:05:48] thanks again for your help, get some sleep [10:07:37] * hashar points ori's bedroom [10:14:27] PROBLEM - Varnish HTTP text-frontend on cp1052 is CRITICAL: Connection refused [10:14:36] (03CR) 10Andrew Bogott: "Marc, this is tested and working, but because it contains a reboot and is going to be deployed on every labs host I'd appreciate a second " [operations/puppet] - 10https://gerrit.wikimedia.org/r/115342 (owner: 10Andrew Bogott) [10:16:11] (03PS1) 10Hashar: Jenkins validation (DO NOT SUBMIT) [operations/debs/archiva] (debian) - 10https://gerrit.wikimedia.org/r/115359 [10:16:27] RECOVERY - Varnish HTTP text-frontend on cp1052 is OK: HTTP OK: HTTP/1.1 200 OK - 263 bytes in 0.003 second response time [10:18:16] well, I can reproduce it [10:21:03] * Nemo_bis wonders what hashar is doing near ori's bedroom [10:24:22] (03PS1) 10Andrew Bogott: Add some labs management scripts. [operations/puppet] - 10https://gerrit.wikimedia.org/r/115360 [10:25:03] (03CR) 10jenkins-bot: [V: 04-1] Add some labs management scripts. [operations/puppet] - 10https://gerrit.wikimedia.org/r/115360 (owner: 10Andrew Bogott) [10:26:18] (03PS2) 10Andrew Bogott: Add some labs management scripts. [operations/puppet] - 10https://gerrit.wikimedia.org/r/115360 [10:26:56] (03CR) 10jenkins-bot: [V: 04-1] Add some labs management scripts. [operations/puppet] - 10https://gerrit.wikimedia.org/r/115360 (owner: 10Andrew Bogott) [10:32:08] :( [10:32:38] (03PS3) 10Andrew Bogott: Add some labs management scripts. [operations/puppet] - 10https://gerrit.wikimedia.org/r/115360 [10:33:16] (03CR) 10jenkins-bot: [V: 04-1] Add some labs management scripts. [operations/puppet] - 10https://gerrit.wikimedia.org/r/115360 (owner: 10Andrew Bogott) [10:33:43] (03Abandoned) 10Hashar: Jenkins validation (DO NOT SUBMIT) [operations/debs/archiva] (debian) - 10https://gerrit.wikimedia.org/r/115359 (owner: 10Hashar) [10:35:06] (03PS4) 10Andrew Bogott: Add some labs management scripts. [operations/puppet] - 10https://gerrit.wikimedia.org/r/115360 [10:43:54] !log Jenkins setting email-ext notification content type to HTML [10:44:02] Logged the message, Master [10:44:27] PROBLEM - Varnish HTTP text-frontend on cp1052 is CRITICAL: Connection reset by peer [10:45:27] RECOVERY - Varnish HTTP text-frontend on cp1052 is OK: HTTP OK: HTTP/1.1 200 OK - 262 bytes in 0.006 second response time [10:48:11] FYI https://bugzilla.wikimedia.org/show_bug.cgi?id=58440 [10:52:28] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [11:08:18] Nikerabbit: awesome reply, kudos :) [11:09:45] paravoid: thanks :) [11:09:57] PROBLEM - Puppet freshness on virt1000 is CRITICAL: Last successful Puppet run was Fri 21 Feb 2014 04:42:42 PM UTC [11:11:27] PROBLEM - Varnish HTTP text-frontend on cp1052 is CRITICAL: HTTP CRITICAL - No data received from host [11:12:27] RECOVERY - Varnish HTTP text-frontend on cp1052 is OK: HTTP OK: HTTP/1.1 200 OK - 263 bytes in 0.082 second response time [11:12:32] ori: still here? [11:12:43] or bblack, but I suppose not [11:12:43] yeah [11:12:51] Feb 25 11:12:16 cp1052 kernel: [7530415.618971] varnishd[10165]: segfault at 0 ip 00007f346cbf15dc sp 00007f34883882c0 error 4 in vcl.p76JOIhH.so[7f346cbee000+10000] [11:13:05] I'll revert [11:13:51] just a sec [11:13:55] paravoid: i have a fix [11:14:13] root@cp1052:/etc/varnish# grep -c segfault /var/log/syslog [11:14:13] 14341 [11:14:17] for the love of god [11:15:27] (03PS2) 10Ori.livneh: Handle NULL record->city [operations/puppet] - 10https://gerrit.wikimedia.org/r/115354 [11:16:03] paravoid: ^ [11:17:06] ok, cp1052 needed a VCL reload [11:17:21] it still spewed multiple segfaults per second [11:17:50] yeah, i think bblack forgot to reload it with the updated vcl [11:17:55] or maybe it has puppet disabled [11:18:48] the change above should make it safe to enable, tho. it's easy to reproduce the issue [11:19:56] see pm [11:21:17] i suppose i should do that for lat/lon rather than rely on snprintf producing '(null)' [11:24:47] that's a bug in the previously-existing code too, though the behavior of snprintf happens to be safe if unspecified [11:25:04] s/if/but [11:26:46] (03PS3) 10Ori.livneh: Handle NULL record->city [operations/puppet] - 10https://gerrit.wikimedia.org/r/115354 [11:31:11] (03PS4) 10Ori.livneh: Handle NULL record->city [operations/puppet] - 10https://gerrit.wikimedia.org/r/115354 [11:35:42] ori, if ( !string )? [11:35:56] direct comparison is so PHP:P [11:49:33] (03PS1) 10Aude: Setup test.wikidata as repo for test2 and test.wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115366 [11:54:11] (03CR) 10Addshore: [C: 031] Setup test.wikidata as repo for test2 and test.wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115366 (owner: 10Aude) [11:57:57] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Last successful Puppet run was Sat 22 Feb 2014 02:36:40 PM UTC [12:55:58] Can someone take a look at [[rt:6853]] & [[bugzilla:56938]] to see if the change can go ahead soon? Thanks [13:00:48] KTC: paste url and give us some background/context :D [13:01:01] KTC: i dont think anyone is willing to copy paste those numbers and figure out what needs to be done there [13:01:02] :D [13:01:11] I'm already looking at it [13:01:16] \O/ [13:01:23] https://bugzilla.wikimedia.org/show_bug.cgi?id=56938 [13:01:36] redirecting uk.wikimedia.org to wikimedia.org.uk [13:01:41] and shutting down ukwikimedia [13:02:26] thanks paravoid [13:09:59] (03PS3) 10Jeremyb: redirect ukwikimedia to wikimedia.org.uk [operations/apache-config] - 10https://gerrit.wikimedia.org/r/113877 [13:10:30] (03CR) 10Faidon Liambotis: [C: 032 V: 032] redirect ukwikimedia to wikimedia.org.uk [operations/apache-config] - 10https://gerrit.wikimedia.org/r/113877 (owner: 10Jeremyb) [13:14:07] :) [13:15:04] Reedy: hey, want to give an opinion on https://gerrit.wikimedia.org/r/#/c/113878/ ? [13:22:15] (03CR) 10Steinsplitter: "i am not sure if it is allowed to redirect from a wmf domaint to a privat hosted (WM CHAPTER) website... the first time i see this." [operations/apache-config] - 10https://gerrit.wikimedia.org/r/113877 (owner: 10Jeremyb) [13:25:41] (03PS1) 10Steinsplitter: Revert "redirect ukwikimedia to wikimedia.org.uk" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/115375 [13:32:49] wtf? [13:33:15] *sigh* [13:33:35] (03Abandoned) 10Steinsplitter: Revert "redirect ukwikimedia to wikimedia.org.uk" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/115375 (owner: 10Steinsplitter) [13:34:01] misunderstanding :O [13:34:55] (03PS3) 10Jeremyb: close ukwikimedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113878 [13:35:05] (03CR) 10Reedy: [C: 032] close ukwikimedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113878 (owner: 10Jeremyb) [13:35:18] (03Merged) 10jenkins-bot: close ukwikimedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113878 (owner: 10Jeremyb) [13:39:36] Steinsplitter: no problem :) [13:40:46] !log reedy synchronized database lists files: [13:40:54] Logged the message, Master [13:45:19] !log reedy synchronized wmf-config/InitialiseSettings.php [13:45:19] Logged the message, Master [14:10:57] PROBLEM - Puppet freshness on virt1000 is CRITICAL: Last successful Puppet run was Fri 21 Feb 2014 04:42:42 PM UTC [14:37:07] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (201713) [14:38:08] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [14:42:08] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (201234) [14:44:08] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [14:55:08] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (201054) [14:58:57] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Last successful Puppet run was Sat 22 Feb 2014 02:36:40 PM UTC [15:41:29] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [15:50:17] !log reedy updated /a/common to {{Gerrit|Ib45270536}}: close ukwikimedia [15:50:20] (03PS1) 10Reedy: sort by dbname before outputting [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115384 [15:50:25] Logged the message, Master [15:51:12] (03CR) 10Reedy: [C: 032] sort by dbname before outputting [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115384 (owner: 10Reedy) [15:51:21] (03Merged) 10jenkins-bot: sort by dbname before outputting [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115384 (owner: 10Reedy) [15:51:40] (03PS1) 10Reedy: Non wikipedias to 1.23wmf15 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115386 [15:55:12] (03PS1) 10Hashar: contint: tweak browsertests URLs [operations/puppet] - 10https://gerrit.wikimedia.org/r/115388 [15:56:56] (03CR) 10BBlack: "GeoIPCity's data (especially city-names) can have all kinds of crazy data that the cookie RFCs don't allow (last I checked, char encoding " [operations/puppet] - 10https://gerrit.wikimedia.org/r/115354 (owner: 10Ori.livneh) [15:57:07] (03CR) 10Manybubbles: contint: tweak browsertests URLs (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/115388 (owner: 10Hashar) [15:59:46] (03CR) 10Hashar: contint: tweak browsertests URLs (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/115388 (owner: 10Hashar) [15:59:51] (03PS2) 10Hashar: contint: tweak browsertests URLs [operations/puppet] - 10https://gerrit.wikimedia.org/r/115388 [16:29:36] (03CR) 10Hashar: [C: 031] "Manually developed and applied on the labs instance running browsertests. Seems to fulfill our needs for now." [operations/puppet] - 10https://gerrit.wikimedia.org/r/115388 (owner: 10Hashar) [16:29:55] can someone merge in https://gerrit.wikimedia.org/r/115388 please? That is for contint browsertests in labs. Already tested/applied on the instance. Thank you! [16:34:32] (03PS1) 10Aude: Enable data transclusion for wikisource [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115393 [16:35:21] (03CR) 10Aude: [C: 04-1] "not to deploy until 19:00 UTC or later" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115393 (owner: 10Aude) [16:35:46] (03CR) 10Aude: "preferably deploy after wikidata is on wmf15" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115393 (owner: 10Aude) [16:38:30] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [16:51:16] (03PS1) 10RobH: assigning mgmt dns for tantalum server [operations/dns] - 10https://gerrit.wikimedia.org/r/115396 [16:51:28] (03CR) 10jenkins-bot: [V: 04-1] assigning mgmt dns for tantalum server [operations/dns] - 10https://gerrit.wikimedia.org/r/115396 (owner: 10RobH) [16:51:40] bleh [16:53:03] hrmm [16:53:10] it seems to fail for changes that arent mine [16:53:16] ie: i think its failing things it shouldnt [16:55:00] (03PS1) 10RobH: disregard, testing patch submission in dns [operations/dns] - 10https://gerrit.wikimedia.org/r/115398 [16:55:48] (03Abandoned) 10RobH: disregard, testing patch submission in dns [operations/dns] - 10https://gerrit.wikimedia.org/r/115398 (owner: 10RobH) [16:57:18] (03PS5) 10BBlack: Handle NULL record->city [operations/puppet] - 10https://gerrit.wikimedia.org/r/115354 (owner: 10Ori.livneh) [16:57:46] (03PS2) 10RobH: assigning mgmt dns for tantalum server [operations/dns] - 10https://gerrit.wikimedia.org/r/115396 [16:57:54] (03CR) 10jenkins-bot: [V: 04-1] assigning mgmt dns for tantalum server [operations/dns] - 10https://gerrit.wikimedia.org/r/115396 (owner: 10RobH) [16:58:11] (03Abandoned) 10RobH: assigning mgmt dns for tantalum server [operations/dns] - 10https://gerrit.wikimedia.org/r/115396 (owner: 10RobH) [16:58:20] (03CR) 10BBlack: "^ Was done offline, you might want to validate that doesn't contain stupid errors, but that's the general idea." [operations/puppet] - 10https://gerrit.wikimedia.org/r/115354 (owner: 10Ori.livneh) [16:58:35] ottomata: are you willing to consent to defeat? [16:59:18] NO! one more week! also, we do still see flapping ISRs [16:59:22] but only about twice a day [17:00:10] (03PS6) 10BBlack: Handle NULL record->city [operations/puppet] - 10https://gerrit.wikimedia.org/r/115354 (owner: 10Ori.livneh) [17:01:40] (03PS1) 10RobH: misc server tantalum dns name assignment [operations/dns] - 10https://gerrit.wikimedia.org/r/115400 [17:03:09] (03CR) 10RobH: [C: 032] misc server tantalum dns name assignment [operations/dns] - 10https://gerrit.wikimedia.org/r/115400 (owner: 10RobH) [17:03:29] fair enough :) [17:03:46] (03PS2) 10Alexandros Kosiaris: Renamed labstore100[34] to labsdb100[45] [operations/dns] - 10https://gerrit.wikimedia.org/r/110220 [17:04:58] (03CR) 10Alexandros Kosiaris: [C: 032] Renamed labstore100[34] to labsdb100[45] [operations/dns] - 10https://gerrit.wikimedia.org/r/110220 (owner: 10Alexandros Kosiaris) [17:07:04] one more week is especially unfair in this case! [17:08:24] that's why i will continue lurking around for a while, ottomata is not going to get away from this bet this easily [17:08:32] good [17:08:37] ottomata: keep it up a few more weeks [17:10:08] (03PS3) 10Hashar: contint: tweak browsertests URLs [operations/puppet] - 10https://gerrit.wikimedia.org/r/115388 [17:10:18] +1 [17:10:59] PROBLEM - Puppet freshness on virt1000 is CRITICAL: Last successful Puppet run was Fri 21 Feb 2014 04:42:42 PM UTC [17:13:16] (03CR) 10Alexandros Kosiaris: [C: 032] PostgreSQL/Postgis module [operations/puppet] - 10https://gerrit.wikimedia.org/r/112469 (owner: 10Alexandros Kosiaris) [17:15:17] (03PS1) 10RobH: tantalum production ip assignment [operations/dns] - 10https://gerrit.wikimedia.org/r/115401 [17:16:08] (03CR) 10RobH: [C: 032] tantalum production ip assignment [operations/dns] - 10https://gerrit.wikimedia.org/r/115401 (owner: 10RobH) [17:16:56] is there documentation anywhere regarding code verification requirements of the production cluster? Question was brought up at https://gerrit.wikimedia.org/r/#/c/112699/ [17:19:25] (03CR) 10Hashar: "Still have an issue with URL like /CirrusSearch/wiki/ causing a 503 MediaWiki error about redirecting loop detected :(" [operations/puppet] - 10https://gerrit.wikimedia.org/r/115388 (owner: 10Hashar) [17:19:33] (03CR) 10Hashar: [C: 04-1] contint: tweak browsertests URLs [operations/puppet] - 10https://gerrit.wikimedia.org/r/115388 (owner: 10Hashar) [17:19:36] ebernhardson: there might be something on wikitech, but what bryan said in the last comment is mostly true [17:19:50] ebernhardson: if it's not a .deb, don't expect it to be deployed [17:20:11] (i am also not really qualified to answer this, so take what i say with a grain of salt) [17:21:03] yea what he wrote makes sense, but since he prefaced it with an uncertainty thought it was worth looking for doc's [17:24:03] ebernhardson: https://wikitech.wikimedia.org/wiki/Package_management and the category? it all looks rather outdated though [17:51:08] (03PS1) 10Ottomata: Adding $brokers_array variable to role kafka config [operations/puppet] - 10https://gerrit.wikimedia.org/r/115406 [17:51:22] (03PS2) 10Ottomata: Adding $brokers_array variable to role kafka config [operations/puppet] - 10https://gerrit.wikimedia.org/r/115406 [17:55:56] (03PS7) 10Ori.livneh: Stricter validation for GeoIP cookie [operations/puppet] - 10https://gerrit.wikimedia.org/r/115354 [17:57:17] (03CR) 10Ori.livneh: [C: 031] "bblack: corrected a 'cookie' that should have been 'cookie_buf'." [operations/puppet] - 10https://gerrit.wikimedia.org/r/115354 (owner: 10Ori.livneh) [17:57:36] (03CR) 10Ottomata: [C: 032 V: 032] Adding $brokers_array variable to role kafka config [operations/puppet] - 10https://gerrit.wikimedia.org/r/115406 (owner: 10Ottomata) [17:58:42] (03PS1) 10Alexandros Kosiaris: Adding OSM role classes [operations/puppet] - 10https://gerrit.wikimedia.org/r/115409 [17:58:44] (03PS1) 10Alexandros Kosiaris: Introduce labsdb100[45].eqiad.wmnet [operations/puppet] - 10https://gerrit.wikimedia.org/r/115410 [17:58:59] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Last successful Puppet run was Sat 22 Feb 2014 02:36:40 PM UTC [17:59:50] (03PS1) 10Ottomata: Setting up kafkatee on analytics1003 to log mobile webrequest logs [operations/puppet] - 10https://gerrit.wikimedia.org/r/115411 [18:02:10] (03CR) 10Alexandros Kosiaris: [C: 032] swift: remove lookupvar and replace with fact @ var [operations/puppet] - 10https://gerrit.wikimedia.org/r/112885 (owner: 10Matanya) [18:02:32] (03PS2) 10Ottomata: Setting up kafkatee on analytics1003 to log mobile webrequest logs [operations/puppet] - 10https://gerrit.wikimedia.org/r/115411 [18:03:22] (03PS3) 10Ottomata: Setting up kafkatee on analytics1003 to log mobile webrequest logs [operations/puppet] - 10https://gerrit.wikimedia.org/r/115411 [18:12:08] !log patched OTRS for XSS vulnerability [18:12:15] (03PS1) 10Odder: Remove Flow from Meta [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115412 [18:12:16] Logged the message, Master [18:24:17] (03CR) 10Chad: [C: 031] "I'm inclined to merge this. Meta clearly doesn't want Flow just yet (and at the very least, would like to have community consensus before " [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115412 (owner: 10Odder) [18:27:21] ori: re: charsets and whatnot, we could also just do our own encoding (e.g. base64) and let the JS consumer (or whatever) decode to get the original 8859-1 and do what it wants with it, but that kinda sucks for viewing the cookie manually [18:27:39] (03CR) 10Tychay: [C: 04-1] "Since when does Meta have a community? Please point to the discussion page on meta demanding we remove it. :-)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115412 (owner: 10Odder) [18:27:47] the problem with just passing in the raw 8859-1 is apparently this causes some browsers (some Safaris anyways) to just drop the cookie [18:28:26] (03CR) 10Odder: "What a ridiculous comment." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115412 (owner: 10Odder) [18:28:27] I'd really like to do the iconv thing ideally, but it seems like a heavyweight approach for something happening on every req in VCL :( [18:28:42] well, once per session anyways [18:30:03] (then again, maybe it's not that heavy in light of the fact that we're calling the geoip API already in the same case) [18:31:26] (03CR) 10Tychay: "I agree it's ridiculous to say "Meta clearly doesn't want Flow just yet (and at the very least, would like to have community consensus bef" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115412 (owner: 10Odder) [18:33:08] ori: in any case, did you run PS7 with that test log already? [18:33:42] (03CR) 10Odder: "The patch also states that there was no consensus to enable it in the first place." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115412 (owner: 10Odder) [18:36:08] manybubbles|food: meetin' [18:38:36] (03CR) 10Maryana: [C: 04-1] ""Meta clearly doesn't want Flow just yet" - well, no, Odder and MZMcBride do not want Flow to be enabled on Meta. That is not a community " [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115412 (owner: 10Odder) [18:41:36] (03CR) 10Matanya: Setting up kafkatee on analytics1003 to log mobile webrequest logs (034 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/115411 (owner: 10Ottomata) [18:41:59] bblack: yep [18:42:25] bblack: b64 decoding in js is expensive, and the geoip data needs to be available quickly, so the decoding could not be deferred [18:42:49] well, the ascii translit still loses data, but it loses less data than just underscores [18:43:03] a lot of international cities will end up with underscores in the current approach [18:43:39] bblack: using iconv, you mean? [18:44:03] ascii translit meant using iconv, yes [18:44:29] I'm playing with some test code now just to see exactly how GNU libiconv behaves and how ugly the code would look [18:44:40] I've never really used the API before [18:45:13] i think it's ok to mangle names with underscores. my rationale is this: [18:45:27] the fundraiser (the primary user of the geoip data) doesn't even use city data [18:45:55] other users of this data are tolerated -- it's been exposed for so long that it's sort of unofficially a public api of sorts [18:46:19] but the primary use-case for city data that we are interested in supporting is allowing admins to create city-targeted notices [18:46:48] if you live in Genève you just have to figure out that it gets decoded to Gen_ve [18:46:55] we don't actually output the literal value anywhere [18:47:01] anywhere user-visible, i mean [18:48:39] RECOVERY - Host cp4009 is UP: PING OK - Packet loss = 0%, RTA = 74.26 ms [18:49:33] !log cp4009 came back from errors after power removal rt6890 [18:49:40] Logged the message, RobH [18:50:25] bblack: ^ [18:51:20] RobH: thanks! [18:51:29] PROBLEM - Varnish HTTP text-backend on cp4009 is CRITICAL: Connection refused [18:53:29] RECOVERY - Varnish HTTP text-backend on cp4009 is OK: HTTP OK: HTTP/1.1 200 OK - 188 bytes in 0.150 second response time [18:57:08] welcome [19:20:26] !log reedy synchronized php-1.23wmf15/extensions/Wikidata/ [19:20:34] Logged the message, Master [19:22:16] 4 Warning: Unknown: Input variables exceeded 1000. To increase the limit change max_input_vars in php.ini. in Unknown on line 0 [19:26:34] <^d> Reedy: Wow. [19:26:57] <^d> Ah, duh. [19:27:02] <^d> It's doing what it should :) [19:37:16] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: non wikipedias to 1.23wmf15 [19:37:24] Logged the message, Master [19:38:25] yay [19:47:55] !log reedy synchronized wmf-config/InitialiseSettings.php 'I5a2b7b360be808e4780f14dda375af17930dec97' [19:48:04] Logged the message, Master [20:01:56] hmm we seem to have lost grrrit-wm [20:06:29] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [20:08:36] (03CR) 10Aude: [C: 031] Enable data transclusion for wikisource [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115393 (owner: 10Aude) [20:08:40] (03PS2) 10Aude: Enable data transclusion for wikisource [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115393 [20:08:42] (03CR) 10jenkins-bot: [V: 04-1] Enable data transclusion for wikisource [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115393 (owner: 10Aude) [20:08:44] (03CR) 10Reedy: [C: 032] Enable data transclusion for wikisource [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115393 (owner: 10Aude) [20:08:46] (03Merged) 10jenkins-bot: Enable data transclusion for wikisource [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115393 (owner: 10Aude) [20:08:56] (03PS2) 10Aude: Setup test.wikidata as repo for test2 and test.wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115366 [20:09:07] (03PS3) 10Aude: Setup test.wikidata as repo for test2 and test.wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115366 [20:09:10] (03PS8) 10Ori.livneh: Stricter validation for GeoIP cookie [operations/puppet] - 10https://gerrit.wikimedia.org/r/115354 [20:09:15] (03CR) 10BBlack: [C: 032] Stricter validation for GeoIP cookie [operations/puppet] - 10https://gerrit.wikimedia.org/r/115354 (owner: 10Ori.livneh) [20:09:20] (03CR) 10Aaron Schulz: [C: 04-1] "If it's just on a test page and for use with some WMF Programs Evaluations talk pages, then I don't see what the fuss is about. Is there s" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115412 (owner: 10Odder) [20:09:35] (03PS4) 10Aude: Setup test.wikidata as repo for test2 and test.wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115366 [20:09:39] (03CR) 10Tobias Gritschacher: [C: 031] Setup test.wikidata as repo for test2 and test.wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115366 (owner: 10Aude) [20:09:41] (03CR) 10Addshore: [C: 031] Setup test.wikidata as repo for test2 and test.wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115366 (owner: 10Aude) [20:09:44] (03CR) 10Tychay: ""EventLogging is hardly a good comparison because it is utterly un-user facing (There's also a difference between enabling things by defau" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115412 (owner: 10Odder) [20:11:59] PROBLEM - Puppet freshness on virt1000 is CRITICAL: Last successful Puppet run was Fri 21 Feb 2014 04:42:42 PM UTC [20:14:40] (03CR) 10Tychay: ""2) Flow is only enabled on a single test page which is not visible"" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115412 (owner: 10Odder) [20:17:40] (03CR) 10Chad: [C: 04-1] "I reread everything and I agree with Aaron more now. Swapping my vote." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115412 (owner: 10Odder) [20:20:11] (03CR) 10PiRSquared17: "@Chad: "Since when does Meta have a community?" https://meta.wikimedia.org/wiki/Meta:About#Community" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115412 (owner: 10Odder) [20:20:42] (03CR) 10PiRSquared17: "I meant @Tychay, sorry" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115412 (owner: 10Odder) [20:20:57] (03CR) 10Chad: "I didn't say that ;-)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115412 (owner: 10Odder) [20:23:32] * greg-g sighs [20:28:56] (03CR) 10John F. Lewis: [C: 04-1] "Looking at it; I actually agree with the -1's so far. A community consensus in my opinion is necessary for a forced change affecting the w" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115412 (owner: 10Odder) [20:29:10] greg-g: Sighing about the above commit by any chance? [20:30:15] JohnLewis: a lot of things, today. [20:30:48] greg-g: Release management can be a tough thing then, or are they not related to work? [20:31:08] JohnLewis: a little from column A, a little from column B. :) [20:31:35] I have a trip to Michigan (I live in the bay area) for a sister-in-law wedding stay at the mother-in-laws for about a week :) [20:31:52] leaving on Thursday morning for a cross country flight, with layover, with a 2 year old ;) [20:31:57] Well that'll be a nice relaxing week then :D [20:32:00] so, all is relative :) [20:35:19] greg-g: Well enjoy the week off and enjoy the wedding :) [20:35:57] JohnLewis: no week off, will be working from Michigan, just taking Thursday and ... uh, Wed(?) off for flight days. [20:36:05] (the following wed, that is) [20:36:11] PROBLEM - Puppet freshness on cp4012 is CRITICAL: Last successful Puppet run was Tue 25 Feb 2014 05:35:12 PM UTC [20:37:14] PROBLEM - Puppet freshness on cp4004 is CRITICAL: Last successful Puppet run was Tue 25 Feb 2014 05:36:38 PM UTC [20:37:14] PROBLEM - Puppet freshness on cp4003 is CRITICAL: Last successful Puppet run was Tue 25 Feb 2014 05:35:57 PM UTC [20:38:07] PROBLEM - Puppet freshness on cp4001 is CRITICAL: Last successful Puppet run was Tue 25 Feb 2014 05:37:28 PM UTC [20:38:07] greg-g: Well, enjoy remote working! :p [20:38:31] JohnLewis: already do, 3 out of 5 days a week :P [20:39:02] announcer's voice: And this concludes today's installment of "Details of Greg's life." [20:40:10] greg-g: When the next installment? :D [20:40:39] Join us next week at the same bat time, same bat channel. [20:41:40] Fair enough :p [20:42:32] PROBLEM - Puppet freshness on cp3014 is CRITICAL: Last successful Puppet run was Tue 25 Feb 2014 05:37:58 PM UTC [20:47:59] PROBLEM - Puppet freshness on cp4019 is CRITICAL: Last successful Puppet run was Tue 25 Feb 2014 05:47:28 PM UTC [20:48:49] (03CR) 10Brian Wolff: "To clarify, my +1 is meant as symbolic support for social reasons. I have no technical objections to deploying flow to wikis for testing p" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115412 (owner: 10Odder) [20:48:59] PROBLEM - Puppet freshness on cp3013 is CRITICAL: Last successful Puppet run was Tue 25 Feb 2014 05:48:39 PM UTC [20:56:59] PROBLEM - Puppet freshness on cp3021 is CRITICAL: Last successful Puppet run was Tue 25 Feb 2014 05:55:57 PM UTC [20:57:59] (03CR) 10PiRSquared17: "This reverts I61cf356c832c5f8915cd20b8ceb789a2db674d03 (why not in commit msg?)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115412 (owner: 10Odder) [20:59:59] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Last successful Puppet run was Sat 22 Feb 2014 02:36:40 PM UTC [21:00:59] PROBLEM - Puppet freshness on cp3022 is CRITICAL: Last successful Puppet run was Tue 25 Feb 2014 06:00:16 PM UTC [21:02:59] PROBLEM - Puppet freshness on cp3020 is CRITICAL: Last successful Puppet run was Tue 25 Feb 2014 06:01:49 PM UTC [21:03:29] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [21:04:14] (03CR) 10Eloquence: "We do not operate on the basis of obtaining per-wiki consensus prior to any software change to Wikimedia Foundation wikis. We never have, " [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115412 (owner: 10Odder) [21:04:59] PROBLEM - Puppet freshness on cp4020 is CRITICAL: Last successful Puppet run was Tue 25 Feb 2014 06:03:52 PM UTC [21:06:22] (03CR) 10Eloquence: "To be clear, if we failed to communicate effectively and well about what was about to happen (which I understand to be a very limited test" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115412 (owner: 10Odder) [21:06:47] (03PS4) 10Hashar: contint: tweak browsertests URLs [operations/puppet] - 10https://gerrit.wikimedia.org/r/115388 [21:06:59] PROBLEM - Puppet freshness on cp4011 is CRITICAL: Last successful Puppet run was Tue 25 Feb 2014 06:06:33 PM UTC [21:07:23] (03CR) 10Hashar: "Further tweak to the rewrite rule. It was missing /wiki/ and did not point to the document root. Seems to be working now." [operations/puppet] - 10https://gerrit.wikimedia.org/r/115388 (owner: 10Hashar) [21:08:36] (03PS5) 10Hashar: contint: tweak browsertests URLs [operations/puppet] - 10https://gerrit.wikimedia.org/r/115388 [21:08:52] (03CR) 10Hashar: "And forgot acceptpathinfo :-D" [operations/puppet] - 10https://gerrit.wikimedia.org/r/115388 (owner: 10Hashar) [21:11:59] PROBLEM - Puppet freshness on cp3012 is CRITICAL: Last successful Puppet run was Tue 25 Feb 2014 06:11:01 PM UTC [21:14:59] PROBLEM - Puppet freshness on cp4002 is CRITICAL: Last successful Puppet run was Tue 25 Feb 2014 06:14:23 PM UTC [21:15:59] PROBLEM - Puppet freshness on cp3011 is CRITICAL: Last successful Puppet run was Tue 25 Feb 2014 06:15:14 PM UTC [21:22:34] (03CR) 10PiRSquared17: "I64f209ce3051854df181cd4a5f063a1885e50eb8 reverts this" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/114088 (owner: 10EBernhardson) [21:25:29] (03PS1) 10Hoo man: Bump the Cache Epoch for Wikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115514 [21:25:59] PROBLEM - Puppet freshness on cp3019 is CRITICAL: Last successful Puppet run was Tue 25 Feb 2014 06:25:24 PM UTC [21:27:38] (03PS2) 10Hoo man: Bump the parser cache epoch for Wikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115514 [21:27:42] aude: ^ [21:33:32] greg-g: is it okay if i approve/deploy that [21:33:35] Reedy: [21:33:59] PROBLEM - Puppet freshness on labstore4 is CRITICAL: Last successful Puppet run was Tue 25 Feb 2014 06:33:37 PM UTC [21:34:00] sync-common-file is what to use? [21:35:44] (03CR) 10Aude: [C: 031] "looks good. fine if Reedy can deploy (or i can try, at time that is open)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115514 (owner: 10Hoo man) [21:38:40] aude: sync-file does some extra syntax checks etc. but both should be fine AFAIR [21:38:52] if greg-g is good with it, I'd say: Go for it :) [21:39:21] i see, sync-file is good [21:39:49] (03PS6) 10Hashar: contint: tweak browsertests URLs [operations/puppet] - 10https://gerrit.wikimedia.org/r/115388 [21:40:27] hashar: that bad bot should be banned.simple enough [21:40:48] mutante: which bot ? :D [21:41:46] aude: I wonder whether such very minor config. deploys are usually fine... [21:41:57] hashar: ZumBot [21:42:05] contint: prevent user-agent ZumBot on Jenkins [21:42:15] I mean if no one else is deploying [21:42:33] in "proxy_jenkins" [21:42:54] it's kind of bad to leave stale, malformatted stuff on wikidata [21:43:51] mutante: yeah I spot it randomly. [21:43:58] flow is scheduled to deploy at 22:00 utc( so like 15 min) [21:44:01] from now [21:44:03] aude: Flow deployment window will be in 15 min [21:44:07] should be clear now, though [21:44:08] right [21:44:26] mutante: if you get bandwidth can you please +2 /merge https://gerrit.wikimedia.org/r/115388 ? That is for a contint website in labs :] [21:44:36] mutante: finally made it work and I am happy with that version [21:44:48] hashar: Can you answer? Is it ok to just quickly deploy a config. fix? :P [21:45:07] hashar: looking [21:45:19] hoo: ask reedy there is a wmf deploy tonight ongoing. [21:45:40] hashar: His windows is closed and the next one is not yet up [21:45:44] hoo: so I guess that depends on the config fix :D [21:45:53] we didk 15 min ago [21:46:04] hashar: It fixes heavy UI problems on wikidata [21:46:05] and greg but no answer [21:46:15] greg is busy with flow :D [21:46:24] * aude nods [21:46:41] i wait 2-3 more minutes [21:46:50] and I dont feel like invalidating the cache for Wikibase at 11pm :] [21:46:54] Yep [21:47:00] hashar: We did that twice before :P [21:47:03] (03CR) 10Dzahn: [C: 032] "sure, hashar needs this and it's just mod_rewrite and a rewrite rule for labs contint site" [operations/puppet] - 10https://gerrit.wikimedia.org/r/115388 (owner: 10Hashar) [21:47:20] We should be more stable at these kind of things... [21:48:06] http://gdash.wikimedia.org/dashboards/pcache/ [21:48:20] we did on feb 3 i believe [21:48:41] seems not a problem [21:49:15] Date: Mon Feb 3 21:32:20 2014 +0100 [21:49:15] Bump wgCacheEpoch for Wikidata [21:49:16] yep [21:49:23] hashar: it ran on gallium [21:49:56] seeing $wgCacheEpoch kind of surprise me. I thought it disappeared some how :-] [21:49:56] mutante: danke :-] [21:50:03] de rien [21:50:07] mutante: and thanks for the Bugzilla migration to eqiad/puppetization/kaulen phase out [21:50:11] hashar: nope [21:50:48] mutante: thanks [21:50:53] aude: go for it? Before we get into Flow's deploy window... [21:51:14] alright [21:51:18] (03CR) 10Aude: [C: 032] Bump the parser cache epoch for Wikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115514 (owner: 10Hoo man) [21:51:25] (03Merged) 10jenkins-bot: Bump the parser cache epoch for Wikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115514 (owner: 10Hoo man) [21:51:29] can't wait all night [21:51:34] !!!!!!!!!!!!!!! [21:51:43] hashar: ? [21:51:48] be bold! [21:54:25] it doesn't like my ssh key :( [21:54:50] aude: oO [21:54:55] got ProxCommand for the mw* [21:54:56] ? [21:55:05] * ProxyCommand [21:55:24] * hoo can't jump in [21:56:04] our window is 2 hours but we only need like 15min today, we can delay a few if you need [21:56:29] ebernhardson: Would be great, as we're having trouble atm :) [21:56:43] ok, just let me know when you are done [21:56:49] ok [21:57:07] sure [21:57:26] if it takes more than 5 min, then i give up and maybe reedy or someone can deploy it [21:57:37] mh [22:00:47] (03CR) 10Matthias Mullie: [C: 04-1] "Though I'm biased, I see more reasons to leave it on, than to turn it off." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115412 (owner: 10Odder) [22:01:46] i'm doing proxy and still doesn't work [22:02:00] what does it say? [22:02:18] Permission denied (publickey). [22:02:24] yikes [22:02:30] sync-common-file? [22:02:37] i need to pull from git [22:02:42] fetch* [22:04:37] yep [22:07:03] if no one is around [22:07:36] i can do ssh -p 29418 gerrit.wikimedia.org and debug [22:07:51] but not block ebernhardson [22:07:51] aude: ssh to gerrit is the issue? [22:07:56] yes [22:08:06] the ssh thing works fine elsewhere [22:08:11] maybe i can try on labs [22:08:13] I see [22:09:49] (03PS1) 10Dzahn: decom Tampa, remove "maurus" from dsh [operations/puppet] - 10https://gerrit.wikimedia.org/r/115520 [22:10:13] aude: only question I'd have is: I haven't seen these epoch edits much, are they scary, cache busting wise? [22:10:21] * greg-g just got out of a meeting, sorry [22:10:50] greg-g: We did that twice in the last weeks and nothing broke [22:10:53] we did this on feb 3 and seemed fine [22:10:56] http://gdash.wikimedia.org/dashboards/pcache/ [22:11:02] sweet [22:11:06] (03CR) 10Dzahn: [C: 032] "unknown host maurus" [operations/puppet] - 10https://gerrit.wikimedia.org/r/115520 (owner: 10Dzahn) [22:11:13] if someone can deploy it for me, that would be great [22:11:30] i can take time whenever to debug ssh to gerrit, but not block flow [22:11:46] ebernhardson: thanks in advance if you can do that wikidata epoch deploy with yours [22:11:49] :) [22:12:05] whats the change? i can probably deploy it [22:12:08] right now wikidata is using old parsed stuff, which didn't link items [22:12:10] aude: is it just me, or are the deploy verticle lines not showing up on gdash anymore? (be sure to tick the deploys checkbox) [22:12:22] ebernhardson: https://gerrit.wikimedia.org/r/#/c/115514/ [22:12:30] (03CR) 10Dzahn: "this file is like a countdown for Tampa, help killing it line by line" [operations/puppet] - 10https://gerrit.wikimedia.org/r/115520 (owner: 10Dzahn) [22:12:30] ebernhardson: https://gerrit.wikimedia.org/r/#/c/115514/ [22:12:33] already merged [22:12:35] it's already merged, just needs to be deployed [22:12:39] heh, jinx [22:12:47] gerrit just doesn't like my ssh key [22:13:10] idk, do i have to put my production ssh public key into gerrit? [22:13:32] !log ebernhardson synchronized wmf-config/Wikibase.php [22:13:33] I wouldn't without confirmation (I don't know) [22:13:39] Logged the message, Master [22:13:41] agree [22:13:43] go ahead and test, its out now [22:13:47] ebernhardson: thanks [22:14:07] did the triick [22:14:09] trick* [22:14:54] greg-g: i don't see deploy lines in gdash, btw [22:15:28] * greg-g files bug [22:16:27] aude: So... how'd you even get git to work? If I do `git status` on terbium I only get a warningn that it's not a git repo -.- [22:16:35] hmmm, maybe an RT ticket? I dont' see an obvious bugzilla product/component [22:16:53] hoo: /a/common is a git repo [22:17:01] i can do git log [22:17:21] hoo@terbium:/a/common$ git status [22:17:22] fatal: Not a git repository (or any of the parent directories): .git [22:17:35] maybe I would need to go onto tin for that? :/ [22:17:37] i am on tin [22:19:47] mh [22:20:00] aude: You just do 'git log' in there with your own account and it works? [22:20:49] no idea if it's really with my account for git log [22:20:56] !log ebernhardson synchronized php-1.23wmf15/extensions/Flow [22:20:57] but it just works [22:21:03] Logged the message, Master [22:27:10] aude: I'm pretty sure you need to forward both your gerrit ssh key and your cluster ssh key into tin. `ssh-add -l` from tin needs to show both. The first for git fetch and the second for dsh [22:27:41] bd808: ok [22:28:27] I don't know the answer to "is it safe to add my cluster key to gerrit". It seems like it would be since your public key is already in operations/puppet.git and it's well just a public key [22:32:16] <^d> I tend to use two separate keys, but bd808 is right. [22:32:26] that's what i thought, but can try forwarding both keys [22:35:59] (03PS1) 10Dzahn: decom - remove rose, pdf1 and tingxi from dsh [operations/puppet] - 10https://gerrit.wikimedia.org/r/115521 [22:41:59] PROBLEM - RAID on labstore1 is CRITICAL: CRITICAL: 1 failed LD(s) (Partially Degraded) [22:52:05] (03CR) 10Dzahn: [C: 032] decom - remove rose, pdf1 and tingxi from dsh [operations/puppet] - 10https://gerrit.wikimedia.org/r/115521 (owner: 10Dzahn) [22:54:33] (03PS1) 10BBlack: Random stab at fixing breakage in 49f9f658 [operations/puppet] - 10https://gerrit.wikimedia.org/r/115524 [22:56:05] (03PS2) 10BBlack: Random stab at fixing breakage in 49f9f658 [operations/puppet] - 10https://gerrit.wikimedia.org/r/115524 [22:57:05] * bblack stabs gerrit for not having the git commitid available *anywhere* on the main page of a changeset in the web view :P [22:57:35] (03CR) 10BBlack: [C: 032 V: 032] Random stab at fixing breakage in 49f9f658 [operations/puppet] - 10https://gerrit.wikimedia.org/r/115524 (owner: 10BBlack) [22:58:44] (03CR) 10Dzahn: ""Requires hash to work with" and the commented stuff..makes sense..right" [operations/puppet] - 10https://gerrit.wikimedia.org/r/115524 (owner: 10BBlack) [22:59:05] (03PS1) 10Ori.livneh: text caches: enable_geoiplookup => true [operations/puppet] - 10https://gerrit.wikimedia.org/r/115525 [22:59:29] RECOVERY - Puppet freshness on cp4012 is OK: puppet ran at Tue Feb 25 22:59:27 UTC 2014 [22:59:35] bblack: probably best to add ottomata [22:59:40] ah, nice [23:00:01] cp4012 is the one I ran manually just now to confirm [23:00:09] but yeah I'll add him so it pings him or whatever [23:00:16] yea, saw recovery, cool [23:00:29] RECOVERY - Puppet freshness on cp3022 is OK: puppet ran at Tue Feb 25 23:00:23 UTC 2014 [23:00:38] yea, it should mail him for being added as reviewer [23:00:42] even when it's already merged [23:00:49] sometimes i use that as an "fyi" [23:01:50] ori: re: enable_geoiplookup, I'll merge that in a bit, but I haven't had a chance to try it manually anywhere yet and look around at it [23:02:19] RECOVERY - Puppet freshness on cp3020 is OK: puppet ran at Tue Feb 25 23:02:16 UTC 2014 [23:04:09] RECOVERY - Puppet freshness on cp4020 is OK: puppet ran at Tue Feb 25 23:03:58 UTC 2014 [23:05:49] RECOVERY - Puppet freshness on cp4003 is OK: puppet ran at Tue Feb 25 23:05:39 UTC 2014 [23:06:29] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [23:06:49] RECOVERY - Puppet freshness on cp4011 is OK: puppet ran at Tue Feb 25 23:06:45 UTC 2014 [23:06:49] RECOVERY - Puppet freshness on cp4004 is OK: puppet ran at Tue Feb 25 23:06:45 UTC 2014 [23:07:59] RECOVERY - Puppet freshness on cp4001 is OK: puppet ran at Tue Feb 25 23:07:56 UTC 2014 [23:08:19] RECOVERY - Puppet freshness on cp3014 is OK: puppet ran at Tue Feb 25 23:08:11 UTC 2014 [23:10:49] RECOVERY - Puppet freshness on cp3012 is OK: puppet ran at Tue Feb 25 23:10:38 UTC 2014 [23:12:59] PROBLEM - Puppet freshness on virt1000 is CRITICAL: Last successful Puppet run was Fri 21 Feb 2014 04:42:42 PM UTC [23:15:09] RECOVERY - Puppet freshness on cp4002 is OK: puppet ran at Tue Feb 25 23:15:02 UTC 2014 [23:15:10] RECOVERY - Puppet freshness on cp3011 is OK: puppet ran at Tue Feb 25 23:15:07 UTC 2014 [23:18:29] RECOVERY - Puppet freshness on cp4019 is OK: puppet ran at Tue Feb 25 23:18:18 UTC 2014 [23:18:59] RECOVERY - Puppet freshness on cp3013 is OK: puppet ran at Tue Feb 25 23:18:54 UTC 2014 [23:25:39] RECOVERY - Puppet freshness on cp3019 is OK: puppet ran at Tue Feb 25 23:25:36 UTC 2014 [23:26:19] RECOVERY - Puppet freshness on cp3021 is OK: puppet ran at Tue Feb 25 23:26:17 UTC 2014 [23:27:49] (03CR) 10Tychay: "@PiRSquared17 It was a little bit tongue-in-cheek. I was just implying the logistics of "gathering community consensus to enable Flow" on" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115412 (owner: 10Odder) [23:45:58] (03PS1) 10Dzahn: remove sockpuppet from puppet [operations/puppet] - 10https://gerrit.wikimedia.org/r/115527