[00:00:16] (03PS1) 10Chad: Disable mergehistory everywhere [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137499 [00:04:19] Ok, I'm out for a few days. Later folks. [00:06:48] (03PS4) 10BBlack: Switch LVS servers to include standard [operations/puppet] - 10https://gerrit.wikimedia.org/r/20681 (owner: 10Faidon Liambotis) [00:13:44] RECOVERY - Disk space on gallium is OK: DISK OK [00:24:44] (03PS1) 10Yurik: Fixed labs config for ZeroBanner ext [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137508 [00:28:50] greg-g, i need to fix labs config file, should i sync-file it too? https://gerrit.wikimedia.org/r/#/c/137508/ [00:29:43] PROBLEM - Puppet freshness on osmium is CRITICAL: Last successful Puppet run was Wed 04 Jun 2014 21:28:55 UTC [00:30:43] PROBLEM - Puppet freshness on mw1168 is CRITICAL: Last successful Puppet run was Thu 05 Jun 2014 00:27:53 UTC [00:32:01] (03CR) 10Yurik: [C: 032] "minor fix for labs, not synced to prod just yet, but since this is a non-prod file, its ok to sync it later." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137508 (owner: 10Yurik) [00:32:07] (03Merged) 10jenkins-bot: Fixed labs config for ZeroBanner ext [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137508 (owner: 10Yurik) [00:32:32] (03PS1) 10BBlack: Remove temporary file-removal for /etc/init/enable-rps... [operations/puppet] - 10https://gerrit.wikimedia.org/r/137510 [00:32:43] PROBLEM - Puppet freshness on mw1168 is CRITICAL: Last successful Puppet run was Thu 05 Jun 2014 00:27:53 UTC [00:32:52] (03CR) 10BBlack: [C: 032 V: 032] Remove temporary file-removal for /etc/init/enable-rps... [operations/puppet] - 10https://gerrit.wikimedia.org/r/137510 (owner: 10BBlack) [00:34:43] PROBLEM - Puppet freshness on mw1168 is CRITICAL: Last successful Puppet run was Thu 05 Jun 2014 00:27:53 UTC [00:36:43] PROBLEM - Puppet freshness on mw1168 is CRITICAL: Last successful Puppet run was Thu 05 Jun 2014 00:27:53 UTC [00:38:07] (03PS5) 10BBlack: Switch LVS servers to include standard [operations/puppet] - 10https://gerrit.wikimedia.org/r/20681 (owner: 10Faidon Liambotis) [00:38:43] PROBLEM - Puppet freshness on mw1168 is CRITICAL: Last successful Puppet run was Thu 05 Jun 2014 00:27:53 UTC [00:40:43] PROBLEM - Puppet freshness on mw1168 is CRITICAL: Last successful Puppet run was Thu 05 Jun 2014 00:27:53 UTC [00:41:14] (03CR) 10BBlack: [C: 032 V: 032] "The merged version is only for lvs300[1234] testing." [operations/puppet] - 10https://gerrit.wikimedia.org/r/20681 (owner: 10Faidon Liambotis) [00:42:43] PROBLEM - Puppet freshness on mw1168 is CRITICAL: Last successful Puppet run was Thu 05 Jun 2014 00:27:53 UTC [00:44:43] PROBLEM - Puppet freshness on mw1168 is CRITICAL: Last successful Puppet run was Thu 05 Jun 2014 00:27:53 UTC [00:46:40] !log lvs3002 (live uploads lb for esams) is running ntpd [00:46:43] PROBLEM - Puppet freshness on mw1168 is CRITICAL: Last successful Puppet run was Thu 05 Jun 2014 00:27:53 UTC [00:46:45] Logged the message, Master [00:48:43] PROBLEM - Puppet freshness on mw1168 is CRITICAL: Last successful Puppet run was Thu 05 Jun 2014 00:27:53 UTC [00:48:52] paravoid: Is stream.wikimedia.org supposed to be operational? [00:50:43] PROBLEM - Puppet freshness on mw1168 is CRITICAL: Last successful Puppet run was Thu 05 Jun 2014 00:27:53 UTC [00:51:26] Krinkle: _joe_ was working on it, IIRC [00:51:39] I just noticed that the ngins server(s) is live [00:51:40] nginx* [00:51:46] websocket is rejecting sonnection though [00:51:49] connection* [00:51:50] (404) [00:52:43] PROBLEM - Puppet freshness on mw1168 is CRITICAL: Last successful Puppet run was Thu 05 Jun 2014 00:27:53 UTC [00:53:50] Krinkle: stream.wikimedia.org/rc/ is no 404 though [00:53:54] (only with trailing slash) [00:54:02] Krinkle: dies with something else :) [00:54:08] both /rc and /rc/ are 404 [00:54:29] access point / is default http response (because we don't use it for port 80 http, only for web sockets) [00:54:34] ("welcome to nginx") [00:54:43] PROBLEM - Puppet freshness on mw1168 is CRITICAL: Last successful Puppet run was Thu 05 Jun 2014 00:27:53 UTC [00:56:18] hmm [00:56:32] Krinkle: was getting a 500 instead on rc/ a while ago, but oh well [00:56:43] PROBLEM - Puppet freshness on mw1168 is CRITICAL: Last successful Puppet run was Thu 05 Jun 2014 00:27:53 UTC [00:57:22] It is pointing to stream-lb.eqiad.wikimedia.org, which is responding. [00:57:23] cool! [00:58:23] RECOVERY - Puppet freshness on mw1168 is OK: puppet ran at Thu Jun 5 00:58:19 UTC 2014 [01:01:15] Ah, it was set up 2 days ago [01:01:15] https://gerrit.wikimedia.org/r/#/q/message:stream+is:merged+project:operations/puppet+branch:production,n,z [01:01:26] (domain name, lvs and nginx) [01:04:18] Hm.. the labs one is broken as well [01:04:29] Looks like maybe something broke it in the puppetisation [01:04:57] stream.wmflabs.org [01:13:35] Krinkle: i'll copy you joe's status e-mail re: rc [01:13:42] cool [01:13:58] 'rcstreamctl stream' is useful! [01:17:06] ori: Can you get it to work from the client? [01:17:46] Krinkle: no, it's broken because "I proxied everything to the /rc location, but later realized that may be not needed as the python app already does that. " [01:18:02] Ah, "not needed" !== "conflicts with" [01:18:08] But I understand now [01:18:11] it's trying to connect to: http://stream.wikimedia.org/socket.io/1/?t=1401931077784 [01:18:17] yeah [01:18:19] but try: http://stream.wikimedia.org/rc/socket.io/1/?t=1401931077784 [01:18:24] How [01:18:37] i mean just load it in yr browser [01:18:54] sure [01:20:13] is anything there supposed to work in a browser? [01:20:31] The main purpose is consumption by browsers, yes. HTML5 Websockets. [01:20:39] Not HTML rendering though [01:20:44] what URL do I hit to see something useful? [01:21:12] bblack: load socket.io on some page and connect to it [01:21:14] example: http://codepen.io/Krinkle/full/laucI [01:21:31] var socket = io.connect('stream.wikimedia.org/rc'); [01:21:35] (in javascript) [01:21:41] hmm, do rc tags come through as well? [01:21:59] ori: Is there a way (without hacking socket.io src) right now to get a socket going? [01:22:15] if I specify /rc it gets 404, if I leave it off, it fails because it requests /1/?t= from the wrong url [01:22:17] Krinkle: there's a simple python client source on the wikitech rcstream page [01:22:44] bblack: alternatively, exec `rcstreamctl stream` on a rcstream node in production to feed it directly to your console [01:23:07] (or on deployment-stream.eqiad.wmflabs in labs, that works too) [01:23:32] https://gist.github.com/Krinkle/81208423a584a1880ab3 [01:23:46] https://gist.githubusercontent.com/Krinkle/81208423a584a1880ab3/raw [01:23:53] Mingle just got git integration: http://getmingle.io/scaling/2014/06/02/Git-Integration.html [01:23:59] any chance we can use this? [01:27:23] self.emit('subscribe', 'commons.wikimedia.org') ... is this enabled for all sites if I change that to something more-active? [01:27:52] Use '*' if you like [01:27:55] and yes [01:30:43] hmmm I got a 502 [01:30:44] http://paste.debian.net/hidden/113d1ef4/ [01:31:18] PROBLEM - Puppet freshness on db1007 is CRITICAL: Last successful Puppet run was Wed 04 Jun 2014 22:30:13 UTC [01:31:18] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Tue 03 Jun 2014 16:21:49 UTC [01:32:22] (using this as a client: http://paste.debian.net/hidden/fe6fd0be/ ) [01:35:44] Hm.. can't use rcstreamctl in production since I can't ssh into that node [01:35:44] redis-cli -h "rcs1001.eqiad.wmnet" -p "6379" PSUBSCRIBE "rc.*" [01:35:47] that works though (from tin) [01:35:48] bblack: [01:36:05] use rc.commonswiki for example [01:36:25] i gotta run, sorry [01:36:28] bbl [02:00:18] RECOVERY - Puppet freshness on db1007 is OK: puppet ran at Thu Jun 5 02:00:08 UTC 2014 [02:17:49] (03PS1) 10Springle: s7 depool db1007 for crash bug hunt [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137522 [02:19:20] (03CR) 10Springle: [C: 032] s7 depool db1007 for crash bug hunt [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137522 (owner: 10Springle) [02:19:26] (03Merged) 10jenkins-bot: s7 depool db1007 for crash bug hunt [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137522 (owner: 10Springle) [02:23:27] !log springle Synchronized wmf-config/db-eqiad.php: depool db1007 (duration: 01m 26s) [02:23:33] Logged the message, Master [02:33:10] !log LocalisationUpdate completed (1.24wmf6) at 2014-06-05 02:32:06+00:00 [02:33:14] Logged the message, Master [02:45:23] (03PS1) 10Springle: Prepare to replicate additional tables to labsdb (with Legal OK): [operations/puppet] - 10https://gerrit.wikimedia.org/r/137526 [02:47:00] (03CR) 10Springle: [C: 032] Prepare to replicate additional tables to labsdb (with Legal OK): [operations/puppet] - 10https://gerrit.wikimedia.org/r/137526 (owner: 10Springle) [03:03:03] !log LocalisationUpdate completed (1.24wmf7) at 2014-06-05 03:02:00+00:00 [03:03:08] Logged the message, Master [03:30:18] PROBLEM - Puppet freshness on osmium is CRITICAL: Last successful Puppet run was Wed 04 Jun 2014 21:28:55 UTC [04:10:56] !log LocalisationUpdate ResourceLoader cache refresh completed at Thu Jun 5 04:09:50 UTC 2014 (duration 9m 49s) [04:11:01] Logged the message, Master [04:20:35] (03PS1) 10Mattflaschen: Enable GuidedTour on additional language Wikipedias [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137533 [04:32:18] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Tue 03 Jun 2014 16:21:49 UTC [06:31:18] PROBLEM - Puppet freshness on osmium is CRITICAL: Last successful Puppet run was Wed 04 Jun 2014 21:28:55 UTC [06:33:07] ACKNOWLEDGEMENT - Puppet freshness on osmium is CRITICAL: Last successful Puppet run was Wed 04 Jun 2014 21:28:55 UTC ori.livneh ori, testing [06:40:54] (03PS7) 10Physikerwelt: Describe Math related packages in a class [operations/puppet] - 10https://gerrit.wikimedia.org/r/115133 (https://bugzilla.wikimedia.org/61090) (owner: 10Hashar) [06:49:58] <_joe_> ori: you here? [06:50:10] hello [06:50:34] <_joe_> hey :) [06:50:44] how are you? did you manage to get some rest? [06:50:51] <_joe_> so, I'm on a train now and today I'll work 'late' [06:50:54] <_joe_> no :/ [06:51:41] <_joe_> just wanted to know if there is something I should take care about for rcstream while you get some rest :) [06:52:07] <_joe_> If the puppet3 migration goes well, I'll have time in the afternoon [06:52:32] good morning [06:52:43] <_joe_> I've seen the urls are something like '/rc/socket.io/something' [06:53:09] <_joe_> which sucks :) can't we rewrite the socket.io part in nginx? [06:53:58] i suppose we could, but the original would have worked, and this is starting to get a bit complex for a theoretical future requirement [06:55:38] <_joe_> ok [06:55:56] <_joe_> so we'll just use '/' as location on nginx, that;s good [06:56:20] * ori nods [06:56:34] <_joe_> I just wanted to use a namespace for rcstream as we may want to add other streaming services in the future [06:57:11] <_joe_> if we will do that, we'll just use a rewrite to redirect old clients :) [06:57:12] well you could /rc-anothersystem/ [06:57:13] i think it's okay; streaming changes is a big deal [06:57:37] big enough to hog a subdomain all to itself, imo [06:57:38] <_joe_> still, we have to wait for SSL [06:57:49] <_joe_> yes [06:58:10] <_joe_> I'll bake a patch (and downvote it to -2) for SSL [06:58:22] how come? [06:58:27] <_joe_> I'd like to use PFS-only chiphers [06:58:33] <_joe_> ori: we need a cert :) [06:59:00] ah, right, of course. [06:59:10] <_joe_> but using PFS-only could mean older libs could run into issues [06:59:16] <_joe_> honestly, I don't care [06:59:33] i don't know the first thing about it [06:59:41] so happy to go with whatever you think is best [06:59:53] <_joe_> it's the devs responsibility to use updated libs [07:00:41] <_joe_> ori: PFS guarantees that communications are undecryptable even if someone eavesdropped on our communications with rcstream and later fetched the SSL private key [07:00:56] <_joe_> someone as $any_spy_agency [07:01:43] it's public data [07:01:46] <_joe_> it's a best practice and we have changes to get to use that for bugzilla and the wikis that are being debated [07:01:58] sure, okay [07:06:03] _joe_: i'm off to sleep. i'm very glad to have your help with this. good luck with puppet 3. [07:06:13] <_joe_> ori: see you later [07:06:17] <_joe_> I'm off as well [07:06:20] <_joe_> :) [07:06:31] * ori loves train rides [07:06:44] sleep well ori! [07:06:44] <_joe_> have some rest [07:06:53] headphones + ipod + train = best feeling [07:06:58] hashar: :) [07:07:31] <_joe_> yeah, atm s/ipod/itunes/ [07:07:50] <_joe_> ok, off as the big tunnels of central italy begin :) [07:07:55] I would go for an empty beach + a storm [07:12:31] (03CR) 10Hashar: [C: 04-1] "git::clone will abort when /vendor/ exists and is not a git copy :-(" (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/137463 (owner: 10BryanDavis) [07:14:27] (03CR) 10Hashar: [C: 031] jenkins user add to systemusers [operations/puppet] - 10https://gerrit.wikimedia.org/r/137390 (owner: 10Rush) [07:16:39] (03CR) 10Hashar: "And the question now is: do we really want to expose our slow parses publicly?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/49678 (owner: 10Ottomata) [07:22:03] (03CR) 10Nemo bis: "Hashar, yes, without doubt. If your fear is DoS, well any user can create a slow parse item on an abandoned wiki nobody watches and hit th" [operations/puppet] - 10https://gerrit.wikimedia.org/r/49678 (owner: 10Ottomata) [07:22:25] Nemo_bis: thanks :) [07:33:18] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Tue 03 Jun 2014 16:21:49 UTC [07:56:16] (03PS4) 10Hashar: contint: localhost.mediawiki vhost on ci labs slave [operations/puppet] - 10https://gerrit.wikimedia.org/r/135529 [07:56:51] (03CR) 10Hashar: "This is already deployed on the contint slaves." [operations/puppet] - 10https://gerrit.wikimedia.org/r/135529 (owner: 10Hashar) [08:22:22] _joe_: ori: Hm.. looks like puppet is not matching what is in production. For example, on Ganglia there is no "RCStream" group. They are part of Misc eqiad still. [08:22:31] https://github.com/wikimedia/operations-puppet/blob/production/manifests/ganglia.pp#L338 [08:34:18] PROBLEM - Puppet freshness on db1006 is CRITICAL: Last successful Puppet run was Thu 05 Jun 2014 05:33:30 UTC [09:03:08] RECOVERY - Puppet freshness on db1006 is OK: puppet ran at Thu Jun 5 09:03:06 UTC 2014 [09:16:30] anyone knows about statsd by any chance ? The gerrit.event.*.count metrics sent by Zuul are broken and stuck at 1 [09:23:07] (03PS2) 10Filippo Giunchedi: merge bits appservers into appservers [operations/puppet] - 10https://gerrit.wikimedia.org/r/136317 [09:23:20] <_joe_> Krinkle: where did you put the 'cluster' variable? [09:23:52] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] merge bits appservers into appservers [operations/puppet] - 10https://gerrit.wikimedia.org/r/136317 (owner: 10Filippo Giunchedi) [09:24:47] !log moving bits traffic to the general appserver pool in eqiad [09:24:52] Logged the message, Master [09:27:56] mhh puppet didn't like that [09:27:57] err: Could not retrieve catalog from remote server: Error 400 on SERVER: comparison of String with Hash failed at /etc/puppet/modules/varnish/manifests/instance.pp:37 on node cp1056.eqiad.wmnet [09:28:52] looks like I should use flatten instead to convert the hash [09:38:15] (03PS1) 10Filippo Giunchedi: properly flatten the bits backend list [operations/puppet] - 10https://gerrit.wikimedia.org/r/137550 [09:38:43] somebody available to quickly review https://gerrit.wikimedia.org/r/#/c/137550/ ? [09:39:17] _joe_: I'm not sure what you're asking. I didn't put anything anywhere :| [09:41:00] <_joe_> Krinkle: yeah don't worry [09:42:17] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] properly flatten the bits backend list [operations/puppet] - 10https://gerrit.wikimedia.org/r/137550 (owner: 10Filippo Giunchedi) [09:42:39] <_joe_> godog: seems legit :) [09:42:52] going for it, thanks! [09:43:47] godog: I'd go for bits_appservers => $lvs::configuration::lvs_service_ips['production']['apaches'] instead [09:44:18] remove the pmtpa/eqiad bits there, basically copy it from the "appservers" block above [09:44:23] sorry I jumped the gun :( [09:44:27] nah it's fine [09:44:30] (03CR) 10Phuedx: [C: 031] Enable GuidedTour on additional language Wikipedias [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137533 (owner: 10Mattflaschen) [09:44:37] it's more stylistic than anything [09:45:00] and just double check under cache.pp the text vs. bits config, the lookups should be approx. the same [09:45:36] yup, will do [09:55:13] (03PS1) 10Hashar: zuul: bring in python-prettytable [operations/puppet] - 10https://gerrit.wikimedia.org/r/137552 [09:56:57] (03CR) 10Hashar: [C: 031 V: 032] "Cherry picked on contint puppetmaster." [operations/puppet] - 10https://gerrit.wikimedia.org/r/137552 (owner: 10Hashar) [09:58:25] (03PS2) 10Hashar: zuul: bring in python babel and prettytable modules [operations/puppet] - 10https://gerrit.wikimedia.org/r/137552 [10:09:34] (03Abandoned) 10Matanya: applicationserver: lint and tidy [operations/puppet] - 10https://gerrit.wikimedia.org/r/122269 (owner: 10Matanya) [10:09:58] PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 525 bytes in 0.001 second response time [10:12:59] taking a look at tungsten, bits appservers in eqiad has died down too [10:13:27] <_joe_> godog: what? [10:13:39] <_joe_> oh died down as in "not serving traffic" [10:14:00] yup [10:14:41] Coren: three more tables replicated to labs. i reassigned the bugs to you [10:14:51] off for cooking [10:15:29] I get error 502 even for graphite URLs [10:16:02] hurraaaaay springle [10:17:58] :) [10:22:50] bah, apache won't talk to uwsgi socket with EAGAIN, trying to understand what's wrong with uwsgi before restarting [10:22:59] (this is tungsten) [10:28:38] !log restarted uwsgi on tungsten [10:28:42] Logged the message, Master [10:29:58] RECOVERY - graphite.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 0.005 second response time [10:30:08] PROBLEM - Number of mediawiki jobs queued on tungsten is CRITICAL: CRITICAL: Anomaly detected: 34 data above and 0 below the confidence bounds [10:30:08] PROBLEM - Number of mediawiki jobs running on tungsten is CRITICAL: CRITICAL: Anomaly detected: 34 data above and 0 below the confidence bounds [10:32:12] (03PS8) 10Christopher Johnson (WMDE): Icinga: Check Dispatch command for Wikidata notification [operations/puppet] - 10https://gerrit.wikimedia.org/r/136095 [10:34:18] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Tue 03 Jun 2014 16:21:49 UTC [10:47:15] <_joe_> godog: I tried to figure that out serveral times [10:47:22] <_joe_> never had those problems before [10:47:33] <_joe_> (with uwsgi, I mean) [10:47:40] <_joe_> but I never used it with apache [10:49:12] _joe_: bah, I also don't understand why there are seemingly two copies of uwsgi for each app running [11:36:57] I dont know stuff about CentralAuth but found this one in exception monitor just now [11:36:58] https://bugzilla.wikimedia.org/show_bug.cgi?id=66185 [11:38:31] Reedy: anomie: Tim-away: hoo: Krenair: greg-g: [11:45:41] (03CR) 10Filippo Giunchedi: [C: 031] zuul: bring in python babel and prettytable modules [operations/puppet] - 10https://gerrit.wikimedia.org/r/137552 (owner: 10Hashar) [11:51:35] (03CR) 10Filippo Giunchedi: [C: 031] contint: localhost.mediawiki vhost on ci labs slave (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/135529 (owner: 10Hashar) [11:52:14] (03CR) 10TTO: [C: 04-1] "This is wrong in many ways. However, the following appears to work:" [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/124140 (https://bugzilla.wikimedia.org/62160) (owner: 1001tonythomas) [11:59:08] PROBLEM - Host ms-be1005 is DOWN: PING CRITICAL - Packet loss = 100% [11:59:52] (03CR) 10Filippo Giunchedi: [C: 031] migrate ::imagescaler -> ::mediawiki::multimedia (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/137363 (owner: 10Ori.livneh) [12:03:06] taking a look [12:05:32] !log powercycling ms-be1005, no ssh, no console [12:05:36] Logged the message, Master [12:05:49] grrr again [12:06:48] sigh, is it known for doing that? [12:09:38] RECOVERY - Host ms-be1005 is UP: PING OK - Packet loss = 0%, RTA = 0.28 ms [12:11:01] yeah [12:11:04] they lock up at random [12:17:07] <_joe_> godog: yeah I rebooted 3 of them in the last month [12:21:39] mmhh indeed, looking back there are page allocation failures too [12:49:42] anyone looking at fatals? they look to spike on specific hosts [12:51:44] https://logstash.wikimedia.org/#dashboard/temp/Ra6vM5OVRoGWuO5VTbcs3g [12:52:40] mw1219 is just APC spam [12:52:42] the "HOSTS" panel in the middle of the screen shows 6 hosts responsible for most of the errors [12:53:03] Reedy: all of them then [12:53:05] ? [12:53:12] A quick glance seems to suggest so [12:53:15] It started on tuesday [12:53:23] Which is weird, I've never seen it for the intermediatary deploys before [12:54:54] I'll ignore it then [13:02:18] PROBLEM - RAID on es1006 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) [13:03:19] springle: ^^ [13:04:18] PROBLEM - Puppet freshness on db1006 is CRITICAL: Last successful Puppet run was Thu 05 Jun 2014 10:03:29 UTC [13:06:31] (03PS1) 10Alexandros Kosiaris: Increase bacula pool size [operations/puppet] - 10https://gerrit.wikimedia.org/r/137564 [13:17:57] YuviPanda, Deskana|Away, zz_prtksxna: Please prepare the backport patches for your SWAT and update the Deployments page; see https://wikitech.wikimedia.org/wiki/SWAT_deploys#Guidelines. Feel free to ping me if you need help figuring out the process. Thanks. [13:25:26] anomie: yeah, am going to do that in about 5 mins [13:25:44] YuviPanda: Great! [13:31:18] PROBLEM - Puppet freshness on db1007 is CRITICAL: Last successful Puppet run was Thu 05 Jun 2014 10:30:42 UTC [13:32:09] anomie: I'm just going to backport it to wmf7, since wmf6 is going away two hours after that anyway. [13:32:21] YuviPanda: fine with me [13:33:12] RECOVERY - Puppet freshness on db1006 is OK: puppet ran at Thu Jun 5 13:33:03 UTC 2014 [13:34:12] anomie: can you +2 these three patches? https://gerrit.wikimedia.org/r/#/q/project:mediawiki/extensions/Popups+branch:wmf/1.24wmf7,n,z I don't have +2 on wmf branches [13:34:18] YuviPanda: Also, feel free to combine them into one extension-version-bump patch if that makes sense from a testing standpoint. [13:34:54] anomie: yeah, it does. let me submit a submodule bump [13:35:18] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Tue 03 Jun 2014 16:21:49 UTC [13:35:21] * YuviPanda waits for things to merge [13:43:55] anomie: submodule bump is at https://gerrit.wikimedia.org/r/#/c/137575/. Let me add it to the calendar. [13:45:39] YuviPanda: Looks good [13:46:04] anomie: sweet. I'll make sure I'm around for the deploy in 1:15 mins, and also zz_prtksxna is around too so he can test and verify [13:46:50] morebots, ok? [13:46:50] I am a logbot running on tools-exec-09. [13:46:50] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [13:46:50] To log a message, type !log . [13:48:33] godog: Have you been looking at swift at all? [13:48:59] (I think mark was going to talk to you about it… but I don't know if he did.) [13:56:15] YuviPanda: I'm not seeing the calendar update yet? [13:56:24] anomie: gah, forgot to hit save. moment. [13:58:01] anomie: updated [13:58:16] !log Adding unit tests Jenkins job for most mediawiki extensions {{gerrit|137578}} [13:58:23] Logged the message, Master [14:02:32] (03PS1) 10Reedy: Add symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137582 [14:02:34] (03PS1) 10Reedy: testwiki to 1.24wmf8 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137583 [14:02:36] (03PS1) 10Reedy: Wikipedias to 1.24wmf7 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137584 [14:02:38] (03PS1) 10Reedy: Update group0 to 1.24wmf8 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137585 [14:02:53] (03CR) 10Reedy: [C: 032] Add symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137582 (owner: 10Reedy) [14:03:09] (03Merged) 10jenkins-bot: Add symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137582 (owner: 10Reedy) [14:03:37] (03CR) 10Reedy: [C: 032] testwiki to 1.24wmf8 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137583 (owner: 10Reedy) [14:03:55] (03Merged) 10jenkins-bot: testwiki to 1.24wmf8 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137583 (owner: 10Reedy) [14:04:52] prtksxna: heya! saw my email? [14:05:10] YuviPanda: o/ [14:05:14] YuviPanda: looking now [14:05:57] !log reedy Purged l10n cache for 1.24wmf5 [14:06:02] Logged the message, Master [14:06:36] !log reedy Started scap: testwiki to 1.24wmf8 and build l10n cahce [14:06:41] Logged the message, Master [14:07:09] anomie: I am here now [14:07:25] prtksxna: wait another 53min :D [14:07:42] prtksxna: Ok. YuviPanda took care of the issue already, now we just wait for 15:00 UTC [14:07:44] !log reedy scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki="testwiki" --list-file="/a/common/wmf-config/extension-list" --output="/tmp/tmp.WtQBrR6JUp" ' returned non-zero exit status 1 (duration: 01m 08s) [14:07:47] Logged the message, Master [14:07:52] YuviPanda: Yup, docs stated you have to be preset ~1h earlier :) [14:12:17] (03CR) 10Sumanah: "Hashar: yes, please. https://www.mediawiki.org/wiki/Performance_profiling_for_Wikimedia_code is one place we ask developers to look at the" [operations/puppet] - 10https://gerrit.wikimedia.org/r/49678 (owner: 10Ottomata) [14:12:42] prtksxna: yeah, I guess that's to get the backports merged and the submodule bump done [14:14:12] (03PS1) 10Reedy: Add extension-list-1.24wmf8, remove Nostalgia from extensions [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137637 [14:14:14] (03PS1) 10Reedy: Add new Skins to extension-list [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137638 [14:14:27] (03CR) 10Reedy: [C: 032] Add extension-list-1.24wmf8, remove Nostalgia from extensions [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137637 (owner: 10Reedy) [14:14:33] (03Merged) 10jenkins-bot: Add extension-list-1.24wmf8, remove Nostalgia from extensions [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137637 (owner: 10Reedy) [14:14:38] (03CR) 10Reedy: [C: 032] Add new Skins to extension-list [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137638 (owner: 10Reedy) [14:14:43] (03Merged) 10jenkins-bot: Add new Skins to extension-list [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137638 (owner: 10Reedy) [14:14:53] !log reedy Started scap: testwiki to 1.24wmf8 and build l10n cache [14:14:58] Logged the message, Master [14:15:10] !log reedy scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki="testwiki" --list-file="/a/common/wmf-config/extension-list" --output="/tmp/tmp.hiiCprts7Z" ' returned non-zero exit status 1 (duration: 00m 17s) [14:16:08] (03PS1) 10Reedy: Promote extension-list-1.24wmf8 to primary extension-list [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137639 [14:23:25] (03PS1) 10Giuseppe Lavagetto: puppet: upgrade labs puppetmasters to version 3 (WIP) [operations/puppet] - 10https://gerrit.wikimedia.org/r/137642 [14:29:24] andrewbogott: yes he did! I did start looking at it yesterday (docs + puppet) that is [14:30:46] godog: I'm trying to set up a mini swift cluster in labs for practice… want to help me troubleshoot? [14:31:33] !log reedy Started scap: testwiki to 1.24wmf8 and build l10n cache [14:32:18] godog: unfortunately the very complete docs on wikitech are also largely wrong since they presume using swauth and prod doesn't use swauth anymore [14:32:43] andrewbogott: yup absolutely, was wondering about labs too [14:35:00] (03CR) 10Reedy: [C: 04-1] "Probably shouldn't be deployed till next Tuesday post the masses to 1.24wmf8" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137639 (owner: 10Reedy) [14:41:45] (03CR) 10BryanDavis: "> git::clone will abort when /vendor/ exists and is not a git copy" [operations/puppet] - 10https://gerrit.wikimedia.org/r/137463 (owner: 10BryanDavis) [14:48:58] (03Abandoned) 10Reedy: Promote extension-list-1.24wmf8 to primary extension-list [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137639 (owner: 10Reedy) [14:52:41] James_F|Away, prtksxna, Deskana, YuviPanda: Ping for SWAT in about 8 minutes [14:52:51] anomie: am around [14:52:54] (03PS1) 10Reedy: Rework extension-list [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137656 [14:52:57] I'm here. :) [14:53:21] \o [14:53:44] manybubbles: I'll take the SWAT today unless you want it, I'm not in the middle of anything [14:53:51] sure! [14:53:53] have fun [14:54:25] !log restarting Zuul [14:54:27] Scap is nearly done [14:54:30] Logged the message, Master [14:54:49] anomie: Here now. [14:54:53] (03PS1) 10Reedy: Remove extension-list-1.24wmf6 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137657 [14:57:57] !log reedy Finished scap: testwiki to 1.24wmf8 and build l10n cache (duration: 26m 23s) [14:58:02] Logged the message, Master [14:58:03] anomie: ^^ all clear [14:58:10] Reedy: thanks [14:58:37] (03CR) 10Hashar: [C: 031] "So that sounds good to me :-]" [operations/puppet] - 10https://gerrit.wikimedia.org/r/137463 (owner: 10BryanDavis) [14:58:52] bd808: good morning! [14:58:59] hey hashar [14:59:17] bd808: I got Zuul upgraded on the labs instance and got the cloner to run there [14:59:27] \o/ [14:59:27] bd808: Zero doc though :-( so that is not helpful [14:59:31] (03PS2) 10Reedy: Replace the Nostalgia extension with the Nostalgia skin [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137200 (https://bugzilla.wikimedia.org/61256) (owner: 10Bartosz Dziewoński) [14:59:45] hashar: Meh. bug #1 [15:00:05] manybubbles, anomie: The time is nigh to deploy SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140605T1500) [15:00:27] prtksxna, YuviPanda, Deskana: I'll do yours first. [15:00:30] bd808: example change http://integration.wmflabs.org/gerrit/#/c/12/ it has two lame job on it that use the cloner [15:00:58] * YuviPanda waves [15:01:03] bd808: I am going to write some doc about the instance (i.e. how to log in, clone the repos etc). Then we can probably add a dummy mw/core and vendor repos there and test. [15:01:14] * prtksxna is ready [15:01:53] hashar: sounds good to me [15:03:05] bd808: you wont believe how much of a hack it is though :( [15:03:18] bd808: have a good breakfast. Will mail you the URL to the doc [15:04:02] hashar: thanks. and thanks for getting to work on this so quickly [15:04:51] !log anomie Synchronized php-1.24wmf7/extensions/Popups/resources/: SWAT: Hovercard animation fixes [[gerrit:137530]] [[gerrit:137531]] [[gerrit:137532]] (duration: 00m 14s) [15:04:56] prtksxna, YuviPanda, Deskana: ^ please test [15:04:56] Logged the message, Master [15:05:05] bd808: to be honest, I have been more or less working on it for the last few months. I guess your vendor directory put the idea on the top of the pile :) [15:05:13] YuviPanda: Deskana on FF [15:05:26] bd808: Is mw1151 known to be down? [15:05:53] anomie: It was broken for us yesterday. I logged in SAL [15:05:59] James_F: You're next [15:06:05] Getting ssh denied? [15:06:07] Kk. [15:06:11] bd808: Yes [15:06:18] Deskana: I see no flickering. [15:06:18] prtksxna: ^ [15:07:53] prtksxna: Deskana looks good to go? can we call the deploy done? [15:08:32] * bd808 heads into office [15:09:35] * prtksxna is still testing [15:10:56] (03PS2) 10Giuseppe Lavagetto: puppet: upgrade labs puppetmasters to version 3 [operations/puppet] - 10https://gerrit.wikimedia.org/r/137642 [15:11:01] (03PS1) 10BBlack: pybal (1.06) precise; urgency=low [operations/debs/pybal] - 10https://gerrit.wikimedia.org/r/137662 [15:11:07] prtksxna: not fixed, still flickering [15:11:10] prtksxna: on FF [15:11:16] (03CR) 10BBlack: [C: 032 V: 032] pybal (1.06) precise; urgency=low [operations/debs/pybal] - 10https://gerrit.wikimedia.org/r/137662 (owner: 10BBlack) [15:11:26] YuviPanda: The animations aren't flipped [15:11:51] YuviPanda: Is it for you? I am not sure if it refreshed properly for me [15:11:55] Deskana: What do you see? [15:13:00] Coren: if I want to give labs_lvm a second crack at partitioning, is it possible to discard an already-created partition? [15:13:22] andrewbogott: Yes, but you have to do it by hand. [15:13:40] that's fine [15:14:04] it's something I can do from within the instance? Or do I need to delete files on the virt host? [15:14:49] * anomie is going to move on to James_F's patches while prtksxna et al figure their stuff out [15:15:00] Is it possible that we aren't seeing the changes at all? [15:15:06] prtksxna: checking that now [15:15:58] No, from within. You want to (a) unmount the filesystem, (b) remove the entry from fstab, and (c) lvremove . You might want to (@) disable puppet while you're doing it just in case. [15:16:46] 'k, trying... [15:18:16] YuviPanda: Checked in the debugger, I don't think we are seeing the changes. The animation class names still seem to be `mwe-popups-fade-in` and not `mwe-popups-fade-in-up` as changed in https://gerrit.wikimedia.org/r/#/c/137530/1/resources/ext.popups.animation.less [15:18:35] prtksxna: right. RL cache might be still stuck. [15:18:47] !log anomie Synchronized php-1.24wmf7/extensions/VisualEditor/modules/ve-mw/ui/dialogs/: SWAT: Use correctly in the Media and Reference toolbars [[gerrit:136782]] (duration: 00m 12s) [15:18:51] James_F: ^ Test please (wmf7) [15:18:53] Logged the message, Master [15:19:05] prtksxna: let's wait for anomie to finish James_F's patch before bugging him again [15:19:18] YuviPanda: Right. [15:19:27] Deskana: Are you seeing the same stuff as us? [15:20:10] anomie: It's actually untestable on the wmf7 wikis right now, sadly. [15:20:23] YuviPanda: Sanity check: you are testing on a non-Wikipedia, right? [15:20:31] James_F: ? [15:20:33] anomie: ugh, no. enwiki, since it is wmf7 [15:21:00] anomie: The code path is inactive on the wikis which have wmf7 until Wikipedias get wmf7 this afternoon. [15:21:06] YuviPanda: enwiki is wmf6 still [15:21:13] anomie: All VE-enabled non-Wikipedias don't use this codepath. [15:21:17] anomie: oh, right. I got it the *other* way around. sorry. [15:21:22] prtksxna: testwiki! [15:21:28] * Deskana looks. [15:21:30] (03CR) 10coren: [C: 031] "A shame about the bug. Not an objection to this patch: is there a specific reason why we cannot use a separate auth.conf in both cases, t" [operations/puppet] - 10https://gerrit.wikimedia.org/r/137642 (owner: 10Giuseppe Lavagetto) [15:21:33] * YuviPanda facepalms [15:21:49] !log anomie Synchronized php-1.24wmf6/extensions/VisualEditor/modules/ve-mw/ui/dialogs/: SWAT: Use correctly in the Media and Reference toolbars [[gerrit:136783]] (duration: 00m 15s) [15:21:53] James_F: Ok, test wmf6 now [15:21:53] Logged the message, Master [15:22:35] prtksxna: looks ok on testwiki. [15:22:58] anomie: Yup, working. Thanks! [15:23:04] Looks fine on testwiki for me. [15:23:10] prtksxna: can you confirm? [15:23:16] testwiki is on wmf8 [15:23:30] YuviPanda: checking, my irc client crashed :\ [15:23:36] lol. testing on commons now [15:23:57] prtksxna: Deskana new information! test on commons, actually. [15:24:03] (03PS1) 10Aaron Schulz: Use fraction sleep in jobs loop to keep the pipeline thicker [operations/puppet] - 10https://gerrit.wikimedia.org/r/137669 [15:24:07] (03CR) 10Giuseppe Lavagetto: "We always kept the vanilla auth.conf for puppet 2.7, which would be fine if we didn't have this bug." [operations/puppet] - 10https://gerrit.wikimedia.org/r/137642 (owner: 10Giuseppe Lavagetto) [15:24:09] * Deskana tests. [15:24:47] prtksxna: Deskana looks ok on commons too [15:24:53] Looks fine on commons. [15:24:57] anomie: this will roll out to enwiki later today when it goes on wmf7 right? [15:25:02] YuviPanda: Correct [15:25:41] YuviPanda Deskana anomie everything seems to be working [15:25:50] anomie: consider ourselves swatted [15:25:58] * anomie is done with SWAT [15:26:00] anomie: prtksxna Deskana apologies for the wiki confusion, I got 6 and 7 backwards [15:26:29] * prtksxna feels calm after setting wikis ablaze [15:26:37] anomie: ty! [15:26:51] Thanks a ton anomie o/ [15:27:55] prtksxna: can you send an email update? also mention that I got testwiki and enwiki backwards, and it'll be live on enwiki in a couple of hours [15:28:18] YuviPanda: Sure [15:29:34] (03Abandoned) 10Reedy: Remove extension-list-1.24wmf6 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137657 (owner: 10Reedy) [15:31:49] andrewbogott: Did the repartition work as you expected? [15:31:50] (03PS2) 10Reedy: Create wmf8 extension-list with new "skins" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137656 [15:32:04] Coren: I think so? I'm distracted by my puppetmaster breaking [15:32:52] (03PS3) 10Reedy: extension-list-1.24wmf8 just with new "skins" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137656 [15:34:10] anomie: Let me know when you've finished with SWAT please [15:34:29] Reedy: Finished 10 minutes ago [15:34:37] aha, sweet [15:34:48] (03CR) 10Reedy: [C: 032] extension-list-1.24wmf8 just with new "skins" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137656 (owner: 10Reedy) [15:34:54] (03Merged) 10jenkins-bot: extension-list-1.24wmf8 just with new "skins" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137656 (owner: 10Reedy) [15:35:28] !log reedy Started scap: 2nd scap for 1.24wmf8, should be effectively a nooop [15:35:34] Logged the message, Master [15:37:29] Anyone know if phps version_compare will work nicely with our 1.24wmfX style? [15:39:41] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Just typos but LGTM otherwise" (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/137642 (owner: 10Giuseppe Lavagetto) [15:39:55] nvm, don't need it [15:40:07] (03PS1) 10Giuseppe Lavagetto: puppet: pin packages for puppet::self::master as well [operations/puppet] - 10https://gerrit.wikimedia.org/r/137678 [15:43:57] (03CR) 10Giuseppe Lavagetto: puppet: upgrade labs puppetmasters to version 3 (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/137642 (owner: 10Giuseppe Lavagetto) [15:44:35] (03CR) 10Giuseppe Lavagetto: [C: 032] puppet: pin packages for puppet::self::master as well [operations/puppet] - 10https://gerrit.wikimedia.org/r/137678 (owner: 10Giuseppe Lavagetto) [15:44:46] Reedy: version_compare does do the right thing for ordering our wmf versions. I use it in updateBranchPointers [15:46:24] (03Draft1) 10Alexandros Kosiaris: Just modularize webserver.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/137682 [15:46:35] (03CR) 10jenkins-bot: [V: 04-1] Just modularize webserver.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/137682 (owner: 10Alexandros Kosiaris) [15:46:50] Reedy: bd808: but note that "1.23.0" < "1.23wmf1", iirc [15:47:06] I'd only care for 1.24wmf1 vs 1.24wmf2 etc [15:47:15] Reedy: what are you breaking with Nostalgia? :( [15:47:39] (03PS2) 10Alexandros Kosiaris: Just modularize webserver.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/137682 [15:47:47] I'm not? [15:48:01] !log reedy Finished scap: 2nd scap for 1.24wmf8, should be effectively a nooop (duration: 12m 33s) [15:48:02] akosiaris: you should coordinate with Ori [15:48:06] Logged the message, Master [15:48:20] akosiaris: he's about to propose a new apache module [15:48:21] paravoid: isn't he doing the applicationserver stuff ? [15:48:25] MatmaRex: "1.23.0" > "1.23wmf1" [15:48:27] argh [15:48:44] webserver.pp is essentially two modules into one btw [15:48:52] I noticed that [15:48:54] and both crap [15:49:04] one of two has some really good concepts but needs some love [15:49:21] anomie: well, then maybe …rc < …wmf, or something. i know people ran into some issues with that [15:49:40] I am trying to solve an issue in zirconium. One too many modules trying to manage webserver [15:49:51] my take is that all three of (webserver.pp + the puppetlabs apache module) should be replaced by a new sane apache2 module [15:50:06] that makes absolute sense [15:50:22] that would support some abstractions for quick configs, misc servers and the production apache set up [15:50:27] and be compatible with both 2.2 and 2.4 [15:50:35] MatmaRex: Looks like it sorts correctly to me -- http://paste.debian.net/103576/ [15:50:40] MatmaRex: Nope, not that one either. The way it compares components is documented at http://us3.php.net/version_compare [15:50:51] I believe ori's ideas are similar, so we should definitely coordinate [15:51:21] (03PS3) 10Giuseppe Lavagetto: puppet: upgrade labs puppetmasters to version 3 [operations/puppet] - 10https://gerrit.wikimedia.org/r/137642 [15:51:38] MatmaRex: I do remember that there is some edge case for it though. [15:51:39] ok, but I can probably decouple those two things. I just modularized it to make the problems even more evident [15:52:27] (03PS3) 10Giuseppe Lavagetto: puppet3: make puppet::self::master work in puppet 3 [operations/puppet] - 10https://gerrit.wikimedia.org/r/137025 [15:53:13] bd808: anomie: i think this is what i had in mind: https://bugzilla.wikimedia.org/show_bug.cgi?id=27248 [15:55:06] (03CR) 10Giuseppe Lavagetto: [C: 032] puppet3: make puppet::self::master work in puppet 3 [operations/puppet] - 10https://gerrit.wikimedia.org/r/137025 (owner: 10Giuseppe Lavagetto) [15:55:53] MatmaRex: Yeah. I remember something similar form GWToolset -- https://github.com/wikimedia/mediawiki-extensions-GWToolset/blob/master/includes/Constants.php#L16 [15:56:17] The 'c' was needed to sort less than 1.23wmf1 [15:57:07] not that again ... [15:57:28] * aude fixed it for mobile frontend [15:57:38] aude: Just a random discussion; no active bugs that I know of [15:57:45] phew [15:59:04] <_joe_> !log disabling puppet on virt1000 while we test the puppet3 upgrade on virt0 [15:59:09] Logged the message, Master [15:59:48] RECOVERY - Puppet freshness on db1007 is OK: puppet ran at Thu Jun 5 15:59:38 UTC 2014 [15:59:59] (03PS4) 10Giuseppe Lavagetto: puppet: upgrade labs puppetmasters to version 3 [operations/puppet] - 10https://gerrit.wikimedia.org/r/137642 [16:00:24] (03CR) 10Giuseppe Lavagetto: [C: 032] puppet: upgrade labs puppetmasters to version 3 [operations/puppet] - 10https://gerrit.wikimedia.org/r/137642 (owner: 10Giuseppe Lavagetto) [16:01:21] PROBLEM - Puppet freshness on cp4018 is CRITICAL: Last successful Puppet run was Thu 05 Jun 2014 15:58:32 UTC [16:01:22] https://www.openssl.org/news/secadv_20140605.txt [16:01:35] "An attacker using a carefully crafted handshake can force the use of weak [16:01:36] keying material in OpenSSL SSL/TLS clients and servers." [16:01:54] <_joe_> Bsadowski1: I think we already upgraded openssl [16:01:56] More SSL vulns [16:01:59] Oh okay. [16:02:03] <_joe_> we're aware anyway :) [16:02:04] This was today [16:02:07] k [16:02:25] (03PS1) 10Ottomata: Add otto dotfiles [operations/puppet] - 10https://gerrit.wikimedia.org/r/137686 [16:02:50] !log Connected cp3018:eth1 to cr1-esams:xe-0/0/3 (unconfigured) [16:02:55] Logged the message, Master [16:02:59] ottomata: ^^ [16:03:08] oo, interesting! [16:03:15] so it's not entirely straightforward [16:03:20] ok... [16:03:21] PROBLEM - Puppet freshness on cp4018 is CRITICAL: Last successful Puppet run was Thu 05 Jun 2014 15:58:32 UTC [16:03:27] eth0 is connected as usual, but we can send test traffic over eth1 once the box is installed [16:03:36] but we'll figure that out [16:03:42] perhaps paravoid will install it today and test some [16:03:58] install just os? [16:04:18] yeah [16:04:23] I'm helping bblack with something now [16:04:32] Strange though. It took OpenSSL devs a month to fix it [16:04:33] ? [16:04:39] I'll do that when I'm done, unless you want to try it first [16:04:43] so we also need to make a temp subnet for it on the router [16:04:49] i'm going to reinstall those solr boxes today anyway, is cp3018 already in dns/netboot, etc? [16:04:52] ah ok [16:04:54] no [16:05:21] PROBLEM - Puppet freshness on cp4018 is CRITICAL: Last successful Puppet run was Thu 05 Jun 2014 15:58:32 UTC [16:05:28] does subnet stuff have to happen before OS install? [16:05:32] so to be clear, eth0 is connected as any other server [16:05:38] and we can put it in any existing subnet [16:05:42] for eth1 we need to create a new subnet on the router [16:05:44] install can already happe [16:05:45] happen [16:05:47] ok, cool [16:05:53] but it's probably not in dhcpd.conf yet, etc, vlan needs to be assigned [16:05:57] this server was never used yet [16:06:02] ok, i'll see if i can get on that [16:06:14] so this is in a block of 4 servers originally slated for OSM [16:06:17] should I put it in with cp300*s? [16:06:17] cp3015-3018 [16:06:24] and we can probably install it as text varnish now [16:06:28] (bblack: ^) [16:06:35] (03PS2) 10Reedy: Wikipedias to 1.24wmf7 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137584 [16:06:40] (03PS2) 10Reedy: Update group0 to 1.24wmf8 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137585 [16:07:21] PROBLEM - Puppet freshness on cp4018 is CRITICAL: Last successful Puppet run was Thu 05 Jun 2014 15:58:32 UTC [16:07:31] ok, just checking on what vlan I shoudl put it in, with other esams text varnishes then? [16:07:55] internal [16:08:04] that's what we want all future installs to be in [16:08:25] however if just for a test, we don't care [16:08:35] but for installing these 4 servers in production, it should go into the new internal subnet [16:08:40] like lvs300x [16:08:51] ok, will check those and put it there [16:09:09] we'll put the test interface on a public ip/subnet [16:09:13] so we have flexibility there [16:09:21] PROBLEM - Puppet freshness on cp4018 is CRITICAL: Last successful Puppet run was Thu 05 Jun 2014 15:58:32 UTC [16:09:26] test interface? [16:09:53] i just connected eth1 of cp3018 to the router directly [16:10:01] to test if we're seeing issues with the switches everything else is on [16:10:08] but eth0 is connected as normal [16:10:19] so we can still use and install the box as normal [16:10:21] and do testing via eth1 [16:10:31] RECOVERY - Puppet freshness on cp4018 is OK: puppet ran at Thu Jun 5 16:10:22 UTC 2014 [16:10:56] ok cool, thanks mark, i'm going to get some lunch and then start on these reinstalls [16:11:03] ok [16:11:07] will do cp3018 (just OS) too [16:11:08] brandon can help as well [16:11:13] he's got experience with that private subnet at esams ;-) [16:11:25] ok, bblack, I will look to you for review :) [16:12:21] PROBLEM - Puppet freshness on cp4018 is CRITICAL: Last successful Puppet run was Thu 05 Jun 2014 16:10:22 UTC [16:13:07] ottomata: ok [16:17:12] !log downprefing amslvs1, upprefing lvs3001 [16:17:15] bblack: ^ [16:17:17] Logged the message, Master [16:20:37] (03PS1) 10Alexandros Kosiaris: Repurposing osm-db100{1,2} as labsdb100{6,7} [operations/puppet] - 10https://gerrit.wikimedia.org/r/137691 [16:28:04] paravoid: One (tedious) thing you could do to further godog's and my education… Ben wrote very complete swift docs on wikitech, but they all presume swauth so they're not especially accurate anymore. Maybe you could take a few minutes and update things? [16:28:33] I switched back to tempauth [16:28:35] (03CR) 10BryanDavis: "Cherry-picked on deployment-salt yesterday. Verified this morning that /srv/scap-stage-dir/php-master/vendor exists and is being updated b" [operations/puppet] - 10https://gerrit.wikimedia.org/r/137463 (owner: 10BryanDavis) [16:28:46] which is essentially... proxy-server.conf, one line per account [16:28:52] that's it :) [16:31:39] (03CR) 10Ori.livneh: migrate ::imagescaler -> ::mediawiki::multimedia (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/137363 (owner: 10Ori.livneh) [16:32:24] morning [16:33:50] (03PS3) 10Ori.livneh: migrate ::imagescaler -> ::mediawiki::multimedia [operations/puppet] - 10https://gerrit.wikimedia.org/r/137363 [16:34:02] (03CR) 10Ori.livneh: [C: 032 V: 032] migrate ::imagescaler -> ::mediawiki::multimedia [operations/puppet] - 10https://gerrit.wikimedia.org/r/137363 (owner: 10Ori.livneh) [16:36:11] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Tue 03 Jun 2014 16:21:49 UTC [16:36:20] (03PS1) 10Ori.livneh: typo fix for I28bcbab76 [operations/puppet] - 10https://gerrit.wikimedia.org/r/137696 [16:36:43] ^ trivial, cld someone +1? [16:36:45] !log downpref all of amslvs* in favor of lvs30* [16:36:50] Logged the message, Master [16:36:57] bblack: ^ [16:37:36] paravoid: woot [16:37:58] we really should add MED support to twistedbgp/pybal [16:38:27] then it could just be a matter of pybal config/puppet to decide what's primary and what's not [16:38:33] and for which service ip [16:38:41] mark: ^ [16:38:57] (03CR) 10Ottomata: [C: 031] typo fix for I28bcbab76 [operations/puppet] - 10https://gerrit.wikimedia.org/r/137696 (owner: 10Ori.livneh) [16:39:00] (03PS1) 10Reedy: Add some documentation about extension-list-$wmfVersionNumber [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137697 [16:39:12] (03CR) 10Ori.livneh: [C: 032] typo fix for I28bcbab76 [operations/puppet] - 10https://gerrit.wikimedia.org/r/137696 (owner: 10Ori.livneh) [16:42:20] (03CR) 10Ori.livneh: [C: 031] Rsyncing slow-parse logs from fluorine to dumps.wikimedia.org. [operations/puppet] - 10https://gerrit.wikimedia.org/r/49678 (owner: 10Ottomata) [16:43:35] !log rebooting lvs3002 [16:43:40] Logged the message, Master [16:46:51] (03PS2) 10Ottomata: Add otto dotfiles [operations/puppet] - 10https://gerrit.wikimedia.org/r/137686 [16:47:01] (03CR) 10Ottomata: [C: 032 V: 032] Add otto dotfiles [operations/puppet] - 10https://gerrit.wikimedia.org/r/137686 (owner: 10Ottomata) [16:47:27] ottodot [16:52:41] paravoid: I know it's simple, but the step-by-step in the docs are still all wrong :) [16:52:53] paravoid: btw, sorry to say but another lib dep coming to hhvm :/ [16:53:00] which one? [16:53:02] fastlz [16:53:09] for ext_memcached [16:53:21] where is that? [16:53:53] (03PS1) 10Reedy: Wrap really long wgCaptchaWhitelist line [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137699 [16:54:09] paravoid: you'll appreciate https://github.com/wikimedia/operations-software-hhvm-dev/commit/5b32fd599e8ce76bfe3d13723fd9d66c6e22dccf#commitcomment-6562917 i think [16:55:03] why did you add the dep to fastlz? [16:55:35] paravoid: everything currently in memcached >2000 bytes is compressed with it [16:55:51] you mean that we use it already? [16:55:53] paravoid: it's the default for the pecl memcached client, which that hhvm library is emulating [16:55:57] paravoid: yes, sadly [16:56:12] paravoid: see https://bugzilla.wikimedia.org/show_bug.cgi?id=66104 [16:56:32] bblack: how do I tell which of these MACs is eth0? [16:56:41] from racadm getsysinfo [16:56:48] just the first ethernet one listed? [16:56:50] (03PS1) 10Reedy: Wrap long 'wmgRSSUrlWhitelist' entry [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137700 [16:56:52] NIC.Integrated.1-1-1 [16:56:52] ? [16:57:49] ottomata: yeah usually the first one [16:58:00] k [16:58:02] ottomata: worst case look for dhcp reqs on the install server [16:58:09] hm, aye right [16:59:06] hm, so, this already has a public IP assoicated with it in dns, we are removing that, right? making it .esams.wmnet? [16:59:21] (03CR) 10Reedy: [C: 032] Wrap long 'wmgRSSUrlWhitelist' entry [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137700 (owner: 10Reedy) [16:59:27] (03Merged) 10jenkins-bot: Wrap long 'wmgRSSUrlWhitelist' entry [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137700 (owner: 10Reedy) [16:59:31] RECOVERY - Puppet freshness on osmium is OK: puppet ran at Thu Jun 5 16:59:30 UTC 2014 [16:59:36] (03PS2) 10Reedy: Wrap really long wgCaptchaWhitelist line [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137699 [16:59:39] (03CR) 10Reedy: [C: 032] Wrap really long wgCaptchaWhitelist line [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137699 (owner: 10Reedy) [16:59:46] (03Merged) 10jenkins-bot: Wrap really long wgCaptchaWhitelist line [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137699 (owner: 10Reedy) [16:59:52] (03PS2) 10Reedy: Add some documentation about extension-list-$wmfVersionNumber [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137697 [16:59:56] (03CR) 10Reedy: [C: 032] Add some documentation about extension-list-$wmfVersionNumber [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137697 (owner: 10Reedy) [17:00:03] (03Merged) 10jenkins-bot: Add some documentation about extension-list-$wmfVersionNumber [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137697 (owner: 10Reedy) [17:00:11] ori: so is fastlz a format as well? [17:00:45] paravoid: a format? [17:00:50] !log reedy Synchronized wmf-config/: Wrap some long lines, add some docs (duration: 00m 26s) [17:00:54] Logged the message, Master [17:01:21] it's a compression algorithm that is used by the pecl client to compressed values, it was added because it is purportedly faster than zlib, and it was made the default [17:05:33] ori: so it has its own format? [17:05:42] (03PS1) 10Ottomata: Add DNS entries for cp3018.esams.wmnet. [operations/dns] - 10https://gerrit.wikimedia.org/r/137706 [17:06:21] it's http://fastlz.org/ [17:06:23] (03CR) 10Filippo Giunchedi: [C: 031] Use fraction sleep in jobs loop to keep the pipeline thicker [operations/puppet] - 10https://gerrit.wikimedia.org/r/137669 (owner: 10Aaron Schulz) [17:06:54] ottomata: please do cp3015-3018 while you're at it... [17:06:58] oh ok can do [17:07:09] they're identical, other than that connected cable [17:07:16] once slated for OSM, now probably gonna be text varnish [17:07:20] ok, those already have linux-host-entires [17:07:28] should I keep them at esams.wikimedia/ [17:07:28] ? [17:07:39] no, move to internal [17:07:45] got it [17:07:46] cool [17:11:25] (03PS2) 10Ottomata: Add DNS entries for cp301[5-8].esams.wmnet. [operations/dns] - 10https://gerrit.wikimedia.org/r/137706 [17:12:24] (03PS1) 10Ottomata: Move cp301[5-8] to internal esams network, add site.pp node entry for cp3018 [operations/puppet] - 10https://gerrit.wikimedia.org/r/137707 [17:13:01] bblack ^ and ^^ [17:13:01] :) [17:16:31] PROBLEM - SSH on lvs4002 is CRITICAL: Server answer: [17:17:31] RECOVERY - SSH on lvs4002 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [17:18:52] PROBLEM - SSH on lvs4001 is CRITICAL: Server answer: [17:19:52] RECOVERY - SSH on lvs4001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [17:20:14] ottomata: any reason not to add the ip6 interface on 3018 now as well? [17:20:29] (like the one above) [17:20:48] (03CR) 10BBlack: [C: 031] Add DNS entries for cp301[5-8].esams.wmnet. [operations/dns] - 10https://gerrit.wikimedia.org/r/137706 (owner: 10Ottomata) [17:21:01] i didn't? [17:21:19] yeah needs interface::add_ip6_mapped { 'main': } [17:21:27] ohoh [17:21:31] ja can do [17:22:17] (03PS2) 10Ottomata: Move cp301[5-8] to internal esams network, add site.pp node entry for cp3018 [operations/puppet] - 10https://gerrit.wikimedia.org/r/137707 [17:22:44] (03CR) 10BBlack: [C: 031] Move cp301[5-8] to internal esams network, add site.pp node entry for cp3018 [operations/puppet] - 10https://gerrit.wikimedia.org/r/137707 (owner: 10Ottomata) [17:22:45] (03CR) 10Ottomata: [C: 032 V: 032] Add DNS entries for cp301[5-8].esams.wmnet. [operations/dns] - 10https://gerrit.wikimedia.org/r/137706 (owner: 10Ottomata) [17:24:49] (03PS3) 10Ottomata: Move cp301[5-8] to internal esams network, add site.pp node entry for cp3018 [operations/puppet] - 10https://gerrit.wikimedia.org/r/137707 [17:24:52] that patch fixed a tab inconsistency [17:24:54] merging that one [17:26:34] (03CR) 10Ottomata: [C: 032 V: 032] Move cp301[5-8] to internal esams network, add site.pp node entry for cp3018 [operations/puppet] - 10https://gerrit.wikimedia.org/r/137707 (owner: 10Ottomata) [17:29:16] yeah I saw the tab thing, but I have a mental block about not nitpicking changes over whitespace/formatting. I figure if it matters enough to mention in a review, we should be automating jenkins tests for that. [17:30:41] PROBLEM - Host ms-fe3001 is DOWN: PING CRITICAL - Packet loss = 100% [17:34:01] (03PS1) 10Ori.livneh: rcstream: move app mount to top-level [operations/puppet] - 10https://gerrit.wikimedia.org/r/137710 [17:34:46] ^ _joe_ [17:37:28] (03PS11) 10Ori.livneh: Add rsyslog module and port existing usage [operations/puppet] - 10https://gerrit.wikimedia.org/r/135447 [17:37:35] (03CR) 10Ori.livneh: "ping" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135447 (owner: 10Ori.livneh) [17:39:35] ori: Filed https://bugzilla.wikimedia.org/show_bug.cgi?id=66204 for now [17:40:06] Could be related to the /rc/ 404 error thingy (e.g. maybe when we hit the right server, it'll magically work, and right now it's just saying CORS not allowed instead of 404) [17:40:13] but if not, we need to look into it. [17:40:33] it'll magically work, i think [17:41:46] ori: So we're removing it from this role, is this removing it from nginx to let rcstream.py handle it, or removing it from rcstream so that rcstream is root and we proxy it behind /rc in nginx? [17:41:53] Krinkle: [17:41:55] the latter might be a problem if it needs to self-identify its own locatino. [17:41:55] $ curl -Is "http://stream.wikimedia.org/rc/socket.io/1/?t=1401990079901" | grep Access-Control-Allow-Origin [17:41:56] Access-Control-Allow-Origin: * [17:42:01] cool [17:42:21] so it's probably failing on CORS because it's really just hitting a 404 but not exposing that because of CORS [17:42:28] alrihgty [17:42:31] nod [17:42:37] (03CR) 10Krinkle: [C: 031] rcstream: move app mount to top-level [operations/puppet] - 10https://gerrit.wikimedia.org/r/137710 (owner: 10Ori.livneh) [17:43:42] manybubbles: do you know the drives that the solr*s had? [17:43:54] is it approximately the same as the existing elastics? [17:44:00] ottomata: I _thought_ so [17:44:11] ok i haven't checked...not sure how to thorugh mgmt console [17:44:43] Hi Operations folks, could someone ping me here about a ticket I just sent in? Time sensitive thing. [17:45:36] manybubbles: its ok that they are each in 3 different racks? [17:45:36] 'elastic1017' => 'A6', [17:45:36] 'elastic1018' => 'A7', [17:45:36] 'elastic1019' => 'B6', [17:45:39] s/racks/rows/ [17:45:49] ottomata: it'd be better if they were in row D but I'm ok with whatever I can get [17:45:57] cmjohnson1: shoudl we move them? [17:46:32] advantage of row D is that we're balanced - or closer to balanced [17:46:54] (03CR) 10Krinkle: [C: 031] "Confirmed." [operations/puppet] - 10https://gerrit.wikimedia.org/r/137265 (owner: 10Ori.livneh) [17:47:05] which would be nice for row awareness [17:47:41] ottomata: these are tho old solr ...right? [17:48:04] we should move them if they are [17:48:45] yup [17:48:47] old solrs [17:48:51] ok cmjohnson1, i'll make a ticket then [17:49:36] cool..i will fix them...manybubbles row D [17:49:38] ? [17:50:12] cmjohnson1: that'd be sweet! just like the 4 that you installed on row D for me a month or two ago [17:51:07] cmjohnson1: https://rt.wikimedia.org/Ticket/Display.html?id=7633 [17:51:23] thanks [17:51:54] hm, manybubbles which ones are in row D now? [17:52:08] ottomata: uh, I dunno, its in puppet [17:52:16] in elasticsearch.pp I think [17:52:18] ah i see [17:52:20] I put the rack and row in there [17:52:22] sorry I had edited that out [17:52:22] got it [17:52:36] cmjohnson1: will those go into D3? [17:53:31] RECOVERY - Host ms-fe3001 is UP: PING OK - Packet loss = 0%, RTA = 96.08 ms [17:53:33] ottomata: yes they will [17:54:01] PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 525 bytes in 0.001 second response time [17:54:49] ottomata: don't mess with anything yet though...I have some unused misc boxes on top of them right now. I think I am going to move them all up x3 so they're all together. [17:55:50] ja cool, that's fine, just going to submit a patch but not merge it [17:55:53] since i already started [17:59:01] RECOVERY - graphite.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 0.003 second response time [17:59:11] PROBLEM - Number of mediawiki jobs running on tungsten is CRITICAL: CRITICAL: Anomaly detected: 35 data above and 0 below the confidence bounds [17:59:11] PROBLEM - Number of mediawiki jobs queued on tungsten is CRITICAL: CRITICAL: Anomaly detected: 35 data above and 0 below the confidence bounds [18:00:04] Reedy, greg-g: The time is nigh to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140605T1800) [18:00:13] NOWAI [18:00:18] :) [18:01:05] chu chu [18:02:03] jouncebot: next [18:02:03] In 4 hour(s) and 57 minute(s): SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140605T2300) [18:02:05] (03PS3) 10Reedy: Wikipedias to 1.24wmf7 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137584 [18:02:09] (03CR) 10Reedy: [C: 032] Wikipedias to 1.24wmf7 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137584 (owner: 10Reedy) [18:02:15] (03Merged) 10jenkins-bot: Wikipedias to 1.24wmf7 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137584 (owner: 10Reedy) [18:06:13] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikipedias to 1.24wmf7 [18:06:18] Logged the message, Master [18:08:46] (03PS3) 10Reedy: Update group0 to 1.24wmf8 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137585 [18:09:10] (03PS1) 10Ottomata: solr100[1-3] are now elastic101[7-9] [operations/dns] - 10https://gerrit.wikimedia.org/r/137720 [18:10:09] (03CR) 10Reedy: [C: 032] Update group0 to 1.24wmf8 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137585 (owner: 10Reedy) [18:10:15] (03Merged) 10jenkins-bot: Update group0 to 1.24wmf8 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137585 (owner: 10Reedy) [18:10:24] (03PS1) 10Ottomata: Setting up solr100[1-3] as elastic101[7-9] [operations/puppet] - 10https://gerrit.wikimedia.org/r/137721 [18:11:16] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Update group0 to 1.24wmf8 [18:11:20] Logged the message, Master [18:12:58] bblack: any idea why i would be getting [18:13:06] DHCPDISCOVER from d4:ae:52:8c:86:31 via 91.198.174.3: network 91.198.174.0/25: no free leases [18:13:06] ? [18:13:07] everything seems to be in order, as far as I can tell [18:13:24] dchp file is correct on carbon [18:13:40] 91.198.174.0/25 is the old network for that mac [18:14:32] (03PS2) 10Reedy: Disable mergehistory everywhere [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137499 (owner: 10Chad) [18:14:36] (03CR) 10Reedy: [C: 032] Disable mergehistory everywhere [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137499 (owner: 10Chad) [18:14:42] (03Merged) 10jenkins-bot: Disable mergehistory everywhere [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137499 (owner: 10Chad) [18:15:09] (03CR) 10Ottomata: [C: 04-1] "-1 until boxes are moved to Row D" [operations/puppet] - 10https://gerrit.wikimedia.org/r/137721 (owner: 10Ottomata) [18:15:14] <_joe_> ori: taking a look now [18:15:54] (03PS2) 10Giuseppe Lavagetto: rcstream: move app mount to top-level [operations/puppet] - 10https://gerrit.wikimedia.org/r/137710 (owner: 10Ori.livneh) [18:15:56] (03PS3) 10Reedy: Set unifont-5.1.20080907.ttf for timeline on ZH projects [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133228 (https://bugzilla.wikimedia.org/20825) (owner: 10Liangent) [18:16:00] (03CR) 10Reedy: [C: 032] Set unifont-5.1.20080907.ttf for timeline on ZH projects [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133228 (https://bugzilla.wikimedia.org/20825) (owner: 10Liangent) [18:16:04] (03CR) 10Giuseppe Lavagetto: [C: 032] rcstream: move app mount to top-level [operations/puppet] - 10https://gerrit.wikimedia.org/r/137710 (owner: 10Ori.livneh) [18:16:22] (03Merged) 10jenkins-bot: Set unifont-5.1.20080907.ttf for timeline on ZH projects [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133228 (https://bugzilla.wikimedia.org/20825) (owner: 10Liangent) [18:16:26] cp3018.esams.wmnet resolves correclty from carbon [18:17:23] !log Created FlaggedRevs tables on ckbwiki [18:17:27] Logged the message, Master [18:17:32] (03PS2) 10Reedy: Enable FlaggedRevs for Central Kurdish Wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136282 (https://bugzilla.wikimedia.org/65809) (owner: 10Reza) [18:17:38] (03CR) 10Reedy: [C: 032] Enable FlaggedRevs for Central Kurdish Wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136282 (https://bugzilla.wikimedia.org/65809) (owner: 10Reza) [18:17:44] (03Merged) 10jenkins-bot: Enable FlaggedRevs for Central Kurdish Wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136282 (https://bugzilla.wikimedia.org/65809) (owner: 10Reza) [18:18:11] (03PS5) 10Reedy: Create 'noratelimit' user group on dewiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130809 (https://bugzilla.wikimedia.org/57819) (owner: 10Withoutaname) [18:18:17] (03CR) 10Reedy: [C: 032] Create 'noratelimit' user group on dewiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130809 (https://bugzilla.wikimedia.org/57819) (owner: 10Withoutaname) [18:18:23] (03Merged) 10jenkins-bot: Create 'noratelimit' user group on dewiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130809 (https://bugzilla.wikimedia.org/57819) (owner: 10Withoutaname) [18:18:42] wonder why we are getting so many "Exception from line 1168 of /usr/local/apache/common-local/php-1.24wmf7/includes/db/Database.php" ? [18:18:54] (03Abandoned) 10Reedy: gtitmsg [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134501 (owner: 10Steinsplitter) [18:19:02] git you forget to run update.php ... [18:19:08] (03PS4) 10Reedy: Sets otherProjectsLinksByDefault to true for fr and it wikisources. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/132611 (owner: 10Tpt) [18:19:13] (03CR) 10Reedy: [C: 032] Sets otherProjectsLinksByDefault to true for fr and it wikisources. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/132611 (owner: 10Tpt) [18:19:19] (03Merged) 10jenkins-bot: Sets otherProjectsLinksByDefault to true for fr and it wikisources. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/132611 (owner: 10Tpt) [18:20:29] anomie: About? [18:20:39] Reedy: ? [18:20:52] DB query error from ApiQueryLogEvents on enwiki [18:20:58] * anomie looks [18:21:09] http://p.defau.lt/?MS_CM1K1QeSgP_XtWCKcDw [18:21:14] thanks aude [18:21:37] looks like they stopped [18:21:40] but was a lot [18:21:48] Krinkle, _joe_: omg omg omg it works [18:21:55] Krinkle: http://noc.wikimedia.org/~ori/stream/ , open js console [18:22:01] eh [18:22:09] hmm [18:22:14] it works in explain [18:22:52] * ori hugs _joe_ [18:22:55] weeeee [18:23:48] (03PS2) 10Reedy: Enable TemplateData GUI on Catalan Wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/135580 (https://bugzilla.wikimedia.org/65785) (owner: 10Jforrester) [18:23:53] (03CR) 10Reedy: [C: 032] Enable TemplateData GUI on Catalan Wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/135580 (https://bugzilla.wikimedia.org/65785) (owner: 10Jforrester) [18:23:58] (03Merged) 10jenkins-bot: Enable TemplateData GUI on Catalan Wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/135580 (https://bugzilla.wikimedia.org/65785) (owner: 10Jforrester) [18:24:10] (03PS2) 10Reedy: Remove outdated eswiki config disabling VE for anons [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/135581 (owner: 10Jforrester) [18:24:10] (03CR) 10Reedy: [C: 032] Remove outdated eswiki config disabling VE for anons [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/135581 (owner: 10Jforrester) [18:24:17] (03Merged) 10jenkins-bot: Remove outdated eswiki config disabling VE for anons [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/135581 (owner: 10Jforrester) [18:24:21] ori: _joe_: http://codepen.io/Krinkle/full/laucI [18:24:25] (03CR) 10Giuseppe Lavagetto: "Change is ok, though do we really need this 'simplification'?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/137470 (owner: 10Ori.livneh) [18:24:29] MaxSem: Want me to do the Geodata solr stuff too? [18:24:52] (03PS3) 10Reedy: Remove flaggedrevs-specific user groups from mediawiki.org [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134969 (owner: 10Withoutaname) [18:25:18] (03CR) 10Reedy: [C: 032] Remove flaggedrevs-specific user groups from mediawiki.org [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134969 (owner: 10Withoutaname) [18:25:21] (03CR) 10Ori.livneh: "Giuseppe, makes sense. I'll think it over too." [operations/puppet] - 10https://gerrit.wikimedia.org/r/137470 (owner: 10Ori.livneh) [18:26:04] (03Merged) 10jenkins-bot: Remove flaggedrevs-specific user groups from mediawiki.org [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134969 (owner: 10Withoutaname) [18:26:18] (03PS3) 10Reedy: Remove NS_USER_TALK from $wmgNamespacesToPostIn [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134472 (https://bugzilla.wikimedia.org/65524) (owner: 10Legoktm) [18:26:21] (03CR) 10Reedy: [C: 032] Remove NS_USER_TALK from $wmgNamespacesToPostIn [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134472 (https://bugzilla.wikimedia.org/65524) (owner: 10Legoktm) [18:26:38] Krinkle: \o/ [18:26:52] (03Merged) 10jenkins-bot: Remove NS_USER_TALK from $wmgNamespacesToPostIn [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134472 (https://bugzilla.wikimedia.org/65524) (owner: 10Legoktm) [18:27:07] Krenair: kudos to you as well! [18:27:20] (03PS5) 10Reedy: Remove dead ULS configs after I49e812eae32266f165591c75fd67b86ca06b13f0 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115880 (owner: 10Nemo bis) [18:27:55] ori, oh is rcstream live? [18:28:10] <_joe_> NO [18:28:12] (03PS3) 10Reedy: Add SVG logos for sixteen Wikisource wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131255 (https://bugzilla.wikimedia.org/52019) (owner: 10Odder) [18:28:13] <_joe_> :) [18:28:17] (03CR) 10Reedy: [C: 032] Add SVG logos for sixteen Wikisource wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131255 (https://bugzilla.wikimedia.org/52019) (owner: 10Odder) [18:28:21] Krenair: it's "dark-launched", meaning live but not yet announced. still needs SSL, perf monitoring, etc. [18:28:25] <_joe_> it's "working", not 'live' [18:28:28] Reedy: I notice that all the queries are looking for log_type = 'move' in namespace 14. So probably someone is looking for uses of the relatively new category-move feature, falling off the end of when it was enabled, and so the database is having to scan all the rows from the beginning of time until last month to determine there are no more results. [18:28:28] I see [18:28:52] Krenair: so feel free to look but don't publicize yet [18:28:58] (03Merged) 10jenkins-bot: Add SVG logos for sixteen Wikisource wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131255 (https://bugzilla.wikimedia.org/52019) (owner: 10Odder) [18:29:02] ori, how are all the namespaces etc. set up? [18:29:03] <_joe_> grumpy ops are witholding the toy from dev's hands [18:29:05] <_joe_> :P [18:29:16] _joe_: no no you are perfectly right to [18:29:22] _joe_: (Krenair did a big piece of the mw implementation) [18:29:27] (03PS2) 10Reedy: Add SVG logos for nine Wikibooks wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131204 (https://bugzilla.wikimedia.org/52019) (owner: 10Odder) [18:29:29] <_joe_> we sould actually need to try some stress testing as well [18:29:29] Well... Yeah, I helped a bit. [18:29:41] (03CR) 10Reedy: [C: 032] Add SVG logos for nine Wikibooks wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131204 (https://bugzilla.wikimedia.org/52019) (owner: 10Odder) [18:29:41] <_joe_> :) [18:29:44] (03Merged) 10jenkins-bot: Add SVG logos for nine Wikibooks wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131204 (https://bugzilla.wikimedia.org/52019) (owner: 10Odder) [18:29:50] * ori will brb [18:30:27] (03PS2) 10Reedy: Enable GuidedTour on additional language Wikipedias [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137533 (owner: 10Mattflaschen) [18:30:31] (03CR) 10Reedy: [C: 032] Enable GuidedTour on additional language Wikipedias [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137533 (owner: 10Mattflaschen) [18:30:49] Basically I started maintaining a (then pretty much abandoned) commit by vvv, which abstracted out how MW core does some things involving outputting RC stream entries [18:31:27] Every little helps [18:32:41] (03Merged) 10jenkins-bot: Enable GuidedTour on additional language Wikipedias [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137533 (owner: 10Mattflaschen) [18:34:25] (03PS2) 10Reedy: Throttle GWToolset uploads [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/132112 (owner: 10Gergő Tisza) [18:34:33] (03CR) 10Reedy: [C: 032] Throttle GWToolset uploads [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/132112 (owner: 10Gergő Tisza) [18:34:38] (03Merged) 10jenkins-bot: Throttle GWToolset uploads [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/132112 (owner: 10Gergő Tisza) [18:35:34] !log reedy Synchronized database lists: (no message) (duration: 00m 13s) [18:35:38] Logged the message, Master [18:36:05] !log reedy Synchronized wmf-config/: (no message) (duration: 00m 14s) [18:36:09] Logged the message, Master [18:37:45] hm [18:39:09] (03PS1) 10Reedy: wgMemoryLimit from 235 to 245MB [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137730 [18:39:38] anybody avail to help me figure out why dhcp is not giving cp3018 the right addy? [18:40:22] mutante: any experience with this? [18:40:22] ottomata: check the MAC address in dhcpd logs and compare to what you see on drac console [18:40:26] yeah [18:40:28] ottomata: did you make any dns changes yet? [18:40:30] that's what i'm doing [18:40:30] Krenair: yay rescuing abandoned vvv code :) [18:40:30] yes [18:40:32] sometimes you get the mgmt interface MAC [18:40:38] or there is more than one NIC [18:40:52] hmm [18:40:53] or the DNS entry is wrong [18:40:59] dns resolves correctly [18:41:12] DHCPDISCOVER from d4:ae:52:8c:86:31 via 91.198.174.1: network 91.198.174.0/25: no free leases [18:41:16] Reedy: what do you want to do with that config change? I think it now includes some semi-unrelated changes it used to depend on (gathering related config in one place) [18:41:33] Which one? :D [18:41:36] that is the first nic in racadm getsysinfo [18:41:46] the [18:41:46] 91.198.174.0/25 [18:41:49] is the weird part [18:42:02] that is a public network, this is supposed to be changed to a private network [18:42:07] cp3018 used to be on that net [18:42:11] or actually... [18:42:16] no it didn't, the other 3 nodes did [18:42:21] cp3015-cp3017 [18:42:40] ah wait, yes it was [18:42:42] cp3018 used to be [18:42:46] 91.198.174.88 [18:42:50] ottomata: check vlan [18:43:08] ottomata: eh.. it is commented in dhcp config? [18:43:09] ? [18:43:11] PROBLEM - Puppet freshness on virt1000 is CRITICAL: Last successful Puppet run was Thu 05 Jun 2014 15:42:35 UTC [18:43:17] no, you should pull [18:43:18] :) [18:43:23] or did i.. heh, ok [18:43:42] https://gerrit.wikimedia.org/r/#/c/137707/3/modules/install-server/files/dhcpd/linux-host-entries.ttyS1-115200 [18:43:59] cmjohnson1: check vlan? [18:44:15] mutante: also: https://gerrit.wikimedia.org/r/#/c/137706/ [18:44:23] so you're saying the others work just 3018 doesnt? [18:44:53] no, haven't tried the others, i'm just dealing with 3018 [18:45:08] did you check vlan on cp3015 to make sure it's correctly set? iirc i get that "no free lease" if it's setup wrong [18:45:12] ori, oh I figured out what was wrong with my script (re: namespaces), nvm [18:45:16] yay, it's working, etc. [18:45:39] cmjohnson1: how do I check that? in mgmt console? [18:45:49] also, i'm workign with cp3018 [18:45:51] not cp3015 [18:47:51] RECOVERY - Puppet freshness on virt1000 is OK: puppet ran at Thu Jun 5 18:47:44 UTC 2014 [18:48:07] um… anyone know what just happened with virt1000? [18:48:54] Ask Coren/_joe_? [18:52:06] ottomata: do you kow if the ethernet port is in the right vlan on the switch? [18:52:17] if nobody's messed with it since this started, probably not [18:54:39] hm, well, that's the one that I see in the dhcp logs, [18:54:40] bblac [18:54:41] k [18:54:43] bblack [18:54:48] so that should be the right one, ja? [18:55:05] but, the strange thing is, that it is trying to assign an addy from the public vlan [18:55:12] not the newly assigned private one [18:55:16] (03CR) 10Calak: "Thank you very much Reedy." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136282 (https://bugzilla.wikimedia.org/65809) (owner: 10Reza) [18:55:40] which install server is showing that log msg? [18:55:55] the local config on carbon says it should use a private one [18:56:10] root@carbon:/etc/dhcp# grep -r cp3018 * [18:56:27] does it really talk to carbon from esams? [18:56:29] is it really carbon and not hooft? [18:56:32] heh [18:56:41] carbon [18:56:49] oh [18:57:00] i didn't check hooft...is that an install server? [18:57:06] anyway, yah i see logs on carbon [18:57:18] ah maybe we forward install stuff across the pond? [18:57:21] seems odd, though :) [18:59:34] hooft has a tftp but not the entire install-server role [18:59:39] I'm trying to get on the switch, but I can't remember yet the magic invocations of ssh proxy, etc [19:02:22] ottomata: yeah it's on the wrong switch port [19:06:15] ottomata: working on it, will take me a min [19:14:39] thanks bblack [19:16:31] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:20:21] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 54249 bytes in 0.238 second response time [19:21:58] ottomata: should be good now, I moved the ports for cp301[5678] from the public vlan to the private vlan [19:22:59] (which should take like 2 minutes, but it takes more like 20 because my junos skills are horrible :) [19:28:30] bblack: thx for doing that I did not remember the illusive magic of the esams switches [19:29:59] hiyo, would "watchlist emptying tool doesn't work" be a duplicate for "watchlist editing doesn't work (raw/clicky)" when both issues are caused by massive number of watched items?` [19:31:07] cmjohnson1: the primary bits of magic that get me every time is that you have to use hooft to reach them, and whlie csw1-esams and csw2-esams exist in DNS, apparently csw1 is dead and csw2 is the one to use [19:31:21] so just figuring out how to get where is a pain :) [19:32:08] that's right forgot about going through hooft [19:37:11] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Tue 03 Jun 2014 16:21:49 UTC [19:45:18] bblack thanks! [19:45:25] ori: Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at https://stream.wikimedia.org/socket.io/1/?t=1401997499149. This can be fixed by moving the resource to the same domain or enabling CORS. [19:46:01] (when I go to https://noc.wikimedia.org/~ori/stream/) [19:46:08] _joe_: That one ^^^ [19:47:09] <_joe_> andrewbogott: labstore? no that's going on since 3 days I think [19:47:12] <_joe_> check on icinga [20:01:11] PROBLEM - Puppet freshness on db1007 is CRITICAL: Last successful Puppet run was Thu 05 Jun 2014 17:00:25 UTC [20:01:34] (03PS1) 10Mwalker: Fix type in OCG role [operations/puppet] - 10https://gerrit.wikimedia.org/r/137743 [20:03:03] ori: ping [20:03:17] bblack: hey [20:05:10] ori: so looking at the VCL request-flow graphs on https://www.varnish-software.com/static/book/VCL_Basics.html , vcl_deliver is always last [20:05:31] that makes me think something else is wrong with our header-manipulation leading to you seeing orig-cookie instead of cookie at that point [20:06:41] could be, or the control flow doesn't match that chart because we override it [20:06:50] it is definitely orig-cookie at that point [20:10:14] ori: oh, duh, I figured it out [20:11:03] the s/Cookie/Orig-Cookie/ is on req.http. The restoration in vcl_hit/miss is into bereq.http [20:11:10] the "right" fix is to check bereq.Cookie [20:11:20] err bereq.http.Cookie [20:11:55] (which, if we weren't munging cookies in the first place, would be == req.http.Cookie) [20:12:07] hmmm bblack [20:12:11] No disk drive was detected. If you know the name of the driver needed │ [20:12:11] │ by your disk drive, you can select it from the list. [20:12:12] ... [20:12:26] lol [20:12:56] ottomata: maybe storage stuff in the install-server puppet stuff [20:13:20] oh you know, i probalby didn't give it a partman [20:13:41] cp102[1-9]|cp10[3-6][0-9]|cp1070|cp[34]0[0-9][0-9]|sq6[7-9]|sq70|dysprosium) echo partman/raid1-varnish.cfg ;; \ [20:13:51] it should be included in that [20:14:39] (03CR) 10Jgreen: [C: 032 V: 031] Fix type in OCG role [operations/puppet] - 10https://gerrit.wikimedia.org/r/137743 (owner: 10Mwalker) [20:14:46] hmm, yeah [20:15:15] ottomata: maybe something in the bios settings for the storage card? like, some funky "hardware" BIOS raid that needs driver assistance is configured? [20:15:29] oh, or it's not configured at all! I hit this before [20:15:46] hah, uhh, so I should boot into bios? [20:15:52] Ctrl+S (or whatever the hotkey is) on bootup for the storage card, and go in and create two volumes, one for each disk with no real raid in the card [20:16:09] not dell bios, but whatever it shows when it initializes the storage card [20:16:11] hmmm ok [20:16:32] I don't remember of Ctrl+S is the one that's usually the network card or the storage [20:16:36] but it will scroll by [20:18:32] (03PS2) 10BBlack: Redirect language variant URLs to mobile [operations/puppet] - 10https://gerrit.wikimedia.org/r/137476 (https://bugzilla.wikimedia.org/51753) (owner: 10MaxSem) [20:19:00] (03CR) 10BBlack: [C: 032 V: 032] Redirect language variant URLs to mobile [operations/puppet] - 10https://gerrit.wikimedia.org/r/137476 (https://bugzilla.wikimedia.org/51753) (owner: 10MaxSem) [20:19:37] (03PS3) 10Ori.livneh: text-frontend VCL: grep Orig-Cookie for GeoIP, not (just) Cookie [operations/puppet] - 10https://gerrit.wikimedia.org/r/137265 [20:20:29] ok in! [20:20:46] hmm, this might be network [20:21:02] heh [20:21:03] yeah, nics [20:21:03] hm [20:21:08] yeah some other control key [20:21:16] can you see everything that scrolls by when bios is first booting? [20:21:23] ok, take 3! [20:21:26] yeah, but it is fast [20:21:36] maybe control-M or control-R? [20:21:49] or E [20:21:55] question! does anyone know anything about logstash? specifically -- what protocol does it speak so that I can put structured logs into it? [20:21:57] or one of the other 22 remaining letters not yet mentioned [20:22:11] bblack, don't forget the special chars and numbers [20:22:28] ctrl r [20:22:53] (03PS1) 10Cmjohnson: adding dns entries for elatic101[7-9] and removing solr100x entries [operations/dns] - 10https://gerrit.wikimedia.org/r/137750 [20:23:09] To configure RAID settings, Press Control-Shift-Meta-Alt-&, then hold down the letters F and U while tapping F3 twice [20:23:27] haha [20:23:29] get outta here [20:23:31] haha [20:23:38] mwalker: bd808|BUFFER but he's not here... [20:23:57] he's the only one? [20:24:08] * YuviPanda adds bblack to quips [20:24:24] (03CR) 10Cmjohnson: [C: 032] adding dns entries for elatic101[7-9] and removing solr100x entries [operations/dns] - 10https://gerrit.wikimedia.org/r/137750 (owner: 10Cmjohnson) [20:25:17] mwalker: or ori (based on the puppet repo) [20:25:19] bblack, not really sure what to do here [20:26:07] ottomata: basically figure out the ridiculous UI, press enter on something to create a 1-disk raid volume out of the first disk and name it whatever (e.g. "sda"), then again for the second disk [20:26:48] we're basically asking the raid card to not do anything and just give us the disks [20:27:51] hmmm,, i think i am being limited by os x's function key mappings... [20:28:18] do you use iterm? [20:28:28] oh, it's the mgmt console key-mapping [20:28:57] yeah iterm [20:29:01] Esc then 3 for F3 [20:29:04] etc [20:29:37] hmmm [20:30:11] RECOVERY - Puppet freshness on db1007 is OK: puppet ran at Thu Jun 5 20:30:10 UTC 2014 [20:30:45] mwalker: right now the only fully implemented input to logstash is reading udp2log udp packets. We should find a better input for you. [20:30:45] ottomata: like this thing says: http://current.workingdirectory.net/posts/2012/console-bios-key-map/ [20:31:04] mwalker: we have redis there for log input but it hasn't been used/tested. [20:31:07] (except it doesn't seem to mention esc-1 for F1 through esc-9 for F9) [20:32:29] I think, anyways [20:33:30] oh maybe the F1-F9 keys are done the normal way, nothing seems to mention them [20:33:45] if you want I can grab the console and figure it out [20:34:08] I don't think it allows multiple sessions unfortunately [20:34:17] yeah maybe, my problem seems to be that i have function keys doing multiple things [20:34:17] bd808, gwicke and I were talking about wanting to use graylog2 -- but I can also talk redis. my plan is to use the Winston library which has all sorts of fun transports: https://github.com/flatiron/winston/blob/master/docs/transports.md [20:34:28] ok i'm out of the console bblack [20:34:31] see if you can figure that out :p [20:35:20] bd808, according to github; logstash accepts graylog2 https://github.com/elasticsearch/logstash/blob/v1.4.1/lib/logstash/inputs/gelf.rb [20:35:42] (03PS1) 10Cmjohnson: adding elastic101[7-9] netboot.cfg and dhcpd [operations/puppet] - 10https://gerrit.wikimedia.org/r/137753 [20:35:49] there was some ok/cancel menu boxes up when I connected, couldn't see what they were, so I cancelled and then it wanted ctrl+alt+del to reboot [20:36:28] mwalker: Yup. We could use that. Only downside is that it's a point-to-point protocol to a single logstash listener. [20:36:29] ahh foo [20:36:41] well, ctrl m is the magic key to get you back in there! [20:37:15] yeah so on that first menu page for the raid controller [20:37:17] (03CR) 10Cmjohnson: [C: 032] adding elastic101[7-9] netboot.cfg and dhcpd [operations/puppet] - 10https://gerrit.wikimedia.org/r/137753 (owner: 10Cmjohnson) [20:37:27] mwalker: disgustingly we are still running logstash 1.2.2, but it has gelf as well -- http://logstash.net/docs/1.2.2/ [20:37:37] you want to up-arrow to the line at the top of the little tree for the controller itself and press enter, and it asks to create a virtual disk, basically [20:38:13] bd808, I don't know if logstash supports it; but you can do multicast udp with the protocol [20:38:20] the line that says PERC H710 Mini or whatever [20:38:44] ottomata: bblack you can also try using the web based mgmt console if that helps [20:38:46] it can all be done with just spacebar/enter/arrowkeys [20:39:01] (and then esc to exit at the end) [20:39:02] (03PS7) 10Reedy: Move a lot of the miscellaneous wikis out of their own specific docroots [operations/apache-config] - 10https://gerrit.wikimedia.org/r/90703 [20:39:05] spacebar!!!! [20:39:09] i didn't try that [20:39:12] enter wasn't doing anything form e [20:39:18] even on that top line? [20:39:33] hm i could get a couple of things to work with enter, but nothing that seemed useful [20:39:39] i wanted to F2 for operations [20:39:42] whatever that was [20:39:44] ah [20:40:03] yeah just up-arrow to the top of the tree, press enter, fill out form to create 1-disk VD [20:40:07] mwalker: are you in the office today? [20:40:11] (spacebar to select a disk) [20:40:18] bd808, no; but I can be [20:40:26] it's a short bike ride [20:40:34] oh and tabs to move around the fields helps [20:40:39] TAB I mean [20:41:04] ottomata: do you want me to drop out and let you do it and see it works for you? or just do it? [20:41:13] mwalker: We can do irc, but maybe we should pick a channel other than ops to hash it out? [20:41:24] join me in pdfhack? [20:41:31] mediawiki-pdfhack that is [20:42:05] hmm, bblack, sure i'll try it [20:42:16] lemme know whne you are out [20:42:44] I checked on the F-keys thing, too. for me, "Fn-F2" or "Esc, 2" works [20:42:53] tried both of those [20:42:55] but I don't think you need F2 anyways, just press enter at the top [20:43:01] both f2 and fn-f2 do things on my mac :/ [20:43:04] gotta disable something [20:43:23] I'm out [20:44:16] try arrow keys when you first get in to see if it's still in that menu. it might be that disconnecting send a break which pops up the ok/cancel for saving changes before exit [20:44:22] or tab [20:45:19] someone should file a bug with dell for shareable remote consoles [20:47:05] agh, bblack [20:47:09] should i space bar on one of the disks? [20:47:20] doesn't seem to do anything... [20:48:05] (03PS4) 10BBlack: text-frontend VCL: grep bereq.http.Cookie for GeoIP, not req.http.Cookie [operations/puppet] - 10https://gerrit.wikimedia.org/r/137265 (owner: 10Ori.livneh) [20:48:26] yeah when the box first pops up to create a VD (from pressing enter) [20:48:40] I think you're already in the box for disks, and spacebar puts an [X] on one [20:48:45] and tab moves between fields [20:49:06] bblack: 'bereq.http.Cookie': Not available in method 'vcl_deliver'. [20:49:11] ah, ok i want that [20:49:12] a VD [20:49:13] ? [20:49:21] yeah I think so [20:49:46] bblack: (that's a compilation failure on that VCL when attempting to apply it in labs) [20:49:53] ori: hmmm [20:50:09] that kinda makes for an interesting conundrum [20:50:10] rrrgh, ok i got one of them [20:50:14] i just want to use one, right? [20:50:24] can't seem to select hte second one now [20:50:34] you want to redo that dialog twice, creating one VD from each PD (virtual from physical) [20:50:41] ah wait, think i got it [20:50:47] anything else to do? [20:50:49] to the settings? [20:50:51] or justthat [20:50:51] ? [20:50:55] just that [20:51:04] I think esc will try to exit and ask to save on the way [20:51:06] http://cl.ly/image/1i000J3Z1B45 [20:51:41] that doesn't look right [20:51:44] ah rats ok [20:51:48] already exited... [20:52:10] ha, can'tctrl alt delete wither [20:52:11] either [20:52:13] maybe it is right [20:52:16] rebooting via console [20:52:18] shodul i just try to boot [20:52:19] wait [20:52:20] netboot? [20:52:24] ok... [20:52:42] just stay in the console, and type (6 keys one by one, quickly): Esc R Esc r Esc R [20:52:48] notice the upper/lower case on the r's [20:52:56] and that will send ctrl+alt+del [20:53:25] hmmm [20:53:29] attempting... [20:53:34] what doesn't look right is the disks are different sizes, but maybe they really are [20:53:45] oh werid that is weird [20:53:51] i don't think I didanythign different [20:54:01] erg, that is not working [20:54:02] esc R stuff [20:54:06] should be ok to reboot via console, no? [20:54:08] let me grab the console again [20:54:10] ok [20:54:13] i'm out [20:54:15] bblack: so, do you mind going with Orig-Cookie for nw? [20:54:20] it works [20:54:51] legoktm: it's because you used https [20:55:07] legoktm: wss (websockets + ssl) is not up yet; we don't have a cert yet [20:55:15] ori: there has to be a better answer to this [20:55:24] ssl certs, fun [20:55:45] I don't know why bereq would be unavail in vcl_deliver, it seems odd [20:56:47] (but everything in vcl seems odd) [20:56:53] bblack: i bet mark knew this, which is why we call restore_cookie in both vcl_hit and vcl_miss rather than just in vcl_deliver [20:57:24] bblack, i have to run pretty soon...hmmm [20:57:54] i can do the same (put the check right after restore_cookie in both miss/hit) [20:58:28] ori: give me a few, I can't miss the timing on this stupid console stuff [20:58:29] so if you wanna just push the proper buttons, please do :) [20:58:39] bblack: np [20:58:48] (03PS1) 10Andrew Bogott: Add roles for testing swift in labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/137803 [20:59:11] (03CR) 10Ori.livneh: [C: 04-2] "doesn't work: 'bereq.http.Cookie': Not available in method 'vcl_deliver'." [operations/puppet] - 10https://gerrit.wikimedia.org/r/137265 (owner: 10Ori.livneh) [21:00:36] (03PS2) 10Andrew Bogott: Add roles for testing swift in labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/137803 [21:02:35] ottomata: http://snag.gy/8wJyp.jpg [21:02:52] ok that looks better [21:02:53] I didn't do anything, just rebooted and went back in there, and now they're both the same size (and neither is the size from your SS) [21:02:55] beats me [21:03:00] haha, oook [21:03:20] (03PS1) 10Awight: Meta: automatic translation workflow state changes [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137804 [21:03:34] ottomata: is it already set to netboot the installer again? [21:03:38] if so I'm gonna hit the reboot button and disconnect [21:04:11] (I think it is actually, so that's what I did) [21:04:45] hm, could be, if the netboot once thing sticks if you don't let it reboot [21:05:09] ok, in watching... [21:05:21] ori: https://www.varnish-software.com/static/book/VCL_functions.html has a nice chart [21:05:22] bblack, hi, do you know if netmapper actually runs on beta cluster? [21:05:40] yurikR: I think it does, but I also think we're not currently pushing it any config [21:05:49] i suspect that it doesn't get updated ips. Anyway to enable that? [21:05:51] like, you'd have to manually copy json files there [21:05:52] bblack: so req is available in deliver [21:06:04] bblack: which agrees with my patch working in labs [21:06:18] yurikR: we can't give it the real data, because the real data is secret and labs isn't, remember? [21:06:28] cool ja, its booting [21:06:37] bblack, how exposed is the betalabs passwords? [21:07:47] what passwords are you referring to? [21:08:04] account to zero.wikimedia [21:08:25] in general, if someting isn't public, we can't go do that on betalabs [21:08:45] right, but does public has access to PrivateSettings.php ? [21:08:53] i thought it is hidden from the public [21:08:58] I have no idea about that [21:09:15] beta probably has it's own privatesettings? [21:09:17] Anyone with access to beta would have access to beta clusters PrivateSettings [21:09:20] it does [21:09:28] production privatesettings are obviously private though, heh [21:09:32] s/access/shell access/ [21:09:48] my point is that - the shell access to beta is still fairly well protected and is not public [21:09:54] yurikR: AFAIK, netmapper on beta is probably somewhat broken [21:10:04] Is beta down? [21:10:11] i didn't do it! [21:10:13] but one of the main reasons we went down this whole road with password-protected databases and all this is because of this [21:10:21] we can't turn around now and say "nevermind, beta is private enough" [21:10:35] hehe, ok, guess the proper way is to set up a zerowiki on beta [21:10:40] yes [21:10:43] with separate fake data [21:10:53] who knows how to create a beta cluster instance? [21:10:54] :) [21:10:58] and then we can do the puppet magic to give it a separate password for the zerofetcher [21:11:06] yep, sounds like a good plan [21:11:13] and we will get it properly tested too [21:11:38] * yurikR is looking for beta cluster guru... [21:11:48] http://www.downforeveryoneorjustme.com/en.wikipedia.beta.wmflabs.org it's not just me :) [21:12:03] yurikR: It's pretty much the same as the cluster [21:12:06] Apache config as necessary, [21:12:09] run addWiki [21:12:10] ori: let's just go back to your PS2 [21:12:12] Add any other settings [21:12:26] Reedy, will wait for the site to come back first ;) [21:12:30] ori: every other solution seems uglier [21:16:01] hm, bblack, i'm going to assume that the installer or the os is still being served up from carbon, not sure [21:16:02] but i gotta run [21:16:10] bblack: ok, i'll amend [21:16:12] hopefully this thing will just have on OS when I check in again tomorrow [21:16:18] thanks for all your help [21:16:19] laters! [21:17:15] (03PS5) 10Ori.livneh: text-frontend VCL: grep Orig-Cookie for GeoIP, not (just) Cookie [operations/puppet] - 10https://gerrit.wikimedia.org/r/137265 [21:17:39] bblack: ^ [21:20:00] csteipp, desktop is down, but M. works [21:20:01] http://en.m.wikipedia.beta.wmflabs.org/wiki/Main_Page [21:21:51] Hmm. The desktop IP address closes the connection right away. [21:22:09] i don't see much in the fatal.log [21:22:19] (03PS6) 10BBlack: text-frontend VCL: grep Orig-Cookie for GeoIP, not (just) Cookie [operations/puppet] - 10https://gerrit.wikimedia.org/r/137265 (owner: 10Ori.livneh) [21:22:24] lots of Fatal error: /usr/local/apache/common-local/wikiversions-labs.cdb has no version entry for `ptwikibooks`. [21:22:27] (03CR) 10BBlack: [C: 032 V: 032] text-frontend VCL: grep Orig-Cookie for GeoIP, not (just) Cookie [operations/puppet] - 10https://gerrit.wikimedia.org/r/137265 (owner: 10Ori.livneh) [21:24:32] ok, its back [21:26:47] * twkozlowski notices the irritating enotif backlog of around 30 minutes [21:26:47] andrewbogott: can you please merge https://gerrit.wikimedia.org/r/#/c/112423/ before i lose the work done? alex is breaking it into a module [21:28:12] (03CR) 10Andrew Bogott: [C: 032] webserver: fixing duplicate declaration of apache-mpm [operations/puppet] - 10https://gerrit.wikimedia.org/r/112423 (owner: 10Matanya) [21:28:33] (03CR) 10Mwalker: [C: 031] Meta: automatic translation workflow state changes [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137804 (owner: 10Awight) [21:29:07] twkozlowski: please post evidence on https://bugzilla.wikimedia.org/show_bug.cgi?id=43936 [21:29:20] thank you andrewbogott [21:30:59] bblack: lgtm [21:34:19] !log Renaming geo_killlist and geo_updates to *_old [21:34:24] Logged the message, Master [21:36:57] Reedy, does this look right to you? mwscript addWiki.php --language en --site zero --dbname zerowiki --domain zero.beta.wmflabs.org [21:37:55] No [21:38:41] Reedy, zero.wikimedia.beta.wmflabs.org ? [21:39:43] mwscript extensions/WikimediaMaintenance/addWiki.php --wiki=aawiki en wikimedia zerowiki zero.wikimedia.beta.wmflabs.org [21:40:33] Reedy, just curious, why do i need --wiki=aawiki ? [21:41:37] legacy reasons, sort of [21:41:47] mwscript needs a wiki config to bootstrap [21:42:59] Reedy, thx, and could you point me where DNS/docroots are configured? we already have it in prod [21:43:50] Shouldn't need to do DNS [21:45:04] bleh, got perm error midway :( [21:45:10] of what? [21:45:10] hope i can re-run the script [21:45:17] running mwscript extensions/WikimediaMaintenance/addWiki.php --wiki=aawiki en wikimedia zerowiki zero.wikimedia.beta.wmflabs.org [21:45:39] what was the permission error? [21:45:46] [25c19f1a] [no req] Exception from line 307 of /mnt/srv/scap-stage-dir/php-master/includes/utils/UIDGenerator.php: Could not open '/tmp/mw-UIDGenerator-squidhtcppurge-48'. [21:45:51] might need to sudo it [21:46:07] needs to run as the mwdeploy user [21:46:37] sudo -u mwdeploy -- your command here [21:47:24] database has already been created, so it barfs ) [21:47:38] should i drop db somehow? [21:47:55] * bd808 looks to Reedy for advice [21:48:04] should be fine [21:48:06] sql enwiki [21:48:11] drop database zerowiki; [21:48:29] (03PS1) 10Mwalker: Adding GELF support to LogStash [operations/puppet] - 10https://gerrit.wikimedia.org/r/137811 [21:48:39] `sql` doesn't work in beta :( [21:48:40] !log updated eventlogging to a8602c1d879f [21:48:44] Logged the message, Master [21:49:03] sob, again the UIDGenerator [21:49:04] I thought it did? :/ [21:49:17] https://bugzilla.wikimedia.org/show_bug.cgi?id=63803 [21:49:35] (03CR) 10jenkins-bot: [V: 04-1] Adding GELF support to LogStash [operations/puppet] - 10https://gerrit.wikimedia.org/r/137811 (owner: 10Mwalker) [21:49:47] sql enwiki -h deployment-db1 [21:49:55] yurikR: hack for getting sql in beta -- https://bugzilla.wikimedia.org/show_bug.cgi?id=45706 [21:50:03] mwscript sql.php --wiki=aawiki [21:50:08] that's what i just used to get in [21:50:40] Reedy, if i do drop zerowiki; from there, would i cause massive demage? [21:50:44] No [21:50:53] Reedy, that doesn't work for me... (sql enwiki -h deployment-db1) [21:51:02] PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host: !DOCTYPE HTML PUBLIC -//IETF//DTD HTML 2.0//EN [21:51:02] Krenair: permission denied? [21:51:06] yes [21:51:11] PROBLEM - Number of mediawiki jobs queued on tungsten is CRITICAL: CRITICAL: Anomaly detected: 35 data above and 0 below the confidence bounds [21:51:11] PROBLEM - Number of mediawiki jobs running on tungsten is CRITICAL: CRITICAL: Anomaly detected: 35 data above and 0 below the confidence bounds [21:51:16] That's a different bug [21:51:16] from deployment-bastion [21:51:16] https://bugzilla.wikimedia.org/show_bug.cgi?id=45706 [21:51:27] (03PS2) 10Mwalker: Adding GELF support to LogStash [operations/puppet] - 10https://gerrit.wikimedia.org/r/137811 [21:52:01] RECOVERY - graphite.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 0.003 second response time [21:53:05] Reedy, correction - drop zerowiki didn't work - trying drop database zerowiki [21:54:23] (03CR) 10BryanDavis: "one tiny copy-n-paste leftover" (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/137811 (owner: 10Mwalker) [21:57:06] bblack: any thoughts on the other VCL patch? [21:57:16] success. thx bd808 & Reedy, now need to configure domain somewhere... [21:57:28] you need to add it to wikiversions-labs.json [21:57:35] commit, review, submit [21:57:37] thx [21:58:58] (03PS2) 10Dzahn: gitblit use generic::systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137392 (owner: 10Rush) [22:00:26] (03PS3) 10Mwalker: Adding GELF support to LogStash [operations/puppet] - 10https://gerrit.wikimedia.org/r/137811 [22:00:38] (03PS1) 10Yurik: Added zerowiki to labs & sorted list [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137812 [22:01:18] greg-g, is it ok to sync-dir labs file ^ [22:01:30] yurikR: Oh, and all-labs.dblist [22:02:22] (03PS2) 10Yurik: Added zerowiki to labs & sorted list [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137812 [22:02:39] You will want to go through and get wgServer etc as are done for other wikis in InitialiseSettings-labs.php [22:04:13] yurikR: yeah [22:04:49] greg-g, is it generally ok to +2 mwf-config changes that are for only for labs files, or do i need to sync it into prod right away? [22:05:15] yurikR: ideally sync to prod as soon as reasonable [22:05:17] stupidly [22:05:45] it's a noop for prod though [22:05:58] updates noc [22:06:00] it's just noise for the next person deploying real things to prod to look over [22:06:45] greg-g, true, but i don't want to connect to prod and do a sync-dir or sync-file when it is not my depl time [22:07:15] and at the same time waiting for a few days to do a minor betalabs adjustment is a bit excessive when debugging things [22:08:21] (03PS1) 10Ori.livneh: Put $wgRCFeeds['rcs100x'] config behind $wmfRealm check [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137813 [22:08:54] yurikR: I just gave you permission though? [22:09:00] if you were worried about "not my time" [22:09:28] (03CR) 10BryanDavis: [C: 031] Put $wgRCFeeds['rcs100x'] config behind $wmfRealm check [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137813 (owner: 10Ori.livneh) [22:09:50] greg-g: ok to sync that ^ ? [22:10:22] yeah, and sync yurikR's thing if he's ready ;) [22:10:30] yurikR: ready? [22:10:42] (03CR) 10Ori.livneh: [C: 032] Put $wgRCFeeds['rcs100x'] config behind $wmfRealm check [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137813 (owner: 10Ori.livneh) [22:10:49] (03Merged) 10jenkins-bot: Put $wgRCFeeds['rcs100x'] config behind $wmfRealm check [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137813 (owner: 10Ori.livneh) [22:12:03] !log ori updated /a/common to {{Gerrit|Ife5081549}}: Put $wgRCFeeds['rcs100x'] config behind $wmfRealm check [22:12:05] (03PS1) 10Ori.livneh: Check $wmgUseRC2UDP for RCStream, too [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137814 [22:12:09] Logged the message, Master [22:12:42] (03CR) 10Ori.livneh: [C: 032] Check $wmgUseRC2UDP for RCStream, too [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137814 (owner: 10Ori.livneh) [22:12:48] (03Merged) 10jenkins-bot: Check $wmgUseRC2UDP for RCStream, too [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137814 (owner: 10Ori.livneh) [22:13:11] PROBLEM - Puppet freshness on virt1000 is CRITICAL: Last successful Puppet run was Thu 05 Jun 2014 19:12:41 UTC [22:13:47] TimStarling: hi, fyi, you can now have your own .bashrc on all servers. so you can take https://gerrit.wikimedia.org/r/#/c/76678/ and just put the .bashrc in modules/admin/files/home/tstarling like f.e. Alex did here https://gerrit.wikimedia.org/r/#/c/137295/ [22:14:48] ok, thanks [22:15:11] !log ori Synchronized wmf-config/CommonSettings.php: Ife5081549: Put $wgRCFeeds[rcs100x] config behind $wmfRealm check (duration: 00m 04s) [22:15:13] (03CR) 10BryanDavis: "+1 for what's here now. Let's see some logs from this in beta to find out if we need additional tags/filters to make them look nice." [operations/puppet] - 10https://gerrit.wikimedia.org/r/137811 (owner: 10Mwalker) [22:15:16] Logged the message, Master [22:15:34] bd808, I think we do need more filters: https://logstash-beta.wmflabs.org/#/dashboard/elasticsearch/default [22:15:42] damn; doesn't give the search term [22:15:46] type:gelf [22:16:37] hmmm... yeah not too pretty out of the box [22:16:58] greg-g: yurikR hasn't responded, so not syncing his change [22:17:18] ori: yep [22:17:18] ori, greg-g sorry, syncing in a sec [22:17:24] or should i not? [22:17:43] you should if greg-g okays it, but i'm no longer on tin so i won't [22:17:43] (03PS3) 10Yurik: Added zerowiki to labs & sorted list [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137812 [22:17:45] ori was just already in there, so was goign to for you, but now I think easiest for you, I guess ;) [22:17:49] just fixed per Reedy ^ [22:17:57] (03PS3) 10Dzahn: gitblit use generic::systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137392 (owner: 10Rush) [22:18:07] greg-g, Reedy told me to fix one more thing, ori go ahead [22:18:26] sigh [22:18:32] up to you ) [22:18:33] (03CR) 10Ori.livneh: [C: 032] Added zerowiki to labs & sorted list [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137812 (owner: 10Yurik) [22:18:40] (03Merged) 10jenkins-bot: Added zerowiki to labs & sorted list [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137812 (owner: 10Yurik) [22:18:46] thx ) [22:19:14] (03CR) 10Gage: [C: 031] "Looks great! I did not test anything, but I read all the diffs." (032 comments) [operations/puppet/cdh4] (cdh5) - 10https://gerrit.wikimedia.org/r/135494 (owner: 10Ottomata) [22:23:17] ori: I haven't had a chance yet to review the latest GeoIP cleanup patch in depth. I'll try to do that this evening [22:23:43] bblack: np, that one isn't actually pressing [22:23:57] that's just my WANT EVERYTHING NOW speaking [22:24:43] iirc you were worried about the strtok_r but i don't think we saw any evidence that it was causing issues, probably some safety check elsewhere is saving us [22:25:39] thanks for merging the other one [22:25:46] hmm, http://zero.wikimedia.beta.wmflabs.org/ is still not working ... Reedy, did i forget to change some other magic value? (taking notes on my wikitech user page so as not to bug you again on this) [22:26:05] jenkins has deployed it? [22:26:18] how do i check? [22:26:27] checking -deployment server [22:27:52] Reedy, the file is up to date on deployment-bastion [22:34:01] PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 525 bytes in 0.001 second response time [22:35:45] (03CR) 10Dzahn: [C: 032] gitblit use generic::systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137392 (owner: 10Rush) [22:38:11] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Tue 03 Jun 2014 16:21:49 UTC [22:38:51] Reedy, "domain is not configured", but documented everything so far at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/Overview#New_wiki [22:39:01] RECOVERY - graphite.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 0.004 second response time [22:39:11] PROBLEM - Number of mediawiki jobs running on tungsten is CRITICAL: CRITICAL: Anomaly detected: 36 data above and 0 below the confidence bounds [22:39:11] PROBLEM - Number of mediawiki jobs queued on tungsten is CRITICAL: CRITICAL: Anomaly detected: 36 data above and 0 below the confidence bounds [22:39:40] (03PS1) 10Rush: dns for phabricator.wikimedia.org [operations/dns] - 10https://gerrit.wikimedia.org/r/137825 [22:39:49] (03CR) 10jenkins-bot: [V: 04-1] dns for phabricator.wikimedia.org [operations/dns] - 10https://gerrit.wikimedia.org/r/137825 (owner: 10Rush) [22:43:42] (03PS1) 10Dzahn: gid not a valid parameter for generic::systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137827 [22:45:57] (03CR) 10Dzahn: [C: 032] gid not a valid parameter for generic::systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137827 (owner: 10Dzahn) [22:49:49] yurikR: https://github.com/wikimedia/operations-apache-config/tree/betacluster [22:50:11] Looks like you'll need to add an explicit vhost [22:50:36] (03PS4) 10Mwalker: Adding GELF support to LogStash [operations/puppet] - 10https://gerrit.wikimedia.org/r/137811 [22:52:37] (03PS2) 10Rush: dns for phabricator.wikimedia.org [operations/dns] - 10https://gerrit.wikimedia.org/r/137825 [22:52:45] (03CR) 10jenkins-bot: [V: 04-1] dns for phabricator.wikimedia.org [operations/dns] - 10https://gerrit.wikimedia.org/r/137825 (owner: 10Rush) [22:56:41] (03PS3) 10Rush: dns for phabricator.wikimedia.org [operations/dns] - 10https://gerrit.wikimedia.org/r/137825 [22:56:49] (03CR) 10jenkins-bot: [V: 04-1] dns for phabricator.wikimedia.org [operations/dns] - 10https://gerrit.wikimedia.org/r/137825 (owner: 10Rush) [22:56:53] Jenkins is revolting [22:57:35] (03PS4) 10Rush: dns for phabricator.wikimedia.org [operations/dns] - 10https://gerrit.wikimedia.org/r/137825 [22:57:42] (03CR) 10jenkins-bot: [V: 04-1] dns for phabricator.wikimedia.org [operations/dns] - 10https://gerrit.wikimedia.org/r/137825 (owner: 10Rush) [22:58:30] "general parse error" eh [23:00:04] mwalker, ori, MaxSem: The time is nigh to deploy SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140605T2300) [23:00:34] YESSIR! [23:00:38] PROBLEM - Puppet freshness on wtp1014 is CRITICAL: Last successful Puppet run was Thu 05 Jun 2014 22:57:53 UTC [23:00:39] !!1ONEONE [23:00:44] (03PS5) 10Rush: dns for phabricator.wikimedia.org [operations/dns] - 10https://gerrit.wikimedia.org/r/137825 [23:00:44] MaxSem: Deploy 9 [23:00:45] I dare you [23:01:05] I DOUBLEDARE YOU DEPLOYER [23:01:07] mutante: you should file a bug to improve the parse error messages! [23:01:17] SCAP, DO YOU SPEAK IT? [23:01:22] heheh [23:02:26] basically it's using a state-machine for most of the DNS syntax. if the state machine can't find a match, you get "general parse error", whereas if you have parseable DNS with some other issue (like NS records with no matching A-records) you get reasonable errors [23:02:34] James_F, preparing to deploy your change [23:02:38] PROBLEM - Puppet freshness on wtp1014 is CRITICAL: Last successful Puppet run was Thu 05 Jun 2014 22:57:53 UTC [23:03:11] (03CR) 10BBlack: [C: 031] dns for phabricator.wikimedia.org [operations/dns] - 10https://gerrit.wikimedia.org/r/137825 (owner: 10Rush) [23:04:10] puppet is fine on wtp1014, it must be the passive checks being overloaded again [23:04:29] MaxSem: Cool. [23:04:38] PROBLEM - Puppet freshness on wtp1014 is CRITICAL: Last successful Puppet run was Thu 05 Jun 2014 22:57:53 UTC [23:05:11] bblack: it was missing preference number after MX [23:06:12] !log maxsem Synchronized php-1.24wmf7/extensions/TemplateData: https://gerrit.wikimedia.org/r/#/c/137751/ (duration: 00m 04s) [23:06:16] Logged the message, Master [23:06:38] PROBLEM - Puppet freshness on wtp1014 is CRITICAL: Last successful Puppet run was Thu 05 Jun 2014 22:57:53 UTC [23:06:57] bblack: should i really report that? against the debian package ? [23:08:38] PROBLEM - Puppet freshness on wtp1014 is CRITICAL: Last successful Puppet run was Thu 05 Jun 2014 22:57:53 UTC [23:08:49] (03PS2) 10Dzahn: webperf use generic::systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137377 (owner: 10Rush) [23:08:50] MaxSem: Thanks! [23:10:03] (03PS2) 10Dzahn: txstatsd use generic::systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137378 (owner: 10Rush) [23:10:38] PROBLEM - Puppet freshness on wtp1014 is CRITICAL: Last successful Puppet run was Thu 05 Jun 2014 22:57:53 UTC [23:10:57] (03PS2) 10Dzahn: tcpircbot use generic::systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137379 (owner: 10Rush) [23:11:25] (03PS5) 10Mwalker: Adding GELF support to LogStash [operations/puppet] - 10https://gerrit.wikimedia.org/r/137811 [23:11:52] (03PS2) 10Dzahn: spamassassin use generic::systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137380 (owner: 10Rush) [23:12:38] PROBLEM - Puppet freshness on wtp1014 is CRITICAL: Last successful Puppet run was Thu 05 Jun 2014 22:57:53 UTC [23:13:19] (03PS2) 10Dzahn: ocg use generic::systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137384 (owner: 10Rush) [23:14:12] (03PS2) 10Dzahn: kiwix use generic::systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137394 (owner: 10Rush) [23:14:38] PROBLEM - Puppet freshness on wtp1014 is CRITICAL: Last successful Puppet run was Thu 05 Jun 2014 22:57:53 UTC [23:15:56] (03PS2) 10Dzahn: limn use generic::systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137388 (owner: 10Rush) [23:16:18] !log maxsem Synchronized php-1.24wmf8/includes/ChangeTags.php: https://gerrit.wikimedia.org/r/#/c/137563/ (duration: 00m 03s) [23:16:22] Logged the message, Master [23:16:38] PROBLEM - Puppet freshness on wtp1014 is CRITICAL: Last successful Puppet run was Thu 05 Jun 2014 22:57:53 UTC [23:17:50] (03PS2) 10Dzahn: mediawiki use generic::systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137387 (owner: 10Rush) [23:18:21] !log maxsem Synchronized php-1.24wmf7/includes/ChangeTags.php: https://gerrit.wikimedia.org/r/#/c/137563/ (duration: 00m 03s) [23:18:25] Logged the message, Master [23:18:32] Deskana, deployed ^^^ [23:18:38] PROBLEM - Puppet freshness on wtp1014 is CRITICAL: Last successful Puppet run was Thu 05 Jun 2014 22:57:53 UTC [23:19:20] (03PS2) 10Dzahn: librenms use generic::systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137389 (owner: 10Rush) [23:20:24] (03PS2) 10Dzahn: mwprof use generic::systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137386 (owner: 10Rush) [23:20:38] PROBLEM - Puppet freshness on wtp1014 is CRITICAL: Last successful Puppet run was Thu 05 Jun 2014 22:57:53 UTC [23:21:32] (03PS2) 10Dzahn: icinga use generic::systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137397 (owner: 10Rush) [23:22:08] (03CR) 10Dzahn: [C: 032] ocg use generic::systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137384 (owner: 10Rush) [23:22:38] PROBLEM - Puppet freshness on wtp1014 is CRITICAL: Last successful Puppet run was Thu 05 Jun 2014 22:57:53 UTC [23:23:23] Reedy, i looked at the betacluster branch of the apache-config, should i break the wikimedia.conf link to placeholder.conf and instead copy content of the master's branch? [23:24:38] PROBLEM - Puppet freshness on wtp1014 is CRITICAL: Last successful Puppet run was Thu 05 Jun 2014 22:57:53 UTC [23:26:38] PROBLEM - Puppet freshness on wtp1014 is CRITICAL: Last successful Puppet run was Thu 05 Jun 2014 22:57:53 UTC [23:28:38] PROBLEM - Puppet freshness on wtp1014 is CRITICAL: Last successful Puppet run was Thu 05 Jun 2014 22:57:53 UTC [23:28:48] RECOVERY - Puppet freshness on wtp1014 is OK: puppet ran at Thu Jun 5 23:28:38 UTC 2014 [23:34:19] (03PS1) 10Dzahn: Revert "ocg use generic::systemuser" [operations/puppet] - 10https://gerrit.wikimedia.org/r/137833 [23:35:24] (03CR) 10Dzahn: [C: 032] "Duplicate parameter 'home' for on Generic::Systemuser" [operations/puppet] - 10https://gerrit.wikimedia.org/r/137833 (owner: 10Dzahn) [23:35:27] Reedy, https://gerrit.wikimedia.org/r/#/c/137834/ [23:46:18] (03CR) 10Dzahn: [C: 032] Update all Bugzilla custom files which have only trivial changes to use MPL 2.0 [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/119726 (https://bugzilla.wikimedia.org/61499) (owner: 1001tonythomas) [23:58:04] (03PS3) 10Dzahn: Update all Bugzilla custom files which have only trivial changes to use MPL 2.0 [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/119726 (https://bugzilla.wikimedia.org/61499) (owner: 1001tonythomas)