[00:01:56] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Server Error - 1703 bytes in 7.366 second response time [00:11:56] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 324786 bytes in 7.906 second response time [00:15:26] PROBLEM - Puppet freshness on stat1 is CRITICAL: Last successful Puppet run was Wed 02 Apr 2014 03:12:29 PM UTC [00:15:30] (03PS1) 10coren: Tool Labs: tweaks to the YuviProxy for tools [operations/puppet] - 10https://gerrit.wikimedia.org/r/123491 [00:18:42] Come on, Jenkins. [00:19:02] (03CR) 10coren: [C: 032] "Well, either this works or it doesn't." [operations/puppet] - 10https://gerrit.wikimedia.org/r/123491 (owner: 10coren) [00:26:20] (03PS1) 10Ori.livneh: hhvm on beta: use 'hhvm' package, not 'hhvm-fastcgi' [operations/puppet] - 10https://gerrit.wikimedia.org/r/123494 [00:26:34] (03CR) 10Ori.livneh: [C: 032] Follow-up to Iacd6a8250: remove reference to repack-libmemcached10 [operations/puppet] - 10https://gerrit.wikimedia.org/r/123455 (owner: 10Ori.livneh) [00:26:44] (03PS2) 10Ori.livneh: hhvm on beta: use 'hhvm' package, not 'hhvm-fastcgi' [operations/puppet] - 10https://gerrit.wikimedia.org/r/123494 [00:29:56] (03CR) 10Ori.livneh: [C: 032] hhvm on beta: use 'hhvm' package, not 'hhvm-fastcgi' [operations/puppet] - 10https://gerrit.wikimedia.org/r/123494 (owner: 10Ori.livneh) [00:34:26] PROBLEM - Puppet freshness on stat1003 is CRITICAL: Last successful Puppet run was Wed 02 Apr 2014 06:33:14 PM UTC [02:24:08] !log LocalisationUpdate completed (1.23wmf19) at 2014-04-03 02:24:07+00:00 [02:24:14] Logged the message, Master [02:35:54] !log springle synchronized wmf-config/db-eqiad.php 's1 depool db1061 for upgrade' [02:35:59] Logged the message, Master [02:45:25] !log springle synchronized wmf-config/db-eqiad.php 's1 repool db1061, warm up' [02:45:31] Logged the message, Master [02:47:40] !log springle synchronized wmf-config/db-eqiad.php 's2 depool db1060 for upgrade' [02:47:46] Logged the message, Master [02:48:02] !log LocalisationUpdate completed (1.23wmf20) at 2014-04-03 02:48:01+00:00 [02:48:07] Logged the message, Master [02:56:34] !log springle synchronized wmf-config/db-eqiad.php 's2 repool db1060, warm up' [02:56:38] Logged the message, Master [02:57:55] !log springle synchronized wmf-config/db-eqiad.php 's3 depool db1019 for upgrade' [02:58:00] Logged the message, Master [03:02:23] (03PS1) 10coren: Tool Labs: fix nginx proxy config so that it works [operations/puppet] - 10https://gerrit.wikimedia.org/r/123508 [03:04:11] (03CR) 10coren: [C: 032] "It works!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/123508 (owner: 10coren) [03:12:31] !log springle synchronized wmf-config/db-eqiad.php 's3 repool db1019, warm up' [03:12:36] Logged the message, Master [03:14:18] !log springle synchronized wmf-config/db-eqiad.php 's4 depool db1020 for upgrade' [03:14:23] Logged the message, Master [03:16:26] PROBLEM - Puppet freshness on stat1 is CRITICAL: Last successful Puppet run was Wed 02 Apr 2014 03:12:29 PM UTC [03:19:46] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [03:33:59] !log db1020 raid controller dimm ecc errors [03:34:06] Logged the message, Master [03:35:26] PROBLEM - Puppet freshness on stat1003 is CRITICAL: Last successful Puppet run was Wed 02 Apr 2014 06:33:14 PM UTC [03:40:43] (03PS1) 10Springle: Depool. See RT 7191. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/123512 [03:41:26] (03CR) 10Springle: [C: 032] Depool. See RT 7191. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/123512 (owner: 10Springle) [03:41:33] (03Merged) 10jenkins-bot: Depool. See RT 7191. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/123512 (owner: 10Springle) [03:53:21] !log LocalisationUpdate ResourceLoader cache refresh completed at Thu Apr 3 03:53:18 UTC 2014 (duration 53m 16s) [03:53:26] Logged the message, Master [03:53:49] !log springle synchronized wmf-config/db-eqiad.php 's5 depool db1037 for upgrade' [03:53:54] Logged the message, Master [04:03:15] !log springle synchronized wmf-config/db-eqiad.php 's5 repool db1037, warm up' [04:03:21] Logged the message, Master [04:04:52] !log springle synchronized wmf-config/db-eqiad.php 's6 depool db1015 for upgrade' [04:04:57] Logged the message, Master [04:11:36] !log springle synchronized wmf-config/db-eqiad.php 's6 repool db1015, warm up' [04:11:41] Logged the message, Master [04:43:24] !log springle synchronized wmf-config/db-eqiad.php 'return upgraded DB slaves to normal load' [04:43:28] Logged the message, Master [05:33:33] morning [05:34:47] <_joe|away> hi paravoid [06:17:26] PROBLEM - Puppet freshness on stat1 is CRITICAL: Last successful Puppet run was Wed 02 Apr 2014 03:12:29 PM UTC [06:23:53] _joe_: want to ACK this ^ ? was turned ogg by otto yesterday [06:23:59] *off [06:24:41] (03CR) 10Matanya: [C: 031] ganglia: address selector in a define [operations/puppet] - 10https://gerrit.wikimedia.org/r/123422 (owner: 10Hashar) [06:25:26] <_joe_> well, I'm used to acknowledge alarms only when I will be the one fixing them... but in this case, the server is going to be decommisioned, right? [06:27:44] ACKNOWLEDGEMENT - Puppet freshness on stat1 is CRITICAL: Last successful Puppet run was Wed 02 Apr 2014 03:12:29 PM UTC Giuseppe Lavagetto Puppet is disabled on stat1 for now as its being decommissioned [06:27:52] yes, it is [06:28:02] thank you [06:28:07] <_joe_> np [06:28:15] !log powercycling ms-be1003, unresponsive, no console output [06:28:16] RECOVERY - Host ms-be1003 is UP: PING OK - Packet loss = 0%, RTA = 0.21 ms [06:28:21] Logged the message, Master [06:36:26] PROBLEM - Puppet freshness on stat1003 is CRITICAL: Last successful Puppet run was Wed 02 Apr 2014 06:33:14 PM UTC [07:02:56] PROBLEM - Host mobile-lb.esams.wikimedia.org_ipv6 is DOWN: PING CRITICAL - Packet loss = 100% [07:03:06] PROBLEM - Host bits-lb.esams.wikimedia.org_ipv6 is DOWN: PING CRITICAL - Packet loss = 100% [07:03:08] PROBLEM - Host text-lb.esams.wikimedia.org_ipv6 is DOWN: PING CRITICAL - Packet loss = 100% [07:03:16] RECOVERY - Host mobile-lb.esams.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 89.46 ms [07:03:26] RECOVERY - Host text-lb.esams.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 90.92 ms [07:03:56] RECOVERY - Host bits-lb.esams.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 89.50 ms [07:04:01] looking [07:14:36] (03CR) 10Alexandros Kosiaris: [C: 032] ganglia: address selector in a define [operations/puppet] - 10https://gerrit.wikimedia.org/r/123422 (owner: 10Hashar) [07:38:48] (03PS2) 10Matanya: applicationserver: lint and tidy [operations/puppet] - 10https://gerrit.wikimedia.org/r/122269 [07:40:26] (03PS2) 10Matanya: exim: fix scoping [operations/puppet] - 10https://gerrit.wikimedia.org/r/119496 [07:41:01] (03PS2) 10Matanya: openstack: qualify var [operations/puppet] - 10https://gerrit.wikimedia.org/r/119488 [07:43:11] (03PS2) 10Matanya: dataset: fix module path [operations/puppet] - 10https://gerrit.wikimedia.org/r/119212 [07:47:28] (03PS9) 10Matanya: Torrus: add torrus to netmon1001 [operations/puppet] - 10https://gerrit.wikimedia.org/r/108314 [08:08:10] akosiaris: can you please merge: https://gerrit.wikimedia.org/r/#/c/123235/ ? [08:19:51] well it is cherry picked on the puppetmaster [08:20:04] we get to find some process to have those changes merged in bulks [08:32:03] (03CR) 10Odder: [C: 031] Add Musées de la Haute-Saône to wgCopyUploadsDomains [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/123459 (owner: 10Jean-Frédéric) [08:35:36] (03PS1) 10Yuvipanda: toollabs: Make proxylistener log to a file [operations/puppet] - 10https://gerrit.wikimedia.org/r/123540 [08:35:53] anyone to do a trivial +2 for a file that's not in use yet? [08:35:56] Coren: andrewbogott_afk ^ [08:36:19] * YuviPanda checks timezones, wonders if apergos or akosiaris are around  [08:36:38] yes i am [08:37:34] (03CR) 10Alexandros Kosiaris: [C: 032] contint: bring puppet-lint on lab slaves [operations/puppet] - 10https://gerrit.wikimedia.org/r/123235 (owner: 10Hashar) [08:38:05] akosiaris: can you +2 that? Trivial patch, plus it's not running in production anywhere, and I am testing it right now (am toollabs admin) [08:38:12] was working with Coren yesterday but damn timezones! [08:40:44] <_joe_> YuviPanda: I can take a look if needed and no-one else is around, btw [08:40:54] _joe_: woo! thank you! :) [08:41:03] YuviPanda: ah... you wouldn't know btw the culprit of these requests "GET [08:41:03] /r/changes/?q=464a5db280c935cb04b4810775c086fbd461e50b&o=CURRENT_REVISION&o=CURRENT_FILES&n=1 [08:41:03] HTTP/1.1" 200 914 T=0s "-" "python-requests/1.2.3 CPython/2.7.3 [08:41:03] Linux/3.2.0-59-virtual" [08:41:04] ? [08:41:15] 3 lines ? damn my client [08:41:18] <_joe_> akosiaris: I was about to ask as well? [08:41:37] <_joe_> ehm s/.$/!/ [08:42:25] akosiaris: no, but could be toollabs. [08:42:27] <_joe_> YuviPanda: we've seen several tool labs ips hammering gerrit with such requests... [08:42:39] oh it is tool labs for sure [08:42:50] yesterday the originated from tools-exec3 [08:42:58] let me look at what's running there [08:42:59] <_joe_> YuviPanda: in the order of ~ 200/min, which apparently is enough to tear gerrit down [08:43:14] <_joe_> akosiaris: there were 8 different IPs yesterday [08:43:17] oh wow, that's not very high. [08:43:23] _joe_: yes I was about to say that [08:43:54] YuviPanda: you wont find it running now, I iptabled the IP and the script probably threw an exception due to a timeout and died [08:44:29] the obviously gerrit related tools I could find are gerrit-to-redis and gerrit-patch-uploader. Former is mine and doesn't make any http requests, latter is scfe_de's [08:44:34] <_joe_> YuviPanda: not very high, but the response time for those requests is ~ 1 sec+ [08:45:13] <_joe_> so it's easy to see how making a lot of them in parallel can render gerrit unavailable [08:45:18] right. [08:45:24] https://gerrit.wikimedia.org/r/#/c/123540 LGTM, _joe_ wanna have a look too ? merge it if you +2 it :-) [08:45:56] actually, it might be gerrit-reviewer-bot [08:46:08] since the functionality it does would require a very similar query [08:46:27] hmm, but that's on webgrid now. [08:47:13] akosiaris: _joe_ emailing labs-l seems to be the best option atm, I think [08:49:02] (03CR) 10Giuseppe Lavagetto: [C: 032] toollabs: Make proxylistener log to a file [operations/puppet] - 10https://gerrit.wikimedia.org/r/123540 (owner: 10Yuvipanda) [08:49:10] woo, thank you [08:50:29] <_joe_> YuviPanda: I wil also look at what proxylistener does also, as soon as possible [08:50:49] * _joe_ adds another thing to the list of things to study [08:50:50] _joe_: :) A lot of it was fixed yesterday by Coren when I was sleeping, apparently :) Am looking through the fixes now. [08:54:31] \o/ http://tools-proxy-test.wmflabs.org/wp-signpost/cgi-bin/api.py/feed [08:54:32] works! [08:54:57] and cleans up properly when I stop it [08:54:58] woo [09:08:41] advice please: i have started modulrazing http://stats.wikimedia.org/ what should i call the module? wikistats is taken, and stats might lead to confustion with statsd [09:08:43] idea? [09:09:12] stats-wm [09:09:19] stats-wm-o even better [09:10:04] like it. more input e.g akosiaris ? [09:10:22] or _joe_ ? [09:10:57] conversion or completely new ? [09:11:16] manifests/misc/statistics.pp i suppose [09:11:37] otto asked me to pull statistics apart [09:11:51] this is the very first phase of this [09:11:57] goddamn [09:12:04] inheritance in roles ? [09:12:19] this is going to be quite a lot of work... [09:12:19] yes, very hard and and sad [09:12:27] boy do i know that :) [09:12:40] that is why no one is doing it? [09:13:10] lack of time is a more probable cause [09:13:39] why not just statistics ? [09:13:40] that is what i meant, very hard, not to urgent [09:13:48] it exists [09:15:10] i don't see a statistics module ... [09:15:47] yeah. i have a branch with it, sorry [09:23:31] matanya: can you explain in more detail what you're trying to achieve ? [09:23:34] just curious [09:24:43] i'm taking the part that builds http://stats.wikimedia.org/ inside manifests/misc/statistics.pp and building a module that takes care of it [09:25:44] (03CR) 10Hashar: jobrunner: reduce polling on beta cluster (036 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/123444 (owner: 10Hashar) [09:25:59] (03PS2) 10Hashar: jobrunner: reduce polling on beta cluster [operations/puppet] - 10https://gerrit.wikimedia.org/r/123444 [09:31:19] (03CR) 10Hashar: "Cherry picked on beta cluster puppetmaster deployment-salt.eqiad.wmflabs Will see how it soften the load on deployment-jobrunner01." [operations/puppet] - 10https://gerrit.wikimedia.org/r/123444 (owner: 10Hashar) [09:35:40] (03CR) 10Alexandros Kosiaris: [C: 04-1] mail :lint (034 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/109514 (owner: 10Matanya) [09:37:26] PROBLEM - Puppet freshness on stat1003 is CRITICAL: Last successful Puppet run was Wed 02 Apr 2014 06:33:14 PM UTC [09:37:36] (03CR) 10Alexandros Kosiaris: "I suppose this requires RT ticket and manager approval ?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/123433 (owner: 10GWicke) [09:38:42] simplewiki-e9a708a7: 1.0513 22.8M [memcached] get(simplewiki:jobqueuegroup:taskruns:v1) [09:38:42] simplewiki-e9a708a7: 1.0520 22.8M [memcached] add(simplewiki:jobqueuegroup:taskruns:v1) [09:38:42] simplewiki-e9a708a7: 1.0527 22.8M [memcached] result: NOT STORED [09:38:44] yeahhhh [09:40:32] (on beta cluster) [09:43:01] (03CR) 10Alexandros Kosiaris: [C: 032] Create u_kolossos,u_aude dbs and users [operations/puppet] - 10https://gerrit.wikimedia.org/r/123191 (owner: 10Alexandros Kosiaris) [09:44:42] :) [09:45:15] (03PS3) 10Alexandros Kosiaris: brewster to carbon migration [operations/puppet] - 10https://gerrit.wikimedia.org/r/123202 [09:47:30] heads up, will turn apt/proxy/tftp and other brewster services over to carbon in the next couple of mins. Hopefully I wont need to revert :-) [09:48:30] akosiaris: we have some DNS service entries I have setup. Ie webproxy.pmtpa.wmnet and webproxy.eqiad.wmnet , CNAME to carbon/brewster [09:49:02] (03CR) 10Alexandros Kosiaris: "@Tim, no this is by design an empty page." [operations/puppet] - 10https://gerrit.wikimedia.org/r/123202 (owner: 10Alexandros Kosiaris) [09:49:23] hashar: https://gerrit.wikimedia.org/r/#/c/123206/ [09:49:36] I believe it covers you [09:49:55] sorry [09:49:57] I meant https://gerrit.wikimedia.org/r/#/c/123205/1 [09:50:19] I think I use something like webproxy.${::site}.wmnet [09:50:47] sounds fine [09:51:10] yeah. eqiad is already pointing to carbon, now turning pmtpa over to carbon as well, well ulsfo and esams I suppose you don't use :-) [09:51:29] (03CR) 10Hashar: "sounds good" (031 comment) [operations/dns] - 10https://gerrit.wikimedia.org/r/123205 (owner: 10Alexandros Kosiaris) [09:51:45] (03CR) 10Alexandros Kosiaris: [C: 032] brewster to carbon migration [operations/puppet] - 10https://gerrit.wikimedia.org/r/123202 (owner: 10Alexandros Kosiaris) [09:53:52] (03CR) 10Alexandros Kosiaris: [C: 032] Move brewster services to carbon [operations/dns] - 10https://gerrit.wikimedia.org/r/123205 (owner: 10Alexandros Kosiaris) [09:54:42] ok, let's see what I broke now [10:04:35] (03CR) 10Nuria: [C: 031] "This is been tested on both vagrant and staging and it is ready to be merged. I thought I did have merge rights on this repo?" [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/123244 (owner: 10Nuria) [10:11:17] (03CR) 10Nuria: [C: 031] "Tested on vagrant plus staging and things are working. This is ready to be merged." [operations/puppet] - 10https://gerrit.wikimedia.org/r/122425 (owner: 10Nuria) [10:18:15] (03PS1) 10Hashar: hhvm: abstract out backports to a hhvm module [operations/puppet] - 10https://gerrit.wikimedia.org/r/123573 [10:19:16] (03Abandoned) 10Hashar: contint: remove hhvm from slaves [operations/puppet] - 10https://gerrit.wikimedia.org/r/121675 (owner: 10Hashar) [10:24:51] (03PS6) 10Matanya: mail :lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/109514 [10:25:45] (03CR) 10Hashar: "Cherry picked on integration-puppetmaster . That managed to bring up hhvm 3.0.1 on the two Jenkins slaves: integration-slave1001.eqiad.wmf" [operations/puppet] - 10https://gerrit.wikimedia.org/r/123573 (owner: 10Hashar) [10:26:14] !log Jenkins job mediawiki-core-phpunit-hhvm is back around thanks to {{gerrit|123573}} [10:26:19] Logged the message, Master [10:33:37] lunch time bbl [10:51:42] !log temporarily stopped squid on brewster [10:51:46] Logged the message, Master [10:52:14] we got a page [10:52:18] I assume it's you? :) [10:52:28] I didnt [10:52:51] a nimsoft [10:52:52] here it is [10:52:55] yeah it is me [10:53:04] there is one point that is not puppet managed it seems [10:53:12] /etc/apt/apt.conf [10:53:17] that points to brewster [10:54:19] * _joe_ luch time [10:56:36] PROBLEM - LVS HTTPS IPv6 on text-lb.esams.wikimedia.org_ipv6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:01:26] RECOVERY - LVS HTTPS IPv6 on text-lb.esams.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 65638 bytes in 1.973 second response time [11:02:11] wth is going on? [11:02:41] with the IPv6 only error in esams ? I have no idea... [11:03:53] hmm [11:03:55] I think I know [11:04:29] * matanya suspects capella decom [11:05:25] capella has nothing to do with anything [11:05:31] okay, I see two different problems, this is weird [11:05:59] * matanya stopps geussing [11:06:29] (03PS1) 10Alexandros Kosiaris: Correct typo in ubuntu mirror's address [operations/dns] - 10https://gerrit.wikimedia.org/r/123585 [11:06:55] (03CR) 10Alexandros Kosiaris: [C: 032] Correct typo in ubuntu mirror's address [operations/dns] - 10https://gerrit.wikimedia.org/r/123585 (owner: 10Alexandros Kosiaris) [11:08:40] !log deactivating cr1-esams<->HE peering, latency > 160ms, over at 200ms (congestion?); back to 84ms now; [11:08:45] Logged the message, Master [11:09:34] (03CR) 10Ori.livneh: hhvm: abstract out backports to a hhvm module (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/123573 (owner: 10Hashar) [11:09:35] !log affects both IPv6 transit at esams (slowdowns) as well as IPv6 eqiad<->esams [11:09:39] Logged the message, Master [11:10:25] !log IPv4 eqiad<->esams private link also elevated by ~15ms but no packet loss observed [11:10:30] Logged the message, Master [11:10:33] wow, something big has happened [11:10:37] someone is having some fun I'm sure [11:40:03] matanya: apparently puppet-lint supports having a file named .puppet-lint.rc which let one pass additional arguments [11:40:17] neat! [11:42:43] matanya: top offenders http://paste.openstack.org/show/74960/ :-] [11:43:30] hashar: for the whole repo? [11:43:41] yeah + submodules probably [11:44:41] the autoloader_layout is a bit of a lie, because of the roles [11:45:19] now if you could attach each one to a file, i'll be flad to go and fix it [11:45:24] _joe_ was talking about it this morning [11:45:34] we might want to move roles to a role module as well [11:46:20] <_joe_> hashar: mh, I'd wait to syphon a few things out with hiera probably [11:46:52] that would be very nice [11:47:06] (03PS1) 10Hashar: puppet-lint: ignore class_parameter_defaults [operations/puppet] - 10https://gerrit.wikimedia.org/r/123600 [11:47:34] <_joe_> matanya: we need puppet 3 for a decent hiera integration :) [11:47:54] (03CR) 10Hashar: "That might help get less errors :-] Probably want to phase out the 'rake lint' target in favor of invoking puppet-lint from the root of t" [operations/puppet] - 10https://gerrit.wikimedia.org/r/123600 (owner: 10Hashar) [11:48:05] _joe_: i know, one of my open tasks. https://etherpad.wikimedia.org/p/Puppet3 [11:48:24] <_joe_> matanya: I know that too, :) [11:48:40] good, we can join forces :) [11:49:19] <_joe_> yep, at the moment I'm more concentrated on understanding the architecture [11:49:34] (03CR) 10Matanya: [C: 031] puppet-lint: ignore class_parameter_defaults [operations/puppet] - 10https://gerrit.wikimedia.org/r/123600 (owner: 10Hashar) [11:54:21] (03PS1) 10Matanya: [WIP] stats.wikimedia.org: strip out from misc into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/123601 [11:55:33] (03CR) 10jenkins-bot: [V: 04-1] [WIP] stats.wikimedia.org: strip out from misc into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/123601 (owner: 10Matanya) [11:55:37] this is so awful [11:59:21] (03PS2) 10Matanya: [WIP] stats.wikimedia.org: strip out from misc into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/123601 [12:27:26] (03PS11) 10Hashar: sanity test for refreshWikiversionsCDB [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/105698 [12:28:00] (03PS1) 10coren: Tool Labs: more tweaks to the urlproxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/123610 [12:30:53] (03CR) 10coren: [C: 032] "I get to keep both pieces." [operations/puppet] - 10https://gerrit.wikimedia.org/r/123610 (owner: 10coren) [12:36:46] PROBLEM - Host db1016 is DOWN: PING CRITICAL - Packet loss = 0%, RTA = 5169.70 ms [12:36:56] RECOVERY - Host db1016 is UP: PING OK - Packet loss = 0%, RTA = 0.31 ms [12:38:26] PROBLEM - Puppet freshness on stat1003 is CRITICAL: Last successful Puppet run was Wed 02 Apr 2014 06:33:14 PM UTC [12:39:47] akosiaris: can push decom patch for brewster ? [12:40:18] matanya: sure. but don't expect it to be merged today. I 'd like to make sure I have not forgotten something first [12:40:30] btw dns is already done [12:40:34] (03PS1) 10coren: Tool Labs: tweaks to urlproxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/123615 [12:40:37] well the patchset it there [12:40:47] sure, a week or so i guess [12:41:00] i suspect the only thing left should be manifests/site.pp [12:42:30] akosiaris: there is the puppet proxy to palladium and the backup client [12:42:36] (03PS1) 10Hashar: beta: switch $wgEventLoggingFile to eqiad [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/123616 [12:42:38] (03PS1) 10Hashar: beta: drop pmtpa reference for CirrusSearch [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/123617 [12:42:40] (03PS1) 10Hashar: beta: drop pmtpa cache configuration [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/123618 [12:42:42] (03PS1) 10Hashar: beta: drop pmtpa configuration for databases [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/123619 [12:42:44] (03PS1) 10Hashar: beta: drop pmtpa configuration for memcached [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/123620 [12:42:46] (03PS1) 10Hashar: beta: drop pmtpa configuration for Parsoid [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/123621 [12:42:47] the puppet proxy is not needed [12:42:48] (03PS1) 10Hashar: beta: drop wmfUdp2logDest for pmtpa [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/123622 [12:42:50] (03PS1) 10Hashar: beta: drop pmtpa configuration for redis job server [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/123623 [12:42:52] (03PS1) 10Hashar: beta: point wgUDPProfilerHost to eqiad instance [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/123624 [12:43:10] the backup client ? [12:43:17] carbon already has it [12:43:18] so ? [12:44:06] (03CR) 10coren: [C: 032] Tool Labs: tweaks to urlproxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/123615 (owner: 10coren) [12:45:17] ok, pushing [12:45:28] (03PS1) 10Matanya: decom : brewster [operations/puppet] - 10https://gerrit.wikimedia.org/r/123626 [12:53:14] (03PS1) 10Alexandros Kosiaris: Purge /etc/apt/apt.conf [operations/puppet] - 10https://gerrit.wikimedia.org/r/123628 [12:54:33] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Giving it a -1 just to avoid merging it before some time (like a week) passes." [operations/puppet] - 10https://gerrit.wikimedia.org/r/123626 (owner: 10Matanya) [12:55:04] (03CR) 10Manybubbles: [C: 031] "Happy to sync whenever. I have a SWAT window in about two hours. The big thing is making sure Chad is around for a while after the sync " [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/123461 (owner: 10Chad) [12:55:36] (03PS1) 10coren: Tool Labs: moar urlproxy tweaks [operations/puppet] - 10https://gerrit.wikimedia.org/r/123629 [12:56:32] akosiaris: what does d-i stand for ? [12:56:45] disk image? :) [12:56:53] debian-installer [12:57:13] ah, should have known [12:58:57] (03CR) 10coren: [C: 032] Tool Labs: moar urlproxy tweaks [operations/puppet] - 10https://gerrit.wikimedia.org/r/123629 (owner: 10coren) [13:26:40] <_joe_> gerrit unresponsive again? [13:39:15] yes [13:40:26] <_joe_> it had recovered [13:42:54] <_joe_> now it seems to be visited by our friendly googlebot [13:45:20] (03CR) 10Giuseppe Lavagetto: "It's not clear to me how you decided to split things between the classes:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/123601 (owner: 10Matanya) [13:49:07] (03CR) 10Matanya: "basically, anything that is wmf specific is in the role class. since i'm not familiar with this i just proposed some start modular work, i" [operations/puppet] - 10https://gerrit.wikimedia.org/r/123601 (owner: 10Matanya) [14:30:58] (03CR) 10Ottomata: [C: 032 V: 032] Making sure apache can write to /var/lib/wikimetrics directory [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/123244 (owner: 10Nuria) [14:35:05] (03PS1) 10Ottomata: Using www-data as wikimetrics group if running in apache web mode [operations/puppet] - 10https://gerrit.wikimedia.org/r/123639 [14:45:59] (03CR) 10Ottomata: [C: 032 V: 032] Using www-data as wikimetrics group if running in apache web mode [operations/puppet] - 10https://gerrit.wikimedia.org/r/123639 (owner: 10Ottomata) [14:46:56] (03PS1) 10Ottomata: Updating wikimetrics submodule [operations/puppet] - 10https://gerrit.wikimedia.org/r/123640 [14:46:56] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:47:07] (03CR) 10Ottomata: [C: 032 V: 032] Updating wikimetrics submodule [operations/puppet] - 10https://gerrit.wikimedia.org/r/123640 (owner: 10Ottomata) [14:48:56] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 638954 bytes in 9.932 second response time [15:01:09] (03PS2) 10BBlack: Add OM and HTTPS support for 635-10. [operations/puppet] - 10https://gerrit.wikimedia.org/r/122851 (owner: 10Dr0ptp4kt) [15:01:24] (03CR) 10BBlack: [C: 032 V: 032] Add OM and HTTPS support for 635-10. [operations/puppet] - 10https://gerrit.wikimedia.org/r/122851 (owner: 10Dr0ptp4kt) [15:14:46] grrr gerrit is still slow as hell for me. can we influence routing to not transfer me along japan ? [15:16:01] Clone of 'https://gerrit.wikimedia.org/r/p/mediawiki/extensions/DataValues.git' into submodule path 'extensions/DataValues' failed [15:16:02] :( [15:17:12] (03PS1) 10BBlack: add lvs role to new esams lvses [operations/puppet] - 10https://gerrit.wikimedia.org/r/123650 [15:23:41] (03PS2) 10BBlack: add lvs role to new esams lvses [operations/puppet] - 10https://gerrit.wikimedia.org/r/123650 [15:37:55] (03PS1) 10BBlack: add puppet alias in esams.wmnet [operations/dns] - 10https://gerrit.wikimedia.org/r/123656 [15:38:33] (03CR) 10BBlack: [C: 032 V: 032] add puppet alias in esams.wmnet [operations/dns] - 10https://gerrit.wikimedia.org/r/123656 (owner: 10BBlack) [15:39:26] PROBLEM - Puppet freshness on stat1003 is CRITICAL: Last successful Puppet run was Wed 02 Apr 2014 06:33:14 PM UTC [15:40:21] (03CR) 10BBlack: [C: 032 V: 032] add lvs role to new esams lvses [operations/puppet] - 10https://gerrit.wikimedia.org/r/123650 (owner: 10BBlack) [15:48:50] Cloning into 'extensions/DataValues'... [15:48:50] fatal: remote error: Git repository not found [15:48:50] Clone of 'https://gerrit.wikimedia.org/r/p/mediawiki/extensions/DataValues.git' into submodule path 'extensions/DataValues' failed [15:48:50] Error updating submodules [15:49:12] Oh [15:49:13] Repo no exist [15:49:15] aude: ^^ [15:49:34] think chad did something to it [15:49:44] where is it still referenced? [15:49:56] wmf/1.23wmf21 is trying to check it out [15:50:03] ugh [15:50:33] It's not in make-wmf-branch/default.conf [15:50:34] it's not needed at all [15:50:59] (03PS1) 10BBlack: lvs300[1234] -> empty service ips for now [operations/puppet] - 10https://gerrit.wikimedia.org/r/123658 [15:51:26] (03CR) 10BBlack: [C: 032 V: 032] lvs300[1234] -> empty service ips for now [operations/puppet] - 10https://gerrit.wikimedia.org/r/123658 (owner: 10BBlack) [15:51:39] * aude sees "./wmf-config/extension-list-wikidata:$IP/extensions/DataValues/DataValues/DataValues.i18n.php" [15:51:44] that should be eliminated [15:54:49] * Reedy checks out hte repo locally [15:54:52] (03PS1) 10Aude: remove extension-list-wikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/123659 [15:55:20] all you need is Wikidata now (and maybe Wikibase if youwant to fix something) [15:55:24] Wikidata.git [15:57:53] (03PS1) 10BBlack: add 10.20.0.0/24 => esams for $::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/123661 [15:58:38] (03CR) 10BBlack: [C: 032 V: 032] add 10.20.0.0/24 => esams for $::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/123661 (owner: 10BBlack) [15:58:47] (03CR) 10GWicke: "These hosts are used for a) temporary Cassandra testing, and b) Parsoid round-trip testing which previously lived in labs." [operations/puppet] - 10https://gerrit.wikimedia.org/r/123433 (owner: 10GWicke) [16:00:46] (03CR) 10GWicke: "Created https://rt.wikimedia.org/Ticket/Display.html?id=7192 with the same content." [operations/puppet] - 10https://gerrit.wikimedia.org/r/123433 (owner: 10GWicke) [16:18:29] https://github.com/wikimedia/mediawiki-core/blob/wmf/1.23wmf21/.gitmodules [16:18:33] wtf is checkout mediawiki doing [16:20:14] or did I typo [16:21:18] (03PS1) 10BBlack: enable apt proxy via carbon for esams [operations/puppet] - 10https://gerrit.wikimedia.org/r/123664 [16:23:30] (03CR) 10BBlack: [C: 032 V: 032] enable apt proxy via carbon for esams [operations/puppet] - 10https://gerrit.wikimedia.org/r/123664 (owner: 10BBlack) [16:25:28] greetings [16:25:43] I'm sorry i did not post it on the bugzilla earlier [16:25:54] we are having a wikipedia workshop [16:26:06] and people would be creating accounts [16:26:27] can you possibly raise the ip limit of account creation [16:27:02] last workshop it stopped at 6, participants should not be more than 12 , and most should have already have their accounts created [16:27:58] When does the workshop take place? [16:28:50] odder: In 3 days according to -ops [16:29:26] We are in -ops. [16:29:46] odder: No.. #wikimedia-ops [16:29:52] uwe: Please file a Bugzilla request under Wikimedia => Site requests [16:29:58] There is a #wikimedia-ops? [16:30:13] odder: Yes, Wikimedia IRC ops :p [16:30:19] *IRC channel ops [16:35:26] PROBLEM - DPKG on lvs3001 is CRITICAL: Connection refused by host [16:35:26] PROBLEM - Disk space on lvs3004 is CRITICAL: Connection refused by host [16:35:36] PROBLEM - puppet disabled on lvs3002 is CRITICAL: Connection refused by host [16:35:36] PROBLEM - Disk space on lvs3001 is CRITICAL: Connection refused by host [16:35:36] PROBLEM - RAID on lvs3004 is CRITICAL: Connection refused by host [16:35:46] PROBLEM - DPKG on lvs3003 is CRITICAL: Connection refused by host [16:35:46] PROBLEM - RAID on lvs3001 is CRITICAL: Connection refused by host [16:35:56] PROBLEM - puppet disabled on lvs3004 is CRITICAL: Connection refused by host [16:35:56] PROBLEM - Disk space on lvs3003 is CRITICAL: Connection refused by host [16:35:56] PROBLEM - puppet disabled on lvs3001 is CRITICAL: Connection refused by host [16:35:56] PROBLEM - RAID on lvs3003 is CRITICAL: Connection refused by host [16:36:06] PROBLEM - DPKG on lvs3002 is CRITICAL: Connection refused by host [16:36:16] PROBLEM - Disk space on lvs3002 is CRITICAL: Connection refused by host [16:36:16] PROBLEM - puppet disabled on lvs3003 is CRITICAL: Connection refused by host [16:36:18] uh-oh [16:36:26] PROBLEM - RAID on lvs3002 is CRITICAL: Connection refused by host [16:36:26] PROBLEM - DPKG on lvs3004 is CRITICAL: Connection refused by host [16:47:06] odder, no, it starts today [16:47:15] it is for 3 days [16:47:23] but its seems it will not workout [16:47:36] we are facing difficulties [16:48:13] (03PS1) 10coren: Tool Labs: tweaks to the urlproxy and webservice [operations/puppet] - 10https://gerrit.wikimedia.org/r/123668 [16:48:54] * Nemo_bis read "we have fascist difficulties" [16:48:56] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:49:56] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 577868 bytes in 9.598 second response time [16:51:28] (03CR) 10coren: [C: 032] "Known to work." [operations/puppet] - 10https://gerrit.wikimedia.org/r/123668 (owner: 10coren) [16:54:51] (03PS1) 10coren: Tool Labs: minor bugfix to portgrabber [operations/puppet] - 10https://gerrit.wikimedia.org/r/123669 [16:57:22] (03CR) 10coren: [C: 032] "Mini tweak." [operations/puppet] - 10https://gerrit.wikimedia.org/r/123669 (owner: 10coren) [16:59:23] (03PS1) 10coren: Dymanicproxy: preserve HTTP statuses [operations/puppet] - 10https://gerrit.wikimedia.org/r/123670 [16:59:26] RECOVERY - DPKG on lvs3001 is OK: All packages OK [16:59:36] RECOVERY - Disk space on lvs3001 is OK: DISK OK [16:59:46] RECOVERY - RAID on lvs3001 is OK: OK: optimal, 2 logical, 2 physical [16:59:56] RECOVERY - puppet disabled on lvs3001 is OK: OK [17:00:36] RECOVERY - RAID on lvs3004 is OK: OK: optimal, 2 logical, 2 physical [17:00:56] RECOVERY - puppet disabled on lvs3004 is OK: OK [17:01:26] RECOVERY - DPKG on lvs3004 is OK: All packages OK [17:01:26] RECOVERY - Disk space on lvs3004 is OK: DISK OK [17:04:13] (03CR) 10coren: [C: 032] Dymanicproxy: preserve HTTP statuses [operations/puppet] - 10https://gerrit.wikimedia.org/r/123670 (owner: 10coren) [17:06:56] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:13:06] RECOVERY - DPKG on lvs3002 is OK: All packages OK [17:13:16] RECOVERY - Disk space on lvs3002 is OK: DISK OK [17:13:26] RECOVERY - RAID on lvs3002 is OK: OK: optimal, 2 logical, 2 physical [17:13:36] RECOVERY - puppet disabled on lvs3002 is OK: OK [17:15:18] (03PS1) 10RobH: setting rhenium's mgmt and production ipv4 addresses [operations/dns] - 10https://gerrit.wikimedia.org/r/123673 [17:16:09] chrome hates gerrit so much less. [17:16:52] (still slow, but at least its not my browser being shitty) [17:18:22] (03CR) 10RobH: [C: 032] setting rhenium's mgmt and production ipv4 addresses [operations/dns] - 10https://gerrit.wikimedia.org/r/123673 (owner: 10RobH) [17:20:34] (03PS1) 10BryanDavis: [WIP] Allow mwdeploy user to ssh between hosts in beta [operations/puppet] - 10https://gerrit.wikimedia.org/r/123674 [17:24:12] (03PS1) 10RobH: adding rhenium dhcpd mac address to dhcpd host entries [operations/puppet] - 10https://gerrit.wikimedia.org/r/123675 [17:26:56] RECOVERY - Disk space on lvs3003 is OK: DISK OK [17:26:56] RECOVERY - RAID on lvs3003 is OK: OK: optimal, 2 logical, 2 physical [17:27:16] RECOVERY - puppet disabled on lvs3003 is OK: OK [17:27:46] RECOVERY - DPKG on lvs3003 is OK: All packages OK [17:28:10] (03CR) 10Manybubbles: "cool" [operations/puppet] - 10https://gerrit.wikimedia.org/r/123466 (owner: 10Ottomata) [17:28:56] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 518080 bytes in 9.612 second response time [17:34:46] (03PS2) 10Alexandros Kosiaris: Purge /etc/apt/apt.conf [operations/puppet] - 10https://gerrit.wikimedia.org/r/123628 [17:36:33] (03CR) 10RobH: [C: 032] adding rhenium dhcpd mac address to dhcpd host entries [operations/puppet] - 10https://gerrit.wikimedia.org/r/123675 (owner: 10RobH) [17:46:56] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:00:21] (03PS1) 10coren: Labs: -pmtpa support from role::labs::instance [operations/puppet] - 10https://gerrit.wikimedia.org/r/123681 [18:01:49] Coren: andrewbogott: cat modules/labs_vmbuilder/files/vmbuilder.partition says /var separate partition and only 2G. Why ? [18:02:00] I am asking cause I just got bitten by it [18:02:34] akosiaris: A little context? Is this on a labs instance? [18:02:47] yes [18:02:51] akosiaris: havana prefers giving one big disk, and separating out /var is a big reliability boost. You can get as much space as you want with an lvm module (ideally at /srv) [18:03:20] why is it a big reliability boost ? [18:03:21] What are you trying to put into /var that's too big for 2G? [18:03:29] /var/lib/postgres, /var/lib/mysql [18:03:31] akosiaris: I just made a note to document partitioning a bit better. And… well, maybe we should have a big partition by default, not sure. [18:03:32] Because you can't fill out / with runaway logs that way. [18:03:39] akosiaris: Those should really go in /srv [18:03:59] No, no, no! Please don't go back to the bad days of / being filled by logs. [18:04:08] (03CR) 10Andrew Bogott: [C: 031] Labs: -pmtpa support from role::labs::instance [operations/puppet] - 10https://gerrit.wikimedia.org/r/123681 (owner: 10coren) [18:04:21] ok. /var/log then and not /var ? [18:04:47] <_joe|away> I was about to suggest that too :) [18:04:55] Coren: I didn't mean one big disk, I meant including role::labs::lvn::mnt by default (or something similar) [18:05:02] i get the / being filled argument but /var 2G is small [18:05:12] akosiaris: The same argument applies to /var/spool, /var/cache and so on. Large applicative data does not belong in /var. :-) [18:05:34] hey hey [18:05:38] I can quote fhs on this :-) [18:05:44] hey akosiaris [18:05:52] hey average [18:06:02] I'd like to draw a diagram [18:06:11] of all machines, varnishes included, and other machines [18:06:17] that are receiving traffic [18:06:22] akosiaris: Please do. It agrees with me. :-) [18:06:32] and draw that diagram in order to understand how the traffic flows... [18:06:56] I'd like to understand this better [18:06:59] really, I install a mysql/postgres/myapp cause I wanna just test something. I do not want to have to configure it to use /srv/mysql [18:07:05] can this be done by solely reading the puppet manifests ? [18:07:24] average: define machines [18:07:29] is machines routers ? [18:07:40] the answer then is no [18:08:02] if not, maybe. [18:08:17] akosiaris: /srv really should be the default for all of those. If you really *have* to use /var/foo, then mount the lvm volume /there/ (the class allows you to pick any mountpoint) [18:08:25] akosiaris: I'll take the not part [18:08:27] hey ottomata [18:08:59] yoyo [18:09:11] btw /var/lib State information. Persistent data modified by programs as they run, e.g., databases, packaging system metadata, etc. [18:09:30] and /srv Site-specific data which are served by the system. [18:09:36] so fhs does not agree with you :-) [18:09:39] akosiaris: I.e.: databases, etc. [18:09:49] akosiaris: ... wait, you just quoted that it does. [18:10:09] Ah, you're picking the examples for /var/lib :-) [18:10:15] err: /File[/var/lib/puppet/lib]: Failed to generate additional resources using 'eval_generate: Error 400 on SERVER: Not authorized to call search on /file_metadata/plugins with {:checksum_type=>"md5", :ignore=>[".svn", "CVS", ".git"], :recurse=>true, :links=>"manage"} [18:10:15] The example is bad. :-) [18:10:20] ^ ? [18:11:06] forget the example, get the definition [18:11:14] akosiaris: labs_lvm::volume { 'bad-place': mountat => '/var/lib/mysql' } [18:11:35] akosiaris: Right. /srv Site-specific data which are served by the system => databases, web sites, ldap dbs, etc. [18:11:58] really ? you are kind of stretching it I think [18:12:08] especially since the Persistent data modified by programs as they run exists [18:12:39] akosiaris: What /else/ would you put in /srv but data that is meant to be served by applications? [18:13:11] That's what /srv is there for. :-) To get it away from /var/lib where it once complicated everything for no good reason. :-) [18:13:22] (/srv is a relatively recent addition) [18:13:50] It was added *specifically* to get things like databases out of /var :-) [18:14:02] not really [18:14:10] it was to get stuff out of /var/www [18:14:30] Yeah, that's one of the things. :-) [18:14:39] Also git repos. [18:15:02] The good rule is: /var being full should not harm any valuable data. [18:15:02] Coren: is this documented somewhere on wikitech? [18:15:24] for labs instances, i don't always know where the "correct" place to put things is [18:15:39] aude: I dunno that it is. We used to use /a at WMF before /srv became part of fhs [18:15:50] * aude nods [18:16:00] i never would have thought of using /a [18:16:02] So we got a *lot* of historicals /a and /b lying around prod. [18:16:13] yeah, that was/is an abomination [18:16:16] yeah /srv is the canonical replacement for whatever everyone else used to do [18:16:32] /db, /data, [18:17:50] akosiaris: Seriously though, I understand that the ubuntu default is /var/lib/mysql but we really should not use that -- prod uses /a for historical reasons but there's no reason to not use /srv in labs. [18:18:50] (And I seem to recall someone wanting to move away from /a in prod too -- Faidon?) [18:19:19] yes but it is not the same scenario. [18:19:27] ... why not? [18:19:29] a user installing a mysql just for something to test [18:19:40] is not the same things as WMF running all these projects [18:20:14] simply put, I am the first complaining, there will be others [18:20:24] A DB is a DB is a DB. You know that "this is just a temporary hack" tends to end up being permanent and relied on by others. Putting it always in the right place => long term win. [18:20:25] and you will get bored of pointing to docs [18:20:49] akosiaris: I would be more than happy to help make the mysql role use the right place by default in labs. :-) [18:20:59] Not like it's hard to do. [18:21:37] well please do, altough I am not using the roles for this ever [18:21:41] if /var/lib/mysql is a symlink to /srv/mysql is everyone happiper? [18:21:50] I agree that having to remember to do it is annoying and prone to forgetting. The correct solution isn't "don't to the right thing" but "make it automatic". :-) [18:22:21] cajoel: That works, of course, and is probably the simplest way. [18:22:24] btw mysql is just an example [18:22:48] openldap uses /var/lib, postgresql uses /var/lib, basically anything uses /var/lib/ [18:22:57] even puppet uses /var/lib/ [18:22:58] akosiaris: Hysterical raisins. [18:23:29] All those packages are older than /srv and I can understand why debian/whoever else wouldn't want breaking updates. [18:23:32] !log reedy updated /a/common to {{Gerrit|I835c2b1d5}}: Depool. See RT 7191. [18:23:33] my histroical reasons for moving mysql was to put it on rieser or on spindles with different performance.. [18:23:35] (03PS1) 10Reedy: Add/update symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/123684 [18:23:36] Logged the message, Master [18:23:41] so you expect that all these programs will move into /srv ? [18:23:47] cause I know I dont [18:23:48] (03PS1) 10BBlack: allow esams.wmnet puppet fileserver access [operations/puppet] - 10https://gerrit.wikimedia.org/r/123685 [18:23:56] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 517096 bytes in 9.879 second response time [18:24:06] jenkins is in /var/lib [18:24:07] all the java stuff [18:24:24] java is not the best example around :-) [18:24:42] heh :) [18:24:44] (03CR) 10BBlack: [C: 032 V: 032] allow esams.wmnet puppet fileserver access [operations/puppet] - 10https://gerrit.wikimedia.org/r/123685 (owner: 10BBlack) [18:24:46] I was about to say "that java does it is demonstrative of how wrong it is" :-) [18:25:44] Wait, don't our mysql classes already have a parameter to point at where the DB should be? [18:25:48] * Coren checks [18:25:58] they do IIRC [18:26:29] well, which classes is a better question [18:26:40] the new, the old, the mariadb ones ? [18:26:44] akosiaris: Yeah, that needs a good linting. [18:26:52] do we put binlogs and data on the same partitions? (that further complicates directory layout) [18:27:01] but still, mysql is just an example [18:27:05] cajoel: I think the answer is "it depends" [18:27:34] akosiaris: I've been bitten often enough with databases stalling because the disk was full of logs to not want it to be possible anymore. :-) [18:28:02] so /var/log instead of /var [18:28:03] s/logs/$random_growing_crap_from_slash_var/ [18:28:10] ahahaha [18:28:18] Then you also need /var/spool /var/cache and so on. [18:28:24] well /tmp tends to have that too [18:28:35] why not it ? [18:29:20] Isn't /tmp tmpfs? [18:29:26] D'oh! It isn't. [18:29:26] and /home/ and /root and /usr (the most common one) [18:29:51] Well, /home is, normally, a different filesystem in labs. [18:29:53] /tmp used to be tmpfs commonly, then for a while it wasn't, now it's becoming vogue again in newer distros [18:29:56] it's a cycle [18:29:57] We trust /root [18:30:09] (03CR) 10Reedy: [C: 032] Add/update symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/123684 (owner: 10Reedy) [18:30:21] (03Merged) 10jenkins-bot: Add/update symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/123684 (owner: 10Reedy) [18:30:24] In my experience also, it's usually run away logs in /var/log are the pace hogs that block/break. [18:30:30] space hogs [18:30:42] And /usr isn't supposed to have unbounded data in it at all in the first place. [18:30:51] !log reedy Started scap: testwiki to 1.23wmf21 and build l10n cache [18:30:56] Logged the message, Master [18:30:57] cajoel: /var/cache bites too. [18:31:34] But yeah, I wouldn't be *opposed* to making /var/log the smaller partition instead of just /var; but that only postpones the same mysql/etc issue. [18:31:38] (03PS2) 10BryanDavis: [WIP] Allow mwdeploy user to ssh between hosts in beta [operations/puppet] - 10https://gerrit.wikimedia.org/r/123674 [18:31:43] Because / is fixed sized. [18:32:14] And "real" data should be on a different disk. [18:33:16] ok, tell you what. Leave it as is. I am gonna make due. But if people start complaining, please reconsider it. [18:35:00] akosiaris: People will complain either way. :-) But yeah, if it proves to be more of a hinderance than just the minor grunt now and then about having forgotten to put data elsewhere, I'll rejigger it somehow. [18:36:10] (A possible workaround would be to have a nice collection of symlinks to bits /srv that are set before => Package['somethingbig'] [18:36:48] For, say, /var/lib/{mysql,openldap,puppet,whatever} [18:37:51] Mind you, in pmtpa it was even worse because the extra disk was stuffed on... /mnt [18:37:52] niah, just set the defaults parameters in the classes [18:37:55] (of all places) [18:38:08] yeah, not a good example either [18:38:46] Sure, but we nevertheless now have crappiles of puppet classes that mention /mnt [18:39:56] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:40:26] PROBLEM - Puppet freshness on stat1003 is CRITICAL: Last successful Puppet run was Wed 02 Apr 2014 06:33:14 PM UTC [18:44:11] separate /var? [18:44:15] what is this, the 90s? :) [18:44:31] and 2G? that's abysmally slow [18:59:06] Wow. scap is taking forever. I see lots of 5m rsync times in the logs [19:01:34] The rsync seems to be pounding mw1070 -- https://ganglia.wikimedia.org/latest/?c=Application%20servers%20eqiad&h=mw1070.eqiad.wmnet&m=cpu_report&r=hour&s=descending&hc=4&mc=2 [19:02:05] Lucky server? [19:02:06] 19:00:54 INFO - Finished sync-common to apaches (duration: 15m 34s) [19:03:45] (03PS2) 10Nuria: Adding new scheduler mode to wikimetrics. [operations/puppet] - 10https://gerrit.wikimedia.org/r/122425 [19:04:01] (03CR) 10jenkins-bot: [V: 04-1] Adding new scheduler mode to wikimetrics. [operations/puppet] - 10https://gerrit.wikimedia.org/r/122425 (owner: 10Nuria) [19:04:03] <_joe|away> bd808: rsync? gee. [19:04:29] <_joe|away> I mean, rsync taking up 100% user time, that's impressive [19:04:33] Apparently. 111 fetches went there vs 72 for mw1010 [19:06:09] Here's what I see in the logs about distribution of the rsync calls: http://p.defau.lt/?PQjGRgSdoklqRIMSLiDJ9g [19:06:26] Are the rack allocations that lop sided? [19:06:38] s/rack/row/ [19:09:15] !log reedy Finished scap: testwiki to 1.23wmf21 and build l10n cache (duration: 38m 23s) [19:09:19] Logged the message, Master [19:10:04] I'llet me see if I can get into racktables again [19:12:01] I guess it's at least better than the time I broke it so that all fetches went to tin :/ [19:12:42] 4 bugs to file based on the logs... [19:13:57] Reedy: new scap problems? [19:14:28] Nope [19:14:36] 1 fatal, a few warnings [19:14:48] * bd808 nods [19:14:56] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 475241 bytes in 8.218 second response time [19:15:23] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedias to 1.23wmf20 [19:15:28] Logged the message, Master [19:16:04] yay, apc spam [19:16:31] crap [19:16:48] (03PS1) 10Ottomata: Adding $srange parameter to ferm::service [operations/puppet] - 10https://gerrit.wikimedia.org/r/123698 [19:17:08] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedias actually to 1.23wmf20 [19:17:12] Logged the message, Master [19:17:16] wow. that's a big pile of apc spam [19:17:43] (03PS1) 10Reedy: Wikipedias to 1.23wmf20 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/123699 [19:17:45] (03PS1) 10Reedy: group0 to 1.23wmf21 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/123700 [19:18:41] (03PS1) 10Ottomata: Adding $ensure parameter to base::firewall [operations/puppet] - 10https://gerrit.wikimedia.org/r/123701 [19:20:30] (03CR) 10Reedy: [C: 032] Wikipedias to 1.23wmf20 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/123699 (owner: 10Reedy) [19:20:38] (03Merged) 10jenkins-bot: Wikipedias to 1.23wmf20 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/123699 (owner: 10Reedy) [19:20:45] (03CR) 10Reedy: [C: 032] group0 to 1.23wmf21 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/123700 (owner: 10Reedy) [19:20:53] (03Merged) 10jenkins-bot: group0 to 1.23wmf21 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/123700 (owner: 10Reedy) [19:21:17] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 wikis to 1.23wmf21 [19:21:22] Logged the message, Master [19:22:40] (03PS1) 10Ottomata: Ferm rules for stat1003 (and rsyncd on any stat* server) [operations/puppet] - 10https://gerrit.wikimedia.org/r/123702 [19:23:02] !log reedy synchronized docroot and w [19:23:06] Logged the message, Master [19:25:38] (03CR) 10Ottomata: [C: 04-1] "This commit depends on https://gerrit.wikimedia.org/r/#/c/123698 for ferm::service $srange support." [operations/puppet] - 10https://gerrit.wikimedia.org/r/123702 (owner: 10Ottomata) [19:29:30] (03PS1) 10Manybubbles: WIP: Deploy experimental highlighter [operations/software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/123704 [19:29:49] (03PS3) 10Ottomata: Adding new scheduler mode to wikimetrics. [operations/puppet] - 10https://gerrit.wikimedia.org/r/122425 (owner: 10Nuria) [19:30:24] (03PS4) 10Ottomata: Adding new scheduler mode to wikimetrics. [operations/puppet] - 10https://gerrit.wikimedia.org/r/122425 (owner: 10Nuria) [19:30:32] apc thrashing seems to have died down in the last 5 minutes [19:30:56] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:31:48] bd808: You never seen that before on dropping usage of a mw version? [19:32:22] (03CR) 10Alexandros Kosiaris: [C: 04-1] Adding $ensure parameter to base::firewall (034 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/123701 (owner: 10Ottomata) [19:32:31] it gets amusing when you try and run 3 versions of mw simultaneously ;D [19:33:45] (03CR) 10Ottomata: [C: 032 V: 032] Adding new scheduler mode to wikimetrics. [operations/puppet] - 10https://gerrit.wikimedia.org/r/122425 (owner: 10Nuria) [19:34:19] akosiaris: BAHp [19:34:26] PROBLEM - Puppet freshness on lvs3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [19:34:26] PROBLEM - Puppet freshness on lvs3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [19:34:26] PROBLEM - Puppet freshness on lvs3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [19:34:26] PROBLEM - Puppet freshness on lvs3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [19:34:26] thanks [19:34:42] :-) [19:34:56] It happened on one of my thursday deploys but not as high rate and sustained. Probably related to rate of traffic at the time of the switch. [19:36:09] (03PS2) 10Ottomata: Adding $ensure parameter to base::firewall [operations/puppet] - 10https://gerrit.wikimedia.org/r/123701 [19:36:31] EP errorings [20:04:56] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 461822 bytes in 8.360 second response time [20:20:56] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:21:08] (03PS1) 10Andrew Bogott: Fix a minor chown/chmod issue with a README [operations/puppet] - 10https://gerrit.wikimedia.org/r/123783 [20:23:25] (03CR) 10Andrew Bogott: [C: 032] Fix a minor chown/chmod issue with a README [operations/puppet] - 10https://gerrit.wikimedia.org/r/123783 (owner: 10Andrew Bogott) [20:27:20] (03PS1) 10Andrew Bogott: Tiny lint fix [operations/puppet] - 10https://gerrit.wikimedia.org/r/123784 [20:39:16] RECOVERY - Puppet freshness on stat1003 is OK: puppet ran at Thu Apr 3 20:39:07 UTC 2014 [20:47:02] gitblit 503 :( [20:47:52] https://git.wikimedia.org/log/operations%2Fpuppet/refs%2Fheads%2Fproduction - takes forever and dies with 503 msg [20:52:39] (03PS1) 10Dr0ptp4kt: Removing ZeroTLS mark for 635-10. It's new, so no need for it. [operations/puppet] - 10https://gerrit.wikimedia.org/r/123786 [20:53:42] ^ bblack, good point, there you go [20:54:15] yurik, vim .git/config [20:54:26] url = ssh://yurik@gerrit.wikimedia.org:29418/operations/puppet [20:54:39] yurik, i'm seeing the same, too [20:55:23] (03CR) 10Andrew Bogott: [C: 032] Tiny lint fix [operations/puppet] - 10https://gerrit.wikimedia.org/r/123784 (owner: 10Andrew Bogott) [20:56:56] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 521645 bytes in 9.476 second response time [20:57:15] this may be the answer to yurik [20:57:24] unless it was yurik bring giutblit down :P [20:57:40] Nemo_bis, yep, its up, thx :) [21:10:38] (03PS2) 10BBlack: Removing ZeroTLS mark for 635-10. It's new, so no need for it. [operations/puppet] - 10https://gerrit.wikimedia.org/r/123786 (owner: 10Dr0ptp4kt) [21:10:47] (03CR) 10BBlack: [C: 032 V: 032] Removing ZeroTLS mark for 635-10. It's new, so no need for it. [operations/puppet] - 10https://gerrit.wikimedia.org/r/123786 (owner: 10Dr0ptp4kt) [21:11:03] bblack thx [21:13:56] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:32:43] (03CR) 10Chad: [C: 032] Opt all italian wikis into interwiki search [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/123461 (owner: 10Chad) [21:32:53] (03Merged) 10jenkins-bot: Opt all italian wikis into interwiki search [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/123461 (owner: 10Chad) [21:33:34] \o/ [21:33:39] !log demon synchronized wmf-config/CirrusSearch-production.php 'italian wikis getting interwiki search. they're my favorite beta testers' [21:33:43] Logged the message, Master [21:33:44] * Nemo_bis sending some notifications [21:33:55] <^d> Nemo_bis: Here we go [21:35:18] (03PS2) 10Ori.livneh: Add Icinga checks for important sysctl params [operations/puppet] - 10https://gerrit.wikimedia.org/r/111163 [21:35:34] (03CR) 10jenkins-bot: [V: 04-1] Add Icinga checks for important sysctl params [operations/puppet] - 10https://gerrit.wikimedia.org/r/111163 (owner: 10Ori.livneh) [21:36:11] <^d> Nemo_bis: https://it.wikipedia.org/w/index.php?search=Rome&title=Speciale%3ARicerca&go=Vai&fulltext=1 \o/ [21:36:11] <^d> Wahoooooo [21:36:12] <^d> Hmmmm, looks off. [21:36:12] <^d> More UI quirks. [21:36:12] <^d> Wonder if all the code's not live yet. [21:38:41] (03PS3) 10Ori.livneh: Add Icinga checks for important sysctl params [operations/puppet] - 10https://gerrit.wikimedia.org/r/111163 [21:38:51] !log demon synchronized php-1.23wmf20/extensions/CirrusSearch 'Updating Cirrus to master' [21:38:56] Logged the message, Master [21:39:31] <^d> Hmmm. [21:39:41] missing interwiki prefixes? [21:40:05] <^d> I went around looking at the prefixes yesterday [21:40:11] <^d> Before I wrote up the config change. [21:40:15] (03CR) 10Ori.livneh: "PS2 & PS3:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/111163 (owner: 10Ori.livneh) [21:40:43] paravoid, akosiaris ^ [21:41:00] I mean, it thinks all results are from it.wikipedia [21:41:25] <^d> Ah, I think I know what it is. [21:42:16] <^d> Nope, that change is live too. [21:45:34] here it works https://it.wikisource.org/w/index.php?title=Speciale%3ARicerca&profile=default&search=chiacchiere&fulltext=Search [21:46:38] !log demon synchronized php-1.23wmf20/extensions/CirrusSearch 'Rolling back to 1.23wmf20 branch point from master' [21:46:43] Logged the message, Master [21:46:46] code should be the same in both [21:47:31] <^d> wikisource looks broken for me. [21:47:35] <^d> same as wikipedia [21:47:56] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 531913 bytes in 9.977 second response time [21:48:04] (03PS3) 10Ori.livneh: updateBitsBranchPointers: get rid of 'static-stable' branch link [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/94447 [21:49:30] (03CR) 10Ori.livneh: [C: 032] updateBitsBranchPointers: get rid of 'static-stable' branch link [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/94447 (owner: 10Ori.livneh) [21:49:39] (03Merged) 10jenkins-bot: updateBitsBranchPointers: get rid of 'static-stable' branch link [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/94447 (owner: 10Ori.livneh) [21:50:15] !log ori updated /a/common to {{Gerrit|Ic1602c045}}: updateBitsBranchPointers: get rid of 'static-stable' branch link [21:50:20] Logged the message, Master [21:50:49] !log ori synchronized multiversion/updateBitsBranchPointers 'updateBitsBranchPointers: get rid of 'static-stable' branch link' [21:50:53] Logged the message, Master [21:51:37] <^d> Nemo_bis: I have a guess. [21:52:40] index? [21:52:48] tellme tellme tellme [21:53:13] <^d> I think it's the cache :) [21:53:40] Which cache? :O [21:54:03] * Nemo_bis knows nothing of cache wrt search [21:54:35] !log demon synchronized php-1.23wmf20/extensions/CirrusSearch 'Cirrus back to master again' [21:54:39] <^d> I cache interwiki results. [21:54:39] Logged the message, Master [21:54:42] <^d> For performance! [21:55:13] !log demon updated /a/common/php-1.23wmf20 to {{Gerrit|Ic853ebff4}}: Cherry-pick I550eb4b0a8fa18344e8b0de3ec85d61c2122ffb8 [21:55:19] Logged the message, Master [21:59:16] (03PS2) 10Alexandros Kosiaris: Create and import shapelines from a pre-existing dump [operations/puppet] - 10https://gerrit.wikimedia.org/r/123767 [21:59:18] (03PS1) 10Alexandros Kosiaris: Import coastlines into OSM db [operations/puppet] - 10https://gerrit.wikimedia.org/r/123792 [22:00:02] (03PS1) 10Chad: Lower interwiki search cache times, for testing [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/123793 [22:00:11] (03CR) 10Chad: [C: 032 V: 032] Lower interwiki search cache times, for testing [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/123793 (owner: 10Chad) [22:00:15] (03PS3) 10BryanDavis: [WIP] Allow mwdeploy user to ssh between hosts in beta [operations/puppet] - 10https://gerrit.wikimedia.org/r/123674 [22:00:53] !log demon synchronized wmf-config/CirrusSearch-production.php 'lowering cache time, for testing' [22:00:57] Logged the message, Master [22:01:52] <^d> Nemo_bis: Lowered the cache to just 60 seconds. [22:01:56] <^d> Let's see if that helps. [22:01:56] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:02:41] <^d> There we go! [22:02:41] <^d> https://it.wikipedia.org/w/index.php?title=Speciale%3ARicerca&profile=default&search=Pizza&fulltext=Search [22:02:54] <^d> Yep, it was the old bad code + too long a cache [22:07:01] <^d> Nemo_bis: This is exciting to see live :) [22:16:46] ^d: extremely! so many years I've longed for this moment :) [22:17:31] weird recipes it.books has O_o [22:19:16] ^d: see, for instance, this recipe was imported following a process we set up around 2006 to avoid useless deletions of useful but non-encyclopedic content https://it.wikibooks.org/wiki/Libro_di_cucina/Ricette/Pizza_di_mosto_cotto [22:19:44] to let people find such moved articles, we needed a javascript with a manually-curated list of migrated pages... [22:20:10] if the interwiki search works, we'll be able to kill both that ugly script (which has been decaying for years IIRC) and maybe the wikidata search at some point :) [22:20:54] (03PS4) 10BryanDavis: [WIP] Allow mwdeploy user to ssh between hosts in beta [operations/puppet] - 10https://gerrit.wikimedia.org/r/123674 [22:35:26] PROBLEM - Puppet freshness on lvs3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [22:35:26] PROBLEM - Puppet freshness on lvs3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [22:35:26] PROBLEM - Puppet freshness on lvs3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [22:35:26] PROBLEM - Puppet freshness on lvs3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [22:36:56] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 512061 bytes in 9.413 second response time [22:46:56] PROBLEM - MySQL Idle Transactions on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:47:46] RECOVERY - MySQL Idle Transactions on db1047 is OK: OK longest blocking idle transaction sleeps for 0 seconds [22:52:56] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:53:18] looks like no swat deploy today, at least nothing on the deployments page? [22:58:33] Uh so [22:58:38] (03PS5) 10BryanDavis: [WIP] Allow mwdeploy user to ssh between hosts in beta [operations/puppet] - 10https://gerrit.wikimedia.org/r/123674 [22:58:46] ebernhardson: There's a fix coming from VE that we'd love to get in today [23:00:04] marktraceur: yea can do that, add it to the calendar? [23:00:12] Thing is [23:00:17] Trevor is still writing it. :3 [23:01:07] not sure how comfortable i am deploying something that was written moments prior :P [23:01:15] marktraceur: well you should have thought about that before today then, shouldn't you have?! [23:01:28] ebernhardson: We'll get a Friday deploy from the wonderful greg-g if necessary. [23:01:34] But Thursday! [23:01:38] oh yea, its thurday [23:01:54] James_F: kiss up [23:01:59] * bd808 sees the panic every week [23:02:05] i dunno ... you folks know the VE code much better than me...can push it out [23:02:06] greg-g: You mean before today, when the OOJS patch wasn't merged and everything we wrote was working perfectly? [23:02:07] * greg-g mixes idioms [23:02:13] marktraceur: yep, that :) [23:02:21] but how will you fix the new problems it creates, no more deploys until monday :P [23:02:25] Instead of today when the oojs patch got merged and pushed to MW.org, breaking our dialogs? [23:02:57] marktraceur: i was kidding, for the avoidance of doubt, buddy :) [23:03:08] *nod* yup [23:03:10] raaaar [23:03:18] mwalker: taaaaar.gz [23:03:21] has anyone taken on the mantle of breaking the site yet [23:03:27] greg-g, ^? [23:03:33] mwalker: not yet, nothing to deploy yet [23:03:37] ah well then [23:03:39] :) [23:03:41] * mwalker goes back to his cave [23:03:55] trevor is workng on a fix that broke mediaviewer, as best I can tell [23:03:57] marktraceur: Tuesday. But yes. [23:04:09] Oh. [23:04:13] Well. [23:04:15] But still. [23:04:18] No, wait, yesterday. [23:04:19] I'm happy to deploy that whenever if he needs it [23:04:21] I just can't count. [23:04:23] well, the fix didn't break it, because A) that'd be a bad fix and B) it isn't written yet #grammar [23:04:31] Was assuming that today UTC was today in SF. [23:04:41] James_F: that's an odd thing to do [23:04:53] greg-g: You're just full of smiles, right? :-D [23:04:57] :) [23:05:05] * mwalker glances at the todo list; finds the entry marked; make the deployment calendar timezone aware; stares at it for a while [23:05:24] * James_F ponders pushing the BLOCKER bug from "Highest" to "Immediate", to look like he's doing something to help marktraceur. [23:05:31] mwalker: what part of timezone aware does that refer to? [23:05:40] <^d> James_F: What would we do without you? [23:05:43] James_F: isn't that what product managers do? [23:05:46] ^d: Sleep better? [23:05:49] greg-g: Hush, you. [23:05:52] :P [23:05:53] basically have the SF Time column display in the time that your browser thinks it is [23:06:03] mwalker: oh, a "your time"? [23:06:07] *nods* [23:06:11] neat-o [23:06:34] * James_F feels quite certain that https://gerrit.wikimedia.org/r/#/c/123782/ is going to need rebasing after Trevor fixes this. :-) [23:06:55] <^d> James_F: Already will, see final comment from ErikB [23:07:16] holy commit message batman [23:07:22] ^d: I've asked him to explain what file, given that I did rebase and t'aint no such file. [23:07:29] greg-g: I wanted to be thorough. [23:07:39] James_F: and I love you for it [23:08:06] greg-g: I might instead split out each resource into its own directory; feedback appreciated. [23:08:10] <^d> Commit messages aren't long enough unless they make machines go into swap. [23:24:29] (03PS6) 10BryanDavis: Allow mwdeploy user to ssh between hosts in beta [operations/puppet] - 10https://gerrit.wikimedia.org/r/123674 [23:29:56] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 506463 bytes in 9.287 second response time [23:30:16] PROBLEM - Disk space on stat1002 is CRITICAL: DISK CRITICAL - free space: /home 41075 MB (3% inode=99%): [23:41:53] Sigh, I guess there's not going to be a fix tonight [23:42:01] James_F: ^^ [23:42:41] marktraceur: Ask edsanders. [23:42:55] I mean, the SWAT is over and I suspect pushing out now isn't totally necessary [23:43:28] Eh. [23:43:55] only broken on mw.org? [23:44:08] greg-g: Yes. [23:44:15] greg-g: (And the rest of phase0.) [23:44:16] yeah, no huge rush then [23:44:27] Monday should be good enough, right? [23:44:32] :-( [23:44:38] Not great. [23:44:42] * greg-g doesn't know all of the context [23:44:45] (03CR) 10BryanDavis: "With this cherry-picked into the repo on deployment-scap, I can now ssh from deployment-bastion to deployment-apache0[12] as the mwdeploy " (033 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/123674 (owner: 10BryanDavis) [23:45:15] greg-g: Breakage in a core part of OOjs UI that's used for almost everything by VisualEditor and also by MediaViewer. [23:46:36] PROBLEM - MySQL InnoDB on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:46:51] oh, I thought it was just MediaViewer [23:47:26] RECOVERY - MySQL InnoDB on db1047 is OK: OK longest blocking idle transaction sleeps for 0 seconds [23:47:56] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:52:48] greg-g: No, sadly it's pretty terrible for VE. Every dialog, most interaction widgets, … :-( [23:53:06] greg-g: Krinkle has offered to deploy if you're OK with it (once it's merged and tested). [23:54:29] it's broken on beta cluster right now, right? [23:54:46] greg-g: Yes, and phase0 [23:54:46] ie: we'll see the fix there 5ish minutes after it's merged into master? [23:54:50] Indeed. [23:55:03] anyone tried to install ubuntu on lvm mirrors? [23:55:18] cool, yeah, I'm ok with this fix being backported to phase0 wikis tonight with Krinkle on point and someone else around for moral support :) [23:55:21] I've done it on RAID1 mirrors up and down and backwards [23:55:46] and I feel like LVM mirrors should basically be the same thing, but I can't find an examples or easy flow in the config dialogs [23:55:50] cajoel: nope, btrfs 'raid' (btrfs' built in raid system) [23:56:24] not sure I want to btrfs yet, and realistically GRUB comes before btrfs, no? [23:56:38] I need a block level device at boot? [23:56:44] maybe not.... [23:57:09] I don't recommend btrfs :) [23:57:21] unless you want to play with it/report issues [23:58:00] greg-g: OK, patch is now being merged in master and will be live in a few minutes. [23:58:45] cool