[00:13:27] (03PS1) 10Yuvipanda: Add SSL support to the proxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/83773 [00:18:52] (03PS2) 10Yuvipanda: Rename labsproxy module to dynamicproxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/83127 [00:18:53] (03PS2) 10Yuvipanda: Add SSL support to the proxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/83773 [00:38:47] (03PS3) 10Yuvipanda: Add SSL support to the proxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/83773 [00:52:20] why? [00:52:25] why [01:51:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:53:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.130 second response time [02:01:47] !log LocalisationUpdate completed (1.22wmf16) at Wed Sep 11 02:01:46 UTC 2013 [02:01:54] Logged the message, Master [02:02:33] !log LocalisationUpdate completed (1.22wmf15) at Wed Sep 11 02:02:32 UTC 2013 [02:02:36] Logged the message, Master [02:06:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:08:16] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 1.540 second response time [02:08:28] !log LocalisationUpdate ResourceLoader cache refresh completed at Wed Sep 11 02:08:28 UTC 2013 [02:08:32] Logged the message, Master [02:14:16] RECOVERY - check_job_queue on hume is OK: JOBQUEUE OK - all job queues below 10,000 [02:17:29] PROBLEM - check_job_queue on hume is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:46:36] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:47:28] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [02:52:36] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:53:27] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.129 second response time [03:01:17] PROBLEM - MySQL Processlist on db1043 is CRITICAL: CRIT 1 unauthenticated, 0 locked, 1 copy to table, 78 statistics [03:02:18] RECOVERY - MySQL Processlist on db1043 is OK: OK 0 unauthenticated, 0 locked, 1 copy to table, 8 statistics [04:13:55] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:14:56] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 8.336 second response time [04:33:35] PROBLEM - MySQL Replication Heartbeat on db49 is CRITICAL: CRIT replication delay 301 seconds [04:34:36] RECOVERY - MySQL Replication Heartbeat on db49 is OK: OK replication delay 29 seconds [04:41:58] RECOVERY - MySQL Replication Heartbeat on db1046 is OK: OK replication delay 0 seconds [05:03:25] (03CR) 10Faidon Liambotis: [C: 031] "* Sheds a lot of tears." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83761 (owner: 10Aaron Schulz) [05:14:09] * Aaron|home is tired [05:15:59] ganglia is being wacky again [05:15:59] http://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&tab=v&vn=Navigation+Timing doesn't load [05:15:59] ditto http://ganglia.wikimedia.org/latest/?c=Miscellaneous%20eqiad&h=client-side&m=cpu_report&r=hour&s=descending&hc=4&mc=2 [05:15:59] it hates me for using a five-minute reporting interval [05:19:00] ori-l: it even loads main page and you still complain? :) [05:28:30] i don't know if the fact that some tampa graphs stopped in the last hour [05:28:31] http://ganglia.wikimedia.org/latest/?c=LVS%20loadbalancers%20pmtpa&m=cpu_report&r=hour&s=by%20name&hc=4&mc=2 [05:28:33] is related [06:07:34] PROBLEM - RAID on ms-be10 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:29:52] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:30:52] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 7.958 second response time [06:41:51] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:44:51] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 6.999 second response time [07:00:51] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:17:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 7.430 second response time [07:24:01] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:25:59] (03CR) 10Steinsplitter: [C: 031] commonswiki wgImportSources, add some sources [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83678 (owner: 10Jeremyb) [07:27:01] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 7.917 second response time [07:30:02] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:39:01] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 7.577 second response time [07:42:04] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:43:53] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 4.545 second response time [07:52:22] (03PS1) 10TTO: Remove wikimania2005 whitelist from InitialiseSettings.php [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83776 [08:16:23] PROBLEM - MySQL Processlist on db1052 is CRITICAL: CRIT 1 unauthenticated, 0 locked, 1 copy to table, 53 statistics [08:17:31] RECOVERY - MySQL Processlist on db1052 is OK: OK 0 unauthenticated, 0 locked, 0 copy to table, 1 statistics [08:22:11] PROBLEM - Puppet freshness on lvs1003 is CRITICAL: No successful Puppet run in the last 10 hours [08:31:11] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:36:14] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 4.767 second response time [08:39:21] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:40:29] hey apergos, got a moment to help me debug something? i'm trying to figure out why certain ganglia views aren't displaying. it looks like ganglia-web just aborts after printing the header. [08:40:41] ok, let me hop on and see [08:40:55] e.g.: http://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&tab=v&vn=Navigation+Timing [08:41:04] thank you [08:41:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 5.329 second response time [08:41:46] PHP Fatal error: Allowed memory size of 201326592 bytes etc [08:42:01] so we're back to memory again, I did just up that some yesterday knowing that is not a solution [08:42:38] want to trry to request something that doesn't load? I'm camped on the error log [08:43:07] just did [08:43:20] http://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&tab=v&vn=Media+storage [08:43:59] yep got the memory error [08:44:21] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:44:38] Allowed memory size of 201326592 bytes exhausted (tried to allocate 32 bytes) in /srv/org/wikimedia/ganglia-web-3.5.7/functions.php on line 1178, referer: http://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&s=by+name&c=&tab=v&vn= [08:45:05] 192 megabytes doesn't seem crazy fwiw; it's not necessarily indicative of a leak [08:45:10] but i'll check out the file/line ref.. sec [08:45:22] well it wa s128 yesterday [08:45:43] I increased it to 192 out of 'do something temporarily til someone with more chops can look at it' [08:46:34] yesterday's error before the increase was [08:46:35] Allowed memory size of 134217728 bytes exhaust [08:46:36] ed (tried to allocate 6 bytes) in /srv/org/wikimedia/ganglia-web-3.5.7/ganglia.php on line 400 [08:46:46] 1178 is $index_array = unserialize(file_get_contents($conf['cachefile'])); [08:47:10] 400 is if (!xml_parse($parser, $data, feof($fp))) [08:47:28] gmetad exposes the set of metrics and hosts via an XML file [08:47:34] that it serves on a TCP port [08:47:39] and which ganglia-web queries [08:47:43] (and caches) [08:47:58] we've spun up a statsd instance recently, which increased the number of metrics by a fair bit [08:47:59] ok, just note that before the memory increase of yesterday, it ran out someplace else [08:48:09] so not convinced that the particular line has much to do with it [08:48:11] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [08:48:27] the other line is related to the same file [08:48:35] but if the total number of metrics is now much bigger [08:48:51] that could have an impact, I guess I was wondering that yesterday, if we'd just crossed a threshhold somehow [08:49:04] tell me more aboout the statsd thing [08:50:06] so, the typical downside to ganglia / gmond is that you need to write something that runs in a separate thread of execution to watch whatever it is you're watching [08:50:53] statsd is a proxy that sits in front of whisper (graphite's database) or ganglia/rrd that listens to metrics on a UDP port and flushes them periodically [08:51:53] and it'll automatically compute certain summary figures [08:52:04] a proxy on some separate host then, doing just that work [08:52:25] yeah [08:53:16] and i've hooked it up to page load timing info that we log via javascript on a sample of our traffic [08:53:53] i'd be more suspicious of my code, but the 'media storage' view is totally unrelated and has worked reliably before [08:54:17] so i think the most plausible story is that the total number of metrics is now past some threshold [08:54:38] what's the php memory limit on the mediawiki apaches? [08:55:26] ok well what makes me wary is that we went from 128 yesterday to 192 to 'that's still not big enough' today [08:55:51] 100M for the apaches [08:56:00] yeah, i just saw that. that is worrisome. [08:56:14] well, i think you can connect via telnet and get the XML that gmetad is generating [08:56:17] and we can see if it's nuts [08:56:27] let me get the info, i forget what port it listens on [08:56:32] sure [08:57:03] 8649 for gmond [08:57:09] telnet localhost 8649 [08:57:10] yep [08:57:19] might want to pipe that into something :D [08:59:26] yep [09:01:06] "only" 151862 lines of crapola [09:01:20] 6514599 bytes [09:02:58] only 62 hosts? [09:03:00] hrm [09:04:41] is it identical to /var/lib/ganglia/conf/ganglia_metrics.cache ? [09:04:48] I guess only those nickel aggregates for? [09:05:25] dunno, my knowledge of ganglia lore is pretty limited [09:06:52] er where is that cache file supposed to be? [09:07:25] /var/lib/ganglia/conf/ganglia_metrics.cache i think [09:08:00] not so much [09:08:29] nothing with metr in the name under that conf dir [09:10:18] $conf['cachefile'] = $conf['conf_dir'] . "/ganglia_metrics.cache"; [09:10:21] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:10:30] where conf_dir is $conf['conf_dir'] = $conf['gmetad_root'] . '/conf'; [09:10:39] and $conf['gmetad_root'] = "/var/lib/ganglia"; [09:11:09] maybe so but [09:11:29] might be that it purged the cache file and failed to regenerate it [09:11:50] root@nickel:~# find /var/lib/ganglia/conf -name \*met\* [09:11:50] root@nickel:~# [09:12:24] yeah i believe you :) [09:13:09] https://github.com/ganglia/ganglia-web/issues/179 [09:14:08] yeah but 512M I mean that's outrageous [09:15:08] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [09:15:12] well, what do you suggest? [09:15:52] i guess i could prune the metrics that get forwarded to ganglia to the subset that i need the most [09:17:06] can you do that temporarily? keep a copy of the old setup [09:17:06] most views are now OOMing [09:17:15] and see if it makes a difference [09:18:24] sure, the config's in ganglia, so could you merge a patch if i submit one in a couple of minutes? [09:18:26] we have 8 gb total on this box, at 512M a pop instead of 128 that's a serious reduction in capacity for what we can handle [09:18:29] yep [09:18:52] what else is it doing? (other than ganglia-web) [09:20:01] i think you're right that 512 is probably far more memory than a judicious implementation would need, but it's still not a lot in the grand scheme of things (ganglia's overall utility) [09:20:22] gmetad [09:20:49] that's what it does and that's all it does (so collection plus ganglia web) [09:36:55] (03PS1) 10Ori.livneh: StatsD: only report medians to Ganglia [operations/puppet] - 10https://gerrit.wikimedia.org/r/83777 [09:37:23] apergos: ^ [09:37:28] yep, looking [09:37:45] though i think upping the memory limit is appropriate, imho [09:38:04] thanks [09:38:47] right now I want to see if gangla returns to normal with this patc [09:38:48] h [09:40:55] (03CR) 10ArielGlenn: [C: 032] StatsD: only report medians to Ganglia [operations/puppet] - 10https://gerrit.wikimedia.org/r/83777 (owner: 10Ori.livneh) [09:41:03] thanks [09:41:09] i have root on hafnium so i can force a puppet run [09:41:17] well, sudo [09:41:23] it's all you [09:41:28] thanks, running [09:41:47] really do appreciate the help even if i'm quibbling with specific points :) [09:41:54] feele free to quibble [09:41:57] my point is [09:42:06] if ganglia behaves with this patch [09:42:21] i don't think that would be enough to establish culpability [09:42:28] maybe it brings us below the threshold again, sure [09:42:44] but? [09:43:05] but so what? removing any N metrics would do the trick; mine just happen to be the most recent [09:43:11] yes [09:43:14] so [09:43:33] if ganglia behaves with this patch then we do some testing to see how much memory your metrics actually need [09:43:58] kk [09:43:59] then an rt ticket goes in explaining that we prolly want more than 8gb for the ganglia box and why [09:44:36] sigh, i made a typo [09:44:47] which I did not see (I looked too) [09:45:49] (03PS1) 10Ori.livneh: Fix typo in misc/graphite.pp ('content', not 'contents') [operations/puppet] - 10https://gerrit.wikimedia.org/r/83778 [09:45:56] heh [09:46:02] shoould have caught that [09:46:53] (03CR) 10ArielGlenn: [C: 032] Fix typo in misc/graphite.pp ('content', not 'contents') [operations/puppet] - 10https://gerrit.wikimedia.org/r/83778 (owner: 10Ori.livneh) [09:47:07] (03PS1) 10TTO: Allow import from enwikisource to bnwikisource [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83779 [09:47:14] your turn again [09:49:21] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:50:06] yep, puppet running [09:58:14] k, statsd updated [09:58:50] ganglia still OOMing, but maybe i should wait 5 mins to let the current metric cache expire [10:01:36] MaxSem: ! [10:03:28] wherever that cache is :-/ [10:03:42] I could restart something-or-other [10:06:23] * ori-l proposes we compromise on 256 mb ram limit [10:06:30] :-D [10:07:10] p858snake|l: havem [10:07:12] err [10:07:15] ori-l: haven't slept yet? [10:07:25] okay, question too obvious [10:07:34] That would be my name ;) [10:07:34] YuviPanda: MaxSem is also around, but pretending otherwise [10:07:38] hehe [10:07:42] sorry, p858snake|l [10:08:13] I'm going to restart something-or-other. :-P [10:09:19] * YuviPanda goes to sleep [10:10:07] whatever works [10:10:59] mm still no good [10:10:59] that sucks [10:11:46] and [10:11:56] .. [10:11:56] after all that, [10:11:58] Allowed memory size of 268435456 bytes exhausted (tried to allocate 32 bytes) in /srv/org/wikimedia/ganglia-web-3.5.7/ganglia.php on line 184 [10:12:03] and this is after going to 256 [10:12:07] and with your fewer metrics [10:12:40] http://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&tab=v&vn=Navigation+Timing loads, tho [10:12:47] maybe 512 was the magic number :P [10:13:21] ugh [10:16:14] guess I'd better log some stuff [10:18:33] !log set php memory limit to 256 on nickel, ori reducd the number of metrics for statsd (see https://gerrit.wikimedia.org/r/#/c/83777/), restarted all of apache, gmond, gmetad and still see the occasional php Allowed memory size exhausted error [10:18:36] Logged the message, Master [10:18:57] apergos: thanks again [10:19:10] i think i'm going to crash too, 3:20AM here [10:19:22] yw but this prolly oughta be an rt ticket so we can figure out what's really needed [10:19:29] and so you can have all your metrics [10:19:30] good night [10:20:18] OK, I'll write one up tomorrow at some point [10:20:21] unless you beat me to it [10:20:22] (huh, I wnt to bed 3:30 am here last night, so everyone will be sleep deprived for today's tech meetings, yay!) [10:20:24] I might [10:21:03] cool, I'd prefer that tbh but happy to do it if you prefer not to [10:21:20] already got the tab open [10:21:27] :) [10:21:27] go get some sleep [10:21:29] good night [10:21:33] night [10:38:36] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 8.799 second response time [10:43:37] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:47:06] PROBLEM - LVS HTTP IPv6 on wikiquote-lb.esams.wikimedia.org_ipv6 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 325 bytes in 7.492 second response time [10:47:26] PROBLEM - Backend Squid HTTP on amssq37 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:47:36] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 4.722 second response time [10:47:58] PROBLEM - Frontend Squid HTTP on amssq37 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:48:11] RECOVERY - LVS HTTP IPv6 on wikiquote-lb.esams.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 61753 bytes in 1.048 second response time [10:48:18] RECOVERY - Backend Squid HTTP on amssq37 is OK: HTTP OK: HTTP/1.0 200 OK - 1423 bytes in 0.188 second response time [10:48:46] PROBLEM - LVS HTTP IPv6 on wikibooks-lb.esams.wikimedia.org_ipv6 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 325 bytes in 7.197 second response time [10:48:46] RECOVERY - Frontend Squid HTTP on amssq37 is OK: HTTP OK: HTTP/1.0 200 OK - 1417 bytes in 0.183 second response time [10:49:36] RECOVERY - LVS HTTP IPv6 on wikibooks-lb.esams.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 61753 bytes in 0.472 second response time [10:50:24] (03PS1) 10Petr Onderka: Fixed memory issues [operations/dumps/incremental] (gsoc) - 10https://gerrit.wikimedia.org/r/83783 [10:50:25] (03PS1) 10Petr Onderka: Write page after its revisions [operations/dumps/incremental] (gsoc) - 10https://gerrit.wikimedia.org/r/83784 [10:50:26] (03PS1) 10Petr Onderka: Don't save namespace as part of title [operations/dumps/incremental] (gsoc) - 10https://gerrit.wikimedia.org/r/83785 [10:50:27] (03PS1) 10Petr Onderka: Fix comments early, so that check for no changes works right [operations/dumps/incremental] (gsoc) - 10https://gerrit.wikimedia.org/r/83786 [10:50:28] (03PS1) 10Petr Onderka: Text groups for better compression [operations/dumps/incremental] (gsoc) - 10https://gerrit.wikimedia.org/r/83787 [10:50:29] (03PS1) 10Petr Onderka: Removed check for trailing zero in decompressed string [operations/dumps/incremental] (gsoc) - 10https://gerrit.wikimedia.org/r/83788 [10:50:30] (03PS1) 10Petr Onderka: Fixed compilation on Linux by moving constants to .cpp [operations/dumps/incremental] (gsoc) - 10https://gerrit.wikimedia.org/r/83789 [10:51:39] (03CR) 10Petr Onderka: [C: 032 V: 032] Fixed memory issues [operations/dumps/incremental] (gsoc) - 10https://gerrit.wikimedia.org/r/83783 (owner: 10Petr Onderka) [10:52:12] (03CR) 10Petr Onderka: [C: 032 V: 032] Write page after its revisions [operations/dumps/incremental] (gsoc) - 10https://gerrit.wikimedia.org/r/83784 (owner: 10Petr Onderka) [10:53:02] (03CR) 10Petr Onderka: [C: 032 V: 032] Don't save namespace as part of title [operations/dumps/incremental] (gsoc) - 10https://gerrit.wikimedia.org/r/83785 (owner: 10Petr Onderka) [10:53:25] (03CR) 10Petr Onderka: [C: 032 V: 032] Fix comments early, so that check for no changes works right [operations/dumps/incremental] (gsoc) - 10https://gerrit.wikimedia.org/r/83786 (owner: 10Petr Onderka) [10:53:53] (03CR) 10Petr Onderka: [C: 032 V: 032] Text groups for better compression [operations/dumps/incremental] (gsoc) - 10https://gerrit.wikimedia.org/r/83787 (owner: 10Petr Onderka) [10:54:10] (03CR) 10Petr Onderka: [C: 032 V: 032] Removed check for trailing zero in decompressed string [operations/dumps/incremental] (gsoc) - 10https://gerrit.wikimedia.org/r/83788 (owner: 10Petr Onderka) [10:54:28] (03CR) 10Petr Onderka: [C: 032 V: 032] Fixed compilation on Linux by moving constants to .cpp [operations/dumps/incremental] (gsoc) - 10https://gerrit.wikimedia.org/r/83789 (owner: 10Petr Onderka) [10:54:46] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:56:36] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 7.138 second response time [11:02:49] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:03:46] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 9.118 second response time [11:06:47] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:12:46] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 6.286 second response time [11:15:56] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:18:56] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 7.393 second response time [11:23:56] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:24:56] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 6.834 second response time [11:27:57] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:28:57] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 8.064 second response time [11:30:56] PROBLEM - LVS HTTP IPv6 on wikibooks-lb.esams.wikimedia.org_ipv6 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 325 bytes in 7.203 second response time [11:33:17] RECOVERY - LVS HTTP IPv6 on wikibooks-lb.esams.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 61617 bytes in 8.222 second response time [11:33:17] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:33:47] PROBLEM - MySQL Processlist on db1051 is CRITICAL: CRIT 0 unauthenticated, 0 locked, 0 copy to table, 76 statistics [11:33:56] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 7.302 second response time [11:34:46] RECOVERY - MySQL Processlist on db1051 is OK: OK 1 unauthenticated, 0 locked, 0 copy to table, 2 statistics [11:35:56] PROBLEM - LVS HTTP IPv6 on wikivoyage-lb.esams.wikimedia.org_ipv6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:36:56] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:38:47] RECOVERY - LVS HTTP IPv6 on wikivoyage-lb.esams.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 45451 bytes in 5.147 second response time [11:39:56] PROBLEM - LVS HTTP IPv6 on foundation-lb.esams.wikimedia.org_ipv6 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 325 bytes in 7.196 second response time [11:40:46] PROBLEM - MySQL Processlist on db1051 is CRITICAL: CRIT 0 unauthenticated, 0 locked, 0 copy to table, 41 statistics [11:41:06] PROBLEM - LVS HTTP IPv6 on wikimedia-lb.esams.wikimedia.org_ipv6 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 325 bytes in 7.195 second response time [11:41:47] RECOVERY - LVS HTTP IPv6 on foundation-lb.esams.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 61754 bytes in 2.630 second response time [11:42:16] PROBLEM - LVS HTTP IPv6 on wikiversity-lb.esams.wikimedia.org_ipv6 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 325 bytes in 7.199 second response time [11:43:06] RECOVERY - LVS HTTP IPv6 on wikimedia-lb.esams.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 93296 bytes in 8.079 second response time [11:43:46] RECOVERY - MySQL Processlist on db1051 is OK: OK 1 unauthenticated, 0 locked, 0 copy to table, 14 statistics [11:43:47] PROBLEM - Backend Squid HTTP on amssq36 is CRITICAL: Connection timed out [11:44:16] PROBLEM - LVS HTTPS IPv6 on wikiquote-lb.esams.wikimedia.org_ipv6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:44:49] RECOVERY - Backend Squid HTTP on amssq36 is OK: HTTP OK: HTTP/1.0 200 OK - 1423 bytes in 0.179 second response time [11:45:01] PROBLEM - LVS HTTPS IPv6 on wikivoyage-lb.esams.wikimedia.org_ipv6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:45:16] RECOVERY - LVS HTTPS IPv6 on wikiquote-lb.esams.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 61563 bytes in 8.349 second response time [11:45:16] PROBLEM - LVS HTTPS IPv6 on wikimedia-lb.esams.wikimedia.org_ipv6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:45:50] RECOVERY - LVS HTTPS IPv6 on wikivoyage-lb.esams.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 45451 bytes in 3.722 second response time [11:45:56] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 7.363 second response time [11:45:56] PROBLEM - LVS HTTP IPv6 on foundation-lb.esams.wikimedia.org_ipv6 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 325 bytes in 7.199 second response time [11:46:07] RECOVERY - LVS HTTP IPv6 on wikiversity-lb.esams.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 61754 bytes in 1.468 second response time [11:46:07] RECOVERY - LVS HTTPS IPv6 on wikimedia-lb.esams.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 93295 bytes in 1.847 second response time [11:46:47] RECOVERY - LVS HTTP IPv6 on foundation-lb.esams.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 61754 bytes in 0.395 second response time [11:46:56] PROBLEM - LVS HTTP IPv6 on wiktionary-lb.esams.wikimedia.org_ipv6 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 325 bytes in 7.189 second response time [11:47:16] PROBLEM - Backend Squid HTTP on amssq32 is CRITICAL: Connection timed out [11:47:17] PROBLEM - Frontend Squid HTTP on amssq32 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:47:52] RECOVERY - LVS HTTP IPv6 on wiktionary-lb.esams.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 61754 bytes in 0.583 second response time [11:47:56] PROBLEM - LVS HTTP IPv6 on wikisource-lb.esams.wikimedia.org_ipv6 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 325 bytes in 7.207 second response time [11:48:48] RECOVERY - LVS HTTP IPv6 on wikisource-lb.esams.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 61754 bytes in 0.678 second response time [11:50:17] RECOVERY - Backend Squid HTTP on amssq32 is OK: HTTP OK: HTTP/1.0 200 OK - 1414 bytes in 0.469 second response time [11:50:17] RECOVERY - Frontend Squid HTTP on amssq32 is OK: HTTP OK: HTTP/1.0 200 OK - 1408 bytes in 0.475 second response time [11:50:27] PROBLEM - LVS HTTP IPv6 on wikiversity-lb.esams.wikimedia.org_ipv6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:50:27] PROBLEM - LVS HTTPS IPv6 on mediawiki-lb.esams.wikimedia.org_ipv6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:51:16] RECOVERY - LVS HTTPS IPv6 on mediawiki-lb.esams.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 61752 bytes in 0.850 second response time [11:52:17] RECOVERY - LVS HTTP IPv6 on wikiversity-lb.esams.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 61754 bytes in 0.679 second response time [12:09:37] (03PS1) 10Aude: Add DataTypes extension for wikidata singlenode [operations/puppet] - 10https://gerrit.wikimedia.org/r/83793 [12:15:51] (03PS1) 10Mark Bergsma: Correct Text squids esams gmond port [operations/puppet] - 10https://gerrit.wikimedia.org/r/83794 [12:16:21] (03CR) 10Mark Bergsma: [C: 032] Correct Text squids esams gmond port [operations/puppet] - 10https://gerrit.wikimedia.org/r/83794 (owner: 10Mark Bergsma) [13:06:05] PROBLEM - Apache HTTP on mw1084 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:06:58] PROBLEM - Apache HTTP on mw1079 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:07:05] RECOVERY - Apache HTTP on mw1084 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 2.186 second response time [13:07:46] RECOVERY - Apache HTTP on mw1079 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 1.036 second response time [13:25:55] PROBLEM - Apache HTTP on mw1055 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:26:45] RECOVERY - Apache HTTP on mw1055 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.086 second response time [13:42:33] PROBLEM - Apache HTTP on mw1087 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:42:34] RECOVERY - Apache HTTP on mw1087 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 3.091 second response time [13:49:15] PROBLEM - Apache HTTP on mw1092 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:50:06] RECOVERY - Apache HTTP on mw1092 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 2.328 second response time [14:13:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:16:25] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 8.867 second response time [14:22:34] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:23:34] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 4.271 second response time [14:28:35] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:29:34] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 7.499 second response time [14:32:14] PROBLEM - Apache HTTP on mw1065 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:33:06] RECOVERY - Apache HTTP on mw1065 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 2.778 second response time [14:39:14] PROBLEM - Apache HTTP on mw1059 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:40:09] RECOVERY - Apache HTTP on mw1059 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.065 second response time [14:40:44] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:41:38] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [15:05:49] PROBLEM - Apache HTTP on mw1074 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:06:49] RECOVERY - Apache HTTP on mw1074 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 6.770 second response time [15:52:39] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:53:29] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.130 second response time [15:58:52] europe is sslloowwwwww in loading of wikis [16:20:59] well, at least it loads (for me) [16:37:03] (03CR) 10Chad: [C: 032] Use Cirrus as default on mw.org [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83759 (owner: 10Chad) [16:37:18] (03Merged) 10jenkins-bot: Use Cirrus as default on mw.org [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83759 (owner: 10Chad) [16:39:05] !log demon synchronized wmf-config/InitialiseSettings.php 'Cirrus as default on mw.org, Lucene as secondary' [16:39:08] Logged the message, Master [16:57:54] hey yalls, anybody know what machine 10.4.1.118 is? qchris_away is asking because the X-Analytics header in request logs from that machine is weird [17:00:07] looks like some labs instance [17:01:14] (03CR) 10Aaron Schulz: [C: 032] Removed ceph backend config for decommissioning [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83761 (owner: 10Aaron Schulz) [17:01:17] <^d> I can't find the instance on wikitech. [17:01:28] (03Merged) 10jenkins-bot: Removed ceph backend config for decommissioning [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83761 (owner: 10Aaron Schulz) [17:02:45] bah [17:02:47] !log aaron synchronized wmf-config/filebackend.php 'Removed ceph backend config for decommissioning' [17:02:50] Logged the message, Master [17:08:53] (03PS1) 10Aaron Schulz: Remove one last ceph reference [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83820 [17:09:21] (03CR) 10Chad: [C: 032] Remove one last ceph reference [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83820 (owner: 10Aaron Schulz) [17:09:31] (03Merged) 10jenkins-bot: Remove one last ceph reference [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83820 (owner: 10Aaron Schulz) [17:10:08] !log aaron synchronized wmf-config/filebackend.php [17:10:11] Logged the message, Master [17:20:23] (03PS1) 10Jgreen: redeploy OTRS idle_agent_report [operations/puppet] - 10https://gerrit.wikimedia.org/r/83822 [17:20:47] PROBLEM - Host testsearch1001 is DOWN: PING CRITICAL - Packet loss = 100% [17:21:00] !log rebooting testsearch1001 to troubleshoot [17:21:04] Logged the message, RobH [17:23:48] RECOVERY - Host testsearch1001 is UP: PING OK - Packet loss = 0%, RTA = 0.70 ms [17:24:49] (03CR) 10Jgreen: [C: 032 V: 031] redeploy OTRS idle_agent_report [operations/puppet] - 10https://gerrit.wikimedia.org/r/83822 (owner: 10Jgreen) [17:32:27] PROBLEM - Host testsearch1001 is DOWN: PING CRITICAL - Packet loss = 100% [17:37:06] RECOVERY - Host testsearch1001 is UP: PING OK - Packet loss = 0%, RTA = 0.25 ms [17:37:08] wikidata - creating references for claim. . MediaWiki (Q83) ‎ (‎Added reference to claim: stable version (P348): 1.21.2 [17:39:24] !log testsearch1001 back in pool, 1002 rebooting for fix [17:39:27] Logged the message, RobH [17:40:47] (03PS1) 10Cmjohnson: Removing decom'd sq servers from dns [operations/dns] - 10https://gerrit.wikimedia.org/r/83830 [17:41:18] (03PS3) 10QChris: Split off geowiki cron job into separate class [operations/puppet] - 10https://gerrit.wikimedia.org/r/82410 [17:41:19] (03PS3) 10QChris: Extract geowiki's research MySQL config into separate class [operations/puppet] - 10https://gerrit.wikimedia.org/r/82411 [17:41:20] (03PS3) 10QChris: Add cronjob to generate and push geowiki's limn files [operations/puppet] - 10https://gerrit.wikimedia.org/r/82412 [17:41:29] PROBLEM - Host testsearch1002 is DOWN: PING CRITICAL - Packet loss = 100% [17:43:57] (03PS2) 10Cmjohnson: Removing decom'd sq servers from dns [operations/dns] - 10https://gerrit.wikimedia.org/r/83830 [17:44:47] RECOVERY - Host testsearch1002 is UP: PING OK - Packet loss = 0%, RTA = 0.24 ms [17:49:03] !log testsearch1002 back up, 1003 down [17:49:07] Logged the message, RobH [17:50:56] PROBLEM - Host testsearch1003 is DOWN: PING CRITICAL - Packet loss = 100% [17:53:17] (03PS1) 10Springle: remove db1045 from unused pool [operations/puppet] - 10https://gerrit.wikimedia.org/r/83832 [17:54:25] (03CR) 10Springle: [C: 032 V: 032] remove db1045 from unused pool [operations/puppet] - 10https://gerrit.wikimedia.org/r/83832 (owner: 10Springle) [17:55:27] (03Abandoned) 10Cmjohnson: Removing decom'd sq servers from dns [operations/dns] - 10https://gerrit.wikimedia.org/r/83830 (owner: 10Cmjohnson) [17:55:46] RECOVERY - Host testsearch1003 is UP: PING OK - Packet loss = 0%, RTA = 0.24 ms [17:57:54] !log forgot to log testsearch1003 is back [17:57:57] Logged the message, RobH [17:58:15] ^d: the odd part was 1003 had the settings that cause the OTHER cpu spike issue [17:58:19] so yay....? [17:58:24] they all match now and will be fine. [17:58:33] <^d> Yay consistency :) [18:10:28] hey folks! we're doing some network maintenance in tampa - i've taken precautions and am reasonably certain no network disruption should happen, but please let me know (and accept my apologies) if it does [18:10:29] !log doing some network maintenance on cr2-pmtpa [18:10:32] Logged the message, Mistress of the network gear. [18:19:27] (03CR) 10Lcarr: [C: 032] adding in some mmgt addresses in ulsfo [operations/dns] - 10https://gerrit.wikimedia.org/r/83426 (owner: 10Lcarr) [18:22:54] PROBLEM - Puppet freshness on lvs1003 is CRITICAL: No successful Puppet run in the last 10 hours [18:25:01] Not sure if it's co-incidence, but I can't browse wikimedia sites now [18:25:33] !log moving all external traffic to come in via eqiad [18:25:33] Reedy: oh ? [18:25:33] wait i am pausing [18:25:34] Reedy: are you here ? [18:25:36] Logged the message, Mistress of the network gear. [18:25:58] I'm in the office, yeah [18:26:10] SSH seems ok to bast1001, but can't load wikidata or wikipedia [18:26:11] so i actually didn't commit this yet [18:26:48] LeslieCarr: it's not just me [18:26:53] it's the office [18:26:54] RECOVERY - Puppet freshness on lvs1003 is OK: puppet ran at Wed Sep 11 18:26:51 UTC 2013 [18:27:33] "I'm not slacking, I can't access wikipedia" [18:27:49] Working again.. [18:27:54] wtf. [18:31:26] we had issues up here too, though assumed it was office internet [18:38:24] (03PS4) 10Ottomata: Split off geowiki cron job into separate class [operations/puppet] - 10https://gerrit.wikimedia.org/r/82410 (owner: 10QChris) [18:38:34] (03CR) 10Ottomata: [C: 032 V: 032] Split off geowiki cron job into separate class [operations/puppet] - 10https://gerrit.wikimedia.org/r/82410 (owner: 10QChris) [18:38:44] (03PS4) 10Ottomata: Extract geowiki's research MySQL config into separate class [operations/puppet] - 10https://gerrit.wikimedia.org/r/82411 (owner: 10QChris) [18:38:54] (03CR) 10Ottomata: [C: 032 V: 032] Extract geowiki's research MySQL config into separate class [operations/puppet] - 10https://gerrit.wikimedia.org/r/82411 (owner: 10QChris) [18:39:03] (03PS4) 10Ottomata: Add cronjob to generate and push geowiki's limn files [operations/puppet] - 10https://gerrit.wikimedia.org/r/82412 (owner: 10QChris) [18:39:20] so it was actually a small percentage of traffic… which happened to include the office [18:39:21] (03CR) 10Ottomata: [C: 032 V: 032] Add cronjob to generate and push geowiki's limn files [operations/puppet] - 10https://gerrit.wikimedia.org/r/82412 (owner: 10QChris) [18:43:57] Haha [18:44:02] Nice [18:48:42] (03PS1) 10coren: Add self (mpelletier) to paging group [operations/puppet] - 10https://gerrit.wikimedia.org/r/83840 [18:49:00] (03PS2) 10Reedy: Allow import from enwikisource to bnwikisource [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83779 (owner: 10TTO) [18:49:07] (03CR) 10Reedy: [C: 032] Allow import from enwikisource to bnwikisource [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83779 (owner: 10TTO) [18:49:20] (03Merged) 10jenkins-bot: Allow import from enwikisource to bnwikisource [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83779 (owner: 10TTO) [18:50:23] (03PS1) 10Ori.livneh: Ensure changes to ganglia-vhtcpd.py trigger gmond service refresh [operations/puppet] - 10https://gerrit.wikimedia.org/r/83841 [18:50:37] (03PS1) 10Ottomata: Resolving $geowiki_path variable correctly in new class misc::statistics::geowiki::mysql::conf::research [operations/puppet] - 10https://gerrit.wikimedia.org/r/83842 [18:50:52] (03CR) 10Ottomata: [C: 032 V: 032] Resolving $geowiki_path variable correctly in new class misc::statistics::geowiki::mysql::conf::research [operations/puppet] - 10https://gerrit.wikimedia.org/r/83842 (owner: 10Ottomata) [18:51:08] ottomata1: waittttttt [18:51:16] eh wha? [18:51:18] this too plz plz https://gerrit.wikimedia.org/r/#/c/83841/1 :P [18:51:48] (03PS2) 10Reedy: Remove wikimania2005 whitelist from InitialiseSettings.php [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83776 (owner: 10TTO) [18:51:54] (03CR) 10Reedy: [C: 032] Remove wikimania2005 whitelist from InitialiseSettings.php [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83776 (owner: 10TTO) [18:51:56] (03PS2) 10Ottomata: Ensure changes to ganglia-vhtcpd.py trigger gmond service refresh [operations/puppet] - 10https://gerrit.wikimedia.org/r/83841 (owner: 10Ori.livneh) [18:52:02] (03CR) 10Ottomata: [C: 032 V: 032] Ensure changes to ganglia-vhtcpd.py trigger gmond service refresh [operations/puppet] - 10https://gerrit.wikimedia.org/r/83841 (owner: 10Ori.livneh) [18:52:04] ottomata1: thank you! [18:52:17] (03Merged) 10jenkins-bot: Remove wikimania2005 whitelist from InitialiseSettings.php [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83776 (owner: 10TTO) [18:52:35] yup! [18:53:15] (03PS1) 10Dzahn: install_certificate: change the behaviour for $privatekey=true/false. [operations/puppet] - 10https://gerrit.wikimedia.org/r/83843 [18:53:54] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:54:05] (03PS3) 10Reedy: commonswiki wgImportSources, add some sources [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83678 (owner: 10Jeremyb) [18:54:12] (03CR) 10Reedy: [C: 032] commonswiki wgImportSources, add some sources [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83678 (owner: 10Jeremyb) [18:54:24] (03Merged) 10jenkins-bot: commonswiki wgImportSources, add some sources [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83678 (owner: 10Jeremyb) [18:54:56] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 7.666 second response time [18:55:18] (03PS3) 10Reedy: Enable Extension:NewUserMessage on fa.wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83765 (owner: 10Dereckson) [18:55:22] (03CR) 10Reedy: [C: 032] Enable Extension:NewUserMessage on fa.wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83765 (owner: 10Dereckson) [18:55:49] (03Merged) 10jenkins-bot: Enable Extension:NewUserMessage on fa.wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83765 (owner: 10Dereckson) [18:56:07] (03PS2) 10Reedy: Sets wgBlockDisablesLogin true on private wikis. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83375 (owner: 10Dereckson) [18:56:13] (03CR) 10Reedy: [C: 032] Sets wgBlockDisablesLogin true on private wikis. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83375 (owner: 10Dereckson) [18:56:59] (03Merged) 10jenkins-bot: Sets wgBlockDisablesLogin true on private wikis. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83375 (owner: 10Dereckson) [18:57:21] (03PS2) 10Reedy: Add filemover group on it.wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83386 (owner: 10Dereckson) [18:57:38] (03CR) 10Reedy: [C: 032] Add filemover group on it.wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83386 (owner: 10Dereckson) [18:57:50] (03Merged) 10jenkins-bot: Add filemover group on it.wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83386 (owner: 10Dereckson) [18:57:55] spammmmm :D [18:58:10] (03PS2) 10Dzahn: install_certificate: change the behaviour for $privatekey=true/false. [operations/puppet] - 10https://gerrit.wikimedia.org/r/83843 [18:58:22] (03PS2) 10Reedy: Revert temp workaround: $wgNewUserMessageOnAutoCreate true on Meta-Wiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83355 (owner: 10Nemo bis) [18:58:30] (03CR) 10Reedy: [C: 032] Revert temp workaround: $wgNewUserMessageOnAutoCreate true on Meta-Wiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83355 (owner: 10Nemo bis) [18:58:43] (03Merged) 10jenkins-bot: Revert temp workaround: $wgNewUserMessageOnAutoCreate true on Meta-Wiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83355 (owner: 10Nemo bis) [18:58:43] !throttle Reedy 50 [18:59:06] (03CR) 10QChris: "Thanks!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/83842 (owner: 10Ottomata) [18:59:31] (03PS2) 10Reedy: Remove hardcoded accountcreator right for MSU proteins lab [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83075 (owner: 10TTO) [18:59:36] (03CR) 10Reedy: [C: 032] Remove hardcoded accountcreator right for MSU proteins lab [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83075 (owner: 10TTO) [18:59:48] (03Merged) 10jenkins-bot: Remove hardcoded accountcreator right for MSU proteins lab [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83075 (owner: 10TTO) [18:59:57] (03PS1) 10Jgreen: minor fixes to otrs idle_agent_report for 2.4->3.2 [operations/puppet] - 10https://gerrit.wikimedia.org/r/83845 [19:00:33] etherpad.wikimedia.org is flakey under load, second "Service Temp Unavailable" [19:01:15] (03PS6) 10Reedy: skwiktionary: Set site logo to local file [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/80321 (owner: 10Danny B.) [19:01:21] (03CR) 10Reedy: [C: 032] skwiktionary: Set site logo to local file [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/80321 (owner: 10Danny B.) [19:01:39] (03Merged) 10jenkins-bot: skwiktionary: Set site logo to local file [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/80321 (owner: 10Danny B.) [19:02:19] (03PS2) 10Reedy: Restricting editing of the module namespace on es.wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/81873 (owner: 10Dereckson) [19:02:24] (03CR) 10Reedy: [C: 032] Restricting editing of the module namespace on es.wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/81873 (owner: 10Dereckson) [19:03:00] (03CR) 10jenkins-bot: [V: 04-1] Restricting editing of the module namespace on es.wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/81873 (owner: 10Dereckson) [19:03:03] (03Merged) 10jenkins-bot: Restricting editing of the module namespace on es.wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/81873 (owner: 10Dereckson) [19:03:47] (03PS4) 10Reedy: Remove VisualEditor's dependency on EventLogging [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/79290 (owner: 10Jforrester) [19:03:53] (03CR) 10Reedy: [C: 032] Remove VisualEditor's dependency on EventLogging [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/79290 (owner: 10Jforrester) [19:04:18] (03Merged) 10jenkins-bot: Remove VisualEditor's dependency on EventLogging [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/79290 (owner: 10Jforrester) [19:04:35] (03PS4) 10Reedy: Enable VisualEditor beta welcome notice for all wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/77269 (owner: 10Catrope) [19:04:40] (03CR) 10Reedy: [C: 032] Enable VisualEditor beta welcome notice for all wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/77269 (owner: 10Catrope) [19:05:04] Jeff_Green: when's a good time to test? [19:05:17] i saw your patch go by :) [19:05:56] it generally seems to work now that I figured out they had renamed a cookie. I'm looking at adding these other queries atm [19:06:36] ok, cool [19:06:43] happy hack day :) [19:07:04] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:07:06] :-$ [19:07:37] (03CR) 10Jgreen: [C: 032 V: 031] minor fixes to otrs idle_agent_report for 2.4->3.2 [operations/puppet] - 10https://gerrit.wikimedia.org/r/83845 (owner: 10Jgreen) [19:07:54] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.132 second response time [19:08:14] (03PS1) 10Akosiaris: Turn backup::host to an included class [operations/puppet] - 10https://gerrit.wikimedia.org/r/83848 [19:09:31] (03CR) 10jenkins-bot: [V: 04-1] Enable VisualEditor beta welcome notice for all wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/77269 (owner: 10Catrope) [19:09:57] (03CR) 10Reedy: [V: 032] Enable VisualEditor beta welcome notice for all wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/77269 (owner: 10Catrope) [19:10:27] !log more network changes, network disruptions unlikely but possible for tampa [19:10:31] Logged the message, Mistress of the network gear. [19:10:31] (03PS4) 10Reedy: Adjust reupload-own permissions for ckbwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/80546 (owner: 10TTO) [19:10:36] (03CR) 10Reedy: [C: 032] Adjust reupload-own permissions for ckbwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/80546 (owner: 10TTO) [19:10:47] (03CR) 10Akosiaris: [C: 032] Turn backup::host to an included class [operations/puppet] - 10https://gerrit.wikimedia.org/r/83848 (owner: 10Akosiaris) [19:10:53] (03Merged) 10jenkins-bot: Adjust reupload-own permissions for ckbwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/80546 (owner: 10TTO) [19:11:38] PROBLEM - Host ms-fe.pmtpa.wmnet is DOWN: PING CRITICAL - Packet loss = 100% [19:11:47] (03PS2) 10Reedy: Remove a bunch of obsolete rules [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83572 (owner: 10MaxSem) [19:11:53] (03CR) 10Reedy: [C: 032] Remove a bunch of obsolete rules [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83572 (owner: 10MaxSem) [19:12:08] PROBLEM - Host wikiquote-lb.pmtpa.wikimedia.org_ipv6 is DOWN: /bin/ping6 -n -U -w 15 -c 5 2620:0:860:ed1a::3 [19:12:23] why you make icinga cry? :) [19:12:27] PROBLEM - Host wikinews-lb.pmtpa.wikimedia.org_ipv6 is DOWN: /bin/ping6 -n -U -w 15 -c 5 2620:0:860:ed1a::6 [19:12:28] it's known [19:12:30] it's being fixed [19:12:36] worked on [19:12:41] oh, i assumed it was leslie's !Log [19:12:42] PROBLEM - Host wikivoyage-lb.pmtpa.wikimedia.org_ipv6 is DOWN: /bin/ping6 -n -U -w 15 -c 5 2620:0:860:ed1a::13 [19:12:47] (03Merged) 10jenkins-bot: Remove a bunch of obsolete rules [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83572 (owner: 10MaxSem) [19:12:51] PROBLEM - Host wikibooks-lb.pmtpa.wikimedia.org_ipv6 is DOWN: /bin/ping6 -n -U -w 15 -c 5 2620:0:860:ed1a::4 [19:12:52] i rolled back [19:12:55] (03PS3) 10Reedy: Add unstructured element translation to Commons [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/82232 (owner: 10Nemo bis) [19:12:59] PROBLEM - Host upload-lb.pmtpa.wikimedia.org_ipv6 is DOWN: /bin/ping6 -n -U -w 15 -c 5 2620:0:860:ed1a::b [19:13:05] (03CR) 10Reedy: [C: 032] Add unstructured element translation to Commons [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/82232 (owner: 10Nemo bis) [19:13:08] PROBLEM - Host wikiversity-lb.pmtpa.wikimedia.org_ipv6 is DOWN: /bin/ping6 -n -U -w 15 -c 5 2620:0:860:ed1a::7 [19:13:16] well paging works :-D [19:13:19] :) [19:13:26] (03Merged) 10jenkins-bot: Add unstructured element translation to Commons [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/82232 (owner: 10Nemo bis) [19:13:26] PROBLEM - Host foundation-lb.pmtpa.wikimedia.org_ipv6 is DOWN: /bin/ping6 -n -U -w 15 -c 5 2620:0:860:ed1a::9 [19:13:28] everyone should get recovery pages soon [19:13:53] PROBLEM - Host wikisource-lb.pmtpa.wikimedia.org_ipv6 is DOWN: /bin/ping6 -n -U -w 15 -c 5 2620:0:860:ed1a::5 [19:14:08] watchmouse didn't notice [19:14:10] PROBLEM - Host mediawiki-lb.pmtpa.wikimedia.org_ipv6_https is DOWN: /bin/ping6 -n -U -w 15 -c 5 2620:0:860:ed1a::8 [19:14:15] AFAICT [19:14:27] PROBLEM - Host mediawiki-lb.pmtpa.wikimedia.org_ipv6 is DOWN: /bin/ping6 -n -U -w 15 -c 5 2620:0:860:ed1a::8 [19:14:38] PROBLEM - Host wikimedia-lb.pmtpa.wikimedia.org_ipv6 is DOWN: /bin/ping6 -n -U -w 15 -c 5 2620:0:860:ed1a::0 [19:14:52] PROBLEM - Host wikipedia-lb.pmtpa.wikimedia.org_ipv6 is DOWN: /bin/ping6 -n -U -w 15 -c 5 2620:0:860:ed1a::1 [19:15:10] PROBLEM - Host wikidata-lb.pmtpa.wikimedia.org is DOWN: CRITICAL - Network Unreachable (208.80.152.218) [19:15:23] PROBLEM - Host wikivoyage-lb.pmtpa.wikimedia.org is DOWN: CRITICAL - Network Unreachable (208.80.152.219) [19:15:35] PROBLEM - Host wikidata-lb.pmtpa.wikimedia.org_ipv6 is DOWN: /bin/ping6 -n -U -w 15 -c 5 2620:0:860:ed1a::12 [19:15:48] PROBLEM - Host wiktionary-lb.pmtpa.wikimedia.org_ipv6 is DOWN: /bin/ping6 -n -U -w 15 -c 5 2620:0:860:ed1a::2 [19:16:00] PROBLEM - Host search-pool5.svc.pmtpa.wmnet is DOWN: PING CRITICAL - Packet loss = 100% [19:16:13] PROBLEM - Host search-pool4.svc.pmtpa.wmnet is DOWN: PING CRITICAL - Packet loss = 100% [19:16:29] PROBLEM - Host search-prefix.svc.pmtpa.wmnet is DOWN: PING CRITICAL - Packet loss = 100% [19:17:03] RECOVERY - Host search-pool5.svc.pmtpa.wmnet is UP: PING OK - Packet loss = 0%, RTA = 27.00 ms [19:17:15] RECOVERY - Host search-prefix.svc.pmtpa.wmnet is UP: PING OK - Packet loss = 0%, RTA = 27.15 ms [19:17:22] (03PS4) 10Yuvipanda: Add SSL support to the proxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/83773 [19:17:28] RECOVERY - Host search-pool4.svc.pmtpa.wmnet is UP: PING OK - Packet loss = 0%, RTA = 28.93 ms [19:17:43] RECOVERY - Host ms-fe.pmtpa.wmnet is UP: PING OK - Packet loss = 0%, RTA = 27.95 ms [19:19:02] i think we give watchmouse a 5 minute timeout [19:19:38] PROBLEM - Host wikivoyage-lb.pmtpa.wikimedia.org_ipv6_https is DOWN: (Host Check Timed Out) [19:19:54] PROBLEM - Host rendering.svc.pmtpa.wmnet is DOWN: (Host Check Timed Out) [19:20:11] PROBLEM - Host wikisource-lb.pmtpa.wikimedia.org_ipv6_https is DOWN: (Host Check Timed Out) [19:20:22] PROBLEM - Host wiktionary-lb.pmtpa.wikimedia.org_ipv6_https is DOWN: (Host Check Timed Out) [19:20:32] PROBLEM - Host wiktionary-lb.pmtpa.wikimedia.org is DOWN: (Host Check Timed Out) [19:20:45] PROBLEM - Host wikiversity-lb.pmtpa.wikimedia.org_https is DOWN: (Host Check Timed Out) [19:20:56] PROBLEM - Host wikiversity-lb.pmtpa.wikimedia.org_ipv6_https is DOWN: (Host Check Timed Out) [19:21:10] PROBLEM - Host wiktionary-lb.pmtpa.wikimedia.org_https is DOWN: (Host Check Timed Out) [19:21:24] PROBLEM - Host wikisource-lb.pmtpa.wikimedia.org_https is DOWN: (Host Check Timed Out) [19:21:37] PROBLEM - Host bits-lb.pmtpa.wikimedia.org_ipv6 is DOWN: (Host Check Timed Out) [19:22:20] PROBLEM - Host wikivoyage-lb.pmtpa.wikimedia.org_https is DOWN: (Host Check Timed Out) [19:22:20] RECOVERY - Host wikidata-lb.pmtpa.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 26.66 ms [19:22:20] RECOVERY - Host wikivoyage-lb.pmtpa.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 26.59 ms [19:22:30] RECOVERY - Host wikibooks-lb.pmtpa.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 26.67 ms [19:22:39] RECOVERY - Host wikiversity-lb.pmtpa.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 26.63 ms [19:22:51] RECOVERY - Host wikisource-lb.pmtpa.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 26.68 ms [19:23:07] RECOVERY - Host wikipedia-lb.pmtpa.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 26.68 ms [19:23:19] RECOVERY - Host wikimedia-lb.pmtpa.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 26.69 ms [19:23:30] RECOVERY - Host wikidata-lb.pmtpa.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 26.69 ms [19:23:42] RECOVERY - Host mediawiki-lb.pmtpa.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 26.72 ms [19:24:32] RECOVERY - Host bits-lb.pmtpa.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 26.61 ms [19:24:42] RECOVERY - Host foundation-lb.pmtpa.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 26.65 ms [19:24:58] RECOVERY - Host wikivoyage-lb.pmtpa.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 26.63 ms [19:25:13] RECOVERY - Host wikiquote-lb.pmtpa.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 26.68 ms [19:25:24] RECOVERY - Host wiktionary-lb.pmtpa.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 26.68 ms [19:25:35] RECOVERY - Host wikinews-lb.pmtpa.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 26.63 ms [19:25:59] RECOVERY - Host wikivoyage-lb.pmtpa.wikimedia.org_ipv6_https is UP: PING OK - Packet loss = 0%, RTA = 28.42 ms [19:26:04] (03PS2) 10Reedy: Set $wgLogSpamBlacklistHits to true wherever SpamBlacklist is enabled [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83353 (owner: 10Legoktm) [19:26:09] (03CR) 10Reedy: [C: 032] Set $wgLogSpamBlacklistHits to true wherever SpamBlacklist is enabled [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83353 (owner: 10Legoktm) [19:26:14] RECOVERY - Host rendering.svc.pmtpa.wmnet is UP: PING OK - Packet loss = 0%, RTA = 26.66 ms [19:26:24] RECOVERY - Host wikisource-lb.pmtpa.wikimedia.org_ipv6_https is UP: PING OK - Packet loss = 0%, RTA = 26.60 ms [19:26:29] (03Merged) 10jenkins-bot: Set $wgLogSpamBlacklistHits to true wherever SpamBlacklist is enabled [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83353 (owner: 10Legoktm) [19:26:36] RECOVERY - Host wiktionary-lb.pmtpa.wikimedia.org_ipv6_https is UP: PING OK - Packet loss = 0%, RTA = 26.61 ms [19:26:47] RECOVERY - Host mediawiki-lb.pmtpa.wikimedia.org_ipv6_https is UP: PING OK - Packet loss = 0%, RTA = 26.59 ms [19:27:02] RECOVERY - Host wiktionary-lb.pmtpa.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 26.63 ms [19:27:12] RECOVERY - Host wikiversity-lb.pmtpa.wikimedia.org_https is UP: PING OK - Packet loss = 0%, RTA = 26.60 ms [19:27:23] RECOVERY - Host wikiversity-lb.pmtpa.wikimedia.org_ipv6_https is UP: PING OK - Packet loss = 0%, RTA = 26.59 ms [19:27:58] RECOVERY - Host wiktionary-lb.pmtpa.wikimedia.org_https is UP: PING OK - Packet loss = 0%, RTA = 26.67 ms [19:28:04] (03PS5) 10Reedy: Enable VisualEditor beta welcome notice for all wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/77269 (owner: 10Catrope) [19:28:09] (03CR) 10Reedy: [C: 032] Enable VisualEditor beta welcome notice for all wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/77269 (owner: 10Catrope) [19:28:15] RECOVERY - Host wikisource-lb.pmtpa.wikimedia.org_https is UP: PING OK - Packet loss = 0%, RTA = 27.01 ms [19:28:26] (03Merged) 10jenkins-bot: Enable VisualEditor beta welcome notice for all wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/77269 (owner: 10Catrope) [19:28:27] RECOVERY - Host upload-lb.pmtpa.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 27.72 ms [19:28:35] RECOVERY - Host wikivoyage-lb.pmtpa.wikimedia.org_https is UP: PING OK - Packet loss = 0%, RTA = 26.62 ms [19:30:31] !log reedy synchronized wmf-config/ [19:30:34] Logged the message, Master [19:33:35] !log reedy synchronized php-1.22wmf16/extensions/CleanChanges 'Stage' [19:33:38] Logged the message, Master [19:59:20] (03PS2) 10Reedy: Enable Global AbuseFilter for more wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/82649 (owner: 10CSteipp) [19:59:31] (03CR) 10Reedy: [C: 032] Enable Global AbuseFilter for more wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/82649 (owner: 10CSteipp) [19:59:45] (03Merged) 10jenkins-bot: Enable Global AbuseFilter for more wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/82649 (owner: 10CSteipp) [20:00:17] !log reedy synchronized wmf-config/InitialiseSettings.php [20:00:20] Logged the message, Master [20:06:36] (03CR) 10Andrew Bogott: [V: 031] install_certificate: change the behaviour for $privatekey=true/false. [operations/puppet] - 10https://gerrit.wikimedia.org/r/83843 (owner: 10Dzahn) [20:07:47] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:08:10] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 1.700 second response time [20:08:30] yoo cmjohnson1 [20:08:40] i guess we both volunteered to help manybubbles with elastic search stuff, eh? [20:08:44] he's around now, you avail? [20:08:49] want to sit with him and get an overview? [20:13:42] (03PS5) 10Yuvipanda: Add SSL support to the proxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/83773 [20:14:10] !log - update mwlib.epub to 0.14.3 [20:14:14] Logged the message, Master [20:14:16] !log - update mwlib to 0.15.11 [20:14:20] Logged the message, Master [20:14:42] !log restarted all services [20:14:47] Logged the message, Master [20:23:02] (03PS1) 10MaxSem: Sync device detection with MobileFrontend [operations/puppet] - 10https://gerrit.wikimedia.org/r/83919 [20:23:19] hey mark, I've divided device detection by zero!:P ^^ [20:33:52] ottomata: ping [20:37:41] (03CR) 10Dzahn: [C: 031] nickel: add self ('olivneh') w/sudo [operations/puppet] - 10https://gerrit.wikimedia.org/r/83201 (owner: 10Ori.livneh) [20:40:34] (03PS2) 10coren: Add self (mpelletier) to paging group [operations/puppet] - 10https://gerrit.wikimedia.org/r/83840 [20:40:50] (03CR) 10coren: [C: 032] "... clearly, I am insane." [operations/puppet] - 10https://gerrit.wikimedia.org/r/83840 (owner: 10coren) [20:46:48] (03CR) 10Dzahn: [C: 031] Add SSL support to the proxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/83773 (owner: 10Yuvipanda) [20:49:13] (03PS1) 10BBlack: Add separate binary 'vnm_validate' [operations/software/varnish/libvmod-netmapper] - 10https://gerrit.wikimedia.org/r/83924 [20:49:14] (03PS1) 10BBlack: add cmdline debugging in vnm_validate [operations/software/varnish/libvmod-netmapper] - 10https://gerrit.wikimedia.org/r/83925 [20:49:15] (03PS1) 10BBlack: support maskless addresses (as implicit all-ones) [operations/software/varnish/libvmod-netmapper] - 10https://gerrit.wikimedia.org/r/83926 [20:49:16] (03PS1) 10BBlack: bugfix on strings alloc [operations/software/varnish/libvmod-netmapper] - 10https://gerrit.wikimedia.org/r/83927 [20:49:17] (03PS1) 10BBlack: release 1.2 [operations/software/varnish/libvmod-netmapper] - 10https://gerrit.wikimedia.org/r/83928 [20:53:36] (03PS1) 10Jgreen: add a wikitable output format to otrs idle_agent_report [operations/puppet] - 10https://gerrit.wikimedia.org/r/83929 [20:54:39] (03PS1) 10BryanDavis: Make vhtcpd collector and pyconf agree on titles. [operations/puppet] - 10https://gerrit.wikimedia.org/r/83932 [20:54:59] (03PS1) 10Reedy: 175 -> 192MB memory [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83933 [20:55:39] (03CR) 10Jgreen: [C: 032 V: 031] add a wikitable output format to otrs idle_agent_report [operations/puppet] - 10https://gerrit.wikimedia.org/r/83929 (owner: 10Jgreen) [20:57:52] (03CR) 10Reedy: [C: 032] 175 -> 192MB memory [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83933 (owner: 10Reedy) [20:58:02] (03Merged) 10jenkins-bot: 175 -> 192MB memory [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83933 (owner: 10Reedy) [20:58:43] !log reedy synchronized wmf-config/InitialiseSettings.php [20:59:07] Reedy: as soon as you're done please let me know [20:59:14] or else i may mess with the sync [21:00:07] I'm done... I think yuri_k was looking to deploy something [21:01:02] yuri_k: hey [21:01:04] we need you to wait [21:01:09] or else it may kaplut [21:01:10] yep, ori-l doesn't need to depl [21:01:15] LeslieCarr, sure [21:01:19] i mean, it could be fun [21:01:21] see if it works ;) [21:01:23] :) [21:01:31] ok, one breakage at a time, network breakage… go! [21:01:36] !log network maintenance in tampa [21:01:37] LeslieCarr, i only need to sync-dir zero ext [21:01:39] Logged the message, Mistress of the network gear. [21:01:52] LeslieCarr, when do you think i can push it out? [21:01:59] let's see how many pages we get ;) [21:02:09] in the next 15 minutes [21:02:14] oh sure [21:03:01] (03PS6) 10Yuvipanda: Add SSL support to the proxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/83773 [21:03:24] PROBLEM - Host wikidata-lb.pmtpa.wikimedia.org is DOWN: CRITICAL - Network Unreachable (208.80.152.218) [21:03:39] PROBLEM - Host wikidata-lb.pmtpa.wikimedia.org_https is DOWN: CRITICAL - Network Unreachable (208.80.152.218) [21:03:50] PROBLEM - Host wikidata-lb.pmtpa.wikimedia.org_ipv6_https is DOWN: /bin/ping6 -n -U -w 15 -c 5 2620:0:860:ed1a::12 [21:04:02] PROBLEM - Host wikivoyage-lb.pmtpa.wikimedia.org_ipv6 is DOWN: /bin/ping6 -n -U -w 15 -c 5 2620:0:860:ed1a::13 [21:04:08] !log rolled back [21:04:11] Logged the message, Mistress of the network gear. [21:04:17] PROBLEM - Host wikidata-lb.pmtpa.wikimedia.org_ipv6 is DOWN: /bin/ping6 -n -U -w 15 -c 5 2620:0:860:ed1a::12 [21:04:28] PROBLEM - Host wikivoyage-lb.pmtpa.wikimedia.org is DOWN: CRITICAL - Network Unreachable (208.80.152.219) [21:14:44] akosiaris: PHP coverage stack trace is http://bug-attachment.wikimedia.org/attachment.cgi?id=13225 [21:15:53] http://9200.elasticsearch-monitoring-playground.instance-proxy.wmflabs.org/_plugin/segmentspy/#/nikwiki_content_red/ [21:15:57] (03PS1) 10Springle: return db1020 to the s4 pool [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83936 [21:17:37] awjr: who came up with that URL? [21:17:42] (03CR) 10Springle: [C: 032 V: 032] return db1020 to the s4 pool [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83936 (owner: 10Springle) [21:18:29] jeremyb: we're getting a walkthrough of elastic search by nik and he shared it with us [21:18:55] !log springle synchronized wmf-config/db-eqiad.php [21:18:58] Logged the message, Master [21:19:01] hehe, what's the 9200 part? [21:19:03] awjr: i mean who chose that length hostname [21:19:09] * awjr shrugs [21:19:10] no idea [21:19:21] :) [21:19:23] i presume one of the search guys :) [21:19:35] mutante: port # maybe? [21:23:07] (03CR) 10BBlack: [C: 032] Add separate binary 'vnm_validate' [operations/software/varnish/libvmod-netmapper] - 10https://gerrit.wikimedia.org/r/83924 (owner: 10BBlack) [21:23:17] (03CR) 10BBlack: [V: 032] Add separate binary 'vnm_validate' [operations/software/varnish/libvmod-netmapper] - 10https://gerrit.wikimedia.org/r/83924 (owner: 10BBlack) [21:23:26] (03CR) 10BBlack: [C: 032 V: 032] add cmdline debugging in vnm_validate [operations/software/varnish/libvmod-netmapper] - 10https://gerrit.wikimedia.org/r/83925 (owner: 10BBlack) [21:23:39] (03CR) 10BBlack: [C: 032 V: 032] support maskless addresses (as implicit all-ones) [operations/software/varnish/libvmod-netmapper] - 10https://gerrit.wikimedia.org/r/83926 (owner: 10BBlack) [21:23:44] hrmmm, Sorry! We could not process your edit due to a loss of session data. Please try again. If it still does not work, try logging out and logging back in. [21:23:48] (03CR) 10BBlack: [C: 032 V: 032] bugfix on strings alloc [operations/software/varnish/libvmod-netmapper] - 10https://gerrit.wikimedia.org/r/83927 (owner: 10BBlack) [21:23:55] isn't that a labs thing? just got it on meta [21:23:57] ottomata: http://www.elasticsearch.org/guide/reference/api/admin-cluster-health/ [21:24:03] may help [21:24:16] (03Abandoned) 10BBlack: release 1.2 [operations/software/varnish/libvmod-netmapper] - 10https://gerrit.wikimedia.org/r/83928 (owner: 10BBlack) [21:24:16] jeremyb: what do you mean a labs thing? it's just what it says it is [21:24:50] Nemo_bis: used to have some kind of credentials message a lot in labs. with keystone. not so much any more [21:25:13] jeremyb: I never got it in labs but it's normal [21:25:48] huh [21:38:09] sync-dir ext/Zero [21:38:20] greg-g, ^ [21:39:04] !log yurik synchronized php-1.22wmf16/extensions/ZeroRatedMobileAccess/ [21:39:07] Logged the message, Master [21:41:11] (03CR) 10Nemo bis: "So this is just to allow URLs which forget the trailing slash after domain? :|" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83225 (owner: 10Umherirrender) [21:42:09] !log yurik synchronized php-1.22wmf15/extensions/ZeroRatedMobileAccess/ [21:42:12] Logged the message, Master [21:44:15] cmjohnson1: http://9200.elasticsearch-monitoring-playground.instance-proxy.wmflabs.org/_plugin/paramedic/ [21:44:42] yurik: please add yourself to https://bugzilla.wikimedia.org/show_bug.cgi?id=53806 [22:17:09] (03PS1) 10Jgreen: change jenkins gid to www-data on civicrm server [operations/puppet] - 10https://gerrit.wikimedia.org/r/83945 [22:20:40] (03CR) 10Jgreen: [C: 032 V: 031] change jenkins gid to www-data on civicrm server [operations/puppet] - 10https://gerrit.wikimedia.org/r/83945 (owner: 10Jgreen) [22:23:51] (03CR) 10Ryan Lane: [C: 032] Rename labsproxy module to dynamicproxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/83127 (owner: 10Yuvipanda) [22:24:19] PROBLEM - Host db1014 is DOWN: PING CRITICAL - Packet loss = 100% [22:25:53] manybubbles: I see that elasticsearch port is not parameterized in puppet, will it always be 9200? [22:26:00] RECOVERY - Host db1014 is UP: PING OK - Packet loss = 0%, RTA = 0.28 ms [22:27:57] (03PS7) 10Yuvipanda: Add SSL support to the proxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/83773 [22:31:20] (03CR) 10Ryan Lane: [C: 032] Add SSL support to the proxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/83773 (owner: 10Yuvipanda) [22:31:42] (03PS3) 10Dzahn: install_certificate: change the behaviour for $privatekey=true/false. [operations/puppet] - 10https://gerrit.wikimedia.org/r/83843 [22:32:06] (03PS4) 10Dzahn: install_certificate: change the behaviour for $privatekey=true/false. [operations/puppet] - 10https://gerrit.wikimedia.org/r/83843 [22:32:15] (03PS1) 10RobH: adding ulsfo.wmnet to search domains on iron [operations/puppet] - 10https://gerrit.wikimedia.org/r/83950 [22:33:29] damned whitespacing. [22:33:30] (03PS2) 10RobH: adding ulsfo.wmnet to search domains on iron [operations/puppet] - 10https://gerrit.wikimedia.org/r/83950 [22:34:50] (03PS1) 10Reedy: Remove ReplaceText, no progress made (bug or whatever) towards enabling it [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83952 [22:35:10] (03CR) 10Reedy: [C: 032] Remove ReplaceText, no progress made (bug or whatever) towards enabling it [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83952 (owner: 10Reedy) [22:35:18] !log reedy synchronized wmf-config/ [22:35:22] Logged the message, Master [22:36:07] (03CR) 10Reedy: [V: 032] Remove ReplaceText, no progress made (bug or whatever) towards enabling it [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83952 (owner: 10Reedy) [22:40:36] (03PS1) 10Asher: enumerate unused dbs outside of an ugly regex, testing graphite 0.910 on db1014 [operations/puppet] - 10https://gerrit.wikimedia.org/r/83953 [22:41:11] (03CR) 10Dzahn: [C: 032] "removal confirmed by legal" [operations/dns] - 10https://gerrit.wikimedia.org/r/81424 (owner: 10Dzahn) [22:41:28] (03PS1) 10Yuvipanda: Use the star.wmflabs.org certificate for dynamic proxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/83954 [22:43:02] (03PS2) 10Asher: enumerate unused dbs outside of an ugly regex, testing graphite 0.910 on db1014 [operations/puppet] - 10https://gerrit.wikimedia.org/r/83953 [22:43:35] !log mwalker synchronized php-1.22wmf15/extensions/Translate 'Updating Translate to master' [22:43:38] Logged the message, Master [22:44:05] !log mwalker synchronized php-1.22wmf16/extensions/Translate 'Updating Translate to master' [22:44:08] Logged the message, Master [22:46:00] (03CR) 10Dzahn: "install_certificate looks right, but still needs 83843" [operations/puppet] - 10https://gerrit.wikimedia.org/r/83954 (owner: 10Yuvipanda) [22:47:06] (03PS3) 10Asher: enumerate unused dbs outside of an ugly regex, testing graphite 0.910 on db1014 [operations/puppet] - 10https://gerrit.wikimedia.org/r/83953 [22:48:54] !log upgraded packages on db1020 to get ganglia-monitor mysql metrics working [22:48:58] Logged the message, Master [22:50:15] !log mwalker synchronized php-1.22wmf16/extensions/CentralNotice 'Updating CentralNotice to master' [22:50:18] Logged the message, Master [22:50:51] !log mwalker synchronized php-1.22wmf15/extensions/CentralNotice 'Updating CentralNotice to master' [22:50:54] Logged the message, Master [22:56:00] (03PS1) 10Ottomata: Adding nagios/icinga plugin and check for elasticsearch. [operations/puppet] - 10https://gerrit.wikimedia.org/r/83956 [22:56:27] (03CR) 10jenkins-bot: [V: 04-1] Adding nagios/icinga plugin and check for elasticsearch. [operations/puppet] - 10https://gerrit.wikimedia.org/r/83956 (owner: 10Ottomata) [22:56:29] (03PS2) 10Yuvipanda: Use the star.wmflabs.org certificate for dynamic proxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/83954 [22:57:24] (03PS2) 10Ottomata: Adding nagios/icinga plugin and check for elasticsearch. [operations/puppet] - 10https://gerrit.wikimedia.org/r/83956 [22:58:29] manybubbles: cmjohnson1: https://gerrit.wikimedia.org/r/#/c/83956/ [22:59:03] (03CR) 10Ottomata: "Went back and forth on the 'nagios' class names. Let me know if you'd rather them be named 'icinga', or something more generic like 'noti" [operations/puppet] - 10https://gerrit.wikimedia.org/r/83956 (owner: 10Ottomata) [23:03:00] (03PS1) 10Mwalker: Run TranslateGroup job Async for CentralNotice [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83957 [23:05:20] (03PS2) 10Mwalker: Run TranslateGroup job Async for CentralNotice [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83957 [23:08:48] (03CR) 10Mwalker: [C: 032] Run TranslateGroup job Async for CentralNotice [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83957 (owner: 10Mwalker) [23:10:31] !log mwalker synchronized wmf-config/CommonSettings.php 'Do not run Translate message group job synchronously for CentralNotice' [23:10:34] Logged the message, Master [23:12:33] Ryan_Lane, so now we really have two different kinds of entries for private and public hosts… think I should split OpenStackNovaHost into two different objects? [23:12:41] Or just track a public/private flag? [23:15:45] (03PS3) 10Dzahn: remove vikipedio.org and vikipedio.com, RT #4673, RT #4674, RT #5681 [operations/dns] - 10https://gerrit.wikimedia.org/r/81414 [23:17:13] (03CR) 10Dzahn: [C: 031] "removal approved by legal" [operations/dns] - 10https://gerrit.wikimedia.org/r/81414 (owner: 10Dzahn) [23:17:14] (03CR) 10Cmjohnson: "The only thing I see is the permission should be set to 644 instead of 755." [operations/puppet] - 10https://gerrit.wikimedia.org/r/83956 (owner: 10Ottomata) [23:18:00] andrewbogott: either is good, but I think there's a reasonable amount of overlapping code [23:18:27] (03CR) 10Ottomata: "Hmm, I don't think so, its a script that is meant to be executable. Also, misc::icinga sets the mode on all the nagios plugins there to 7" [operations/puppet] - 10https://gerrit.wikimedia.org/r/83956 (owner: 10Ottomata) [23:20:51] who broke icinga? [23:22:33] paravoid: https://en.wikipedia.org/wiki/Wikipedia:BLAMEWHEEL says arbcom did it [23:22:35] !log powercycling neon, swapdeath [23:22:39] Logged the message, Master [23:23:10] (03PS1) 10Ottomata: Creating new dclass module. [operations/puppet] - 10https://gerrit.wikimedia.org/r/83960 [23:23:46] that's oracle java 6, right? [23:23:48] not just java 6 [23:24:23] (03CR) 10Ottomata: "Faidon, I'm not sure about the creation of this new module here, let alone the multiple classes for each package. I just really wanted a " [operations/puppet] - 10https://gerrit.wikimedia.org/r/83960 (owner: 10Ottomata) [23:28:24] ottomata: how's the openjdk 7 testing going? [23:28:55] the java 6 reference was not relevant in that commit btw ;) [23:29:06] and paravoid i built openjdk 7 for osx last weekend [23:29:14] it's oracle java 6 iirc [23:29:16] (03PS1) 10coren: Tool Labs: add package python-wikitools [operations/puppet] - 10https://gerrit.wikimedia.org/r/83961 [23:29:24] there is not an installer for osx :( [23:29:39] buuut. it compiled all the kraken java code fine [23:29:39] oh oop [23:29:45] (03CR) 10coren: [C: 032] "Trivial package addition" [operations/puppet] - 10https://gerrit.wikimedia.org/r/83961 (owner: 10coren) [23:30:01] paravoid: i can do it anytime in prod, i just finished saving some benchmark data with oracle 6 today [23:30:04] been working on other stuff [23:30:17] (other slightly more fun stuff :p, today is a hackday afterall right? ) [23:31:40] (03PS5) 10Dzahn: install_certificate: change the behaviour for $privatekey=true/false. [operations/puppet] - 10https://gerrit.wikimedia.org/r/83843 [23:34:57] (03PS6) 10Dzahn: install_certificate: change the behaviour for $privatekey=true/false. [operations/puppet] - 10https://gerrit.wikimedia.org/r/83843 [23:38:07] PROBLEM - profiler-to-carbon on db1014 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/udpprofile/sbin/profiler-to-carbon [23:38:17] PROBLEM - profiling collector on db1014 is CRITICAL: PROCS CRITICAL: 0 processes with command name collector [23:42:43] (03CR) 10Ryan Lane: [C: 032] install_certificate: change the behaviour for $privatekey=true/false. [operations/puppet] - 10https://gerrit.wikimedia.org/r/83843 (owner: 10Dzahn) [23:44:12] (03PS3) 10Yuvipanda: Use the star.wmflabs.org certificate for dynamic proxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/83954 [23:45:01] (03CR) 10Ryan Lane: [C: 032] Use the star.wmflabs.org certificate for dynamic proxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/83954 (owner: 10Yuvipanda) [23:48:55] (03PS1) 10Asher: increment graphite pkg version [operations/puppet] - 10https://gerrit.wikimedia.org/r/83963 [23:49:54] (03PS2) 10Asher: increment graphite pkg version [operations/puppet] - 10https://gerrit.wikimedia.org/r/83963 [23:50:10] (03CR) 10Asher: [C: 032 V: 032] increment graphite pkg version [operations/puppet] - 10https://gerrit.wikimedia.org/r/83963 (owner: 10Asher) [23:51:38] (03PS1) 10Ryan Lane: Add proper star.wmflabs.org public cert [operations/puppet] - 10https://gerrit.wikimedia.org/r/83964 [23:52:00] (03CR) 10Ryan Lane: [C: 032 V: 032] Add proper star.wmflabs.org public cert [operations/puppet] - 10https://gerrit.wikimedia.org/r/83964 (owner: 10Ryan Lane) [23:55:57] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:56:58] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 6.098 second response time [23:59:46] (03PS3) 10Ottomata: Adding nagios/icinga plugin and check for elasticsearch. [operations/puppet] - 10https://gerrit.wikimedia.org/r/83956