[00:14:48] (03PS1) 10TTO: Change favicon for angwiktionary to ['w] icon [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/97458 [00:24:04] (03PS1) 10Springle: repool db1050 at LB 0 for dumps & QueryPage::recache. depool db1037 for upgrade & schema change [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/97459 [00:25:06] (03CR) 10Springle: [C: 032] repool db1050 at LB 0 for dumps & QueryPage::recache. depool db1037 for upgrade & schema change [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/97459 (owner: 10Springle) [00:26:15] !log springle synchronized wmf-config/db-eqiad.php 'repool db1050. depool db1037' [00:26:30] Logged the message, Master [00:32:51] (03CR) 10Tim Starling: Normalise the path part of URLs in the text frontend (034 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/96941 (owner: 10Tim Starling) [00:55:33] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [01:15:40] !log springle synchronized wmf-config/db-eqiad.php 'warm up db1037' [01:15:56] Logged the message, Master [01:20:45] !log mariadb 5.5.34 live-fire test on db1037 [01:21:01] Logged the message, Master [01:39:48] (03PS1) 10Springle: repool db1037 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/97461 [01:40:15] (03CR) 10Springle: [C: 032] repool db1037 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/97461 (owner: 10Springle) [01:41:15] !log springle synchronized wmf-config/db-eqiad.php [01:41:33] Logged the message, Master [02:14:09] !log LocalisationUpdate completed (1.23wmf4) at Mon Nov 25 02:14:09 UTC 2013 [02:14:25] Logged the message, Master [02:25:55] PROBLEM - Puppet freshness on rhodium is CRITICAL: No successful Puppet run for 2d 0h 7m 53s [02:26:17] !log LocalisationUpdate completed (1.23wmf5) at Mon Nov 25 02:26:16 UTC 2013 [02:26:33] Logged the message, Master [02:37:27] (03CR) 10Tim Starling: Generate redirects.conf (031 comment) [operations/apache-config] - 10https://gerrit.wikimedia.org/r/96438 (owner: 10Tim Starling) [02:37:48] (03PS2) 10Tim Starling: Generate redirects.conf [operations/apache-config] - 10https://gerrit.wikimedia.org/r/96438 [02:39:00] (03CR) 10Tim Starling: "PS2:" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/96438 (owner: 10Tim Starling) [02:40:11] webhostingwikipedia.com [02:40:20] * Aaron|home chuckles [02:42:34] A few domains have clearly been donated. [03:05:06] (03PS2) 10Tim Starling: Normalise the path part of URLs in the text frontend [operations/puppet] - 10https://gerrit.wikimedia.org/r/96941 [03:05:20] !log LocalisationUpdate ResourceLoader cache refresh completed at Mon Nov 25 03:05:19 UTC 2013 [03:05:35] Logged the message, Master [03:05:36] (03CR) 10Tim Starling: "PS2: use memcpy()" [operations/puppet] - 10https://gerrit.wikimedia.org/r/96941 (owner: 10Tim Starling) [03:56:49] (03PS1) 10Springle: depool db1045 for uprade [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/97463 [03:57:09] (03CR) 10Springle: [C: 032] depool db1045 for uprade [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/97463 (owner: 10Springle) [03:57:59] (03PS1) 10Andrew Bogott: Move the proxy API to a port 5668 [operations/puppet] - 10https://gerrit.wikimedia.org/r/97464 [03:58:09] !log springle synchronized wmf-config/db-eqiad.php 'depool db1045 for upgrade' [03:58:25] Logged the message, Master [03:59:45] (03CR) 10Andrew Bogott: [C: 032] Move the proxy API to a port 5668 [operations/puppet] - 10https://gerrit.wikimedia.org/r/97464 (owner: 10Andrew Bogott) [04:09:08] !log springle synchronized wmf-config/db-eqiad.php 'warm up db1045' [04:09:24] Logged the message, Master [04:10:18] !log mariadb 5.5.34 live-fire test on db1045 [04:10:34] Logged the message, Master [04:41:05] PROBLEM - Host mw31 is DOWN: PING CRITICAL - Packet loss = 100% [04:42:35] RECOVERY - Host mw31 is UP: PING OK - Packet loss = 0%, RTA = 35.52 ms [04:43:45] (03PS1) 10Springle: repool db1045 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/97469 [04:44:04] (03CR) 10Springle: [C: 032] repool db1045 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/97469 (owner: 10Springle) [04:45:27] !log springle synchronized wmf-config/db-eqiad.php [04:45:42] Logged the message, Master [04:56:55] PROBLEM - NTP on mw31 is CRITICAL: NTP CRITICAL: Offset unknown [05:01:55] RECOVERY - NTP on mw31 is OK: NTP OK: Offset -0.001426935196 secs [05:14:20] (03CR) 10Ori.livneh: [C: 031] "Simple and elegant; I like it. For future projects, consider implementing configuration DSLs by extending Puppet with custom Ruby code ins" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/96438 (owner: 10Tim Starling) [05:20:26] (03CR) 10Faidon Liambotis: [C: 032] "Thanks Tim. Good to go from my side, shall I deploy or will you?" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/96438 (owner: 10Tim Starling) [05:23:54] hey TimStarling, can you give me a pointer or two for investigating the issue that Sean reported (a sudden crunch of parser cache writes on pc1001)? I extracted the queries from cp1001's slow query log and the specific keys referenced in those queries. At this point I'm not stuck, but I'm sort of flailing around without a real plan. [05:25:09] can I see the list of keys? [05:26:03] TimStarling: iron:/home/ori/keys [05:26:19] PROBLEM - Puppet freshness on rhodium is CRITICAL: No successful Puppet run for 2d 3h 8m 17s [05:26:35] note that it coincides exactly with a fundraising banner test run [05:26:46] no idea if it's related, but it's a nice coincidence [05:28:39] ori-l: it's quite broadly distributed then [05:28:57] yep [05:29:19] these are from write queries? [05:29:31] or from select queries as well? [05:29:52] only SqlBagOStuff::set [05:32:55] there were a lot of deletes as well [05:33:43] 50k deletes, 110k replaces [05:38:40] pity this shows nothing useful: http://ganglia.wikimedia.org/latest/graph.php?r=week&z=xlarge&c=MySQL+eqiad&h=pc1001.eqiad.wmnet&jr=&js=&v=1512&m=mysql_com_delete&vl=stmts&ti=mysql_com_delete [05:39:10] that metric script really should be fixed [05:40:43] this is quite nice: http://ganglia.wikimedia.org/latest/graph_all_periods.php?c=MySQL%20eqiad&h=pc1001.eqiad.wmnet&r=hour&z=default&jr=&js=&st=1385357718&v=21403783&m=mysql_innodb_rows_deleted&vl=rows&ti=mysql_innodb_rows_deleted&z=large [05:41:09] although it would be nicer as a derivative, if counter overflows could be filtered out [05:41:59] well, overflows and server restarts [05:42:36] was the restart a cause or an effect? [05:43:35] an effect [05:45:31] 15:34 springle: bounced pc1001 mysqld. massive spike of writes exhausted innodb txn log slots and wouldn't be killed [05:45:43] bah, sorry for the ping [05:46:12] so did the query rate definitely increase? [05:46:34] maybe I should click these CSV links... [05:48:35] 21M rows deleted? [05:48:40] https://graphite.wikimedia.org/render?from=-1weeks&until=now&width=500&height=380&target=query.DELETE.FROM_pcN_WHERE_keyname_X.count&target=query.REPLACE.INTO_pcN_keyname_value_exptime_VALUES_X.count&uniq=0.04967071581631899 [05:49:09] per hour? [05:49:34] no, it's a counter; it indicates the number of deletes since the server started [05:50:01] ah [05:52:43] I'm not sure how to interpret the Graphite graphs -- is it a real increase in the query rate, or was the count depressed by queries being backed up [05:54:03] the last one seems to show a normal-ish trendline [05:54:16] cept that spike on the 23rd [05:55:52] I've got the ganglia CSVs, there's no spike in the 14:42 or 15:24 samples [05:56:30] in either com_replace or com_delete [05:56:41] Hi bougyman. [05:57:01] hello Elsie [05:59:11] TimStarling: mysql_bytes_received spiked: http://ganglia.wikimedia.org/latest/graph.php?r=week&z=xlarge&c=MySQL+eqiad&h=pc1001.eqiad.wmnet&jr=&js=&v=37695292&m=mysql_bytes_received&vl=bytes&ti=mysql_bytes_received [05:59:46] that's one heck of a spike [06:00:15] * ori-l looks to see what sort of size limitations are enforced [06:01:11] ori-l: but no spike in bytes_in [06:01:26] which makes me think it is another metric bug [06:02:43] yes, you're right [06:02:48] on both counts [06:05:38] http://tstarling.com/stuff/pc1001-query-rate-2013-11-21.png [06:06:28] once you discount the bad sample, it looks like the server slowed down under normal traffic, then went back to normal after a restart [06:12:54] why did it slow down? well, cpu_wio was high starting from the 20th [06:20:43] hrm [06:20:44] http://ganglia.wikimedia.org/latest/graph.php?r=year&z=xlarge&c=MySQL+eqiad&h=pc1001.eqiad.wmnet&jr=&js=&v=0.4&m=cpu_wio&vl=%25&ti=CPU+wio [06:21:17] that's quite nice [06:21:58] https://bugs.launchpad.net/ubuntu/+source/apt-xapian-index/+bug/363695 [06:22:07] "update-apt-xapian-index uses too much CPU and memory" [06:22:11] that's in cron.weekly [06:23:06] that's max: 5.83% in the graph [06:27:27] PROBLEM - udp2log log age for lucene on oxygen is CRITICAL: CRITICAL: log files /a/log/lucene/lucene.log, have not been written in a critical amount of time. For most logs, this is 4 hours. For slow logs, this is 4 days. [06:27:49] yes, but that's a percentage of 24 cores [06:28:37] iowait is really the most awful disk metric [06:29:24] gtg [06:29:27] RECOVERY - udp2log log age for lucene on oxygen is OK: OK: all log files active [06:34:20] thanks for investigating [06:35:06] weekly cron runs on sundays anyways [06:37:15] (03CR) 10MZMcBride: "I don't think this is a "confirmed" bug, per se. I think the underlying idea here needs further consideration." (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/97190 (owner: 10Faidon Liambotis) [06:49:10] (03CR) 10Aaron Schulz: [C: 031] Normalise the path part of URLs in the text frontend [operations/puppet] - 10https://gerrit.wikimedia.org/r/96941 (owner: 10Tim Starling) [06:50:03] (03CR) 10Peachey88: "> Dzahn: didn't you mean https://twitter.com/wikimediatech instead of https://twitter.com/wikimedia ? ...snip..." [operations/puppet] - 10https://gerrit.wikimedia.org/r/97190 (owner: 10Faidon Liambotis) [06:51:39] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [06:58:36] ori-l: http://ori.scs.stanford.edu/ [06:59:00] HAHAHAHA :D [06:59:42] nice! [07:00:07] Fits nicely with https://github.com/cobrateam/roan :-) [07:00:34] there's also https://en.wikipedia.org/wiki/Ori_(Stargate) [07:00:59] The Ori are "a group of 'ascended' beings who use their advanced technology and knowledge of the universe to attempt to trick non-ascended humans into worshipping them as gods." [07:01:21] basically they were evil gods [07:02:07] i'm still disappointed we never meet the furlings in stargate [07:02:29] i've never watched it; krinkle pointed it out to me [07:02:51] * paravoid is a huge stargate fan [07:02:54] seasons 1-5 were good and should be watched [07:03:00] up to 7 was great [07:03:13] the ori story line was pretty meh tbh [07:03:21] s/meh/awful/ [07:04:05] no offence ori-l [07:06:45] yurik: around? [07:07:04] paravoid, yep [07:07:08] hey [07:07:14] hi [07:07:19] adam was mentioning a BDD script that you run periodically [07:07:28] bdd? [07:07:51] that's how he called it [07:08:05] what does it do? :) [07:08:11] wiki returns Body dysmorphic disorder [07:08:13] zero testing against production [07:08:33] ? you mean the script that tests that zero prod is correct? [07:08:52] yes [07:09:00] i sometimes run a small script that checks that IP ranges don't overlap [07:09:14] https://github.com/wikimedia/mediawiki-extensions-ZeroRatedMobileAccess/tree/master/maintenance/phantom [07:09:16] the prod testing script is all adam - he wrote it and is running it on daily [07:09:22] oh, okay [07:09:49] yeah, that's adam [07:10:47] paravoid, so we are postponing all depl this week? Or is it just the apache? [07:11:01] all large deployments [07:11:13] is landing page a large depl? [07:11:18] tech ops obviously continues, but no deployments of ours that would end up in greg's calendar [07:11:22] dunno, you have to ask greg [07:11:31] i thought he is gone this week [07:12:05] oh, right, should have asked earlier :) [07:12:06] besides, you will need to do most of that depl - i wouldn't want to mess up apache configs :) [07:12:19] I think robla is in his stead [07:12:36] heh, well, its all ready to go - even the minor patch to remove relative redirects [07:12:54] let me know if you want to poke it later this week [07:13:32] or we can even poke at it now - the redirect can be made temporarily in the mobileportal.php script [07:13:54] or i will head to bed [07:14:05] nah, not a great time now [07:14:09] also very late for you, isn't it? [07:14:19] 2am,not a biggi [07:14:29] i'm an owl (as russians call it) [07:15:43] https://en.wikipedia.org/wiki/Night_owl_(person) [07:16:19] thx ori-l , i thought it was a common expression mostly in ru [07:16:40] nah, english too [07:16:45] is there an always awake word for ori? :) [07:16:59] i think you are around more than anyone i know :) [07:17:26] oh, btw, ori, I have been poking at the zero config stuff, had some basic thoughts [07:17:28] i hope not, that would be a dubious distinction [07:17:33] hehe [07:17:55] what would you say about ... wait for it... Config:SubSpace:BlahBlah structure? [07:18:09] the SubSpace would be defined by an extension [07:18:17] it would describe it with a json schema [07:18:40] this way we will avoid meta community ripping us to shreds for creating new namespaces for each new config type [07:18:42] doesn't sound bad [07:18:52] have you seen https://www.mediawiki.org/wiki/Requests_for_comment/Configuration_database_2 ? [07:19:05] no, i'm affraid of going to RFC page... [07:19:08] readidng... [07:20:00] it may or may not make sense to expand the scope of the problem to mediawiki configuration generally [07:20:15] it would certainly be nice to have a single framework for both, but the requirements may be too different [07:20:31] Yes. [07:22:07] part of the rationale cited by the RFC for making all configuration editable on-wiki is that it's a nightmare to get set up with gerrit [07:22:11] ori-l, not sure about the security/storage [07:22:21] how does he propose to store it? [07:22:23] which is true, but it's a bigger problem, and it should be fixed at the root [07:22:37] i.e., by making it not be a nightmare to get set up with gerrit [07:23:10] there's a storage section but it's not tightly specced [07:23:18] I commented on the talk page a few minutes ago about the rationale for a graphical configuration interface. [07:23:19] legoktm might be around [07:23:25] Gerrit has nothing to do with it, I don't think. [07:23:40] legoktm said good night in another channel a few mins ago [07:23:43] so he might not be [07:23:52] I *was* about to sleep.... [07:24:06] well, gerrit requires very complex tools to use it - whereas wiki requires a browser [07:24:20] Right, but the point is that setting your wiki logo shouldn't require editing PHP. [07:24:24] For anyone. [07:24:36] hi legoktm, could you explain what you meant in the storage section of that rfc? [07:24:58] what is the storage engine/editing/etc? [07:25:05] well I was planning to just use json as a storage mechanism [07:25:14] but in the rfc review, Tim suggested using MySQL (or any db I guess) as the backend [07:25:40] Elsie: $wgLogo belongs to a very small set of config vars that really do beg for a web interface [07:25:48] I wouldn't generalize from that to all configuration vars [07:26:06] ori-l: Why not? [07:26:27] Adding namespaces, configuring user groups can all use web interfaces [07:26:30] legoktm, but what about all the other goodies of the wiki, namelly: history, diffs, monitoring, email notifications, etc? [07:27:01] yurik: Those come with using a MySQL backend, I think. ;-) [07:27:31] well, all of wiki uses mysql backend, but that doesn't solve the stated security problem :) [07:27:38] yurik: you mean storing in the page text? yes there are a lot of advantages to that, but problems with that is a) security: can't have private settings stored in page text, and b) accessing another wiki in a farm's settings becomes non-trivial [07:28:15] legoktm, security - reading, or security - editing? [07:28:33] there are very few really private settings that we have [07:28:54] reading [07:29:10] and i don't feel its a good tradeoff to trade the regular wiki abilities to the few keys that should be hidden [07:29:14] I'm in favor of ploughing ahead with an extension that provides a generic on-wiki configuration facility for other extensions rather than starting from core [07:29:32] so am i [07:30:11] why do we use #wikimedia-operations [07:30:13] and all external access can easily be done through api - it has all functionality for that [07:30:32] because ops are asleep and mice are having a field day? [07:30:37] heh [07:30:40] mice or owls? [07:30:41] * apergos peeks in [07:30:53] hi apergos [07:30:56] awake since 7:30 (for some value of 'awake') [07:31:06] for some value of TZ [07:31:06] yurik: using the API sounds like a good idea, I hadn't really considered that. [07:31:20] legoktm, that's how we do it internally for zero [07:31:26] ori-l: If there's a generic on-wiki configuration facility, I'm not sure what sense it makes to start with extensions. [07:31:47] I have to sleep though, I'll update the page tomorrow [07:31:51] it's a smaller problem [07:32:01] night [07:32:02] http://meta.wikimedia.org/w/api.php?action=help&modules=zeroconfig [07:32:04] night lego [07:32:24] night [07:32:41] ori-l: Well, focusing on all core configuration variables isn't exactly the alternative. We could focus on any handful. [07:32:42] ori-l, what's the better channel? [07:32:48] #wikimedia-dev [07:32:51] Or #mediawiki. [07:33:02] -dev [07:33:16] -tech is basically dead [07:33:46] If you say so. [07:34:01] -staff-cabal [07:43:20] !log imported de-debianized mariadb 5.5.34 debs into rerepro on brewster [07:43:34] Logged the message, Master [08:00:55] paravoid: any further thoughts re: https://gerrit.wikimedia.org/r/#/c/96961/ ? [08:01:24] ori-l: nope, looks good from my side, but I thought of letting Ryan have a final look [08:01:34] (03CR) 10Faidon Liambotis: [C: 031] "LGTM" [operations/puppet] - 10https://gerrit.wikimedia.org/r/96961 (owner: 10Ori.livneh) [08:03:11] kk [08:03:21] he okayed it in principle [08:03:42] ori-l: I don't love 'donotify' though, maybe 'managed' on the nginx class? [08:03:42] ori-l: class { 'nginx': managed => false, } [08:03:42] Ryan_Lane: that works for me [08:03:44] Ryan_Lane: I don't care too much what the variable is :) [08:04:07] but if i press you to merge it murphy's law dictates that the cluster explodes horribly [08:05:17] probably a good idea to let Ryan look first [08:26:47] PROBLEM - Puppet freshness on rhodium is CRITICAL: No successful Puppet run for 2d 6h 8m 45s [08:37:20] (03CR) 10Odder: [C: 031] Change favicon for angwiktionary to ['w] icon [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/97458 (owner: 10TTO) [09:07:09] (03PS1) 10Akosiaris: Raise splaylimit from 45 to 60 seconds [operations/puppet] - 10https://gerrit.wikimedia.org/r/97486 [09:10:57] (03CR) 10Akosiaris: [C: 032] Raise splaylimit from 45 to 60 seconds [operations/puppet] - 10https://gerrit.wikimedia.org/r/97486 (owner: 10Akosiaris) [10:07:19] (03CR) 10Ori.livneh: [C: 04-1] role and module structure for ishmael (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/96403 (owner: 10Dzahn) [10:10:17] (03CR) 10Daniel Kinzler: "somebody give a +2 then, i can't :)" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/92925 (owner: 10Dzahn) [10:14:10] so, who half-killed streber? [10:14:32] no SAL entries, ticket was last updated by apergos last week but for a seemingly unrelated reason [10:14:40] and now we're left with no smokeping to debug a site outage [10:14:42] great [10:18:27] yeah I've done nothing over there [10:39:26] (03CR) 10Hashar: "Daniel: that can only be deployed by ops." [operations/apache-config] - 10https://gerrit.wikimedia.org/r/92925 (owner: 10Dzahn) [11:06:50] hey, are you guys holding some courses sometimes ? [11:06:54] or some tutorials [11:07:02] like "this is how you do X with puppet" [11:07:14] or "this is how you do this and that with the X and the Y to get Z" [11:12:56] paravoid: Hallowed are the Ori! [11:26:48] LeslieCarr: do we have something like https://monitor.archive.org/weathermap/weathermap.html ? [11:27:05] PROBLEM - Puppet freshness on rhodium is CRITICAL: No successful Puppet run for 2d 9h 9m 3s [11:27:36] i.e. compact representation of bandwidth capacity & usage for all links [11:29:31] no [11:30:29] paravoid: no, to Krinkle or to Nemo_bis? [11:30:35] or no to both at once? [11:31:48] no to Nemo_bis [11:32:09] oki thanks [11:32:26] average: At hackathons and community events (such as the annual Wikimedia Hackathon in Europe, and at Wikimania) there are usually several hands-on workshops and talks about the various tools we use and how to use them. [11:32:35] Some of them are also recorded and/or documented on-wiki. [11:39:31] so when's the next one in Europe ? [11:39:52] paravoid: can I knock on your door and ask you puppet questions ? [11:40:13] https://www.mediawiki.org/wiki/Berlin_Hackathon_2012 [11:40:14] https://www.mediawiki.org/wiki/Amsterdam_Hackathon_2013 [11:40:17] https://www.mediawiki.org/wiki/Z%C3%BCrich_Hackathon_2014 [12:00:53] PROBLEM - SSH on amslvs1 is CRITICAL: Server answer: [12:06:50] RECOVERY - SSH on amslvs1 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [12:34:45] https://gdash.wikimedia.org/dashboards/reqerror/ [12:39:08] heh, I read that as rageerror [12:39:30] not raging yet [12:39:35] YET!! [12:40:54] exactly! [12:47:28] PROBLEM - LVS Lucene on search-pool1.svc.eqiad.wmnet is CRITICAL: Connection timed out [12:48:18] RECOVERY - LVS Lucene on search-pool1.svc.eqiad.wmnet is OK: TCP OK - 0.000 second response time on port 8123 [13:00:07] (03CR) 10Edenhill: "Overall looks good, but some smaller issues here and there." (037 comments) [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/95473 (owner: 10Ottomata) [13:01:31] (03CR) 10Edenhill: Writing JSON statistics to log file rather than syslog or stderr (031 comment) [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/95473 (owner: 10Ottomata) [13:16:40] about half of today's 503s are Special:CentralAutoLogin either .../start or .../createSession [13:17:02] Spezial:Zentrale_automatische_Anmeldung for de which is where most of em are [13:20:50] apergos: related to https://bugzilla.wikimedia.org/show_bug.cgi?id=54195 ? [13:21:41] maybe [13:21:46] if those URLs are 60 % of apache traffic, maybe being 50 % of 503s doesn't mean anything [13:22:10] (unless I'm comparing apple and oranges) [13:27:49] looks like there's still an outstanding patch: https://gerrit.wikimedia.org/r/#/c/96317/ [13:34:22] there are a bunch (at least enough not to be buried in the rest of the noise) of these: GET http://en.wikipedia.org\ [13:36:46] (03Abandoned) 10Hashar: contint: firewall out ssh access (restrict to bastion) [operations/puppet] - 10https://gerrit.wikimedia.org/r/96040 (owner: 10Hashar) [13:38:06] !log Jenkins: upgrading openjdk 6 on gallium and lanthanum [13:38:22] Logged the message, Master [13:38:50] !log jenkins : restarted slave daemon on lanthanum.eqiad.wmnet [13:39:05] Logged the message, Master [13:39:58] !log jenkins : restarted slave daemon on gallium.wikimedia.org [13:40:13] Logged the message, Master [13:52:07] so we are getting peaks where there's a lot of these centralautologins [13:52:11] root@oxygen:/a/log/webrequest# tail -2000 5xx.tsv.log | egrep '(Special:CentralAutoLogin|Spezial:Zentrale_automatische_Anmeldung)' | wc -l [13:52:11] 1431 [13:52:23] 1431 out of 2000, rather a lot [14:27:13] PROBLEM - Puppet freshness on rhodium is CRITICAL: No successful Puppet run for 2d 12h 9m 11s [14:36:43] (03Abandoned) 10Hashar: Jenkins job validation (DO NOT SUBMIT) [operations/puppet] - 10https://gerrit.wikimedia.org/r/89002 (owner: 10Hashar) [15:02:22] (03CR) 10Cmcmahon: "It would be nice to do this soon. This is blocking some testing work in beta labs and will block release very soon." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/96161 (owner: 10EBernhardson) [15:12:43] tail -4000 5xx.tsv.log | egrep '(Special:CentralAutoLogin|Spezial:Zentrale_automatisch)' | wc -l [15:12:43] 3390 [15:12:58] and really no idea what can be done about it right now :-( [15:13:07] https://gdash.wikimedia.org/dashboards/reqerror/ [15:18:21] (03CR) 10Anomie: Normalise the path part of URLs in the text frontend (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/96941 (owner: 10Tim Starling) [15:33:25] apergos: Don't suppose you've any more information about the error than that? [15:33:55] fatal and exception logs are quiet [15:33:57] typical [15:35:07] well nemo has been bugzilla watching and pointed out this with rather a long discussion: [15:35:21] https://bugzilla.wikimedia.org/show_bug.cgi?id=54195 [15:35:35] and the vast majority of those are indeed /start [15:42:26] (03PS2) 10Addshore: Start wikidata puppet module for builder [operations/puppet] - 10https://gerrit.wikimedia.org/r/96552 [15:42:51] (03CR) 10Addshore: "Covered most if not all of the initial comments from PS1" (037 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/96552 (owner: 10Addshore) [15:45:17] apergos, what DC is this? [15:46:08] (03PS3) 10Addshore: Start wikidata puppet module for builder [operations/puppet] - 10https://gerrit.wikimedia.org/r/96552 [15:46:36] seem nicely split across cp100x and amssqx [15:46:42] MaxSem: [15:46:45] if anyone around fancies reviewing some puppet stuff for me I would be very gratefull, [= (see above) nice and small ;p [15:49:23] addshore, is this for labs? [15:49:29] yus [15:54:40] addshore, have you tested it? [15:58:16] this is my first time touching puppet, So I'm not sure what the standard process is :P is the jenkins validation not enough? ;p [15:59:49] addshore: it just do very basic tests [16:00:28] addshore, you're supposed to test it on labs [16:00:59] from the looks of it, exec { 'npm_install' would fail due to lack of path [16:01:02] Nemo_bis: nope [16:03:13] ah ha... can I ask you about the pop list links we want for The Matrix? ( LeslieCarr ) [16:03:52] apergos: sure [16:03:59] (03CR) 10Jeroen De Dauw: Start wikidata puppet module for builder (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/96552 (owner: 10Addshore) [16:04:08] MaxSem: is the process documented anywhere? :/ [16:04:09] i'm about to head out for a little bit though... so you have like 5 minutes :) go! [16:04:22] so...erm... links from where? :-D [16:05:36] (03PS4) 10Addshore: Start wikidata puppet module for builder [operations/puppet] - 10https://gerrit.wikimedia.org/r/96552 [16:05:36] is this the lit of peers if the dc happens to be in a facility listed in peeringdb? or some other thing? [16:05:38] *list [16:05:51] (03CR) 10Addshore: Start wikidata puppet module for builder (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/96552 (owner: 10Addshore) [16:05:53] addshore, https://wikitech.wikimedia.org/wiki/Help:Self-hosted_puppetmaster [16:07:43] i'ma little confused ? [16:08:06] let me look [16:08:07] me too [16:08:21] there's two columns I haven't touched, [16:08:34] ok, i'm not actually sure what to put in that column specifically [16:08:34] "POP List Link" [16:08:48] "POPs at Location (and type)" [16:08:57] someone mentioned there may be lists, but they are usually in bid [16:08:58] not sure what we want in there [16:09:01] so that column may be pointlesss [16:09:14] i preferred to put too much on and parse down ;] [16:09:15] the pops at location and type would beif we know that XYZ carrier has a pop at the location and if it's major or just backhauled [16:09:22] but yeah, i'd get rid of the pop list link column [16:09:38] ok [16:09:52] I was really scratching my head over it [16:09:54] thanks [16:10:10] ok [16:10:12] i do have an idea though [16:10:19] yes? [16:10:37] we don't have a column for cost of 10G waves/dark fiber [16:10:49] the info is sorta all over [16:11:07] we could replace that with cost [16:11:18] yeah, there's a little in a few replies to tickets but mostly I think we don't have that pricing info [16:11:24] (and 10G vs dark fiber is quite important, since we can put a lot of 10g's per dark fiber) [16:11:28] a few bids had it [16:11:36] and then some tickets [16:11:56] lemme change that column title then [16:12:00] coo [16:12:05] mm wrong location, I'll move it also [16:12:22] did anyone respond to the sites i wanted to throw out of this round email ? [16:12:29] because we can also start doing theoretical pricing [16:13:03] digital realty contegix etc? [16:13:40] I saw no replies, I looked at them all and agree, we can easily afford to be choosy about connectivity, lots of bids that qualify [16:14:21] apergos: no names in this channel [16:14:41] sorry [16:15:01] well, we can start doing pricing theoreticals on the others that we have info for ? [16:15:24] oh shit i gotta run 5 minutes ago! [16:15:26] bye [16:15:27] go go go [16:15:29] thanks [16:19:44] (03PS1) 10Hashar: contint: package curl on slaves [operations/puppet] - 10https://gerrit.wikimedia.org/r/97526 [16:20:15] !log jenkins : installed curl on lanthanum.eqiad.wmnet puppet change is {{gerrit|97526}} [16:20:28] ty MaxSem :) (in our new office and the internet here currently leaves a bit to be desired) [16:20:31] Logged the message, Master [16:20:58] (03CR) 10Hashar: "manually installed on lanthanum.eqiad.wmnet" [operations/puppet] - 10https://gerrit.wikimedia.org/r/97526 (owner: 10Hashar) [16:31:40] !jenkins mediawiki-core-qunit [16:31:41] https://integration.wikimedia.org/ci/job/ [16:31:46] !jenkins mediawiki-core-qunit [16:31:46]