[00:01:10] (03CR) 10Dzahn: [C: 032] dsh: split the misc-servers into dc-specific files [operations/puppet] - 10https://gerrit.wikimedia.org/r/92599 (owner: 10Dzahn) [00:08:30] PROBLEM - twemproxy process on arsenic is CRITICAL: PROCS CRITICAL: 0 processes with UID = 65534 (nobody), command name nutcracker [00:11:50] PROBLEM - Host wikibooks-lb.pmtpa.wikimedia.org_ipv6 is DOWN: /bin/ping6 -n -U -w 15 -c 5 2620:0:860:ed1a::4 [00:12:11] RECOVERY - Host wikibooks-lb.pmtpa.wikimedia.org_ipv6 is UP: PING WARNING - Packet loss = 37%, RTA = 31.29 ms [00:12:29] Hey opsen, does anyone know why I still do not get SMS pages despite getting email pages? [00:12:47] hrmm, lemme take look at contact file [00:14:06] RoanKattouw: you are just set for notify by email [00:14:08] not sms [00:14:14] but i can change it now to both (like most ops) [00:14:16] sound good? [00:14:20] RECOVERY - check_job_queue on hume is OK: JOBQUEUE OK - all job queues below 10,000 [00:14:38] I thought that's how I was supposed to have been set up ... [00:14:39] heh [00:14:40] nope! [00:14:42] edits on https://wikitech.wikimedia.org/wiki/Tampa_cluster appreciated. we have moved more stuff than it says, i will also get back to editing it, but if you know about something, please go ahead and edit [00:14:47] so you want it changed right? [00:15:20] Yeah get me SMS pages [00:15:24] Do I have off hours configured? [00:16:03] yep [00:16:07] yer set for pdt awake hours [00:16:16] so it will email then but not page. [00:16:27] What are "awake hours" exactly? [00:16:55] Like, 8am-midnight or something? [00:17:06] its in public repo icinga [00:17:07] uhh [00:17:18] alias PDT 8 am till midnight, 7 days a week [00:17:19] yep [00:17:20] PROBLEM - check_job_queue on hume is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:17:23] except its gmt [00:17:30] so its shuffled for daylight savings no? [00:19:18] What's the UTC offset they used? [00:19:35] Did they do 1500-0700 or 1600-0800 UTC? [00:19:59] RoanKattouw: you can have custom hours if you want to, it's public puppet files/icinga/timeperiods.cfg [00:20:23] 15 alias CET 8 am till midnight, 7 days a week 16 sunday 07:00-23:00 [00:20:27] Alright, was just wondering [00:20:48] Thanks for giving me the path [00:21:01] RobH: When I travel to a different timezone, do I notify someone to change my config? [00:21:54] Hah, it's 1600-0800, so the name is a lie, it's PST [00:22:41] RoanKattouw: you are the special case who gets paged but can't change it self-service, so i guess the answer is yes [00:22:48] RoanKattouw: its public repo [00:22:57] oh wait, its not [00:22:58] hrmmm [00:23:01] EST and EDT_early are correctly labeled, but I'm amused that we have no straight EDT or EST_early :) [00:23:04] it's public to define a new timezone [00:23:07] schedule is, but yea [00:23:12] it's not public to attach a timezone to a contact [00:23:14] but timezone is set in private file [00:23:15] RobH: The TZ defs are public so I can fix those [00:23:25] so while they can be added, assignment needs ops [00:23:32] Right [00:23:37] or ops can decide roan can do it himself for his own contact only [00:23:41] which im pretty ok with. [00:23:42] Shall I just drop a note to ops@ when I travel? [00:23:54] yea, atleast for the first time [00:23:55] RoanKattouw: they will appear a bit random because the way they have been added was pretty much everybody added the one they wanted for their contact :) [00:24:03] then i bet everyone just agrees for you to change your own then ;] [00:24:07] so feel free to have a better one [00:24:40] Yeah I figured that the reason we have EDT_early but not PDT_early is because there is one early bird person and they're back east [00:24:45] and yea, ops-requests "switch roan to roan_pdt" would be fine [00:24:59] We also don't have any Australia configs so I guess that means Tim is 24x7? [00:25:37] hmm, but i once used Australia too when i was in Darwin [00:25:52] maybe removed in some cleanup [00:26:00] if it wasnt applied to anyone it may have been [00:26:13] would be nice to define them in a bit better naming and order. [00:26:31] a little more systematic, yep [00:26:32] I'm already working on that a bit [00:26:37] cool [00:26:54] Just for the existing ones [00:27:56] Renaming PDT to PST because that's what it is, adding real PDT, adding EDT_early and EST to complement EST and EDT_early [00:28:04] Should probably add CEST as well [00:28:21] Also, this means our configs will now actually be DST-aware and we'll have to switch everyone's next week [00:29:05] i'm pretty sure we had more zones in the past .. [00:29:13] nice [00:29:19] i bet the pdt was right at one part [00:29:28] but then it changed when daylight ended [00:29:37] and whoever changed it was easier to change there than on every contact [00:29:43] * RobH did not do it, but that makes sense to him [00:32:58] Right [00:33:38] So DST ends on Sunday, so apparently no one minded their paging hours having been off by an hour since like March [00:34:15] But once my change is merged, it'll fix the definition of "PDT" to actually be correct, and so if it lands after Sunday we'll want to change PDT contacts to PST [00:35:58] (03CR) 10Legoktm: "Caused bug 56358." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90934 (owner: 10Amire80) [00:38:07] off for food, ttyl (or on a gerrit change) [00:39:21] RobH: Hey quick question (since mut ante just left): those aliases like "alias PDT 8 am till midnight, 7 days a week", are they ever used in the config file, or is it only the timeperiod_name like PDT_awake_hours? [00:39:37] I ask because some of the aliases have typos in them and I don't want to break anyone's config by fixing them [00:39:58] yea aliases are not used that i can see [00:40:00] all like host_notification_period PDT_awake_hours [00:40:24] yea, no aliases used [00:40:36] i suppose that was pulled for info elsewhere at some point, dunno. [00:40:40] OK [00:41:03] you should be able to treat that just like a comment here [00:41:10] /aways [00:44:47] (03PS1) 10Catrope: Clean up Icinga timezone definitions [operations/puppet] - 10https://gerrit.wikimedia.org/r/92604 [00:49:16] (03CR) 10RobH: [C: 031] "looks right to me, i just don't want to merge something and ensure it doesn't break when its this late in day." [operations/puppet] - 10https://gerrit.wikimedia.org/r/92604 (owner: 10Catrope) [00:54:04] (03CR) 10Catrope: "Yeah let's not merge changes to paging right before dinnertime :)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/92604 (owner: 10Catrope) [00:56:30] PROBLEM - Puppet freshness on analytics1021 is CRITICAL: No successful Puppet run in the last 10 hours [00:58:36] (03CR) 10Bsitu: "The span should be defined in HTML template inside email formatter" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90934 (owner: 10Amire80) [01:20:21] (03CR) 10Bsitu: "If this can not be defined inside the template, then we need to strip out the span tag for text version" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90934 (owner: 10Amire80) [01:32:03] (03PS1) 10Reedy: Remove codereview specific config file, collaps into CommonSettings.php [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92605 [01:36:50] (03PS1) 10Reedy: Remove $wgSubversionProxy [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92606 [09:53:24] (03PS1) 10ArielGlenn: remove dup wtp1003 entry [operations/dns] - 10https://gerrit.wikimedia.org/r/92622 [09:56:45] !log jenkins migrated mediawiki-core jobs from PHPUnit 3.7.24 to 3.7.28 {{gerrit|I8651f2fb0e09d4e868c2139b5a7d1b640de61784}} [09:57:44] paravoid: my sponsor is Michael Prokop (mika), he wrote jenkins-debian-glue a bunch of shell script to easily create package with Jenkins [09:58:00] yeah I know him [09:58:00] paravoid: so he did review, caught a few issues like description / copyright [09:58:12] and tried out some a rebuild on unstable :-] [09:58:33] he's the grml author [09:59:49] (03CR) 10ArielGlenn: [C: 032] remove dup wtp1003 entry [operations/dns] - 10https://gerrit.wikimedia.org/r/92622 (owner: 10ArielGlenn) [10:01:01] paravoid: and he is willing to setup a Gerrit/Zuul system as well :D [10:06:21] !log jenkins migrated mediawiki jobs from PHPUnit 3.7.24 to 3.7.28 {{gerrit|Ib8eb320ec5c2e9f8332f09c50e06cfed0d7f5fdf}} [10:06:29] bah morebot is dead again [10:07:05] apergos: mind restarting morebots on tools labs? doc at https://wikitech.wikimedia.org/wiki/Morebots#Example:_restart_the_ops_channel_morebot [10:29:09] huh well my attempt to log in over there just failed :-/ [10:29:25] I wonder if I am a member of the project [10:29:42] evry time they move these bots around something like that happens [10:30:13] you know, since this does logging for ops (a production service) maybe it shouldn't live in labs, just sayin... [10:30:59] I have the same feeling :] [10:31:12] though the idea is to offload us to volunteers whenever they can help [10:31:28] I am happy to have volunteers do it, but look... I'm doing it instead [10:31:34] :-/ [10:35:09] hi, we have a varnish package in our git/gerrit, that we have modified from the upstream. What's the better way to apply a patch that was implemented in the upstream? [10:35:59] yurik_: have a look at previous changes? https://gerrit.wikimedia.org/r/#/q/project:operations/debs/varnish,n,z [10:36:15] https://gerrit.wikimedia.org/r/#/c/87347/ [10:36:25] apparently just stick your patch under debian/patches/ [10:36:28] yurik_: debian/patches [10:36:37] "not a big fan of forking" -- a bit too late for that [10:36:54] -plus is a varnish fork anyway [10:37:04] and we need patches on top of -plus too [10:37:05] paravoid: if android was able to de-fork into mainstream kernel, so can we [10:37:11] sigh [10:37:18] life is just not easy, eh? [10:37:23] and they're well aware of our patches [10:38:57] paravoid: are you saying i should just copy/paste the patch into a file in that dir and build? [10:39:15] try it at least? :) [10:39:18] and here i am, trying to use the power of git to merge and do other crazy things :( [10:39:50] its not like i build deb packages everyday (or ever) [10:39:56] might take a few min [10:39:59] esp on win [10:40:00] (03PS1) 10Odder: en.planet - add Wikistaycation blog [operations/puppet] - 10https://gerrit.wikimedia.org/r/92628 [10:40:39] !log restarted morebots (failed to rejoin after netsplit) [10:41:02] Logged the message, Master [10:56:55] PROBLEM - Puppet freshness on analytics1021 is CRITICAL: No successful Puppet run in the last 10 hours [10:59:40] yurik_: what you linked to is a 8 line patch [10:59:54] that seems much more trivial than a 3.0.4 upgrade :) [11:02:39] paravoid: i know, that's why i'm attempting the package build. Currently strugling with automake regenerating everything to 1.11.3, and then complaining about aclocal being 1.11.1 [11:02:45] and later -- lib/libvarnishapi/Makefile.am:35: HAVE_LD_VERSION_SCRIPT does not appear in AM_CONDITIONAL [11:06:54] PROBLEM - SSH on lvs1001 is CRITICAL: Server answer: [11:08:54] RECOVERY - SSH on lvs1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [12:56:03] paravoid or mark: when i run from varnish root - ./autogen.sh, ./configure, make, make install, do i need to do anything else to apply the extra patches? [13:00:07] debian/patches you mean? [13:00:20] debian/patches are used when you build the package [13:00:34] if you build manually, they won't be applied, obviously [13:02:52] paravoid: i thought they are included as part of one of the steps above. When should I apply them during the build process, and which tool should i usse? [13:24:10] paravoid: do you know who is admin for RT? [13:25:27] (03PS2) 10Krinkle: Remove old tampa search config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92016 (owner: 10Chad) [13:28:16] (03CR) 10jenkins-bot: [V: 04-1] Remove old tampa search config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92016 (owner: 10Chad) [13:42:50] oh my god [13:54:21] (03CR) 10Lcarr: "Those hosts did exist and were decommissioned. The IPs were temporarily moved to sq50, since they occupied several roles and pybal would " [operations/dns] - 10https://gerrit.wikimedia.org/r/92613 (owner: 10ArielGlenn) [13:54:27] (03CR) 10Lcarr: [C: 032] remove entries for sq87-106, these hosts never existed [operations/dns] - 10https://gerrit.wikimedia.org/r/92613 (owner: 10ArielGlenn) [13:59:46] (03CR) 10Lcarr: "FYI the bug has been fixed in pybal. was weird edge case" [operations/dns] - 10https://gerrit.wikimedia.org/r/92613 (owner: 10ArielGlenn) [14:07:05] has anyone changed gerrit's host key? i got a men-in-the-middle warning [14:07:13] http://pastebin.com/Y6aaudDx [14:07:44] (03PS1) 10Lcarr: decommissioned cp1-10 [operations/dns] - 10https://gerrit.wikimedia.org/r/92642 [14:08:10] hasn't changed as far as i know yurik [14:08:34] lemme see my key i have for that and compare [14:08:48] thx [14:10:29] i have a different fingerprint [14:10:34] lovelly [14:10:34] 1024 dc:e9:68:7b:99:1b:27:d0:f9:fd:ce:6a:2e:bf:92:e1 blah (RSA) [14:10:39] PROBLEM - Host wikidata-lb.pmtpa.wikimedia.org_ipv6 is DOWN: /bin/ping6 -n -U -w 15 -c 5 2620:0:860:ed1a::12 [14:10:41] PROBLEM - Host wikisource-lb.pmtpa.wikimedia.org_ipv6 is DOWN: /bin/ping6 -n -U -w 15 -c 5 2620:0:860:ed1a::5 [14:10:59] RECOVERY - Host wikidata-lb.pmtpa.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 35.43 ms [14:11:02] sorry took me a second, i needed to make the fingerprint from the pub key in my known hosts [14:11:05] LeslieCarr: should we nuke all my access right now just to be safe? [14:11:28] mark: got an opinion on that ? [14:11:29] RECOVERY - Host wikisource-lb.pmtpa.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 35.97 ms [14:11:39] mark will be happy to nuke my access ;) [14:11:45] hey [14:11:49] got pages [14:11:52] what's up? [14:12:18] (03CR) 10Lcarr: [C: 032] decommissioned cp1-10 [operations/dns] - 10https://gerrit.wikimedia.org/r/92642 (owner: 10Lcarr) [14:12:28] meh, pmtpa [14:12:36] paravoid: http://pastebin.com/Y6aaudDx -- is it safe? [14:12:36] and only v6 [14:12:55] did anyone do anything LVS related? [14:13:20] yurik_: huh, interesting [14:13:36] yurik_: as long as you never went through and logged in anyway, there shouldn't be a problem [14:13:53] well, define "logged in" [14:13:59] not like i use a password [14:14:14] ssh stopped you because it looks like a MitM [14:14:21] as long as you didn't override that it should be fine [14:15:15] i haven't, but the q remains if there is an MitM [14:15:35] run a tcptraceroute to that host/port [14:15:35] LeslieCarr: how did you convert known_hosts to fingerprints [14:15:57] ssh-keygen -lf .ssh/known_hosts [14:16:48] or above i had copied that specific key into its own file named "blah" (hence the "blah" in the line) [14:16:58] yurik_: you did paste a different fingerprint on pastebin [14:17:24] dc:e9:68:7b:99:1b:27:d0:f9:fd:ce:6a:2e:bf:92:e1 (the correct one) vs 83:fe:34:4b:16:2c:9e:95:1d:f6:d7:7d:ee:28:03:02 (the one you pasted) [14:17:33] akosiaris: correct [14:17:45] 2048 83:fe:34:4b:16:2c:9e:95:1d:f6:d7:7d:ee:28:03:02 ytterbium.wikimedia.org (RSA) [14:17:49] i didn't paste anything here [14:17:55] is what my known hosts says [14:18:05] 22 vs. 29418 probably? [14:18:52] lol [14:19:09] 1024 dc:e9:68:7b:99:1b:27:d0:f9:fd:ce:6a:2e:bf:92:e1 [208.80.154.81]:29418 (RSA) [14:19:15] yes that must be it. I have the same symptom [14:19:25] hrm hrm hrm [14:19:29] for ssh gerrit instead of ssh -p 29418 gerrit [14:19:32] stupid ports [14:20:32] heh [14:20:34] http://pastebin.com/bEBKRYRx [14:20:39] mark ^ tracert [14:21:00] no you need tcptraceroute, not a normal traceroute [14:21:01] so the old key must be manganese [14:21:13] dc:e9 is the key on port 22, 83:fe is the key on port 29418 (gerrit) [14:21:14] mark, i'm on windows, remember? :) [14:21:31] that's your problem :) [14:21:55] i tried running tcptraceroute on a virt host, but it only traced it to the host machine [14:22:04] paravoid: other way around [14:22:14] mark, forgive me, i thought security was yours :-P [14:22:46] * apergos grabs popcorn [14:22:51] hehe [14:23:08] ok, as i take it, no cornerns, i'm cleaning up my hosts file [14:23:23] :-) [14:27:20] oh I have concerns, just not about MitM attacks at the moment :P [14:28:51] hehe. btw, it allows me to do git pull, but refuses git review [14:28:54] with pubkey [14:29:11] ohh well, i will just reply with the patch :) [14:55:17] hashar: you mentioned getting a review for gear? [14:57:39] paravoid: yup you asked me earlier who was my sponsor for python-gear . [14:57:46] are these in svn? [14:57:49] then disappeared mysteriously :-] [14:57:50] yeah [14:58:00] got reviewed by some folks from the python module team as well [14:58:06] ok [14:59:20] paravoid: here it is http://anonscm.debian.org/viewvc/python-modules/packages/python-gear/trunk/debian/ [14:59:21] :-] [14:59:43] I know [15:00:11] does being listed in the Uploaders: field grant me some specific rights ? [15:00:16] or I should become a DD as well ? [15:00:39] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:00:41] no, uploaders doesn't get you upload rights, confusingly enough [15:01:04] there's an intermediate step before becoming DD though [15:01:07] I got used to be confused, the last encounter was figuring out 'any' versus 'all' [15:01:29] RECOVERY - RAID on searchidx1001 is OK: OK: State is Optimal, checked 1 logical drive(s), 4 physical drive(s) [15:01:34] I am not ready to even apply as a DD, I got a ton of documentation to read still and would like to try out more packages before even thinking about becoming a dd [15:02:00] yeah, I wouldn't suggest you to go through the process at this time either [15:02:07] you can be a DM :) [15:02:33] that is, a Debian Maintainer, which is also different from being a maintainer of a package :P [15:02:47] https://wiki.debian.org/DebianMaintainer [15:03:53] ohhh [15:04:39] so noobie -> DM -> debian new members process -> DD -> social events -> root [15:04:45] that is going to take long [15:04:48] root? [15:04:55] @debian.org \O/ [15:05:07] yeah... no :) [15:05:18] maybe in ten years or so I will have escalated through the whole process to help with sysadmin [15:06:10] * jeremyb makes a hashar for DSA sign and puts it on a stick [15:06:30] in my closet :) [15:06:49] then I will be able to watch you poop, unlike the NSA [15:06:58] (hopefully the NSA doesn't have yet a webcam in our closets) [15:07:09] i said closet! i don't have a toilet in there [15:07:49] hashar: "social events" are not before DD though :P [15:09:27] hashar: DSA has multiple meanings btw. https://wiki.debian.org/Glossary#dsa (and it's an algorithm too) [15:09:51] jeremyb: in France we call the toilet "water closest" [15:10:04] jeremyb: more exactly we refer to them either by "toilettes" or "WC3 [15:10:06] WC [15:10:39] hashar: i meant storage area. so it can be stored until you're actually running :) [15:11:01] paravoid: so DM is essentially the same as a DD but on a restricted set of packages. Sounds good. [15:11:23] no voting for DM [15:11:30] what jeremyb said [15:11:44] it's an important difference [15:11:47] also DD comes in two flavors. uploading and non-uploading [15:12:59] does DM have porterbox or other machine access? alioth non -guest? email alias? [15:13:30] http://lists.debian.org/debian-newmaint/2006/02/msg00012.html :D [15:14:16] default access to alioth resources like collab-maint? [15:14:43] hah hashar [15:15:05] hashar: time flies [15:15:20] hashar: https://nm.debian.org/public/person/paravoid [15:15:27] can we stop talking about me? :P [15:15:43] well that explain the process [15:16:00] it's a lot simpler nowadays [15:16:07] it took me exactly a year [15:16:14] and it was one of the short ones back then [15:16:24] of the very few short ones [15:16:29] a year after I applied to become a DD that is [15:16:34] well there was a time when it was even shorter [15:16:56] some people in NYC have been DD since last century [15:26:02] jeremyb: that nm page is nice, we could generate a genealogy of the DDs [15:26:40] hashar: sure :) [15:40:25] !log Broke jenkins for a couple hours while doing some path migration. All jobs got updated in the process and failed jobs got retriggered. [15:40:39] Logged the message, Master [16:06:33] (03PS3) 10Hashar: Remove old tampa search config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92016 (owner: 10Chad) [16:11:33] <^d> mutante: Ping [16:13:29] So my VPS provider gave me 18446744073709551616 IPv6 addresses to play with. [16:14:09] And they also changed the way the addresses are added to the machine. Anyone who wants to help get the new translatewiki.net server IPv6-ed? [16:15:05] <^d> siebrand: Can I have 1817466888 ± 9551616 to use then? :p [16:15:35] ^d: ehr... I guess so (?) [16:16:01] <^d> I figured if you had 18446744073709551616 you could spare a few :p [16:18:11] so that's a /64 [16:18:29] i needed some help from bc :P [16:31:44] (03PS1) 10Dzahn: dsh cleanup. these servers are decom in racktables RT #6123 [operations/puppet] - 10https://gerrit.wikimedia.org/r/92660 [16:37:04] PROBLEM - Backend Squid HTTP on amssq32 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:37:14] PROBLEM - Frontend Squid HTTP on amssq32 is CRITICAL: Connection timed out [16:37:55] RECOVERY - Backend Squid HTTP on amssq32 is OK: HTTP OK: HTTP/1.0 200 OK - 1423 bytes in 0.453 second response time [16:38:04] RECOVERY - Frontend Squid HTTP on amssq32 is OK: HTTP OK: HTTP/1.0 200 OK - 1408 bytes in 0.741 second response time [16:38:32] (03PS1) 10Dzahn: add arsenic to dsh groups, treat like terbium [operations/puppet] - 10https://gerrit.wikimedia.org/r/92662 [16:39:07] (03PS2) 10Dzahn: add arsenic to dsh groups, treat like terbium [operations/puppet] - 10https://gerrit.wikimedia.org/r/92662 [16:40:27] (03CR) 10Chad: [C: 031] add arsenic to dsh groups, treat like terbium [operations/puppet] - 10https://gerrit.wikimedia.org/r/92662 (owner: 10Dzahn) [16:40:37] (03CR) 10Dzahn: [C: 032] dsh cleanup. these servers are decom in racktables RT #6123 [operations/puppet] - 10https://gerrit.wikimedia.org/r/92660 (owner: 10Dzahn) [16:41:07] (03CR) 10Dzahn: [C: 032] add arsenic to dsh groups, treat like terbium [operations/puppet] - 10https://gerrit.wikimedia.org/r/92662 (owner: 10Dzahn) [16:41:41] ^d: waiting for verified [16:42:04] ^d: other thing i wonder, shouldn't terbium and arsenic both be in 'apaches' as well [16:42:19] but terbium wasn't either [16:42:37] that would make sync-apache work on them [16:42:51] <^d> But they don't need it? [16:43:03] <^d> They're not part of the apache pool, they're only meant for crons & one-off tasks. [16:43:15] ok, alright then [16:43:36] just cause you said 09:36 <^d> So /usr/local/apache/common/ isn't setup right [16:43:45] and that would be synced by that [16:44:10] but yep, it's like terbium, so ok [16:44:46] no jenkins? [16:45:16] <^d> Just finished running, just slow in responding it seems. [16:45:33] <^d> nvm, that was for 92660 [16:46:40] <^d> Hmm :\ [16:46:48] (03CR) 10Chad: "recheck" [operations/puppet] - 10https://gerrit.wikimedia.org/r/92662 (owner: 10Dzahn) [16:47:14] (03CR) 10Dzahn: "recheck" [operations/puppet] - 10https://gerrit.wikimedia.org/r/92660 (owner: 10Dzahn) [16:47:33] <^d> 92660 seems to have run, but not responded yet :\ [16:49:40] (03PS1) 10Andrew Bogott: Switch to using uwsgi for the proxy api [operations/puppet] - 10https://gerrit.wikimedia.org/r/92664 [16:55:53] <^d> mutante: I guess just go ahead and verify yourself? Jenkins isn't going to check the dsh files anyway :p [16:57:02] (03CR) 10jenkins-bot: [V: 04-1] add arsenic to dsh groups, treat like terbium [operations/puppet] - 10https://gerrit.wikimedia.org/r/92662 (owner: 10Dzahn) [16:57:08] ^d: it just finished for the first one [16:57:10] and oops heh [16:57:15] (03CR) 10jenkins-bot: [V: 04-1] add arsenic to dsh groups, treat like terbium [operations/puppet] - 10https://gerrit.wikimedia.org/r/92662 (owner: 10Dzahn) [16:57:19] <^d> Silly jenkins [16:57:46] and where was my merge on irc? hmm [16:57:51] of the other one [16:58:37] i see, fixing [17:00:55] (03PS3) 10Dzahn: add arsenic to dsh groups, treat like terbium [operations/puppet] - 10https://gerrit.wikimedia.org/r/92662 [17:04:51] (03CR) 10Dzahn: [C: 032] add arsenic to dsh groups, treat like terbium [operations/puppet] - 10https://gerrit.wikimedia.org/r/92662 (owner: 10Dzahn) [17:05:18] ^d: merged on sockpuppet [17:06:20] <^d> I guess we need to run puppet on tin and then scap? [17:09:29] ^d: sounds right, yea [17:09:35] want me to run on tin? [17:09:44] <^d> That'd be great. I can do scap myself [17:10:10] doing so [17:10:55] <^d> yurik: You've got a deploy window right now, what all you doing for that? [17:11:12] ^d: not using it [17:11:22] <^d> Ah cool, thanks. [17:11:25] np [17:11:27] mutante: do you have ability to help me with RT? [17:11:30] feel free to crash it [17:12:21] <^d> greg-g: Gonna run scap. No new code going out, got a new terbium-like box to get in sync. [17:12:26] aude: yea, give me a minute [17:12:30] mutante: ok [17:12:40] <^d> yurik's not using his window, so shouldn't be stepping on toes [17:13:19] * yurik_ 's toes have been hardened by ops ;) [17:13:28] sees grain-ensure , salt grains .. and finished catalog [17:13:46] ^d: awesome, go forth and such [17:13:58] I recommend steel toed boots [17:14:38] ^d: mediawiki-installation:arsenic on tin [17:15:00] <^d> Hmm? [17:16:02] ^d: saying it's done, dsh group updated on tin [17:16:19] <^d> Ah k, cool. [17:17:01] !log demon Started syncing Wikimedia installation... : No changes, making sure arsenic is synced [17:17:17] Logged the message, Master [17:17:36] <^d> !log aborted sync [17:17:51] Logged the message, Master [17:18:03] <^d> Copying wikiversions dat and cdb files to apaches...arsenic: rsync: mkdir "/usr/local/apache/common-local" failed: Permission denied (13) [17:18:03] <^d> arsenic: rsync error: error in file IO (code 11) at main.c(605) [Receiver=3.0.9] [17:18:05] <^d> mutante: ^ [17:18:56] <^d> cd /usr/local/apache && mkdir common-local && chown mwdeploy:mwdeploy common-local [17:18:58] <^d> Should do it [17:19:24] ^d: try again [17:19:46] 0 lrwxrwxrwx 1 root root 12 May 15 23:40 common -> common-local [17:19:49] 4.0K drwxr-xr-x 2 mwdeploy mwdeploy 4.0K Oct 30 17:19 common-local [17:20:31] !log demon Started syncing Wikimedia installation... : No changes, making sure arsenic is synced [17:20:45] <^d> Looks good, no errors so far. [17:20:45] Logged the message, Master [17:22:35] (03PS2) 10Andrew Bogott: Switch to using uwsgi for the proxy api [operations/puppet] - 10https://gerrit.wikimedia.org/r/92664 [17:23:00] <^d> Unrelated, but worth noting: [17:23:18] <^d> snapshot[24]: sudo: no tty present and no askpass program specified [17:24:19] <^d> arsenic: /usr/local/bin/mwversionsinuse: line 5: /usr/local/apache/common-local/multiversion/activeMWVersions: No such file or directory [17:24:19] <^d> arsenic: Unable to read wikiversions.dat or it is empty [17:24:19] <^d> ... [17:24:20] <^d> arsenic: install: cannot change owner and permissions of `/usr/local/apache/uncommon': No such file or directory [17:24:22] <^d> arsenic: Unable to create /usr/local/apache/uncommon, please re-run this script as root. [17:24:24] <^d> Blahhh [17:25:42] you need mkdir /usr/local/apache/uncommon ? [17:26:05] <^d> Perhaps. But also curious why all this wasn't done via puppet or something else. [17:26:18] <^d> Reedy: Did we have these problems when setting up terbium? :) [17:26:27] i created uncommon for you, mwdeploy:mwdeploy [17:27:33] <^d> Well the uncommon bit is less important. I'm curious why we're not getting multiversion [17:33:07] (03PS1) 10MaxSem: Enable mobile redirect for wikimania2014.wikimedia.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/92671 [17:33:30] * ^d bangs on arsenic with a wrench [17:36:45] (03CR) 10Dzahn: [C: 031] Enable mobile redirect for wikimania2014.wikimedia.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/92671 (owner: 10MaxSem) [17:38:49] !log demon Finished syncing Wikimedia installation... : No changes, making sure arsenic is synced [17:39:03] Logged the message, Master [17:39:11] (03PS2) 10Dzahn: delete search.wikimedia.org Apache config file [operations/puppet] - 10https://gerrit.wikimedia.org/r/91132 [17:39:36] (03CR) 10Dzahn: [C: 032] delete search.wikimedia.org Apache config file [operations/puppet] - 10https://gerrit.wikimedia.org/r/91132 (owner: 10Dzahn) [17:40:26] MaxSem: O NO U DIDNT [17:40:31] :D [17:40:34] <^d> dangit. [17:40:41] * ^d fumes [17:41:26] is arsenic not workin? [17:42:33] heh, daniel is updtin me [17:42:35] annoying issues [18:04:00] ^d: is search.wikimedia.org = "new" search? [18:04:12] <^d> No [18:04:31] ok, so do we care about monitoring for this ? http://search.wikimedia.org/?lang=en&site=wikipedia&search=foo&limit=5 [18:04:37] returning something [18:04:46] not anymore? [18:04:54] <^d> It works. [18:04:57] <^d> It's a thing. [18:05:07] <^d> Silly apple dictionary thing. [18:05:08] apple dictionary [18:05:14] ah, yea, true [18:05:23] dont stir the mountain (apple) [18:05:28] =P [18:05:34] <^d> mutante: new search is search.svc.eqiad.wmnet ;-) [18:05:36] ehm.. yea, damn, well there's an RT. now let's forget about it :) [18:05:49] Didn't Tim essentially re-write it? [18:05:58] yep [18:06:01] its across apaches now [18:06:02] yes, he did, so it moved to apache-config [18:06:12] and we deleted it from puppet/files/apache/sites [18:06:16] just now [18:06:42] just wanted to know if there is any relation between cirrus and the usage of search.wm [18:08:43] Requestor hashar added . Requestor dzahn deleted [18:21:44] who wants to help me deploy a robots.txt tweak? [18:21:56] https://gerrit.wikimedia.org/r/#/c/92552/ [18:25:35] <^d> RobH, mutante: Where are we on arsenic? [18:27:59] <^d> brion: Should just be a matter of pulling and sync-file, right? [18:28:09] ^d: yeah [18:29:02] (03CR) 10Chad: [C: 032] Whitelist API mobileview for robots.txt [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92552 (owner: 10Brion VIBBER) [18:29:03] ^d: uhhh, what do you mean? [18:29:17] whats wrong with it? [18:29:23] (it should be identical to terbium) [18:29:30] (03Merged) 10jenkins-bot: Whitelist API mobileview for robots.txt [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92552 (owner: 10Brion VIBBER) [18:29:32] <^d> puppet setup's right, and we added it to dsh. [18:29:42] <^d> But nothing's been sync'd to it [18:29:51] <^d> Other than wikiversions.txt|dat [18:29:54] hrmm [18:30:16] i didnt include all the terbium stuff, ie: left out maintaince script calls [18:30:16] perhaps those calls (set to disabled) install that other stuff. [18:30:38] im gonna install them set to false [18:30:40] see if it doesnt fix. [18:31:03] ...does that make sense orn o? [18:31:07] wheeeee [18:31:08] * RobH checks into what they install first [18:31:27] thx ^d [18:31:45] hahahaaha [18:31:51] ^d: i dont think anything requires mediawiki at present [18:32:03] since those maintainance scripts are what require it. [18:32:07] =P [18:32:27] mediawiki? pssh that's some bloated php crap. delete! [18:32:36] haha, yea [18:32:38] i'm going to put in a single script call to disabled [18:32:42] and let puppet handle it [18:33:38] i'd fix with proper dependency chain [18:33:41] but its a 90 day server [18:33:44] so no. [18:34:08] ^d: ^ [18:36:38] (03PS1) 10RobH: fixing mediawiki addtion to arsenic [operations/puppet] - 10https://gerrit.wikimedia.org/r/92679 [18:38:02] (03CR) 10RobH: [C: 032] fixing mediawiki addtion to arsenic [operations/puppet] - 10https://gerrit.wikimedia.org/r/92679 (owner: 10RobH) [18:38:52] <^d> Hehe :) [18:39:05] so that was my bad [18:39:15] i didnt realize that none of the entries other than script calls required mediawiki [18:39:19] running it now on host [18:39:36] !log demon synchronized robots.txt 'New robots.txt changes for mobileview' [18:39:41] <^d> brion: ^ [18:39:52] woot [18:39:53] Logged the message, Master [18:40:04] ^d: thanks! [18:40:15] i'll keep an eye out for explosions on the API servers ;) [18:40:24] <^d> http://en.wikipedia.org/robots.txt doesn't show yet. Wonder if we cache something :) [18:42:19] ^d: i see it . Allow: /w/api.php?action=mobileview& [18:42:33] <^d> I'm not, hmm [18:42:41] <^d> Ah there it is [18:43:03] <^d> brion: You're live :) [18:43:10] thanks ^d :D [18:43:13] <^d> yw [18:43:17] you've made some nerds at google very happy [18:45:42] <^d> Haha, sync-file sync'd the robots.txt to my broken server :p [18:46:37] heh [18:54:25] PROBLEM - Disk space on wtp1008 is CRITICAL: DISK CRITICAL - free space: / 355 MB (3% inode=85%): [19:00:55] RECOVERY - search indices - check lucene status page on search20 is OK: HTTP OK: HTTP/1.1 200 OK - 60075 bytes in 0.132 second response time [19:10:22] Reedy: hi, are you done with the docroots? I tried it on betacluster but still doesn't work [19:11:39] I fixed the broken/missing symlinks [19:11:56] its working fine on the cluster... [19:12:25] Reedy: i was hoping for http://www.wikipedia.beta.wmflabs.org/ [19:12:44] do you know if anything is trying to get it to work as in production [19:12:48] anyone [19:13:00] is that specific vhost actually enabled on the cluster? [19:13:19] how do you mean? [19:13:20] its really not a complex setup [19:13:28] http://en.wikipedia.beta.wmflabs.org/ works [19:13:35] ah, that again :) [19:13:41] hi aude :) [19:13:48] yes, but its a different vhost completely [19:14:13] simple vhost, simple docrooty [19:14:17] Reedy: what about http://wikipedia.beta.wmflabs.org/ [19:14:18] symlink [19:15:19] i want to start playing with extract2 and make a custom one for zero (anything to make paravoid happy!), hence was hoping to have a playground that actually runs production code [19:15:26] no Antoine [19:15:50] I'm not at a computer ATM. on my phone... [19:16:27] * yurik_ wishes to be just like Reedy, who does deployments from phone! [19:18:27] the specific error message on www.wiki suggests its being handled by the *.wikipedia vhost which is wrong [19:18:41] Reedy: i think hashar is not around atm [19:18:43] its trying for lang www project wikipedia [19:18:49] wwwwiki [19:19:09] part of me now wants to create wwwwiki... [19:19:47] Reedy: do you know who knows enough about it to fix it? and btw, wwwwiki.com is already taken :) [19:20:02] (03PS1) 10Cmjohnson: Removing dns entries for srv302 -srv500 [operations/dns] - 10https://gerrit.wikimedia.org/r/92686 [19:20:26] I was meaning the dbname of wwwwiki to go with madness like test wiki data wiki [19:20:43] I should be able to fix it [19:21:03] first thing is finding the Apache config repo for labs again [19:21:14] separate gerrit project iirc [19:21:55] (03CR) 10Cmjohnson: [C: 032] Removing dns entries for srv302 -srv500 [operations/dns] - 10https://gerrit.wikimedia.org/r/92686 (owner: 10Cmjohnson) [19:22:29] !log dns update [19:22:44] Logged the message, Master [19:27:34] !log removing decom'd srv* from /etc/dsh/groups nrpe-ext-stores, nagios [19:27:49] Logged the message, Master [19:44:35] !log demon rebuilt wikiversions.cdb and synchronized wikiversions files: Making sure wikiversions is up to date on arsenic [19:44:50] Logged the message, Master [19:45:22] !log demon synchronized multiversion 'Making sure multiversion is up to date on arsenic' [19:45:36] Logged the message, Master [19:46:14] !log demon Started syncing Wikimedia installation... : No changes, making sure arsenic is synced [19:46:25] Logged the message, Master [19:51:25] <^d> RobH, mutante: arsenic is just fine now. Thanks for all your help. [19:51:49] PROBLEM - check_job_queue on arsenic is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , commonswiki (12163), enwiki (46441), Total (66781) [19:56:02] !log demon Finished syncing Wikimedia installation... : No changes, making sure arsenic is synced [19:56:16] Logged the message, Master [20:02:54] ^d: awesome =] [20:03:14] ^d: since you insist on mediawiki being installed on your mediawiki scripting host ;] [20:03:26] <^d> :p [20:03:27] <^d> http://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&c=Miscellaneous+eqiad&h=arsenic.eqiad.wmnet&tab=m&vn=&mc=2&z=small&metric_group=ALLGROUPS [20:04:44] <^d> RobH: Do we have HT enabled? [20:05:01] nope [20:05:05] dual cpu quad core [20:05:12] i can enable with reboot if you want? [20:05:27] <^d> That'd be great, lemme halt my scripts. [20:05:44] <^d> Ok done. [20:06:05] ok, rebooting [20:06:33] RECOVERY - search indices - check lucene status page on search19 is OK: HTTP OK: HTTP/1.1 200 OK - 60075 bytes in 0.110 second response time [20:08:33] PROBLEM - Host arsenic is DOWN: PING CRITICAL - Packet loss = 100% [20:09:35] bootin up into bios now [20:12:15] ^d: HT enabled, booting now [20:12:37] <^d> Wheee [20:15:23] RECOVERY - Host arsenic is UP: PING OK - Packet loss = 0%, RTA = 0.33 ms [20:15:53] RECOVERY - twemproxy process on arsenic is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [20:15:58] ^d: all yours again [20:16:11] <^d> Thank you sir [20:16:11] heh, useless twemproxy... [20:16:23] thats the result of me not wanting to make a special manifest for this ;P [20:16:29] since its short term [20:21:55] (03PS1) 10Dzahn: remove decom'ed hosts from DHCP [operations/puppet] - 10https://gerrit.wikimedia.org/r/92759 [20:22:44] <^d> RobH: This is totally awesome with HT, I can double my number of processes :p [20:24:06] cool [20:24:21] (03PS2) 10Dzahn: remove decom'ed hosts from DHCP [operations/puppet] - 10https://gerrit.wikimedia.org/r/92759 [20:24:28] yea, we have off on many things by default, but more and more are able to use it so we find ourselves enabling more often now [20:24:29] (03CR) 10jenkins-bot: [V: 04-1] remove decom'ed hosts from DHCP [operations/puppet] - 10https://gerrit.wikimedia.org/r/92759 (owner: 10Dzahn) [20:25:15] <^d> terbium could benefit from it too at some point if it's not there already. [20:25:21] <^d> Since it's a lot of crons & one-off things. [20:25:26] (03PS3) 10Dzahn: remove decom'ed hosts from DHCP [operations/puppet] - 10https://gerrit.wikimedia.org/r/92759 [20:25:27] <^d> And batch jobs. [20:25:50] lemme see ifits on [20:26:04] its off. [20:26:13] ^d: I imagine we need to schedule that downtime? [20:26:25] i imagine folks are runnin shit. [20:26:30] yea [20:30:29] (03CR) 10Dzahn: [C: 032] en.planet - add Wikistaycation blog [operations/puppet] - 10https://gerrit.wikimedia.org/r/92628 (owner: 10Odder) [20:31:31] jenkins already said +1 but then still "needs verified" [20:31:41] and it got way slower [20:33:44] (03CR) 10RobH: [C: 032] en.planet - add Wikistaycation blog [operations/puppet] - 10https://gerrit.wikimedia.org/r/92628 (owner: 10Odder) [20:33:51] daniel isnt nuts, its fucked up. [20:34:08] it has +1 verified [20:34:09] 2 x +2 and one +1 by jenkins , but not enough for merge [20:34:14] yet it says needs verified to continue [20:34:30] jenkins is probably busy? [20:34:34] its +1 not +2 verified [20:34:38] yea, let's just wait a bit more [20:34:39] i dont know why it would do that. [20:34:40] http://integration.wikimedia.org/zuul/ [20:34:50] yea but apply +1 then a +2 later? [20:34:52] seems odd [20:35:02] RobH: full tests for +2 are only ran for trusted users [20:35:11] yes, but daniel is [20:35:18] all of ops are [20:35:29] seems odd it would apply any +1 tests at all in this case [20:35:32] the patch submitter is what matters [20:35:42] oh. [20:35:43] that makes sense, submitter is odder [20:35:43] apparnetly odder isn't on the trusted list [20:35:47] that explains it [20:35:48] maybe he should be [20:35:53] MatmaRex: thx ! [20:36:01] well, we can manually +2 [20:36:06] (they are not run for everyone because apparently no one has bothered to properly sandbox them yet and you could do a rm -rf / to break test runners or something) [20:36:08] or can we trigger it to do it? [20:36:21] cuz we would want it to do it for real testing rather than manual override. [20:36:24] it was a recent change though? [20:36:26] MatmaRex: ? [20:36:33] mutante: what was a recent change? [20:36:49] i don't recall having to manually verify for merging changes by odder [20:36:50] mutante: the not-running-tests-for-everyone? it's been that was since ever for core, dunno about for puppe [20:36:51] t [20:37:05] yea, puppet, because it's planet config [20:37:05] hrmm [20:37:06] mutante: some repos don't havy any "unsafe" tests [20:37:14] such as operations/mediawiki-config, i think [20:37:18] ok [20:37:18] and most of odder's changes are there [20:37:43] (03CR) 10Dzahn: [V: 032] en.planet - add Wikistaycation blog [operations/puppet] - 10https://gerrit.wikimedia.org/r/92628 (owner: 10Odder) [20:37:58] I think the question is when was operations/puppet set to not run full testing for nonverified [20:38:02] cuz i think it used to run them all the time. [20:38:03] (i think i should charge per character typed for consulting. :P) [20:38:05] something changed about this i think [20:38:07] but im not 100% certain. [20:38:09] for the puppet repo [20:38:16] or odder's account specifically,,shrug [20:38:30] mutante: you can have him added to the trusted list [20:38:38] mutante: it's in operations/zuul-config [20:38:42] yea but i wanna know when it changed now! ;] [20:39:00] mutante: submit a patch, have hashar +2 it :D [20:39:27] MatmaRex: ok, thanks;) [20:49:19] MatmaRex: mutante yeah that, and some other folks can +2 / deploy as well [20:49:30] I want to eventually have zuul-config managed by ryan deployment system [20:49:31] (03CR) 10Dzahn: [C: 031] Clean up Icinga timezone definitions [operations/puppet] - 10https://gerrit.wikimedia.org/r/92604 (owner: 10Catrope) [20:57:53] PROBLEM - Puppet freshness on analytics1021 is CRITICAL: No successful Puppet run in the last 10 hours [21:02:36] * apergos raises an eyebrow [21:02:41] the method for adding trusted gerrit users is messy [21:02:46] that is a hell of a regex. [21:02:59] oh yeah, that's the one with puppet disabled [21:03:04] ok, who disabled it? speak up [21:03:06] ahh, hence hashar's comment about ryan deployment systems [21:03:25] the regex, hehe [21:03:27] apergos: are we supposed to look into analytics machines now? [21:03:41] if puppet's not running on a host I want to know why [21:04:11] in this case I know on ly that it was disabled by a human [21:04:26] that's not very helpful for getting a sense of when it's going to be enabled again [21:04:38] thanks obama. [21:04:49] puppet ran fine before you were president! [21:04:58] er [21:05:01] not so much :-P [21:05:13] my statement makes just as much sense as every single other thanks obama statement ;] [21:05:41] hahaha [21:06:02] well... 'thanks for your great oversight of the nsa' I think there's some grounds for something there... [21:06:03] can't you disable puppet with a puppet change? :D [21:06:12] this was from the command line [21:06:36] and obviously it's not cool for me to just re-enable it but I'd like to know at least what's going on on the host [21:10:47] trusting odder https://gerrit.wikimedia.org/r/#/c/92773/1 [21:11:00] MatmaRex: [21:11:33] mutante: +1'd [21:11:42] thx [21:11:56] pong [21:12:26] deploying [21:12:39] twkozlowski: you will have test run for you whenever you submit a patch in Gerrit \O/ [21:12:47] twkozlowski: merged your latest planet addition, noticed jenkins doesn't run tests on it, asked around why, hashar and matmaxrex explained, making patch to have you whitelisted [21:13:16] Yeah, I just noticed a minute ago [21:13:32] Thanks for the merge & whitelist [21:13:41] Zuul reloading, the change will be applied soon [21:13:59] thanks:) [21:14:05] aka whenever the red box on https://integration.wikimedia.org/zuul/ disappear [21:15:39] twkozlowski: https://meta.wikimedia.org/w/index.php?title=Planet_Wikimedia%2FNew_language&diff=6054239&oldid=5496889 [21:16:27] twkozlowski: could use some some cleanup though.. the "new language" thing, and some didn't reply if they still want it [21:17:50] (03CR) 10RobH: [C: 032] Clean up Icinga timezone definitions [operations/puppet] - 10https://gerrit.wikimedia.org/r/92604 (owner: 10Catrope) [21:21:01] mutante: I see, perhaps setting up an /archive would do here for the time being? [21:21:22] and out of curiosity, why don't we translate 'All times are UTC.' and 'Powered by'? [21:22:11] oh, there's https://meta.wikimedia.org/wiki/Planet_Wikimedia/New_language/Archives - nice [21:31:58] RECOVERY - check_job_queue on hume is OK: JOBQUEUE OK - all job queues below 10,000 [21:32:39] twkozlowski: translating those is possible, no real reason not to, just needs some more variables in the template (puppet:///templates/planet/index.html.tmpl.erb) and the matching values in role/planet.pp [21:33:26] (03CR) 10RobH: [C: 032] remove decom'ed hosts from DHCP [operations/puppet] - 10https://gerrit.wikimedia.org/r/92759 (owner: 10Dzahn) [21:35:08] PROBLEM - check_job_queue on hume is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:35:58] PROBLEM - Disk space on wtp1008 is CRITICAL: DISK CRITICAL - free space: / 352 MB (3% inode=85%): [21:40:49] https://meta.wikimedia.org/w/index.php?title=Planet_Wikimedia/New_language&diff=6211242&oldid=6054239 mutante [21:56:07] (03PS6) 10Andrew Bogott: Added the system module and the system::role class. [operations/puppet] - 10https://gerrit.wikimedia.org/r/92338 [21:56:25] (03PS1) 10Dzahn: remove alsted.wm.org IP from DNS, decom'ed [operations/dns] - 10https://gerrit.wikimedia.org/r/92783 [21:57:10] (03PS1) 10Dzahn: complete misc_ dsh groups with servers still having IPs [operations/puppet] - 10https://gerrit.wikimedia.org/r/92784 [21:58:04] yay mor things going away [21:58:06] (03CR) 10Dzahn: [C: 032] remove alsted.wm.org IP from DNS, decom'ed [operations/dns] - 10https://gerrit.wikimedia.org/r/92783 (owner: 10Dzahn) [21:58:09] * apergos does a little happy dance [21:58:26] (03CR) 10Andrew Bogott: [C: 032] Added the system module and the system::role class. [operations/puppet] - 10https://gerrit.wikimedia.org/r/92338 (owner: 10Andrew Bogott) [22:02:22] (03PS4) 10Andrew Bogott: Add a 'generic' module. [operations/puppet] - 10https://gerrit.wikimedia.org/r/92339 [22:02:32] (03CR) 10Dzahn: [C: 032] complete misc_ dsh groups with servers still having IPs [operations/puppet] - 10https://gerrit.wikimedia.org/r/92784 (owner: 10Dzahn) [22:20:00] greg-g: anyone deplyoing? [22:20:30] yurik_: not that I know of [22:20:50] maybe i can jump in and push a few zero things out [22:21:09] what are they? :) [22:22:04] the usual breaking stuff [22:22:27] there is an issue where we show free banner even when its not free [22:22:33] and ask for confirmation when we shouldn't [22:22:50] different partners - different configurations [22:23:59] huh, suck [22:24:03] yurik_: yeah, go for it [22:24:16] yurik_: can you link me the gerrit urls when you have a chance? [22:24:48] greg-g: https://gerrit.wikimedia.org/r/#/c/92789/ [22:36:52] (03PS1) 10MarkTraceur: Add BetaFeatures and MultimediaViewer to config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92793 [22:40:02] marktraceur: wait, question.... [22:40:07] Answer! [22:40:10] https://gerrit.wikimedia.org/r/#/c/92793/1/wmf-config/InitialiseSettings.php [22:40:21] Yup [22:40:23] that tells me that it'll go everywhere tomorrow, that's not right though, right? [22:40:26] Oh, no [22:40:37] greg-g: I thought it would get staged normally, clearly I haven't done this before :) [22:40:45] right, so that's the confusing part [22:40:47] greg-g: So only for test, test2, and mediawiki.org [22:40:51] greg-g: about to push out [22:40:55] * greg-g nods [22:41:02] marktraceur: and testwikidata [22:41:16] but, yeah, so, the code is deployed in stages, but the config is the same among them all [22:42:02] (03PS2) 10MarkTraceur: Add BetaFeatures and MultimediaViewer to config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92793 [22:42:09] Oh, testwikidata? [22:42:38] (03PS3) 10MarkTraceur: Add BetaFeatures and MultimediaViewer to config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92793 [22:42:54] I think that's the variable name [22:43:02] testwikidatawiki* [22:43:06] its confusing. [22:43:07] Bollocks [22:43:08] Yeah [22:43:10] testwikidatawiki [22:43:13] :) [22:43:16] (03PS4) 10MarkTraceur: Add BetaFeatures and MultimediaViewer to config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92793 [22:43:29] testwikidatawikidotwikitest [22:43:37] mediamediamedia [22:43:43] wikiwikiwiki [22:44:03] UploadWizard UploadWizard UploadWizard [22:44:13] not the same ring to it [22:44:21] (03CR) 10Andrew Bogott: [C: 032] Add a 'generic' module. [operations/puppet] - 10https://gerrit.wikimedia.org/r/92339 (owner: 10Andrew Bogott) [22:44:27] btw, greg-g, i'm only updating 22 - there doesn't seems to be 23 out there yet [22:45:14] ?? [22:45:26] 1.23wmf1 is indeed out there [22:45:42] https://www.mediawiki.org/wiki/Special:Version [22:45:47] yurik_: ^ [22:46:10] greg-g: oh, i was looking for 1.22-wmf23. funny. [22:46:12] will do [22:46:14] :) [22:46:22] number confusion [22:46:25] collision [22:46:39] I didn't even notice that before [22:46:52] Hrm [22:46:57] let's do that next time, too. Have 1.23wmf23 be the last wmf* [22:47:03] then 1.24wmf24 etc etc [22:47:07] :) [22:47:14] 1.23wmf2 doesn't exist, so...I guess...not possible? [22:47:29] greg-g, I've CC'd you on a thread about a LD - are we OK to deploy this today? [22:47:35] wmf2 isn't out until tomorrow [22:47:35] https://www.mediawiki.org/wiki/MediaWiki_1.23/Roadmap [22:47:36] Or maybe it goes into master? [22:47:40] * marktraceur never done [22:47:55] marktraceur: yeah, if you get it in master before you go to sleep tonight, it'll be in wmf2 [22:47:59] Cool cool [22:48:04] !log yurik synchronized php-1.22wmf22/extensions/ZeroRatedMobileAccess/ [22:48:18] Logged the message, Master [22:48:18] MaxSem: yeah, /me looks at gerrit change [22:48:40] MaxSem: just this? https://gerrit.wikimedia.org/r/92676 [22:48:45] RECOVERY - check_job_queue on fenari is OK: JOBQUEUE OK - all job queues below 10,000 [22:48:51] yurik_: yurik http://www.wikipedia.beta.wmflabs.org/w/extract2.php?template=Www.wikipedia.org_template15 [22:48:57] http://meta.wikimedia.beta.wmflabs.org/wiki/Www.wikipedia.org_template [22:49:51] Or even [22:49:52] http://www.wikipedia.beta.wmflabs.org/ [22:50:47] Reedy: what about http://www.wikipedia.beta.wmflabs.org :) [22:50:51] but very nice work!!! [22:51:12] What about it? [22:51:15] (03PS1) 10Odder: (bug 56384) Configure $wgImportSources for dewikisource [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92797 [22:51:18] http://www.wikipedia.beta.wmflabs.org/ [22:51:22] WIKIPEDIA GOES HERE [22:51:42] Reedy: sorry - http://wikipedia.beta.wmflabs.org [22:51:46] copy paste error [22:52:05] Reedy: and also http://m.wikipedia.beta.wmflabs.org/ [22:52:26] http://zero.wikipedia.beta.wmflabs.org/ has a different page alltogether [22:52:27] beta hasn't got half the redirects that wikipedia has [22:52:30] Yes [22:52:35] Because it thinks it's a different wiki [22:52:45] PROBLEM - check_job_queue on fenari is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:53:00] ok, but thanks for getting the extract to run - will play with it tomorrow [22:54:16] what's with the job queue alerts? [22:54:32] ugh [22:55:38] ori-l: https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=job_queue [22:55:51] they work on arsenic and terbium [22:55:57] and the queue is actually large [22:56:09] https://gdash.wikimedia.org/dashboards/jobq/deploys [22:56:11] but they don't work on fenari/hume and need to be removed there i'd say [22:56:11] that's awesome, they're critical for two independent reasons [22:56:45] so ignore the CHECK_NRPE timeout ones [22:56:49] but do look at the other ones [22:57:12] sycing 23 ext/Zero [22:57:13] yurik_: If you wanted somerandomdomainthaticouldnotcareabout.wikipedia.beta.wmflabs.org to point to some mobile specific portal (or on production) it needs its own vhost (extract2 rewrite is in there), and adding to the allowed template array [22:57:24] greg-g, yes - only that rev (sorry I was re-checking with team) [22:57:30] MaxSem: no worries [22:57:53] (03PS1) 10Reedy: wiktionary.wikipedia.org isn't in DNS. Remove rewrite [operations/apache-config] - 10https://gerrit.wikimedia.org/r/92799 [22:57:58] MaxSem: well, pending that yurik_ finishes soon and the jobqueue isn't exploded before of him, you're good to go for the LD [22:58:14] but, check with yurik_ and ori-l (for those two respective problems) before doing so [22:58:15] greg-g: looking to sync https://gerrit.wikimedia.org/r/#/c/92791/ today if i can [22:58:16] ok, preparing commits [22:58:27] greg-g, MaxSem should be almost done [22:58:33] its pushing 23 [22:59:02] gah, everyone wants in! [22:59:11] (03PS1) 10Reedy: Remove defining of MEDIAWIKI constant, done in WebStart.php [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92800 [22:59:30] max, ve, ori [22:59:52] (03PS5) 10MarkTraceur: Add BetaFeatures and MultimediaViewer to config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92793 [23:00:10] hehe, greg is popular today [23:00:19] ugh, so [23:00:30] (03PS6) 10MarkTraceur: Add three Multimedia extensions to config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92793 [23:00:45] ugh [23:00:50] 1) make sure that jobqueue issue isn't a real issue (ori, can I trust you to do this?) [23:00:58] 2) pending 1, max goes after yuri's done [23:01:04] greg-g: no [23:01:05] 3) VE goes after max [23:01:10] .. [23:01:11] fine [23:01:41] Reedy: thx, i will need to do dynamic html generation for www.m.*, m.*, zero.wp --- if its zero, it will do our own magic, if not - regular extract2 stuff [23:01:44] the jobq numbers went down https://gdash.wikimedia.org/dashboards/jobq/deploys [23:01:45] i mean, i can try and be helpful, but i know so little about the job queue, it'd be hubris to commit to fixing something [23:02:03] Reedy: any suggestions if i can try my code on the cluster without commiting it to master/ [23:02:10] !log yurik synchronized php-1.23wmf1/extensions/ZeroRatedMobileAccess/ [23:02:17] greg-g, MaxSem done [23:02:21] Logged the message, Master [23:02:49] AaronSchulz: if you didn't want to do anything you wanted to do today, could you look at the job queue alerts: https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=job_queue [23:02:53] yurik_: thanks [23:03:18] ok, deploying my stuff [23:03:49] MaxSem: one second [23:04:07] Reedy: how often is https://bugzilla.wikimedia.org/show_bug.cgi?id=56269 in prod? [23:04:14] greg-g: job queue length doesn't seem especially aberrant: http://ganglia.wikimedia.org/latest/graph.php?r=week&z=xlarge&c=Miscellaneous+pmtpa&h=www.wikimedia.org&jr=&js=&v=71395&m=Global_JobQueue_length [23:04:17] yurik_: your deploys had an effect on jobqueue, I think: https://gdash.wikimedia.org/dashboards/jobq/deploys [23:04:27] (03PS2) 10Reedy: Remove defining of MEDIAWIKI constant, done in WebStart.php [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92800 [23:04:33] the alert threshold is kind of lame [23:04:45] AaronSchulz: ok, I'll accept that [23:04:45] AaronSchulz: I can check on fluorine [23:04:49] MaxSem: go ahead :) [23:04:56] ok [23:05:08] greg-g: "fix the threshold"...there's a backlog item ;) [23:05:10] James_F: who's doing your deploy? [23:05:10] greg-g: want me to roll it back? they really shouldn't [23:05:18] yurik_: nah, I think it's over now [23:05:19] greg-g: Roan. See [[Deployments]]. [23:05:23] yurik_: just an fyi [23:05:25] James_F: thanks [23:05:32] greg-g: No worries. [23:05:44] RoanKattouw_away: you're up after Max is done. MaxSem please tell Roan when you're done. [23:06:06] ori-l: you're after VE/Roan if there's time (cut off at 4:45, 15 minutes each) [23:06:18] ayeaye, thanks [23:07:21] !log maxsem synchronized php-1.22wmf22/extensions/MobileFrontend/ 'https://gerrit.wikimedia.org/r/#/c/92676/' [23:07:34] Logged the message, Master [23:08:48] !log maxsem synchronized php-1.23wmf1/extensions/MobileFrontend/ 'https://gerrit.wikimedia.org/r/#/c/92676/' [23:08:55] RoanKattouw_away, greg-g, I'm done [23:09:01] OK [23:09:04] Logged the message, Master [23:09:05] I just got here so let me do my prep first [23:09:08] Is anyone after me? [23:09:12] ori-l is [23:09:13] Oh ori-l is after me [23:09:30] ori-l: Where are you at w/ prep? I have to like put in submodule changes in Gerrit still [23:09:38] ori-l: so, after today (and a couple other days) I'mma gonna put that pushbot from etsy on my list for realz :) [23:09:51] RoanKattouw: I still need to have my patch merged, so you're way ahead of the game here :P [23:10:19] greg-g: yes, definitely [23:10:24] i mean, good idea. [23:11:09] * greg-g nods [23:11:53] whats' going on? 855 Fatal error: Class 'MFResourceLoaderModule' not found in /usr/local/apache/common-local/php-1.23wmf1/includes/resourceloader/ResourceLoader.php on line 408 [23:12:10] MaxSem: ^ [23:12:13] greg-g: ^ [23:12:23] ergh [23:12:39] my change was a couple lines of JS [23:12:58] is it zero? [23:13:08] MF presumably means MobileFrontend [23:14:40] yurik_: wherea re you seeing this? [23:14:51] fatal monitor [23:14:57] on fenari [23:15:08] there are only 8000 of them in fatal.log [23:15:10] no biggie [23:15:15] shush you [23:15:29] * yurik_ throws digital snowball at ori [23:16:10] not sure how to diagnose this, but it points to Max's deploy [23:16:15] yurik_: when did those start coming in? [23:16:23] a few min ago [23:16:26] * greg-g is making guesses right now, btw [23:16:29] reverting whatever I deployed [23:16:31] about 10 min after zero [23:16:32] well wait [23:16:43] will that exascerbate it? [23:16:49] its only in 23 [23:17:09] it's not in the autoloader, is it? [23:17:16] it is [23:17:23] !log maxsem synchronized php-1.23wmf1/extensions/MobileFrontend/ 'Shit hit fan' [23:17:30] I poked for a minute before reverting [23:17:34] looked right [23:17:37] Logged the message, Master [23:18:18] have the fatals stopped coming in? [23:18:24] no :0 [23:18:50] seems like they are dropping [23:19:33] errr [23:19:35] nope, back up [23:19:37] bleh [23:20:01] (03PS1) 10Chad: Move Cirrus poolcounter settings to where they belong, update values to reflect production more [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92808 [23:20:51] OK so there's nothing wrong with the code in wmf1 [23:21:06] I tried instantiating the class using eval.php and it works fine [23:21:08] greg-g: is there a place to get a normal stacktrace? [23:21:21] So there must be something wrong with the server or the wiki it's happening on [23:21:23] yes, but I forget where it is off hand [23:21:25] Reedy: ^ [23:21:42] fluorine [23:21:47] /a/mw-log [23:21:59] right, flourine [23:22:00] It's wikidatawiki [23:22:11] Does it have Zero but not MF maybe? [23:22:31] Yup [23:22:42] https://www.wikidata.org/wiki/Special:Version [23:22:52] Extension dependancies are bad, mmm'kay? [23:22:56] dependencies [23:23:12] 'wmgMobileFrontend' => array( [23:23:12] 'default' => true, [23:23:12] 'wikidata' => false, // Disabled due to lack of mobile domain setup [23:23:12] ), [23:23:12] 'wmgZeroRatedMobileAccess' => array( [23:23:13] 'default' => true, [23:23:15] ), [23:23:17] Sigh [23:23:19] lol [23:23:25] bleh [23:23:25] yurik_: ZeroRatedMobileAccess uses MFResourceLoader module in MobileFrontend, which means it depends on MF, but wikidata has Zero without MF [23:23:41] i got it, who wants to fix :) [23:23:41] Have the fatals stopped? [23:23:47] me [23:23:58] thanks RoanKattouw [23:23:58] Why was it loading resources anyway? [23:24:09] also, why wasn't it before :) [23:24:12] Elsie: No: https://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&title=MediaWiki+errors&vl=errors+%2F+sec&n=&hreg[]=vanadium.eqiad.wmnet&mreg[]=fatal|exception>ype=stack&glegend=show&aggregate=1&embed=1 [23:24:24] (03CR) 10jenkins-bot: [V: 04-1] Move Cirrus poolcounter settings to where they belong, update values to reflect production more [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92808 (owner: 10Chad) [23:24:31] ResourceLoader->getModule('mobile.zero.tem...') [23:24:38] As part of the modules=startup request [23:24:41] fatals are going down [23:24:42] greg-g: i could either get our patch reverted, or we can change configs [23:24:56] Wikidata JS/CSS is probably completely broken right now, if modules=startup is really 500ing [23:25:05] (03PS1) 10Reedy: Disable Zero on wikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92809 [23:25:36] there you have your answer [23:25:41] (03CR) 10Reedy: [C: 032] Disable Zero on wikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92809 (owner: 10Reedy) [23:25:44] * aude cries :( [23:25:46] yep [23:25:57] (03PS2) 10Chad: Move Cirrus poolcounter settings to where they belong, update values to reflect production more [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92808 [23:25:58] greg-g, RoanKattouw: I'm forfeiting my slot, prod too crazy, will go another day. [23:25:58] (03PS1) 10Chad: Update LuceneSearchRequest maxqueue to 600 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92810 [23:25:58] The children in Africa want Wikidata via SMS. [23:26:03] (03PS1) 10MaxSem: Enable Zero only with MF [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92811 [23:26:03] aude: i just knew our paths would cross [23:26:05] ori-l: wise man [23:26:16] (03PS2) 10Chad: Update LuceneSearchRequest maxqueue to 600 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92810 [23:26:17] (03PS3) 10Chad: Move Cirrus poolcounter settings to where they belong, update values to reflect production more [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92808 [23:26:30] someday we'll have mobile frontend [23:26:35] hmm, actually fatals went down slightly and then up slightly, continuing to go up slowly [23:26:42] That day shal not be this day [23:26:49] yurik_, https://gerrit.wikimedia.org/r/92811 [23:26:51] (03CR) 10Chad: [C: 032] Update LuceneSearchRequest maxqueue to 600 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92810 (owner: 10Chad) [23:26:51] yeah, they won't go away until reedy syncs [23:26:52] is wikidata the only wiki w/o it? [23:26:55] I read that SHA1. [23:26:58] as [23:27:04] * Elsie pets Reedy. [23:27:04] ori-l: hey [23:27:15] * yurik_ gives reedy a big cookie [23:27:15] MaxSem: just put it in the block that enables MFE [23:27:17] (03Merged) 10jenkins-bot: Disable Zero on wikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92809 (owner: 10Reedy) [23:27:18] paravoid: hey [23:27:31] ori-l: did you have any luck with the memcache investigation? [23:27:46] fatals dropping again [23:27:47] alright, I need to run in 3 minutes [23:28:12] paravoid: no, not yet. [23:28:26] so, once zero is disabled on wikidata, and things look to be back to normal, Roan can fix VE if he promises not to break wikidata (or any other wiki) ;) [23:28:27] meh, I still need to re-deploy my stuff I reverted [23:28:48] !log reedy synchronized wmf-config/InitialiseSettings.php 'Disable Zero on wikidataw' [23:28:49] Go for it MaxSem [23:28:58] * greg-g sighs [23:28:59] I'll wait till you tell me you're done [23:29:02] Logged the message, Master [23:29:02] fatal are still pretty high [23:29:07] give it a sec [23:29:22] !log demon synchronized wmf-config/PoolCounterSettings-eqiad.php 'Upping search maxqueue to 600' [23:29:23] ok, 100 down [23:29:31] yurik_: That display lags, too [23:29:37] Logged the message, Master [23:29:46] There's certainly no new ones coming in [23:29:48] !log demon synchronized wmf-config/PoolCounterSettings-pmtpa.php 'Upping search maxqueue to 600' [23:29:52] dropping fast now [23:29:57] It's the share of fatals out of the most recent 1000 log lines [23:30:04] Logged the message, Master [23:30:11] So it's probably already stopped, it's just waiting for other stuff to push it out of the top 1000 lines of the log [23:30:13] where 1000 log lines can be minutes or seconds [23:30:14] :D [23:30:17] Exactly [23:30:27] https://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&title=MediaWiki+errors&vl=errors+%2F+sec&n=&hreg[]=vanadium.eqiad.wmnet&mreg[]=fatal|exception>ype=stack&glegend=show&aggregate=1&embed=1 shows it's stopped [23:30:28] whew, looks good [23:30:39] MaxSem, yurik_: I think you owe the engineering list a postmortem [23:30:39] it seems fine now [23:30:42] yei!!! team awesome [23:30:42] that was a bit of a shit-show [23:30:51] RoanKattouw: hey, that's a useful graph :) [23:30:54] !log maxsem synchronized php-1.23wmf1/extensions/MobileFrontend/ 'https://gerrit.wikimedia.org/r/#/c/92676/' [23:30:55] what happened? [23:31:00] kaldari: Thank ori-l , he rigged that up [23:31:01] ori-l: shush [23:31:06] or maybe I should just wait for the postmortem :) [23:31:07] alright, I need to run to the bus [23:31:09] now if someone could explain how in the world could it have happened... [23:31:09] Logged the message, Master [23:31:12] RoanKattouw, done [23:31:15] thanks [23:31:17] ori-l: it's true though :( [23:31:23] paravoid: ZeroRatedMobileAccess was changed to depend on MobileFrontend, which exploded because wikidata has Zero but not MF [23:31:31] greg-g: yeah, dunno what i did to deserve that shush [23:31:33] Why it started exploding just now and didn't explode before, we don't know yet [23:31:41] paravoid: flood of fatals in prod during LD [23:32:01] related to zero assuming MFE is loaded everywhere (it isn't) [23:32:05] MaxSem, greg-g: OK then I'm gonna go next [23:32:08] ori-l: I've seen worse :) [23:32:54] yurik_, MaxSem: Could you guys figure out when that dependency on MF appeared in ZeroRated? [23:33:06] checking... [23:33:08] RoanKattouw, RL module [23:33:08] (03PS2) 10MaxSem: Wrap all mobile.php in MF existence check [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92811 [23:33:17] ^^ should be safer:) [23:33:25] I know what the dependency is, I'm asking *when* it was added [23:33:28] RoanKattouw: zero always relied on MF from what i know [23:33:41] yeah, but implicitly, with hooks [23:33:42] +1 for postmortem [23:33:43] Like, between 1.22wmfNN and 1.23wmf1 most likely? [23:33:51] MaxSem: could be that singleton we started using [23:33:59] Yeah the RL module is a hard dependency [23:34:12] modules=startup instantiates all modules so it can ask them what their mtime is [23:34:21] Which means that if your module inherits a class that doesn't exist, startup will explode [23:34:31] (and a Zero module inherited a class from MF) [23:35:14] Hmm, I suppose the reason it exploded during the LD is that it invalidated the cache for the startup module, perhaps? I would need to read the code to figure out how exactly that 304 logic works [23:36:59] !log catrope synchronized php-1.23wmf1/extensions/VisualEditor/ 'VE cherry-pick' [23:37:10] RoanKattouw, per log it appears to have exploded right after Zero pusges at 23:02 [23:37:14] Logged the message, Master [23:37:29] OK, and you said those Zero pushes were JS-only? [23:37:30] not after the MF deployment [23:37:45] _my_ pushes were JS-only [23:37:53] Aha [23:37:59] So who pushed Zero and what did they push? [23:38:13] 23:02 logmsgbot: yurik synchronized php-1.23wmf1/extensions/ZeroRatedMobileAccess/ <-- that one? [23:38:19] yup [23:38:36] same minute, the graph went up https://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&title=MediaWiki+errors&vl=errors+%2F+sec&n=&hreg[]=vanadium.eqiad.wmnet&mreg[]=fatal|exception>ype=stack&glegend=show&aggregate=1&embed=1 [23:38:41] k [23:38:43] ok, found it [23:39:47] why do you ask, re: memcached? just checking, or is there some indication that things have gotten critical? [23:39:54] er, that was @paravoid [23:40:00] ori-l: no indication [23:40:09] I pointed out back in Sept and we kinda dropped the ball [23:40:16] so I was excited that you cared now [23:40:25] and wanted to learn about the progress [23:40:31] yes, give me a bit more time [23:40:35] but i won't drop it [23:40:38] sorry, didn't mean to pressure you [23:40:54] RoanKattouw: MaxSem, it was part of the styles added by adam (reviewed by me) https://gerrit.wikimedia.org/r/#/c/83133/39/ZeroRatedMobileAccess.php [23:41:02] np, some pressure is good [23:41:24] I think the main issue was wmf-config [23:41:41] that allowed to enable Zero separately from MF [23:41:43] ori-l: feel free to keep me updated and ping me if you need to brainstorm this or any help [23:41:52] (or not, no worries) [23:42:10] who's gonna write a post-mortem? [23:42:14] ok, will do, appreciate the offer. [23:42:16] MaxSem: MFResourceLoaderModule is the only ref there [23:42:30] singleton was in prod for a while, never caused any issues [23:42:44] yurik_, because the singleton is used by hooks [23:42:45] i suspect that's because it never gets called without mobile [23:42:54] correct [23:43:12] dr0ptp4kt: ^ [23:43:39] MaxSem: i guess having 39 patch revisions is a sign that it has bugs :) [23:43:52] any objections against https://gerrit.wikimedia.org/r/#/c/92811/ ? [23:44:11] awjr, kaldari, yurik_, dr0ptp4kt ^^ [23:45:39] looks reasoanble to me MaxSem [23:45:54] seems like a good idea [23:47:08] (03CR) 10Yurik: [C: 031] Wrap all mobile.php in MF existence check [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92811 (owner: 10MaxSem) [23:47:26] (03CR) 10MaxSem: [C: 032] Wrap all mobile.php in MF existence check [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92811 (owner: 10MaxSem) [23:47:36] (03Merged) 10jenkins-bot: Wrap all mobile.php in MF existence check [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92811 (owner: 10MaxSem) [23:48:02] ^d, Warning: Search backend highlighted a redirect (Hugo "Hurley" Reyes) but didn't return it. [Called from CirrusSearchResult::findRedirectTitle in /usr/local/apache/common-local/php-1.22wmf22/extensions/CirrusSearch/includes/CirrusSearchSearcher.php at line 821] in /usr/local/apache/common-local/php-1.22wmf22/includes/debug/Debug.php on line 296 [23:48:29] <^d> Hmm, where'd you see this? [23:49:55] ^d, in fatalmonitor [23:50:10] They come and go [23:50:46] reedy@fenari:~$ grep -c "Search backend highlighted a redirect" /home/wikipedia/syslog/apache.log [23:50:46] 42 [23:50:48] haha [23:51:18] !log maxsem synchronized wmf-config/mobile.php 'https://gerrit.wikimedia.org/r/92811' [23:51:34] Logged the message, Master