[00:00:01] But I think it's missing some wikis? [00:00:08] the restart was .. maybe half an hour [00:00:14] Oh, okay. [00:00:37] Dereckson, i don't know the specific url - its part of the dynamic resource loader. But it works now just fine, so go ahead with 23 [00:00:58] Leah: i'd also have to restart the ircd itself but users would hate me [00:01:17] Let me try making an edit. [00:01:21] And see if the channel appears. [00:01:23] Dereckson, it seems like i just needed to wait a bit longer for all servers to get it [00:01:35] k [00:01:43] 23: [00:01:56] !log dereckson@tin Synchronized php-1.27.0-wmf.23/extensions/Graph/lib/d3-global.js: Graph: match modern module loading in core (1/3) (duration: 00m 26s) [00:02:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:02:22] !log dereckson@tin Synchronized php-1.27.0-wmf.23/extensions/Graph/lib/topojson-global.js: Graph: match modern module loading in core (2/3) (duration: 00m 26s) [00:02:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:02:48] !log dereckson@tin Synchronized php-1.27.0-wmf.23/extensions/Graph/extension.json: Graph: match modern module loading in core (3/3) (duration: 00m 26s) [00:02:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:02:56] yurik: here you are ^ [00:02:58] 06Operations, 10DBA, 13Patch-For-Review: Set up TLS for MariaDB replication - https://phabricator.wikimedia.org/T111654#2262206 (10Volans) [00:03:24] mutante: Okay, making an edit worked. rc-pmtpa is in #wikimediafoundation.org now on irc.wikimedia.org. [00:03:27] And snitch picked up the edit. [00:03:41] Leah: :) great [00:03:44] 06Operations, 07Tracking: Upgrade Wikimedia servers to Ubuntu Trusty (14.04) (tracking) - https://phabricator.wikimedia.org/T65899#2262208 (10Neil_P._Quinn_WMF) For the curious, the ticket @MoritzMuehlenhoff is referring to is T123525. [00:03:49] I hadn't realized there was https://phabricator.wikimedia.org/T134247 and a recent restart. I think that explains it. :-) [00:03:50] Leah: so then it is that bug that i linked that is an old bug [00:03:56] Leah: about pre-creating all channels [00:04:03] Yeah, my bot can join the channels before an edit happens. [00:04:06] That part is fine. [00:04:12] ok! [00:04:14] Dereckson, testing... [00:04:27] Leah: what should i do about an ircd restart? [00:04:29] mutante: wmfwiki is the only channel with .org, I think. [00:04:43] mutante: Are any more restarts needed now? [00:04:56] Or you mean in the future? [00:05:03] (03CR) 10BBlack: [C: 032] Bump WDQS cache to 5 mins [puppet] - 10https://gerrit.wikimedia.org/r/286776 (owner: 10Smalyshev) [00:05:09] bot and ircserver are separate [00:05:14] the bot should be done now [00:05:21] the ircserver would need one just to close the ticket [00:05:23] Right. Does ircd need to be restarted? [00:05:25] about "pmtpa" in the hostname [00:05:27] and the motd [00:05:53] MOTD shouldn't require a restart, I don't think. [00:05:54] it doesnt actually matter for the clients [00:06:01] well the server hostname does [00:06:05] Yeah. [00:06:05] and the motd includes $hostname [00:06:32] well, technically it does not. here's the thing [00:06:41] Does the hostname matter? [00:06:55] Dereckson, could you try it - navigate and activate any of the graphs - https://www.mediawiki.org/wiki/Extension:Graph/Demo#Vega_2.0_interactive_examples [00:06:55] 20:06 -!- - irc.pmtpa.wikimedia.org Message of the Day - [00:07:09] "irc.pmtpa.wikimedia.org" is the problematic part? [00:07:14] https://phabricator.wikimedia.org/T133328 [00:07:24] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [00:07:26] ah, just started working - i really think like there is a 5 min delay there somewhere [00:07:28] https://phabricator.wikimedia.org/T133328#2261358 [00:07:39] Dereckson, all good, thanks a lot! [00:08:06] mutante: I think a restart can wait. It'll happen during the next security upgrade or OS upgrade, right? [00:08:07] Leah: yea, so the actual motd that you edit with "vimotd" starts one line after that [00:08:20] Leah: yea, like kernel upgrade for example [00:08:30] Seems fine to wait. [00:08:34] ok [00:08:46] I don't think most people read the MOTD on irc.wikimedia.org. :-) [00:08:47] Thanks for testing yurik. [00:08:51] So SWAT is done. [00:08:55] yep [00:09:22] Leah: while that is true it is also pretty much guaranteed that when it gets any kind of attention somebody will say "pmtpa? lol" [00:09:37] Leah: and we are not even changing the bot name :) [00:09:54] Yeah, "rc-pmtpa" is a lovely bot name. [00:10:00] Like someone stepped on a keyboard. [00:10:04] it would be easy to change [00:10:08] Sure. [00:10:24] but nobody knows what it breaks [00:13:14] PROBLEM - Host mr1-ulsfo is DOWN: CRITICAL - Network Unreachable (198.35.26.194) [00:13:14] PROBLEM - Host mr1-ulsfo IPv6 is DOWN: /bin/ping6 -n -U -w 15 -c 5 2620:0:863:ffff::6 [00:13:18] Leah: the part about your bots joining the channels ... that would be surprising [00:13:24] PROBLEM - Host asw-ulsfo.mgmt.ulsfo.wmnet is DOWN: PING CRITICAL - Packet loss = 100% [00:13:44] Leah: because the custom patch by Fred.. that is in our ircd is that only opers can create channels [00:14:36] The deployment documentation says that the MediaWiki installations on terbium are in /usr/local/apache/common/, but there doesn't seem to be any such directory [00:15:40] kaldari: see /srv/mediawiki/ [00:15:55] outdated docs about /usr/local/ [00:16:07] mutante: Thanks. I'll update the docs! [00:16:12] :) cool [00:16:36] mutante: "/list" on either server looks a lot bigger than "/wii rc-pmtpa" [00:17:09] Maybe I'm mis-counting. [00:17:35] PROBLEM - Host mr1-ulsfo.oob is DOWN: PING CRITICAL - Packet loss = 100% [00:18:11] Or maybe the bot restart means more channels exist than would have? [00:18:13] Leah: try joining a random channel that is not a project [00:18:25] For example, #yi.wiktionary has 1 user (not rc-pmtpa). [00:18:38] mutante: I joined [#fjdsklfdjsf] [00:18:39] ah, well this could make sense [00:18:51] it's not about joining them, it's about _creating_ them [00:19:00] so if you were already in it, and the bot leaves [00:19:01] you stay [00:19:11] I don't know what "creating" means. [00:19:18] Most IRC servers allow a person to join any channel. [00:19:31] And joining auto-creates the channel. [00:19:37] yes, but this is custom because exactly this one thing is not like most [00:19:46] Weird. [00:19:52] hold on ... [00:20:10] "creating" means joining when there are no existing users [00:20:13] being the first [00:20:39] see this https://github.com/wikimedia/operations-debs-ircd-ratbox/blob/master/ircd-ratbox-notalk.patch [00:20:45] Hmmm, maybe "/join #ffdsjfdsf" isn't actually joining/creating. [00:20:45] 2007 :) [00:20:59] When I try to speak in there, irssi just closes the window. [00:21:08] yea, that's how it looks [00:21:14] Ugh. [00:21:19] so that's "normal" [00:21:28] !log ran mwscript maintenance/updateCollation.php --wiki=ruwiktionary --force [00:21:31] I don't know why the behavior of this ircd is different than like every other. [00:21:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:21:40] also see https://phabricator.wikimedia.org/T134271#2261992 for more rabbit hole [00:22:24] Leah: i think .. because Fred and maybe others back in 2007 didnt want to deal with spam and trolls [00:22:41] it's supposed to be readonly [00:22:48] I remember Fred. [00:22:55] I don't remember him being around in 2007, though. [00:23:06] 2009, yeah. [00:23:17] !log mwscript namespaceDupes.php aswikisource --merge --fix (T133505) [00:23:18] eh, sorry [00:23:20] --- ircd-ratbox-2.2.8/modules/core/m_join.c 2007-01-22 10:12:25.000000000 -0800 [00:23:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:23:25] +++ ircd-ratbox-notalk-2.2.8/modules/core/m_join.c 2009-06-15 15:30:31.000000000 -0700 [00:23:27] Yeah. [00:23:37] That makes a bit more sense. [00:23:41] T133505: Page on aswikisource not accessible via page title, only via "curid" - https://phabricator.wikimedia.org/T133505 [00:23:56] (03CR) 10Zhuyifei1999: [C: 031] Commons: Restrict changetags userright [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286522 (https://phabricator.wikimedia.org/T134196) (owner: 10Rillke) [00:24:28] Leah: having that custom feature now makes it harder to replace the ircd with a different one [00:24:35] 06Operations, 10ops-ulsfo: power loss in ulsfo cabinet 1.23 - https://phabricator.wikimedia.org/T134330#2262227 (10RobH) [00:24:41] I guess? [00:24:48] inspircd has about a trillion config options. [00:25:01] I imagine it would be fine with some tweaking. [00:25:20] you can do all that in core inspircd, i was going to open a bug about it [00:25:29] but the package was already built [00:25:40] Leah: charybdis https://phabricator.wikimedia.org/T134271 [00:25:48] 06Operations, 10ops-ulsfo: power loss in ulsfo cabinet 1.23 - https://phabricator.wikimedia.org/T134330#2262244 (10RobH) email sent to support. [00:25:53] Leah: if you have other options and comments please do [00:26:21] mutante: I'd rather stop using ircd than upgrade it. [00:26:28] I think we were getting close with rcstream.wikimedia.org or something. [00:26:44] Leah: no, that's just what people think [00:26:51] but it's not replacing it [00:27:08] for $reasons [00:27:16] Well, we should figure out what those reasons are. [00:28:09] 06Operations, 10Wikimedia-IRC-RC-Server: Replace ircd-ratbox with something newer/maintained - https://phabricator.wikimedia.org/T134271#2260405 (10MZMcBride) There was activity at some point to deprecate irc.wikimedia.org altogether. I wonder what happened with that. [00:28:24] Leah: "Is irc.wikimedia.org deprecated in favor of stream.wikimedia.org? My understanding is that the current answer is no, so this task is not resolved and won't be resolved for some time." [00:28:31] most of reasons, but when it was first discussed is because no one wants to update and rewrite all their stuff [00:28:38] -- MzMcBride - 2015, Jul 28 [00:29:06] Leah: https://phabricator.wikimedia.org/T87780#2031332 [00:29:33] Yeah, just found it. [00:29:40] That MzMcBride guy sure is smat. [00:29:42] 06Operations, 10Wikimedia-IRC-RC-Server: Replace ircd-ratbox with something newer/maintained - https://phabricator.wikimedia.org/T134271#2262262 (10Dzahn) >>! In T134271#2262258, @MZMcBride wrote: > There was activity at some point to deprecate irc.wikimedia.org altogether. I wonder what happened with that. s... [00:29:43] Smart. [00:29:47] :) [00:29:58] 06Operations, 10Wikimedia-IRC-RC-Server: Replace ircd-ratbox with something newer/maintained - https://phabricator.wikimedia.org/T134271#2262264 (10MZMcBride) I'm thinking of T87780#2031332, I guess. [00:30:11] Oh, hah. [00:30:24] hehee [00:31:52] "#mediawiki.wikipedia", "#wikidata.wikipedia", "#wikimediafoundation.org", "#commons.wikimedia" [00:31:55] What a naming convention. [00:32:08] 06Operations, 10ops-ulsfo: power loss in ulsfo cabinet 1.23 - https://phabricator.wikimedia.org/T134330#2262266 (10RobH) UnitedLayer called me, seems they hadn't even read my support ticket yet and I was the contact for issues. They are aware of the loss of the B side tower PDU in our cabinet 1.23. They esti... [00:32:43] Leah: compare with https://noc.wikimedia.org/conf/highlight.php?file=all.dblist [00:32:49] Leah: bah, where is that even configured? [00:32:59] frwiki → #fr.wikipedia [00:33:06] commonswiki → #commons.wikipedia [00:33:19] right [00:33:20] mutante: Somewhere in InitialiseSettings.php I think? [00:33:25] arf but it uses #commons.wikimedia [00:33:30] 06Operations, 10Wikimedia-IRC-RC-Server: IRC RC server still mentions pmtpa on various places - https://phabricator.wikimedia.org/T133328#2262267 (10Dzahn) 17:09 < mutante> Leah: what should i do about an ircd restart? 17:09 < Leah> mutante: Are any more restarts needed now? 17:09 < Leah> Or you mean in the f... [00:33:45] mutante: its a "historic" thing, but there is code to rename to some of them in Common or InitSettings [00:33:56] Yeah. [00:33:59] Krinkle was doing some of it iirc [00:34:01] We special-case a few. [00:34:52] #species.wikimedia works :) [00:35:58] #wikimediafoundation.org must be the strangest. [00:37:19] 'wmgRC2UDPPrefix' => array( in InitialiseSettings.php. [00:38:21] Some of these wikis can't possibly by using irc.wikimedia.org. [00:38:24] Like internalwiki. [00:38:32] That would leak private data. [00:39:16] 'legalteamwiki' => "#legalteam.wikipedia\t", [00:39:23] I somehow doubt that channel is getting messages. [00:39:48] for sure :) [00:41:17] Leah: internal.wikimedia.org [00:41:44] RECOVERY - Host mr1-ulsfo is UP: PING OK - Packet loss = 0%, RTA = 75.23 ms [00:42:34] RECOVERY - Host mr1-ulsfo.oob is UP: PING OK - Packet loss = 0%, RTA = 74.75 ms [00:43:11] mutante: What about it? [00:43:24] RECOVERY - Host mr1-ulsfo IPv6 is UP: PING OK - Packet loss = 0%, RTA = 88.47 ms [00:43:59] Leah: not much, just another obscure one that is probably in the list but the bot cant get edits [00:44:35] RECOVERY - Host asw-ulsfo.mgmt.ulsfo.wmnet is UP: PING OK - Packet loss = 0%, RTA = 77.46 ms [00:44:40] Yeah, we should probably clean up the list. [00:45:58] btw, the configuration of the IRCD wasnt public, but i'll change it [00:46:18] because of the oper passwods [00:46:44] but we'll split it so that only the actually private parts are in private repo and not the whole file [00:48:38] 06Operations, 10ops-ulsfo: power loss in ulsfo cabinet 1.23 - https://phabricator.wikimedia.org/T134330#2262278 (10RobH) They moved mr1-ulsfo as requested, still awaiting call back from them regarding completion of that and repairs. It seems that msw1-ulsfo is a netgear, and only single pdu. Its down, but no... [00:49:34] PROBLEM - Juniper alarms on asw-ulsfo.mgmt.ulsfo.wmnet is CRITICAL: JNX_ALARMS CRITICAL - 1 red alarms, 0 yellow alarms [00:50:45] PROBLEM - puppet last run on ms-fe2003 is CRITICAL: CRITICAL: Puppet has 1 failures [00:51:18] thats due to one of the pdus being down on the access switch (the juniper asw-ulsfo alarm) [00:55:13] (03PS1) 10Dzahn: ircserver: move ircd.conf to public repo [puppet] - 10https://gerrit.wikimedia.org/r/286783 (https://phabricator.wikimedia.org/T134271) [00:56:59] (03PS1) 10Eevans: Blacklist some meta-table Cassandra metrics [puppet] - 10https://gerrit.wikimedia.org/r/286784 (https://phabricator.wikimedia.org/T134016) [01:00:55] (03PS1) 10Dzahn: ircserver: don't use TS6 protocol, no other servers [puppet] - 10https://gerrit.wikimedia.org/r/286785 (https://bugzilla.wikimedia.org/134271) [01:02:20] (03PS2) 10Dzahn: ircserver: don't use TS6 protocol, no other servers [puppet] - 10https://gerrit.wikimedia.org/r/286785 (https://bugzilla.wikimedia.org/134271) [01:03:16] (03PS2) 10Dzahn: ircserver: move ircd.conf to public repo (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/286783 (https://phabricator.wikimedia.org/T134271) [01:03:24] (03CR) 10Dzahn: [C: 04-1] ircserver: move ircd.conf to public repo (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/286783 (https://phabricator.wikimedia.org/T134271) (owner: 10Dzahn) [01:18:36] RECOVERY - puppet last run on ms-fe2003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [01:22:28] RECOVERY - Juniper alarms on asw-ulsfo.mgmt.ulsfo.wmnet is OK: JNX_ALARMS OK - 0 red alarms, 0 yellow alarms [01:30:36] PROBLEM - Juniper alarms on asw-ulsfo.mgmt.ulsfo.wmnet is CRITICAL: JNX_ALARMS CRITICAL - 1 red alarms, 0 yellow alarms [01:35:19] 06Operations, 10Traffic: Internet Explorer 11.0 hangs on wikimedia.org sites - https://phabricator.wikimedia.org/T134332#2262310 (10Krenair) [02:08:09] 06Operations, 10Traffic, 07Browser-Support-Internet-Explorer: Internet Explorer 11.0 hangs on wikimedia.org sites - https://phabricator.wikimedia.org/T134332#2262326 (10Danny_B) [02:08:21] 06Operations, 10Traffic, 07Browser-Support-Internet-Explorer: Internet Explorer 11.0 hangs on wikimedia.org sites - https://phabricator.wikimedia.org/T134332#2262293 (10Danny_B) [02:10:33] 06Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 13Patch-For-Review, 05WMF-deploy-2016-05-01_(1.27.0-wmf.23): Create Wikipedia Jamaican - https://phabricator.wikimedia.org/T134017#2262328 (10Krenair) Just waiting for labs replication to start working now... [02:11:57] (03CR) 10Hoo man: "On which grounds was this re-enabled? The questions raised weren't properly answered." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/284091 (https://phabricator.wikimedia.org/T126741) (owner: 10Yurik) [02:13:35] (03CR) 10Yurik: "Hoo, Lydia, Stas, and I had a hangout meeting, and decided that having this capability does not interfere with the plans for the wikidata " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/284091 (https://phabricator.wikimedia.org/T126741) (owner: 10Yurik) [02:14:03] 06Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 13Patch-For-Review, 05WMF-deploy-2016-05-01_(1.27.0-wmf.23): Create Wikipedia Jamaican - https://phabricator.wikimedia.org/T134017#2262330 (10Krenair) Actually, maybe I was wrong: https://jam.wikipedia.org/api/rest_v1/page/html/User%3AK... [02:15:05] 06Operations, 10Traffic, 07Browser-Support-Internet-Explorer, 07TestMe: Internet Explorer 11.0 hangs on wikimedia.org sites - https://phabricator.wikimedia.org/T134332#2262335 (10Danny_B) @AxelBoldt Please specify exact IE and Windows version (ie. including build #), thanks. In IE 11.0.9600.17914 @ Win7 U... [02:16:23] 06Operations, 10Traffic, 07Browser-Support-Internet-Explorer, 07TestMe: Internet Explorer 11.0 hangs on wikimedia.org sites - https://phabricator.wikimedia.org/T134332#2262293 (10brion) Works for me in IE 11 on a Win7 vm. [02:20:31] 06Operations, 10Traffic, 07Browser-Support-Internet-Explorer, 07TestMe: Internet Explorer 11.0 hangs on wikimedia.org sites - https://phabricator.wikimedia.org/T134332#2262293 (10BBlack) >>! In T134332#2262335, @Danny_B wrote: > @AxelBoldt Please specify exact IE and Windows version (ie. including build #)... [02:23:43] !log mwdeploy@tin sync-l10n completed (1.27.0-wmf.22) (duration: 09m 19s) [02:23:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:26:02] 06Operations, 10Traffic, 07Browser-Support-Internet-Explorer, 07TestMe: Internet Explorer 11.0 hangs on wikimedia.org sites - https://phabricator.wikimedia.org/T134332#2262356 (10Danny_B) >>! In T134332#2262351, @BBlack wrote: >>>! In T134332#2262335, @Danny_B wrote: >> @AxelBoldt Please specify exact IE a... [02:31:43] 06Operations, 10Traffic, 07Browser-Support-Internet-Explorer, 07TestMe: Internet Explorer 11.0 hangs on wikimedia.org sites - https://phabricator.wikimedia.org/T134332#2262358 (10BBlack) That's fascinating. Is it only commons and meta like the original description, or all our sites? Probably completely u... [02:36:58] 06Operations, 10Internet-Archive, 10Wikimedia-Planet, 07Upstream: wordpress.com seems to have blocked us from fetching feeds - https://phabricator.wikimedia.org/T133818#2262361 (10Dzahn) Wordpress admins reacted and contact us a few days ago. They could not confirm the 500 errors on their side. Upon furt... [02:45:19] 06Operations, 10Traffic, 07Browser-Support-Internet-Explorer, 07TestMe: Internet Explorer 11.0 hangs on wikimedia.org sites - https://phabricator.wikimedia.org/T134332#2262363 (10AxelBoldt) My information: IE Version 11.0.9600.18282CO, Windows 7 Enterprise 64bit Service Pack 1, build 7601, 7601.23391.amd64... [02:46:35] 06Operations, 10Internet-Archive, 10Wikimedia-Planet, 07Upstream: wordpress.com seems to have blocked us from fetching feeds - https://phabricator.wikimedia.org/T133818#2262364 (10Dzahn) using curl from the planet1001 machine also actually times out , while it works fine from my laptop [02:50:19] 06Operations, 10Traffic, 07Browser-Support-Internet-Explorer, 07TestMe: Internet Explorer 11.0 hangs on wikimedia.org sites - https://phabricator.wikimedia.org/T134332#2262365 (10Danny_B) Any add-ons by any chance? [02:51:30] 06Operations, 10Traffic, 07Browser-Support-Internet-Explorer, 07TestMe: Internet Explorer 11.0 hangs on wikimedia.org sites - https://phabricator.wikimedia.org/T134332#2262366 (10Danny_B) @AxelBoldt To clarify: by "Wikimedia sites" you mean only *.wikimedia.org, right? [02:56:40] 06Operations, 10Traffic, 07Browser-Support-Internet-Explorer, 07TestMe: Internet Explorer 11.0 hangs on wikimedia.org sites - https://phabricator.wikimedia.org/T134332#2262368 (10BBlack) It's a bit perplexing that this is only affecting sites in `wikimedia.org` for you. That's not really a technical disti... [02:57:09] !log mwdeploy@tin sync-l10n completed (1.27.0-wmf.23) (duration: 17m 12s) [02:57:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:57:39] 06Operations, 10Traffic, 07Browser-Support-Internet-Explorer, 07TestMe: Internet Explorer 11.0 hangs on wikimedia.org sites - https://phabricator.wikimedia.org/T134332#2262369 (10AxelBoldt) Yes, I mean only *.wikimedia.org sites. There is in fact one suspicious add-on enabled: "DefaultTab Browser Helper"... [02:58:42] 06Operations, 10Traffic, 07Browser-Support-Internet-Explorer, 07TestMe: Internet Explorer 11.0 hangs on wikimedia.org sites - https://phabricator.wikimedia.org/T134332#2262370 (10Krenair) @AxelBoldt, would you mind giving wikitech.wikimedia.org and gerrit.wikimedia.org a try? [03:02:27] (03PS2) 10BearND: Deploy mobileapps using scap3 [puppet] - 10https://gerrit.wikimedia.org/r/286695 (https://phabricator.wikimedia.org/T129147) [03:04:04] Dereckson: re: the failed feed updates.. planet is a liar [03:04:26] PING lb.wordpress.com (192.0.78.13) 56(84) bytes of data. [03:04:26] 64 bytes from 192.0.78.13: icmp_seq=1 ttl=61 time=0.163 ms [03:04:26] 64 bytes from 192.0.78.13: icmp_seq=2 ttl=61 time=0.161 ms [03:04:44] eh wrong paste [03:05:01] so first of all, it calls this: # capture http status [03:05:18] but then it just does some stuff like: [03:05:22] 75 if not data.has_key("status"):... [03:05:27] else: [03:05:34] data.status = 500 [03:05:56] so if anything fails it sets it actively to 500, but calls it "capturing" the status, yea right [03:06:14] then, with more debugging i can see the real error is a timeout: [03:06:30] 'bozo': 1, 'bozo_exception': URLError(timeout('timed out',) [03:06:41] !log l10nupdate@tin ResourceLoader cache refresh completed at Wed May 4 03:06:41 UTC 2016 (duration 9m 32s) [03:06:45] which is also funny because planet-code tries to detect that timeout.. and that also fails [03:06:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:07:10] and in the end it's just that the planet machine cant talk to the Wordpress LB at all.. while others can [03:07:24] no ping, no curl, no traceroute with ICMP or TCP [03:08:05] so i mailed them about that.. let's see what happens next [03:08:11] laters ! afk [03:13:11] 06Operations, 10Traffic, 07Browser-Support-Internet-Explorer, 07TestMe: Internet Explorer 11.0 hangs on wikimedia.org sites - https://phabricator.wikimedia.org/T134332#2262389 (10AxelBoldt) None of maps.wikimedia.org, stats.wikimedia.org, phabricator.wikimedia.org, meta.wikimedia.org, commons.wikimedia.org... [03:17:31] 06Operations, 10Traffic, 07Browser-Support-Internet-Explorer, 07TestMe: Internet Explorer 11.0 hangs on wikimedia.org sites - https://phabricator.wikimedia.org/T134332#2262390 (10Krenair) Did it get stuck with "Waiting for commons.wikimedia.org" or return you a funny error on those two? [03:18:22] 06Operations, 10Traffic, 07Browser-Support-Internet-Explorer, 07TestMe: Internet Explorer 11.0 hangs on wikimedia.org sites - https://phabricator.wikimedia.org/T134332#2262397 (10BBlack) >>! In T134332#2262389, @AxelBoldt wrote: > DNS problems should apply to all browsers equally, no? Well, sure, but so s... [03:24:50] 06Operations, 10Traffic, 07Browser-Support-Internet-Explorer, 07TestMe: Internet Explorer 11.0 hangs on wikimedia.org sites - https://phabricator.wikimedia.org/T134332#2262401 (10AxelBoldt) I just reinstalled the browser and all Wikimedia sites work fine now. I am sorry for having wasted everybody's time. [03:25:22] 06Operations, 10Traffic, 07Browser-Support-Internet-Explorer, 07TestMe: Internet Explorer 11.0 hangs on wikimedia.org sites - https://phabricator.wikimedia.org/T134332#2262402 (10AxelBoldt) 05Open>03Invalid [03:27:50] (03CR) 10BearND: "Removed mobileapps/deploy from hieradata/common/role/deployment.yaml." [puppet] - 10https://gerrit.wikimedia.org/r/286695 (https://phabricator.wikimedia.org/T129147) (owner: 10BearND) [05:03:37] 06Operations, 10Wikimedia-IRC-RC-Server, 13Patch-For-Review: Replace ircd-ratbox with something newer/maintained - https://phabricator.wikimedia.org/T134271#2260405 (10Danny_B) [[ https://github.com/freenode/ircd-seven | ircd-seven ]] may be considered as well as a [[ https://upload.wikimedia.org/wikipedia/c... [05:17:34] (03CR) 10Thcipriani: [C: 031] Deploy mobileapps using scap3 [puppet] - 10https://gerrit.wikimedia.org/r/286695 (https://phabricator.wikimedia.org/T129147) (owner: 10BearND) [05:55:53] same [06:20:26] (03CR) 10Jcrespo: [C: 04-1] Switched to pt-heartbeat lag detection on s6 (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/243116 (https://phabricator.wikimedia.org/T111266) (owner: 10Aaron Schulz) [06:20:38] !log restarting elasticsearch server elastic2015.codfw.wmnet (T110236) [06:20:39] T110236: Use unicast instead of multicast for node communication - https://phabricator.wikimedia.org/T110236 [06:20:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [06:21:42] (03CR) 10Jcrespo: Switched to pt-heartbeat lag detection on s6 (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/243116 (https://phabricator.wikimedia.org/T111266) (owner: 10Aaron Schulz) [06:30:35] PROBLEM - puppet last run on restbase2006 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:15] PROBLEM - puppet last run on mw1135 is CRITICAL: CRITICAL: Puppet has 3 failures [06:31:35] PROBLEM - puppet last run on mw2073 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:04] PROBLEM - puppet last run on mw1170 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:22] hi [06:32:54] PROBLEM - puppet last run on mw2045 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:44] PROBLEM - puppet last run on mw2207 is CRITICAL: CRITICAL: Puppet has 1 failures [06:35:56] were [06:56:25] RECOVERY - puppet last run on mw1135 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [06:56:55] RECOVERY - puppet last run on mw2207 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [06:57:06] RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [06:57:37] (03PS1) 10Muehlenhoff: Amend imagemagick policy to also include the URL decoder [puppet] - 10https://gerrit.wikimedia.org/r/286790 [06:57:45] RECOVERY - puppet last run on restbase2006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:55] RECOVERY - puppet last run on mw2045 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:35] RECOVERY - puppet last run on mw2073 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:00:09] 06Operations, 10Wikimedia-General-or-Unknown, 07I18n, 07Upstream: Update Malayalam fonts packages - https://phabricator.wikimedia.org/T33950#2262594 (10santhosh) fonts-lohit-mlym and [[ http://packages.ubuntu.com/trusty/fonts-smc | fonts-smc ]] has Malayalam fonts - all support latest unicode versions. Loh... [07:04:31] 06Operations, 10Wikimedia-General-or-Unknown, 07I18n, 07Upstream: Update Malayalam fonts packages - https://phabricator.wikimedia.org/T33950#2262598 (10MoritzMuehlenhoff) @Santosh : Thanks, I'll add the fonts-smc package, then. [07:07:30] (03CR) 10Mobrovac: [C: 031] Deploy mobileapps using scap3 [puppet] - 10https://gerrit.wikimedia.org/r/286695 (https://phabricator.wikimedia.org/T129147) (owner: 10BearND) [07:07:47] (03PS6) 10Aaron Schulz: Switched to pt-heartbeat lag detection on s6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/243116 (https://phabricator.wikimedia.org/T111266) [07:17:59] (03CR) 10Jcrespo: [C: 04-1] Switched to pt-heartbeat lag detection on s6 (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/243116 (https://phabricator.wikimedia.org/T111266) (owner: 10Aaron Schulz) [07:26:01] (03CR) 10Mobrovac: [C: 04-1] "cxserver/deploy is already present in hieradata/common/role/deployment.yaml so you need to remove it from there." [puppet] - 10https://gerrit.wikimedia.org/r/286395 (https://phabricator.wikimedia.org/T120104) (owner: 10KartikMistry) [07:37:12] PROBLEM - puppet last run on mw2014 is CRITICAL: CRITICAL: puppet fail [07:54:55] 06Operations, 10Wikimedia-General-or-Unknown, 07I18n, 07Upstream: Update Malayalam fonts packages - https://phabricator.wikimedia.org/T33950#2262679 (10MoritzMuehlenhoff) a:03MoritzMuehlenhoff [08:01:51] 06Operations, 10Wikimedia-General-or-Unknown, 07I18n, 07Upstream: Update Malayalam fonts packages - https://phabricator.wikimedia.org/T33950#2262699 (10MoritzMuehlenhoff) Our app servers are currently running trusty, but we're planning to migrate to jessie in the next months. jessie is already configured t... [08:04:20] RECOVERY - puppet last run on mw2014 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [08:08:36] (03PS1) 10Jcrespo: Depool db1058 for reimage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286792 (https://phabricator.wikimedia.org/T125028) [08:10:36] (03CR) 10Jcrespo: [C: 032] Depool db1058 for reimage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286792 (https://phabricator.wikimedia.org/T125028) (owner: 10Jcrespo) [08:14:30] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Depool db1058 for reimage (duration: 00m 36s) [08:14:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:19:37] !log stopping db1058 mysql for backup in preparation for reimage [08:19:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:22:45] 06Operations, 06Discovery, 10Monitoring, 10Wikidata, 10Wikidata-Query-Service: Create response time monitoring for WDQS endpoint - https://phabricator.wikimedia.org/T119915#2262722 (10Smalyshev) I think we need to at least put monitor on whatever "varnish latency" counts and alert say if it's over 30 s. [08:27:47] 06Operations, 10CirrusSearch, 06Discovery, 06Discovery-Search-Backlog, and 4 others: "Elastica: missing curl_init_pooled method" due to mwscript job running with PHP 5 on terbium - https://phabricator.wikimedia.org/T132751#2262738 (10hashar) The last event in logstash is at 2016-05-03T10:15:02.000Z and cam... [08:28:05] 06Operations, 10CirrusSearch, 06Discovery, 06Discovery-Search-Backlog, and 4 others: "Elastica: missing curl_init_pooled method" due to mwscript job running with PHP 5 on terbium - https://phabricator.wikimedia.org/T132751#2262739 (10hashar) [08:32:34] 06Operations, 10Wikimedia-General-or-Unknown, 07I18n, 07Upstream: Update Malayalam fonts packages - https://phabricator.wikimedia.org/T33950#2262755 (10santhosh) ttf-malayalam-fonts is renamed to fonts-smc based on debian font package naming conventions introduced recently. All font packages in debian foll... [08:32:54] 06Operations, 10Wikimedia-General-or-Unknown, 07I18n, 07Upstream: Update Malayalam fonts packages - https://phabricator.wikimedia.org/T33950#355946 (10KartikMistry) +1 for fonts-smc. [08:34:03] 06Operations, 10Wikimedia-General-or-Unknown, 07I18n, 07Upstream: Update Malayalam fonts packages - https://phabricator.wikimedia.org/T33950#2262758 (10MoritzMuehlenhoff) I'm convinced, then :-) Will prepare a patch later [08:34:06] (03PS1) 10Jcrespo: Prepare db1058 for jessie reimage [puppet] - 10https://gerrit.wikimedia.org/r/286795 (https://phabricator.wikimedia.org/T125028) [08:35:15] (03CR) 10Jcrespo: [C: 032] Prepare db1058 for jessie reimage [puppet] - 10https://gerrit.wikimedia.org/r/286795 (https://phabricator.wikimedia.org/T125028) (owner: 10Jcrespo) [08:42:18] !log restarting elasticsearch server elastic2016.codfw.wmnet (T110236) [08:42:18] T110236: Use unicast instead of multicast for node communication - https://phabricator.wikimedia.org/T110236 [08:42:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:52:17] 06Operations, 10ops-eqiad, 06DC-Ops, 10netops: asw-d-eqiad SNMP failures - https://phabricator.wikimedia.org/T112781#2262790 (10fgiunchedi) I just noticed JNX_ALARMS flapping in icinga with "no response from remote host" and asw-d-eqiad, possibly related to this too? ```lines=5 neon:/var/log/icinga$ fgrep... [08:56:58] 06Operations, 10Beta-Cluster-Infrastructure, 10Traffic, 07Varnish: varnish text on beta is unreachable / stuck - https://phabricator.wikimedia.org/T134346#2262798 (10hashar) [09:04:58] 06Operations, 10DBA, 06Labs, 10Labs-Infrastructure: Upgrade db1069 - https://phabricator.wikimedia.org/T134349#2262819 (10jcrespo) [09:10:11] 06Operations, 06Discovery, 10Monitoring, 10Wikidata, 10Wikidata-Query-Service: Create response time monitoring for WDQS endpoint - https://phabricator.wikimedia.org/T119915#2262856 (10Addshore) Just looking at the other things I am recording right now but it may infact make sense to put a monitor on the... [09:13:32] 06Operations, 10Beta-Cluster-Infrastructure, 10Traffic, 13Patch-For-Review, 07Varnish: Beta cluster varnish sets overly broad domain on GeoIP cookie - https://phabricator.wikimedia.org/T133936#2262859 (10Aklapper) [09:13:36] 06Operations, 10Traffic, 10Wikidata, 07Varnish: Varnish seems to sometimes mangle uncompressed API results - https://phabricator.wikimedia.org/T133866#2262861 (10Aklapper) [09:13:39] 06Operations, 10Traffic, 10Wikimedia-Apache-configuration, 07Varnish: Data passed to HHVM ($_SERVER variables) is a mixed bag of already-decoded and non-decoded nonsense - https://phabricator.wikimedia.org/T132629#2262864 (10Aklapper) [09:13:42] 06Operations, 10Analytics-EventLogging, 10RESTBase, 06Services, and 3 others: RESTBase should handle the X-Analytics header - https://phabricator.wikimedia.org/T133139#2262863 (10Aklapper) [09:13:51] 06Operations, 10MobileFrontend, 10Traffic, 13Patch-For-Review, and 2 others: Stop default redirecting Samsung Smart TVs to mobile web - https://phabricator.wikimedia.org/T127021#2262871 (10Aklapper) [09:13:53] 06Operations, 06Performance-Team, 10Traffic, 07Varnish: Understand and improve streaming behaviour from Varnish - https://phabricator.wikimedia.org/T126015#2262873 (10Aklapper) [09:14:03] 06Operations, 10MediaWiki-API, 10Traffic, 07Varnish: Evaluate the feasibility of cache invalidation for the action API - https://phabricator.wikimedia.org/T122867#2262880 (10Aklapper) [09:15:49] !log restarting enwiki-labs reimports (lag could happen temporarily) [09:15:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:20:02] 06Operations, 10Beta-Cluster-Infrastructure, 10Traffic: varnish text on beta is unreachable / stuck - https://phabricator.wikimedia.org/T134346#2262964 (10hashar) The flow of traffic comes from deployment-changeprop, which hit restbase02 and then the text cache frontend. Marko has stopped the change prop pr... [09:24:15] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 630.86 seconds [09:25:31] (03PS5) 10KartikMistry: cxserver: scap3 migration [puppet] - 10https://gerrit.wikimedia.org/r/286395 (https://phabricator.wikimedia.org/T120104) [09:30:12] 06Operations, 10Beta-Cluster-Infrastructure, 10Traffic: varnish text on beta is unreachable / stuck - https://phabricator.wikimedia.org/T134346#2263008 (10hashar) p:05Unbreak!>03Normal So the root cause is apparently change prop sending too many updates that ends up overloading the varnish text frontend.... [09:35:46] 06Operations, 10Beta-Cluster-Infrastructure, 10Traffic: varnish text on beta is unreachable / stuck - https://phabricator.wikimedia.org/T134346#2262742 (10mobrovac) >>! In T134346#2263008, @hashar wrote: > So the root cause is apparently change prop sending too many updates that ends up overloading the varni... [09:43:47] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Comment inline, -1ing but I may have missed something" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/282160 (owner: 10Muehlenhoff) [09:44:07] !log Updated cxserver to 45596ac [09:44:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:44:45] PROBLEM - Disk space on tin is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=67%) [09:46:44] I'll take a look at tin [09:47:46] 06Operations, 10OTRS, 07Upstream: Investigate OTRS 5.0.6 memory leak - https://phabricator.wikimedia.org/T126448#2263030 (10akosiaris) p:05High>03Low Since we got a very effective mitigation in place, I am lowering priority. [09:49:38] 06Operations, 06Performance-Team, 13Patch-For-Review: Update memcached package and configuration options - https://phabricator.wikimedia.org/T129963#2263036 (10ori) I am generating a dump of some 10,000,000 key/value pairs from mc100[1..4], but my internet connection is so slow that working on the cluster is... [09:49:50] 06Operations, 10DBA: mysql user and group should be a system user/group - https://phabricator.wikimedia.org/T100501#2263037 (10jcrespo) p:05Normal>03Low [09:50:02] 06Operations, 10DBA: mysql user and group should be a system user/group - https://phabricator.wikimedia.org/T100501#1314171 (10jcrespo) [09:50:34] 06Operations, 10DBA, 06Labs: disk failure on labsdb1002 - https://phabricator.wikimedia.org/T126946#2263042 (10jcrespo) [09:50:36] RECOVERY - Disk space on tin is OK: DISK OK [09:50:39] godog: tin was me, i created a dump of memcached key/values for testing purposes and it grew to 4gb. [09:50:45] ori: how big is that dump ^ tin just ran out of disk space [09:50:49] ah good tming [09:51:16] PROBLEM - swift-container-updater on ms-be3004 is CRITICAL: Connection refused by host [09:51:25] PROBLEM - RAID on ms-be3004 is CRITICAL: Connection refused by host [09:51:26] PROBLEM - swift-object-replicator on ms-be3004 is CRITICAL: Connection refused by host [09:51:34] heh the vg has another 95G, we could extend / [09:51:37] PROBLEM - puppet last run on ms-be3004 is CRITICAL: Connection refused by host [09:51:41] in 2016 4gb is not a lot, but since we like to waste time on menial problems we make sure to allocate just barely enough for the root partition [09:51:55] PROBLEM - swift-account-replicator on ms-be3004 is CRITICAL: Connection refused by host [09:51:56] PROBLEM - swift-account-auditor on ms-be3004 is CRITICAL: Connection refused by host [09:51:56] PROBLEM - DPKG on ms-be3004 is CRITICAL: Connection refused by host [09:51:56] PROBLEM - swift-container-auditor on ms-be3004 is CRITICAL: Connection refused by host [09:51:56] PROBLEM - swift-object-updater on ms-be3004 is CRITICAL: Connection refused by host [09:51:56] PROBLEM - swift-object-server on ms-be3004 is CRITICAL: Connection refused by host [09:51:56] PROBLEM - salt-minion processes on ms-be3004 is CRITICAL: Connection refused by host [09:52:05] PROBLEM - configured eth on ms-be3004 is CRITICAL: Connection refused by host [09:52:15] PROBLEM - swift-account-server on ms-be3004 is CRITICAL: Connection refused by host [09:52:15] PROBLEM - very high load average likely xfs on ms-be3004 is CRITICAL: Connection refused by host [09:52:26] PROBLEM - Check size of conntrack table on ms-be3004 is CRITICAL: Connection refused by host [09:52:27] PROBLEM - swift-object-auditor on ms-be3004 is CRITICAL: Connection refused by host [09:52:37] PROBLEM - swift-container-replicator on ms-be3004 is CRITICAL: Connection refused by host [09:52:46] PROBLEM - dhclient process on ms-be3004 is CRITICAL: Connection refused by host [09:52:46] PROBLEM - swift-account-reaper on ms-be3004 is CRITICAL: Connection refused by host [09:52:57] PROBLEM - swift-container-server on ms-be3004 is CRITICAL: Connection refused by host [09:53:21] (03CR) 10Alexandros Kosiaris: [C: 031] Add LVS configuration for EventBus in codfw (DNS reverse/wmnet config already in place). [puppet] - 10https://gerrit.wikimedia.org/r/286621 (https://phabricator.wikimedia.org/T121558) (owner: 10Elukey) [09:53:38] it'd be nice to have a single check that verifies that all configured swift services are running, as opposed to eight checks for eight swift-* services :-/ [09:54:19] indeed [09:54:36] more visible when it breaks in this way :P [09:55:14] ori: re: tin and small / part of the reason is l10nupdate caches not being cleaned up https://phabricator.wikimedia.org/T130317 [09:55:16] i deleted the dump on tin, but with / at 31G out of 37G, it is bound to explode again soon [09:56:37] (03PS1) 10Mobrovac: Beta: RESTBase: Contact the MW API directly [puppet] - 10https://gerrit.wikimedia.org/r/286803 (https://phabricator.wikimedia.org/T134346) [09:57:55] RECOVERY - swift-account-replicator on ms-be3004 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [09:57:56] RECOVERY - swift-account-auditor on ms-be3004 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [09:57:56] RECOVERY - DPKG on ms-be3004 is OK: All packages OK [09:57:56] RECOVERY - swift-container-auditor on ms-be3004 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [09:57:56] RECOVERY - swift-object-server on ms-be3004 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [09:57:56] RECOVERY - salt-minion processes on ms-be3004 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [09:57:56] RECOVERY - swift-object-updater on ms-be3004 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [09:58:06] RECOVERY - configured eth on ms-be3004 is OK: OK - interfaces up [09:58:11] 06Operations, 10Deployment-Systems, 06Release-Engineering-Team, 03Scap3: setup automatic deletion of old l10nupdate - https://phabricator.wikimedia.org/T130317#2263057 (10ori) p:05Normal>03High @mmodell, blocking this on porting l10nupdate to scap doesn't seem reasonable. Could you simply make pruning... [09:58:16] RECOVERY - swift-account-server on ms-be3004 is OK: PROCS OK: 25 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [09:58:16] RECOVERY - very high load average likely xfs on ms-be3004 is OK: OK - load average: 7.08, 7.39, 8.22 [09:58:26] RECOVERY - Check size of conntrack table on ms-be3004 is OK: OK: nf_conntrack is 19 % full [09:58:26] RECOVERY - swift-object-auditor on ms-be3004 is OK: PROCS OK: 3 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [09:58:28] that was nrpe failing to properly start during a restart, looking into it [09:58:45] RECOVERY - swift-container-replicator on ms-be3004 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [09:58:46] RECOVERY - dhclient process on ms-be3004 is OK: PROCS OK: 0 processes with command name dhclient [09:58:46] RECOVERY - swift-account-reaper on ms-be3004 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [09:58:57] RECOVERY - swift-container-server on ms-be3004 is OK: PROCS OK: 25 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [09:59:02] godog: replied on that task, going to bed, good night! [09:59:16] RECOVERY - swift-container-updater on ms-be3004 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-updater [09:59:20] ori: thanks! I've bumped tin's / space too [09:59:26] RECOVERY - RAID on ms-be3004 is OK: OK: optimal, 12 logical, 12 physical [09:59:26] RECOVERY - swift-object-replicator on ms-be3004 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [09:59:34] thanks [09:59:37] RECOVERY - puppet last run on ms-be3004 is OK: OK: Puppet is currently enabled, last run 21 minutes ago with 0 failures [09:59:43] !log root@tin:/# lvresize -r -v --size +30G /dev/mapper/tin--vg-root [09:59:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:05:11] !log restarting elasticsearch server elastic2017.codfw.wmnet (T110236) [10:05:12] T110236: Use unicast instead of multicast for node communication - https://phabricator.wikimedia.org/T110236 [10:05:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:07:08] (03CR) 10Mobrovac: [C: 031] "Tested on beta, works." [puppet] - 10https://gerrit.wikimedia.org/r/286803 (https://phabricator.wikimedia.org/T134346) (owner: 10Mobrovac) [10:08:55] (03CR) 10Jcrespo: [C: 032] Beta: RESTBase: Contact the MW API directly [puppet] - 10https://gerrit.wikimedia.org/r/286803 (https://phabricator.wikimedia.org/T134346) (owner: 10Mobrovac) [10:17:30] (03PS3) 10Elukey: Add LVS configuration for EventBus in codfw (DNS reverse/wmnet config already in place). [puppet] - 10https://gerrit.wikimedia.org/r/286621 (https://phabricator.wikimedia.org/T121558) [10:18:43] (03CR) 10Elukey: [C: 032] Add LVS configuration for EventBus in codfw (DNS reverse/wmnet config already in place). [puppet] - 10https://gerrit.wikimedia.org/r/286621 (https://phabricator.wikimedia.org/T121558) (owner: 10Elukey) [10:23:18] !log restarting db1058 for reimaging to jessie T125028 [10:23:18] T125028: reimage or decom db servers on precise - https://phabricator.wikimedia.org/T125028 [10:23:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:23:41] FYI I am working on lvs2006 atm [10:25:54] !log updating pybal/LVS with codfw eventbus config on lvs2006 [10:26:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:29:08] !log rolling restart of parsoid in eqiad to pick up openssl update [10:29:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:32:31] mmmm the logs on pybal are a bit weird, it complained for a bit about wtp2XXX hosts not up [10:33:46] !log restarting elasticsearch server elastic2018.codfw.wmnet (T110236) [10:33:47] T110236: Use unicast instead of multicast for node communication - https://phabricator.wikimedia.org/T110236 [10:33:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:33:58] 06Operations, 10ops-eqiad: db1058 does not come up after restart - https://phabricator.wikimedia.org/T134360#2263194 (10jcrespo) [10:34:02] (03PS1) 10Dereckson: Shakespeare in London throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286807 (https://phabricator.wikimedia.org/T134353) [10:34:34] (03CR) 10jenkins-bot: [V: 04-1] Shakespeare in London throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286807 (https://phabricator.wikimedia.org/T134353) (owner: 10Dereckson) [10:42:34] (03PS2) 10Dereckson: Shakespeare in London throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286807 (https://phabricator.wikimedia.org/T134353) [10:42:36] 06Operations, 10procurement: Certificate renewal for stream.wikimedia.org - https://phabricator.wikimedia.org/T134361#2263223 (10MoritzMuehlenhoff) [10:42:40] 06Operations, 10Beta-Cluster-Infrastructure, 10Traffic: beta cluster varnish cache can't apt-get upgrade nginx-full: nginx: [emerg] unknown "spdy" variable - https://phabricator.wikimedia.org/T134362#2263236 (10hashar) [10:43:09] 06Operations, 10Beta-Cluster-Infrastructure, 10Traffic: beta cluster varnish cache can't apt-get upgrade nginx-full: nginx: [emerg] unknown "spdy" variable - https://phabricator.wikimedia.org/T134362#2263249 (10hashar) ``` # grep -n spdy /etc/nginx/sites-enabled/unified 6: listen [::]:443 default_server def... [10:43:24] 06Operations, 10procurement: Certificate renewal for toolserver.org - https://phabricator.wikimedia.org/T134363#2263250 (10MoritzMuehlenhoff) [10:45:41] (03PS1) 10Alexandros Kosiaris: conftool: eventbus in codfw as well [puppet] - 10https://gerrit.wikimedia.org/r/286808 [10:46:09] ahhhh [10:46:21] all right makes sense... [10:48:28] (03CR) 10Elukey: [C: 032] conftool: eventbus in codfw as well [puppet] - 10https://gerrit.wikimedia.org/r/286808 (owner: 10Alexandros Kosiaris) [10:48:38] 06Operations, 10Beta-Cluster-Infrastructure, 10Traffic, 07WorkType-Maintenance: beta cluster varnish cache can't apt-get upgrade nginx-full: nginx: [emerg] unknown "spdy" variable - https://phabricator.wikimedia.org/T134362#2263270 (10hashar) 05Open>03Resolved a:03hashar I have moved out the config f... [10:51:56] (03CR) 10DCausse: [C: 031] Collect pending_tasks count metric from elasticsearch [puppet] - 10https://gerrit.wikimedia.org/r/286756 (https://phabricator.wikimedia.org/T134240) (owner: 10Gehel) [10:54:11] (03PS1) 10Alexandros Kosiaris: lvs: Add eventbus codfw site [puppet] - 10https://gerrit.wikimedia.org/r/286809 [10:54:54] 06Operations, 10ops-eqiad: db1058 does not come up after restart - https://phabricator.wikimedia.org/T134360#2263332 (10jcrespo) Resetting the interface does not do anything. Also trying to power it up from the web interface. Console output after power on is inexistent. [10:55:21] (03PS2) 10Elukey: lvs: Add eventbus codfw site [puppet] - 10https://gerrit.wikimedia.org/r/286809 (owner: 10Alexandros Kosiaris) [10:55:47] (03PS2) 10Gehel: Collect pending_tasks count metric from elasticsearch [puppet] - 10https://gerrit.wikimedia.org/r/286756 (https://phabricator.wikimedia.org/T134240) [10:56:05] (03CR) 10Alexandros Kosiaris: [C: 031] "Looks fine, how do you want guys to coordinate the migration ?" [puppet] - 10https://gerrit.wikimedia.org/r/286695 (https://phabricator.wikimedia.org/T129147) (owner: 10BearND) [10:57:25] (03CR) 10Gehel: [C: 032] Collect pending_tasks count metric from elasticsearch [puppet] - 10https://gerrit.wikimedia.org/r/286756 (https://phabricator.wikimedia.org/T134240) (owner: 10Gehel) [10:57:31] (03CR) 10Elukey: [C: 032] lvs: Add eventbus codfw site [puppet] - 10https://gerrit.wikimedia.org/r/286809 (owner: 10Alexandros Kosiaris) [10:57:41] (03PS3) 10Elukey: lvs: Add eventbus codfw site [puppet] - 10https://gerrit.wikimedia.org/r/286809 (owner: 10Alexandros Kosiaris) [10:57:47] (03PS1) 10Dereckson: Namespace configuration for gl.wikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286811 (https://phabricator.wikimedia.org/T134041) [10:58:05] gehel you stole my merge position in gerrit :P [10:58:29] Sorry for that... the queue is not that visible... [10:58:39] ahahahah no no I am kidding [10:58:40] elukey: don't worry, I'll be slower next time [10:59:17] * jynus would love to do an inapropiate joke [11:00:30] jynus: shoot :P [11:01:36] * elukey hides [11:01:41] {{File:Sting.ogg}} [11:02:07] ahahhaah [11:02:52] * gehel is reading back. It is worse than he initially thought. [11:03:17] * gehel needs to think much more before writing anything in public [11:03:26] !log restarting elasticsearch server elastic2019.codfw.wmnet (T110236) [11:03:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:03:42] T110236: Use unicast instead of multicast for node communication - https://phabricator.wikimedia.org/T110236 [11:04:39] 06Operations, 10hardware-requests: new labstore hardware for eqiad - https://phabricator.wikimedia.org/T126089#2263357 (10MoritzMuehlenhoff) [11:04:42] 06Operations, 10ops-eqiad, 06DC-Ops: testing: r430 server / h800 controller / md1200 shelf - https://phabricator.wikimedia.org/T127490#2263354 (10MoritzMuehlenhoff) 05Resolved>03Open Reopening the ticket, since wmf4727-test.eqiad.wmnet is still up and running and hooked into puppet and salt (it's not in... [11:06:33] 06Operations, 10Traffic: gu.wikiquote.org URL Do Not work - https://phabricator.wikimedia.org/T134343#2263360 (10Danny_B) p:05High>03Low WFM. @YmKavishwar Does it still not work for you? [11:06:46] 06Operations, 10Traffic, 07TestMe: gu.wikiquote.org URL Do Not work - https://phabricator.wikimedia.org/T134343#2263363 (10Danny_B) [11:07:42] (03PS1) 10Dereckson: Remove redundant NS_PROJECT entries from wgNamespacesAliases [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286812 (https://phabricator.wikimedia.org/T131023) [11:09:16] !log removed obsolete mediawiki-math-texvc/imagemagick from nobelium [11:09:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:09:38] (03CR) 10Hashar: "Has been cherry picked on beta and CI puppet masters (in labs)." [puppet] - 10https://gerrit.wikimedia.org/r/284852 (https://phabricator.wikimedia.org/T132689) (owner: 1020after4) [11:10:05] 06Operations: kvm on ganeti instances getting stuck - https://phabricator.wikimedia.org/T134242#2263371 (10MoritzMuehlenhoff) p:05Triage>03High [11:10:15] 06Operations, 10Ops-Access-Requests, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team: Allow RelEng nova log access - https://phabricator.wikimedia.org/T133992#2263372 (10MoritzMuehlenhoff) p:05Triage>03Normal [11:10:49] 06Operations, 10Traffic: confctl: give regexen more freedom - https://phabricator.wikimedia.org/T134323#2263373 (10MoritzMuehlenhoff) p:05Triage>03Low [11:10:51] 06Operations, 10Ops-Access-Requests: Allow mobrovac to run puppet on SC(A|B) - https://phabricator.wikimedia.org/T134251#2263374 (10MoritzMuehlenhoff) p:05Triage>03Normal [11:10:58] 06Operations, 10Traffic: confctl select needs a -y flag? - https://phabricator.wikimedia.org/T134324#2263375 (10MoritzMuehlenhoff) p:05Triage>03Low [11:11:44] 06Operations, 10procurement: Certificate renewal for toolserver.org - https://phabricator.wikimedia.org/T134363#2263378 (10MoritzMuehlenhoff) p:05Triage>03High [11:11:57] 06Operations, 10procurement: Certificate renewal for stream.wikimedia.org - https://phabricator.wikimedia.org/T134361#2263382 (10MoritzMuehlenhoff) p:05Triage>03High [11:12:12] 06Operations, 10ops-eqiad: db1058 does not come up after restart - https://phabricator.wikimedia.org/T134360#2263384 (10MoritzMuehlenhoff) a:03Cmjohnson [11:13:29] 06Operations, 10Traffic, 07TestMe: gu.wikiquote.org URL Do Not work - https://phabricator.wikimedia.org/T134343#2263386 (10YmKavishwar) still its not working. i have try to open my lapton and mobile both but its not work. [11:15:28] 06Operations, 10Traffic, 07TestMe: gu.wikiquote.org URL Do Not work - https://phabricator.wikimedia.org/T134343#2263387 (10YmKavishwar) @Danny_B webise not open means its really need high priority. [11:16:36] !log updating Pybal/LVS for codfw eventbus on lvs2003 [11:16:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:16:53] 06Operations, 10Traffic, 07TestMe: gu.wikiquote.org URL Do Not work - https://phabricator.wikimedia.org/T134343#2262627 (10KartikMistry) Works for me. [11:17:24] (03PS3) 10Dereckson: Enable UploadsLink at Wikimedia Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286517 (https://phabricator.wikimedia.org/T130018) (owner: 10Rillke) [11:17:44] (03CR) 10Dereckson: "PS3: add reference to task" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286517 (https://phabricator.wikimedia.org/T130018) (owner: 10Rillke) [11:19:46] 06Operations, 10Traffic, 07TestMe: gu.wikiquote.org URL Do Not work - https://phabricator.wikimedia.org/T134343#2263393 (10YmKavishwar) >>! In T134343#2263389, @KartikMistry wrote: > Works for me. good. still not working for me. [11:20:13] 06Operations, 10Beta-Cluster-Infrastructure, 10Traffic, 07WorkType-Maintenance: beta cluster varnish cache can't apt-get upgrade nginx-full: nginx: [emerg] unknown "spdy" variable - https://phabricator.wikimedia.org/T134362#2263396 (10BBlack) 05Resolved>03Open This is because we're halfway through the... [11:20:48] 06Operations, 10Traffic, 07TestMe: gu.wikiquote.org URL Do Not work - https://phabricator.wikimedia.org/T134343#2263399 (10Dereckson) p:05Low>03Triage @YmKavishwar Could you copy/paste the full message of the errors you have? [11:21:08] 06Operations, 10Beta-Cluster-Infrastructure, 10Traffic, 07WorkType-Maintenance: beta cluster varnish cache can't apt-get upgrade nginx-full: nginx: [emerg] unknown "spdy" variable - https://phabricator.wikimedia.org/T134362#2263401 (10BBlack) Oh, I see now also that puppet auto-upgraded the package for you... [11:22:44] mobrovac: curl http://eventbus.svc.codfw.wmnet:8085/v1/topics :) [11:25:34] elukey: \o/ ! [11:25:35] 06Operations, 10ops-codfw, 06Analytics-Kanban, 06DC-Ops, and 5 others: setup kafka2001 & kafka2002 - https://phabricator.wikimedia.org/T121558#2263404 (10elukey) LVS configuration set up on lvs200[36]: curl http://eventbus.svc.codfw.wmnet:8085/v1/topics Pending verification, but the work should be done :) [11:25:36] thnx! [11:26:12] elukey: this curl reminds me, we have a bug there, as it doesn't show the topic names with dc prefixes [11:26:22] more tech debt [11:26:23] * mobrovac sighs [11:27:14] mobrovac: let me know if there is anything else left to do [11:28:50] 06Operations, 10Traffic, 07TestMe: gu.wikiquote.org URL Do Not work - https://phabricator.wikimedia.org/T134343#2263408 (10YmKavishwar) >>! In T134343#2263399, @Dereckson wrote: > @YmKavishwar Could you copy/paste the full message of the errors you have? Exception encountered, of type "InvalidArgumentExcept... [11:30:35] (03CR) 10KartikMistry: [C: 031] Translate: Use Apertium via cxserver [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286632 (https://phabricator.wikimedia.org/T133008) (owner: 10Nikerabbit) [11:33:25] mobrovac: those two cert procurement tasks are in the wrong space [11:33:32] They should be in S4 so I can't access them [11:36:41] 06Operations, 10MediaWiki-ResourceLoader, 10Traffic: commons.wikimedia.org home page has 404s loaded from JS (RL?) - https://phabricator.wikimedia.org/T134368#2263419 (10BBlack) [11:36:46] p858snake: thanks, I've moved these [11:38:27] without looking the stats, I would be surprised if toolserver.org was still highly used enough to warrant keeping it around [11:40:56] p858snake: we have links everywhere to toolserver.org [11:43:12] just checked the date, only killed 1 year, 10 months, 3 days ago, felt a lot longer [11:43:25] toolserver.org could use letsencrypt.org anyways probably [11:43:29] 06Operations, 10Traffic, 07TestMe: gu.wikiquote.org URL Do Not work - https://phabricator.wikimedia.org/T134343#2263454 (10YmKavishwar) its true not working only for me. after logout its acceced, after login its not acceced. iam only one contributer and local sysop here. [11:45:57] (03PS3) 10Nikerabbit: Translate: Use Apertium via cxserver [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286632 (https://phabricator.wikimedia.org/T133008) [11:51:22] 06Operations, 10Traffic, 07TestMe: gu.wikiquote.org URL Do Not work - https://phabricator.wikimedia.org/T134343#2263464 (10Danny_B) Ah, so there are other pre-conditions necessary to replicate... Please provide as //complete// as possible description of what you have done right before it stopped working for... [11:51:27] !log restarting elasticsearch server elastic2020.codfw.wmnet (T110236) [11:51:28] T110236: Use unicast instead of multicast for node communication - https://phabricator.wikimedia.org/T110236 [11:51:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:55:44] (03PS1) 10BBlack: cache_upload HTTP/2 switch [puppet] - 10https://gerrit.wikimedia.org/r/286816 (https://phabricator.wikimedia.org/T96848) [11:55:46] (03PS1) 10BBlack: cache_text HTTP/2 switch [puppet] - 10https://gerrit.wikimedia.org/r/286817 (https://phabricator.wikimedia.org/T96848) [11:55:48] (03PS1) 10BBlack: remove do_spdy conditional, all h2, 1/2 [puppet] - 10https://gerrit.wikimedia.org/r/286818 (https://phabricator.wikimedia.org/T96848) [11:55:50] (03PS1) 10BBlack: remove do_spdy hieradata, all h2, 2/2 [puppet] - 10https://gerrit.wikimedia.org/r/286819 (https://phabricator.wikimedia.org/T96848) [11:56:17] \o/ [11:57:03] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 265.30 seconds [12:00:12] !log starting cache_upload HTTP/2 switch process [12:00:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:00:37] (03CR) 10BBlack: [C: 032] cache_upload HTTP/2 switch [puppet] - 10https://gerrit.wikimedia.org/r/286816 (https://phabricator.wikimedia.org/T96848) (owner: 10BBlack) [12:01:47] (03PS1) 10Hashar: beta: disable spdy on Nginx tlsproxies [puppet] - 10https://gerrit.wikimedia.org/r/286821 (https://phabricator.wikimedia.org/T134362) [12:05:22] 06Operations, 10Beta-Cluster-Infrastructure, 10Traffic, 13Patch-For-Review, 07WorkType-Maintenance: beta cluster varnish cache can't apt-get upgrade nginx-full: nginx: [emerg] unknown "spdy" variable - https://phabricator.wikimedia.org/T134362#2263549 (10hashar) Based on @bblack input patch set `tlsproxy... [12:06:18] (03CR) 10Hashar: "I have cherry picked it on the beta cluster puppetmaster. Diff output pasted on T134362 and shows spdy is replaced by http2 :}" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/286821 (https://phabricator.wikimedia.org/T134362) (owner: 10Hashar) [12:08:05] 06Operations, 10Traffic, 07TestMe: gu.wikiquote.org URL Do Not work - https://phabricator.wikimedia.org/T134343#2263567 (10YmKavishwar) i have try to internet explorer, chrome and UC Browser (on mobile). after logout its worked. no problem with other wikiproject. my mediawiki usernamne is YmKavishwar. this p... [12:08:39] !log restarting elasticsearch server elastic2021.codfw.wmnet (T110236) [12:08:39] T110236: Use unicast instead of multicast for node communication - https://phabricator.wikimedia.org/T110236 [12:08:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:10:33] !log cache_upload HTTP/2 switch process complete [12:10:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:11:19] (03PS1) 10ArielGlenn: for 'other' datasets html page, avoid links outside of the directory tree [puppet] - 10https://gerrit.wikimedia.org/r/286823 [12:11:43] (03PS1) 10Muehlenhoff: Add salt grain for stat1004 [puppet] - 10https://gerrit.wikimedia.org/r/286824 [12:12:23] 06Operations, 10Traffic, 07TestMe: gu.wikiquote.org URL Do Not work - https://phabricator.wikimedia.org/T134343#2263571 (10YmKavishwar) i have try to login this wiki with AWB but login failed. error mesage: you have provide ileage username. [12:12:25] (03CR) 10BBlack: [C: 032] cache_text HTTP/2 switch [puppet] - 10https://gerrit.wikimedia.org/r/286817 (https://phabricator.wikimedia.org/T96848) (owner: 10BBlack) [12:12:34] !log starting cache_text HTTP/2 switch process [12:12:41] 06Operations, 10Traffic, 07TestMe: gu.wikiquote.org URL Do Not work - https://phabricator.wikimedia.org/T134343#2262627 (10Luke081515) @YmKavishwar If you got this error message, mediawiki should always show the stack trace. Can you post that too? [12:12:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:13:09] 06Operations, 10ops-ulsfo: ulsfo power issues - https://phabricator.wikimedia.org/T134370#2263575 (10faidon) [12:13:17] (03PS2) 10ArielGlenn: for 'other' datasets html page, avoid links outside of the directory tree [puppet] - 10https://gerrit.wikimedia.org/r/286823 [12:13:54] 06Operations, 10Traffic, 07TestMe: gu.wikiquote.org URL Do Not work - https://phabricator.wikimedia.org/T134343#2263588 (10Danny_B) >>! In T134343#2263571, @YmKavishwar wrote: > i have try to login this wiki with AWB but login failed. error mesage: you have provide ileage username. Did you type your name ma... [12:14:22] (03PS3) 10ArielGlenn: for 'other' datasets html page, avoid links outside of the directory tree [puppet] - 10https://gerrit.wikimedia.org/r/286823 [12:15:19] 06Operations, 10Traffic, 07TestMe: gu.wikiquote.org URL Do Not work - https://phabricator.wikimedia.org/T134343#2263602 (10YmKavishwar) maybe its problem related with this page ? https://gu.wikiquote.org/wiki/%E0%AA%AE%E0%AB%80%E0%AA%A1%E0%AA%BF%E0%AA%AF%E0%AA%BE%E0%AA%B5%E0%AA%BF%E0%AA%95%E0%AA%BF:Newuserme... [12:15:49] (03CR) 10ArielGlenn: [C: 032] for 'other' datasets html page, avoid links outside of the directory tree [puppet] - 10https://gerrit.wikimedia.org/r/286823 (owner: 10ArielGlenn) [12:15:54] 06Operations, 10Traffic, 07TestMe: gu.wikiquote.org URL Do Not work - https://phabricator.wikimedia.org/T134343#2263603 (10YmKavishwar) >>! In T134343#2263588, @Danny_B wrote: >>>! In T134343#2263571, @YmKavishwar wrote: >> i have try to login this wiki with AWB but login failed. error mesage: you have provi... [12:17:15] (03PS1) 10Muehlenhoff: Also add canaries to the parsoid debdeploy groups [puppet] - 10https://gerrit.wikimedia.org/r/286827 [12:19:06] (03PS3) 10Mobrovac: Change-Prop: Enable summary and definition updates. [puppet] - 10https://gerrit.wikimedia.org/r/286539 (owner: 10Ppchelko) [12:20:18] 06Operations, 10Traffic, 07TestMe: gu.wikiquote.org URL Do Not work - https://phabricator.wikimedia.org/T134343#2263608 (10Danny_B) I logged in with admin rights and it worked for me as well. [12:21:53] (03CR) 10Mobrovac: [C: 031] "Cherry-picked in beta, works as advertised." [puppet] - 10https://gerrit.wikimedia.org/r/286539 (owner: 10Ppchelko) [12:22:41] (03CR) 10Alexandros Kosiaris: [C: 032] Change-Prop: Enable summary and definition updates. [puppet] - 10https://gerrit.wikimedia.org/r/286539 (owner: 10Ppchelko) [12:23:06] !log cache_text HTTP/2 switch process complete [12:23:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:23:19] 06Operations, 10Traffic, 07TestMe: gu.wikiquote.org URL Do Not work - https://phabricator.wikimedia.org/T134343#2263610 (10Aklapper) Can you please provide a screenshot of the problem? [12:24:18] (03PS2) 10Muehlenhoff: Also add canaries to the parsoid debdeploy groups [puppet] - 10https://gerrit.wikimedia.org/r/286827 [12:24:33] (03CR) 10Muehlenhoff: [C: 032 V: 032] Also add canaries to the parsoid debdeploy groups [puppet] - 10https://gerrit.wikimedia.org/r/286827 (owner: 10Muehlenhoff) [12:24:56] (03PS2) 10Muehlenhoff: Add salt grain for stat1004 [puppet] - 10https://gerrit.wikimedia.org/r/286824 [12:25:43] (03CR) 10Muehlenhoff: [C: 032 V: 032] Add salt grain for stat1004 [puppet] - 10https://gerrit.wikimedia.org/r/286824 (owner: 10Muehlenhoff) [12:27:11] heh I just noticed the HTTP/2 on my firefox [12:31:00] (03PS1) 10Mobrovac: Changeprop: Re-render only enwiktionary definitions [puppet] - 10https://gerrit.wikimedia.org/r/286830 [12:32:18] (03PS2) 10Mobrovac: Changeprop: Re-render only enwiktionary definitions [puppet] - 10https://gerrit.wikimedia.org/r/286830 [12:34:11] !log restarting blazegraph (T134238) [12:34:12] T134238: Query service fails with "Too many open files" - https://phabricator.wikimedia.org/T134238 [12:34:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:36:36] 06Operations, 06Discovery, 03Discovery-Search-Sprint, 07Elasticsearch, 13Patch-For-Review: Publish "pending_tasks" count from Elastic search cluster to graphite - https://phabricator.wikimedia.org/T134240#2263644 (10Gehel) [12:42:51] PROBLEM - changeprop endpoints health on scb1001 is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=10.64.0.16, port=7272): Max retries exceeded with url: /?spec (Caused by ProtocolError(Connection aborted., error(111, Connection refused))) [12:43:12] PROBLEM - changeprop endpoints health on scb1002 is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=10.64.16.21, port=7272): Max retries exceeded with url: /?spec (Caused by ProtocolError(Connection aborted., error(111, Connection refused))) [12:43:13] (03CR) 10Nemo bis: [C: 031] "+1 to enabling Apertium in $wgTranslateTranslationServices" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286632 (https://phabricator.wikimedia.org/T133008) (owner: 10Nikerabbit) [12:45:45] ignore ^ [12:47:03] akosiaris: ping re https://gerrit.wikimedia.org/r/#/c/286830/ [12:47:17] stopped changeprop in the meantime [12:47:33] (03CR) 10Alexandros Kosiaris: [C: 032] Changeprop: Re-render only enwiktionary definitions [puppet] - 10https://gerrit.wikimedia.org/r/286830 (owner: 10Mobrovac) [12:50:15] !log restarting elasticsearch server elastic2022.codfw.wmnet (T110236) [12:50:16] T110236: Use unicast instead of multicast for node communication - https://phabricator.wikimedia.org/T110236 [12:50:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:57:02] RECOVERY - changeprop endpoints health on scb1001 is OK: All endpoints are healthy [13:01:37] 06Operations, 10ops-codfw, 06Analytics-Kanban, 06DC-Ops, and 5 others: setup kafka2001 & kafka2002 - https://phabricator.wikimedia.org/T121558#2263744 (10elukey) [13:03:12] 06Operations, 06Analytics-Kanban, 13Patch-For-Review: Upgrade stat1001 to Debian Jessie - https://phabricator.wikimedia.org/T76348#2263748 (10elukey) [13:03:29] 06Operations, 06Analytics-Kanban, 13Patch-For-Review: Upgrade stat1001 to Debian Jessie - https://phabricator.wikimedia.org/T76348#798408 (10elukey) [13:08:29] RECOVERY - changeprop endpoints health on scb1002 is OK: All endpoints are healthy [13:11:15] (03PS1) 10BBlack: varnishxcps: fix CP header parsing for H2= [puppet] - 10https://gerrit.wikimedia.org/r/286836 (https://phabricator.wikimedia.org/T118892) [13:11:58] (03CR) 10BBlack: [C: 032 V: 032] varnishxcps: fix CP header parsing for H2= [puppet] - 10https://gerrit.wikimedia.org/r/286836 (https://phabricator.wikimedia.org/T118892) (owner: 10BBlack) [13:12:05] (03PS1) 10Hashar: contint: drop libcurl4-gnutls-dev [puppet] - 10https://gerrit.wikimedia.org/r/286837 (https://phabricator.wikimedia.org/T134378) [13:12:49] (03PS1) 10ArielGlenn: rsync dumps script fixes [puppet] - 10https://gerrit.wikimedia.org/r/286838 [13:20:22] (03PS2) 10ArielGlenn: rsync dumps script fixes [puppet] - 10https://gerrit.wikimedia.org/r/286838 [13:21:39] (03CR) 10ArielGlenn: [C: 032] rsync dumps script fixes [puppet] - 10https://gerrit.wikimedia.org/r/286838 (owner: 10ArielGlenn) [13:24:44] !log restarting elasticsearch server elastic2023.codfw.wmnet (T110236) [13:24:45] T110236: Use unicast instead of multicast for node communication - https://phabricator.wikimedia.org/T110236 [13:24:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:32:52] (03PS1) 10Jcrespo: Depool db1065 for hardware maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286841 (https://phabricator.wikimedia.org/T133250) [13:34:18] 06Operations, 06Performance-Team, 10Traffic, 13Patch-For-Review: Support HTTP/2 - https://phabricator.wikimedia.org/T96848#2263903 (10BBlack) text and upload have been converted now as well, so all cache clusters have made the HTTP/2 switch. I've also done a last-minute fixup to the varnishxcps script (it... [13:39:00] (03PS1) 10Elukey: Add fake database password for hue [labs/private] - 10https://gerrit.wikimedia.org/r/286844 (https://phabricator.wikimedia.org/T127990) [13:39:24] (03CR) 10Elukey: [C: 032 V: 032] Add fake database password for hue [labs/private] - 10https://gerrit.wikimedia.org/r/286844 (https://phabricator.wikimedia.org/T127990) (owner: 10Elukey) [13:40:43] (03PS1) 10Elukey: Add external database configuration for Hue (Analytics) [puppet] - 10https://gerrit.wikimedia.org/r/286845 (https://phabricator.wikimedia.org/T127990) [13:40:57] !log rolling restart of apertium in sca1* for openssl update [13:41:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:41:40] !log jmm@palladium conftool action : set/pooled=no; selector: sca1001.eqiad.wmnet [13:41:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:42:33] !log jmm@palladium conftool action : set/pooled=yes; selector: sca1001.eqiad.wmnet [13:42:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:42:51] (03CR) 10jenkins-bot: [V: 04-1] Add external database configuration for Hue (Analytics) [puppet] - 10https://gerrit.wikimedia.org/r/286845 (https://phabricator.wikimedia.org/T127990) (owner: 10Elukey) [13:44:28] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 633 600 - REDIS on 10.192.48.44:6479 has 1 databases (db0) with 5708253 keys - replication_delay is 633 [13:46:02] (03PS2) 10Elukey: Add external database configuration for Hue (Analytics) [puppet] - 10https://gerrit.wikimedia.org/r/286845 (https://phabricator.wikimedia.org/T127990) [13:47:10] (03CR) 10jenkins-bot: [V: 04-1] Add external database configuration for Hue (Analytics) [puppet] - 10https://gerrit.wikimedia.org/r/286845 (https://phabricator.wikimedia.org/T127990) (owner: 10Elukey) [13:48:29] * elukey writes 100 times: use parse validate and puppet lint before sending a code review [13:49:12] (03PS3) 10Elukey: Add external database configuration for Hue (Analytics) [puppet] - 10https://gerrit.wikimedia.org/r/286845 (https://phabricator.wikimedia.org/T127990) [13:49:35] !log jmm@palladium conftool action : set/pooled=no; selector: sca1002.eqiad.wmnet [13:49:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:49:53] !log jmm@palladium conftool action : set/pooled=yes; selector: sca1002.eqiad.wmnet [13:49:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:52:46] (03PS1) 10Mobrovac: Change prop: Add the rule for MobileApps re-renders [puppet] - 10https://gerrit.wikimedia.org/r/286847 [13:54:06] (03PS4) 10Elukey: Add external database configuration for Hue (Analytics) [puppet] - 10https://gerrit.wikimedia.org/r/286845 (https://phabricator.wikimedia.org/T127990) [13:57:44] (03CR) 10Elukey: "Puppet compiler looks good: https://puppet-compiler.wmflabs.org/2665/" [puppet] - 10https://gerrit.wikimedia.org/r/286845 (https://phabricator.wikimedia.org/T127990) (owner: 10Elukey) [13:58:39] (03PS2) 10Mobrovac: Change prop: Add the rule for MobileApps re-renders [puppet] - 10https://gerrit.wikimedia.org/r/286847 [14:05:01] PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:07:22] (03CR) 10Ottomata: Add beta-specific access.conf exceptions in scap::target (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/286754 (https://phabricator.wikimedia.org/T121721) (owner: 1020after4) [14:08:59] RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy [14:09:20] 06Operations, 10Traffic, 06Zero, 13Patch-For-Review: Use Text IP for Mobile hostnames to gain SPDY/H2 coalesce between the two - https://phabricator.wikimedia.org/T124482#2264028 (10BBlack) It's been 8.8 days since 10-minute TTL expiry, and the rates are low enough that we definitely don't have any kind o... [14:09:39] PROBLEM - eventlogging-service-eventbus endpoints health on kafka1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:09:52] ouch [14:09:55] checking [14:10:16] hm [14:11:26] ottomata: o/ [14:11:48] HMMM [14:12:03] UnknownError: TopicMetadata(topic='codfw.test.event', error=-1, partitions=[]) ?? [14:12:12] !log restarting elasticsearch server elastic2024.codfw.wmnet (T110236) [14:12:13] T110236: Use unicast instead of multicast for node communication - https://phabricator.wikimedia.org/T110236 [14:12:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:13:05] hm, elukey kafka2001 is doing it too [14:13:18] (03CR) 10Rush: Add beta-specific access.conf exceptions in scap::target (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/286754 (https://phabricator.wikimedia.org/T121721) (owner: 1020after4) [14:13:27] elukey: i'm going to restart it on 2001 [14:13:38] ottomata: where did you find that log? [14:13:39] RECOVERY - eventlogging-service-eventbus endpoints health on kafka1002 is OK: All endpoints are healthy [14:14:21] en. [VyoDsApAMFUAAJGhRiEAAAAB] 2016-05-04 14:14:08: Fatal exception of type MWException [14:14:44] (no more exception) [14:14:50] elukey: /var/log/eventlogging/eventlogging-service-eventbus.log [14:14:52] but: Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "/mathoid/local/v1/":): {\displaystyle \hat{H} | \psi_n(t) \rangle = i \hbar \frac{\partial}{\partial t} | \psi_n(t) \rangle } [14:15:12] 06Operations, 10ops-ulsfo: ulsfo power issues - https://phabricator.wikimedia.org/T134370#2264050 (10RobH) [14:15:14] 06Operations, 10ops-ulsfo: power loss in ulsfo cabinet 1.23 - https://phabricator.wikimedia.org/T134330#2264051 (10RobH) [14:15:37] ottomata: yeah I am on kafka1002 and I was checking that file but didn't find it [14:16:13] Dereckson: link? [14:16:20] https://en.wikipedia.org/wiki/Quantum_pseudo-telepathy [14:16:22] to the page that presents the error [14:16:24] kk [14:16:42] elukey: its in the stack trace of Uncaught exception POST /v1/events (10.192.0.139)#012HTTPServerRe ... [14:16:42] Page renders correctly in private browser logged out. [14:16:45] 06Operations, 10ops-ulsfo: power loss in ulsfo cabinet 1.23 - https://phabricator.wikimedia.org/T134330#2264057 (10RobH) Something bad did happen, we lost all power to one of the two PDU towers in 1.23. It seems the power loss event fried some power supplies, since power has been restored but they are offline... [14:17:10] PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:17:23] Dereckson: i use mathml / svg and i see no errors on that page [14:18:06] ottomata: not sure why I can't see the exception, the log finishes at May 3rd for me [14:18:12] why is icinga complaining? all's looking good on scb1001 [14:18:17] anyhow, I'll figure it out later [14:18:20] PROBLEM - puppet last run on restbase-test2002 is CRITICAL: CRITICAL: puppet fail [14:18:32] urandom: that you ^ ? [14:18:38] mobrovac: ctrl + maj + r led me to a [VyoEogpAMEcAABFSep0AAABQ] 2016-05-04 14:18:10: Fatal exception of type MWException [14:18:44] mobrovac: no [14:18:50] mobrovac: did yall start producing a bunch more to eventbus recently? [14:18:50] and then another, I've the figure correctly [14:18:55] like a few hours ago? [14:19:18] ottomata: yup [14:19:32] Dereckson: still can't see it. is it still happening for you? [14:19:40] PROBLEM - restbase endpoints health on restbase1008 is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=10.64.32.178, port=7231): Max retries exceeded with url: /en.wikipedia.org/v1/?spec (Caused by ProtocolError(Connection aborted., BadStatusLine(,))) [14:19:47] ok, i see that, i think we need to spawn some more processes, wasn't aware yall were about to do that [14:19:48] doh [14:19:59] wtf? [14:20:10] i touch nothing [14:21:03] so elukey i'm not sure if this is related, but i have been watching eventbus throughput, and while the 1 processes running on each node now should be able to handle the current rate, it is getting close to where I think we should spawn more [14:21:07] there are 2 ways to do this [14:21:09] RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy [14:21:20] cassandra is very active on eb1008 [14:21:27] tornado supports multiple processes internally, so we can just tell it to run more [14:21:43] ottomata: yes, spawn more [14:21:45] OR [14:21:45] we could puppetize more processes on different ports, and configure LVS appropriately [14:22:02] both would work. i think we should try the tornado stuff first [14:22:13] the docs say that this may be more difficult to monitor, but i'm not sure why really [14:22:31] !log installing sys schema on dbs with performance_schema enabled [14:22:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:22:38] ottomata: tornado uses one master process that listen()s and workers that accept() connections? [14:23:37] mobrovac: http://www.tornadoweb.org/en/stable/guide/running.html#processes-and-ports [14:24:06] <_joe_> ottomata: lvs can only do 1:1 ports [14:24:18] <_joe_> so you might want to put nginx in front of it [14:24:38] <_joe_> lvs-dr, I mean, which is the one we use [14:24:47] and you were laughing at us for using nodejs for this [14:24:49] ;) [14:25:33] <_joe_> mobrovac: I think uwsgi is by far the best way to handle multiple runners for a webapp [14:25:46] <_joe_> that is, with python [14:26:05] <_joe_> for node, I used phusion passenger in the past and it worked pretty well [14:26:44] <_joe_> I am not against managing everything within the app, it just takes more glue to do things "correctly" [14:27:01] ottomata: requests tripled during the past hours as you were saying https://grafana.wikimedia.org/dashboard/db/eventbus?panelId=1&fullscreen [14:27:29] <_joe_> (passenger manages silent rolling restarts of the app, for instance, which is pretty nice and we don't have) [14:28:25] sure, but at this point using ngix and god-knows-what to scale the proxy service looks like super-overkill to me [14:28:27] (03PS1) 10Ottomata: Increase eventbus num processors to 8 [puppet] - 10https://gerrit.wikimedia.org/r/286851 [14:28:51] mobrovac: we try this first, in my local testing it well, we haven't tried it in prod though [14:29:00] PROBLEM - mobileapps endpoints health on scb1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:29:40] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 5673207 keys - replication_delay is 0 [14:29:45] (03CR) 10Ottomata: [C: 032] Increase eventbus num processors to 8 [puppet] - 10https://gerrit.wikimedia.org/r/286851 (owner: 10Ottomata) [14:29:51] RECOVERY - restbase endpoints health on restbase1008 is OK: All endpoints are healthy [14:30:13] ottomata: we need to find a solution right now, everything is hung up on that service now [14:30:45] !log change-prop: stopping the service until we scale the http proxy service [14:30:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:31:38] mobrovac: is it not working? [14:32:04] (03PS1) 10Rush: scap access.conf entries for labs deployments [puppet] - 10https://gerrit.wikimedia.org/r/286852 (https://phabricator.wikimedia.org/T121721) [14:32:30] from what i can tell it is still working mobrovac what's hung up? [14:32:47] ottomata: restbase started getting hung up because of proxy service time outs [14:33:14] hmm [14:33:20] ottomata: ETIMEDOUT for URI http://eventbus.svc.eqiad.wmnet:8085/v1/events [14:33:22] on rb [14:33:56] PROBLEM - changeprop endpoints health on scb1002 is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=10.64.16.21, port=7272): Max retries exceeded with url: /?spec (Caused by ProtocolError(Connection aborted., error(111, Connection refused))) [14:34:15] RECOVERY - mobileapps endpoints health on scb1002 is OK: All endpoints are healthy [14:34:19] mobrovac: just increased workers to 8 on each node [14:34:22] try now? [14:34:25] PROBLEM - changeprop endpoints health on scb1001 is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=10.64.0.16, port=7272): Max retries exceeded with url: /?spec (Caused by ProtocolError(Connection aborted., error(111, Connection refused))) [14:34:26] ottomata: more than 100k of those in the last 15 mins [14:35:13] hmm, that is not good that eventbus doesn't give any indication of that hm. [14:35:25] sorry mobrovac i should have told yall not to go over a certain threshold until we upped processes [14:35:42] it had been running in initial deployment single process settings since we deployed it [14:36:00] ottomata: usually it should be ok, but we might have stumbled upon a template edit or something [14:36:56] ottomata: ok, starting changeprop back up, let's see [14:37:35] hm, nm, puppet started it for me it seems [14:37:56] RECOVERY - changeprop endpoints health on scb1002 is OK: All endpoints are healthy [14:38:16] PROBLEM - Check that eventlogging-service-eventbus is running on kafka1001 is CRITICAL: PROCS CRITICAL: 9 processes with command name python, args /srv/deployment/eventlogging/eventbus/bin/eventlogging-service @/etc/eventlogging.d/services/eventbus [14:38:16] RECOVERY - changeprop endpoints health on scb1001 is OK: All endpoints are healthy [14:38:27] haha [14:38:30] shoudl change it to at least 1 [14:38:31] haha [14:39:16] PROBLEM - Check that eventlogging-service-eventbus is running on kafka2002 is CRITICAL: PROCS CRITICAL: 9 processes with command name python, args /srv/deployment/eventlogging/eventbus/bin/eventlogging-service @/etc/eventlogging.d/services/eventbus [14:39:22] ottomata: ^^^ [14:39:29] yeah that's just a dumb alarm [14:39:36] PROBLEM - Check that eventlogging-service-eventbus is running on kafka2001 is CRITICAL: PROCS CRITICAL: 9 processes with command name python, args /srv/deployment/eventlogging/eventbus/bin/eventlogging-service @/etc/eventlogging.d/services/eventbus [14:39:36] PROBLEM - restbase endpoints health on restbase1008 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:39:38] ah right [14:39:42] ottomata: chaging the alarm [14:39:45] restbase critical again [14:40:05] mobrovac: elukey noticed that 1002 hadn't applied the num procs setting, i just restarted eventbus now [14:41:13] hm checking things [14:41:27] RECOVERY - restbase endpoints health on restbase1008 is OK: All endpoints are healthy [14:41:53] mobrovac: hi there, saw mobileapps flapping, anything i can do to help right now? [14:42:30] hey there mdholloway, no all's under control so to speak [14:42:42] these are semi-expected [14:42:53] or at least, known [14:43:09] ok, cool [14:43:32] (03PS1) 10Muehlenhoff: Enable base::firewall on kraz [puppet] - 10https://gerrit.wikimedia.org/r/286856 [14:44:35] RECOVERY - puppet last run on restbase-test2002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:45:56] PROBLEM - Check that eventlogging-service-eventbus is running on kafka1002 is CRITICAL: PROCS CRITICAL: 9 processes with command name python, args /srv/deployment/eventlogging/eventbus/bin/eventlogging-service @/etc/eventlogging.d/services/eventbus [14:48:00] ACKNOWLEDGEMENT - Check that eventlogging-service-eventbus is running on kafka1001 is CRITICAL: PROCS CRITICAL: 9 processes with command name python, args /srv/deployment/eventlogging/eventbus/bin/eventlogging-service @/etc/eventlogging.d/services/eventbus Elukey Increased the number of processes, wrong check, the service is running. [14:48:00] ACKNOWLEDGEMENT - Check that eventlogging-service-eventbus is running on kafka1002 is CRITICAL: PROCS CRITICAL: 9 processes with command name python, args /srv/deployment/eventlogging/eventbus/bin/eventlogging-service @/etc/eventlogging.d/services/eventbus Elukey Increased the number of processes, wrong check, the service is running. [14:48:00] ACKNOWLEDGEMENT - Check that eventlogging-service-eventbus is running on kafka2001 is CRITICAL: PROCS CRITICAL: 9 processes with command name python, args /srv/deployment/eventlogging/eventbus/bin/eventlogging-service @/etc/eventlogging.d/services/eventbus Elukey Increased the number of processes, wrong check, the service is running. [14:48:00] ACKNOWLEDGEMENT - Check that eventlogging-service-eventbus is running on kafka2002 is CRITICAL: PROCS CRITICAL: 9 processes with command name python, args /srv/deployment/eventlogging/eventbus/bin/eventlogging-service @/etc/eventlogging.d/services/eventbus Elukey Increased the number of processes, wrong check, the service is running. [14:49:28] elukey: you fixing process alarm? [14:49:33] just make it say at least one :) [14:49:52] ottomata: ah yeah trying to figure out where they are defined, but yes I am working on it :) [14:50:08] twentyafterfour: hey, here? [14:51:15] ottomata: elukey: how are the service and kafka doing? [14:51:32] elukey: in eventlogging/service/service.pp [14:51:47] near bottom [14:51:47] change [14:51:49] "/usr/lib/nagios/plugins/check_procs -c 1:1 [14:51:51] to "/usr/lib/nagios/plugins/check_procs -c 1: [14:52:15] thanks [14:52:17] (03PS1) 10Jcrespo: Cleanup my.cnf by grouping options and enable skip-slave-start [puppet] - 10https://gerrit.wikimedia.org/r/286858 [14:52:29] mobrovac: https://grafana.wikimedia.org/dashboard/db/eventbus - the last change seems to have improved the throughput [14:55:38] 06Operations, 07Performance: Package and deploy Mcrouter as a replacement for twemproxy - https://phabricator.wikimedia.org/T132317#2264146 (10Joe) I just have one doubt about this: are we using the memcached binary protocol at all anywhere? [14:56:03] hmm elukey kafka1001 is the leader for all topics! [14:56:17] or, at least ones that are getting data [14:56:38] hmm, that isn't true [14:56:39] hm [14:56:56] i'm just looking at messages in per sec and seeing that kafka1001 has way ore [14:56:56] ottomata: EB is now handling 1.3 K msg/sec [14:56:57] more [14:57:08] 06Operations, 07Performance: Package and deploy Mcrouter as a replacement for twemproxy - https://phabricator.wikimedia.org/T132317#2264149 (10Joe) a:03Joe [14:57:21] elukey: ottomata: that's likely changeprop catching up [14:57:57] (03CR) 10Jcrespo: [C: 032] Depool db1065 for hardware maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286841 (https://phabricator.wikimedia.org/T133250) (owner: 10Jcrespo) [14:59:04] (03PS1) 10Elukey: Require at least one Event Logging process running for EventBus. [puppet] - 10https://gerrit.wikimedia.org/r/286859 [14:59:20] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Depool db1065 for maintenance (duration: 00m 35s) [14:59:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:59:53] mobrovac: i think resource_change having only 1 partition is going to be a problem [15:00:05] anomie ostriches thcipriani marktraceur Krenair: Dear anthropoid, the time has come. Please deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160504T1500). [15:00:05] Nikerabbit Thiemo_WMDE: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [15:00:13] (03CR) 10Elukey: [C: 032] Require at least one Event Logging process running for EventBus. [puppet] - 10https://gerrit.wikimedia.org/r/286859 (owner: 10Elukey) [15:00:19] hey, cmjohnson1 [15:00:20] o7 [15:00:25] jouncebot: ping [15:01:08] jouncebot: next [15:01:08] In 0 hour(s) and 58 minute(s): UploadsLink deployment (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160504T1600) [15:01:12] o/ [15:01:19] I can SWAT this morning [15:01:32] ottomata: yeah, possibly [15:02:03] (03PS4) 10Thcipriani: Translate: Use Apertium via cxserver [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286632 (https://phabricator.wikimedia.org/T133008) (owner: 10Nikerabbit) [15:02:18] (03CR) 10KartikMistry: [C: 031] Translate: Use Apertium via cxserver [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286632 (https://phabricator.wikimedia.org/T133008) (owner: 10Nikerabbit) [15:02:32] :) [15:02:40] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286632 (https://phabricator.wikimedia.org/T133008) (owner: 10Nikerabbit) [15:02:58] mobrovac: yeah, it means that all messages are produced to a single broker [15:03:01] that won't scale well [15:03:21] (03Merged) 10jenkins-bot: Translate: Use Apertium via cxserver [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286632 (https://phabricator.wikimedia.org/T133008) (owner: 10Nikerabbit) [15:04:27] Hi. [15:04:51] (03PS1) 10Jcrespo: Depool db1065, now for real [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286862 [15:05:26] Dereckson: o/ [15:06:39] ^I need to apply this for hardware issues [15:06:56] RECOVERY - Check that eventlogging-service-eventbus is running on kafka2001 is OK: PROCS OK: 9 processes with command name python, args /srv/deployment/eventlogging/eventbus/bin/eventlogging-service @/etc/eventlogging.d/services/eventbus [15:07:05] ottomata --^ [15:07:06] thank you Luke081515 ! :) [15:07:08] oops! [15:07:09] ahh [15:07:10] elukey: [15:07:13] :) [15:07:13] jynus: kk, one sec, I'll sync and get out of your way. [15:07:17] running puppet on the others [15:07:39] elukey: let's keep an aye on this eventbus num_processes thing, espceially since this is the first time we've run it like this in prod [15:08:28] !log thcipriani@tin Synchronized wmf-config/ProductionServices.php: SWAT: Translate: Use Apertium via cxserver Part I [[gerrit:286632]] (duration: 00m 29s) [15:08:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:09:01] !log thcipriani@tin Synchronized wmf-config/CommonSettings.php: SWAT: Translate: Use Apertium via cxserver Part II [[gerrit:286632]] (duration: 00m 28s) [15:09:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:09:14] ^ Nikerabbit check please [15:09:29] https://test.wikipedia.org/w/api.php?action=translationaids&format=jsonfm&title=Translations%3ATranslate+test%2FPage+display+title%2Fca as expected [15:09:34] jynus: feel free to sync depool [15:09:51] thcipriani, thanks, it will take 5 seconds only [15:10:21] (03CR) 10Jcrespo: [C: 032] Depool db1065, now for real [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286862 (owner: 10Jcrespo) [15:10:37] also works on meta [15:11:22] 06Operations, 06Discovery, 03Discovery-Search-Sprint, 07Elasticsearch, 13Patch-For-Review: Publish "pending_tasks" count from Elastic search cluster to graphite - https://phabricator.wikimedia.org/T134240#2264234 (10Gehel) Data is now available in the [[ https://grafana.wikimedia.org/dashboard/db/elastic... [15:11:30] Nikerabbit: awesome, thanks for checking. [15:11:39] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Really depool db1065 for maintenance (duration: 00m 26s) [15:11:40] Dereckson, https://gerrit.wikimedia.org/r/286812 looks like something that should be well tested before proper deployment [15:11:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:11:52] thcipriani, done [15:11:57] Thiemo_WMDE: ping for SWAT [15:12:02] jynus: thanks :) [15:12:44] mediawiki.org ok https://www.mediawiki.org/w/index.php?title=Special:Translate&group=page-Educational+hub&language=ca&filter=!translated&action=translate [15:12:55] Krenair: any idea for test? [15:13:32] 07Blocked-on-Operations, 06Operations, 06Services, 06WMDE-Analytics-Engineering, and 2 others: scale graphite deployment (tracking) - https://phabricator.wikimedia.org/T85451#2264239 (10GWicke) [15:13:48] thcipriani: I'm here. [15:13:57] PROBLEM - Check for gridmaster host resolution UDP on labs-ns1.wikimedia.org is CRITICAL: CRITICAL - Plugin timed out while executing system call [15:14:00] Krenair: what about first sync on mw1017, and test there if the namespaces still work? [15:14:21] dblist settings without tricky + are trustable by the way [15:14:26] RECOVERY - Check that eventlogging-service-eventbus is running on kafka1001 is OK: PROCS OK: 9 processes with command name python, args /srv/deployment/eventlogging/eventbus/bin/eventlogging-service @/etc/eventlogging.d/services/eventbus [15:15:31] Dereckson, first put it on tin and use eval.php/mwrepl to check the resulting variables, then sync-common on mw1017 and test that way [15:15:50] Okay I've an idea of testing method: 1. deploy on mw1017. 2. mwrepl > $wgNamespaceAliases, ensure we still have a NS_PROJECT (okay we share the same idea) [15:16:00] Thiemo_WMDE: change looks fine, this only be in effect on testwikis until they roll forward later this afternoon, just FYI. [15:16:14] other way around [15:16:20] So mwrepl on Tin, testing through X-Wikimedia-Debug on mw1017 browser side second? [15:16:30] 06Operations, 07Performance, 05codfw-rollout: Package and deploy Mcrouter as a replacement for twemproxy - https://phabricator.wikimedia.org/T132317#2264258 (10Joe) [15:16:33] thcipriani: Sounds correct for me. [15:16:38] kk [15:17:19] yes [15:17:27] PROBLEM - Check for gridmaster host resolution UDP on labs-ns0.wikimedia.org is CRITICAL: CRITICAL - Plugin timed out while executing system call [15:17:57] RECOVERY - Check for gridmaster host resolution UDP on labs-ns1.wikimedia.org is OK: DNS OK - 0.072 seconds response time (tools-grid-master.tools.eqiad.wmflabs. 60 IN A 10.68.20.158) [15:17:57] PROBLEM - puppet last run on mw2188 is CRITICAL: CRITICAL: puppet fail [15:18:21] 06Operations, 07Performance, 05codfw-rollout: Package and deploy Mcrouter as a replacement for twemproxy - https://phabricator.wikimedia.org/T132317#2194315 (10Joe) this would also allow replicating redis traffic, but there is one big caveat: data would flow unencrypted between datacenters AFAICS, and that i... [15:18:22] ok, so on the next patch, I'll merge, run a sync-common on tin, check with the repl, sync-common mw1017, check with repl, sync-file from tin. Sound right Krenair Dereckson ? [15:18:33] (03PS1) 10Eevans: Update collector version (both branches) [software/cassandra-metrics-collector] - 10https://gerrit.wikimedia.org/r/286865 (https://phabricator.wikimedia.org/T134016) [15:19:00] 06Operations, 06Performance-Team, 10Traffic, 13Patch-For-Review: Support HTTP/2 - https://phabricator.wikimedia.org/T96848#2264271 (10Jdforrester-WMF) \o/ [15:19:05] yes [15:19:13] * Dereckson nods. [15:19:16] (03CR) 10Ottomata: [C: 031] "Danke!" [puppet] - 10https://gerrit.wikimedia.org/r/286852 (https://phabricator.wikimedia.org/T121721) (owner: 10Rush) [15:19:18] RECOVERY - Check for gridmaster host resolution UDP on labs-ns0.wikimedia.org is OK: DNS OK - 0.125 seconds response time (tools-grid-master.tools.eqiad.wmflabs. 60 IN A 10.68.20.158) [15:19:33] (03PS2) 10Thcipriani: Remove redundant NS_PROJECT entries from wgNamespacesAliases [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286812 (https://phabricator.wikimedia.org/T131023) (owner: 10Dereckson) [15:19:58] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286812 (https://phabricator.wikimedia.org/T131023) (owner: 10Dereckson) [15:21:44] (03Merged) 10jenkins-bot: Remove redundant NS_PROJECT entries from wgNamespacesAliases [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286812 (https://phabricator.wikimedia.org/T131023) (owner: 10Dereckson) [15:24:53] !log changed catchpoint 'Static Assets' checks from (deprecated) https://bits.wikimedia.org/static-current/resources/assets/poweredby_mediawiki_88x31.png to https://meta.wikimedia.org/w/resources/assets/poweredby_mediawiki_88x31.png - T107430 [15:24:53] T107430: Decom bits.wikimedia.org hostname - https://phabricator.wikimedia.org/T107430 [15:24:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:25:36] (03PS1) 10Hashar: contint: move File[/srv/localhost-worker] out of role [puppet] - 10https://gerrit.wikimedia.org/r/286869 [15:26:16] 06Operations, 10Traffic, 06Zero, 13Patch-For-Review: Use Text IP for Mobile hostnames to gain SPDY/H2 coalesce between the two - https://phabricator.wikimedia.org/T124482#2264298 (10BBlack) Checked watchmouse + catchpoint, didn't find any hardcoded IP refs there (but did find a bits.wm.o ref to kill in wat... [15:26:28] 06Operations, 10DBA, 07Performance, 07RfC, 05codfw-rollout: [RFC] improve parsercache replication and sharding handling - https://phabricator.wikimedia.org/T133523#2234475 (10jcrespo) [15:27:03] Dereckson: Krenair angwiki print var_dump($wgNamespaceAliases['Wikipedia'] === NS_PROJECT); bool(true) [15:27:05] on tin [15:27:46] going to mw1017 [15:28:06] !log restarting elasticsearch server elastic1001.eqiad.wmnet (T110236) [15:28:07] T110236: Use unicast instead of multicast for node communication - https://phabricator.wikimedia.org/T110236 [15:28:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:29:33] Dereckson: Krenair live on mw1017 [15:29:53] https://de.wikivoyage.org/wiki/WV:Foo stills redirect to https://de.wikivoyage.org/wiki/Wikivoyage:Foo @ mw1017 [15:29:58] (03PS2) 10BBlack: decom mobile IPs from LVS/caches [puppet] - 10https://gerrit.wikimedia.org/r/285229 (https://phabricator.wikimedia.org/T124482) [15:30:57] RECOVERY - Check that eventlogging-service-eventbus is running on kafka2002 is OK: PROCS OK: 9 processes with command name python, args /srv/deployment/eventlogging/eventbus/bin/eventlogging-service @/etc/eventlogging.d/services/eventbus [15:31:04] kk, going ahead with the sync-file [15:31:46] looks ok [15:32:05] (03PS2) 10BBlack: mobile IP DNS decom [dns] - 10https://gerrit.wikimedia.org/r/285227 (https://phabricator.wikimedia.org/T124482) [15:32:16] (03CR) 10Mobrovac: [C: 031] Update collector version (both branches) [software/cassandra-metrics-collector] - 10https://gerrit.wikimedia.org/r/286865 (https://phabricator.wikimedia.org/T134016) (owner: 10Eevans) [15:33:08] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Remove redundant NS_PROJECT entries from wgNamespacesAliases [[gerrit:286812]] (duration: 00m 34s) [15:33:09] ^ Dereckson check please [15:33:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:33:48] thcipriani: looks good [15:33:58] Dereckson: cool, thanks! [15:33:59] (tested ang.wikt, ang.wikip, de.wikiv) [15:34:14] 06Operations, 10DBA, 07Performance, 07RfC, 05codfw-rollout: [RFC] improve parsercache replication and sharding handling - https://phabricator.wikimedia.org/T133523#2264329 (10jcrespo) One option would be to shift sharding to the same model than external storage. Shard by a static key value, and separate... [15:34:42] Hmmm, strange it's good also on outreach. [15:34:48] !log removing old mobile IPs from actual production config (no longer in use) - T124482 [15:34:49] T124482: Use Text IP for Mobile hostnames to gain SPDY/H2 coalesce between the two - https://phabricator.wikimedia.org/T124482 [15:34:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:35:12] ah no it's not [15:35:39] thcipriani: I'm submitting a fix to restore it for Outreach (not in wikipedia dblist) [15:36:04] Dereckson: ok [15:36:39] !log thcipriani@tin Synchronized php-1.27.0-wmf.23/includes/htmlform/HTMLFormField.php: SWAT: Fix HTMLFormField calling Message::setContext with null [[gerrit:286855]] (duration: 00m 25s) [15:36:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:36:54] Thiemo_WMDE: ^ check please [15:38:08] (03PS1) 10Hashar: contint: move package_builder setup to its own class [puppet] - 10https://gerrit.wikimedia.org/r/286873 [15:38:31] 06Operations, 10ops-codfw: sinistra - RAID failure - https://phabricator.wikimedia.org/T134187#2264350 (10Papaul) failed disk: slot 3 [15:38:42] (03PS3) 10Thcipriani: Shakespeare in London throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286807 (https://phabricator.wikimedia.org/T134353) (owner: 10Dereckson) [15:38:58] (03PS1) 10Dereckson: Restore Wikipedia: namespace alias on outreach [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286875 [15:39:02] Here you are ^ [15:39:04] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286807 (https://phabricator.wikimedia.org/T134353) (owner: 10Dereckson) [15:39:17] thcipriani: Checked, the bug I had disappeared. Thanks! [15:39:28] Thiemo_WMDE: great! Thanks for checking. [15:39:42] (03Merged) 10jenkins-bot: Shakespeare in London throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286807 (https://phabricator.wikimedia.org/T134353) (owner: 10Dereckson) [15:40:29] (286875 added to [[Deployments]]) [15:41:46] (03PS2) 10Thcipriani: Restore Wikipedia: namespace alias on outreach [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286875 (owner: 10Dereckson) [15:41:56] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286875 (owner: 10Dereckson) [15:42:32] (03Merged) 10jenkins-bot: Restore Wikipedia: namespace alias on outreach [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286875 (owner: 10Dereckson) [15:43:47] !log thcipriani@tin Synchronized wmf-config/throttle.php: SWAT: Shakespeare in London throttle rule [[gerrit:286807]] (duration: 00m 26s) [15:43:54] ^ Dereckson throttle rule sync'd [15:43:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:44:34] ok [15:44:54] (03PS1) 10Dereckson: Enable NewUserMessage for SUL accounts too on gu.wikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286877 (https://phabricator.wikimedia.org/T134253) [15:45:11] May I add this change to the SWAT? ^ [15:45:33] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Restore Wikipedia: namespace alias on outreach [[gerrit:286875]] (duration: 00m 27s) [15:45:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:45:41] Dereckson: ^ check please [15:45:42] RECOVERY - puppet last run on mw2188 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [15:45:55] (03CR) 10BBlack: [C: 032] decom mobile IPs from LVS/caches [puppet] - 10https://gerrit.wikimedia.org/r/285229 (https://phabricator.wikimedia.org/T124482) (owner: 10BBlack) [15:46:05] 286875 outreach tested, works fine [15:46:07] Dereckson: sure, you can add that to SWAT. [15:46:25] thanks, editing the wikitech table [15:46:41] (03PS2) 10Thcipriani: Namespace configuration for gl.wikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286811 (https://phabricator.wikimedia.org/T134041) (owner: 10Dereckson) [15:46:53] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286811 (https://phabricator.wikimedia.org/T134041) (owner: 10Dereckson) [15:47:35] (03Merged) 10jenkins-bot: Namespace configuration for gl.wikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286811 (https://phabricator.wikimedia.org/T134041) (owner: 10Dereckson) [15:48:43] (03PS1) 10Hashar: contint: regroup PHP definitions in contint::packages::php [puppet] - 10https://gerrit.wikimedia.org/r/286879 [15:50:13] !log thcipriani@tin Synchronized php-1.27.0-wmf.23/extensions/ProofreadPage/ProofreadPage.namespaces.php: SWAT: Localize namespaces Page and Index in Galician [[gerrit:286820]] (duration: 00m 26s) [15:50:19] ^ Dereckson check please [15:50:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:51:33] thcipriani: hmmmm non testable right now, gl.wikisource is still wmf22 until today MediaWiki train [15:51:55] Dereckson: right. ok. [15:51:55] (03PS2) 10Hashar: contint: drop libcurl4-gnutls-dev [puppet] - 10https://gerrit.wikimedia.org/r/286837 (https://phabricator.wikimedia.org/T134378) [15:52:45] (03CR) 10Hashar: "Rebased on top of https://gerrit.wikimedia.org/r/286879 which introduces contint::packages::php . This way the absent is in the same class" [puppet] - 10https://gerrit.wikimedia.org/r/286837 (https://phabricator.wikimedia.org/T134378) (owner: 10Hashar) [15:52:57] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Namespace configuration for gl.wikisource [[gerrit:286811]] (duration: 00m 27s) [15:53:01] ^ Dereckson check please [15:53:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:53:32] Looks good. [15:53:35] 06Operations, 10ops-ulsfo: power loss in ulsfo cabinet 1.23 - https://phabricator.wikimedia.org/T134330#2264482 (10RobH) I've created a case with Juniper support for the cr1-ulsfo power supply replacement. 2016-0504-0555 [15:53:51] (03PS2) 10Thcipriani: Enable NewUserMessage for SUL accounts too on gu.wikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286877 (https://phabricator.wikimedia.org/T134253) (owner: 10Dereckson) [15:54:02] 06Operations, 10Continuous-Integration-Config: Create a CI check for puppet/mediawiki-config to detect misspelled hostnames - https://phabricator.wikimedia.org/T134399#2264486 (10faidon) [15:54:03] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286877 (https://phabricator.wikimedia.org/T134253) (owner: 10Dereckson) [15:54:43] (03Merged) 10jenkins-bot: Enable NewUserMessage for SUL accounts too on gu.wikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286877 (https://phabricator.wikimedia.org/T134253) (owner: 10Dereckson) [15:56:50] 06Operations, 10Continuous-Integration-Config, 06Release-Engineering-Team: Write a test to check for clearly bogus hostnames - https://phabricator.wikimedia.org/T133047#2264503 (10hashar) [15:57:00] (03CR) 10Ottomata: Add external database configuration for Hue (Analytics) (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/286845 (https://phabricator.wikimedia.org/T127990) (owner: 10Elukey) [15:57:03] 06Operations, 10Continuous-Integration-Config: Create a CI check for puppet/mediawiki-config to detect misspelled hostnames - https://phabricator.wikimedia.org/T134399#2264486 (10hashar) [15:57:05] 06Operations, 10Continuous-Integration-Config, 06Release-Engineering-Team: Write a test to check for clearly bogus hostnames - https://phabricator.wikimedia.org/T133047#2218145 (10hashar) [15:58:56] 06Operations, 10ops-codfw, 06DC-Ops, 10Traffic: lvs2002 Embedded Flash/SD-CARD iLO errors - https://phabricator.wikimedia.org/T126321#2264512 (10Papaul) @BBlack Oaky please let me know when you want to do this. Thanks. [15:59:30] Dereckson: sorry, laptop froze [15:59:57] 06Operations: Puppet-manage redis.conf - https://phabricator.wikimedia.org/T134400#2264513 (10faidon) [16:00:04] legoktm rillke: Dear anthropoid, the time has come. Please deploy UploadsLink deployment (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160504T1600). [16:00:45] !log REALLY (from active LVS) removing old mobile IPs from actual production config (no longer in use) - T124482 [16:00:46] T124482: Use Text IP for Mobile hostnames to gain SPDY/H2 coalesce between the two - https://phabricator.wikimedia.org/T124482 [16:00:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:02:07] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable NewUserMessage for SUL accounts too on gu.wikiquote [[gerrit:286877]] (duration: 00m 28s) [16:02:14] ^ Dereckson check please [16:02:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:03:04] !log restarting elasticsearch server elastic1002.eqiad.wmnet (T110236) [16:03:05] T110236: Use unicast instead of multicast for node communication - https://phabricator.wikimedia.org/T110236 [16:03:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:03:59] 06Operations, 10DBA, 07Performance, 07RfC, 05codfw-rollout: [RFC] improve parsercache replication and sharding handling - https://phabricator.wikimedia.org/T133523#2234475 (10GWicke) Another option worth considering would be to store the parser cache in Cassandra, and leverage its multi-DC and sharding f... [16:04:59] 06Operations, 10ops-eqiad, 06DC-Ops: testing: r430 server / h800 controller / md1200 shelf - https://phabricator.wikimedia.org/T127490#2264556 (10RobH) a:05RobH>03Cmjohnson I'm assigning this to Chris, since he spun up the test and will have to do the disk wipe. Since it seems there is agreement to not... [16:06:58] thcipriani: I've asked two users to visit https://gu.wikiquote.org/ and I don't see any welcome message. [16:07:24] 06Operations, 10Continuous-Integration-Config, 06Release-Engineering-Team: Write a test to check for clearly bogus hostnames - https://phabricator.wikimedia.org/T133047#2264567 (10hashar) ```grep: the -P option only supports a single pattern``` Will need to use Extended Regular Expressions. ``` $ cat wrong... [16:08:11] 06Operations, 10MediaWiki-General-or-Unknown, 10Monitoring, 06Performance-Team: edit.success in graphite never reached zero during codfw switchover - https://phabricator.wikimedia.org/T133177#2264569 (10faidon) [16:08:14] But then, during previous deployments, it was only after some hours we had NewUserMessage ok feedback. [16:08:15] thcipriani: are you still SWATing? [16:08:49] Dereckson: hmm, the code has definitely sync'd. [16:08:54] legoktm: just finished. [16:08:58] 06Operations, 10Continuous-Integration-Config, 06Release-Engineering-Team: Write a test to check for clearly bogus hostnames - https://phabricator.wikimedia.org/T133047#2264572 (10hashar) And James pointed to https://github.com/wikimedia/mediawiki-extensions-VisualEditor/blob/master/build/typos.json [16:10:30] (03PS3) 10Legoktm: Add UploadsLink to production extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286494 (https://phabricator.wikimedia.org/T130018) (owner: 10Rillke) [16:10:32] (03PS4) 10Legoktm: Enable UploadsLink at Wikimedia Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286517 (https://phabricator.wikimedia.org/T130018) (owner: 10Rillke) [16:10:47] (03CR) 10Legoktm: [C: 032] Add UploadsLink to production extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286494 (https://phabricator.wikimedia.org/T130018) (owner: 10Rillke) [16:11:27] (03CR) 10BBlack: [C: 032] mobile IP DNS decom [dns] - 10https://gerrit.wikimedia.org/r/285227 (https://phabricator.wikimedia.org/T124482) (owner: 10BBlack) [16:12:19] (03Merged) 10jenkins-bot: Add UploadsLink to production extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286494 (https://phabricator.wikimedia.org/T130018) (owner: 10Rillke) [16:12:46] 06Operations, 10Traffic, 07Performance: missing SPDY coalesce for upload.wm.o for images ref'd in projects' page outputs - https://phabricator.wikimedia.org/T116132#2264583 (10BBlack) [16:12:47] thcipriani: I've checked their on wiki config, it looks good, we'll see in the next hours if some users receive it or not. [16:12:59] Dereckson: ok, sounds good. Thank you for checking. [16:13:05] Oh wonderful: https://gu.wikiquote.org/wiki/%E0%AA%B8%E0%AA%AD%E0%AB%8D%E0%AA%AF%E0%AA%A8%E0%AB%80_%E0%AA%9A%E0%AA%B0%E0%AB%8D%E0%AA%9A%E0%AA%BE:El_pitareio [16:13:08] We've got one :) [16:13:12] So tested, works fine. [16:13:40] !log legoktm@tin Started scap: Build l10n cache for UploadsLink deployment - T130018 [16:13:41] T130018: Review and deploy Extension:UploadsLink to Wikimedia Commons - https://phabricator.wikimedia.org/T130018 [16:13:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:14:21] 06Operations, 10Ops-Access-Requests: Requesting access to stat1003, stat1002 and bast1001 for Dstrine - https://phabricator.wikimedia.org/T133953#2264587 (10DStrine) >>! In T133953#2250517, @Dzahn wrote: > @DStrine While we are awaiting manager approval, you could already create a SSH keypair and upload the pu... [16:15:18] 06Operations, 06Analytics-Kanban, 10DNS, 10Traffic: Create analytics.wikimedia.org - https://phabricator.wikimedia.org/T132407#2264602 (10Ottomata) [16:15:22] (03PS1) 10Andrew Bogott: Increase the cache size for the Labs dns recursor [puppet] - 10https://gerrit.wikimedia.org/r/286897 (https://phabricator.wikimedia.org/T124680) [16:15:31] 06Operations, 06Analytics-Kanban, 10DNS, 10Traffic: Create analytics.wikimedia.org - https://phabricator.wikimedia.org/T132407#2197243 (10Ottomata) a:03Ottomata [16:19:14] !log restarting blazegraph (T134238) [16:19:15] T134238: Query service fails with "Too many open files" - https://phabricator.wikimedia.org/T134238 [16:20:12] (03Abandoned) 10Addshore: Add ganglia link to codfw too [software/tendril] - 10https://gerrit.wikimedia.org/r/284184 (owner: 10Addshore) [16:21:14] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 709 600 - REDIS on 10.192.48.44:6479 has 1 databases (db0) with 5690294 keys - replication_delay is 709 [16:22:24] 06Operations, 10DBA, 07Performance, 07RfC, 05codfw-rollout: [RFC] improve parsercache replication and sharding handling - https://phabricator.wikimedia.org/T133523#2264634 (10aaron) >>! In T133523#2264538, @GWicke wrote: > Another option worth considering would be to store the parser cache in Cassandra,... [16:25:13] (03PS2) 10Andrew Bogott: Increase the cache size for the Labs dns recursor [puppet] - 10https://gerrit.wikimedia.org/r/286897 (https://phabricator.wikimedia.org/T124680) [16:27:38] (03PS1) 10ArielGlenn: use full paths and explicitly invoke bash for kiwix rsync cron script [puppet] - 10https://gerrit.wikimedia.org/r/286902 [16:30:02] (03CR) 10ArielGlenn: [C: 032] use full paths and explicitly invoke bash for kiwix rsync cron script [puppet] - 10https://gerrit.wikimedia.org/r/286902 (owner: 10ArielGlenn) [16:30:13] (03CR) 10Andrew Bogott: [C: 032] Increase the cache size for the Labs dns recursor [puppet] - 10https://gerrit.wikimedia.org/r/286897 (https://phabricator.wikimedia.org/T124680) (owner: 10Andrew Bogott) [16:31:38] (03PS3) 10Andrew Bogott: Increase the cache size for the Labs dns recursor [puppet] - 10https://gerrit.wikimedia.org/r/286897 (https://phabricator.wikimedia.org/T124680) [16:31:40] (03PS1) 10Andrew Bogott: Labs DNS: Change the cache ttls back to defaults. [puppet] - 10https://gerrit.wikimedia.org/r/286905 (https://phabricator.wikimedia.org/T124680) [16:32:04] (03Abandoned) 10Bmansurov: Reduce sampling rate for language switcher [mediawiki-config] - 10https://gerrit.wikimedia.org/r/272724 (https://phabricator.wikimedia.org/T127212) (owner: 10Bmansurov) [16:32:14] PROBLEM - puppet last run on mw2146 is CRITICAL: CRITICAL: puppet fail [16:34:33] PROBLEM - High lag on wdqs1002 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [1800.0] [16:35:06] 06Operations, 10media-storage: Look into enabling HTTPS for Swift traffic - https://phabricator.wikimedia.org/T127455#2264650 (10fgiunchedi) a:03fgiunchedi [16:36:15] 06Operations, 10Traffic: Upgrade all cache clusters to Varnish 4 - https://phabricator.wikimedia.org/T131499#2264652 (10BBlack) [16:36:17] 06Operations, 10Traffic, 07HTTPS: Outbound HTTPS for varnish backend instances - https://phabricator.wikimedia.org/T109325#2264651 (10BBlack) [16:36:35] 06Operations, 06Performance-Team, 10Traffic, 13Patch-For-Review: Support HTTP/2 - https://phabricator.wikimedia.org/T96848#2264653 (10ori) Thanks so much, @bblack. Rolling this out at this pace with no significant interruptions is very impressive. [16:37:06] !log restarting elasticsearch server elastic1003.eqiad.wmnet (T110236) [16:37:07] T110236: Use unicast instead of multicast for node communication - https://phabricator.wikimedia.org/T110236 [16:37:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:44:33] 06Operations, 06Discovery, 03Discovery-Search-Sprint, 07Elasticsearch, 13Patch-For-Review: Publish "pending_tasks" count from Elastic search cluster to graphite - https://phabricator.wikimedia.org/T134240#2264663 (10EBernhardson) should we also get an icinga alert in there? (maybe different task though) [16:46:36] 06Operations, 10Ops-Access-Requests: Requesting access to stat1003, stat1002 and bast1001 for Dstrine - https://phabricator.wikimedia.org/T133953#2264667 (10K4-713) I approve this access request. [16:47:31] 06Operations, 10Traffic, 05codfw-rollout: Varnish support for active:active backend services - https://phabricator.wikimedia.org/T134404#2264674 (10BBlack) [16:48:41] 06Operations, 10Traffic, 05codfw-rollout: Varnish support for active:active backend services - https://phabricator.wikimedia.org/T134404#2264708 (10BBlack) [16:48:43] 06Operations, 10Traffic, 05codfw-rollout: Enable VCL source-DC switching via confd - https://phabricator.wikimedia.org/T127482#2264707 (10BBlack) [16:49:10] 06Operations, 10Traffic, 05codfw-rollout: Varnish support for active:active backend services - https://phabricator.wikimedia.org/T134404#2264674 (10BBlack) [16:49:29] 06Operations, 10Traffic, 05codfw-rollout: Enable VCL applayer datacenter-switch via confd - https://phabricator.wikimedia.org/T127485#2264709 (10BBlack) [16:50:13] 06Operations, 10Traffic, 07HTTPS, 05codfw-rollout: Outbound HTTPS for varnish backend instances - https://phabricator.wikimedia.org/T109325#2264715 (10BBlack) [16:50:25] 06Operations, 10Traffic, 10Wikimedia-Apache-configuration, 07HTTPS, 05codfw-rollout: Enable HTTPS on internal MediaWiki appserver virtual service hostnames - https://phabricator.wikimedia.org/T109315#2264716 (10BBlack) [16:50:54] 06Operations, 10Traffic, 07HTTPS, 05codfw-rollout: HTTPS for internal service traffic - https://phabricator.wikimedia.org/T108580#2264717 (10BBlack) [16:51:25] paravoid: sorry I didn't see your ping before, what's up? [16:52:42] 06Operations, 10ops-codfw: rack/setup/deploy maps200[1-4] - https://phabricator.wikimedia.org/T134406#2264720 (10Papaul) [16:54:01] 06Operations, 10ops-codfw: rack/setup/deploy maps200[1-4] - https://phabricator.wikimedia.org/T134406#2264740 (10Papaul) p:05Triage>03Normal a:03Papaul [16:56:05] (03PS5) 10Elukey: Add external database configuration for Hue (Analytics) [puppet] - 10https://gerrit.wikimedia.org/r/286845 (https://phabricator.wikimedia.org/T127990) [16:56:30] 06Operations, 10ops-codfw: sinistra - RAID failure - https://phabricator.wikimedia.org/T134187#2264747 (10Papaul) Dell Customer Communication Hi Papaul, I’ve just submitted dispatch # 318524567 for this hard drive to arrive tomorrow. Let me know when you get it, and can confirm the issue is resolved.... [16:58:34] RECOVERY - puppet last run on mw2146 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:00:56] 06Operations, 10Ops-Access-Requests: Requesting access to stat1003, stat1002 and bast1001 for Dstrine - https://phabricator.wikimedia.org/T133953#2264782 (10Dzahn) @DStrine [[ https://www.mediawiki.org/wiki/Gerrit/Tutorial#Generate_a_new_SSH_key | This ]] is about Gerrit but the "Generate a new SSH key" sectio... [17:01:56] ^ didnt we also have a link that explained ssh-keygen etc but was production specific [17:03:00] (03CR) 10Ottomata: [C: 031] Add external database configuration for Hue (Analytics) [puppet] - 10https://gerrit.wikimedia.org/r/286845 (https://phabricator.wikimedia.org/T127990) (owner: 10Elukey) [17:03:28] (03CR) 10Elukey: [C: 032] Add external database configuration for Hue (Analytics) [puppet] - 10https://gerrit.wikimedia.org/r/286845 (https://phabricator.wikimedia.org/T127990) (owner: 10Elukey) [17:03:30] PROBLEM - HHVM rendering on mw1213 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50404 bytes in 0.005 second response time [17:05:16] (03PS2) 10BBlack: remove do_spdy conditional, all h2, 1/2 [puppet] - 10https://gerrit.wikimedia.org/r/286818 (https://phabricator.wikimedia.org/T96848) [17:05:25] 06Operations, 05codfw-rollout: test2wiki has no recent changes before the 20 april - https://phabricator.wikimedia.org/T133225#2225647 (10aaron) AFAIK, ori ran the queries (deleting rows for sanity to time overlaps) and the RC rebuild scripts during the last roll-out. I wonder if this is related to that? [17:05:30] RECOVERY - HHVM rendering on mw1213 is OK: HTTP OK: HTTP/1.1 200 OK - 68412 bytes in 0.107 second response time [17:05:34] (03CR) 10BBlack: [C: 032 V: 032] remove do_spdy conditional, all h2, 1/2 [puppet] - 10https://gerrit.wikimedia.org/r/286818 (https://phabricator.wikimedia.org/T96848) (owner: 10BBlack) [17:05:34] !log legoktm@tin Finished scap: Build l10n cache for UploadsLink deployment - T130018 (duration: 51m 53s) [17:05:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:06:31] (03PS2) 10BBlack: remove do_spdy hieradata, all h2, 2/2 [puppet] - 10https://gerrit.wikimedia.org/r/286819 (https://phabricator.wikimedia.org/T96848) [17:06:43] T130018: Review and deploy Extension:UploadsLink to Wikimedia Commons - https://phabricator.wikimedia.org/T130018 [17:06:54] (03CR) 10Legoktm: [C: 032] Enable UploadsLink at Wikimedia Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286517 (https://phabricator.wikimedia.org/T130018) (owner: 10Rillke) [17:07:21] (03Merged) 10jenkins-bot: Enable UploadsLink at Wikimedia Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286517 (https://phabricator.wikimedia.org/T130018) (owner: 10Rillke) [17:07:55] (03PS1) 10Elukey: Revert "Add external database configuration for Hue (Analytics)" [puppet] - 10https://gerrit.wikimedia.org/r/286913 [17:08:03] (03PS2) 10Elukey: Revert "Add external database configuration for Hue (Analytics)" [puppet] - 10https://gerrit.wikimedia.org/r/286913 [17:09:06] !log legoktm@tin Synchronized wmf-config/: Enable UploadsLink at Wikimedia Commons - T130018 (duration: 00m 43s) [17:09:07] T130018: Review and deploy Extension:UploadsLink to Wikimedia Commons - https://phabricator.wikimedia.org/T130018 [17:09:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:09:29] \o/ [17:10:36] all done :) [17:10:45] Steinsplitter: you can disable the MyUploads gadget now [17:11:39] Yay more Multimedia extensions. ;-) [17:11:47] (03PS1) 10Elukey: Add database_name to hiera config. [puppet] - 10https://gerrit.wikimedia.org/r/286915 [17:12:17] (03CR) 10Elukey: [C: 032 V: 032] Add database_name to hiera config. [puppet] - 10https://gerrit.wikimedia.org/r/286915 (owner: 10Elukey) [17:14:11] !log restarting elasticsearch server elastic1004.eqiad.wmnet (T110236) [17:14:12] T110236: Use unicast instead of multicast for node communication - https://phabricator.wikimedia.org/T110236 [17:14:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:14:44] (03PS3) 10BBlack: remove do_spdy hieradata, all h2, 2/2 [puppet] - 10https://gerrit.wikimedia.org/r/286819 (https://phabricator.wikimedia.org/T96848) [17:14:54] (03CR) 10BBlack: [C: 032 V: 032] remove do_spdy hieradata, all h2, 2/2 [puppet] - 10https://gerrit.wikimedia.org/r/286819 (https://phabricator.wikimedia.org/T96848) (owner: 10BBlack) [17:15:19] twentyafterfour: hey, sorry, I was in a meeting -- are you here now perhaps? [17:16:03] 06Operations, 10Traffic, 07Performance: missing SPDY coalesce for upload.wm.o for images ref'd in projects' page outputs - https://phabricator.wikimedia.org/T116132#2264938 (10BBlack) [17:16:21] 06Operations, 06Performance-Team, 10Traffic, 13Patch-For-Review: Support HTTP/2 - https://phabricator.wikimedia.org/T96848#2264936 (10BBlack) 05Open>03Resolved a:03BBlack [17:17:23] 765614 Notice: Undefined variable: wmgUseUploadsLink in /srv/mediawiki/wmf-conf [17:17:23] ig/CommonSettings.php on line 827 [17:17:26] sigh [17:17:31] that was just temporary [17:17:57] paravoid: I am here now [17:18:03] awesome :) [17:18:03] so [17:18:06] 06Operations, 10Traffic, 07Performance: missing H2 coalesce for upload.wm.o for images ref'd in projects' page outputs - https://phabricator.wikimedia.org/T116132#2264939 (10BBlack) [17:18:34] https://phabricator.wikimedia.org/T132078#2250108 [17:18:50] if you click on either one of the two attachments, and then on "Download file" [17:19:08] you get a phab exception, "Declining to emit response with unsafe HTTP header: <'Content-Disposition', 'attachment [...]" [17:19:26] weird.. [17:19:29] yeah :) [17:21:09] RECOVERY - High lag on wdqs1002 is OK: OK: Less than 30.00% above the threshold [600.0] [17:21:10] paravoid: from the code in question... // Attackers may perform an "HTTP response splitting" attack by making [17:21:12] // the application emit certain types of headers containing newlines: [17:21:40] PHP has built-in protections against HTTP response-splitting, but they are of dubious trustworthiness: http://news.php.net/php.internals/57655 [17:22:30] so I wonder where the newline is introduced [17:23:30] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 5684019 keys - replication_delay is 0 [17:24:07] we did just recently deploy a varnishd code patch designed to prevent response-splitting, too [17:24:14] I'm not really sure how relevant that is to all the above [17:25:02] the timing just seems odd. that went out like yesterday, and here it is being discussed in an open question above [17:25:55] I'm thinking the filename there has a newline or null character ... [17:26:09] it doesn't seem like phab scrubs the filenames before setting the header [17:26:13] I'll write up a patch [17:27:34] <3 [17:29:03] bblack: do you have the varnishd patch id handy? [17:29:06] I'm curious! [17:29:48] paravoid: https://github.com/varnish/Varnish-Cache/commit/85e8468bec9416bd7e16b0d80cb820ecd2b330c3 [17:30:08] legoktm: done [17:30:17] that commit was only in 3.0.7, which we never upgraded to for reasons related to other patch conflicts with custom legacy plus stuff [17:30:21] interesting! [17:30:26] paravoid: ...indeed it's just phab isn't sanitizing the filenames [17:30:33] moritzm rolled that and the other safe-ish 3.0.7 bugfixes into a new 3.0.6 package for us [17:30:43] * twentyafterfour just live-hacked a fix, will submit a proper patch after scrum of scrums [17:30:50] and I pushed that package out over the past couple of days [17:30:59] see also: http://www.openwall.com/lists/oss-security/2016/04/16/1 [17:31:06] !log livehacked phabricator/src/aphront/response/AphrontFileResponse.php to fix filename with newlines [17:31:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:31:20] twentyafterfour: thank you so much! [17:31:31] major kudos :) [17:33:06] you're welcome, no problem at all [17:33:56] (03Abandoned) 10Elukey: Revert "Add external database configuration for Hue (Analytics)" [puppet] - 10https://gerrit.wikimedia.org/r/286913 (owner: 10Elukey) [17:38:41] 06Operations, 10CirrusSearch, 06Discovery, 06Discovery-Search-Backlog, and 4 others: "Elastica: missing curl_init_pooled method" due to mwscript job running with PHP 5 on terbium - https://phabricator.wikimedia.org/T132751#2264992 (10EBernhardson) [17:49:53] !log restarting elasticsearch server elastic1005.eqiad.wmnet (T110236) [17:49:54] T110236: Use unicast instead of multicast for node communication - https://phabricator.wikimedia.org/T110236 [17:50:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:53:41] PROBLEM - puppet last run on mw2133 is CRITICAL: CRITICAL: puppet fail [17:54:30] RECOVERY - Check that eventlogging-service-eventbus is running on kafka1002 is OK: PROCS OK: 9 processes with command name python, args /srv/deployment/eventlogging/eventbus/bin/eventlogging-service @/etc/eventlogging.d/services/eventbus [18:03:18] (03PS1) 10Glaisher: Configure $wgCheckUserCAMultiLock for CentralAuth wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286927 (https://phabricator.wikimedia.org/T128605) [18:10:32] !log restarting elasticsearch server elastic1006.eqiad.wmnet (T110236) [18:10:33] T110236: Use unicast instead of multicast for node communication - https://phabricator.wikimedia.org/T110236 [18:10:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:11:29] 06Operations, 10ops-ulsfo: repair/replace pem1 in cr1-ulsfo - https://phabricator.wikimedia.org/T134419#2265084 (10RobH) [18:12:21] 06Operations, 10ops-ulsfo: repair/replace pem1 in cr1-ulsfo - https://phabricator.wikimedia.org/T134419#2265084 (10RobH) Faidon provided the output, since it seems my user doesn't have permission to run: request support information I've sent the output (thx Faidon!) to Juniper support. [18:22:35] RECOVERY - puppet last run on mw2133 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [18:30:34] 06Operations, 10Internet-Archive, 10Wikimedia-Planet, 07Upstream: wordpress.com seems to have blocked us from fetching feeds - https://phabricator.wikimedia.org/T133818#2265130 (10Dzahn) That was just me not setting the proxy the right away.. it works with curl and https_proxy.. but nevertheless.. when usi... [18:31:15] _joe_: during the parsoid/ocg deploy window today I'm like to try deploying those puppet changes to decommission ocg1003 [18:31:19] *I'd like [18:33:00] PROBLEM - aqs endpoints health on aqs1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:38:10] (03PS2) 10Dzahn: Enable base::firewall on kraz [puppet] - 10https://gerrit.wikimedia.org/r/286856 (owner: 10Muehlenhoff) [18:39:10] (03CR) 10Dzahn: [C: 032] "yea, it was supposed to be on here from the beginning..somehow missed" [puppet] - 10https://gerrit.wikimedia.org/r/286856 (owner: 10Muehlenhoff) [18:39:40] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [18:40:00] PROBLEM - aqs endpoints health on aqs1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:41:42] RECOVERY - aqs endpoints health on aqs1001 is OK: All endpoints are healthy [18:46:21] RECOVERY - aqs endpoints health on aqs1003 is OK: All endpoints are healthy [18:46:47] <_joe_> cscott: it's a bit late here (almost 9 PM) you'd need to find another opsen if possible [18:46:59] <_joe_> cscott: you have my +1 on the hostname/fqdn change [18:47:15] _joe_: sure, ok. [18:47:20] <_joe_> or, I can deploy it tomorrow morning EU time and verify it doesn't break anything [18:47:43] _joe_: did you review the other patch, which actually adds ocg1003 to the ocg config as a decommissioned host? [18:48:34] <_joe_> cscott: yeah but that one, i wanted to merge myslf when i want to actually depool it [18:49:17] <_joe_> but if some other opsen is available, feel free to do that as well [18:50:46] I wanted to merge it, check that the code I wrote works, and then check that the "remove host from cache" script actually works. [18:50:59] <_joe_> cool, go on :) [18:50:59] I mean, I tested locally of course, but you know... [18:51:06] <_joe_> yes of course :) [18:52:05] <_joe_> but I am honestly too tired to assist you in case something needs to be repaired [18:53:00] i'm pretty confident i can fix any problems, i don't mean to make you stay up late. [18:53:28] we should definitely go through the process together for ocg1002 or so, assuming this works for ocg1003. maybe early morning eastern time would be good for both of us. [18:53:40] i think there's a pre-existing early morning ocg deploy opportunity on tuesdays. [18:55:30] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [18:59:44] (03PS1) 10Hashar: Adjust /typos to use extended regular expressions [puppet] - 10https://gerrit.wikimedia.org/r/286938 (https://phabricator.wikimedia.org/T133047) [18:59:54] jouncebot: next [18:59:54] In 0 hour(s) and 0 minute(s): MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160504T1900) [18:59:57] ... [19:00:03] rounding is terrible [19:00:04] hashar: Respected human, time to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160504T1900). Please do the needful. [19:00:27] (03PS1) 10Hashar: group1 wikis to 1.27.0-wmf.23 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286939 [19:01:06] 1253637 Notice: Undefined variable: wmgUseUploadsLink in /srv/mediawiki/wmf-config/CommonSettings.php on line 827 [19:01:06] thcipriani ^^ [19:01:13] looks like SWAT related [19:01:39] deployment time? [19:01:44] aude: yeah [19:02:45] (03CR) 10Hashar: "That causes a huge spam of" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286517 (https://phabricator.wikimedia.org/T130018) (owner: 10Rillke) [19:03:12] legoktm: Dereckson https://gerrit.wikimedia.org/r/#/c/286517/4 causes a spam of undefined variable wmgUseUploadsLink :( [19:03:58] (03PS1) 10Aude: Bump cache epoch for Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286940 [19:04:14] hashar: i think we will need that ^ [19:04:28] hashar: there was a dedicated deployment window to deploy this new extension today [19:05:06] https://wikitech.wikimedia.org/wiki/Deployments#Wednesday.2C.C2.A0May.C2.A004 [19:05:12] legoktm, rillke [19:05:23] patch is https://gerrit.wikimedia.org/r/#q,286517,n,z [19:05:30] looks like there is some load order issue :( [19:05:31] so yes that matches. [19:06:25] (03PS2) 10Aude: Bump cache epoch for Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286940 [19:06:30] (03PS4) 10Rush: Increase the cache size for the Labs dns recursor [puppet] - 10https://gerrit.wikimedia.org/r/286897 (https://phabricator.wikimedia.org/T124680) (owner: 10Andrew Bogott) [19:06:39] what i dont get is that wmgUseUploadsLink is defined in InitialiseSettings and in CommonSettings.php it is only used AFTER InitialiseSettings has been loaded [19:07:06] hashar: could be something switched to extension registration? [19:07:41] that is from CommonSettings.php [19:07:43] 827 if ( $wmgUseUploadsLink ) { [19:07:49] it should be defined [19:08:20] errr, i see the related swat patch [19:08:57] could be if ( !empty( $wmgUseUploadsLink ) ) [19:12:12] 06Operations, 06WMF-Legal, 07Privacy: Consider moving policy.wikimedia.org away from WordPress.com - https://phabricator.wikimedia.org/T132104#2265233 (10Dzahn) >>! In T132104#2227051, @Slaporte wrote: >>>! In T132104#2215041, @Dzahn wrote: >> @Slaporte I am able to help with setting up a simple static site... [19:14:15] 06Operations, 10ops-ulsfo: power loss in ulsfo cabinet 1.23 - https://phabricator.wikimedia.org/T134330#2265242 (10RobH) So Juniper needs the S/N of the defective PEM, which is absent and won't output via software. I'll go onsite tomorrow to pull PEM info for both asw-ulsfo and cr1-ulsfo. [19:14:54] !log hashar@tin Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 33s) [19:15:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:15:53] !log hashar@tin Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 41s) [19:16:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:16:03] pfff [19:17:13] I am reverting it [19:17:16] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 702 600 - REDIS on 10.192.48.44:6479 has 1 databases (db0) with 5703427 keys - replication_delay is 702 [19:17:26] !log wikimedia.ru looks down - hello Moscow? https://meta.wikimedia.org/wiki/Wikimedia_Russia [19:17:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:17:44] (03PS1) 10Hashar: Revert "Enable UploadsLink at Wikimedia Commons" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286941 [19:17:48] aude: I don't think so for if ( !empty( $wmgUseUploadsLink ) ) [19:18:18] yurik: wikimedia.ru seems broken, got any contacts? [19:18:32] (03CR) 10Hashar: [C: 032] Revert "Enable UploadsLink at Wikimedia Commons" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286941 (owner: 10Hashar) [19:18:38] mutante, sec [19:18:41] what's the issue? [19:18:46] InitialiseSettings array keys become scalar values when their value is scalar, here we have true/false [19:18:55] yurik: try to open the website. http://wikimedia.ru/ [19:19:08] oh, like completelly down [19:19:10] ok, sec [19:19:11] yea [19:19:13] (03Merged) 10jenkins-bot: Revert "Enable UploadsLink at Wikimedia Commons" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286941 (owner: 10Hashar) [19:19:51] mutante: yurik: no DNS record for www.wikimedia.ru or wikimedia.ru [19:19:53] i noticed this stuff when the planet feeds start throwing errors [19:20:02] ugh [19:20:11] are they.. seized or what [19:20:32] ru-planet.log:ERROR:planet.runner:Error 503 while updating feed http://wikimedia.ru/blog/feeds/latest/ [19:21:00] I love the HTTP codes we have in the Venus log. [19:21:10] Never a good indication of the real status. [19:21:16] yea, lol, it calls it "getting the HTTP status" [19:21:17] PROBLEM - restbase endpoints health on restbase1009 is CRITICAL: /transform/wikitext/to/html{/title}{/revision} (Transform wikitext to html) is CRITICAL: Could not fetch url http://10.64.48.110:7231/en.wikipedia.org/v1/transform/wikitext/to/html/Foobar: Generic connection error: HTTPConnectionPool(host=u10.64.48.110, port=7231): Max retries exceeded with url: /en.wikipedia.org/v1/transform/wikitext/to/html/Foobar (Caused by ProtocolEr [19:21:22] but then it sets it to 500 itself [19:21:34] you probably saw my comment from last night? [19:21:37] !log hashar@tin Synchronized wmf-config: Reverting https://gerrit.wikimedia.org/r/#/c/286517/ due to wmgUseUploadsLink being undefined. T130018 (duration: 00m 29s) [19:21:43] T130018: Review and deploy Extension:UploadsLink to Wikimedia Commons - https://phabricator.wikimedia.org/T130018 [19:21:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:21:46] yup [19:22:14] holy f***** [19:22:18] it still spam [19:22:26] :D [19:22:43] wmgUseUploadsLink is gone from operations/mediawiki-config [19:22:45] That's a good news hashar. [19:22:50] A very good news. [19:22:55] but I still have app servers spurting the undefined variable [19:23:34] A wmgUseUploadsLink variable lost elsewhere in the codebase is better than issues in InitialiseSettings array to variables transformation. [19:23:55] my issue is I still have: Undefined variable: wmgUseUploadsLink in /srv/mediawiki/wmf-config/CommonSettings.php on line 827 [19:24:03] but that line is no more wmgUseUploadsLink [19:24:09] so we are not syncing properly [19:24:24] looking at the patch, i don't understand how it was a problem :/ [19:24:39] neither do I [19:24:47] unless we had CommonSettings.php synced [19:24:52] we could make an inquiry [19:24:53] but it wasn't hugely spamming... only moderately [19:24:53] but InitialiseSettings.php to not be synced [19:25:07] that is thousands of lines per seconds [19:25:14] May 4 19:21:28 mw1142: message repeated 111105 times: [19:25:16] trying a grep on CommonSettings.php on several app servers? [19:25:29] that much? [19:25:30] only on mw1142? [19:25:40] various app servers [19:26:17] ssh fluorine.eqiad.wmnet tail -F /a/mw-log/hhvm.log | grep wmgUseUploadsLink [19:26:52] Dereckson: i hacked more debug into planet, found this: [19:26:59] 143 WARNING:planet.runner:wtf {'feed': {}, 'bozo': 1, 'bozo_exception': URLError(SSLError(1, u'[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify fa iled (_ssl.c:581)'),), 'entries': []} [19:27:03] 144 ERROR:planet.runner:Error 500 while updating feed https://timotijhof.net/category/tools/feed/ [19:27:11] Dereckson: but that's not the wordpress issue [19:27:15] 06Operations, 10ops-ulsfo: power loss in ulsfo cabinet 1.23 - https://phabricator.wikimedia.org/T134330#2265304 (10BBlack) Rack 1.23 also has half our cache and LVS servers there, and bast4001. Should we look (physically? or via serial console?) to make sure we didn't damage any power supplies on these as well? [19:27:31] it's a Comodo SSL certificate [19:27:59] and http redirects to https [19:28:06] ostriches: thcipriani: got a mediawiki config caching issue somehow :( [19:28:14] Verified by: Not specified [19:28:15] ? [19:29:00] Touch initialisesettings & sync it [19:29:17] RECOVERY - restbase endpoints health on restbase1009 is OK: All endpoints are healthy [19:29:18] Your connection to this website is not encrypted. [19:29:21] did a revert, that has touched it and sycned [19:29:23] will try again [19:30:09] !log hashar@tin Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 25s) [19:30:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:30:15] ohhhh [19:30:33] maybe that is just syslog that buffered [19:31:03] (03PS2) 10Hashar: group1 wikis to 1.27.0-wmf.23 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286939 [19:32:16] it is gone from logstash [19:32:37] (03CR) 10Hashar: [C: 032] group1 wikis to 1.27.0-wmf.23 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286939 (owner: 10Hashar) [19:32:38] mutante: https://phabricator.wikimedia.org/P3001 [19:32:57] PROBLEM - Restbase root url on restbase1009 is CRITICAL: HTTP CRITICAL - No data received from host [19:32:59] so the messages I was seeing on hhvm.log were just syslog flushing its buffer [19:33:04] (03Merged) 10jenkins-bot: group1 wikis to 1.27.0-wmf.23 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286939 (owner: 10Hashar) [19:33:25] !log hashar@tin rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.27.0-wmf.23 [19:33:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:34:42] mutante: I've an hypothesis: the webserver is configured to only offer the final certificate, not the intermediate one [19:34:58] RECOVERY - Restbase root url on restbase1009 is OK: HTTP OK: HTTP/1.1 200 - 15273 bytes in 0.030 second response time [19:35:08] (03PS1) 10Ottomata: Add analytics.wikmiedia.org pointing at misc cluster (stat1001) [dns] - 10https://gerrit.wikimedia.org/r/286948 (https://phabricator.wikimedia.org/T132407) [19:35:27] Dereckson: sounds like a good hypothesis, yep [19:36:10] Dereckson: updating https://phabricator.wikimedia.org/T133577 [19:36:42] fatal monitor for .23 barely has any events https://logstash.wikimedia.org/#dashboard/temp/AVR9RV34jK4nptUtko32 [19:37:03] (03CR) 10Ottomata: [C: 032] Add analytics.wikmiedia.org pointing at misc cluster (stat1001) [dns] - 10https://gerrit.wikimedia.org/r/286948 (https://phabricator.wikimedia.org/T132407) (owner: 10Ottomata) [19:38:28] mutante: https://www.ssllabs.com/ssltest/analyze.html?d=timotijhof.net&s=141.138.169.210 "This server's certificate chain is incomplete. Grade capped to B." [19:38:31] confirmed [19:39:32] the missing one is COMODO RSA Domain Validation Secure Server CA [19:40:20] Krinkle: you need to create a text file with two certificates: the public key of yours + this COMODO RSA Domain Validation Secure Server CA immediately after cat yourcertificate comodocertificate >> bundle works [19:40:42] aude: you had a change ? [19:40:48] Krinkle: if not, we can't access your server to recover your blog feed [19:41:28] (03PS3) 10Hashar: Bump cache epoch for Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286940 (owner: 10Aude) [19:42:40] (03PS2) 10Alex Monk: Make udpmxircecho conform to pep8 [puppet] - 10https://gerrit.wikimedia.org/r/286683 [19:43:31] (03PS1) 10Ottomata: Add analytics.wikimedia.org to list of domains served by misc varnish backend stat1001 [puppet] - 10https://gerrit.wikimedia.org/r/286950 (https://phabricator.wikimedia.org/T132407) [19:46:47] PROBLEM - Restbase root url on restbase1009 is CRITICAL: HTTP CRITICAL - No data received from host [19:48:38] RECOVERY - Restbase root url on restbase1009 is OK: HTTP OK: HTTP/1.1 200 - 15273 bytes in 0.004 second response time [19:48:54] Dereckson: pretty much everything that claims "500" is actually bozo_exception': URLError(timeout('timed out', [19:49:27] aude: ah the wikidata epoch bump https://gerrit.wikimedia.org/r/#/c/286940/ [19:49:34] Dereckson: also some that are not on wordpress.com and i still wonder what part changed.. [19:49:34] (03CR) 10Hashar: [C: 032] Bump cache epoch for Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286940 (owner: 10Aude) [19:49:59] (03Merged) 10jenkins-bot: Bump cache epoch for Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286940 (owner: 10Aude) [19:50:05] Hi pictures are no showing for me. [19:50:05] https://commons.wikimedia.org/wiki/File:FEZ_trial_gameplay_HD.webm [19:50:21] It shows in chrome A little image [19:50:40] The default image that shows when it carn't load or find the image. [19:51:05] paladox, is this for all images or just the linked webm? [19:51:09] hashar: thanks [19:51:12] All [19:51:17] !log hashar@tin Synchronized wmf-config/: Bump cache epoch for Wikidata - https://gerrit.wikimedia.org/r/#/c/286940/ (duration: 00m 30s) [19:51:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:51:30] (03CR) 10Hashar: "Deployed." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286940 (owner: 10Aude) [19:51:57] paladox, just realising which channel this is, it might be worth moving the discussion to #wikimedia-tech ? [19:52:29] paladox: works for me [19:52:32] Oh, i think it's just chrome cache even though i didn't really use chrome. It loads properly in Internet Explorer. [19:52:37] paladox: must be some codec issue with your browser maybe [19:52:58] Yep probaly. [19:53:59] paladox: if you have the player showing up, at the bottom right you can change Ogg/Webm and resolution. [19:54:17] hashar: Yep, doing f5 seemed to fix it. [19:54:22] Thanks [19:55:04] hmmm [19:55:34] Hallo [19:56:34] (03PS1) 10Hashar: Typo tests (DO NOT SUBMIT) [puppet] - 10https://gerrit.wikimedia.org/r/286953 [19:56:57] aude: did the epoch cache work? [19:57:17] I once requested access to stat1002 - https://phabricator.wikimedia.org/T122524 [19:57:41] but it looks like I only have access to stat1003 [19:57:41] (03CR) 10jenkins-bot: [V: 04-1] Typo tests (DO NOT SUBMIT) [puppet] - 10https://gerrit.wikimedia.org/r/286953 (owner: 10Hashar) [19:57:48] And I need stat1002 [19:57:58] hashar: i think so but maybe there still is a bug [19:58:06] (03PS1) 10BBlack: LE: refactor renewal check, add info output [puppet] - 10https://gerrit.wikimedia.org/r/286954 [19:58:20] (03PS2) 10BBlack: LE: refactor renewal check, add info output [puppet] - 10https://gerrit.wikimedia.org/r/286954 [19:58:29] Dereckson: do fr.planet feeds _actually_ not get updated (except that littletony87) ? can you confirm one more time? [19:58:37] (03CR) 10Nuria: [C: 031] Add analytics.wikimedia.org to list of domains served by misc varnish backend stat1001 [puppet] - 10https://gerrit.wikimedia.org/r/286950 (https://phabricator.wikimedia.org/T132407) (owner: 10Ottomata) [19:58:39] (03CR) 10BBlack: [C: 032 V: 032] LE: refactor renewal check, add info output [puppet] - 10https://gerrit.wikimedia.org/r/286954 (owner: 10BBlack) [19:58:48] (03PS2) 10Hashar: Typo tests (DO NOT SUBMIT) [puppet] - 10https://gerrit.wikimedia.org/r/286953 [19:59:15] mutante: https://fr.planet.wikimedia.org/ looks good, with last posts from two blogs, DarkoNeko and Wikimedia France [19:59:29] Dereckson: including wordpress.com feeds. what ..the .. [19:59:32] aharoni, how are you actually using 1003 to connect to the mysql slaves? do you have your own mysql login? [19:59:44] (03CR) 10jenkins-bot: [V: 04-1] Typo tests (DO NOT SUBMIT) [puppet] - 10https://gerrit.wikimedia.org/r/286953 (owner: 10Hashar) [20:00:04] gwicke cscott arlolra subbu bearND mdholloway: Respected human, time to deploy Services – Parsoid / OCG / Citoid / Mobileapps / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160504T2000). Please do the needful. [20:00:28] Krenair: I'm actually not using stat1003 at the moment. I need stat1002, and I have instructions from Ellery about what to do there. [20:00:30] (03Abandoned) 10Hashar: Typo tests (DO NOT SUBMIT) [puppet] - 10https://gerrit.wikimedia.org/r/286953 (owner: 10Hashar) [20:01:02] aharoni, did you ever use 1003? if not you should probably reopen and ask for it to be corrected [20:01:40] (03CR) 10BBlack: [C: 031] Add analytics.wikimedia.org to list of domains served by misc varnish backend stat1001 [puppet] - 10https://gerrit.wikimedia.org/r/286950 (https://phabricator.wikimedia.org/T132407) (owner: 10Ottomata) [20:02:08] mutante: perhaps Automattic looks at the issue in their side? [20:02:12] !log starting deploy of parsoid sha b0d015fa [20:02:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:02:41] 06Operations, 10Internet-Archive, 10Wikimedia-Planet, 13Patch-For-Review: fr.planet doesn't update as expected - https://phabricator.wikimedia.org/T133573#2265437 (10Dzahn) 05Open>03Resolved @Dereckson the only issue left on fr.planet that i see is now: http://littletony87.unblog.fr/feed/ .. Unicode... [20:02:46] 06Operations, 10Analytics, 10ContentTranslation-Analytics, 10MediaWiki-extensions-ContentTranslation, and 4 others: schedule a daily run of ContentTranslation analytics scripts - https://phabricator.wikimedia.org/T122479#2265442 (10Amire80) [20:02:50] 06Operations, 10Ops-Access-Requests, 10Analytics, 10ContentTranslation-Analytics, 10MediaWiki-extensions-ContentTranslation: access for amire80 to stat1002.eqiad.wmnet - https://phabricator.wikimedia.org/T122524#2265439 (10Amire80) 05Resolved>03Open @ellery says that I will definitely need stat1002 t... [20:04:57] !log Completed group1 wikis to 1.27.0-wmf.23 [20:05:01] deployment done [20:05:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:05:23] (03PS1) 10Ottomata: Set up analytics.wikimedia.org site on stat1001 [puppet] - 10https://gerrit.wikimedia.org/r/286957 (https://phabricator.wikimedia.org/T132407) [20:05:29] (03PS2) 10Hashar: Adjust /typos to use extended regular expressions [puppet] - 10https://gerrit.wikimedia.org/r/286938 (https://phabricator.wikimedia.org/T133047) [20:05:47] !log synced new code; restarted parsoid on wtp1001 as canary [20:05:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:07:10] 06Operations, 10Ops-Access-Requests, 10Analytics, 10ContentTranslation-Analytics, 10MediaWiki-extensions-ContentTranslation: access for amire80 to stat1002.eqiad.wmnet - https://phabricator.wikimedia.org/T122524#2265466 (10Ottomata) Please clarify how you will run these queries. If MySQL, then you only n... [20:08:10] (03PS2) 10Ottomata: Set up analytics.wikimedia.org site on stat1001 [puppet] - 10https://gerrit.wikimedia.org/r/286957 (https://phabricator.wikimedia.org/T132407) [20:08:13] 06Operations, 10Ops-Access-Requests, 10Analytics, 10ContentTranslation-Analytics, 10MediaWiki-extensions-ContentTranslation: access for amire80 to stat1002.eqiad.wmnet - https://phabricator.wikimedia.org/T122524#2265469 (10Krenair) Yes, can we please start talking in terms of groups instead of hosts? [20:08:17] (03CR) 10Ottomata: [C: 032 V: 032] Set up analytics.wikimedia.org site on stat1001 [puppet] - 10https://gerrit.wikimedia.org/r/286957 (https://phabricator.wikimedia.org/T132407) (owner: 10Ottomata) [20:08:44] !log finished deploying parsoid sha b0d015fa (T134017) [20:08:45] T134017: Create Wikipedia Jamaican - https://phabricator.wikimedia.org/T134017 [20:08:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:09:08] James_F, Krenair ^ [20:09:19] does that deal with jamwiki? [20:09:35] yes. [20:09:45] deployed your sitematrix update on the parsoid end. [20:09:50] (03PS1) 10Ottomata: Use content instead of source param in statistics::sites::analytics [puppet] - 10https://gerrit.wikimedia.org/r/286958 [20:10:00] Hey yall, do you guys have a process for when a new wiki project is added? [20:10:07] there's some analytics stuff that depends on this [20:10:17] and if we could get in there as folks who should be notified [20:10:24] it would help us avoid some job alerts and failuers [20:10:49] ottomata, there is .. i don't have the wikipage handy .. Krenair should know. [20:10:51] A new wiki project ottomata? [20:11:01] You just mean a new wiki? [20:11:04] (03CR) 10Ottomata: [C: 032] Use content instead of source param in statistics::sites::analytics [puppet] - 10https://gerrit.wikimedia.org/r/286958 (owner: 10Ottomata) [20:11:18] cscott, all done with parsoid [20:11:20] Krenair: ottomata - https://wikitech.wikimedia.org/wiki/Add_a_wiki ? [20:11:26] yes [20:11:34] ^ that's for adding new wikis, it doesn't cover adding a new *project* [20:11:35] perfect, thanks i'll edit [20:11:38] uhhh [20:12:09] Krenair: is "Jamaican Wikipedia" a wiki or a project? :) [20:12:11] new wiki [20:12:16] hashar: there is an issue with some labels and descriptions not appearing on wikidata items [20:12:16] 'project' == things like en, jam, etc, eh no? [20:12:16] we already have Wikipedias [20:12:21] afaik people call those projects [20:12:23] project = wikipedia, wikinews [20:12:26] haha [20:12:26] wikibooks [20:12:27] etc. [20:12:28] so much ambiguity [20:12:36] i've heard otherwise...:p [20:12:40] but yes, i mean new wiki then [20:12:47] or new project really [20:12:52] ottomata, which files do I need to change for you when making a new wiki? [20:12:57] we might want to put wikidata extension back on the previous branch (wmf22) for now [20:13:07] Krenair: uhh there is a whitelist for something lemme find it [20:13:10] (03CR) 10Dzahn: "@Muehlenhoff turns base::firewall is already in the role class, so i saw no puppet change. but the ferm service was not started and puppet" [puppet] - 10https://gerrit.wikimedia.org/r/286856 (owner: 10Muehlenhoff) [20:13:10] i was just going to add a notification [20:13:11] ok [20:13:18] we saw some alarms about something a few days ago [20:13:20] for jam [20:13:21] even if i can fix, i think most wikidata develoeprs are at a biergarden and not available [20:13:23] I'd rather we have a complete checklist documented ottomata [20:13:28] and we only figure what it was because i heard about it in the ops meeting [20:13:31] ok, looking [20:13:41] (gimme a few mins, in the middle of something tooo) [20:13:42] (wiki)?(med|ped|man)ia(wiki)? [20:13:55] Not just 'notify person x, they'll know what to do in analytics' because person x might leave without telling anyone what they did [20:14:37] (03PS1) 10Ottomata: Use template() in statistics::sites::analytics [puppet] - 10https://gerrit.wikimedia.org/r/286959 [20:14:57] (03CR) 10Ottomata: [C: 032 V: 032] Use template() in statistics::sites::analytics [puppet] - 10https://gerrit.wikimedia.org/r/286959 (owner: 10Ottomata) [20:16:23] Krenair: i think it is this [20:16:29] we should verify with joal but he is offline for the day [20:16:56] i think this is so that garbage pageview data doesn't make it into the pageview api or seomthign [20:17:12] hashar: are you available? [20:17:26] this? [20:17:31] cscott: more or less, depend on the topic :) [20:17:36] oh sorry [20:17:38] https://github.com/wikimedia/analytics-refinery/blob/master/static_data/pageview/whitelist/whitelist.tsv [20:18:05] ... You guys made a list of wikis without documenting at Add_a_wiki? [20:18:28] actually, maybe ottomata is a better person to help, since he's the most recent person to push a puppet change AFAICT. i just want some hand-holding deploying a puppet patch: https://gerrit.wikimedia.org/r/286068 [20:18:56] it's been +1'ed twice, it just need to be deployed. i'm pretty sure i've deployed puppet patches in the past, but the procedure at https://wikitech.wikimedia.org/wiki/Puppet#Making_changes doesn't look familiar to me. [20:18:57] ottomata, which wikis are supposed to be on this list? I checked a couple of private ones and they don't show up [20:19:23] maybe the process has been changed since I last did it? or maybe i'm looking at the wrong wiki page? [20:19:25] Dereckson: ok, so .. everything works again but i dont know why [20:19:51] cscott: yeah for that you want to sync with ops. I lack merge rights / puppet master access [20:20:04] Krenair: i'm not totally sure [20:20:07] and overall most probably cant connect to the OCG machines to babysit the puppet patch :( [20:20:08] i don't have all the context here [20:20:15] i'm making a phab ticket now and will CC the right people [20:20:19] ottomata, needs to be figured out before it can be added [20:20:24] sure thing [20:20:30] wikimanias are also not there? [20:20:39] (03PS1) 10Aude: Put wikidata back on wmf/1.27.0-wmf.22 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286961 (https://phabricator.wikimedia.org/T134432) [20:20:44] In fact there are only 4 *.wikimedias, this cannot be right [20:20:47] aude: :( [20:20:49] hashar: ^ [20:20:49] Krenair: https://phabricator.wikimedia.org/T134433 [20:21:08] it's not th emost terrible bug but shouldn't leave wikidata broken while i fix [20:21:09] hashar: oh, hm, i thought you were the one who'd helped me with puppet in the far distant past. i guess i'm misremembering. [20:21:14] not a high priority, but someone will eventually follow up with you [20:21:19] to figure this out [20:21:20] thank you! [20:21:29] i don't think parser cache needs to be bumped again [20:21:38] ottomata: can you help me with https://gerrit.wikimedia.org/r/286068 (see backlog above)? [20:22:19] cscott: you just need a merge? [20:22:22] aude: could it be that the l10n cache needs to be rebuild ? [20:22:30] i see some trustworhty +1s there, so I'm ahppy to do so [20:22:56] hashar: did we do that after we bumped the submodule yesterday? [20:23:00] it'd be a blind merge for me, if you tell me a host that that will run on i'll log in and run puppet to make sure it works [20:23:09] i believe so. i'd thought that in the past i'd been able to do that myself, but i must be misremembering. i think i'm getting it mixed up with jenkins/zuul hacking, on which hashar definitely tutored me in the past. [20:23:09] but i can reproduce the issue locally, so think that's probably not the reason [20:23:22] (03PS2) 10Ottomata: Add analytics.wikimedia.org to list of domains served by misc varnish backend stat1001 [puppet] - 10https://gerrit.wikimedia.org/r/286950 (https://phabricator.wikimedia.org/T132407) [20:23:23] aude: sal would know. I havent for sure [20:23:26] 06Operations, 10Ops-Access-Requests, 10Analytics, 10ContentTranslation-Analytics, 10MediaWiki-extensions-ContentTranslation: access for amire80 to stat1002.eqiad.wmnet - https://phabricator.wikimedia.org/T122524#2265533 (10ellery) @Amire80 needs to run hive queries to count the number times users navigat... [20:23:29] (03CR) 10Ottomata: [C: 032 V: 032] Add analytics.wikimedia.org to list of domains served by misc varnish backend stat1001 [puppet] - 10https://gerrit.wikimedia.org/r/286950 (https://phabricator.wikimedia.org/T132407) (owner: 10Ottomata) [20:23:35] cscott: on the beta cluster yes :) [20:23:49] hashar: oh, maybe that's the difference. [20:23:56] hashar: i thought the regular i18n update does that [20:24:25] aude: yeah I can imagine it doing the refresh [20:24:39] lets roll it back and think later [20:24:56] (03CR) 10Hashar: [C: 032] Put wikidata back on wmf/1.27.0-wmf.22 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286961 (https://phabricator.wikimedia.org/T134432) (owner: 10Aude) [20:25:31] (03PS2) 10Ottomata: Use FQDN for OCG hostnames. [puppet] - 10https://gerrit.wikimedia.org/r/286068 (https://phabricator.wikimedia.org/T133864) (owner: 10Cscott) [20:25:39] (03CR) 10Ottomata: [C: 032 V: 032] Use FQDN for OCG hostnames. [puppet] - 10https://gerrit.wikimedia.org/r/286068 (https://phabricator.wikimedia.org/T133864) (owner: 10Cscott) [20:25:46] thanks [20:26:04] as said, it's probably an easy fix but then not sure i can get anyone to review the patch tonight [20:26:07] ok cscott, merged [20:26:13] dunno what that is gonna do though... [20:26:18] or where it will run :) [20:26:48] ottomata: it should affect the ocg machines in prod and beta. i can log in and check to see if it's taken effect. [20:26:51] (03Merged) 10jenkins-bot: Put wikidata back on wmf/1.27.0-wmf.22 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286961 (https://phabricator.wikimedia.org/T134432) (owner: 10Aude) [20:26:57] won't actually do anything until i actually restart ocg though. [20:28:25] !log hashar@tin rebuilt wikiversions.php and synchronized wikiversions files: Wikidata back to 1.27.0-wmf.22 due to T134432. Poke T131557. [20:28:26] T134432: Missing labels and descriptions in "other languages" box - https://phabricator.wikimedia.org/T134432 [20:28:26] T131557: MW-1.27.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T131557 [20:28:29] aude: wikidata should be back to .22 now [20:28:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:30:18] ok thanks [20:30:45] looks better for now [20:31:16] (03PS1) 10Gehel: Add response time checks to WDQS [puppet] - 10https://gerrit.wikimedia.org/r/286992 (https://phabricator.wikimedia.org/T119915) [20:32:08] !log catrope@tin Synchronized php-1.27.0-wmf.23/extensions/Echo/Hooks.php: Fix fatal (T134428) (duration: 00m 33s) [20:32:08] T134428: Call to a member function getUnreadCounts() on a non-object (boolean) - https://phabricator.wikimedia.org/T134428 [20:32:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:32:18] (03CR) 10jenkins-bot: [V: 04-1] Add response time checks to WDQS [puppet] - 10https://gerrit.wikimedia.org/r/286992 (https://phabricator.wikimedia.org/T119915) (owner: 10Gehel) [20:32:32] (03CR) 10Cscott: "Compiler diff, doesn't look right:" [puppet] - 10https://gerrit.wikimedia.org/r/286070 (https://phabricator.wikimedia.org/T84723) (owner: 10Cscott) [20:33:35] (03PS2) 10Gehel: Add response time checks to WDQS [puppet] - 10https://gerrit.wikimedia.org/r/286992 (https://phabricator.wikimedia.org/T119915) [20:33:39] ottomata, Krenair : does Ellery's comment on https://phabricator.wikimedia.org/T122524 answer your questions? [20:33:49] ottomata: can you trigger puppet on deployment-pdf01.eqiad.wmflabs and deployment-pdf02.eqiad.wmflabs? (or tell me how, I have root on those machines). Assuming the change looks good in labs, we can trigger puppet on the production ocg machines. [20:34:07] think so [20:34:09] sudo puppet agent -t [20:34:17] (Sorry about rushing, but it's pretty late in this hemisphere :) ) [20:35:22] (03CR) 10Smalyshev: Add response time checks to WDQS (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/286992 (https://phabricator.wikimedia.org/T119915) (owner: 10Gehel) [20:35:30] (03CR) 10Smalyshev: [C: 031] Add response time checks to WDQS [puppet] - 10https://gerrit.wikimedia.org/r/286992 (https://phabricator.wikimedia.org/T119915) (owner: 10Gehel) [20:35:42] 06Operations, 10Internet-Archive, 10Wikimedia-Planet, 13Patch-For-Review: fr.planet doesn't update as expected - https://phabricator.wikimedia.org/T133573#2265581 (10Dzahn) [20:35:44] 06Operations, 10Internet-Archive, 10Wikimedia-Planet, 07Upstream: wordpress.com seems to have blocked us from fetching feeds - https://phabricator.wikimedia.org/T133818#2265579 (10Dzahn) 05Open>03Resolved Updates just started working again. I made no change on our side and not sure what happened. But i... [20:35:51] 06Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 13Patch-For-Review, and 2 others: Create Wikipedia Jamaican - https://phabricator.wikimedia.org/T134017#2265584 (10Krenair) @jcrespo/@volans: I think we need a maintain-replicas.pl run? [20:37:06] (03PS1) 10Addshore: Switch my planet feed from cats to tags [puppet] - 10https://gerrit.wikimedia.org/r/286995 [20:37:10] ottomata: hm, no change on deployment-pdf01.eqiad.wmflabs after running `sudo puppet agent -t`: [20:37:10] Info: Retrieving plugin [20:37:10] Info: Loading facts in /var/lib/puppet/lib/facter/initsystem.rb [20:37:10] Info: Loading facts in /var/lib/puppet/lib/facter/puppet_config_dir.rb [20:37:10] Info: Loading facts in /var/lib/puppet/lib/facter/labsprojectfrommetadata.rb [20:37:11] Info: Loading facts in /var/lib/puppet/lib/facter/physicalcorecount.rb [20:37:13] Info: Loading facts in /var/lib/puppet/lib/facter/apt.rb [20:37:13] Info: Loading facts in /var/lib/puppet/lib/facter/pe_version.rb [20:37:13] _joe_, godog: do you guys have any time to meet about T97562? [20:37:13] T97562: Get cache relay daemon reviewed and usable - https://phabricator.wikimedia.org/T97562 [20:37:16] Info: Loading facts in /var/lib/puppet/lib/facter/lldp.rb [20:37:16] Info: Loading facts in /var/lib/puppet/lib/facter/ganeti.rb [20:37:17] Info: Loading facts in /var/lib/puppet/lib/facter/root_home.rb [20:37:18] Info: Loading facts in /var/lib/puppet/lib/facter/puppet_vardir.rb [20:37:20] Info: Caching catalog for deployment-pdf01.deployment-prep.eqiad.wmflabs [20:37:20] Info: Applying configuration version '1462393883' [20:37:23] Notice: Finished catalog run in 45.45 seconds [20:37:24] how do i check if that configuration version is the right one? [20:38:52] oh cscott it probably needs synced on deployment puppetmaster [20:38:53] might have to wait [20:38:55] in meeting... [20:38:58] it'll happen eventually [20:39:28] ok, no worries. i appreciate the help, and i'd rather take it slow when there's a risk of breaking prod. ;) [20:42:08] 06Operations, 10ops-ulsfo: power loss in ulsfo cabinet 1.23 - https://phabricator.wikimedia.org/T134330#2265627 (10RobH) I'll check all servers for error LEDs when onsite, as well as checking the power supply LEDS of all devices. [20:46:43] hashar: um, what? [20:46:51] hashar: it was a temporary deployment issue [20:47:29] !log catrope@tin Synchronized php-1.27.0-wmf.22/extensions/Echo/Hooks.php: Fix fatal (T134428) (duration: 00m 32s) [20:47:30] T134428: Call to a member function getUnreadCounts() on a non-object (boolean) - https://phabricator.wikimedia.org/T134428 [20:47:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:51:14] legoktm: hello [20:51:31] (03CR) 10Legoktm: "No, the patch was fine. It was a temporary deployment issue because I used sync-dir wmf-config/ instead of doing sync-file InitialiseSetti" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286517 (https://phabricator.wikimedia.org/T130018) (owner: 10Rillke) [20:51:57] legoktm: I did sync file InitialiseSettings and CommonSettings and apparently the spam was not going off [20:52:27] legoktm: we can put it back ! [20:52:43] hashar: the app servers buffer log messages locally, so they often show up late in the logs :( [20:53:18] jouncebot: next [20:53:18] In 2 hour(s) and 6 minute(s): Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160504T2300) [20:53:49] (03PS1) 10Legoktm: Revert "Revert "Enable UploadsLink at Wikimedia Commons"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286998 [20:53:57] (03PS2) 10Legoktm: Revert "Revert "Enable UploadsLink at Wikimedia Commons"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286998 [20:54:05] legoktm: noooo [20:54:09] no revert revert [20:54:09] no? [20:54:12] that is a mess to read :D [20:54:18] cherry pick the reverted patch on tip of master [20:54:29] so you preserve the original commit message :=) [20:54:51] also I did sync Common/Initialise files [20:54:55] okay [20:54:57] but that did not cleared out the spam https://logstash.wikimedia.org/#dashboard/temp/AVR9jQsECsPTNesWGpTe :( [20:55:16] (03PS3) 10Gehel: Add response time checks to WDQS [puppet] - 10https://gerrit.wikimedia.org/r/286992 (https://phabricator.wikimedia.org/T119915) [20:55:25] initialise settings was sync-file at 21:15:53 [20:55:25] (03PS3) 10Legoktm: Enable UploadsLink at Wikimedia Commons (try #2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286998 (https://phabricator.wikimedia.org/T130018) [20:55:35] legoktm: better: ) [20:55:57] and the notice only went away when I did the revert [20:56:35] hrrm. [20:56:47] I looked at it for a few minutes [20:56:51] then gave up / reverted [20:57:07] I have really Zero idea why it happens really :( [20:57:17] maybe the conf cache does not refresh somehow [20:57:25] (03CR) 10Rillke: "> That causes a huge spam of" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286517 (https://phabricator.wikimedia.org/T130018) (owner: 10Rillke) [20:58:27] hashar: so, I'm going to merge that, sync out initialisesettings, then use mwscript eval.php on an app server to verify $wmgUseUploadsLink is defined, and then sync out commonsettings [20:58:35] sound good? [20:58:41] legoktm: maybe InitialiseSettings.php context is loaded in CommonSettings.php further below than line 827 ? [20:59:14] (03CR) 10Mobrovac: [C: 04-1] "This is effectively blocked until I398ac38e091e865023e532af4469f51dd666b985 is fixed, merged and present on tin so I'm changing my vote o" [puppet] - 10https://gerrit.wikimedia.org/r/286695 (https://phabricator.wikimedia.org/T129147) (owner: 10BearND) [20:59:15] hashar: no, because above the UploadsLink line there are other variables that are used from InitialiseSettings? [20:59:21] (03PS2) 10Cscott: Decommission ocg1003. [puppet] - 10https://gerrit.wikimedia.org/r/286070 (https://phabricator.wikimedia.org/T84723) [20:59:37] legoktm: yeah that is what has lead me to give up :] [20:59:49] maybe we can sync-common a single app server to debug on? [20:59:53] instead of the whole cluster [20:59:58] (cant remember how to that) [21:00:17] oh yeah, lets do that [21:00:25] I can stage it on mw1017 [21:00:34] (03CR) 10Legoktm: [C: 032] Enable UploadsLink at Wikimedia Commons (try #2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286998 (https://phabricator.wikimedia.org/T130018) (owner: 10Legoktm) [21:01:00] (03Merged) 10jenkins-bot: Enable UploadsLink at Wikimedia Commons (try #2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286998 (https://phabricator.wikimedia.org/T130018) (owner: 10Legoktm) [21:01:38] cscott: i synced deployment-puppetmaster, running puppet now, do you know where to look to see if your change got applied while we weren't looking? [21:01:54] (03CR) 10Hashar: "Well your patch is entirely legit. Legoktm approved it and I would have approved it as well." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286517 (https://phabricator.wikimedia.org/T130018) (owner: 10Rillke) [21:02:04] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 5698985 keys - replication_delay is 0 [21:02:19] (03CR) 10Hashar: "+1 been talking about it with Kunal on irc." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286998 (https://phabricator.wikimedia.org/T130018) (owner: 10Legoktm) [21:02:50] ... [21:05:11] May 4 21:04:42 mw1017: #012Notice: Undefined variable: wmgUseUploadsLink in /srv/mediawiki/wmf-config/CommonSettings.php on line 827 [21:05:36] but [21:05:54] I've requested https://commons.wikimedia.org/wiki/User:Legoktm 10 times now with x-wikimedia-debug and only one notice [21:06:03] niceee [21:06:11] ottomata: /etc/ocg/mw-ocg-service.js looking at the "config.coordinator.hostname" line, which *ought* to have a FQDN after puppet runs. [21:06:26] ottomata: doesn't seem to have it yet on deployment-pdf01 [21:08:11] !log legoktm@tin Synchronized wmf-config/InitialiseSettings.php: Enable UploadsLink on Wikimedia Commons (1/2) - try 2 (duration: 00m 28s) [21:08:14] legoktm: I have no idea how the whole mess works really. Maybe some HHVM thread is kept alive with an old version a half baked version of the code [21:08:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:08:49] hashar: I think the logging is broked [21:08:52] May 4 19:21:28 mw1158: message repeated 29232 times: [ #012Notice: Undefined variable: wmgUseUploadsLink in /srv/mediawiki/wmf-config/CommonSettings.php on line 827] [21:08:52] May 4 21:04:42 mw1017: #012Notice: Undefined variable: wmgUseUploadsLink in /srv/mediawiki/wmf-config/CommonSettings.php on line 827 [21:08:52] May 4 19:21:06 mw1159: message repeated 28107 times: [ #012Notice: Undefined variable: wmgUseUploadsLink in /srv/mediawiki/wmf-config/CommonSettings.php on line 827] [21:09:06] look at the timestamps >.< [21:09:13] yeah that is the syslog buffering I guess (hope) [21:09:28] with date being the first time of the event [21:09:38] it is flushed whenever another event is generated [21:09:49] got caught by that one while tailling the log file [21:09:56] then logstash had the spam as well [21:10:06] legoktm: lets sync everywhere [21:10:25] legoktm@mw1018:~$ mwscript eval.php --wiki=commonswiki [21:10:25] > echo $wmgUseUploadsLink; [21:10:25] 1 [21:11:21] !log legoktm@tin Synchronized wmf-config/CommonSettings.php: Enable UploadsLink on Wikimedia Commons (2/2) - try 2 (duration: 00m 30s) [21:11:25] it is coming back :) [21:11:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:11:45] WTF [21:12:22] (03PS4) 10Gehel: Add response time checks to WDQS [puppet] - 10https://gerrit.wikimedia.org/r/286992 (https://phabricator.wikimedia.org/T119915) [21:12:25] well [21:12:28] just one spike [21:12:39] maybe due to files recaching somehow [21:12:51] hmm [21:12:56] it's still going though [21:13:01] May 4 21:12:46 mw1008: #012Notice: Undefined variable: wmgUseUploadsLink in /srv/mediawiki/wmf-config/CommonSettings.php on line 827 [21:13:02] hm ya cscott doesn't have it [21:13:02] hm [21:13:05] touch InitialiseSettings.php and sync it ? [21:13:42] ottomata: maybe i should check puppet-compiler, perhaps the labs hosts don't have a FQDN set in puppet? (but that would be weird) [21:13:45] * legoktm does [21:13:50] !log legoktm@tin Synchronized wmf-config/InitialiseSettings.php: touch (duration: 00m 27s) [21:13:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:14:06] am looking on deploy puppetmaster, it is missing your change [21:14:08] looking... [21:14:31] May 4 21:14:12 mw1008: #012Notice: Undefined variable: wmgUseUploadsLink in /srv/mediawiki/wmf-config/CommonSettings.php on line 827 [21:14:33] 06Operations, 06Analytics-Kanban, 13Patch-For-Review: Upgrade stat1001 to Debian Jessie - https://phabricator.wikimedia.org/T76348#2265713 (10ezachte) Issues fixed. Thanks, elukey, and others. [21:14:36] * legoktm logs in to test [21:15:11] ah, the repo is broke there, bad rebase [21:15:11] hm [21:16:15] hashar: I logged into one of the appservers with notices, ran mwscript eval.php, and the variable is defined. [21:16:42] :( [21:18:39] I have an idea [21:19:19] (03PS3) 10Cscott: Decommission ocg1003. [puppet] - 10https://gerrit.wikimedia.org/r/286070 (https://phabricator.wikimedia.org/T84723) [21:20:06] (03PS1) 10Legoktm: Suppress undefined $wmgUseUploadsLink warnings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/287000 [21:20:14] (03CR) 10jenkins-bot: [V: 04-1] Decommission ocg1003. [puppet] - 10https://gerrit.wikimedia.org/r/286070 (https://phabricator.wikimedia.org/T84723) (owner: 10Cscott) [21:20:52] (03CR) 10Gehel: [C: 032] Add response time checks to WDQS [puppet] - 10https://gerrit.wikimedia.org/r/286992 (https://phabricator.wikimedia.org/T119915) (owner: 10Gehel) [21:21:05] hashar: I'm not sure if that is a good idea ^ [21:21:12] but it would stop the warnings. [21:21:20] legoktm: looking at CommonSettings.php the invalidation time is based on "$wmfConfigDir/." [21:21:38] 9330dbd38fc85998d8bdea67224ea4063a60d490 [21:21:38] Should I try touching the directory? [21:21:43] yeah [21:21:43] (03PS4) 10Cscott: Decommission ocg1003. [puppet] - 10https://gerrit.wikimedia.org/r/286070 (https://phabricator.wikimedia.org/T84723) [21:21:58] which is like [21:22:01] ok cscott ran puppet on deployment-pdf01 and it applied your change [21:22:05] breaking a decade old convention [21:22:40] (03PS1) 10Ottomata: Remove base::firewall from stat1002 and stat1004 analytics cluster clients [puppet] - 10https://gerrit.wikimedia.org/r/287001 (https://phabricator.wikimedia.org/T134422) [21:23:00] !log legoktm@tin Synchronized wmf-config/: touch the directory this time (duration: 00m 37s) [21:23:07] ostriches: touching InitialiseSettings.php is no more the way to invalidate the cache!!! https://phabricator.wikimedia.org/rOMWC9330dbd38fc85998d8bdea67224ea4063a60d490 [21:23:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:23:28] tada [21:23:33] whaaaa [21:23:36] :=) [21:23:37] that worked [21:23:39] * legoktm hugs hashar [21:23:45] so different FS / different os whatever [21:23:57] and changing the files in a dir dont necessarly change the mtime [21:24:10] rillke: we should be good now :) [21:24:18] (03Abandoned) 10Legoktm: Suppress undefined $wmgUseUploadsLink warnings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/287000 (owner: 10Legoktm) [21:24:30] (03CR) 10Ottomata: [C: 032] Remove base::firewall from stat1002 and stat1004 analytics cluster clients [puppet] - 10https://gerrit.wikimedia.org/r/287001 (https://phabricator.wikimedia.org/T134422) (owner: 10Ottomata) [21:24:33] oh man it is 11pm again [21:24:40] thanks! [21:24:46] going to take a short break then I will fill a task about it [21:24:52] hashar: It did when I tested it on tin :) [21:25:05] ostriches: will fill a bug about it [21:25:10] !log deploying new icinga check on response time for WDQS [21:25:15] It updated mtimes anytime the content of the files changed. [21:25:17] the directory mtime is not always updated [21:25:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:25:27] (03CR) 10Thcipriani: "We're all currently at the reading team's offsite ("all" == myself, BearND, and Mholloway). Ideally, we could coordinate this rollout to h" [puppet] - 10https://gerrit.wikimedia.org/r/286695 (https://phabricator.wikimedia.org/T129147) (owner: 10BearND) [21:25:30] (03CR) 10Cscott: "@Dzahn: ok, put it in hiera, and things appear to actually work! Compiler diff, looks right:" [puppet] - 10https://gerrit.wikimedia.org/r/286070 (https://phabricator.wikimedia.org/T84723) (owner: 10Cscott) [21:25:46] hashar: Sure, but it should for the cases I was trying to cover.... [21:25:48] And did in testing... [21:25:57] yeah but it does not [21:26:07] at least not for all hosts / modification types [21:26:22] I got some interesting scenario. Will copy write something to a task [21:26:42] 06Operations, 10Ops-Access-Requests, 10Analytics, 10ContentTranslation-Analytics, 10MediaWiki-extensions-ContentTranslation: access for amire80 to stat1002.eqiad.wmnet - https://phabricator.wikimedia.org/T122524#2265768 (10Dzahn) a:05Dzahn>03None [21:26:45] but surely using wmfconfig dir instead of InitialiseSettings.php is a very good idea [21:27:06] then I am not sure how the FS updates the parent dir [21:27:19] and some app servers might have different mount options [21:27:33] like i dont know --speed-stuff-up (does not update parent dir info) [21:27:35] or whatever else [21:29:10] 06Operations, 10Ops-Access-Requests, 10Analytics, 10ContentTranslation-Analytics, 10MediaWiki-extensions-ContentTranslation: access for amire80 to stat1002.eqiad.wmnet - https://phabricator.wikimedia.org/T122524#2265769 (10Dzahn) it will be handled, i'm just giving it back to the pool because access requ... [21:29:10] ottomata: w00t. ran puppet on deployment-pdf02 as well, looked fine, restarted ocg on both nodes, ran tests, seems to work fine. (and updated https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL) [21:29:25] great [21:30:00] ottomata: can we run puppet on ocg100[123].eqiad.wmnet now? (i don't have root on those boxes, but i have perms sufficient to restart ocg once puppet is run.) [21:32:50] (03PS2) 10Rush: scap access.conf entries for labs deployments [puppet] - 10https://gerrit.wikimedia.org/r/286852 (https://phabricator.wikimedia.org/T121721) [21:35:18] !log catrope@tin Synchronized php-1.27.0-wmf.22/extensions/Echo/Hooks.php: Debug logging for seemingly unattached users (duration: 00m 25s) [21:35:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:35:36] (03CR) 10Rush: [C: 032] scap access.conf entries for labs deployments [puppet] - 10https://gerrit.wikimedia.org/r/286852 (https://phabricator.wikimedia.org/T121721) (owner: 10Rush) [21:37:59] ottomata: ping? don't mean to rush you, but I have to leave to pick up my kid from daycare in about 30 min, and i'd rather not leave an untested undeployed puppet change in the tree... [21:40:26] pffff [21:40:29] I have lost my task [21:41:58] 06Operations, 10OCG-General, 13Patch-For-Review, 05codfw-rollout: Use FQDNs instead of hostnames in the download urls sent to Mediawiki - https://phabricator.wikimedia.org/T133864#2265818 (10cscott) Puppet patch merged and deployed on labs machines, OCG restarted, seems to work fine. Waiting for @Ottomata... [21:41:59] cscott: looking [21:42:02] its probably already applied [21:42:14] ottomata: thanks. yeah, i can't tell if it's there or not w/o root. [21:42:22] ja config.coordinator.hostname = "ocg1001.eqiad.wmnet"; [21:42:39] ottomata: ok, cool. i'll restart the service and run tests and make sure everything's peachy. [21:42:59] cscott: looks good on the other 2 as well [21:43:38] !log restarting OCG after puppet deploy of https://gerrit.wikimedia.org/r/286068 [21:43:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:44:20] (03PS1) 10Rush: labs pdns adjustments for perf [puppet] - 10https://gerrit.wikimedia.org/r/287004 (https://phabricator.wikimedia.org/T124680) [21:44:40] (03PS2) 10Rush: labs pdns adjustments for perf [puppet] - 10https://gerrit.wikimedia.org/r/287004 (https://phabricator.wikimedia.org/T124680) [21:45:12] 06Operations, 06Discovery, 10Monitoring, 10Wikidata, and 3 others: Create response time monitoring for WDQS endpoint - https://phabricator.wikimedia.org/T119915#2265837 (10Gehel) a:03Gehel [21:45:22] 06Operations, 06Discovery, 10Monitoring, 10Wikidata, and 3 others: Create response time monitoring for WDQS endpoint - https://phabricator.wikimedia.org/T119915#1839912 (10Gehel) [21:45:41] ottomata: tests look good. thanks for your help! [21:46:45] 06Operations, 10Traffic, 07HTTPS: letsencrypt puppetization: upgrade for scalability - https://phabricator.wikimedia.org/T134447#2265856 (10BBlack) [21:47:29] 06Operations, 10OCG-General, 05codfw-rollout: Document eqiad/codfw transition plan for OCG - https://phabricator.wikimedia.org/T133164#2265875 (10cscott) [21:47:46] 06Operations, 10OCG-General, 13Patch-For-Review, 05codfw-rollout: Use FQDNs instead of hostnames in the download urls sent to Mediawiki - https://phabricator.wikimedia.org/T133864#2265872 (10cscott) 05Open>03Resolved a:03cscott OK, deployed on production, OCG restarted, everything looks fine. [21:47:58] cscott: yw! [21:49:00] (03CR) 10Andrew Bogott: [C: 031] labs pdns adjustments for perf [puppet] - 10https://gerrit.wikimedia.org/r/287004 (https://phabricator.wikimedia.org/T124680) (owner: 10Rush) [21:49:37] (03Abandoned) 10Andrew Bogott: Labs DNS: Change the cache ttls back to defaults. [puppet] - 10https://gerrit.wikimedia.org/r/286905 (https://phabricator.wikimedia.org/T124680) (owner: 10Andrew Bogott) [21:51:31] 06Operations, 06Discovery, 10Monitoring, 10Wikidata, and 3 others: Create response time monitoring for WDQS endpoint - https://phabricator.wikimedia.org/T119915#2265898 (10Smalyshev) [21:51:45] 06Operations, 06Discovery, 10Monitoring, 10Wikidata, and 3 others: Create response time monitoring for WDQS endpoint - https://phabricator.wikimedia.org/T119915#1839912 (10Smalyshev) 05Open>03Resolved [21:51:49] PROBLEM - puppet last run on cp3006 is CRITICAL: CRITICAL: puppet fail [21:51:57] (03PS3) 10Rush: labs pdns adjustments for perf [puppet] - 10https://gerrit.wikimedia.org/r/287004 (https://phabricator.wikimedia.org/T124680) [21:52:11] (03PS4) 10Rush: labs pdns adjustments for perf [puppet] - 10https://gerrit.wikimedia.org/r/287004 (https://phabricator.wikimedia.org/T124680) [21:54:12] (03CR) 10Rush: [C: 032] labs pdns adjustments for perf [puppet] - 10https://gerrit.wikimedia.org/r/287004 (https://phabricator.wikimedia.org/T124680) (owner: 10Rush) [21:54:20] ostriches: legoktm: filled the cache madness as https://phabricator.wikimedia.org/T134448 [21:54:53] thanks :) [21:55:07] 06Operations, 06Analytics-Kanban, 10DNS, 10Traffic, 13Patch-For-Review: Create analytics.wikimedia.org - https://phabricator.wikimedia.org/T132407#2265931 (10Ottomata) [21:55:23] (03PS1) 10Chad: Revert "Invalidate InitialiseSettings cache anytime config changes" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/287006 [21:55:30] PROBLEM - Check for gridmaster host resolution UDP on labs-ns1.wikimedia.org is CRITICAL: CRITICAL - Plugin timed out while executing system call [21:55:37] (03PS2) 10Chad: Revert "Invalidate InitialiseSettings cache anytime config changes" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/287006 [21:55:45] (03CR) 10Chad: [C: 032 V: 032] Revert "Invalidate InitialiseSettings cache anytime config changes" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/287006 (owner: 10Chad) [21:56:52] !log demon@tin Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 28s) [21:57:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:57:26] Already reverted and resolved :P [21:57:27] (03CR) 10Cscott: "I ran out of time in yesterday's deploy window to try this; the FQDN patch ate up all the time I had. The puppet compiler output below lo" [puppet] - 10https://gerrit.wikimedia.org/r/286070 (https://phabricator.wikimedia.org/T84723) (owner: 10Cscott) [21:58:06] (03CR) 10Cscott: "(Where by "yesterday's deploy window" I mean "today's deploy window", which will be "yesterday" by the time _joe_ reads this...)" [puppet] - 10https://gerrit.wikimedia.org/r/286070 (https://phabricator.wikimedia.org/T84723) (owner: 10Cscott) [21:59:19] RECOVERY - Check for gridmaster host resolution UDP on labs-ns1.wikimedia.org is OK: DNS OK - 0.110 seconds response time (tools-grid-master.tools.eqiad.wmflabs. 60 IN A 10.68.20.158) [22:00:29] ostriches: lets keep the task around though. [22:01:01] ostriches: cause invalidating based on the directory is a great idea. Alternatively we could have scap to always touch a dummy file such as conf-cache.touch or whatever [22:01:19] Nah it's all actually a kludge. [22:01:25] I think we should just move towards less state-dependent config :) [22:02:35] PROBLEM - puppet last run on holmium is CRITICAL: CRITICAL: puppet fail [22:03:54] RECOVERY - puppet last run on holmium is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [22:06:31] (03PS1) 10Rush: labs pdns don't add reuseport for now [puppet] - 10https://gerrit.wikimedia.org/r/287008 [22:06:50] (03PS2) 10Rush: labs pdns don't add reuseport for now [puppet] - 10https://gerrit.wikimedia.org/r/287008 [22:08:36] (03CR) 10Rush: [C: 032] labs pdns don't add reuseport for now [puppet] - 10https://gerrit.wikimedia.org/r/287008 (owner: 10Rush) [22:17:24] PROBLEM - puppet last run on holmium is CRITICAL: CRITICAL: puppet fail [22:18:04] RECOVERY - puppet last run on cp3006 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [22:19:24] RECOVERY - puppet last run on holmium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [22:30:02] (03PS1) 10Dzahn: update URLs of redirected feeds [puppet] - 10https://gerrit.wikimedia.org/r/287014 (https://phabricator.wikimedia.org/T134436) [22:32:35] (03PS2) 10Dzahn: update URLs of redirected feeds [puppet] - 10https://gerrit.wikimedia.org/r/287014 (https://phabricator.wikimedia.org/T134436) [22:32:58] (03CR) 10Dzahn: [C: 032] update URLs of redirected feeds [puppet] - 10https://gerrit.wikimedia.org/r/287014 (https://phabricator.wikimedia.org/T134436) (owner: 10Dzahn) [22:34:01] (03PS3) 10Dzahn: planet: update URLs of redirected feeds [puppet] - 10https://gerrit.wikimedia.org/r/287014 (https://phabricator.wikimedia.org/T134436) [22:44:13] (03PS1) 10Dzahn: add base::firewall on argon, remove from role::mw_rc_irc [puppet] - 10https://gerrit.wikimedia.org/r/287018 [22:44:53] (03PS2) 10Dzahn: add base::firewall on argon, remove from role::mw_rc_irc [puppet] - 10https://gerrit.wikimedia.org/r/287018 [22:45:35] (03PS1) 1020after4: librarize phab extensions repo [puppet] - 10https://gerrit.wikimedia.org/r/287021 (https://phabricator.wikimedia.org/T128797) [22:46:52] (03PS2) 10Dzahn: Switch my planet feed from cats to tags [puppet] - 10https://gerrit.wikimedia.org/r/286995 (owner: 10Addshore) [22:47:24] (03CR) 10Dzahn: [C: 032] Switch my planet feed from cats to tags [puppet] - 10https://gerrit.wikimedia.org/r/286995 (owner: 10Addshore) [22:47:43] (03PS2) 1020after4: librarize phab extensions repo [puppet] - 10https://gerrit.wikimedia.org/r/287021 (https://phabricator.wikimedia.org/T128797) [22:52:49] (03PS3) 1020after4: librarize phab extensions repo [puppet] - 10https://gerrit.wikimedia.org/r/287021 (https://phabricator.wikimedia.org/T128797) [23:00:04] RoanKattouw ostriches Krenair Dereckson: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160504T2300). [23:00:04] RoanKattouw James_F: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [23:00:09] * James_F waves. [23:02:58] I'll do it [23:03:46] The sync on the VE one is a little more complicated than normal, sorry. [23:06:02] No worries [23:06:10] Cite first, then a recursive submodule update for VE? [23:06:40] Please. [23:15:36] 06Operations, 10Ops-Access-Requests: "design-team" list archiving or abandoning - https://phabricator.wikimedia.org/T134454#2266108 (10Volker_E) [23:18:53] !log catrope@tin Synchronized php-1.27.0-wmf.23/extensions/Cite: SWAT (duration: 00m 52s) [23:19:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:21:16] !log catrope@tin Synchronized php-1.27.0-wmf.23/extensions/VisualEditor: SWAT (duration: 00m 29s) [23:21:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:24:44] James_F: ---^^ please verify [23:24:52] (03CR) 10Catrope: [C: 032] Remove emailuser override for hewiki, no longer needed [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286105 (https://phabricator.wikimedia.org/T133927) (owner: 10Catrope) [23:24:59] Checking. [23:26:17] Hmm. [23:26:20] Darn cache. [23:27:52] RoanKattouw: What did you sync of VE? All of it? [23:27:57] Yes [23:28:04] git submodule update --recursive extensions/VisualEditor [23:28:08] Hmm. Change not apparent. [23:28:08] sync-dir extensions/VisualEditor [23:28:16] Does --recursive do the sub-module? [23:28:20] Yes [23:28:36] * James_F hits MW and does a new ?debug=true load again. [23:28:42] Don't use ?debug=true [23:28:52] It works in Beta Cluster. [23:28:52] That will get you staler code, not fresher code [23:28:55] Oh. [23:28:57] * James_F sighs. [23:29:00] When did that happen? [23:29:09] 2011 [23:29:17] ? No. [23:29:27] I've been needing to do ?debug=true to verify deploys for years now. [23:29:27] Debug mode has always been crap [23:29:42] Yes, crap. But less crap than waiting 10 minutes for the deploy to work. [23:29:42] You shouldn't need to. Try patience (5 mins) + incognito instead [23:29:46] That works for me [23:30:32] Well, it isn't working in prod the way it works in master. [23:30:54] wait, since when ?debug=true works on beta cluster? [23:31:05] it serves super old code sometimes. there's a bug about it. [23:31:05] But it's also not broken. [23:31:14] MatmaRex: Not beta cluster, production. [23:31:24] I suppose you may not have cherry-picked all that you needed to? [23:31:38] Maaaybe? It seemed to work locally. [23:31:41] * James_F re-checks. [23:31:44] (ah, the other 'it'.) [23:31:51] Yeah. [23:32:43] It looks OK. [23:32:54] Eurgh. Call it good enough and we'll debug further in the morning. [23:32:57] (03PS3) 10Catrope: Remove emailuser override for hewiki, no longer needed [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286105 (https://phabricator.wikimedia.org/T133927) [23:33:04] (03CR) 10Catrope: [C: 032] Remove emailuser override for hewiki, no longer needed [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286105 (https://phabricator.wikimedia.org/T133927) (owner: 10Catrope) [23:33:10] In 20 minutes RL's cache might even have fixed itself enough for it to suddenly work in prod. [23:33:33] (03Merged) 10jenkins-bot: Remove emailuser override for hewiki, no longer needed [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286105 (https://phabricator.wikimedia.org/T133927) (owner: 10Catrope) [23:33:57] (03PS1) 10Aude: Put wikidata back on wmf/1.27.0-wmf.23 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/287031 [23:34:52] RoanKattouw: maybe when you are done swat, i'd like to deploy a patch for wikidata and then put it back on wmf.23 [23:35:04] assuming it's too late for adding things to swat [23:36:20] Sure [23:36:55] I'm deploying my config patch now and then I'll be done [23:36:59] ok [23:37:12] !log catrope@tin Synchronized wmf-config/InitialiseSettings.php: Remove emailuser override for hewiki, no longer needed (duration: 00m 33s) [23:37:16] There, all done [23:37:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:37:41] waiting for jenkins anyways [23:37:50] before +2 [23:45:31] (03PS1) 10Dzahn: acme-setup: only accept ASCII letters as cert ID [puppet] - 10https://gerrit.wikimedia.org/r/287032 (https://phabricator.wikimedia.org/T134447) [23:48:14] (03PS2) 10Dzahn: acme-setup: only accept ASCII letters as unique cert ID [puppet] - 10https://gerrit.wikimedia.org/r/287032 (https://phabricator.wikimedia.org/T134447) [23:49:41] (03PS3) 10Dzahn: acme-setup: only accept ASCII letters as unique cert ID [puppet] - 10https://gerrit.wikimedia.org/r/287032 (https://phabricator.wikimedia.org/T134447) [23:52:45] (03CR) 10Alex Monk: "what about digits?" [puppet] - 10https://gerrit.wikimedia.org/r/287032 (https://phabricator.wikimedia.org/T134447) (owner: 10Dzahn) [23:53:19] deploying [23:54:31] !log aude@tin Synchronized php-1.27.0-wmf.23/extensions/Wikidata: Fix bug in other languages box: T134432 (duration: 02m 18s) [23:54:32] T134432: Missing labels and descriptions in "other languages" box - https://phabricator.wikimedia.org/T134432 [23:54:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:55:12] looks good (on test.wikidata) [23:55:36] (03PS1) 10Catrope: Remove useless Echo footer notice overrides in labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/287033 [23:55:38] (03PS1) 10Catrope: Enable cross-wiki notifications by default in beta labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/287034 (https://phabricator.wikimedia.org/T130655) [23:55:40] (03PS1) 10Catrope: Enable cross-wiki notifications by default in production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/287035 (https://phabricator.wikimedia.org/T130655) [23:55:53] (03CR) 10Catrope: [C: 04-2] "Not until May 12th" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/287035 (https://phabricator.wikimedia.org/T130655) (owner: 10Catrope) [23:57:14] (03CR) 10Aude: [C: 032] Put wikidata back on wmf/1.27.0-wmf.23 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/287031 (owner: 10Aude) [23:57:15] PROBLEM - puppet last run on db2067 is CRITICAL: CRITICAL: puppet fail [23:57:54] (03PS2) 10Aude: Put wikidata back on wmf/1.27.0-wmf.23 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/287031 [23:58:03] spank69 [23:58:12] (03CR) 10Aude: Put wikidata back on wmf/1.27.0-wmf.23 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/287031 (owner: 10Aude) [23:58:18] (03CR) 10Aude: [C: 032] Put wikidata back on wmf/1.27.0-wmf.23 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/287031 (owner: 10Aude) [23:58:27] * twentyafterfour changes his password :O [23:58:31] hahaha [23:58:33] o_O [23:58:49] f'in gdm [23:59:22] (03Merged) 10jenkins-bot: Put wikidata back on wmf/1.27.0-wmf.23 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/287031 (owner: 10Aude)