[06:52:49] 10serviceops, 10MediaWiki-extensions-Linter, 10Parsoid, 10WMF-JobQueue, and 2 others: Could not enqueue jobs: "Unable to deliver all events: 503: Service Unavailable" - https://phabricator.wikimedia.org/T249745 (10Joe) >>! In T249745#6095306, @Pchelolo wrote: > In general this is happening quite a lot, [[... [07:25:02] 10serviceops, 10MediaWiki-extensions-Linter, 10Parsoid, 10WMF-JobQueue, and 2 others: Could not enqueue jobs: "Unable to deliver all events: 503: Service Unavailable" - https://phabricator.wikimedia.org/T249745 (10Joe) In another case, we had ~ 100 errors corresponding to a spike in latency from the backen... [07:48:34] 10serviceops, 10Operations, 10Traffic, 10Patch-For-Review, and 2 others: Make CDN purges reliable - https://phabricator.wikimedia.org/T133821 (10Joe) [09:48:33] 10serviceops, 10Operations, 10Traffic, 10Patch-For-Review, and 2 others: Make CDN purges reliable - https://phabricator.wikimedia.org/T133821 (10Joe) Looking at our existing event schemas, [[ https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/event-schemas/+/master/jsonschema/resource_change/1.0.0.ya... [12:57:40] hi all, i restarted php-fpm on mw2 ~20 mins ago and it has just started alerting relating to cache-hit ratio [12:57:53] 1) this is still fairly fine right as mw2 is not in use? [12:58:04] 1) any idea why it took 20 mins to alert [12:58:07] 2) any idea why it took 20 mins to alert [12:58:35] 3) restarts are also happening for mw1 althugh much less agressivly should i stop theses? [12:58:59] _joe_: mutante: [13:01:57] recoveries are comming in now [13:11:56] jbond42: 1) yea, should be fine for mw2 2) not really, check_interval is 1 3) if you are using $ sudo -i /usr/local/sbin/restart-php7.2-fpm that is the safe method. if less aggresively means "You should never run restart on more than 10% of the servers in a cluster at the same time. " then it's ok [13:12:35] that was source: https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#PHP7_opcache_health [13:13:05] mutante: for the mw one im running via cumin with `-b1 -s 15` i.e. one at a time with a 15 seconds sleep interval between each one [13:13:52] " restart-php7.2-fpm depools the server, restarts php-fpm, then repools the server" [13:14:08] are you using that script or directly systemctl? [13:14:16] that script [13:14:21] sudo cumin -b 1 -s 15 'mw1*' 'restart-php7.2-fpm' [13:14:33] ok, that should be good [13:14:42] thanks [13:16:06] better to use aliases, you might miss some (mwdebug, mwmaint...) [13:16:44] i dont know the definition of "at the same time" but assuming -b 1 means that is not. and what volans said +1 [13:17:05] -b 1 just does one at a time yes [13:17:36] yea, i meant the "how much time has to pass until it's not considered the same time" [13:17:45] and ack on the aliases [13:18:01] ahh [13:38:27] 10serviceops, 10MediaWiki-extensions-Linter, 10Parsoid, 10WMF-JobQueue, and 2 others: Could not enqueue jobs: "Unable to deliver all events: 503: Service Unavailable" - https://phabricator.wikimedia.org/T249745 (10Ottomata) > a single event on april 24th at 16:30. At that time, we had what looks like a pr... [13:44:54] 10serviceops, 10Operations, 10Traffic, 10Patch-For-Review, and 2 others: Make CDN purges reliable - https://phabricator.wikimedia.org/T133821 (10BBlack) I like the root event timestamp info. We could potentially put in future rules to help by ignoring ancient purges, in some cases (e.g. if we can guarante... [13:49:16] 10serviceops, 10Operations, 10Kubernetes, 10Patch-For-Review: Migrate to helm v3 - https://phabricator.wikimedia.org/T251305 (10JMeybohm) [15:12:21] <_joe_> ottomata: when a repo is not to be used, *mark it as such in the README* [15:12:23] <_joe_> :P [15:12:33] just did _joe_ sorrrryyyy [15:12:39] <_joe_> ahah [15:17:35] <_joe_> ottomata: so I need a "meta" section in the message? [15:17:38] <_joe_> and a $schema too? [15:18:10] yeshttps://wikitech.wikimedia.org/wiki/Event_Platform/Schemas#Event_meta_data [15:18:12] yes https://wikitech.wikimedia.org/wiki/Event_Platform/Schemas#Event_meta_data [15:19:56] _joe_: if i could go back to 2015 now, i would not use meta. it was partly a compromise with gabriel about some stuff, and also an effort at standardizing some common fields, when we didn't have a goodo way of $ref/including them from a common place. [15:21:48] <_joe_> ottomata: ack [15:24:34] heh you are not the only one to not know that mediawiki/event-schemas is no more [15:24:34] https://gerrit.wikimedia.org/r/c/mediawiki/event-schemas/+/593528 [15:24:44] the docs don't reference that anymore...do they? [15:24:48] where did you find it? [15:28:02] <_joe_> I'm pretty sure I found it in docs [15:54:38] <_joe_> ottomata: why is "dt" needed? [15:56:26] _joe_: it isn't strictly needed but is good to have for almost everything [15:56:30] we might be able to get away with it here [15:56:33] without it* [15:56:39] but, yours will be the only event stream that doesn't have it [15:56:56] <_joe_> right, I was thinking of abusing it to set the root dt [15:57:03] <_joe_> but it's a second-order optimization [15:57:11] <_joe_> it makes no sense to do it [15:57:20] meta.dt is used as the event timestamp in kafka [15:57:35] <_joe_> I might propose to make the "signature" attribute optional in root_event [15:57:37] i think kafka falls back to broker receive time if one isn't set by the producer though [15:57:43] that sounds good [16:01:14] 10serviceops, 10Operations, 10Traffic, 10Patch-For-Review, and 2 others: Make CDN purges reliable - https://phabricator.wikimedia.org/T133821 (10Joe) After a discussion on the patch, it was clearer to me that some information can't be removed from the message, and that makes `resource_change` the perfect f... [16:01:32] <_joe_> ottomata: say I create a golang package to interact with our events - would that be of general use? [16:54:03] <_joe_> ottomata: uhm I see uri is not *required* in resource_change [20:29:54] mcrouter cert renewal is scripted now: https://wikitech.wikimedia.org/w/index.php?title=Mcrouter&diff=1864782&oldid=1855921 [21:34:26] 10serviceops, 10Operations, 10Patch-For-Review, 10Performance-Team (Radar), and 2 others: Upgrade memcached for Debian Stretch/Buster - https://phabricator.wikimedia.org/T213089 (10Andrew) [21:34:33] 10serviceops, 10Packaging: prepare mcrouter package for Debian Buster - https://phabricator.wikimedia.org/T251574 (10Andrew) [21:37:30] 10serviceops, 10Packaging: prepare mcrouter package for Debian Buster - https://phabricator.wikimedia.org/T251574 (10Andrew)