[00:22:39] is https://phabricator.wikimedia.org/T236955 a serviceops thing? [00:32:55] 10serviceops, 10Parsing-Team, 10Parsoid, 10PHP 7.2 support: Parsoid-php doesn't get updated after a code deploy - https://phabricator.wikimedia.org/T236275 (10Dzahn) Here's the thing with the beta setup: In production the `use_php` Hiera key is what turns a classic (non-MW) parsoid server into a MW-parsoi... [00:36:04] cdanis: it's more for MediaWiki config SWAT [01:01:24] cdanis: volunteers mostly take care of the site-requests work queue. Somebody will make the wmf-config patch that is needed and post it for SWAT typically. And the folks who do that stuff will pull in others with more rights if something tricky is needed. [01:18:27] bd808: ack, thanks :) [03:22:47] 10serviceops, 10Operations, 10Parsoid-PHP: wt2html: Out of memory crashers - https://phabricator.wikimedia.org/T236833 (10ssastry) With that change, 40% of the urls don't OOM anymore. [11:34:35] 10serviceops, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (CI & Testing services), 10Test-Coverage: Upgrade our php-xdebug package for php7.2 - https://phabricator.wikimedia.org/T234418 (10hashar) a:03jijiki Phabricator edit conflict. @jijiki is indeed working on it :] [11:36:18] 10serviceops, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (CI & Testing services), 10Test-Coverage: Upgrade our php-xdebug package for php7.2 - https://phabricator.wikimedia.org/T234418 (10jijiki) Cherry picked and packaged. Please ping when you test it so I can upload it:) [11:38:41] 10serviceops, 10Operations, 10observability, 10Performance-Team (Radar): Messages in Logstash from php-fatal-error.php are missing from type:mediawiki/channel:fatal - https://phabricator.wikimedia.org/T234283 (10jijiki) Please ping if there are more things to be done for this task:) [11:39:30] 10serviceops, 10Operations, 10HHVM, 10MW-1.35-notes (1.35.0-wmf.3; 2019-10-22), and 2 others: Remove HHVM from production - https://phabricator.wikimedia.org/T229792 (10jijiki) p:05Triage→03High [14:54:43] akosiaris: so I should just get rid of the logstash output stream for eventgate chart? [14:54:47] and only always use stdout? [14:58:50] if so: [14:58:51] https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/547549 [15:00:55] godog: [15:01:04] how do you handle cross DC logstash stuff? [15:01:18] do we need to set up mirrormaker for your loggingg kafka clusters? [15:02:07] ottomata: producers can produce to either codfw or eqiad, and we consume from both [15:02:24] so no mirrormaker involved, but I appreciate the help! [15:02:32] e.g. logstash in eqiad consumes from kafka in eqiad and kafka in codfe? [15:02:36] codfw? [15:02:38] that's correct yeah [15:02:39] hm [15:03:05] so we usually do DC topic prefixes [15:03:10] so if data gets replicated between DCs [15:03:15] there aren't any circular replication loops [15:03:25] would it be ok to use separate topics in different DCs? [15:03:36] you could still consume from both, but you'd use a different topic for eqiad and codfw logs [15:04:10] we don't have to use a topic prefix tho. [15:04:26] hmm but [15:04:28] we probably should [15:04:29] hm [15:04:48] it would make intergration into other systems (i.e. hive) easier [15:04:53] since we can use all the same logic [15:04:54] hmm [15:05:07] yeah because we will probably want to mirror those topics into kafka-jumbo [15:05:10] yeah topic prefix sounds good to me if it makes things easier on your end [15:05:26] ok [15:05:36] will that make things confusing in logstash for users? [15:05:40] if there are 2 different topics? [15:05:45] not sure if the topic name is surfaced at all [15:05:45] ? [15:05:53] * ottomata does not know much about logstash [15:06:10] no topic names aren't surfaced to users in logstash [15:06:21] it is for transport only really [15:06:29] ok [15:06:56] but yeah we'll configure logstash to consume from all prefixes and that should do it [15:07:35] ok [15:07:36] cool [15:08:56] out of curiosity how are topic names used/surfaced in e.g. hive ? [15:09:19] currently [15:09:34] camus consumes them as is in per topic directories in hdfs as raw json data [15:09:37] then [15:09:50] a spark job (refine) comes around and regex matches patterns out of diretory hierarchies [15:09:53] so [15:09:57] eqiad.mediawiki.revision_create [15:10:00] becomes table [15:10:08] mediawiki_revision_create with a datacenter=eqiad partition [15:10:21] the regex is configurable [15:10:27] so we coudl support non prefixed topics [15:10:35] it just means we have to configure somethign to do it and run it differently [15:11:06] https://github.com/wikimedia/puppet/blob/production/modules/profile/manifests/analytics/refinery/job/refine.pp#L77-L93 [15:15:46] ah! thanks, yeah I don't think it is worth the trouble [15:22:02] godog: another q [15:22:18] i was considering naming the stream (which becomse the topic) mediawiki.client.error [15:22:24] since we are just working on mediawiki rn [15:22:38] but, will we want to be able to log from other clients, e.g. mobile apps? [15:22:40] and, if so [15:22:51] should all client errors just go to the same stream? [15:24:31] ottomata: you should, just doublecheck that the logs are there in a good format [15:24:51] we are going to deprecate and eventually remove logstash support from all charts [15:24:53] akosiaris: will when deployed to staging ya? [15:25:00] for new eventgate-logging-external [15:25:04] fine by me [15:25:06] coo [15:28:19] ottomata: heh good question, I'm leaning towards a single stream and clients should identify themselves in metadata [15:30:16] ok [15:30:28] if you have naming ideas or prefs let me know! [15:31:19] godog: should we maybe just have a single eventgate instance for both external and intneral clients [15:31:26] in case internal clients wanted to use this for some reason? [15:31:40] eventgate-logging instead of eventgate-logging-external [15:31:57] and set up public routing to eventgate-logging, but internal clients just use .svc url? [15:33:43] ottomata: in a meeting now [15:33:54] will get back to you after that [15:33:57] k [16:01:46] ottomata: yeah I'm not sure tbh, I haven't thought at all about internal clients. I'm leaning towards going with -external now and not widen the scope too much [16:33:04] BTW, I presume the operations/debs/hhvm repo can be archived? [17:34:59] 10serviceops, 10Cleanup, 10Repository-Admins, 10HHVM: Archive operations/debs/hhvm repository - https://phabricator.wikimedia.org/T237038 (10Dzahn) # Remove from https://doc.wikimedia.org/ if present (requires Continuous-Integration-Infrastructure shell user to delete directly from the server). On doc1001... [18:24:14] 10serviceops, 10Cleanup, 10Repository-Admins, 10HHVM: Archive operations/debs/hhvm repository - https://phabricator.wikimedia.org/T237038 (10Dinoguy1000) 05Open→03Stalled Marking as stalled per note at the top of the description (please undo if I've overstepped my bounds here or anything). [18:47:02] 10serviceops, 10Cleanup, 10Repository-Admins, 10HHVM: Archive operations/debs/hhvm repository - https://phabricator.wikimedia.org/T237038 (10Jdforrester-WMF) >>! In T237038#5624181, @Dzahn wrote: > re: # Remove from https://doc.wikimedia.org/ if present (requires Continuous-Integration-Infrastructure shell... [18:47:41] 10serviceops, 10Cleanup, 10Repository-Admins, 10HHVM: Archive operations/debs/hhvm repository - https://phabricator.wikimedia.org/T237038 (10Jdforrester-WMF) [19:10:24] 10serviceops, 10Operations, 10observability, 10Performance-Team (Radar): Messages in Logstash from php-fatal-error.php are missing from type:mediawiki/channel:fatal - https://phabricator.wikimedia.org/T234283 (10Krinkle) >>! In T234283#5619101, @gerritbot wrote: > Change 546219 **merged** by Effie Mouzeli:... [19:33:37] 10serviceops, 10Operations, 10observability, 10Performance-Team (Radar): Messages in Logstash from php-fatal-error.php are missing from type:mediawiki/channel:fatal - https://phabricator.wikimedia.org/T234283 (10jijiki) @Krinkle we are looking into it, tx [21:17:43] 10serviceops, 10DC-Ops, 10Operations, 10ops-eqiad: mw1239 memory errors - https://phabricator.wikimedia.org/T227867 (10Dzahn) still a problem i think: {P9515}