[00:32:01] 10serviceops, 10Operations, 10Performance-Team, 10Patch-For-Review: Test gutter pool failover in production and memcached 1.5.x - https://phabricator.wikimedia.org/T240684 (10aaron) >>! In T240684#5893528, @Joe wrote: > A few notes: > - We cannot really worry too much about stale keys over failovers - we... [00:33:25] cscott: James_F: the servergroup on scandium is "misc". i'll look at a patch to change that to either parsoid or parsoid-test i guess and add you. [00:34:36] 10serviceops, 10Operations, 10Performance-Team, 10Patch-For-Review: Test gutter pool failover in production and memcached 1.5.x - https://phabricator.wikimedia.org/T240684 (10aaron) >>! In T240684#5839106, @elukey wrote: > Couple of random thoughts: > > * we should check the diff between our mcrouter ver... [00:35:00] mutante: That'd be great, thanks! [00:44:32] the change looks trivial: https://gerrit.wikimedia.org/r/c/operations/puppet/+/575383/2/hieradata/role/common/parsoid/testing.yaml but it does all this: https://puppet-compiler.wmflabs.org/compiler1001/21141/scandium.eqiad.wmnet/ [00:50:26] -SetEnvIf Request_URI "." SERVERGROUP=misc [00:50:26] +SetEnvIf Request_URI "." SERVERGROUP=parsoid [00:50:38] +Environment="SERVICE_CLUSTER=parsoid" [00:50:57] Exec[systemd daemon-reload for envoyproxy.service]: Triggered 'refresh' [00:51:15] [scandium:/etc/apache2/conf-available] $ grep SERVERGROUP 50-wikimedia-cluster.conf [00:51:18] SetEnvIf Request_URI "." SERVERGROUP=parsoid [00:51:28] James_F: cscott: fixed [00:51:55] scandium is now treated as a member of parsoid cluster like the other wtp* [00:51:56] Cool. [00:52:04] But not load-balanced by lvs? [00:52:44] yes, that's right [00:52:48] Good. :-) [00:52:52] it's not in lvs::configuration [00:52:59] OK, that makes things very simple for us. [00:53:00] this is just the environment variable [00:53:28] you can amend that patch and avoid "scandium" hardcoding [00:53:54] it's based on the role parsoid::testing now [00:54:24] so "parsoid" and "parsoid::testing" both make the server member of the parsoid cluster [00:54:31] in this sense [00:54:46] * James_F nods. [00:55:04] that is .. unless we think we need a separate cluster called parsoid_test and change the check to do "parsoid or parsoid_test" [00:57:18] For just one machine, it's fine. [01:00:06] i should have said "not in conftool-data" but yea, it's not in there, it is not pooled like wtp servers [01:03:48] i changes little thing like in icinga scandium now moved here: https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?servicegroup=parsoid_eqiad&style=overview [01:39:33] 10serviceops, 10Operations, 10wikitech.wikimedia.org: Install php-ldap on all MW appservers - https://phabricator.wikimedia.org/T237889 (10Jdforrester-WMF) [01:42:17] 10serviceops, 10Operations, 10Performance-Team, 10Patch-For-Review: Test gutter pool failover in production and memcached 1.5.x - https://phabricator.wikimedia.org/T240684 (10aaron) >>! In T240684#5891129, @jijiki wrote: > **Other comments** > * It is not completely predictable how fast a mcrouter will fa... [01:44:00] 10serviceops, 10Operations, 10wikitech.wikimedia.org: Install php-ldap on all MW appservers - https://phabricator.wikimedia.org/T237889 (10Jdforrester-WMF) `ldap` is already installed in CI, so no need for RelEng to change things from our end. [01:53:33] 10serviceops, 10Operations, 10Parsoid-PHP, 10SRE-Access-Requests, 10Patch-For-Review: Give all members of the Parsing team production `deployment` access - https://phabricator.wikimedia.org/T245877 (10Dzahn) @cscott @ssastry Actually you already have deployment access. So this is just about adding arlol... [01:54:54] 10serviceops, 10Operations, 10Parsoid-PHP, 10SRE-Access-Requests, 10Patch-For-Review: Give all members of the Parsing team production `deployment` access ( add arlolra to deployers) - https://phabricator.wikimedia.org/T245877 (10Dzahn) [02:00:01] 10serviceops, 10Operations, 10Parsoid-PHP, 10SRE-Access-Requests, 10Patch-For-Review: Give all members of the Parsing team production `deployment` access ( add arlolra to deployers) - https://phabricator.wikimedia.org/T245877 (10Dzahn) @Arlolra Has been added to deployers. This would solve this ticket n... [02:00:46] 10serviceops, 10Operations, 10Parsoid-PHP, 10SRE-Access-Requests, 10Patch-For-Review: Give all members of the Parsing team production `deployment` access ( add arlolra to deployers) - https://phabricator.wikimedia.org/T245877 (10Dzahn) a:03Sbailey [06:08:16] https://gerrit.wikimedia.org/r/c/operations/puppet/+/574902 [06:08:23] https://gerrit.wikimedia.org/r/c/operations/puppet/+/575382 [06:27:24] https://gerrit.wikimedia.org/r/c/operations/puppet/+/575383 [06:28:28] the last one was a fix to scandium for parsoid deployment.. the others to add more eqiad appservers and canaries to dsh.. feel free. and cya later [10:59:57] 10serviceops, 10Operations, 10Performance-Team, 10Patch-For-Review: Test gutter pool failover in production and memcached 1.5.x - https://phabricator.wikimedia.org/T240684 (10aaron) >>! In T240684#5917524, @jijiki wrote: > > Secondly, regardless of when we use the gutter pool, do you thing we need to cle... [12:24:38] 10serviceops, 10ChangeProp, 10Release Pipeline, 10Release-Engineering-Team-TODO, and 3 others: Migrate changeprop to kubernetes - https://phabricator.wikimedia.org/T213193 (10hnowlan) [12:48:59] 10serviceops, 10Beta-Cluster-Infrastructure, 10Operations, 10observability, 10Patch-For-Review: Stream a subset of mediawiki apache logs to logstash - https://phabricator.wikimedia.org/T244472 (10jijiki) [12:49:36] 10serviceops, 10Beta-Cluster-Infrastructure, 10Operations, 10observability, 10Patch-For-Review: Stream a subset of mediawiki apache logs to logstash - https://phabricator.wikimedia.org/T244472 (10jijiki) 05Open→03Resolved Thank you @herron and @fgiunchedi for your help! [13:30:02] 10serviceops, 10Deployments, 10Patch-For-Review, 10Performance-Team (Radar), and 3 others: Cache of wmf-config/InitialiseSettings often 1 step behind - https://phabricator.wikimedia.org/T236104 (10LarsWirzenius) I admit I don't understand PHP or MW, but... would a simpler approach be for scap to wait 10s b... [13:56:33] 10serviceops, 10Deployments, 10Patch-For-Review, 10Performance-Team (Radar), and 3 others: Cache of wmf-config/InitialiseSettings often 1 step behind - https://phabricator.wikimedia.org/T236104 (10Krinkle) >>! In T236104#5926376, @LarsWirzenius wrote: > […] would a simpler approach be for scap to wait 10s... [14:01:21] 10serviceops, 10Deployments, 10Patch-For-Review, 10Performance-Team (Radar), and 3 others: Cache of wmf-config/InitialiseSettings often 1 step behind - https://phabricator.wikimedia.org/T236104 (10Krinkle) I'm wondering if it'd be easier if we simply generate this cache upfront the way we do for localisati... [14:41:08] 10serviceops, 10Operations, 10Scap: Make canary wait time configurable - https://phabricator.wikimedia.org/T217924 (10jijiki) Great we merged this patch! Do we have a plan of how we will communicate this to the deployers when we release scap as well as how to test that it is all good in production? Thahnk you! [14:42:20] 10serviceops, 10Deployments, 10Patch-For-Review, 10Performance-Team (Radar), and 3 others: Cache of wmf-config/InitialiseSettings often 1 step behind - https://phabricator.wikimedia.org/T236104 (10LarsWirzenius) Okay, there's clearly things here I'm not getting yet. Scratch the idea of having scap wait ten... [14:49:03] 10serviceops, 10Deployments, 10Patch-For-Review, 10Performance-Team (Radar), and 3 others: Cache of wmf-config/InitialiseSettings often 1 step behind - https://phabricator.wikimedia.org/T236104 (10Joe) >>! In T236104#5926454, @Krinkle wrote: > I'm wondering if it'd be easier if we simply generate this cach... [15:23:35] 10serviceops, 10Operations, 10Parsoid-PHP, 10SRE-Access-Requests, 10Patch-For-Review: Give all members of the Parsing team production `deployment` access ( add arlolra to deployers) - https://phabricator.wikimedia.org/T245877 (10Arlolra) >>! In T245877#5925447, @Dzahn wrote: > @Arlolra Has been added to... [15:29:39] James_F, mutante: while we're looking at this, can we figure out what to do about beta as well? [15:30:17] should there be a parsoid::beta role (maybe there already is) and do we need a different SERVERGROUP for that, or can we just use SERVERGROUP=='parsoid' to mean "exports the Parsoid REST API" [15:55:07] 10serviceops, 10MediaWiki-General, 10Operations, 10observability: MediaWiki Prometheus support - https://phabricator.wikimedia.org/T240685 (10colewhite) As I think about it more, it's the wire format being wholly incompatible with Prometheus format. In order to make it work, StatsD requires a lot of confi... [16:04:19] 10serviceops, 10Operations, 10Parsoid-PHP, 10SRE-Access-Requests, 10Patch-For-Review: Give all members of the Parsing team production `deployment` access ( add arlolra to deployers) - https://phabricator.wikimedia.org/T245877 (10Sbailey) Greg, do you really want me to post my public key in this form? or... [16:04:21] 10serviceops, 10Operations, 10Scap: Make canary wait time configurable - https://phabricator.wikimedia.org/T217924 (10LarsWirzenius) I had a discussion with Tyler just now. We plan the following: * add to docs/ in the source tree; this ends up in doc.wikimedia.org * add to debian/changelog in the source tre... [16:09:58] 10serviceops, 10Operations, 10Scap: Make canary wait time configurable - https://phabricator.wikimedia.org/T217924 (10jijiki) >>! In T217924#5926912, @LarsWirzenius wrote: > I had a discussion with Tyler just now. We plan the following: > > * add to docs/ in the source tree; this ends up in doc.wikimedia.or... [16:15:34] 10serviceops, 10Operations, 10Parsoid-PHP, 10SRE-Access-Requests, 10Patch-For-Review: Give all members of the Parsing team production `deployment` access ( add arlolra to deployers) - https://phabricator.wikimedia.org/T245877 (10Sbailey) Subbu says yes to here it is: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABA... [16:38:41] 10serviceops, 10MediaWiki-General, 10Operations, 10observability: MediaWiki Prometheus support - https://phabricator.wikimedia.org/T240685 (10colewhite) In response to @Joe's concerns: > What happens if redis is overwhelmed/down? How can we control timeouts? The library has a configurable timeout with a... [16:39:33] 10serviceops, 10Operations, 10Patch-For-Review, 10Performance-Team (Radar), and 2 others: Upgrade memcached for Debian Stretch/Buster - https://phabricator.wikimedia.org/T213089 (10jijiki) 05Open→03Stalled This is stalled until we completely stop using redis and we have put the gutter pool in production [16:39:38] 10serviceops, 10Operations: Upgrade and improve our application object caching service (memcached) - https://phabricator.wikimedia.org/T244852 (10jijiki) [17:40:57] 10serviceops, 10Core Platform Team, 10DC-Ops, 10Operations: Rename wtp* servers to parsoid* (Parsoid PHP servers) - https://phabricator.wikimedia.org/T245888 (10jijiki) [17:42:42] 10serviceops, 10Core Platform Team, 10DC-Ops, 10Operations: Rename wtp* servers to parsoid* (Parsoid PHP servers) - https://phabricator.wikimedia.org/T245888 (10Dzahn) let's rename the ticket to parse* to match what is in DNS and the wiki page, ok? [18:03:08] 10serviceops, 10Core Platform Team, 10DC-Ops, 10Operations: Rename wtp* servers to parse* (Parsoid PHP servers) - https://phabricator.wikimedia.org/T245888 (10jijiki) [18:03:19] 10serviceops, 10Core Platform Team, 10DC-Ops, 10Operations: Rename wtp* servers to parse* (Parsoid PHP servers) - https://phabricator.wikimedia.org/T245888 (10jijiki) done, sigh [18:13:37] 10serviceops: Measure segfaults in mediawiki and parsoid servers - https://phabricator.wikimedia.org/T246470 (10jijiki) [18:13:50] 10serviceops: Measure segfaults in mediawiki and parsoid servers - https://phabricator.wikimedia.org/T246470 (10jijiki) p:05Triage→03Medium [18:27:02] 10serviceops, 10Core Platform Team, 10DC-Ops, 10Operations: Rename wtp* servers to parse* (Parsoid PHP servers) - https://phabricator.wikimedia.org/T245888 (10Dzahn) linking ticket and gerrit change that adds the _new_ parse* servers to go with this: T243112 https://gerrit.wikimedia.org/r/c/operations/pup... [18:49:14] 10serviceops, 10Deployments, 10Patch-For-Review, 10Performance-Team (Radar), and 3 others: Cache of wmf-config/InitialiseSettings often 1 step behind - https://phabricator.wikimedia.org/T236104 (10Jdforrester-WMF) >>! In T236104#5926454, @Krinkle wrote: > I'm wondering if it'd be easier if we simply genera... [19:10:50] 10serviceops, 10Operations, 10Parsoid-PHP, 10SRE-Access-Requests, 10Patch-For-Review: Give all members of the Parsing team production `deployment` access ( add arlolra to deployers) - https://phabricator.wikimedia.org/T245877 (10Dzahn) Thanks @Sbailey yes, that's correct. We actually want it on the ticke... [19:11:05] 10serviceops, 10Operations, 10Parsoid-PHP, 10SRE-Access-Requests, 10Patch-For-Review: Give all members of the Parsing team production `deployment` access ( add arlolra to deployers) - https://phabricator.wikimedia.org/T245877 (10Dzahn) a:05Sbailey→03Dzahn [20:21:23] 10serviceops, 10Parsing-critical-path, 10Parsoid-PHP, 10MW-1.35-notes (1.35.0-wmf.20; 2020-02-18), 10Patch-For-Review: Craft a deployment strategy to transition Parsoid/PHP from a faux extension to a composer library without breaking incoming requests - https://phabricator.wikimedia.org/T240055 (10Dzahn) [20:21:31] 10serviceops, 10Operations, 10Parsoid-PHP, 10SRE-Access-Requests, 10Patch-For-Review: Give all members of the Parsing team production `deployment` access ( add arlolra to deployers) - https://phabricator.wikimedia.org/T245877 (10Dzahn) 05Open→03Resolved @Sbailey You now have root access to the parsoi... [20:22:40] 10serviceops, 10Operations, 10Parsoid-PHP, 10SRE-Access-Requests, 10Patch-For-Review: Give all members of the Parsing team production `deployment` access ( add arlolra to deployers) - https://phabricator.wikimedia.org/T245877 (10Dzahn) @Sbailey Here you see an example SSH config for jumping via bastion h... [20:48:49] 10serviceops, 10Operations, 10Parsoid-PHP, 10SRE-Access-Requests, 10Patch-For-Review: Give all members of the Parsing team production `deployment` access ( add arlolra to deployers) - https://phabricator.wikimedia.org/T245877 (10Dzahn) @arlolra and @cscott You also have root on parsoid::testing (scandium... [20:54:02] 10serviceops, 10Operations, 10Performance-Team, 10Patch-For-Review: Test gutter pool failover in production and memcached 1.5.x - https://phabricator.wikimedia.org/T240684 (10jijiki) >>! In T240684#5925272, @aaron wrote: > Yeah, the eviction ("tko") delay should be low to avoid prolonging DB traffic spike... [21:39:03] 10serviceops, 10Operations, 10Performance-Team, 10Patch-For-Review: Test gutter pool failover in production and memcached 1.5.x - https://phabricator.wikimedia.org/T240684 (10jijiki) @aaron Please have a look at https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/569541/ and let us know if it reflects... [21:58:11] 10serviceops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need by: 2020-02-28) rack/setup/install mw[1385-1413].eqiad.wmnet - https://phabricator.wikimedia.org/T241849 (10Dzahn) p:05Medium→03High Raising priority because the Needed-by date has arrived . Could we have a status update @Cmjohnson ? Is... [22:31:48] 10serviceops, 10ChangeProp, 10CPT Initiatives (Containerise Services): Update Kafka-dev Helm chart to work with Kubernetes version 1.16 or newer - https://phabricator.wikimedia.org/T246501 (10holger.knust)