[04:03:29] <_joe_> that's amaizing [04:03:36] <_joe_> I'm proud of you cdanis [04:04:04] <_joe_> now you just need to pipe that through a perl script that transforms that json in html [05:50:35] what is that mess? [05:52:31] more specifically what is this: _etcd._tcp.eqiad.wmnet srv ?? [05:52:46] s/srv// [06:16:46] a SRV record pointing out to the servers offering the etcd service, aka conf[1004-1006].eqiad.wmnet in that case [06:17:54] a primitive service discovery based on DNS [06:18:21] please excuse my not very smart questions [06:18:43] the underscores are there to indicate it's a name that should not be used widely, or...? [06:19:11] is part of the SRV naming format [06:19:14] _service._proto.name. [06:19:47] and one more (sorry), is there an implied _udp one of these? [06:21:36] so that one is a TCP service, but yeah, you could have a _udp record if needed [06:27:09] "An underscore (_) is prepended to the service identifier to avoid collisions with DNS labels that occur in nature." going to ask questions about natural habitat and mating behavior in a minute [07:11:38] the new buster-based pool counters are in use since two weeks and TTBOMK there have been no issues. is everyone okay with dropping the old jessie VMs now or would anyone prefer to keep them around a little longer? [07:29:55] sounds reasonable [07:31:21] ack, if there are no further objections, I'll drop these later the day [08:19:18] <_joe_> moritzm: kill! kill! [08:23:52] <_joe_> jijiki: so regarding https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/528176/ [08:24:04] <_joe_> last week I migrated mw1270 to be php7 only [08:24:17] <_joe_> there was a couple unexpected effects [08:24:33] <_joe_> 1 - a 10% increase in the median and 75% percentiles [08:24:52] <_joe_> 2 - we filled up apc, which resulted in severe degradation of performance [08:25:16] <_joe_> so I looked at the server and I saw a lot of messages of php-fpm spawning new workers [08:25:29] <_joe_> which is due to us using the processmanager 'dynamic' [08:25:39] <_joe_> which was a good idea while php-fpm had little traffic [08:25:48] <_joe_> it would use just the resources it needed [08:26:01] <_joe_> but that's an expensive procedure we don't really need at full steam [08:26:08] <_joe_> so I switched to pm 'static' [08:26:47] <_joe_> also, it seemed we had some queued requests in php-fpm, so I raised the number of workers for appservers from 1.5 * ncores to 2 * ncores [08:27:20] <_joe_> the result of applying that config (and bumping up apc's space) on mw1270 was getting the perf back where it was before [08:27:49] ok that makes sense [08:27:56] do you mind if I add all that in a new task [08:28:27] just to keep the process of it [08:28:41] <_joe_> it can even be a comment on the switch traffic task [08:28:50] sure [08:28:59] I'll do it [08:29:51] and I think with the new dashboards, it makes sense to look into the api servers we switched [08:29:58] a bit further [08:30:13] I don't recall such issues when we switched an api server to php7 [08:30:23] <_joe_> yeah there weren't [08:30:33] <_joe_> I checked the perf of the 2 100% api servers too [08:31:18] and nothing notable? [08:31:44] <_joe_> nope [08:31:46] <_joe_> they're ok [08:32:03] <_joe_> which is strange if you think the apis get way more req/s than the appservers [08:32:16] that is what I was about to say [08:32:20] strange and upsetting :p [08:33:05] unless there are something specific [08:33:20] or some specific things that appservers do that api servers don't do [08:33:28] that cause this exhaustion [08:34:42] ok I will write those things down on the task [08:35:01] tx joe [08:36:06] <_joe_> I think appservers do more rendering [08:36:13] <_joe_> of wikitext to html [08:36:20] <_joe_> so more parsing, which is very expensive [08:36:22] <_joe_> anyways [08:41:39] could be [08:56:26] 10serviceops, 10Operations, 10Performance-Team (Radar), 10User-jijiki: Ramp up percentage of users on php7.2 to 100% on both API and appserver clusters - https://phabricator.wikimedia.org/T219150 (10jijiki) We switched mw1270 to PHP7 but we came across the following issues * a 10% increase in the median a... [09:01:21] _joe_ akosiaris oi [09:01:31] <_joe_> sigh sorry [09:05:58] sorry [09:53:58] 10serviceops, 10Operations, 10PHP 7.2 support: Don't monitor HHVM on PHP7 only servers - https://phabricator.wikimedia.org/T228643 (10jijiki) 05Open→03Invalid @Dzahn you are right, I am marking this as invalid. [09:54:02] 10serviceops, 10Operations, 10Performance-Team (Radar), 10User-jijiki: Ramp up percentage of users on php7.2 to 100% on both API and appserver clusters - https://phabricator.wikimedia.org/T219150 (10jijiki) [10:07:51] _joe_: does it make sense to make php7 default on mwmain* servers ? [10:08:00] <_joe_> jijiki: at some point yes [10:08:17] <_joe_> there should be a patch somewhere already [10:08:24] I was thinking that since we'll switch them all [10:08:36] <_joe_> sure [10:09:09] <_joe_> https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/425027/ [10:09:51] ok I will update this one [10:09:53] tx [10:23:45] we need to update our 7.2 packages to catch up, I'll work on this in the next days. deb.sury.org (from which we the sources to rebuild in component/php72) is at 7.2.20 and not yet updated to .21 which was released four days ago. I think I'll go with .20 for now as we're probably going to be upgrading to a .22 or .23 release when the GC bug is fixed/backported to 7.2.x [10:23:49] thoughts/objections? [10:26:24] moritzm: due to wikimania/vacations etc [10:26:40] out team will be a league of 1 for a few days next week [10:27:07] unless we can have something this week and try it out [10:29:49] I'll deal with the rollout, I'm just asking for input on .20 vs .21 :-) [10:31:59] <_joe_> moritzm: makes sense [10:32:15] <_joe_> moritzm: I think we're stull unsure what the GC bug really is, tim was looking into it [10:32:47] <_joe_> https://bugs.php.net/bug.php?id=78379 heh [10:32:50] I posted a reproducer earlier the day, but it'll still be a long way to get that fixed/backported etc [10:33:10] and it's my understanding that this is only relevant for Parsoid/PHP [10:33:27] I'll look into preparing 7.2.20 packages later the day [10:34:42] <_joe_> wow the last comment in that bug, lol [13:17:05] 10serviceops, 10CPT Initiatives (Session Management Service (CDP2)), 10Core Platform Team Workboards (Green), 10User-Clarakosi, 10User-Eevans: Package table_properties utility for Debian - https://phabricator.wikimedia.org/T226551 (10holger.knust) [14:49:56] 10serviceops, 10Operations: tmpreaper possible race condition - https://phabricator.wikimedia.org/T151304 (10Andrew) From the man page: ` Unless your machine is one with lots of relatively untrusted users, such as an ISP or school, you don't need this program; `find ... -exec rm ...' wor... [17:50:40] 10serviceops, 10CPT Initiatives (Session Management Service (CDP2)), 10Core Platform Team Workboards (Green), 10User-Clarakosi, 10User-Eevans: Package table_properties utility for Debian - https://phabricator.wikimedia.org/T226551 (10holger.knust) [19:29:44] 10serviceops, 10Parsoid-PHP, 10CPT Initiatives (Parsoid REST API in PHP (CDP2)): Pick a simple (short-term) deployment option for scandium - https://phabricator.wikimedia.org/T229858 (10ssastry) We are just going to with the simplest strategy here. * scandium is already configured as an appserver and will... [19:32:55] 10serviceops, 10Parsoid-PHP, 10CPT Initiatives (Parsoid REST API in PHP (CDP2)): Pick a simple (short-term) deployment option for scandium - https://phabricator.wikimedia.org/T229858 (10ssastry) [22:00:06] jijiki: mw-maintenance-cronjob-72-move-progress-o-meter: 7.4 % [22:00:48] 1 moved, 1 deleted that did not work. 2/27 [22:01:08] but i'd rather move some more individually than all at once with that other patch [22:01:16] will update progress later today [22:21:56] mutante: could increase out confidence it it keeps working ok [22:23:07] I just felt like there woulf be a lot of tedious work to move one by one and then remove the PHP= leftovers [22:23:52] i dont mind the tedious work because it means the possible issues get stretched out over time instead of one large merge, i guess [22:24:10] and they seem to have different risk level [22:24:16] from trivial to wikidata [22:24:59] ok what if, we switch manually the tricky ones [22:25:24] and then merge the patch to switch them all [22:30:03] i was going to merge the simple ones first and leave wikidata last as it says on the ticket [22:30:25] well, yea. but also i dont really know yet which are the tricky ones. and if i do them one-by-one i will find out [22:30:55] we have a good record here since all async jobs are or php7 [22:31:09] and nothing failed (well, that we know of) [22:32:27] ok let's continue for now with what you think it's best [22:32:38] and we can revisit tomorrow [22:32:56] thank you for the help! [22:32:59] maybe something in between, i dont need to make 27 patch sets [22:33:15] haha, yeah we'll fegure it out [22:33:19] figure* [22:33:20] alright, and you're welcome [22:33:41] also, i got reminded there is still wikitech .. hmpf [22:33:50] which i started but did not get to finish [22:35:21] i hope to maybe find more jobs that don't even work anynmore and can be removed [22:35:41] checking that first by manually running them.. we dont easily noticed with some of them just being silent and not logging [22:35:54] they should probably all log to logstash