[00:00:02] 10serviceops, 10Operations, 10cloud-services-team, 10Core Platform Team Legacy (Watching / External), and 4 others: Switch cronjobs on maintenance hosts to PHP7 - https://phabricator.wikimedia.org/T195392 (10Reedy) [01:14:36] 10serviceops, 10Operations, 10cloud-services-team, 10Core Platform Team Legacy (Watching / External), and 4 others: Switch cronjobs on maintenance hosts to PHP7 - https://phabricator.wikimedia.org/T195392 (10Dzahn) [01:17:20] 10serviceops, 10Operations, 10cloud-services-team, 10Core Platform Team Legacy (Watching / External), and 4 others: Switch cronjobs on maintenance hosts to PHP7 - https://phabricator.wikimedia.org/T195392 (10Dzahn) [01:27:45] 10serviceops, 10Operations, 10cloud-services-team, 10Core Platform Team Legacy (Watching / External), and 4 others: Switch cronjobs on maintenance hosts to PHP7 - https://phabricator.wikimedia.org/T195392 (10Dzahn) [01:42:04] 10serviceops, 10Operations, 10cloud-services-team, 10Core Platform Team Legacy (Watching / External), and 4 others: Switch cronjobs on maintenance hosts to PHP7 - https://phabricator.wikimedia.org/T195392 (10Dzahn) [01:55:25] 10serviceops, 10Operations, 10cloud-services-team, 10Core Platform Team Legacy (Watching / External), and 4 others: Switch cronjobs on maintenance hosts to PHP7 - https://phabricator.wikimedia.org/T195392 (10Dzahn) [02:08:15] 10serviceops, 10Operations, 10cloud-services-team, 10Core Platform Team Legacy (Watching / External), and 4 others: Switch cronjobs on maintenance hosts to PHP7 - https://phabricator.wikimedia.org/T195392 (10Dzahn) [02:16:37] 10serviceops, 10Operations, 10cloud-services-team, 10Core Platform Team Legacy (Watching / External), and 4 others: Switch cronjobs on maintenance hosts to PHP7 - https://phabricator.wikimedia.org/T195392 (10Dzahn) [06:35:53] 10serviceops, 10Operations: tmpreaper possible race condition - https://phabricator.wikimedia.org/T151304 (10elukey) >>! In T151304#5396403, @Andrew wrote: > So that suggests that it's probably wise to keep using it on cloud VMs, as long as it still works. That said, I'm not sure that we couldn't just > /dev/... [06:50:45] 10serviceops, 10Parsoid-PHP, 10CPT Initiatives (Parsoid REST API in PHP (CDP2)), 10Patch-For-Review: Pick a simple (short-term) deployment option for scandium - https://phabricator.wikimedia.org/T229858 (10Tgr) Note, you'll have to do something along the lines of https://gerrit.wikimedia.org/r/c/mediawiki... [07:32:11] 10serviceops, 10Operations, 10Patch-For-Review: Migrate pool counters to Stretch/Buster - https://phabricator.wikimedia.org/T224572 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jmm@cumin2001 for hosts: `poolcounter1001.eqiad.wmnet` - poolcounter1001.eqiad.wmnet - Removed from Puppet... [07:36:48] 10serviceops, 10Operations, 10Patch-For-Review: Migrate pool counters to Stretch/Buster - https://phabricator.wikimedia.org/T224572 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jmm@cumin2001 for hosts: `poolcounter1003.eqiad.wmnet` - poolcounter1003.eqiad.wmnet - Removed from Puppet... [07:54:57] 10serviceops, 10Parsoid-PHP, 10CPT Initiatives (Parsoid REST API in PHP (CDP2)), 10Patch-For-Review: Pick a simple (short-term) deployment option for scandium - https://phabricator.wikimedia.org/T229858 (10Joe) My main worry is that anything you could do would be wiped out by the next scap run, unless we... [07:56:27] 10serviceops, 10Operations, 10Traffic, 10Patch-For-Review: Applayer services without TLS - https://phabricator.wikimedia.org/T210411 (10ema) [08:10:00] 10serviceops, 10Operations, 10Traffic, 10Patch-For-Review: Applayer services without TLS - https://phabricator.wikimedia.org/T210411 (10ema) [08:37:22] 10serviceops, 10Operations: Migrate pool counters to Stretch/Buster - https://phabricator.wikimedia.org/T224572 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jmm@cumin2001 for hosts: `poolcounter2001.codfw.wmnet` - poolcounter2001.codfw.wmnet - Removed from Puppet master and PuppetDB... [08:48:18] 10serviceops, 10Operations, 10Traffic, 10Patch-For-Review: Applayer services without TLS - https://phabricator.wikimedia.org/T210411 (10ema) [09:46:42] o/ [09:46:55] qq - anything against me merging https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/528629/ ? [09:47:05] it is to prevent hhvm restarts on php only hosts [09:51:08] thanks :) [09:51:40] as follow up, I noticed that the prometheus hhvm exporter is still deployed on mw1348 [09:51:52] probably known, but we could add an option to avoid ot [09:51:54] *it [09:52:00] <_joe_> uhm is it? [09:52:05] <_joe_> that's indeed wrong [09:52:18] <_joe_> I didn't go as far as remove it from the role, meh [10:03:19] 10serviceops, 10Operations: Migrate pool counters to Stretch/Buster - https://phabricator.wikimedia.org/T224572 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jmm@cumin2001 for hosts: `poolcounter2002.codfw.wmnet` - poolcounter2002.codfw.wmnet - Removed from Puppet master and PuppetDB... [11:23:15] 10serviceops, 10Operations, 10Patch-For-Review: Migrate pool counters to Buster - https://phabricator.wikimedia.org/T224572 (10MoritzMuehlenhoff) [11:24:58] 10serviceops, 10Operations, 10Patch-For-Review: Migrate pool counters to Buster - https://phabricator.wikimedia.org/T224572 (10MoritzMuehlenhoff) 05Open→03Resolved We now have the main pool counters running on Buster using the stock Debian package of poolcounter (poolcounter1004, poolcounter1005, poolcou... [13:23:53] _joe_: will you have a few minutes today to take a look at https://gerrit.wikimedia.org/r/c/operations/puppet/+/528527 ? [13:24:14] <_joe_> cdanis: sure [13:24:41] <_joe_> cdanis: oh you did it :P [13:25:52] indeed, although the silly one-liner is now almost 30 lines of proper script [13:29:10] 10serviceops, 10Parsoid-PHP, 10CPT Initiatives (Parsoid REST API in PHP (CDP2)), 10Patch-For-Review: Pick a simple (short-term) deployment option for scandium - https://phabricator.wikimedia.org/T229858 (10ssastry) >>! In T229858#5398587, @Joe wrote: > My main worry is that anything you could do would be... [13:41:31] 10serviceops, 10Operations: Update component/php72 to 7.2.20 - https://phabricator.wikimedia.org/T230024 (10MoritzMuehlenhoff) [13:50:28] <_joe_> cdanis: I think I last saw shuf(1) used in the early 2000s [13:50:31] <_joe_> :P [13:50:38] <_joe_> that's a nice throwback [13:50:46] hey, I'm actually kinda proud of that function :P [13:56:34] <_joe_> cdanis: I have one doubt though, doesn't $4 coming out of that dig including the dreaded final dot? [13:56:43] _joe_: it's still a valid hostname [13:57:12] <_joe_> curl correctly interprets it? [13:57:16] <_joe_> ok then [13:57:18] yep [13:57:22] <_joe_> I was convinced of the contrary [13:57:35] anything that accepts a host name should [13:58:22] final dot in domain names is meant to always be valid, when DNS resolvers resolve [13:58:47] a trailing dot is just the DNS equivalent of a leading / [14:00:46] * liw idly runs "dig -t NS ." [14:05:37] <_joe_> cdanis I'm aware of the function of the dot in dns notation [14:05:52] <_joe_> It just never occurred to me anything working with the web would swallow it [14:06:10] <_joe_> cdanis: for instance gerrit never lets me down [14:06:13] <_joe_> https://gerrit.wikimedia.org./r/#/c/operations/puppet/+/528527/ [14:06:25] interesting [14:06:28] https://en.wikipedia.org./wiki/Main_Page works fine [14:06:46] phab is fine as well [14:07:01] heh, gnome terminal's URL detection doesn't detect those as URLs [14:07:10] and grafana [14:07:16] <_joe_> liw: neither xfce [14:07:35] glowing-bear knows they are URLs :) [14:07:41] <_joe_> I knew gerrit wouldn't let me down [14:07:48] <_joe_> chrome too :P [14:07:50] it might not even be gerrit, could just be gerrit's apache config [14:08:49] <_joe_> cdanis: I added a couple comments [14:09:33] <_joe_> the fact that the shell overwrites any file on the left side of > independently of the fact if that command is reached in execution is still not making sense to me after all those years [14:10:26] left side? [14:10:31] <_joe_> right side [14:10:33] <_joe_> sorry [14:10:40] <_joe_> my usual difficulty with right and left [14:10:49] just checking [14:12:43] thanks for the review _joe_ [14:13:29] <_joe_> cdanis: I'm not 100% sure you /need/ pipefail, but it's better safe than sorry [14:13:37] for sure [14:13:47] it helps make resolve-srv a bit more correct as well [14:13:53] <_joe_> the other concern, though, is sadly real [14:14:35] <_joe_> the shell acquires the filehandle for the file on the right of > when it starts executing the line, and it closes it afterwards, wiping the file out [14:14:48] <_joe_> that's the reason why you can't substitute-in-place in the shell [14:14:53] oh, I know, it was actually a choice [14:15:12] <_joe_> you prefer to have an empty file [14:15:13] showing nothing is maybe better than showing possibly-stale data [14:15:19] <_joe_> than a stale but correct one? [14:15:41] it lets you know something is up, at least [14:15:45] <_joe_> it depends on the function you want to achieve, but yeah then it's a good choice [14:15:56] anyway I'm not strongly convinced either way [14:16:12] and we have other ways of knowing that a systemd::timer::job is broken [14:16:18] so a temporary file it is [14:16:27] <_joe_> yeah I was about to say [14:16:36] <_joe_> if the script ends in error, systemd will take notice [14:23:40] hahah, TIL that dig(1) returns with exit code 0 on NXDOMAIN [14:24:56] <_joe_> wut [14:25:00] <_joe_> sigh [14:25:08] you two, how dare, wasting those teases for brandon in a channel he's not in... [14:25:11] it's fine, I have a nice workaround [14:41:47] oh hooray, tracing through puppet code [15:00:42] Q: I'm troubleshooting a perf issue that I cannot replicate locally, assuming that I can replicate it in k8s staging, is that an appropriate place to do some light service benchmarking? And if so, from where is the best place to conduct the tests? [15:17:39] 10serviceops, 10Operations, 10cloud-services-team, 10Core Platform Team Legacy (Watching / External), and 4 others: Switch cronjobs on maintenance hosts to PHP7 - https://phabricator.wikimedia.org/T195392 (10jijiki) [15:28:36] I guess I dropped a hot potato :) [15:42:19] I have some opinions, but I wouldn't call them informed ones [15:47:34] cdanis: shoot :) [15:48:19] if the staging cluster *isn't* appropriate for that, I'm inclined to say that's a bug [15:48:55] yeah [15:49:10] I also think it'd be nice if it were easy to run some load generation in pods on the staging cluster [15:49:33] honestly, it's the nature of the bug I'm trying to track down that it the throughput doesn't need to be high at all [15:49:49] doesn't *need* to be, anyway [15:49:58] but I'm not sure where to run it from [15:50:16] because I'd also need the tools to test with there [15:51:20] also... the way I was told to access the staging instance was by a port-forward with kubectl, which I think means I'd need to run from the deploy host [15:51:53] but I've since come to suspect it's also available as kubernetes1001.eqiad.wmnet:8081 [15:53:27] related: I don't think prometheus collection of the staging service is working either, that makes this process more difficult [15:55:24] that is definitely a bug [15:57:57] 10serviceops, 10Performance-Team: Create warmup procedure for MediaWiki app servers - https://phabricator.wikimedia.org/T230037 (10Krinkle) [15:59:02] 10serviceops, 10Performance-Team: Create warmup procedure for MediaWiki app servers - https://phabricator.wikimedia.org/T230037 (10Krinkle) [16:59:07] 10serviceops, 10Operations, 10cloud-services-team, 10Core Platform Team Legacy (Watching / External), and 4 others: Switch cronjobs on maintenance hosts to PHP7 - https://phabricator.wikimedia.org/T195392 (10jijiki) [17:06:09] mutante: I switched wikidata crons with Amir [17:06:38] if that was what was keeping us from running all crons in php7 [17:06:58] maybe we are ready now :) [17:11:42] jijiki: oh, cool! very nice. see the table i made. there are somewhat different categories depending which other script they invoke. the ones with a straight "mwscript" commandline were easy but at least one other though hardcodes "php" as RUNNER. some Bash calling PHP [17:12:40] then one thing would be to change php7 as the default of php-cli [17:12:59] and then check the ones that might have something hardcoded? [17:15:51] yea. one is checking "foreachwikiindblist", one is checking "mwscriptwikiset" [17:16:13] there was one that had HHVM mentioned in the command line [17:16:22] but i lost my comment it seems.. grrm [17:17:19] oh yea, that was the wikidata one you did! how nice [17:17:21] PHP='hhvm -vEval.Jit=1' [17:19:08] yeah [17:21:55] so that is removed now ,right (still waiting for git pull for some reason) [17:22:13] ok, i will see what i can do next in a bit [17:23:06] 13:21 < addshore> what mediawiki related cronjobs do we have that run every 2 hours? [17:23:10] maybe answer that, heh [17:23:39] maybe i should add a column to that table how often stuff runs and then move it to wiki [17:23:52] :D [17:27:03] addshore: hah, yea, i am looking now [17:27:32] are you asking related to anything wikidata? [17:27:45] 0 */2 * * * /usr/local/bin/mwscriptwikiset extensions/FlaggedRevs/maintenance/updateStats.php [17:27:48] ^ here's one [17:28:18] just catching up on my emails and there is one regarding a spike in something wikidata related happening every 2 hours, any cron running may not be wikidata related though [17:29:30] 10 */2 * * * /usr/local/bin/foreachwiki extensions/CirrusSearch/maintenance/saneitizeJobs.php [17:30:38] saneitize oder sanitize [17:31:14] addshore: i think that's it. those 2. FlaggedRevs updateStats and CirrusSearch jobs [17:31:22] hmm, okay [17:31:39] its probably neither of them, oh well, [17:32:28] seems to start just after half past the hour every 2 hours [17:32:56] might be down to some in mediawiki or in wikibase cache ttl, i'll do some digging [17:33:27] alright [17:43:35] <_joe_> urandom: sorry I was not reading here [17:43:49] <_joe_> yes the staging cluster is the right place where to test a bug in a prod-like setting [17:44:05] <_joe_> assuming it's not sharing the cassandra with production, but that you know better than me :P [17:44:24] What about where to run the test? [17:45:42] _joe_: ^^ [17:46:45] And, wherever that is I'd need wrk (benchmark util) installed, and a Lua module (cjson) [17:56:01] <_joe_> ok, I would say this needs more attention that I can give you now, but I guess mwmaint1002 is the right place? or deploy100*? [18:03:33] _joe_: sure, no worries