[05:45:40] Morning [05:54:10] <_joe_> hello! [05:54:21] <_joe_> I didn't expect you up at this hour fsero [05:55:37] Yeah first kindergarten day for daughter [06:21:18] wow! [08:11:51] wow, time flies, she's already in school :-P [09:08:57] 10serviceops, 10Operations: conftool: upgrade fleet to use existing python3-conftool - https://phabricator.wikimedia.org/T226965 (10Joe) p:05Triage→03Normal a:03Joe [09:14:46] 10serviceops, 10Operations: conftool: upgrade fleet to use existing python3-conftool - https://phabricator.wikimedia.org/T226965 (10Joe) What we need to do is: [] Upgrade python3-etcd to the latest version [] Upgrade python3-conftool to the latest version [] Remove python-conftool if present [10:32:35] 10serviceops, 10Operations: conftool: upgrade fleet to use existing python3-conftool - https://phabricator.wikimedia.org/T226965 (10Joe) This is blocked until https://gerrit.wikimedia.org/r/c/mediawiki/tools/scap/+/491412 is merged and deployed. I'll take care of it during the week. [10:47:52] 10serviceops, 10MediaWiki-Logging, 10Operations, 10Wikimedia-Logstash, and 8 others: Port mediawiki/php/wmerrors to PHP7 and deploy - https://phabricator.wikimedia.org/T187147 (10Joe) After installing wmerrors on the test servers, these are my results: - **OOM errors** are now correctly treated: we get th... [10:51:52] 10serviceops, 10MediaWiki-Logging, 10Operations, 10Wikimedia-Logstash, and 8 others: Port mediawiki/php/wmerrors to PHP7 and deploy - https://phabricator.wikimedia.org/T187147 (10Joe) [10:54:39] <_joe_> akosiaris, fsero can you please update the SRE meeting document? [11:33:19] see -operations for some puppet failure due to '/etc/php/7.2/fpm/conf.d/20-wmerrors.ini' [11:37:29] patch avalible here but needs review https://gerrit.wikimedia.org/r/c/operations/puppet/+/519990 [11:40:36] in particular it's not clear if 'cli' was intentionally excluded and so it should be [] or ['fpm'] or it should be included so being ['cli'] or ['cli','fpm'] [12:49:43] akosiaris: interested what you think about T226988, (particularly the part about including build information in the `/healthz` response) [12:50:40] urandom: https://phabricator.wikimedia.org/T226988#5296162 [12:50:43] :D [12:50:51] both fine by me [12:52:00] urandom: something to think about doing -- https://www.robustperception.io/exposing-the-software-version-to-prometheus [12:52:25] (not sure offhand if the prometheus go client already exports this) [12:52:44] cdanis: auh, yes, good idea [12:52:49] I'll look into that [13:45:18] 10serviceops, 10Operations, 10observability, 10PHP 7.2 support, and 2 others: [Regression] fatal-errors.php action=segfault results in a 503 error under php7-fpm. - https://phabricator.wikimedia.org/T223336 (10Joe) As I explained in T187147#5295715, my understanding is that in case of a segfault php-fpm fa... [14:38:11] 10serviceops, 10Operations, 10observability, 10PHP 7.2 support, and 2 others: [Regression] fatal-errors.php action=segfault results in a 503 error under php7-fpm. - https://phabricator.wikimedia.org/T223336 (10Joe) Using a modified version of `furl` that now supports unix sockets, for segfaults I get: `lan... [16:16:30] 10serviceops, 10Operations, 10Continuous-Integration-Infrastructure (phase-out-jessie): Upload docker-ce 18.06.3 upstream package for Stretch - https://phabricator.wikimedia.org/T226236 (10MoritzMuehlenhoff) >>! In T226236#5291604, @hashar wrote: > Ah eventually I found the entry: > > ` > Name: thirdparty/k... [16:42:03] 10serviceops, 10Operations, 10observability: Gather metrics on request status codes, latencies from the MediaWiki appservers - https://phabricator.wikimedia.org/T226815 (10colewhite) @Joe afaik, we're using mtail for this kind of metrics gathering. Some examples are the [[ https://github.com/wikimedia/puppe... [16:52:14] 10serviceops, 10Operations, 10PHP 7.2 support, 10Performance-Team (Radar), and 2 others: PHP 7 corruption during deployment (was: PHP 7 fatals on mw1262) - https://phabricator.wikimedia.org/T224491 (10Krinkle) [21:18:07] 10serviceops, 10Continuous-Integration-Infrastructure, 10Operations, 10Release-Engineering-Team (Kanban): contint1001 store docker images on separate partition or disk - https://phabricator.wikimedia.org/T207707 (10Dzahn) a:03Dzahn [21:22:15] 10serviceops, 10Operations, 10ops-codfw, 10User-jijiki: Degraded RAID on mw2250 - https://phabricator.wikimedia.org/T226948 (10jijiki)