[00:55:25] 10serviceops, 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External): Reimage both phab1001 and phab2001 to stretch - https://phabricator.wikimedia.org/T190568 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['phab2001.codfw.wmnet'] ` Of which those *... [06:30:55] 10serviceops, 10Operations, 10wikitech.wikimedia.org, 10PHP 7.2 support, 10Patch-For-Review: switch wikitech to PHP 7.2 - https://phabricator.wikimedia.org/T223393 (10Joe) For the record, the above script works once you do ` sudo apt-get install php-apcu php-bcmath php-bz2 php-cli php-common php-curl php... [06:51:07] 10serviceops, 10Operations, 10wikitech.wikimedia.org, 10PHP 7.2 support, 10Patch-For-Review: switch wikitech to PHP 7.2 - https://phabricator.wikimedia.org/T223393 (10Joe) Oh I forgot to add: the list of unloadable extensions could be found in the php7.2-fpm log. You also need to restart php7.2-fpm for i... [07:48:55] 10serviceops, 10Operations, 10RESTBase-API, 10TechCom, and 2 others: Decide whether to keep violating OpenAPI/Swagger specification in our REST services - https://phabricator.wikimedia.org/T217881 (10mobrovac) [07:48:58] 10serviceops, 10Operations, 10RESTBase, 10RESTBase-API, and 3 others: Make RESTBase spec standard compliant and switch to OpenAPI 3.0 - https://phabricator.wikimedia.org/T218218 (10mobrovac) 05Open→03Resolved This has now been deployed. [07:50:02] 10serviceops, 10Operations, 10RESTBase, 10RESTBase-API, and 3 others: Make RESTBase spec standard compliant and switch to OpenAPI 3.0 - https://phabricator.wikimedia.org/T218218 (10mobrovac) [07:51:41] 10serviceops, 10Operations, 10RESTBase, 10RESTBase-API, and 3 others: Make RESTBase spec standard compliant and switch to OpenAPI 3.0 - https://phabricator.wikimedia.org/T218218 (10mobrovac) [08:34:17] 10serviceops, 10RESTBase, 10Core Platform Team (RESTBase Split (CDP2)), 10Core Platform Team Kanban (Doing), and 3 others: Split RESTBase in two services: storage service and API router/proxy - https://phabricator.wikimedia.org/T220449 (10mobrovac) [08:35:34] 10serviceops, 10RESTBase, 10Core Platform Team (RESTBase Split (CDP2)), 10Core Platform Team Kanban (Doing), and 3 others: Split RESTBase in two services: storage service and API router/proxy - https://phabricator.wikimedia.org/T220449 (10mobrovac) [08:36:13] 10serviceops, 10RESTBase, 10Core Platform Team (RESTBase Split (CDP2)), 10Core Platform Team Kanban (Doing), and 3 others: Split RESTBase in two services: storage service and API router/proxy - https://phabricator.wikimedia.org/T220449 (10mobrovac) [10:32:08] _joe_: I suggest we first merge https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/507939/ [10:32:31] and then I will push another patch in regex.yaml [10:32:40] to target one api server [10:32:46] in codfw [10:32:49] <_joe_> I see a -1 from me [10:32:51] <_joe_> fix it [10:32:54] yeah, I fixed it [10:33:03] I didn't [10:33:05] ? [10:33:06] <_joe_> no let's use hosts/hostname.yaml [10:33:40] I don't mind [10:33:45] I commented on your comments [10:34:22] <_joe_> well the frist comment stands. [10:34:27] <_joe_> and the second too [10:36:27] <_joe_> anyways, nitpicks [10:37:11] my take is that there were some comments on the rendered files as well [10:40:58] _joe_: what is the verdict ? [10:41:00] :D [10:46:26] <_joe_> add those slashes please. [10:46:42] <_joe_> err dashes [10:46:51] <_joe_> also brb [10:47:15] <_joe_> the first comment should be addressed is what I'm saying [10:49:06] it didn't show up on https://puppet-compiler.wmflabs.org/compiler1001/16353/mw1222.eqiad.wmnet/, but yes sorry [10:49:59] <_joe_> didn't it? [10:50:08] <_joe_> all those modified resources [10:50:10] <_joe_> are files [10:50:25] <_joe_> btw there's something wrong with the compiler [10:51:42] when I was experimenting with the dashes, I would see the change on the compiler [10:51:49] the extra new line or not [10:51:55] <_joe_> \n\n # Set the variable if php7 is required\n SetEnvIf Cookie \"PHP_ENGINE=php7\" [10:51:57] <_joe_> vs [10:52:25] <_joe_> \n\n # Set the variable if php7 is required\n SetEnvIf Cookie \"PHP_ENGINE=php7\" backend=php7\n\n [10:52:47] <_joe_> so it looks equal [10:52:54] <_joe_> so whhy say it's modified [10:54:03] <_joe_> ok then, let's merge it as-is [10:54:07] hangon [10:54:29] shouldn't it be SetEnvIf Cookie \"PHP_ENGINE=php7\"\n backend=php7\n\n [10:54:32] <_joe_> yeah I pasted a bit more in the second line, but it's ok [10:54:59] <_joe_> the \n we're interested in is not modified [10:55:20] ok, let's see what gives [11:15:39] ok I found why the changes [11:15:41] https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/507939/11/modules/mediawiki/templates/apache/mediawiki-vhost.conf.erb [11:15:46] L113 [11:16:03] puppet-run said [11:16:05] - [11:16:07] + [11:16:09] \ No newline at end of file [11:18:03] I will push a fix for it, it is a bit silly [11:24:00] <_joe_> ah great [11:24:04] <_joe_> bbl [11:24:25] bb [12:11:19] 10serviceops, 10Operations, 10User-jijiki: Investigate increase in GET ops registered by mcrouter for the mediawiki appserver cluster - https://phabricator.wikimedia.org/T223647 (10elukey) The best guess that I can give after checking the HHVM memcached extension's code is that the following might be the rea... [13:09:29] Percona has gone full Kubernetes https://youtu.be/sSD735mH2-8?t=2074 CC marostegui [13:12:20] <_joe_> waddaya mean [13:15:35] if you cannot watch the full 5 minute presentation, just read this one slide: https://youtu.be/sSD735mH2-8?t=2331 [13:18:08] (that is slide is not that interesting technically, but means they are thinking seriously about containers) [13:30:40] 10serviceops, 10Operations, 10PHP 7.2 support, 10Performance-Team (Radar), 10Wikimedia-production-error: PHP 7 corruption during deployment (was: PHP 7 fatals on mw1262) - https://phabricator.wikimedia.org/T224491 (10Joe) after even the first simplest tests, it's absolutely clear to me that running with... [13:34:29] ok "an operating system for the cloud" already makes me grit my teeth [13:36:41] secure by default = tls? ic. [13:36:56] * apergos takes themselves and their snark back out of the channel... [13:45:43] 10serviceops, 10Operations, 10User-jijiki: Investigate increase in GET ops registered by mcrouter for the mediawiki appserver cluster - https://phabricator.wikimedia.org/T223647 (10Joe) 05Open→03Resolved we transitioned our first appserver (api) to full php7 and it confirms indeed the theory https://graf... [14:18:06] 10serviceops, 10Operations, 10User-jijiki: Investigate increase in GET ops registered by mcrouter for the mediawiki appserver cluster - https://phabricator.wikimedia.org/T223647 (10jijiki) {F29279472} :D [15:27:08] 10serviceops, 10Operations, 10wikitech.wikimedia.org, 10PHP 7.2 support, 10Patch-For-Review: switch wikitech to PHP 7.2 - https://phabricator.wikimedia.org/T223393 (10Dzahn) >>! In T223393#5223143, @Joe wrote: > Once you've done that, I'd say let's be bold and just change the proxy/rewrite rules from ht... [15:33:37] akosiaris: I guess you saw https://gerrit.wikimedia.org/r/c/mediawiki/services/kask/+/513222, it really was that simple [15:33:56] akosiaris: though I think I am going to bike-shed myself into renaming s/hostnames/hosts/ [15:35:54] urandom: nice! Btw this approach saves us from having to reissue certs. The DNS RR approach would need us to reissue certs for the cassandra clusters in question in order to add above said DNS RR as SAN [15:37:17] akosiaris: yeah, I guess it should have been like this all along [15:37:38] probably something that survived from the earlier prototypes unquestioned, or something [16:00:04] urandom: kask-staging-ddcc7b5d8-xgfqc 1/1 Running 0 33s [16:00:09] ok running finally [16:00:14] but I have cheated [16:00:26] I am using the cassandra superuser account [16:00:50] and I should not I guess. I am trying to figure out the roles in restbase-dev1004-a but list roles isn't helpful [16:01:05] NoHostAvailable: ... [16:01:33] akosiaris: for restbase-dev? [16:01:37] I also took the liberty to create the sessions keyspace (with replication 1, probably not what we want in the actual production keyspace) and values table [16:01:57] yup [16:01:58] akosiaris@restbase-dev1004:~$ sudo cqlsh --ssl --cqlshrc=/etc/cassandra-a/cqlshrc restbase-dev1004-a [16:02:29] akosiaris: you can use c-cqlsh {instance id} if you prefer [16:02:38] like, c-cqlsh a [16:02:44] TLS [16:02:46] TIL [16:02:55] that's more useful indeed [16:03:52] anyway, I am guessing we want to create users/roles and keyspaces [16:03:59] akosiaris: yeah [16:04:17] akosiaris: so... the former we can do from puppet [16:04:25] sort of [16:05:23] I need to dig up the specifics, but ultimately puppet creates a file (/etc/cassandra-{id}/adduser.cql), and then someone has to execute it (c-cqlsh a -f /etc/cassandra-{id}/adduser.cql) [16:06:16] akosiaris: as for the keyspace and table, that is entirely manual here for the time being, and is being sussed out as part of T220246 [16:07:02] yup, I expected as much [16:08:30] akosiaris: this is kind of weird because we're sharing a cluster with RESTBase; I'm not sure what to do exactly [16:09:06] akosiaris: the approach with RESTBase was that the `restb` user was applicable for all keyspaces [16:09:30] akosiaris: and there's really no way to know them upfront (as it stands now, anyway) [16:09:54] maybe we should just use the `restb` user for restbase-dev as a one-off? [16:10:03] I am fine with that [16:10:47] lemme dig the adduser.cql stuff for sessionstore [16:11:00] and let's leave restbase-dev as is for now [16:12:06] ah that one is actually already done. I see it on sessionstore1001 [16:12:08] great [16:12:18] the Cassandra sessionstore cluster has its own user that owns all tables [16:12:25] this is not going to fly forever methinks [16:12:31] but yeah, it'll work now [16:12:35] yup, true [16:26:07] urandom: curl -X POST --data lala http://kubestage1001.eqiad.wmnet:8081/sessions/v1/sss [16:26:07] worked fine [16:26:21] and I can get the value back as well, we are fine [16:26:26] cool! [16:26:29] looking now into monitoring and stuff [17:11:13] 10serviceops, 10Continuous-Integration-Config, 10Epic, 10Release-Engineering-Team (Kanban): Define variant Wikimedia production config in compiled, static files - https://phabricator.wikimedia.org/T223602 (10Jdforrester-WMF) [17:12:51] 10serviceops, 10Continuous-Integration-Config, 10Epic, 10Release-Engineering-Team (Kanban): Define variant Wikimedia production config in compiled, static files - https://phabricator.wikimedia.org/T223602 (10Jdforrester-WMF) a:03Jdforrester-WMF [17:21:41] urandom: and there you go https://grafana.wikimedia.org/d/000001590/sessionstore?refresh=1m&orgId=1&var-dc=eqiad%20prometheus%2Fk8s-staging&var-service=mathoid&from=now-1h&to=now [17:21:59] first draft, and I had to leave a curl in a for loop to generate artificial traffic [17:22:09] but we already got most stuff out of it [17:23:26] akosiaris: this is awesome! [17:23:36] akosiaris: should it be showing anything atm? [17:23:42] urandom: btw, the nodejs services include /healthz in the statistics [17:23:50] it should. some minor requests [17:23:53] it doesn't ? [17:24:02] cause I can see them [17:24:03] akosiaris: nothing I can see [17:24:17] everything is "no data points" [17:24:18] hmm werid [17:24:26] indeed the link does not work [17:24:51] there we go [17:25:17] ah dammit... the link has the wrong var in it [17:25:28] https://grafana.wikimedia.org/d/000001590/sessionstore?refresh=1m&orgId=1&var-dc=eqiad%20prometheus%2Fk8s-staging&var-service=sessionstore&from=now-1h&to=now [17:25:31] urandom: that ^ [17:25:35] sorry, wrong c/p [17:28:20] akosiaris: yeah, this is great [17:30:21] akosiaris: btw, is logging working as expected? There seemed to be some confusion about whether we'd need to prefix that @cee cookie [17:31:11] haven't tested that yet [17:31:16] ok [17:32:16] whether or not that is the case, I've recently noticed that gocql and the http module (at least), take it upon themselves to do some logging, the default of which goes to stdout, but this is obviously not JSON formatted [17:32:57] I'm not sure how to fix that, and am hoping that at least it doesn't break anything (that non JSON formatted output will be ignored) [17:33:54] hmm, good that you told me, I may have to take that into account [17:34:20] said logging only happens under rare, exceptional errors, and so far I haven't seen where it did not correspond with handled errors that Kask propagated according as it's own error logs [17:34:42] so it's mostly a question of whether or not it hurts anything when (if) that happens [17:52:15] urandom: I deployed the hosts change as well just right now. Worked fine. [17:52:36] which reminds me. We need to schedule a training for how things are different deployment wise on this platform [17:52:53] akosiaris: yes, that would be awesome [17:53:05] akosiaris: all of this is still quite opaque to me [18:08:24] 10serviceops, 10Operations, 10PHP 7.2 support, 10Performance-Team (Radar), 10Wikimedia-production-error: PHP 7 corruption during deployment (was: PHP 7 fatals on mw1262) - https://phabricator.wikimedia.org/T224491 (10Krinkle) More errors, that I presume are due to corruption due to being impossible based... [18:18:53] 10serviceops, 10Operations, 10PHP 7.2 support, 10Performance-Team (Radar), 10Wikimedia-production-error: PHP 7 corruption during deployment (was: PHP 7 fatals on mw1262) - https://phabricator.wikimedia.org/T224491 (10Krinkle) The mw1321 issue seems to have recovered (based on querying `host:mw1321` on th... [18:52:09] 10serviceops, 10Operations, 10PHP 7.2 support, 10Performance-Team (Radar), 10Wikimedia-production-error: PHP 7 corruption during deployment (was: PHP 7 fatals on mw1262) - https://phabricator.wikimedia.org/T224491 (10Krinkle) @joe I'm excluding php7.2 from all Logstash monitoring related to MediaWiki for... [18:59:40] 10serviceops, 10Operations, 10Performance-Team (Radar), 10User-jijiki: Ramp up percentage of users on php7.2 to 100% on both API and appserver clusters - https://phabricator.wikimedia.org/T219150 (10Krinkle) [19:28:59] 10serviceops, 10Operations, 10Performance-Team (Radar), 10User-jijiki: Ramp up percentage of users on php7.2 to 100% on both API and appserver clusters - https://phabricator.wikimedia.org/T219150 (10Jdforrester-WMF) [19:30:29] 10serviceops, 10Operations, 10Performance-Team (Radar), 10User-jijiki: Ramp up percentage of users on php7.2 to 100% on both API and appserver clusters - https://phabricator.wikimedia.org/T219150 (10Jdforrester-WMF) [19:31:18] 10serviceops, 10Operations, 10Traffic, 10User-jijiki: Allow directing a percentage of API traffic to PHP7 - https://phabricator.wikimedia.org/T219129 (10Jdforrester-WMF) Given this is being used now, does this count as Resolved? Or do you want to keep this open until clean-up?