[01:04:50] 10serviceops, 10Parsoid-PHP, 10CPT Initiatives (Parsoid REST API in PHP (CDP2)), 10Patch-For-Review: Deploy Parsoid-PHP with Mediawiki to scandium for RT and performance testing - https://phabricator.wikimedia.org/T228069 (10Dzahn) >>! In T228069#5364513, @Joe wrote: > @Mutante I think we need to do what... [01:14:37] 10serviceops, 10Parsoid-PHP, 10CPT Initiatives (Parsoid REST API in PHP (CDP2)), 10Patch-For-Review: Deploy Parsoid-PHP with Mediawiki to scandium for RT and performance testing - https://phabricator.wikimedia.org/T228069 (10Dzahn) >>! In T228069#5364513, @Joe wrote: > [] make the HHVM installation option... [09:11:57] fsero: I 'll try and use helmfile to deploy https://gerrit.wikimedia.org/r/#/c/operations/deployment-charts/+/526389. I 'll reach out if I mess it up ;-) [09:12:26] great [09:12:30] im around [09:33:24] fsero: success! [09:35:31] akosiaris: \o/ [09:37:28] akosiaris: regarding applying helmfile in eqiad and codfw, do you think it should be doable next monday? considering depooling per service and applying one by one [09:38:14] <_joe_> +1 [09:38:28] termbox is not a problem, i guess zotero/citoid isn't too the ones that worries me the most are sessionstore urandom and eventgate ottomata [09:38:49] but i guess depooling them should be okay [09:52:24] 10serviceops, 10Operations: tmpreaper possible race condition - https://phabricator.wikimedia.org/T151304 (10jijiki) [11:29:04] 10serviceops, 10Operations: tmpreaper possible race condition - https://phabricator.wikimedia.org/T151304 (10elukey) I have created tmpreaper_1.6.13+nmu1+deb9u1+wmf1_amd64.deb on boron, with the following patch: ` elukey@boron:~/tmpreaper-1.6.13+nmu1+deb9u1$ cat patches/no-log-enoent.patch Index: tmpreaper-1.... [11:54:26] 10serviceops, 10Operations: tmpreaper possible race condition - https://phabricator.wikimedia.org/T151304 (10faidon) We can start by responding to [[ https://bugs.debian.org/763858 | Debian bug #763858 ]] with your fix and see if the maintainer is willing to incorporate this! [12:52:39] o/ [12:52:53] fsero: is there anything I have to do to apply the calico change we merged? [12:52:59] i did a helmfile diff but didn't see anything [12:53:03] (in stagign) [12:54:36] I applied it [12:54:38] So no [12:54:45] :) [12:59:41] _joe_: around? do you mind giving a +1 on https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/525684 [13:04:30] hm [13:04:31] k [13:14:52] _joe_: btw I am ready to proceed on dbctl deploy, did some testing by hand on mwdebug2002 and everything looks good [13:15:12] <_joe_> cdanis: I am around, gimme 20 minutes [13:22:27] hm fsero am getting https://logstash.wikimedia.org/app/kibana#/doc/logstash-*/logstash-2019.07.30/eventgate-wikimedia?id=AWxDBltI-pVO1PagZ5in&_g=h@b74aee6 [13:22:37] i think this is fine in staging. [13:23:04] http timeout to schema.svc [13:23:33] not sure how to verify if it is something i'm doing wrong...no curl or telnet on the pod :) [13:23:55] I did launch a wmfdebug container and was able to curl [13:25:16] You can launch one as well [13:25:51] you can just launch one into the same pod?! [13:26:19] No not in the same pod [13:26:33] I launched a new deploy in the same namespace [13:36:47] hm, fsero it indeed works fine in staging. [13:37:07] i didn't do a wmfdebug, but i got a shell in a pod [13:37:15] launched a nodejs repl [13:37:21] and then just used the http lib to make a request [13:37:30] staging works, eqiad times out [13:39:24] codfw also times out [13:40:15] That's expected otto [13:40:22] Sorry maybe I didn't clarified that [13:40:28] I only applied it on staging [13:40:31] oh [13:40:48] I was hoping to move eqiad and codfw to be managed via helmfile as well [13:40:51] sorry i should have clarified too! I said (in staging) before [13:41:14] Is ok for eventgate to be depooled in one dc [13:41:17] it wasn't working in staging too, but for a different reason, the http request worked [13:41:32] And recreate cluster namespaces for some time? [13:41:45] it should be, eventgate-analytics only produces to kafka in eqiad anyway [13:41:46] from codfw [13:42:01] Ack and depoling in eqiad for some time? [13:42:05] should be fine [13:42:11] Great [13:42:12] as long as all requests get routed [13:42:30] eventgate-main is a little bit more annoying, but still everything should work there too. [13:44:15] We only need to delete namespace and recreate so we are talking worst case scenario 10 mins downtime and one dc at a time [13:44:32] Regarding Calico policy do you need it on eqiad and codfw as well? [13:45:45] codfw too [13:58:36] fsero: we doing that now ? :) or should I make a ticket and schedule? [13:58:54] I can do it now [13:59:02] Lemme a couple minutes [13:59:07] yeehaw [14:14:48] done ottomata [14:17:24] i think it works, thank you! [14:18:58] yes it does, awesome! [14:19:03] painless [14:40:45] dbctl looks good on mwdebug1001 [14:40:51] continuing with some real appservers [14:44:00] tx cdanis ! [14:44:02] good luck ! [14:44:14] 🤞 [14:45:30] * volans turn off laptop and phone [14:47:34] I feel so loved :P [14:47:54] btw the best part about 'scatter crap around production' is that it has the same scansion as 'teenage mutant ninja turtles' [14:48:39] rotfl [14:49:11] is there a them song then? [14:49:16] *theme [14:49:31] https://phab.wmfusercontent.org/file/data/dxnbmgt6vu5zg72d3knv/PHID-FILE-mqjgc5i4jzyoft3kdygq/image.png [14:49:37] * jijiki has a scap sticker [14:50:20] jijiki: jealous [14:50:25] nice (the image) [14:50:30] hahaah [14:50:41] apergos: http://glench.com/tmnt/ and for extra fun look at how it's implemented [14:53:40] that's frickin hilarious [14:53:56] http://glench.com/tmnt/#Full_Revision_History_DUMPS [14:54:30] amazing the fonts there are nowadays... [15:05:16] cdanis: I can hook you up with a scap sticker the next time we are IRL [15:05:34] all laptops need more flying pigs [15:08:39] bd808: <3 [16:21:40] _joe_: can you help me interpret this? https://logstash.wikimedia.org/goto/7d9ac1578a2513fa2a3a8ea0cd016523 [16:22:40] looks like not a concern re: dbctl, probably just a bad user request tried several times? [16:25:56] mm that is an error message *from* mysql itself so I am going to say yes [16:28:12] cdanis: FYI I've an errand to run, I'll reconnect later. Call if needed. [16:28:16] np [16:33:19] <_joe_> cdanis: I'm not sure [16:34:26] <_joe_> cdanis: Error: 1062 Duplicate entry for ... [16:34:43] <_joe_> so yeah, they tried to insert a record twice [16:34:47] yeah that's from mysql, about someone adding a duplicate block entry [16:34:50] <_joe_> probably re-POST ing a form [16:36:14] I have not seen any errors that seem like problems related to db loadbalancing [16:36:20] about to continue to all mw canaries [16:47:16] there are a handful of query timeouts on mw canaries, but they started a minute before my scap run began [16:47:35] (also it's only about 30 requests total) [17:18:56] <_joe_> yeah don't overfreak out :D [17:21:57] don't worry, I'm not :)