[00:00:03] Not hard, I guess [00:01:36] I mean, I can wait until tomorrow morning rather easily. Longer I guess, if necessary [00:01:54] ok, Alex should be back at work his morning tomorrow, so hopefully this is fixed by the time you're working [01:10:54] 10serviceops, 10Parsing-Team, 10Parsoid, 10PHP 7.2 support: Parsoid-php doesn't get updated after a code deploy - https://phabricator.wikimedia.org/T236275 (10ssastry) >>! In T236275#5613714, @Dzahn wrote: > @ssastry The error in production should hopefully be gone now: > > ` > root@wtp1026:/etc/sudoers.d... [06:31:22] <_joe_> urandom: I did tell you earlier there was a problem with helm deployments and to call me in case of emergency [08:24:33] there is cron complaining about "/bin/sh: 1: /usr/local/bin/hhvm-needs-restart: not found" FYI [08:29:31] that's covered by Effie's recent patches, the rollout is ongoing [08:29:47] cool [09:42:25] <_joe_> akosiaris: around? I want to deploy https://gerrit.wikimedia.org/r/#/c/operations/deployment-charts/+/544158/ [10:03:28] jynus: hmm mutant.e fixed that [10:03:38] I will haev another check [10:03:47] something might be lingering [10:04:48] tx jynus, it is just 2 hosts, I will check what is up [10:05:53] so if it is only spamming, low prio [10:06:21] just mentioned in case it could create a worse issue (php server restart or something) [10:07:35] 10serviceops, 10Wikidata, 10Wikidata-Termbox: Error when executing helmfile commands for the termbox service - https://phabricator.wikimedia.org/T236709 (10Tarrow) @Pablo-WMDE not sure if you want to have this flagged up as this week's incident manager. I think being unable to deploy urgent bugfixes counts a... [10:09:28] 10serviceops, 10Wikidata, 10Wikidata-Termbox: Error when executing helmfile commands for the termbox service - https://phabricator.wikimedia.org/T236709 (10Joe) @Tarrow if it's an urgent bugfix we can just revert the change to let you deploy immediately. Please let's coordinate on IRC, and sorry for the inco... [10:10:28] <_joe_> sigh [10:10:35] <_joe_> ok, deploying the revert [10:17:46] 10serviceops, 10Wikidata, 10Wikidata-Termbox: Error when executing helmfile commands for the termbox service - https://phabricator.wikimedia.org/T236709 (10Pablo-WMDE) Discussed that with Jakob yesterday and am watching this, was under the assumption that it would unlock itself during the course of today - o... [10:19:22] 10serviceops, 10Wikidata, 10Wikidata-Termbox: Error when executing helmfile commands for the termbox service - https://phabricator.wikimedia.org/T236709 (10Joe) @Tarrow @Pablo-WMDE can someone try the release to staging? I should have fixed the rbac roles there. It should've fixed your issues. I am proceedi... [10:24:09] 10serviceops, 10Wikidata, 10Wikidata-Termbox: Error when executing helmfile commands for the termbox service - https://phabricator.wikimedia.org/T236709 (10Joe) 05Open→03Resolved I have just tested and I can easily run `helmfile diff` on termbox now, in all environments. Resolving for now [11:02:27] 10serviceops, 10DBA, 10Operations, 10Goal: Switchover backup director service from helium to backup1001 - https://phabricator.wikimedia.org/T236406 (10jcrespo) I've created a copy of the bacula database on the bacula9 one, and then ran: ` sudo -u bacula ./update_mysql_tables -h m1-master.eqiad.wmnet bacul... [11:19:58] 10serviceops, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (CI & Testing services), 10Test-Coverage: Upgrade our php-xdebug package for php7.2 - https://phabricator.wikimedia.org/T234418 (10hashar) >>! In T234418#5594961, @MoritzMuehlenhoff wrote: > Where does this error show up, in... [11:32:01] 10serviceops, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (CI & Testing services), 10Test-Coverage: Upgrade our php-xdebug package for php7.2 - https://phabricator.wikimedia.org/T234418 (10jijiki) a:03jijiki [11:33:31] 10serviceops, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (CI & Testing services), 10Test-Coverage: Upgrade our php-xdebug package for php7.2 - https://phabricator.wikimedia.org/T234418 (10hashar) a:05jijiki→03None [12:01:20] 10serviceops, 10DBA, 10Operations, 10Goal: Switchover backup director service from helium to backup1001 - https://phabricator.wikimedia.org/T236406 (10akosiaris) [13:19:03] 10serviceops, 10DBA, 10Operations, 10Goal: Switchover backup director service from helium to backup1001 - https://phabricator.wikimedia.org/T236406 (10jcrespo) //TODO: Move log from /var/lib/bacula/log to /var/log/bacula/log [13:29:17] 10serviceops, 10Operations, 10PHP 7.2 support: Mysterious, coordinated slowdowns every ~ 25 minutes on API servers - https://phabricator.wikimedia.org/T231011 (10jijiki) [14:36:11] 10serviceops, 10Machine vision, 10Product-Infrastructure-Team-Backlog: How should the MachineVision extension interact with external APIs from production? - https://phabricator.wikimedia.org/T236797 (10Mholloway) [14:36:59] 10serviceops, 10Machine vision, 10Product-Infrastructure-Team-Backlog: How should the MachineVision extension interact with external APIs from production? - https://phabricator.wikimedia.org/T236797 (10Mholloway) p:05Triage→03High [14:37:02] 10serviceops, 10Machine vision, 10Product-Infrastructure-Team-Backlog: How should the MachineVision extension interact with external APIs from production? - https://phabricator.wikimedia.org/T236797 (10Mholloway) [14:48:24] 10serviceops, 10Machine vision, 10Operations, 10Product-Infrastructure-Team-Backlog (Kanban): Configure Google Cloud Vision credentials in production - https://phabricator.wikimedia.org/T236426 (10Mholloway) [14:48:35] 10serviceops, 10Machine vision, 10Operations, 10Product-Infrastructure-Team-Backlog: How should the MachineVision extension interact with external APIs from production? - https://phabricator.wikimedia.org/T236797 (10Mholloway) [14:57:40] 10serviceops, 10Operations, 10User-jijiki: mw2225 keeps sending cronspam for hhvm-needs-restart - https://phabricator.wikimedia.org/T236799 (10jijiki) [14:57:56] 10serviceops, 10Operations, 10User-jijiki: mw2225 keeps sending cronspam for hhvm-needs-restart - https://phabricator.wikimedia.org/T236799 (10jijiki) p:05Triage→03Low [15:03:24] 10serviceops, 10Core Platform Team, 10MediaWiki-Cache, 10Performance-Team: Ensure apcu incr/decr are atomic - https://phabricator.wikimedia.org/T236800 (10Krinkle) [15:04:36] 10serviceops, 10Core Platform Team, 10MediaWiki-Cache, 10Performance-Team: Ensure apcu incr/decr are atomic - https://phabricator.wikimedia.org/T236800 (10Krinkle) I asked upstream APCu maintainers what their recommendation is. Upstream issue at Their answer w... [15:04:47] 10serviceops, 10Core Platform Team, 10MediaWiki-Cache, 10Performance-Team (Radar): Ensure apcu incr/decr are atomic - https://phabricator.wikimedia.org/T236800 (10Krinkle) [15:08:46] 10serviceops, 10Machine vision, 10Operations, 10Product-Infrastructure-Team-Backlog: How should the MachineVision extension interact with external APIs from production? - https://phabricator.wikimedia.org/T236797 (10Mholloway) [15:28:29] 10serviceops, 10DBA, 10Operations, 10Goal: Switchover backup director service from helium to backup1001 - https://phabricator.wikimedia.org/T236406 (10jcrespo) [15:34:28] 10serviceops, 10Operations: Reimage mwdebug1002 and mw1317 - https://phabricator.wikimedia.org/T236806 (10jijiki) [15:34:40] 10serviceops, 10DBA, 10Operations, 10Goal, 10Patch-For-Review: Switchover backup director service from helium to backup1001 - https://phabricator.wikimedia.org/T236406 (10Ottomata) a:03jcrespo Jaime, assigning to you, feel free to undo or reassign if this is not correct. [15:35:11] 10serviceops, 10DBA, 10Operations, 10Patch-For-Review: Backups on buster hosts fail to run - https://phabricator.wikimedia.org/T235838 (10Ottomata) a:03jcrespo Jaime, assigning to you, feel free to undo or reassign if this is not correct. [15:36:11] 10serviceops, 10DBA, 10Operations, 10Goal, 10Patch-For-Review: Switchover backup director service from helium to backup1001 - https://phabricator.wikimedia.org/T236406 (10jcrespo) Thanks, it is indeed correct and this just happened today (even if alex did most of the work). Not closing because it is high... [15:40:17] 10serviceops, 10Operations: Set up LVS for parsoid/PHP - https://phabricator.wikimedia.org/T233722 (10Ottomata) a:03Dzahn @dzahn, assigning to you, feel free to undo or reassign if this is not correct. [15:41:13] 10serviceops, 10Operations: Make the parsoid cluster support parsoid/PHP - https://phabricator.wikimedia.org/T233654 (10Ottomata) a:03Dzahn @Dzahn, assigning to you, feel free to undo or reassign if this is not correct. [15:43:40] 10serviceops, 10Operations, 10Release Pipeline, 10Goal, 10Release-Engineering-Team (Pipeline): Self-service Deployment Pipeline - https://phabricator.wikimedia.org/T228676 (10Ottomata) a:03akosiaris @akosiaris, assigning to you, feel free to undo or reassign if this is not correct. [15:46:15] 10serviceops, 10DBA, 10Operations, 10Patch-For-Review: Backups on buster hosts fail to run - https://phabricator.wikimedia.org/T235838 (10jcrespo) p:05High→03Low We believe this is fixed after T236406. Keeping it open until all hosts run at least once. [16:43:23] 10serviceops, 10Parsing-Team, 10Parsoid, 10PHP 7.2 support: Parsoid-php doesn't get updated after a code deploy - https://phabricator.wikimedia.org/T236275 (10Dzahn) Error was still there on next attempt. Fix by joe to add missing sudo: https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/deploy/+... [16:48:15] 10serviceops, 10Parsing-Team, 10Parsoid, 10PHP 7.2 support: Parsoid-php doesn't get updated after a code deploy - https://phabricator.wikimedia.org/T236275 (10Dzahn) Prod problem is gone now - beta problem is still open. [16:53:29] <_joe_> mutante: in beta you just need to run systemctl restart php7.2-fpm service [16:53:39] <_joe_> it has no smart restart script nor something for opcache [16:54:46] _joe_: ok! so deploy-local is _trying_ to restart but then gets a file not found. so they see error messages [16:54:58] <_joe_> yes [16:55:11] <_joe_> this needs to change in the parsoid/deploy repo [16:55:27] ok, ACK. thanks [17:14:19] https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/deploy/+/546985/1/scap/environments/beta/checks.yaml [17:45:59] 10serviceops, 10Operations: Make the parsoid cluster support parsoid/PHP - https://phabricator.wikimedia.org/T233654 (10Dzahn) 05Open→03Resolved [18:21:04] 10serviceops, 10MediaWiki-General, 10Operations, 10Core Platform Team Workboards (Clinic Duty Team), and 2 others: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10WDoranWMF) [19:13:41] 10serviceops, 10Operations, 10Parsoid-PHP: wt2html: Out of memory crashers - https://phabricator.wikimedia.org/T236833 (10Joe) We can raise the memory limit for parsoid-php a bit. We first need to find out if such requests use a reasonable amount of memory or not. [19:28:38] 10serviceops, 10DBA, 10Operations, 10Goal, 10Patch-For-Review: Switchover backup director service from helium to backup1001 - https://phabricator.wikimedia.org/T236406 (10jcrespo) For the record: * https://gerrit.wikimedia.org/r/c/operations/puppet/+/546928 (and followup https://gerrit.wikimedia.org/r/c... [19:30:28] 10serviceops, 10DBA, 10Operations, 10Goal, 10Patch-For-Review: Switchover backup director service from helium to backup1001 - https://phabricator.wikimedia.org/T236406 (10jcrespo) I thought this error was due to a backup attempt, pre-patch. However, after I ran it manually, it failed again: ` 29-Oct 19... [19:36:32] 10serviceops, 10DBA, 10Operations, 10Goal, 10Patch-For-Review: Switchover backup director service from helium to backup1001 - https://phabricator.wikimedia.org/T236406 (10jcrespo) That wasn't enough: ` 158826 Full 0 0 Error 29-Oct-19 19:35 install1002.wikimedia.org-Monthly-1st-Wed... [19:36:38] 10serviceops, 10Mobile-Content-Service, 10Operations, 10Page Content Service, and 4 others: New Service Request: wikifeeds - https://phabricator.wikimedia.org/T223469 (10Ottomata) @Joe can you assign this to someone if it is in the 'Doing' status? :) [19:40:38] 10serviceops, 10MediaWiki-General, 10Operations, 10Core Platform Team Workboards (Clinic Duty Team), and 2 others: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10Ottomata) a:03Anomie @Anomie, assigning to you as it see... [19:47:28] 10serviceops, 10DBA, 10Operations, 10Goal, 10Patch-For-Review: Switchover backup director service from helium to backup1001 - https://phabricator.wikimedia.org/T236406 (10jcrespo) I got it, it was the storage daemon that hadn't been restarted, not the clients (that is why the director could connect, but... [19:54:14] 10serviceops, 10Operations: Reimage mwdebug1002 and mw1317 - https://phabricator.wikimedia.org/T236806 (10Ottomata) p:05Triage→03Normal a:03jijiki @jijiki assigning to you but feel free to undo or re-assign. [20:33:55] 10serviceops, 10Machine vision, 10Operations, 10Product-Infrastructure-Team-Backlog: How should the MachineVision extension interact with external APIs from production? - https://phabricator.wikimedia.org/T236797 (10Mholloway) It looks like what we need to do is ensure that our outbound requests can use an... [21:09:29] 10serviceops, 10Operations, 10Parsoid-PHP: wt2html: Out of memory crashers - https://phabricator.wikimedia.org/T236833 (10Izno) The Syrian civil war template is a known issue on wiki even with legacy parser. We've had to remove details at least once from the template before. Just fyi. [22:42:59] 10serviceops, 10Machine vision, 10Operations, 10Product-Infrastructure-Team-Backlog: How should the MachineVision extension interact with external APIs from production? - https://phabricator.wikimedia.org/T236797 (10Mholloway) I see an `http_proxy` value defined in hieradata/common.yaml, but is it exposed... [22:53:38] !log ganeti1003 - gnt-instance remove ununpentium.wikimedia.org (T236748) [23:11:34] 10serviceops, 10Operations: Set up LVS for parsoid/PHP - https://phabricator.wikimedia.org/T233722 (10Dzahn) " lvs parsoid-php workaround " https://gerrit.wikimedia.org/r/c/operations/puppet/+/545619 [23:13:24] 10serviceops, 10Operations: Set up LVS for parsoid/PHP - https://phabricator.wikimedia.org/T233722 (10Dzahn) 05Open→03Resolved a:05Dzahn→03Joe [23:13:28] 10serviceops, 10Operations: Make the parsoid cluster support parsoid/PHP - https://phabricator.wikimedia.org/T233654 (10Dzahn)