[08:00:55] 10serviceops, 10Operations, 10Core Platform Team Backlog (Watching / External), 10Patch-For-Review, and 2 others: Use PHP7 to run all async jobs - https://phabricator.wikimedia.org/T219148 (10jijiki) [08:05:27] 10serviceops, 10Operations, 10Core Platform Team Backlog (Watching / External), 10Patch-For-Review, and 2 others: Use PHP7 to run all async jobs - https://phabricator.wikimedia.org/T219148 (10jijiki) After discussing with @Pchelolo, we believe that in order to migrate the rest, we could migrate ~25% of job... [11:38:33] 10serviceops, 10Operations, 10Patch-For-Review, 10Release-Engineering-Team-TODO (201907), 10Wikimedia-Incident: docker-registry: some layers has been corrupted due to deleting other swift containers - https://phabricator.wikimedia.org/T228196 (10fsero) it seems that container synchronization is broken a... [13:51:44] hashar: https://gerrit.wikimedia.org/r/c/integration/config/+/523804 [14:05:11] tarrow: around? [15:20:02] 10serviceops, 10Operations, 10ops-codfw, 10User-jijiki: Degraded RAID on mw2250 - https://phabricator.wikimedia.org/T226948 (10Papaul) a:05Papaul→03MoritzMuehlenhoff Replaced both 500GB disks with 250GB disks . All your's for re-imaging [15:29:51] fsero: yes! [15:30:01] but clearly not paying enough attention to IRC [15:40:42] :-) [15:42:58] tarrow: im going to leave now for today, the issue you had yesterday should be mitigated [15:43:13] however if you cannot deploy your new image i recommend you to deploy a new one [15:43:38] fsero: thanks! I'll take a look at deploying a newly built one [15:43:55] Sorry for abandoning you yesterday with it [16:09:01] 10serviceops, 10Operations, 10ops-codfw: (OoW) restbase2009 lockup - https://phabricator.wikimedia.org/T227408 (10Papaul) Indeed the server is not showing the Smart Storage Battery status. Lets try to upgrade the server firmware since the last upgrade was from 2015. @fgiunchedi Let me know when we can de... [16:11:16] 10serviceops, 10Operations, 10ops-codfw: (OoW) restbase2009 lockup - https://phabricator.wikimedia.org/T227408 (10Papaul) {F29791228} [16:57:18] 10serviceops, 10Beta-Cluster-Infrastructure, 10Editing-team, 10Release Pipeline, and 2 others: Migrate Beta cluster services to use Kubernetes - https://phabricator.wikimedia.org/T220235 (10Pchelolo) [16:57:50] 10serviceops, 10ChangeProp, 10Operations, 10Release Pipeline, and 4 others: Migrate cpjobqueue to kubernetes - https://phabricator.wikimedia.org/T220399 (10Pchelolo) [16:58:54] 10serviceops, 10Operations, 10ops-codfw: (OoW) restbase2009 lockup - https://phabricator.wikimedia.org/T227408 (10Eevans) @Papaul you can take the server down as needed. [17:00:14] 10serviceops, 10Operations, 10Core Platform Team (Services Operations): Migrate node-based services in production to node10 - https://phabricator.wikimedia.org/T210704 (10Pchelolo) [17:46:41] 10serviceops, 10Operations, 10Release Pipeline, 10Core Platform Team (RESTBase Split (CDP2)), and 4 others: Deploy the RESTBase front-end service (RESTRouter) to Kubernetes - https://phabricator.wikimedia.org/T223953 (10Pchelolo) [18:12:42] 10serviceops, 10Operations, 10ops-codfw: (OoW) MCE errors on mw2181 / temperature warnings - https://phabricator.wikimedia.org/T205240 (10Dzahn) a:05MoritzMuehlenhoff→03None [18:14:10] 10serviceops, 10Operations, 10ops-codfw: (OoW) restbase2009 lockup - https://phabricator.wikimedia.org/T227408 (10Papaul) After Firmware upgrade, we still have the Smart storage battery problem since the server is out of warranty we can not have the part replaced. [18:17:13] jijiki: do you know about mw2181? after papaul finished the firmware upgrade i just did "scap pull" but that fails with "mwscript not found" [18:17:31] originally this was just reported for having temperature issues [18:25:03] 10serviceops, 10Operations, 10ops-codfw: (OoW) MCE errors on mw2181 / temperature warnings - https://phabricator.wikimedia.org/T205240 (10Dzahn) Running 'scap pull' on this host (to sync mw code before repooling) fails with "sudo: /usr/local/bin/mwscript: command not found". [19:02:07] 10serviceops, 10Release-Engineering-Team: 'scap pull' stopped working on appservers ? - https://phabricator.wikimedia.org/T228328 (10Dzahn) [19:04:07] 10serviceops, 10Release-Engineering-Team: 'scap pull' stopped working on appservers ? - https://phabricator.wikimedia.org/T228328 (10Jdforrester-WMF) Is this a circular dependency? `mwscript` (== `/usr/local/bin/mwscript`) is not part of the appserver base? [19:04:34] 10serviceops, 10Release-Engineering-Team: 'scap pull' stopped working on appservers ? - https://phabricator.wikimedia.org/T228328 (10thcipriani) `refreshMessageBlobs` was added in T222539 One of two solutions: * Install `scap::scripts` on all appservers rather than canary appservers * rethink how this is incl... [19:08:03] I am afraid I dont know mutante [19:08:15] 10serviceops, 10Operations, 10ops-codfw: (OoW) MCE errors on mw2181 / temperature warnings - https://phabricator.wikimedia.org/T205240 (10Dzahn) Made a separate task for the scap pull issue. Repooled the server anyways. [19:08:35] I just depooled it right before I left [19:10:58] 10serviceops, 10Operations, 10ops-codfw: (OoW) MCE errors on mw2181 / temperature warnings - https://phabricator.wikimedia.org/T205240 (10Dzahn) 05Open→03Resolved a:03Dzahn mcelog has not been written to since Oct 10 2018. No new thermal events after that. So not sure if that tells us much about the f... [19:11:59] effie: i made a separate task and it already has replies. https://phabricator.wikimedia.org/T228328 and the server i repooled and closed that. no worries. [19:12:26] it was only offline a short time but when some mw servers are down for days.. we need scap pull back [19:12:32] alright [19:13:16] that server had no more temperature alerts since a long time ago. so not sure if that firmware upgrade was actually related. but we need to do it before complaining more to Dell anyways [19:15:37] 10serviceops, 10Operations, 10Traffic, 10Patch-For-Review: Applayer services without TLS - https://phabricator.wikimedia.org/T210411 (10Dzahn) [19:25:13] 10serviceops, 10Operations, 10Patch-For-Review, 10Release-Engineering-Team-TODO (201907), 10Wikimedia-Incident: docker-registry: some layers has been corrupted due to deleting other swift containers - https://phabricator.wikimedia.org/T228196 (10Jdforrester-WMF) We're seeing this happening now on contint... [19:35:08] 10serviceops, 10Release-Engineering-Team: 'scap pull' stopped working on appservers ? - https://phabricator.wikimedia.org/T228328 (10thcipriani) >>! In T228328#5342778, @thcipriani wrote: > `refreshMessageBlobs` was added in T222539 > > One of two solutions: > * Install `scap::scripts` on all appservers rathe... [19:39:22] 10serviceops, 10Operations, 10Patch-For-Review, 10Release-Engineering-Team-TODO (201907), 10Wikimedia-Incident: docker-registry: some layers has been corrupted due to deleting other swift containers - https://phabricator.wikimedia.org/T228196 (10thcipriani) For that particular image I can recreate locall... [20:39:54] 10serviceops, 10MediaWiki-Logging, 10Operations, 10Wikimedia-Logstash, and 8 others: Port mediawiki/php/wmerrors to PHP7 and deploy - https://phabricator.wikimedia.org/T187147 (10tstarling) Is this blocking deployment of PHP 7? [20:54:44] 10serviceops, 10Machine vision, 10Operations, 10Reading-Infrastructure-Team-Backlog (Kanban), and 2 others: Update open_nsfw-- for Wikimedia production deployment - https://phabricator.wikimedia.org/T225664 (10Mholloway) [21:34:51] 10serviceops, 10Operations: SRE FY2019 Q4 goal: complete the transition to PHP7 - https://phabricator.wikimedia.org/T219127 (10Fito) [22:33:48] 10serviceops, 10Operations, 10ops-codfw, 10User-jijiki: Degraded RAID on mw2250 - https://phabricator.wikimedia.org/T226948 (10Dzahn) a:05MoritzMuehlenhoff→03Dzahn [22:35:47] !log reimaging mw2250 after disks have been replaced [22:55:56] 10serviceops, 10Machine vision, 10Operations, 10Reading-Infrastructure-Team-Backlog (Kanban), and 2 others: Update open_nsfw-- for Wikimedia production deployment - https://phabricator.wikimedia.org/T225664 (10Mholloway) [23:19:12] 10serviceops, 10Machine vision, 10Operations, 10Reading-Infrastructure-Team-Backlog (Kanban), and 2 others: Update open_nsfw-- for Wikimedia production deployment - https://phabricator.wikimedia.org/T225664 (10Mholloway) @Joe I've updated the fork at https://github.com/mdholloway/nsfwoid according to your...