[00:00:50] RoanKattouw: little spike of "PHP Warning: Use of undefined constant NS_LQT_SUMMARY_TALK - assumed 'NS_LQT_SUMMARY_TALK'" sorts of errors just now. am guessing just a transient deploy-related error, but figured i'd mention since i'm about to take off for the day. [00:01:06] Thanks for the heads up, I'll monitor it [00:01:27] What was the timestamp? [00:01:44] first one 2020-04-28 23:51:34 [00:02:17] 23:51:31 Updating ExtensionMessages-1.35.0-wmf.28.php [00:02:17] 23:51:32 Updating LocalisationCache for 1.35.0-wmf.28 using 30 thread(s) [00:02:22] Guessing it's related to that then [00:02:29] yep, makes sense. [00:03:04] * brennen → out [00:03:54] Yeah, looks like they were all on deploy1001, and all from cawikibooks [00:04:12] ErrorException from line 72 of /srv/mediawiki-staging/php-1.35.0-wmf.28/extensions/LiquidThreads/i18n/Lqt.namespaces.php: PHP Warning: Use of undefined constant NS_LQT_SUMMARY_TALK - assumed 'NS_LQT_SUMMARY_TALK' (this will throw an Error in a future version of PHP) [00:04:17] (and some other line numbers) [00:20:06] PROBLEM - snapshot of s8 in codfw on db1115 is CRITICAL: Last snapshot for s8 at codfw (db2100.codfw.wmnet:3318) taken on 2020-04-28 20:42:07 is 1089 GB, but previous one was 1620 GB, a change of 32.8% https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Alerting [00:20:07] Yeah this looks like it's an artifact of how the i18n rebuild script works [00:43:44] !log catrope@deploy1001 Finished scap: Update WikimediaMessages with new i18n messages for T248421 (duration: 55m 23s) [00:43:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:43:53] T248421: Deploy Quicksurveys extension on all Wikipedias (for a Growth study) - https://phabricator.wikimedia.org/T248421 [00:47:09] (03CR) 10Catrope: [C: 03+2] Add Config for Growth Study Quick Survey [mediawiki-config] - 10https://gerrit.wikimedia.org/r/592788 (https://phabricator.wikimedia.org/T248421) (owner: 10Nray) [00:47:32] 10Operations, 10LDAP-Access-Requests: Add Eamedina to `wmf` LDAF group - https://phabricator.wikimedia.org/T251358 (10eamedina) [00:48:04] (03Merged) 10jenkins-bot: Add Config for Growth Study Quick Survey [mediawiki-config] - 10https://gerrit.wikimedia.org/r/592788 (https://phabricator.wikimedia.org/T248421) (owner: 10Nray) [00:50:57] nray: Alright, we're almost there! Your config patch is on the test server (mwdebug1002), please use the WikimediaDebug browser extensions to test it there [00:51:09] thank you, testing now [00:51:50] (03CR) 10BryanDavis: "Nice work Jbond. One more small layout nit documented inline that I noticed when playing with different screen widths." (031 comment) [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/587538 (https://phabricator.wikimedia.org/T233939) (owner: 10Jbond) [00:54:32] RoanKattouw: tested and things looks great! [00:54:39] you have my +2 [00:55:24] OK here we go [00:56:30] !log catrope@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Enable Growth Study QuickSurvey on enwiki (with sample size 0, for testing) (T248421) (duration: 01m 10s) [00:56:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:56:36] T248421: Deploy Quicksurveys extension on all Wikipedias (for a Growth study) - https://phabricator.wikimedia.org/T248421 [01:11:12] Can someone please take a look at the stack trace for `XqjS8wpAMNcAAZ2@F8oAAAEN`? Don't want to file a phab task if it would be a dup of an existing one [01:12:47] yeah [01:13:34] Function: WatchedItemStore::duplicateEntry [01:13:34] Error: 1213 Deadlock found when trying to get lock; try restarting transaction (10.64.16.101) [01:13:39] I'm guessing it's gonna be a dupe [01:13:54] okay, not reporting then [01:14:13] also [01:14:14] 2020-04-29 01:05:58 [XqjS8wpAMNcAAZ2@F8oAAAEN] mw1373 enwiki 1.35.0-wmf.28 exception ERROR: [XqjS8wpAMNcAAZ2@F8oAAAEN] /w/index.php?title=Special:MovePage&action=submit InvalidArgumentException from line 137 of /srv/mediawiki/php-1.35.0-wmf.28/includes/deferred/LinksUpdate.php: The Title object yields no ID. Perhaps the page doesn't exist? {"exception_id":"XqjS8wpAMNcAAZ2@F8oAAAEN","exception_url":"/w/index.php?title=Special: [01:14:14] MovePage&action=submit","caught_by":"other"} [01:14:33] yeah it was while doing some moves [01:16:56] RoanKattouw: Thank you for deploying! Things looks good from my side (I tested on production). Is the process complete? [01:17:15] Yes, we're all done! Sorry for not clarifying that [01:18:01] Awesome, thanks so much for taking the time. I owe you a beer (or any other beverage)! [04:38:33] (03PS1) 10Jeena Huneidi: scaffold: Fix bugs in service.yaml, deployment.yaml [deployment-charts] - 10https://gerrit.wikimedia.org/r/593093 (https://phabricator.wikimedia.org/T251363) [04:54:50] PROBLEM - PyBal backends health check on lvs1016 is CRITICAL: PYBAL CRITICAL - CRITICAL - cloudelasticlb_9243: Servers cloudelastic1003.wikimedia.org are marked down but pooled: cloudelasticlb6_9243: Servers cloudelastic1001.wikimedia.org are marked down but pooled: cloudelasticlb6_8243: Servers cloudelastic1001.wikimedia.org are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal [04:56:40] RECOVERY - PyBal backends health check on lvs1016 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [05:16:53] Please see https://phabricator.wikimedia.org/T251364 [05:17:06] Not sure if it should be unbreak now [05:28:47] 10Operations, 10SRE-Access-Requests: Request for srv/phab/phabricator/bin/bulk make-silent --id * command via SSH for moving tasks quarterly - https://phabricator.wikimedia.org/T251349 (10Dzahn) a:03Dzahn [05:47:33] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1114', diff saved to https://phabricator.wikimedia.org/P11070 and previous config saved to /var/cache/conftool/dbconfig/20200429-054733-marostegui.json [05:47:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:04:24] 10Operations, 10Wikimedia-Mailing-lists: Create mailing list for Indic Wikisource - https://phabricator.wikimedia.org/T251339 (10jayantanth) [06:06:36] 10Operations, 10Wikimedia-Mailing-lists: Create mailing list for Indic Wikisource - https://phabricator.wikimedia.org/T251339 (10jayantanth) [06:07:32] !log ats-tls restart on cp3064 - T249335 [06:07:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:07:38] T249335: Memory leak on ats-tls 8.0.6 - https://phabricator.wikimedia.org/T249335 [06:15:16] 10Operations, 10Wikimedia-Mailing-lists: Create mailing list for Indic Wikisource - https://phabricator.wikimedia.org/T251339 (10jayantanth) [06:17:28] !log ats-tls restart on cp[3050,3058] - T249335 [06:17:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:17:34] T249335: Memory leak on ats-tls 8.0.6 - https://phabricator.wikimedia.org/T249335 [06:19:42] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1103:3314', diff saved to https://phabricator.wikimedia.org/P11071 and previous config saved to /var/cache/conftool/dbconfig/20200429-061941-marostegui.json [06:19:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:22:54] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1105:3311 and 3312 for reimage', diff saved to https://phabricator.wikimedia.org/P11072 and previous config saved to /var/cache/conftool/dbconfig/20200429-062254-marostegui.json [06:22:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:23:01] (03PS1) 10Marostegui: install_server: Reimage db1105 to buster [puppet] - 10https://gerrit.wikimedia.org/r/593098 [06:31:27] (03CR) 10Marostegui: [C: 03+2] install_server: Reimage db1105 to buster [puppet] - 10https://gerrit.wikimedia.org/r/593098 (owner: 10Marostegui) [06:32:45] !log stop mysql on db1105 for reimage [06:32:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:38:02] (03PS1) 10Dzahn: admins: new admin group to manage bulk jobs on Phabricator [puppet] - 10https://gerrit.wikimedia.org/r/593166 (https://phabricator.wikimedia.org/T251349) [06:39:14] (03CR) 10jerkins-bot: [V: 04-1] admins: new admin group to manage bulk jobs on Phabricator [puppet] - 10https://gerrit.wikimedia.org/r/593166 (https://phabricator.wikimedia.org/T251349) (owner: 10Dzahn) [06:43:05] (03CR) 10Gilles: [C: 03+1] Add lazy-loading to Wikimedia Foundation powered-by icon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593038 (https://phabricator.wikimedia.org/T239377) (owner: 10Jforrester) [06:47:14] (03PS2) 10Dzahn: admins: new admin group to manage bulk jobs on Phabricator [puppet] - 10https://gerrit.wikimedia.org/r/593166 (https://phabricator.wikimedia.org/T251349) [06:47:33] !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime [06:47:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:48:23] (03CR) 10jerkins-bot: [V: 04-1] admins: new admin group to manage bulk jobs on Phabricator [puppet] - 10https://gerrit.wikimedia.org/r/593166 (https://phabricator.wikimedia.org/T251349) (owner: 10Dzahn) [06:50:04] !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [06:50:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:54:00] (03CR) 10Elukey: [C: 03+1] "Looks good, my only fear would be that the kdc on 1001 and 2001 for some reason get restarted at the same time, causing a little unavailab" [puppet] - 10https://gerrit.wikimedia.org/r/592966 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [06:54:58] 10Operations, 10Goal: FY2019-20 Q4 DC switchover and switchback - https://phabricator.wikimedia.org/T243314 (10jcrespo) [06:55:02] (03PS1) 10QEDK: Update jvwiki logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593172 [06:56:17] (03PS3) 10Dzahn: admins: new admin group to manage bulk jobs on Phabricator [puppet] - 10https://gerrit.wikimedia.org/r/593166 (https://phabricator.wikimedia.org/T251349) [06:57:23] (03PS2) 10QEDK: Update jvwiki logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593172 (https://phabricator.wikimedia.org/T251050) [06:57:25] (03CR) 10jerkins-bot: [V: 04-1] admins: new admin group to manage bulk jobs on Phabricator [puppet] - 10https://gerrit.wikimedia.org/r/593166 (https://phabricator.wikimedia.org/T251349) (owner: 10Dzahn) [06:57:27] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [06:58:29] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [07:01:10] (03PS1) 10Marostegui: Revert "install_server: Reimage db1105 to buster" [puppet] - 10https://gerrit.wikimedia.org/r/593176 [07:01:12] (03PS1) 10QEDK: Update Phabricator task for jvwiki logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593177 (https://phabricator.wikimedia.org/T251050) [07:03:17] (03CR) 10Dzahn: "I don't know why this is triggering "ERROR at setup of DataTest.test_absent_members". @jbond, do you know?" [puppet] - 10https://gerrit.wikimedia.org/r/593166 (https://phabricator.wikimedia.org/T251349) (owner: 10Dzahn) [07:04:13] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Request for srv/phab/phabricator/bin/bulk make-silent --id * command via SSH for moving tasks quarterly - https://phabricator.wikimedia.org/T251349 (10Dzahn) p:05Triage→03Medium [07:08:55] 10Operations, 10MediaWiki-Cache, 10MediaWiki-General: SimpleCacheWithBagOStuff : Cache key contains characters that are not allowed - https://phabricator.wikimedia.org/T251368 (10elukey) [07:09:29] addshore: o/ - when you have a minute, can you tell me what tags are best for --^? [07:09:42] I tried to look some wikibase ones but I have no idea [07:11:36] (03PS2) 10Muehlenhoff: Enable base::service_auto_restart for KDC [puppet] - 10https://gerrit.wikimedia.org/r/592966 (https://phabricator.wikimedia.org/T135991) [07:14:32] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1105:3311 and 3312 after reimage', diff saved to https://phabricator.wikimedia.org/P11073 and previous config saved to /var/cache/conftool/dbconfig/20200429-071431-marostegui.json [07:14:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:14:42] (03CR) 10Marostegui: [C: 03+2] Revert "install_server: Reimage db1105 to buster" [puppet] - 10https://gerrit.wikimedia.org/r/593176 (owner: 10Marostegui) [07:15:15] (03CR) 10Muehlenhoff: "It gets spread out via cron_splay(), so 1440 possibile different minutes, that seems acceptable. Going forward we could add a report which" [puppet] - 10https://gerrit.wikimedia.org/r/592966 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [07:16:52] 10Operations, 10Wikimedia-Mailing-lists: Create mailing list for Indic Wikisource - https://phabricator.wikimedia.org/T251339 (10Dzahn) Hi @jayantanth the list has been created. You should have received mail with a random password to login as initial admin: list info page: https://lists.wikimedia.org/mailm... [07:17:10] 10Operations, 10Wikimedia-Mailing-lists: Create mailing list for Indic Wikisource - https://phabricator.wikimedia.org/T251339 (10Dzahn) 05Open→03Resolved a:03Dzahn [07:21:18] (03CR) 10Muehlenhoff: "The patch itself looks fine, but I'm not keen on the approach here: We're allowing new people on the Phabricator server / raising the risk" [puppet] - 10https://gerrit.wikimedia.org/r/593166 (https://phabricator.wikimedia.org/T251349) (owner: 10Dzahn) [07:22:02] (03CR) 10Muehlenhoff: [C: 03+2] Enable base::service_auto_restart for KDC [puppet] - 10https://gerrit.wikimedia.org/r/592966 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [07:22:13] PROBLEM - puppet last run on an-coord1001 is CRITICAL: CRITICAL: Puppet last ran 22 hours ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [07:22:20] elukey: I think that is patched in this train [07:22:32] elukey: let me find the ticket [07:22:37] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [07:22:51] wikidata-campsite would be the best tag right now though [07:23:29] The last patch merged in https://phabricator.wikimedia.org/T245396 should fix the issue elukey which is going out with .30 [07:23:54] 10Operations, 10MediaWiki-Cache, 10MediaWiki-General: SimpleCacheWithBagOStuff : Cache key contains characters that are not allowed - https://phabricator.wikimedia.org/T251368 (10Addshore) [07:24:47] addshore: <3 [07:24:57] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [07:27:27] RECOVERY - puppet last run on an-coord1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [07:28:39] (03CR) 10Dzahn: "Tbh, I feel that if we had suggested to simply add Max to the existing phabricator-admin group it would have been accepted easily." [puppet] - 10https://gerrit.wikimedia.org/r/593166 (https://phabricator.wikimedia.org/T251349) (owner: 10Dzahn) [07:29:42] (03PS1) 10Dzahn: remove https://transparency-private.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/593181 [07:29:48] 10Operations: FY2019-20 Q4 (or later) codfw -> eqiad switchback - https://phabricator.wikimedia.org/T243318 (10jcrespo) [07:29:53] 10Operations, 10Traffic, 10netops, 10Patch-For-Review: eqiad row D switch upgrade - https://phabricator.wikimedia.org/T172459 (10jcrespo) [07:30:24] 10Operations: FY2019-20 Q4 (or later) codfw -> eqiad switchback - https://phabricator.wikimedia.org/T243318 (10jcrespo) [07:30:50] 10Operations: FY2019-20 Q4 (or later) codfw -> eqiad switchback - https://phabricator.wikimedia.org/T243318 (10jcrespo) [07:30:53] 10Operations, 10Puppet, 10DBA, 10User-jbond: DB: perform rolling restart of mariadb daemons to pick up CA changes - https://phabricator.wikimedia.org/T239791 (10jcrespo) [07:31:11] 10Operations, 10Wikimedia-Mailing-lists: Create mailing list for Indic Wikisource - https://phabricator.wikimedia.org/T251339 (10jayantanth) @Dzahn Thanks for your quick action. [07:31:44] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1105:3311 and 3312 after reimage', diff saved to https://phabricator.wikimedia.org/P11074 and previous config saved to /var/cache/conftool/dbconfig/20200429-073144-marostegui.json [07:31:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:31:59] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Request for srv/phab/phabricator/bin/bulk make-silent --id * command via SSH for moving tasks quarterly - https://phabricator.wikimedia.org/T251349 (10Dzahn) https://gerrit.wikimedia.org/r/c/operations/puppet/+/593166 has been created. Please also se... [07:32:35] * elukey brb [07:33:26] (03CR) 10Dzahn: "@Dereckson I am not a native French speaker, what do you think about: https://fr.wikipedia.org/wiki/Wikip%C3%A9dia:Faux-nez/Maitreidmry ?" [puppet] - 10https://gerrit.wikimedia.org/r/592438 (https://phabricator.wikimedia.org/T251001) (owner: 10Dereckson) [07:35:53] (03CR) 10Muehlenhoff: [C: 03+2] Update acmechief config for second IDP staging host [puppet] - 10https://gerrit.wikimedia.org/r/592975 (owner: 10Muehlenhoff) [07:36:08] (03CR) 10Giuseppe Lavagetto: [C: 03+1] "This looks really nice! Next step will be setting up CI for this repository." [debs/helm3] - 10https://gerrit.wikimedia.org/r/592967 (https://phabricator.wikimedia.org/T251305) (owner: 10JMeybohm) [07:37:11] (03CR) 10Muehlenhoff: [C: 03+2] "JFTR, after running Puppet in the KDCs there are now running at 08:16 and 17:15" [puppet] - 10https://gerrit.wikimedia.org/r/592966 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [07:39:20] (03PS2) 10Dzahn: remove https://transparency-private.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/593181 (https://phabricator.wikimedia.org/T188362) [07:39:44] (03CR) 10Vgutierrez: [C: 03+1] cache: move to purged [puppet] - 10https://gerrit.wikimedia.org/r/592928 (https://phabricator.wikimedia.org/T249583) (owner: 10Ema) [07:41:08] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good, one more comment inline" (031 comment) [debs/helm3] - 10https://gerrit.wikimedia.org/r/592967 (https://phabricator.wikimedia.org/T251305) (owner: 10JMeybohm) [07:42:38] (03CR) 10Dzahn: [C: 03+1] "https://puppet-compiler.wmflabs.org/compiler1002/22196/miscweb1002.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/593181 (https://phabricator.wikimedia.org/T188362) (owner: 10Dzahn) [07:43:33] (03CR) 10Dzahn: [C: 03+1] "I also like how this removes the need for having LDAP config on here." [puppet] - 10https://gerrit.wikimedia.org/r/593181 (https://phabricator.wikimedia.org/T188362) (owner: 10Dzahn) [07:45:14] (03CR) 10Dzahn: "@Reuven This should add tests for all "misc" sites now being unified on miscweb1002/2002 (previously spread across 2 sets of VMs). Maybe w" [puppet] - 10https://gerrit.wikimedia.org/r/592883 (https://phabricator.wikimedia.org/T247650) (owner: 10Dzahn) [07:46:11] (03CR) 10Giuseppe Lavagetto: [C: 03+1] Build with vendor, improve packaging [debs/helm] - 10https://gerrit.wikimedia.org/r/592689 (owner: 10JMeybohm) [07:46:51] (03PS3) 10Dzahn: httpbb: add tests for miscweb sites [puppet] - 10https://gerrit.wikimedia.org/r/592883 (https://phabricator.wikimedia.org/T247650) [07:46:55] (03CR) 10Giuseppe Lavagetto: [C: 03+2] scaffold: Fix bugs in service.yaml, deployment.yaml [deployment-charts] - 10https://gerrit.wikimedia.org/r/593093 (https://phabricator.wikimedia.org/T251363) (owner: 10Jeena Huneidi) [07:47:02] (03CR) 10Dzahn: "removed transparency-private. this site will be removed entirely with https://gerrit.wikimedia.org/r/c/operations/puppet/+/593181" [puppet] - 10https://gerrit.wikimedia.org/r/592883 (https://phabricator.wikimedia.org/T247650) (owner: 10Dzahn) [07:47:04] (03PS4) 10JMeybohm: Add debian directory and .gitreview [debs/helm3] - 10https://gerrit.wikimedia.org/r/592967 (https://phabricator.wikimedia.org/T251305) [07:47:16] (03Merged) 10jenkins-bot: scaffold: Fix bugs in service.yaml, deployment.yaml [deployment-charts] - 10https://gerrit.wikimedia.org/r/593093 (https://phabricator.wikimedia.org/T251363) (owner: 10Jeena Huneidi) [07:48:31] (03CR) 10JMeybohm: Add debian directory and .gitreview (031 comment) [debs/helm3] - 10https://gerrit.wikimedia.org/r/592967 (https://phabricator.wikimedia.org/T251305) (owner: 10JMeybohm) [07:49:21] (03CR) 10Muehlenhoff: [C: 03+1] "Ship it" [debs/helm3] - 10https://gerrit.wikimedia.org/r/592967 (https://phabricator.wikimedia.org/T251305) (owner: 10JMeybohm) [07:50:09] (03CR) 10Dzahn: [C: 03+2] "@Dereckson All of these have explicit "blog has been removed" or similar message or are permanent 404. Going ahead." [puppet] - 10https://gerrit.wikimedia.org/r/592649 (https://phabricator.wikimedia.org/T168459) (owner: 10Dzahn) [07:50:31] (03PS3) 10Dzahn: planet: remove some more feeds that don't exist anymore [puppet] - 10https://gerrit.wikimedia.org/r/592649 (https://phabricator.wikimedia.org/T168459) [07:54:40] <_joe_> !log restarting php-fpm on mw1288 (workers die in SIGILL status) [07:54:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:54:57] RECOVERY - Apache HTTP on mw1288 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.071 second response time https://wikitech.wikimedia.org/wiki/Application_servers [07:55:10] !log Upgrade mysql on x1 master (without restarting) in preparation for tomorrow's upgrade - T250701 [07:55:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:55:16] T250701: Restart extension1 (x1) database primary master (db1120) - https://phabricator.wikimedia.org/T250701 [07:55:43] RECOVERY - PHP7 rendering on mw1288 is OK: HTTP OK: HTTP/1.1 200 OK - 75363 bytes in 0.175 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:56:14] 10Operations, 10SRE-Access-Requests, 10serviceops-radar, 10Core Platform Team (Icebox): Onboarding Hugh Nowlan - https://phabricator.wikimedia.org/T242309 (10Dzahn) [07:56:52] 10Operations, 10SRE-Access-Requests, 10serviceops-radar, 10Core Platform Team (Icebox): Onboarding Hugh Nowlan - https://phabricator.wikimedia.org/T242309 (10Dzahn) a:05hnowlan→03Dzahn [07:59:12] 10Operations, 10Cognate, 10ContentTranslation, 10DBA, and 10 others: Restart extension1 (x1) database primary master (db1120) - https://phabricator.wikimedia.org/T250701 (10Marostegui) I have updated the mysql package to 10.1.43-2 Tomorrow I will issue the restart + mysql_upgrade [08:00:27] 10Operations, 10SRE-Access-Requests, 10serviceops-radar, 10Core Platform Team (Icebox): Onboarding Hugh Nowlan - https://phabricator.wikimedia.org/T242309 (10Dzahn) @hnowlan Do you see a signature other than your own with `gpg --list-sigs 63514D67ADFD2615`? Could you `gpg --armor --output hnowlan.pub --ex... [08:02:07] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1105:3311 and 3312 after reimage', diff saved to https://phabricator.wikimedia.org/P11075 and previous config saved to /var/cache/conftool/dbconfig/20200429-080206-marostegui.json [08:02:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:03:48] (03CR) 10Muehlenhoff: [C: 03+1] "Ship it" [debs/helm] - 10https://gerrit.wikimedia.org/r/592689 (owner: 10JMeybohm) [08:05:07] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3054 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [08:07:59] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3054 is OK: HTTP OK: HTTP/1.0 200 OK - 22701 bytes in 0.274 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [08:07:59] !log installing openldap security updates [08:08:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:08:09] 10Operations, 10Android-app-Bugs, 10Page Content Service, 10Product-Infrastructure-Team-Backlog, and 4 others: Incorrect language variant returned for PCS endpoints - https://phabricator.wikimedia.org/T249284 (10ema) >>! In T249284#6091252, @Pchelolo wrote: > Tagging #traffic - seems like Varnish/ATS layer... [08:12:27] (03PS1) 10Dzahn: add TXT record to wikimedia.org for haveibeenpwned.com verification [dns] - 10https://gerrit.wikimedia.org/r/593187 (https://phabricator.wikimedia.org/T246357) [08:13:59] (03CR) 10Ema: [C: 03+2] cache: move to purged [puppet] - 10https://gerrit.wikimedia.org/r/592928 (https://phabricator.wikimedia.org/T249583) (owner: 10Ema) [08:15:07] (03CR) 10Jbond: "hi Daniel," (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/593166 (https://phabricator.wikimedia.org/T251349) (owner: 10Dzahn) [08:15:19] (03CR) 10Muehlenhoff: "Looks good, but I think we should also drop "authnz_ldap" from profile::misc:_apps::httpd, then (and that class can also get the os_versio" [puppet] - 10https://gerrit.wikimedia.org/r/593181 (https://phabricator.wikimedia.org/T188362) (owner: 10Dzahn) [08:18:33] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3064 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [08:20:03] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [08:20:10] (03CR) 10Jbond: [C: 03+1] add graceful-restart to CRs [homer/public] - 10https://gerrit.wikimedia.org/r/577564 (https://phabricator.wikimedia.org/T191667) (owner: 10CDanis) [08:23:36] 10Operations, 10Core Platform Team, 10Traffic: Move all purge traffic to kafka - https://phabricator.wikimedia.org/T250781 (10ema) >>! In T250781#6076720, @Pchelolo wrote: > AFAIK cdnPurgeJob is only involved if the delayed purge is required if reboundDelay option is set. For every rebound purge job there's... [08:23:51] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3064 is OK: HTTP OK: HTTP/1.0 200 OK - 22701 bytes in 0.256 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [08:28:53] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3054 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [08:30:59] FFS :) [08:31:21] !log restart ats-tls on cp[3054,3064] [08:31:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:31:27] 10Operations, 10ops-ulsfo, 10decommission: decommission backup4001 - https://phabricator.wikimedia.org/T161904 (10Dzahn) [08:32:19] !log upgrade to ATS 8.1 on cp4032 - T249335 [08:32:21] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3054 is OK: HTTP OK: HTTP/1.0 200 OK - 22567 bytes in 2.515 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [08:32:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:32:26] T249335: Memory leak on ats-tls 8.0.6 - https://phabricator.wikimedia.org/T249335 [08:35:52] 10Operations, 10Traffic: Puppet cleanup after purged transition - https://phabricator.wikimedia.org/T251374 (10ema) [08:36:01] 10Operations, 10Traffic: Puppet cleanup after purged transition - https://phabricator.wikimedia.org/T251374 (10ema) p:05Triage→03Medium [08:36:12] 10Operations, 10DBA, 10Phabricator: replace phabricator db passwords with longer passwords - https://phabricator.wikimedia.org/T250361 (10Dzahn) [08:36:26] (03CR) 10JMeybohm: "> Patch Set 3: Code-Review+1" [debs/helm3] - 10https://gerrit.wikimedia.org/r/592967 (https://phabricator.wikimedia.org/T251305) (owner: 10JMeybohm) [08:36:47] 10Operations, 10Traffic, 10Patch-For-Review: Create vhtcpd replacement - https://phabricator.wikimedia.org/T249583 (10ema) 05Open→03Resolved purged deployed fleet-wide. [08:39:30] (03CR) 10Vgutierrez: [C: 03+1] vcl: introduce wm_admission_policies [puppet] - 10https://gerrit.wikimedia.org/r/588945 (https://phabricator.wikimedia.org/T249809) (owner: 10Ema) [08:45:48] !log elukey@cumin1001 START - Cookbook sre.zookeeper.roll-restart-zookeeper [08:45:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:48:12] (03PS1) 10Elukey: role::statistics::private: factor out wmde/discovery crons out [puppet] - 10https://gerrit.wikimedia.org/r/593191 (https://phabricator.wikimedia.org/T249754) [08:49:00] (03CR) 10JMeybohm: [C: 03+2] Build with vendor, improve packaging [debs/helm] - 10https://gerrit.wikimedia.org/r/592689 (owner: 10JMeybohm) [08:49:01] !log gerrit1002 - gzipping gerrit.log.2020-04* files in /var/log/gerrit (T243808) [08:49:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:49:09] T243808: gerrit1002 running out of space - https://phabricator.wikimedia.org/T243808 [08:49:39] (03CR) 10Ayounsi: "> Patch Set 10:" (031 comment) [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/588036 (https://phabricator.wikimedia.org/T244153) (owner: 10CRusnov) [08:50:01] (03CR) 10Alexandros Kosiaris: [C: 03+1] "\o/" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/592988 (owner: 10Jbond) [08:52:08] !log elukey@cumin1001 END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) [08:52:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:55:40] (03CR) 10Ayounsi: [C: 03+1] reports cables: Add extra regexp to support more active interfaces [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/589750 (owner: 10CRusnov) [08:56:13] (03PS13) 10Jbond: apereo_cas: update templates login page [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/587538 (https://phabricator.wikimedia.org/T233939) [08:56:15] (03PS1) 10Vgutierrez: ATS: Re-enable the session ID based cache [puppet] - 10https://gerrit.wikimedia.org/r/593192 (https://phabricator.wikimedia.org/T170567) [08:56:55] (03PS2) 10Vgutierrez: ATS: Re-enable the TLS session ID based cache [puppet] - 10https://gerrit.wikimedia.org/r/593192 (https://phabricator.wikimedia.org/T170567) [08:57:07] 10Operations, 10Gerrit, 10Release-Engineering-Team-TODO (2020-04 to 2020-06 (Q4)): gerrit1002 running out of space - https://phabricator.wikimedia.org/T243808 (10Dzahn) 05Open→03Resolved Disk space alert is OK since almost a month, i gzipped the existing logs and beyond that i don't think it's worth the... [08:57:10] 10Operations, 10Gerrit, 10vm-requests: Gerrit VM to test data migration - https://phabricator.wikimedia.org/T239151 (10Dzahn) [08:57:54] (03CR) 10Jbond: ferm-status: add KUBE to list of ignored chains (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/592988 (owner: 10Jbond) [08:58:01] 10Operations, 10serviceops: Chaos Engineering - Stop for x hours one or more mc10xx memcached shards - https://phabricator.wikimedia.org/T251378 (10elukey) [08:59:03] 10Operations, 10serviceops: Chaos Engineering - Stop for x hours one or more mc10xx memcached shards - https://phabricator.wikimedia.org/T251378 (10elukey) [09:00:02] 10Operations, 10Gerrit, 10vm-requests: Gerrit VM to test data migration - https://phabricator.wikimedia.org/T239151 (10Dzahn) a:05Dzahn→03QChris @QChris fyi, this is the dedicated test machine for the gerrit upgrade, you can feel free to use it. I confirmed your shell user exists. also see T243808#60257... [09:00:18] (03PS2) 10Elukey: role::statistics::private: factor out wmde/discovery crons out [puppet] - 10https://gerrit.wikimedia.org/r/593191 (https://phabricator.wikimedia.org/T249754) [09:01:51] (03PS1) 10Ema: cache: remove profile::cache::base::use_purged [puppet] - 10https://gerrit.wikimedia.org/r/593193 (https://phabricator.wikimedia.org/T251374) [09:01:53] (03PS1) 10Ema: varnish: remove varnish::htcppurger [puppet] - 10https://gerrit.wikimedia.org/r/593194 (https://phabricator.wikimedia.org/T251374) [09:01:55] (03PS1) 10Ema: prometheus: remove node_vhtcpd [puppet] - 10https://gerrit.wikimedia.org/r/593195 (https://phabricator.wikimedia.org/T251374) [09:01:57] (03PS1) 10Ema: purged: use nrpe::monitor_systemd_unit_state [puppet] - 10https://gerrit.wikimedia.org/r/593196 (https://phabricator.wikimedia.org/T251374) [09:01:59] (03PS1) 10Ema: purged: make purge_host_regex default to undef [puppet] - 10https://gerrit.wikimedia.org/r/593197 (https://phabricator.wikimedia.org/T251374) [09:02:10] 10Operations, 10Release-Engineering-Team-TODO: Should 'doc' machines (i.e. doc1001) have contint-roots as a group? - https://phabricator.wikimedia.org/T245691 (10Dzahn) 05Stalled→03Declined Alright, based on the last comment i will call it declined then. Reopen if the need arises, of course. [09:03:19] (03CR) 10Vgutierrez: "pcc looks happy: https://puppet-compiler.wmflabs.org/compiler1002/22198/" [puppet] - 10https://gerrit.wikimedia.org/r/593192 (https://phabricator.wikimedia.org/T170567) (owner: 10Vgutierrez) [09:03:46] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler1001/22199/" [puppet] - 10https://gerrit.wikimedia.org/r/593191 (https://phabricator.wikimedia.org/T249754) (owner: 10Elukey) [09:03:57] PROBLEM - WMF Cloud -Chi Cluster- - Public Internet Port - HTTPS on cloudelastic.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Search%23Administration [09:04:08] (03PS3) 10Vgutierrez: ATS: Re-enable the TLS session ID based cache [puppet] - 10https://gerrit.wikimedia.org/r/593192 (https://phabricator.wikimedia.org/T170567) [09:04:09] 10Operations, 10serviceops: Chaos Engineering - Stop for x hours one or more mc10xx memcached shards - https://phabricator.wikimedia.org/T251378 (10Joe) I think we should run 3 different tests, and I would run them for 1 host first. [] Stop memcached completely [] drop all packets directed to port 11211 [] dro... [09:04:12] (03CR) 10Elukey: [C: 03+2] role::statistics::private: factor out wmde/discovery crons out [puppet] - 10https://gerrit.wikimedia.org/r/593191 (https://phabricator.wikimedia.org/T249754) (owner: 10Elukey) [09:04:45] the WMF chi cluster is sadly known, a big reindex is in progress :( [09:04:51] (03CR) 10Ema: [C: 03+1] ATS: Re-enable the TLS session ID based cache [puppet] - 10https://gerrit.wikimedia.org/r/593192 (https://phabricator.wikimedia.org/T170567) (owner: 10Vgutierrez) [09:05:41] RECOVERY - WMF Cloud -Chi Cluster- - Public Internet Port - HTTPS on cloudelastic.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 673 bytes in 3.697 second response time https://wikitech.wikimedia.org/wiki/Search%23Administration [09:05:46] (03CR) 10Vgutierrez: [C: 03+2] ATS: Re-enable the TLS session ID based cache [puppet] - 10https://gerrit.wikimedia.org/r/593192 (https://phabricator.wikimedia.org/T170567) (owner: 10Vgutierrez) [09:06:54] 10Operations, 10Privacy Engineering, 10WMF-Legal, 10Privacy: Consider moving policy.wikimedia.org away from WordPress.com - https://phabricator.wikimedia.org/T132104 (10Dzahn) I would be happy to help converting this to a simple static site on our own infra. The amount of work would be limited (as long as... [09:07:10] (03PS1) 10Ema: cache: remove nginx-ats transition leftover [puppet] - 10https://gerrit.wikimedia.org/r/593198 [09:10:18] !log starting rolling restart of ats-tls to enable the TLS session ID based cache - T170567 [09:10:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:10:25] T170567: Support TLSv1.3 - https://phabricator.wikimedia.org/T170567 [09:11:46] (03CR) 10Vgutierrez: [C: 03+1] cache: remove nginx-ats transition leftover [puppet] - 10https://gerrit.wikimedia.org/r/593198 (owner: 10Ema) [09:12:25] (03PS2) 10Muehlenhoff: Enable idp-test1001 as second IDP staging server [puppet] - 10https://gerrit.wikimedia.org/r/592976 (https://phabricator.wikimedia.org/T233930) [09:14:15] 10Operations, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team-TODO, 10Release-Engineering-Team (CI & Testing services): Assess whether we should still disable seccomp in Docker for CI - https://phabricator.wikimedia.org/T249729 (10hashar) 05Open→03Resolved a:03hashar Per my previo... [09:17:17] 10Operations, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team-TODO, 10Release-Engineering-Team (CI & Testing services): Assess whether we should still disable seccomp in Docker for CI - https://phabricator.wikimedia.org/T249729 (10MoritzMuehlenhoff) 05Resolved→03Open I'm reopening t... [09:18:17] (03PS1) 10JMeybohm: package_builder: Add dh-exec to package builder packages [puppet] - 10https://gerrit.wikimedia.org/r/593200 [09:19:10] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/593200 (owner: 10JMeybohm) [09:19:42] (03CR) 10Ema: [C: 03+2] cache: remove nginx-ats transition leftover [puppet] - 10https://gerrit.wikimedia.org/r/593198 (owner: 10Ema) [09:20:27] (03PS1) 10JMeybohm: Drop obsolete build dependency [debs/helm] - 10https://gerrit.wikimedia.org/r/593201 [09:21:26] (03PS14) 10Jbond: apereo_cas: update templates login page [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/587538 (https://phabricator.wikimedia.org/T233939) [09:22:44] (03CR) 10JMeybohm: [C: 03+2] package_builder: Add dh-exec to package builder packages [puppet] - 10https://gerrit.wikimedia.org/r/593200 (owner: 10JMeybohm) [09:22:57] 10Operations, 10LDAP-Access-Requests, 10observability, 10serviceops, 10Patch-For-Review: Grant Access to Logstash to Peter(peter.ovchyn@speedandfunction.com) - https://phabricator.wikimedia.org/T249037 (10Dzahn) 05Open→03Stalled [09:24:10] 10Puppet, 10Wikimedia Meet: Puppetize the account manager - https://phabricator.wikimedia.org/T251034 (10Majavah) Would like to learn some puppet so happy to help too, if I will be useful in any way [09:24:19] (03CR) 10Addshore: [C: 03+1] Add new properties to wmgWBRepoPreferredPageImagesProperties [mediawiki-config] - 10https://gerrit.wikimedia.org/r/592939 (https://phabricator.wikimedia.org/T249811) (owner: 10Hoo man) [09:27:11] (03CR) 10Jbond: "updated thanks" (031 comment) [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/587538 (https://phabricator.wikimedia.org/T233939) (owner: 10Jbond) [09:28:41] 10Puppet, 10Wikimedia Meet: Puppetize the account manager - https://phabricator.wikimedia.org/T251034 (10Dzahn) @Majavah I am happy to add you as reviewer or CC to any puppet changes in Gerrit. Have you logged in there before? See https://www.mediawiki.org/wiki/Gerrit [09:30:40] 10Puppet, 10Wikimedia Meet: Puppetize the account manager - https://phabricator.wikimedia.org/T251034 (10Majavah) @Dzahn I might have done [[ https://gerrit.wikimedia.org/r/#/q/owner:%22Majavah+%253Chi%2540tassu.me%253E%22 | something ]] sometimes. [09:30:49] !log disable puppet for puppetdb upgrade [09:30:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:30:58] !log disable puppet fleet wide for puppetdb upgrade [09:31:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:33:03] 10Puppet, 10Wikimedia Meet: Puppetize the account manager - https://phabricator.wikimedia.org/T251034 (10Dzahn) @Majavah Very cool! I will add you so you can see the puppet changes and get notified. [09:35:07] 10Puppet, 10Wikimedia Meet: Puppetize the account manager - https://phabricator.wikimedia.org/T251034 (10Majavah) @Dzahn (for the record, the intended meaning of T251034#6092621 was to tell that I've been around for a while and learning Puppet has been on my bucket list for a while) [09:38:45] !log puppet enabled fleetwide [09:38:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:40:07] ohh that explains why my run-puppet-agent was breaking on A:cp :) [09:42:58] (03PS4) 10Ema: vcl: introduce wm_admission_policies [puppet] - 10https://gerrit.wikimedia.org/r/588945 (https://phabricator.wikimedia.org/T249809) [09:43:33] (03CR) 10Ema: [C: 03+2] vcl: introduce wm_admission_policies [puppet] - 10https://gerrit.wikimedia.org/r/588945 (https://phabricator.wikimedia.org/T249809) (owner: 10Ema) [09:48:46] (03PS3) 10Ema: cache: use profile::lvs::realserver [puppet] - 10https://gerrit.wikimedia.org/r/584902 [09:50:28] (03PS4) 10Dzahn: admins: new admin group to manage bulk jobs on Phabricator [puppet] - 10https://gerrit.wikimedia.org/r/593166 (https://phabricator.wikimedia.org/T251349) [09:50:30] (03CR) 10Dzahn: admins: new admin group to manage bulk jobs on Phabricator (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/593166 (https://phabricator.wikimedia.org/T251349) (owner: 10Dzahn) [09:55:27] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1103:3314', diff saved to https://phabricator.wikimedia.org/P11078 and previous config saved to /var/cache/conftool/dbconfig/20200429-095527-marostegui.json [09:55:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:55:45] !log marostegui@cumin1001 dbctl commit (dc=all): 'Fully repool db1105:3311 and 3312 after reimage', diff saved to https://phabricator.wikimedia.org/P11079 and previous config saved to /var/cache/conftool/dbconfig/20200429-095545-marostegui.json [09:55:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:56:30] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1097:3314', diff saved to https://phabricator.wikimedia.org/P11080 and previous config saved to /var/cache/conftool/dbconfig/20200429-095629-marostegui.json [09:56:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:57:06] (03PS2) 10Volans: homer: add a diff check that sends an email [puppet] - 10https://gerrit.wikimedia.org/r/593007 (https://phabricator.wikimedia.org/T249224) [09:59:23] (03PS3) 10Ema: vcl: move 'exp' admission policy to wm_admission_policies [puppet] - 10https://gerrit.wikimedia.org/r/589341 (https://phabricator.wikimedia.org/T249809) [10:00:20] !log reimaging db2087 to buster T250666 [10:00:21] (03CR) 10Volans: [C: 03+2] homer: add a diff check that sends an email [puppet] - 10https://gerrit.wikimedia.org/r/593007 (https://phabricator.wikimedia.org/T249224) (owner: 10Volans) [10:00:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:00:26] T250666: Upgrade WMF database-and-backup-related hosts to buster - https://phabricator.wikimedia.org/T250666 [10:00:44] 10Operations, 10SRE-Access-Requests, 10serviceops-radar, 10Core Platform Team (Icebox): Onboarding Hugh Nowlan - https://phabricator.wikimedia.org/T242309 (10jbond) its getting harder and harder to use signitures with gpg. Anyway the key available on [[ http://keys.gnupg.net/pks/lookup?op=get&search=0x8B4... [10:02:09] 10Operations, 10serviceops, 10Kubernetes: Migrate to helm v3 - https://phabricator.wikimedia.org/T251305 (10JMeybohm) [10:03:29] (03CR) 10Ema: "pcc is fine: https://puppet-compiler.wmflabs.org/compiler1003/22201/" [puppet] - 10https://gerrit.wikimedia.org/r/589341 (https://phabricator.wikimedia.org/T249809) (owner: 10Ema) [10:06:28] 10Operations, 10SRE-Access-Requests, 10serviceops-radar, 10Core Platform Team (Icebox): Onboarding Hugh Nowlan - https://phabricator.wikimedia.org/T242309 (10hnowlan) Here's a copy of the signed key https://people.wikimedia.org/~hnowlan/hnowlan.asc which is the same as the one on keys.gnupg.net. [10:07:50] (03CR) 10Hnowlan: [C: 03+2] Rerender mobile-html on wikidata description changes [deployment-charts] - 10https://gerrit.wikimedia.org/r/593066 (https://phabricator.wikimedia.org/T250209) (owner: 10Ppchelko) [10:10:43] 10Operations, 10Release-Engineering-Team, 10Core Platform Team Workboards (Clinic Duty Team), 10Performance Issue, 10Wikimedia-database-error: WikiPage::updateCategoryCounts causing replication lag due to long-running writes on commonswiki - https://phabricator.wikimedia.org/T240405 (10Aklapper) @CCicale... [10:15:59] (03PS1) 10Kormat: install_server: Allow reimage of db2087 [puppet] - 10https://gerrit.wikimedia.org/r/593205 (https://phabricator.wikimedia.org/T250666) [10:19:02] (03CR) 10Marostegui: [C: 03+1] install_server: Allow reimage of db2087 [puppet] - 10https://gerrit.wikimedia.org/r/593205 (https://phabricator.wikimedia.org/T250666) (owner: 10Kormat) [10:27:47] (03CR) 10Dzahn: [C: 03+1] "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/593181 (https://phabricator.wikimedia.org/T188362) (owner: 10Dzahn) [10:28:18] 10Operations, 10serviceops, 10Kubernetes: Migrate to helm v3 - https://phabricator.wikimedia.org/T251305 (10JMeybohm) [10:28:46] (03PS3) 10Dzahn: remove https://transparency-private.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/593181 (https://phabricator.wikimedia.org/T188362) [10:29:21] 10Operations, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar), 10Sustainability (MediaWiki-MultiDC): Make CDN purges reliable - https://phabricator.wikimedia.org/T133821 (10Joe) Since [[ https://github.com/wikimedia/operations-software-purged | purged ]] is now in production, and that we have s... [10:29:40] 10Operations, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar), 10Sustainability (MediaWiki-MultiDC): Make CDN purges reliable - https://phabricator.wikimedia.org/T133821 (10Joe) At a later time, we could think of changing the logic, and make purges avoid race conditions, removing the need for t... [10:29:43] (03CR) 10Dereckson: "Could be enough, yes, it's a sockpuppet investigation page. Perhaps I would have preferred an explicit request to remove the feed, for exa" [puppet] - 10https://gerrit.wikimedia.org/r/592438 (https://phabricator.wikimedia.org/T251001) (owner: 10Dereckson) [10:29:53] 10Operations, 10SRE-Access-Requests, 10serviceops-radar, 10Core Platform Team (Icebox): Onboarding Hugh Nowlan - https://phabricator.wikimedia.org/T242309 (10Dzahn) ` gpg --keyserver keys.gnupg.net --search-keys hnowlan@wikimedia.org gpg: data source: http://82.148.229.254:11371 (1) Hugh Nowlan 10Operations, 10SRE-Access-Requests, 10serviceops-radar, 10Core Platform Team (Icebox): Onboarding Hugh Nowlan - https://phabricator.wikimedia.org/T242309 (10Dzahn) ` wget https://people.wikimedia.org/~hnowlan/hnowlan.asc .. ` [10:33:43] 10Operations, 10SRE-Access-Requests, 10serviceops-radar, 10Core Platform Team (Icebox): Onboarding Hugh Nowlan - https://phabricator.wikimedia.org/T242309 (10jbond) >>! In T242309#6092868, @Dzahn wrote: > ` > gpg --keyserver keys.gnupg.net --search-keys hnowlan@wikimedia.org > > ` Yes i know its very anno... [10:34:20] jbond42: i don't get why i still don't see it. https://phabricator.wikimedia.org/T242309#6092868 i had similar issues with another user and then suddenly it worked but still dont know why [10:34:44] that includes completely removing it from keyring and importing it again from different servers or people.wm.org [10:35:05] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [debs/helm] - 10https://gerrit.wikimedia.org/r/593201 (owner: 10JMeybohm) [10:35:41] mutante: im not sure about the one on people but i validated the one at http://keys.gnupg.net/pks/lookup?op=get&search=0x63514D67ADFD2615 [10:36:22] gpg --keyserver keys.gnupg.net --recv-keys 0x63514D67ADFD2615 [10:36:24] gpg --list-sigs 63514D67ADFD2615 [10:36:28] ... not there [10:37:11] 10Operations, 10DBA: Make enabling reimaging for db hosts more humane - https://phabricator.wikimedia.org/T251392 (10Kormat) [10:37:54] 10Operations, 10DBA: Make enabling reimaging for db hosts more humane - https://phabricator.wikimedia.org/T251392 (10Kormat) [10:38:31] (03PS1) 10Volans: keyholder: notify the agent on auth changes [puppet] - 10https://gerrit.wikimedia.org/r/593207 [10:38:33] (03PS1) 10Volans: homer: improve check diff email message [puppet] - 10https://gerrit.wikimedia.org/r/593208 (https://phabricator.wikimedia.org/T249224) [10:38:39] (03CR) 10Hashar: "recheck" [debs/helm3] - 10https://gerrit.wikimedia.org/r/592967 (https://phabricator.wikimedia.org/T251305) (owner: 10JMeybohm) [10:39:19] (03CR) 10Kormat: [C: 03+2] install_server: Allow reimage of db2087 [puppet] - 10https://gerrit.wikimedia.org/r/593205 (https://phabricator.wikimedia.org/T250666) (owner: 10Kormat) [10:39:23] mutante: ahh if it where that easy ;) [10:39:24] wget http://keys.gnupg.net/pks/lookup\?op\=get\&search\=0x63514D67ADFD2615\&options\=mr -O - | gpg --import [10:39:28] jbond42: next i manually copy/pasted it from the http link and .. it works.. wtf [10:39:57] mutante: yes thats what i had to do to get riccardos [10:40:06] wow, how strange. thank you [10:40:21] --recv-keys should work too :p meh [10:40:26] yes network is a bit foobared and np [10:41:46] i tried recv-keys with a few different servers and never got the new sigs. i know hnowlan also took part in the key signing party at all hands so should have many more then just mine and vol.ans [10:42:01] meh :( [10:42:21] when adding kormat i had the same issue at first and then i just deleted it and repeated and suddenly it worked [10:42:24] I don't think I got any signatures from the signing party fwiw [10:42:46] hnowlan: ahh ok [10:43:11] hnowlan: alright, let's get this done finally:) will add you now and ping you in a few to test pwstore [10:43:23] mutante: thank you! [10:43:32] the copy on people.wm.org still seems to be without sigs.. but dunno [10:44:13] btw, Moritz said somwhere in the long term we should add our keys to our DNS zone files. sounds nice [10:45:43] did my signature never appear in a public server? [10:47:00] volans: if you manually copy it from http://keys.gnupg.net/pks/lookup?op=get&search=0x63514D67ADFD2615 then you see sigs but if you do "gpg --keyserver keys.gnupg.net --recv-keys " then you don't. even though it's the same server [10:48:52] (03PS1) 10Ema: vcl: add public_clouds_shutdown to labs hiera [puppet] - 10https://gerrit.wikimedia.org/r/593211 [10:49:18] mutante: interesting [10:49:19] and weird [10:49:24] yea [10:50:05] did they decide to not export signatures via gpg protocol anymore? [10:50:17] I'm having the same behaviour with other servers too [10:52:05] until now i thought it is just one of the servers that is stripping signatures [10:52:32] Moritz said we should only use the sks servers btw [10:52:40] (03CR) 10Ema: [C: 03+2] vcl: add public_clouds_shutdown to labs hiera [puppet] - 10https://gerrit.wikimedia.org/r/593211 (owner: 10Ema) [10:53:27] (03CR) 10Muehlenhoff: [C: 03+2] Enable idp-test1001 as second IDP staging server [puppet] - 10https://gerrit.wikimedia.org/r/592976 (https://phabricator.wikimedia.org/T233930) (owner: 10Muehlenhoff) [10:53:54] (03CR) 10Hashar: "I don't know why dpkg-source does not complain about the introduction of the ".gitreview" file :]" (032 comments) [debs/helm3] - 10https://gerrit.wikimedia.org/r/592967 (https://phabricator.wikimedia.org/T251305) (owner: 10JMeybohm) [10:54:06] before i said the opposite because i found some stackoverflow comment about sks being under attack and how Debian now defaults to keys.openpgp.org [10:54:24] under answer 2 on https://unix.stackexchange.com/questions/530778/what-is-debians-default-gpg-keyserver-and-where-is-it-configured [10:55:00] let's check out the "add to our own DNS" option :) [10:55:43] (03CR) 10Muehlenhoff: [C: 03+1] Add debian directory and .gitreview (031 comment) [debs/helm3] - 10https://gerrit.wikimedia.org/r/592967 (https://phabricator.wikimedia.org/T251305) (owner: 10JMeybohm) [10:58:08] mutante: I got it unchanged from ipv4.pool.sks-keyservers.net too and in the web UI has the signatures [10:58:23] and I don't have john's one locally [10:59:10] volans: *nod*. strange [10:59:55] when adding kormat i had the same issue and then asked him to just directly give me an export, had to remove it completely from keychain, reimport again.. then they were there [11:00:04] Amir1, Lucas_WMDE, awight, and Urbanecm: It is that lovely time of the day again! You are hereby commanded to deploy European Mid-day SWAT(Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200429T1100). [11:00:04] hoo: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [11:01:17] (03PS2) 10Hoo man: Add new properties to wmgWBRepoPreferredPageImagesProperties [mediawiki-config] - 10https://gerrit.wikimedia.org/r/592939 (https://phabricator.wikimedia.org/T249811) [11:01:19] (03CR) 10Hoo man: [C: 03+2] Add new properties to wmgWBRepoPreferredPageImagesProperties [mediawiki-config] - 10https://gerrit.wikimedia.org/r/592939 (https://phabricator.wikimedia.org/T249811) (owner: 10Hoo man) [11:02:12] (03Merged) 10jenkins-bot: Add new properties to wmgWBRepoPreferredPageImagesProperties [mediawiki-config] - 10https://gerrit.wikimedia.org/r/592939 (https://phabricator.wikimedia.org/T249811) (owner: 10Hoo man) [11:04:28] mutante: keys.openpgp.org definetly does strip signitures as it uses a newer protocol to avoid the issues with signiture spamming. its possible other networks have moved to the same software/started stripping sigs in HKP. As to the DNS options that really needs DNSSEC for is to be able to correctly trust things [11:05:03] 10Operations, 10SRE-Access-Requests, 10serviceops-radar, 10Core Platform Team (Icebox): Onboarding Hugh Nowlan - https://phabricator.wikimedia.org/T242309 (10Dzahn) It worked after i manually copy/pasted the key from http://keys.gnupg.net/pks/lookup?op=get&search=0x63514D67ADFD2615 and imported it. Then... [11:05:13] !log hoo@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Add new properties to wmgWBRepoPreferredPageImagesProperties (T249811) (duration: 01m 18s) [11:05:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:05:21] T249811: Add more properties to the PageImages list - https://phabricator.wikimedia.org/T249811 [11:06:20] jbond42: ok, ACK! it seems like the other servers just started stripping as well very recently [11:06:24] (03CR) 10Muehlenhoff: [C: 03+2] Remove now obsolete check for jessie [puppet] - 10https://gerrit.wikimedia.org/r/592913 (owner: 10Muehlenhoff) [11:06:48] ack [11:06:56] i missed that [11:07:19] PROBLEM - WMF Cloud -Chi Cluster- - Public Internet Port - HTTPS on cloudelastic.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Search%23Administration [11:08:46] (03CR) 10Muehlenhoff: [C: 03+2] phragile: Remove support for jessie [puppet] - 10https://gerrit.wikimedia.org/r/592914 (owner: 10Muehlenhoff) [11:09:11] RECOVERY - WMF Cloud -Chi Cluster- - Public Internet Port - HTTPS on cloudelastic.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 673 bytes in 9.242 second response time https://wikitech.wikimedia.org/wiki/Search%23Administration [11:13:18] (03PS1) 10Kormat: install_server: switch db2087 to buster [puppet] - 10https://gerrit.wikimedia.org/r/593213 (https://phabricator.wikimedia.org/T250666) [11:15:59] 10Operations, 10SRE-Access-Requests, 10serviceops-radar, 10Core Platform Team (Icebox): Onboarding Hugh Nowlan - https://phabricator.wikimedia.org/T242309 (10Dzahn) added Hugh to the .users file in pwstore, reencrypted files and pushed them. It should work now. [11:16:10] (03CR) 10Jcrespo: [C: 03+1] install_server: switch db2087 to buster [puppet] - 10https://gerrit.wikimedia.org/r/593213 (https://phabricator.wikimedia.org/T250666) (owner: 10Kormat) [11:18:23] (03CR) 10Kormat: [C: 03+2] install_server: switch db2087 to buster [puppet] - 10https://gerrit.wikimedia.org/r/593213 (https://phabricator.wikimedia.org/T250666) (owner: 10Kormat) [11:20:42] (03PS4) 10Ema: vcl: move 'exp' admission policy to wm_admission_policies [puppet] - 10https://gerrit.wikimedia.org/r/589341 (https://phabricator.wikimedia.org/T249809) [11:26:11] PROBLEM - PyBal backends health check on lvs1014 is CRITICAL: PYBAL CRITICAL - CRITICAL - cloudelasticlb_9243: Servers cloudelastic1003.wikimedia.org are marked down but pooled: cloudelasticlb6_9243: Servers cloudelastic1003.wikimedia.org are marked down but pooled: cloudelasticlb6_8243: Servers cloudelastic1003.wikimedia.org are marked down but pooled: cloudelasticlb_8243: Servers cloudelastic1003.wikimedia.org are marked down b [11:26:11] /wikitech.wikimedia.org/wiki/PyBal [11:27:23] RECOVERY - PyBal backends health check on lvs1014 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [11:29:31] 10Operations, 10SRE-Access-Requests, 10serviceops-radar, 10Core Platform Team (Icebox): Onboarding Hugh Nowlan - https://phabricator.wikimedia.org/T242309 (10Dzahn) 05Stalled→03Resolved Hugh confirmed he could get the mgmt password and login to a random host. Closing :) [11:30:11] 10Operations, 10SRE-Access-Requests, 10serviceops-radar, 10Core Platform Team (Icebox): Onboarding Hugh Nowlan - https://phabricator.wikimedia.org/T242309 (10Dzahn) [11:38:18] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/593207 (owner: 10Volans) [11:40:42] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/593181 (https://phabricator.wikimedia.org/T188362) (owner: 10Dzahn) [11:40:54] (03PS2) 10Dzahn: ATS: switch backends for wikiworkshop static sites [puppet] - 10https://gerrit.wikimedia.org/r/592612 (https://phabricator.wikimedia.org/T247650) [11:44:29] (03CR) 10Dzahn: [C: 03+2] ATS: switch backends for wikiworkshop static sites [puppet] - 10https://gerrit.wikimedia.org/r/592612 (https://phabricator.wikimedia.org/T247650) (owner: 10Dzahn) [11:46:23] (03CR) 10Dzahn: "[cumin1001:~] $ httpbb /tmp/test_miscweb.yaml --hosts=miscweb1002.eqiad.wmnet,bromine.eqiad.wmnet,vega.codfw.wmnet,miscweb2002.codfw.wmnet" [puppet] - 10https://gerrit.wikimedia.org/r/592612 (https://phabricator.wikimedia.org/T247650) (owner: 10Dzahn) [11:46:47] (03CR) 10Dzahn: "https://wikiworkshop.org:" [puppet] - 10https://gerrit.wikimedia.org/r/592612 (https://phabricator.wikimedia.org/T247650) (owner: 10Dzahn) [11:54:19] (03PS2) 10Muehlenhoff: Remove component integration for Puppet 5 / Facter 3 on jessie/stretch [puppet] - 10https://gerrit.wikimedia.org/r/583028 [11:55:23] (03CR) 10Ema: [C: 03+2] vcl: move 'exp' admission policy to wm_admission_policies [puppet] - 10https://gerrit.wikimedia.org/r/589341 (https://phabricator.wikimedia.org/T249809) (owner: 10Ema) [11:58:47] !log running puppet on cp-ats - switching backends of wikiworkshop.org [11:58:48] (03PS6) 10Ema: vcl: 10M cutoff for the 'exp' admission policy [puppet] - 10https://gerrit.wikimedia.org/r/589342 (https://phabricator.wikimedia.org/T249809) [11:58:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:00:04] Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200429T1200) [12:01:41] ^ Please note that T251404 may block the train from rolling out on time [12:01:42] T251404: `editingold` when creating a new page - https://phabricator.wikimedia.org/T251404 [12:01:45] (03CR) 10Ema: "pcc after rebase: https://puppet-compiler.wmflabs.org/compiler1001/22203/" [puppet] - 10https://gerrit.wikimedia.org/r/589342 (https://phabricator.wikimedia.org/T249809) (owner: 10Ema) [12:02:19] (03PS4) 10Dzahn: httpbb: add tests for miscweb sites [puppet] - 10https://gerrit.wikimedia.org/r/592883 (https://phabricator.wikimedia.org/T247650) [12:13:00] 10Operations, 10LDAP-Access-Requests: Add Naïké Nembetwa Nzali to "wmf" ldap group - https://phabricator.wikimedia.org/T250821 (10Naike) Hello @CDanis , The wiki account with the username NNzali is indeed mine. I don't think I need to do anything else than what you've listed. Thanks! [12:13:58] (03PS1) 10Muehlenhoff: Install python3-setuptools-scm in package builder [puppet] - 10https://gerrit.wikimedia.org/r/593217 [12:15:07] (03PS1) 10Dzahn: remove bromine.eqiad.wmnet and vega.codfw.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/593218 (https://phabricator.wikimedia.org/T247650) [12:15:08] (03CR) 10Ssingh: [C: 03+1] "+1 if it matters as it does fix the problem :)" [puppet] - 10https://gerrit.wikimedia.org/r/593217 (owner: 10Muehlenhoff) [12:15:42] (03CR) 10Muehlenhoff: [C: 03+2] Install python3-setuptools-scm in package builder [puppet] - 10https://gerrit.wikimedia.org/r/593217 (owner: 10Muehlenhoff) [12:17:51] (03CR) 10Dzahn: "the backends for all the static sites have been switched in steps, there should be no more traffic now besides monitoring ->" [puppet] - 10https://gerrit.wikimedia.org/r/593218 (https://phabricator.wikimedia.org/T247650) (owner: 10Dzahn) [12:17:54] (03CR) 10JMeybohm: [C: 03+2] Drop obsolete build dependency [debs/helm] - 10https://gerrit.wikimedia.org/r/593201 (owner: 10JMeybohm) [12:18:49] (03PS1) 10Andrew Bogott: wmcs-instancepurge.py: fix hiera settings to match profile code [puppet] - 10https://gerrit.wikimedia.org/r/593219 (https://phabricator.wikimedia.org/T251152) [12:19:36] (03CR) 10Dzahn: "possibly superseded by https://gerrit.wikimedia.org/r/591304 now" [puppet] - 10https://gerrit.wikimedia.org/r/579601 (https://phabricator.wikimedia.org/T246763) (owner: 10Thcipriani) [12:20:05] (03PS2) 10Andrew Bogott: wmcs-instancepurge.py: fix hiera settings to match profile code [puppet] - 10https://gerrit.wikimedia.org/r/593219 (https://phabricator.wikimedia.org/T251152) [12:24:36] (03CR) 10Andrew Bogott: [C: 03+2] wmcs-instancepurge.py: fix hiera settings to match profile code [puppet] - 10https://gerrit.wikimedia.org/r/593219 (https://phabricator.wikimedia.org/T251152) (owner: 10Andrew Bogott) [12:25:47] (03PS1) 10Dzahn: decom bromine.eqiad.wmnet and vega.codfw.wmnet [dns] - 10https://gerrit.wikimedia.org/r/593221 (https://phabricator.wikimedia.org/T247650) [12:26:03] !log kormat@cumin1001 dbctl commit (dc=all): 'Depool db2087 for reimaging T250666', diff saved to https://phabricator.wikimedia.org/P11081 and previous config saved to /var/cache/conftool/dbconfig/20200429-122602-kormat.json [12:26:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:26:10] T250666: Upgrade WMF database-and-backup-related hosts to buster - https://phabricator.wikimedia.org/T250666 [12:26:31] PROBLEM - WMF Cloud -Chi Cluster- - Prod MW AppServer Port - HTTPS on cloudelastic.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Search%23Administration [12:27:41] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_citoid_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [12:28:13] RECOVERY - WMF Cloud -Chi Cluster- - Prod MW AppServer Port - HTTPS on cloudelastic.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 673 bytes in 0.010 second response time https://wikitech.wikimedia.org/wiki/Search%23Administration [12:28:46] (03PS2) 10Dzahn: remove bromine.eqiad.wmnet and vega.codfw.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/593218 (https://phabricator.wikimedia.org/T247650) [12:29:31] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [12:33:31] PROBLEM - MariaDB Slave IO: s6 on db2087 is CRITICAL: CRITICAL slave_io_state could not connect https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave [12:33:33] PROBLEM - MariaDB Slave SQL: s7 on db2087 is CRITICAL: CRITICAL slave_sql_state could not connect https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave [12:33:44] kormat: ^ [12:33:49] PROBLEM - mysqld processes on db2087 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting [12:34:16] marostegui: oh crap. what did i forget.. [12:34:21] downtime? [12:34:56] marostegui: it's been a couple of hours since i submitted the change to disable notifications for the host [12:35:12] I am checking and only a few checks got the notifications disabled [12:35:13] That's weird [12:35:19] let's check puppet runs in icinga [12:35:25] The logs I mean [12:37:04] So the puppet ran did disable notifications: https://phabricator.wikimedia.org/P11082 [12:37:15] That specific check looks disabled there but yet alerted [12:37:32] And also: Apr 29 11:17:05 icinga1001 puppet-agent[180139]: (/Stage[main]/Icinga/Systemd::Service[icinga]/Service[icinga]) Triggered 'refresh' from 2 events [12:38:11] marostegui: you'll have to run puppet on the monitored hosts and on icinga itself .probably [12:38:19] mutante: the change was applied 2h ago [12:38:29] uhm, ok [12:38:35] so puppet probably ran on the host too [12:38:40] Let's check that too [12:38:55] the host is currently being reimaged [12:39:02] I will check the central logs [12:40:24] last time it ran: Apr 29 12:06:59 db2087 puppet-agent[16664]: Applying configuration version '(9e7fc4aeeb) Ema - vcl: move 'exp' admission policy to wm_admission_policies' [12:40:26] so yeah, it ran [12:40:31] let me create a task to follow up [12:41:12] marostegui: thanks :) [12:43:17] (03PS3) 10Dzahn: remove bromine.eqiad, vega.codfw and webserver_misc_static role [puppet] - 10https://gerrit.wikimedia.org/r/593218 (https://phabricator.wikimedia.org/T247650) [12:44:52] pro-tip: to see recent puppet runs logs go to puppetboard.w.o [12:46:11] (03PS1) 10Arturo Borrero Gonzalez: aptrepo: cleanup unused openstack components [puppet] - 10https://gerrit.wikimedia.org/r/593223 [12:47:15] (03CR) 10Dzahn: "This also adds backup::host to miscweb* because it was on bromine/vega before but not on webserver_misc_apps." [puppet] - 10https://gerrit.wikimedia.org/r/593218 (https://phabricator.wikimedia.org/T247650) (owner: 10Dzahn) [12:47:51] 10Operations, 10Icinga, 10observability: Icinga notifications didn't get applied after a puppet run - https://phabricator.wikimedia.org/T251407 (10Marostegui) [12:47:53] !log kormat@cumin1001 START - Cookbook sre.hosts.downtime [12:48:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:48:59] could someone review the change for https://phabricator.wikimedia.org/T251404 (by DannyS712), so it can be merged into master and hopefully backported in SWAT or so on? [12:49:08] (03PS2) 10Arturo Borrero Gonzalez: aptrepo: cleanup unused openstack components [puppet] - 10https://gerrit.wikimedia.org/r/593223 [12:49:24] ^ I'm available if anyone has questions about it [12:50:13] DannyS712, good :) [12:50:24] !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [12:50:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:51:33] 10Operations, 10Icinga, 10observability: Icinga notifications didn't get applied after a puppet run - https://phabricator.wikimedia.org/T251407 (10Marostegui) [12:54:20] 10Operations, 10Icinga, 10observability: Icinga notifications didn't get applied after a puppet run - https://phabricator.wikimedia.org/T251407 (10Marostegui) Interestingly, after the host was reimaged and dropped from icinga...when it got created again, it did have all the notifications disabled. [12:56:38] (03PS1) 10Ottomata: Collect kaios_app.error stream into logstash clienterror input [puppet] - 10https://gerrit.wikimedia.org/r/593225 (https://phabricator.wikimedia.org/T248615) [12:57:22] (03PS1) 10Hnowlan: changeprop: disable purge_varnish [deployment-charts] - 10https://gerrit.wikimedia.org/r/593226 (https://phabricator.wikimedia.org/T248677) [13:00:04] liw and brennen: #bothumor I � Unicode. All rise for Mediawiki train - European+American Version deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200429T1300). [13:00:35] (03CR) 10jerkins-bot: [V: 04-1] Collect kaios_app.error stream into logstash clienterror input [puppet] - 10https://gerrit.wikimedia.org/r/593225 (https://phabricator.wikimedia.org/T248615) (owner: 10Ottomata) [13:02:04] (03PS1) 10Lars Wirzenius: group1 wikis to 1.35.0-wmf.30 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593227 [13:02:05] (03CR) 10Lars Wirzenius: [C: 03+2] group1 wikis to 1.35.0-wmf.30 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593227 (owner: 10Lars Wirzenius) [13:02:07] (03PS2) 10Ottomata: Collect kaios_app.error stream into logstash clienterror input [puppet] - 10https://gerrit.wikimedia.org/r/593225 (https://phabricator.wikimedia.org/T248615) [13:02:09] (03Merged) 10jenkins-bot: group1 wikis to 1.35.0-wmf.30 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593227 (owner: 10Lars Wirzenius) [13:04:18] !log liw@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.30 [13:04:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:05:23] !log liw@deploy1001 Synchronized php: group1 wikis to 1.35.0-wmf.30 (duration: 01m 04s) [13:05:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:06:01] (03CR) 10Giuseppe Lavagetto: cergen: Add script for renewing mcrouter certs. (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/589076 (owner: 10RLazarus) [13:06:42] (03CR) 10Giuseppe Lavagetto: [C: 03+1] maintenance: Migrate readinglists to periodic_job [puppet] - 10https://gerrit.wikimedia.org/r/589706 (https://phabricator.wikimedia.org/T211250) (owner: 10RLazarus) [13:06:43] group1 at wmf.30 [13:07:20] (03CR) 10Giuseppe Lavagetto: [C: 03+1] maintenance: Migrate pageassessments to periodic_job [puppet] - 10https://gerrit.wikimedia.org/r/589695 (https://phabricator.wikimedia.org/T211250) (owner: 10RLazarus) [13:07:33] 10Operations, 10observability, 10Graphite, 10Patch-For-Review: Duplicate definitions found in Icinga configuration - https://phabricator.wikimedia.org/T211692 (10Dzahn) 05Resolved→03Open a:05Volans→03None Just found a new new duplicate ones. Recycling the ticket if i may. [13:07:41] @liw if the fix for editingold gets +2 in the next 2 hours can you deploy it within the train window, or does it need to wait for swat? [13:08:07] 10Operations, 10observability, 10Graphite, 10Patch-For-Review: Duplicate definitions found in Icinga configuration - https://phabricator.wikimedia.org/T211692 (10Dzahn) Warning: Duplicate definition found for service 'LVS HTTPS IPv4 #page' on host 'search.svc.eqiad.wmnet' (config file '/etc/nagios/nagios_s... [13:08:26] 10Operations, 10observability: Duplicate definitions found in Icinga configuration - https://phabricator.wikimedia.org/T211692 (10Dzahn) [13:08:44] DannyS712, I would prefer SWAT [13:09:11] okay; lets see if it gets +2 first before scheduling that [13:09:25] (03CR) 10Giuseppe Lavagetto: [C: 03+1] maintenance: Migrate generatecaptcha to periodic_job (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/589688 (https://phabricator.wikimedia.org/T211250) (owner: 10RLazarus) [13:10:17] (03PS1) 10Dzahn: prometheus/icinga: avoid duplicate service definitions [puppet] - 10https://gerrit.wikimedia.org/r/593228 (https://phabricator.wikimedia.org/T211692) [13:11:46] (03PS2) 10Dzahn: prometheus/icinga: avoid duplicate service definitions [puppet] - 10https://gerrit.wikimedia.org/r/593228 (https://phabricator.wikimedia.org/T211692) [13:13:38] (03PS1) 10Dzahn: LVS/icinga: avoid duplicate service definitions [puppet] - 10https://gerrit.wikimedia.org/r/593230 (https://phabricator.wikimedia.org/T211692) [13:15:07] (03PS5) 10JMeybohm: Add debian directory and .gitreview [debs/helm3] - 10https://gerrit.wikimedia.org/r/592967 (https://phabricator.wikimedia.org/T251305) [13:15:43] https://phabricator.wikimedia.org/T251409 and https://phabricator.wikimedia.org/T251408 filed [13:15:55] (03PS2) 10Dzahn: LVS/icinga: avoid duplicate service definitions [puppet] - 10https://gerrit.wikimedia.org/r/593230 (https://phabricator.wikimedia.org/T211692) [13:17:32] @liw I caused at least one :) [13:19:06] (03CR) 10JMeybohm: Add debian directory and .gitreview (033 comments) [debs/helm3] - 10https://gerrit.wikimedia.org/r/592967 (https://phabricator.wikimedia.org/T251305) (owner: 10JMeybohm) [13:19:18] (03CR) 10jerkins-bot: [V: 04-1] LVS/icinga: avoid duplicate service definitions [puppet] - 10https://gerrit.wikimedia.org/r/593230 (https://phabricator.wikimedia.org/T211692) (owner: 10Dzahn) [13:20:30] DannyS712, I'm glad you're availablt to fix :) [13:20:46] patch sent [13:20:51] Now, lets see if I caused the other one [13:22:21] (03PS3) 10Dzahn: LVS/icinga: avoid duplicate service definitions [puppet] - 10https://gerrit.wikimedia.org/r/593230 (https://phabricator.wikimedia.org/T211692) [13:23:51] nope, don't think I caused that one [13:24:18] 10Operations, 10Icinga, 10observability: Icinga notifications didn't get applied after a puppet run - https://phabricator.wikimedia.org/T251407 (10Dzahn) It's notable that: - now all services on db2087 DO have disabled notifications - in the pasted log above the "host_name" parameter only shows up for "idp-... [13:28:18] 10Operations, 10Icinga, 10observability: Icinga notifications didn't get applied after a puppet run - https://phabricator.wikimedia.org/T251407 (10Marostegui) >>! In T251407#6093444, @Dzahn wrote: > It's notable that: > > - in the screenshot pasted above only some services have disabled notifications but al... [13:28:45] 10Operations, 10Icinga, 10observability: Icinga notifications didn't get applied after a puppet run - https://phabricator.wikimedia.org/T251407 (10Dzahn) From my experience puppet has to run first on the monitored host and afterwards on icinga itself and likely more than once. Maybe it ran once and then the... [13:31:50] 10Operations, 10Icinga, 10observability: Icinga notifications didn't get applied after a puppet run - https://phabricator.wikimedia.org/T251407 (10Marostegui) Puppet ran at least twice on the host itself and it did on Icinga as well, likes it ran 3 times on the icinga host between the first disablement and r... [13:31:54] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [debs/helm3] - 10https://gerrit.wikimedia.org/r/592967 (https://phabricator.wikimedia.org/T251305) (owner: 10JMeybohm) [13:32:04] 10Operations, 10Release-Engineering-Team, 10Core Platform Team Workboards (Clinic Duty Team), 10Performance Issue, 10Wikimedia-database-error: WikiPage::updateCategoryCounts causing replication lag due to long-running writes on commonswiki - https://phabricator.wikimedia.org/T240405 (10CCicalese_WMF) Per... [13:41:12] (03PS13) 10Andrew Bogott: wmcs pdns: stop using primary/secondary language for resolvers and recursors [puppet] - 10https://gerrit.wikimedia.org/r/593035 (https://phabricator.wikimedia.org/T249941) [13:44:37] (03CR) 10jerkins-bot: [V: 04-1] wmcs pdns: stop using primary/secondary language for resolvers and recursors [puppet] - 10https://gerrit.wikimedia.org/r/593035 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [13:47:20] (03PS1) 10Dzahn: puppetize meet-accountmanager (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/593233 (https://phabricator.wikimedia.org/T251034) [13:50:28] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3056 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [13:50:43] (03CR) 10jerkins-bot: [V: 04-1] puppetize meet-accountmanager (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/593233 (https://phabricator.wikimedia.org/T251034) (owner: 10Dzahn) [13:52:18] (03CR) 10Giuseppe Lavagetto: [C: 03+1] maintenance: Migrate db_lag_stats_reporter to periodic_job [puppet] - 10https://gerrit.wikimedia.org/r/589672 (https://phabricator.wikimedia.org/T211250) (owner: 10RLazarus) [13:56:26] (03CR) 10Giuseppe Lavagetto: [C: 03+1] maintenance: Migrate cirrussearch to periodic_job [puppet] - 10https://gerrit.wikimedia.org/r/589680 (https://phabricator.wikimedia.org/T211250) (owner: 10RLazarus) [13:57:38] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3056 is OK: HTTP OK: HTTP/1.0 200 OK - 22712 bytes in 0.259 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [13:59:23] (03CR) 10Ladsgroup: "One note: We currently have two types of account manager: A server (the uwsgi one that is exposed over port 5000, the node is meet-auth.eq" [puppet] - 10https://gerrit.wikimedia.org/r/593233 (https://phabricator.wikimedia.org/T251034) (owner: 10Dzahn) [14:00:01] (03PS14) 10Andrew Bogott: wmcs pdns: stop using primary/secondary language for resolvers and recursors [puppet] - 10https://gerrit.wikimedia.org/r/593035 (https://phabricator.wikimedia.org/T249941) [14:02:15] (03PS15) 10Andrew Bogott: wmcs pdns: stop using primary/secondary language for resolvers and recursors [puppet] - 10https://gerrit.wikimedia.org/r/593035 (https://phabricator.wikimedia.org/T249941) [14:04:28] (03PS16) 10Andrew Bogott: wmcs pdns: stop using primary/secondary language for resolvers and recursors [puppet] - 10https://gerrit.wikimedia.org/r/593035 (https://phabricator.wikimedia.org/T249941) [14:07:52] (03PS6) 10Hashar: Add debian directory and .gitreview [debs/helm3] - 10https://gerrit.wikimedia.org/r/592967 (https://phabricator.wikimedia.org/T251305) (owner: 10JMeybohm) [14:08:17] (03CR) 10jerkins-bot: [V: 04-1] wmcs pdns: stop using primary/secondary language for resolvers and recursors [puppet] - 10https://gerrit.wikimedia.org/r/593035 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [14:09:04] (03CR) 10Hashar: [C: 03+1] "I have just dropped an extra newline in the commit message." [debs/helm3] - 10https://gerrit.wikimedia.org/r/592967 (https://phabricator.wikimedia.org/T251305) (owner: 10JMeybohm) [14:10:59] (03CR) 10Ppchelko: [C: 03+2] changeprop: disable purge_varnish [deployment-charts] - 10https://gerrit.wikimedia.org/r/593226 (https://phabricator.wikimedia.org/T248677) (owner: 10Hnowlan) [14:11:17] (03Merged) 10jenkins-bot: changeprop: disable purge_varnish [deployment-charts] - 10https://gerrit.wikimedia.org/r/593226 (https://phabricator.wikimedia.org/T248677) (owner: 10Hnowlan) [14:11:29] (03PS1) 10Elukey: role::kafka::jumbo::broker: enable TLS for the mirror maker [puppet] - 10https://gerrit.wikimedia.org/r/593234 (https://phabricator.wikimedia.org/T250250) [14:11:31] 10Operations, 10serviceops, 10Kubernetes, 10User-fsero, 10User-jijiki: Support kubernetes Egress networkpolicies in our helm charts - https://phabricator.wikimedia.org/T249927 (10akosiaris) a:05akosiaris→03apakhomov [14:12:57] (03PS1) 10Andrew Bogott: wmcs-instancepurge: set envfile so that mwopenstackclients has permission [puppet] - 10https://gerrit.wikimedia.org/r/593235 (https://phabricator.wikimedia.org/T251152) [14:15:20] (03PS17) 10Andrew Bogott: wmcs pdns: stop using primary/secondary language for resolvers and recursors [puppet] - 10https://gerrit.wikimedia.org/r/593035 (https://phabricator.wikimedia.org/T249941) [14:15:24] (03CR) 10Andrew Bogott: [C: 03+2] wmcs-instancepurge: set envfile so that mwopenstackclients has permission [puppet] - 10https://gerrit.wikimedia.org/r/593235 (https://phabricator.wikimedia.org/T251152) (owner: 10Andrew Bogott) [14:17:30] 10Operations: smart-data-dump fails with ValueError when trying to parse a date - https://phabricator.wikimedia.org/T251413 (10Kormat) [14:18:35] (03CR) 10jerkins-bot: [V: 04-1] wmcs pdns: stop using primary/secondary language for resolvers and recursors [puppet] - 10https://gerrit.wikimedia.org/r/593035 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [14:19:00] kormat: you can add tags on the create form :) [14:19:10] (03PS1) 10Ssingh: Update the changelog for v0.1.2 release [software/censorship-monitoring] - 10https://gerrit.wikimedia.org/r/593236 [14:19:16] 10Operations: smart-data-dump fails with ValueError when trying to parse a date - https://phabricator.wikimedia.org/T251413 (10Kormat) Ah. It fails due to a locale issue. It's running `facter --version`, which produces this output for me: ` root@db2087:~# facter --version 2020-04-29 14:18:37.215447 WARN puppetl... [14:19:41] 10Operations, 10InternetArchiveBot, 10Traffic: Support TLSv1.3 - https://phabricator.wikimedia.org/T251414 (10Vgutierrez) [14:20:28] (03CR) 10Ssingh: [C: 03+2] "No code change; updated changelog for new release." [software/censorship-monitoring] - 10https://gerrit.wikimedia.org/r/593236 (owner: 10Ssingh) [14:21:00] (03Abandoned) 10Niedzielski: [prod] [beta] [Vector] remove outdated config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585304 (owner: 10Niedzielski) [14:21:34] (03PS1) 10Jbond: ferm-status: refactor and add support for multiple tables [puppet] - 10https://gerrit.wikimedia.org/r/593237 [14:21:42] (03PS2) 10Seddon: Uncoupling graphoid on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/592924 [14:23:20] (03CR) 10Elukey: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/22209/kafka-jumbo1001.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/593234 (https://phabricator.wikimedia.org/T250250) (owner: 10Elukey) [14:23:34] (03CR) 10Dzahn: "@Ladsgroup In which cloud VPS project is this running please?" [puppet] - 10https://gerrit.wikimedia.org/r/593233 (https://phabricator.wikimedia.org/T251034) (owner: 10Dzahn) [14:25:05] mutante: that'll be https://www.mediawiki.org/wiki/Wikimedia_Meet [14:26:06] (03CR) 10Majavah: "> @Ladsgroup In which cloud VPS project is this running please?" [puppet] - 10https://gerrit.wikimedia.org/r/593233 (https://phabricator.wikimedia.org/T251034) (owner: 10Dzahn) [14:26:11] (03PS1) 10Ssingh: Update the Debian package for the v0.1.2 release [software/censorship-monitoring] - 10https://gerrit.wikimedia.org/r/593240 [14:26:44] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [software/censorship-monitoring] - 10https://gerrit.wikimedia.org/r/593240 (owner: 10Ssingh) [14:27:42] (03CR) 10Ssingh: [C: 03+2] Update the Debian package for the v0.1.2 release [software/censorship-monitoring] - 10https://gerrit.wikimedia.org/r/593240 (owner: 10Ssingh) [14:28:04] (03CR) 10Dzahn: "thanks Majavah" [puppet] - 10https://gerrit.wikimedia.org/r/593233 (https://phabricator.wikimedia.org/T251034) (owner: 10Dzahn) [14:28:23] 10Operations: git-pbuilder incorrectly copies DIST=stretch package files into results/buster-amd64 on deneb.codfw.wmnet - https://phabricator.wikimedia.org/T250803 (10JMeybohm) [14:28:39] (03CR) 10Ayounsi: [C: 03+1] homer: improve check diff email message [puppet] - 10https://gerrit.wikimedia.org/r/593208 (https://phabricator.wikimedia.org/T249224) (owner: 10Volans) [14:28:50] (03PS4) 10Jbond: sre.wdqs.data-transfer: manage ferm rules required for transfer [cookbooks] - 10https://gerrit.wikimedia.org/r/589289 (https://phabricator.wikimedia.org/T206951) [14:29:04] (03CR) 10Jbond: "updated" (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/589289 (https://phabricator.wikimedia.org/T206951) (owner: 10Jbond) [14:30:53] (03CR) 10Volans: "replies to comments inline" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/589076 (owner: 10RLazarus) [14:31:59] (03PS2) 10Mholloway: MachineVision: Update image withholding term list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591406 [14:33:11] 10Operations, 10DBA: PXE Boot defaults to automatically reimaging (normally destroying os and all filesystemdata) on all servers - https://phabricator.wikimedia.org/T251416 (10jcrespo) [14:36:02] (03CR) 10Mholloway: [C: 03+2] MachineVision: Update image withholding term list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591406 (owner: 10Mholloway) [14:36:03] 10Operations, 10DBA, 10DC-Ops: PXE Boot defaults to automatically reimaging (normally destroying os and all filesystemdata) on all servers - https://phabricator.wikimedia.org/T251416 (10jcrespo) [14:37:02] (03Merged) 10jenkins-bot: MachineVision: Update image withholding term list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591406 (owner: 10Mholloway) [14:37:03] RhinosF1: ack, thanks [14:37:08] np [14:39:56] !log mholloway-shell@deploy1001 Synchronized wmf-config/InitialiseSettings.php: MachineVision: Update image withholding term list (duration: 01m 06s) [14:40:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:49:48] !log re-ran extension/MachineVision/maintenance/withholdImages.php on commonswiki [14:49:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:51:43] (03PS1) 10Kormat: Revert "install_server: Allow reimage of db2087" [puppet] - 10https://gerrit.wikimedia.org/r/593247 (https://phabricator.wikimedia.org/T250666) [14:53:12] (03PS18) 10Andrew Bogott: wmcs pdns: stop using primary/secondary language for resolvers and recursors [puppet] - 10https://gerrit.wikimedia.org/r/593035 (https://phabricator.wikimedia.org/T249941) [14:56:14] !log upload cescout 0.1.2-1 to apt.wm.o (buster) [14:56:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:57:20] (03PS19) 10Andrew Bogott: wmcs pdns: stop using primary/secondary language for resolvers and recursors [puppet] - 10https://gerrit.wikimedia.org/r/593035 (https://phabricator.wikimedia.org/T249941) [14:57:49] (03PS1) 10Dzahn: sslcert: add parameter to support cergen private keys [puppet] - 10https://gerrit.wikimedia.org/r/593249 [14:58:30] (03PS1) 10Ayounsi: Change blackhole term scope [homer/public] - 10https://gerrit.wikimedia.org/r/593250 (https://phabricator.wikimedia.org/T226742) [15:00:55] (03CR) 10Hashar: [C: 03+1] "Indeed, it is a legacy domain that has not been used in ages." [dns] - 10https://gerrit.wikimedia.org/r/591340 (owner: 10Dzahn) [15:01:37] (03CR) 10Hashar: "A little less madness, that domain has not served anything for years. Thank you for the cleanup!" [puppet] - 10https://gerrit.wikimedia.org/r/591338 (owner: 10Dzahn) [15:01:45] (03CR) 10Hashar: [C: 03+1] ci: remove integration.mediawiki.org [puppet] - 10https://gerrit.wikimedia.org/r/591338 (owner: 10Dzahn) [15:03:39] (03PS2) 10Dzahn: sslcert: add parameter to support cergen private keys [puppet] - 10https://gerrit.wikimedia.org/r/593249 [15:04:53] (03PS1) 10Cwhite: smart: try to handle error messages when checking facter version [puppet] - 10https://gerrit.wikimedia.org/r/593251 (https://phabricator.wikimedia.org/T251413) [15:05:38] (03CR) 10Kormat: [C: 03+2] Revert "install_server: Allow reimage of db2087" [puppet] - 10https://gerrit.wikimedia.org/r/593247 (https://phabricator.wikimedia.org/T250666) (owner: 10Kormat) [15:06:28] (03CR) 10Dzahn: "I watched httpd logs for a while and the only request i still see is for http://static-bugzilla.wikimedia.org/bug1.html from Icinga monito" [puppet] - 10https://gerrit.wikimedia.org/r/593218 (https://phabricator.wikimedia.org/T247650) (owner: 10Dzahn) [15:08:10] PROBLEM - WMF Cloud -Chi Cluster- - Public Internet Port - HTTPS on cloudelastic.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Search%23Administration [15:08:18] (03CR) 10Volans: "Thanks for converting our chat into reality! Looks mostly ok, few nits and a dissertation with myself inline :)" (038 comments) [puppet] - 10https://gerrit.wikimedia.org/r/593237 (owner: 10Jbond) [15:08:21] there is a reindex in progress :( -^ [15:09:52] RECOVERY - WMF Cloud -Chi Cluster- - Public Internet Port - HTTPS on cloudelastic.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 673 bytes in 6.155 second response time https://wikitech.wikimedia.org/wiki/Search%23Administration [15:12:03] (03PS1) 10Dzahn: add fake private keys for ganeti in cergen location [labs/private] - 10https://gerrit.wikimedia.org/r/593252 [15:12:20] !log kormat@cumin1001 dbctl commit (dc=all): 'Repooling db2087 in s6 and s7 after reimaging T250666', diff saved to https://phabricator.wikimedia.org/P11085 and previous config saved to /var/cache/conftool/dbconfig/20200429-151219-kormat.json [15:12:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:12:28] T250666: Upgrade WMF database-and-backup-related hosts to buster - https://phabricator.wikimedia.org/T250666 [15:12:38] (03CR) 10Dzahn: "doing https://gerrit.wikimedia.org/r/593252 to be able to compile this" [puppet] - 10https://gerrit.wikimedia.org/r/593249 (owner: 10Dzahn) [15:13:46] (03CR) 10Dzahn: [V: 03+2 C: 03+2] "empty files to be able to compile https://gerrit.wikimedia.org/r/c/operations/puppet/+/593249" [labs/private] - 10https://gerrit.wikimedia.org/r/593252 (owner: 10Dzahn) [15:14:28] 10Operations, 10ops-codfw, 10DC-Ops, 10decommission: Decommission es2001, es2002, es2003, es2004 - https://phabricator.wikimedia.org/T222592 (10Papaul) [15:17:31] (03PS1) 10Dzahn: add content into fake private keys for ganeti in cergen location [labs/private] - 10https://gerrit.wikimedia.org/r/593255 [15:17:45] (03PS1) 10Papaul: DNS: Remove mgmt asset tag for es200[1-4] [dns] - 10https://gerrit.wikimedia.org/r/593256 [15:18:59] (03CR) 10Papaul: [C: 03+2] DNS: Remove mgmt asset tag for es200[1-4] [dns] - 10https://gerrit.wikimedia.org/r/593256 (owner: 10Papaul) [15:19:21] (03CR) 10Volans: [C: 03+2] keyholder: notify the agent on auth changes [puppet] - 10https://gerrit.wikimedia.org/r/593207 (owner: 10Volans) [15:19:25] (03PS2) 10Dzahn: add content into fake private keys for ganeti in cergen location [labs/private] - 10https://gerrit.wikimedia.org/r/593255 [15:19:33] (03CR) 10Volans: [C: 03+2] homer: improve check diff email message [puppet] - 10https://gerrit.wikimedia.org/r/593208 (https://phabricator.wikimedia.org/T249224) (owner: 10Volans) [15:19:35] (03CR) 10Dzahn: [V: 03+2 C: 03+2] add content into fake private keys for ganeti in cergen location [labs/private] - 10https://gerrit.wikimedia.org/r/593255 (owner: 10Dzahn) [15:20:04] mutante: feel free to merge mines too if they are listed [15:20:13] volans: i get 2 of your changes. feel free to merge mine as well. we run into the bug when there is ... (lol) :) [15:20:23] (03CR) 10Filippo Giunchedi: [C: 04-1] smart: try to handle error messages when checking facter version (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/593251 (https://phabricator.wikimedia.org/T251413) (owner: 10Cwhite) [15:20:24] ..one in puppet and one in labs/private [15:20:33] 10Operations, 10ops-codfw, 10DC-Ops, 10decommission, 10Patch-For-Review: Decommission es2001, es2002, es2003, es2004 - https://phabricator.wikimedia.org/T222592 (10Papaul) [15:20:33] I have 2 in puppet [15:20:35] 10Operations, 10ops-codfw, 10DC-Ops, 10decommission, 10Patch-For-Review: Decommission es2001, es2002, es2003, es2004 - https://phabricator.wikimedia.org/T222592 (10Papaul) 05Open→03Resolved complete [15:20:41] volans: merged [15:20:44] thanks! [15:20:51] i am not sure if mine is synced now [15:21:06] there is a bug if they are mixed users and mixed repo [15:21:17] ah, i did not submit :) all good [15:22:10] (03CR) 10Filippo Giunchedi: "Nit inline, other than that LGTM" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/593228 (https://phabricator.wikimedia.org/T211692) (owner: 10Dzahn) [15:22:15] mutante: your lock is still there though [15:22:29] "No changes to merge. [15:22:56] aaaand I need to revert mine [15:23:26] (03CR) 10Filippo Giunchedi: [C: 03+1] LVS/icinga: avoid duplicate service definitions [puppet] - 10https://gerrit.wikimedia.org/r/593230 (https://phabricator.wikimedia.org/T211692) (owner: 10Dzahn) [15:24:02] (03PS1) 10Volans: Revert "keyholder: notify the agent on auth changes" [puppet] - 10https://gerrit.wikimedia.org/r/593257 [15:24:21] OK: puppet-merge on puppetmaster2002.codfw.wmnet (labs) succeeded [15:24:25] should be clean [15:24:39] (03CR) 10jerkins-bot: [V: 04-1] Revert "keyholder: notify the agent on auth changes" [puppet] - 10https://gerrit.wikimedia.org/r/593257 (owner: 10Volans) [15:24:43] I can confirm it in a moment [15:25:11] (03CR) 10RLazarus: "I expect the git and openssl sections to need minor fixes after being rewritten. Feel free to point out the bugs if it brings you joy, but" (037 comments) [puppet] - 10https://gerrit.wikimedia.org/r/589076 (owner: 10RLazarus) [15:26:07] (03PS2) 10Volans: Revert "keyholder: notify the agent on auth changes" [puppet] - 10https://gerrit.wikimedia.org/r/593257 [15:26:38] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3058 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [15:27:31] (03CR) 10Dzahn: "https://puppet-compiler.wmflabs.org/compiler1001/22216/" [puppet] - 10https://gerrit.wikimedia.org/r/593249 (owner: 10Dzahn) [15:27:56] (03CR) 10Volans: [C: 03+2] Revert "keyholder: notify the agent on auth changes" [puppet] - 10https://gerrit.wikimedia.org/r/593257 (owner: 10Volans) [15:28:03] (03CR) 10Dzahn: "https://puppet-compiler.wmflabs.org/compiler1001/22216/ganeti1003.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/593249 (owner: 10Dzahn) [15:28:59] (03CR) 10Dzahn: prometheus/icinga: avoid duplicate service definitions (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/593228 (https://phabricator.wikimedia.org/T211692) (owner: 10Dzahn) [15:29:14] (03PS3) 10Dzahn: prometheus/icinga: avoid duplicate service definitions [puppet] - 10https://gerrit.wikimedia.org/r/593228 (https://phabricator.wikimedia.org/T211692) [15:29:40] (03PS4) 10Ema: cache: use profile::lvs::realserver [puppet] - 10https://gerrit.wikimedia.org/r/584902 [15:31:30] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3058 is OK: HTTP OK: HTTP/1.0 200 OK - 22713 bytes in 0.260 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [15:33:20] (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/593225 (https://phabricator.wikimedia.org/T248615) (owner: 10Ottomata) [15:33:38] !jouncebot next [15:33:38] a Python reminder bot for deployments. see https://wikitech.wikimedia.org/wiki/Tool:Jouncebot [15:33:49] jouncebot next [15:33:49] In 2 hour(s) and 26 minute(s): Morning SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200429T1800) [15:34:05] (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/593228 (https://phabricator.wikimedia.org/T211692) (owner: 10Dzahn) [15:35:19] 10Operations, 10Analytics, 10Wikimedia-Logstash, 10observability, 10Performance-Team (Radar): Retire udp2log: onboard its producers and consumers to the logging pipeline - https://phabricator.wikimedia.org/T205856 (10fgiunchedi) >>! In T205856#6072710, @Krinkle wrote: >>>! In T126989#5076715, @gerritbot... [15:35:56] off [15:37:02] (03PS2) 10Ppchelko: Rerender mobile-html on wikidata description changes [deployment-charts] - 10https://gerrit.wikimedia.org/r/593066 (https://phabricator.wikimedia.org/T250209) [15:37:07] (03CR) 10Ppchelko: [C: 03+2] Rerender mobile-html on wikidata description changes [deployment-charts] - 10https://gerrit.wikimedia.org/r/593066 (https://phabricator.wikimedia.org/T250209) (owner: 10Ppchelko) [15:37:29] (03Merged) 10jenkins-bot: Rerender mobile-html on wikidata description changes [deployment-charts] - 10https://gerrit.wikimedia.org/r/593066 (https://phabricator.wikimedia.org/T250209) (owner: 10Ppchelko) [15:43:09] (03PS2) 10Cwhite: smart: try to handle error messages when checking facter version [puppet] - 10https://gerrit.wikimedia.org/r/593251 (https://phabricator.wikimedia.org/T251413) [15:43:31] (03CR) 10Cwhite: smart: try to handle error messages when checking facter version (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/593251 (https://phabricator.wikimedia.org/T251413) (owner: 10Cwhite) [15:45:26] (03CR) 10Jbond: "updated" (038 comments) [puppet] - 10https://gerrit.wikimedia.org/r/593237 (owner: 10Jbond) [15:45:40] (03CR) 10Cwhite: [C: 03+1] Collect kaios_app.error stream into logstash clienterror input [puppet] - 10https://gerrit.wikimedia.org/r/593225 (https://phabricator.wikimedia.org/T248615) (owner: 10Ottomata) [15:46:13] (03PS2) 10Jbond: ferm-status: refactor and add support for multiple tables [puppet] - 10https://gerrit.wikimedia.org/r/593237 [15:47:23] (03PS1) 10Hnowlan: changeprop: Fix bucketing for servicerunner gc metrics [deployment-charts] - 10https://gerrit.wikimedia.org/r/593262 (https://phabricator.wikimedia.org/T248677) [15:48:29] 10Operations, 10netops, 10Wikimedia-Incident: Investigate Juniper storm control - https://phabricator.wikimedia.org/T245192 (10Papaul) - Documentation in place - Add action-shutdown [15:54:58] (03CR) 10Filippo Giunchedi: "Minor thing inline, LGTM otherwise" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/593251 (https://phabricator.wikimedia.org/T251413) (owner: 10Cwhite) [15:55:22] (03PS1) 10Nray: Set Growth Screener Survey sample rate to 0.1% and limit to anon audience [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593265 (https://phabricator.wikimedia.org/T248421) [15:58:12] !log joal@deploy1001 Started deploy [analytics/aqs/deploy@c87c8e2]: Analytics regular weekly deploy [15:58:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:59:27] (03PS3) 10Cwhite: smart: try to handle error messages when checking facter version [puppet] - 10https://gerrit.wikimedia.org/r/593251 (https://phabricator.wikimedia.org/T251413) [15:59:59] fdans: Thanks a milion for the helpful test-url in dpeloying AQS :) [16:00:02] (03CR) 10Herron: [C: 03+1] Collect kaios_app.error stream into logstash clienterror input [puppet] - 10https://gerrit.wikimedia.org/r/593225 (https://phabricator.wikimedia.org/T248615) (owner: 10Ottomata) [16:00:03] (03CR) 10Cwhite: smart: try to handle error messages when checking facter version (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/593251 (https://phabricator.wikimedia.org/T251413) (owner: 10Cwhite) [16:03:16] (03CR) 10Alexandros Kosiaris: [C: 03+1] "/me likes. Nice!" [puppet] - 10https://gerrit.wikimedia.org/r/593249 (owner: 10Dzahn) [16:04:58] (03PS3) 10DannyS712: More cleanup of InitialiseSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/582879 (https://phabricator.wikimedia.org/T231178) [16:05:11] !log joal@deploy1001 Finished deploy [analytics/aqs/deploy@c87c8e2]: Analytics regular weekly deploy (duration: 06m 59s) [16:05:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:07:32] !log joal@deploy1001 Started deploy [analytics/refinery@6460d05]: Regular analytics weekly train [6460d05] [16:07:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:11:38] 10Operations, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar), 10Sustainability (MediaWiki-MultiDC): Make CDN purges reliable - https://phabricator.wikimedia.org/T133821 (10BBlack) >>! In T133821#6092865, @Joe wrote: > - Define a schema for a "url purge message". If I can throw in another $0.0... [16:12:07] (03PS4) 10DannyS712: More cleanup of InitialiseSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/582879 (https://phabricator.wikimedia.org/T231178) [16:12:28] 10Operations, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar), 10Sustainability (MediaWiki-MultiDC): Make CDN purges reliable - https://phabricator.wikimedia.org/T133821 (10Pchelolo) I volunteer RESTBase stack to be a guinea pig for this. Currently we already are processing all the purges via k... [16:13:40] (03CR) 10Ppchelko: [C: 03+2] changeprop: Fix bucketing for servicerunner gc metrics [deployment-charts] - 10https://gerrit.wikimedia.org/r/593262 (https://phabricator.wikimedia.org/T248677) (owner: 10Hnowlan) [16:16:14] (03PS2) 10Hnowlan: changeprop: Fix bucketing for servicerunner gc metrics [deployment-charts] - 10https://gerrit.wikimedia.org/r/593262 (https://phabricator.wikimedia.org/T248677) [16:16:27] (03PS5) 10RLazarus: cergen: Add script for renewing mcrouter certs. [puppet] - 10https://gerrit.wikimedia.org/r/589076 [16:19:19] liw: OK for me to deploy a back-port for T251408 now? [16:19:19] T251408: Call to undefined method MediaWiki\Extension\MachineVision\Hooks::onRollbackComplete() - https://phabricator.wikimedia.org/T251408 [16:19:48] (Code is only in group1, so impact is immediate.) [16:20:09] James_F, yes [16:20:33] Thanks. [16:20:38] I [16:20:48] I'll do T251404, whilst I'm here. [16:20:48] T251404: `editingold` when creating a new page - https://phabricator.wikimedia.org/T251404 [16:20:52] @James_F should I remove it from the swat deployments page then? [16:21:03] -Should I remove them both? [16:21:04] DannyS712: I'll do that once it's deployed. [16:22:35] (03PS3) 10Ottomata: Collect kaios_app.error stream into logstash clienterror input [puppet] - 10https://gerrit.wikimedia.org/r/593225 (https://phabricator.wikimedia.org/T248615) [16:22:36] (03CR) 10Hnowlan: [V: 03+2 C: 03+2] changeprop: Fix bucketing for servicerunner gc metrics [deployment-charts] - 10https://gerrit.wikimedia.org/r/593262 (https://phabricator.wikimedia.org/T248677) (owner: 10Hnowlan) [16:23:08] (03Merged) 10jenkins-bot: changeprop: Fix bucketing for servicerunner gc metrics [deployment-charts] - 10https://gerrit.wikimedia.org/r/593262 (https://phabricator.wikimedia.org/T248677) (owner: 10Hnowlan) [16:23:27] 10Operations, 10netops, 10Wikimedia-Incident: Investigate Juniper storm control - https://phabricator.wikimedia.org/T245192 (10Papaul) @ayounsi for the restore process when the interface is shutdown do you want for us to setup a recovery timeout or manually restore the interface? if we are going to use the... [16:24:12] (03CR) 10Ottomata: "https://puppet-compiler.wmflabs.org/compiler1003/22218/logstash1007.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/593225 (https://phabricator.wikimedia.org/T248615) (owner: 10Ottomata) [16:24:21] (03CR) 10Ottomata: [V: 03+2 C: 03+2] Collect kaios_app.error stream into logstash clienterror input [puppet] - 10https://gerrit.wikimedia.org/r/593225 (https://phabricator.wikimedia.org/T248615) (owner: 10Ottomata) [16:35:14] (03PS20) 10Andrew Bogott: wmcs pdns: stop using primary/secondary language for resolvers and recursors [puppet] - 10https://gerrit.wikimedia.org/r/593035 (https://phabricator.wikimedia.org/T249941) [16:39:53] (03CR) 10RLazarus: "Thank you for adding these!" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/592883 (https://phabricator.wikimedia.org/T247650) (owner: 10Dzahn) [16:41:07] (03PS21) 10Andrew Bogott: wmcs pdns: stop using primary/secondary language for resolvers and recursors [puppet] - 10https://gerrit.wikimedia.org/r/593035 (https://phabricator.wikimedia.org/T249941) [16:45:07] !log hnowlan@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' . [16:45:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:45:49] (03CR) 10Filippo Giunchedi: [C: 03+1] smart: try to handle error messages when checking facter version [puppet] - 10https://gerrit.wikimedia.org/r/593251 (https://phabricator.wikimedia.org/T251413) (owner: 10Cwhite) [16:46:34] (03PS22) 10Andrew Bogott: wmcs pdns: stop using primary/secondary language for resolvers and recursors [puppet] - 10https://gerrit.wikimedia.org/r/593035 (https://phabricator.wikimedia.org/T249941) [16:50:23] (03PS23) 10Andrew Bogott: wmcs pdns: stop using primary/secondary language for resolvers and recursors [puppet] - 10https://gerrit.wikimedia.org/r/593035 (https://phabricator.wikimedia.org/T249941) [16:51:15] (03PS4) 10Ottomata: refine.pp - Slight refactor to use new unified refine tranform functions [puppet] - 10https://gerrit.wikimedia.org/r/592756 (https://phabricator.wikimedia.org/T238230) [16:53:46] !log jforrester@deploy1001 Synchronized php-1.35.0-wmf.30/includes/EditPage.php: EditPage::showHeader - only warn editing an old revision if it exists T251404 (duration: 01m 06s) [16:53:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:53:53] T251404: `editingold` when creating a new page - https://phabricator.wikimedia.org/T251404 [16:54:15] !log jforrester@deploy1001 sync-file aborted: Fix hook handling for hook T251408 (duration: 00m 02s) [16:54:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:54:22] T251408: Call to undefined method MediaWiki\Extension\MachineVision\Hooks::onRollbackComplete() - https://phabricator.wikimedia.org/T251408 [16:55:29] !log jforrester@deploy1001 Synchronized php-1.35.0-wmf.30/extensions/MachineVision/src/Hooks.php: Fix hook handling for hook T251408 (duration: 01m 05s) [16:55:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:03:02] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3064 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [17:03:16] Prod clear BTW, sorry for not mentioning. [17:04:18] PROBLEM - WMF Cloud -Chi Cluster- - Public Internet Port - HTTPS on cloudelastic.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Search%23Administration [17:04:40] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3064 is OK: HTTP OK: HTTP/1.0 200 OK - 22720 bytes in 0.270 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [17:05:58] RECOVERY - WMF Cloud -Chi Cluster- - Public Internet Port - HTTPS on cloudelastic.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 673 bytes in 0.008 second response time https://wikitech.wikimedia.org/wiki/Search%23Administration [17:13:14] PROBLEM - Check systemd state on an-presto1005 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [17:13:22] PROBLEM - Check systemd state on an-presto1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [17:13:39] Presto is being handled --^ [17:15:02] RECOVERY - Check systemd state on an-presto1005 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [17:15:10] RECOVERY - Check systemd state on an-presto1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [17:18:06] 10Operations, 10Android-app-Bugs, 10Page Content Service, 10Product-Infrastructure-Team-Backlog, and 4 others: Incorrect language variant returned for PCS endpoints - https://phabricator.wikimedia.org/T249284 (10Pchelolo) You've just challenged my core belief - I was 100% sure Varnish layer was not passing... [17:19:09] 10Operations, 10Parsoid, 10RESTBase, 10Traffic, 10Core Platform Team Workboards (Clinic Duty Team): HTTP 400 Error when trying to save an edit on English Wikipedia: Error contacting the Parsoid/RESTBase server - https://phabricator.wikimedia.org/T250815 (10Pchelolo) #traffic This seems like a borderline... [17:24:41] !log joal@deploy1001 Finished deploy [analytics/refinery@6460d05]: Regular analytics weekly train [6460d05] (duration: 77m 08s) [17:24:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:26:15] (03CR) 10Cwhite: [C: 03+2] smart: try to handle error messages when checking facter version [puppet] - 10https://gerrit.wikimedia.org/r/593251 (https://phabricator.wikimedia.org/T251413) (owner: 10Cwhite) [17:29:10] 10Operations, 10Parsoid, 10RESTBase, 10Traffic, 10Core Platform Team Workboards (Clinic Duty Team): HTTP 400 Error when trying to save an edit on English Wikipedia: Error contacting the Parsoid/RESTBase server - https://phabricator.wikimedia.org/T250815 (10BBlack) Yeah, the linked Lua code is, I think, t... [17:31:28] PROBLEM - Check systemd state on an-presto1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [17:33:18] RECOVERY - Check systemd state on an-presto1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [17:35:07] 10Operations, 10Patch-For-Review: smart-data-dump fails with ValueError when trying to parse a date - https://phabricator.wikimedia.org/T251413 (10colewhite) we added the ability to ignore stderr during the facter version check. this should mitigate situations like pre-locale configuration from affecting the... [17:35:17] 10Operations, 10Patch-For-Review: smart-data-dump fails with ValueError when trying to parse a date - https://phabricator.wikimedia.org/T251413 (10colewhite) 05Open→03Resolved p:05Triage→03Medium [17:36:46] (03PS1) 10Zoranzoki21: Add awa lang [dns] - 10https://gerrit.wikimedia.org/r/593280 (https://phabricator.wikimedia.org/T251371) [17:41:28] (03PS1) 10Elukey: role::presto::worker: adjust -Xmx size [puppet] - 10https://gerrit.wikimedia.org/r/593281 [17:41:32] (03PS2) 10Zoranzoki21: Add Awadhi (awa) lang [dns] - 10https://gerrit.wikimedia.org/r/593280 (https://phabricator.wikimedia.org/T251371) [17:42:45] (03CR) 10Elukey: [C: 03+2] role::presto::worker: adjust -Xmx size [puppet] - 10https://gerrit.wikimedia.org/r/593281 (owner: 10Elukey) [17:44:13] (03PS4) 10RhinosF1: Set $wgArticleCount to 'any' on trwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591771 (https://phabricator.wikimedia.org/T248747) [17:44:36] !log elukey@cumin1001 START - Cookbook sre.presto.roll-restart-workers [17:44:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:48:10] PROBLEM - WMF Cloud -Chi Cluster- - Public Internet Port - HTTPS on cloudelastic.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Search%23Administration [17:49:52] RECOVERY - WMF Cloud -Chi Cluster- - Public Internet Port - HTTPS on cloudelastic.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 673 bytes in 3.768 second response time https://wikitech.wikimedia.org/wiki/Search%23Administration [17:54:47] !log elukey@cumin1001 END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) [17:54:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:55:50] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=routinator site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [17:56:49] !log joal@deploy1001 Started deploy [analytics/refinery@6460d05] (thin): Regular analytics weekly train THIN [6460d05] [17:56:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:56:57] !log joal@deploy1001 Finished deploy [analytics/refinery@6460d05] (thin): Regular analytics weekly train THIN [6460d05] (duration: 00m 08s) [17:57:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:57:36] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [18:00:04] RoanKattouw, Niharika, and Urbanecm: #bothumor My software never has bugs. It just develops random features. Rise for Morning SWAT(Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200429T1800). [18:00:05] DannyS712: A patch you scheduled for Morning SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [18:02:13] I can SWAT today! [18:03:16] DannyS712: your patch is WIP [18:05:04] (03CR) 10Urbanecm: [C: 04-1] "This change is ready for review." (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/582879 (https://phabricator.wikimedia.org/T231178) (owner: 10DannyS712) [18:07:50] (03PS24) 10Andrew Bogott: wmcs pdns: stop using primary/secondary language for resolvers and recursors [puppet] - 10https://gerrit.wikimedia.org/r/593035 (https://phabricator.wikimedia.org/T249941) [18:09:59] (03CR) 10DannyS712: More cleanup of InitialiseSettings.php (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/582879 (https://phabricator.wikimedia.org/T231178) (owner: 10DannyS712) [18:11:44] (03CR) 10Andrew Bogott: [C: 03+2] wmcs pdns: stop using primary/secondary language for resolvers and recursors [puppet] - 10https://gerrit.wikimedia.org/r/593035 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [18:18:10] (03PS1) 10Urbanecm: Assign oathauth-verify-user to stewards [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593286 (https://phabricator.wikimedia.org/T251447) [18:18:32] (03CR) 10RhinosF1: [C: 03+1] Assign oathauth-verify-user to stewards [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593286 (https://phabricator.wikimedia.org/T251447) (owner: 10Urbanecm) [18:23:15] (03PS1) 10Ottomata: Refactor and DRY camus module templates [puppet] - 10https://gerrit.wikimedia.org/r/593288 [18:24:40] PROBLEM - WMF Cloud -Chi Cluster- - Public Internet Port - HTTPS on cloudelastic.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Search%23Administration [18:26:28] RECOVERY - WMF Cloud -Chi Cluster- - Public Internet Port - HTTPS on cloudelastic.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 673 bytes in 5.526 second response time https://wikitech.wikimedia.org/wiki/Search%23Administration [18:26:31] (03CR) 10jerkins-bot: [V: 04-1] Refactor and DRY camus module templates [puppet] - 10https://gerrit.wikimedia.org/r/593288 (owner: 10Ottomata) [18:30:34] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=atlas_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [18:32:00] PROBLEM - WMF Cloud -Chi Cluster- - Public Internet Port - HTTPS on cloudelastic.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Search%23Administration [18:32:24] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [18:33:42] (03CR) 10Urbanecm: [C: 04-2] "not to be deployed without an okay" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593286 (https://phabricator.wikimedia.org/T251447) (owner: 10Urbanecm) [18:33:42] RECOVERY - WMF Cloud -Chi Cluster- - Public Internet Port - HTTPS on cloudelastic.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 673 bytes in 0.006 second response time https://wikitech.wikimedia.org/wiki/Search%23Administration [18:36:24] (03PS2) 10Ottomata: Refactor and DRY camus module templates [puppet] - 10https://gerrit.wikimedia.org/r/593288 [18:39:32] (03CR) 10jerkins-bot: [V: 04-1] Refactor and DRY camus module templates [puppet] - 10https://gerrit.wikimedia.org/r/593288 (owner: 10Ottomata) [18:41:36] (03PS3) 10Ottomata: Refactor and DRY camus module templates [puppet] - 10https://gerrit.wikimedia.org/r/593288 [18:44:45] (03CR) 10jerkins-bot: [V: 04-1] Refactor and DRY camus module templates [puppet] - 10https://gerrit.wikimedia.org/r/593288 (owner: 10Ottomata) [18:48:18] (03PS4) 10Ottomata: Refactor and DRY camus module templates [puppet] - 10https://gerrit.wikimedia.org/r/593288 [18:50:03] (03PS2) 10Andrew Bogott: Openstack packaging: set the stage for Rocky on Buster [puppet] - 10https://gerrit.wikimedia.org/r/592997 (https://phabricator.wikimedia.org/T251294) [18:50:05] (03PS1) 10Andrew Bogott: Rename labtestservices2003 to cloudservices2003-dev [puppet] - 10https://gerrit.wikimedia.org/r/593292 [18:51:44] (03CR) 10jerkins-bot: [V: 04-1] Refactor and DRY camus module templates [puppet] - 10https://gerrit.wikimedia.org/r/593288 (owner: 10Ottomata) [18:52:27] (03PS1) 10Andrew Bogott: rename labtestservices2003 to cloudservices2003-dev [dns] - 10https://gerrit.wikimedia.org/r/593294 [18:52:57] (03CR) 10jerkins-bot: [V: 04-1] rename labtestservices2003 to cloudservices2003-dev [dns] - 10https://gerrit.wikimedia.org/r/593294 (owner: 10Andrew Bogott) [18:53:03] (03CR) 10jerkins-bot: [V: 04-1] Openstack packaging: set the stage for Rocky on Buster [puppet] - 10https://gerrit.wikimedia.org/r/592997 (https://phabricator.wikimedia.org/T251294) (owner: 10Andrew Bogott) [18:55:26] (03PS3) 10Andrew Bogott: Openstack packaging: set the stage for Rocky on Buster [puppet] - 10https://gerrit.wikimedia.org/r/592997 (https://phabricator.wikimedia.org/T251294) [18:55:28] (03PS2) 10Andrew Bogott: Rename labtestservices2003 to cloudservices2003-dev [puppet] - 10https://gerrit.wikimedia.org/r/593292 [18:55:35] 10Operations, 10Traffic: ats-tls ran out of FDs on cp1089 - https://phabricator.wikimedia.org/T248736 (10Ottomata) Is there a more permanent fix? Any idea why ATS was leaking the socket FDs? [18:56:53] (03PS5) 10Ottomata: Refactor and DRY camus module templates [puppet] - 10https://gerrit.wikimedia.org/r/593288 [18:59:32] (03PS2) 10Andrew Bogott: rename labtestservices2003 to cloudservices2003-dev [dns] - 10https://gerrit.wikimedia.org/r/593294 [18:59:37] (03CR) 10Andrew Bogott: [C: 03+2] Openstack packaging: set the stage for Rocky on Buster [puppet] - 10https://gerrit.wikimedia.org/r/592997 (https://phabricator.wikimedia.org/T251294) (owner: 10Andrew Bogott) [18:59:51] (03CR) 10Andrew Bogott: [C: 03+2] Rename labtestservices2003 to cloudservices2003-dev [puppet] - 10https://gerrit.wikimedia.org/r/593292 (owner: 10Andrew Bogott) [18:59:57] (03CR) 10jerkins-bot: [V: 04-1] rename labtestservices2003 to cloudservices2003-dev [dns] - 10https://gerrit.wikimedia.org/r/593294 (owner: 10Andrew Bogott) [19:00:04] liw and brennen: My dear minions, it's time we take the moon! Just kidding. Time for Mediawiki train - European+American Version (secondary timeslot) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200429T1900). [19:00:50] (03CR) 10Jhedden: [C: 03+2] cloudvps: enable monitoring for projects using shinken [puppet] - 10https://gerrit.wikimedia.org/r/593054 (https://phabricator.wikimedia.org/T250206) (owner: 10Jhedden) [19:01:34] (03PS3) 10Andrew Bogott: rename labtestservices2003 to cloudservices2003-dev [dns] - 10https://gerrit.wikimedia.org/r/593294 [19:02:00] (03CR) 10jerkins-bot: [V: 04-1] rename labtestservices2003 to cloudservices2003-dev [dns] - 10https://gerrit.wikimedia.org/r/593294 (owner: 10Andrew Bogott) [19:02:57] !log depooling and stopping the updater on wdqs2008 for some query tests (wdqs-internal) [19:03:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:09:06] (03PS11) 10Cwhite: smart: add tests for _parse_smart_info and _parse_smart_attributes [puppet] - 10https://gerrit.wikimedia.org/r/587877 (https://phabricator.wikimedia.org/T199236) [19:11:11] (03CR) 10BryanDavis: [C: 03+1] "Perfect, no. Hella better than before, yes!" [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/587538 (https://phabricator.wikimedia.org/T233939) (owner: 10Jbond) [19:17:11] (03CR) 10Jdlrobson: [C: 03+1] Set Growth Screener Survey sample rate to 0.1% and limit to anon audience [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593265 (https://phabricator.wikimedia.org/T248421) (owner: 10Nray) [19:27:16] (03CR) 10Niedzielski: [C: 03+1] Set Growth Screener Survey sample rate to 0.1% and limit to anon audience [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593265 (https://phabricator.wikimedia.org/T248421) (owner: 10Nray) [19:29:03] (03CR) 10Urbanecm: [C: 03+1] Add Awadhi (awa) lang [dns] - 10https://gerrit.wikimedia.org/r/593280 (https://phabricator.wikimedia.org/T251371) (owner: 10Zoranzoki21) [19:33:35] (03PS4) 10Andrew Bogott: rename labtestservices2003 to cloudservices2003-dev [dns] - 10https://gerrit.wikimedia.org/r/593294 [19:34:10] !log repool wdqs2008 [19:34:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:34:56] (03CR) 10Andrew Bogott: [C: 03+2] rename labtestservices2003 to cloudservices2003-dev [dns] - 10https://gerrit.wikimedia.org/r/593294 (owner: 10Andrew Bogott) [19:35:01] (03PS5) 10Andrew Bogott: rename labtestservices2003 to cloudservices2003-dev [dns] - 10https://gerrit.wikimedia.org/r/593294 [19:42:32] (03PS2) 10Jforrester: Update Phabricator task for jvwiki logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593177 (https://phabricator.wikimedia.org/T251050) (owner: 10QEDK) [19:42:50] I'll sling out a comment-only patch. [19:42:54] (03CR) 10Jforrester: [C: 03+2] Update Phabricator task for jvwiki logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593177 (https://phabricator.wikimedia.org/T251050) (owner: 10QEDK) [19:43:37] (03CR) 10Jforrester: "These are very blurry; can we re-render them more nicely?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593172 (https://phabricator.wikimedia.org/T251050) (owner: 10QEDK) [19:43:59] (03Merged) 10jenkins-bot: Update Phabricator task for jvwiki logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593177 (https://phabricator.wikimedia.org/T251050) (owner: 10QEDK) [19:46:03] (03CR) 10Cwhite: [C: 03+2] smart: add tests for _parse_smart_info and _parse_smart_attributes [puppet] - 10https://gerrit.wikimedia.org/r/587877 (https://phabricator.wikimedia.org/T199236) (owner: 10Cwhite) [19:46:07] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: (doc-only) Fix Phabricator task reference for jvwiki logo (duration: 01m 05s) [19:46:11] (03CR) 10QEDK: "> Patch Set 2:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593172 (https://phabricator.wikimedia.org/T251050) (owner: 10QEDK) [19:46:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:47:38] 10Operations, 10Parsoid, 10RESTBase, 10Traffic, 10Core Platform Team Workboards (Clinic Duty Team): HTTP 400 Error when trying to save an edit on English Wikipedia: Error contacting the Parsoid/RESTBase server - https://phabricator.wikimedia.org/T250815 (10matmarex) >>! In T250815#6091098, @Pchelolo wrot... [19:51:35] (03PS6) 10Ottomata: Refactor and DRY camus module templates [puppet] - 10https://gerrit.wikimedia.org/r/593288 [19:51:49] (03PS3) 10QEDK: Update jvwiki logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593172 (https://phabricator.wikimedia.org/T251050) [19:54:23] 10Operations, 10Parsoid, 10RESTBase, 10Traffic, 10Core Platform Team Workboards (Clinic Duty Team): HTTP 400 Error when trying to save an edit on English Wikipedia: Error contacting the Parsoid/RESTBase server - https://phabricator.wikimedia.org/T250815 (10Pchelolo) [19:56:22] (03CR) 10Ottomata: "I ran a property-diff script on the output from https://puppet-compiler.wmflabs.org/compiler1001/22230/ and compared it with the property" [puppet] - 10https://gerrit.wikimedia.org/r/593288 (owner: 10Ottomata) [19:56:42] (03PS4) 10QEDK: Update jvwiki logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593172 (https://phabricator.wikimedia.org/T251050) [19:59:54] (03PS5) 10QEDK: Update jvwiki logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593172 (https://phabricator.wikimedia.org/T251050) [20:00:04] halfak and accraze: #bothumor My software never has bugs. It just develops random features. Rise for Services – Graphoid / Citoid / ORES. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200429T2000). [20:04:33] (03PS6) 10QEDK: Update jvwiki logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593172 (https://phabricator.wikimedia.org/T251050) [20:05:07] (03CR) 10QEDK: "> Patch Set 2:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593172 (https://phabricator.wikimedia.org/T251050) (owner: 10QEDK) [20:07:17] 10Operations, 10Android-app-Bugs, 10Page Content Service, 10Product-Infrastructure-Team-Backlog, and 4 others: Incorrect language variant returned for PCS endpoints - https://phabricator.wikimedia.org/T249284 (10bearND) > I am inclined to say, that maybe unsetting it in RESTBase actually makes more sense -... [20:09:12] !log andrew@cumin1001 START - Cookbook sre.hosts.downtime [20:09:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:11:36] !log andrew@cumin1001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [20:11:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:18:47] (03CR) 10VolkerE: [C: 03+1] Set Growth Screener Survey sample rate to 0.1% and limit to anon audience [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593265 (https://phabricator.wikimedia.org/T248421) (owner: 10Nray) [20:22:54] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [20:24:46] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [20:32:00] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [20:35:42] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [20:37:10] PROBLEM - WMF Cloud -Chi Cluster- - Public Internet Port - HTTPS on cloudelastic.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Search%23Administration [20:38:52] RECOVERY - WMF Cloud -Chi Cluster- - Public Internet Port - HTTPS on cloudelastic.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 673 bytes in 0.007 second response time https://wikitech.wikimedia.org/wiki/Search%23Administration [20:42:35] (03PS1) 10Andrew Bogott: Add hiera host def for cloudservices2003-dev [puppet] - 10https://gerrit.wikimedia.org/r/593310 [20:44:46] (03CR) 10Andrew Bogott: [C: 03+2] Add hiera host def for cloudservices2003-dev [puppet] - 10https://gerrit.wikimedia.org/r/593310 (owner: 10Andrew Bogott) [20:48:22] 10Operations, 10observability, 10Patch-For-Review: log spam from mtail 3.0.0~rc19 on wezen - https://phabricator.wikimedia.org/T225604 (10colewhite) [20:48:45] (03CR) 10Jforrester: [C: 03+1] "> Patch Set 6:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593172 (https://phabricator.wikimedia.org/T251050) (owner: 10QEDK) [20:54:37] 10Operations, 10observability, 10Patch-For-Review: log spam from mtail 3.0.0~rc19 on wezen - https://phabricator.wikimedia.org/T225604 (10colewhite) [20:54:43] (03PS1) 10Andrew Bogott: Add entries for cloudservices2003-dev/codfw1dev-recursor1/codfw1dev-ns1 [dns] - 10https://gerrit.wikimedia.org/r/593313 [20:55:07] 10Operations, 10observability, 10Patch-For-Review: log spam from mtail 3.0.0~rc19 on wezen - https://phabricator.wikimedia.org/T225604 (10colewhite) [20:56:45] 10Operations, 10observability, 10Patch-For-Review: log spam from mtail 3.0.0~rc19 on wezen - https://phabricator.wikimedia.org/T225604 (10colewhite) per T224564, wezen is no longer around and upgrading mtail should resolve the issue. [20:58:35] (03PS1) 10Cwhite: aptrepo: add mtail component for controlled mtail upgrade [puppet] - 10https://gerrit.wikimedia.org/r/593314 (https://phabricator.wikimedia.org/T251466) [21:01:59] (03PS1) 10Volans: homer: fix email formatting [puppet] - 10https://gerrit.wikimedia.org/r/593317 (https://phabricator.wikimedia.org/T249224) [21:02:22] (03Abandoned) 10Addshore: Add basic Dockerfile to run docker-pkg [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/404084 (owner: 10Addshore) [21:03:18] (03PS2) 10Andrew Bogott: Add entries for cloudservices2003-dev/codfw1dev-recursor1/codfw1dev-ns1 [dns] - 10https://gerrit.wikimedia.org/r/593313 [21:04:14] (03Abandoned) 10Addshore: Do not set useEntitySourceBasedFederation for wikibase client [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499768 (owner: 10Addshore) [21:07:32] (03CR) 10Andrew Bogott: [C: 03+2] Add entries for cloudservices2003-dev/codfw1dev-recursor1/codfw1dev-ns1 [dns] - 10https://gerrit.wikimedia.org/r/593313 (owner: 10Andrew Bogott) [21:09:33] (03PS1) 10Andrew Bogott: codfw1dev designate: add ns1 to our list of servers [puppet] - 10https://gerrit.wikimedia.org/r/593320 [21:10:34] (03CR) 10Andrew Bogott: [C: 03+2] codfw1dev designate: add ns1 to our list of servers [puppet] - 10https://gerrit.wikimedia.org/r/593320 (owner: 10Andrew Bogott) [21:14:27] Deploying two UBNs. [21:16:47] (03PS1) 10Volans: homer: send email only on successful diff [puppet] - 10https://gerrit.wikimedia.org/r/593321 (https://phabricator.wikimedia.org/T249224) [21:17:01] !log jforrester@deploy1001 Synchronized php-1.35.0-wmf.30/extensions/Quiz/includes/Quiz.php: Don't crash if quiz attempts to include a bad title T251409 (duration: 01m 06s) [21:17:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:17:12] T251409: Argument 1 passed to Parser::fetchTemplateAndTitle() must be an instance of Title, null given, called in /srv/mediawiki/php-1.35.0-wmf.30/extensions/Quiz/includes/Quiz.php on line 258 - https://phabricator.wikimedia.org/T251409 [21:18:39] PROBLEM - Host 208.80.153.83 is DOWN: PING CRITICAL - Packet loss = 100% [21:19:00] !log jforrester@deploy1001 Synchronized php-1.35.0-wmf.30/extensions/GlobalBlocking/includes/api/ApiQueryGlobalBlocks.php: T251430 Unconditionally select gb_timestamp (duration: 01m 05s) [21:19:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:19:08] T251430: API / GlobalBlocks: PHP Notice: Undefined property: stdClass::$gb_timestamp - https://phabricator.wikimedia.org/T251430 [21:19:14] thanks on that one James_F, cscott. [21:19:52] Happy to get crap fixed. :-) [21:22:18] !log jforrester@deploy1001 Synchronized php-1.35.0-wmf.28/extensions/GlobalBlocking/includes/api/ApiQueryGlobalBlocks.php: T251430 Unconditionally select gb_timestamp (duration: 01m 06s) [21:22:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:23:15] (03CR) 10Volans: [C: 03+2] homer: fix email formatting [puppet] - 10https://gerrit.wikimedia.org/r/593317 (https://phabricator.wikimedia.org/T249224) (owner: 10Volans) [21:23:33] (03CR) 10Volans: [C: 03+2] homer: send email only on successful diff [puppet] - 10https://gerrit.wikimedia.org/r/593321 (https://phabricator.wikimedia.org/T249224) (owner: 10Volans) [21:23:59] (03PS2) 10Volans: homer: send email only on successful diff [puppet] - 10https://gerrit.wikimedia.org/r/593321 (https://phabricator.wikimedia.org/T249224) [21:24:42] (03PS1) 10Cwhite: mtail: add flag to install mtail apt component [puppet] - 10https://gerrit.wikimedia.org/r/593327 (https://phabricator.wikimedia.org/T251466) [21:34:03] !log volker-e@deploy1001 Started deploy [design/style-guide@c4956c3]: Deploy design/style-guide: [21:34:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:34:11] !log volker-e@deploy1001 Finished deploy [design/style-guide@c4956c3]: Deploy design/style-guide: (duration: 00m 08s) [21:34:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:43:37] (03PS5) 10RhinosF1: Set $wgArticleCount to 'any' on trwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591771 (https://phabricator.wikimedia.org/T248747) [21:44:11] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job={cloud_dev_pdns,cloud_dev_pdns_rec,swagger_check_cxserver_cluster_eqiad} site={codfw,eqiad} https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [21:47:25] * RhinosF1 here until after SWAT [21:54:55] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3062 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [21:59:57] !log upgrading RAID firmware on labsdb1011 T249188 [22:00:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:00:04] T249188: Reimage labsdb1011 to Buster and 10.4 - https://phabricator.wikimedia.org/T249188 [22:02:49] PROBLEM - haproxy failover on dbproxy1018 is CRITICAL: CRITICAL check_failover servers up 1 down 1 https://wikitech.wikimedia.org/wiki/HAProxy [22:05:51] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3062 is OK: HTTP OK: HTTP/1.0 200 OK - 22734 bytes in 0.278 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [22:11:35] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3062 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [22:15:39] RECOVERY - haproxy failover on dbproxy1018 is OK: OK check_failover servers up 2 down 0 https://wikitech.wikimedia.org/wiki/HAProxy [22:22:31] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3062 is OK: HTTP OK: HTTP/1.0 200 OK - 22727 bytes in 0.256 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [22:41:32] (03CR) 10Bstorm: "First thing I'm noticing in QA is that this seems to exit out quicker when running a `webservice stop`. The pods are still terminating wh" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/586162 (https://phabricator.wikimedia.org/T197930) (owner: 10BryanDavis) [22:41:43] (03CR) 10Bstorm: [C: 04-1] Replace pykube with a custom API client [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/586162 (https://phabricator.wikimedia.org/T197930) (owner: 10BryanDavis) [22:44:16] (03CR) 10Bstorm: [C: 04-1] "The problem appears to be content-type headers https://github.com/kubernetes/kubernetes/issues/22904" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/586162 (https://phabricator.wikimedia.org/T197930) (owner: 10BryanDavis) [22:50:55] (03CR) 10Bstorm: [C: 04-1] "According to the replies here https://github.com/kubernetes-client/javascript/issues/19, the OpenAPI spec should have the correct content " [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/586162 (https://phabricator.wikimedia.org/T197930) (owner: 10BryanDavis) [22:52:03] (03PS1) 10Jhedden: cloudvps: metricsinfra alert on puppet agent disabled state [puppet] - 10https://gerrit.wikimedia.org/r/593342 (https://phabricator.wikimedia.org/T250206) [22:55:24] (03CR) 10Jhedden: [C: 03+2] cloudvps: metricsinfra alert on puppet agent disabled state [puppet] - 10https://gerrit.wikimedia.org/r/593342 (https://phabricator.wikimedia.org/T250206) (owner: 10Jhedden) [22:55:52] (03PS2) 10Nray: Set Growth Screener Survey sample rate to 0.1% and limit to anon audience [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593265 (https://phabricator.wikimedia.org/T248421) [23:00:05] RoanKattouw, Niharika, and Urbanecm: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) Evening SWAT(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200429T2300). [23:00:05] nray and RhinosF1: A patch you scheduled for Evening SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [23:00:31] o/ here and ready to roll! [23:00:52] hi [23:01:26] I'll do the SWAT [23:01:50] ty [23:02:38] (03CR) 10Catrope: [C: 03+2] Set $wgArticleCount to 'any' on trwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591771 (https://phabricator.wikimedia.org/T248747) (owner: 10RhinosF1) [23:03:41] (03Merged) 10jenkins-bot: Set $wgArticleCount to 'any' on trwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591771 (https://phabricator.wikimedia.org/T248747) (owner: 10RhinosF1) [23:06:25] RhinosF1: Your patch is on mwdebug1002, please test [23:08:59] RoanKattouw: please run updateArticleCount.php :) [23:09:06] Oh right lol [23:09:07] One secodn [23:13:20] (03CR) 10Catrope: [C: 03+2] Set Growth Screener Survey sample rate to 0.1% and limit to anon audience [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593265 (https://phabricator.wikimedia.org/T248421) (owner: 10Nray) [23:15:42] (03PS1) 10Krinkle: contint: Add redirect from stale doc dirs to current ones [puppet] - 10https://gerrit.wikimedia.org/r/593344 [23:20:52] RoanKattouw: ping me please when the script has ran (don't forget --update) [23:21:11] It's still syncing sorry [23:21:18] Or rather I had a typo in the sync command, so it's now syncing [23:21:20] np [23:22:13] !log catrope@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Set $wgArticleCount to any on trwikisource (duration: 01m 06s) [23:22:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:23:06] (03PS1) 10Jforrester: Drop Sentry, Part I: Stop loading it anywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593347 (https://phabricator.wikimedia.org/T91649) [23:23:08] (03PS1) 10Jforrester: Drop Sentry, Part II: Stop configuring it for Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593348 (https://phabricator.wikimedia.org/T91649) [23:23:10] (03PS1) 10Jforrester: Drop Sentry, Part III: Drop from i18n build step [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593349 (https://phabricator.wikimedia.org/T91649) [23:23:19] RoanKattouw: Ping when you're done so I can cleanse prod of Sentry? ;-) [23:23:29] RhinosF1: Update script has run [23:24:21] RECOVERY - snapshot of s8 in eqiad on db1115 is OK: Last snapshot for s8 at eqiad (db1116.eqiad.wmnet:3318) taken on 2020-04-29 20:37:30 (1034 GB) https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Alerting [23:25:15] RoanKattouw: still seeing the previous value 4443 on https://tr.wikisource.org/wiki/%C3%96zel:%C4%B0statistikler - task reports we expect >6000 [23:25:20] (03PS3) 10Catrope: Set Growth Screener Survey sample rate to 0.1% and limit to anon audience [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593265 (https://phabricator.wikimedia.org/T248421) (owner: 10Nray) [23:25:29] Counting articles...found 7508. [23:25:31] Is what it told me [23:25:43] RoanKattouw: did you do --update ? [23:25:45] (03CR) 10Catrope: Set Growth Screener Survey sample rate to 0.1% and limit to anon audience [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593265 (https://phabricator.wikimedia.org/T248421) (owner: 10Nray) [23:25:48] (03CR) 10Catrope: [C: 03+2] Set Growth Screener Survey sample rate to 0.1% and limit to anon audience [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593265 (https://phabricator.wikimedia.org/T248421) (owner: 10Nray) [23:25:55] Oh whoops [23:25:55] 10Operations, 10MediaWiki-extensions-CodeReview, 10Patch-For-Review: Set up static-codereview.wikimedia.org to host static HTML dump of CodeReview - https://phabricator.wikimedia.org/T243056 (10Krinkle) >>! In T243056#6091211, @MaxSem wrote: > Mhm, maybe put it in a subdomain of mediawiki.org? That's not un... [23:25:57] Now I did [23:26:15] RoanKattouw: perfect! don't forget to !log it [23:26:30] !log Ran updateArticleCount.php on trwikisource [23:26:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:26:51] (03Merged) 10jenkins-bot: Set Growth Screener Survey sample rate to 0.1% and limit to anon audience [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593265 (https://phabricator.wikimedia.org/T248421) (owner: 10Nray) [23:27:14] nray, James_F: clear from my end now! I'm off to sleep. [23:27:17] RoanKattouw: ty! [23:27:44] Cool. [23:28:38] 10Operations, 10MediaWiki-extensions-CodeReview, 10Patch-For-Review: Set up static-codereview.wikimedia.org to host static HTML dump of CodeReview - https://phabricator.wikimedia.org/T243056 (10Jdforrester-WMF) Why not just put it in Labs where other toy projects go? [23:29:24] RoanKattouw: Clear? [23:29:33] No one more config patch [23:29:41] Kk, standing back. [23:31:16] nray: Your patch is on mwdebug1002, please test [23:31:26] k, will check now. Thank you Roan [23:33:51] (03PS1) 10Jforrester: Drop CodeReview, Part I: Stop loading it anywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593350 (https://phabricator.wikimedia.org/T116948) [23:33:53] (03PS1) 10Jforrester: Drop CodeReview, Part II: Stop configuring it anywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593351 (https://phabricator.wikimedia.org/T116948) [23:33:55] (03PS1) 10Jforrester: Drop CodeReview, Part III: Drop from i18n build step [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593352 (https://phabricator.wikimedia.org/T116948) [23:34:31] (03CR) 10Jforrester: [C: 04-2] "Not yet, sadly." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593350 (https://phabricator.wikimedia.org/T116948) (owner: 10Jforrester) [23:36:03] RoanKattouw: things looks good from my side. You have my +2 [23:37:40] !log catrope@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Set Growth Screener survey sample rate to 0.1% and limit to anons only (T248421) (duration: 01m 05s) [23:37:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:37:47] T248421: Deploy Quicksurveys extension on all Wikipedias (for a Growth study) - https://phabricator.wikimedia.org/T248421 [23:42:04] James_F: All done [23:43:50] thanks Roan! I'm signing off [23:49:22] (03PS2) 10Jforrester: Drop Sentry, Part I: Stop loading it anywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593347 (https://phabricator.wikimedia.org/T91649) [23:49:27] (03CR) 10Jforrester: [C: 03+2] Drop Sentry, Part I: Stop loading it anywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593347 (https://phabricator.wikimedia.org/T91649) (owner: 10Jforrester) [23:50:15] (03Merged) 10jenkins-bot: Drop Sentry, Part I: Stop loading it anywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593347 (https://phabricator.wikimedia.org/T91649) (owner: 10Jforrester) [23:50:53] (03PS2) 10Jforrester: Drop Sentry, Part II: Stop configuring it for Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593348 (https://phabricator.wikimedia.org/T91649) [23:51:43] !log jforrester@deploy1001 Synchronized wmf-config/CommonSettings.php: T91649 Drop Sentry, Part I: Stop loading it anywhere (duration: 01m 05s) [23:51:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:51:50] T91649: Deploy Sentry (JavaScript error logging) to production, configured to log only a limited subset of users/pages - https://phabricator.wikimedia.org/T91649 [23:52:03] (03CR) 10Jforrester: [C: 03+2] Drop Sentry, Part II: Stop configuring it for Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593348 (https://phabricator.wikimedia.org/T91649) (owner: 10Jforrester) [23:52:09] (03PS2) 10Jforrester: Drop Sentry, Part III: Drop from i18n build step [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593349 (https://phabricator.wikimedia.org/T91649) [23:53:01] (03Merged) 10jenkins-bot: Drop Sentry, Part II: Stop configuring it for Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593348 (https://phabricator.wikimedia.org/T91649) (owner: 10Jforrester) [23:53:07] (03CR) 10Krinkle: [C: 04-1] noc.wikimedia.org: highlight.php should not append .txt to dblist URLs (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591459 (https://phabricator.wikimedia.org/T250852) (owner: 10Urbanecm) [23:54:02] * Krinkle updates https://wikitech.wikimedia.org/wiki/Technical_debt/Extensions [23:55:46] If you insist. ;-) [23:55:54] (03CR) 10Jforrester: [C: 03+2] Drop Sentry, Part III: Drop from i18n build step [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593349 (https://phabricator.wikimedia.org/T91649) (owner: 10Jforrester) [23:55:58] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T91649 Drop Sentry, Part II: Stop configuring it for production or Beta Cluster (duration: 01m 05s) [23:56:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:56:47] (03Merged) 10jenkins-bot: Drop Sentry, Part III: Drop from i18n build step [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593349 (https://phabricator.wikimedia.org/T91649) (owner: 10Jforrester) [23:57:02] 10Operations, 10Commons, 10MediaWiki-File-management, 10Thumbor, 10Traffic: Ghostscript outputs errors to stdout despite -q, preventing Thumbor from generating some thumbnails properly - https://phabricator.wikimedia.org/T236240 (10AntiCompositeNumber) [23:57:03] Production is clear. [23:57:04] 10Operations, 10Commons, 10Thumbor: Thumbnailing page 2 of c:File:Mimořádné opatření - zákaz vývozu desinfekce rukou.pdf generates a non-fatal Ghostscript error that is piped to imagemagick - https://phabricator.wikimedia.org/T247473 (10AntiCompositeNumber) [23:58:26] Krinkle: Can we maintain that list on a proper wiki with VE and so on instead of a scrap heap?