[00:00:37] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:03:29] (03CR) 10Dzahn: "This is somewhat a special case as it's using hiera_array and as far as I can tell it should be replaced with lookup() and specifying the " [puppet] - 10https://gerrit.wikimedia.org/r/630694 (owner: 10Dzahn) [00:06:11] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [00:07:44] 10Operations, 10Puppet: Stop introducing new code expanded from erb templates - https://phabricator.wikimedia.org/T200984 (10Dzahn) This seems somewhat but not exactly a duplicate of T254480. The other ticket would be resolved once we have removed all cases of it from the repo. Then this ticket would be sti... [00:07:55] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [00:12:00] 10Operations, 10Puppet: Stop introducing new code expanded from erb templates - https://phabricator.wikimedia.org/T200984 (10Dzahn) also see: T148494 and T245266 for open tasks about adding lint checks for shell scripts. [00:12:54] 10Operations, 10MediaWiki-Documentation, 10User-Dereckson, 10patch-welcome: Repair "svn.wikimedia.org/doc/" redirect for doc.wikimedia.org - https://phabricator.wikimedia.org/T109950 (10Dereckson) Actually, you can use the compiler of redirect.dat as a software to output to stdout the result: `ruby module... [00:13:01] 10Operations, 10MediaWiki-Documentation, 10User-Dereckson, 10patch-welcome: Repair "svn.wikimedia.org/doc/" redirect for doc.wikimedia.org - https://phabricator.wikimedia.org/T109950 (10Dereckson) a:03Dereckson [00:14:43] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [00:16:23] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [00:17:48] (03PS1) 10Dereckson: Redirect svn.wikimedia.org/doc properly [puppet] - 10https://gerrit.wikimedia.org/r/631888 (https://phabricator.wikimedia.org/T109550) [00:22:28] (03CR) 10Dereckson: "Hi Valentin, I see you were the last to refactor the redirects compiler, could you check the problem we try to solve and the solution offe" [puppet] - 10https://gerrit.wikimedia.org/r/631888 (https://phabricator.wikimedia.org/T109550) (owner: 10Dereckson) [00:23:26] (03PS1) 10Dzahn: cdh/hiveserver2: add shebang, fix bashisms [puppet] - 10https://gerrit.wikimedia.org/r/631889 (https://phabricator.wikimedia.org/T95064) [00:24:40] (03PS2) 10Dereckson: Redirect svn.wikimedia.org/doc properly [puppet] - 10https://gerrit.wikimedia.org/r/631888 (https://phabricator.wikimedia.org/T109950) [00:29:43] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [00:31:21] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [00:31:30] (03PS3) 10Dereckson: Redirect svn.wikimedia.org/doc properly [puppet] - 10https://gerrit.wikimedia.org/r/631888 (https://phabricator.wikimedia.org/T109950) [00:31:45] (03PS1) 10Dzahn: turn a couple scripts without bashisms into sh scripts [puppet] - 10https://gerrit.wikimedia.org/r/631890 (https://phabricator.wikimedia.org/T95064) [00:32:21] (03PS4) 10Dereckson: Redirect svn.wikimedia.org/doc properly [puppet] - 10https://gerrit.wikimedia.org/r/631888 (https://phabricator.wikimedia.org/T109950) [00:33:00] (sorry for the flood, chmod and typo issue) [00:35:54] (03PS1) 10Dzahn: opernstack: turn bash scripts without bashisms into sh scripts [puppet] - 10https://gerrit.wikimedia.org/r/631891 (https://phabricator.wikimedia.org/T95064) [00:36:22] dereckson: flooding is alright in this channel :) as long as you don't get kicked like wikibugs sometimes [00:36:47] also, haven't seen you online in a while i think. cheers [00:38:50] hi :) [00:40:45] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:41:21] (03PS1) 10Dzahn: dumps/homer/trafficserver: turn bash scripts into sh scripts [puppet] - 10https://gerrit.wikimedia.org/r/631892 (https://phabricator.wikimedia.org/T95064) [00:45:55] (03PS1) 10Dzahn: admins/bd808: add bash shebang to .bash scripts [puppet] - 10https://gerrit.wikimedia.org/r/631893 (https://phabricator.wikimedia.org/T95064) [00:47:37] (03PS2) 10Dzahn: admins/bd808: add bash shebang to .bash scripts [puppet] - 10https://gerrit.wikimedia.org/r/631893 (https://phabricator.wikimedia.org/T95064) [00:55:28] (03PS1) 10Dzahn: admins/ori: add bash shebang to .z.sh [puppet] - 10https://gerrit.wikimedia.org/r/631895 (https://phabricator.wikimedia.org/T95064) [00:59:25] (03PS1) 10Razzi: Archive Maxmind database files to hadoop only [puppet] - 10https://gerrit.wikimedia.org/r/631896 (https://phabricator.wikimedia.org/T264152) [00:59:48] (03PS1) 10Dzahn: admins/rush: add shebangs to shell scripts [puppet] - 10https://gerrit.wikimedia.org/r/631897 (https://phabricator.wikimedia.org/T95064) [01:00:37] (03CR) 10Dzahn: "sorry for adding both users, I can't remember for the life of me which is the right one:)" [puppet] - 10https://gerrit.wikimedia.org/r/631897 (https://phabricator.wikimedia.org/T95064) (owner: 10Dzahn) [01:02:06] (03CR) 10Razzi: "This is very much a work-in-progress. An additional change that is requested in the task is moving this from the current node, stat1007, t" [puppet] - 10https://gerrit.wikimedia.org/r/631896 (https://phabricator.wikimedia.org/T264152) (owner: 10Razzi) [01:15:22] (03PS2) 10Dzahn: phabricator: don't create chk_phuser shell script from erb (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/630257 (https://phabricator.wikimedia.org/T254480) [01:16:12] (03PS3) 10Dzahn: phabricator: don't create chk_phuser shell script from erb [puppet] - 10https://gerrit.wikimedia.org/r/630257 (https://phabricator.wikimedia.org/T254480) [01:17:12] (03CR) 10jerkins-bot: [V: 04-1] phabricator: don't create chk_phuser shell script from erb [puppet] - 10https://gerrit.wikimedia.org/r/630257 (https://phabricator.wikimedia.org/T254480) (owner: 10Dzahn) [01:27:17] (03PS4) 10Dzahn: phabricator: don't create chk_phuser shell script from erb [puppet] - 10https://gerrit.wikimedia.org/r/630257 (https://phabricator.wikimedia.org/T254480) [01:28:17] (03CR) 10jerkins-bot: [V: 04-1] phabricator: don't create chk_phuser shell script from erb [puppet] - 10https://gerrit.wikimedia.org/r/630257 (https://phabricator.wikimedia.org/T254480) (owner: 10Dzahn) [01:30:52] (03PS5) 10Dzahn: phabricator: don't create chk_phuser shell script from erb [puppet] - 10https://gerrit.wikimedia.org/r/630257 (https://phabricator.wikimedia.org/T254480) [01:41:55] PROBLEM - Check systemd state on ms-be2055 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:41:55] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1003/25652/phab1001.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/630257 (https://phabricator.wikimedia.org/T254480) (owner: 10Dzahn) [01:51:52] (03PS1) 10Dzahn: recommendation_api: hiera->lookup, data types [puppet] - 10https://gerrit.wikimedia.org/r/631898 [01:58:48] (03PS1) 10Dzahn: bird/piwki/elasticsearch: replace hiera with lookup, data types [puppet] - 10https://gerrit.wikimedia.org/r/631899 [01:59:47] (03CR) 10jerkins-bot: [V: 04-1] bird/piwki/elasticsearch: replace hiera with lookup, data types [puppet] - 10https://gerrit.wikimedia.org/r/631899 (owner: 10Dzahn) [02:05:03] (03PS2) 10Dzahn: bird/piwki/elasticsearch: replace hiera with lookup, data types [puppet] - 10https://gerrit.wikimedia.org/r/631899 [02:07:50] (03PS1) 10Dzahn: stdlib: update to version 5.2 [puppet] - 10https://gerrit.wikimedia.org/r/631900 [02:08:37] (03CR) 10Dzahn: "https://gerrit.wikimedia.org/r/631900" [puppet] - 10https://gerrit.wikimedia.org/r/631522 (owner: 10Dzahn) [02:10:37] (03PS2) 10Dzahn: stdlib: update to version 5.2.0 [puppet] - 10https://gerrit.wikimedia.org/r/631900 [02:10:53] (03PS3) 10Dzahn: stdlib: update to version 5.2.0 [puppet] - 10https://gerrit.wikimedia.org/r/631900 [02:12:09] (03CR) 10Dzahn: "what's with the "+/- 0" files in the diff?" [puppet] - 10https://gerrit.wikimedia.org/r/631900 (owner: 10Dzahn) [02:12:46] (03CR) 10DannyS712: "> Patch Set 3:" [puppet] - 10https://gerrit.wikimedia.org/r/631900 (owner: 10Dzahn) [02:14:27] (03CR) 10Dzahn: "ah, thanks. yes. that makes the review more annoying of course 😊" [puppet] - 10https://gerrit.wikimedia.org/r/631900 (owner: 10Dzahn) [02:20:19] RECOVERY - Check systemd state on ms-be2055 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [02:23:46] (03PS1) 10Dzahn: stdlib: change file mode from 755 to 644 for acceptance test files [puppet] - 10https://gerrit.wikimedia.org/r/631901 [02:24:29] (03PS4) 10Dzahn: stdlib: update to version 5.2.0 [puppet] - 10https://gerrit.wikimedia.org/r/631900 [02:27:50] (03PS2) 10Dzahn: stdlib: change file mode from 755 to 644 for spec files [puppet] - 10https://gerrit.wikimedia.org/r/631901 [02:28:14] (03PS5) 10Dzahn: stdlib: update to version 5.2.0 [puppet] - 10https://gerrit.wikimedia.org/r/631900 [02:29:02] (03CR) 10Dzahn: "much smaller now after rebasing on top of Ibe9f9cae0a9a88e2e1d" [puppet] - 10https://gerrit.wikimedia.org/r/631900 (owner: 10Dzahn) [07:00:05] Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201003T0700) [07:34:31] (03CR) 10ArielGlenn: [C: 03+1] "Fine for the kiwix rsync script, but note that in dumps/manifests/web/fetches/kiwix.pp it's invoked with bash anyways :-)" [puppet] - 10https://gerrit.wikimedia.org/r/631892 (https://phabricator.wikimedia.org/T95064) (owner: 10Dzahn) [07:40:51] PROBLEM - Check systemd state on ms-be2055 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [07:53:04] (03PS1) 10ArielGlenn: remove snapshot01 from scap targets in deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/631907 (https://phabricator.wikimedia.org/T245402) [07:54:00] (03CR) 10ArielGlenn: [C: 03+2] remove snapshot01 from scap targets in deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/631907 (https://phabricator.wikimedia.org/T245402) (owner: 10ArielGlenn) [08:11:39] RECOVERY - Memcached on mw2271 is OK: TCP OK - 0.037 second response time on 10.192.48.93 port 11210 https://wikitech.wikimedia.org/wiki/Memcached [08:20:39] RECOVERY - Check systemd state on ms-be2055 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [08:43:02] (03Abandoned) 10Elukey: Add profile::hadoop::worker::gpu to Hadoop workers' role [puppet] - 10https://gerrit.wikimedia.org/r/630861 (https://phabricator.wikimedia.org/T255138) (owner: 10Elukey) [08:43:17] (03Abandoned) 10Elukey: role::analytics_test_cluster::coordinator: add analytics users without ssh keys [puppet] - 10https://gerrit.wikimedia.org/r/630218 (https://phabricator.wikimedia.org/T262660) (owner: 10Elukey) [08:43:26] (03Abandoned) 10Elukey: profile::hadoop::common: add datanode mountpoints override [puppet] - 10https://gerrit.wikimedia.org/r/629165 (owner: 10Elukey) [08:53:07] (03CR) 10Elukey: "Very good start, I left some comments as follow up, lemme know your thoughts. Very close to merge :)" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/631849 (https://phabricator.wikimedia.org/T262660) (owner: 10Razzi) [09:14:21] (03PS1) 10Elukey: Import the config module from Spicerack [software/pywmflib] - 10https://gerrit.wikimedia.org/r/631909 (https://phabricator.wikimedia.org/T257905) [09:18:59] (03PS2) 10Elukey: Import the config module from Spicerack [software/pywmflib] - 10https://gerrit.wikimedia.org/r/631909 (https://phabricator.wikimedia.org/T257905) [09:25:01] (03PS1) 10Elukey: Import the phabricator module from Spicerack [software/pywmflib] - 10https://gerrit.wikimedia.org/r/631910 (https://phabricator.wikimedia.org/T257905) [10:11:19] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [10:12:59] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [10:21:19] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [10:22:59] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [10:43:05] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=pdu_sentry4 site=eqsin https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [10:44:45] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [11:11:31] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [11:13:09] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [11:55:37] 10Operations, 10OTRS, 10serviceops, 10Patch-For-Review, 10User-notice: Update OTRS to the latest stable version (6.0.x) - https://phabricator.wikimedia.org/T187984 (10akosiaris) 05Open→03Resolved The upgrade has happened successfully and tickets for followup work that is required as a result of this... [12:11:43] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [12:13:25] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [12:28:07] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [12:29:49] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [12:44:51] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job={routinator,swagger_check_restbase_esams} site={codfw,esams} https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [12:46:33] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [13:37:07] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job={atlas_exporter,swagger_check_restbase_esams} site={eqiad,esams} https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [13:38:47] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [14:38:19] PROBLEM - MediaWiki exceptions and fatals per minute on alert1001 is CRITICAL: 181 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [14:39:59] RECOVERY - MediaWiki exceptions and fatals per minute on alert1001 is OK: (C)100 gt (W)50 gt 8 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [14:43:05] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [14:46:23] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [15:28:13] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [15:29:55] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [15:50:01] (03PS1) 10Urbanecm: Restrict 'flow-hide' right to autoconfirmed users on zhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/631930 (https://phabricator.wikimedia.org/T264489) [15:50:18] (03CR) 10Urbanecm: [C: 03+2] "emergency action per https://phabricator.wikimedia.org/T264489" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/631930 (https://phabricator.wikimedia.org/T264489) (owner: 10Urbanecm) [15:51:01] (03Merged) 10jenkins-bot: Restrict 'flow-hide' right to autoconfirmed users on zhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/631930 (https://phabricator.wikimedia.org/T264489) (owner: 10Urbanecm) [15:52:54] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: emergency: 840545f1d9115ea6b672cecce1762d850d8b1f54: Restrict flow-hide right to autoconfirmed users on zhwiki (T264489) (duration: 01m 17s) [15:52:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:25:19] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [16:26:59] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [16:30:16] (03CR) 10RhinosF1: [C: 04-1] "Shouldn't the existing private wikis with true explicity set be removed now it will be default?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/631423 (https://phabricator.wikimedia.org/T258356) (owner: 10Urbanecm) [16:49:32] (03CR) 10Urbanecm: "> Patch Set 1: Code-Review-1" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/631423 (https://phabricator.wikimedia.org/T258356) (owner: 10Urbanecm) [16:50:29] (03PS2) 10Urbanecm: Enable bot passwords at all fishbowl and private wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/631423 (https://phabricator.wikimedia.org/T258356) [16:52:15] (03CR) 10RhinosF1: [C: 03+1] "> Patch Set 1:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/631423 (https://phabricator.wikimedia.org/T258356) (owner: 10Urbanecm) [16:55:37] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [16:57:19] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [17:28:26] 10Operations, 10Mail, 10Security: Don't get a mail to confirm my email address - https://phabricator.wikimedia.org/T264504 (10Urbanecm) [18:02:45] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_citoid_cluster_codfw site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [18:04:27] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [18:39:45] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [18:41:27] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [18:53:07] 10Operations, 10Mail, 10Security: Don't get a mail to confirm my email address - https://phabricator.wikimedia.org/T264504 (10Reedy) [19:09:53] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [19:14:06] 10Operations, 10SRE-Access-Requests: Change urbanecm's SSH production key - https://phabricator.wikimedia.org/T264345 (10Urbanecm) 05Open→03Resolved a:03herron [19:14:55] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [19:26:40] (03CR) 10BryanDavis: [C: 03+1] admins/bd808: add bash shebang to .bash scripts [puppet] - 10https://gerrit.wikimedia.org/r/631893 (https://phabricator.wikimedia.org/T95064) (owner: 10Dzahn) [20:05:33] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [20:07:13] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [20:23:15] PROBLEM - Varnish traffic drop between 30min ago and now at eqiad on alert1001 is CRITICAL: 45.34 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1 [20:28:17] RECOVERY - Varnish traffic drop between 30min ago and now at eqiad on alert1001 is OK: (C)60 le (W)70 le 71.8 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1 [20:28:53] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [20:32:17] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [20:53:31] PROBLEM - Check for VMs leaked by the nova-fullstack test on cloudcontrol1003 is CRITICAL: 10 instances in the admin-monitoring project https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting%23Nova-fullstack [21:03:41] PROBLEM - Check for VMs leaked by the nova-fullstack test on cloudcontrol1003 is CRITICAL: 10 instances in the admin-monitoring project https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting%23Nova-fullstack [21:08:45] RECOVERY - Check for VMs leaked by the nova-fullstack test on cloudcontrol1003 is OK: 3 instances in the admin-monitoring project https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting%23Nova-fullstack [21:17:58] (03PS1) 10Andrew Bogott: nova fullstack test: run tests every 10 minutes and increase timeouts [puppet] - 10https://gerrit.wikimedia.org/r/631947 [21:18:52] (03CR) 10Andrew Bogott: [C: 03+2] nova fullstack test: run tests every 10 minutes and increase timeouts [puppet] - 10https://gerrit.wikimedia.org/r/631947 (owner: 10Andrew Bogott) [21:39:47] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [21:41:29] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [22:04:01] PROBLEM - wikifeeds codfw on wikifeeds.svc.codfw.wmnet is CRITICAL: /{domain}/v1/feed/onthisday/{type}/{month}/{day} (retrieve selected events on January 15) timed out before a response was received https://wikitech.wikimedia.org/wiki/Wikifeeds [22:08:23] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_wikifeeds_codfw site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [22:10:37] RECOVERY - wikifeeds codfw on wikifeeds.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Wikifeeds [22:11:45] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [22:18:33] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [22:20:17] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [22:43:57] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_citoid_cluster_codfw site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [22:45:39] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [22:52:23] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [22:54:05] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [23:04:07] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [23:05:49] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [23:15:09] (03PS3) 10Ladsgroup: mailman3: Start mailman3 [puppet] - 10https://gerrit.wikimedia.org/r/608163 (https://phabricator.wikimedia.org/T256536) [23:16:55] (03CR) 10jerkins-bot: [V: 04-1] mailman3: Start mailman3 [puppet] - 10https://gerrit.wikimedia.org/r/608163 (https://phabricator.wikimedia.org/T256536) (owner: 10Ladsgroup) [23:20:43] 10Operations, 10Wikimedia-Mailing-lists, 10I18n: Several unreadable mailing list descriptions (Mojibake) due to wrong charset encodings, should be Unicode - https://phabricator.wikimedia.org/T261031 (10Ladsgroup) >>! In T261031#6474717, @Aklapper wrote: > Wondering if backporting https://gitlab.com/mailman/m... [23:24:41] (03PS4) 10Ladsgroup: mailman3: Start mailman3 [puppet] - 10https://gerrit.wikimedia.org/r/608163 (https://phabricator.wikimedia.org/T256536) [23:27:50] (03CR) 10jerkins-bot: [V: 04-1] mailman3: Start mailman3 [puppet] - 10https://gerrit.wikimedia.org/r/608163 (https://phabricator.wikimedia.org/T256536) (owner: 10Ladsgroup) [23:36:15] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [23:37:55] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [23:39:31] (03PS1) 10Ladsgroup: mailman: Make apache serve with utf-8 charset [puppet] - 10https://gerrit.wikimedia.org/r/631952 (https://phabricator.wikimedia.org/T261031) [23:44:05] (03PS5) 10Ladsgroup: mailman3: Start mailman3 [puppet] - 10https://gerrit.wikimedia.org/r/608163 (https://phabricator.wikimedia.org/T256536) [23:47:01] 10Operations, 10Wikimedia-Mailing-lists, 10Patch-For-Review: Puppetize mailman3 - https://phabricator.wikimedia.org/T256536 (10Ladsgroup) So https://gerrit.wikimedia.org/r/c/operations/puppet/+/608163 is cherry-picked on the standalone puppet master and it works to some degree which is nice. This is going to... [23:50:45] 10Operations, 10Wikimedia-Mailing-lists, 10Patch-For-Review: Puppetize mailman3 - https://phabricator.wikimedia.org/T256536 (10Ladsgroup) BTW I put these hiera values in the mailman puppetmaster (`/var/lib/git/labs/private/hieradata/labs/mailman/common.yaml`) `lang=yaml profile::mailman3::db_host: mailman-db...