[00:32:53] 10Operations, 10ops-eqiad: Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T245324 (10ops-monitoring-bot) [00:32:55] 10Operations, 10Beta-Cluster-Infrastructure: Upgrade puppet in deployment-prep (Puppet agent broken in Beta Cluster) - https://phabricator.wikimedia.org/T243226 (10Krenair) So this error was the memory usage problem on puppetdb03 I mentioned above - puppetdb won't work without postgresql, which can't start bec... [00:34:16] 10Operations, 10ops-eqiad, 10cloud-services-team (Hardware): Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T245324 (10bd808) [00:35:14] 10Operations, 10ops-eqiad, 10cloud-services-team (Hardware): Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T245324 (10bd808) [00:35:17] 10Operations, 10ops-eqiad, 10Patch-For-Review, 10cloud-services-team (Hardware): Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T241884 (10bd808) [01:01:40] !log ✔️ cdanis@an-coord1001.eqiad.wmnet ~ 🕗🍺 sudo systemctl restart hive-server2.service ; sudo systemctl restart hive-metastore.service [01:01:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:01:44] 10Operations, 10Traffic, 10Patch-For-Review: ats-tls performance issues under production load - https://phabricator.wikimedia.org/T244538 (10Krinkle) >>! From `#wikimedia-traffic`: > - vgutierrez: what kind of DNS queries was ats-tls performing though? Surely not the address to localhost varnish-... [02:15:49] (03PS1) 10Dzahn: add apt2001.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/572377 [02:26:56] (03PS1) 10Dzahn: doc: add envoy for TLS termination on doc1001 [puppet] - 10https://gerrit.wikimedia.org/r/572378 (https://phabricator.wikimedia.org/T210411) [02:29:12] (03PS1) 10Dzahn: add doc.discovery.wmnet for use in envoy config [dns] - 10https://gerrit.wikimedia.org/r/572380 (https://phabricator.wikimedia.org/T210411) [02:41:22] (03PS1) 10Dzahn: wmcs::monitoring: add envoy for TLS termination for grafana-labs [puppet] - 10https://gerrit.wikimedia.org/r/572381 (https://phabricator.wikimedia.org/T210411) [02:45:53] (03PS1) 10Dzahn: ATS: switch backend URL to https for grafana-labs [puppet] - 10https://gerrit.wikimedia.org/r/572382 (https://phabricator.wikimedia.org/T210411) [02:50:26] (03PS1) 10Dzahn: add grafana-labs.discovery.wmnet [dns] - 10https://gerrit.wikimedia.org/r/572385 (https://phabricator.wikimedia.org/T210411) [02:52:07] (03PS1) 10Dzahn: add graphite-labs.discovery.wmnet [dns] - 10https://gerrit.wikimedia.org/r/572387 (https://phabricator.wikimedia.org/T210411) [02:55:09] (03PS1) 10Dzahn: ATS: switch backend URL to https/discovery for graphite-labs [puppet] - 10https://gerrit.wikimedia.org/r/572391 (https://phabricator.wikimedia.org/T210411) [03:07:55] (03PS1) 10Dzahn: site: add installserver::light role on new install servers [puppet] - 10https://gerrit.wikimedia.org/r/572394 [03:23:06] (03PS1) 10Dzahn: ATS: remove commented webperf2001 from backend config [puppet] - 10https://gerrit.wikimedia.org/r/572395 [03:25:35] 10Operations, 10Traffic, 10serviceops, 10Patch-For-Review: Applayer services without TLS - https://phabricator.wikimedia.org/T210411 (10Dzahn) [03:26:08] 10Operations, 10Traffic, 10serviceops, 10Patch-For-Review: Applayer services without TLS - https://phabricator.wikimedia.org/T210411 (10Dzahn) meanwhile there is another one in ATS backend.yaml. added [ ] cloudweb2001-dev.wikimedia.org - http://labtesthorizon.wikimedia.org , http://labtestwikitech.wikimed... [03:31:17] (03PS1) 10Dzahn: ATS: remove commented blubberoid non-discovery record from backend [puppet] - 10https://gerrit.wikimedia.org/r/572396 [03:34:11] 10Operations, 10ops-eqiad, 10DC-Ops: mr1-eqiad.wikimedia.org - Duplicate IP on mgmt network - https://phabricator.wikimedia.org/T245320 (10Peachey88) [05:43:49] (03PS2) 10Gergő Tisza: Increase Commons linkpurge rate limit for patrollers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/572339 (https://phabricator.wikimedia.org/T245214) [07:07:04] (03CR) 10Brian Wolff: [C: 03+1] Increase Commons linkpurge rate limit for patrollers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/572339 (https://phabricator.wikimedia.org/T245214) (owner: 10Gergő Tisza) [07:38:24] (03PS1) 10Gergő Tisza: Make the logstash and authmanager-statsd Monolog handlers compatible [mediawiki-config] - 10https://gerrit.wikimedia.org/r/572401 [07:39:36] (03CR) 10Gergő Tisza: "Oops, this broke the dashboard because the logstash handler overwrites the statsd one:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477005 (owner: 10Brian Wolff) [16:01:00] (03PS1) 10Andrew Bogott: keystone fernet key rotation: delete files during rsync [puppet] - 10https://gerrit.wikimedia.org/r/572413 (https://phabricator.wikimedia.org/T243418) [16:30:47] (03CR) 10Andrew Bogott: [C: 03+2] keystone fernet key rotation: delete files during rsync [puppet] - 10https://gerrit.wikimedia.org/r/572413 (https://phabricator.wikimedia.org/T243418) (owner: 10Andrew Bogott) [19:19:26] (03PS2) 10Krinkle: admin: change matrix.php column "grp" to "groups" [puppet] - 10https://gerrit.wikimedia.org/r/556281 [19:47:36] 10Operations, 10Beta-Cluster-Infrastructure: Upgrade puppet in deployment-prep (Puppet agent broken in Beta Cluster) - https://phabricator.wikimedia.org/T243226 (10Krenair) Thanks to Andrew it seems to be running well now. I've copied across /var/lib/puppet/volatile to sort a lot of swift/GeoIP failures. [20:23:08] 10Operations, 10Beta-Cluster-Infrastructure: Upgrade puppet in deployment-prep (Puppet agent broken in Beta Cluster) - https://phabricator.wikimedia.org/T243226 (10Krenair) Also copied /etc/conftool-state/mediawiki.yaml to sort out mediawiki::state for mwmaint01 I've also taken /root and /home and put them at... [21:27:42] (03PS1) 10Alex Monk: Misc work to make puppet run in codfw1dev again following Icad66f70 [puppet] - 10https://gerrit.wikimedia.org/r/572421 (https://phabricator.wikimedia.org/T242607) [21:53:31] (03PS2) 10Alex Monk: Misc work to make puppet run in codfw1dev again following Icad66f70 [puppet] - 10https://gerrit.wikimedia.org/r/572421 (https://phabricator.wikimedia.org/T242607) [21:58:52] (03CR) 10Krinkle: Scrape webperf Prometheus metrics (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/572141 (https://phabricator.wikimedia.org/T175087) (owner: 10Dave Pifke) [21:59:19] (03PS2) 10Krinkle: Scrape webperf Prometheus metrics [puppet] - 10https://gerrit.wikimedia.org/r/572141 (https://phabricator.wikimedia.org/T175087) (owner: 10Dave Pifke) [22:01:03] (03CR) 10Krinkle: [C: 03+1] Make the logstash and authmanager-statsd Monolog handlers compatible [mediawiki-config] - 10https://gerrit.wikimedia.org/r/572401 (owner: 10Gergő Tisza) [22:06:16] (03PS3) 10Alex Monk: Misc work to make puppet run in codfw1dev again following Icad66f70 [puppet] - 10https://gerrit.wikimedia.org/r/572421 (https://phabricator.wikimedia.org/T242607) [22:30:45] (03CR) 10Jforrester: [C: 03+1] admin: change matrix.php column "grp" to "groups" [puppet] - 10https://gerrit.wikimedia.org/r/556281 (owner: 10Krinkle)