[00:10:33] (03PS6) 10Jforrester: Variant configuration: Allow for YAML-based inheritance of configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538129 [00:10:35] (03PS2) 10Jforrester: Variant configuration: Move some dblist configuration into YAML [mediawiki-config] - 10https://gerrit.wikimedia.org/r/539414 [00:10:37] (03PS1) 10Jforrester: Variant configuration: Move some all-wiki configuration from CS to all.yaml [mediawiki-config] - 10https://gerrit.wikimedia.org/r/539436 [00:11:53] (03PS3) 10Jforrester: Variant configuration: Move some dblist configuration into YAML [mediawiki-config] - 10https://gerrit.wikimedia.org/r/539414 [00:15:34] (03PS2) 10Jforrester: Variant configuration: Move some all-wiki configuration from CS to all.yaml [mediawiki-config] - 10https://gerrit.wikimedia.org/r/539436 [00:15:36] (03PS4) 10Jforrester: Variant configuration: Move some dblist configuration into YAML [mediawiki-config] - 10https://gerrit.wikimedia.org/r/539414 [00:18:44] (03PS8) 10Andrew Bogott: novaproxy: support hiera config for blocking ips, user agents, referers [puppet] - 10https://gerrit.wikimedia.org/r/479041 (https://phabricator.wikimedia.org/T211709) [00:22:15] (03CR) 10Andrew Bogott: [C: 03+2] novaproxy: support hiera config for blocking ips, user agents, referers [puppet] - 10https://gerrit.wikimedia.org/r/479041 (https://phabricator.wikimedia.org/T211709) (owner: 10Andrew Bogott) [00:29:57] (03PS1) 10Bstorm: labstore: add visualeditor project to dumps mounts [puppet] - 10https://gerrit.wikimedia.org/r/539437 (https://phabricator.wikimedia.org/T164992) [00:53:52] !log hotfixing phabricator fatal exception refs T233998 [00:53:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:53:55] T233998: ArgumentCountError on Phabricator - https://phabricator.wikimedia.org/T233998 [01:37:29] PROBLEM - Kafka Broker Replica Max Lag on kafka-jumbo1003 is CRITICAL: 5.044e+06 ge 5e+06 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/dashboard/db/kafka?panelId=16&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=jumbo-eqiad&var-kafka_broker=kafka-jumbo1003 [02:58:23] (03PS4) 10CRusnov: netbox: Setup automated DNS generation [puppet] - 10https://gerrit.wikimedia.org/r/539182 (https://phabricator.wikimedia.org/T233183) [02:59:40] (03CR) 10CRusnov: [C: 03+1] "LGTM." [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/539192 (owner: 10Ayounsi) [03:12:37] RECOVERY - Kafka Broker Replica Max Lag on kafka-jumbo1003 is OK: (C)5e+06 ge (W)1e+06 ge 9.668e+05 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/dashboard/db/kafka?panelId=16&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=jumbo-eqiad&var-kafka_broker=kafka-jumbo1003 [03:21:27] (03PS5) 10CRusnov: netbox: Setup automated DNS generation [puppet] - 10https://gerrit.wikimedia.org/r/539182 (https://phabricator.wikimedia.org/T233183) [03:30:59] PROBLEM - MediaWiki eqiad exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [03:32:37] RECOVERY - MediaWiki eqiad exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [03:34:27] (03PS6) 10CRusnov: netbox: Setup automated DNS generation [puppet] - 10https://gerrit.wikimedia.org/r/539182 (https://phabricator.wikimedia.org/T233183) [03:39:10] (03PS7) 10CRusnov: netbox: Setup automated DNS generation [puppet] - 10https://gerrit.wikimedia.org/r/539182 (https://phabricator.wikimedia.org/T233183) [03:46:35] (03PS8) 10CRusnov: netbox: Setup automated DNS generation [puppet] - 10https://gerrit.wikimedia.org/r/539182 (https://phabricator.wikimedia.org/T233183) [03:53:16] (03CR) 10CRusnov: "puppet compiler is happy now, although for some reason it doesn't put 'authdns1001.wikimedia.org' in the allowed hosts (it does put 2001 a" [puppet] - 10https://gerrit.wikimedia.org/r/539182 (https://phabricator.wikimedia.org/T233183) (owner: 10CRusnov) [03:54:35] (03CR) 10KartikMistry: "> This new global is landed in master and will ship in 1.34.0-wmf.25;" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538867 (https://phabricator.wikimedia.org/T232986) (owner: 10KartikMistry) [04:19:17] PROBLEM - Check systemd state on an-coord1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [04:55:24] (03PS1) 10Vgutierrez: ATS: Gather metrics regarding parent servers [puppet] - 10https://gerrit.wikimedia.org/r/539446 (https://phabricator.wikimedia.org/T231627) [04:56:33] RECOVERY - haproxy failover on dbproxy1010 is OK: OK check_failover servers up 2 down 0 https://wikitech.wikimedia.org/wiki/HAProxy [04:56:53] RECOVERY - haproxy failover on dbproxy1019 is OK: OK check_failover servers up 2 down 0 https://wikitech.wikimedia.org/wiki/HAProxy [04:56:55] RECOVERY - haproxy failover on dbproxy1011 is OK: OK check_failover servers up 2 down 0 https://wikitech.wikimedia.org/wiki/HAProxy [04:57:21] RECOVERY - haproxy failover on dbproxy1018 is OK: OK check_failover servers up 2 down 0 https://wikitech.wikimedia.org/wiki/HAProxy [04:58:08] (03PS2) 10Vgutierrez: ATS: Gather metrics regarding parent servers [puppet] - 10https://gerrit.wikimedia.org/r/539446 (https://phabricator.wikimedia.org/T231627) [05:01:00] (03CR) 10Vgutierrez: [C: 03+2] ATS: Gather metrics regarding parent servers [puppet] - 10https://gerrit.wikimedia.org/r/539446 (https://phabricator.wikimedia.org/T231627) (owner: 10Vgutierrez) [05:18:38] 10Operations, 10ops-eqiad, 10serviceops: mw1286.mgmt is down - https://phabricator.wikimedia.org/T234009 (10jijiki) [05:19:11] ACKNOWLEDGEMENT - SSH mw1286.mgmt on mw1286.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds Effie Mouzeli T234009 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [05:23:56] !log remove tcp-mss clamping from cr1-eqiad - T232602 [05:24:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:24:01] T232602: GRE MTU mitigations - Tracking - https://phabricator.wikimedia.org/T232602 [05:30:26] !log remove tcp-mss clamping from cr2-eqord - T232602 [05:30:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:30:33] T232602: GRE MTU mitigations - Tracking - https://phabricator.wikimedia.org/T232602 [05:42:31] !log remove tcp-mss clamping from cr2-eqiad - T232602 [05:42:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:42:36] T232602: GRE MTU mitigations - Tracking - https://phabricator.wikimedia.org/T232602 [05:56:59] PROBLEM - Disk space on elastic1025 is CRITICAL: DISK CRITICAL - free space: /srv 26086 MB (5% inode=99%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=elastic1025&var-datasource=eqiad+prometheus/ops [06:10:12] (03PS1) 10Volans: Homer: setup private repo [puppet] - 10https://gerrit.wikimedia.org/r/539453 (https://phabricator.wikimedia.org/T228388) [06:12:22] (03CR) 10jerkins-bot: [V: 04-1] Homer: setup private repo [puppet] - 10https://gerrit.wikimedia.org/r/539453 (https://phabricator.wikimedia.org/T228388) (owner: 10Volans) [06:16:04] ^ for elastic1025, shards are relocating and should be fine soon enough [06:16:23] RECOVERY - Disk space on elastic1025 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=elastic1025&var-datasource=eqiad+prometheus/ops [06:16:35] <_joe_> onimisionipe: ack, thanks :) [06:22:09] 10Operations, 10SRE-tools, 10netops, 10Goal, 10Patch-For-Review: Configuration management for network operations - https://phabricator.wikimedia.org/T228388 (10Volans) [06:22:56] 10Operations, 10Traffic: ATS fails to log the used SSLCurve when the SSL session is being reused - https://phabricator.wikimedia.org/T234011 (10Vgutierrez) [06:27:11] (03PS2) 10Volans: Homer: setup private repo [puppet] - 10https://gerrit.wikimedia.org/r/539453 (https://phabricator.wikimedia.org/T228388) [06:29:45] (03PS2) 10Alexandros Kosiaris: rsyslog::input::file: Fix regular expression [puppet] - 10https://gerrit.wikimedia.org/r/539418 [06:29:47] (03PS6) 10Alexandros Kosiaris: rsyslog: Support adding metadata to input, default to off [puppet] - 10https://gerrit.wikimedia.org/r/538626 (https://phabricator.wikimedia.org/T207200) [06:29:49] (03PS2) 10Alexandros Kosiaris: rsyslog: Support adding cee tag to input file [puppet] - 10https://gerrit.wikimedia.org/r/539419 (https://phabricator.wikimedia.org/T207200) [06:29:51] (03PS7) 10Alexandros Kosiaris: rsyslog: populate kubernetes configuration [puppet] - 10https://gerrit.wikimedia.org/r/538627 (https://phabricator.wikimedia.org/T207200) [06:30:05] (03CR) 10jerkins-bot: [V: 04-1] Homer: setup private repo [puppet] - 10https://gerrit.wikimedia.org/r/539453 (https://phabricator.wikimedia.org/T228388) (owner: 10Volans) [06:31:24] 10Operations, 10serviceops: Update component/php72 to 7.2.22 - https://phabricator.wikimedia.org/T230024 (10jijiki) @Dzahn Is it ok if you upgrade on phab*? [06:32:42] (03PS3) 10Volans: Homer: setup private repo [puppet] - 10https://gerrit.wikimedia.org/r/539453 (https://phabricator.wikimedia.org/T228388) [06:34:42] 10Operations, 10serviceops, 10HHVM, 10Performance-Team (Radar): Remove HHVM from production - https://phabricator.wikimedia.org/T229792 (10jijiki) [06:35:55] (03PS1) 10Effie Mouzeli: hiera: install php-fpm on maintenance servers [puppet] - 10https://gerrit.wikimedia.org/r/539458 (https://phabricator.wikimedia.org/T229792) [06:39:49] (03CR) 10Effie Mouzeli: [V: 03+1] "https://puppet-compiler.wmflabs.org/compiler1002/18636/mwmaint1002.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/539458 (https://phabricator.wikimedia.org/T229792) (owner: 10Effie Mouzeli) [06:42:47] (03PS4) 10Volans: Homer: setup private repo [puppet] - 10https://gerrit.wikimedia.org/r/539453 (https://phabricator.wikimedia.org/T228388) [06:43:27] (03PS1) 10Effie Mouzeli: noc: switch catch_all to php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/539459 (https://phabricator.wikimedia.org/T229792) [06:45:25] (03PS5) 10Volans: Homer: setup private repo [puppet] - 10https://gerrit.wikimedia.org/r/539453 (https://phabricator.wikimedia.org/T228388) [06:46:00] (03CR) 10jerkins-bot: [V: 04-1] noc: switch catch_all to php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/539459 (https://phabricator.wikimedia.org/T229792) (owner: 10Effie Mouzeli) [06:48:17] (03PS1) 10Elukey: Remove absented python2 libs from Analytics profiles [puppet] - 10https://gerrit.wikimedia.org/r/539460 (https://phabricator.wikimedia.org/T204734) [06:49:49] PROBLEM - MediaWiki eqiad exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [06:50:29] PROBLEM - MediaWiki codfw exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=codfw https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=codfw+prometheus/ops [06:50:38] (03CR) 10Elukey: [C: 03+2] Remove absented python2 libs from Analytics profiles [puppet] - 10https://gerrit.wikimedia.org/r/539460 (https://phabricator.wikimedia.org/T204734) (owner: 10Elukey) [06:51:58] (03CR) 10Volans: "Compiler is failing becuase it doesn't find any host matching the homer profile, but I think is an issue with the compiler maybe not up to" [puppet] - 10https://gerrit.wikimedia.org/r/539453 (https://phabricator.wikimedia.org/T228388) (owner: 10Volans) [06:52:55] 10Operations, 10Core Platform Team, 10Performance-Team, 10TechCom-RFC, and 6 others: RFC: Serve Main Page of Wikimedia wikis from a consistent URL - https://phabricator.wikimedia.org/T120085 (10awight) [06:53:12] (03CR) 10Elukey: "Ready to go Marcel?" [puppet] - 10https://gerrit.wikimedia.org/r/539385 (owner: 10Mforns) [06:54:24] (03PS1) 10Alexandros Kosiaris: add_ip6_mapped: Ignore errors if ip token set fails [puppet] - 10https://gerrit.wikimedia.org/r/539462 (https://phabricator.wikimedia.org/T233906) [06:56:19] PROBLEM - MediaWiki eqiad exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [06:56:57] (03PS2) 10Effie Mouzeli: noc: switch catch_all to php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/539459 (https://phabricator.wikimedia.org/T229792) [06:56:59] PROBLEM - MediaWiki codfw exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=codfw https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=codfw+prometheus/ops [06:58:01] (03Abandoned) 10Muehlenhoff: Initial Kerberos KDC/kadminserver profiles/roles (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/494242 (owner: 10Muehlenhoff) [07:00:13] PROBLEM - MediaWiki codfw exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=codfw https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=codfw+prometheus/ops [07:00:19] mmm seems to be a lot of Error: 1213 Deadlock found when trying to get lock; try restarting transaction (10.64.48.172) [07:00:22] marostegui: --^ [07:00:45] that is s8 master right? [07:00:48] elukey: that's wikidata, and most likely because of the migration script from Amir1 which puts some load [07:01:09] all right, ack :) [07:01:20] * elukey blames Amir1 :) [07:06:03] amir1 can give you a hug and make it all go away [07:06:36] btw item terms inserts are failing [07:06:53] *wbt [07:06:58] at least some of the time [07:08:00] and we have some memory exhaustion from php too [07:08:13] /srv/mediawiki/php-1.34.0-wmf.24/vendor/wikibase/data-model/src/Entity/EntityId.php on line 229 [07:08:19] PROBLEM - MediaWiki codfw exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=codfw https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=codfw+prometheus/ops [07:08:31] (03CR) 10Mforns: [C: 03+1] "Yes, refinery-source was deployed yesterday. Thanks Luca!" [puppet] - 10https://gerrit.wikimedia.org/r/539385 (owner: 10Mforns) [07:09:34] a bot is trying to set some claims it seems [07:09:37] and these are failing [07:10:51] PROBLEM - MediaWiki eqiad exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [07:11:34] PROBLEM - MediaWiki codfw exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=codfw https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=codfw+prometheus/ops [07:14:49] PROBLEM - MediaWiki codfw exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=codfw https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=codfw+prometheus/ops [07:18:11] (03CR) 10Alexandros Kosiaris: "@jbond This is a preliminary approach to resolve the issue in the linked task, but it does reopen a race condition issue as the token migh" [puppet] - 10https://gerrit.wikimedia.org/r/539462 (https://phabricator.wikimedia.org/T233906) (owner: 10Alexandros Kosiaris) [07:18:59] PROBLEM - MediaWiki eqiad exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [07:19:51] (03PS1) 10Effie Mouzeli: mediawiki: switch search.wikimedia.org to PHP7 [puppet] - 10https://gerrit.wikimedia.org/r/539465 (https://phabricator.wikimedia.org/T229792) [07:21:17] PROBLEM - MediaWiki codfw exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=codfw https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=codfw+prometheus/ops [07:22:11] PROBLEM - MediaWiki eqiad exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [07:22:47] (03PS1) 10Effie Mouzeli: mediawiki: switch wwwportals to PHP7 [puppet] - 10https://gerrit.wikimedia.org/r/539488 (https://phabricator.wikimedia.org/T229792) [07:24:11] (03CR) 10Kosta Harlan: "> Patch Set 2:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/530639 (https://phabricator.wikimedia.org/T219222) (owner: 10Kosta Harlan) [07:24:13] (03PS2) 10Effie Mouzeli: mediawiki: switch wwwportals to PHP7 [puppet] - 10https://gerrit.wikimedia.org/r/539488 (https://phabricator.wikimedia.org/T229792) [07:24:19] (03Abandoned) 10Kosta Harlan: Echo: Enable poll for updates feature on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/530639 (https://phabricator.wikimedia.org/T219222) (owner: 10Kosta Harlan) [07:25:15] (03PS2) 10Muehlenhoff: Switch auth1002/auth2001 to role::test [puppet] - 10https://gerrit.wikimedia.org/r/539145 [07:25:27] (03PS2) 10Effie Mouzeli: mediawiki: switch search.wikimedia.org to PHP7 [puppet] - 10https://gerrit.wikimedia.org/r/539465 (https://phabricator.wikimedia.org/T229792) [07:28:02] 10Operations, 10Mail: Vendor's Emails Not Coming Through - https://phabricator.wikimedia.org/T233991 (10Aklapper) @HMarcus: For future reference, please explain who "we" is and also see https://www.mediawiki.org/wiki/How_to_report_a_bug - Thanks a lot! :) [07:32:51] (03CR) 10Muehlenhoff: [C: 03+2] Switch auth1002/auth2001 to role::test [puppet] - 10https://gerrit.wikimedia.org/r/539145 (owner: 10Muehlenhoff) [07:36:12] (03PS2) 10Muehlenhoff: Use correct database name for PuppetDB Postgres replication check [puppet] - 10https://gerrit.wikimedia.org/r/539346 [07:36:12] !log swift eqiad-prod: remove ms-be1027 - T233289 [07:36:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:36:16] T233289: Unable to power on ms-be1027 - https://phabricator.wikimedia.org/T233289 [07:36:43] PROBLEM - MediaWiki eqiad exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [07:40:37] PROBLEM - MediaWiki codfw exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=codfw https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=codfw+prometheus/ops [07:41:31] PROBLEM - MediaWiki eqiad exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [07:41:47] (03CR) 10Muehlenhoff: [C: 03+2] Use correct database name for PuppetDB Postgres replication check [puppet] - 10https://gerrit.wikimedia.org/r/539346 (owner: 10Muehlenhoff) [07:42:20] now is when I wish we could filter these icinga alerts by wiki (e.g. "ignore exceptions from wikidata") [07:42:36] (03PS1) 10Filippo Giunchedi: Remove ms-be1027 production entries [dns] - 10https://gerrit.wikimedia.org/r/539491 (https://phabricator.wikimedia.org/T233289) [07:43:33] apergos: your wish shall be granted once enough pieces on the alerting infra are in place :) [07:43:47] (03CR) 10Filippo Giunchedi: [C: 03+2] Remove ms-be1027 production entries [dns] - 10https://gerrit.wikimedia.org/r/539491 (https://phabricator.wikimedia.org/T233289) (owner: 10Filippo Giunchedi) [07:43:51] PROBLEM - MediaWiki codfw exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=codfw https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=codfw+prometheus/ops [07:44:56] 10Operations, 10ops-eqiad, 10decommission, 10Patch-For-Review: Decommission ms-be1027 - https://phabricator.wikimedia.org/T233289 (10fgiunchedi) a:05fgiunchedi→03None [07:45:34] 10Operations, 10ops-eqiad, 10decommission, 10Patch-For-Review: Decommission ms-be1027 - https://phabricator.wikimedia.org/T233289 (10fgiunchedi) a:03Cmjohnson @Cmjohnson host is ready for decom! thanks [07:48:27] 10Operations, 10ops-eqiad, 10decommission, 10Patch-For-Review: Decommission ms-be1027 - https://phabricator.wikimedia.org/T233289 (10fgiunchedi) [07:49:43] 10Operations, 10ops-eqiad: (2019-09-15) rack/setup/install ms-be105[1-6].eqiad.wmnet - https://phabricator.wikimedia.org/T232367 (10fgiunchedi) >>! In T232367#5527463, @Cmjohnson wrote: > @fgiunchedi I see you said for raid Partitioning/Raid: "use existing ms-be setup" Unfortunately my memory is not that great... [07:51:47] PROBLEM - MediaWiki codfw exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=codfw https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=codfw+prometheus/ops [07:54:26] 10Operations, 10ops-eqiad: (2019-09-15) rack/setup/install ms-be105[1-6].eqiad.wmnet - https://phabricator.wikimedia.org/T232367 (10fgiunchedi) I see this task is for 6x hosts and parent T228461 is for 9x, wanted to make sure that's expected/wanted ? [07:54:29] ooooohhh, can't wait! [07:55:21] PROBLEM - MediaWiki eqiad exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [08:08:51] PROBLEM - MediaWiki codfw exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=codfw https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=codfw+prometheus/ops [08:08:59] PROBLEM - MediaWiki eqiad exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [08:15:21] PROBLEM - MediaWiki eqiad exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [08:16:47] PROBLEM - MediaWiki codfw exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=codfw https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=codfw+prometheus/ops [08:19:12] PROBLEM - MediaWiki eqiad exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [08:20:02] PROBLEM - MediaWiki codfw exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=codfw https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=codfw+prometheus/ops [08:23:34] PROBLEM - Disk space on elastic1025 is CRITICAL: DISK CRITICAL - free space: /srv 28754 MB (5% inode=99%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=elastic1025&var-datasource=eqiad+prometheus/ops [08:25:52] RECOVERY - Disk space on elastic1025 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=elastic1025&var-datasource=eqiad+prometheus/ops [08:27:32] (03CR) 10Muehlenhoff: "Ack, let's rather work towards obsoleting this; I'll work on a backport to stretch of the ferm version with proper AAAA support today" [puppet] - 10https://gerrit.wikimedia.org/r/381073 (https://phabricator.wikimedia.org/T153468) (owner: 10Hashar) [08:27:52] PROBLEM - MediaWiki codfw exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=codfw https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=codfw+prometheus/ops [08:28:08] PROBLEM - MediaWiki eqiad exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [08:31:46] PROBLEM - MediaWiki codfw exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=codfw https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=codfw+prometheus/ops [08:32:00] PROBLEM - MediaWiki eqiad exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [08:32:39] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "LGTM :-) thanks!" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/537755 (https://phabricator.wikimedia.org/T227290) (owner: 10Bstorm) [08:36:51] (03CR) 10Arturo Borrero Gonzalez: toolforge-kubernetes: restructure pod security policies (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/537732 (https://phabricator.wikimedia.org/T227290) (owner: 10Bstorm) [08:37:13] the mw exceptions/fatals are wikidata, correct ? is there an expectations on when they'll be gone? [08:37:18] apergos: ^ ? [08:38:14] they are wikidata, and they will go away when we have less intense editing by bots or when Amir's script finishes [08:38:35] for context the script is a wb terms migration script of some sort, it was started a couple weeks ago and I dont know how far along it is [08:39:46] 10Operations, 10Traffic: ATS fails to log the used SSLCurve when the SSL session is being reused - https://phabricator.wikimedia.org/T234011 (10Vgutierrez) p:05Triage→03Normal [08:41:30] 10Operations, 10ops-eqiad: Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T234018 (10ops-monitoring-bot) [08:42:36] apergos: ack, thanks for the context [08:42:43] 👍 [08:42:55] Amir1: you might know how long the wb terms migration script might still take ? [08:43:48] 10Operations, 10ops-eqiad, 10cloud-services-team (Kanban): Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T234018 (10aborrero) [08:45:11] 10Operations: Evaluate SSO solutions - https://phabricator.wikimedia.org/T220362 (10MoritzMuehlenhoff) 05Open→03Resolved a:03MoritzMuehlenhoff This was done a while ago, we've settled on Apereo CAS. [08:45:29] PROBLEM - MediaWiki eqiad exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [08:45:47] (03PS2) 10Elukey: analytics::refinery::job::Refine: bump up refinery jar version to v0.0.101 [puppet] - 10https://gerrit.wikimedia.org/r/539385 (owner: 10Mforns) [08:45:49] 10Operations, 10LDAP-Access-Requests, 10SRE-Access-Requests: NDA Request from WMDE employee Verena - https://phabricator.wikimedia.org/T233807 (10Michael_Jahn_WMDE) I approve Verena's request on behalf of Wikimedia Deutschland. [08:45:54] 10Operations, 10ops-eqiad, 10cloud-services-team (Kanban): Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T234018 (10aborrero) p:05Triage→03High [08:46:36] 10Operations: Evaluate SSO solutions - https://phabricator.wikimedia.org/T220362 (10Vgutierrez) do we have the rationale of this choice documented/explained somewhere? [08:48:19] PROBLEM - MediaWiki codfw exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=codfw https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=codfw+prometheus/ops [08:48:23] godog: a month, the old table is 2B rows [08:49:18] (03CR) 10Filippo Giunchedi: [C: 03+2] rsyslog: Support adding metadata to input, default to off [puppet] - 10https://gerrit.wikimedia.org/r/538626 (https://phabricator.wikimedia.org/T207200) (owner: 10Alexandros Kosiaris) [08:51:31] PROBLEM - MediaWiki codfw exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=codfw https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=codfw+prometheus/ops [08:51:34] (03CR) 10Elukey: [C: 03+2] analytics::refinery::job::Refine: bump up refinery jar version to v0.0.101 [puppet] - 10https://gerrit.wikimedia.org/r/539385 (owner: 10Mforns) [08:57:34] (03CR) 10Filippo Giunchedi: "LGTM overall, see inline" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/538627 (https://phabricator.wikimedia.org/T207200) (owner: 10Alexandros Kosiaris) [08:57:55] PROBLEM - MediaWiki codfw exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=codfw https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=codfw+prometheus/ops [08:58:10] Amir1: ack! ow, a month still to go ? [08:59:10] <_joe_> ok, I think it's clear it needs to be reevaluated [08:59:18] <_joe_> I don't think we can keep this rate of fatals [08:59:32] <_joe_> godog: why is the alert saying site=codfw btw? [09:00:01] (03CR) 10Filippo Giunchedi: [C: 03+2] rsyslog: Support adding cee tag to input file [puppet] - 10https://gerrit.wikimedia.org/r/539419 (https://phabricator.wikimedia.org/T207200) (owner: 10Alexandros Kosiaris) [09:00:35] (03CR) 10Filippo Giunchedi: [C: 03+2] rsyslog::input::file: Fix regular expression [puppet] - 10https://gerrit.wikimedia.org/r/539418 (owner: 10Alexandros Kosiaris) [09:01:04] (03CR) 10Arturo Borrero Gonzalez: "The code layout I would use, just to be consistent with other similar code we have, is to:" [puppet] - 10https://gerrit.wikimedia.org/r/539421 (https://phabricator.wikimedia.org/T223907) (owner: 10Jhedden) [09:01:07] PROBLEM - MediaWiki codfw exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=codfw https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=codfw+prometheus/ops [09:01:25] PROBLEM - MediaWiki eqiad exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [09:01:36] _joe_: yeah it shouldn't, duplicated with eqiad [09:01:45] I'll fix that [09:02:43] RECOVERY - MediaWiki codfw exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=codfw+prometheus/ops [09:03:01] RECOVERY - MediaWiki eqiad exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [09:03:02] it's not the script by itself, it's the script interacting with the bot [09:03:05] PROBLEM - Check whether ferm is active by checking the default input chain on ganeti2001 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [09:03:27] PROBLEM - Check systemd state on ganeti2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:04:09] the bot is doing about 35 edits a minute (!) [09:06:46] !log running a few ferm tests on cp1008, puppet disabled [09:06:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:07:31] <_joe_> apergos: 35 edits a minute is not much [09:07:43] for one bot by itself it is [09:08:00] <_joe_> it shouldn't matter is my point. [09:08:57] total edits/min is about 200 [09:09:51] ordinarily it wouldn't matter but the combo is apparntly enough to be a problem [09:09:56] anyways, be back in a little while [09:10:22] (03CR) 10Arturo Borrero Gonzalez: openstack: add designate API ferm rules to haproxy (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/539421 (https://phabricator.wikimedia.org/T223907) (owner: 10Jhedden) [09:18:07] (03PS1) 10Ladsgroup: mediawiki: Make the rebuildItemTerms script slower [puppet] - 10https://gerrit.wikimedia.org/r/539498 [09:21:32] (03CR) 10jerkins-bot: [V: 04-1] mediawiki: Make the rebuildItemTerms script slower [puppet] - 10https://gerrit.wikimedia.org/r/539498 (owner: 10Ladsgroup) [09:22:40] (03CR) 10Effie Mouzeli: "LGTM https://puppet-compiler.wmflabs.org/compiler1001/18641/" [puppet] - 10https://gerrit.wikimedia.org/r/539498 (owner: 10Ladsgroup) [09:23:26] (03CR) 10Giuseppe Lavagetto: [V: 03+2 C: 03+2] mediawiki: Make the rebuildItemTerms script slower [puppet] - 10https://gerrit.wikimedia.org/r/539498 (owner: 10Ladsgroup) [09:23:47] (03PS1) 10Filippo Giunchedi: mediawiki: fix and comment alerts based on logstash metrics [puppet] - 10https://gerrit.wikimedia.org/r/539499 [09:25:12] _joe_: ^ [09:27:14] (03CR) 10jerkins-bot: [V: 04-1] mediawiki: fix and comment alerts based on logstash metrics [puppet] - 10https://gerrit.wikimedia.org/r/539499 (owner: 10Filippo Giunchedi) [09:29:59] that's a timeout/abort [09:30:03] !log jmm@cumin2001 START - Cookbook sre.hosts.downtime [09:30:04] !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [09:30:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:30:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:32:56] 10Operations, 10ops-eqiad, 10DBA: db1114 crashed due to memory issues (server under warranty) - https://phabricator.wikimedia.org/T229452 (10Marostegui) @Cmjohnson did this happen yesterday in the end? [09:33:11] (03PS1) 10Volans: prospector: disable McCabe complexity check [cookbooks] - 10https://gerrit.wikimedia.org/r/539500 [09:33:58] (03CR) 10Filippo Giunchedi: "PCC https://puppet-compiler.wmflabs.org/compiler1002/18642/icinga1001.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/539499 (owner: 10Filippo Giunchedi) [09:34:21] (03CR) 10Filippo Giunchedi: [C: 03+2] mediawiki: fix and comment alerts based on logstash metrics [puppet] - 10https://gerrit.wikimedia.org/r/539499 (owner: 10Filippo Giunchedi) [09:34:31] (03PS2) 10Filippo Giunchedi: mediawiki: fix and comment alerts based on logstash metrics [puppet] - 10https://gerrit.wikimedia.org/r/539499 [09:35:19] (03CR) 10jerkins-bot: [V: 04-1] mediawiki: fix and comment alerts based on logstash metrics [puppet] - 10https://gerrit.wikimedia.org/r/539499 (owner: 10Filippo Giunchedi) [09:35:22] 10Operations, 10ops-eqiad, 10DBA: es1019 IPMI and its management interface are unresponsive (again2) - https://phabricator.wikimedia.org/T233698 (10Marostegui) >>! In T233698#5527397, @Cmjohnson wrote: > @Marostegui Can you depool it leave it for us to do when we get a free moment. It's an easy thing to do... [09:36:54] (03PS3) 10Filippo Giunchedi: mediawiki: fix and comment alerts based on logstash metrics [puppet] - 10https://gerrit.wikimedia.org/r/539499 [09:40:25] (03CR) 10Filippo Giunchedi: [C: 03+2] mediawiki: fix and comment alerts based on logstash metrics [puppet] - 10https://gerrit.wikimedia.org/r/539499 (owner: 10Filippo Giunchedi) [09:40:57] (03PS2) 10Effie Mouzeli: hiera: install php-fpm on maintenance servers [puppet] - 10https://gerrit.wikimedia.org/r/539458 (https://phabricator.wikimedia.org/T229792) [09:42:05] (03PS3) 10Alexandros Kosiaris: rsyslog::input::file: Fix regular expression [puppet] - 10https://gerrit.wikimedia.org/r/539418 [09:42:14] (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] "thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/539418 (owner: 10Alexandros Kosiaris) [09:43:04] (03PS7) 10Alexandros Kosiaris: rsyslog: Support adding metadata to input, default to off [puppet] - 10https://gerrit.wikimedia.org/r/538626 (https://phabricator.wikimedia.org/T207200) [09:43:21] (03CR) 10Alexandros Kosiaris: [V: 03+2] "thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/538626 (https://phabricator.wikimedia.org/T207200) (owner: 10Alexandros Kosiaris) [09:43:36] (03PS3) 10Alexandros Kosiaris: rsyslog: Support adding cee tag to input file [puppet] - 10https://gerrit.wikimedia.org/r/539419 (https://phabricator.wikimedia.org/T207200) [09:43:46] (03CR) 10Alexandros Kosiaris: [V: 03+2] "thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/539419 (https://phabricator.wikimedia.org/T207200) (owner: 10Alexandros Kosiaris) [09:44:36] (03CR) 10Jbond: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/539462 (https://phabricator.wikimedia.org/T233906) (owner: 10Alexandros Kosiaris) [09:50:09] (03PS2) 10Alexandros Kosiaris: add_ip6_mapped: Ignore errors if ip token set fails [puppet] - 10https://gerrit.wikimedia.org/r/539462 (https://phabricator.wikimedia.org/T233906) [09:50:11] (03PS8) 10Alexandros Kosiaris: rsyslog: populate kubernetes configuration [puppet] - 10https://gerrit.wikimedia.org/r/538627 (https://phabricator.wikimedia.org/T207200) [09:50:13] (03PS1) 10Alexandros Kosiaris: rsyslog::config: Support passing a file mode [puppet] - 10https://gerrit.wikimedia.org/r/539502 [09:51:03] (03PS2) 10Alexandros Kosiaris: rsyslog::config: Support passing a file mode [puppet] - 10https://gerrit.wikimedia.org/r/539502 [09:51:06] (03PS9) 10Alexandros Kosiaris: rsyslog: populate kubernetes configuration [puppet] - 10https://gerrit.wikimedia.org/r/538627 (https://phabricator.wikimedia.org/T207200) [09:51:38] (03CR) 10Volans: [C: 03+1] "> Patch Set 7:" (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/537486 (https://phabricator.wikimedia.org/T233053) (owner: 10Ayounsi) [09:52:13] PROBLEM - Too many messages in kafka logging-eqiad on icinga1001 is CRITICAL: cluster=misc exported_cluster=logging-eqiad group=foo instance=kafkamon1001:9501 job=burrow partition={0,1,2} site=eqiad topic=rsyslog-info https://wikitech.wikimedia.org/wiki/Logstash%23Kafka_consumer_lag https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?from=now-3h&to=now&orgId=1&var-datasource=eqiad+prometheus/ops&var-cluster=logging-eqiad [09:52:13] r-consumer_group=All [09:53:28] (03Abandoned) 10Ladsgroup: mediawiki: Use mediawiki::errorpage instead of a hhvm-fatal-error.php.erb [puppet] - 10https://gerrit.wikimedia.org/r/511078 (https://phabricator.wikimedia.org/T113114) (owner: 10Ladsgroup) [09:55:23] godog: kafka consumer lag is horrible --^ [09:55:41] * Urbanecm is going to do an emergency deploy [09:57:26] (03CR) 10Alexandros Kosiaris: rsyslog: populate kubernetes configuration (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/538627 (https://phabricator.wikimedia.org/T207200) (owner: 10Alexandros Kosiaris) [09:57:51] (03PS10) 10Alexandros Kosiaris: rsyslog: populate kubernetes configuration [puppet] - 10https://gerrit.wikimedia.org/r/538627 (https://phabricator.wikimedia.org/T207200) [09:58:13] elukey: thanks I'll take a look [09:59:07] ah yeah that might be a kafkacat client, judging by the consumer group 'foo' [09:59:19] (03PS1) 10Urbanecm: New throttle rule for Czech course [mediawiki-config] - 10https://gerrit.wikimedia.org/r/539503 (https://phabricator.wikimedia.org/T234024) [09:59:32] akosiaris: that might be an involuntary side effect of the kafkacat command I gave you earlier (the consumer lag) [09:59:37] (03CR) 10Urbanecm: [C: 03+2] New throttle rule for Czech course [mediawiki-config] - 10https://gerrit.wikimedia.org/r/539503 (https://phabricator.wikimedia.org/T234024) (owner: 10Urbanecm) [09:59:39] (03CR) 10Effie Mouzeli: [C: 03+2] hiera: install php-fpm on maintenance servers [puppet] - 10https://gerrit.wikimedia.org/r/539458 (https://phabricator.wikimedia.org/T229792) (owner: 10Effie Mouzeli) [10:01:02] (03PS3) 10Effie Mouzeli: hiera: install php-fpm on maintenance servers [puppet] - 10https://gerrit.wikimedia.org/r/539458 (https://phabricator.wikimedia.org/T229792) [10:01:04] godog: ah indeed that's me [10:01:07] (03Merged) 10jenkins-bot: New throttle rule for Czech course [mediawiki-config] - 10https://gerrit.wikimedia.org/r/539503 (https://phabricator.wikimedia.org/T234024) (owner: 10Urbanecm) [10:01:16] I wasn't aware I destroyed something, sorry [10:01:37] (03PS1) 10Filippo Giunchedi: swift: introduce servers_per_port for object-server [puppet] - 10https://gerrit.wikimedia.org/r/539504 (https://phabricator.wikimedia.org/T222366) [10:02:02] akosiaris: nah nothing to worry about, I didn't realize there could be alerts going off [10:02:14] and I only ran it for some 10-20 secs [10:02:20] !log urbanecm@deploy1001 Synchronized wmf-config/throttle.php: New throttle rule for Czech course (T234024) (duration: 00m 59s) [10:02:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:02:25] T234024: Lift account creation throttle for Czech Senior citizens write Wikipedia course - https://phabricator.wikimedia.org/T234024 [10:02:32] anyway, I guess I was the one consuming, not whatever should normally be consuming [10:02:44] anyway, indeed things go just fine to kafka [10:02:51] * Urbanecm is done [10:03:02] so either the consumer or more probably logstash is dropping the message [10:03:42] note that we're using different consumer groups, so logstash shouldn't be affected in any ways consuming wise [10:03:52] (03CR) 10jenkins-bot: New throttle rule for Czech course [mediawiki-config] - 10https://gerrit.wikimedia.org/r/539503 (https://phabricator.wikimedia.org/T234024) (owner: 10Urbanecm) [10:03:58] oh... so what did I do then? just make it slow? [10:04:34] heh not 100% sure yet, if kafkacat isn't running anymore my understanding is that the consumer group should be gone now [10:04:53] elukey: thoughts on ^ ? [10:06:42] (03CR) 10Giuseppe Lavagetto: [C: 04-1] noc: switch catch_all to php-fpm (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/539459 (https://phabricator.wikimedia.org/T229792) (owner: 10Effie Mouzeli) [10:07:43] godog: yes I think so, IIRC Burrow's prometheus exporter might not like the fact that metrics are gone and keep then around [10:07:49] lemme try to restart it [10:07:53] (03CR) 10Alexandros Kosiaris: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/539502 (owner: 10Alexandros Kosiaris) [10:08:46] (03PS2) 10Filippo Giunchedi: swift: introduce servers_per_port for object-server [puppet] - 10https://gerrit.wikimedia.org/r/539504 (https://phabricator.wikimedia.org/T222366) [10:11:01] (03CR) 10Filippo Giunchedi: [C: 03+1] "PCC https://puppet-compiler.wmflabs.org/compiler1001/18644/" [puppet] - 10https://gerrit.wikimedia.org/r/539504 (https://phabricator.wikimedia.org/T222366) (owner: 10Filippo Giunchedi) [10:11:11] bbiab [10:12:34] mmm nope, the consumer group is still there lagging [10:12:41] but it is not affecting the other ones [10:12:52] it is only an indication that the consumer group is stuck/slow [10:12:56] kafka is completely fine [10:14:11] why is it there though in the first place? [10:14:18] I did stop kafkacat with ctrl-c [10:14:25] I have even logged out of the machine [10:15:40] 10Operations, 10Analytics, 10Fundraising-Backlog, 10SRE-Access-Requests: Banner History and page view data access for fundraising analysts - Jerrie and Erin - https://phabricator.wikimedia.org/T233636 (10jrobell) Hi @Nuria and all, Jerrie is full time staff with a req number and Erin is a full time contr... [10:15:58] I don't recall exactly how kafka cleans up the cgroups, but it is not only something on the client side. One of the kafka brokers acts as coordinator for all the consumers in the consumer group, and state is maintained since it might happen that some new consumer joins, some leaves, etc.. [10:16:05] (03PS1) 10Effie Mouzeli: hiera: reduce php-fpm workers on maintenance servers [puppet] - 10https://gerrit.wikimedia.org/r/539505 (https://phabricator.wikimedia.org/T229792) [10:16:19] (03CR) 10Jbond: "recheck" [software/spicerack] - 10https://gerrit.wikimedia.org/r/537468 (owner: 10Jbond) [10:17:50] ah and also kafka stores the offsets for the cgroup in a special topic [10:18:14] but! [10:18:15] https://github.com/linkedin/Burrow/wiki/http-request-remove-consumer-group [10:19:15] interesting, for some reason I was imagining automatic clean up for stale consumer groups [10:19:22] ditto [10:19:58] * akosiaris did not even know what consumer groups was 2 hours ago btw :P [10:20:00] just cleaned up in burrow [10:20:01] but I can see how you'd want to actually keep the consumer group offsets in case producers join later, e.g. a logstash restart [10:20:41] curl -X DELETE localhost:8101/v3/kafka/logging-eqiad/consumer/foo [10:21:06] (03CR) 10jerkins-bot: [V: 04-1] ipmi: use run instead of checkouput [software/spicerack] - 10https://gerrit.wikimedia.org/r/537468 (owner: 10Jbond) [10:21:27] interesting, yeah the consumer group 'foo' is still there in lafka [10:21:30] kafka event [10:21:30] ok it worked, but also with a restart of the prometheus exporter [10:21:41] root@logstash1012:~# kafka-consumer-groups --list --bootstrap-server localhost:9092 [10:22:17] I think we'll have to clean it up in kafka [10:23:09] it shouldn't be a huge deal, and I think that the clean up happens probably at some point, maybe following topic retention settings? [10:23:16] jbond42: https://github.com/psf/requests/issues/5079 [10:23:21] RECOVERY - Too many messages in kafka logging-eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Logstash%23Kafka_consumer_lag https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?from=now-3h&to=now&orgId=1&var-datasource=eqiad+prometheus/ops&var-cluster=logging-eqiad&var-topic=All&var-consumer_group=All [10:23:26] there you go --^ [10:23:38] elukey: yeah indeed it should clean up automatically kafka at some point [10:23:41] logstash1012:~# kafka-consumer-groups --describe --group foo --bootstrap-server localhost:9092 [10:23:43] ack thanks volans [10:24:01] feel free to merge bypassing jenkins it should be transient [10:24:08] nad in any case unrelated to your patch [10:24:13] godog: one thing that we should check is if the prometheus burrow exporter can now clean up stale metrics [10:24:20] ack ok cheers [10:24:32] (03CR) 10Jbond: [V: 03+2 C: 03+2] ipmi: use run instead of checkouput [software/spicerack] - 10https://gerrit.wikimedia.org/r/537468 (owner: 10Jbond) [10:24:40] elukey: indeed [10:25:06] I mean the lag is real in the sense that the consumer group is there and lagging in kafka [10:25:17] just waiting for clean up [10:25:57] (03CR) 10jenkins-bot: ipmi: use run instead of checkouput [software/spicerack] - 10https://gerrit.wikimedia.org/r/537468 (owner: 10Jbond) [10:27:44] elukey akosiaris I'm for wait-and-see if kafka is going to clean it up at some point [10:28:56] ok [10:30:35] 10Operations: Revisit Tomcat deployment of CAS - https://phabricator.wikimedia.org/T233950 (10jbond) FYI the current cas-overlay dose use tomcat however it is embedded this is recommended by apereo to ensure dependencies are all correct but of course we still may want to split things out [10:30:38] godog: https://github.com/jirwin/burrow_exporter/issues/17 [10:30:41] :( [10:30:45] akosiaris: sorry about the scrambling btw, I wasn't expecting kafkacat with consumer groups to have these consequences [10:30:57] well, at least we learned something [10:31:01] let's blame alex for once! :P [10:31:22] sure, why not? [10:31:42] * elukey sends wikilove to akosiaris [10:31:44] lolz [10:31:50] elukey: meh re: the issue [10:32:14] we'll have to find a less intrusive way to tap into kafka-logging [10:33:56] !log marostegui@cumin1001 START - Cookbook sre.hosts.decommission [10:33:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:34:09] !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) [10:34:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:34:13] 10Operations, 10ops-codfw, 10DC-Ops, 10decommission: Decommission db2047.codfw.wmnet - https://phabricator.wikimedia.org/T231852 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by marostegui@cumin1001 for hosts: `db2047.codfw.wmnet` - db2047.codfw.wmnet (**PASS**) - Downtimed host on Ic... [10:34:56] (03CR) 10Effie Mouzeli: [V: 03+1] "OK https://puppet-compiler.wmflabs.org/compiler1001/18645/" [puppet] - 10https://gerrit.wikimedia.org/r/539505 (https://phabricator.wikimedia.org/T229792) (owner: 10Effie Mouzeli) [10:35:04] (03CR) 10Effie Mouzeli: [V: 03+1 C: 03+2] hiera: reduce php-fpm workers on maintenance servers [puppet] - 10https://gerrit.wikimedia.org/r/539505 (https://phabricator.wikimedia.org/T229792) (owner: 10Effie Mouzeli) [10:35:51] godog: what do you mean? For consumer lag? [10:36:02] (03PS1) 10Marostegui: site.pp: Remove db2047 [puppet] - 10https://gerrit.wikimedia.org/r/539507 (https://phabricator.wikimedia.org/T231852) [10:36:19] (03PS1) 10Marostegui: wmnet: Remove production DNS entries for db2047 [dns] - 10https://gerrit.wikimedia.org/r/539508 (https://phabricator.wikimedia.org/T231852) [10:36:46] (03PS2) 10Marostegui: site.pp: Remove db2047 [puppet] - 10https://gerrit.wikimedia.org/r/539507 (https://phabricator.wikimedia.org/T231852) [10:37:09] elukey: yeah, maybe not creating consumer groups at all [10:37:12] (03CR) 10Marostegui: [C: 03+2] "Thanks Anomie, it was more a sanity check. This is just used by DBAs." [software] - 10https://gerrit.wikimedia.org/r/539319 (https://phabricator.wikimedia.org/T233625) (owner: 10Marostegui) [10:37:44] (03Merged) 10jenkins-bot: sX-pager.sql: Remove partitioning from logging table [software] - 10https://gerrit.wikimedia.org/r/539319 (https://phabricator.wikimedia.org/T233625) (owner: 10Marostegui) [10:37:46] (03CR) 10Marostegui: [C: 03+2] site.pp: Remove db2047 [puppet] - 10https://gerrit.wikimedia.org/r/539507 (https://phabricator.wikimedia.org/T231852) (owner: 10Marostegui) [10:37:56] (03CR) 10Marostegui: [C: 03+2] wmnet: Remove production DNS entries for db2047 [dns] - 10https://gerrit.wikimedia.org/r/539508 (https://phabricator.wikimedia.org/T231852) (owner: 10Marostegui) [10:38:06] elukey: the use case we were discussing with akosiaris was to literally tap into the stream of logs, to see what's in kafka before logstash does [10:38:16] without interfering with logstash of course [10:38:37] 10Operations, 10ops-codfw, 10DC-Ops, 10decommission, 10Patch-For-Review: Decommission db2047.codfw.wmnet - https://phabricator.wikimedia.org/T231852 (10Marostegui) [10:39:10] godog: IIRC it should be possible to use kafkacat as regular consumer without cgroup.. that shouldn't mess with any metric/alarm/etc.. [10:39:14] 10Operations, 10ops-codfw, 10DC-Ops, 10decommission: Decommission db2047.codfw.wmnet - https://phabricator.wikimedia.org/T231852 (10Marostegui) a:05RobH→03Papaul Host ready for @Papaul for onsite steps and switch disablement [10:39:18] but no idea about the use vase [10:39:20] *case [10:41:48] it should yeah, that might be the answer [10:41:58] 10Operations, 10decommission, 10User-fgiunchedi: Return graphite200[12] to spares pool - https://phabricator.wikimedia.org/T199321 (10MoritzMuehlenhoff) [10:42:21] 10Operations, 10Analytics, 10User-Elukey: setup/install krb2001/WMF6577 - https://phabricator.wikimedia.org/T233142 (10elukey) Very strange, the debian install works in setting up raids and lvm volumes, but fail when installing grub. I noticed that the host has multiple huge disks (4TB each), and they all ge... [10:42:46] 10Operations, 10decommission, 10User-fgiunchedi: Return graphite200[12] to spares pool - https://phabricator.wikimedia.org/T199321 (10MoritzMuehlenhoff) a:05MoritzMuehlenhoff→03RobH These are now ready to be wiped/reclaimed as spares. [10:43:33] 10Operations: Banning IPs / subnets from accessing login/validation endpoint - https://phabricator.wikimedia.org/T233945 (10jbond) Relevant https://apereo.github.io/cas/development/installation/Configuring-Authentication-Throttling.html [10:45:00] 10Operations, 10ops-eqiad, 10DBA: db1114 crashed due to memory issues (server under warranty) - https://phabricator.wikimedia.org/T229452 (10Cmjohnson) @Marostegui the tech couldn’t make it in time yesterday and were scheduled today 1100-1300 local time. [10:45:32] 10Operations, 10ops-eqiad, 10DBA: db1114 crashed due to memory issues (server under warranty) - https://phabricator.wikimedia.org/T229452 (10Marostegui) >>! In T229452#5529113, @Cmjohnson wrote: > @Marostegui the tech couldn’t make it in time yesterday and were scheduled > today 1100-1300 local time. Great!... [10:49:15] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db2088:3311 db2091:3312 db2084:3314 db2089:3315 db2089:3316 db2087:3317 T233625', diff saved to https://phabricator.wikimedia.org/P9213 and previous config saved to /var/cache/conftool/dbconfig/20190927-104914-marostegui.json [10:49:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:49:19] T233625: Change PK and remove partitions from the logging table - https://phabricator.wikimedia.org/T233625 [10:51:11] 10Operations: Revisit Tomcat deployment of CAS - https://phabricator.wikimedia.org/T233950 (10MoritzMuehlenhoff) Ack, what I meant was using the Tomcat packages as shipped in Debian [10:54:51] 10Operations: CLI tools for CAS administration - https://phabricator.wikimedia.org/T233940 (10jbond) relevent: https://apereo.github.io/cas/6.0.x/monitoring/Monitoring-Statistics.html [10:55:35] (03CR) 10Jcrespo: "> Patch Set 11:" [puppet] - 10https://gerrit.wikimedia.org/r/538239 (https://phabricator.wikimedia.org/T229209) (owner: 10Jcrespo) [11:05:46] 10Operations: Integrate CAS into backup infrastructure - https://phabricator.wikimedia.org/T233936 (10jbond) i think ultimately all the none volatile data like u2f registration and audit data should live in one of the [[ https://apereo.github.io/cas/6.0.x/configuration/Configuration-Properties.html#fido-u2f-couc... [11:06:29] 10Operations, 10Analytics, 10User-Elukey: setup/install krb2001/WMF6577 - https://phabricator.wikimedia.org/T233142 (10MoritzMuehlenhoff) >>! In T233142#5529100, @elukey wrote: > Very strange, the debian install works in setting up raids and lvm volumes, but fail when installing grub. I noticed that the host... [11:07:45] 10Operations, 10Analytics, 10User-Elukey: setup/install krb2001/WMF6577 - https://phabricator.wikimedia.org/T233142 (10elukey) >>! In T233142#5529150, @MoritzMuehlenhoff wrote: >>>! In T233142#5529100, @elukey wrote: >> Very strange, the debian install works in setting up raids and lvm volumes, but fail when... [11:09:17] 10Operations: Create a staging environment for CAS - https://phabricator.wikimedia.org/T233930 (10jbond) As a side not to this i have been using a [[ https://github.com/b4ldr/cas-overlay-template/tree/6.0 | local overlay ]] with an [[ https://github.com/b4ldr/cas-overlay-template/tree/6.0/docker/ldap | ldap dock... [11:12:36] 10Operations, 10Analytics, 10User-Elukey: setup/install krb2001/WMF6577 - https://phabricator.wikimedia.org/T233142 (10elukey) The new recipe seems to have worked! [11:17:54] PROBLEM - Disk space on elastic1025 is CRITICAL: DISK CRITICAL - free space: /srv 26796 MB (5% inode=99%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=elastic1025&var-datasource=eqiad+prometheus/ops [11:18:43] (03PS1) 10Jbond: remove TOTP support [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/539512 [11:24:40] (03PS1) 10Mathew.onipe: query_service: properly adapt query_service profile [puppet] - 10https://gerrit.wikimedia.org/r/539513 (https://phabricator.wikimedia.org/T232297) [11:26:10] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/539512 (owner: 10Jbond) [11:29:44] 10Operations, 10Patch-For-Review: Add U2F/FIDO as second factor for CAS - https://phabricator.wikimedia.org/T233937 (10MoritzMuehlenhoff) John and I have discussed next steps on IRC: Initially we'll make U2F opt-in via a memberOf/LDAP check. At a later step we'll add TOTP support (ideally in a way that allows... [11:32:38] (03PS1) 10Jbond: apereo_cas: configure to used MFA based obn ldap group membership [puppet] - 10https://gerrit.wikimedia.org/r/539515 (https://phabricator.wikimedia.org/T233937) [11:36:16] (03PS2) 10Jbond: apereo_cas: configure to used MFA based obn ldap group membership [puppet] - 10https://gerrit.wikimedia.org/r/539515 (https://phabricator.wikimedia.org/T233937) [11:40:26] RECOVERY - Disk space on elastic1025 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=elastic1025&var-datasource=eqiad+prometheus/ops [12:02:33] (03PS1) 10KartikMistry: Enable CX out of beta in Tagalog and Central Bikol Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/539517 (https://phabricator.wikimedia.org/T233006) [12:03:29] 10Operations, 10Beta-Cluster-Infrastructure, 10DNS, 10Traffic, and 4 others: Ferm's upstream Net::DNS Perl library questionable handling of NOERROR responses without records causing puppet errors when we try to @resolve AAAA in labs - https://phabricator.wikimedia.org/T153468 (10MoritzMuehlenhoff) I've bui... [12:06:16] !log install gnupg2 security update from Buster 10.1 point release [12:06:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:14:46] !log reimaging auth2001 to buster [12:14:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:14:50] (03PS1) 10Elukey: Change partman recipe for krb2001 [puppet] - 10https://gerrit.wikimedia.org/r/539518 (https://phabricator.wikimedia.org/T233142) [12:15:04] (03CR) 10Alexandros Kosiaris: "https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler-test/273/console PCC happy, merging" [puppet] - 10https://gerrit.wikimedia.org/r/539502 (owner: 10Alexandros Kosiaris) [12:15:22] (03PS3) 10Alexandros Kosiaris: rsyslog::config: Support passing a file mode [puppet] - 10https://gerrit.wikimedia.org/r/539502 [12:15:28] (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] rsyslog::config: Support passing a file mode [puppet] - 10https://gerrit.wikimedia.org/r/539502 (owner: 10Alexandros Kosiaris) [12:16:31] (03CR) 10Elukey: [C: 03+2] Change partman recipe for krb2001 [puppet] - 10https://gerrit.wikimedia.org/r/539518 (https://phabricator.wikimedia.org/T233142) (owner: 10Elukey) [12:16:41] (03PS2) 10Elukey: Change partman recipe for krb2001 [puppet] - 10https://gerrit.wikimedia.org/r/539518 (https://phabricator.wikimedia.org/T233142) [12:16:43] (03CR) 10Elukey: [V: 03+2 C: 03+2] Change partman recipe for krb2001 [puppet] - 10https://gerrit.wikimedia.org/r/539518 (https://phabricator.wikimedia.org/T233142) (owner: 10Elukey) [12:17:10] (03PS11) 10Alexandros Kosiaris: rsyslog: populate kubernetes configuration [puppet] - 10https://gerrit.wikimedia.org/r/538627 (https://phabricator.wikimedia.org/T207200) [12:17:12] (03PS1) 10Alexandros Kosiaris: rsyslog: Correctly parse docker logs [puppet] - 10https://gerrit.wikimedia.org/r/539519 (https://phabricator.wikimedia.org/T207200) [12:18:26] godog: I split off the contentious part in https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/539519/ so that https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/538627/ is easier to land [12:18:29] mmm I failed to puppet-merge in codfw [12:18:37] maybe a race with you akosiaris ? [12:18:40] elukey: probably [12:18:43] I was merging [12:18:50] what do you mean puppet-merge in codfw? [12:18:58] (03PS1) 10KartikMistry: Update cxserver to 2019-09-26-034732-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/539520 (https://phabricator.wikimedia.org/T233834) [12:19:01] just a few of the backends ? [12:19:12] or you literraly ran the command on puppetmaster2001 ? [12:19:14] yes sorry, it failed on rodhium and puppetmaster2001 [12:19:30] nono I ran it on 1001 [12:19:36] I got it too btw [12:19:49] failed on puppetmaster2002, correction [12:20:03] anyway, I am sure that the next puppet merge will fix it [12:20:22] PROBLEM - Disk space on elastic1025 is CRITICAL: DISK CRITICAL - free space: /srv 23217 MB (4% inode=99%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=elastic1025&var-datasource=eqiad+prometheus/ops [12:22:05] (03CR) 10Alexandros Kosiaris: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/538627 (https://phabricator.wikimedia.org/T207200) (owner: 10Alexandros Kosiaris) [12:22:36] (03PS1) 10Elukey: site.pp: update comments for krb2001 [puppet] - 10https://gerrit.wikimedia.org/r/539521 [12:22:49] trying with --^ [12:23:58] (03CR) 10Elukey: [C: 03+2] site.pp: update comments for krb2001 [puppet] - 10https://gerrit.wikimedia.org/r/539521 (owner: 10Elukey) [12:25:16] akosiaris: ack, thanks! I'll take a look [12:25:25] (03CR) 10CDanis: [C: 03+1] swift: introduce servers_per_port for object-server [puppet] - 10https://gerrit.wikimedia.org/r/539504 (https://phabricator.wikimedia.org/T222366) (owner: 10Filippo Giunchedi) [12:25:34] all good! [12:30:54] !log installing glib2.0 security updates on Buster [12:30:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:33:47] elukey: cool, thanks [12:35:51] 10Operations, 10DBA, 10Data-Services, 10cloud-services-team: Prepare and check storage layer for nqowiki - https://phabricator.wikimedia.org/T230543 (10Marostegui) a:05Marostegui→03None db1124 (sanitarium), db2094 (sanitarium), labsdb1009, 1010, 1011 and 1012 are clean. I have created the `nqowiki_p` o... [12:36:53] (03CR) 10CDanis: "This looks good to me, thanks!" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/536586 (https://phabricator.wikimedia.org/T162123) (owner: 10Filippo Giunchedi) [12:37:37] (03CR) 10CDanis: [C: 03+1] swiftrepl: handle empty srcobjects when scanning for dstobjects [software] - 10https://gerrit.wikimedia.org/r/537610 (https://phabricator.wikimedia.org/T231110) (owner: 10Filippo Giunchedi) [12:37:42] !log killing stray processes from old openjdk-8 build on boron (probably test suite not properly terminated) [12:37:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:39:54] (03CR) 10Alexandros Kosiaris: "> I'm not sure this is something we should deploy globaly. if a server has a AAAA record then it may be more desirable for the server to " [puppet] - 10https://gerrit.wikimedia.org/r/539462 (https://phabricator.wikimedia.org/T233906) (owner: 10Alexandros Kosiaris) [12:40:49] (03CR) 10CDanis: [C: 03+1] "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/539342 (https://phabricator.wikimedia.org/T233956) (owner: 10Filippo Giunchedi) [12:41:39] !log jmm@cumin2001 START - Cookbook sre.hosts.downtime [12:41:40] (03CR) 10CDanis: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/510139 (owner: 10Jbond) [12:41:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:43:45] (03PS3) 10Alexandros Kosiaris: add_ip6_mapped: Ignore errors if ip token set fails [puppet] - 10https://gerrit.wikimedia.org/r/539462 (https://phabricator.wikimedia.org/T233906) [12:43:54] !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [12:43:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:44:40] (03PS1) 10Alexandros Kosiaris: Disable ip6 mapped addresses for ganeti hosts [puppet] - 10https://gerrit.wikimedia.org/r/539523 (https://phabricator.wikimedia.org/T233906) [12:44:52] (03Abandoned) 10Alexandros Kosiaris: add_ip6_mapped: Ignore errors if ip token set fails [puppet] - 10https://gerrit.wikimedia.org/r/539462 (https://phabricator.wikimedia.org/T233906) (owner: 10Alexandros Kosiaris) [12:47:50] (03PS1) 10Elukey: site.pp: add role::kerberos::kdc to kdc2001 [puppet] - 10https://gerrit.wikimedia.org/r/539524 (https://phabricator.wikimedia.org/T226089) [12:48:42] (03CR) 10jerkins-bot: [V: 04-1] site.pp: add role::kerberos::kdc to kdc2001 [puppet] - 10https://gerrit.wikimedia.org/r/539524 (https://phabricator.wikimedia.org/T226089) (owner: 10Elukey) [12:48:50] !log installing openldap security updates on Buster [12:48:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:50:40] krb2001.eqiad.wmnet [12:50:43] ok Luca keep going [12:50:45] sigh [12:52:21] (03PS2) 10Elukey: site.pp: add role::kerberos::kdc to kdc2001 [puppet] - 10https://gerrit.wikimedia.org/r/539524 (https://phabricator.wikimedia.org/T226089) [12:52:49] 10Operations, 10media-storage: bring swift eqiad to one zone per row - https://phabricator.wikimedia.org/T138496 (10fgiunchedi) 05Open→03Resolved Row balancing has occurred naturally as we've cycled through hardware [12:52:56] (03PS1) 10Marostegui: db1081: Fix typo [puppet] - 10https://gerrit.wikimedia.org/r/539525 [12:53:15] (03CR) 10Muehlenhoff: [C: 03+1] site.pp: add role::kerberos::kdc to kdc2001 [puppet] - 10https://gerrit.wikimedia.org/r/539524 (https://phabricator.wikimedia.org/T226089) (owner: 10Elukey) [12:54:29] (03CR) 10Marostegui: [C: 03+2] db1081: Fix typo [puppet] - 10https://gerrit.wikimedia.org/r/539525 (owner: 10Marostegui) [12:55:51] 10Operations, 10media-storage: High CPU usage from swift-proxy on frontend machines - https://phabricator.wikimedia.org/T156143 (10fgiunchedi) 05Open→03Declined Hasn't reoccurred through multiple depool cycles, in the meantime swift has been upgraded too, declining [12:56:04] !log elukey@cumin1001 START - Cookbook sre.hosts.downtime [12:56:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:56:56] second kerberos node is rising [12:57:44] (03CR) 10Filippo Giunchedi: [C: 03+2] swift: introduce servers_per_port for object-server [puppet] - 10https://gerrit.wikimedia.org/r/539504 (https://phabricator.wikimedia.org/T222366) (owner: 10Filippo Giunchedi) [12:57:52] (03PS3) 10Filippo Giunchedi: swift: introduce servers_per_port for object-server [puppet] - 10https://gerrit.wikimedia.org/r/539504 (https://phabricator.wikimedia.org/T222366) [12:58:16] !log elukey@cumin1001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [12:58:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:58:27] (03PS2) 10Alexandros Kosiaris: Disable ip6 mapped addresses for ganeti hosts [puppet] - 10https://gerrit.wikimedia.org/r/539523 (https://phabricator.wikimedia.org/T233906) [12:58:34] (03PS1) 10Marostegui: db2113: Specify it is the candidate master for s5 [puppet] - 10https://gerrit.wikimedia.org/r/539527 [12:58:39] (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] Disable ip6 mapped addresses for ganeti hosts [puppet] - 10https://gerrit.wikimedia.org/r/539523 (https://phabricator.wikimedia.org/T233906) (owner: 10Alexandros Kosiaris) [13:01:00] (03PS4) 10Filippo Giunchedi: swift: introduce servers_per_port for object-server [puppet] - 10https://gerrit.wikimedia.org/r/539504 (https://phabricator.wikimedia.org/T222366) [13:01:05] (03CR) 10Filippo Giunchedi: [V: 03+2 C: 03+2] swift: introduce servers_per_port for object-server [puppet] - 10https://gerrit.wikimedia.org/r/539504 (https://phabricator.wikimedia.org/T222366) (owner: 10Filippo Giunchedi) [13:01:58] (03PS2) 10Marostegui: db2113: Specify it is the candidate master for s5 [puppet] - 10https://gerrit.wikimedia.org/r/539527 [13:02:24] * godog body-checked marostegui [13:02:37] sorry ! [13:02:51] hahaha [13:03:02] !log Disable puppet on mwmaint1002 to test noc.wikimedia.org with PHP7 [13:03:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:03:26] (03CR) 10Marostegui: [C: 03+2] db2113: Specify it is the candidate master for s5 [puppet] - 10https://gerrit.wikimedia.org/r/539527 (owner: 10Marostegui) [13:06:48] RECOVERY - Disk space on elastic1025 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=elastic1025&var-datasource=eqiad+prometheus/ops [13:07:58] (03CR) 10Jhedden: "> Patch Set 7:" [puppet] - 10https://gerrit.wikimedia.org/r/539421 (https://phabricator.wikimedia.org/T223907) (owner: 10Jhedden) [13:08:02] 10Operations, 10media-storage: ms-be1034 crash - https://phabricator.wikimedia.org/T214838 (10fgiunchedi) 05Open→03Declined Will be done as part of {T141756}, resolving [13:08:18] !log marostegui@cumin1001 START - Cookbook sre.hosts.decommission [13:08:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:08:30] !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) [13:08:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:08:34] 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: Decommission dbproxy1005.eqiad.wmnet - https://phabricator.wikimedia.org/T231967 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by marostegui@cumin1001 for hosts: `dbproxy1005.eqiad.wmnet` - dbproxy1005.eqiad.wmnet (**PASS**) - Downt... [13:09:06] !log reboot ganeti2001 T233906 [13:09:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:09:10] T233906: Broken network connection on ganeti2001 after reboot - https://phabricator.wikimedia.org/T233906 [13:09:52] (03PS1) 10Marostegui: wmnet: Remove DNS entries for dbproxy1005 [dns] - 10https://gerrit.wikimedia.org/r/539528 (https://phabricator.wikimedia.org/T231967) [13:10:00] PROBLEM - Host ganeti2001 is DOWN: PING CRITICAL - Packet loss = 100% [13:10:16] RECOVERY - Host ganeti2001 is UP: PING WARNING - Packet loss = 66%, RTA = 186.37 ms [13:10:38] (03PS1) 10Marostegui: site.pp: Remove references to dbproxy1005 [puppet] - 10https://gerrit.wikimedia.org/r/539530 (https://phabricator.wikimedia.org/T231967) [13:11:10] RECOVERY - Check whether ferm is active by checking the default input chain on ganeti2001 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [13:11:14] RECOVERY - Check systemd state on ganeti2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [13:11:21] 10Operations, 10Analytics, 10User-Elukey: setup/install krb2001/WMF6577 - https://phabricator.wikimedia.org/T233142 (10elukey) ` elukey@krb2001:~$ df -h Filesystem Size Used Avail Use% Mounted on udev 32G 0 32G 0% /dev tmpfs 6.3G... [13:11:39] (03CR) 10Marostegui: [C: 03+2] site.pp: Remove references to dbproxy1005 [puppet] - 10https://gerrit.wikimedia.org/r/539530 (https://phabricator.wikimedia.org/T231967) (owner: 10Marostegui) [13:11:44] (03CR) 10Marostegui: [C: 03+2] wmnet: Remove DNS entries for dbproxy1005 [dns] - 10https://gerrit.wikimedia.org/r/539528 (https://phabricator.wikimedia.org/T231967) (owner: 10Marostegui) [13:12:24] 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: Decommission dbproxy1005.eqiad.wmnet - https://phabricator.wikimedia.org/T231967 (10Marostegui) a:05RobH→03Cmjohnson [13:12:38] 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: Decommission dbproxy1005.eqiad.wmnet - https://phabricator.wikimedia.org/T231967 (10Marostegui) Host ready for switch disablement and on-site steps [13:14:19] (03PS3) 10Effie Mouzeli: noc: switch catch_all to php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/539459 (https://phabricator.wikimedia.org/T229792) [13:14:21] (03CR) 10Filippo Giunchedi: swift: add swiftrepl (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/536586 (https://phabricator.wikimedia.org/T162123) (owner: 10Filippo Giunchedi) [13:14:31] (03PS10) 10Filippo Giunchedi: swift: add swiftrepl [puppet] - 10https://gerrit.wikimedia.org/r/536586 (https://phabricator.wikimedia.org/T162123) [13:14:33] (03PS8) 10Filippo Giunchedi: WIP: turn on swiftrepl on swift frontends [puppet] - 10https://gerrit.wikimedia.org/r/537613 [13:14:36] 10Operations: Track remaining jessie systems in production - https://phabricator.wikimedia.org/T224549 (10MoritzMuehlenhoff) [13:14:48] (03CR) 10jerkins-bot: [V: 04-1] noc: switch catch_all to php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/539459 (https://phabricator.wikimedia.org/T229792) (owner: 10Effie Mouzeli) [13:15:03] 10Operations: Broken network connection on ganeti2001 after reboot - https://phabricator.wikimedia.org/T233906 (10akosiaris) 05Open→03Resolved We 've sidestepped the problem for now by disabling ip6 mapped addresses for ganeti hosts. This solves the chicken and problem, although we should arguably find a way... [13:15:24] moritzm: ^ this should unblock you for now [13:15:59] (03CR) 10Filippo Giunchedi: [V: 03+2 C: 03+2] swiftrepl: handle empty srcobjects when scanning for dstobjects [software] - 10https://gerrit.wikimedia.org/r/537610 (https://phabricator.wikimedia.org/T231110) (owner: 10Filippo Giunchedi) [13:16:09] ack, saw that, thanks. will resume codfw reboots on Monday [13:16:15] (03CR) 10jerkins-bot: [V: 04-1] WIP: turn on swiftrepl on swift frontends [puppet] - 10https://gerrit.wikimedia.org/r/537613 (owner: 10Filippo Giunchedi) [13:16:20] !log reimaging auth1002 to buster [13:16:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:17:36] (03PS1) 10Elukey: Add AAAA/PTR IPv6 records for kerb2001.codfw.wmnet [dns] - 10https://gerrit.wikimedia.org/r/539532 (https://phabricator.wikimedia.org/T233142) [13:18:39] (03PS2) 10Elukey: Add AAAA/PTR IPv6 records for kerb2001.codfw.wmnet [dns] - 10https://gerrit.wikimedia.org/r/539532 (https://phabricator.wikimedia.org/T233142) [13:18:54] moritzm: is it ok to run dns-update now or better wait? [13:19:52] 10Operations, 10Analytics, 10Patch-For-Review, 10User-Elukey: setup/install krb2001/WMF6577 - https://phabricator.wikimedia.org/T233142 (10elukey) 05Open→03Resolved [13:19:55] 10Operations, 10Analytics, 10hardware-requests, 10User-Elukey: codfw: 1 misc node for the Kerberos KDC service - https://phabricator.wikimedia.org/T227425 (10elukey) [13:19:57] 10Operations: Integrate Buster 10.1 point update - https://phabricator.wikimedia.org/T232310 (10MoritzMuehlenhoff) [13:20:32] (03PS3) 10Elukey: site.pp: add role::kerberos::kdc to kdc2001 [puppet] - 10https://gerrit.wikimedia.org/r/539524 (https://phabricator.wikimedia.org/T226089) [13:21:38] (03CR) 10Jhedden: openstack: add designate API ferm rules to haproxy (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/539421 (https://phabricator.wikimedia.org/T223907) (owner: 10Jhedden) [13:21:50] (03CR) 10Elukey: [C: 03+2] site.pp: add role::kerberos::kdc to kdc2001 [puppet] - 10https://gerrit.wikimedia.org/r/539524 (https://phabricator.wikimedia.org/T226089) (owner: 10Elukey) [13:23:54] (03PS1) 10Jhedden: openstack: add designate ferm rules to haproxy [puppet] - 10https://gerrit.wikimedia.org/r/539533 (https://phabricator.wikimedia.org/T223907) [13:24:47] (03CR) 10Jhedden: "Proposing an alternate to https://gerrit.wikimedia.org/r/c/operations/puppet/+/539421" [puppet] - 10https://gerrit.wikimedia.org/r/539421 (https://phabricator.wikimedia.org/T223907) (owner: 10Jhedden) [13:25:23] (03CR) 10Jhedden: "proposing an alternate method to https://gerrit.wikimedia.org/r/c/operations/puppet/+/539421" [puppet] - 10https://gerrit.wikimedia.org/r/539533 (https://phabricator.wikimedia.org/T223907) (owner: 10Jhedden) [13:27:59] (03CR) 10Jhedden: "PCC results: https://puppet-compiler.wmflabs.org/compiler1002/18647/" [puppet] - 10https://gerrit.wikimedia.org/r/539533 (https://phabricator.wikimedia.org/T223907) (owner: 10Jhedden) [13:29:02] elukey: seems fine [13:29:42] !log jmm@cumin1001 START - Cookbook sre.hosts.downtime [13:29:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:31:55] !log jmm@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [13:31:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:32:53] (03PS4) 10Effie Mouzeli: noc: switch catch_all to php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/539459 (https://phabricator.wikimedia.org/T229792) [13:33:08] (03Abandoned) 10Jhedden: openstack: add designate API ferm rules to haproxy [puppet] - 10https://gerrit.wikimedia.org/r/539421 (https://phabricator.wikimedia.org/T223907) (owner: 10Jhedden) [13:33:52] !log Set candidate masters in dbctl T234039 [13:33:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:33:55] T234039: Add "candidate master and sanitarium master" on dbctl for the candidate masters and sanitarium masters in eqiad/codfw - https://phabricator.wikimedia.org/T234039 [13:35:55] PROBLEM - Check systemd state on krb2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [13:36:01] (03CR) 10Effie Mouzeli: [V: 03+1] "OK https://puppet-compiler.wmflabs.org/compiler1001/18649/mwmaint1002.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/539459 (https://phabricator.wikimedia.org/T229792) (owner: 10Effie Mouzeli) [13:36:07] (03CR) 10Effie Mouzeli: [V: 03+1 C: 03+2] noc: switch catch_all to php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/539459 (https://phabricator.wikimedia.org/T229792) (owner: 10Effie Mouzeli) [13:43:44] (03Abandoned) 10Jhedden: openstack: add designate ferm rules to haproxy [puppet] - 10https://gerrit.wikimedia.org/r/539533 (https://phabricator.wikimedia.org/T223907) (owner: 10Jhedden) [13:45:20] 10Operations, 10ops-eqiad: Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T234044 (10ops-monitoring-bot) [13:47:04] (03PS1) 10Filippo Giunchedi: swift: open per-port object server ports [puppet] - 10https://gerrit.wikimedia.org/r/539535 (https://phabricator.wikimedia.org/T162123) [13:51:13] 10Operations, 10ops-codfw, 10DC-Ops, 10decommission: Decommission db2047.codfw.wmnet - https://phabricator.wikimedia.org/T231852 (10Papaul) ` papaul@asw-c-codfw# show | compare [edit interfaces interface-range vlan-private1-c-codfw] - member ge-6/0/16; [edit interfaces interface-range disabled] me... [13:51:52] 10Operations, 10ops-codfw, 10DC-Ops, 10decommission: Decommission db2047.codfw.wmnet - https://phabricator.wikimedia.org/T231852 (10Papaul) [13:56:01] (03CR) 10Giuseppe Lavagetto: [C: 03+1] mediawiki: switch search.wikimedia.org to PHP7 [puppet] - 10https://gerrit.wikimedia.org/r/539465 (https://phabricator.wikimedia.org/T229792) (owner: 10Effie Mouzeli) [13:56:42] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "Update a comment, then LGTM" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/539488 (https://phabricator.wikimedia.org/T229792) (owner: 10Effie Mouzeli) [14:02:19] 10Operations, 10ops-eqiad, 10DBA: db1074 crashed: Broken BBU - https://phabricator.wikimedia.org/T231638 (10Jclark-ctr) [14:04:16] (03PS3) 10Effie Mouzeli: mediawiki: switch wwwportals to PHP7 [puppet] - 10https://gerrit.wikimedia.org/r/539488 (https://phabricator.wikimedia.org/T229792) [14:08:50] 10Operations, 10ops-codfw, 10media-storage: rack/setup/install ms-be205[1-6].codfw.wmnet - https://phabricator.wikimedia.org/T233638 (10Papaul) [14:09:18] 10Operations, 10DC-Ops, 10decommission: decommission elastic1017 - https://phabricator.wikimedia.org/T234045 (10MoritzMuehlenhoff) a:03Gehel [14:09:41] 10Operations, 10DC-Ops, 10decommission: decommission elastic1017 - https://phabricator.wikimedia.org/T234045 (10MoritzMuehlenhoff) [14:10:30] 10Operations: Migrate ldap/corp replicas to Stretch/Buster - https://phabricator.wikimedia.org/T224557 (10MoritzMuehlenhoff) a:03MoritzMuehlenhoff [14:17:43] 10Operations, 10ops-eqiad, 10Analytics, 10Analytics-Kanban, 10netops: Move cloudvirtan* hardware out of CloudVPS back into production Analytics VLAN. - https://phabricator.wikimedia.org/T225128 (10elukey) 05Open→03Resolved ` elukey@asw2-a-eqiad> show ethernet-switching interface xe-4/0/37 Routing Ins... [14:19:50] (03CR) 10Elukey: [C: 03+2] Add AAAA/PTR IPv6 records for kerb2001.codfw.wmnet [dns] - 10https://gerrit.wikimedia.org/r/539532 (https://phabricator.wikimedia.org/T233142) (owner: 10Elukey) [14:22:38] 10Operations, 10ops-eqiad: Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T234044 (10aborrero) [14:22:41] 10Operations, 10ops-eqiad, 10cloud-services-team (Kanban): Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T234018 (10aborrero) [14:26:40] 10Operations, 10netops: Extend firewall rules for new corp LDAP replicas - https://phabricator.wikimedia.org/T234047 (10MoritzMuehlenhoff) [14:27:34] 10Operations, 10ops-eqiad: (2019-09-15) rack/setup/install ms-be105[1-6].eqiad.wmnet - https://phabricator.wikimedia.org/T232367 (10Jclark-ctr) @fgiunchedi 3 backend systems to replace ms-be101[6-8] [14:28:54] (03PS2) 10Arturo Borrero Gonzalez: toolforge: k8s: haproxy: add proxy redirection for nginx-ingress [puppet] - 10https://gerrit.wikimedia.org/r/527544 (https://phabricator.wikimedia.org/T228500) [14:29:12] (03CR) 10Alexandros Kosiaris: [C: 03+1] "Sorry, I got so deep in rsyslog hell, that I missed the ping. I 've had a look at PCC (https://puppet-compiler.wmflabs.org/compiler1001/18" [puppet] - 10https://gerrit.wikimedia.org/r/538239 (https://phabricator.wikimedia.org/T229209) (owner: 10Jcrespo) [14:30:26] (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM, and thanks for splitting the json parsing part out. I'm +1 on this on the condition that we merge on Monday" [puppet] - 10https://gerrit.wikimedia.org/r/538627 (https://phabricator.wikimedia.org/T207200) (owner: 10Alexandros Kosiaris) [14:30:48] 10Operations, 10ops-eqiad, 10cloud-services-team (Kanban): Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T234018 (10aborrero) [14:30:53] 10Operations, 10ops-eqiad, 10Cloud-VPS, 10cloud-services-team (Kanban): rack/setup/install cloudvirt102[34] - https://phabricator.wikimedia.org/T199125 (10aborrero) [14:31:05] (03CR) 10Filippo Giunchedi: rsyslog: Correctly parse docker logs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/539519 (https://phabricator.wikimedia.org/T207200) (owner: 10Alexandros Kosiaris) [14:31:29] (03CR) 10Alexandros Kosiaris: "Sure thing!" [puppet] - 10https://gerrit.wikimedia.org/r/538627 (https://phabricator.wikimedia.org/T207200) (owner: 10Alexandros Kosiaris) [14:32:06] (03PS1) 10Effie Mouzeli: WIP: mediawiki: remove cleanup apache configs from hhvm [puppet] - 10https://gerrit.wikimedia.org/r/539541 (https://phabricator.wikimedia.org/T229792) [14:32:46] (03CR) 10Filippo Giunchedi: "> Patch Set 2: Code-Review+1" [puppet] - 10https://gerrit.wikimedia.org/r/539342 (https://phabricator.wikimedia.org/T233956) (owner: 10Filippo Giunchedi) [14:32:58] !log Disable puppet and reload apache on mw* for 539465 and 539488 - T229792 [14:33:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:33:02] T229792: Remove HHVM from production - https://phabricator.wikimedia.org/T229792 [14:33:03] (03CR) 10Alexandros Kosiaris: "> It is true that logstash's configuration is complex, at the advantage of centralizing application-specific logic and IMHO it is more eas" [puppet] - 10https://gerrit.wikimedia.org/r/539519 (https://phabricator.wikimedia.org/T207200) (owner: 10Alexandros Kosiaris) [14:34:17] (03PS1) 10Jhedden: openstack: add desginate API ferm rules to haproxy [puppet] - 10https://gerrit.wikimedia.org/r/539542 (https://phabricator.wikimedia.org/T223907) [14:37:20] (03CR) 10Jhedden: "Compiler results: https://puppet-compiler.wmflabs.org/compiler1002/18654/" [puppet] - 10https://gerrit.wikimedia.org/r/539542 (https://phabricator.wikimedia.org/T223907) (owner: 10Jhedden) [14:37:53] (03CR) 10Filippo Giunchedi: "PCC https://puppet-compiler.wmflabs.org/compiler1002/18653/" [puppet] - 10https://gerrit.wikimedia.org/r/539535 (https://phabricator.wikimedia.org/T162123) (owner: 10Filippo Giunchedi) [14:39:50] !log installing postgresql-common bugfix update from Buster 10.1 point release [14:39:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:40:42] (03CR) 10Effie Mouzeli: [V: 03+1] "OK https://puppet-compiler.wmflabs.org/compiler1001/18655/mw1333.eqiad.wmnet/Oj" [puppet] - 10https://gerrit.wikimedia.org/r/539488 (https://phabricator.wikimedia.org/T229792) (owner: 10Effie Mouzeli) [14:42:52] 10Operations: Integrate Buster 10.1 point update - https://phabricator.wikimedia.org/T232310 (10MoritzMuehlenhoff) [14:45:17] !log installing ncurses bugfix update from Buster 10.1 point release [14:45:18] (03CR) 10Andrew Bogott: [C: 03+1] openstack: add desginate API ferm rules to haproxy [puppet] - 10https://gerrit.wikimedia.org/r/539542 (https://phabricator.wikimedia.org/T223907) (owner: 10Jhedden) [14:45:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:46:08] (03CR) 10Effie Mouzeli: [C: 03+2] mediawiki: switch search.wikimedia.org to PHP7 [puppet] - 10https://gerrit.wikimedia.org/r/539465 (https://phabricator.wikimedia.org/T229792) (owner: 10Effie Mouzeli) [14:46:18] (03PS3) 10Effie Mouzeli: mediawiki: switch search.wikimedia.org to PHP7 [puppet] - 10https://gerrit.wikimedia.org/r/539465 (https://phabricator.wikimedia.org/T229792) [14:46:33] (03PS2) 10Jhedden: openstack: add desginate API ferm rules to haproxy [puppet] - 10https://gerrit.wikimedia.org/r/539542 (https://phabricator.wikimedia.org/T223907) [14:47:49] (03PS3) 10Jhedden: openstack: add desginate API ferm rules to haproxy [puppet] - 10https://gerrit.wikimedia.org/r/539542 (https://phabricator.wikimedia.org/T223907) [14:47:51] (03CR) 10CDanis: [C: 03+1] swift: open per-port object server ports [puppet] - 10https://gerrit.wikimedia.org/r/539535 (https://phabricator.wikimedia.org/T162123) (owner: 10Filippo Giunchedi) [14:50:35] (03CR) 10Jhedden: [C: 03+2] openstack: add desginate API ferm rules to haproxy [puppet] - 10https://gerrit.wikimedia.org/r/539542 (https://phabricator.wikimedia.org/T223907) (owner: 10Jhedden) [14:50:57] (03PS4) 10Jhedden: openstack: add desginate API ferm rules to haproxy [puppet] - 10https://gerrit.wikimedia.org/r/539542 (https://phabricator.wikimedia.org/T223907) [14:51:15] (03CR) 10Effie Mouzeli: [V: 03+1 C: 03+2] mediawiki: switch wwwportals to PHP7 [puppet] - 10https://gerrit.wikimedia.org/r/539488 (https://phabricator.wikimedia.org/T229792) (owner: 10Effie Mouzeli) [14:51:24] (03PS4) 10Effie Mouzeli: mediawiki: switch wwwportals to PHP7 [puppet] - 10https://gerrit.wikimedia.org/r/539488 (https://phabricator.wikimedia.org/T229792) [14:59:00] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] openstack: add desginate API ferm rules to haproxy [puppet] - 10https://gerrit.wikimedia.org/r/539542 (https://phabricator.wikimedia.org/T223907) (owner: 10Jhedden) [14:59:13] (03PS1) 10Elukey: Enable kerberos replication on krb[12]001 [puppet] - 10https://gerrit.wikimedia.org/r/539546 (https://phabricator.wikimedia.org/T226089) [14:59:25] RECOVERY - Check systemd state on krb2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:00:00] (03PS5) 10Jhedden: openstack: add desginate API ferm rules to haproxy [puppet] - 10https://gerrit.wikimedia.org/r/539542 (https://phabricator.wikimedia.org/T223907) [15:00:02] (03CR) 10Elukey: [C: 03+2] Enable kerberos replication on krb[12]001 [puppet] - 10https://gerrit.wikimedia.org/r/539546 (https://phabricator.wikimedia.org/T226089) (owner: 10Elukey) [15:00:33] (03PS3) 10Arturo Borrero Gonzalez: toolforge: k8s: haproxy: add proxy redirection for nginx-ingress [puppet] - 10https://gerrit.wikimedia.org/r/527544 (https://phabricator.wikimedia.org/T228500) [15:01:50] (03CR) 10Arturo Borrero Gonzalez: "This change is ready for review." [puppet] - 10https://gerrit.wikimedia.org/r/527544 (https://phabricator.wikimedia.org/T228500) (owner: 10Arturo Borrero Gonzalez) [15:02:28] !log installing usb.ids update from Buster 10.1 point release [15:02:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:02:43] (03PS4) 10Arturo Borrero Gonzalez: toolforge: k8s: haproxy: add proxy redirection for nginx-ingress [puppet] - 10https://gerrit.wikimedia.org/r/527544 (https://phabricator.wikimedia.org/T228500) [15:04:34] (03PS6) 10Jhedden: openstack: add desginate API ferm rules to haproxy [puppet] - 10https://gerrit.wikimedia.org/r/539542 (https://phabricator.wikimedia.org/T223907) [15:06:07] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] toolforge: k8s: haproxy: add proxy redirection for nginx-ingress [puppet] - 10https://gerrit.wikimedia.org/r/527544 (https://phabricator.wikimedia.org/T228500) (owner: 10Arturo Borrero Gonzalez) [15:06:45] 10Operations, 10ops-codfw, 10media-storage: rack/setup/install ms-be205[1-6].codfw.wmnet - https://phabricator.wikimedia.org/T233638 (10Papaul) [15:08:57] (03PS7) 10Jhedden: openstack: add desginate API ferm rules to haproxy [puppet] - 10https://gerrit.wikimedia.org/r/539542 (https://phabricator.wikimedia.org/T223907) [15:15:17] (03PS1) 10Volans: Refactor execute() into Homer class [software/homer] - 10https://gerrit.wikimedia.org/r/539550 [15:15:19] (03PS1) 10Volans: Add commit action to the Homer class [software/homer] - 10https://gerrit.wikimedia.org/r/539551 [15:18:31] (03CR) 10jerkins-bot: [V: 04-1] Refactor execute() into Homer class [software/homer] - 10https://gerrit.wikimedia.org/r/539550 (owner: 10Volans) [15:20:52] (03PS1) 10Strainu: [rowiki] Enable 'deleterevision' for patrollers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/539553 (https://phabricator.wikimedia.org/T234051) [15:22:06] (03PS1) 10Andrew Bogott: wmcs postgres: make 'includes' an array [puppet] - 10https://gerrit.wikimedia.org/r/539554 [15:26:32] (03PS1) 10Jhedden: openstack: update newton keystone apache log format [puppet] - 10https://gerrit.wikimedia.org/r/539556 [15:26:58] 10Operations: Integrate Buster 10.1 point update - https://phabricator.wikimedia.org/T232310 (10MoritzMuehlenhoff) [15:28:27] (03CR) 10Jhedden: [C: 03+2] openstack: update newton keystone apache log format [puppet] - 10https://gerrit.wikimedia.org/r/539556 (owner: 10Jhedden) [15:30:18] (03CR) 10Andrew Bogott: [C: 03+2] wmcs postgres: make 'includes' an array [puppet] - 10https://gerrit.wikimedia.org/r/539554 (owner: 10Andrew Bogott) [15:30:40] (03PS2) 10Andrew Bogott: wmcs postgres: make 'includes' an array [puppet] - 10https://gerrit.wikimedia.org/r/539554 [15:32:45] (03CR) 10Ayounsi: [C: 03+1] prospector: disable McCabe complexity check [cookbooks] - 10https://gerrit.wikimedia.org/r/539500 (owner: 10Volans) [15:33:55] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] wmcs postgres: make 'includes' an array [puppet] - 10https://gerrit.wikimedia.org/r/539554 (owner: 10Andrew Bogott) [15:34:39] !log update pcc facts to add new hosts [15:34:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:50:50] PROBLEM - Disk space on ms-be2053 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.192.16.73: Connection reset by peer https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ms-be2053&var-datasource=codfw+prometheus/ops [15:51:52] RECOVERY - Disk space on ms-be2053 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ms-be2053&var-datasource=codfw+prometheus/ops [15:54:56] (03PS9) 10Ayounsi: Add cookbook to update Sentry PDUs passwords [cookbooks] - 10https://gerrit.wikimedia.org/r/537486 (https://phabricator.wikimedia.org/T233053) [15:56:57] (03CR) 10jerkins-bot: [V: 04-1] Add cookbook to update Sentry PDUs passwords [cookbooks] - 10https://gerrit.wikimedia.org/r/537486 (https://phabricator.wikimedia.org/T233053) (owner: 10Ayounsi) [15:57:46] PROBLEM - Host db1114.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [16:01:05] cmjohnson1: ^ [16:01:14] jclark-ctr: ^ [16:01:40] !log delete BGP to AS34305 on cr2-esams [16:01:41] https://phabricator.wikimedia.org/T229452 [16:01:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:03:24] @xionox host was powered off dell tech workin on host [16:03:25] thx, please downtime alerts before taking an host down (or working on it) /cc marostegui :) [16:04:19] ACKNOWLEDGEMENT - Host db1114.mgmt is DOWN: PING CRITICAL - Packet loss = 100% Ayounsi https://phabricator.wikimedia.org/T229452 [16:06:19] 10Operations, 10ops-codfw, 10media-storage: rack/setup/install ms-be205[1-6].codfw.wmnet - https://phabricator.wikimedia.org/T233638 (10Papaul) [16:06:23] (03CR) 10Herron: "> > It is true that logstash's configuration is complex, at the" [puppet] - 10https://gerrit.wikimedia.org/r/539519 (https://phabricator.wikimedia.org/T207200) (owner: 10Alexandros Kosiaris) [16:09:19] (03PS1) 10Elukey: profile::kerberos::replication: test [puppet] - 10https://gerrit.wikimedia.org/r/539565 [16:12:31] (03PS2) 10Elukey: profile::kerberos::replication: test [puppet] - 10https://gerrit.wikimedia.org/r/539565 [16:27:18] (03PS3) 10Elukey: profile::kerberos::replication: test [puppet] - 10https://gerrit.wikimedia.org/r/539565 [16:31:14] RECOVERY - Host db1114.mgmt is UP: PING OK - Packet loss = 0%, RTA = 1.20 ms [16:31:27] (03PS4) 10Elukey: profile::kerberos::replication: test [puppet] - 10https://gerrit.wikimedia.org/r/539565 [16:33:44] 10Operations, 10ops-codfw: (OoW) wtp2020: correctable memory errors - https://phabricator.wikimedia.org/T205712 (10ayounsi) 05Resolved→03Open This is alerting again: https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=wtp2020&service=Memory+correctable+errors+-EDAC- [16:34:53] 10Operations, 10Mail, 10Traffic: Set up basic email infra for w.wiki domain - https://phabricator.wikimedia.org/T216172 (10BBlack) Ping @herron can we move on this? Any current blockers? [16:35:54] XioNoX: the host db1114 was downtimed,but not host db1114.mgmt which is a different item in icinga [16:36:08] yep [16:36:22] no big deal :) [16:41:01] 10Operations, 10netops: Instability of the Level3 link between cr2-eqiad and cr2-esams - https://phabricator.wikimedia.org/T228827 (10ayounsi) Another one (scheduled as 17144179) 2019-09-26 23:32:28 xe-4/1/3 ifOperStatus: down -> up 2019-09-26 22:12:28 xe-4/1/3 ifOperStatus: up -> down [16:41:44] (03PS5) 10Elukey: profile::kerberos::replication: test [puppet] - 10https://gerrit.wikimedia.org/r/539565 [16:46:41] (03Abandoned) 10Elukey: profile::kerberos::replication: test [puppet] - 10https://gerrit.wikimedia.org/r/539565 (owner: 10Elukey) [16:48:02] 10Operations, 10netops: Extend firewall rules for new corp LDAP replicas - https://phabricator.wikimedia.org/T234047 (10ayounsi) There is only a mention of dubnium.wikimedia.org (208.80.154.13) in the analytics firewall filter. If that task if for network devices only, feel free to close it. If it's for all ty... [16:48:53] (03PS1) 10Elukey: profile::kerberos::replication: fix typo [puppet] - 10https://gerrit.wikimedia.org/r/539576 [16:50:06] 10Operations, 10Traffic: GRE MTU mitigations - Tracking - https://phabricator.wikimedia.org/T232602 (10BBlack) 05Open→03Resolved [16:50:29] (03PS2) 10Elukey: profile::kerberos::replication: fix typo [puppet] - 10https://gerrit.wikimedia.org/r/539576 [16:52:05] (03Abandoned) 10Lucas Werkmeister (WMDE): wgWBQualityConstraintsCacheCheckConstraintsResults true on testwikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/471001 (https://phabricator.wikimedia.org/T204031) (owner: 10Addshore) [16:53:07] (03CR) 10Elukey: [C: 03+2] profile::kerberos::replication: fix typo [puppet] - 10https://gerrit.wikimedia.org/r/539576 (owner: 10Elukey) [16:56:26] (03CR) 10Ayounsi: [C: 03+2] "Thx!" (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/537486 (https://phabricator.wikimedia.org/T233053) (owner: 10Ayounsi) [16:57:56] (03PS1) 10Herron: exim: add w.wiki to wikimedia_domains [puppet] - 10https://gerrit.wikimedia.org/r/539579 (https://phabricator.wikimedia.org/T216172) [16:58:32] (03CR) 10jerkins-bot: [V: 04-1] Add cookbook to update Sentry PDUs passwords [cookbooks] - 10https://gerrit.wikimedia.org/r/537486 (https://phabricator.wikimedia.org/T233053) (owner: 10Ayounsi) [16:59:24] (03PS2) 10Herron: exim: add w.wiki to wikimedia_domains [puppet] - 10https://gerrit.wikimedia.org/r/539579 (https://phabricator.wikimedia.org/T216172) [17:00:05] (03PS1) 10Elukey: profile::kerberos::replication: fix replicate_krb_database script [puppet] - 10https://gerrit.wikimedia.org/r/539580 [17:03:06] (03PS2) 10Elukey: profile::kerberos::replication: fix replicate_krb_database script [puppet] - 10https://gerrit.wikimedia.org/r/539580 (https://phabricator.wikimedia.org/T226089) [17:03:16] (03CR) 10Herron: [C: 03+2] exim: add w.wiki to wikimedia_domains [puppet] - 10https://gerrit.wikimedia.org/r/539579 (https://phabricator.wikimedia.org/T216172) (owner: 10Herron) [17:05:12] PROBLEM - Check systemd state on krb1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [17:06:19] 10Operations, 10ops-codfw, 10media-storage: rack/setup/install ms-be205[1-6].codfw.wmnet - https://phabricator.wikimedia.org/T233638 (10Papaul) [17:08:58] 10Operations, 10ops-eqiad, 10fundraising-tech-ops: rack/setup/install frban1001.eqiad.wmnet - https://phabricator.wikimedia.org/T234068 (10RobH) p:05Triage→03Normal [17:09:02] 10Operations, 10ops-codfw, 10fundraising-tech-ops: rack/setup/install frban2001.codfw.wmnet - https://phabricator.wikimedia.org/T234069 (10RobH) p:05Triage→03Normal [17:09:15] 10Operations, 10ops-eqiad, 10fundraising-tech-ops: rack/setup/install frban1001.eqiad.wmnet - https://phabricator.wikimedia.org/T234068 (10RobH) [17:09:27] 10Operations, 10ops-codfw, 10fundraising-tech-ops: rack/setup/install frban2001.codfw.wmnet - https://phabricator.wikimedia.org/T234069 (10RobH) [17:09:44] 10Operations, 10ops-eqiad, 10fundraising-tech-ops: rack/setup/install frban1001.eqiad.wmnet - https://phabricator.wikimedia.org/T234068 (10RobH) [17:09:49] 10Operations, 10ops-codfw, 10fundraising-tech-ops: rack/setup/install frban2001.codfw.wmnet - https://phabricator.wikimedia.org/T234069 (10RobH) [17:14:11] (03CR) 10Urbanecm: [C: 04-2] "Do not merge without approval from Legal." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/539553 (https://phabricator.wikimedia.org/T234051) (owner: 10Strainu) [17:16:17] 10Operations, 10Mail, 10Traffic: Set up basic email infra for w.wiki domain - https://phabricator.wikimedia.org/T216172 (10herron) 05Open→03Resolved a:03herron Thanks for the ping/reminder! Basic aliasing for w.wiki has been deployed and successfully tested. [17:17:01] 10Operations, 10Mail, 10Traffic: Set up basic email infra for w.wiki domain - https://phabricator.wikimedia.org/T216172 (10BBlack) Awesome, thank you! [17:18:47] (03PS1) 10Arturo Borrero Gonzalez: toolforge: k8s: ingress: make the nginx-ingress's nginx listen in 8080/tcp [puppet] - 10https://gerrit.wikimedia.org/r/539583 (https://phabricator.wikimedia.org/T228500) [17:21:45] (03CR) 10Phamhi: [C: 03+2] toolforge: k8s: ingress: make the nginx-ingress's nginx listen in 8080/tcp [puppet] - 10https://gerrit.wikimedia.org/r/539583 (https://phabricator.wikimedia.org/T228500) (owner: 10Arturo Borrero Gonzalez) [17:29:49] (03PS6) 10Lucas Werkmeister (WMDE): dologmsg: add manpage [puppet] - 10https://gerrit.wikimedia.org/r/513759 (https://phabricator.wikimedia.org/T222244) [17:30:07] 10Operations, 10ops-eqiad, 10serviceops: mw1286.mgmt is down - https://phabricator.wikimedia.org/T234009 (10Dzahn) Just tried to ssh to it now and it works for me. I get to login. [17:30:15] (03PS4) 10Lucas Werkmeister (WMDE): dologmsg: fix variable [puppet] - 10https://gerrit.wikimedia.org/r/511750 [17:31:13] (03CR) 10Ayounsi: "> Patch Set 5:" [puppet] - 10https://gerrit.wikimedia.org/r/539453 (https://phabricator.wikimedia.org/T228388) (owner: 10Volans) [17:32:08] (03PS2) 10Lucas Werkmeister (WMDE): exec watch in fatalmonitor [puppet] - 10https://gerrit.wikimedia.org/r/499761 [17:34:23] (03CR) 10Ayounsi: "> Patch Set 9: Code-Review+1" [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/539192 (owner: 10Ayounsi) [17:45:11] 10Operations, 10ops-eqiad: rack/setup/install dumpsdata1003.eqiad.wmnet - https://phabricator.wikimedia.org/T234076 (10RobH) p:05Triage→03Normal [17:45:18] 10Operations, 10ops-eqiad: rack/setup/install dumpsdata1003.eqiad.wmnet - https://phabricator.wikimedia.org/T234076 (10RobH) [17:46:26] 10Operations, 10ops-codfw, 10media-storage: rack/setup/install ms-be205[1-6].codfw.wmnet - https://phabricator.wikimedia.org/T233638 (10Papaul) [17:46:47] 10Operations, 10ops-eqiad: rack/setup/install dumpsdata1003.eqiad.wmnet - https://phabricator.wikimedia.org/T234076 (10RobH) [17:51:04] 10Operations, 10Analytics, 10Fundraising-Backlog, 10SRE-Access-Requests: Banner History and page view data access for fundraising analysts - Jerrie and Erin - https://phabricator.wikimedia.org/T233636 (10Nuria) @jrobell both need phabricator accounts and ldap accounts (via creating a user in wikitech) o... [17:51:18] PROBLEM - Check whether ferm is active by checking the default input chain on ms-be2054 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.192.32.184: Connection reset by peer https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [17:51:18] PROBLEM - dhclient process on ms-be2054 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.192.32.184: Connection reset by peer https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient [17:51:45] 10Operations, 10ops-eqiad, 10serviceops: mw1286.mgmt is down - https://phabricator.wikimedia.org/T234009 (10Dzahn) p:05Triage→03Normal @Cmjohnson This seems to be flapping. Sometimes it works and sometimes it doesn't. Could you check for loose cable and/or switch port, maybe just reconnecting it will do it. [17:52:08] papaul: ms-be2054 new installs, right [17:52:15] 10Operations, 10ops-eqiad: rack/setup/install dumpsdata1003.eqiad.wmnet - https://phabricator.wikimedia.org/T234076 (10RobH) [17:52:29] mutante: yes [17:52:33] working on it [17:52:47] thanks [17:53:14] PROBLEM - DPKG on ms-be2054 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.192.32.184: Connection reset by peer https://wikitech.wikimedia.org/wiki/Monitoring/dpkg [17:53:14] PROBLEM - puppet last run on ms-be2054 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.192.32.184: Connection reset by peer https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [17:54:38] RECOVERY - Check whether ferm is active by checking the default input chain on ms-be2054 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [17:54:38] RECOVERY - dhclient process on ms-be2054 is OK: PROCS OK: 0 processes with command name dhclient https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient [17:55:32] RECOVERY - DPKG on ms-be2054 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg [17:58:28] RECOVERY - puppet last run on ms-be2054 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [18:00:48] RECOVERY - Check systemd state on krb1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [18:04:21] 10Operations, 10ops-codfw, 10media-storage: rack/setup/install ms-be205[1-6].codfw.wmnet - https://phabricator.wikimedia.org/T233638 (10Papaul) [18:04:45] (03PS1) 10Jhedden: openstack: update designate config for newton release [puppet] - 10https://gerrit.wikimedia.org/r/539594 (https://phabricator.wikimedia.org/T223907) [18:06:56] (03CR) 10Jhedden: [C: 03+2] openstack: update designate config for newton release [puppet] - 10https://gerrit.wikimedia.org/r/539594 (https://phabricator.wikimedia.org/T223907) (owner: 10Jhedden) [18:17:26] !log mwdebug1001, mwdebug1002 - apt-get clean saves about 3GB and gets usage down from 94% to 87% on / (T234063) [18:17:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:17:31] T234063: Free up space on mwdebug* - https://phabricator.wikimedia.org/T234063 [18:19:19] 10Operations, 10Analytics, 10Fundraising-Backlog, 10SRE-Access-Requests: Banner History and page view data access for fundraising analysts - Jerrie and Erin - https://phabricator.wikimedia.org/T233636 (10DStrine) Their phab accounts are @EYener and @jkumalah I have a wiktech account but I have forgotten... [18:22:46] 10Operations, 10Analytics, 10Fundraising-Backlog, 10SRE-Access-Requests: Banner History and page view data access for fundraising analysts - Jerrie and Erin - https://phabricator.wikimedia.org/T233636 (10Nuria) @DStrine : just creating a user/password on https://wikitech.wikimedia.org/wiki/Main_Page is enough [18:22:56] !log mwdebug1001, mwdebug1002 - deleted from /srv/mediawiki/: php-1.34.0-wmf.16, .17, .18, .19 and .20 (current is .24) - usage back to about 57% (T234063) [18:22:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:23:00] T234063: Free up space on mwdebug* - https://phabricator.wikimedia.org/T234063 [18:29:19] 10Operations, 10netops: configure BGP route damping on IX sessions - https://phabricator.wikimedia.org/T222424 (10ayounsi) Maybe @jbond too! [18:30:45] 10Operations: Extend firewall rules for new corp LDAP replicas - https://phabricator.wikimedia.org/T234047 (10ayounsi) [18:34:17] 10Operations: Extend firewall rules for new corp LDAP replicas - https://phabricator.wikimedia.org/T234047 (10herron) p:05Triage→03Normal [18:44:26] (03PS1) 10Jhedden: openstack: fix nova standby host in designate ferm rule [puppet] - 10https://gerrit.wikimedia.org/r/539603 [18:46:58] (03CR) 10Jhedden: [C: 03+2] openstack: fix nova standby host in designate ferm rule [puppet] - 10https://gerrit.wikimedia.org/r/539603 (owner: 10Jhedden) [18:51:11] PROBLEM - mediawiki originals uploads -hourly- for codfw on icinga1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe2005:9112 job=statsd_exporter site=codfw https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/dashboard/file/swift?panelId=9&fullscreen&orgId=1&var-DC=codfw [18:51:29] PROBLEM - mediawiki originals uploads -hourly- for eqiad on icinga1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe1005:9112 job=statsd_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/dashboard/file/swift?panelId=9&fullscreen&orgId=1&var-DC=eqiad [19:16:55] 10Operations, 10Mail: Vendor's Emails Not Coming Through - https://phabricator.wikimedia.org/T233991 (10herron) Hello, yes generally speaking based upon the production mail logs I am seeing mail from lawroom.com being accepted and sent onwards to google for final delivery. Messages from this domain appears to... [19:17:07] 10Operations, 10Mail: Vendor's Emails Not Coming Through - https://phabricator.wikimedia.org/T233991 (10herron) p:05Triage→03Normal [19:17:35] 10Operations, 10OTRS, 10Office-IT, 10Wikimedia-Mailing-lists: Convert glam@wikimedia.org OTRS into a Google Group - https://phabricator.wikimedia.org/T233843 (10herron) p:05Triage→03Normal [19:19:41] 10Operations, 10OTRS, 10Office-IT: Convert glam@wikimedia.org OTRS into a Google Group - https://phabricator.wikimedia.org/T233843 (10Dzahn) [19:19:46] 10Operations, 10Puppet, 10Traffic: Puppet systemd::mask is an anti pattern that has unwanted side effect - https://phabricator.wikimedia.org/T233839 (10herron) p:05Triage→03Normal [19:20:03] 10Operations, 10Wikimedia-Mailing-lists: disable WMFSF, keep archives - https://phabricator.wikimedia.org/T233883 (10herron) p:05Triage→03Normal [19:35:25] (03CR) 10DannyS712: [C: 04-1] "Per task - legal approval needed" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/539553 (https://phabricator.wikimedia.org/T234051) (owner: 10Strainu) [19:43:00] (03PS3) 10Andrew Bogott: wmcs postgres: make 'includes' an array [puppet] - 10https://gerrit.wikimedia.org/r/539554 [19:43:02] (03PS1) 10Andrew Bogott: memcached: add the (now required!) notes_url arg for nrpe_monitor [puppet] - 10https://gerrit.wikimedia.org/r/539613 [19:43:54] (03CR) 10jerkins-bot: [V: 04-1] memcached: add the (now required!) notes_url arg for nrpe_monitor [puppet] - 10https://gerrit.wikimedia.org/r/539613 (owner: 10Andrew Bogott) [19:50:55] RECOVERY - mediawiki originals uploads -hourly- for codfw on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/dashboard/file/swift?panelId=9&fullscreen&orgId=1&var-DC=codfw [19:51:13] RECOVERY - mediawiki originals uploads -hourly- for eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/dashboard/file/swift?panelId=9&fullscreen&orgId=1&var-DC=eqiad [20:00:59] (03PS4) 10Andrew Bogott: wmcs postgres: make 'includes' an array [puppet] - 10https://gerrit.wikimedia.org/r/539554 [20:04:30] (03PS2) 10Andrew Bogott: memcached: add the (now required!) notes_url arg for nrpe_monitor [puppet] - 10https://gerrit.wikimedia.org/r/539613 [20:04:32] (03PS5) 10Andrew Bogott: wmcs postgres: make 'includes' an array [puppet] - 10https://gerrit.wikimedia.org/r/539554 [20:05:42] (03CR) 10Andrew Bogott: [C: 03+2] memcached: add the (now required!) notes_url arg for nrpe_monitor [puppet] - 10https://gerrit.wikimedia.org/r/539613 (owner: 10Andrew Bogott) [20:14:43] 10Operations, 10ops-eqiad, 10DBA: db1114 crashed due to memory issues (server under warranty) - https://phabricator.wikimedia.org/T229452 (10Marostegui) I see this host is back up, so I guess the mainboard has been replaced? [20:18:30] 10Operations, 10Phabricator, 10Traffic, 10Release-Engineering-Team (Development services), and 2 others: Prepare Phame to support heavy traffic for a Tech Department blog - https://phabricator.wikimedia.org/T226044 (10greg) >>! In T226044#5527864, @CDanis wrote: > Just curious -- what's the expected timefr... [20:18:57] 10Operations, 10OTRS, 10Office-IT: Convert glam@wikimedia.org OTRS into a Google Group - https://phabricator.wikimedia.org/T233843 (10Emufarmers) No objection from the OTRS admins. It's always a little sad when we move things to Google Groups, as a matter of both using non-free software and having data stor... [20:25:19] 10Operations, 10Cloud-VPS, 10User-fgiunchedi, 10cloud-services-team (Kanban): CPU scaling governor audit - https://phabricator.wikimedia.org/T225713 (10Andrew) [20:40:58] (03PS1) 10Paladox: letsencrypt: Sync acme-tiny script from upstream [puppet] - 10https://gerrit.wikimedia.org/r/539618 [20:41:39] (03PS2) 10Paladox: letsencrypt: Sync acme-tiny script from upstream [puppet] - 10https://gerrit.wikimedia.org/r/539618 [20:50:14] (03PS3) 10Paladox: letsencrypt: Sync acme-tiny script from upstream [puppet] - 10https://gerrit.wikimedia.org/r/539618 [20:52:07] 10Operations, 10Puppet, 10Traffic, 10serviceops: Puppet systemd::mask is an anti pattern that has unwanted side effect - https://phabricator.wikimedia.org/T233839 (10Dzahn) [21:05:13] 10Operations, 10Wikimedia-Logstash, 10observability, 10serviceops: Errors managed by wmf-errors (like OOMs) lack normalized_message on logstash - https://phabricator.wikimedia.org/T233828 (10herron) These logs appear to be nesting the message field inside the exception field, and the message field at the r... [21:05:36] (03PS1) 10Herron: logstash: parse nested JSON in php7.2-fpm exception field [puppet] - 10https://gerrit.wikimedia.org/r/539621 (https://phabricator.wikimedia.org/T233828) [21:11:10] (03PS1) 10Herron: logstash: if php7.2-fpm message field is empty, use exception.message [puppet] - 10https://gerrit.wikimedia.org/r/539623 (https://phabricator.wikimedia.org/T233828) [21:15:05] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review, 10Wikimedia-Incident: Logstash pipeline crashes on non-UTF8 log messages. - https://phabricator.wikimedia.org/T233662 (10ayounsi) [21:17:45] 10Operations, 10Wikimedia-Logstash, 10observability, 10serviceops, 10Patch-For-Review: Errors managed by wmf-errors (like OOMs) lack normalized_message on logstash - https://phabricator.wikimedia.org/T233828 (10herron) A few ideas to address this: Parse the nested exception field into the root of the lo... [21:21:19] 10Operations, 10Wikimedia-Logstash, 10observability, 10serviceops, 10Patch-For-Review: Errors managed by php-wmerrors (like OOMs) lack normalized_message on logstash - https://phabricator.wikimedia.org/T233828 (10Krinkle) [21:24:42] 10Operations, 10Wikimedia-Logstash, 10observability, 10serviceops, 10Patch-For-Review: Errors managed by php-wmerrors (like OOMs) lack normalized_message on logstash - https://phabricator.wikimedia.org/T233828 (10Krinkle) I think this should be fixed at the source in [puppet: php7-fatal-error.php](https:... [21:25:51] (03PS1) 10Paladox: letsencrypt: Fix acme-setup script [puppet] - 10https://gerrit.wikimedia.org/r/539625 [21:27:55] (03PS2) 10Paladox: letsencrypt: Fix acme-setup script [puppet] - 10https://gerrit.wikimedia.org/r/539625 [21:37:35] PROBLEM - OSPF status on cr1-eqsin is CRITICAL: OSPFv2: 2/3 UP : OSPFv3: 2/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [21:38:03] PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 133, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [21:43:23] (03PS2) 10Volans: Refactor execute() into Homer class [software/homer] - 10https://gerrit.wikimedia.org/r/539550 [21:43:25] (03PS2) 10Volans: Add commit action to the Homer class [software/homer] - 10https://gerrit.wikimedia.org/r/539551 [21:47:13] 08Warning Alert for device cr2-eqsin.wikimedia.org - Traffic on tunnel link [21:49:09] bblack ^ emergency Telia maintenance on in eqsin-codfw link, keep using the tunnel or depool eqsin? [21:52:38] (03PS3) 10Volans: Add commit action to the Homer class [software/homer] - 10https://gerrit.wikimedia.org/r/539551 [21:57:32] (03PS3) 10Paladox: letsencrypt: Fix acme-setup script [puppet] - 10https://gerrit.wikimedia.org/r/539625 [22:03:55] !log webperf1001, webperf2001: restart envoyproxy to pick up new cert with the right subject alt. names [22:03:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:04:20] (03PS4) 10Dzahn: ATS: switch webperf backends to TLS and discovery name [puppet] - 10https://gerrit.wikimedia.org/r/535929 (https://phabricator.wikimedia.org/T210411) [22:04:56] (03CR) 10Dzahn: [C: 03+1] "[webperf1001:~] $ curl https://performance.discovery.wmnet works now" [puppet] - 10https://gerrit.wikimedia.org/r/535929 (https://phabricator.wikimedia.org/T210411) (owner: 10Dzahn) [22:07:52] (03CR) 10Dzahn: [C: 03+2] ATS: switch webperf backends to TLS and discovery name [puppet] - 10https://gerrit.wikimedia.org/r/535929 (https://phabricator.wikimedia.org/T210411) (owner: 10Dzahn) [22:08:33] RECOVERY - OSPF status on cr1-eqsin is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [22:09:01] RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 135, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [22:10:16] 10Operations, 10Traffic, 10serviceops, 10Patch-For-Review: Applayer services without TLS - https://phabricator.wikimedia.org/T210411 (10Dzahn) [22:13:05] 10Operations, 10Traffic, 10serviceops, 10Patch-For-Review: Applayer services without TLS - https://phabricator.wikimedia.org/T210411 (10Dzahn) >>! In T210411#5496180, @Vgutierrez wrote: > Please note that the docker-registry certificate is missing the public hostname: `docker-registry.wikimedia.org` Per I... [22:13:29] 10Operations, 10Traffic, 10serviceops, 10Patch-For-Review: Applayer services without TLS - https://phabricator.wikimedia.org/T210411 (10Dzahn) https://performance.wikimedia.org switch to https://performance.discovery.wmnet as backend. [22:14:41] (03CR) 10Ayounsi: [C: 03+2] "LGTM & tested." [software/homer] - 10https://gerrit.wikimedia.org/r/539550 (owner: 10Volans) [22:15:26] that was brief [22:17:36] (03Merged) 10jenkins-bot: Refactor execute() into Homer class [software/homer] - 10https://gerrit.wikimedia.org/r/539550 (owner: 10Volans) [22:18:39] (03CR) 10jenkins-bot: Refactor execute() into Homer class [software/homer] - 10https://gerrit.wikimedia.org/r/539550 (owner: 10Volans) [22:20:00] (03PS13) 10Jeena Huneidi: Add restbase chart (port from local-charts) [deployment-charts] - 10https://gerrit.wikimedia.org/r/517557 (https://phabricator.wikimedia.org/T228910) [22:21:42] (03PS1) 10Dzahn: mediawiki::maintenance: add envoy for TLS termination for noc.wm.org [puppet] - 10https://gerrit.wikimedia.org/r/539633 (https://phabricator.wikimedia.org/T210411) [22:22:13] 08̶W̶a̶r̶n̶i̶n̶g Device cr2-eqsin.wikimedia.org recovered from Traffic on tunnel link [22:23:28] (03CR) 10Jeena Huneidi: Add restbase chart (port from local-charts) (033 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/517557 (https://phabricator.wikimedia.org/T228910) (owner: 10Jeena Huneidi) [22:24:54] (03PS14) 10Jeena Huneidi: Add restbase chart (port from local-charts) [deployment-charts] - 10https://gerrit.wikimedia.org/r/517557 (https://phabricator.wikimedia.org/T228910) [22:25:41] (03PS1) 10Dzahn: site: merge mwmaint servers into a single stanza [puppet] - 10https://gerrit.wikimedia.org/r/539634 [22:26:56] (03PS1) 10Dzahn: add mwmaint.discovery and point to mwmaint1002 [dns] - 10https://gerrit.wikimedia.org/r/539635 (https://phabricator.wikimedia.org/T210411) [22:32:39] 10Operations, 10ops-codfw, 10media-storage: rack/setup/install ms-be205[1-6].codfw.wmnet - https://phabricator.wikimedia.org/T233638 (10Papaul) [22:34:26] 10Operations, 10ops-codfw, 10media-storage: rack/setup/install ms-be205[1-6].codfw.wmnet - https://phabricator.wikimedia.org/T233638 (10Papaul) a:05Papaul→03fgiunchedi @fgiunchedi all yours [22:36:34] !log phab2001 - upgrade php7.2 packages to 7.2.22 (T230024) [22:36:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:36:38] T230024: Update component/php72 to 7.2.22 - https://phabricator.wikimedia.org/T230024 [22:40:36] 10Operations, 10serviceops: Update component/php72 to 7.2.22 - https://phabricator.wikimedia.org/T230024 (10Dzahn) @jijiki Any examples how it was done on the other servers? Did you keep the locally modified php.ini and fpm/php.ini files? I let the package overwrite but then let puppet revert that. [22:44:09] !log phab2001 - apt-get autoremove - remove unused python and ruby packages [22:44:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:44:56] (03CR) 10Ayounsi: "Looks good overall, a few comments inline." (033 comments) [software/homer] - 10https://gerrit.wikimedia.org/r/539551 (owner: 10Volans) [22:49:30] (03CR) 10Dzahn: [C: 03+1] "https://puppet-compiler.wmflabs.org/compiler1001/18671/mwmaint1002.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/539633 (https://phabricator.wikimedia.org/T210411) (owner: 10Dzahn) [22:52:37] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1001/18672/" [puppet] - 10https://gerrit.wikimedia.org/r/539634 (owner: 10Dzahn) [23:08:01] (03CR) 10Alex Monk: "Looks like this was broken in If2a652f8d171d405b29b34adfa3a321266c709c2" [puppet] - 10https://gerrit.wikimedia.org/r/539625 (owner: 10Paladox) [23:09:29] 10Operations, 10ops-eqiad: replace scs-a8-eqiad - https://phabricator.wikimedia.org/T228919 (10RobH) [23:10:13] (03CR) 10Alex Monk: "you could probably fix this by putting an extra + at the end of the 'subjectAltName=' line but this should fix it as well" [puppet] - 10https://gerrit.wikimedia.org/r/539625 (owner: 10Paladox) [23:10:20] 10Operations, 10ops-eqiad: replace scs-a8-eqiad - https://phabricator.wikimedia.org/T228919 (10RobH) a:05Cmjohnson→03Jclark-ctr @Jclark-ctr, I see you received in the scs on the procurement task T228202. Can you go ahead and do the first two steps on this, so it is in netbox and trackable? The remainder... [23:12:55] (03PS4) 10Paladox: Gerrit: Allow configuring accountPattern [puppet] - 10https://gerrit.wikimedia.org/r/539211 [23:36:48] (03PS15) 10Jeena Huneidi: Add restbase chart (port from local-charts) [deployment-charts] - 10https://gerrit.wikimedia.org/r/517557 (https://phabricator.wikimedia.org/T228910) [23:41:06] (03PS16) 10Jeena Huneidi: Add restbase chart (port from local-charts) [deployment-charts] - 10https://gerrit.wikimedia.org/r/517557 (https://phabricator.wikimedia.org/T228910) [23:57:56] (03CR) 10Dzahn: "we should be able to change this on the LDAP server config side. somehow in the schema file for inetorgperson or so.. hrmmm.." [puppet] - 10https://gerrit.wikimedia.org/r/539211 (owner: 10Paladox)