[00:03:02] (03PS3) 10Bstorm: k8s: Set default requests for the new cluster [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/563592 (https://phabricator.wikimedia.org/T236202) [00:06:24] 10Operations, 10Packaging, 10serviceops: package requirements for upgrading deployment_servers to buster - https://phabricator.wikimedia.org/T242480 (10Dzahn) [00:06:29] (03CR) 10jerkins-bot: [V: 04-1] k8s: Set default requests for the new cluster [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/563592 (https://phabricator.wikimedia.org/T236202) (owner: 10Bstorm) [00:08:13] PROBLEM - Postgres Replication Lag on maps2001 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 85337992 and 5 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:09:45] (03PS1) 10Dzahn: devtools (cloud): switch deployment server to deploy1002 [puppet] - 10https://gerrit.wikimedia.org/r/563616 [00:10:03] RECOVERY - Postgres Replication Lag on maps2001 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 0 and 108 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:10:14] (03CR) 10jerkins-bot: [V: 04-1] devtools (cloud): switch deployment server to deploy1002 [puppet] - 10https://gerrit.wikimedia.org/r/563616 (owner: 10Dzahn) [00:10:19] (03PS2) 10Dzahn: devtools (cloud): switch deployment server to deploy1002 [puppet] - 10https://gerrit.wikimedia.org/r/563616 [00:10:24] (03CR) 10Dzahn: [C: 03+2] devtools (cloud): switch deployment server to deploy1002 [puppet] - 10https://gerrit.wikimedia.org/r/563616 (owner: 10Dzahn) [00:20:53] (03PS1) 10Dzahn: devtools: add Hiera values for a deployment_server in cloud [puppet] - 10https://gerrit.wikimedia.org/r/563618 [00:22:46] (03PS1) 10Dzahn: codesearch: fix dependency cycle with git::clone [puppet] - 10https://gerrit.wikimedia.org/r/563619 [00:26:19] (03CR) 10Dzahn: [C: 03+2] codesearch: fix dependency cycle with git::clone [puppet] - 10https://gerrit.wikimedia.org/r/563619 (owner: 10Dzahn) [00:26:29] (03PS2) 10Dzahn: codesearch: fix dependency cycle with git::clone [puppet] - 10https://gerrit.wikimedia.org/r/563619 [00:35:02] (03CR) 10Dzahn: "@Legoktm After the follow-ups in the topic branch https://gerrit.wikimedia.org/r/q/topic:%22codesearch%22+(status:open%20OR%20status:merge" [puppet] - 10https://gerrit.wikimedia.org/r/563114 (https://phabricator.wikimedia.org/T242319) (owner: 10Legoktm) [00:39:38] (03PS4) 10Bstorm: k8s: Set default requests for the new cluster [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/563592 (https://phabricator.wikimedia.org/T236202) [00:45:33] (03CR) 10Bstorm: "Most of this is just implementing the hilarious effort to talk out a needed code change in IRC. However, I did make some adjustments to t" (033 comments) [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/563592 (https://phabricator.wikimedia.org/T236202) (owner: 10Bstorm) [00:48:11] (03CR) 10Bstorm: [C: 03+2] k8s: Set default requests for the new cluster [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/563592 (https://phabricator.wikimedia.org/T236202) (owner: 10Bstorm) [00:51:25] (03CR) 10Bstorm: "Actually running this produces an error:" [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/563610 (owner: 10Bstorm) [00:51:54] (03CR) 10Bstorm: "> Patch Set 1:" [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/563610 (owner: 10Bstorm) [01:04:59] (03PS2) 10Dzahn: devtools: add Hiera values for a deployment_server in cloud [puppet] - 10https://gerrit.wikimedia.org/r/563618 [01:18:31] 10Operations, 10Wikimedia-General-or-Unknown, 10serviceops, 10Performance-Team (Radar), 10Wikimedia-Incident: Investigate recurrent GET latency spikes on MediaWiki appservers (Oct 2019) - https://phabricator.wikimedia.org/T235872 (10Krinkle) [01:20:23] PROBLEM - Check systemd state on ms-be1037 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:22:38] 10Operations, 10Performance-Team, 10Traffic, 10Patch-For-Review, 10Wikimedia-Incident: 15% response start regression as of 2019-11-11 (Varnish->ATS) - https://phabricator.wikimedia.org/T238494 (10Krinkle) [01:27:11] 10Operations, 10Wikimedia-General-or-Unknown, 10serviceops, 10Performance-Team (Radar), 10Wikimedia-Incident: Investigate recurrent GET latency spikes on MediaWiki appservers (Oct 2019) - https://phabricator.wikimedia.org/T235872 (10Krinkle) It is not clear to me whether this is an actual issue or not.... [01:29:10] (03PS1) 10Bstorm: k8s: Don't restart all k8s machinery to reboot a basic webservice [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/563624 (https://phabricator.wikimedia.org/T228499) [01:45:39] RECOVERY - Check systemd state on ms-be1037 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:49:27] (03PS10) 10Bstorm: Make Kubernetes the default backend and warn when guessing [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/443190 (https://phabricator.wikimedia.org/T154504) (owner: 10Nehajha) [01:59:46] 10Operations, 10Internet-Archive, 10Wikimedia-Portals: www.wikipedia.org/robots.txt should not be a redirect - https://phabricator.wikimedia.org/T242500 (10Krinkle) [02:00:01] 10Operations, 10Wikimedia-Portals: www.wikipedia.org/robots.txt should not be a redirect - https://phabricator.wikimedia.org/T242500 (10Krinkle) [02:50:41] PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 223 probes of 509 (alerts on 35) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [02:56:29] RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 32 probes of 509 (alerts on 35) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [03:24:17] 10Operations, 10Wikimedia-Portals, 10Regression: www.wikipedia.org/robots.txt should not be a redirect - https://phabricator.wikimedia.org/T242500 (10DannyS712) [03:26:46] PROBLEM - puppet last run on ms-be1035 is CRITICAL: CRITICAL: Puppet last ran 6 hours ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [03:58:47] (03CR) 10Legoktm: "> Patch Set 6:" [puppet] - 10https://gerrit.wikimedia.org/r/563114 (https://phabricator.wikimedia.org/T242319) (owner: 10Legoktm) [04:00:59] (03PS1) 10Bstorm: Revert "Add busybox to buster and stretch images" [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/563632 [04:02:19] (03PS2) 10Bstorm: Revert "Add busybox to buster and stretch images" [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/563632 [04:04:24] 10Operations, 10ops-codfw, 10DBA: Missing Network drivers from Stretch and Buster installer for BRCM 2P 1G BT + 2P 10G SFP NDC - https://phabricator.wikimedia.org/T242481 (10Peachey88) [04:05:02] (03PS1) 10Legoktm: codesearch: Install docker-ce from thirdparty/kubeadm-k8s component [puppet] - 10https://gerrit.wikimedia.org/r/563633 [04:06:04] (03CR) 10Bstorm: [C: 03+2] Revert "Add busybox to buster and stretch images" [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/563632 (owner: 10Bstorm) [04:06:37] (03CR) 10Legoktm: codesearch: Install docker-ce from thirdparty/kubeadm-k8s component (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/563633 (owner: 10Legoktm) [05:34:45] !log volker-e@deploy1001 Started deploy [design/style-guide@6a44c69]: Deploy design/style-guide: [05:34:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:34:52] !log volker-e@deploy1001 Finished deploy [design/style-guide@6a44c69]: Deploy design/style-guide: (duration: 00m 08s) [05:34:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:51:57] 10Operations, 10ops-codfw, 10DBA: Missing Network drivers from Stretch and Buster installer for BRCM 2P 1G BT + 2P 10G SFP NDC - https://phabricator.wikimedia.org/T242481 (10Marostegui) What if you try to configure the network manually rather than using DHCP, does it fail too? If you send me the MAC address... [06:08:43] 10Operations, 10ops-codfw, 10DBA: Missing Network drivers from Stretch and Buster installer for BRCM 2P 1G BT + 2P 10G SFP NDC - https://phabricator.wikimedia.org/T242481 (10Papaul) @Marostegui the request is not getting to the DHCP server [06:10:42] 10Operations, 10ops-codfw, 10DBA: Missing Network drivers from Stretch and Buster installer for BRCM 2P 1G BT + 2P 10G SFP NDC - https://phabricator.wikimedia.org/T242481 (10Marostegui) And setting the IP, GW etc manually doesn't work either? [06:13:15] 10Operations, 10ops-codfw, 10DBA: Missing Network drivers from Stretch and Buster installer for BRCM 2P 1G BT + 2P 10G SFP NDC - https://phabricator.wikimedia.org/T242481 (10Papaul) I didn't try that but you can try with es2024 or es2021. [07:08:23] 10Operations, 10ops-codfw, 10DBA: Missing Network drivers from Stretch and Buster installer for BRCM 2P 1G BT + 2P 10G SFP NDC - https://phabricator.wikimedia.org/T242481 (10Marostegui) When reseting es2021 via IDRAC I saw this and after a couple of powercycles the host booted up: ` !!!! X64 Exception Type -... [07:35:02] 10Operations, 10ops-codfw, 10DBA: Missing Network drivers from Stretch and Buster installer for BRCM 2P 1G BT + 2P 10G SFP NDC - https://phabricator.wikimedia.org/T242481 (10Marostegui) I have tried es2024 and setting its IP manually and it seems that it indeed cannot reach the network. However, I do see the... [07:51:57] PROBLEM - Check systemd state on ms-be1037 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [07:55:33] PROBLEM - Check systemd state on ms-be1037 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [08:38:11] 10Operations, 10Phabricator, 10Traffic, 10serviceops, 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)): Phabricator downtime due to aphlict and websockets (aphlict current disabled) - https://phabricator.wikimedia.org/T238593 (10mmodell) >>! In T238593#5792383, @akosiaris wrote: >> I really can'... [08:46:01] RECOVERY - Check systemd state on ms-be1037 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:22:13] (03PS1) 10Elukey: Add spark encryption option to Hadoop test's yarn configuration [puppet] - 10https://gerrit.wikimedia.org/r/563651 (https://phabricator.wikimedia.org/T240934) [09:23:22] (03CR) 10Elukey: [C: 03+2] Add spark encryption option to Hadoop test's yarn configuration [puppet] - 10https://gerrit.wikimedia.org/r/563651 (https://phabricator.wikimedia.org/T240934) (owner: 10Elukey) [09:37:37] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [09:41:18] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [09:57:31] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [10:01:05] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [10:04:54] (03PS1) 10DannyS712: Deploy partial blocks on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/563653 (https://phabricator.wikimedia.org/T218626) [10:05:18] (03PS2) 10DannyS712: Deploy partial blocks on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/563653 (https://phabricator.wikimedia.org/T218626) [12:12:04] (03CR) 10Nuria: "Is there documentation on how to make use of these?" [deployment-charts] - 10https://gerrit.wikimedia.org/r/562623 (https://phabricator.wikimedia.org/T240985) (owner: 10Ottomata) [12:43:57] PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [14:25:19] PROBLEM - Disk space on ms-be1039 is CRITICAL: DISK CRITICAL - /srv/swift-storage/sdd1 is not accessible: Input/output error https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ms-be1039&var-datasource=eqiad+prometheus/ops [14:47:31] PROBLEM - HP RAID on ms-be1039 is CRITICAL: CRITICAL: Slot 3: Failed: 1I:1:6 - OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:7, 1I:1:8, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, 2I:4:1, 2I:4:2 - Controller: OK - Battery/Capacitor: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [14:47:36] ACKNOWLEDGEMENT - HP RAID on ms-be1039 is CRITICAL: CRITICAL: Slot 3: Failed: 1I:1:6 - OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:7, 1I:1:8, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, 2I:4:1, 2I:4:2 - Controller: OK - Battery/Capacitor: OK nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T242511 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [14:47:39] 10Operations, 10ops-eqiad: Degraded RAID on ms-be1039 - https://phabricator.wikimedia.org/T242511 (10ops-monitoring-bot) [15:14:04] PROBLEM - Device not healthy -SMART- on ms-be1039 is CRITICAL: cluster=swift device=cciss,13 instance=ms-be1039:9100 job=node site=eqiad https://wikitech.wikimedia.org/wiki/SMART%23Alerts https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ms-be1039&var-datasource=eqiad+prometheus/ops [16:34:25] 10Operations, 10ops-codfw, 10DBA: Missing Network drivers from Stretch and Buster installer for BRCM 2P 1G BT + 2P 10G SFP NDC - https://phabricator.wikimedia.org/T242481 (10Papaul) @Marostegui ` papaul@asw-a-codfw> show interfaces ge-6/0/12 descriptions Interface Admin Link Description ge-6/0/12... [16:59:58] 10Operations, 10ops-codfw, 10DBA: Missing Network drivers from Stretch and Buster installer for BRCM 2P 1G BT + 2P 10G SFP NDC - https://phabricator.wikimedia.org/T242481 (10Marostegui) >>! In T242481#5794821, @Papaul wrote: > @Marostegui > ` > papaul@asw-a-codfw> show interfaces ge-6/0/12 descriptions... [17:07:46] PROBLEM - Postgres Replication Lag on maps2003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 95137280 and 13 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [17:08:16] 10Operations, 10ops-codfw, 10DBA: Missing Network drivers from Stretch and Buster installer for BRCM 2P 1G BT + 2P 10G SFP NDC - https://phabricator.wikimedia.org/T242481 (10Papaul) @Marostegui ` Logical Vlan TAG MAC STP Logical Tagging interface membe... [17:09:08] 10Operations, 10ops-codfw, 10DBA: Missing Network drivers from Stretch and Buster installer for BRCM 2P 1G BT + 2P 10G SFP NDC - https://phabricator.wikimedia.org/T242481 (10Marostegui) I just saw this: ` [35393.835622] tg3 0000:01:00.0 eno3: Link is up at 1000 Mbps, full duplex [35393.835634] tg3 0000:01:00... [17:09:32] RECOVERY - Postgres Replication Lag on maps2003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 7464 and 108 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [17:09:51] 10Operations, 10ops-codfw, 10DBA: Missing Network drivers from Stretch and Buster installer for BRCM 2P 1G BT + 2P 10G SFP NDC - https://phabricator.wikimedia.org/T242481 (10Papaul) @Marostegui switch didn't learn any MAC address on that interface ` papaul@asw-a-codfw> show ethernet-switching table interf... [17:22:02] 10Operations, 10ops-codfw, 10DBA: Missing Network drivers from Stretch and Buster installer for BRCM 2P 1G BT + 2P 10G SFP NDC - https://phabricator.wikimedia.org/T242481 (10Marostegui) You are not able to see any MAC addresses for es2024? [17:31:26] 10Operations, 10ops-codfw, 10DBA: Missing Network drivers from Stretch and Buster installer for BRCM 2P 1G BT + 2P 10G SFP NDC - https://phabricator.wikimedia.org/T242481 (10Papaul) correct, on the switch side [17:33:04] 10Operations, 10ops-codfw, 10DBA: Missing Network drivers from Stretch and Buster installer for BRCM 2P 1G BT + 2P 10G SFP NDC - https://phabricator.wikimedia.org/T242481 (10Marostegui) I have unloaded and then loaded again `bnxt_en` kernel module and I can see the main iface disappearing and then coming bac... [17:40:46] PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=205 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [17:49:46] RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [17:56:54] 10Operations, 10ops-codfw, 10DBA: Missing Network drivers from Stretch and Buster installer for BRCM 2P 1G BT + 2P 10G SFP NDC - https://phabricator.wikimedia.org/T242481 (10Papaul) @Marostegui i double check the switch configuration for both es2020 and es2024 and the DNS files from https://gerrit.wikimedia... [18:09:24] 10Operations, 10ops-codfw, 10DBA: Missing Network drivers from Stretch and Buster installer for BRCM 2P 1G BT + 2P 10G SFP NDC - https://phabricator.wikimedia.org/T242481 (10Papaul) @Marostegui I will focus more on troubleshooting this on the NIC level on Monday. Since the 1GB and 10GB interfaces are on th... [18:10:26] PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 87862328 and 12 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [18:12:18] RECOVERY - Postgres Replication Lag on maps1003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 88424 and 71 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [18:20:43] 10Operations, 10ops-codfw, 10DBA: Missing Network drivers from Stretch and Buster installer for BRCM 2P 1G BT + 2P 10G SFP NDC - https://phabricator.wikimedia.org/T242481 (10Marostegui) Sounds good @papaul! Thank you a lot. Have a good one!! [19:01:27] 10Operations, 10Wikimedia-Mailing-lists: Allow Cloud mailing list to be indexed - https://phabricator.wikimedia.org/T242520 (10Peachey88) [19:02:47] 10Operations, 10ops-eqiad, 10SRE-swift-storage: Degraded RAID on ms-be1039 - https://phabricator.wikimedia.org/T242511 (10Peachey88) [19:58:35] 10Operations, 10Wikimedia-Mailing-lists: Allow Cloud mailing list to be indexed - https://phabricator.wikimedia.org/T242520 (10RhinosF1) Has been done before in T193572 for other lists. Should be as simple as replacing https://phabricator.wikimedia.org/source/operations-puppet/browse/production/modules/mailma... [20:07:31] (03PS1) 10RhinosF1: Add wikimedia cloud mailing list to mailman’s robots.txt [puppet] - 10https://gerrit.wikimedia.org/r/563684 (https://phabricator.wikimedia.org/T242520) [20:09:18] PROBLEM - Postgres Replication Lag on maps1001 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 38409448 and 3 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [20:11:06] RECOVERY - Postgres Replication Lag on maps1001 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 175368 and 72 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [20:12:05] (03PS2) 10RhinosF1: Add wikimedia cloud mailing list to mailman’s robots.txt [puppet] - 10https://gerrit.wikimedia.org/r/563684 (https://phabricator.wikimedia.org/T242520) [20:13:36] 10Operations, 10Wikimedia-Mailing-lists, 10Patch-For-Review, 10User-RhinosF1: Allow Cloud mailing list to be indexed - https://phabricator.wikimedia.org/T242520 (10RhinosF1) a:03RhinosF1 >>! In T242520#5794935, @gerritbot wrote: > Change 563684 had a related patch set uploaded (by RhinosF1; owner: Rhinos... [20:14:35] 10Operations, 10Wikimedia-Mailing-lists, 10Patch-For-Review, 10User-RhinosF1: Allow Cloud mailing list to be indexed - https://phabricator.wikimedia.org/T242520 (10RhinosF1) p:05Triage→03Normal [20:16:55] 10Operations, 10Beta-Cluster-Infrastructure, 10Lexicographical data, 10Wikidata, 10User-DannyS712: PHP fatal error on beta cluster - https://phabricator.wikimedia.org/T242188 (10Cutmuetia1998) [20:18:38] 10Operations, 10Beta-Cluster-Infrastructure, 10Lexicographical data, 10Wikidata, 10User-DannyS712: PHP fatal error on beta cluster - https://phabricator.wikimedia.org/T242188 (10RhinosF1) [21:13:42] (03CR) 10Ammarpad: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/563684 (https://phabricator.wikimedia.org/T242520) (owner: 10RhinosF1) [21:44:40] 10Operations, 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Scap, 10serviceops: On beta, scap can't clear opcache on some mw servers - https://phabricator.wikimedia.org/T237033 (10hashar) Those settings are for the Puppet roles. Given roles are solely for production, on WMCS the hiera look... [21:47:52] (03CR) 10RhinosF1: "Recheck what?" [puppet] - 10https://gerrit.wikimedia.org/r/563684 (https://phabricator.wikimedia.org/T242520) (owner: 10RhinosF1) [21:51:19] 10Operations, 10Wikimedia-Mailing-lists, 10Patch-For-Review, 10User-RhinosF1: Allow Cloud mailing list to be indexed - https://phabricator.wikimedia.org/T242520 (10MarcoAurelio) I would prefer not to go forward with this. I don't feel like opening the gate even more to spammers and URL/email address grabbe... [22:00:18] 10Operations, 10Wikimedia-Mailing-lists, 10Patch-For-Review, 10User-RhinosF1: Allow Cloud mailing list to be indexed - https://phabricator.wikimedia.org/T242520 (10RhinosF1) >>! In T242520#5795026, @MarcoAurelio wrote: > I would prefer not to go forward with this. I don't feel like opening the gate even mo... [22:00:49] 10Operations, 10Wikimedia-Mailing-lists, 10Patch-For-Review, 10User-RhinosF1: Allow Cloud mailing list to be indexed - https://phabricator.wikimedia.org/T242520 (10RoySmith) I'm fine with an internal search tool instead of external search engines, but the mailing list really does need to be indexed in some... [22:41:34] 10Operations, 10Wikimedia-Mailing-lists, 10Patch-For-Review, 10User-RhinosF1: Allow Cloud mailing list to be indexed - https://phabricator.wikimedia.org/T242520 (10Platonides) As an external solution, it could be added to an external mailing list archiver such as [[ https://marc.info/?q=about#Add | marc ]]... [23:11:35] 10Operations, 10Wikimedia-Mailing-lists, 10Patch-For-Review, 10User-RhinosF1: Allow Cloud mailing list to be indexed - https://phabricator.wikimedia.org/T242520 (10Bawolff) Note that the archives do replace @ signs with "at" for whatever good that does. Personally I think ease of finding answers to old que... [23:12:55] (03CR) 10Brian Wolff: "> Recheck what?" [puppet] - 10https://gerrit.wikimedia.org/r/563684 (https://phabricator.wikimedia.org/T242520) (owner: 10RhinosF1)