[01:10:12] PROBLEM - Old JVM GC check - cloudelastic1001-cloudelastic-chi-eqiad on cloudelastic1001 is CRITICAL: 101.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1 [01:26:34] RECOVERY - Old JVM GC check - cloudelastic1001-cloudelastic-chi-eqiad on cloudelastic1001 is OK: (C)100 gt (W)80 gt 77.29 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1 [02:01:10] PROBLEM - mediawiki originals uploads -hourly- for eqiad on icinga1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe1005:9112 job=statsd_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad [02:01:16] PROBLEM - mediawiki originals uploads -hourly- for codfw on icinga1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe2005:9112 job=statsd_exporter site=codfw https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw [04:18:10] (03CR) 10KartikMistry: "recheck" [debs/contenttranslation/apertium-pol-szl] - 10https://gerrit.wikimedia.org/r/576628 (https://phabricator.wikimedia.org/T202276) (owner: 10KartikMistry) [04:18:16] (03CR) 10KartikMistry: "recheck" [debs/contenttranslation/apertium-oci-fra] - 10https://gerrit.wikimedia.org/r/577047 (https://phabricator.wikimedia.org/T202360) (owner: 10KartikMistry) [04:18:43] (03CR) 10KartikMistry: "recheck" [debs/contenttranslation/apertium-fra-cat] - 10https://gerrit.wikimedia.org/r/577216 (https://phabricator.wikimedia.org/T233700) (owner: 10KartikMistry) [04:18:48] (03CR) 10KartikMistry: "recheck" [debs/contenttranslation/apertium-spa-cat] - 10https://gerrit.wikimedia.org/r/577243 (https://phabricator.wikimedia.org/T233700) (owner: 10KartikMistry) [04:18:51] (03CR) 10KartikMistry: "recheck" [debs/contenttranslation/apertium-separable] - 10https://gerrit.wikimedia.org/r/577046 (https://phabricator.wikimedia.org/T234182) (owner: 10KartikMistry) [05:16:53] !log restart gerrit-replica as it's OOM T247182 [05:16:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:16:58] T247182: gerrit-replica seems down - https://phabricator.wikimedia.org/T247182 [05:30:42] RECOVERY - mediawiki originals uploads -hourly- for eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad [05:30:50] RECOVERY - mediawiki originals uploads -hourly- for codfw on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw [05:32:00] 10Operations, 10observability, 10cloud-services-team (Kanban): Prometheus vs. CPU usage vs. hyperthreading - https://phabricator.wikimedia.org/T193272 (10bd808) [05:35:54] 10Operations, 10Puppet, 10cloud-services-team (Kanban): Puppet class systemd needs to throw a more useful error - https://phabricator.wikimedia.org/T195553 (10bd808) [05:52:32] (03PS2) 10KartikMistry: apertium-separable: Update to new upstream release 0.3.3 [debs/contenttranslation/apertium-separable] - 10https://gerrit.wikimedia.org/r/577046 (https://phabricator.wikimedia.org/T234182) [05:53:40] (03PS4) 10KartikMistry: apertium-lex-tools: Update to new upstream release 0.2.3 [debs/contenttranslation/apertium-lex-tools] - 10https://gerrit.wikimedia.org/r/577045 (https://phabricator.wikimedia.org/T234182) [05:57:17] (03CR) 10jerkins-bot: [V: 04-1] apertium-lex-tools: Update to new upstream release 0.2.3 [debs/contenttranslation/apertium-lex-tools] - 10https://gerrit.wikimedia.org/r/577045 (https://phabricator.wikimedia.org/T234182) (owner: 10KartikMistry) [06:04:33] (03PS5) 10KartikMistry: apertium-lex-tools: Update to new upstream release 0.2.3 [debs/contenttranslation/apertium-lex-tools] - 10https://gerrit.wikimedia.org/r/577045 (https://phabricator.wikimedia.org/T234182) [06:08:34] (03CR) 10jerkins-bot: [V: 04-1] apertium-lex-tools: Update to new upstream release 0.2.3 [debs/contenttranslation/apertium-lex-tools] - 10https://gerrit.wikimedia.org/r/577045 (https://phabricator.wikimedia.org/T234182) (owner: 10KartikMistry) [06:19:08] (03PS6) 10KartikMistry: apertium-lex-tools: Update to new upstream release 0.2.3 [debs/contenttranslation/apertium-lex-tools] - 10https://gerrit.wikimedia.org/r/577045 (https://phabricator.wikimedia.org/T234182) [06:23:31] (03CR) 10jerkins-bot: [V: 04-1] apertium-lex-tools: Update to new upstream release 0.2.3 [debs/contenttranslation/apertium-lex-tools] - 10https://gerrit.wikimedia.org/r/577045 (https://phabricator.wikimedia.org/T234182) (owner: 10KartikMistry) [06:26:28] (03PS7) 10KartikMistry: apertium-lex-tools: Update to new upstream release 0.2.3 [debs/contenttranslation/apertium-lex-tools] - 10https://gerrit.wikimedia.org/r/577045 (https://phabricator.wikimedia.org/T234182) [06:36:41] (03PS3) 10KartikMistry: Add apertium-pol-szl package [debs/contenttranslation/apertium-pol-szl] - 10https://gerrit.wikimedia.org/r/576628 (https://phabricator.wikimedia.org/T202276) [06:36:50] (03CR) 10jerkins-bot: [V: 04-1] Add apertium-pol-szl package [debs/contenttranslation/apertium-pol-szl] - 10https://gerrit.wikimedia.org/r/576628 (https://phabricator.wikimedia.org/T202276) (owner: 10KartikMistry) [06:40:30] (03PS4) 10KartikMistry: Add apertium-pol-szl package [debs/contenttranslation/apertium-pol-szl] - 10https://gerrit.wikimedia.org/r/576628 (https://phabricator.wikimedia.org/T202276) [06:43:46] (03CR) 10jerkins-bot: [V: 04-1] Add apertium-pol-szl package [debs/contenttranslation/apertium-pol-szl] - 10https://gerrit.wikimedia.org/r/576628 (https://phabricator.wikimedia.org/T202276) (owner: 10KartikMistry) [07:06:16] (03PS1) 10KartikMistry: apertium: Update dependency and fix conflict [debs/contenttranslation/apertium] - 10https://gerrit.wikimedia.org/r/577861 [11:29:18] 10Operations, 10WMNO-Sami, 10Wikimedia-Mailing-lists: Create mailing list for WMNO Sámi project - https://phabricator.wikimedia.org/T182093 (10jhsoby-WMNO) a:05jhsoby-WMNO→03None [12:03:46] PROBLEM - Ensure traffic_server is running for instance backend on cp3053 is CRITICAL: connect to address 10.20.0.53 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [12:03:48] PROBLEM - check_trafficserver_log_fifo_purge_backend on cp3053 is CRITICAL: connect to address 10.20.0.53 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [12:03:56] PROBLEM - Disk space on cp3053 is CRITICAL: connect to address 10.20.0.53 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=cp3053&var-datasource=esams+prometheus/ops [12:04:04] PROBLEM - MD RAID on cp3053 is CRITICAL: connect to address 10.20.0.53 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [12:04:12] PROBLEM - Freshness of OCSP Stapling files -ATS-TLS acme-chief- on cp3053 is CRITICAL: connect to address 10.20.0.53 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/HTTPS/Unified_Certificates [12:04:14] PROBLEM - Ensure traffic_manager is running for instance tls on cp3053 is CRITICAL: connect to address 10.20.0.53 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [12:04:14] PROBLEM - Varnish HTCP daemon on cp3053 is CRITICAL: connect to address 10.20.0.53 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Varnish [12:04:20] PROBLEM - Ensure traffic_manager is running for instance backend on cp3053 is CRITICAL: connect to address 10.20.0.53 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [12:04:20] PROBLEM - Webrequests Varnishkafka log producer on cp3053 is CRITICAL: connect to address 10.20.0.53 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka [12:04:30] PROBLEM - traffic-pool service on cp3053 is CRITICAL: connect to address 10.20.0.53 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [12:04:42] PROBLEM - Ensure trafficserver_exporter is running for instance backend on cp3053 is CRITICAL: connect to address 10.20.0.53 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [12:04:42] PROBLEM - confd service on cp3053 is CRITICAL: connect to address 10.20.0.53 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [12:05:02] PROBLEM - check_trafficserver_backend_config_status on cp3053 is CRITICAL: connect to address 10.20.0.53 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [12:05:14] PROBLEM - Check systemd state on cp3053 is CRITICAL: connect to address 10.20.0.53 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [12:05:18] PROBLEM - Freshness of OCSP Stapling files -ATS-TLS- on cp3053 is CRITICAL: connect to address 10.20.0.53 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/HTTPS/Unified_Certificates [12:05:28] PROBLEM - Confd template for /etc/varnish/directors.frontend.vcl on cp3053 is CRITICAL: connect to address 10.20.0.53 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Confd%23Monitoring [12:05:34] PROBLEM - Logs skipped by trafficserver-tls on cp3053 is CRITICAL: connect to address 10.20.0.53 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/ATS [12:05:36] PROBLEM - Logs skipped by trafficserver on cp3053 is CRITICAL: connect to address 10.20.0.53 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/ATS [12:05:36] PROBLEM - configured eth on cp3053 is CRITICAL: connect to address 10.20.0.53 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_eth [12:05:38] PROBLEM - Ensure traffic_server is running for instance tls on cp3053 is CRITICAL: connect to address 10.20.0.53 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [12:05:38] PROBLEM - Ensure trafficserver_exporter is running for instance tls on cp3053 is CRITICAL: connect to address 10.20.0.53 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [12:05:54] PROBLEM - Confd vcl based reload on cp3053 is CRITICAL: connect to address 10.20.0.53 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Varnish [12:05:58] PROBLEM - Default ATS Lua configuration file on cp3053 is CRITICAL: connect to address 10.20.0.53 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/ATS [12:05:58] PROBLEM - TLS Lua configuration file on cp3053 is CRITICAL: connect to address 10.20.0.53 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/ATS [12:06:00] PROBLEM - check_trafficserver_log_fifo_notpurge_backend on cp3053 is CRITICAL: connect to address 10.20.0.53 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [12:06:00] PROBLEM - check_trafficserver_log_fifo_analytics_tls on cp3053 is CRITICAL: connect to address 10.20.0.53 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [12:06:00] PROBLEM - check_trafficserver_log_fifo_tls_tls on cp3053 is CRITICAL: connect to address 10.20.0.53 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [12:06:04] PROBLEM - dhclient process on cp3053 is CRITICAL: connect to address 10.20.0.53 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient [12:06:04] PROBLEM - DPKG on cp3053 is CRITICAL: connect to address 10.20.0.53 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/dpkg [12:07:30] PROBLEM - puppet last run on cp3053 is CRITICAL: connect to address 10.20.0.53 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [12:09:20] PROBLEM - traffic_server backend process restarted on cp3053 is CRITICAL: 2 ge 2 https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server https://grafana.wikimedia.org/d/6uhkG6OZk/ats-instance-drilldown?orgId=1&var-site=esams+prometheus/ops&var-instance=cp3053&var-layer=backend [12:10:12] PROBLEM - traffic_server tls process restarted on cp3053 is CRITICAL: 2 ge 2 https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server https://grafana.wikimedia.org/d/6uhkG6OZk/ats-instance-drilldown?orgId=1&var-site=esams+prometheus/ops&var-instance=cp3053&var-layer=tls [12:10:34] PROBLEM - Varnish frontend child restarted on cp3053 is CRITICAL: 2 ge 2 https://wikitech.wikimedia.org/wiki/Varnish https://grafana.wikimedia.org/dashboard/db/varnish-machine-stats?panelId=66&fullscreen&orgId=1&var-server=cp3053&var-datasource=esams+prometheus/ops [12:14:30] hmmm [12:14:33] checking [12:17:11] OOM killer got rid of varnish cache-main process :/ [12:18:12] RECOVERY - Freshness of OCSP Stapling files -ATS-TLS acme-chief- on cp3053 is OK: OK https://wikitech.wikimedia.org/wiki/HTTPS/Unified_Certificates [12:18:12] RECOVERY - Ensure traffic_manager is running for instance tls on cp3053 is OK: PROCS OK: 1 process with args /usr/bin/traffic_manager --run-root=/srv/trafficserver/tls --nosyslog https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [12:18:12] RECOVERY - Varnish HTCP daemon on cp3053 is OK: PROCS OK: 1 process with UID = 114 (vhtcpd), args vhtcpd https://wikitech.wikimedia.org/wiki/Varnish [12:18:20] RECOVERY - Webrequests Varnishkafka log producer on cp3053 is OK: PROCS OK: 1 process with args /usr/bin/varnishkafka -S /etc/varnishkafka/webrequest.conf https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka [12:18:20] RECOVERY - Ensure traffic_manager is running for instance backend on cp3053 is OK: PROCS OK: 1 process with args /usr/bin/traffic_manager --nosyslog https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [12:18:30] RECOVERY - traffic-pool service on cp3053 is OK: OK - traffic-pool is active https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [12:18:42] RECOVERY - Ensure trafficserver_exporter is running for instance backend on cp3053 is OK: PROCS OK: 1 process with args /usr/bin/python3 /usr/bin/prometheus-trafficserver-exporter --no-procstats --no-ssl-verification --endpoint http://127.0.0.1:3128/_stats --port 9122 https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [12:18:42] RECOVERY - confd service on cp3053 is OK: OK - confd is active https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [12:19:02] RECOVERY - check_trafficserver_backend_config_status on cp3053 is OK: OK: configuration is current https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [12:19:12] RECOVERY - Check systemd state on cp3053 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [12:19:18] RECOVERY - Freshness of OCSP Stapling files -ATS-TLS- on cp3053 is OK: OK https://wikitech.wikimedia.org/wiki/HTTPS/Unified_Certificates [12:19:28] RECOVERY - Confd template for /etc/varnish/directors.frontend.vcl on cp3053 is OK: No errors detected https://wikitech.wikimedia.org/wiki/Confd%23Monitoring [12:19:32] RECOVERY - Logs skipped by trafficserver-tls on cp3053 is OK: OK: no matches found in journal for unit trafficserver-tls https://wikitech.wikimedia.org/wiki/ATS [12:19:34] RECOVERY - Logs skipped by trafficserver on cp3053 is OK: OK: no matches found in journal for unit trafficserver https://wikitech.wikimedia.org/wiki/ATS [12:19:36] RECOVERY - configured eth on cp3053 is OK: OK - interfaces up https://wikitech.wikimedia.org/wiki/Monitoring/check_eth [12:19:38] RECOVERY - Ensure traffic_server is running for instance tls on cp3053 is OK: PROCS OK: 1 process with args /srv/trafficserver/tls/bin/traffic_server -M --run-root=/srv/trafficserver/tls/runroot.yaml --httpport 443 https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [12:19:38] RECOVERY - Ensure trafficserver_exporter is running for instance tls on cp3053 is OK: PROCS OK: 1 process with args /usr/bin/python3 /usr/bin/prometheus-trafficserver-exporter --no-procstats --no-ssl-verification --endpoint https://127.0.0.1:443/_stats --port 9322 https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [12:19:54] RECOVERY - Confd vcl based reload on cp3053 is OK: reload-vcl successfully ran 94h, 18 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [12:19:58] RECOVERY - TLS Lua configuration file on cp3053 is OK: OK https://wikitech.wikimedia.org/wiki/ATS [12:19:58] RECOVERY - Default ATS Lua configuration file on cp3053 is OK: OK https://wikitech.wikimedia.org/wiki/ATS [12:20:00] RECOVERY - check_trafficserver_log_fifo_analytics_tls on cp3053 is OK: OK: TS_MAIN writing to and fifo-log-demux reading from /srv/trafficserver/tls/var/log/analytics.pipe https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [12:20:00] RECOVERY - check_trafficserver_log_fifo_tls_tls on cp3053 is OK: OK: TS_MAIN writing to and fifo-log-demux reading from /srv/trafficserver/tls/var/log/tls.pipe https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [12:20:00] RECOVERY - check_trafficserver_log_fifo_notpurge_backend on cp3053 is OK: OK: TS_MAIN writing to and fifo-log-demux reading from /var/log/trafficserver/notpurge.pipe https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [12:20:04] RECOVERY - dhclient process on cp3053 is OK: PROCS OK: 0 processes with command name dhclient https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient [12:20:04] RECOVERY - DPKG on cp3053 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg [12:20:08] RECOVERY - Ensure traffic_server is running for instance backend on cp3053 is OK: PROCS OK: 1 process with args /usr/bin/traffic_server -M --httpport 3128 https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [12:20:10] RECOVERY - puppet last run on cp3053 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [12:20:10] RECOVERY - check_trafficserver_log_fifo_purge_backend on cp3053 is OK: OK: TS_MAIN writing to and fifo-log-demux reading from /var/log/trafficserver/purge.pipe https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [12:20:16] RECOVERY - Disk space on cp3053 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=cp3053&var-datasource=esams+prometheus/ops [12:20:26] RECOVERY - MD RAID on cp3053 is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [12:24:55] 10Operations, 10Traffic: OOM killer killed varnihsd cache-main on cp3053 - https://phabricator.wikimedia.org/T247195 (10Vgutierrez) [12:43:02] RECOVERY - Memory correctable errors -EDAC- on mw1248 is OK: (C)4 ge (W)2 ge 1 https://wikitech.wikimedia.org/wiki/Monitoring/Memory%23Memory_correctable_errors_-EDAC- https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=mw1248&var-datasource=eqiad+prometheus/ops [14:02:30] PROBLEM - Old JVM GC check - cloudelastic1001-cloudelastic-chi-eqiad on cloudelastic1001 is CRITICAL: 100.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1 [14:04:40] PROBLEM - Hadoop NodeManager on an-worker1087 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration [14:05:19] vgutierrez: did we ever fix it so that ats-tls answers unhealthy to pybal when its backing varnish is down? [14:11:30] that's not a bug [14:11:34] it's a feature [14:11:43] so I don't know if it should be fixed :) [14:11:53] hm [14:12:31] we can provide an ats-tls healthcheck endpoint if it's needed [14:13:06] actually we will need it as soon as we are ready to move the port 80 from varnish-fe to ats-tls [14:30:36] RECOVERY - Old JVM GC check - cloudelastic1001-cloudelastic-chi-eqiad on cloudelastic1001 is OK: (C)100 gt (W)80 gt 75.35 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1 [15:19:49] 10Operations, 10cloud-services-team (Kanban): Migrate remaining self-hosted puppet masters to Puppet 5 / facter 3 - https://phabricator.wikimedia.org/T241719 (10Krenair) @aborrero: Hi, do you still need openstack-puppetmaster-01? It's not got any cherry-picks in operations/puppet or labs/private, it's not got... [16:56:08] (03PS1) 10Reedy: Add wmgDisableAccountCreation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/578047 (https://phabricator.wikimedia.org/T247185) [16:58:45] Reedy: should we not add autocreateaccount to * per Urbanecm in that as well [16:59:02] https://phabricator.wikimedia.org/T247185#5952178 [17:01:11] 10Operations, 10cloud-services-team (Kanban): Migrate remaining self-hosted puppet masters to Puppet 5 / facter 3 - https://phabricator.wikimedia.org/T241719 (10Krenair) [17:02:37] RhinosF1: You really don't need to act as a go between for comments on a task I'm subscribed to [17:04:16] k [17:07:42] (03PS2) 10Reedy: Add wmgDisableAccountCreation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/578047 (https://phabricator.wikimedia.org/T247185) [17:08:26] (03CR) 10RhinosF1: [C: 04-1] "Auto create account should be true?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/578047 (https://phabricator.wikimedia.org/T247185) (owner: 10Reedy) [17:08:54] (03PS3) 10Reedy: Add wmgDisableAccountCreation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/578047 (https://phabricator.wikimedia.org/T247185) [17:10:07] (03CR) 10RhinosF1: [C: 03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/578047 (https://phabricator.wikimedia.org/T247185) (owner: 10Reedy) [17:10:28] (03CR) 10Urbanecm: [C: 03+1] "lgtm" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/578047 (https://phabricator.wikimedia.org/T247185) (owner: 10Reedy) [17:11:18] (03PS4) 10Reedy: Add wmgDisableAccountCreation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/578047 (https://phabricator.wikimedia.org/T247185) [17:13:05] (03CR) 10Reedy: [C: 03+2] Add wmgDisableAccountCreation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/578047 (https://phabricator.wikimedia.org/T247185) (owner: 10Reedy) [17:14:04] (03Merged) 10jenkins-bot: Add wmgDisableAccountCreation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/578047 (https://phabricator.wikimedia.org/T247185) (owner: 10Reedy) [17:15:29] !log reedy@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Add wmgDisableAccountCreation (duration: 00m 59s) [17:15:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:17:05] !log reedy@deploy1001 Synchronized wmf-config/CommonSettings.php: Add wmgDisableAccountCreation (duration: 00m 56s) [17:17:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:44:28] 10Operations, 10cloud-services-team (Kanban): Migrate remaining self-hosted puppet masters to Puppet 5 / facter 3 - https://phabricator.wikimedia.org/T241719 (10Krenair) [17:58:04] !log restart hadoop-yarn-nodemanger on an-worker1087 [17:58:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:58:38] RECOVERY - Hadoop NodeManager on an-worker1087 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration [18:10:50] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3060 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [18:22:28] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3060 is OK: HTTP OK: HTTP/1.0 200 OK - 22047 bytes in 0.257 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [18:48:02] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3050 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [18:54:58] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3050 is OK: HTTP OK: HTTP/1.0 200 OK - 22057 bytes in 0.269 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [19:20:06] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3056 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [19:26:58] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3056 is OK: HTTP OK: HTTP/1.0 200 OK - 22067 bytes in 0.273 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [21:15:10] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3052 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [21:24:24] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3052 is OK: HTTP OK: HTTP/1.0 200 OK - 22065 bytes in 0.256 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server