[00:00:34] 10SRE, 10serviceops, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqia... [00:06:08] 10SRE: upgrade ping offload servers to bullseye (was: ping servers running out of disk) - https://phabricator.wikimedia.org/T273509 (10Dzahn) 05Open→03Stalled p:05Triage→03Low [00:13:52] !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1366.eqiad.wmnet with reason: REIMAGE [00:13:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:15:50] !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1366.eqiad.wmnet with reason: REIMAGE [00:15:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:17:29] !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw2265.codfw.wmnet with reason: REIMAGE [00:17:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:17:37] (03PS1) 10Dzahn: wmcs::monitoring: replace hiera inside hiera with lookup [puppet] - 10https://gerrit.wikimedia.org/r/662026 (https://phabricator.wikimedia.org/T209953) [00:19:28] !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2265.codfw.wmnet with reason: REIMAGE [00:19:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:23:38] !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1319.eqiad.wmnet with reason: REIMAGE [00:23:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:25:48] !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1319.eqiad.wmnet with reason: REIMAGE [00:25:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:28:56] !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1313.eqiad.wmnet with reason: REIMAGE [00:28:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:30:56] !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1313.eqiad.wmnet with reason: REIMAGE [00:30:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:31:41] (03CR) 10Dzahn: [C: 03+1] ldap: Migrate hiera() to lookup() [puppet] - 10https://gerrit.wikimedia.org/r/661916 (https://phabricator.wikimedia.org/T209953) (owner: 10Ladsgroup) [00:33:55] (03CR) 10Dzahn: mailman3: Start apache2 for web (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/657950 (https://phabricator.wikimedia.org/T256542) (owner: 10Ladsgroup) [00:40:34] 10SRE, 10serviceops, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1366.eqiad.wmnet'] ` an... [00:43:12] 10SRE, 10serviceops, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2265.codfw.wmnet'] ` an... [00:46:28] (03PS1) 10Dzahn: profile::rsyslog::udp_json_logback_compat: hiera -> lookup [puppet] - 10https://gerrit.wikimedia.org/r/662033 (https://phabricator.wikimedia.org/T209953) [00:46:28] !log dzahn@cumin1001 conftool action : set/pooled=no; selector: name=mw2265.codfw.wmnet [00:46:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:46:51] !log dzahn@cumin1001 conftool action : set/pooled=no; selector: name=mw1366.eqiad.wmnet [00:46:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:55:58] (03CR) 10Bstorm: [C: 03+1] "Looks safe enough to me" [puppet] - 10https://gerrit.wikimedia.org/r/661916 (https://phabricator.wikimedia.org/T209953) (owner: 10Ladsgroup) [00:57:20] !log dzahn@cumin1001 conftool action : set/pooled=yes; selector: name=mw1366.eqiad.wmnet [00:57:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:57:36] PROBLEM - Postgres Replication Lag on maps2010 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 2559898896 and 190 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [01:00:36] !log dzahn@cumin1001 conftool action : set/pooled=yes; selector: name=mw2265.codfw.wmnet [01:00:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:04:40] RECOVERY - Postgres Replication Lag on maps2010 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 1152 and 98 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [01:05:29] (03PS1) 10Dzahn: gerrit: replace certbot cron for cloud with systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/662035 (https://phabricator.wikimedia.org/T273673) [01:05:31] (03PS1) 10Dzahn: gerrit: remove code that absented cron [puppet] - 10https://gerrit.wikimedia.org/r/662036 (https://phabricator.wikimedia.org/T273673) [01:07:04] (03CR) 10jerkins-bot: [V: 04-1] gerrit: replace certbot cron for cloud with systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/662035 (https://phabricator.wikimedia.org/T273673) (owner: 10Dzahn) [01:07:18] (03CR) 10jerkins-bot: [V: 04-1] gerrit: remove code that absented cron [puppet] - 10https://gerrit.wikimedia.org/r/662036 (https://phabricator.wikimedia.org/T273673) (owner: 10Dzahn) [01:13:02] RECOVERY - Kafka Broker Replica Max Lag on kafka-jumbo1008 is OK: (C)5e+06 ge (W)1e+06 ge 6.799e+05 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/dashboard/db/kafka?panelId=16&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=jumbo-eqiad&var-kafka_broker=kafka-jumbo1008 [01:13:24] RECOVERY - Kafka Broker Replica Max Lag on kafka-jumbo1007 is OK: (C)5e+06 ge (W)1e+06 ge 5.748e+05 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/dashboard/db/kafka?panelId=16&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=jumbo-eqiad&var-kafka_broker=kafka-jumbo1007 [01:16:24] 10SRE, 10serviceops, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1319.eqiad.wmnet'] ` an... [01:21:20] 10SRE, 10serviceops, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1313.eqiad.wmnet'] ` an... [01:21:34] 10SRE, 10serviceops, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10Dzahn) [01:22:16] 10SRE, 10serviceops, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10Dzahn) [01:23:24] 10SRE, 10serviceops, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10Dzahn) p:05Medium→03High [01:25:35] !log dzahn@cumin1001 conftool action : set/pooled=no; selector: name=mw1313.eqiad.wmnet [01:25:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:25:53] !log dzahn@cumin1001 conftool action : set/pooled=no; selector: name=mw1319.eqiad.wmnet [01:25:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:29:10] !log dzahn@cumin1001 conftool action : set/pooled=yes; selector: name=mw1313.eqiad.wmnet [01:29:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:30:05] !log dzahn@cumin1001 conftool action : set/pooled=yes; selector: name=mw1319.eqiad.wmnet [01:30:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:44:29] 10Puppet, 10SRE, 10puppet-compiler, 10Patch-For-Review, 10User-jbond: replace all puppet crons with systemd timers - https://phabricator.wikimedia.org/T273673 (10Dzahn) a:05jbond→03None [02:15:59] PROBLEM - Check systemd state on ms-be2055 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [02:18:49] PROBLEM - OSPF status on cr2-eqord is CRITICAL: OSPFv2: 2/3 UP : OSPFv3: 2/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [02:20:37] PROBLEM - Router interfaces on cr2-codfw is CRITICAL: CRITICAL: host 208.80.153.193, interfaces up: 133, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [02:20:47] RECOVERY - Check systemd state on ms-be2055 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [02:39:04] (03PS1) 10Dzahn: site: add mwdebug servers on buster [puppet] - 10https://gerrit.wikimedia.org/r/662037 (https://phabricator.wikimedia.org/T274023) [02:53:08] (03PS1) 10Dzahn: trafficserver: add new debug servers to debug routing [puppet] - 10https://gerrit.wikimedia.org/r/662038 (https://phabricator.wikimedia.org/T274023) [02:53:41] (03CR) 10Dzahn: [C: 04-2] "stalled but upcoming" [puppet] - 10https://gerrit.wikimedia.org/r/662038 (https://phabricator.wikimedia.org/T274023) (owner: 10Dzahn) [02:55:01] (03CR) 10Dzahn: [C: 04-2] "3 and 4 will be buster, 1 and 2 are stretch, until we want to delete them" [puppet] - 10https://gerrit.wikimedia.org/r/662038 (https://phabricator.wikimedia.org/T274023) (owner: 10Dzahn) [02:56:21] (03PS2) 10Dzahn: site: add mwdebug servers on buster [puppet] - 10https://gerrit.wikimedia.org/r/662037 (https://phabricator.wikimedia.org/T274023) [02:58:41] (03PS3) 10Dzahn: site: add mwdebug servers on buster [puppet] - 10https://gerrit.wikimedia.org/r/662037 (https://phabricator.wikimedia.org/T274023) [02:59:56] (03PS4) 10Dzahn: site: add mwdebug servers on buster [puppet] - 10https://gerrit.wikimedia.org/r/662037 (https://phabricator.wikimedia.org/T274023) [03:22:47] PROBLEM - WDQS SPARQL on wdqs1013 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook [03:29:07] (03PS5) 10Dzahn: site: add mwdebug servers on buster [puppet] - 10https://gerrit.wikimedia.org/r/662037 (https://phabricator.wikimedia.org/T274023) [03:32:51] RECOVERY - WDQS SPARQL on wdqs1013 is OK: HTTP OK: HTTP/1.1 200 OK - 688 bytes in 1.065 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook [03:34:43] PROBLEM - Disk space on wdqs1009 is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=98%): /tmp 0 MB (0% inode=98%): /var/tmp 0 MB (0% inode=98%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=wdqs1009&var-datasource=eqiad+prometheus/ops [03:40:40] !log Deleted dump taking up diskspace on `wdqs1009`, disk space warning will resolve now [03:40:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:43:53] PROBLEM - Check systemd state on wdqs1009 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [03:51:21] RECOVERY - Check systemd state on wdqs1009 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [03:56:11] RECOVERY - Disk space on wdqs1009 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=wdqs1009&var-datasource=eqiad+prometheus/ops [03:58:15] PROBLEM - Host wdqs1013 is DOWN: PING CRITICAL - Packet loss = 100% [03:58:37] RECOVERY - Host wdqs1013 is UP: PING OK - Packet loss = 0%, RTA = 0.28 ms [05:23:09] RECOVERY - Router interfaces on cr2-codfw is OK: OK: host 208.80.153.193, interfaces up: 135, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [05:23:49] RECOVERY - OSPF status on cr2-eqord is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [08:00:04] Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210206T0800) [08:47:02] (03PS4) 10Elukey: Set Apache Bigtop 1.5 as default hadoop distro [puppet] - 10https://gerrit.wikimedia.org/r/661974 (https://phabricator.wikimedia.org/T273711) [08:48:55] (03PS1) 10Elukey: Set Bigtop 1.5 for Druid and Hue test hosts [puppet] - 10https://gerrit.wikimedia.org/r/662040 [08:49:37] (03CR) 10Elukey: [C: 03+2] Set Bigtop 1.5 for Druid and Hue test hosts [puppet] - 10https://gerrit.wikimedia.org/r/662040 (owner: 10Elukey) [08:52:14] !log elukey@cumin1001 START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001 [08:52:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:52:57] !log elukey@cumin1001 END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001 [08:52:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:58:57] !log elukey@cumin1001 START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001 [08:58:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:59:11] !log elukey@cumin1001 END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001 [08:59:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:03:28] (03PS5) 10Elukey: Set Apache Bigtop 1.5 as default hadoop distro [puppet] - 10https://gerrit.wikimedia.org/r/661974 (https://phabricator.wikimedia.org/T273711) [09:08:55] (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS (DIFF 17): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/27891/console" [puppet] - 10https://gerrit.wikimedia.org/r/661974 (https://phabricator.wikimedia.org/T273711) (owner: 10Elukey) [09:17:43] (03PS6) 10Elukey: Set Apache Bigtop 1.5 as default hadoop distro [puppet] - 10https://gerrit.wikimedia.org/r/661974 (https://phabricator.wikimedia.org/T273711) [09:22:37] (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS (DIFF 17): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/27892/console" [puppet] - 10https://gerrit.wikimedia.org/r/661974 (https://phabricator.wikimedia.org/T273711) (owner: 10Elukey) [12:46:24] PROBLEM - WDQS SPARQL on wdqs1005 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook [13:21:58] PROBLEM - Varnish HTTP upload-frontend - port 3120 on cp5004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [13:22:37] PROBLEM - ATS TLS has reduced HTTP availability #page on alert1001 is CRITICAL: cluster=cache_upload layer=tls https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=13&fullscreen&refresh=1m&orgId=1 [13:24:20] here, checking [13:24:46] moving to _security [13:25:10] PROBLEM - Maps edge eqsin on upload-lb.eqsin.wikimedia.org is CRITICAL: /osm-intl/info.json (tile service info for osm-intl) timed out before a response was received: /private-info/info.json (private tile service info for osm-intl) timed out before a response was received: /v4/marker/pin-m-fuel+ffffff.png (Untitled test) timed out before a response was received: /v4/marker/pin-m+ffffff.png (Untitled test) timed out before a respo [13:25:10] /v4/marker/pin-m+ffffff@2x.png (Untitled test) timed out before a response was received https://wikitech.wikimedia.org/wiki/Maps/RunBook [13:27:10] PROBLEM - Varnish HTTP upload-frontend - port 3120 on cp5006 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [13:27:26] PROBLEM - Varnish HTTP upload-frontend - port 3120 on cp5002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [13:28:46] PROBLEM - Varnish HTTP upload-frontend - port 3120 on cp5003 is CRITICAL: HTTP CRITICAL - No data received from host https://wikitech.wikimedia.org/wiki/Varnish [13:29:54] RECOVERY - Maps edge eqsin on upload-lb.eqsin.wikimedia.org is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Maps/RunBook [13:31:22] RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 134, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [13:31:26] RECOVERY - Varnish HTTP upload-frontend - port 3120 on cp5006 is OK: HTTP OK: HTTP/1.1 200 OK - 413 bytes in 0.450 second response time https://wikitech.wikimedia.org/wiki/Varnish [13:33:04] RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 241, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [13:39:22] PROBLEM - Varnish HTTP upload-frontend - port 3121 on cp5002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [13:39:46] PROBLEM - Varnish HTTP upload-frontend - port 80 on cp5002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [13:40:08] PROBLEM - Varnish HTTP upload-frontend - port 3126 on cp5002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [13:40:17] cp5002 noise is expected :) [13:41:12] PROBLEM - Varnish HTTP upload-frontend - port 3123 on cp5002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [13:41:16] PROBLEM - Varnish HTTP upload-frontend - port 3122 on cp5002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [13:41:42] PROBLEM - Varnish HTTP upload-frontend - port 3125 on cp5002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [13:41:58] PROBLEM - Varnish HTTP upload-frontend - port 3124 on cp5002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [13:43:12] PROBLEM - Varnish HTTP upload-frontend - port 3127 on cp5002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [13:45:04] RECOVERY - Varnish HTTP upload-frontend - port 3120 on cp5002 is OK: HTTP OK: HTTP/1.1 200 OK - 410 bytes in 0.451 second response time https://wikitech.wikimedia.org/wiki/Varnish [13:45:28] RECOVERY - Varnish HTTP upload-frontend - port 3123 on cp5002 is OK: HTTP OK: HTTP/1.1 200 OK - 410 bytes in 0.450 second response time https://wikitech.wikimedia.org/wiki/Varnish [13:45:34] RECOVERY - Varnish HTTP upload-frontend - port 3122 on cp5002 is OK: HTTP OK: HTTP/1.1 200 OK - 410 bytes in 0.451 second response time https://wikitech.wikimedia.org/wiki/Varnish [13:45:58] RECOVERY - Varnish HTTP upload-frontend - port 3125 on cp5002 is OK: HTTP OK: HTTP/1.1 200 OK - 410 bytes in 0.451 second response time https://wikitech.wikimedia.org/wiki/Varnish [13:46:18] RECOVERY - Varnish HTTP upload-frontend - port 3124 on cp5002 is OK: HTTP OK: HTTP/1.1 200 OK - 410 bytes in 0.601 second response time https://wikitech.wikimedia.org/wiki/Varnish [13:47:32] RECOVERY - Varnish HTTP upload-frontend - port 3127 on cp5002 is OK: HTTP OK: HTTP/1.1 200 OK - 410 bytes in 0.450 second response time https://wikitech.wikimedia.org/wiki/Varnish [13:48:12] RECOVERY - Varnish HTTP upload-frontend - port 3121 on cp5002 is OK: HTTP OK: HTTP/1.1 200 OK - 410 bytes in 0.589 second response time https://wikitech.wikimedia.org/wiki/Varnish [13:48:34] RECOVERY - Varnish HTTP upload-frontend - port 80 on cp5002 is OK: HTTP OK: HTTP/1.1 200 OK - 410 bytes in 0.450 second response time https://wikitech.wikimedia.org/wiki/Varnish [13:48:56] RECOVERY - Varnish HTTP upload-frontend - port 3126 on cp5002 is OK: HTTP OK: HTTP/1.1 200 OK - 409 bytes in 0.484 second response time https://wikitech.wikimedia.org/wiki/Varnish [13:55:12] RECOVERY - Varnish HTTP upload-frontend - port 3120 on cp5003 is OK: HTTP OK: HTTP/1.1 200 OK - 411 bytes in 0.450 second response time https://wikitech.wikimedia.org/wiki/Varnish [13:59:17] RECOVERY - ATS TLS has reduced HTTP availability #page on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=13&fullscreen&refresh=1m&orgId=1 [14:01:48] RECOVERY - Varnish HTTP upload-frontend - port 3120 on cp5004 is OK: HTTP OK: HTTP/1.1 200 OK - 411 bytes in 0.580 second response time https://wikitech.wikimedia.org/wiki/Varnish [14:27:35] 10SRE, 10Traffic: Investigate unusual media traffic pattern for AsterNovi-belgii-flower-1mb.jpg on Commons - https://phabricator.wikimedia.org/T273741 (10jcrespo) [14:41:18] PROBLEM - SSH on mw2217.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [15:42:34] RECOVERY - SSH on mw2217.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [15:47:50] Amir1: normal renaming still okay? [15:49:07] tabbycat: it should be slowed down but not a big deal [15:49:34] ack [16:54:39] (03CR) 10Ladsgroup: mailman3: Start apache2 for web (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/657950 (https://phabricator.wikimedia.org/T256542) (owner: 10Ladsgroup) [19:21:12] (03CR) 10Gergő Tisza: [C: 03+1] Set wgGEHelpPanelAskMentor to true for several wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/661448 (https://phabricator.wikimedia.org/T272753) (owner: 10Urbanecm) [19:30:09] Daimona: very well done [19:31:52] I'm debugging on mwdebug1003 right now [19:39:06] legoktm: what exactly? :D [19:42:35] Daimona: I think you being nerd sniped ;pP [19:42:53] I'm presuming lego is benchmarking your patch [19:42:54] Oh, I'm an easy target [19:43:10] Ahhh that patch, right :D It was fun [19:44:20] Daimona: I posted my results on your patch, it seems slower to me [19:44:31] Hmmmm I see [19:45:25] I need to investigate why [19:47:12] Daimona: in general, function calls are slow, so I think just calling ord() has enough overhead that it might not be able to beat string index access [19:47:21] heh [19:47:42] That's correct, I was wondering if it would compensate the cast and empty check though [19:50:09] Note, I'm using a poor man's script. But I'm still getting the same result locally. [19:50:51] that ord is faster? [19:51:02] Yes [19:51:04] what PHP version are you using btw? I'm on 7.4 [19:51:08] 7.4.1 [19:51:44] I doubt there's a significant difference between 7.4.1 and 7.4.15 [19:51:47] Using hrtime on 500000000 iterations, it's 19201ms vs 15283ms. I should try using Benchmarker [19:53:08] It is indeed slower if you remove the string cast and empty check [19:53:32] But consistently faster otherwise [19:53:38] hmm [19:53:51] Wait [19:54:07] I was looking at your script, but it's not using the actual code, right? [19:54:41] no, it just copies in the code [19:55:14] The "new" version uses an if-else instead of a ternary, and the "no safety" obviously doesn't have the empty check [19:56:17] (brb in a while, you've successfully nerd-sniped me \o/) [19:56:19] let me switch "new" to use && - and I'm ignoring "no safety" [19:56:22] haha :D [19:57:51] huh, that's wild. && is slower than if/else for me [20:00:08] legoktm: huh, do we have mwdebug1003 now? [20:00:35] Yep, it's a buster host [20:01:52] I do worry that we're spending too much time on a microoptimization which very little gain though :p [20:21:21] lol [20:21:33] you started it ;) [22:12:19] legoktm: so what is the final verdict? :P [22:13:46] I've also been thinking about another possibility, i.e. using a null coalesce with the string offset access [22:15:42] Which seems basically as performant as ord, at least my benchmarking shows no relevant difference. [22:25:41] may I ask for the link to the patch ? [22:26:15] No, you may not Platonides [22:26:41] ok then :P [22:26:46] :) [22:28:31] I guess it's just https://gerrit.wikimedia.org/r/c/mediawiki/core/+/662076 [22:28:57] lol https://gerrit.wikimedia.org/r/q/hashtag:"faster-mw-plz" [22:30:37] lolol [22:31:01] .o( "no le pidas peras al olmo") [22:31:02] Yes, that one :) [22:36:27] is "no le pidas peras al olmo" in one of those patches? O_o [22:38:30] Platonides: nope, that'd be for Wikimedia Labs :P [22:40:18] that could suit a logo [22:40:51] a pear over an Elm [22:45:12] BioHack LLP [23:34:37] (03CR) 10Dzahn: [C: 03+1] "compiles on mwdebug1003 - cherry-picking and then running all the httpbb tests seems best" [puppet] - 10https://gerrit.wikimedia.org/r/657139 (https://phabricator.wikimedia.org/T272305) (owner: 10Giuseppe Lavagetto)