[00:05:55] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1951 bytes in 0.101 second response time [00:13:55] RECOVERY - puppet last run on lvs5003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:36:16] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1947 bytes in 0.064 second response time [01:16:23] (03CR) 10Krinkle: [C: 031] "@Dzahn I don't know what those aliases are for but they're certainly not for the current webperf hosts. Maybe it happens to connect to tha" [puppet] - 10https://gerrit.wikimedia.org/r/433710 (https://phabricator.wikimedia.org/T158837) (owner: 10Imarlier) [01:16:47] (03CR) 10Krinkle: [C: 031] "I'd recommend a separate patch simply remove O:webperf and O:ve from that line." [puppet] - 10https://gerrit.wikimedia.org/r/433710 (https://phabricator.wikimedia.org/T158837) (owner: 10Imarlier) [01:19:16] RECOVERY - Check systemd state on kubernetes2003 is OK: OK - running: The system is fully operational [01:22:35] PROBLEM - Check systemd state on kubernetes2003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [02:07:45] PROBLEM - kubelet operational latencies on kubernetes1001 is CRITICAL: instance=kubernetes1001.eqiad.wmnet operation_type={container_status,create_container,start_container} https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [02:08:17] (03CR) 10Reedy: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435626 (owner: 10Reedy) [02:08:45] RECOVERY - kubelet operational latencies on kubernetes1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [02:08:52] (03CR) 10jerkins-bot: [V: 04-1] Support PHPUnit 6.5 in composer.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435626 (owner: 10Reedy) [02:18:56] RECOVERY - Check systemd state on kubernetes2003 is OK: OK - running: The system is fully operational [02:22:15] PROBLEM - Check systemd state on kubernetes2003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [02:29:26] (03CR) 10Reedy: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435626 (owner: 10Reedy) [02:29:58] (03CR) 10jerkins-bot: [V: 04-1] Support PHPUnit 6.5 in composer.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435626 (owner: 10Reedy) [03:48:49] RECOVERY - Check systemd state on kubernetes2003 is OK: OK - running: The system is fully operational [03:52:00] PROBLEM - Check systemd state on kubernetes2003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [04:37:30] PROBLEM - puppet last run on db2059 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [05:18:59] RECOVERY - Check systemd state on kubernetes2003 is OK: OK - running: The system is fully operational [05:22:10] PROBLEM - Check systemd state on kubernetes2003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [06:32:40] PROBLEM - puppet last run on labvirt1010 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/apt2xml] [06:39:19] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1967 bytes in 0.087 second response time [06:58:09] RECOVERY - puppet last run on labvirt1010 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:22:50] RECOVERY - MariaDB Slave Lag: s8 on db2085 is OK: OK slave_sql_lag Replication lag: 42.64 seconds [07:23:00] RECOVERY - MariaDB Slave Lag: s8 on db2080 is OK: OK slave_sql_lag Replication lag: 23.21 seconds [07:23:00] RECOVERY - MariaDB Slave Lag: s8 on db2086 is OK: OK slave_sql_lag Replication lag: 18.43 seconds [07:23:09] RECOVERY - MariaDB Slave Lag: s8 on dbstore2001 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [07:23:09] RECOVERY - MariaDB Slave Lag: s8 on db2045 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [07:23:20] RECOVERY - MariaDB Slave Lag: s8 on db2079 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [07:23:29] RECOVERY - MariaDB Slave Lag: s8 on db2082 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [07:23:39] RECOVERY - MariaDB Slave Lag: s8 on db2094 is OK: OK slave_sql_lag Replication lag: 0.34 seconds [07:23:39] RECOVERY - MariaDB Slave Lag: s8 on db2081 is OK: OK slave_sql_lag Replication lag: 0.51 seconds [07:36:52] !log legoktm@deploy1001 Synchronized php-1.32.0-wmf.6/skins/MonoBook/: Temporarily revert responsive MonoBook (T195625) (duration: 00m 58s) [07:36:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:36:58] T195625: Implement a responsive layout for MonoBook - https://phabricator.wikimedia.org/T195625 [10:02:40] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1952 bytes in 0.084 second response time [10:30:20] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1962 bytes in 0.089 second response time [10:50:39] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1945 bytes in 0.098 second response time [10:54:09] (03Abandoned) 10Zoranzoki21: Add filemover right to the groups of patroller and autoreviewer on zhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436213 (https://phabricator.wikimedia.org/T195247) (owner: 10Zoranzoki21) [10:55:36] (03CR) 10Zoranzoki21: [C: 031] "Oh, I had to do this? Thank you Urbanecm!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436524 (https://phabricator.wikimedia.org/T195247) (owner: 10Urbanecm) [10:56:45] (03PS3) 10Zoranzoki21: Add sites to the wgCopyUploadsDomains whitelist of Wikimedia Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436211 (https://phabricator.wikimedia.org/T195270) [12:48:29] RECOVERY - Check systemd state on kubernetes2003 is OK: OK - running: The system is fully operational [12:51:40] PROBLEM - Check systemd state on kubernetes2003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [12:59:59] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1974 bytes in 0.084 second response time [13:10:10] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1952 bytes in 0.075 second response time [13:18:13] (03CR) 10Paladox: "Probaly want to go with 2.15.2 as it has some notedb fixes + a ldap log change so we can investigate ldap issues better." [software/gerrit] (stable-2.15) - 10https://gerrit.wikimedia.org/r/436607 (owner: 10Chad) [13:19:40] PROBLEM - Device not healthy -SMART- on db2047 is CRITICAL: cluster=mysql device=cciss,2 instance=db2047:9100 job=node site=codfw https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=db2047&var-datasource=codfw%2520prometheus%252Fops [13:32:40] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1959 bytes in 0.075 second response time [13:42:45] (03PS1) 10Urbanecm: Change mode of IS.php to 644 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436988 (https://phabricator.wikimedia.org/T196225) [13:43:49] (03PS1) 10Urbanecm: Remove ruwiki from MFSpecialCaseMainPage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436989 (https://phabricator.wikimedia.org/T196223) [13:47:31] (03PS1) 10Urbanecm: Change bewikiquote logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436990 (https://phabricator.wikimedia.org/T196134) [13:53:09] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1948 bytes in 0.073 second response time [14:17:49] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1968 bytes in 0.083 second response time [14:23:00] 10Puppet, 10Beta-Cluster-Infrastructure, 10Tracking: Deployment-prep hosts with puppet errors (tracking) - https://phabricator.wikimedia.org/T132259#4250625 (10Krenair) [14:23:05] 10Operations, 10Puppet, 10Beta-Cluster-Infrastructure, 10media-storage, 10Patch-For-Review: Puppet broken on deployment-ms-be0[34] with evaluation error in swift module - https://phabricator.wikimedia.org/T184236#4250623 (10Krenair) 05Resolved>03Open cherry-picked, not merged [14:53:20] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1962 bytes in 0.085 second response time [15:09:36] 10Operations: replace tin (new hardware) - https://phabricator.wikimedia.org/T185275#4250678 (10Dzahn) [15:09:40] 10Operations, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External): setup/install/deploy deploy1001 as deployment server - https://phabricator.wikimedia.org/T175288#4250679 (10Dzahn) [15:10:01] (03PS1) 10Dzahn: deployment::server: add rsync for home dirs [puppet] - 10https://gerrit.wikimedia.org/r/436992 (https://phabricator.wikimedia.org/T175288) [15:17:00] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1971 bytes in 0.089 second response time [15:25:25] (03PS2) 10Dzahn: deployment::server: add rsync for home dirs [puppet] - 10https://gerrit.wikimedia.org/r/436992 (https://phabricator.wikimedia.org/T175288) [15:27:11] (03CR) 10Dzahn: [C: 032] deployment::server: add rsync for home dirs [puppet] - 10https://gerrit.wikimedia.org/r/436992 (https://phabricator.wikimedia.org/T175288) (owner: 10Dzahn) [15:34:58] (03PS1) 10Mainframe98: Add Minus-X to check against files that shouldn't be executable [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436994 (https://phabricator.wikimedia.org/T196255) [15:36:29] (03CR) 10jerkins-bot: [V: 04-1] Add Minus-X to check against files that shouldn't be executable [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436994 (https://phabricator.wikimedia.org/T196255) (owner: 10Mainframe98) [15:37:37] (03PS2) 10Mainframe98: Add Minus-X to check against files that shouldn't be executable [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436994 (https://phabricator.wikimedia.org/T196225) [15:37:39] (03PS1) 10Dzahn: Revert "deployment::server: add rsync for home dirs" [puppet] - 10https://gerrit.wikimedia.org/r/436995 [15:38:55] (03CR) 10jerkins-bot: [V: 04-1] Add Minus-X to check against files that shouldn't be executable [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436994 (https://phabricator.wikimedia.org/T196225) (owner: 10Mainframe98) [15:40:44] (03PS2) 10Urbanecm: Change mode of IS.php and a few of other files to 644 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436988 (https://phabricator.wikimedia.org/T196225) [15:42:01] (03PS3) 10Aklapper: phabricator: List new and recent assignees [puppet] - 10https://gerrit.wikimedia.org/r/435984 (https://phabricator.wikimedia.org/T195780) [15:42:20] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1952 bytes in 0.120 second response time [15:42:46] (03PS3) 10Urbanecm: Add Minus-X to check against files that shouldn't be executable [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436994 (https://phabricator.wikimedia.org/T196225) (owner: 10Mainframe98) [15:44:16] (03CR) 10Aklapper: "in PS3 I went for "user registered in last six weeks" instead of "less than X assignments in total" as querying the registration date is w" [puppet] - 10https://gerrit.wikimedia.org/r/435984 (https://phabricator.wikimedia.org/T195780) (owner: 10Aklapper) [15:51:02] (03CR) 10Dzahn: "unfortunately this new query is also already running over 3 or 4 minutes without a result" [puppet] - 10https://gerrit.wikimedia.org/r/435984 (https://phabricator.wikimedia.org/T195780) (owner: 10Aklapper) [15:51:55] (03PS1) 10Urbanecm: Set $wgMetaNamespace to "Вікіцытатнік" on bewikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436997 (https://phabricator.wikimedia.org/T196230) [15:55:43] (03CR) 10Dzahn: "query killed after roughly 8 minutes" [puppet] - 10https://gerrit.wikimedia.org/r/435984 (https://phabricator.wikimedia.org/T195780) (owner: 10Aklapper) [16:22:10] (03CR) 10Dzahn: [C: 031] "> @Dzahn I don't know what those aliases are for but they're certainly not for the current webperf hosts." [puppet] - 10https://gerrit.wikimedia.org/r/433710 (https://phabricator.wikimedia.org/T158837) (owner: 10Imarlier) [16:23:23] (03CR) 10Dzahn: [C: 031] "no, don't simply remove it. Just adjust the class name as it is renamed in this patch." [puppet] - 10https://gerrit.wikimedia.org/r/433710 (https://phabricator.wikimedia.org/T158837) (owner: 10Imarlier) [16:58:38] (03CR) 10Zhuyifei1999: "Will we conflict with https://puppet.com/docs/puppet/5.5/types/mailalias.html?" [puppet] - 10https://gerrit.wikimedia.org/r/436752 (https://phabricator.wikimedia.org/T196137) (owner: 10Arturo Borrero Gonzalez) [17:05:49] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1979 bytes in 0.102 second response time [17:10:59] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1973 bytes in 0.063 second response time [19:35:29] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1981 bytes in 0.083 second response time [19:37:59] PROBLEM - Check systemd state on restbase-dev1006 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [19:45:39] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1969 bytes in 0.069 second response time [20:03:00] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1972 bytes in 0.124 second response time [20:04:19] RECOVERY - Check systemd state on restbase-dev1006 is OK: OK - running: The system is fully operational [20:13:50] (03PS3) 10Reedy: Support PHPUnit 6.5 in composer.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435626 [20:30:50] PROBLEM - HP RAID on db2047 is CRITICAL: CRITICAL: Slot 0: OK: 1I:1:1, 1I:1:2, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:9, 1I:1:10, 1I:1:11, 1I:1:12 - Failed: 1I:1:3 - Controller: OK - Battery/Capacitor: OK [20:30:58] 10Operations, 10ops-codfw: Degraded RAID on db2047 - https://phabricator.wikimedia.org/T196246#4251009 (10ops-monitoring-bot) [20:49:00] RECOVERY - Check systemd state on kubernetes2003 is OK: OK - running: The system is fully operational [20:52:19] PROBLEM - Check systemd state on kubernetes2003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [20:53:59] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1948 bytes in 0.073 second response time [21:06:20] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1978 bytes in 0.116 second response time [21:21:53] (03PS3) 10Huji: Add several rights to eliminators in fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/430627 (https://phabricator.wikimedia.org/T176553) [21:27:59] ACKNOWLEDGEMENT - HP RAID on db2047 is CRITICAL: CRITICAL: Slot 0: OK: 1I:1:1, 1I:1:2, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:9, 1I:1:10, 1I:1:11, 1I:1:12 - Failed: 1I:1:3 - Controller: OK - Battery/Capacitor: OK nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T196246 [21:47:00] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1953 bytes in 0.091 second response time [21:54:19] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1962 bytes in 0.076 second response time [22:01:08] (03PS4) 10Andrew Bogott: keystonehooks: only add users to bastion if they have the 'user' role [puppet] - 10https://gerrit.wikimedia.org/r/436955 (https://phabricator.wikimedia.org/T165337) [22:02:27] (03CR) 10Andrew Bogott: [C: 032] keystonehooks: only add users to bastion if they have the 'user' role [puppet] - 10https://gerrit.wikimedia.org/r/436955 (https://phabricator.wikimedia.org/T165337) (owner: 10Andrew Bogott) [22:07:49] PROBLEM - toolschecker: check mtime mod from tools cron job on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/toolscron - 185 bytes in 0.008 second response time [22:10:00] RECOVERY - toolschecker: check mtime mod from tools cron job on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.003 second response time [22:27:19] (03CR) 10Jforrester: [C: 031] "Good to go next SWAT/whatever." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435626 (owner: 10Reedy) [22:30:00] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1951 bytes in 0.090 second response time [22:47:30] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1967 bytes in 0.065 second response time [22:52:39] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1947 bytes in 0.091 second response time [23:46:26] 10Operations, 10Performance-Team, 10Traffic, 10HTTPS: TLS certificates renewal process - https://phabricator.wikimedia.org/T196248#4251088 (10Krinkle) [23:54:59] (03PS1) 10Alex Monk: cumin: Allow Puppet DB backend to be used within Labs projects that use it [puppet] - 10https://gerrit.wikimedia.org/r/437052