[00:02:13] RECOVERY - High lag on wdqs1003 is OK: (C)3600 ge (W)1200 ge 175 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [00:02:44] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:03:11] !log krinkle@deploy1001 Synchronized php-1.33.0-wmf.1/extensions/CirrusSearch: T206967 - Ia23d19cf1e6 (duration: 01m 02s) [00:03:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:03:15] T206967: Job cirrusSearchLinksUpdatePrioritized failures "Call to getNamespace() on a non-object" - https://phabricator.wikimedia.org/T206967 [00:50:58] (03CR) 10GTirloni: git-sync-upstream: Send cron mail in case of failures (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/468865 (https://phabricator.wikimedia.org/T184261) (owner: 10GTirloni) [01:03:44] RECOVERY - Memory correctable errors -EDAC- on wtp2013 is OK: (C)4 ge (W)2 ge 0 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=wtp2013&var-datasource=codfw%2520prometheus%252Fops [01:04:54] PROBLEM - HTTP availability for Nginx -SSL terminators- at codfw on einsteinium is CRITICAL: cluster=cache_text site=codfw https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [01:05:14] PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on einsteinium is CRITICAL: cluster=cache_text site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [01:05:44] PROBLEM - HTTP availability for Nginx -SSL terminators- at esams on einsteinium is CRITICAL: cluster=cache_text site=esams https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [01:06:03] PROBLEM - HTTP availability for Varnish at codfw on einsteinium is CRITICAL: job=varnish-text site=codfw https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 [01:06:23] PROBLEM - HTTP availability for Nginx -SSL terminators- at eqsin on einsteinium is CRITICAL: cluster=cache_text site=eqsin https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [01:06:24] PROBLEM - HTTP availability for Varnish at ulsfo on einsteinium is CRITICAL: job=varnish-text site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 [01:06:34] PROBLEM - HTTP availability for Varnish at eqsin on einsteinium is CRITICAL: job=varnish-text site=eqsin https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 [01:06:39] uh [01:07:04] PROBLEM - Packet loss ratio for UDP on logstash1009 is CRITICAL: 0.3526 ge 0.1 https://grafana.wikimedia.org/dashboard/db/logstash [01:07:13] PROBLEM - HTTP availability for Varnish at esams on einsteinium is CRITICAL: job=varnish-text site=esams https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 [01:07:24] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5 [01:07:34] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5 [01:07:43] PROBLEM - HTTP availability for Varnish at eqiad on einsteinium is CRITICAL: job=varnish-text site=eqiad https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 [01:07:43] PROBLEM - HTTP availability for Nginx -SSL terminators- at eqiad on einsteinium is CRITICAL: cluster=cache_text site=eqiad https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [01:07:43] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 2.763 second response time [01:07:43] PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=eqiad&var-cache_type=All&var-status_type=5 [01:07:44] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [01:08:13] RECOVERY - Packet loss ratio for UDP on logstash1009 is OK: (C)0.1 ge (W)0.05 ge 0 https://grafana.wikimedia.org/dashboard/db/logstash [01:08:23] PROBLEM - Eqsin HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=eqsin&var-cache_type=All&var-status_type=5 [01:08:48] Krinkle ^^ [01:08:55] oh [01:08:57] nvm [01:09:04] RECOVERY - HTTP availability for Nginx -SSL terminators- at esams on einsteinium is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [01:09:19] meh clock went back so it looked like just around the time you deployed it. [01:09:24] RECOVERY - HTTP availability for Nginx -SSL terminators- at codfw on einsteinium is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [01:09:24] RECOVERY - HTTP availability for Varnish at codfw on einsteinium is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 [01:09:24] RECOVERY - HTTP availability for Varnish at esams on einsteinium is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 [01:09:24] PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=ulsfo&var-cache_type=All&var-status_type=5 [01:09:43] RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on einsteinium is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [01:09:43] RECOVERY - HTTP availability for Nginx -SSL terminators- at eqsin on einsteinium is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [01:09:44] RECOVERY - HTTP availability for Varnish at ulsfo on einsteinium is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 [01:09:54] RECOVERY - HTTP availability for Varnish at eqiad on einsteinium is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 [01:09:54] RECOVERY - HTTP availability for Nginx -SSL terminators- at eqiad on einsteinium is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [01:09:54] RECOVERY - HTTP availability for Varnish at eqsin on einsteinium is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 [01:11:04] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:12:04] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [01:16:03] RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=ulsfo&var-cache_type=All&var-status_type=5 [01:16:14] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5 [01:16:24] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5 [01:16:33] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=eqiad&var-cache_type=All&var-status_type=5 [01:17:14] RECOVERY - Eqsin HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=eqsin&var-cache_type=All&var-status_type=5 [01:39:14] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 7.569 second response time [01:42:50] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:16:25] !log repooling wdqs1003 - it has catch up with others [02:16:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:22:03] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 5.976 second response time [02:25:33] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:34:44] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 901.90 seconds [03:36:13] PROBLEM - puppet last run on mw2234 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP/GeoIPCity.dat.gz] [03:37:53] PROBLEM - puppet last run on mw2277 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP/GeoIPCity.dat.gz] [04:01:43] RECOVERY - puppet last run on mw2234 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [04:03:23] RECOVERY - puppet last run on mw2277 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [04:10:23] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 225.49 seconds [05:28:00] hi [05:28:09] https://meta.wikimedia.org/wiki/Requests_for_new_languages/Wikisource_Hindi [05:28:31] what does need to be done to create the subdomain? [05:30:13] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 9.904 second response time [05:33:33] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:40:33] yannf: once its approved by langcom, a task will be filed in phabricator and then the relevant patches are created to make the subdomain [05:40:55] ok thanks [06:28:23] PROBLEM - puppet last run on cp2024 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:28:33] PROBLEM - puppet last run on labstore1003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/puppet-enabled] [06:29:36] PROBLEM - puppet last run on labvirt1017 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/profile.d/bash_autologout.sh] [06:30:14] PROBLEM - puppet last run on deploy1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/cgroup-mediawiki-clean] [06:33:04] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 9.534 second response time [06:36:23] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:55:15] RECOVERY - puppet last run on labvirt1017 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [06:58:53] RECOVERY - puppet last run on cp2024 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:59:04] RECOVERY - puppet last run on labstore1003 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [07:00:53] RECOVERY - puppet last run on deploy1001 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [07:58:32] (03CR) 10Elukey: "Please be patient with me! :)" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/468865 (https://phabricator.wikimedia.org/T184261) (owner: 10GTirloni) [08:05:33] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 8.649 second response time [08:08:53] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:38:04] ACKNOWLEDGEMENT - Check systemd state on db1117 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. Banyek The host is fully operational, however, there are 2 service files which shouldnt be there, and those services are reporting fail: T208151 [10:15:43] PROBLEM - HHVM jobrunner on mw1337 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time [10:16:53] RECOVERY - HHVM jobrunner on mw1337 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.008 second response time [10:30:33] (03PS1) 10Stibba: Update logo for Hebrew Wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470236 (https://phabricator.wikimedia.org/T208148) [10:30:35] (03CR) 10Welcome, new contributor!: "Thank you for making your first contribution to Wikimedia! :) To learn how to get your code changes reviewed faster and more likely to get" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470236 (https://phabricator.wikimedia.org/T208148) (owner: 10Stibba) [11:35:26] (03CR) 10Zoranzoki21: [C: 031] "Hi, thanks for contributing to Wikimedia! This looks good, so I will add +1 in this patch." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470236 (https://phabricator.wikimedia.org/T208148) (owner: 10Stibba) [11:35:50] (03CR) 10Zoranzoki21: [C: 031] "@Urbanecm Should user update 1,5 and 2?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470236 (https://phabricator.wikimedia.org/T208148) (owner: 10Stibba) [11:42:05] (03CR) 10Urbanecm: [C: 04-1] "Logos should be optimalized, so less data are transferred. We use http://optipng.sourceforge.net/ as optimalizer. Please download optipng " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470236 (https://phabricator.wikimedia.org/T208148) (owner: 10Stibba) [11:47:39] 10Operations, 10Patch-For-Review: httpd class and php7.0 - conflict with mpm_event module - https://phabricator.wikimedia.org/T208108 (10Joe) I think we should stop using mod_php *anywhere*. We should really use php-fpm for anything that is not explicitly known not to work with fcgi (and I wonder, what that mi... [13:01:53] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 2.396 second response time [13:05:24] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:32:53] PROBLEM - High lag on wdqs1003 is CRITICAL: 3645 ge 3600 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [13:38:00] !log depooling wdqs1003 again to catch up with others [13:38:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:54:32] (03PS2) 10Stibba: Update logo for Hebrew Wikivoyage, Add HD hewikivoyage logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470236 (https://phabricator.wikimedia.org/T208148) [13:56:23] RECOVERY - High lag on wdqs1003 is OK: (C)3600 ge (W)1200 ge 138 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [14:00:14] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 3.489 second response time [14:02:36] (03CR) 10Urbanecm: [C: 031] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470236 (https://phabricator.wikimedia.org/T208148) (owner: 10Stibba) [14:03:04] (03PS7) 10GTirloni: toolforge: refactor/bootstrap service node puppet code [puppet] - 10https://gerrit.wikimedia.org/r/469614 (https://phabricator.wikimedia.org/T207591) (owner: 10Arturo Borrero Gonzalez) [14:03:44] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:14:18] (03CR) 10GTirloni: "Other than a bin/sbin typo, it seems okay so far. You can check gtirloni-stretch-01.testlabs.eqiad.wmflabs where I ran this on. I've insta" [puppet] - 10https://gerrit.wikimedia.org/r/469614 (https://phabricator.wikimedia.org/T207591) (owner: 10Arturo Borrero Gonzalez) [14:25:19] (03PS1) 10Stibba: Add Hebrew Wikivoyage HD logo location in InitialiseSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470242 (https://phabricator.wikimedia.org/T150618) [14:50:51] (03CR) 10Urbanecm: [C: 04-1] Add Hebrew Wikivoyage HD logo location in InitialiseSettings (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470242 (https://phabricator.wikimedia.org/T150618) (owner: 10Stibba) [14:55:40] (03PS2) 10Stibba: Add Hebrew Wikivoyage HD logo location in InitialiseSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470242 (https://phabricator.wikimedia.org/T208148) [14:56:58] (03CR) 10Urbanecm: [C: 031] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470242 (https://phabricator.wikimedia.org/T208148) (owner: 10Stibba) [16:36:24] !log repooling wdqs1003 - it didn't really catch up with others, but lag time on others are beginning to up. [16:36:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:12:29] (03PS5) 10Mathew.onipe: elasticsearch: cookbook for service rolling restart [cookbooks] - 10https://gerrit.wikimedia.org/r/467964 (https://phabricator.wikimedia.org/T207919) [17:14:32] (03PS6) 10Mathew.onipe: elasticsearch: cookbook for multi-cluster services rolling restart [cookbooks] - 10https://gerrit.wikimedia.org/r/467964 (https://phabricator.wikimedia.org/T207919) [17:15:13] (03PS8) 10Mathew.onipe: elasticsearch_cluster: multi-cluster/multi-instance support [software/spicerack] - 10https://gerrit.wikimedia.org/r/468558 (https://phabricator.wikimedia.org/T207918) [17:16:29] (03CR) 10jerkins-bot: [V: 04-1] elasticsearch_cluster: multi-cluster/multi-instance support [software/spicerack] - 10https://gerrit.wikimedia.org/r/468558 (https://phabricator.wikimedia.org/T207918) (owner: 10Mathew.onipe) [17:17:54] (03CR) 10Mathew.onipe: "> Patch Set 8: Verified-1" [software/spicerack] - 10https://gerrit.wikimedia.org/r/468558 (https://phabricator.wikimedia.org/T207918) (owner: 10Mathew.onipe) [17:30:44] !log restart yarn resource manager on an-master1002 to force failover to an-master1001 - T206943 [17:30:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:30:48] T206943: JVM pauses cause Yarn master to failover - https://phabricator.wikimedia.org/T206943 [17:37:14] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 6.061 second response time [17:40:43] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:52:24] PROBLEM - High lag on wdqs1003 is CRITICAL: 3628 ge 3600 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [17:54:43] PROBLEM - High lag on wdqs1003 is CRITICAL: 3663 ge 3600 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [17:56:02] * onimisionipe is looking at wdqs1003 [19:17:57] !log depooling wdqs1003 to catch up on lag [19:17:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:44:24] PROBLEM - HHVM jobrunner on mw1296 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time [19:45:33] RECOVERY - HHVM jobrunner on mw1296 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.001 second response time [19:50:20] 10Operations, 10Cloud-Services, 10Cloud-VPS: labs precise and jessie instance not accessible after provisioning - https://phabricator.wikimedia.org/T117673 (10Krenair) I haven't seen this recently, and precise VMs aren't supported/existing anymore (AFAIK). Shall we close it? [19:55:24] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 4.369 second response time [19:58:53] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:37:04] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 7.973 second response time [20:40:33] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:57:24] RECOVERY - High lag on wdqs1003 is OK: (C)3600 ge (W)1200 ge 168 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [21:26:04] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 5.228 second response time [21:29:34] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:27:53] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 9.143 second response time [22:31:13] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:51:33] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 6.624 second response time [22:54:53] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:57:14] PROBLEM - puppet last run on mw1323 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:00:03] * Krinkle staging on mwdebug1002 [23:23:53] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 7.549 second response time [23:27:23] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:27:54] RECOVERY - puppet last run on mw1323 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [23:30:44] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 9.642 second response time [23:34:13] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:36:28] !log krinkle@deploy1001 Synchronized php-1.33.0-wmf.1/extensions/Graph: T184128 - I02da92de33 (duration: 00m 58s) [23:36:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:36:32] T184128: "PHP Warning: data error" from gzdecode() in ApiGraph.php and ApiQueryMapData.php - https://phabricator.wikimedia.org/T184128 [23:37:03] PROBLEM - HHVM rendering on mwdebug1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:38:03] RECOVERY - HHVM rendering on mwdebug1002 is OK: HTTP OK: HTTP/1.1 200 OK - 75744 bytes in 0.461 second response time