[00:02:13] <icinga-wm>	 RECOVERY - High lag on wdqs1003 is OK: (C)3600 ge (W)1200 ge 175 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen
[00:02:44] <icinga-wm>	 PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[00:03:11] <logmsgbot>	 !log krinkle@deploy1001 Synchronized php-1.33.0-wmf.1/extensions/CirrusSearch: T206967 - Ia23d19cf1e6 (duration: 01m 02s)
[00:03:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:03:15] <stashbot>	 T206967: Job cirrusSearchLinksUpdatePrioritized failures "Call to getNamespace() on a non-object" - https://phabricator.wikimedia.org/T206967
[00:50:58] <wikibugs>	 (03CR) 10GTirloni: git-sync-upstream: Send cron mail in case of failures (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/468865 (https://phabricator.wikimedia.org/T184261) (owner: 10GTirloni)
[01:03:44] <icinga-wm>	 RECOVERY - Memory correctable errors -EDAC- on wtp2013 is OK: (C)4 ge (W)2 ge 0 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=wtp2013&var-datasource=codfw%2520prometheus%252Fops
[01:04:54] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at codfw on einsteinium is CRITICAL: cluster=cache_text site=codfw https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[01:05:14] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on einsteinium is CRITICAL: cluster=cache_text site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[01:05:44] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at esams on einsteinium is CRITICAL: cluster=cache_text site=esams https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[01:06:03] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at codfw on einsteinium is CRITICAL: job=varnish-text site=codfw https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[01:06:23] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at eqsin on einsteinium is CRITICAL: cluster=cache_text site=eqsin https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[01:06:24] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at ulsfo on einsteinium is CRITICAL: job=varnish-text site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[01:06:34] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at eqsin on einsteinium is CRITICAL: job=varnish-text site=eqsin https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[01:06:39] <paladox>	 uh
[01:07:04] <icinga-wm>	 PROBLEM - Packet loss ratio for UDP on logstash1009 is CRITICAL: 0.3526 ge 0.1 https://grafana.wikimedia.org/dashboard/db/logstash
[01:07:13] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at esams on einsteinium is CRITICAL: job=varnish-text site=esams https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[01:07:24] <icinga-wm>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5
[01:07:34] <icinga-wm>	 PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5
[01:07:43] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at eqiad on einsteinium is CRITICAL: job=varnish-text site=eqiad https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[01:07:43] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at eqiad on einsteinium is CRITICAL: cluster=cache_text site=eqiad https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[01:07:43] <icinga-wm>	 RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 2.763 second response time
[01:07:43] <icinga-wm>	 PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=eqiad&var-cache_type=All&var-status_type=5
[01:07:44] <icinga-wm>	 PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen
[01:08:13] <icinga-wm>	 RECOVERY - Packet loss ratio for UDP on logstash1009 is OK: (C)0.1 ge (W)0.05 ge 0 https://grafana.wikimedia.org/dashboard/db/logstash
[01:08:23] <icinga-wm>	 PROBLEM - Eqsin HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=eqsin&var-cache_type=All&var-status_type=5
[01:08:48] <paladox>	 Krinkle ^^
[01:08:55] <paladox>	 oh
[01:08:57] <paladox>	 nvm
[01:09:04] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at esams on einsteinium is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[01:09:19] <paladox>	 meh clock went back so it looked like just around the time you deployed it.
[01:09:24] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at codfw on einsteinium is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[01:09:24] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at codfw on einsteinium is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[01:09:24] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at esams on einsteinium is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[01:09:24] <icinga-wm>	 PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=ulsfo&var-cache_type=All&var-status_type=5
[01:09:43] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on einsteinium is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[01:09:43] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at eqsin on einsteinium is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[01:09:44] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at ulsfo on einsteinium is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[01:09:54] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at eqiad on einsteinium is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[01:09:54] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at eqiad on einsteinium is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[01:09:54] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at eqsin on einsteinium is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[01:11:04] <icinga-wm>	 PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[01:12:04] <icinga-wm>	 RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen
[01:16:03] <icinga-wm>	 RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=ulsfo&var-cache_type=All&var-status_type=5
[01:16:14] <icinga-wm>	 RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5
[01:16:24] <icinga-wm>	 RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5
[01:16:33] <icinga-wm>	 RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=eqiad&var-cache_type=All&var-status_type=5
[01:17:14] <icinga-wm>	 RECOVERY - Eqsin HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=eqsin&var-cache_type=All&var-status_type=5
[01:39:14] <icinga-wm>	 RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 7.569 second response time
[01:42:50] <icinga-wm>	 PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[02:16:25] <onimisionipe>	 !log repooling wdqs1003 - it has catch up with others
[02:16:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:22:03] <icinga-wm>	 RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 5.976 second response time
[02:25:33] <icinga-wm>	 PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[03:34:44] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 901.90 seconds
[03:36:13] <icinga-wm>	 PROBLEM - puppet last run on mw2234 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP/GeoIPCity.dat.gz]
[03:37:53] <icinga-wm>	 PROBLEM - puppet last run on mw2277 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP/GeoIPCity.dat.gz]
[04:01:43] <icinga-wm>	 RECOVERY - puppet last run on mw2234 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[04:03:23] <icinga-wm>	 RECOVERY - puppet last run on mw2277 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[04:10:23] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 225.49 seconds
[05:28:00] <yannf>	 hi
[05:28:09] <yannf>	 https://meta.wikimedia.org/wiki/Requests_for_new_languages/Wikisource_Hindi
[05:28:31] <yannf>	 what does need to be done to create the subdomain?
[05:30:13] <icinga-wm>	 RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 9.904 second response time
[05:33:33] <icinga-wm>	 PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[05:40:33] <p858snake|_>	 yannf: once its approved by langcom, a task will be filed in phabricator and then the relevant patches are created to make the subdomain
[05:40:55] <yannf>	 ok thanks
[06:28:23] <icinga-wm>	 PROBLEM - puppet last run on cp2024 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[06:28:33] <icinga-wm>	 PROBLEM - puppet last run on labstore1003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/puppet-enabled]
[06:29:36] <icinga-wm>	 PROBLEM - puppet last run on labvirt1017 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/profile.d/bash_autologout.sh]
[06:30:14] <icinga-wm>	 PROBLEM - puppet last run on deploy1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/cgroup-mediawiki-clean]
[06:33:04] <icinga-wm>	 RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 9.534 second response time
[06:36:23] <icinga-wm>	 PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[06:55:15] <icinga-wm>	 RECOVERY - puppet last run on labvirt1017 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures
[06:58:53] <icinga-wm>	 RECOVERY - puppet last run on cp2024 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[06:59:04] <icinga-wm>	 RECOVERY - puppet last run on labstore1003 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[07:00:53] <icinga-wm>	 RECOVERY - puppet last run on deploy1001 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[07:58:32] <wikibugs>	 (03CR) 10Elukey: "Please be patient with me! :)" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/468865 (https://phabricator.wikimedia.org/T184261) (owner: 10GTirloni)
[08:05:33] <icinga-wm>	 RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 8.649 second response time
[08:08:53] <icinga-wm>	 PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[08:38:04] <icinga-wm>	 ACKNOWLEDGEMENT - Check systemd state on db1117 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. Banyek The host is fully operational, however, there are 2 service files which shouldnt be there, and those services are reporting fail: T208151
[10:15:43] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1337 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time
[10:16:53] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1337 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.008 second response time
[10:30:33] <wikibugs>	 (03PS1) 10Stibba: Update logo for Hebrew Wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470236 (https://phabricator.wikimedia.org/T208148)
[10:30:35] <wikibugs>	 (03CR) 10Welcome, new contributor!: "Thank you for making your first contribution to Wikimedia! :) To learn how to get your code changes reviewed faster and more likely to get" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470236 (https://phabricator.wikimedia.org/T208148) (owner: 10Stibba)
[11:35:26] <wikibugs>	 (03CR) 10Zoranzoki21: [C: 031] "Hi, thanks for contributing to Wikimedia! This looks good, so I will add +1 in this patch." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470236 (https://phabricator.wikimedia.org/T208148) (owner: 10Stibba)
[11:35:50] <wikibugs>	 (03CR) 10Zoranzoki21: [C: 031] "@Urbanecm Should user update 1,5 and 2?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470236 (https://phabricator.wikimedia.org/T208148) (owner: 10Stibba)
[11:42:05] <wikibugs>	 (03CR) 10Urbanecm: [C: 04-1] "Logos should be optimalized, so less data are transferred. We use http://optipng.sourceforge.net/ as optimalizer. Please download optipng " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470236 (https://phabricator.wikimedia.org/T208148) (owner: 10Stibba)
[11:47:39] <wikibugs>	 10Operations, 10Patch-For-Review: httpd class and php7.0 - conflict with mpm_event module - https://phabricator.wikimedia.org/T208108 (10Joe) I think we should stop using mod_php *anywhere*. We should really use php-fpm for anything that is not explicitly known not to work with fcgi (and I wonder, what that mi...
[13:01:53] <icinga-wm>	 RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 2.396 second response time
[13:05:24] <icinga-wm>	 PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[13:32:53] <icinga-wm>	 PROBLEM - High lag on wdqs1003 is CRITICAL: 3645 ge 3600 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen
[13:38:00] <onimisionipe>	 !log depooling wdqs1003 again to catch up with others
[13:38:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:54:32] <wikibugs>	 (03PS2) 10Stibba: Update logo for Hebrew Wikivoyage, Add HD hewikivoyage logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470236 (https://phabricator.wikimedia.org/T208148)
[13:56:23] <icinga-wm>	 RECOVERY - High lag on wdqs1003 is OK: (C)3600 ge (W)1200 ge 138 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen
[14:00:14] <icinga-wm>	 RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 3.489 second response time
[14:02:36] <wikibugs>	 (03CR) 10Urbanecm: [C: 031] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470236 (https://phabricator.wikimedia.org/T208148) (owner: 10Stibba)
[14:03:04] <wikibugs>	 (03PS7) 10GTirloni: toolforge: refactor/bootstrap service node puppet code [puppet] - 10https://gerrit.wikimedia.org/r/469614 (https://phabricator.wikimedia.org/T207591) (owner: 10Arturo Borrero Gonzalez)
[14:03:44] <icinga-wm>	 PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[14:14:18] <wikibugs>	 (03CR) 10GTirloni: "Other than a bin/sbin typo, it seems okay so far. You can check gtirloni-stretch-01.testlabs.eqiad.wmflabs where I ran this on. I've insta" [puppet] - 10https://gerrit.wikimedia.org/r/469614 (https://phabricator.wikimedia.org/T207591) (owner: 10Arturo Borrero Gonzalez)
[14:25:19] <wikibugs>	 (03PS1) 10Stibba: Add Hebrew Wikivoyage HD logo location in InitialiseSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470242 (https://phabricator.wikimedia.org/T150618)
[14:50:51] <wikibugs>	 (03CR) 10Urbanecm: [C: 04-1] Add Hebrew Wikivoyage HD logo location in InitialiseSettings (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470242 (https://phabricator.wikimedia.org/T150618) (owner: 10Stibba)
[14:55:40] <wikibugs>	 (03PS2) 10Stibba: Add Hebrew Wikivoyage HD logo location in InitialiseSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470242 (https://phabricator.wikimedia.org/T208148)
[14:56:58] <wikibugs>	 (03CR) 10Urbanecm: [C: 031] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470242 (https://phabricator.wikimedia.org/T208148) (owner: 10Stibba)
[16:36:24] <onimisionipe>	 !log repooling wdqs1003 - it didn't really catch up with others, but lag time on others are beginning to up.
[16:36:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:12:29] <wikibugs>	 (03PS5) 10Mathew.onipe: elasticsearch: cookbook for service rolling restart [cookbooks] - 10https://gerrit.wikimedia.org/r/467964 (https://phabricator.wikimedia.org/T207919)
[17:14:32] <wikibugs>	 (03PS6) 10Mathew.onipe: elasticsearch: cookbook for multi-cluster services rolling restart [cookbooks] - 10https://gerrit.wikimedia.org/r/467964 (https://phabricator.wikimedia.org/T207919)
[17:15:13] <wikibugs>	 (03PS8) 10Mathew.onipe: elasticsearch_cluster: multi-cluster/multi-instance support [software/spicerack] - 10https://gerrit.wikimedia.org/r/468558 (https://phabricator.wikimedia.org/T207918)
[17:16:29] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] elasticsearch_cluster: multi-cluster/multi-instance support [software/spicerack] - 10https://gerrit.wikimedia.org/r/468558 (https://phabricator.wikimedia.org/T207918) (owner: 10Mathew.onipe)
[17:17:54] <wikibugs>	 (03CR) 10Mathew.onipe: "> Patch Set 8: Verified-1" [software/spicerack] - 10https://gerrit.wikimedia.org/r/468558 (https://phabricator.wikimedia.org/T207918) (owner: 10Mathew.onipe)
[17:30:44] <elukey>	 !log restart yarn resource manager on an-master1002 to force failover to an-master1001 - T206943
[17:30:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:30:48] <stashbot>	 T206943: JVM pauses cause Yarn master to failover - https://phabricator.wikimedia.org/T206943
[17:37:14] <icinga-wm>	 RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 6.061 second response time
[17:40:43] <icinga-wm>	 PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[17:52:24] <icinga-wm>	 PROBLEM - High lag on wdqs1003 is CRITICAL: 3628 ge 3600 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen
[17:54:43] <icinga-wm>	 PROBLEM - High lag on wdqs1003 is CRITICAL: 3663 ge 3600 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen
[17:56:02] * onimisionipe is looking at wdqs1003
[19:17:57] <onimisionipe>	 !log depooling wdqs1003 to catch up on lag
[19:17:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:44:24] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1296 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time
[19:45:33] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1296 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.001 second response time
[19:50:20] <wikibugs>	 10Operations, 10Cloud-Services, 10Cloud-VPS: labs precise and jessie instance not accessible after provisioning - https://phabricator.wikimedia.org/T117673 (10Krenair) I haven't seen this recently, and precise VMs aren't supported/existing anymore (AFAIK). Shall we close it?
[19:55:24] <icinga-wm>	 RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 4.369 second response time
[19:58:53] <icinga-wm>	 PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[20:37:04] <icinga-wm>	 RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 7.973 second response time
[20:40:33] <icinga-wm>	 PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[20:57:24] <icinga-wm>	 RECOVERY - High lag on wdqs1003 is OK: (C)3600 ge (W)1200 ge 168 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen
[21:26:04] <icinga-wm>	 RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 5.228 second response time
[21:29:34] <icinga-wm>	 PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[22:27:53] <icinga-wm>	 RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 9.143 second response time
[22:31:13] <icinga-wm>	 PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[22:51:33] <icinga-wm>	 RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 6.624 second response time
[22:54:53] <icinga-wm>	 PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[22:57:14] <icinga-wm>	 PROBLEM - puppet last run on mw1323 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[23:00:03] * Krinkle staging on mwdebug1002
[23:23:53] <icinga-wm>	 RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 7.549 second response time
[23:27:23] <icinga-wm>	 PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[23:27:54] <icinga-wm>	 RECOVERY - puppet last run on mw1323 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[23:30:44] <icinga-wm>	 RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 9.642 second response time
[23:34:13] <icinga-wm>	 PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[23:36:28] <logmsgbot>	 !log krinkle@deploy1001 Synchronized php-1.33.0-wmf.1/extensions/Graph: T184128 - I02da92de33 (duration: 00m 58s)
[23:36:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:36:32] <stashbot>	 T184128: "PHP Warning: data error" from gzdecode() in ApiGraph.php and ApiQueryMapData.php - https://phabricator.wikimedia.org/T184128
[23:37:03] <icinga-wm>	 PROBLEM - HHVM rendering on mwdebug1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[23:38:03] <icinga-wm>	 RECOVERY - HHVM rendering on mwdebug1002 is OK: HTTP OK: HTTP/1.1 200 OK - 75744 bytes in 0.461 second response time