[00:57:04] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s5 on db2066 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[01:16:34] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s5 on db2045 is OK: OK slave_sql_lag Replication lag: 0.15 seconds
[01:55:34] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s5 on db2059 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[02:36:34] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s5 on db2038 is OK: OK slave_sql_lag Replication lag: 0.46 seconds
[03:11:34] <icinga-wm>	 PROBLEM - cxserver endpoints health on scb1001 is CRITICAL: /v1/page/{language}/{title}{/revision} (Fetch enwiki Oxygen page) timed out before a response was received: /v1/mt/{from}/{to}{/provider} (Machine translate an HTML fragment using Apertium.) timed out before a response was received: /v1/translate/{from}/{to}{/provider} (Machine translate an HTML fragment using Apertium, adapt the links to target language wiki.) timed o
[03:11:34] <icinga-wm>	 e was received
[03:11:44] <icinga-wm>	 PROBLEM - eventstreams on scb1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[03:12:35] <icinga-wm>	 RECOVERY - eventstreams on scb1001 is OK: HTTP OK: HTTP/1.1 200 OK - 929 bytes in 0.012 second response time
[03:13:34] <icinga-wm>	 RECOVERY - cxserver endpoints health on scb1001 is OK: All endpoints are healthy
[03:20:34] <icinga-wm>	 PROBLEM - pdfrender on scb1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[03:21:24] <icinga-wm>	 RECOVERY - pdfrender on scb1002 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.002 second response time
[03:24:34] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 749.99 seconds
[03:30:05] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on scb1002 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly
[03:31:04] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on scb1002 is OK: OK ferm input default policy is set
[03:35:15] <icinga-wm>	 PROBLEM - puppet last run on mw2222 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP/GeoIP2-City.mmdb.gz]
[03:35:15] <icinga-wm>	 PROBLEM - puppet last run on mw2112 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 4 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP/GeoIP2-City.mmdb.test],File[/usr/share/GeoIP/GeoIP2-City.mmdb.gz]
[03:36:44] <icinga-wm>	 PROBLEM - puppet last run on analytics1042 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP/GeoIPCity.dat.gz]
[03:53:44] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 167.74 seconds
[04:01:44] <icinga-wm>	 RECOVERY - puppet last run on analytics1042 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[04:05:14] <icinga-wm>	 RECOVERY - puppet last run on mw2222 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[04:05:14] <icinga-wm>	 RECOVERY - puppet last run on mw2112 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[04:05:15] <icinga-wm>	 PROBLEM - pdfrender on scb1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[04:06:14] <icinga-wm>	 RECOVERY - pdfrender on scb1001 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 1.858 second response time
[04:07:55] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s5 on db2052 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[04:22:14] <icinga-wm>	 PROBLEM - mobileapps endpoints health on scb1002 is CRITICAL: /{domain}/v1/page/summary/{title}{/revision}{/tid} (Get summary for Barack Obama) timed out before a response was received: /{domain}/v1/page/media/{title} (retrieve images and videos of en.wp Cat page via media route) timed out before a response was received
[04:25:14] <icinga-wm>	 PROBLEM - Mobileapps LVS eqiad on mobileapps.svc.eqiad.wmnet is CRITICAL: /{domain}/v1/feed/onthisday/{type}/{mm}/{dd} (retrieve the selected anniversaries for January 15) timed out before a response was received: /{domain}/v1/page/mobile-sections/{title} (retrieve en.wp main page via mobile-sections) timed out before a response was received
[04:26:05] <icinga-wm>	 RECOVERY - Mobileapps LVS eqiad on mobileapps.svc.eqiad.wmnet is OK: All endpoints are healthy
[04:26:05] <icinga-wm>	 RECOVERY - mobileapps endpoints health on scb1002 is OK: All endpoints are healthy
[04:26:05] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1002 is CRITICAL: /api (Zotero alive) is CRITICAL: Could not fetch url http://10.64.16.21:1970/api: Generic connection error: HTTPConnectionPool(host=u10.64.16.21, port=1970): Max retries exceeded with url: /api?search=http%3A%2F%2Fexample.comformat=bibtex (Caused by ProtocolError(Connection aborted., BadStatusLine(,))): /api (Scrapes sample page) is CRITICAL: Could not fetch url http://1
[04:26:05] <icinga-wm>	 : Generic connection error: HTTPConnectionPool(host=u10.64.16.21, port=1970): Max retries exceeded with url: /api?search=http%3A%2F%2Fexample.comformat=mediawiki (Caused by ProtocolError(Connection aborted., BadStatusLine(,)))
[04:30:14] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1002 is CRITICAL: /api (Zotero alive) is CRITICAL: Could not fetch url http://10.64.16.21:1970/api: Generic connection error: HTTPConnectionPool(host=u10.64.16.21, port=1970): Max retries exceeded with url: /api?search=http%3A%2F%2Fexample.comformat=bibtex (Caused by ProtocolError(Connection aborted., BadStatusLine(,))): /api (Scrapes sample page) is CRITICAL: Could not fetch url http://1
[04:30:16] <icinga-wm>	 : Generic connection error: HTTPConnectionPool(host=u10.64.16.21, port=1970): Max retries exceeded with url: /api?search=http%3A%2F%2Fexample.comformat=mediawiki (Caused by ProtocolError(Connection aborted., BadStatusLine(,)))
[04:54:44] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1002 is CRITICAL: /api (Zotero alive) is CRITICAL: Could not fetch url http://10.64.16.21:1970/api: Generic connection error: HTTPConnectionPool(host=u10.64.16.21, port=1970): Max retries exceeded with url: /api?search=http%3A%2F%2Fexample.comformat=bibtex (Caused by ProtocolError(Connection aborted., BadStatusLine(,))): /api (Scrapes sample page) is CRITICAL: Could not fetch url http://1
[04:54:44] <icinga-wm>	 : Generic connection error: HTTPConnectionPool(host=u10.64.16.21, port=1970): Max retries exceeded with url: /api?search=http%3A%2F%2Fexample.comformat=mediawiki (Caused by ProtocolError(Connection aborted., BadStatusLine(,)))
[04:56:54] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1002 is CRITICAL: /api (Zotero alive) is CRITICAL: Could not fetch url http://10.64.16.21:1970/api: Generic connection error: HTTPConnectionPool(host=u10.64.16.21, port=1970): Max retries exceeded with url: /api?search=http%3A%2F%2Fexample.comformat=bibtex (Caused by ProtocolError(Connection aborted., BadStatusLine(,))): /api (Scrapes sample page) is CRITICAL: Could not fetch url http://1
[04:56:54] <icinga-wm>	 : Generic connection error: HTTPConnectionPool(host=u10.64.16.21, port=1970): Max retries exceeded with url: /api?search=http%3A%2F%2Fexample.comformat=mediawiki (Caused by ProtocolError(Connection aborted., BadStatusLine(,)))
[04:59:45] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1002 is CRITICAL: /api (Zotero alive) is CRITICAL: Could not fetch url http://10.64.16.21:1970/api: Generic connection error: HTTPConnectionPool(host=u10.64.16.21, port=1970): Max retries exceeded with url: /api?search=http%3A%2F%2Fexample.comformat=bibtex (Caused by ProtocolError(Connection aborted., BadStatusLine(,))): /api (Scrapes sample page) is CRITICAL: Could not fetch url http://1
[04:59:45] <icinga-wm>	 : Generic connection error: HTTPConnectionPool(host=u10.64.16.21, port=1970): Max retries exceeded with url: /api?search=http%3A%2F%2Fexample.comformat=mediawiki (Caused by ProtocolError(Connection aborted., BadStatusLine(,)))
[04:59:54] <icinga-wm>	 PROBLEM - mobileapps endpoints health on scb1002 is CRITICAL: /{domain}/v1/page/summary/{title}{/revision}{/tid} (Get summary for Barack Obama) timed out before a response was received
[05:00:44] <icinga-wm>	 RECOVERY - mobileapps endpoints health on scb1002 is OK: All endpoints are healthy
[05:23:14] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1002 is CRITICAL: /api (Zotero alive) is CRITICAL: Could not fetch url http://10.64.16.21:1970/api: Generic connection error: HTTPConnectionPool(host=u10.64.16.21, port=1970): Max retries exceeded with url: /api?search=http%3A%2F%2Fexample.comformat=bibtex (Caused by ProtocolError(Connection aborted., BadStatusLine(,))): /api (Scrapes sample page) is CRITICAL: Could not fetch url http://1
[05:23:14] <icinga-wm>	 : Generic connection error: HTTPConnectionPool(host=u10.64.16.21, port=1970): Max retries exceeded with url: /api?search=http%3A%2F%2Fexample.comformat=mediawiki (Caused by ProtocolError(Connection aborted., BadStatusLine(,)))
[05:27:15] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1002 is CRITICAL: /api (Zotero alive) is CRITICAL: Could not fetch url http://10.64.16.21:1970/api: Generic connection error: HTTPConnectionPool(host=u10.64.16.21, port=1970): Max retries exceeded with url: /api?search=http%3A%2F%2Fexample.comformat=bibtex (Caused by ProtocolError(Connection aborted., BadStatusLine(,))): /api (Scrapes sample page) is CRITICAL: Could not fetch url http://1
[05:27:15] <icinga-wm>	 : Generic connection error: HTTPConnectionPool(host=u10.64.16.21, port=1970): Max retries exceeded with url: /api?search=http%3A%2F%2Fexample.comformat=mediawiki (Caused by ProtocolError(Connection aborted., BadStatusLine(,)))
[06:15:04] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1002 is CRITICAL: /api (Zotero alive) is CRITICAL: Could not fetch url http://10.64.16.21:1970/api: Generic connection error: HTTPConnectionPool(host=u10.64.16.21, port=1970): Max retries exceeded with url: /api?search=http%3A%2F%2Fexample.comformat=bibtex (Caused by ProtocolError(Connection aborted., BadStatusLine(,))): /api (Scrapes sample page) is CRITICAL: Could not fetch url http://1
[06:15:04] <icinga-wm>	 : Generic connection error: HTTPConnectionPool(host=u10.64.16.21, port=1970): Max retries exceeded with url: /api?search=http%3A%2F%2Fexample.comformat=mediawiki (Caused by ProtocolError(Connection aborted., BadStatusLine(,)))
[06:27:15] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1002 is CRITICAL: /api (Zotero alive) is CRITICAL: Could not fetch url http://10.64.16.21:1970/api: Generic connection error: HTTPConnectionPool(host=u10.64.16.21, port=1970): Max retries exceeded with url: /api?search=http%3A%2F%2Fexample.comformat=bibtex (Caused by ProtocolError(Connection aborted., BadStatusLine(,))): /api (Scrapes sample page) is CRITICAL: Could not fetch url http://1
[06:27:15] <icinga-wm>	 : Generic connection error: HTTPConnectionPool(host=u10.64.16.21, port=1970): Max retries exceeded with url: /api?search=http%3A%2F%2Fexample.comformat=mediawiki (Caused by ProtocolError(Connection aborted., BadStatusLine(,)))
[06:41:25] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1002 is CRITICAL: /api (Zotero alive) is CRITICAL: Could not fetch url http://10.64.16.21:1970/api: Generic connection error: HTTPConnectionPool(host=u10.64.16.21, port=1970): Max retries exceeded with url: /api?search=http%3A%2F%2Fexample.comformat=bibtex (Caused by ProtocolError(Connection aborted., BadStatusLine(,))): /api (Scrapes sample page) is CRITICAL: Could not fetch url http://1
[06:41:25] <icinga-wm>	 : Generic connection error: HTTPConnectionPool(host=u10.64.16.21, port=1970): Max retries exceeded with url: /api?search=http%3A%2F%2Fexample.comformat=mediawiki (Caused by ProtocolError(Connection aborted., BadStatusLine(,)))
[06:43:44] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1002 is CRITICAL: /api (Zotero alive) is CRITICAL: Could not fetch url http://10.64.16.21:1970/api: Generic connection error: HTTPConnectionPool(host=u10.64.16.21, port=1970): Max retries exceeded with url: /api?search=http%3A%2F%2Fexample.comformat=bibtex (Caused by ProtocolError(Connection aborted., BadStatusLine(,))): /api (Scrapes sample page) is CRITICAL: Could not fetch url http://1
[06:43:44] <icinga-wm>	 : Generic connection error: HTTPConnectionPool(host=u10.64.16.21, port=1970): Max retries exceeded with url: /api?search=http%3A%2F%2Fexample.comformat=mediawiki (Caused by ProtocolError(Connection aborted., BadStatusLine(,)))
[06:45:24] <icinga-wm>	 PROBLEM - cxserver endpoints health on scb1002 is CRITICAL: /v1/mt/{from}/{to}{/provider} (Machine translate an HTML fragment using Apertium.) timed out before a response was received: /v1/translate/{from}/{to}{/provider} (Machine translate an HTML fragment using Apertium, adapt the links to target language wiki.) timed out before a response was received
[06:45:24] <icinga-wm>	 PROBLEM - graphoid endpoints health on scb1002 is CRITICAL: /{domain}/v1/{format}/{title}/{revid}/{id} (retrieve PNG from mediawiki.org) timed out before a response was received
[06:46:15] <icinga-wm>	 RECOVERY - graphoid endpoints health on scb1002 is OK: All endpoints are healthy
[06:46:24] <icinga-wm>	 RECOVERY - cxserver endpoints health on scb1002 is OK: All endpoints are healthy
[06:46:34] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1002 is CRITICAL: /api (Zotero alive) is CRITICAL: Could not fetch url http://10.64.16.21:1970/api: Generic connection error: HTTPConnectionPool(host=u10.64.16.21, port=1970): Max retries exceeded with url: /api?search=http%3A%2F%2Fexample.comformat=bibtex (Caused by ProtocolError(Connection aborted., BadStatusLine(,))): /api (Scrapes sample page) is CRITICAL: Could not fetch url http://1
[06:46:34] <icinga-wm>	 : Generic connection error: HTTPConnectionPool(host=u10.64.16.21, port=1970): Max retries exceeded with url: /api?search=http%3A%2F%2Fexample.comformat=mediawiki (Caused by ProtocolError(Connection aborted., BadStatusLine(,)))
[06:50:35] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1002 is CRITICAL: /api (Zotero alive) is CRITICAL: Could not fetch url http://10.64.16.21:1970/api: Generic connection error: HTTPConnectionPool(host=u10.64.16.21, port=1970): Max retries exceeded with url: /api?search=http%3A%2F%2Fexample.comformat=bibtex (Caused by ProtocolError(Connection aborted., BadStatusLine(,))): /api (Scrapes sample page) is CRITICAL: Could not fetch url http://1
[06:50:35] <icinga-wm>	 : Generic connection error: HTTPConnectionPool(host=u10.64.16.21, port=1970): Max retries exceeded with url: /api?search=http%3A%2F%2Fexample.comformat=mediawiki (Caused by ProtocolError(Connection aborted., BadStatusLine(,)))
[06:52:29] <mobrovac>	 looking ^
[07:01:33] <logmsgbot>	 !log mobrovac@tin Started restart [electron-render/deploy@8dd5f13]: electron stuck - T174916
[07:01:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:01:41] <stashbot>	 T174916: electron/pdfrender hangs - https://phabricator.wikimedia.org/T174916
[07:09:27] <logmsgbot>	 !log mobrovac@tin Started restart [zotero/translators@a0c41c3]: Zotero eating up memory
[07:09:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:13:24] <mobrovac>	 this was a trifecta: electron hanging, zotero eating up mem and trending edits losing its offset yet again
[07:13:52] <mobrovac>	 got to love such sunday mornings
[07:17:04] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1002 is OK: All endpoints are healthy
[09:28:37] <wikibugs>	 10Operations, 10Developer-Relations, 10cloud-services-team (Kanban): Create discourse-mediawiki.wmflabs.org - https://phabricator.wikimedia.org/T180854#3772732 (10Qgil) >>! In T180854#3771558, @Tgr wrote: > SSO task is T124691, should probably be a blocker.  Blocker for the pilot or for {T180853}?  > That ce...
[12:08:43] <wikibugs>	 10Operations, 10DBA, 10Wikimedia-Site-requests: Rename user "Arsog1985" to "Sigma'am" on Central Auth - https://phabricator.wikimedia.org/T180903#3772847 (10Linedwell)
[12:09:04] <wikibugs>	 10Operations, 10DBA, 10Wikimedia-Site-requests: Rename user "Arsog1985" to "Sigma'am" on Central Auth - https://phabricator.wikimedia.org/T180903#3772859 (10Linedwell)
[12:45:49] <wikibugs>	 10Operations, 10DBA, 10Wikimedia-Site-requests: Rename user "Arsog1985" to "Sigma'am" on Central Auth : supervision needed - https://phabricator.wikimedia.org/T180903#3772878 (10Framawiki)
[13:29:07] <Nikerabbit>	 could someone reset the topic?
[13:32:06] <Nikerabbit>	 thanks!
[13:34:57] <wikibugs>	 10Operations, 10Wikimedia-Site-requests: {{NUMBEROFARTICLES}} is too low in din.wikipedia.org - https://phabricator.wikimedia.org/T180905#3772918 (10Amire80)
[13:36:20] <paladox>	 your welcome :).
[14:59:22] <wikibugs>	 (03PS8) 10Paladox: javascript: Remove the npm package [puppet] - 10https://gerrit.wikimedia.org/r/386889
[14:59:35] <wikibugs>	 (03Abandoned) 10Paladox: javascript: Remove the npm package [puppet] - 10https://gerrit.wikimedia.org/r/386889 (owner: 10Paladox)
[15:01:28] <wikibugs>	 (03CR) 10Gehel: [C: 04-1] Gerrit: Fix up logstash configuation (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/392079 (https://phabricator.wikimedia.org/T141324) (owner: 10Paladox)
[15:02:07] <wikibugs>	 (03CR) 10Paladox: Gerrit: Fix up logstash configuation (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/392079 (https://phabricator.wikimedia.org/T141324) (owner: 10Paladox)
[15:04:00] <wikibugs>	 (03CR) 10Paladox: [C: 031] Gerrit: Fix up logstash configuation (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/392079 (https://phabricator.wikimedia.org/T141324) (owner: 10Paladox)
[15:05:18] <paladox>	 gehel it's done pursposely it seems. I want to add support for setting socket in gerrit with java code but it wont be in 2.13 and may not be in 2.14.
[15:05:51] <paladox>	 i am just trying to think how we can wrap async around the socket one in java.
[15:06:16] <paladox>	 though for now we have to do it that way until we get it into gerrit's core.
[15:11:43] <gehel>	 paladox: the code you link checks the existence of the config file and bails if it cannot  create the log directory. Looking at that code, it I don't see any reason why the log files themselves should already exist
[15:11:55] <paladox>	 it checks for
[15:12:04] <paladox>	 log4j.configuation
[15:12:08] <paladox>	 in system properties
[15:12:14] <paladox>	 if it exists, it dosent execute the code
[15:12:42] <paladox>	 gehel https://github.com/GerritCodeReview/gerrit/blob/09786353f76b778a76a61a092adf60a41fbc3cfd/java/com/google/gerrit/server/util/SystemLog.java#L54
[15:12:52] <paladox>	 LOG4J_CONFIGURATION = "log4j.configuration";
[15:13:32] <paladox>	 ah here
[15:13:32] <paladox>	 https://github.com/GerritCodeReview/gerrit/blob/09786353f76b778a76a61a092adf60a41fbc3cfd/java/com/google/gerrit/server/util/SystemLog.java#L79
[15:13:37] <paladox>	 ah = and
[15:15:51] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Let public archives be indexed and archived - https://phabricator.wikimedia.org/T90407#3773043 (10Nemo_bis) +CC per https://lists.wikimedia.org/pipermail/wikitech-l/2017-November/089137.html
[15:16:01] <gehel>	 paladox: I have to leave (again), but can you try and make sure gerrit actually fails to log if the log files are non-existing? I'll probably be back for a bit later today...
[15:16:19] <paladox>	 ok
[15:16:32] <paladox>	 gehel i've confirmed that gerit fails to log if the file does not exist
[15:17:23] <gehel>	 How does it fails?
[15:17:36] <paladox>	 it shows in /var/log/syslog that the file does not exist
[15:17:45] <paladox>	 we had the wrong path specified for gc_log
[15:17:47] <paladox>	 apparently
[15:18:16] <paladox>	 so it was trying /var/log/gerrit/gc_log
[15:18:46] <paladox>	 i fixed that now. but it was logging to syslog saying that path did not exist
[15:19:46] <paladox>	 <param name="File" value="/var/lib/gerrit2/review_site/gc_log"/>
[15:19:54] <paladox>	 um 
[15:20:06] <paladox>	 never mind, it seems to create the file but not the directory if needed
[15:20:11] <paladox>	 sorry for spam
[15:20:59] <wikibugs>	 (03PS16) 10Paladox: Gerrit: Fix up logstash configuation [puppet] - 10https://gerrit.wikimedia.org/r/392079 (https://phabricator.wikimedia.org/T141324)
[15:54:07] <wikibugs>	 (03Draft1) 10Paladox: planet: Update template / css / item look [puppet] - 10https://gerrit.wikimedia.org/r/389498
[15:54:11] <wikibugs>	 (03Draft2) 10Paladox: planet: Update template / css / item look [puppet] - 10https://gerrit.wikimedia.org/r/389498
[15:54:14] <wikibugs>	 (03Draft3) 10Paladox: planet: Update template / css / item look [puppet] - 10https://gerrit.wikimedia.org/r/389498
[15:54:18] <wikibugs>	 (03PS4) 10Paladox: planet: Update template / css / item look [puppet] - 10https://gerrit.wikimedia.org/r/389498
[17:45:55] <icinga-wm>	 PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0]
[17:47:24] <icinga-wm>	 PROBLEM - Misc HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[17:54:25] <icinga-wm>	 RECOVERY - Misc HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[17:56:04] <icinga-wm>	 RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[17:56:44] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 224, down: 1, dormant: 0, excluded: 0, unused: 0
[18:06:27] <elukey>	 the 5xx spike is ores related, commented in https://phabricator.wikimedia.org/T179712#3773092
[18:15:02] <elukey>	 Zayo port on cr2-eqiad down, but no related downtime announced afaics. ---^ 
[18:15:09] <elukey>	 Cc: XioNoX 
[18:30:27] <wikibugs>	 (03Draft2) 10Jayprakash12345: Enable wgNamespacesWithSubpages for hiwikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392312
[18:30:54] <wikibugs>	 (03PS3) 10Jayprakash12345: Enable wgNamespacesWithSubpages for hiwikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392312 (https://phabricator.wikimedia.org/T180913)
[18:31:07] <XioNoX>	 elukey: not an issue, we have plenty of capacity. I will open a ticket if it's not solved by the time I get to my laptop in a few hours 
[18:32:43] <elukey>	 XioNoX: sure sure, I just wanted to ping you and get your opinion, thanks! :)
[18:56:15] <icinga-wm>	 PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1952 bytes in 0.090 second response time
[19:00:15] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 90.00% of data above the critical threshold [50.0]
[19:38:06] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Load 0 for db1100 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392315
[19:39:36] <wikibugs>	 (03CR) 10Jcrespo: [C: 04-1] "Load 9 doesn't depool a server." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392315 (owner: 10Marostegui)
[19:40:05] <wikibugs>	 (03CR) 10Jcrespo: [C: 04-1] "I meant load 0" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392315 (owner: 10Marostegui)
[19:40:27] <wikibugs>	 (03PS2) 10Marostegui: db-eqiad.php: Depool db1100 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392315
[19:40:51] <marostegui>	 jynus: I didn't realise it had replication broken, that is why I set 0
[19:42:57] <wikibugs>	 (03PS1) 10Jcrespo: mariadb: Depool db1100, pool db1071 instead [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392316
[19:44:06] <jynus>	 what do you think of https://gerrit.wikimedia.org/r/#/c/392316/1/wmf-config/db-eqiad.php ?
[19:44:29] <marostegui>	 I didn't chose db1071 in case you wanted it for testing during the week
[19:44:39] <marostegui>	 I don't mind if you prefer your patch or my patch
[19:44:58] <marostegui>	 I would set weight 0 though to db1071
[19:45:04] <icinga-wm>	 PROBLEM - Cxserver LVS eqiad on cxserver.svc.eqiad.wmnet is CRITICAL: /v1/mt/{from}/{to}{/provider} (Machine translate an HTML fragment using Apertium.) timed out before a response was received
[19:45:06] <marostegui>	 if you want to push your patch
[19:45:16] <marostegui>	 that would be my only comment about it
[19:45:25] <jynus>	 with load 0 it will be still pinged every time
[19:45:33] <jynus>	 it is how shitty is the load balancer
[19:45:38] <marostegui>	 \o/
[19:45:51] <marostegui>	 then, whatever patch you prefer I don't mind
[19:45:55] <icinga-wm>	 RECOVERY - Cxserver LVS eqiad on cxserver.svc.eqiad.wmnet is OK: All endpoints are healthy
[19:46:59] <jynus>	 also probably how bad wikidata code is
[19:47:07] <wikibugs>	 (03CR) 10Jcrespo: [C: 032] mariadb: Depool db1100, pool db1071 instead [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392316 (owner: 10Jcrespo)
[19:47:19] <wikibugs>	 (03Abandoned) 10Marostegui: db-eqiad.php: Depool db1100 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392315 (owner: 10Marostegui)
[19:47:23] <wikibugs>	 (03CR) 10jenkins-bot: mariadb: Depool db1100, pool db1071 instead [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392316 (owner: 10Jcrespo)
[19:50:03] <logmsgbot>	 !log jynus@tin Synchronized wmf-config/db-eqiad.php: Depool db1100 (duration: 00m 49s)
[19:50:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:51:33] <jynus>	 I do not know why we have a software load balancer
[19:51:34] <marostegui>	 Interesting, this server crashed already: https://gerrit.wikimedia.org/r/#/c/378193/ so probabbly a rebuild is a good idea
[19:51:45] <jynus>	 if if a server goes trivially down
[19:52:00] <jynus>	 the servers keeps being queried
[19:52:09] <marostegui>	 yeah, that is pretty terrible :(
[19:52:42] <jynus>	 and the only thing keeping the server up
[19:52:46] <jynus>	 is the query kille
[19:52:47] <jynus>	 r
[19:56:15] <icinga-wm>	 RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1948 bytes in 0.117 second response time
[19:57:35] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 70.00% above the threshold [25.0]
[20:10:44] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 226, down: 0, dormant: 0, excluded: 0, unused: 0
[20:21:57] <wikibugs>	 (03PS1) 10Krinkle: noc: Link to Grafana instead of Ganglia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392317
[20:22:07] <wikibugs>	 (03CR) 10Krinkle: [C: 032] noc: Link to Grafana instead of Ganglia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392317 (owner: 10Krinkle)
[20:23:20] <wikibugs>	 (03Merged) 10jenkins-bot: noc: Link to Grafana instead of Ganglia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392317 (owner: 10Krinkle)
[20:23:30] <wikibugs>	 (03CR) 10jenkins-bot: noc: Link to Grafana instead of Ganglia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392317 (owner: 10Krinkle)
[20:28:18] <logmsgbot>	 !log krinkle@tin Synchronized docroot/noc/index.html: noc: Link to Grafana (duration: 00m 49s)
[20:28:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:41:38] <TabbyCat>	 Krinkle, 
[20:41:41] <TabbyCat>	 A database query error has occurred. This may indicate a bug in the software.[WhHsMApAAD0AAFkrelgAAAAA] 2017-11-19 20:41:16: Fatal exception of type "Wikimedia\Rdbms\DBQueryTimeoutError"
[20:42:45] <dereckson>	 TabbyCat: on noc ?
[20:42:54] <TabbyCat>	 on production
[20:43:24] <dereckson>	 I was also puzzled a Meddiawiki error should occur on noc.
[20:44:01] <TabbyCat>	 filed https://phabricator.wikimedia.org/T180919
[20:44:20] <dereckson>	 a query too slow
[20:44:59] <dereckson>	 I can't repro visiting your url
[20:45:12] <dereckson>	 (it correctly prints the user contributions of the maintenance script)
[20:45:31] <TabbyCat>	 special:log not special:contribs
[20:45:41] <dereckson>	 oh fun
[20:45:43] <dereckson>	 /wiki/Special:Log/Maintenance_script
[20:45:47] <dereckson>	 is slow / buggy
[20:46:10] <dereckson>	 (oh, nevermind, I was testing the referrer URL previously)
[21:04:55] <icinga-wm>	 PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: /{domain}/v1/page/mobile-sections-lead/{title} (retrieve lead section of en.wp Altrincham page via mobile-sections-lead) is CRITICAL: Test retrieve lead section of en.wp Altrincham page via mobile-sections-lead returned the unexpected status 504 (expecting: 200)
[21:06:04] <icinga-wm>	 RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy
[21:15:23] <wikibugs>	 (03CR) 10Zoranzoki21: [C: 031] "Looks good to me, but someone else must approve" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392312 (https://phabricator.wikimedia.org/T180913) (owner: 10Jayprakash12345)
[21:19:22] <wikibugs>	 (03PS5) 10Paladox: planet: Update template / css / item look [puppet] - 10https://gerrit.wikimedia.org/r/389498
[21:23:10] <wikibugs>	 10Operations, 10Developer-Relations, 10cloud-services-team (Kanban): Create discourse-mediawiki.wmflabs.org - https://phabricator.wikimedia.org/T180854#3773265 (10Tgr) Blocker for production, I mean.
[21:57:55] <icinga-wm>	 PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: /{domain}/v1/page/mobile-sections-lead/{title} (retrieve lead section of en.wp Altrincham page via mobile-sections-lead) is CRITICAL: Test retrieve lead section of en.wp Altrincham page via mobile-sections-lead returned the unexpected status 504 (expecting: 200)
[21:59:04] <icinga-wm>	 RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy
[21:59:48] <wikibugs>	 (03CR) 10MarcoAurelio: [C: 031] Enable wgNamespacesWithSubpages for hiwikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392312 (https://phabricator.wikimedia.org/T180913) (owner: 10Jayprakash12345)
[22:05:16] <wikibugs>	 10Operations, 10Analytics, 10Research, 10Traffic, and 2 others: Referrer policy for browsers which only support the old spec - https://phabricator.wikimedia.org/T180921#3773297 (10Tgr)
[22:06:57] <wikibugs>	 10Operations, 10Analytics, 10Research, 10Traffic, and 2 others: Referrer policy for browsers which only support the old spec - https://phabricator.wikimedia.org/T180921#3773313 (10Tgr)
[22:17:24] <icinga-wm>	 PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: /{domain}/v1/page/mobile-sections/{title} (retrieve en.wp main page via mobile-sections) is CRITICAL: Test retrieve en.wp main page via mobile-sections returned the unexpected status 504 (expecting: 200)
[22:18:24] <icinga-wm>	 RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy
[22:22:49] <wikibugs>	 10Operations, 10Wikimedia-Planet, 10Patch-For-Review: planet.wikimedia.org: replace planet-venus software with rawdog - https://phabricator.wikimedia.org/T180498#3773316 (10Paladox)
[22:40:57] <wikibugs>	 (03PS6) 10Paladox: planet: Improve look and configuation updates [puppet] - 10https://gerrit.wikimedia.org/r/389498 (https://phabricator.wikimedia.org/T180498)
[22:46:44] <wikibugs>	 10Operations, 10Developer-Relations: Bring discourse.mediawiki.org to production - https://phabricator.wikimedia.org/T180853#3773320 (10Qgil)
[23:20:09] <Jamesofur>	 !log removed 2FA for Ask21 T180889
[23:20:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:20:18] <stashbot>	 T180889: Disable 2FA for Ask21 - https://phabricator.wikimedia.org/T180889
[23:51:37] <wikibugs>	 10Operations, 10I18n: Publish full fallback sequence for generic families (sans, serif) in SVG font rendering - https://phabricator.wikimedia.org/T180923#3773351 (10Arthur2e5)
[23:54:20] <wikibugs>	 10Operations, 10Analytics, 10Research, 10Traffic, and 4 others: Referrer policy for browsers which only support the old spec - https://phabricator.wikimedia.org/T180921#3773364 (10gh87)
[23:55:47] <wikibugs>	 10Operations, 10I18n: Publish full fallback sequence for generic families (sans, serif) in SVG font rendering - https://phabricator.wikimedia.org/T180923#3773365 (10Arthur2e5)
[23:58:34] <icinga-wm>	 PROBLEM - Long running screen/tmux on graphite1001 is CRITICAL: CRIT: Long running SCREEN process. (PID: 36516, 1734464s 1728000s).