[00:10:30] <grrrit-wm>	 (03PS29) 10Alex Monk: Add python version of maintain-replicas script [software] - 10https://gerrit.wikimedia.org/r/295607 (https://phabricator.wikimedia.org/T138450) 
[00:11:28] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Add python version of maintain-replicas script [software] - 10https://gerrit.wikimedia.org/r/295607 (https://phabricator.wikimedia.org/T138450) (owner: 10Alex Monk)
[00:12:13] <Krenair>	 maintain-replicas.py:187:80: E501 line too long (80 > 79 characters)
[00:12:15] * Krenair sighs
[00:12:39] <grrrit-wm>	 (03PS30) 10Alex Monk: Add python version of maintain-replicas script [software] - 10https://gerrit.wikimedia.org/r/295607 (https://phabricator.wikimedia.org/T138450) 
[01:54:30] <icinga-wm_>	 PROBLEM - puppet last run on multatuli is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[02:19:46] <icinga-wm_>	 RECOVERY - puppet last run on multatuli is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[02:27:12] <logmsgbot>	 !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.20) (duration: 13m 07s)
[02:27:18] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[02:30:03] <icinga-wm_>	 RECOVERY - cassandra service on maps-test2004 is OK: OK - cassandra is active
[02:38:25] <icinga-wm_>	 PROBLEM - cassandra service on maps-test2004 is CRITICAL: CRITICAL - Expecting active but unit cassandra is failed
[03:03:32] <icinga-wm_>	 PROBLEM - puppet last run on cp3003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[03:09:54] <black-man>	 !log just took a shit
[03:09:59] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[03:11:23] <icinga-wm_>	 PROBLEM - puppet last run on lvs1012 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[03:28:59] <icinga-wm_>	 RECOVERY - puppet last run on cp3003 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures
[03:36:29] <icinga-wm_>	 RECOVERY - puppet last run on lvs1012 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[04:54:49] <icinga-wm_>	 PROBLEM - puppet last run on mw1169 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[05:19:51] <icinga-wm_>	 RECOVERY - puppet last run on mw1169 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[05:34:02] <Debra>	 Maybe https://twitter.com/wikimediatech/status/782417315606462464 should be deleted/hidden.
[05:37:12] <p858snake|L2>	 We really should update the bot to check if the person is ID to nickserv first
[06:40:33] <alvarohola>	 !admin  hola alvaro molina hola alvaro molin hola alvaro molina hola alvaro molin hola alvaro molina hola alvaro molina
[06:40:35] <alvarohola>	 !admin  hola alvaro molina hola alvaro molin hola alvaro molina hola alvaro molin hola alvaro molina hola alvaro molina
[06:46:16] <alvarohola>	 !admin  hola alvaro molina hola alvaro molin hola alvaro molina hola alvaro molin hola alvaro molina hola alvaro molina
[06:46:41] <alvarohola>	 I LOVE ALEXZ
[06:46:55] <alvarohola>	 Shahhssh
[06:47:25] <alvarohola>	 !admin  hola alvaro molina hola alvaro molin hola alvaro molina hola alvaro molin hola alvaro molina hola alvaro molina
[06:50:11] <icinga-wm_>	 RECOVERY - puppet last run on maps-test2001 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures
[06:51:50] <icinga-wm_>	 RECOVERY - cassandra service on maps-test2003 is OK: OK - cassandra is active
[06:57:26] <icinga-wm_>	 PROBLEM - tools homepage -admin tool- on tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 531 bytes in 0.055 second response time
[06:58:08] <wikibugs>	 06Operations, 10Ops-Access-Requests, 10netops: Access to network devices - https://phabricator.wikimedia.org/T147061#2682181 (10Peachey88)
[06:58:41] <icinga-wm_>	 PROBLEM - puppet last run on elastic2005 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 6 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[molly-guard],Package[ncdu]
[06:59:31] <icinga-wm_>	 PROBLEM - cassandra service on maps-test2003 is CRITICAL: CRITICAL - Expecting active but unit cassandra is failed
[07:20:23] <icinga-wm_>	 RECOVERY - tools homepage -admin tool- on tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 3670 bytes in 0.065 second response time
[07:24:04] <icinga-wm_>	 RECOVERY - puppet last run on elastic2005 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[07:28:31] <wikibugs>	 06Operations, 06Discovery, 10Wikidata, 10Wikidata-Query-Service: Response times of Wikidata Query Service increasing - https://phabricator.wikimedia.org/T147130#2682204 (10Gehel)
[07:29:32] <wikibugs>	 06Operations, 06Discovery, 10Wikidata, 10Wikidata-Query-Service: 502 Bad Gateway errors while trying to run simple queries with the Wikidata Query Service - https://phabricator.wikimedia.org/T146576#2665333 (10Gehel) 05Open>03Resolved a:03Gehel As @jcrespo pointed out, the current issue is different...
[07:29:49] <gehel>	 !log silencing wdqs response time alerts, it is flapping, related to traffic - T147130
[07:29:51] <stashbot>	 T147130: Response times of Wikidata Query Service increasing - https://phabricator.wikimedia.org/T147130
[07:30:00] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[08:12:28] <grrrit-wm>	 (03PS2) 10Urbanecm: Add 1.5 and 2x logos for olowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313658 (https://phabricator.wikimedia.org/T146745) 
[08:43:22] <icinga-wm_>	 PROBLEM - puppet last run on elastic1032 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:48:41] <icinga-wm_>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0]
[08:53:11] <icinga-wm_>	 PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0]
[09:10:34] <icinga-wm_>	 RECOVERY - puppet last run on elastic1032 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[09:10:35] <icinga-wm_>	 RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[09:18:24] <icinga-wm_>	 RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[09:19:28] <icinga-wm_>	 PROBLEM - MD RAID on relforge1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[09:21:50] <icinga-wm_>	 RECOVERY - MD RAID on relforge1001 is OK: OK: Active: 8, Working: 8, Failed: 0, Spare: 0
[09:39:17] <icinga-wm_>	 PROBLEM - MD RAID on relforge1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[09:44:03] <icinga-wm_>	 RECOVERY - MD RAID on relforge1001 is OK: OK: Active: 8, Working: 8, Failed: 0, Spare: 0
[09:58:07] <icinga-wm_>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0]
[10:02:07] <icinga-wm_>	 PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0]
[10:03:09] <icinga-wm_>	 RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[10:07:07] <icinga-wm_>	 RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[10:22:58] <icinga-wm_>	 PROBLEM - Codfw HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[10:25:07] <icinga-wm_>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0]
[10:25:28] <icinga-wm_>	 RECOVERY - Codfw HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[10:26:11] <icinga-wm_>	 PROBLEM - Router interfaces on cr2-ulsfo is CRITICAL: CRITICAL: host 198.35.26.193, interfaces up: 75, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-1/3/0: down - Core: cr1-codfw:xe-5/0/2 (Zayo, OGYX/124337//ZYO, 38.8ms) {#?} [10Gbps wave]BR
[10:28:40] <icinga-wm_>	 RECOVERY - Router interfaces on cr2-ulsfo is OK: OK: host 198.35.26.193, interfaces up: 77, down: 0, dormant: 0, excluded: 0, unused: 0
[10:35:13] <icinga-wm_>	 PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0]
[10:44:22] <icinga-wm_>	 PROBLEM - puppet last run on analytics1049 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[10:44:44] <icinga-wm_>	 RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[10:47:04] <icinga-wm_>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0]
[10:52:03] <icinga-wm_>	 PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0]
[10:52:34] <icinga-wm_>	 PROBLEM - Codfw HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[11:06:53] <icinga-wm_>	 PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0]
[11:11:33] <icinga-wm_>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0]
[11:19:22] <icinga-wm_>	 RECOVERY - Codfw HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[11:23:32] <icinga-wm_>	 RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[11:23:42] <icinga-wm_>	 RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[11:40:19] <icinga-wm_>	 RECOVERY - puppet last run on analytics1049 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[12:05:30] <icinga-wm_>	 PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 658 600 - REDIS on 10.192.48.44:6479 has 1 databases (db0) with 3387384 keys - replication_delay is 658
[12:09:30] <icinga-wm_>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[12:16:39] <icinga-wm_>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0]
[12:21:31] <icinga-wm_>	 RECOVERY - cassandra service on maps-test2003 is OK: OK - cassandra is active
[12:23:49] <icinga-wm_>	 RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[12:38:39] <icinga-wm_>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[12:38:40] <icinga-wm_>	 PROBLEM - puppet last run on graphite1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[12:50:41] <icinga-wm_>	 RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[13:03:10] <icinga-wm_>	 RECOVERY - puppet last run on graphite1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[13:31:44] <_joe_>	 seems like the 5xx errors are 500s from commons
[13:33:11] <_joe_>	 some bot requesting 0px images
[13:33:15] <_joe_>	 or some app
[14:44:49] <icinga-wm_>	 PROBLEM - parsoid on wtp1018 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[14:47:01] <icinga-wm_>	 RECOVERY - parsoid on wtp1018 is OK: HTTP OK: HTTP/1.1 200 OK - 1514 bytes in 0.271 second response time
[15:33:39] <icinga-wm_>	 PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0]
[15:38:00] <icinga-wm_>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0]
[15:44:50] <icinga-wm_>	 PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0]
[15:51:10] <grrrit-wm>	 (03PS1) 10Kerberizer: Fix an invalid empty line in the global robots.txt [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313763 (https://phabricator.wikimedia.org/T146908) 
[15:52:01] <icinga-wm_>	 PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0]
[15:55:01] <icinga-wm_>	 PROBLEM - puppet last run on ms-be3001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[16:01:41] <icinga-wm_>	 RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0]
[16:02:10] <icinga-wm_>	 RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[16:06:19] <icinga-wm_>	 PROBLEM - OCG health on ocg1002 is CRITICAL: CRITICAL: ocg_job_status 868271 msg (=800000 warning): ocg_render_job_queue 3004 msg (=3000 critical)
[16:07:03] <icinga-wm_>	 PROBLEM - OCG health on ocg1003 is CRITICAL: CRITICAL: ocg_job_status 868544 msg (=800000 warning): ocg_render_job_queue 3117 msg (=3000 critical)
[16:07:39] <icinga-wm_>	 PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0]
[16:07:40] <icinga-wm_>	 PROBLEM - OCG health on ocg1001 is CRITICAL: CRITICAL: ocg_job_status 868828 msg (=800000 warning): ocg_render_job_queue 3226 msg (=3000 critical)
[16:09:29] <icinga-wm_>	 RECOVERY - cassandra service on maps-test2002 is OK: OK - cassandra is active
[16:12:01] <icinga-wm_>	 PROBLEM - puppet last run on lvs3002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[16:19:10] <icinga-wm_>	 RECOVERY - puppet last run on ms-be3001 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures
[16:21:00] <icinga-wm_>	 PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0]
[16:25:49] <icinga-wm_>	 RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0]
[16:37:24] <icinga-wm_>	 RECOVERY - puppet last run on lvs3002 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures
[16:38:12] <icinga-wm_>	 RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[17:12:49] <icinga-wm_>	 PROBLEM - puppet last run on cp3004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[17:39:10] <icinga-wm_>	 RECOVERY - puppet last run on cp3004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:59:50] <icinga-wm_>	 RECOVERY - cassandra service on maps-test2004 is OK: OK - cassandra is active
[18:39:50] <icinga-wm_>	 PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0]
[18:44:42] <icinga-wm_>	 RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0]
[18:59:49] <icinga-wm_>	 PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[19:10:23] <icinga-wm_>	 PROBLEM - puppet last run on cp3014 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[19:36:43] <icinga-wm_>	 RECOVERY - puppet last run on cp3014 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:50:00] <icinga-wm_>	 PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0]
[19:54:40] <icinga-wm_>	 PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0]
[19:56:00] <icinga-wm_>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0]
[20:00:52] <icinga-wm_>	 RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[21:23:15] <icinga-wm_>	 PROBLEM - puppet last run on ms-fe3002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:47:51] <icinga-wm_>	 RECOVERY - puppet last run on ms-fe3002 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures
[22:11:06] <icinga-wm_>	 PROBLEM - puppet last run on elastic1022 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[22:37:51] <icinga-wm_>	 RECOVERY - puppet last run on elastic1022 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[22:42:53] <icinga-wm_>	 RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 3290404 keys - replication_delay is 0
[23:23:22] <icinga-wm_>	 RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[23:33:11] <wikibugs>	 06Operations, 10Wikimedia-Site-requests, 07I18n, 07Tracking: Wikis waiting to be renamed (tracking) - https://phabricator.wikimedia.org/T21986#2683321 (10Krenair)