[02:24:49] <logmsgbot>	 !log l10nupdate@tin scap sync-l10n completed (1.30.0-wmf.1) (duration: 08m 46s)
[02:24:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:59:41] <icinga-wm>	 PROBLEM - Apache HTTP on mw1193 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.092 second response time
[03:00:41] <icinga-wm>	 RECOVERY - Apache HTTP on mw1193 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.130 second response time
[03:33:21] <icinga-wm>	 PROBLEM - puppet last run on analytics1042 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 2 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP/GeoIP2-City.mmdb.test],File[/usr/share/GeoIP/GeoIP2-City.mmdb.gz]
[04:00:21] <icinga-wm>	 RECOVERY - puppet last run on analytics1042 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures
[04:08:31] <icinga-wm>	 PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=295.80 Read Requests/Sec=2797.20 Write Requests/Sec=11.30 KBytes Read/Sec=29702.00 KBytes_Written/Sec=108.00
[04:16:31] <icinga-wm>	 RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=2.50 Read Requests/Sec=0.00 Write Requests/Sec=0.40 KBytes Read/Sec=0.00 KBytes_Written/Sec=5.20
[06:01:51] <icinga-wm>	 PROBLEM - haproxy failover on dbproxy1009 is CRITICAL: CRITICAL check_failover servers up 1 down 1
[06:02:01] <icinga-wm>	 PROBLEM - haproxy failover on dbproxy1004 is CRITICAL: CRITICAL check_failover servers up 1 down 1
[06:04:52] <icinga-wm>	 RECOVERY - haproxy failover on dbproxy1009 is OK: OK check_failover servers up 2 down 0
[06:04:52] <icinga-wm>	 RECOVERY - haproxy failover on dbproxy1004 is OK: OK check_failover servers up 2 down 0
[06:07:51] <icinga-wm>	 PROBLEM - haproxy failover on dbproxy1009 is CRITICAL: CRITICAL check_failover servers up 1 down 1
[06:07:51] <icinga-wm>	 PROBLEM - haproxy failover on dbproxy1004 is CRITICAL: CRITICAL check_failover servers up 1 down 1
[06:21:51] <icinga-wm>	 RECOVERY - haproxy failover on dbproxy1009 is OK: OK check_failover servers up 2 down 0
[06:21:51] <icinga-wm>	 RECOVERY - haproxy failover on dbproxy1004 is OK: OK check_failover servers up 2 down 0
[08:38:51] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1169 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[08:39:52] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1260 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[08:41:51] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1260 is OK: HTTP OK: HTTP/1.1 200 OK - 202 bytes in 3.152 second response time
[08:42:41] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1169 is OK: HTTP OK: HTTP/1.1 200 OK - 202 bytes in 0.074 second response time
[09:37:41] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s2 on db2049 is OK: OK slave_sql_lag Replication lag: 0.15 seconds
[09:48:38] <wikibugs>	 06Operations: Puppet: test non stringified facts across the fleet - https://phabricator.wikimedia.org/T166372#3296715 (10Volans) All but two diffs are related to `$::processorcount`:  ``` californium.wikimedia.org -          "no_workers": "8", +          "no_workers": 8, ...SNIP... -              "processes": "8...
[09:48:50] <wikibugs>	 (03PS1) 10Volans: Monitoring: remove spaces from list of interfaces [puppet] - 10https://gerrit.wikimedia.org/r/355896 (https://phabricator.wikimedia.org/T166372)
[10:35:01] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1260 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[10:35:51] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1259 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[10:36:51] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1260 is OK: HTTP OK: HTTP/1.1 200 OK - 202 bytes in 0.074 second response time
[10:39:51] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1259 is OK: HTTP OK: HTTP/1.1 200 OK - 202 bytes in 9.715 second response time
[10:41:51] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1259 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[10:44:41] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1259 is OK: HTTP OK: HTTP/1.1 200 OK - 202 bytes in 1.576 second response time
[10:52:01] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1260 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[10:53:51] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1169 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[10:56:41] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1169 is OK: HTTP OK: HTTP/1.1 200 OK - 202 bytes in 0.608 second response time
[10:58:51] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1260 is OK: HTTP OK: HTTP/1.1 200 OK - 202 bytes in 0.074 second response time
[11:02:01] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[11:02:01] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[11:02:51] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1001 is OK: All endpoints are healthy
[11:03:01] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1003 is OK: All endpoints are healthy
[11:03:51] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1169 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[11:04:41] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1169 is OK: HTTP OK: HTTP/1.1 200 OK - 202 bytes in 0.080 second response time
[11:06:01] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[11:07:51] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1002 is OK: All endpoints are healthy
[11:13:05] <wikibugs>	 (03CR) 10Volans: "Compiler results available here: https://puppet-compiler.wmflabs.org/6550/" [puppet] - 10https://gerrit.wikimedia.org/r/355896 (https://phabricator.wikimedia.org/T166372) (owner: 10Volans)
[11:14:51] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1168 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[11:16:41] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1168 is OK: HTTP OK: HTTP/1.1 200 OK - 202 bytes in 1.350 second response time
[11:19:01] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1260 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[11:19:51] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1259 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[11:22:51] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1168 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[11:24:41] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1259 is OK: HTTP OK: HTTP/1.1 200 OK - 202 bytes in 0.074 second response time
[11:25:01] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1260 is OK: HTTP OK: HTTP/1.1 200 OK - 202 bytes in 7.892 second response time
[11:28:51] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1259 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[11:30:41] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1259 is OK: HTTP OK: HTTP/1.1 200 OK - 202 bytes in 0.074 second response time
[11:35:01] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1260 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[11:36:01] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1260 is OK: HTTP OK: HTTP/1.1 200 OK - 202 bytes in 9.730 second response time
[11:38:01] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1260 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[11:40:41] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1168 is OK: HTTP OK: HTTP/1.1 200 OK - 202 bytes in 0.074 second response time
[11:41:01] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1260 is OK: HTTP OK: HTTP/1.1 200 OK - 202 bytes in 9.414 second response time
[11:45:01] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[11:45:02] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[11:45:02] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[11:47:01] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1001 is OK: All endpoints are healthy
[11:48:01] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1002 is OK: All endpoints are healthy
[11:48:01] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1003 is OK: All endpoints are healthy
[12:01:01] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[12:01:01] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[12:01:02] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[12:02:01] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1002 is OK: All endpoints are healthy
[12:02:01] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1003 is OK: All endpoints are healthy
[12:03:01] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1001 is OK: All endpoints are healthy
[12:06:01] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[12:07:01] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[12:08:01] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[12:09:01] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1002 is OK: All endpoints are healthy
[12:12:01] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1003 is OK: All endpoints are healthy
[12:12:01] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1001 is OK: All endpoints are healthy
[12:19:01] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[12:19:01] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[12:20:01] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1002 is OK: All endpoints are healthy
[12:20:03] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1003 is OK: All endpoints are healthy
[13:19:27] <jynus>	 !log restart db1069:3313 mysql instance, stuck on replication
[13:19:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:37:11] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1003 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received
[13:37:12] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[13:37:12] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[13:38:01] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1002 is OK: All endpoints are healthy
[13:38:11] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1003 is OK: All endpoints are healthy
[13:38:14] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1001 is OK: All endpoints are healthy
[13:59:11] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1002 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received
[14:02:11] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[14:03:01] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1001 is OK: All endpoints are healthy
[14:03:01] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1002 is OK: All endpoints are healthy
[14:22:12] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[14:24:11] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1001 is OK: All endpoints are healthy
[15:02:11] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[15:02:12] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[15:04:11] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1002 is OK: All endpoints are healthy
[15:04:11] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1001 is OK: All endpoints are healthy
[15:24:01] <wikibugs>	 (03PS2) 10Alexandros Kosiaris: role::kubernetes::worker: upgrade calico everywhere [puppet] - 10https://gerrit.wikimedia.org/r/355394 (https://phabricator.wikimedia.org/T165024) (owner: 10Giuseppe Lavagetto)
[15:25:33] <wikibugs>	 (03PS3) 10Alexandros Kosiaris: role::kubernetes::worker: upgrade calico everywhere [puppet] - 10https://gerrit.wikimedia.org/r/355394 (https://phabricator.wikimedia.org/T165024) (owner: 10Giuseppe Lavagetto)
[15:29:37] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 032] role::kubernetes::worker: upgrade calico everywhere [puppet] - 10https://gerrit.wikimedia.org/r/355394 (https://phabricator.wikimedia.org/T165024) (owner: 10Giuseppe Lavagetto)
[18:39:43] <Revent>	 Ping?
[18:39:53] <Revent>	 See https://quarry.wmflabs.org/query/18947
[18:41:37] <Revent>	 Whoops, broke the query for a moment, fixed...
[18:42:23] <Revent>	 Some recent uploads on Commons are giving a transcode_error of “* An unknown error occurred in storage backend "local-swift-eqiad". * An unknown error occurred in storage backend "local-swift-codfw”.”, but only for some resolutions.
[18:44:20] <Revent>	 https://quarry.wmflabs.org/query/18950 is another example.
[22:08:55] <Revent>	 akosiaris: Ping?
[22:09:03] <Revent>	 Since you are ‘on call’...
[22:47:51] <icinga-wm>	 PROBLEM - Disk space on labstore1005 is CRITICAL: DISK CRITICAL - free space: /srv/tools 474054 MB (5% inode=83%)
[23:44:23] <Revent>	 I need a script that pokes this channel every few hours until someone wakes up….
[23:50:08] <Dereckson>	 Revent: what issue do you have?
[23:50:28] <Revent>	 Easiest to just point at examples...
[23:50:30] <Dereckson>	 Revent: did you fill a ticket on Phabricator?
[23:50:34] <Revent>	 https://quarry.wmflabs.org/query/18951
[23:50:38] <Revent>	 https://quarry.wmflabs.org/query/18950
[23:50:43] <Revent>	 https://quarry.wmflabs.org/query/18947
[23:51:01] <Revent>	 It might be related to some known issue (shrugs)
[23:51:49] <Dereckson>	 Revent: open a CC on Phabricator, add Filippo Giunchedi as cc
[23:52:31] <Revent>	 Dereckson: Those files (and there are a substantial number more, I’m estimating about a hundred transcodes over the last few days) are ‘persistent’ about which transcodes fail that way, even if reset.
[23:54:53] <Dereckson>	 Revent: add also 'operations' as project (and media-storage)
[23:55:14] <Revent>	 (nods) Working on it.
[23:57:52] <wikibugs>	 06Operations, 06Commons, 10media-storage: More missing 'original' files on Commons - https://phabricator.wikimedia.org/T163068#3185065 (10Dereckson) The first two and the last now work from esams.  https://upload.wikimedia.org/wikipedia/commons/6/69/Autonomous_bus_trials_South_Perth_-_3.ogv is still 404.
[23:58:27] <Revent>	 Dereckson: ^ That was a different issue.
[23:59:32] <Revent>	 I fixed two of those by uploading a new copy from youtube (the source), and got the author (Anna Frodesiak) to upload a new copy of another one.
[23:59:58] <Dereckson>	 ok