[00:43:45] <wikibugs>	 6operations, 6Commons, 10MediaWiki-File-management, 6Multimedia, and 2 others: Image cache issue when 'over-writing' an image on commons - https://phabricator.wikimedia.org/T119038#1856135 (10Denniss) https://commons.wikimedia.org/wiki/File:Megabalanus_coccopoma.jpg Another file to run experiments with - L...
[01:27:43] <icinga-wm>	 PROBLEM - puppet last run on mw2091 is CRITICAL: CRITICAL: puppet fail
[01:30:49] <wikibugs>	 6operations, 10DBA: Revision 186704908 on en.wikipedia.org, Fatal exception: unknown "cluster16" - https://phabricator.wikimedia.org/T26675#1856148 (10Halfak) @Krenair, it may also make sense to add a relevant row to the `logging` table.
[01:54:13] <icinga-wm>	 RECOVERY - puppet last run on mw2091 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures
[02:25:22] <logmsgbot>	 !log mwdeploy@tin sync-l10n completed (1.27.0-wmf.7) (duration: 10m 04s)
[02:25:27] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[03:09:04] <icinga-wm>	 PROBLEM - puppet last run on mw2001 is CRITICAL: CRITICAL: puppet fail
[03:20:23] <icinga-wm>	 PROBLEM - Kafka Broker Replica Max Lag on kafka1012 is CRITICAL: CRITICAL: 88.89% of data above the critical threshold [5000000.0]
[03:24:33] <icinga-wm>	 PROBLEM - Kafka Broker Replica Max Lag on kafka1014 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [5000000.0]
[03:35:52] <icinga-wm>	 PROBLEM - puppet last run on mw1097 is CRITICAL: CRITICAL: Puppet has 1 failures
[03:36:04] <icinga-wm>	 RECOVERY - Kafka Broker Replica Max Lag on kafka1012 is OK: OK: Less than 1.00% above the threshold [1000000.0]
[03:36:24] <icinga-wm>	 RECOVERY - Kafka Broker Replica Max Lag on kafka1014 is OK: OK: Less than 1.00% above the threshold [1000000.0]
[03:36:43] <icinga-wm>	 PROBLEM - puppet last run on db2035 is CRITICAL: CRITICAL: Puppet has 1 failures
[03:36:44] <icinga-wm>	 RECOVERY - puppet last run on mw2001 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures
[03:57:02] <logmsgbot>	 !log l10nupdate@tin ResourceLoader cache refresh completed at Sun Dec  6 03:57:02 UTC 2015 (duration 1h 31m 41s)
[03:57:07] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[03:57:50] <Krenair>	 why is that taking so long
[04:00:04] <icinga-wm>	 RECOVERY - puppet last run on db2035 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures
[04:00:39] <Krenair>	 lol what
[04:01:34] <icinga-wm>	 PROBLEM - puppet last run on mw1060 is CRITICAL: CRITICAL: Puppet has 1 failures
[04:01:45] <Krenair>	 https://phabricator.wikimedia.org/diffusion/OPUP/browse/production/modules/scap/files/l10nupdate-1;ace7bc642fe4815e62c80daab85fe47a3b2ff608$123-131
[04:02:04] <Krenair>	 why is it reimplementing foreachwiki?
[04:03:13] <icinga-wm>	 RECOVERY - puppet last run on mw1097 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[04:26:44] <icinga-wm>	 RECOVERY - puppet last run on mw1060 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[04:40:33] <YuviPanda>	 grrrit-wm: are you alive?
[04:40:59] <YuviPanda>	 no you aren't
[04:41:54] <bd808>	 Krenair: the refreshMessageBlobs script is not fast but Sam knocked 2 hours off of it this week by cleaning up junk in the database. Probably no good reason it isn't using foreachwiki. Worth cleaning up I imagine
[04:42:13] <Krenair>	 look at SAL
[04:42:55] <Krenair>	 why did it take 3-4 hours yesterday?
[04:43:56] <bd808>	 its been taking that long for a while but the message wasn't showing hours before
[04:44:24] <bd808>	 it pauses for slave lag after ever row right now
[04:45:04] <bd808>	 Sam made a patch to only pause every 100 rows but we missed the deploy window
[04:45:31] <bd808>	 there were also hundreds of bogus lang rows that Sam cleaned out yesterday
[04:45:59] <bd808>	 it's generally a slow script though for sure
[05:06:53] <icinga-wm>	 PROBLEM - puppet last run on mw2121 is CRITICAL: CRITICAL: puppet fail
[05:33:55] <icinga-wm>	 PROBLEM - MariaDB disk space on silver is CRITICAL: DISK CRITICAL - free space: / 526 MB (5% inode=80%)
[05:34:22] <icinga-wm>	 RECOVERY - puppet last run on mw2121 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures
[05:40:46] <mutante>	 !log silver: apt-get clean for disk space
[05:40:52] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[05:40:53] <andrewbogott>	 mutante: thanks :)
[05:41:08] <andrewbogott>	 I think the issue is that nutcracker is running wild — can you make any sense of that?
[05:41:12] <andrewbogott>	 the log is huge
[05:41:28] <mutante>	 np, got a page about it. was at 5% and 523M left.after this now 1.2G left and 87%
[05:41:44] <icinga-wm>	 RECOVERY - MariaDB disk space on silver is OK: DISK OK
[05:42:15] <andrewbogott>	 today’s nutcracker log is 1.1G
[05:42:15] <mutante>	 and there was the recovery page
[05:42:18] <andrewbogott>	 that doesn’t seem right :)
[05:43:42] <icinga-wm>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[05:43:54] <icinga-wm>	 PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0]
[05:44:32] <mutante>	 andrewbogott: sorry, dont know besides that nutcracker is doing stuff
[05:44:43] <andrewbogott>	 ok — me neither
[05:44:50] <andrewbogott>	 it’s vaguely possible that this is normal behavior
[05:44:54] <mutante>	 is it really that much more than usual and didnt just run out of disk now
[05:44:59] <mutante>	 before logrotate
[05:45:52] <mutante>	 andrewbogott: let me gzip log.1
[05:46:13] <andrewbogott>	 thanks, that’s what I was thinking of trying as well
[05:46:17] <andrewbogott>	 might be that it compresses way down
[05:46:33] <mutante>	 if that ends up in size like 2.gz 5o 7.gz then its normal
[05:46:36] <mutante>	 for that timeframe 
[05:47:08] <andrewbogott>	 but then there’s this:  https://ganglia.wikimedia.org/latest/graph.php?r=week&z=xlarge&h=silver.wikimedia.org&m=cpu_report&s=by+name&mc=2&g=network_report&c=Virtualization+cluster+eqiad
[05:47:33] <icinga-wm>	 RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[05:47:33] <andrewbogott>	 hm, 96M?
[05:47:41] <andrewbogott>	 So I guess it was a normal size
[05:47:44] <mutante>	  96M -rw-r-----  1 nutcracker adm     96M Dec  5 06:26 nutcracker.log.1.gz
[05:47:47] <mutante>	  87M -rw-r-----  1 nutcracker adm     87M Dec  4 06:26 nutcracker.log.2.gz
[05:47:50] <mutante>	  82M -rw-r-----  1 nutcracker adm     82M Dec  3 06:26 nutcracker.log.3.gz
[05:47:52] <icinga-wm>	 RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[05:48:13] <andrewbogott>	 mutante: ok, I’m convinced that nothing in particular is happening, at least with that logfile
[05:48:21] <mutante>	 yea, also, this is just a 10G partition
[05:48:25] <andrewbogott>	 weird spikes on that network graph but maybe that’s normal
[05:48:25] <mutante>	 and not a separate one for /var
[05:48:27] <mutante>	 so..
[05:48:36] <mutante>	 that's just easy to fill up in general
[05:48:40] <andrewbogott>	 yeah, should figure out about getting those logs someplace else
[05:48:59] <mutante>	 for now it's fine. 25% free
[05:49:00] <andrewbogott>	 But this is the second disk space alert this week, after my never having seen one before
[05:49:08] <mutante>	 the rest can be later
[05:49:14] <andrewbogott>	 yep
[05:49:20] <mutante>	 log to a different place somehow
[05:49:23] <andrewbogott>	 definitely good enough for a Saturday night
[05:49:26] <mutante>	 yes
[05:49:39] <mutante>	 i'll afk again then 
[05:50:00] <andrewbogott>	 me too.  g’night!
[05:50:04] <mutante>	 !log silver gzip /var/log/nutcracker.log.1
[05:50:06] <mutante>	 good night
[05:50:09] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[06:09:14] <icinga-wm>	 PROBLEM - Kafka Broker Replica Max Lag on kafka1018 is CRITICAL: CRITICAL: 83.33% of data above the critical threshold [5000000.0]
[06:24:52] <icinga-wm>	 RECOVERY - Kafka Broker Replica Max Lag on kafka1018 is OK: OK: Less than 1.00% above the threshold [1000000.0]
[06:30:04] <icinga-wm>	 PROBLEM - puppet last run on chromium is CRITICAL: CRITICAL: Puppet has 3 failures
[06:31:03] <icinga-wm>	 PROBLEM - puppet last run on mw1061 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:32:23] <icinga-wm>	 PROBLEM - puppet last run on mw2018 is CRITICAL: CRITICAL: Puppet has 2 failures
[06:32:34] <icinga-wm>	 PROBLEM - puppet last run on mw2208 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:32:34] <icinga-wm>	 PROBLEM - puppet last run on mw2158 is CRITICAL: CRITICAL: Puppet has 2 failures
[06:33:03] <icinga-wm>	 PROBLEM - puppet last run on mw2126 is CRITICAL: CRITICAL: Puppet has 4 failures
[06:35:22] <icinga-wm>	 PROBLEM - puppet last run on mw1039 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:55:44] <icinga-wm>	 RECOVERY - puppet last run on chromium is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures
[06:56:12] <icinga-wm>	 RECOVERY - puppet last run on mw2208 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures
[06:56:34] <icinga-wm>	 RECOVERY - puppet last run on mw2126 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures
[06:56:34] <icinga-wm>	 RECOVERY - puppet last run on mw1061 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures
[06:57:53] <icinga-wm>	 RECOVERY - puppet last run on mw2018 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures
[06:58:12] <icinga-wm>	 RECOVERY - puppet last run on mw2158 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:58:53] <icinga-wm>	 RECOVERY - puppet last run on mw1039 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[07:00:12] <icinga-wm>	 PROBLEM - check_puppetrun on americium is CRITICAL: CRITICAL: Puppet has 1 failures
[07:05:12] <icinga-wm>	 PROBLEM - check_puppetrun on americium is CRITICAL: CRITICAL: Puppet has 1 failures
[07:10:12] <icinga-wm>	 PROBLEM - check_puppetrun on americium is CRITICAL: CRITICAL: Puppet has 1 failures
[07:15:12] <icinga-wm>	 RECOVERY - check_puppetrun on americium is OK: OK: Puppet is currently enabled, last run 200 seconds ago with 0 failures
[07:57:43] <icinga-wm>	 PROBLEM - puppet last run on labservices1001 is CRITICAL: CRITICAL: Puppet has 1 failures
[08:00:32] <icinga-wm>	 PROBLEM - puppet last run on holmium is CRITICAL: CRITICAL: Puppet has 1 failures
[08:19:01] <thedj>	 hmm, another report of thumbs not getting purged
[08:21:53] <icinga-wm>	 PROBLEM - DPKG on krypton is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:21:53] <icinga-wm>	 PROBLEM - Disk space on krypton is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:22:13] <icinga-wm>	 PROBLEM - salt-minion processes on krypton is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:22:14] <icinga-wm>	 PROBLEM - grafana-admin.wikimedia.org on krypton is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[08:22:25] <icinga-wm>	 PROBLEM - RAID on krypton is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:22:33] <icinga-wm>	 PROBLEM - configured eth on krypton is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:22:43] <icinga-wm>	 PROBLEM - dhclient process on krypton is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:23:04] <icinga-wm>	 PROBLEM - grafana.wikimedia.org on krypton is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[08:23:13] <icinga-wm>	 PROBLEM - puppet last run on krypton is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:23:14] <icinga-wm>	 PROBLEM - Check size of conntrack table on krypton is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:29:40] <Nemo_bis>	 thedj: in ams?
[08:30:34] <thedj>	 yeah
[09:16:15] <grrrit-wm>	 (03PS4) 10Yuvipanda: dynamicproxy: Increase websocket timeout [puppet] - 10https://gerrit.wikimedia.org/r/256882 (https://phabricator.wikimedia.org/T120335) 
[09:39:15] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 032] k8s: Roll etcd into master role [puppet] - 10https://gerrit.wikimedia.org/r/257173 (owner: 10Yuvipanda)
[09:41:12] <icinga-wm>	 PROBLEM - NTP on krypton is CRITICAL: NTP CRITICAL: No response from NTP server
[10:00:13] <grrrit-wm>	 (03PS1) 10Yuvipanda: k8s: Use regular puppet cert path [puppet] - 10https://gerrit.wikimedia.org/r/257176 
[10:01:03] <grrrit-wm>	 (03PS2) 10Yuvipanda: k8s: Use regular puppet cert path [puppet] - 10https://gerrit.wikimedia.org/r/257176 
[10:01:05] <grrrit-wm>	 (03PS6) 10Yuvipanda: base: Allow auto puppetmaster switching tuning [puppet] - 10https://gerrit.wikimedia.org/r/256890 (https://phabricator.wikimedia.org/T120159) 
[10:03:15] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 032] k8s: Use regular puppet cert path [puppet] - 10https://gerrit.wikimedia.org/r/257176 (owner: 10Yuvipanda)
[10:08:52] <icinga-wm>	 PROBLEM - SSH on krypton is CRITICAL: Server answer
[10:12:43] <icinga-wm>	 RECOVERY - SSH on krypton is OK: SSH OK - OpenSSH_6.7p1 Debian-5 (protocol 2.0)
[10:18:04] <grrrit-wm>	 (03PS5) 10Yuvipanda: dynamicproxy: Increase websocket timeout [puppet] - 10https://gerrit.wikimedia.org/r/256882 (https://phabricator.wikimedia.org/T120335) 
[10:18:55] <grrrit-wm>	 (03Abandoned) 10Yuvipanda: [WIP] / HACK: Enforce single ssldir for puppet [puppet] - 10https://gerrit.wikimedia.org/r/256642 (owner: 10Yuvipanda)
[10:19:13] <grrrit-wm>	 (03Abandoned) 10Yuvipanda: puppetmaster: Make sure base::puppet is present [puppet] - 10https://gerrit.wikimedia.org/r/255154 (owner: 10Yuvipanda)
[10:19:44] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 032] dynamicproxy: Increase websocket timeout [puppet] - 10https://gerrit.wikimedia.org/r/256882 (https://phabricator.wikimedia.org/T120335) (owner: 10Yuvipanda)
[10:22:23] <icinga-wm>	 PROBLEM - SSH on krypton is CRITICAL: Server answer
[10:24:04] <icinga-wm>	 RECOVERY - puppet last run on labservices1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[10:26:13] <icinga-wm>	 RECOVERY - SSH on krypton is OK: SSH OK - OpenSSH_6.7p1 Debian-5 (protocol 2.0)
[10:26:53] <icinga-wm>	 RECOVERY - puppet last run on holmium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[10:47:44] <icinga-wm>	 PROBLEM - SSH on krypton is CRITICAL: Server answer
[10:49:43] <icinga-wm>	 RECOVERY - SSH on krypton is OK: SSH OK - OpenSSH_6.7p1 Debian-5 (protocol 2.0)
[10:57:33] <icinga-wm>	 PROBLEM - SSH on krypton is CRITICAL: Server answer
[11:15:04] <icinga-wm>	 RECOVERY - SSH on krypton is OK: SSH OK - OpenSSH_6.7p1 Debian-5 (protocol 2.0)
[11:26:43] <icinga-wm>	 PROBLEM - SSH on krypton is CRITICAL: Server answer
[11:28:34] <icinga-wm>	 RECOVERY - SSH on krypton is OK: SSH OK - OpenSSH_6.7p1 Debian-5 (protocol 2.0)
[11:38:33] <icinga-wm>	 PROBLEM - SSH on krypton is CRITICAL: Server answer
[11:42:23] <icinga-wm>	 RECOVERY - SSH on krypton is OK: SSH OK - OpenSSH_6.7p1 Debian-5 (protocol 2.0)
[11:50:03] <icinga-wm>	 PROBLEM - SSH on krypton is CRITICAL: Server answer
[11:52:03] <icinga-wm>	 RECOVERY - SSH on krypton is OK: SSH OK - OpenSSH_6.7p1 Debian-5 (protocol 2.0)
[11:57:53] <icinga-wm>	 PROBLEM - SSH on krypton is CRITICAL: Server answer
[12:03:53] <icinga-wm>	 RECOVERY - SSH on krypton is OK: SSH OK - OpenSSH_6.7p1 Debian-5 (protocol 2.0)
[12:11:42] <icinga-wm>	 PROBLEM - SSH on krypton is CRITICAL: Server answer
[12:17:23] <icinga-wm>	 RECOVERY - SSH on krypton is OK: SSH OK - OpenSSH_6.7p1 Debian-5 (protocol 2.0)
[12:23:13] <icinga-wm>	 PROBLEM - SSH on krypton is CRITICAL: Server answer
[12:27:04] <icinga-wm>	 RECOVERY - SSH on krypton is OK: SSH OK - OpenSSH_6.7p1 Debian-5 (protocol 2.0)
[12:38:54] <icinga-wm>	 PROBLEM - SSH on krypton is CRITICAL: Server answer
[12:44:44] <icinga-wm>	 RECOVERY - SSH on krypton is OK: SSH OK - OpenSSH_6.7p1 Debian-5 (protocol 2.0)
[12:58:14] <icinga-wm>	 PROBLEM - SSH on krypton is CRITICAL: Server answer
[13:00:23] <icinga-wm>	 RECOVERY - SSH on krypton is OK: SSH OK - OpenSSH_6.7p1 Debian-5 (protocol 2.0)
[13:06:13] <icinga-wm>	 PROBLEM - SSH on krypton is CRITICAL: Server answer
[13:10:03] <icinga-wm>	 RECOVERY - SSH on krypton is OK: SSH OK - OpenSSH_6.7p1 Debian-5 (protocol 2.0)
[13:17:52] <icinga-wm>	 PROBLEM - SSH on krypton is CRITICAL: Server answer
[13:19:43] <icinga-wm>	 RECOVERY - SSH on krypton is OK: SSH OK - OpenSSH_6.7p1 Debian-5 (protocol 2.0)
[13:23:32] <grrrit-wm>	 (03CR) 10Paladox: Gerrit: use Diffusion for repo browsing (again) (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/256605 (https://phabricator.wikimedia.org/T110607) (owner: 10Chad)
[13:25:33] <icinga-wm>	 PROBLEM - SSH on krypton is CRITICAL: Server answer
[13:27:32] <icinga-wm>	 RECOVERY - SSH on krypton is OK: SSH OK - OpenSSH_6.7p1 Debian-5 (protocol 2.0)
[13:39:23] <icinga-wm>	 PROBLEM - SSH on krypton is CRITICAL: Server answer
[13:41:23] <icinga-wm>	 RECOVERY - SSH on krypton is OK: SSH OK - OpenSSH_6.7p1 Debian-5 (protocol 2.0)
[13:48:06] <grrrit-wm>	 (03PS1) 10Paladox: Fix redirections in gerrit [puppet] - 10https://gerrit.wikimedia.org/r/257193 
[13:49:03] <icinga-wm>	 PROBLEM - SSH on krypton is CRITICAL: Server answer
[13:52:59] <grrrit-wm>	 (03CR) 10Krinkle: Fix redirections in gerrit (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/257193 (owner: 10Paladox)
[13:54:12] <grrrit-wm>	 (03CR) 10Paladox: Fix redirections in gerrit (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/257193 (owner: 10Paladox)
[13:54:52] <icinga-wm>	 RECOVERY - SSH on krypton is OK: SSH OK - OpenSSH_6.7p1 Debian-5 (protocol 2.0)
[13:57:18] <grrrit-wm>	 (03CR) 10Krinkle: Fix redirections in gerrit (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/257193 (owner: 10Paladox)
[13:58:29] <grrrit-wm>	 (03CR) 10Paladox: Fix redirections in gerrit (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/257193 (owner: 10Paladox)
[14:00:22] <grrrit-wm>	 (03CR) 10Paladox: Fix redirections in gerrit (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/257193 (owner: 10Paladox)
[14:00:28] <grrrit-wm>	 (03PS2) 10Paladox: Fix redirections in gerrit [puppet] - 10https://gerrit.wikimedia.org/r/257193 
[14:02:44] <icinga-wm>	 PROBLEM - SSH on krypton is CRITICAL: Server answer
[14:08:33] <icinga-wm>	 RECOVERY - SSH on krypton is OK: SSH OK - OpenSSH_6.7p1 Debian-5 (protocol 2.0)
[14:24:04] <icinga-wm>	 PROBLEM - SSH on krypton is CRITICAL: Server answer
[14:37:54] <icinga-wm>	 RECOVERY - SSH on krypton is OK: SSH OK - OpenSSH_6.7p1 Debian-5 (protocol 2.0)
[14:43:43] <icinga-wm>	 PROBLEM - SSH on krypton is CRITICAL: Server answer
[15:01:43] <icinga-wm>	 PROBLEM - Disk space on restbase1008 is CRITICAL: DISK CRITICAL - free space: /var 69958 MB (3% inode=99%)
[15:14:52] <icinga-wm>	 RECOVERY - SSH on krypton is OK: SSH OK - OpenSSH_6.7p1 Debian-5 (protocol 2.0)
[15:20:42] <icinga-wm>	 PROBLEM - SSH on krypton is CRITICAL: Server answer
[15:26:24] <icinga-wm>	 RECOVERY - SSH on krypton is OK: SSH OK - OpenSSH_6.7p1 Debian-5 (protocol 2.0)
[15:32:32] <icinga-wm>	 PROBLEM - SSH on krypton is CRITICAL: Server answer
[16:03:43] <icinga-wm>	 RECOVERY - SSH on krypton is OK: SSH OK - OpenSSH_6.7p1 Debian-5 (protocol 2.0)
[16:09:32] <icinga-wm>	 PROBLEM - SSH on krypton is CRITICAL: Server answer
[16:11:24] <icinga-wm>	 RECOVERY - SSH on krypton is OK: SSH OK - OpenSSH_6.7p1 Debian-5 (protocol 2.0)
[16:17:14] <icinga-wm>	 PROBLEM - SSH on krypton is CRITICAL: Server answer
[16:25:24] <icinga-wm>	 PROBLEM - Disk space on restbase1008 is CRITICAL: DISK CRITICAL - free space: /var 70146 MB (3% inode=99%)
[17:14:12] <icinga-wm>	 RECOVERY - Disk space on restbase1008 is OK: DISK OK
[17:36:26] <eranroz>	 hi, there is some dead local in DB
[17:36:32] <eranroz>	 dead-lock*
[17:36:38] <eranroz>	 https://he.wikipedia.org/w/index.php?title=%D7%94%D7%A4%D7%95%D7%A2%D7%9C_%D7%97%D7%93%D7%A8%D7%94&action=purge
[17:39:43] <wikibugs>	 6operations: Dead-lock in hewiki DB following page move - https://phabricator.wikimedia.org/T120571#1856635 (10eranroz) 3NEW
[17:42:47] <wikibugs>	 6operations: Dead-lock in hewiki DB following page move - https://phabricator.wikimedia.org/T120571#1856643 (10IKhitron) Let me guess that we have infinite loop there - a redirect page to itself.
[17:44:52] <wikibugs>	 6operations: Dead-lock in hewiki DB following page move - https://phabricator.wikimedia.org/T120571#1856644 (10eranroz) Function: WikiPage::insertRedirectEntry Error: 1205 Lock wait timeout exceeded; try restarting transaction (10.64.16.22)
[18:01:54] <icinga-wm>	 PROBLEM - puppet last run on mw2077 is CRITICAL: CRITICAL: puppet fail
[18:07:38] <grrrit-wm>	 (03CR) 10Paladox: "Hi I just tested this on my gerrit local install the links show correctly but when clicked still doint redirect properly but the links do " [puppet] - 10https://gerrit.wikimedia.org/r/257193 (owner: 10Paladox)
[18:29:04] <icinga-wm>	 RECOVERY - puppet last run on mw2077 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[18:34:23] <grrrit-wm>	 (03PS1) 10BBlack: pybal: whitespace-only: align => [puppet] - 10https://gerrit.wikimedia.org/r/257207 
[18:34:25] <grrrit-wm>	 (03PS1) 10BBlack: pybal: persist journal logs to disk [puppet] - 10https://gerrit.wikimedia.org/r/257208 
[18:34:51] <grrrit-wm>	 (03CR) 10BBlack: [C: 032 V: 032] pybal: whitespace-only: align => [puppet] - 10https://gerrit.wikimedia.org/r/257207 (owner: 10BBlack)
[18:37:04] <grrrit-wm>	 (03PS2) 10BBlack: pybal: persist journal logs to disk [puppet] - 10https://gerrit.wikimedia.org/r/257208 
[18:49:39] <legoktm>	 !log reset auth token for User:QuimGil
[18:49:44] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[19:41:58] <grrrit-wm>	 (03PS3) 10Paladox: Fix redirections in gerrit [puppet] - 10https://gerrit.wikimedia.org/r/257193 
[19:43:18] <grrrit-wm>	 (03PS4) 10Paladox: Fix redirections in gerrit [puppet] - 10https://gerrit.wikimedia.org/r/257193 
[19:44:45] <grrrit-wm>	 (03PS5) 10Paladox: Fix redirections in gerrit [puppet] - 10https://gerrit.wikimedia.org/r/257193 
[19:53:02] <grrrit-wm>	 (03CR) 10Paladox: "branch seems to show like refs/head/master instead of just being master." [puppet] - 10https://gerrit.wikimedia.org/r/257193 (owner: 10Paladox)
[21:48:27] <ori>	 !log krypton unresponsive, nothing on console. shutting down, increasing instance ram from 2 to 4g, and rebooting.
[21:48:31] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[21:50:32] <icinga-wm>	 RECOVERY - SSH on krypton is OK: SSH OK - OpenSSH_6.7p1 Debian-5 (protocol 2.0)
[21:50:53] <icinga-wm>	 RECOVERY - grafana-admin.wikimedia.org on krypton is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 534 bytes in 0.005 second response time
[21:50:53] <icinga-wm>	 RECOVERY - salt-minion processes on krypton is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[21:51:13] <icinga-wm>	 RECOVERY - dhclient process on krypton is OK: PROCS OK: 0 processes with command name dhclient
[21:51:13] <icinga-wm>	 RECOVERY - configured eth on krypton is OK: OK - interfaces up
[21:51:13] <icinga-wm>	 RECOVERY - RAID on krypton is OK: OK: no RAID installed
[21:51:43] <icinga-wm>	 RECOVERY - grafana.wikimedia.org on krypton is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 522 bytes in 0.006 second response time
[21:51:43] <icinga-wm>	 RECOVERY - DPKG on krypton is OK: All packages OK
[21:51:43] <icinga-wm>	 RECOVERY - Disk space on krypton is OK: DISK OK
[21:51:43] <icinga-wm>	 RECOVERY - puppet last run on krypton is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures
[21:51:44] <icinga-wm>	 RECOVERY - Check size of conntrack table on krypton is OK: OK: nf_conntrack is 0 % full
[22:10:03] <icinga-wm>	 RECOVERY - NTP on krypton is OK: NTP OK: Offset -0.001266598701 secs
[22:20:12] <wikibugs>	 6operations, 10Deployment-Systems: Make l10nupdate user a system user - https://phabricator.wikimedia.org/T120585#1856865 (10bd808) 3NEW
[22:21:13] <wikibugs>	 6operations, 10Deployment-Systems, 10Wikimedia-General-or-Unknown, 5Patch-For-Review, 15User-bd808: localisationupdate broken on wmf wikis by scap master-master sync changes - https://phabricator.wikimedia.org/T119746#1856874 (10bd808) 5Open>3Resolved The nightly l10nupdate cron job seems to be worki...
[22:29:13] <icinga-wm>	 PROBLEM - puppet last run on holmium is CRITICAL: CRITICAL: Puppet has 1 failures
[22:51:20] <grrrit-wm>	 (03PS1) 10BryanDavis: vagrant: Set umask 0002 for wikidev users [puppet] - 10https://gerrit.wikimedia.org/r/257263 (https://phabricator.wikimedia.org/T120472) 
[22:55:48] <YuviPanda>	 bd808: ^ let me know if / when you want me to look at / merge
[22:56:15] <bd808>	 whenever you have time
[22:56:35] <bd808>	 I was going to put it up for puppet swat but that's not until Thursday apparently
[22:56:50] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 032] vagrant: Set umask 0002 for wikidev users [puppet] - 10https://gerrit.wikimedia.org/r/257263 (https://phabricator.wikimedia.org/T120472) (owner: 10BryanDavis)
[22:57:05] <YuviPanda>	 bd808: yeah we cancelled tuesday's because there's a big ldap migration planned
[22:57:17] <bd808>	 *nod*
[22:57:51] <YuviPanda>	 bd808: done
[22:58:22] <bd808>	 thanks
[22:59:23] <icinga-wm>	 PROBLEM - puppet last run on labservices1001 is CRITICAL: CRITICAL: Puppet has 1 failures
[23:13:03] <icinga-wm>	 PROBLEM - Kafka Broker Replica Max Lag on kafka1022 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [5000000.0]
[23:20:43] <icinga-wm>	 RECOVERY - Kafka Broker Replica Max Lag on kafka1022 is OK: OK: Less than 1.00% above the threshold [1000000.0]
[23:34:58] <grrrit-wm>	 (03CR) 10Alex Monk: [C: 04-1] "Should be merged with I20b202be" [puppet] - 10https://gerrit.wikimedia.org/r/256437 (https://phabricator.wikimedia.org/T115965) (owner: 10Reedy)
[23:35:52] <icinga-wm>	 PROBLEM - Kafka Broker Replica Max Lag on kafka1012 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [5000000.0]
[23:41:25] <grrrit-wm>	 (03PS1) 10Alex Monk: Update my .bashrc [puppet] - 10https://gerrit.wikimedia.org/r/257267 
[23:41:28] <grrrit-wm>	 (03PS1) 10Alex Monk: Don't reimplement foreachwiki in l10nupdate-1 [puppet] - 10https://gerrit.wikimedia.org/r/257268 
[23:49:23] <icinga-wm>	 PROBLEM - puppet last run on cp2014 is CRITICAL: CRITICAL: puppet fail
[23:49:54] <icinga-wm>	 PROBLEM - puppet last run on mw2087 is CRITICAL: CRITICAL: puppet fail
[23:51:23] <icinga-wm>	 RECOVERY - Kafka Broker Replica Max Lag on kafka1012 is OK: OK: Less than 1.00% above the threshold [1000000.0]