[00:20:20] <icinga-wm>	 PROBLEM - puppet last run on mw1153 is CRITICAL: CRITICAL: Puppet has 1 failures
[00:26:29] <icinga-wm>	 RECOVERY - puppet last run on db2002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[00:47:09] <icinga-wm>	 RECOVERY - puppet last run on mw1153 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[01:10:28] <icinga-wm>	 PROBLEM - Host mw1085 is DOWN: PING CRITICAL - Packet loss = 100%
[01:11:19] <icinga-wm>	 RECOVERY - Host mw1085 is UP: PING OK - Packet loss = 0%, RTA = 0.51 ms
[01:51:48] <jzerebecki>	 !log disabling 2fa for Hym411 T130994  [labswiki]> delete from oathauth_users where id=1363;
[01:51:49] <stashbot>	 T130994: Reset 2FA on wikitech for User:Revi and User:Hym411 - https://phabricator.wikimedia.org/T130994
[01:51:53] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[01:53:10] <revi>	 thank you :)
[01:53:28] <jzerebecki>	 yw
[02:19:21] <Dereckson>	 jzerebecki: hi. My account is concerned too, UID is 2362 if they are the same than LDAP.
[02:22:28] <logmsgbot>	 !log mwdeploy@tin sync-l10n completed (1.27.0-wmf.18) (duration: 10m 06s)
[02:22:33] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[02:29:28] <matanya>	 jzerebecki: can you do server side uploads ?
[02:30:14] <jzerebecki>	 matanya: I have no idea how to do those
[02:30:42] <jzerebecki>	 Dereckson: I'm looking at it
[02:31:03] <logmsgbot>	 !log l10nupdate@tin ResourceLoader cache refresh completed at Sun Mar 27 02:31:03 UTC 2016 (duration 8m 35s)
[02:31:07] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[02:35:53] <matanya>	 jzerebecki: https://wikitech.wikimedia.org/wiki/Uploading_large_files
[02:37:14] <jzerebecki>	 matanya: ah yes good to know, why are you asking?
[02:37:27] <matanya>	 have a large file to upload
[02:42:26] <jzerebecki>	 matanya: sorry not now
[02:42:35] <matanya>	 no worries
[02:57:35] <jzerebecki>	 !log disabling 2fa for Dereckson T130892  [labswiki]> delete from oathauth_users where id=402;
[02:57:37] <stashbot>	 T130892: wikitech 2fa provisioning form does so without confirmation - https://phabricator.wikimedia.org/T130892
[02:57:40] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[02:58:07] <jzerebecki>	 Dereckson: please verify
[03:01:15] <Dereckson>	 I can login, thanks.
[03:04:51] <Dereckson>	 Now, if I want to really add 2FA, is it safe to visit options page or should I wait some pending bug is fixed?
[03:05:59] <matanya>	  503 Service Temporarily Unavailable 
[03:06:06] <matanya>	 on commons
[03:06:50] <matanya>	 A database query error has occurred. This may indicate a bug in the software.
[03:06:50] <matanya>	     Function: WikiPage::lockAndGetLatest
[03:06:50] <matanya>	     Error: 1205 Lock wait timeout exceeded; try restarting transaction (10.64.16.29)
[03:07:23] <matanya>	 I broke commons ;)
[03:08:59] <jzerebecki>	 Dereckson: it is safe to do so, the bug will not appear anymore.
[03:11:50] <Dereckson>	 ok
[03:36:18] <icinga-wm>	 PROBLEM - puppet last run on mw1109 is CRITICAL: CRITICAL: Puppet has 1 failures
[03:36:19] <icinga-wm>	 PROBLEM - puppet last run on mw2051 is CRITICAL: CRITICAL: Puppet has 1 failures
[04:02:19] <icinga-wm>	 RECOVERY - puppet last run on mw1109 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[04:02:28] <icinga-wm>	 RECOVERY - puppet last run on mw2051 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures
[06:30:28] <icinga-wm>	 PROBLEM - puppet last run on wtp2008 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:31:28] <icinga-wm>	 PROBLEM - puppet last run on analytics1047 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:31:48] <icinga-wm>	 PROBLEM - puppet last run on cp2001 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:31:49] <icinga-wm>	 PROBLEM - puppet last run on mw2073 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:38:39] <icinga-wm>	 PROBLEM - Ubuntu mirror in sync with upstream on carbon is CRITICAL: /srv/mirrors/ubuntu is over 12 hours old.
[06:43:49] <icinga-wm>	 RECOVERY - Ubuntu mirror in sync with upstream on carbon is OK: /srv/mirrors/ubuntu is over 0 hours old.
[06:56:19] <icinga-wm>	 RECOVERY - puppet last run on cp2001 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures
[06:56:40] <icinga-wm>	 RECOVERY - puppet last run on wtp2008 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures
[06:57:40] <icinga-wm>	 RECOVERY - puppet last run on analytics1047 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:58:09] <icinga-wm>	 RECOVERY - puppet last run on mw2073 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures
[07:19:28] <icinga-wm>	 PROBLEM - RAID on ms-be2016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:21:39] <icinga-wm>	 PROBLEM - very high load average likely xfs on ms-be2016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:25:09] <icinga-wm>	 PROBLEM - configured eth on ms-be2016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:25:09] <icinga-wm>	 PROBLEM - swift-object-server on ms-be2016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:25:09] <icinga-wm>	 PROBLEM - swift-object-updater on ms-be2016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:25:10] <icinga-wm>	 PROBLEM - swift-container-auditor on ms-be2016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:25:28] <icinga-wm>	 PROBLEM - salt-minion processes on ms-be2016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:25:30] <icinga-wm>	 PROBLEM - puppet last run on ms-be2016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:25:39] <icinga-wm>	 PROBLEM - swift-account-replicator on ms-be2016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:25:40] <icinga-wm>	 PROBLEM - swift-container-replicator on ms-be2016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:25:48] <icinga-wm>	 PROBLEM - swift-object-auditor on ms-be2016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:25:50] <icinga-wm>	 PROBLEM - SSH on ms-be2016 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[07:25:59] <icinga-wm>	 PROBLEM - swift-container-server on ms-be2016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:26:00] <icinga-wm>	 PROBLEM - swift-container-updater on ms-be2016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:26:09] <icinga-wm>	 PROBLEM - swift-account-reaper on ms-be2016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:26:19] <icinga-wm>	 PROBLEM - swift-object-replicator on ms-be2016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:26:19] <icinga-wm>	 PROBLEM - Disk space on ms-be2016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:26:28] <icinga-wm>	 PROBLEM - swift-account-server on ms-be2016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:26:38] <icinga-wm>	 PROBLEM - swift-account-auditor on ms-be2016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:26:39] <icinga-wm>	 PROBLEM - dhclient process on ms-be2016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:26:39] <icinga-wm>	 PROBLEM - DPKG on ms-be2016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:26:48] <icinga-wm>	 PROBLEM - Check size of conntrack table on ms-be2016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:15:58] <icinga-wm>	 PROBLEM - NTP on ms-be2016 is CRITICAL: NTP CRITICAL: No response from NTP server
[08:50:08] <wikibugs>	 6Operations, 10Traffic: HTTPS error on status.wikimedia.org - https://phabricator.wikimedia.org/T131017#2153719 (10Peachey88)
[08:51:01] <wikibugs>	 6Operations, 10Traffic, 7HTTPS: HTTPS error on status.wikimedia.org - https://phabricator.wikimedia.org/T131017#2153707 (10Peachey88) AFAIK we can't fix that, because its hosted externally by watchmouse.
[08:51:30] <wikibugs>	 6Operations, 10Traffic, 7HTTPS: HTTPS error on status.wikimedia.org (watchmouse certificate mismatch) - https://phabricator.wikimedia.org/T131017#2153723 (10Peachey88)
[08:52:10] <wikibugs>	 6Operations, 10Traffic, 7HTTPS: HTTPS error on status.wikimedia.org (watchmouse certificate mismatch) - https://phabricator.wikimedia.org/T131017#2153707 (10Peachey88)
[08:52:12] <wikibugs>	 6Operations, 10Traffic, 7HTTPS: status.wikimedia.org is using SSL cert from other domain - https://phabricator.wikimedia.org/T34796#2153725 (10Peachey88)
[09:05:49] <wikibugs>	 6Operations, 10Traffic, 7HTTPS: investigate/remove hostname login.m.wikimedia.org - https://phabricator.wikimedia.org/T111998#2153727 (10Peachey88)
[09:05:51] <wikibugs>	 6Operations, 10Traffic, 7Mobile, 13Patch-For-Review: Investigate if login.m.wikimedia.org needs to stay around - https://phabricator.wikimedia.org/T123431#2153728 (10Peachey88)
[12:07:08] <icinga-wm>	 PROBLEM - HHVM rendering on mw1253 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50395 bytes in 0.175 second response time
[12:08:49] <icinga-wm>	 RECOVERY - HHVM rendering on mw1253 is OK: HTTP OK: HTTP/1.1 200 OK - 67886 bytes in 0.070 second response time
[13:28:51] <wikibugs>	 6Operations, 10Wikimedia-Apache-configuration: SVGs without XML prolog (<?xml ... ?>) served with Content-Type text/html from upload.wikimedia.org - https://phabricator.wikimedia.org/T131012#2153826 (10matmarex) I'm not sure if this is an Apache configuration problem or what, I'm really using this to mean "The...
[13:30:58] <grrrit-wm>	 (03CR) 10Luke081515: [C: 04-1] "Needs rebase." [dns] - 10https://gerrit.wikimedia.org/r/276385 (https://phabricator.wikimedia.org/T123431) (owner: 10Dzahn)
[14:01:35] <wikibugs>	 6Operations, 10media-storage: SVGs without XML prolog (<?xml ... ?>) served with Content-Type text/html from upload.wikimedia.org - https://phabricator.wikimedia.org/T131012#2153834 (10Krenair) from upload? I don't think it can be apache ```krenair@tin:~$ set +H krenair@tin:~$ curl -I -H "Host: upload.wikimedi...
[17:07:48] <icinga-wm>	 PROBLEM - Check size of conntrack table on kafka1014 is CRITICAL: CRITICAL: nf_conntrack is 90 % full
[17:07:58] <icinga-wm>	 PROBLEM - Check size of conntrack table on kafka1012 is CRITICAL: CRITICAL: nf_conntrack is 90 % full
[17:08:00] <icinga-wm>	 PROBLEM - Check size of conntrack table on kafka1013 is CRITICAL: CRITICAL: nf_conntrack is 90 % full
[17:08:48] <icinga-wm>	 PROBLEM - Check size of conntrack table on kafka1022 is CRITICAL: CRITICAL: nf_conntrack is 90 % full
[17:08:58] <icinga-wm>	 PROBLEM - Check size of conntrack table on kafka1018 is CRITICAL: CRITICAL: nf_conntrack is 90 % full
[17:09:00] <icinga-wm>	 PROBLEM - Check size of conntrack table on kafka1020 is CRITICAL: CRITICAL: nf_conntrack is 90 % full
[17:39:19] <icinga-wm>	 PROBLEM - puppet last run on mw1123 is CRITICAL: CRITICAL: Puppet has 3 failures
[17:50:29] <grrrit-wm>	 (03Abandoned) 10Tim Landscheidt: testsystem: Move role class to module role [puppet] - 10https://gerrit.wikimedia.org/r/270105 (owner: 10Tim Landscheidt)
[18:04:29] <icinga-wm>	 PROBLEM - Ubuntu mirror in sync with upstream on carbon is CRITICAL: /srv/mirrors/ubuntu is over 12 hours old.
[18:08:59] <icinga-wm>	 RECOVERY - puppet last run on mw1123 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[18:11:39] <icinga-wm>	 PROBLEM - Check size of conntrack table on kafka1022 is CRITICAL: CRITICAL: nf_conntrack is 90 % full
[18:11:48] <icinga-wm>	 PROBLEM - Check size of conntrack table on kafka1018 is CRITICAL: CRITICAL: nf_conntrack is 90 % full
[18:11:58] <icinga-wm>	 PROBLEM - Check size of conntrack table on kafka1020 is CRITICAL: CRITICAL: nf_conntrack is 90 % full
[18:12:19] <icinga-wm>	 PROBLEM - Check size of conntrack table on kafka1014 is CRITICAL: CRITICAL: nf_conntrack is 90 % full
[18:12:29] <icinga-wm>	 PROBLEM - Check size of conntrack table on kafka1012 is CRITICAL: CRITICAL: nf_conntrack is 90 % full
[18:12:38] <icinga-wm>	 PROBLEM - Check size of conntrack table on kafka1013 is CRITICAL: CRITICAL: nf_conntrack is 90 % full
[18:17:29] <elukey>	 ---^ still tons of TIME_WAITs showing up in /proc/net/nf_conntrack and not on nestat -tunap
[18:21:18] <icinga-wm>	 PROBLEM - Check size of conntrack table on kafka1013 is CRITICAL: CRITICAL: nf_conntrack is 90 % full
[18:22:19] <icinga-wm>	 PROBLEM - Check size of conntrack table on kafka1020 is CRITICAL: CRITICAL: nf_conntrack is 90 % full
[18:22:48] <icinga-wm>	 PROBLEM - Check size of conntrack table on kafka1014 is CRITICAL: CRITICAL: nf_conntrack is 90 % full
[18:22:58] <icinga-wm>	 PROBLEM - Check size of conntrack table on kafka1012 is CRITICAL: CRITICAL: nf_conntrack is 90 % full
[18:23:15] <elukey>	 and most of the timewaits are from mw1233.eqiad.wmnet. hosts..
[18:23:48] <icinga-wm>	 PROBLEM - Check size of conntrack table on kafka1022 is CRITICAL: CRITICAL: nf_conntrack is 90 % full
[18:23:58] <icinga-wm>	 PROBLEM - Check size of conntrack table on kafka1018 is CRITICAL: CRITICAL: nf_conntrack is 90 % full
[18:25:12] <elukey>	 and netstat -tuap shows ESTABLISHED with other kafka hosts or cpXXXX as expected
[18:29:00] <icinga-wm>	 PROBLEM - Check size of conntrack table on kafka1022 is CRITICAL: CRITICAL: nf_conntrack is 90 % full
[18:29:10] <icinga-wm>	 PROBLEM - Check size of conntrack table on kafka1018 is CRITICAL: CRITICAL: nf_conntrack is 90 % full
[18:29:19] <icinga-wm>	 PROBLEM - Check size of conntrack table on kafka1020 is CRITICAL: CRITICAL: nf_conntrack is 90 % full
[18:31:40] <icinga-wm>	 PROBLEM - Check size of conntrack table on kafka1014 is CRITICAL: CRITICAL: nf_conntrack is 90 % full
[18:31:49] <icinga-wm>	 PROBLEM - Check size of conntrack table on kafka1012 is CRITICAL: CRITICAL: nf_conntrack is 90 % full
[18:31:58] <icinga-wm>	 PROBLEM - Check size of conntrack table on kafka1013 is CRITICAL: CRITICAL: nf_conntrack is 90 % full
[18:32:40] <icinga-wm>	 PROBLEM - Check size of conntrack table on kafka1022 is CRITICAL: CRITICAL: nf_conntrack is 90 % full
[18:32:59] <icinga-wm>	 PROBLEM - Check size of conntrack table on kafka1020 is CRITICAL: CRITICAL: nf_conntrack is 90 % full
[18:33:25] <wikibugs>	 6Operations, 10Analytics: nf_conntrack warnings for kafka hosts - https://phabricator.wikimedia.org/T131028#2153998 (10elukey)
[18:33:38] <elukey>	 https://phabricator.wikimedia.org/T131028 filed
[18:35:07] <elukey>	 (happy easter btw :P)
[18:35:36] <Dereckson>	 (thanks, happy easter to you too)
[18:37:05] <elukey>	 :) yesterday it went away by itself, but something definitely changed in the past two days to trigger this
[18:40:39] <icinga-wm>	 PROBLEM - Check size of conntrack table on kafka1013 is CRITICAL: CRITICAL: nf_conntrack is 90 % full
[18:40:39] <Platonides>	 happy easter, elukey and Dereckson :)
[18:41:28] <icinga-wm>	 PROBLEM - Check size of conntrack table on kafka1022 is CRITICAL: CRITICAL: nf_conntrack is 90 % full
[18:41:38] <icinga-wm>	 PROBLEM - Check size of conntrack table on kafka1018 is CRITICAL: CRITICAL: nf_conntrack is 91 % full
[18:41:39] <icinga-wm>	 PROBLEM - Check size of conntrack table on kafka1020 is CRITICAL: CRITICAL: nf_conntrack is 90 % full
[18:42:09] <icinga-wm>	 PROBLEM - Check size of conntrack table on kafka1014 is CRITICAL: CRITICAL: nf_conntrack is 91 % full
[18:42:19] <icinga-wm>	 PROBLEM - Check size of conntrack table on kafka1012 is CRITICAL: CRITICAL: nf_conntrack is 91 % full
[18:44:39] <icinga-wm>	 RECOVERY - Ubuntu mirror in sync with upstream on carbon is OK: /srv/mirrors/ubuntu is over 0 hours old.
[18:46:39] <icinga-wm>	 PROBLEM - Check size of conntrack table on kafka1022 is CRITICAL: CRITICAL: nf_conntrack is 90 % full
[18:46:48] <icinga-wm>	 PROBLEM - Check size of conntrack table on kafka1018 is CRITICAL: CRITICAL: nf_conntrack is 91 % full
[18:46:58] <icinga-wm>	 PROBLEM - Check size of conntrack table on kafka1020 is CRITICAL: CRITICAL: nf_conntrack is 90 % full
[18:47:19] <icinga-wm>	 PROBLEM - Check size of conntrack table on kafka1014 is CRITICAL: CRITICAL: nf_conntrack is 90 % full
[18:47:23] <elukey>	 mmm the alarms are bouncing, a lot of noise but for the moment nothing critical
[18:50:18] <icinga-wm>	 PROBLEM - Check size of conntrack table on kafka1018 is CRITICAL: CRITICAL: nf_conntrack is 90 % full
[18:50:20] <icinga-wm>	 PROBLEM - Check size of conntrack table on kafka1020 is CRITICAL: CRITICAL: nf_conntrack is 90 % full
[18:50:49] <icinga-wm>	 PROBLEM - Check size of conntrack table on kafka1014 is CRITICAL: CRITICAL: nf_conntrack is 91 % full
[18:50:59] <icinga-wm>	 PROBLEM - Check size of conntrack table on kafka1012 is CRITICAL: CRITICAL: nf_conntrack is 91 % full
[18:51:50] <icinga-wm>	 PROBLEM - Check size of conntrack table on kafka1022 is CRITICAL: CRITICAL: nf_conntrack is 91 % full
[18:56:45] <wikibugs>	 6Operations, 10Analytics: nf_conntrack warnings for kafka hosts - https://phabricator.wikimedia.org/T131028#2153998 (10Muehlenhoff) We can easily bump that to 512k, we had the same workaround for the recent job runner problems. We can than properly analyse the root cause on Tuesday. Am 27.03.2016 20:33 schrieb...
[18:58:28] <elukey>	 moritzm: o/ shall we bump nf_conntrack_max for kafka?
[18:59:51] <elukey>	 maybe for role::kafka::analytics::broker ?
[19:06:40] <grrrit-wm>	 (03PS1) 10Elukey: Bump nf_conntrack_max temporarily to allow proper investigation. [puppet] - 10https://gerrit.wikimedia.org/r/279776 (https://phabricator.wikimedia.org/T131028) 
[19:06:50] <icinga-wm>	 PROBLEM - Check size of conntrack table on kafka1013 is CRITICAL: CRITICAL: nf_conntrack is 90 % full
[19:07:40] <icinga-wm>	 PROBLEM - Check size of conntrack table on kafka1022 is CRITICAL: CRITICAL: nf_conntrack is 90 % full
[19:07:59] <icinga-wm>	 PROBLEM - Check size of conntrack table on kafka1020 is CRITICAL: CRITICAL: nf_conntrack is 90 % full
[19:08:06] <elukey>	 all right code review sent
[19:08:29] <icinga-wm>	 PROBLEM - Check size of conntrack table on kafka1012 is CRITICAL: CRITICAL: nf_conntrack is 90 % full
[19:09:38] <icinga-wm>	 PROBLEM - Check size of conntrack table on kafka1018 is CRITICAL: CRITICAL: nf_conntrack is 91 % full
[19:17:17] <Luke|Dinner>	 moritzm: Take a look at https://phabricator.wikimedia.org/T131028#2153998, try to avoid quoting the old mail, if you answer via mail ;)
[19:23:31] <grrrit-wm>	 (03PS2) 10Elukey: Bump nf_conntrack_max temporarily to allow proper investigation. [puppet] - 10https://gerrit.wikimedia.org/r/279776 (https://phabricator.wikimedia.org/T131028) 
[20:00:01] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 031] Bump nf_conntrack_max temporarily to allow proper investigation. [puppet] - 10https://gerrit.wikimedia.org/r/279776 (https://phabricator.wikimedia.org/T131028) (owner: 10Elukey)
[20:01:54] <grrrit-wm>	 (03CR) 10Elukey: [C: 032] Bump nf_conntrack_max temporarily to allow proper investigation. [puppet] - 10https://gerrit.wikimedia.org/r/279776 (https://phabricator.wikimedia.org/T131028) (owner: 10Elukey)
[20:04:29] <elukey>	 !log Increased nf_conntrack_max to ~528k for the kafka brokers (https://gerrit.wikimedia.org/r/279776) 
[20:04:34] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[20:05:20] <icinga-wm>	 RECOVERY - Check size of conntrack table on kafka1022 is OK: OK: nf_conntrack is 46 % full
[20:05:39] <icinga-wm>	 RECOVERY - Check size of conntrack table on kafka1020 is OK: OK: nf_conntrack is 40 % full
[20:07:58] <icinga-wm>	 PROBLEM - Check size of conntrack table on kafka1012 is CRITICAL: CRITICAL: nf_conntrack is 90 % full
[20:07:59] <icinga-wm>	 PROBLEM - HHVM rendering on mw1123 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[20:07:59] <icinga-wm>	 PROBLEM - Check size of conntrack table on kafka1013 is CRITICAL: CRITICAL: nf_conntrack is 90 % full
[20:08:39] <icinga-wm>	 PROBLEM - Apache HTTP on mw1123 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50392 bytes in 0.011 second response time
[20:08:59] <icinga-wm>	 PROBLEM - Check size of conntrack table on kafka1018 is CRITICAL: CRITICAL: nf_conntrack is 95 % full
[20:09:39] <icinga-wm>	 RECOVERY - Check size of conntrack table on kafka1012 is OK: OK: nf_conntrack is 47 % full
[20:10:17] <elukey>	 mw1123: [Sun Mar 27 20:07:46 2016] Out of memory: Kill process 19999 (hhvm) score 928 or sacrifice child
[20:11:05] <elukey>	 !log restarted hhvm on mw1123
[20:11:08] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[20:11:18] <icinga-wm>	 RECOVERY - HHVM rendering on mw1123 is OK: HTTP OK: HTTP/1.1 200 OK - 67741 bytes in 1.862 second response time
[20:12:09] <icinga-wm>	 RECOVERY - Apache HTTP on mw1123 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 0.074 second response time
[20:15:08] <icinga-wm>	 PROBLEM - puppet last run on mw1123 is CRITICAL: CRITICAL: Puppet has 10 failures
[20:19:29] <icinga-wm>	 RECOVERY - Check size of conntrack table on kafka1018 is OK: OK: nf_conntrack is 47 % full
[20:20:00] <icinga-wm>	 RECOVERY - Check size of conntrack table on kafka1014 is OK: OK: nf_conntrack is 47 % full
[20:20:18] <icinga-wm>	 RECOVERY - Check size of conntrack table on kafka1013 is OK: OK: nf_conntrack is 47 % full
[20:20:24] <elukey>	 all right, no more alarms, going back to my sofa :)
[20:38:09] <icinga-wm>	 RECOVERY - puppet last run on mw1123 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[20:55:42] <eranroz>	 Hi, I contacted an abuse department of ISP provider of a troll who switch IPs and told them I had to  completely block the ISP IP ranges.  I gave them the IP+time of the edits, but they ask me also the source port since the IP is shared. where can I find it?
[20:57:18] <Reedy>	 You can't
[20:57:37] <Reedy>	 I'm not sure what use the source port would really be
[21:19:08] <Dereckson>	 Reedy, eranroz > it's standard NOC abuse procedure to check if the logs are coherent to require dest/src ports too
[21:20:12] <Reedy>	 Is it a shared IP as in it's dynamically assigned?
[21:20:18] <Reedy>	 Or it's shared, as there's NAT behind it?
[21:22:31] <Dereckson>	 Oh hmmm, that's true, we now have ISP (Free in France, Belgacom and Voo in Belgium) who use CGN or something similar for IPv4.
[21:23:04] <Dereckson>	 The information will become relevant, and not only a comprehensive nobody-knows-exactly-why request.
[21:24:18] <grrrit-wm>	 (03PS1) 10Ladsgroup: Flake8 for toollabs [puppet] - 10https://gerrit.wikimedia.org/r/279895 
[21:24:59] <icinga-wm>	 PROBLEM - puppet last run on db1076 is CRITICAL: CRITICAL: Puppet has 1 failures
[21:28:39] <grrrit-wm>	 (03PS1) 10Ladsgroup: Remove unused import in labs [puppet] - 10https://gerrit.wikimedia.org/r/279896 
[21:32:17] <Reedy>	 Dereckson: Y U NO IPv6?
[21:32:17] <Reedy>	 ;)
[21:32:59] <Reedy>	 CGNAT is evil
[21:33:31] <Dereckson>	 Oh sure, I've my HE tunnels, and the three ISP quoted have deployed it all three. And I'm puzzled each time we have a throttle request it's always IPv4 instead to be dual stack.
[21:35:08] <Dereckson>	 Perhaps the relevant ISPs think they only offer a legacy IPv4 fallback solution, and everyone uses IPv6.
[21:37:17] <grrrit-wm>	 (03CR) 10Tim Landscheidt: [C: 031] Remove unused import in labs [puppet] - 10https://gerrit.wikimedia.org/r/279896 (owner: 10Ladsgroup)
[21:39:25] <grrrit-wm>	 (03CR) 10Tim Landscheidt: [C: 031] Flake8 for toollabs [puppet] - 10https://gerrit.wikimedia.org/r/279895 (owner: 10Ladsgroup)
[21:51:28] <icinga-wm>	 RECOVERY - puppet last run on db1076 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[21:53:46] <grrrit-wm>	 (03PS1) 10Dereckson: HD logo for da.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/279897 (https://phabricator.wikimedia.org/T131033) 
[21:54:09] <icinga-wm>	 PROBLEM - Apache HTTP on mw1099 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[21:55:48] <icinga-wm>	 RECOVERY - Apache HTTP on mw1099 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 0.050 second response time
[21:59:49] <icinga-wm>	 PROBLEM - Kafka Broker Replica Max Lag on kafka1012 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [5000000.0]
[22:01:20] <grrrit-wm>	 (03PS1) 10Ori.livneh: Better request profiling via XWD header [mediawiki-config] - 10https://gerrit.wikimedia.org/r/279898 
[22:02:54] <grrrit-wm>	 (03CR) 10Ori.livneh: [C: 032] Better request profiling via XWD header [mediawiki-config] - 10https://gerrit.wikimedia.org/r/279898 (owner: 10Ori.livneh)
[22:03:14] * MatmaRex eyes ori
[22:03:18] <grrrit-wm>	 (03Merged) 10jenkins-bot: Better request profiling via XWD header [mediawiki-config] - 10https://gerrit.wikimedia.org/r/279898 (owner: 10Ori.livneh)
[22:04:52] <ori>	 MatmaRex: hi
[22:10:20] <icinga-wm>	 RECOVERY - Kafka Broker Replica Max Lag on kafka1012 is OK: OK: Less than 50.00% above the threshold [1000000.0]
[22:19:28] <icinga-wm>	 PROBLEM - Unmerged changes on repository mediawiki_config on mira is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/).
[22:21:55] <logmsgbot>	 !log ori@tin Synchronized wmf-config/StartProfiler.php: I1b5c620b85: Better request profiling via XWD header (duration: 00m 33s)
[22:21:59] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[22:22:49] <icinga-wm>	 RECOVERY - Unmerged changes on repository mediawiki_config on mira is OK: No changes to merge.
[22:47:38] <icinga-wm>	 PROBLEM - very high load average likely xfs on ms-be2008 is CRITICAL: CRITICAL - load average: 102.42, 100.94, 100.05
[23:40:08] <icinga-wm>	 PROBLEM - Kafka Broker Replica Max Lag on kafka1014 is CRITICAL: CRITICAL: 53.33% of data above the critical threshold [5000000.0]
[23:50:30] <icinga-wm>	 RECOVERY - Kafka Broker Replica Max Lag on kafka1014 is OK: OK: Less than 50.00% above the threshold [1000000.0]