[00:20:20] PROBLEM - puppet last run on mw1153 is CRITICAL: CRITICAL: Puppet has 1 failures [00:26:29] RECOVERY - puppet last run on db2002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:47:09] RECOVERY - puppet last run on mw1153 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [01:10:28] PROBLEM - Host mw1085 is DOWN: PING CRITICAL - Packet loss = 100% [01:11:19] RECOVERY - Host mw1085 is UP: PING OK - Packet loss = 0%, RTA = 0.51 ms [01:51:48] !log disabling 2fa for Hym411 T130994 [labswiki]> delete from oathauth_users where id=1363; [01:51:49] T130994: Reset 2FA on wikitech for User:Revi and User:Hym411 - https://phabricator.wikimedia.org/T130994 [01:51:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:53:10] thank you :) [01:53:28] yw [02:19:21] jzerebecki: hi. My account is concerned too, UID is 2362 if they are the same than LDAP. [02:22:28] !log mwdeploy@tin sync-l10n completed (1.27.0-wmf.18) (duration: 10m 06s) [02:22:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:29:28] jzerebecki: can you do server side uploads ? [02:30:14] matanya: I have no idea how to do those [02:30:42] Dereckson: I'm looking at it [02:31:03] !log l10nupdate@tin ResourceLoader cache refresh completed at Sun Mar 27 02:31:03 UTC 2016 (duration 8m 35s) [02:31:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:35:53] jzerebecki: https://wikitech.wikimedia.org/wiki/Uploading_large_files [02:37:14] matanya: ah yes good to know, why are you asking? [02:37:27] have a large file to upload [02:42:26] matanya: sorry not now [02:42:35] no worries [02:57:35] !log disabling 2fa for Dereckson T130892 [labswiki]> delete from oathauth_users where id=402; [02:57:37] T130892: wikitech 2fa provisioning form does so without confirmation - https://phabricator.wikimedia.org/T130892 [02:57:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:58:07] Dereckson: please verify [03:01:15] I can login, thanks. [03:04:51] Now, if I want to really add 2FA, is it safe to visit options page or should I wait some pending bug is fixed? [03:05:59] 503 Service Temporarily Unavailable [03:06:06] on commons [03:06:50] A database query error has occurred. This may indicate a bug in the software. [03:06:50] Function: WikiPage::lockAndGetLatest [03:06:50] Error: 1205 Lock wait timeout exceeded; try restarting transaction (10.64.16.29) [03:07:23] I broke commons ;) [03:08:59] Dereckson: it is safe to do so, the bug will not appear anymore. [03:11:50] ok [03:36:18] PROBLEM - puppet last run on mw1109 is CRITICAL: CRITICAL: Puppet has 1 failures [03:36:19] PROBLEM - puppet last run on mw2051 is CRITICAL: CRITICAL: Puppet has 1 failures [04:02:19] RECOVERY - puppet last run on mw1109 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [04:02:28] RECOVERY - puppet last run on mw2051 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [06:30:28] PROBLEM - puppet last run on wtp2008 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:28] PROBLEM - puppet last run on analytics1047 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:48] PROBLEM - puppet last run on cp2001 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:49] PROBLEM - puppet last run on mw2073 is CRITICAL: CRITICAL: Puppet has 1 failures [06:38:39] PROBLEM - Ubuntu mirror in sync with upstream on carbon is CRITICAL: /srv/mirrors/ubuntu is over 12 hours old. [06:43:49] RECOVERY - Ubuntu mirror in sync with upstream on carbon is OK: /srv/mirrors/ubuntu is over 0 hours old. [06:56:19] RECOVERY - puppet last run on cp2001 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [06:56:40] RECOVERY - puppet last run on wtp2008 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [06:57:40] RECOVERY - puppet last run on analytics1047 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:09] RECOVERY - puppet last run on mw2073 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [07:19:28] PROBLEM - RAID on ms-be2016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:21:39] PROBLEM - very high load average likely xfs on ms-be2016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:25:09] PROBLEM - configured eth on ms-be2016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:25:09] PROBLEM - swift-object-server on ms-be2016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:25:09] PROBLEM - swift-object-updater on ms-be2016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:25:10] PROBLEM - swift-container-auditor on ms-be2016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:25:28] PROBLEM - salt-minion processes on ms-be2016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:25:30] PROBLEM - puppet last run on ms-be2016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:25:39] PROBLEM - swift-account-replicator on ms-be2016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:25:40] PROBLEM - swift-container-replicator on ms-be2016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:25:48] PROBLEM - swift-object-auditor on ms-be2016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:25:50] PROBLEM - SSH on ms-be2016 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:25:59] PROBLEM - swift-container-server on ms-be2016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:26:00] PROBLEM - swift-container-updater on ms-be2016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:26:09] PROBLEM - swift-account-reaper on ms-be2016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:26:19] PROBLEM - swift-object-replicator on ms-be2016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:26:19] PROBLEM - Disk space on ms-be2016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:26:28] PROBLEM - swift-account-server on ms-be2016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:26:38] PROBLEM - swift-account-auditor on ms-be2016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:26:39] PROBLEM - dhclient process on ms-be2016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:26:39] PROBLEM - DPKG on ms-be2016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:26:48] PROBLEM - Check size of conntrack table on ms-be2016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:15:58] PROBLEM - NTP on ms-be2016 is CRITICAL: NTP CRITICAL: No response from NTP server [08:50:08] 6Operations, 10Traffic: HTTPS error on status.wikimedia.org - https://phabricator.wikimedia.org/T131017#2153719 (10Peachey88) [08:51:01] 6Operations, 10Traffic, 7HTTPS: HTTPS error on status.wikimedia.org - https://phabricator.wikimedia.org/T131017#2153707 (10Peachey88) AFAIK we can't fix that, because its hosted externally by watchmouse. [08:51:30] 6Operations, 10Traffic, 7HTTPS: HTTPS error on status.wikimedia.org (watchmouse certificate mismatch) - https://phabricator.wikimedia.org/T131017#2153723 (10Peachey88) [08:52:10] 6Operations, 10Traffic, 7HTTPS: HTTPS error on status.wikimedia.org (watchmouse certificate mismatch) - https://phabricator.wikimedia.org/T131017#2153707 (10Peachey88) [08:52:12] 6Operations, 10Traffic, 7HTTPS: status.wikimedia.org is using SSL cert from other domain - https://phabricator.wikimedia.org/T34796#2153725 (10Peachey88) [09:05:49] 6Operations, 10Traffic, 7HTTPS: investigate/remove hostname login.m.wikimedia.org - https://phabricator.wikimedia.org/T111998#2153727 (10Peachey88) [09:05:51] 6Operations, 10Traffic, 7Mobile, 13Patch-For-Review: Investigate if login.m.wikimedia.org needs to stay around - https://phabricator.wikimedia.org/T123431#2153728 (10Peachey88) [12:07:08] PROBLEM - HHVM rendering on mw1253 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50395 bytes in 0.175 second response time [12:08:49] RECOVERY - HHVM rendering on mw1253 is OK: HTTP OK: HTTP/1.1 200 OK - 67886 bytes in 0.070 second response time [13:28:51] 6Operations, 10Wikimedia-Apache-configuration: SVGs without XML prolog () served with Content-Type text/html from upload.wikimedia.org - https://phabricator.wikimedia.org/T131012#2153826 (10matmarex) I'm not sure if this is an Apache configuration problem or what, I'm really using this to mean "The... [13:30:58] (03CR) 10Luke081515: [C: 04-1] "Needs rebase." [dns] - 10https://gerrit.wikimedia.org/r/276385 (https://phabricator.wikimedia.org/T123431) (owner: 10Dzahn) [14:01:35] 6Operations, 10media-storage: SVGs without XML prolog () served with Content-Type text/html from upload.wikimedia.org - https://phabricator.wikimedia.org/T131012#2153834 (10Krenair) from upload? I don't think it can be apache ```krenair@tin:~$ set +H krenair@tin:~$ curl -I -H "Host: upload.wikimedi... [17:07:48] PROBLEM - Check size of conntrack table on kafka1014 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [17:07:58] PROBLEM - Check size of conntrack table on kafka1012 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [17:08:00] PROBLEM - Check size of conntrack table on kafka1013 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [17:08:48] PROBLEM - Check size of conntrack table on kafka1022 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [17:08:58] PROBLEM - Check size of conntrack table on kafka1018 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [17:09:00] PROBLEM - Check size of conntrack table on kafka1020 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [17:39:19] PROBLEM - puppet last run on mw1123 is CRITICAL: CRITICAL: Puppet has 3 failures [17:50:29] (03Abandoned) 10Tim Landscheidt: testsystem: Move role class to module role [puppet] - 10https://gerrit.wikimedia.org/r/270105 (owner: 10Tim Landscheidt) [18:04:29] PROBLEM - Ubuntu mirror in sync with upstream on carbon is CRITICAL: /srv/mirrors/ubuntu is over 12 hours old. [18:08:59] RECOVERY - puppet last run on mw1123 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [18:11:39] PROBLEM - Check size of conntrack table on kafka1022 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [18:11:48] PROBLEM - Check size of conntrack table on kafka1018 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [18:11:58] PROBLEM - Check size of conntrack table on kafka1020 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [18:12:19] PROBLEM - Check size of conntrack table on kafka1014 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [18:12:29] PROBLEM - Check size of conntrack table on kafka1012 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [18:12:38] PROBLEM - Check size of conntrack table on kafka1013 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [18:17:29] ---^ still tons of TIME_WAITs showing up in /proc/net/nf_conntrack and not on nestat -tunap [18:21:18] PROBLEM - Check size of conntrack table on kafka1013 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [18:22:19] PROBLEM - Check size of conntrack table on kafka1020 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [18:22:48] PROBLEM - Check size of conntrack table on kafka1014 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [18:22:58] PROBLEM - Check size of conntrack table on kafka1012 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [18:23:15] and most of the timewaits are from mw1233.eqiad.wmnet. hosts.. [18:23:48] PROBLEM - Check size of conntrack table on kafka1022 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [18:23:58] PROBLEM - Check size of conntrack table on kafka1018 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [18:25:12] and netstat -tuap shows ESTABLISHED with other kafka hosts or cpXXXX as expected [18:29:00] PROBLEM - Check size of conntrack table on kafka1022 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [18:29:10] PROBLEM - Check size of conntrack table on kafka1018 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [18:29:19] PROBLEM - Check size of conntrack table on kafka1020 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [18:31:40] PROBLEM - Check size of conntrack table on kafka1014 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [18:31:49] PROBLEM - Check size of conntrack table on kafka1012 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [18:31:58] PROBLEM - Check size of conntrack table on kafka1013 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [18:32:40] PROBLEM - Check size of conntrack table on kafka1022 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [18:32:59] PROBLEM - Check size of conntrack table on kafka1020 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [18:33:25] 6Operations, 10Analytics: nf_conntrack warnings for kafka hosts - https://phabricator.wikimedia.org/T131028#2153998 (10elukey) [18:33:38] https://phabricator.wikimedia.org/T131028 filed [18:35:07] (happy easter btw :P) [18:35:36] (thanks, happy easter to you too) [18:37:05] :) yesterday it went away by itself, but something definitely changed in the past two days to trigger this [18:40:39] PROBLEM - Check size of conntrack table on kafka1013 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [18:40:39] happy easter, elukey and Dereckson :) [18:41:28] PROBLEM - Check size of conntrack table on kafka1022 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [18:41:38] PROBLEM - Check size of conntrack table on kafka1018 is CRITICAL: CRITICAL: nf_conntrack is 91 % full [18:41:39] PROBLEM - Check size of conntrack table on kafka1020 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [18:42:09] PROBLEM - Check size of conntrack table on kafka1014 is CRITICAL: CRITICAL: nf_conntrack is 91 % full [18:42:19] PROBLEM - Check size of conntrack table on kafka1012 is CRITICAL: CRITICAL: nf_conntrack is 91 % full [18:44:39] RECOVERY - Ubuntu mirror in sync with upstream on carbon is OK: /srv/mirrors/ubuntu is over 0 hours old. [18:46:39] PROBLEM - Check size of conntrack table on kafka1022 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [18:46:48] PROBLEM - Check size of conntrack table on kafka1018 is CRITICAL: CRITICAL: nf_conntrack is 91 % full [18:46:58] PROBLEM - Check size of conntrack table on kafka1020 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [18:47:19] PROBLEM - Check size of conntrack table on kafka1014 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [18:47:23] mmm the alarms are bouncing, a lot of noise but for the moment nothing critical [18:50:18] PROBLEM - Check size of conntrack table on kafka1018 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [18:50:20] PROBLEM - Check size of conntrack table on kafka1020 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [18:50:49] PROBLEM - Check size of conntrack table on kafka1014 is CRITICAL: CRITICAL: nf_conntrack is 91 % full [18:50:59] PROBLEM - Check size of conntrack table on kafka1012 is CRITICAL: CRITICAL: nf_conntrack is 91 % full [18:51:50] PROBLEM - Check size of conntrack table on kafka1022 is CRITICAL: CRITICAL: nf_conntrack is 91 % full [18:56:45] 6Operations, 10Analytics: nf_conntrack warnings for kafka hosts - https://phabricator.wikimedia.org/T131028#2153998 (10Muehlenhoff) We can easily bump that to 512k, we had the same workaround for the recent job runner problems. We can than properly analyse the root cause on Tuesday. Am 27.03.2016 20:33 schrieb... [18:58:28] moritzm: o/ shall we bump nf_conntrack_max for kafka? [18:59:51] maybe for role::kafka::analytics::broker ? [19:06:40] (03PS1) 10Elukey: Bump nf_conntrack_max temporarily to allow proper investigation. [puppet] - 10https://gerrit.wikimedia.org/r/279776 (https://phabricator.wikimedia.org/T131028) [19:06:50] PROBLEM - Check size of conntrack table on kafka1013 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [19:07:40] PROBLEM - Check size of conntrack table on kafka1022 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [19:07:59] PROBLEM - Check size of conntrack table on kafka1020 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [19:08:06] all right code review sent [19:08:29] PROBLEM - Check size of conntrack table on kafka1012 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [19:09:38] PROBLEM - Check size of conntrack table on kafka1018 is CRITICAL: CRITICAL: nf_conntrack is 91 % full [19:17:17] moritzm: Take a look at https://phabricator.wikimedia.org/T131028#2153998, try to avoid quoting the old mail, if you answer via mail ;) [19:23:31] (03PS2) 10Elukey: Bump nf_conntrack_max temporarily to allow proper investigation. [puppet] - 10https://gerrit.wikimedia.org/r/279776 (https://phabricator.wikimedia.org/T131028) [20:00:01] (03CR) 10Yuvipanda: [C: 031] Bump nf_conntrack_max temporarily to allow proper investigation. [puppet] - 10https://gerrit.wikimedia.org/r/279776 (https://phabricator.wikimedia.org/T131028) (owner: 10Elukey) [20:01:54] (03CR) 10Elukey: [C: 032] Bump nf_conntrack_max temporarily to allow proper investigation. [puppet] - 10https://gerrit.wikimedia.org/r/279776 (https://phabricator.wikimedia.org/T131028) (owner: 10Elukey) [20:04:29] !log Increased nf_conntrack_max to ~528k for the kafka brokers (https://gerrit.wikimedia.org/r/279776) [20:04:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:05:20] RECOVERY - Check size of conntrack table on kafka1022 is OK: OK: nf_conntrack is 46 % full [20:05:39] RECOVERY - Check size of conntrack table on kafka1020 is OK: OK: nf_conntrack is 40 % full [20:07:58] PROBLEM - Check size of conntrack table on kafka1012 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [20:07:59] PROBLEM - HHVM rendering on mw1123 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:07:59] PROBLEM - Check size of conntrack table on kafka1013 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [20:08:39] PROBLEM - Apache HTTP on mw1123 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50392 bytes in 0.011 second response time [20:08:59] PROBLEM - Check size of conntrack table on kafka1018 is CRITICAL: CRITICAL: nf_conntrack is 95 % full [20:09:39] RECOVERY - Check size of conntrack table on kafka1012 is OK: OK: nf_conntrack is 47 % full [20:10:17] mw1123: [Sun Mar 27 20:07:46 2016] Out of memory: Kill process 19999 (hhvm) score 928 or sacrifice child [20:11:05] !log restarted hhvm on mw1123 [20:11:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:11:18] RECOVERY - HHVM rendering on mw1123 is OK: HTTP OK: HTTP/1.1 200 OK - 67741 bytes in 1.862 second response time [20:12:09] RECOVERY - Apache HTTP on mw1123 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 0.074 second response time [20:15:08] PROBLEM - puppet last run on mw1123 is CRITICAL: CRITICAL: Puppet has 10 failures [20:19:29] RECOVERY - Check size of conntrack table on kafka1018 is OK: OK: nf_conntrack is 47 % full [20:20:00] RECOVERY - Check size of conntrack table on kafka1014 is OK: OK: nf_conntrack is 47 % full [20:20:18] RECOVERY - Check size of conntrack table on kafka1013 is OK: OK: nf_conntrack is 47 % full [20:20:24] all right, no more alarms, going back to my sofa :) [20:38:09] RECOVERY - puppet last run on mw1123 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [20:55:42] Hi, I contacted an abuse department of ISP provider of a troll who switch IPs and told them I had to completely block the ISP IP ranges. I gave them the IP+time of the edits, but they ask me also the source port since the IP is shared. where can I find it? [20:57:18] You can't [20:57:37] I'm not sure what use the source port would really be [21:19:08] Reedy, eranroz > it's standard NOC abuse procedure to check if the logs are coherent to require dest/src ports too [21:20:12] Is it a shared IP as in it's dynamically assigned? [21:20:18] Or it's shared, as there's NAT behind it? [21:22:31] Oh hmmm, that's true, we now have ISP (Free in France, Belgacom and Voo in Belgium) who use CGN or something similar for IPv4. [21:23:04] The information will become relevant, and not only a comprehensive nobody-knows-exactly-why request. [21:24:18] (03PS1) 10Ladsgroup: Flake8 for toollabs [puppet] - 10https://gerrit.wikimedia.org/r/279895 [21:24:59] PROBLEM - puppet last run on db1076 is CRITICAL: CRITICAL: Puppet has 1 failures [21:28:39] (03PS1) 10Ladsgroup: Remove unused import in labs [puppet] - 10https://gerrit.wikimedia.org/r/279896 [21:32:17] Dereckson: Y U NO IPv6? [21:32:17] ;) [21:32:59] CGNAT is evil [21:33:31] Oh sure, I've my HE tunnels, and the three ISP quoted have deployed it all three. And I'm puzzled each time we have a throttle request it's always IPv4 instead to be dual stack. [21:35:08] Perhaps the relevant ISPs think they only offer a legacy IPv4 fallback solution, and everyone uses IPv6. [21:37:17] (03CR) 10Tim Landscheidt: [C: 031] Remove unused import in labs [puppet] - 10https://gerrit.wikimedia.org/r/279896 (owner: 10Ladsgroup) [21:39:25] (03CR) 10Tim Landscheidt: [C: 031] Flake8 for toollabs [puppet] - 10https://gerrit.wikimedia.org/r/279895 (owner: 10Ladsgroup) [21:51:28] RECOVERY - puppet last run on db1076 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:53:46] (03PS1) 10Dereckson: HD logo for da.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/279897 (https://phabricator.wikimedia.org/T131033) [21:54:09] PROBLEM - Apache HTTP on mw1099 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:55:48] RECOVERY - Apache HTTP on mw1099 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 0.050 second response time [21:59:49] PROBLEM - Kafka Broker Replica Max Lag on kafka1012 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [5000000.0] [22:01:20] (03PS1) 10Ori.livneh: Better request profiling via XWD header [mediawiki-config] - 10https://gerrit.wikimedia.org/r/279898 [22:02:54] (03CR) 10Ori.livneh: [C: 032] Better request profiling via XWD header [mediawiki-config] - 10https://gerrit.wikimedia.org/r/279898 (owner: 10Ori.livneh) [22:03:14] * MatmaRex eyes ori [22:03:18] (03Merged) 10jenkins-bot: Better request profiling via XWD header [mediawiki-config] - 10https://gerrit.wikimedia.org/r/279898 (owner: 10Ori.livneh) [22:04:52] MatmaRex: hi [22:10:20] RECOVERY - Kafka Broker Replica Max Lag on kafka1012 is OK: OK: Less than 50.00% above the threshold [1000000.0] [22:19:28] PROBLEM - Unmerged changes on repository mediawiki_config on mira is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/). [22:21:55] !log ori@tin Synchronized wmf-config/StartProfiler.php: I1b5c620b85: Better request profiling via XWD header (duration: 00m 33s) [22:21:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:22:49] RECOVERY - Unmerged changes on repository mediawiki_config on mira is OK: No changes to merge. [22:47:38] PROBLEM - very high load average likely xfs on ms-be2008 is CRITICAL: CRITICAL - load average: 102.42, 100.94, 100.05 [23:40:08] PROBLEM - Kafka Broker Replica Max Lag on kafka1014 is CRITICAL: CRITICAL: 53.33% of data above the critical threshold [5000000.0] [23:50:30] RECOVERY - Kafka Broker Replica Max Lag on kafka1014 is OK: OK: Less than 50.00% above the threshold [1000000.0]