[01:01:21] Shouldn't all our sites which require login passwords for restricted data over the internet require HTTPS? [01:03:53] Sure [01:12:30] see PM, hoo [01:21:51] hoo: modules/mediawiki/files/apache/sites/main.conf search for RW_PROTO [01:23:12] hoo: once that is set in the vhost add something like: RewriteRule ^(.*)$ https://ru.wikinews.org$1 [R=301,L] [01:25:58] (03PS1) 10Hoo man: Force HTTPS for graphite [puppet] - 10https://gerrit.wikimedia.org/r/181949 [01:37:48] (03PS2) 10Hoo man: Force HTTPS for graphite [puppet] - 10https://gerrit.wikimedia.org/r/181949 [01:43:33] modules/graphite/templates/graphite.nginx.erb [01:59:10] (03PS3) 10Hoo man: Force HTTPS for graphite [puppet] - 10https://gerrit.wikimedia.org/r/181949 [02:21:13] (03PS1) 10JanZerebecki: Delete unused classes and templates. [puppet] - 10https://gerrit.wikimedia.org/r/181952 [02:23:27] (03CR) 10JanZerebecki: [C: 031] Force HTTPS for graphite [puppet] - 10https://gerrit.wikimedia.org/r/181949 (owner: 10Hoo man) [02:46:26] hoo: could you rename https://wikitech.wikimedia.org/wiki/User:Chmarkine/HTTPS to https://wikitech.wikimedia.org/wiki/HTTPS/domains status [03:13:16] PROBLEM - puppet last run on db2005 is CRITICAL: CRITICAL: puppet fail [03:26:49] RECOVERY - puppet last run on db2005 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [03:39:32] PROBLEM - puppet last run on cp1038 is CRITICAL: CRITICAL: Puppet has 1 failures [03:39:54] PROBLEM - puppet last run on mw1029 is CRITICAL: CRITICAL: Puppet has 1 failures [03:40:32] PROBLEM - puppet last run on amssq62 is CRITICAL: CRITICAL: Puppet has 1 failures [03:40:43] PROBLEM - puppet last run on mw1097 is CRITICAL: CRITICAL: Puppet has 1 failures [03:44:52] PROBLEM - puppet last run on cp4013 is CRITICAL: CRITICAL: puppet fail [03:46:33] PROBLEM - puppet last run on mw1020 is CRITICAL: CRITICAL: Puppet has 1 failures [03:49:03] PROBLEM - puppet last run on cp3008 is CRITICAL: CRITICAL: puppet fail [03:50:44] PROBLEM - puppet last run on mw1012 is CRITICAL: CRITICAL: Puppet has 1 failures [03:53:13] RECOVERY - puppet last run on cp1038 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [03:53:33] RECOVERY - puppet last run on mw1029 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [03:54:03] RECOVERY - puppet last run on amssq62 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [03:54:12] PROBLEM - puppet last run on mw1120 is CRITICAL: CRITICAL: Puppet has 1 failures [03:54:32] RECOVERY - puppet last run on mw1097 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [03:59:04] PROBLEM - puppet last run on hafnium is CRITICAL: CRITICAL: Puppet has 1 failures [04:00:33] RECOVERY - puppet last run on mw1020 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [04:02:14] RECOVERY - puppet last run on cp4013 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [04:03:04] RECOVERY - puppet last run on cp3008 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [04:04:35] RECOVERY - puppet last run on mw1012 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [04:08:04] RECOVERY - puppet last run on mw1120 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [04:12:43] RECOVERY - puppet last run on hafnium is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [04:56:53] (03PS1) 10Glaisher: [URGENT] Increase throttle rule on hewiki for public wiki event [mediawiki-config] - 10https://gerrit.wikimedia.org/r/181955 [04:58:34] anyone up to get that deployed? ^^ [05:48:35] (03CR) 10MaxSem: [C: 04-2] "I think that once upon a time, we need to actually start declining outrageously belated requests, and this holiday sounds like a good time" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/181955 (owner: 10Glaisher) [06:32:50] PROBLEM - puppet last run on mw1129 is CRITICAL: CRITICAL: puppet fail [06:37:59] PROBLEM - puppet last run on mw1025 is CRITICAL: CRITICAL: Puppet has 1 failures [06:39:00] PROBLEM - puppet last run on mw1052 is CRITICAL: CRITICAL: Puppet has 1 failures [06:46:49] RECOVERY - puppet last run on mw1129 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [06:48:38] RECOVERY - puppet last run on mw1025 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:49:31] RECOVERY - puppet last run on mw1052 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [07:30:34] Glaisher: the patch is also not reflecting his request [07:30:50] and i agree with MaxSem [11:27:56] PROBLEM - very high load average likely xfs on ms-be2011 is CRITICAL: CRITICAL - load average: 101.62, 101.18, 98.71 [11:48:57] PROBLEM - very high load average likely xfs on ms-be2011 is CRITICAL: CRITICAL - load average: 100.03, 100.15, 98.12 [11:55:57] PROBLEM - very high load average likely xfs on ms-be2011 is CRITICAL: CRITICAL - load average: 108.87, 100.29, 98.23 [12:25:01] PROBLEM - puppet last run on cp3007 is CRITICAL: CRITICAL: puppet fail [12:37:30] PROBLEM - very high load average likely xfs on ms-be2011 is CRITICAL: CRITICAL - load average: 102.47, 101.13, 99.32 [12:38:41] RECOVERY - puppet last run on cp3007 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [12:48:10] PROBLEM - very high load average likely xfs on ms-be2011 is CRITICAL: CRITICAL - load average: 105.98, 102.50, 100.45 [13:01:50] PROBLEM - very high load average likely xfs on ms-be2011 is CRITICAL: CRITICAL - load average: 106.58, 102.95, 101.26 [13:12:27] (03CR) 10Hoo man: "I agree with MaxSem here... in case this is still relevant, an administrator (or someone with "noratelimit") can create accounts for the u" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/181955 (owner: 10Glaisher) [13:15:41] PROBLEM - very high load average likely xfs on ms-be2011 is CRITICAL: CRITICAL - load average: 103.65, 101.59, 100.46 [13:37:31] PROBLEM - Host virt1012 is DOWN: CRITICAL - Plugin timed out after 15 seconds [13:37:31] PROBLEM - Host mw1096 is DOWN: CRITICAL - Plugin timed out after 15 seconds [13:37:43] PROBLEM - Host analytics1032 is DOWN: CRITICAL - Plugin timed out after 15 seconds [13:37:43] PROBLEM - Host searchidx1001 is DOWN: CRITICAL - Plugin timed out after 15 seconds [13:37:43] PROBLEM - Host db1027 is DOWN: CRITICAL - Plugin timed out after 15 seconds [13:37:43] PROBLEM - Host elastic1003 is DOWN: CRITICAL - Plugin timed out after 15 seconds [13:37:43] PROBLEM - Host mw1107 is DOWN: CRITICAL - Plugin timed out after 15 seconds [13:37:44] PROBLEM - Host appservers.svc.eqiad.wmnet is DOWN: CRITICAL - Plugin timed out after 15 seconds [13:37:56] PROBLEM - Host db2003 is DOWN: CRITICAL - Plugin timed out after 15 seconds [13:37:57] RECOVERY - Host mw1096 is UP: PING OK - Packet loss = 0%, RTA = 3.13 ms [13:38:08] RECOVERY - Host mw1107 is UP: PING OK - Packet loss = 0%, RTA = 2.48 ms [13:38:08] RECOVERY - Host elastic1003 is UP: PING OK - Packet loss = 0%, RTA = 2.83 ms [13:38:37] RECOVERY - Host analytics1032 is UP: PING OK - Packet loss = 0%, RTA = 2.79 ms [13:38:38] PROBLEM - Host mw1139 is DOWN: CRITICAL - Plugin timed out after 15 seconds [13:38:51] RECOVERY - Host db1027 is UP: PING OK - Packet loss = 0%, RTA = 0.96 ms [13:38:52] RECOVERY - Host virt1012 is UP: PING OK - Packet loss = 0%, RTA = 0.28 ms [13:38:52] RECOVERY - Host mw1139 is UP: PING OK - Packet loss = 0%, RTA = 1.40 ms [13:39:08] RECOVERY - Host db2003 is UP: PING OK - Packet loss = 0%, RTA = 43.21 ms [13:39:27] RECOVERY - Host searchidx1001 is UP: PING OK - Packet loss = 0%, RTA = 1.58 ms [13:40:36] RECOVERY - Host appservers.svc.eqiad.wmnet is UP: PING OK - Packet loss = 0%, RTA = 3.92 ms [13:40:57] <_joe_> wth? [13:41:53] what was that? [13:42:01] <_joe_> mark: no idea [13:42:09] <_joe_> I jumped online right now [13:42:14] me too [13:43:18] <_joe_> neon maybe? [13:43:51] yeah [13:43:55] looking at ganglia that must be it [13:44:10] There was an error collecting ganglia data (127.0.0.1:8654): fsockopen error: Connection timed out [13:45:31] lots of check_ganglia calls [13:46:17] <_joe_> that's quite usual btw [13:46:37] yeah [13:46:49] well [13:46:53] nothing to be done now then [13:46:56] we'll look at it tomorrow [13:46:56] <_joe_> no 5xx spike... i'd say the third false alarm [13:47:00] yup [13:47:00] <_joe_> of the festivities [13:47:06] <_joe_> :) [13:47:11] and you're on vacation ;) [13:47:13] but thanks for checking in! [13:47:34] <_joe_> eheh, but I've seen "appservers" [13:47:39] yeah [13:47:40] appreciated ;) [13:47:41] <_joe_> I had a panic moment [13:47:44] hehe [13:47:57] <_joe_> see you next year, hopefully! [13:48:02] hopefully ;) [13:58:18] PROBLEM - very high load average likely xfs on ms-be2011 is CRITICAL: CRITICAL - load average: 106.53, 103.45, 101.52 [14:05:19] PROBLEM - Kafka Broker Messages In on analytics1021 is CRITICAL: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate CRITICAL: 8.00709266445e-94 [15:18:33] (03Abandoned) 10Glaisher: [URGENT] Increase throttle rule on hewiki for public wiki event [mediawiki-config] - 10https://gerrit.wikimedia.org/r/181955 (owner: 10Glaisher) [16:01:08] (03Restored) 10John F. Lewis: planets: add Varnish statement [puppet] - 10https://gerrit.wikimedia.org/r/181419 (owner: 10John F. Lewis) [16:05:27] (03PS1) 10John F. Lewis: planets: remove SSL stanza [puppet] - 10https://gerrit.wikimedia.org/r/181984 [16:05:48] (03PS2) 10John F. Lewis: planets: remove SSL stanza [puppet] - 10https://gerrit.wikimedia.org/r/181984 [16:10:16] (03PS1) 10John F. Lewis: planet: change dns to misc-web [dns] - 10https://gerrit.wikimedia.org/r/181985 [16:25:26] (03PS1) 10John F. Lewis: map ipv6 on dataset1001 [puppet] - 10https://gerrit.wikimedia.org/r/181987 [16:38:36] PROBLEM - puppet last run on mw1248 is CRITICAL: CRITICAL: Puppet has 1 failures [16:45:41] matanya: why can I see four duplicate tickets (exactly) of https://phabricator.wikimedia.org/T84459? [16:46:01] (just in case you might know) [16:46:48] oh I see; RT annoyance stuff. I'll merge [16:52:40] RECOVERY - puppet last run on mw1248 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:57:54] (03PS1) 10Tim Landscheidt: Tools: Fix typo in static nginx configuration [puppet] - 10https://gerrit.wikimedia.org/r/181989 [17:03:02] PROBLEM - Host dysprosium is DOWN: CRITICAL - Plugin timed out after 15 seconds [17:03:04] PROBLEM - Host mw1237 is DOWN: CRITICAL - Plugin timed out after 15 seconds [17:03:12] RECOVERY - Host mw1237 is UP: PING OK - Packet loss = 0%, RTA = 1.75 ms [17:03:21] RECOVERY - Host dysprosium is UP: PING OK - Packet loss = 0%, RTA = 2.90 ms [17:04:31] PROBLEM - Host rubidium is DOWN: CRITICAL - Plugin timed out after 15 seconds [17:04:40] RECOVERY - Host rubidium is UP: PING OK - Packet loss = 0%, RTA = 1.71 ms [17:11:08] (03PS1) 10Glaisher: Redirect Main_[Pp]age to Wikidata:Main_Page [puppet] - 10https://gerrit.wikimedia.org/r/181992 [17:14:10] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 7.14% of data above the critical threshold [500.0] [17:21:10] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [17:33:26] (03CR) 10Hoo man: Redirect Main_[Pp]age to Wikidata:Main_Page (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/181992 (owner: 10Glaisher) [18:32:13] PROBLEM - Host analytics1035 is DOWN: CRITICAL - Plugin timed out after 15 seconds [18:32:14] PROBLEM - Host mw1142 is DOWN: CRITICAL - Plugin timed out after 15 seconds [18:32:14] PROBLEM - Host stat1002 is DOWN: CRITICAL - Plugin timed out after 15 seconds [18:32:14] PROBLEM - Host wtp1019 is DOWN: CRITICAL - Plugin timed out after 15 seconds [18:32:14] PROBLEM - Host cp1038 is DOWN: CRITICAL - Plugin timed out after 15 seconds [18:32:21] RECOVERY - Host cp1038 is UP: PING OK - Packet loss = 0%, RTA = 0.25 ms [18:32:21] RECOVERY - Host mw1142 is UP: PING OK - Packet loss = 0%, RTA = 0.32 ms [18:32:22] RECOVERY - Host stat1002 is UP: PING OK - Packet loss = 0%, RTA = 1.75 ms [18:33:00] RECOVERY - Host analytics1035 is UP: PING OK - Packet loss = 0%, RTA = 1.11 ms [18:33:10] RECOVERY - Host wtp1019 is UP: PING OK - Packet loss = 0%, RTA = 4.61 ms [18:49:27] (03PS3) 10Nemo bis: Update cached article count monthly to avoid social unrest [puppet] - 10https://gerrit.wikimedia.org/r/178170 [18:50:02] (03CR) 10Nemo bis: "Rebased, should be ready to go" [puppet] - 10https://gerrit.wikimedia.org/r/178170 (owner: 10Nemo bis) [21:41:11] PROBLEM - puppet last run on lvs2003 is CRITICAL: CRITICAL: puppet fail [21:54:31] RECOVERY - puppet last run on lvs2003 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [23:26:35] PROBLEM - Kafka Broker Messages In on analytics1021 is CRITICAL: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate CRITICAL: 4.42980642416e-110