[00:00:16] 6operations, 10Wikimedia-Git-or-Gerrit, 7Monitoring: Improve monitoring of https://git.wikimedia.org/ - https://phabricator.wikimedia.org/T94320#1159734 (10greg) [00:01:02] i didn't know we plan to retire it. i wonder what the proposed replacement is [00:01:26] of gitblit? phab [00:01:38] https://phabricator.wikimedia.org/tag/gitblit-deprecate/ [00:01:42] one tool to rule them all [00:02:01] and in the downtime bind them [00:02:09] exactly [00:02:36] [00:11:51] https://meta.wikimedia.org/wiki/System_administrators#List - anyone know about the missing info? [00:12:12] James Douglas's irc nickname, and on-wiki accounts for Chris Johnson and Marko Obrovac [00:30:03] james is earldouglas [00:32:46] got the other two sorted out as well [00:32:48] thanks jgage [01:32:07] PROBLEM - puppet last run on mw2130 is CRITICAL: CRITICAL: puppet fail [01:43:17] PROBLEM - puppet last run on mw1101 is CRITICAL: CRITICAL: Puppet has 1 failures [01:43:37] PROBLEM - puppet last run on mw1075 is CRITICAL: CRITICAL: Puppet has 3 failures [01:43:56] PROBLEM - puppet last run on mw1019 is CRITICAL: CRITICAL: Puppet has 1 failures [01:43:56] PROBLEM - puppet last run on lanthanum is CRITICAL: CRITICAL: Puppet has 1 failures [01:44:19] PROBLEM - puppet last run on mw1184 is CRITICAL: CRITICAL: Puppet has 2 failures [01:44:20] PROBLEM - puppet last run on mw1157 is CRITICAL: CRITICAL: Puppet has 1 failures [01:44:27] PROBLEM - puppet last run on mw1095 is CRITICAL: CRITICAL: Puppet has 1 failures [01:44:27] PROBLEM - puppet last run on mw1058 is CRITICAL: CRITICAL: Puppet has 1 failures [01:44:38] PROBLEM - puppet last run on mw1191 is CRITICAL: CRITICAL: Puppet has 1 failures [01:44:47] PROBLEM - puppet last run on mw1196 is CRITICAL: CRITICAL: Puppet has 1 failures [01:50:27] RECOVERY - puppet last run on mw2130 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [01:57:28] RECOVERY - puppet last run on mw1157 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [01:58:28] RECOVERY - puppet last run on mw1075 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [01:58:47] RECOVERY - puppet last run on lanthanum is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [01:58:48] RECOVERY - puppet last run on mw1019 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [01:59:17] RECOVERY - puppet last run on mw1095 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [01:59:18] RECOVERY - puppet last run on mw1058 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [01:59:37] RECOVERY - puppet last run on mw1191 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [01:59:38] RECOVERY - puppet last run on mw1196 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [01:59:57] RECOVERY - puppet last run on mw1101 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [02:00:48] RECOVERY - puppet last run on mw1184 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [02:05:59] greg-g: https://xkcd.com/303/ [02:09:15] Negative24: that one [02:09:46] :) [02:17:35] 6operations, 10RESTBase, 10RESTBase-Cassandra, 6Security, 5Patch-For-Review: iptables firewall to limit access to Cassandra services - https://phabricator.wikimedia.org/T92680#1159883 (10Eevans) [02:18:59] !log l10nupdate Synchronized php-1.25wmf22/cache/l10n: (no message) (duration: 05m 22s) [02:19:13] Logged the message, Master [02:22:29] !log LocalisationUpdate completed (1.25wmf22) at 2015-03-29 02:21:25+00:00 [02:22:35] Logged the message, Master [02:32:56] !log l10nupdate Synchronized php-1.25wmf23/cache/l10n: (no message) (duration: 03m 20s) [02:33:03] Logged the message, Master [02:34:57] 6operations, 10RESTBase, 10RESTBase-Cassandra: secure Cassandra/RESTBase cluster - https://phabricator.wikimedia.org/T94329#1159901 (10Eevans) 3NEW [02:35:31] !log LocalisationUpdate completed (1.25wmf23) at 2015-03-29 02:34:28+00:00 [02:35:36] Logged the message, Master [02:37:22] 6operations, 10RESTBase, 10RESTBase-Cassandra, 6Security, 5Patch-For-Review: iptables firewall to limit access to Cassandra services - https://phabricator.wikimedia.org/T92680#1159909 (10Eevans) [02:37:23] 6operations, 10RESTBase, 10RESTBase-Cassandra: use non-default credentials when authenticating to Cassandra - https://phabricator.wikimedia.org/T92590#1159912 (10Eevans) [02:37:24] 6operations, 10RESTBase, 10RESTBase-Cassandra, 5Patch-For-Review: enable authenticated access to Cassandra JMX - https://phabricator.wikimedia.org/T92471#1159911 (10Eevans) [02:37:25] 6operations, 10RESTBase, 10RESTBase-Cassandra: secure Cassandra/RESTBase cluster - https://phabricator.wikimedia.org/T94329#1159908 (10Eevans) [02:37:27] 6operations, 10RESTBase-Cassandra: cassandra - enable Inter-node encryption - https://phabricator.wikimedia.org/T94132#1159910 (10Eevans) [02:39:53] 6operations, 10RESTBase, 10RESTBase-Cassandra: disable Thirft RPC interface on Cassandra/RESTBase cluster - https://phabricator.wikimedia.org/T94330#1159913 (10Eevans) 3NEW [02:40:27] 6operations, 10RESTBase, 10RESTBase-Cassandra: disable Thirft RPC interface on Cassandra/RESTBase cluster - https://phabricator.wikimedia.org/T94330#1159921 (10Eevans) [02:40:27] 6operations, 10RESTBase, 10RESTBase-Cassandra: secure Cassandra/RESTBase cluster - https://phabricator.wikimedia.org/T94329#1159920 (10Eevans) [03:05:16] PROBLEM - puppet last run on mw2121 is CRITICAL: CRITICAL: puppet fail [03:24:58] RECOVERY - puppet last run on mw2121 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [03:34:36] PROBLEM - puppet last run on elastic1014 is CRITICAL: CRITICAL: Puppet has 1 failures [03:35:19] PROBLEM - puppet last run on mw1053 is CRITICAL: CRITICAL: Puppet has 1 failures [03:49:47] PROBLEM - puppet last run on mw2080 is CRITICAL: CRITICAL: Puppet has 1 failures [03:51:07] RECOVERY - puppet last run on elastic1014 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [03:51:48] RECOVERY - puppet last run on mw1053 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [04:06:17] RECOVERY - puppet last run on mw2080 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [04:20:06] PROBLEM - puppet last run on amssq57 is CRITICAL: CRITICAL: puppet fail [04:36:17] RECOVERY - puppet last run on amssq57 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [06:00:46] !log LocalisationUpdate ResourceLoader cache refresh completed at Sun Mar 29 05:59:39 UTC 2015 (duration 59m 38s) [06:00:54] Logged the message, Master [06:10:44] YuviPanda: now [06:29:37] PROBLEM - puppet last run on mw1068 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:18] PROBLEM - puppet last run on mw2145 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:19] PROBLEM - puppet last run on db2036 is CRITICAL: CRITICAL: puppet fail [06:30:38] PROBLEM - puppet last run on mw1166 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:46] PROBLEM - puppet last run on cp4003 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:07] PROBLEM - puppet last run on mw1061 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:17] PROBLEM - puppet last run on amssq34 is CRITICAL: CRITICAL: puppet fail [06:31:58] PROBLEM - puppet last run on mw2096 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:56] PROBLEM - puppet last run on amssq54 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:56] PROBLEM - puppet last run on amssq47 is CRITICAL: CRITICAL: Puppet has 1 failures [06:35:17] PROBLEM - puppet last run on mw2097 is CRITICAL: CRITICAL: Puppet has 2 failures [06:35:17] PROBLEM - puppet last run on mw2126 is CRITICAL: CRITICAL: Puppet has 1 failures [06:35:18] PROBLEM - puppet last run on mw2022 is CRITICAL: CRITICAL: Puppet has 1 failures [06:35:18] PROBLEM - puppet last run on mw2045 is CRITICAL: CRITICAL: Puppet has 1 failures [06:45:57] RECOVERY - puppet last run on mw1061 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [06:46:06] RECOVERY - puppet last run on mw1068 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [06:46:07] RECOVERY - puppet last run on amssq54 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [06:46:48] RECOVERY - puppet last run on mw2145 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:46:48] RECOVERY - puppet last run on mw2096 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [06:46:48] RECOVERY - puppet last run on db2036 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [06:46:48] RECOVERY - puppet last run on mw2126 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [06:46:48] RECOVERY - puppet last run on mw2022 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [06:46:56] RECOVERY - puppet last run on mw2045 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [06:47:07] RECOVERY - puppet last run on mw1166 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:08] RECOVERY - puppet last run on cp4003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:47] RECOVERY - puppet last run on amssq47 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:48:37] RECOVERY - puppet last run on mw2097 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:49:36] RECOVERY - puppet last run on amssq34 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:50:08] PROBLEM - Kafka Broker Messages In on analytics1021 is CRITICAL: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate CRITICAL: 768.456072527 [12:52:51] (03PS1) 10Mobrovac: Citoid: switch from localsettings.js to config.yaml [puppet] - 10https://gerrit.wikimedia.org/r/200356 [12:54:29] (03CR) 10Mobrovac: [C: 04-1] "Giving -1 as this patch needs to be deployed in coordination with the upcoming citoid deployment patch." [puppet] - 10https://gerrit.wikimedia.org/r/200356 (owner: 10Mobrovac) [12:57:18] (03CR) 10Chiefwei: [C: 031] Add Draft namespace on zhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193827 (https://phabricator.wikimedia.org/T91223) (owner: 10Gerrit Patch Uploader) [14:50:13] 6operations, 10ops-esams: Remove all Toolserver equipment - https://phabricator.wikimedia.org/T92518#1160264 (10Cmjohnson) 5Open>3Resolved All items were removed from OE10 and OE16. I moved them to decommission rack d4 [14:50:14] 6operations, 10ops-esams: Remove knsq16-30 and prepare OE13 for new servers - https://phabricator.wikimedia.org/T92519#1160266 (10Cmjohnson) [17:12:08] PROBLEM - git.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:27:08] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 59662 bytes in 0.225 second response time [17:31:03] hm. nobody restarted giblit that time, but there was a load spike on antimony: load average: 15.06, 21.09, 19.65 [17:48:47] PROBLEM - puppet last run on mw2030 is CRITICAL: CRITICAL: puppet fail [18:03:27] PROBLEM - puppet last run on erbium is CRITICAL: CRITICAL: Puppet has 1 failures [18:08:36] RECOVERY - puppet last run on mw2030 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [18:19:48] RECOVERY - puppet last run on erbium is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [18:31:37] PROBLEM - git.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:38:27] RECOVERY - Graphite Carbon on graphite2001 is OK: OK: All defined Carbon jobs are runnning. [18:49:37] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 59662 bytes in 0.623 second response time [18:54:38] PROBLEM - git.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:56:16] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 59662 bytes in 0.683 second response time [19:05:08] PROBLEM - puppet last run on vanadium is CRITICAL: CRITICAL: puppet fail [19:21:38] RECOVERY - puppet last run on vanadium is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [19:42:26] (03CR) 10Hashar: [C: 04-1] proxies: allow filtering by datacenter (033 comments) [tools/scap] - 10https://gerrit.wikimedia.org/r/200130 (owner: 10Giuseppe Lavagetto) [19:53:42] Coren: I guess you're aware of tools being a bit down/not responding ? [20:14:00] NotASpy, working for me [20:14:36] getting No Webservice for https://tools.wmflabs.org/copyvios/ [20:15:12] flickr2commons was dead earlier but is back now (though I've not tested it to see if it's working) [20:25:26] NotASpy: That would be the tool itself being down. I can check to see if it will simply restart, though. [20:26:17] ah right, thought, with the other tool that was down (but isn't now) it was tools and not the individual tool [20:26:36] mind you, the makeref tool was up, so I don't know [20:28:27] NotASpy: It seems to have responded well to a simple restart. [20:28:45] perfect, I'll pass on the message to others too [20:29:26] 6operations: setup/deploy ganeti2001-2006 - https://phabricator.wikimedia.org/T94042#1160606 (10RobH) [20:32:04] NotASpy: As a rule, the 'no webservice' error message (503) is 99.9% of the time a problem with the specific tool rather than the infrastructure. [20:32:32] (Though that is not true in the aftermath of a global outage because then the webservices may be down because of that) [20:53:03] (03CR) 10Faidon Liambotis: [C: 04-1] "If we do hop-based detection already, why do we need this? This feels unnecessarily specific." [tools/scap] - 10https://gerrit.wikimedia.org/r/200130 (owner: 10Giuseppe Lavagetto) [21:11:17] 6operations, 10Wikimedia-SVG-rendering, 7Upstream: Filter effect Gaussian blur filter not rendered correctly for small to medium thumbnail sizes - https://phabricator.wikimedia.org/T44090#1160674 (10TheDJ) This is fixed upstream ??? Awesome, this really was one of the biggest problems with librsvg so far. I'... [21:18:26] 6operations, 10Wikimedia-SVG-rendering, 7Upstream: Filter effect Gaussian blur filter not rendered correctly for small to medium thumbnail sizes - https://phabricator.wikimedia.org/T44090#461916 (10Krenair) I think I made a mistake. Have been trying to deal with the ridiculous backlog on the #Upstream workbo... [21:30:27] PROBLEM - puppet last run on cp3008 is CRITICAL: CRITICAL: puppet fail [21:31:46] PROBLEM - puppet last run on cp3003 is CRITICAL: CRITICAL: puppet fail [21:46:57] RECOVERY - puppet last run on cp3008 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [21:49:56] RECOVERY - puppet last run on cp3003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [22:17:27] PROBLEM - puppet last run on mw2010 is CRITICAL: CRITICAL: puppet fail [22:35:37] RECOVERY - puppet last run on mw2010 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [22:57:17] PROBLEM - puppet last run on mw2180 is CRITICAL: CRITICAL: puppet fail [23:17:07] RECOVERY - puppet last run on mw2180 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:29:37] (03PS1) 10Alex Monk: Fix two people's real names in the admin data [puppet] - 10https://gerrit.wikimedia.org/r/200495