[00:29:54] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 4 below the confidence bounds [01:17:44] RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK: OK: No anomaly detected [01:18:05] PROBLEM - puppet last run on cp3011 is CRITICAL: CRITICAL: Puppet has 1 failures [01:33:54] !log running checkLocalNames.php --delete on commonswiki & wikidatawiki (CentralAuth) [01:34:01] Logged the message, Master [01:37:05] RECOVERY - puppet last run on cp3011 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [02:02:09] !log l10nupdate Synchronized php-1.25wmf19/cache/l10n: (no message) (duration: 00m 01s) [02:02:18] Logged the message, Master [02:03:17] !log LocalisationUpdate completed (1.25wmf19) at 2015-03-08 02:02:13+00:00 [02:03:23] Logged the message, Master [02:03:41] !log l10nupdate Synchronized php-1.25wmf20/cache/l10n: (no message) (duration: 00m 02s) [02:03:47] Logged the message, Master [02:04:49] !log LocalisationUpdate completed (1.25wmf20) at 2015-03-08 02:03:45+00:00 [02:04:54] Logged the message, Master [02:16:18] 6operations, 10Analytics-EventLogging, 6Analytics-Kanban: Eventlogging JS client should warn users when serialized event is more than 1024 chars long and not sent the event - https://phabricator.wikimedia.org/T91918#1098730 (10Nuria) 3NEW a:3mforns [02:16:26] !log LocalisationUpdate ResourceLoader cache refresh completed at Sun Mar 8 02:15:23 UTC 2015 (duration 15m 22s) [02:16:32] Logged the message, Master [02:16:44] 6operations, 10Analytics-EventLogging, 6Analytics-Kanban: Eventlogging JS client should warn users when serialized event is more than "N" chars long and not sent the event - https://phabricator.wikimedia.org/T91918#1098730 (10Nuria) [02:36:56] PROBLEM - puppet last run on cp4002 is CRITICAL: CRITICAL: puppet fail [02:55:54] RECOVERY - puppet last run on cp4002 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [03:33:54] PROBLEM - puppet last run on mw1171 is CRITICAL: CRITICAL: Puppet has 1 failures [03:34:14] PROBLEM - puppet last run on wtp1013 is CRITICAL: CRITICAL: Puppet has 1 failures [03:35:15] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.00333333333333 [03:37:45] PROBLEM - puppet last run on cp1044 is CRITICAL: CRITICAL: Puppet has 1 failures [03:40:24] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [03:42:34] PROBLEM - puppet last run on db2017 is CRITICAL: CRITICAL: Puppet has 1 failures [03:44:15] PROBLEM - puppet last run on logstash1001 is CRITICAL: CRITICAL: puppet fail [03:45:24] PROBLEM - puppet last run on mw1093 is CRITICAL: CRITICAL: Puppet has 1 failures [03:49:05] PROBLEM - puppet last run on db2007 is CRITICAL: CRITICAL: puppet fail [03:51:15] PROBLEM - puppet last run on labstore2001 is CRITICAL: CRITICAL: Puppet has 1 failures [03:52:05] PROBLEM - puppet last run on carbon is CRITICAL: CRITICAL: Puppet has 1 failures [03:52:28] PROBLEM - puppet last run on acamar is CRITICAL: CRITICAL: puppet fail [03:57:15] PROBLEM - puppet last run on elastic1027 is CRITICAL: CRITICAL: Puppet has 1 failures [03:58:35] RECOVERY - puppet last run on wtp1013 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [03:58:54] RECOVERY - puppet last run on cp1044 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [03:59:14] PROBLEM - puppet last run on mc2015 is CRITICAL: CRITICAL: puppet fail [03:59:15] RECOVERY - puppet last run on db2017 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [03:59:55] PROBLEM - puppet last run on analytics1022 is CRITICAL: CRITICAL: Puppet has 1 failures [04:00:25] PROBLEM - puppet last run on cp4002 is CRITICAL: CRITICAL: Puppet has 1 failures [04:00:25] PROBLEM - puppet last run on antimony is CRITICAL: CRITICAL: Puppet has 1 failures [04:00:35] RECOVERY - puppet last run on mw1171 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [04:00:55] RECOVERY - puppet last run on mw1093 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [04:01:04] PROBLEM - puppet last run on labsdb1006 is CRITICAL: CRITICAL: Puppet has 1 failures [04:01:14] PROBLEM - puppet last run on cp1048 is CRITICAL: CRITICAL: Puppet has 1 failures [04:01:34] PROBLEM - puppet last run on cp4011 is CRITICAL: CRITICAL: Puppet has 1 failures [04:02:14] RECOVERY - puppet last run on carbon is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [04:02:36] RECOVERY - puppet last run on labstore2001 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [04:03:05] RECOVERY - puppet last run on logstash1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [04:06:15] RECOVERY - puppet last run on elastic1027 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [04:08:07] RECOVERY - puppet last run on db2007 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [04:08:16] RECOVERY - puppet last run on antimony is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [04:08:54] RECOVERY - puppet last run on analytics1022 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [04:09:54] RECOVERY - puppet last run on labsdb1006 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [04:10:05] RECOVERY - puppet last run on cp1048 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [04:11:26] RECOVERY - puppet last run on acamar is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [04:11:34] springle, hi [04:15:56] RECOVERY - puppet last run on cp4002 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [04:16:55] RECOVERY - puppet last run on mc2015 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [04:18:05] RECOVERY - puppet last run on cp4011 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [04:27:02] 6operations, 6MediaWiki-Core-Team, 10SUL-Finalization: db1068 (s4/commonswiki slave) is missing data about some users - https://phabricator.wikimedia.org/T91920#1098763 (10Legoktm) 3NEW [04:31:49] 6operations, 6MediaWiki-Core-Team, 10SUL-Finalization: db1068 (s4/commonswiki slave) is missing data about some users - https://phabricator.wikimedia.org/T91920#1098772 (10Krenair) I checked the other slaves, only this particular one (db1068.eqiad.wmnet) is missing that entry. [04:46:39] 6operations, 6MediaWiki-Core-Team, 10SUL-Finalization: db1068 (s4/commonswiki slave) is missing data about some users - https://phabricator.wikimedia.org/T91920#1098773 (10Legoktm) [05:00:34] 6operations, 6MediaWiki-Core-Team, 10SUL-Finalization: db1068 (s4/commonswiki slave) is missing data about some users - https://phabricator.wikimedia.org/T91920#1098787 (10Krenair) We found 5 other users who are also on master but not db1068 - and they were all registered on commons within minutes of each ot... [05:08:34] 6operations, 6MediaWiki-Core-Team, 10SUL-Finalization: db1068 (s4/commonswiki slave) is missing data about at least 6 users - https://phabricator.wikimedia.org/T91920#1098789 (10Krenair) [05:52:41] (03PS1) 10Tim Landscheidt: Tools: Split local apt repository by OS release [puppet] - 10https://gerrit.wikimedia.org/r/195079 (https://phabricator.wikimedia.org/T76802) [05:56:41] (03CR) 10Tim Landscheidt: "Tested successfully on Toolsbeta:" [puppet] - 10https://gerrit.wikimedia.org/r/195079 (https://phabricator.wikimedia.org/T76802) (owner: 10Tim Landscheidt) [06:11:11] (03CR) 10Yuvipanda: "Yay." [puppet] - 10https://gerrit.wikimedia.org/r/195079 (https://phabricator.wikimedia.org/T76802) (owner: 10Tim Landscheidt) [06:16:14] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 3 below the confidence bounds [06:16:25] PROBLEM - Disk space on fluorine is CRITICAL: DISK CRITICAL - free space: /a 75482 MB (3% inode=99%): [06:28:45] PROBLEM - puppet last run on mw1166 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:45] PROBLEM - Disk space on fluorine is CRITICAL: DISK CRITICAL - free space: /a 76083 MB (3% inode=99%): [06:28:45] PROBLEM - puppet last run on db1034 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:04] PROBLEM - puppet last run on cp1056 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:14] PROBLEM - puppet last run on mw1065 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:15] PROBLEM - puppet last run on mw1118 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:14] PROBLEM - puppet last run on mw1119 is CRITICAL: CRITICAL: Puppet has 1 failures [06:45:35] RECOVERY - puppet last run on mw1119 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [06:45:44] RECOVERY - puppet last run on cp1056 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [06:46:34] RECOVERY - puppet last run on mw1166 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:46:35] RECOVERY - puppet last run on db1034 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [06:46:55] RECOVERY - puppet last run on mw1065 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [06:47:04] RECOVERY - puppet last run on mw1118 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [07:04:25] RECOVERY - Disk space on fluorine is OK: DISK OK [07:14:16] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 6 below the confidence bounds [07:49:23] (03CR) 10Tim Landscheidt: "There are only three instances that have labsdebrepo explicitly enabled, and each belongs to a different project, i. e. no project has two" [puppet] - 10https://gerrit.wikimedia.org/r/195079 (https://phabricator.wikimedia.org/T76802) (owner: 10Tim Landscheidt) [07:51:11] (03CR) 10Yuvipanda: "Alright :) I guess you would want to be around when this gets merged? Monday?" [puppet] - 10https://gerrit.wikimedia.org/r/195079 (https://phabricator.wikimedia.org/T76802) (owner: 10Tim Landscheidt) [08:00:35] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.00333333333333 [08:04:46] (03CR) 10Tim Landscheidt: "Well, the last time the effects only started a day afterwards :-). But Monday is fine with me." [puppet] - 10https://gerrit.wikimedia.org/r/195079 (https://phabricator.wikimedia.org/T76802) (owner: 10Tim Landscheidt) [08:05:35] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [09:34:34] (03CR) 10Yuvipanda: [C: 032] "On second thought, let's just do it and see what breaks..." [puppet] - 10https://gerrit.wikimedia.org/r/195079 (https://phabricator.wikimedia.org/T76802) (owner: 10Tim Landscheidt) [10:12:08] (03PS1) 10Tim Landscheidt: Ensure that apt preferences are named *.pref [puppet] - 10https://gerrit.wikimedia.org/r/195081 (https://phabricator.wikimedia.org/T60681) [10:18:00] (03CR) 10Tim Landscheidt: "NB: Rebasing this change requires checking that between its parent commit and the new HEAD no changes where made that use apt::pin or /etc" [puppet] - 10https://gerrit.wikimedia.org/r/195081 (https://phabricator.wikimedia.org/T60681) (owner: 10Tim Landscheidt) [10:37:06] PROBLEM - puppet last run on virt1011 is CRITICAL: CRITICAL: Puppet has 3 failures [10:54:45] RECOVERY - puppet last run on virt1011 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [11:07:12] (03PS1) 10Yuvipanda: tools: Point internal DNS hack to webproxy-02 not webproxy [puppet] - 10https://gerrit.wikimedia.org/r/195084 [11:08:07] (03PS2) 10Yuvipanda: tools: Point internal DNS hack to webproxy-02 not webproxy [puppet] - 10https://gerrit.wikimedia.org/r/195084 [11:08:26] (03CR) 10Yuvipanda: [C: 032 V: 032] tools: Point internal DNS hack to webproxy-02 not webproxy [puppet] - 10https://gerrit.wikimedia.org/r/195084 (owner: 10Yuvipanda) [11:13:45] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 9 below the confidence bounds [11:49:26] RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK: OK: No anomaly detected [12:26:15] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 11 data above and 9 below the confidence bounds [13:04:06] RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK: OK: No anomaly detected [14:52:05] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 12 data above and 9 below the confidence bounds [15:29:06] (03PS1) 10Steinsplitter: Adding pool.publicdomainproject.org to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195090 [15:30:56] (03PS2) 10Steinsplitter: Adding pool.publicdomainproject.org to wgCopyUploadsDomains for GWT upload [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195090 [15:32:30] (03PS3) 10Steinsplitter: Adding pool.publicdomainproject.org to wgCopyUploadsDomains for GWT upload [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195090 (https://phabricator.wikimedia.org/T91927) [15:35:34] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 16 data above and 9 below the confidence bounds [15:46:46] PROBLEM - Disk space on dataset1001 is CRITICAL: DISK CRITICAL - free space: /data 1521054 MB (3% inode=99%): [15:54:07] (03CR) 10coren: [C: 032] "Okaying the change despite the request on a Sunday; the change is minor, well-contained, and time-sensitive." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195090 (https://phabricator.wikimedia.org/T91927) (owner: 10Steinsplitter) [15:54:16] (03Merged) 10jenkins-bot: Adding pool.publicdomainproject.org to wgCopyUploadsDomains for GWT upload [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195090 (https://phabricator.wikimedia.org/T91927) (owner: 10Steinsplitter) [15:58:11] !log marc Synchronized wmf-config/InitialiseSettings.php: Adding pool.publicdomainproject.org to wgCopyUploadsDomains (T91927) (duration: 00m 07s) [15:58:19] Logged the message, Master [16:10:53] 6operations, 10Citoid: Update the citoid/deploy branch to not contain zotero deploy - https://phabricator.wikimedia.org/T89872#1099115 (10akosiaris) OK, I wasn't aware of that, thanks! [16:16:42] (03PS2) 10Alexandros Kosiaris: Include the zotero role in the sca role [puppet] - 10https://gerrit.wikimedia.org/r/195041 (https://phabricator.wikimedia.org/T89869) [16:20:21] 6operations: Dear ops-requests@wikimedia.org, Call for Submissions on Various Academic Disciplines - https://phabricator.wikimedia.org/T91932#1099120 (10emailbot) [16:30:02] (03PS1) 10Glaisher: Set wgLanguageCode at cawikimedia from 'en-ca' to 'en' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195092 (https://phabricator.wikimedia.org/T88843) [16:59:25] RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK: OK: No anomaly detected [17:16:47] (03PS1) 10MarkTraceur: Add throttle exception for Walker Art Center [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195096 [17:17:11] (03PS2) 10MarkTraceur: Add throttle exception for Walker Art Center [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195096 (https://phabricator.wikimedia.org/T91936) [17:17:30] Anyone around who can sanity-check that so I can deploy it quick? [17:17:41] The event is literally happening now [17:17:47] looking [17:18:10] Thanks hoo|away [17:18:59] (03CR) 10Hoo man: [C: 031] Add throttle exception for Walker Art Center [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195096 (https://phabricator.wikimedia.org/T91936) (owner: 10MarkTraceur) [17:19:02] Huzzah. [17:19:10] So...greg-g isn't here [17:19:27] But if hoo|away says it's OK and I can test it...fine [17:20:21] IMO: Just do it, it looks sane [17:20:30] (03CR) 10MarkTraceur: [C: 032] Add throttle exception for Walker Art Center [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195096 (https://phabricator.wikimedia.org/T91936) (owner: 10MarkTraceur) [17:20:36] (03Merged) 10jenkins-bot: Add throttle exception for Walker Art Center [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195096 (https://phabricator.wikimedia.org/T91936) (owner: 10MarkTraceur) [17:20:37] Shit, account creation demo happening now. [17:20:45] hurry :D [17:21:56] !log marktraceur Synchronized wmf-config/throttle.php: Account creation throttle exemption for Walker Art Center - hopefully soon enough (duration: 00m 06s) [17:22:00] Whew. [17:22:03] Logged the message, Master [17:25:35] (03PS1) 10Glaisher: Fix typo in robots.txt: Wayback Machine [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195097 [17:26:53] (03CR) 10Mdann52: [C: 031] Fix typo in robots.txt: Wayback Machine [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195097 (owner: 10Glaisher) [17:30:35] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 6 below the confidence bounds [17:45:58] * Coren says something unkind about things-that-need-to-happen-right-now that were known about long in advance. [17:52:13] Coren: I know, I was going to do it earlier this week, I forgot about it [17:52:21] The event organizers are less up on the tech community [17:52:38] Poop occurs. :-) [17:59:15] marktraceur, were the event organisers aware it was necessary? [18:01:06] Krenair: I don't think so. I tried to explain it to them last weekend... [18:01:11] But maybe it isn't working anyway. [18:01:31] the exception is not working? [18:04:05] For some reason, someone just got an account creation error [18:05:34] marktraceur: that was reported the last time an exemption was set as well [18:05:41] ...huh [18:05:53] That might be an issue. This was reported last month as well. [18:05:55] marktraceur: a trick is to create accounts on other language wikis and sul [18:05:57] I think [18:06:06] Yeah, that works [18:06:12] https://phabricator.wikimedia.org/T88203 [18:07:47] YuviPanda: I'll keep that in mind, some of the organizers have account creator too [18:09:30] YuviPanda: another trick; do it before the event ;) [18:09:30] marktraceur: cool [18:09:37] JohnLewis: +1 [18:09:54] JohnLewis: thanks for the ml creation! [18:09:58] JohnLewis: Ssshhhhh [18:10:01] * YuviPanda is in a bus [18:10:40] YuviPanda: welcome, as I said it should work out of the box for the purpose [18:11:00] Sweet! [18:11:09] Now I sleep [18:11:11] Nighty [18:53:29] (03PS6) 10JanZerebecki: Hide "prefershttps" preference on HSTS domains (ru): it has no effect [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194856 (https://phabricator.wikimedia.org/T91352) (owner: 10Nemo bis) [19:48:24] PROBLEM - puppet last run on cp4004 is CRITICAL: CRITICAL: puppet fail [20:01:14] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 8 below the confidence bounds [20:01:55] PROBLEM - dhclient process on labstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:02:05] PROBLEM - puppet last run on labstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:02:16] PROBLEM - configured eth on labstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:02:16] PROBLEM - Disk space on labstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:02:16] PROBLEM - DPKG on labstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:02:44] PROBLEM - RAID on labstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:02:45] PROBLEM - salt-minion processes on labstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:03:36] RECOVERY - RAID on labstore1001 is OK: OK: optimal, 72 logical, 72 physical [20:03:44] RECOVERY - salt-minion processes on labstore1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [20:03:55] RECOVERY - dhclient process on labstore1001 is OK: PROCS OK: 0 processes with command name dhclient [20:04:14] RECOVERY - puppet last run on labstore1001 is OK: OK: Puppet is currently enabled, last run 16 minutes ago with 0 failures [20:04:15] RECOVERY - Disk space on labstore1001 is OK: DISK OK [20:04:15] RECOVERY - configured eth on labstore1001 is OK: NRPE: Unable to read output [20:04:15] RECOVERY - DPKG on labstore1001 is OK: All packages OK [20:07:15] RECOVERY - puppet last run on cp4004 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [20:24:34] RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK: OK: No anomaly detected [21:01:48] (03PS2) 10Nemo bis: Just use "en" as language code for WMCA wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195064 (https://phabricator.wikimedia.org/T88843) [21:02:19] (03CR) 10Nemo bis: "Dupe of https://gerrit.wikimedia.org/r/#/c/195064/ (sorry)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195092 (https://phabricator.wikimedia.org/T88843) (owner: 10Glaisher) [22:28:04] 6operations, 6MediaWiki-Core-Team, 10SUL-Finalization: db1068 (s4/commonswiki slave) is missing data about at least 6 users - https://phabricator.wikimedia.org/T91920#1099363 (10Springle) a:3Springle [22:32:45] 6operations, 6MediaWiki-Core-Team, 10SUL-Finalization: db1068 (s4/commonswiki slave) is missing data about at least 6 users - https://phabricator.wikimedia.org/T91920#1099367 (10Springle) Binary logs don't go back that far. Starting a sync check to see how large the problem is. [22:36:11] (03PS1) 10Springle: depool db1068 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195186 [22:39:23] (03CR) 10Springle: [C: 032] depool db1068 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195186 (owner: 10Springle) [22:39:28] (03Merged) 10jenkins-bot: depool db1068 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195186 (owner: 10Springle) [22:40:22] !log springle Synchronized wmf-config/db-eqiad.php: depool db1068 T91920 (duration: 00m 06s) [22:40:31] Logged the message, Master [23:04:24] Coren: tools-webgrid-06.eqiad.wmflabs looks faulty [23:05:02] hoo|away: How so? [23:05:54] I can't ssh into it + geohack which is on it right now doesn't respond [23:06:26] I can log into it fine. What symptoms are you seeing? [23:06:27] guess I could try to ssh into it via ProxyCommand and not hostbased [23:07:01] geohack is *incredibly* slow to unresponsive [23:07:21] seems like hostbased access just isn't enabled for that instance? Is that on purpose? [23:07:35] (I'm not member of the geohack project, so can't really do anything on my own) [23:07:59] No, it should have been enabled. I'm guessing it was forgotten when Yuvi added it. [23:08:06] debug1: Authentications that can continue: publickey [23:08:07] ah ok [23:08:49] Yup, it was forgotten. Enabled, it'lll kick in next puppet run. [23:10:25] Can you also have a look at geohack? [23:15:13] * Coren examines. [23:39:03] $ curl -v https://tools.wmflabs.org/magnustools/resources/css/bootstrap.min.css [23:39:03] < HTTP/1.1 502 Bad Gateway [23:40:40] hoo@tools-webgrid-01:~$ curl -v 'localhost:14192/magnustools/resources/css/bootstrap.min.css' [23:40:40] < HTTP/1.1 200 OK [23:40:47] Coren: ^ any idea about that? [23:41:01] (also not in magnustools, so can't just restart it) [23:41:39] The proxies appear confused, but Yuvi has been working on them lately so I'm not sure what's up. [23:42:19] :S But the issue geohack has is different, polling that from localhost also doesn't work [23:44:15] Geohack is genuinely slow; but I just restarted it and it seems a bit better. [23:44:55] Way better now [23:45:58] The proxies are really ill though. [23:46:11] * Coren boggles. [23:47:14] Yuvi changed things around when he made them redundant, and now I'm not sure what's going on. [23:50:11] I see :/ [23:50:31] There's a phab ticket already open for him to find - he should be around in a couple of hours given his timezone. [23:52:32] Ok [23:57:19] Coren: If you're being disgustingly helpful and obliging, could I ask you to poke a Tool Labs project that's gone away? :-) [23:57:59] Coren: https://phabricator.wikimedia.org/T89695 – no idea about it.