[00:06:33] (03PS2) 10Alex Monk: Add task ids for e41f9ab31a44a68a5979e38b5160c01f58135e49 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/212436 (owner: 10Ricordisamoa) [00:06:42] (03CR) 10Alex Monk: [C: 032] Add task ids for e41f9ab31a44a68a5979e38b5160c01f58135e49 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/212436 (owner: 10Ricordisamoa) [00:06:47] (03Merged) 10jenkins-bot: Add task ids for e41f9ab31a44a68a5979e38b5160c01f58135e49 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/212436 (owner: 10Ricordisamoa) [00:08:09] !log krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/212436/ - docs only, no code change (how was this waiting 10 days?) (duration: 00m 14s) [00:08:13] Logged the message, Master [00:42:12] 6operations, 5Patch-For-Review, 7database: contacts.wikimedia.org drupal unpuppetized / retire contacts - https://phabricator.wikimedia.org/T90679#1322509 (10Dzahn) [00:45:56] 6operations, 5Patch-For-Review, 7database: contacts.wikimedia.org drupal unpuppetized / retire contacts - https://phabricator.wikimedia.org/T90679#1322513 (10Dzahn) a:5Springle>3jcrespo [01:03:17] ori, rcstream down? [01:03:30] it is? [01:03:32] * ori looks [01:04:04] yeah, I see the alert on the list [01:15:51] !log Deployed rcstream I797bc1244: Handle invalid JSON gracefully [01:15:55] Logged the message, Master [01:32:27] any idea what the invalid data was that was getting sent to it? [01:39:05] Krenair: nope. We haven't gotten an error since I deployed. [02:06:52] PROBLEM - puppet last run on mw1256 is CRITICAL Puppet has 1 failures [02:22:03] RECOVERY - puppet last run on mw1256 is OK Puppet is currently enabled, last run 6 seconds ago with 0 failures [02:28:00] I seem to be having an issue with Central Login and Phabricator at the moment. As Central Login or Mediawiki Login for both of them refuse to work. [02:30:21] works for me [02:30:47] !log l10nupdate Synchronized php-1.26wmf7/cache/l10n: (no message) (duration: 06m 50s) [02:30:55] Logged the message, Master [02:31:16] Interesting, i keep getting the standard Wikimedia Server error when i try to login to Phabricator [02:33:45] Lor_: Logging in on the Wikis works for you, but you can't get to phabricator from that? [02:33:46] hoo, Centrally logging into Meta does not currently work, but clicking the 'Mediawiki' Link in the Phabricator login screen gets me a 503 [02:33:56] I see that your account did a few successful CentralAuth logins, but nothing regarding OAuth [02:35:40] Hmm, maybe something on my End is stopping me from logging into both of them? Refreshing as prompted when i go to a meta page just keeps me logged out. [02:35:59] !log LocalisationUpdate completed (1.26wmf7) at 2015-05-30 02:34:55+00:00 [02:36:06] Logged the message, Master [02:36:15] Any login attempts on en-wiki seem to work fine. [02:37:32] Yeah, saw that [02:38:49] I don't think I have access to the phabricator error log (at least I have no idea where/ how) [02:40:07] I don't think I can help you here... can you have someone file a bug for you? [02:40:11] (Not being able to file a bug sucks...) [02:40:36] Unless suddenly we move to github, doubt it. I may try with the normal LADP login and see what happens. [02:40:46] That should work [02:41:20] I have to call it a day... it's 4:40am here, and I have troubles keeping my eyes open [02:41:42] PROBLEM - High load average on labstore1001 is CRITICAL 100.00% of data above the critical threshold [24.0] [02:42:09] Not a problem, get sleep if you need it. [02:43:02] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 100.00% of data above the critical threshold [35.0] [02:48:04] RECOVERY - Persistent high iowait on labstore1001 is OK Less than 50.00% above the threshold [25.0] [02:48:23] RECOVERY - High load average on labstore1001 is OK Less than 50.00% above the threshold [16.0] [02:51:59] !log l10nupdate Synchronized php-1.26wmf8/cache/l10n: (no message) (duration: 05m 40s) [02:52:04] Logged the message, Master [02:56:26] !log LocalisationUpdate completed (1.26wmf8) at 2015-05-30 02:55:22+00:00 [02:56:30] Logged the message, Master [03:28:52] PROBLEM - High load average on labstore1001 is CRITICAL 57.14% of data above the critical threshold [24.0] [03:38:52] RECOVERY - High load average on labstore1001 is OK Less than 50.00% above the threshold [16.0] [03:45:33] PROBLEM - High load average on labstore1001 is CRITICAL 50.00% of data above the critical threshold [24.0] [04:10:23] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 57.14% of data above the critical threshold [35.0] [04:13:38] 6operations, 6Phabricator, 7database: Add Story points (from Sprint Extension) - https://phabricator.wikimedia.org/T100846#1322576 (10mmodell) @chasemp: it's stored in maniphest_customfieldstorage and you have to look up the fieldIndex. In my test instance the fieldIndex for story points is `yERhvoZPNPtM`... [04:24:03] RECOVERY - High load average on labstore1001 is OK Less than 50.00% above the threshold [16.0] [04:25:22] PROBLEM - MySQL Idle Transactions on db1041 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:26:43] RECOVERY - MySQL Idle Transactions on db1041 is OK longest blocking idle transaction sleeps for 0 seconds [04:28:53] RECOVERY - Persistent high iowait on labstore1001 is OK Less than 50.00% above the threshold [25.0] [04:39:03] PROBLEM - High load average on labstore1001 is CRITICAL 50.00% of data above the critical threshold [24.0] [04:47:33] RECOVERY - High load average on labstore1001 is OK Less than 50.00% above the threshold [16.0] [05:26:12] PROBLEM - High load average on labstore1001 is CRITICAL 85.71% of data above the critical threshold [24.0] [05:31:53] PROBLEM - are wikitech and wt-static in sync on silver is CRITICAL: wikitech-static CRIT - wikitech and wikitech-static out of sync (101199s 100000s) [05:32:05] !log LocalisationUpdate ResourceLoader cache refresh completed at Sat May 30 05:31:02 UTC 2015 (duration 31m 1s) [05:32:19] Logged the message, Master [05:45:23] RECOVERY - are wikitech and wt-static in sync on silver is OK: wikitech-static OK - wikitech and wikitech-static in sync (15384 100000s) [06:03:13] RECOVERY - High load average on labstore1001 is OK Less than 50.00% above the threshold [16.0] [06:14:52] PROBLEM - High load average on labstore1001 is CRITICAL 57.14% of data above the critical threshold [24.0] [06:24:42] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 55.56% of data above the critical threshold [35.0] [06:33:02] PROBLEM - puppet last run on cp4014 is CRITICAL Puppet has 1 failures [06:33:22] RECOVERY - High load average on labstore1001 is OK Less than 50.00% above the threshold [16.0] [06:33:32] PROBLEM - puppet last run on mw1065 is CRITICAL Puppet has 1 failures [06:34:12] PROBLEM - puppet last run on db2065 is CRITICAL Puppet has 1 failures [06:34:32] PROBLEM - puppet last run on ms-fe2003 is CRITICAL Puppet has 1 failures [06:35:03] PROBLEM - puppet last run on mw1144 is CRITICAL Puppet has 1 failures [06:35:12] PROBLEM - puppet last run on mw2017 is CRITICAL Puppet has 1 failures [06:35:32] PROBLEM - puppet last run on mw1052 is CRITICAL Puppet has 1 failures [06:36:02] PROBLEM - puppet last run on mw2206 is CRITICAL Puppet has 1 failures [06:36:02] PROBLEM - puppet last run on mw2003 is CRITICAL Puppet has 1 failures [06:43:12] RECOVERY - Persistent high iowait on labstore1001 is OK Less than 50.00% above the threshold [25.0] [06:43:32] PROBLEM - High load average on labstore1001 is CRITICAL 50.00% of data above the critical threshold [24.0] [06:46:53] RECOVERY - puppet last run on mw1144 is OK Puppet is currently enabled, last run 30 seconds ago with 0 failures [06:46:53] RECOVERY - puppet last run on mw1065 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:14] RECOVERY - puppet last run on mw1052 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:42] RECOVERY - puppet last run on db2065 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:52] RECOVERY - puppet last run on mw2206 is OK Puppet is currently enabled, last run 59 seconds ago with 0 failures [06:47:52] RECOVERY - puppet last run on mw2003 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:48:02] RECOVERY - puppet last run on ms-fe2003 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:48:03] RECOVERY - puppet last run on cp4014 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:48:42] RECOVERY - puppet last run on mw2017 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:51:53] RECOVERY - High load average on labstore1001 is OK Less than 50.00% above the threshold [16.0] [07:08:02] PROBLEM - puppet last run on db2002 is CRITICAL puppet fail [07:23:59] 6operations, 10pywikibot-core, 5Patch-For-Review, 10Wikimedia-Mailing-lists: Rename pywikipedia list prefixes to pywikibot - https://phabricator.wikimedia.org/T100707#1322602 (10Ladsgroup) Sorry for not making the notice in the mailing lists, I was traveling for a while, I reached my final destination five... [07:24:52] RECOVERY - puppet last run on db2002 is OK Puppet is currently enabled, last run 12 seconds ago with 0 failures [07:35:12] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 55.56% of data above the critical threshold [35.0] [07:38:53] PROBLEM - High load average on labstore1001 is CRITICAL 57.14% of data above the critical threshold [24.0] [07:45:48] Hi all, is there someone here who would be able to stop a GWToolset job? we had a problem on our server last thursday and since then the GWToolset seems to overload our server https://commons.wikimedia.org/w/index.php?title=Special:Log&type=gwtoolset [07:50:13] RECOVERY - Persistent high iowait on labstore1001 is OK Less than 50.00% above the threshold [25.0] [07:50:42] PROBLEM - High load average on labstore1001 is CRITICAL 57.14% of data above the critical threshold [24.0] [07:54:03] RECOVERY - High load average on labstore1001 is OK Less than 50.00% above the threshold [16.0] [08:00:53] PROBLEM - High load average on labstore1001 is CRITICAL 55.56% of data above the critical threshold [24.0] [08:05:23] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 66.67% of data above the critical threshold [35.0] [08:10:22] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 57.14% of data above the critical threshold [35.0] [08:24:22] RECOVERY - High load average on labstore1001 is OK Less than 50.00% above the threshold [16.0] [08:35:32] RECOVERY - Persistent high iowait on labstore1001 is OK Less than 50.00% above the threshold [25.0] [09:04:03] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 62.50% of data above the critical threshold [35.0] [09:06:22] PROBLEM - High load average on labstore1001 is CRITICAL 100.00% of data above the critical threshold [24.0] [09:12:02] Grrrrr....Anyone around? [09:15:43] RECOVERY - Persistent high iowait on labstore1001 is OK Less than 50.00% above the threshold [25.0] [09:16:01] On firefox, i get a 503 when i try to access anything relating to Meta/central login and Phabricator. [09:18:05] hi Lor_ [09:18:11] MatmaRex, Greetings [09:18:26] what exactly, for example? trying to log into phabricator using your SUL account? [09:19:09] MatmaRex, Trying to login centrally into Meta, and trying to login to phabricator...to make a ticket about not being able to login to Meta. [09:19:30] It seems like the bug only exists on firefox, as safari seems to be fine. [09:20:39] specifically i seem to get a 503 (Service Unavailable) with any login attempts on Phab. [09:21:07] let me try to reproduce [09:21:30] (i am not ops, by the way, so i probably won't be able to help :( but i'll probably be able to help file a bug) [09:21:31] To be specific, i'm running Firefox Dev Edition 40.0a2 [09:21:50] MatmaRex, That's fine, i've been trying to file a bug all day anyway :) [09:25:29] Lor_: i've been able to log in to phabricator via mediawiki.org oauth using firefox 38 on windows 7 [09:26:04] MatmaRex, Interesting, i'll try using Basic firefox with it... [09:26:11] (which is apparently the latest stable, so maybe there's something wrong with the alpha firefox you're using?) [09:26:45] MatmaRex, Possibly, maybe it's something up with a setting? [09:29:12] if it means anything, using private browsing gets me to the 0Auth confirm screen, but stops due to a missing API key [09:29:53] Well, it works on Safari..but i got another problem now... [09:31:05] Not a bug...I recently switched phones, and i use 2FA on my Phab account.... [09:32:33] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 87.50% of data above the critical threshold [35.0] [09:38:44] Lor_: hmm. unless you have a different way of proving your identity to ops, then it might not be possible to recover your account [09:38:51] that's kind of the whole point of 2FA [09:39:09] I got a million problems...digging for backup codes is one it seems...Shite [09:39:20] Meh, thanks for the help anyway [09:39:56] Lor_: we might still want to file the firefox-related bug? [09:40:06] MatmaRex, Guess so [09:41:01] so you don't have access to phab right now? [09:41:08] I don't, no. [09:41:22] I may be able to dig for my old phone later..Would still have Google Auth on it, right? [09:41:40] probably [09:41:51] you're Lor on phabricator? [09:41:57] Yup [09:43:37] Lor_: i filed https://phabricator.wikimedia.org/T100874 , if you have anything to add, you should be able to do it via email (even if you can't login) [09:43:49] MatmaRex, Good, Cheers [09:43:59] you should have just gotten an email copy of that bug (since i subscribed you to it), just reply to that [09:45:01] Indeed, thanks for the help :) [09:45:20] :) [09:48:38] Guh, Mail Doubled a link that i typed in. [09:49:52] But, never less, thanks for the help [09:50:21] :) [09:50:28] :) [09:56:43] RECOVERY - High load average on labstore1001 is OK Less than 50.00% above the threshold [16.0] [10:00:33] PROBLEM - puppet last run on mc2015 is CRITICAL puppet fail [10:11:33] RECOVERY - Persistent high iowait on labstore1001 is OK Less than 50.00% above the threshold [25.0] [10:17:22] RECOVERY - puppet last run on mc2015 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:39:32] 6operations, 10Wikimedia-Apache-configuration: HTTP/1.1 compliance of bits.wikimedia.org/.../load.php - https://phabricator.wikimedia.org/T30345#1322811 (10Krenair) @ori: Is there any point in fixing this, isn't bits going away? [11:40:36] 6operations, 10Traffic, 10Wikimedia-DNS: Consider DNSSec - https://phabricator.wikimedia.org/T26413#1322813 (10Krenair) [11:45:17] 6operations, 10Wikimedia-Fundraising: Add /fundraising to dumps.wikimedia.org - https://phabricator.wikimedia.org/T42847#1322818 (10Krenair) So what we want to do here is move /fundraising under /other/fundraising (with a 301 redirect in place), and provide a link on the /other index.html? Is that correct? [11:52:28] 6operations, 10Wikimedia-Fundraising: Add /fundraising to dumps.wikimedia.org - https://phabricator.wikimedia.org/T42847#1322826 (10jayvdb) >>! In T42847#1322818, @Krenair wrote: > So what we want to do here is move /fundraising to /other/fundraising (with a 301 redirect in place), and provide a link on the /o... [11:54:11] 6operations, 7Mail, 10Wikimedia-Mailing-lists: Mails to any wikimedia.org account/list from any account @wikimedia.org.ve bounces - https://phabricator.wikimedia.org/T62215#1322827 (10Krenair) [11:54:22] 6operations, 7Mail: Mails to any wikimedia.org account/list from any account @wikimedia.org.ve bounces - https://phabricator.wikimedia.org/T62215#675294 (10Krenair) [11:57:10] 6operations, 10Wikimedia-General-or-Unknown: Icinga has httpauth on (not accessible for public) - https://phabricator.wikimedia.org/T62112#1322835 (10Krenair) >>! In T62112#1098490, @scfc wrote: > IIRC, it could also be fixed by upgrading `neon` (?) to Trusty Yep, neon is the icinga host. [12:10:32] PROBLEM - puppet last run on cp4014 is CRITICAL puppet fail [12:13:16] Could someone help me with changePassword.php ? [12:19:06] falcus, ? [12:27:13] RECOVERY - puppet last run on cp4014 is OK Puppet is currently enabled, last run 35 seconds ago with 0 failures [13:36:50] 6operations, 10Analytics: Upgrade stat1001 to Ubuntu Trusty - https://phabricator.wikimedia.org/T76348#1322938 (10Krenair) [13:36:53] 6operations, 7Tracking: Upgrade Wikimedia servers to Ubuntu Trusty (14.04) (tracking) - https://phabricator.wikimedia.org/T65899#1322937 (10Krenair) [13:40:57] 6operations, 6Commons, 6Multimedia, 7HHVM, and 4 others: Convert Imagescalers to HHVM, Trusty - https://phabricator.wikimedia.org/T84842#1322941 (10Krenair) [13:41:00] 6operations, 10Datasets-General-or-Unknown, 7HHVM: Convert snapshot hosts to use HHVM and trusty - https://phabricator.wikimedia.org/T94277#1322943 (10Krenair) [13:41:03] 6operations, 10Beta-Cluster, 7HHVM: Convert work machines (tin, terbium) to Trusty and hhvm usage - https://phabricator.wikimedia.org/T87036#1322942 (10Krenair) [13:41:05] 6operations, 7Tracking: Upgrade Wikimedia servers to Ubuntu Trusty (14.04) (tracking) - https://phabricator.wikimedia.org/T65899#1322940 (10Krenair) [13:43:52] 7Puppet, 6operations, 7HHVM: Local hhvm error logs not readable by deployers - https://phabricator.wikimedia.org/T78310#1322948 (10Krenair) [13:59:46] 6operations, 10Wikimedia-General-or-Unknown: Icinga has httpauth on (not accessible for public) - https://phabricator.wikimedia.org/T62112#1322966 (10Krenair) [14:06:53] 6operations: Install fonts-wqy-zenhei on all mediawiki app servers - https://phabricator.wikimedia.org/T84777#1322973 (10Krenair) [15:16:02] PROBLEM - puppet last run on mw2075 is CRITICAL puppet fail [15:36:04] RECOVERY - puppet last run on mw2075 is OK Puppet is currently enabled, last run 14 seconds ago with 0 failures [15:36:21] 6operations, 7HHVM: Switch HAT appservers to trusty's ICU - https://phabricator.wikimedia.org/T86096#1323055 (10matmarex) [16:16:56] 6operations, 6Labs: fawikivoyage, orwikisource. maiwiki's dbs missing from Labs - https://phabricator.wikimedia.org/T75480#751702 (10Krenair) [16:24:23] PROBLEM - puppet last run on cp3046 is CRITICAL puppet fail [16:26:34] 6operations, 10ops-requests, 7Monitoring, 5Patch-For-Review, 10Wikimedia-Mailing-lists: Monitor mailman - https://phabricator.wikimedia.org/T84150#1323121 (10Krenair) [16:27:34] Krenair: around? [16:27:40] yep [16:27:53] > $wgUser = User::newFromName('Legoktm'); [16:27:53] > var_dump($wgUser->getOption('usebetatoolbar')); [16:27:53] int(1) [16:27:53] > var_dump(WikiEditorHooks::isEnabled('toolbar')); [16:27:53] bool(true) [16:29:23] ok... [16:31:21] toolbar shows up for me locally... [16:32:32] oh this is related to wikieditor disappearing on mediawiki.org? [16:32:38] I think I saw a bug about that [16:32:43] yes [16:32:51] https://phabricator.wikimedia.org/T100888 [16:33:04] https://www.mediawiki.org/wiki/MediaWiki_1.26/wmf8/Changelog#WikiEditor is mainly extension registration stuff [16:34:49] so this is an issue that's affecting wmf8 but not wmf7? [16:34:57] ok [16:35:29] I don't like the sound of https://gerrit.wikimedia.org/r/#/c/212854/1 - could it be playing strangely with wikimedia's settings? [16:36:13] Krenair: that was the first thing I thought of, but nope: http://fpaste.org/227250/376114/raw/ [16:37:35] Krenair: also WMF's config is the same as extension defaults... [16:39:24] works for me in beta [16:39:59] alex@alex-laptop:~/Development/MediaWiki/extensions/WikiEditor (master)$ git log HEAD..origin/wmf/1.26wmf8 --oneline [16:39:59] 62182ef Creating new wmf/1.26wmf8 branch [16:39:59] alex@alex-laptop:~/Development/MediaWiki/extensions/WikiEditor (master)$ [16:40:58] hmm [16:41:51] settings seem the same in beta.. [16:43:03] RECOVERY - puppet last run on cp3046 is OK Puppet is currently enabled, last run 38 seconds ago with 0 failures [16:44:46] 6operations, 10Wikimedia-Apache-configuration: HTTP/1.1 compliance of bits.wikimedia.org/.../load.php - https://phabricator.wikimedia.org/T30345#1323151 (10ori) 5Open>3Resolved a:3ori bits is no longer actively used. [16:46:07] I can see that the EditPageBeforeEditToolbar hook is being run [16:46:38] Krenair: the onBeforePageDisplay hook adds a ext.wikiEditor.init module which no longer exists [16:47:09] that one sounds like my fault [16:47:19] but from back in march [16:47:22] it's not been broken that long [16:48:49] I think the issue is client side, because all the PHP hooks are running properly and modules are being loaded [16:50:08] yes [16:50:17] mw.loader.getState( 'ext.wikiEditor.toolbar' ) [16:50:17] "ready" [16:51:00] (03PS1) 10Ori.livneh: Move media beacon off of bits [mediawiki-config] - 10https://gerrit.wikimedia.org/r/214822 [16:51:02] (03PS1) 10Ori.livneh: Make comment-stripping in MWWikiversions::readDbListFile simpler [mediawiki-config] - 10https://gerrit.wikimedia.org/r/214823 [16:51:12] $.wikiEditor.modules.toolbar is set [16:51:16] (03PS2) 10Ori.livneh: Move media beacon off of bits [mediawiki-config] - 10https://gerrit.wikimedia.org/r/214822 [16:51:53] (03PS3) 10Ori.livneh: Move media beacon off of bits [mediawiki-config] - 10https://gerrit.wikimedia.org/r/214822 (https://phabricator.wikimedia.org/T95448) [16:51:55] https://gerrit.wikimedia.org/r/#/c/200750/ ? [16:52:04] (03CR) 10jenkins-bot: [V: 04-1] Make comment-stripping in MWWikiversions::readDbListFile simpler [mediawiki-config] - 10https://gerrit.wikimedia.org/r/214823 (owner: 10Ori.livneh) [16:52:42] possibly [16:52:43] (03CR) 10Ori.livneh: [C: 032] Move media beacon off of bits [mediawiki-config] - 10https://gerrit.wikimedia.org/r/214822 (https://phabricator.wikimedia.org/T95448) (owner: 10Ori.livneh) [16:53:47] legoktm, I don't think so [16:53:48] but if it works in beta, it had to be changed in master/wmf9? [16:53:50] ok... [16:54:06] what's the bug? [16:54:14] wikieditor gone from wmf8 wikis [16:54:42] is it the module position thing? [16:54:42] it's still all over the dom [16:54:46] but toolbar is missing [16:56:06] 849cbf2 Explicitly define module position <-- in master, not wmf8 [16:56:57] legoktm, it's display:none [16:56:58] * legoktm live hacks 1017 [16:58:14] " * hide the WikiEditor toolbar until it's css has loaded */" [16:58:49] it fails because of ze grammar [16:58:49] legoktm, seems to be fixed? [16:58:59] Krenair: on mw1017 ? [16:59:03] on test, yes [16:59:21] then it's the position: top thing [16:59:33] saper: ;) [17:00:23] ori: I know bad time for jokes... [17:03:00] hmm, paladox nailed the issue from the beginning [17:06:01] !log legoktm Synchronized php-1.26wmf8/extensions/WikiEditor/extension.json: Explicitly define module position (duration: 00m 13s) [17:06:08] Logged the message, Master [17:06:11] Krenair: ^ [17:06:35] works [17:07:48] 6operations, 10ops-codfw, 7network: setup wifi in codfw - https://phabricator.wikimedia.org/T86541#1323197 (10Krenair) [17:08:06] 6operations: Spammer using //bits.wikimedia.org/geoiplookup - https://phabricator.wikimedia.org/T100902#1323201 (10ori) 3NEW a:3BBlack [17:15:16] * legoktm is off for a bit [17:19:21] * Krenair waves [17:20:50] (03PS2) 10Ori.livneh: Make comment-stripping in MWWikiversions::readDbListFile simpler [mediawiki-config] - 10https://gerrit.wikimedia.org/r/214823 [17:20:55] (03CR) 10jenkins-bot: [V: 04-1] Make comment-stripping in MWWikiversions::readDbListFile simpler [mediawiki-config] - 10https://gerrit.wikimedia.org/r/214823 (owner: 10Ori.livneh) [17:21:38] (03PS3) 10Ori.livneh: Make comment-stripping in MWWikiversions::readDbListFile simpler [mediawiki-config] - 10https://gerrit.wikimedia.org/r/214823 [17:27:16] jgage: Logstash Elasticsearch status update -- ridiculously still in progress; I raised several recovery limits and things seem to be going much faster now. It would probably be reasonable to make sure the icinga maintenance window stays open for another 24h or so to make sure nobody gets paged for it. [17:52:22] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL 7.14% of data above the critical threshold [500.0] [17:52:32] bd808: ok. i did notice that the monitors went green in icinga, but ganglia graphs tell me that data is still moving. [17:53:58] ok i marked those services for another 24h of downtime in icinga [18:03:38] 6operations, 7database: db1021 %iowait up - https://phabricator.wikimedia.org/T87277#1323264 (10Krenair) [18:04:12] RECOVERY - HTTP 5xx req/min on graphite1001 is OK Less than 1.00% above the threshold [250.0] [18:16:42] 6operations, 7database: mysql boxes not in ganglia - https://phabricator.wikimedia.org/T87209#1323280 (10Krenair) [18:20:06] 6operations, 6Phabricator: unable to subscribe to operations tag after migration and merge from ops-core and ops-request - https://phabricator.wikimedia.org/T89053#1323284 (10Krenair) [18:22:43] 6operations, 6Labs: fawikivoyage, orwikisource. maiwiki's dbs missing from Labs - https://phabricator.wikimedia.org/T75480#1323285 (10scfc) 5Open>3Resolved a:3scfc All view databases (e. g., `fawikivoyage_p`) seem to have been created and are replicated and the `fawikivoyage.labsdb` host aliases in `mani... [18:23:11] 6operations, 6Labs: fawikivoyage, orwikisource. maiwiki's dbs missing from Labs - https://phabricator.wikimedia.org/T75480#1323288 (10scfc) a:5scfc>3None [18:24:53] 6operations, 10Traffic, 10hardware-requests: Upgrade eqiad LVS to 10G - https://phabricator.wikimedia.org/T89120#1323289 (10Krenair) [18:26:19] 6operations, 6Labs: fawikivoyage, orwikisource. maiwiki's dbs missing from Labs - https://phabricator.wikimedia.org/T75480#1323291 (10scfc) @Reedy, could you please check/update the status of https://rt.wikimedia.org/Ticket/Display.html?id=8877 if it duplicates this task? [18:29:41] 6operations, 6Labs: fawikivoyage, orwikisource. maiwiki's dbs missing from Labs - https://phabricator.wikimedia.org/T75480#1323294 (10Krenair) That would be T84757 - the restricted task which I duplicated into this earlier. [18:30:47] 6operations, 10Traffic: Spammer using //bits.wikimedia.org/geoiplookup - https://phabricator.wikimedia.org/T100902#1323296 (10BBlack) [18:35:28] 6operations, 7Documentation: Incident response protocol needs a refresh - https://phabricator.wikimedia.org/T89800#1323300 (10Krenair) [18:35:43] !log hoo Synchronized php-1.26wmf7/extensions/UploadWizard/: Touch js… (duration: 00m 18s) [18:35:47] Logged the message, Master [18:38:12] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL Anomaly detected: 10 data above and 1 below the confidence bounds [18:38:36] 6operations, 6Labs, 10Labs-Infrastructure: Make labs/private really private - https://phabricator.wikimedia.org/T89642#1323303 (10Krenair) 5Open>3declined a:3Krenair [18:39:57] 6operations, 10Traffic: Spammer using //bits.wikimedia.org/geoiplookup - https://phabricator.wikimedia.org/T100902#1323305 (10BBlack) I don't know that it makes sense to start blocking these on the referrer. In general it's publicly-hittable by design, and someone can always just proxy into it with a fake/emp... [18:41:29] 6operations, 10Analytics-Cluster, 7Monitoring: Replace uses of monitoring::ganglia with monitoring::graphite_* - https://phabricator.wikimedia.org/T90642#1323306 (10Krenair) [19:02:03] 6operations, 6Labs, 7network: permit syslog from labs to lithium - https://phabricator.wikimedia.org/T90695#1323313 (10Krenair) I assume this is a firewall thing? [19:18:53] -++ [19:20:48] (oops, turns out this is what happens when you clean your keyboard while logged in) [19:26:32] 6operations, 10Analytics, 6Security, 10Traffic, 6Zero: Purge > 90 days stat1002:/a/squid/archive/zero - https://phabricator.wikimedia.org/T92343#1323351 (10Krenair) [19:27:20] 6operations, 10Analytics, 6Security, 10Traffic, 6Zero: Purge > 90 days stat1002:/a/squid/archive/sampled - https://phabricator.wikimedia.org/T92342#1323362 (10Krenair) [19:28:33] 6operations, 10Analytics, 6Security, 10Traffic, and 2 others: Purge > 90 days stat1002:/a/squid/archive/mobile - https://phabricator.wikimedia.org/T92341#1323369 (10Krenair) [19:29:44] 6operations, 10Analytics, 6Security, 10Traffic: Purge > 90 days stat1002:/a/squid/archive/glam_nara - https://phabricator.wikimedia.org/T92340#1323374 (10Krenair) [19:30:50] 6operations, 10Analytics, 6Security, 10Traffic: Purge > 90 days stat1002:/a/squid/archive/api - https://phabricator.wikimedia.org/T92338#1323382 (10Krenair) You've got actual raw user passwords in there? [19:32:25] 6operations, 10Fundraising-Backlog, 6Security: Delete gadolinium:/a/log/fundraising/ - https://phabricator.wikimedia.org/T92336#1323396 (10Krenair) [19:51:49] (03PS3) 10Jforrester: [WIP] Make VisualEditor access RESTbase directly on private wikis too [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200107 [19:51:51] (03PS1) 10Jforrester: Make VisualEditor access RESTbase directly on all public wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/214833 (https://phabricator.wikimedia.org/T100026) [20:22:32] (03CR) 10Jforrester: [C: 04-1] "Let's not do this until after the current A/B test is complete (i.e., after 11 June)." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/214833 (https://phabricator.wikimedia.org/T100026) (owner: 10Jforrester) [20:50:52] PROBLEM - puppet last run on mw2127 is CRITICAL Puppet has 1 failures [21:07:34] !log Upgraded Elasticsearch cluster to 1.3.9 on logstash100[1-6] [21:07:37] Logged the message, Master [21:07:42] jgage: all done [21:07:43] RECOVERY - puppet last run on mw2127 is OK Puppet is currently enabled, last run 54 seconds ago with 0 failures [21:10:55] 2015-05-30 22:52:26 mw1147 dawiki exception INFO: [db66cdc2] /w/api.php MWException from line 3690 of /srv/mediawiki/php-1.26wmf7/includes/User.php: CAS update failed on user_touched for user ID '149811';the version of the user to be saved is older than the current version. [21:16:35] 6operations, 10Wikimedia-Logstash, 7Elasticsearch: Update Wikimedia apt repo to include debs for Elasticsearch on jessie - https://phabricator.wikimedia.org/T98042#1323479 (10bd808) The current cluster has been updated to elasticsearch_1.3.9_all.deb now so this is the version we should make the default for j... [21:29:12] bd808: yay! [21:30:55] (03CR) 10Alex Monk: [C: 04-1] "Searching https://rest.wikimedia.org for 'wikimedia.org' only shows commons, so I think we're missing a lot of wikis still." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/214833 (https://phabricator.wikimedia.org/T100026) (owner: 10Jforrester) [21:31:14] things sped up dramatically when I doubled all of the resource limits [21:32:20] heh good call [21:32:23] where did you make that change? [21:33:10] Just as a transient cluster config change [21:33:28] via curl -XPUT? [21:33:34] run `curl localhost:9200/_cluster/settings?pretty` on any of the logstash hosts to see it [21:33:35] yeah [21:33:39] gotcha [21:34:43] the defaults are tuned for EC2 instance hosting apparently. We have a lot more cpu, iops and network to throw at the problem [21:35:34] kids these days [21:36:33] RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK No anomaly detected [21:36:42] the network graph shows when I turned things up -- https://ganglia.wikimedia.org/latest/graph.php?r=day&z=xlarge&c=Logstash+cluster+eqiad&m=cpu_report&s=by+name&mc=2&g=network_report [21:37:09] yeah, i was watching that [21:41:45] lol [21:41:49] that's awesome [21:48:23] PROBLEM - puppet last run on mw1041 is CRITICAL Puppet last ran 1 day ago [22:01:43] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL 7.14% of data above the critical threshold [500.0] [22:03:33] RECOVERY - puppet last run on mw1041 is OK Puppet is currently enabled, last run 24 seconds ago with 0 failures [22:15:32] (03CR) 10Alex Monk: "I think VisualEditor is expected to work on some of these:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/214833 (https://phabricator.wikimedia.org/T100026) (owner: 10Jforrester) [22:25:23] PROBLEM - puppet last run on mw2099 is CRITICAL Puppet has 1 failures [22:27:02] RECOVERY - HTTP 5xx req/min on graphite1001 is OK Less than 1.00% above the threshold [250.0] [22:32:08] 6operations, 7Easy: server admin log should include year in date (again) - https://phabricator.wikimedia.org/T85803#1323511 (10Krenair) Looks easy for anyone with a basic knowledge of Python: https://git.wikimedia.org/blob/operations%2Fdebs%2Fadminbot.git/579ea5abfc7b4b70d7ed82fa6b45956dd30090e9/adminlog.py#L34 [22:36:54] (03CR) 10Alex Monk: "It feels like all the dependencies are backwards on this stack of changes" [debs/adminbot] - 10https://gerrit.wikimedia.org/r/181054 (owner: 10Merlijn van Deen) [22:42:12] RECOVERY - puppet last run on mw2099 is OK Puppet is currently enabled, last run 12 seconds ago with 0 failures [22:49:09] 6operations, 7Documentation: Incident response protocol needs a refresh - https://phabricator.wikimedia.org/T89800#1323514 (10Krenair) I noticed that it still referred to Asher - have replaced with Sean and Jaime. It still references Leslie though, and I don't know if ops still has someone specifically on netw... [23:14:40] 6operations, 10Wikimedia-Fundraising: Add /fundraising to dumps.wikimedia.org - https://phabricator.wikimedia.org/T42847#1323517 (10Krenair) It looks like all old data, and I can't find references in puppet to it. Nor can I find references to frdata, which appears to be hosted in frack?