[00:08:16] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 50.00% of data above the critical threshold [35.0] [00:24:17] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 50.00% of data above the critical threshold [35.0] [00:34:43] (03CR) 10Yuvipanda: "Fair enough. I'll file a task to run a sed or something to put these in the existing ones before this webservice implementation becomes de" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/211098 (owner: 10Merlijn van Deen) [00:37:58] (03PS3) 10Yuvipanda: Add warning comment to manifest file [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/211098 (owner: 10Merlijn van Deen) [00:38:48] (03CR) 10Yuvipanda: "PS3 modifies wording to remover reference to webservice, make it more generic (this is going to be used for crontab replacement too) - don" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/211098 (owner: 10Merlijn van Deen) [00:50:07] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 55.56% of data above the critical threshold [35.0] [00:55:47] PROBLEM - carbon-cache too many creates on graphite1001 is CRITICAL 1.69% of data above the critical threshold [1000.0] [01:07:56] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 55.56% of data above the critical threshold [35.0] [01:14:06] PROBLEM - High load average on labstore1001 is CRITICAL 55.56% of data above the critical threshold [24.0] [01:25:27] RECOVERY - High load average on labstore1001 is OK Less than 50.00% above the threshold [16.0] [01:50:47] RECOVERY - carbon-cache too many creates on graphite1001 is OK Less than 1.00% above the threshold [500.0] [02:20:07] !log l10nupdate Synchronized php-1.26wmf5/cache/l10n: (no message) (duration: 06m 10s) [02:20:29] Logged the message, Master [02:23:46] PROBLEM - are wikitech and wt-static in sync on silver is CRITICAL: wikitech-static CRIT - wikitech and wikitech-static out of sync (94738s 90000s) [02:25:12] !log LocalisationUpdate completed (1.26wmf5) at 2015-05-17 02:24:09+00:00 [02:25:18] Logged the message, Master [02:39:50] !log l10nupdate Synchronized php-1.26wmf6/cache/l10n: (no message) (duration: 05m 18s) [02:39:57] Logged the message, Master [02:44:16] !log LocalisationUpdate completed (1.26wmf6) at 2015-05-17 02:43:13+00:00 [02:44:22] Logged the message, Master [03:09:07] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 50.00% of data above the critical threshold [35.0] [03:17:07] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 50.00% of data above the critical threshold [35.0] [03:25:27] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 50.00% of data above the critical threshold [35.0] [03:34:07] PROBLEM - puppet last run on cp3035 is CRITICAL Puppet has 1 failures [03:34:27] PROBLEM - puppet last run on mw1087 is CRITICAL Puppet has 1 failures [03:34:57] PROBLEM - puppet last run on mw1146 is CRITICAL Puppet has 1 failures [03:35:47] PROBLEM - puppet last run on lvs1001 is CRITICAL Puppet has 1 failures [03:36:26] PROBLEM - puppet last run on mw1210 is CRITICAL Puppet has 1 failures [03:51:08] RECOVERY - puppet last run on mw1146 is OK Puppet is currently enabled, last run 51 seconds ago with 0 failures [03:52:06] RECOVERY - puppet last run on cp3035 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [03:52:06] RECOVERY - puppet last run on lvs1001 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [03:52:26] RECOVERY - puppet last run on mw1087 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [03:52:37] RECOVERY - puppet last run on mw1210 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:32:07] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 50.00% of data above the critical threshold [35.0] [04:54:56] RECOVERY - Persistent high iowait on labstore1001 is OK Less than 50.00% above the threshold [25.0] [05:06:20] !log LocalisationUpdate ResourceLoader cache refresh completed at Sun May 17 05:05:16 UTC 2015 (duration 5m 15s) [05:06:26] Logged the message, Master [05:24:26] RECOVERY - are wikitech and wt-static in sync on silver is OK: wikitech-static OK - wikitech and wikitech-static in sync (17593 90000s) [06:28:07] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL 14.29% of data above the critical threshold [500.0] [06:30:17] PROBLEM - puppet last run on db1034 is CRITICAL Puppet has 1 failures [06:30:27] PROBLEM - puppet last run on mw2056 is CRITICAL puppet fail [06:30:37] PROBLEM - puppet last run on db1021 is CRITICAL puppet fail [06:31:27] PROBLEM - puppet last run on cp4003 is CRITICAL Puppet has 1 failures [06:31:58] PROBLEM - puppet last run on labcontrol2001 is CRITICAL Puppet has 3 failures [06:34:06] PROBLEM - puppet last run on mw2030 is CRITICAL Puppet has 2 failures [06:34:17] PROBLEM - puppet last run on mw2212 is CRITICAL Puppet has 1 failures [06:34:36] PROBLEM - puppet last run on mw1011 is CRITICAL Puppet has 1 failures [06:35:17] PROBLEM - puppet last run on mw2136 is CRITICAL Puppet has 1 failures [06:39:27] RECOVERY - HTTP 5xx req/min on graphite1001 is OK Less than 1.00% above the threshold [250.0] [06:46:11] RECOVERY - puppet last run on cp4003 is OK Puppet is currently enabled, last run 2 seconds ago with 0 failures [06:46:37] RECOVERY - puppet last run on db1034 is OK Puppet is currently enabled, last run 4 seconds ago with 0 failures [06:46:46] RECOVERY - puppet last run on labcontrol2001 is OK Puppet is currently enabled, last run 23 seconds ago with 0 failures [06:46:56] RECOVERY - puppet last run on db1021 is OK Puppet is currently enabled, last run 17 seconds ago with 0 failures [06:47:27] RECOVERY - puppet last run on mw2212 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:38] RECOVERY - puppet last run on mw1011 is OK Puppet is currently enabled, last run 49 seconds ago with 0 failures [06:48:27] RECOVERY - puppet last run on mw2136 is OK Puppet is currently enabled, last run 35 seconds ago with 0 failures [06:48:37] RECOVERY - puppet last run on mw2056 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:48:56] RECOVERY - puppet last run on mw2030 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [07:07:07] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 50.00% of data above the critical threshold [35.0] [07:10:17] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 50.00% of data above the critical threshold [35.0] [07:39:07] PROBLEM - High load average on labstore1001 is CRITICAL 62.50% of data above the critical threshold [24.0] [07:42:27] RECOVERY - High load average on labstore1001 is OK Less than 50.00% above the threshold [16.0] [07:42:57] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 55.56% of data above the critical threshold [35.0] [08:19:57] PROBLEM - puppet last run on elastic1025 is CRITICAL Puppet has 1 failures [08:23:36] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 66.67% of data above the critical threshold [35.0] [08:36:06] RECOVERY - puppet last run on elastic1025 is OK Puppet is currently enabled, last run 1 second ago with 0 failures [08:43:07] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 55.56% of data above the critical threshold [35.0] [08:59:07] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 50.00% of data above the critical threshold [35.0] [09:02:28] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 55.56% of data above the critical threshold [35.0] [09:13:08] (03PS1) 10Dereckson: Enable NewUserMessage on pa.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/211512 (https://phabricator.wikimedia.org/T99331) [09:44:47] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 55.56% of data above the critical threshold [35.0] [09:52:52] (03PS1) 10Dereckson: Logo configuration on ur.wikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/211514 (https://phabricator.wikimedia.org/T97510) [09:59:26] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 50.00% of data above the critical threshold [35.0] [10:04:17] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 50.00% of data above the critical threshold [35.0] [10:08:25] (03CR) 1020after4: [C: 031] phabricator: Add priority keywords/labels for !priority email command [puppet] - 10https://gerrit.wikimedia.org/r/209445 (https://phabricator.wikimedia.org/T98356) (owner: 10Merlijn van Deen) [10:09:56] (03PS1) 10Dereckson: Namespace configuration on pt.wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/211517 [10:13:37] (03PS2) 10Dereckson: Namespace configuration on pt.wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/211517 (https://phabricator.wikimedia.org/T94894) [10:22:16] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 50.00% of data above the critical threshold [35.0] [10:24:47] PROBLEM - puppet last run on graphite2001 is CRITICAL Puppet has 8 failures [10:39:18] RECOVERY - puppet last run on graphite2001 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [10:44:22] (03CR) 10Mobrovac: [C: 031] Removed localhost access by graphoid [puppet] - 10https://gerrit.wikimedia.org/r/211450 (owner: 10Yurik) [10:47:53] (03PS11) 10Mobrovac: mathoid to service::node [puppet] - 10https://gerrit.wikimedia.org/r/167413 (https://phabricator.wikimedia.org/T97124) (owner: 10Ori.livneh) [10:51:03] (03CR) 10Mobrovac: "The options have been included in the production config in this patchset and set not to use the new features, so we should be good to go." [puppet] - 10https://gerrit.wikimedia.org/r/167413 (https://phabricator.wikimedia.org/T97124) (owner: 10Ori.livneh) [10:51:17] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 50.00% of data above the critical threshold [35.0] [11:04:36] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 55.56% of data above the critical threshold [35.0] [11:09:26] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 50.00% of data above the critical threshold [35.0] [11:32:32] (03PS1) 10Merlijn van Deen: tools: store verbose logrotate logs [puppet] - 10https://gerrit.wikimedia.org/r/211519 (https://phabricator.wikimedia.org/T96007) [11:45:17] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 50.00% of data above the critical threshold [35.0] [11:47:21] (03CR) 10Physikerwelt: [C: 031] mathoid to service::node [puppet] - 10https://gerrit.wikimedia.org/r/167413 (https://phabricator.wikimedia.org/T97124) (owner: 10Ori.livneh) [11:55:06] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 55.56% of data above the critical threshold [35.0] [12:24:36] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 55.56% of data above the critical threshold [35.0] [12:28:16] (03CR) 10Hashar: [C: 031] contint: lint fixes [puppet] - 10https://gerrit.wikimedia.org/r/211337 (owner: 10Dzahn) [12:32:37] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 55.56% of data above the critical threshold [35.0] [12:34:04] 6operations, 3Roadmap, 10Wikimedia-Mailing-lists, 7notice, 7user-notice: Mailing list maintenance window - 2015-05-19 17:00 UTC to 19:00 UTC - https://phabricator.wikimedia.org/T99098#1291765 (10JohnLewis) [12:37:18] 6operations, 3Roadmap, 10Wikimedia-Mailing-lists, 7notice, 7user-notice: Mailing list maintenance window - 2015-05-19 17:00 UTC to 19:00 UTC - https://phabricator.wikimedia.org/T99098#1291776 (10Krenair) [12:37:19] 6operations, 10Wikimedia-Mailing-lists: Rename Wikidata-l to Wikidata - https://phabricator.wikimedia.org/T99136#1291778 (10Krenair) [12:37:26] 6operations, 3Roadmap, 10Wikimedia-Mailing-lists, 7notice, 7user-notice: Mailing list maintenance window - 2015-05-19 17:00 UTC to 19:00 UTC - https://phabricator.wikimedia.org/T99098#1285379 (10Krenair) [12:37:32] 6operations, 3Roadmap, 10Wikimedia-Mailing-lists, 7notice, 7user-notice: Mailing list maintenance window - 2015-05-19 17:00 UTC to 19:00 UTC - https://phabricator.wikimedia.org/T99098#1285379 (10Krenair) [12:37:33] 6operations, 10Wikimedia-Mailing-lists: Rename Wikidata-l to Wikidata - https://phabricator.wikimedia.org/T99136#1286032 (10Krenair) [12:49:36] PROBLEM - Varnishkafka Delivery Errors per minute on cp4004 is CRITICAL 11.11% of data above the critical threshold [20000.0] [12:50:36] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 66.67% of data above the critical threshold [35.0] [12:54:27] RECOVERY - Varnishkafka Delivery Errors per minute on cp4004 is OK Less than 1.00% above the threshold [0.0] [13:10:08] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 50.00% of data above the critical threshold [35.0] [13:36:16] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 50.00% of data above the critical threshold [35.0] [13:39:36] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 66.67% of data above the critical threshold [35.0] [13:45:40] 6operations, 10MediaWiki-JobQueue, 10MediaWiki-JobRunner, 5Patch-For-Review: enwiki's job is about 28m atm and increasing - https://phabricator.wikimedia.org/T98621#1291958 (10Wbm1058) Is it possible to change the DESCRIPTION at the top of this page? See https://en.wikipedia.org/wiki/Wikipedia:Village_pum... [13:59:39] 6operations, 10MediaWiki-JobQueue, 10MediaWiki-JobRunner, 5Patch-For-Review: enwiki's job is about 28m atm and increasing - https://phabricator.wikimedia.org/T98621#1291959 (10Krenair) Yes, click the edit task button in the top right hand corner, @Wbm1058. Current large job queue types, as of a few minute... [14:04:07] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 50.00% of data above the critical threshold [35.0] [14:10:03] 6operations, 10MediaWiki-JobQueue, 10MediaWiki-JobRunner, 5Patch-For-Review: enwiki's job is about 28m atm and increasing - https://phabricator.wikimedia.org/T98621#1291966 (10Wbm1058) [14:28:28] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 55.56% of data above the critical threshold [35.0] [14:38:16] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 50.00% of data above the critical threshold [35.0] [14:43:16] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 62.50% of data above the critical threshold [35.0] [15:09:17] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 50.00% of data above the critical threshold [35.0] [15:14:07] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 62.50% of data above the critical threshold [35.0] [15:18:18] PROBLEM - puppet last run on mw2158 is CRITICAL puppet fail [15:37:56] RECOVERY - puppet last run on mw2158 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [16:14:27] RECOVERY - Persistent high iowait on labstore1001 is OK Less than 50.00% above the threshold [25.0] [16:24:18] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 62.50% of data above the critical threshold [35.0] [17:04:48] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 55.56% of data above the critical threshold [35.0] [17:11:16] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 50.00% of data above the critical threshold [35.0] [17:18:47] PROBLEM - puppet last run on mw2070 is CRITICAL puppet fail [17:28:37] PROBLEM - puppet last run on mw2083 is CRITICAL puppet fail [17:36:37] RECOVERY - puppet last run on mw2070 is OK Puppet is currently enabled, last run 8 seconds ago with 0 failures [17:46:26] RECOVERY - puppet last run on mw2083 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [18:08:07] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 50.00% of data above the critical threshold [35.0] [18:12:15] springle: Around? [18:14:36] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 66.67% of data above the critical threshold [35.0] [18:16:14] nevermind [18:16:20] * hoo goes to cry [18:19:28] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 50.00% of data above the critical threshold [35.0] [18:33:57] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 55.56% of data above the critical threshold [35.0] [18:40:37] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 55.56% of data above the critical threshold [35.0] [18:44:24] * Nemo_bis hands handkerchiefs [18:45:27] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 62.50% of data above the critical threshold [35.0] [18:48:38] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 55.56% of data above the critical threshold [35.0] [18:58:50] Nemo_bis: thanks [18:59:14] I only *now* realized that the columns we use for page titles in Wikidata are to short [18:59:18] how on earth did we not notice [19:00:04] Q123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890 [19:00:22] No, I mean the ones we use to point to the pages on clients [19:00:34] ouch [19:00:44] Indeed [19:01:47] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 50.00% of data above the critical threshold [35.0] [19:08:16] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 55.56% of data above the critical threshold [35.0] [19:18:10] hoo whats the limit? [19:19:45] 255 [19:21:16] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 50.00% of data above the critical threshold [35.0] [19:25:16] hoo isnt that the normal title limit? [19:25:33] It is, but we have the full title, with namespace [19:25:49] while normal wikis have the namespace in a separate column [19:26:47] ah [19:29:27] hoo doesnt mediawiki check for total title length? [19:29:36] No, apparently not [19:29:52] Also that would cause some weird problems [19:29:59] * restrictions [19:42:20] I think a lot of those 255-byte limits are going to go away. [19:42:22] Probably this year. [19:42:32] We can go significantly higher while retaining the ability to index. [19:43:24] Let's not for page titles, unless *really, really* needed [19:43:30] Is that suggested somewhere? [19:43:41] In that case, I'd go for much more than the 300 I have in mind now [19:44:08] There's an open task somewhere about the _comment fields, I think. Sean has commented. [19:45:07] https://phabricator.wikimedia.org/T6715#1111905 [19:45:37] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 55.56% of data above the critical threshold [35.0] [19:45:45] Fiona: May I quote you in the bug? [19:45:58] I'm not authoritative. :-) [19:46:08] It's really Sean and the new guy's call whether we bump these limits. [19:46:16] "the new guy" [19:46:18] Because of the schema changes involved. I don't know what their priorities are. [19:46:23] There's a new DBA, I think. [19:46:27] yes [19:46:30] there is [19:47:00] Altering the page tables would actually be less horrible than revision. [19:47:22] upsizing a varbinary shouldn't be a problem (that's why their var) [19:47:32] At least for old data [19:47:56] I think we mostly have online schema change capability now as well. [19:48:20] The large page title has come up in the context of move logs. [19:48:35] If you have a long enough title and "Special:whatever" gets prepended, you can end up with truncation. [19:49:41] Jaime Crespo (jynus) [19:52:10] I love how maria just gives you a Warning (Code 1265): Data truncated for column ... [20:19:05] hoo|away, so in some cases ips_site_page contains the full prefixed title? [20:21:36] oh, I think I understand [20:45:42] Krenair: In all cases [20:50:38] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 55.56% of data above the critical threshold [35.0] [21:26:37] RECOVERY - Persistent high iowait on labstore1001 is OK Less than 50.00% above the threshold [25.0] [22:05:47] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 55.56% of data above the critical threshold [35.0] [22:08:21] * bordercat watches the memcached log spam [22:39:51] Is anyone able to succeed git-pull from mediawiki-core from gerrit? [22:39:54] It seems to be timing out [22:39:57] pulling from github works fine [22:40:05] pulling other repos also works [22:40:24] It's as if it's locked or something [22:44:09] Krinkle|detached: Works for me [22:46:15] I pulled from it successfully too [22:48:08] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 50.00% of data above the critical threshold [35.0] [23:13:09] It's back now, but was unresponsive for 10-20 minutes.