[01:34:24] 6operations, 10Wikimedia-Labs-wikitech-interface: fix wikitech-static updates - https://phabricator.wikimedia.org/T83596#1057022 (10Andrew) 5Open>3Resolved a:3Andrew This is fixed. It broke because the mw version on -static had a version incompatibility with the version running on wikitech. Of course,... [01:37:23] 6operations, 10Wikimedia-Labs-wikitech-interface: fix wikitech-static updates - https://phabricator.wikimedia.org/T83596#1057027 (10Andrew) This is fixed. It broke because the mw version on -static had a version incompatibility with the version running on wikitech. Of course, we need to monitor. https://ph... [01:38:07] 6operations, 7Monitoring: Monitor the up-to-date status of wikitech-static - https://phabricator.wikimedia.org/T89323#1057031 (10Andrew) @fgiunchedi a daily cron runs on wikitech which wikitech-static fetches. Details here: https://wikitech.wikimedia.org/wiki/Wikitech-static [01:39:27] 6operations, 7Monitoring: Monitor the up-to-date status of wikitech-static - https://phabricator.wikimedia.org/T89323#1057032 (10Andrew) I propose that we have a monitoring test which uses api calls to compare the most recent edit date of https://wikitech.wikimedia.org/wiki/Server_Admin_Log with https://wi... [01:47:06] ^ Cleaning up phab issues, pretty much the equivalent of a college student cleaning his room when a paper is due [01:48:03] lol [01:48:33] (03PS1) 10Springle: depool db1065 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192169 [01:48:53] (03CR) 10Springle: [C: 032] depool db1065 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192169 (owner: 10Springle) [01:48:58] (03Merged) 10jenkins-bot: depool db1065 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192169 (owner: 10Springle) [01:49:44] !log springle Synchronized wmf-config/db-eqiad.php: depool db1065 (duration: 00m 06s) [01:49:53] Logged the message, Master [02:03:20] !log l10nupdate Synchronized php-1.25wmf17/cache/l10n: (no message) (duration: 00m 01s) [02:03:26] Logged the message, Master [02:04:27] !log LocalisationUpdate completed (1.25wmf17) at 2015-02-22 02:03:23+00:00 [02:04:31] Logged the message, Master [02:04:51] !log l10nupdate Synchronized php-1.25wmf18/cache/l10n: (no message) (duration: 00m 01s) [02:04:54] Logged the message, Master [02:05:58] !log LocalisationUpdate completed (1.25wmf18) at 2015-02-22 02:04:55+00:00 [02:06:02] Logged the message, Master [02:08:56] 6operations, 10Analytics-EventLogging, 10Wikimedia-Site-requests: wikitech.wikimedia.org error "$wgEventLoggingBaseUri is not set." - https://phabricator.wikimedia.org/T84965#1057052 (10Andrew) I wrote that patch in a rush of trying to get wikitech to work at all :( I don't really know what is indicated b... [02:14:43] 7Puppet, 6operations, 5Patch-For-Review: Make Puppet repository pass lenient and strict lint checks - https://phabricator.wikimedia.org/T87132#1057055 (10Andrew) a:5Andrew>3hashar [02:17:08] !log LocalisationUpdate ResourceLoader cache refresh completed at Sun Feb 22 02:16:04 UTC 2015 (duration 16m 3s) [02:17:14] Logged the message, Master [03:33:44] PROBLEM - puppet last run on mw1074 is CRITICAL: CRITICAL: Puppet has 1 failures [03:38:44] PROBLEM - puppet last run on cp4020 is CRITICAL: CRITICAL: Puppet has 1 failures [03:41:54] PROBLEM - puppet last run on mw1058 is CRITICAL: CRITICAL: Puppet has 1 failures [03:47:54] PROBLEM - puppet last run on mw1254 is CRITICAL: CRITICAL: Puppet has 1 failures [03:52:24] RECOVERY - puppet last run on mw1074 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [03:55:16] RECOVERY - puppet last run on cp4020 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [03:58:04] RECOVERY - puppet last run on mw1058 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [04:04:14] RECOVERY - puppet last run on mw1254 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [04:51:41] (03PS1) 10Tim Landscheidt: Tools: Puppetize jobkill [puppet] - 10https://gerrit.wikimedia.org/r/192172 (https://phabricator.wikimedia.org/T90331) [05:30:21] is group1 defined in InitialiseSettings.php so that I can driectly enable an extension to it, or should I be defining it somewhere in CommonSettings ? [05:32:06] https://wikitech.wikimedia.org/wiki/Heterogeneous_deployment/Train_deploys#1._Create_group1_wikis_to_VERSION_patch [05:32:39] default to true, wikipedias to false [05:33:06] that should give you group0 & group1 [05:46:09] Krenair: that looks good. so just 'group1' => true right ? [05:49:14] PROBLEM - puppet last run on amssq34 is CRITICAL: CRITICAL: puppet fail [05:49:55] tonythomas: I'm not sure, but you can look for a dblist in the config repo. [05:50:17] If I could find it for you from my phone, I would. [05:53:47] can you just do default => true and 'wikipedia' (or maybe 'wiki') => false? [06:09:44] RECOVERY - puppet last run on amssq34 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [06:27:08] legoktm: I will try that one then [06:29:34] PROBLEM - puppet last run on mw1119 is CRITICAL: CRITICAL: Puppet has 3 failures [06:29:43] PROBLEM - puppet last run on db1028 is CRITICAL: CRITICAL: Puppet has 3 failures [06:30:14] PROBLEM - puppet last run on mw1065 is CRITICAL: CRITICAL: Puppet has 2 failures [06:32:14] PROBLEM - puppet last run on cp4008 is CRITICAL: CRITICAL: Puppet has 1 failures [06:45:43] RECOVERY - puppet last run on mw1119 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [06:46:24] RECOVERY - puppet last run on mw1065 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [06:46:55] RECOVERY - puppet last run on db1028 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [06:47:24] RECOVERY - puppet last run on cp4008 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [08:56:54] PROBLEM - very high load average likely xfs on ms-be1007 is CRITICAL: CRITICAL - load average: 228.87, 112.78, 52.94 [09:43:14] PROBLEM - puppet last run on cp3015 is CRITICAL: CRITICAL: puppet fail [10:03:56] RECOVERY - puppet last run on cp3015 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [10:21:44] (03PS1) 10Springle: repool db1065 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192180 [10:22:28] (03CR) 10Springle: [C: 032] repool db1065 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192180 (owner: 10Springle) [10:22:33] (03Merged) 10jenkins-bot: repool db1065 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192180 (owner: 10Springle) [10:23:22] !log springle Synchronized wmf-config/db-eqiad.php: repool db1065 (duration: 00m 05s) [10:23:29] Logged the message, Master [10:33:00] !log reboot ms-be1007, xfs hosed [10:33:03] Logged the message, Master [10:37:54] RECOVERY - very high load average likely xfs on ms-be1007 is OK: OK - load average: 29.24, 7.99, 2.72 [12:41:45] (03CR) 10Odder: [C: 031] Add autopatrolled user group for dawikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/188928 (https://phabricator.wikimedia.org/T88591) (owner: 10Mjbmr) [12:45:14] (03CR) 10Odder: [C: 031] Set $wgBabelCategoryNames true at outreachwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190686 (https://phabricator.wikimedia.org/T89484) (owner: 10Gerardduenas) [14:05:53] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 13.33% of data above the critical threshold [500.0] [14:05:54] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 13.33% of data above the critical threshold [500.0] [14:34:04] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [14:35:04] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [15:43:14] (03CR) 10Odder: [C: 031] Enable EducationProgram in the Hebrew Wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190357 (https://phabricator.wikimedia.org/T89393) (owner: 10Amire80) [16:11:54] PROBLEM - puppet last run on analytics1027 is CRITICAL: CRITICAL: Puppet last ran 1 day ago [16:12:54] RECOVERY - puppet last run on analytics1027 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [16:57:01] (03CR) 10Krinkle: [C: 031] Tidy up SpecialVersionUrl hook usage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191927 (https://phabricator.wikimedia.org/T75759) (owner: 10Chad) [17:16:48] (03CR) 10Krinkle: [C: 031] Add legoktm to contint-admins [puppet] - 10https://gerrit.wikimedia.org/r/191954 (https://phabricator.wikimedia.org/T90275) (owner: 10Hashar) [17:25:44] PROBLEM - puppet last run on db2005 is CRITICAL: CRITICAL: puppet fail [17:44:04] RECOVERY - puppet last run on db2005 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [19:24:50] (03PS1) 10Aklapper: Rename Phabricator's 'Needs Volunteer' priority to 'Lowest' [puppet] - 10https://gerrit.wikimedia.org/r/192205 (https://phabricator.wikimedia.org/T78617) [19:45:59] did someone just say volunteers are lowest priority? my career in yellow journalism could start right here. [19:50:56] pajz: no, someone is fixing that statement [19:52:35] (03CR) 10Nemo bis: [C: 04-1] "At last!" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/192205 (https://phabricator.wikimedia.org/T78617) (owner: 10Aklapper) [20:09:39] (03CR) 10Deskana: "Shouldn't the short name also be changed from "Volunteer"?" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/192205 (https://phabricator.wikimedia.org/T78617) (owner: 10Aklapper) [20:25:12] (03PS2) 10Aklapper: Rename Phabricator's 'Needs Volunteer' priority to 'Lowest' [puppet] - 10https://gerrit.wikimedia.org/r/192205 (https://phabricator.wikimedia.org/T78617) [20:26:02] (03CR) 10Aklapper: "Deskana: Uhm, thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/192205 (https://phabricator.wikimedia.org/T78617) (owner: 10Aklapper) [20:29:18] (03CR) 10Nemo bis: [C: 031] Rename Phabricator's 'Needs Volunteer' priority to 'Lowest' [puppet] - 10https://gerrit.wikimedia.org/r/192205 (https://phabricator.wikimedia.org/T78617) (owner: 10Aklapper) [20:36:37] (03CR) 10Qgil: [C: 031] Rename Phabricator's 'Needs Volunteer' priority to 'Lowest' [puppet] - 10https://gerrit.wikimedia.org/r/192205 (https://phabricator.wikimedia.org/T78617) (owner: 10Aklapper) [20:37:38] (03CR) 10BryanDavis: [C: 031] Tidy up SpecialVersionUrl hook usage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191927 (https://phabricator.wikimedia.org/T75759) (owner: 10Chad) [20:44:32] springle: Error: 1176 Key 'cl_from' doesn't exist in table 'categorylinks' (10.64.48.28) [20:44:46] doesn't seem to be all slaves [20:45:00] why would the PK disappear? [20:47:52] AaronS: PRIMARY KEY (`cl_from`,`cl_to`), vs UNIQUE KEY `cl_from` (`cl_from`,`cl_to`), [20:47:58] shouldn't really make an difference [20:48:03] unless you try to hint for it :P [20:48:16] well, that's whats going on [20:48:25] I doubt the hint is needed though [20:51:07] AaronS: another topic... do you think we should wait for slaves in RecentChangesUpdateJob? [20:51:18] on wikidatawiki eg. 23298 edits collected up [20:51:30] I guess it's because of the to big transaction thing [20:52:07] I guess it wont prune any edits until that patch got deployed [20:52:29] * gets deployed [20:54:31] (03CR) 10Bartosz DziewoƄski: [C: 031] Rename Phabricator's 'Needs Volunteer' priority to 'Lowest' [puppet] - 10https://gerrit.wikimedia.org/r/192205 (https://phabricator.wikimedia.org/T78617) (owner: 10Aklapper) [20:59:59] hoo: maybe backport a wait check to the next wmf branch? [21:01:03] yeah, should go out with the flush patch [21:01:04] it's mostly just the build up problem due to the mysql max_binlog_cache_size [21:01:10] let me prepare a patch [21:01:14] yep [21:01:20] shouldn't be needed in master though [21:01:33] Well, it wont harm, right? [21:02:17] the limit 100 helps apparent slave lag a bit by letting stuff get interlaced, and without that bug it shouldn't be deleting too much [21:02:37] if there is still a problem, I'd rather it do very low timeout waits and yield if there is lag [21:02:51] e.g. have the job return false, triggering a retries later [21:02:58] We could also make the loop only run a couple of times max. [21:03:43] this would only matter if lots of edits come in, which also means lots of job enqueues, so even retries aren't needed [21:03:52] since another job would just come along anyway soon [21:04:04] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [21:04:04] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [21:04:06] so the job could even just return true in that case [21:04:25] it just don't want job runner time getting spent in sleep() basically ;) [21:05:39] (03CR) 10Deskana: [C: 031] Rename Phabricator's 'Needs Volunteer' priority to 'Lowest' [puppet] - 10https://gerrit.wikimedia.org/r/192205 (https://phabricator.wikimedia.org/T78617) (owner: 10Aklapper) [21:10:09] AaronS: https://gerrit.wikimedia.org/r/192243 like that? I hope the return value of the global function can be used like that [21:10:16] According to the docs it can [21:11:09] * AaronS gripes about named function arguments [21:12:33] Python, ftw :P [21:15:55] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [21:15:55] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [21:34:40] (03CR) 10Legoktm: [C: 031] Rename Phabricator's 'Needs Volunteer' priority to 'Lowest' [puppet] - 10https://gerrit.wikimedia.org/r/192205 (https://phabricator.wikimedia.org/T78617) (owner: 10Aklapper) [21:40:18] (03CR) 10Legoktm: "wikibugs update is https://gerrit.wikimedia.org/r/192245" [puppet] - 10https://gerrit.wikimedia.org/r/192205 (https://phabricator.wikimedia.org/T78617) (owner: 10Aklapper) [21:42:33] AaronS: hrmm ok. improper cleanup on my part after https://phabricator.wikimedia.org/T72558 when api traffic spills over to non-api slaves [21:43:24] (03PS1) 10Base: Abusefilter config change for ukwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192246 (https://phabricator.wikimedia.org/T89379) [21:43:43] springle: is that FORCE needed? I proposed https://gerrit.wikimedia.org/r/#/c/192242/ [21:44:02] the STRAIGHT JOIN and the fact that the join are LEFT largely dictates the order [21:44:14] (03PS2) 10Base: Abusefilter config change for ukwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192246 (https://phabricator.wikimedia.org/T89379) [21:44:16] that should narrow down the indexes choices a lot anyway [21:44:30] yeah the force is an overkill [21:44:31] FORCE seems excessive at that point [21:46:27] (03CR) 10Hoo man: [C: 04-1] Abusefilter config change for ukwiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192246 (https://phabricator.wikimedia.org/T89379) (owner: 10Base) [22:08:29] !log springle Synchronized wmf-config/db-eqiad.php: depool db1065 (duration: 00m 06s) [22:08:37] Logged the message, Master [22:27:10] 6operations, 10Wikimedia-Mailing-lists: Let public archives be indexed and archived - https://phabricator.wikimedia.org/T90407#1058003 (10Nemo_bis) 3NEW [22:29:05] 6operations, 10Wikimedia-Mailing-lists: Let public archives be indexed and archived - https://phabricator.wikimedia.org/T90407#1058013 (10Nemo_bis) [22:29:59] (03CR) 10Base: "The duration? It is stated in the last comment by Green Zero. As to discussion it closed with duration "1.5 - 3 hours" so 2 hours mentione" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192246 (https://phabricator.wikimedia.org/T89379) (owner: 10Base) [22:37:01] !log springle Synchronized wmf-config/db-eqiad.php: repool db1065, warm up (duration: 00m 05s) [22:37:04] Logged the message, Master [22:38:15] (03CR) 10Base: Abusefilter config change for ukwiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192246 (https://phabricator.wikimedia.org/T89379) (owner: 10Base) [22:42:05] 6operations, 10Wikimedia-Mailing-lists: Let public archives be indexed and archived - https://phabricator.wikimedia.org/T90407#1058022 (10Nemo_bis) [22:42:40] 6operations, 10Wikimedia-Mailing-lists: Let public archives be indexed and archived - https://phabricator.wikimedia.org/T90407#1058003 (10Nemo_bis) [22:44:08] (03CR) 10Hoo man: [C: 031] "As far as I managed to understand the discussion (not much) two hours should be ok. Good to go from my side." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192246 (https://phabricator.wikimedia.org/T89379) (owner: 10Base) [22:50:56] !log springle Synchronized wmf-config/db-eqiad.php: db1065 raise load (duration: 00m 07s) [22:51:02] Logged the message, Master [23:01:24] 6operations, 10Wikimedia-Mailing-lists: Let public archives be indexed and archived - https://phabricator.wikimedia.org/T90407#1058046 (10Multichill) Dear Federico Leva, spamming all mailman admins is not appreciated. Please don't do that again. [23:02:42] (03CR) 10GWicke: [C: 031] "LGTM." [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/191652 (https://phabricator.wikimedia.org/T78514) (owner: 10Filippo Giunchedi) [23:07:36] 6operations, 10Wikimedia-Mailing-lists: Let public archives be indexed and archived - https://phabricator.wikimedia.org/T90407#1058069 (10DaBPunkt) Was is really necessary to inform EVERY mailing-list? This is clearly a problem that only OPs can solve, not we mail-list-admins.