[00:00:05] RoanKattouw ostriches Krenair: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160113T0000). Please do the needful. [00:00:05] Krenair RoanKattouw jgirault jan_drewniak: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [00:00:40] RoanKattouw, jgirault: looks like you guys added 4 extra patches when there was only room for 1 [00:00:43] please pick oen [00:00:44] one* [00:01:01] RECOVERY - puppet last run on mw1036 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [00:01:04] Krenair, are you swating? I need to sync the maps service (git deploy / git sync), would i be stepping on your toes? [00:01:20] probably not [00:01:24] ok [00:01:37] I don't know much about git deploy but I think that is external to mediawiki so I'm not too bothered by it [00:01:41] Krenair: I know there was only "room" for one, but mine are all fixing things that are broken in wmf10 [00:01:41] that we discovered right after the cut [00:02:34] (03PS3) 10Alex Monk: Set logos for mobile login page for Wikidata and Wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263201 (https://phabricator.wikimedia.org/T123175) (owner: 10Aude) [00:02:36] (03CR) 10Chad: [C: 031] add OfficeIT namespace to wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263768 (https://phabricator.wikimedia.org/T123383) (owner: 10ArielGlenn) [00:02:40] (03CR) 10Alex Monk: [C: 032] Set logos for mobile login page for Wikidata and Wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263201 (https://phabricator.wikimedia.org/T123175) (owner: 10Aude) [00:03:18] (03Merged) 10jenkins-bot: Set logos for mobile login page for Wikidata and Wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263201 (https://phabricator.wikimedia.org/T123175) (owner: 10Aude) [00:04:16] greg-g, around? [00:04:16] Krenair: You sent me a contentless ping. This is a contentless pong. Please provide a bit of information about what you want and I will respond when I am around. [00:04:20] Krenair: I got approval from Greg to go over the patch limit yesterday. [00:04:34] Krenair: For our one patch (the one that jgirault listed). [00:04:35] yes, I am pinging him to find out if we can do that here too [00:04:56] oh, he approved going over the limit for jgirault's one? ok [00:05:17] Krenair: Yes. That approval was actually for yesterday, but we ended up having to push it to today. [00:05:37] I confirm [00:05:59] !log krenair@tin Synchronized images/mobile/wikidata.png: https://gerrit.wikimedia.org/r/#/c/263201/ (duration: 00m 32s) [00:06:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:06:06] hm. no note on the commit itself [00:06:56] !log krenair@tin Synchronized images/mobile/wikivoyage.png: https://gerrit.wikimedia.org/r/#/c/263201/ (duration: 00m 31s) [00:07:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:07:08] Krenair: Here's the log, FWIW: http://pastebin.com/QEGRfAca [00:07:16] (03PS3) 10Dzahn: jmxtrans: update submodule for lint fix [puppet] - 10https://gerrit.wikimedia.org/r/263670 [00:07:30] where's that from... [00:07:45] Krenair: Oh, sorry. #wikimedia-staff [00:07:52] Krenair: I asked in a less-than-helpful channel. [00:07:52] Uh. [00:07:55] Yeah. [00:08:42] !log switched all maps kartotherian servers to v5, restarted [00:08:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:09:10] I will do it this time, but I'm going to add a note about this on wikitech [00:09:22] Krenair, i'm done, thx [00:10:06] Krenair: Thank you. Feel free to point out that I said Greg approved it. [00:10:07] !log krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/263201/ (duration: 00m 30s) [00:10:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:12:00] (03PS3) 10Alex Monk: Prepare for merge of ApiSandbox into core [mediawiki-config] - 10https://gerrit.wikimedia.org/r/262999 (owner: 10Anomie) [00:12:06] (03CR) 10Alex Monk: [C: 032] Prepare for merge of ApiSandbox into core [mediawiki-config] - 10https://gerrit.wikimedia.org/r/262999 (owner: 10Anomie) [00:13:16] (03Merged) 10jenkins-bot: Prepare for merge of ApiSandbox into core [mediawiki-config] - 10https://gerrit.wikimedia.org/r/262999 (owner: 10Anomie) [00:15:03] !log krenair@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/262999/ (duration: 00m 31s) [00:15:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:16:16] (03PS3) 10Alex Monk: Fix $wgSitename for my.wikipedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263237 (https://phabricator.wikimedia.org/T123191) (owner: 10Mdann52) [00:16:23] (03CR) 10Alex Monk: [C: 032] Fix $wgSitename for my.wikipedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263237 (https://phabricator.wikimedia.org/T123191) (owner: 10Mdann52) [00:17:00] (03Merged) 10jenkins-bot: Fix $wgSitename for my.wikipedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263237 (https://phabricator.wikimedia.org/T123191) (owner: 10Mdann52) [00:17:57] !log krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/263237/ (duration: 00m 31s) [00:18:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:22:19] (03CR) 10Alex Monk: [C: 032] Enable WikiLove extension on es.wikivoyage.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/262894 (https://phabricator.wikimedia.org/T122765) (owner: 10Mdann52) [00:23:07] (03Merged) 10jenkins-bot: Enable WikiLove extension on es.wikivoyage.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/262894 (https://phabricator.wikimedia.org/T122765) (owner: 10Mdann52) [00:24:27] (03CR) 10BryanDavis: [C: 031] add OfficeIT namespace to wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263768 (https://phabricator.wikimedia.org/T123383) (owner: 10ArielGlenn) [00:25:31] !log krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/262894/ (duration: 00m 30s) [00:25:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:26:54] (03PS6) 10Alex Monk: additional import sources for kn.wikisource.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/262895 (https://phabricator.wikimedia.org/T122955) (owner: 10Mdann52) [00:26:59] (03CR) 10Alex Monk: [C: 032] additional import sources for kn.wikisource.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/262895 (https://phabricator.wikimedia.org/T122955) (owner: 10Mdann52) [00:27:27] (03Merged) 10jenkins-bot: additional import sources for kn.wikisource.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/262895 (https://phabricator.wikimedia.org/T122955) (owner: 10Mdann52) [00:28:20] !log krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/262895/ (duration: 00m 32s) [00:28:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:29:31] (03PS3) 10Alex Monk: Enable WikidataPageBanner extension on Ukrainian Wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/261994 (https://phabricator.wikimedia.org/T121999) (owner: 10RLuts) [00:29:40] (03CR) 10Alex Monk: [C: 032] Enable WikidataPageBanner extension on Ukrainian Wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/261994 (https://phabricator.wikimedia.org/T121999) (owner: 10RLuts) [00:30:18] (03Merged) 10jenkins-bot: Enable WikidataPageBanner extension on Ukrainian Wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/261994 (https://phabricator.wikimedia.org/T121999) (owner: 10RLuts) [00:31:16] !log krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/261994/ (duration: 00m 31s) [00:31:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:32:22] (03PS1) 10Nuria: Whitelisting access to js beacon in piwik [puppet] - 10https://gerrit.wikimedia.org/r/263778 (https://phabricator.wikimedia.org/T123260) [00:33:22] jgirault, around? [00:33:25] yes [00:34:24] jgirault, ready to test this commit once it's live? [00:34:34] Krenair: yes [00:35:03] (03PS2) 10Subramanya Sastry: Make testreduce generic and instantiate parsoid-rt services [puppet] - 10https://gerrit.wikimedia.org/r/263322 (https://phabricator.wikimedia.org/T118778) [00:35:52] (03CR) 10Alex Monk: [C: 032] Bump portals to master (deploy new a/b/c test) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263770 (owner: 10JGirault) [00:35:54] 6operations, 7Mail: remove exim alias deskana - https://phabricator.wikimedia.org/T123448#1930067 (10eliza) 3NEW a:3Dzahn [00:35:54] (03CR) 10jenkins-bot: [V: 04-1] Make testreduce generic and instantiate parsoid-rt services [puppet] - 10https://gerrit.wikimedia.org/r/263322 (https://phabricator.wikimedia.org/T118778) (owner: 10Subramanya Sastry) [00:36:35] (03Merged) 10jenkins-bot: Bump portals to master (deploy new a/b/c test) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263770 (owner: 10JGirault) [00:37:20] jgirault, syncing [00:37:40] !log krenair@tin Synchronized portals: https://gerrit.wikimedia.org/r/#/c/263770/ (duration: 00m 33s) [00:37:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:40:24] 6operations, 7Mail: remove wikibugs-irc mail alias ? - https://phabricator.wikimedia.org/T123432#1930087 (10Legoktm) Yeah, you can turn that off. It hasn't been used ever since valhallasw rewrote the bot in python (wikibugs2). And then Phab happened so we re-rewrote the bot (wikibugs.py). It shouldn't be recei... [00:40:36] jgirault, all good? [00:40:57] (03PS3) 10Subramanya Sastry: Make testreduce generic and instantiate parsoid-rt services [puppet] - 10https://gerrit.wikimedia.org/r/263322 (https://phabricator.wikimedia.org/T118778) [00:41:11] Krenair: Unless we purge varnish I need to wait a little [00:41:19] oh, right [00:41:41] I purged https://www.wikipedia.org/ [00:41:43] what about now? [00:41:47] (03CR) 10jenkins-bot: [V: 04-1] Make testreduce generic and instantiate parsoid-rt services [puppet] - 10https://gerrit.wikimedia.org/r/263322 (https://phabricator.wikimedia.org/T118778) (owner: 10Subramanya Sastry) [00:43:01] (03PS4) 10Subramanya Sastry: Make testreduce generic and instantiate parsoid-rt services [puppet] - 10https://gerrit.wikimedia.org/r/263322 (https://phabricator.wikimedia.org/T118778) [00:44:20] (03PS2) 10ArielGlenn: dumps: new actions 'show lastrun' and 'show alldone' for dumps admin [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/263762 [00:44:26] jgirault, ? [00:44:37] I don’t see any change so far [00:46:38] (03CR) 10ArielGlenn: [C: 032 V: 032] dumps: new actions 'show lastrun' and 'show alldone' for dumps admin [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/263762 (owner: 10ArielGlenn) [00:49:22] jgirault, well... I don't have any other ideas [00:49:28] maybe bblack knows how to purge it [00:50:23] 6operations, 7Mail: remove exim alias - usermetrics - https://phabricator.wikimedia.org/T123452#1930113 (10JKrauska) 3NEW a:3Dzahn [00:50:23] Krenair, tried purging http:// ? [00:51:54] jgirault, ^ done that, try now? [00:52:53] Krenair: MaxSem: still do not see my changes :/ [00:53:54] jgirault, it shows to me... [00:53:56] (03PS1) 10Ori.livneh: piwik: do not require authentication for public endpoints [puppet] - 10https://gerrit.wikimedia.org/r/263786 [00:54:13] oh no wait [00:54:23] (03PS5) 10Subramanya Sastry: Make testreduce generic and instantiate parsoid-rt services [puppet] - 10https://gerrit.wikimedia.org/r/263322 (https://phabricator.wikimedia.org/T118778) [00:54:43] I was looking at something duplicated, it existed in the old version [00:55:36] (03CR) 10jenkins-bot: [V: 04-1] Make testreduce generic and instantiate parsoid-rt services [puppet] - 10https://gerrit.wikimedia.org/r/263322 (https://phabricator.wikimedia.org/T118778) (owner: 10Subramanya Sastry) [00:56:13] (03PS2) 10Ori.livneh: piwik: do not require authentication for public endpoints [puppet] - 10https://gerrit.wikimedia.org/r/263786 [01:01:52] 6operations, 7Mail: remove exim alias deskana - https://phabricator.wikimedia.org/T123448#1930176 (10Dzahn) done ``` -# Dan Garry -deskana: dgarry - ``` [01:02:10] 6operations, 7Mail: remove exim alias deskana - https://phabricator.wikimedia.org/T123448#1930177 (10Dzahn) [01:02:26] 6operations, 7Mail: Move most (all?) exim personal aliases to OIT - https://phabricator.wikimedia.org/T122144#1930183 (10Dzahn) [01:02:28] 6operations, 7Mail: remove exim alias deskana - https://phabricator.wikimedia.org/T123448#1930182 (10Dzahn) 5Open>3Resolved [01:02:37] Ohhhh. [01:02:40] jgirault, sorry [01:02:54] I forgot the git submodule update [01:02:54] (03PS3) 10Ori.livneh: piwik: do not require authentication for public endpoints [puppet] - 10https://gerrit.wikimedia.org/r/263786 [01:02:59] oh =) [01:03:09] (03CR) 10Ori.livneh: [C: 032 V: 032] piwik: do not require authentication for public endpoints [puppet] - 10https://gerrit.wikimedia.org/r/263786 (owner: 10Ori.livneh) [01:03:42] !log krenair@tin Synchronized portals: https://gerrit.wikimedia.org/r/#/c/263770/ - after having done the submodule update this time (duration: 00m 31s) [01:03:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:04:42] 6operations, 7Mail: remove exim alias - chase, chasemp, rush - https://phabricator.wikimedia.org/T123453#1930197 (10JKrauska) 3NEW a:3Dzahn [01:05:23] Krenair: yay, works now!! thanks :) [01:05:34] great [01:05:48] well, I guess SWAT is over [01:06:02] 6operations, 7Mail: remove exim alias deskana - https://phabricator.wikimedia.org/T123448#1930211 (10Deskana) I'm going to assume that this doesn't affect anything for me? :-) [01:06:08] 6operations, 7Mail: Move most (all?) exim personal aliases to OIT - https://phabricator.wikimedia.org/T122144#1930214 (10Dzahn) [01:06:10] 6operations, 7Mail: remove exim alias - usermetrics - https://phabricator.wikimedia.org/T123452#1930212 (10Dzahn) 5Open>3Resolved done ``` -## User Metrics (RT-4885) ## -usermetrics: dario, rfaulkner, dvanliere, erosen, dandreescu - ``` [01:10:12] 6operations, 7Mail: remove exim alias deskana - https://phabricator.wikimedia.org/T123448#1930223 (10eliza) HI Dan, No it will not, your alias is already configured in LDAP/Google. We could test it - i'll send you an email. Eliza [01:10:58] (03Abandoned) 10Nuria: Whitelisting access to js beacon in piwik [puppet] - 10https://gerrit.wikimedia.org/r/263778 (https://phabricator.wikimedia.org/T123260) (owner: 10Nuria) [01:11:18] 6operations, 7Mail: Move most (all?) exim personal aliases to OIT - https://phabricator.wikimedia.org/T122144#1930229 (10Dzahn) [01:11:20] 6operations, 7Mail: remove wikibugs-irc mail alias ? - https://phabricator.wikimedia.org/T123432#1930227 (10Dzahn) 5Open>3Resolved done. ``` -# Bug mail to IRC bridge -wikibugs-irc: |/usr/local/bin/wikibugs.pl - ``` [01:12:16] mutante: lol, everything pointing to that was dead ^ [01:12:43] 6operations, 7Mail: remove exim alias deskana - https://phabricator.wikimedia.org/T123448#1930233 (10Deskana) Great. Thanks @eliza! [01:12:49] 6operations, 7Mail: remove exim alias - dapatrick - https://phabricator.wikimedia.org/T123454#1930234 (10JKrauska) 3NEW a:3Dzahn [01:13:07] 6operations, 7Mail: remove exim alias deskana - https://phabricator.wikimedia.org/T123448#1930241 (10Deskana) Yep, it's all good. Thanks again! [01:13:14] (03CR) 10Alex Monk: "Yes, that would be my plan as well, however it's the use of MassMessage itself which I haven't needed to do before and therefore wouldn't " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237686 (owner: 10Legoktm) [01:16:05] (03PS9) 10Madhuvishy: wikimetrics: Puppet module for wikimetrics [puppet] - 10https://gerrit.wikimedia.org/r/260687 [01:18:57] !log krenair@tin Synchronized php-1.27.0-wmf.10/extensions/Flow/modules/editor/editors/visualeditor/mw.flow.ve.Target.js: https://gerrit.wikimedia.org/r/#/c/263644/ (duration: 00m 31s) [01:19:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:19:41] !log krenair@tin Synchronized php-1.27.0-wmf.10/extensions/Echo/Resources.php: https://gerrit.wikimedia.org/r/#/c/263645/ (duration: 00m 32s) [01:19:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:21:35] legoktm, around? [01:24:31] (03PS10) 10Madhuvishy: wikimetrics: Puppet module for wikimetrics [puppet] - 10https://gerrit.wikimedia.org/r/260687 (https://phabricator.wikimedia.org/T101763) [01:26:19] Krenair, done with depl? I will push latest tilerator service now [01:26:57] I guess so [01:27:46] Krenair: yes [01:28:51] legoktm, oh, I was having issues, now it all works [01:33:28] !log deployed tilerator maps service [01:33:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:34:56] PROBLEM - MariaDB Slave Lag: m3 on db1048 is CRITICAL: CRITICAL slave_sql_lag Seconds_Behind_Master: 1758 [01:37:06] RECOVERY - MariaDB Slave Lag: m3 on db1048 is OK: OK slave_sql_lag Seconds_Behind_Master: 0 [01:37:47] (03PS4) 10Dzahn: jmxtrans: update submodule for lint fix [puppet] - 10https://gerrit.wikimedia.org/r/263670 [01:38:30] (03PS1) 10Jdrewniak: Bump portals to master (fixes event logging around search) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263796 [01:39:15] (03CR) 10Chad: [C: 031] "Super trivial and a local hack on beta." [puppet] - 10https://gerrit.wikimedia.org/r/263019 (https://phabricator.wikimedia.org/T85239) (owner: 10Gergő Tisza) [01:40:15] (03CR) 10Chad: [C: 031] "Also trivial and in place in beta already." [puppet] - 10https://gerrit.wikimedia.org/r/263012 (https://phabricator.wikimedia.org/T85239) (owner: 10Gergő Tisza) [01:41:52] (03CR) 10Chad: "*bump*" [puppet] - 10https://gerrit.wikimedia.org/r/217794 (https://phabricator.wikimedia.org/T94570) (owner: 10Muehlenhoff) [01:41:58] 6operations, 7Mail: Get mail relay out of Yahoo! blacklist: apply to Yahoo for whitelisting bulk mail - https://phabricator.wikimedia.org/T58414#1930321 (10Krenair) [01:47:01] Jamesofur, hey [01:47:13] just been checking through some mailing-lists bugs [01:47:43] are you able to do any of https://phabricator.wikimedia.org/T123163 https://phabricator.wikimedia.org/T122560 or https://phabricator.wikimedia.org/T116740 ? [01:47:51] by having the master password [01:49:35] Krenair: I'm guessing https://gerrit.wikimedia.org/r/#/c/263767/ was not done because it wasn't urgent and was over the limit? [01:50:12] I sort of did the other two after the swat ended because they looked like otherwise stuff would be broken [01:50:31] OK [01:50:42] I'll reschedule that one for tomorrow then [01:50:44] ok [01:53:47] (03PS1) 10Eevans: EventBus configuration (currently disabled) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263797 (https://phabricator.wikimedia.org/T116786) [01:58:37] (03CR) 10Alex Monk: "not +1ing this myself because I'm not convinced about the name, and whether it should be a namespace or subpages/categories/etc." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263768 (https://phabricator.wikimedia.org/T123383) (owner: 10ArielGlenn) [02:00:01] (03CR) 10Alex Monk: "I'm just being fussy and nitpicky though, don't take this as a -1" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263768 (https://phabricator.wikimedia.org/T123383) (owner: 10ArielGlenn) [02:10:36] 6operations, 6Performance-Team, 10Wikimedia-General-or-Unknown, 5Patch-For-Review, 5WMF-deploy-2016-01-19_(1.27.0-wmf.11): jobrunner memory leaks - https://phabricator.wikimedia.org/T122069#1930379 (10ori) Yep, it's GWT: {F3221921 size=full} [02:18:02] PROBLEM - graphite.wikimedia.org on graphite1001 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 398 bytes in 0.003 second response time [02:19:43] (03CR) 10ArielGlenn: "I figured that discussion would happen on the ticket and this patch would be updated as needed to reflect the agreed-upon name (if indeed " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263768 (https://phabricator.wikimedia.org/T123383) (owner: 10ArielGlenn) [02:23:27] Hm.. Graphite seems to be throwing [02:23:31] Can't view anything in Grafana [02:23:37] Consistently getting 502 [02:25:09] OK. Who broke Graphite.. [02:26:35] (03PS2) 10Andrew Bogott: Add Sentry hiera rules to deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/263012 (https://phabricator.wikimedia.org/T85239) (owner: 10Gergő Tisza) [02:32:58] (03CR) 10Andrew Bogott: [C: 032] Add Sentry hiera rules to deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/263012 (https://phabricator.wikimedia.org/T85239) (owner: 10Gergő Tisza) [02:34:41] !log mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 11m 13s) [02:34:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:41:39] (03CR) 10Mobrovac: [C: 04-1] "Some comments in-lined. Also, I see the new role introduced, but don't see it used anywhere." (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/263322 (https://phabricator.wikimedia.org/T118778) (owner: 10Subramanya Sastry) [02:44:18] !log Graphite is down. Consistently returns HTTP 502 Bad Gateway for any/all requests [02:44:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:46:17] https://ganglia.wikimedia.org/latest/?r=4hr&c=Miscellaneous+eqiad&h=graphite1001.eqiad.wmnet [02:46:24] Looks like something spiked [02:47:32] CPU went from 9% to 33%. Load from 12 to 20. [02:49:52] top shows that about 8 uwsgi processes are occupying 100% of their respective CPU (32 CPUs total). [02:55:22] RECOVERY - graphite.wikimedia.org on graphite1001 is OK: HTTP OK: HTTP/1.1 200 OK - 1572 bytes in 0.025 second response time [02:57:32] !log Manually killed uwsgi graphite-web child processes on graphite1001. Service recovered itself from there. [02:57:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:06:51] ori: Hm.. so graphite was 100% unresponsive for the better part of an hour until I killed some processes (not the usual flip flopping where icinga reports critical for 1 minute only and recovers itself) [03:07:13] Not sure what it was. ssh was working fine. It had about a dozen cgi processes occuping a complete CPU each. [03:07:17] * Krinkle away [03:08:45] !log mwdeploy@tin sync-l10n completed (1.27.0-wmf.10) (duration: 16m 09s) [03:08:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:10:15] (03PS1) 10Gergő Tisza: Configure bot passwords [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263804 (https://phabricator.wikimedia.org/T123451) [03:15:58] !log l10nupdate@tin ResourceLoader cache refresh completed at Wed Jan 13 03:15:57 UTC 2016 (duration 7m 13s) [03:16:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:21:46] 6operations, 7Mail: remove exim alias deskana - https://phabricator.wikimedia.org/T123448#1930433 (10Dzahn) Thanks as well to both of you, this is the best way to solve these :) [03:22:34] (03PS5) 10Dzahn: jmxtrans: update submodule for lint fix [puppet] - 10https://gerrit.wikimedia.org/r/263670 [03:24:08] (03CR) 10Dzahn: "@yuvipanda i'd test in compiler but it's not on a prod node .." [puppet] - 10https://gerrit.wikimedia.org/r/260187 (owner: 10Dzahn) [03:57:41] PROBLEM - puppet last run on analytics1040 is CRITICAL: CRITICAL: puppet fail [04:07:23] (03PS1) 10ArielGlenn: dumps: set up but don't enable script for dumps to run from cron [puppet] - 10https://gerrit.wikimedia.org/r/263807 [04:10:01] (03PS2) 10ArielGlenn: dumps: set up but don't enable script for dumps to run from cron [puppet] - 10https://gerrit.wikimedia.org/r/263807 [04:12:18] 6operations, 10Dumps-Generation: Make dumps run via cron on each snapshot host - https://phabricator.wikimedia.org/T107750#1930462 (10ArielGlenn) [04:12:47] (03PS3) 10ArielGlenn: dumps: set up but don't enable script for dumps to run from cron [puppet] - 10https://gerrit.wikimedia.org/r/263807 (https://phabricator.wikimedia.org/T107750) [04:24:43] RECOVERY - puppet last run on analytics1040 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [04:27:28] 6operations, 10Dumps-Generation: Make dumps run via cron on each snapshot host - https://phabricator.wikimedia.org/T107750#1930472 (10ArielGlenn) What's left: * clean up the cron classes in the last commit https://gerrit.wikimedia.org/r/#/c/263807/ so that all conf file and other dependencies are called out,... [04:45:50] 6operations, 10Salt: Move salt master to separate host from puppet master - https://phabricator.wikimedia.org/T115287#1930510 (10ArielGlenn) I sent a second reminder mail asking folks to test neodymium as salt master for salt commands (not yet for git deploy). It's a copy of my email from Dec 30 so no new con... [04:54:22] PROBLEM - puppet last run on cp4005 is CRITICAL: CRITICAL: Puppet has 1 failures [05:13:25] (03CR) 10Alex Monk: [C: 04-1] "Not currently being branched, also the security review ticket is still open." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263797 (https://phabricator.wikimedia.org/T116786) (owner: 10Eevans) [05:19:31] RECOVERY - puppet last run on cp4005 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [05:19:34] (03CR) 10Alex Monk: [C: 031] Add temporary lift of IP cap for eswiki/wikivoyage on 2016-01-14/15 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263625 (https://phabricator.wikimedia.org/T123351) (owner: 10Mdann52) [05:20:37] (03CR) 10Alex Monk: "Not sure if we require local community discussion on these sorts of things... thoughts?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263342 (https://phabricator.wikimedia.org/T123188) (owner: 10Mdann52) [05:24:08] (03CR) 10Peachey88: "Yes, there should be local discussion. It is changing how the categories are used on the local wiki." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263342 (https://phabricator.wikimedia.org/T123188) (owner: 10Mdann52) [05:35:30] (03CR) 10Alex Monk: [C: 04-1] "I don't think those tasks had any community consensus" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263614 (https://phabricator.wikimedia.org/T121853) (owner: 10Mdann52) [05:38:29] (03CR) 10Alex Monk: [C: 04-1] Localisation of Babel categories on nap.wikipedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263342 (https://phabricator.wikimedia.org/T123188) (owner: 10Mdann52) [05:41:15] (03CR) 10Alex Monk: [C: 04-1] "variable still in use" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/261999 (https://phabricator.wikimedia.org/T122754) (owner: 10Florianschmidtwelzow) [06:30:01] PROBLEM - puppet last run on db2055 is CRITICAL: CRITICAL: puppet fail [06:30:22] PROBLEM - puppet last run on mc1017 is CRITICAL: CRITICAL: Puppet has 2 failures [06:31:52] PROBLEM - puppet last run on mw1135 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:12] PROBLEM - puppet last run on mw1061 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:22] PROBLEM - puppet last run on ms-be1010 is CRITICAL: CRITICAL: Puppet has 2 failures [06:32:23] PROBLEM - puppet last run on mw1110 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:32] PROBLEM - puppet last run on mw2208 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:42] PROBLEM - puppet last run on mw2036 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:52] PROBLEM - puppet last run on db2044 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:01] PROBLEM - puppet last run on mw2207 is CRITICAL: CRITICAL: Puppet has 2 failures [06:33:01] PROBLEM - puppet last run on mw2073 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:22] PROBLEM - puppet last run on mw2018 is CRITICAL: CRITICAL: Puppet has 1 failures [06:34:02] PROBLEM - puppet last run on mw2129 is CRITICAL: CRITICAL: Puppet has 2 failures [06:34:38] (03CR) 10Mdann52: "There is one active user, consensus for the other task was just them voting alone, I don't think its worth asking them to start yet anothe" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263342 (https://phabricator.wikimedia.org/T123188) (owner: 10Mdann52) [06:43:26] !log ori@tin Synchronized php-1.27.0-wmf.10/extensions/GWToolset: Ib9375b: Make sure XMLReader::close() is always called (T122069) (duration: 01m 07s) [06:43:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [06:44:25] (03CR) 10Mdann52: "The community consists of one person, who submitted this change (look at the discussion for the other bug!). Is it worth making them open " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263614 (https://phabricator.wikimedia.org/T121853) (owner: 10Mdann52) [06:45:45] !log ori@tin Synchronized php-1.27.0-wmf.9/extensions/GWToolset: Ib9375b: Make sure XMLReader::close() is always called (T122069) (duration: 00m 32s) [06:45:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [06:50:21] PROBLEM - puppet last run on analytics1057 is CRITICAL: CRITICAL: Puppet has 1 failures [06:55:41] RECOVERY - puppet last run on mw2036 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [06:56:51] RECOVERY - puppet last run on mw1135 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [06:57:11] RECOVERY - puppet last run on mw1061 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [06:57:12] RECOVERY - puppet last run on db2055 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [06:57:21] RECOVERY - puppet last run on ms-be1010 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [06:57:22] RECOVERY - puppet last run on mw1110 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [06:57:31] RECOVERY - puppet last run on mc1017 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:32] RECOVERY - puppet last run on mw2208 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:52] RECOVERY - puppet last run on db2044 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:52] RECOVERY - puppet last run on mw2207 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [06:58:01] RECOVERY - puppet last run on mw2073 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [06:58:22] RECOVERY - puppet last run on mw2018 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:32] PROBLEM - puppet last run on db1011 is CRITICAL: CRITICAL: Puppet has 1 failures [06:59:02] RECOVERY - puppet last run on mw2129 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:05:22] (03CR) 10Florianschmidtwelzow: "@Krenair: What do you mean? Have you seen I63cdc0a7fd51ca3a45dc2fd83b22eb58a8de520c ? :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/261999 (https://phabricator.wikimedia.org/T122754) (owner: 10Florianschmidtwelzow) [07:12:43] (03PS6) 10Subramanya Sastry: Make testreduce generic and instantiate parsoid-rt services [puppet] - 10https://gerrit.wikimedia.org/r/263322 (https://phabricator.wikimedia.org/T118778) [07:13:34] (03CR) 10jenkins-bot: [V: 04-1] Make testreduce generic and instantiate parsoid-rt services [puppet] - 10https://gerrit.wikimedia.org/r/263322 (https://phabricator.wikimedia.org/T118778) (owner: 10Subramanya Sastry) [07:15:32] RECOVERY - puppet last run on analytics1057 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:15:46] (03PS7) 10Subramanya Sastry: Make testreduce generic and instantiate parsoid-rt services [puppet] - 10https://gerrit.wikimedia.org/r/263322 (https://phabricator.wikimedia.org/T118778) [07:23:42] RECOVERY - puppet last run on db1011 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [07:26:40] (03PS1) 10Yuvipanda: eventlogging: Emit \n at beginning of log message, not end [puppet] - 10https://gerrit.wikimedia.org/r/263816 [07:26:43] (03PS1) 10Yuvipanda: tools: Add simple command to log tool invocations [puppet] - 10https://gerrit.wikimedia.org/r/263817 (https://phabricator.wikimedia.org/T123444) [07:27:09] (03CR) 10jenkins-bot: [V: 04-1] eventlogging: Emit \n at beginning of log message, not end [puppet] - 10https://gerrit.wikimedia.org/r/263816 (owner: 10Yuvipanda) [07:27:20] (03CR) 10jenkins-bot: [V: 04-1] tools: Add simple command to log tool invocations [puppet] - 10https://gerrit.wikimedia.org/r/263817 (https://phabricator.wikimedia.org/T123444) (owner: 10Yuvipanda) [07:28:39] (03PS2) 10Yuvipanda: tools: Add simple command to log tool invocations [puppet] - 10https://gerrit.wikimedia.org/r/263817 (https://phabricator.wikimedia.org/T123444) [07:28:58] (03PS2) 10Yuvipanda: eventlogging: Emit \n at beginning of log message, not end [puppet] - 10https://gerrit.wikimedia.org/r/263816 [07:44:15] 6operations, 6Release-Engineering-Team, 10Wikimedia-Apache-configuration: Make it possible to quickly and programmatically pool and depool application servers - https://phabricator.wikimedia.org/T73212#1930647 (10Joe) a:3Joe [08:01:58] (03PS1) 10Giuseppe Lavagetto: conftool-data: add mobileapps nodes, mathoid service [puppet] - 10https://gerrit.wikimedia.org/r/263819 [08:16:39] (03PS1) 10Giuseppe Lavagetto: puppet-merge: auto-run conftool-merge [puppet] - 10https://gerrit.wikimedia.org/r/263821 [08:23:18] (03CR) 10Giuseppe Lavagetto: [C: 032] conftool-data: add mobileapps nodes, mathoid service [puppet] - 10https://gerrit.wikimedia.org/r/263819 (owner: 10Giuseppe Lavagetto) [08:27:05] (03PS1) 10Giuseppe Lavagetto: conftool-data: fix typo [puppet] - 10https://gerrit.wikimedia.org/r/263822 [08:27:25] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/263822 (owner: 10Giuseppe Lavagetto) [08:27:44] (03CR) 10Merlijn van Deen: [C: 04-1] "minor suggestions inline" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/263817 (https://phabricator.wikimedia.org/T123444) (owner: 10Yuvipanda) [08:39:12] (03CR) 10Ema: puppet-merge: auto-run conftool-merge (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/263821 (owner: 10Giuseppe Lavagetto) [08:50:12] PROBLEM - Disk space on stat1002 is CRITICAL: DISK CRITICAL - free space: /a 329241 MB (3% inode=99%) [09:00:51] RECOVERY - Disk space on stat1002 is OK: DISK OK [09:03:28] (03PS3) 10Thiemo Mättig (WMDE): Basic "Identifiers" statement section config for Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263046 (https://phabricator.wikimedia.org/T123112) [09:06:08] (03CR) 10Thiemo Mättig (WMDE): "Moved to Wikibase.php. Thanks for the suggestion, this indeed makes much more sense." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263046 (https://phabricator.wikimedia.org/T123112) (owner: 10Thiemo Mättig (WMDE)) [09:11:54] (03PS1) 10Ema: standard_packages: add dstat and ncdu [puppet] - 10https://gerrit.wikimedia.org/r/263824 [09:20:13] ema: neat, i didn't know about ncdu [09:22:17] ori: yeah, it's really useful to find forgotten huge directories :) [09:22:37] (03CR) 10Ori.livneh: [C: 031] standard_packages: add dstat and ncdu [puppet] - 10https://gerrit.wikimedia.org/r/263824 (owner: 10Ema) [09:23:10] <_joe_> uh I didn't know ncdu [09:26:12] PROBLEM - puppet last run on cp2017 is CRITICAL: CRITICAL: puppet fail [09:26:23] (03CR) 10Alexandros Kosiaris: [C: 032] standard_packages: add dstat and ncdu [puppet] - 10https://gerrit.wikimedia.org/r/263824 (owner: 10Ema) [09:26:30] (03PS2) 10Alexandros Kosiaris: standard_packages: add dstat and ncdu [puppet] - 10https://gerrit.wikimedia.org/r/263824 (owner: 10Ema) [09:26:32] (03CR) 10Alexandros Kosiaris: [V: 032] standard_packages: add dstat and ncdu [puppet] - 10https://gerrit.wikimedia.org/r/263824 (owner: 10Ema) [09:27:42] akosiaris: thanks! [09:32:30] (03PS4) 10Faidon Liambotis: Add %D (response time in microseconds) to Apache log formats [puppet] - 10https://gerrit.wikimedia.org/r/263637 (owner: 10Ori.livneh) [09:32:56] (03CR) 10Faidon Liambotis: [C: 032] "Analytics doesn't use the Apache logs." [puppet] - 10https://gerrit.wikimedia.org/r/263637 (owner: 10Ori.livneh) [09:33:16] (03CR) 10Faidon Liambotis: [V: 032] Add %D (response time in microseconds) to Apache log formats [puppet] - 10https://gerrit.wikimedia.org/r/263637 (owner: 10Ori.livneh) [09:34:17] (03PS3) 10Faidon Liambotis: Get rid of .pep8 files [puppet] - 10https://gerrit.wikimedia.org/r/262598 (https://phabricator.wikimedia.org/T114887) (owner: 10Hashar) [09:34:28] (03CR) 10Faidon Liambotis: [C: 032] Get rid of .pep8 files [puppet] - 10https://gerrit.wikimedia.org/r/262598 (https://phabricator.wikimedia.org/T114887) (owner: 10Hashar) [09:36:36] (03CR) 10Giuseppe Lavagetto: [C: 04-2] "We explicitly want this _not_ to happen as a wrong apache change can be disruptive. We want to be able to restart/reload apache (which you" [puppet] - 10https://gerrit.wikimedia.org/r/263745 (owner: 10JanZerebecki) [09:38:51] (03CR) 10Faidon Liambotis: [C: 04-1] tox entry point to run pep8==1.4.6 (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/244148 (https://phabricator.wikimedia.org/T114887) (owner: 10Hashar) [09:47:16] godog: is ms-fe3001 you? [09:47:40] seems so [09:47:44] <_joe_> yes [09:47:44] active icinga alerts :) [09:47:57] <_joe_> I looked in the SAL this morning too [09:53:41] RECOVERY - puppet last run on cp2017 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [09:55:22] PROBLEM - puppet last run on wtp2009 is CRITICAL: CRITICAL: puppet fail [10:00:11] PROBLEM - check_mysql on db1008 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 610 [10:07:39] paravoid: yup, expired downtime from yesterday, {{done}} [10:10:11] PROBLEM - check_mysql on db1008 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 865 [10:20:11] RECOVERY - check_mysql on db1008 is OK: Uptime: 72050 Threads: 2 Questions: 421156 Slow queries: 441 Opens: 697 Flush tables: 2 Open tables: 331 Queries per second avg: 5.845 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [10:21:46] (03CR) 10Filippo Giunchedi: [C: 04-1] "IMO it should work the same as m.wikipedia.org (i.e. mobile version, not the homepage) and there's no harm in keeping it around" [dns] - 10https://gerrit.wikimedia.org/r/256597 (https://phabricator.wikimedia.org/T120143) (owner: 10Dzahn) [10:23:12] RECOVERY - puppet last run on wtp2009 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [10:28:03] (03PS1) 10Faidon Liambotis: hhvm: use random tmpdir in hhvm-collect-heaps [puppet] - 10https://gerrit.wikimedia.org/r/263829 [10:31:34] !log upgrading grafana 2.6.0-beta1 -> 2.6.0 [10:31:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:33:38] hrm, starred dashboards are gone... [10:33:40] wonder why [10:34:51] oh, wait, these should be my starred, not featured [10:39:13] (03PS1) 10Giuseppe Lavagetto: eventbus: add dedicated cluster [puppet] - 10https://gerrit.wikimedia.org/r/263831 [10:39:15] (03PS1) 10Giuseppe Lavagetto: eventbus: add servers to conftool-data [puppet] - 10https://gerrit.wikimedia.org/r/263832 [10:41:31] (03PS3) 10Filippo Giunchedi: swift: add explicit bind_port to servers [puppet] - 10https://gerrit.wikimedia.org/r/263630 (https://phabricator.wikimedia.org/T117972) [10:41:33] (03PS3) 10Filippo Giunchedi: swift: adjust mount options for debian and ubuntu [puppet] - 10https://gerrit.wikimedia.org/r/263629 (https://phabricator.wikimedia.org/T117972) [10:52:10] (03CR) 10Giuseppe Lavagetto: [C: 032] eventbus: add dedicated cluster [puppet] - 10https://gerrit.wikimedia.org/r/263831 (owner: 10Giuseppe Lavagetto) [10:52:50] (03CR) 10Filippo Giunchedi: "https://puppet-compiler.wmflabs.org/1593/" [puppet] - 10https://gerrit.wikimedia.org/r/263628 (https://phabricator.wikimedia.org/T117972) (owner: 10Filippo Giunchedi) [10:55:09] (03CR) 10Faidon Liambotis: [C: 04-1] "I don't think we use swauth anymore -- I switched us to tempauth years ago. This is a historical artifact I think and can be safely be rem" [puppet] - 10https://gerrit.wikimedia.org/r/263628 (https://phabricator.wikimedia.org/T117972) (owner: 10Filippo Giunchedi) [10:59:53] paravoid: d'oh, you are right, {{done}} [10:59:58] (03PS4) 10Filippo Giunchedi: swift: add explicit bind_port to servers [puppet] - 10https://gerrit.wikimedia.org/r/263630 (https://phabricator.wikimedia.org/T117972) [11:00:00] (03PS4) 10Filippo Giunchedi: swift: adjust mount options for debian and ubuntu [puppet] - 10https://gerrit.wikimedia.org/r/263629 (https://phabricator.wikimedia.org/T117972) [11:00:02] (03PS2) 10Filippo Giunchedi: swift: add python-webob, remove python-swauth deps [puppet] - 10https://gerrit.wikimedia.org/r/263628 (https://phabricator.wikimedia.org/T117972) [11:03:55] (03CR) 10Giuseppe Lavagetto: [C: 032] eventbus: add servers to conftool-data [puppet] - 10https://gerrit.wikimedia.org/r/263832 (owner: 10Giuseppe Lavagetto) [11:09:10] PROBLEM - Host mr1-codfw.oob is DOWN: PING CRITICAL - Packet loss = 100% [11:10:50] 6operations, 10Traffic, 5Patch-For-Review, 7Performance: Varnish apparently unconditionally varies on session cookies - https://phabricator.wikimedia.org/T122673#1930791 (10ema) This seems to be fixed by https://gerrit.wikimedia.org/r/#/c/259882: curl -s -I --cookie 'enwikiSession=123' https://en.wikiped... [11:15:30] RECOVERY - Host mr1-codfw.oob is UP: PING OK - Packet loss = 0%, RTA = 35.53 ms [11:18:00] (03PS3) 10Filippo Giunchedi: swift: add python-webob, remove python-swauth deps [puppet] - 10https://gerrit.wikimedia.org/r/263628 (https://phabricator.wikimedia.org/T117972) [11:18:07] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] swift: add python-webob, remove python-swauth deps [puppet] - 10https://gerrit.wikimedia.org/r/263628 (https://phabricator.wikimedia.org/T117972) (owner: 10Filippo Giunchedi) [11:18:16] 6operations, 10vm-requests: Site: (1) VM request for url_downloader - https://phabricator.wikimedia.org/T123472#1930804 (10akosiaris) 3NEW [11:18:42] 6operations: request VM for url-downloader in codfw - https://phabricator.wikimedia.org/T123386#1930812 (10akosiaris) Tracked in T123472 [11:18:49] 6operations, 10vm-requests: Site: (1) VM request for url_downloader - https://phabricator.wikimedia.org/T123472#1930804 (10akosiaris) [11:23:11] RECOVERY - puppet last run on ms-fe3001 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [11:33:39] RECOVERY - Swift HTTP backend on ms-fe3001 is OK: HTTP OK: HTTP/1.1 200 OK - 391 bytes in 0.231 second response time [11:34:09] RECOVERY - Swift HTTP frontend on ms-fe3001 is OK: HTTP OK: HTTP/1.1 200 OK - 185 bytes in 0.182 second response time [11:34:29] godog: so swift 2.5 is not available for precise, right? [11:34:48] I was thinking of maybe it'd be a safer avenue to just upgrade swift first across the fleet, then do the distro upgrades [11:38:03] paravoid: there aren't afaik but I'm double checking [11:42:49] paravoid: not for precise no, 2.5 is trusty-only using cloud archive [11:45:01] paravoid: though we could upgrade eqiad to trusty, codfw has been running trusty since the beginning and the last batch of machines in eqiad runs trusty too [11:45:32] meh [11:48:34] (03PS2) 10Alexandros Kosiaris: osm: split and move roles to modules/role/ [puppet] - 10https://gerrit.wikimedia.org/r/260936 (owner: 10Dzahn) [11:51:09] PROBLEM - test icmp reachability to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 24 probes of 396 (alerts on 19) [11:53:28] PROBLEM - test icmp reachability to eqiad on ripe-atlas-eqiad is CRITICAL: CRITICAL - failed 29 probes of 383 (alerts on 19) [11:54:49] (03CR) 10Alexandros Kosiaris: [C: 032] osm: split and move roles to modules/role/ [puppet] - 10https://gerrit.wikimedia.org/r/260936 (owner: 10Dzahn) [11:54:53] <_joe_> uh [11:55:03] <_joe_> what's that icmp reachability alarm about? [12:01:51] eqiad is https://atlas.ripe.net/measurements/1790945/ and codfw is https://atlas.ripe.net/measurements/1791210/ from puppet, not sure if it is atlas itself [12:11:30] (03PS5) 10Filippo Giunchedi: swift: add explicit bind_port to servers [puppet] - 10https://gerrit.wikimedia.org/r/263630 (https://phabricator.wikimedia.org/T117972) [12:11:32] (03PS5) 10Filippo Giunchedi: swift: adjust mount options for debian and ubuntu [puppet] - 10https://gerrit.wikimedia.org/r/263629 (https://phabricator.wikimedia.org/T117972) [12:15:50] (03CR) 10Tim Landscheidt: "This installs mpt-status back in the Jessie image. It was previously removed in Ie52618f946685473797f6b4d2ed924e065b3d689 because it was " [puppet] - 10https://gerrit.wikimedia.org/r/263773 (owner: 10Andrew Bogott) [12:20:42] (03PS6) 10Filippo Giunchedi: swift: add explicit bind_port to servers [puppet] - 10https://gerrit.wikimedia.org/r/263630 (https://phabricator.wikimedia.org/T117972) [12:20:48] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] swift: add explicit bind_port to servers [puppet] - 10https://gerrit.wikimedia.org/r/263630 (https://phabricator.wikimedia.org/T117972) (owner: 10Filippo Giunchedi) [12:51:45] 6operations, 10Wikimedia-SVG-rendering: Install Noto CJK (Source Han Sans) font family for SVG rendering - https://phabricator.wikimedia.org/T123223#1930897 (10chasemp) [12:55:29] PROBLEM - puppet last run on mw1003 is CRITICAL: CRITICAL: Puppet has 2 failures [13:06:49] PROBLEM - RAID on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:08:49] PROBLEM - SSH on mw1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:08:58] PROBLEM - salt-minion processes on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:08:59] PROBLEM - DPKG on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:10:58] RECOVERY - salt-minion processes on mw1003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [13:10:58] RECOVERY - SSH on mw1003 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3 (protocol 2.0) [13:17:19] PROBLEM - SSH on mw1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:17:19] PROBLEM - salt-minion processes on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:18:09] PROBLEM - nutcracker process on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:18:09] PROBLEM - configured eth on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:19:38] PROBLEM - Disk space on mw1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:19:39] PROBLEM - salt-minion processes on mw1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:20:09] PROBLEM - RAID on mw1011 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:20:09] PROBLEM - puppet last run on mw1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:20:18] PROBLEM - DPKG on mw1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:20:29] PROBLEM - puppet last run on mw1011 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:20:30] PROBLEM - dhclient process on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:20:38] PROBLEM - RAID on mw1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:20:39] PROBLEM - SSH on mw1007 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:21:00] PROBLEM - configured eth on mw1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:21:29] PROBLEM - dhclient process on mw1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:22:19] RECOVERY - DPKG on mw1007 is OK: All packages OK [13:22:49] PROBLEM - nutcracker port on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:23:08] PROBLEM - nutcracker port on mw1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:23:59] PROBLEM - nutcracker process on mw1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:24:39] PROBLEM - SSH on mw1011 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:25:29] PROBLEM - Disk space on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:26:40] (03CR) 10Hashar: "Some files are ignored since they come from some 3rd parties." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/244148 (https://phabricator.wikimedia.org/T114887) (owner: 10Hashar) [13:27:08] RECOVERY - nutcracker port on mw1003 is OK: TCP OK - 0.000 second response time on port 11212 [13:27:38] RECOVERY - Disk space on mw1003 is OK: DISK OK [13:27:49] RECOVERY - dhclient process on mw1007 is OK: PROCS OK: 0 processes with command name dhclient [13:27:59] RECOVERY - Disk space on mw1007 is OK: DISK OK [13:28:00] RECOVERY - salt-minion processes on mw1007 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [13:28:42] (03PS9) 10Hashar: tox entry point to run pep8==1.4.6 [puppet] - 10https://gerrit.wikimedia.org/r/244148 (https://phabricator.wikimedia.org/T114887) [13:28:49] PROBLEM - DPKG on mw1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:28:58] (03PS10) 10Hashar: tox entry point to run pep8==1.4.6 [puppet] - 10https://gerrit.wikimedia.org/r/244148 (https://phabricator.wikimedia.org/T114887) [13:28:59] RECOVERY - SSH on mw1011 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3 (protocol 2.0) [13:29:59] PROBLEM - puppet last run on mw1009 is CRITICAL: CRITICAL: puppet fail [13:30:22] (03CR) 10jenkins-bot: [V: 04-1] tox entry point to run pep8==1.4.6 [puppet] - 10https://gerrit.wikimedia.org/r/244148 (https://phabricator.wikimedia.org/T114887) (owner: 10Hashar) [13:33:48] PROBLEM - nutcracker port on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:34:48] PROBLEM - Disk space on mw1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:34:49] PROBLEM - salt-minion processes on mw1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:35:48] PROBLEM - configured eth on mw1011 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:37:39] PROBLEM - SSH on mw1011 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:38:10] hhvm memleak again it seems, I'll bounce hhvm on machines about to run oom but still reachable [13:38:29] PROBLEM - Disk space on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:38:59] PROBLEM - DPKG on mw1011 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:39:28] !log bounce hhvm on mw1013 [13:39:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:41:08] RECOVERY - Disk space on mw1007 is OK: DISK OK [13:41:08] RECOVERY - salt-minion processes on mw1007 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [13:44:09] RECOVERY - SSH on mw1007 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3 (protocol 2.0) [13:44:09] RECOVERY - RAID on mw1007 is OK: OK: no RAID installed [13:44:18] PROBLEM - Disk space on mw1011 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:44:20] RECOVERY - nutcracker port on mw1007 is OK: TCP OK - 0.000 second response time on port 11212 [13:44:39] PROBLEM - dhclient process on mw1011 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:45:19] RECOVERY - nutcracker process on mw1007 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [13:45:49] RECOVERY - DPKG on mw1007 is OK: All packages OK [13:46:07] !log bounce hhvm on mw1009, powercycle mw1003 [13:46:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:46:39] RECOVERY - configured eth on mw1007 is OK: OK - interfaces up [13:48:58] RECOVERY - Disk space on mw1003 is OK: DISK OK [13:49:10] RECOVERY - SSH on mw1003 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3 (protocol 2.0) [13:49:19] RECOVERY - salt-minion processes on mw1003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [13:49:19] RECOVERY - RAID on mw1003 is OK: OK: no RAID installed [13:49:19] RECOVERY - DPKG on mw1003 is OK: All packages OK [13:49:59] RECOVERY - nutcracker process on mw1003 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [13:49:59] RECOVERY - configured eth on mw1003 is OK: OK - interfaces up [13:50:18] RECOVERY - dhclient process on mw1003 is OK: PROCS OK: 0 processes with command name dhclient [13:50:18] (03PS11) 10Hashar: tox entry point to run pep8==1.4.6 [puppet] - 10https://gerrit.wikimedia.org/r/244148 (https://phabricator.wikimedia.org/T114887) [13:50:29] RECOVERY - nutcracker port on mw1003 is OK: TCP OK - 0.000 second response time on port 11212 [13:50:38] PROBLEM - RAID on mw1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:50:39] PROBLEM - SSH on mw1007 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:51:10] (03CR) 10jenkins-bot: [V: 04-1] tox entry point to run pep8==1.4.6 [puppet] - 10https://gerrit.wikimedia.org/r/244148 (https://phabricator.wikimedia.org/T114887) (owner: 10Hashar) [13:51:19] PROBLEM - nutcracker process on mw1011 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:51:35] (03CR) 10Hashar: "Also excluded the git submodules and .git dir itself." [puppet] - 10https://gerrit.wikimedia.org/r/244148 (https://phabricator.wikimedia.org/T114887) (owner: 10Hashar) [13:51:39] RECOVERY - DPKG on mw1011 is OK: All packages OK [13:51:39] PROBLEM - salt-minion processes on mw1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:51:40] PROBLEM - Disk space on mw1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:51:49] PROBLEM - nutcracker process on mw1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:52:18] PROBLEM - DPKG on mw1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:52:29] RECOVERY - SSH on mw1011 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3 (protocol 2.0) [13:52:38] RECOVERY - configured eth on mw1011 is OK: OK - interfaces up [13:52:39] RECOVERY - Disk space on mw1011 is OK: DISK OK [13:52:59] PROBLEM - nutcracker port on mw1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:53:00] RECOVERY - dhclient process on mw1011 is OK: PROCS OK: 0 processes with command name dhclient [13:53:09] PROBLEM - configured eth on mw1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:53:09] PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [13:53:18] RECOVERY - nutcracker process on mw1011 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [13:53:29] PROBLEM - dhclient process on mw1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:54:00] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [13:54:18] RECOVERY - RAID on mw1011 is OK: OK: no RAID installed [13:56:00] RECOVERY - nutcracker process on mw1007 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [13:56:19] RECOVERY - DPKG on mw1007 is OK: All packages OK [13:56:48] RECOVERY - RAID on mw1007 is OK: OK: no RAID installed [13:56:49] RECOVERY - SSH on mw1007 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3 (protocol 2.0) [13:57:08] RECOVERY - nutcracker port on mw1007 is OK: TCP OK - 0.000 second response time on port 11212 [13:57:19] RECOVERY - configured eth on mw1007 is OK: OK - interfaces up [13:57:38] RECOVERY - puppet last run on mw1009 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:57:39] RECOVERY - dhclient process on mw1007 is OK: PROCS OK: 0 processes with command name dhclient [13:57:59] RECOVERY - salt-minion processes on mw1007 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [13:57:59] RECOVERY - Disk space on mw1007 is OK: DISK OK [13:58:19] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [13:58:29] RECOVERY - puppet last run on mw1007 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [13:59:38] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [14:00:58] PROBLEM - RAID on mw1011 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:02:57] (03PS1) 10Chad: Trivial comment fix for pep8 [puppet] - 10https://gerrit.wikimedia.org/r/263837 [14:03:19] !log bounce hhvm on mw1005, powercycle mw1011 [14:03:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:04:06] (03CR) 10jenkins-bot: [V: 04-1] Trivial comment fix for pep8 [puppet] - 10https://gerrit.wikimedia.org/r/263837 (owner: 10Chad) [14:04:58] PROBLEM - Host mw1011 is DOWN: PING CRITICAL - Packet loss = 100% [14:05:28] RECOVERY - puppet last run on mw1011 is OK: OK: Puppet is currently enabled, last run 1 hour ago with 0 failures [14:05:29] RECOVERY - Host mw1011 is UP: PING OK - Packet loss = 0%, RTA = 1.55 ms [14:07:09] RECOVERY - RAID on mw1011 is OK: OK: no RAID installed [14:07:42] (03PS1) 10Hoo man: Disable external identifiers on Wikidata for now [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263838 (https://phabricator.wikimedia.org/T123447) [14:11:10] (03CR) 10Chad: "lol, I try to fix pep8 problems and jenkins yells at me for pep8 that's not my fault :p" [puppet] - 10https://gerrit.wikimedia.org/r/263837 (owner: 10Chad) [14:11:58] !log bounce hhvm on mw1007 [14:12:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:19:24] 6operations, 10DBA, 7Icinga: "db1047/eventlogging_sync processes" icinga alert is flaky since at least early January - https://phabricator.wikimedia.org/T123509#1931235 (10hoo) 3NEW [14:20:40] (03PS6) 10Filippo Giunchedi: swift: adjust mount options for debian and ubuntu [puppet] - 10https://gerrit.wikimedia.org/r/263629 (https://phabricator.wikimedia.org/T117972) [14:24:29] PROBLEM - puppet last run on rdb1006 is CRITICAL: CRITICAL: Puppet has 1 failures [14:32:02] (03PS1) 10Filippo Giunchedi: icinga: report atlas measurement url in alert text [puppet] - 10https://gerrit.wikimedia.org/r/263840 [14:33:21] (03CR) 10jenkins-bot: [V: 04-1] icinga: report atlas measurement url in alert text [puppet] - 10https://gerrit.wikimedia.org/r/263840 (owner: 10Filippo Giunchedi) [14:35:38] (03CR) 10Filippo Giunchedi: "heh, ditto for https://gerrit.wikimedia.org/r/#/c/263840/ pep8 fails but not on the file under code review :(" [puppet] - 10https://gerrit.wikimedia.org/r/263837 (owner: 10Chad) [14:35:58] godog: I saw hashar tweaking pep8 files in scrollback [14:36:03] Somethin' ain't right :) [14:37:59] RECOVERY - test icmp reachability to codfw on ripe-atlas-codfw is OK: OK - failed 2 probes of 399 (alerts on 19) [14:38:25] heheh indeed, my money is on https://gerrit.wikimedia.org/r/#/c/262598/ [14:39:41] I suppose we could write up an omnibus patch to fix every pep8 violation in one go, but that seems silly :) [14:39:59] RECOVERY - test icmp reachability to eqiad on ripe-atlas-eqiad is OK: OK - failed 1 probes of 386 (alerts on 19) [14:40:52] indeed, I think I can merge https://gerrit.wikimedia.org/r/#/c/244148/ and do the voting switcharoo, not seeing hashar here now tho [14:41:44] @seen hashar [14:41:44] godog: Last time I saw hashar they were quitting the network with reason: Client Quit N/A at 1/13/2016 10:16:35 AM (4h25m8s ago) [14:42:33] Is there an associated zuul conf change? [14:43:57] good question, I don't know if there's one out yet [14:44:23] That first part can merge either way, trivial. [14:44:32] The part we really want afterwords is the zuul change to shut up the old job [14:44:40] PROBLEM - puppet last run on labsdb1006 is CRITICAL: CRITICAL: Puppet has 1 failures [14:46:18] (03CR) 10Filippo Giunchedi: [C: 031] "LGTM, though after the switch I think we want submodules to be linted too since the scripts inside are likely copied to production anyway" [puppet] - 10https://gerrit.wikimedia.org/r/244148 (https://phabricator.wikimedia.org/T114887) (owner: 10Hashar) [14:47:09] true [14:47:43] (03PS1) 10Steinsplitter: adding whewiki to imporsources. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263842 [14:49:11] (03CR) 10John Vandenberg: tox entry point to run pep8==1.4.6 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/244148 (https://phabricator.wikimedia.org/T114887) (owner: 10Hashar) [14:49:50] RECOVERY - puppet last run on rdb1006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:53:59] PROBLEM - puppet last run on mw1011 is CRITICAL: CRITICAL: Puppet has 1 failures [14:55:07] (03PS2) 10Steinsplitter: adding w:hewiki to wgImportSources. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263842 [14:55:54] (03Abandoned) 10Aude: Explicitly define Wikibase data types [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263753 (owner: 10Aude) [15:00:30] 6operations, 10Traffic, 5Patch-For-Review, 7Performance: Varnish apparently unconditionally varies on session cookies - https://phabricator.wikimedia.org/T122673#1931309 (10BBlack) 5Open>3Resolved a:3BBlack [15:00:51] (03PS1) 10Giuseppe Lavagetto: conftool-data: sync with what is in pybal's config at the moment [puppet] - 10https://gerrit.wikimedia.org/r/263843 [15:01:10] (03PS2) 10Giuseppe Lavagetto: conftool-data: sync with what is in pybal's config at the moment [puppet] - 10https://gerrit.wikimedia.org/r/263843 [15:05:39] (03CR) 10Giuseppe Lavagetto: [C: 032] conftool-data: sync with what is in pybal's config at the moment [puppet] - 10https://gerrit.wikimedia.org/r/263843 (owner: 10Giuseppe Lavagetto) [15:06:41] (03PS1) 10Chad: pep8 W601: has_key() is deprecated, use not in [puppet] - 10https://gerrit.wikimedia.org/r/263845 [15:09:59] RECOVERY - puppet last run on labsdb1006 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [15:11:28] (03CR) 10Anomie: [C: 031] Configure bot passwords (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263804 (https://phabricator.wikimedia.org/T123451) (owner: 10Gergő Tisza) [15:13:26] (03CR) 10Chad: [C: 031] mediawiki: add conftool-specifc credentials and scripts [puppet] - 10https://gerrit.wikimedia.org/r/258979 (owner: 10Giuseppe Lavagetto) [15:13:40] (03CR) 10Thiemo Mättig (WMDE): [C: 031] Disable external identifiers on Wikidata for now [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263838 (https://phabricator.wikimedia.org/T123447) (owner: 10Hoo man) [15:14:27] (03CR) 10Chad: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/263837 (owner: 10Chad) [15:18:08] joal: , bblack is pinging me about this: https://phabricator.wikimedia.org/T122651 [15:18:23] Over the holidays, I turned off the two jobs that were only using mobile [15:18:43] bblack is thinking tuesday the 19th for doing this switch [15:19:18] RECOVERY - puppet last run on mw1011 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:19:21] ottomata: No problem for me [15:19:25] i think we can stop the jobs, he can make the switch, and we can make the needed changes and then restart them once all of bblack's changes are ready [15:19:28] (03PS1) 10Giuseppe Lavagetto: lvs: use etcd for pybal config for ulsfo backups [puppet] - 10https://gerrit.wikimedia.org/r/263847 [15:19:39] <_joe_> bblack: feel adventurous? ^^ [15:19:49] ottomata: things should continue to work, and then we can change the oozie code [15:20:11] ja, i *think* that if we dont' change the oozie code, it will block if no mobile data comes in, ja? [15:20:21] ottomata: shouldn't :) [15:20:29] shouldn't block? [15:20:37] I don't think, why would it ? [15:20:53] because hourly mobile is the list of coordinator dependencies, no? [15:21:04] for some jobs? [15:21:09] Yes, but as long jobs don't fail ... [15:21:18] right, they won't fail, but say [15:21:32] raw mobile hour 15 won't have a _PARTITIONED file [15:21:36] It'll be cleaner to remove the mobile deps, for sure [15:21:39] which means refined won't have a _SUCCESS file [15:21:51] which means that any workflow that is waiting for that hour's mobile data [15:21:55] won't start [15:22:05] right ottomata [15:22:25] So, let's have a task about chaging that code [15:22:36] aye, but it needs to be done in coordination with bblack's change [15:22:41] Sure [15:22:45] if we do it before, we'll start jobs missing mobile data [15:22:52] joal: that is this task: https://phabricator.wikimedia.org/T122651 [15:22:53] :) [15:23:09] i can work on that one [15:23:09] yeah, just noticed (I'm still a bit jetlagged :)We can still code and review, [15:23:15] ottomata: it only becomes blocking if there's zero requests, right? [15:23:18] right [15:23:30] if any data at all, things will just go [15:23:39] (03PS2) 10Giuseppe Lavagetto: lvs: use etcd for pybal config for ulsfo backups [puppet] - 10https://gerrit.wikimedia.org/r/263847 [15:23:40] ottomata: so, the change will probably occur over a window of time, so what you really want to know is the end of tha twindow [15:23:49] yes [15:24:05] I'm planning to start on Tuesday the 19th, and then it will probably take a few hours, assuming everything goes smoothly. [15:24:14] but it could stretch into the next day for all I know [15:24:36] ok yeah, so as long as we are ready to make the change when requests finally stops flowing in to mobiles [15:24:39] that'll be fine [15:24:51] hehe, will monitoring reqs still hit mobile caches after public traffic is moved? [15:25:06] if so, our stuff will probably just keep working [15:25:41] ottomata: it'll stop working when no more data though [15:25:43] joal: also, if we restart jobs next week, will that bring in the XFF/X-Client-IP change at the same time? [15:26:01] https://gerrit.wikimedia.org/r/#/c/253474/ [15:26:41] i mean, will the new jobs all use X-Client-IP instead of using the UDF? [15:26:54] which would allow us to merge the change to remove XFF? [15:27:04] correct ottomata [15:27:13] ok cool. so this is a nicely timed restart then [15:27:20] :) [15:27:21] one restart two birds [15:27:24] one stone two changes [15:27:35] bblack, let's do it [15:27:47] (03PS1) 10John Vandenberg: Incorrect syntax in RCStreamCollector.collect [puppet] - 10https://gerrit.wikimedia.org/r/263848 [15:27:55] let's start early on tuesday so hopefully the window will close at a reasonable time? :) [15:28:17] ottomata: you make the changes I review ? [15:28:21] ja [15:28:33] ottomata: I plan a big day next tuesday just in case :) [15:28:40] hehe, k [15:28:50] (03CR) 10jenkins-bot: [V: 04-1] Incorrect syntax in RCStreamCollector.collect [puppet] - 10https://gerrit.wikimedia.org/r/263848 (owner: 10John Vandenberg) [15:29:39] (03CR) 10Giuseppe Lavagetto: [C: 031] Incorrect syntax in RCStreamCollector.collect [puppet] - 10https://gerrit.wikimedia.org/r/263848 (owner: 10John Vandenberg) [15:29:56] Thanks ottomata and bblack for synchro :) [15:30:05] ottomata: ok [15:30:36] ottomata: something to monitor in the meantime: will camus react ok to double-sized data in text partition [15:31:23] ottomata: many we should plan for kafka topic getting bigger, therefore more camus workers ? [15:31:30] ottomata: just in case ... [15:32:08] ottomata: kafka topic getting bigger --> I meant having more partitionds [15:32:49] (03CR) 10Giuseppe Lavagetto: puppet-merge: auto-run conftool-merge (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/263821 (owner: 10Giuseppe Lavagetto) [15:36:57] joal: the mobile traffic is pretty small compared to the text traffic, I doubt it will be a huge increase there [15:38:58] joal: well, to put some real numbers on that, text's daily peak is ~50K reqs/sec with an average of 41K, and the same numbers for mobile are something like 16K and 14K [15:39:07] ottomata: kafka says "Generic error: 'unicode' object does not support item assignment " [15:39:38] if I combine the two in graphite, I end up with peaks ~70K and average ~55K [15:40:22] so, we could call it something like a 20-25% -ish increase [15:40:47] mutante: i know [15:40:50] its not kafka though [15:40:53] that's eventbus service checker [15:40:54] bblack: Ok [15:40:58] i'm trying to work on that now [15:41:04] service_checker autmoatically urlencodes requests [15:41:09] request bodies [15:41:10] which is funky [15:41:15] _joe_ got any opinions on that? [15:41:16] ottomata: gotcha, i said it because the hosts its on are called kafka [15:41:31] i'm trying to make eventlogging-service support x-www-form-urlencoded posts [15:41:34] but its not quite working yet [15:41:36] aye [15:41:49] <_joe_> ottomata: no it's not if you declare another format of requests [15:41:49] joal: hm [15:41:54] via header? [15:42:13] <_joe_> ottomata: if you declare in the spec that you send json-formatted data, I'm pretty sure, it will work [15:42:18] naw, that doesn't work either, that gets more funky [15:42:26] was going down that route last night [15:42:33] <_joe_> ottomata: I'm pretty sure it doesn't [15:42:56] <_joe_> ottomata: we already support services that expect json bodies, I think [15:43:02] maybe i'm doing it wrong [15:43:02] but [15:43:08] <_joe_> but if that's not the case, I can fix service_checker [15:43:20] it seems that client.request_encode_body [15:43:21] <_joe_> now I need a coffee, but I'll check later [15:43:26] in fetch_url [15:43:32] <_joe_> also, ask mobrovac in case [15:43:34] is passed fields= [15:43:37] yeah was talking to him [15:43:53] <_joe_> ottomata: I can anyways add support for json bodies to service_checker [15:43:56] and, fields= expects that the value be bufferable? like a string or a file [15:43:58] not a dict [15:44:18] was trying many varytions of stringifying before that [15:44:19] <_joe_> ottomata: I'm not looking at the code atm [15:44:21] k [15:44:26] <_joe_> nah don't get creative [15:44:43] <_joe_> if a new function is needed in service_checker, let's add it [15:44:51] <_joe_> where is the spec I can look at? [15:45:01] <_joe_> and what specifically is failing? [15:45:15] <_joe_> also, is there a ticket for this [15:45:18] <_joe_> ? [15:45:31] "morning" [15:46:16] _joe_: naw, just started working on it, uMmMm [15:46:20] hang on [15:46:27] needed to make some spec changes, am testing it [15:46:50] <_joe_> ottomata: you were mostly right btw [15:46:58] _joe_: https://gist.github.com/ottomata/5e01284626f382f56b4a [15:47:03] <_joe_> I need to add a json encoding function [15:47:14] that's the up to date spec, haven't committed it yet [15:47:17] <_joe_> ottomata: which spec is failing? [15:47:24] that one _joe_^^ [15:47:25] oh [15:47:26] <_joe_> which test specifically? [15:47:27] which x-ample [15:47:30] <_joe_> yes [15:47:31] /v1/events [15:47:33] post [15:48:00] ACKNOWLEDGEMENT - Host analytics1021 is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn wrong IP (docker) [15:48:21] yeah, if i do it like that, with the content-type header in the request x-ample [15:48:28] it is still encoded by service_checker [15:48:29] test=this+is+a+test&meta=%7Bu%27topic%27%3A+u%27test.event%27%2C+u%27id%27%3A+u%2712345678-1234-5678-1234-567812345678%27%7D [15:48:38] <_joe_> ottomata: yeah I can fix that [15:48:44] ok! [15:55:05] (03CR) 10Alex Monk: "Filippo: Either all the other projects have to get one, or this one has to go." [dns] - 10https://gerrit.wikimedia.org/r/256597 (https://phabricator.wikimedia.org/T120143) (owner: 10Dzahn) [15:58:49] (03CR) 10Dzahn: "also see https://phabricator.wikimedia.org/T111967 for why this causes extra questions" [dns] - 10https://gerrit.wikimedia.org/r/256597 (https://phabricator.wikimedia.org/T120143) (owner: 10Dzahn) [15:59:20] (03CR) 10Alex Monk: "I have, but that's not in master, let alone the oldest wmf branch in production." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/261999 (https://phabricator.wikimedia.org/T122754) (owner: 10Florianschmidtwelzow) [16:00:04] anomie ostriches thcipriani marktraceur Krenair: Respected human, time to deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160113T1600). Please do the needful. [16:00:04] aude jgirault jan_drewniak Krenair: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [16:00:16] * aude waves [16:00:30] hey [16:01:27] (03PS3) 10Alex Monk: Add Wikibase-labs.php and Wikibase-production.php to noc [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263370 (owner: 10Aude) [16:01:33] (03CR) 10Alex Monk: [C: 032] Add Wikibase-labs.php and Wikibase-production.php to noc [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263370 (owner: 10Aude) [16:02:00] (03Merged) 10jenkins-bot: Add Wikibase-labs.php and Wikibase-production.php to noc [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263370 (owner: 10Aude) [16:02:18] Yay [16:02:55] 6operations, 10Traffic, 7Mobile, 7Varnish: Static image files from en.m.wikipedia.org are served with cache-suppressing headers - https://phabricator.wikimedia.org/T86993#1931440 (10BBlack) 5Open>3Resolved This was resolved incidentally during the mobile VCL refactoring that happened during the course... [16:03:17] !log krenair@tin Synchronized docroot/noc: https://gerrit.wikimedia.org/r/#/c/263370/3 (duration: 00m 31s) [16:03:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:03:47] looks good :) [16:04:03] aude, doesn't get listed on https://noc.wikimedia.org/conf/ ? [16:04:11] i had to break my cache [16:04:35] e.g. https://noc.wikimedia.org/conf/?randomkittens :) [16:05:50] ah yes [16:06:17] not sure about this next one [16:06:24] is NS_MODULE really defined at this point? [16:06:37] let me check [16:08:16] scribunto is included before wikibase [16:09:31] well [16:09:35] it does seem to work [16:09:45] will it still work when Scribunto switches to extension registration? [16:09:58] we could just hard code the namespace [16:10:12] yes, that's the normal way to do this sort of thing [16:10:20] ok [16:10:35] amending [16:11:04] https://gerrit.wikimedia.org/r/#/c/263838/ has an unmet dependency [16:11:05] (03CR) 10Filippo Giunchedi: "I see! my comment was mostly based on "users typing www.m.wikipedia.org in the browser" but I realize it can be quite the corner case" [dns] - 10https://gerrit.wikimedia.org/r/256597 (https://phabricator.wikimedia.org/T120143) (owner: 10Dzahn) [16:11:38] oh I see, that's the next patch [16:12:39] (03PS3) 10Aude: Set 828 (NS_MODULE) as a Wikibase client NS on Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263354 (https://phabricator.wikimedia.org/T123234) (owner: 10Hoo man) [16:12:59] Krenair: the data types thing... doesn't matter which order [16:13:17] (03CR) 10Alex Monk: [C: 032] Set 828 (NS_MODULE) as a Wikibase client NS on Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263354 (https://phabricator.wikimedia.org/T123234) (owner: 10Hoo man) [16:13:24] an unknown setting is just ignored / not used [16:13:57] (03Merged) 10jenkins-bot: Set 828 (NS_MODULE) as a Wikibase client NS on Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263354 (https://phabricator.wikimedia.org/T123234) (owner: 10Hoo man) [16:14:21] (03CR) 10Luke081515: [C: 031] add OfficeIT namespace to wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263768 (https://phabricator.wikimedia.org/T123383) (owner: 10ArielGlenn) [16:14:51] !log krenair@tin Synchronized wmf-config/Wikibase.php: https://gerrit.wikimedia.org/r/#/c/263354/ (duration: 00m 31s) [16:14:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:14:57] aude, please test [16:15:11] ok [16:15:40] looks good (eg. https://www.wikidata.org/wiki/Module:Wikidata) [16:16:34] (03CR) 10Alex Monk: [C: 032] Disable external identifiers on Wikidata for now [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263838 (https://phabricator.wikimedia.org/T123447) (owner: 10Hoo man) [16:17:29] (03Merged) 10jenkins-bot: Disable external identifiers on Wikidata for now [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263838 (https://phabricator.wikimedia.org/T123447) (owner: 10Hoo man) [16:18:03] (03CR) 10Niedzielski: [C: 031] delete www.m.wikipedia.org [dns] - 10https://gerrit.wikimedia.org/r/256597 (https://phabricator.wikimedia.org/T120143) (owner: 10Dzahn) [16:18:42] looks like wikibase runs quite a lot of tests [16:18:47] we do :/ [16:19:08] if you want to deploy other things in the meanwhile, go ahead [16:20:00] oh, then i need to update our build [16:20:02] gah [16:20:16] !log krenair@tin Synchronized wmf-config/Wikibase-production.php: https://gerrit.wikimedia.org/r/#/c/263838/ (duration: 00m 31s) [16:20:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:20:48] (03PS2) 10Alex Monk: Bump portals to master (fixes event logging around search) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263796 (owner: 10Jdrewniak) [16:20:53] * aude not used to having to be ready for swat + deploys in the morning and early in the day [16:21:04] (03CR) 10Alex Monk: [C: 032] Bump portals to master (fixes event logging around search) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263796 (owner: 10Jdrewniak) [16:21:28] (03Merged) 10jenkins-bot: Bump portals to master (fixes event logging around search) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263796 (owner: 10Jdrewniak) [16:21:43] jgirault, jan_drewniak: ready? [16:21:49] Yep [16:22:51] !log krenair@tin Synchronized portals: https://gerrit.wikimedia.org/r/#/c/263796/ (duration: 00m 31s) [16:22:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:23:09] jgirault, jan_drewniak: ok, please test [16:23:19] aude, is anything else needed before I sync https://gerrit.wikimedia.org/r/#/c/263820/ ? [16:23:52] yeah, update our build [16:23:57] which i am doing [16:25:47] Krenair: looks good! [16:26:39] (03PS2) 10Alex Monk: Added Simple English Wikipedia as import source for English Wikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263341 (https://phabricator.wikimedia.org/T123212) (owner: 10Pmlineditor) [16:26:45] (03CR) 10Alex Monk: [C: 032] Added Simple English Wikipedia as import source for English Wikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263341 (https://phabricator.wikimedia.org/T123212) (owner: 10Pmlineditor) [16:27:17] (03Merged) 10jenkins-bot: Added Simple English Wikipedia as import source for English Wikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263341 (https://phabricator.wikimedia.org/T123212) (owner: 10Pmlineditor) [16:28:16] !log krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/263341/ (duration: 00m 31s) [16:28:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:28:41] (03PS6) 10Alex Monk: Add temporary lift of IP cap for eswiki/wikivoyage on 2016-01-14/15 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263625 (https://phabricator.wikimedia.org/T123351) (owner: 10Mdann52) [16:29:08] (03CR) 10Alex Monk: [C: 032] Add temporary lift of IP cap for eswiki/wikivoyage on 2016-01-14/15 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263625 (https://phabricator.wikimedia.org/T123351) (owner: 10Mdann52) [16:29:40] (03Merged) 10jenkins-bot: Add temporary lift of IP cap for eswiki/wikivoyage on 2016-01-14/15 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263625 (https://phabricator.wikimedia.org/T123351) (owner: 10Mdann52) [16:31:21] !log krenair@tin Synchronized wmf-config/throttle.php: https://gerrit.wikimedia.org/r/#/c/263625/ (duration: 00m 31s) [16:31:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:32:04] aude, how's that build going? [16:32:18] waiting on jenkins [16:32:35] https://gerrit.wikimedia.org/r/#/c/263857/ [16:33:21] _joe_: it would be cool to support form encoded stuff too, buuut, i don't think i can: [16:33:22] https://gist.github.com/ottomata/44981d5fb85ac8339d8f [16:33:41] <_joe_> ottomata: don't worry, a patch is a comin [16:33:45] unless you know of a way to unencode back to the same thang [16:33:46] ok [16:35:57] aude, is the github -> gerrit change supposed to be part of this? [16:36:06] it's ok [16:36:24] i think [16:37:10] (it's been moved to gerrit and nothing from there is needed in this backport0 [16:37:22] <_joe_> ottomata: so, I think there is still an error in your spec [16:37:31] very likely [16:37:38] _joe_: in the gist i sent? [16:37:50] <_joe_> yes [16:37:55] <_joe_> multiple, actually [16:38:01] k tell me! [16:38:10] am debugging this piece by piece here [16:38:34] <_joe_> uhm wait [16:38:45] <_joe_> it's not what you have atm on kafka1001, right? [16:38:47] no [16:39:01] just pushed this uphttps://gerrit.wikimedia.org/r/#/c/263861/ [16:39:05] haven't merged or deploye dit [16:39:07] can do so now [16:39:30] _joe_: i am testing by running a service in the cli on a different port [16:39:38] want to just test that real quick? [16:39:40] run service checker on that? [16:39:59] kafka1002 port 8086 [16:41:21] <_joe_> ottomata: so in your spec at line 40 [16:41:38] <_joe_> response is misindented AFAICS [16:43:00] hm, _joe_, i get though [16:43:01] Test Produce a valid test event returned the unexpected status 500 (expecting: 201) [16:43:19] <_joe_> where do you get that? [16:43:31] but ja it does look weird [16:43:38] /usr/local/lib/nagios/plugins/service_checker -t 5 10.64.16.41 http://10.64.16.41:8086 [16:43:54] <_joe_> oh on 02 [16:43:59] <_joe_> because you updated it [16:44:05] <_joe_> 01 still gives me a funky spec [16:44:27] yeah 02 [16:44:36] i'm running in cli there with updated code [16:44:39] haven't deployed yet [16:44:44] to main service daemon [16:44:53] <_joe_> oblivian@kafka1002:~$ python checker.py -t 5 10.64.16.41 http://10.64.16.41:8086 [16:44:54] on port 8086 [16:44:56] <_joe_> All endpoints are healthy [16:44:59] <_joe_> ok [16:45:00] awesooome [16:45:14] yeah cool, i see the event in my cli [16:45:16] <_joe_> on port 8085 though [16:45:34] (03CR) 10Andrew Bogott: "Dang. Is there an easy way to verify that this new image has the same problem?" [puppet] - 10https://gerrit.wikimedia.org/r/263773 (owner: 10Andrew Bogott) [16:45:34] <_joe_> which is where service_checker is supposed to check [16:45:37] <_joe_> I get an error [16:45:43] _joe_: 8085 is the proper one, but it has a bad spec [16:45:46] haven't deployed [16:45:48] 8086 is just test [16:45:49] <_joe_> ok [16:45:53] <_joe_> ah I see [16:45:58] <_joe_> ok lemme submit my change [16:46:01] k, me too [16:46:02] will deploy [16:46:30] (03PS1) 10Giuseppe Lavagetto: service_checker: support sending json-encoded requests [puppet] - 10https://gerrit.wikimedia.org/r/263864 [16:46:36] <_joe_> ottomata: take a look ^^ [16:46:39] aude, hm... no submodule bump? [16:46:59] <_joe_> oh shit 15 mins at the ops meeting and I didn't even look at the etherpad [16:47:00] hm [16:47:06] should be automatic? [16:47:14] * aude can do it manually if needed [16:47:25] oooo kw body [16:47:26] cool [16:47:31] hasn't appeared at https://github.com/wikimedia/mediawiki/commits/wmf/1.27.0-wmf.10 [16:47:34] aude, yes please [16:47:42] (03CR) 10jenkins-bot: [V: 04-1] service_checker: support sending json-encoded requests [puppet] - 10https://gerrit.wikimedia.org/r/263864 (owner: 10Giuseppe Lavagetto) [16:47:47] (03CR) 10Ottomata: [C: 031] service_checker: support sending json-encoded requests [puppet] - 10https://gerrit.wikimedia.org/r/263864 (owner: 10Giuseppe Lavagetto) [16:47:50] (03CR) 10Faidon Liambotis: [C: 04-1] "This exclusion list doesn't make any sense. Our submodules aren't all third-party for starters. Plus, I don't see how it's an improvement " [puppet] - 10https://gerrit.wikimedia.org/r/244148 (https://phabricator.wikimedia.org/T114887) (owner: 10Hashar) [16:48:00] ok [16:48:10] <_joe_> why is pep8 failing? [16:48:45] <_joe_> my change passes strict pep8 [16:48:49] looks like not your fault [16:48:50] ERROR: unknown environment 'pep8' [16:48:59] <_joe_> ottomata: anyways, feel free to merge it as you like [16:49:01] k [16:49:02] merging [16:49:07] (03CR) 10Nuria: "Is wikimetrics not going to work on vagrant any longer?" [puppet] - 10https://gerrit.wikimedia.org/r/260687 (https://phabricator.wikimedia.org/T101763) (owner: 10Madhuvishy) [16:49:12] (03CR) 10Ottomata: [C: 032 V: 032] service_checker: support sending json-encoded requests [puppet] - 10https://gerrit.wikimedia.org/r/263864 (owner: 10Giuseppe Lavagetto) [16:49:14] thank you! [16:49:17] Krenair: wmf/1.27.0-wmf.10 [16:49:18] <_joe_> does the change make sense to you? [16:49:19] ah [16:49:24] https://gerrit.wikimedia.org/r/#/c/263865/ [16:49:26] yes _joe_ [16:49:26] (03CR) 10Alexandros Kosiaris: [C: 031] Incorrect syntax in RCStreamCollector.collect [puppet] - 10https://gerrit.wikimedia.org/r/263848 (owner: 10John Vandenberg) [16:49:28] makes sense [16:49:33] <_joe_> ottomata: I'm thinking of making service_checker a separate project [16:49:47] <_joe_> it just needs some documentation on how to write the specs [16:49:49] <_joe_> :P [16:50:03] <_joe_> then it might be useful to others outside the wmf too [16:50:53] (03PS2) 10Alexandros Kosiaris: pep8 W601: has_key() is deprecated, use not in [puppet] - 10https://gerrit.wikimedia.org/r/263845 (owner: 10Chad) [16:51:18] aude, see https://git.wikimedia.org/blob/mediawiki%2Fcore.git/8c1431602a36959e02c3892f84b5eda336a39078/.gitmodules [16:51:33] those first 5 extensions don't have branch= [16:51:40] therefore won't be getting auto submodule updates [16:51:45] oops, _joe_ [16:51:52] it depends on anothercommit :/ [16:51:55] will see if i can rebase it [16:52:06] (I think) [16:52:09] Krenair: ok :/ [16:52:10] (03CR) 10Ottomata: service_checker: support sending json-encoded requests [puppet] - 10https://gerrit.wikimedia.org/r/263864 (owner: 10Giuseppe Lavagetto) [16:52:45] there might be somethign strange with the scripts that cut the branch (the submodule bumps used to work) [16:52:51] but ok... [16:53:22] (03PS2) 10Ottomata: service_checker: support sending json-encoded requests [puppet] - 10https://gerrit.wikimedia.org/r/263864 (owner: 10Giuseppe Lavagetto) [16:53:43] (03CR) 10Ottomata: [C: 032 V: 032] service_checker: support sending json-encoded requests [puppet] - 10https://gerrit.wikimedia.org/r/263864 (owner: 10Giuseppe Lavagetto) [16:53:47] aude, might be due to special_extensions config in the make-wmf-branch script [16:53:56] (03CR) 10Alexandros Kosiaris: [C: 032] pep8 W601: has_key() is deprecated, use not in [puppet] - 10https://gerrit.wikimedia.org/r/263845 (owner: 10Chad) [16:54:07] (03PS3) 10Alexandros Kosiaris: pep8 W601: has_key() is deprecated, use not in [puppet] - 10https://gerrit.wikimedia.org/r/263845 (owner: 10Chad) [16:54:30] aude, it's those 5 which are listed in there [16:54:53] Krenair: ok [16:55:40] probably caused by this commit: https://gerrit.wikimedia.org/r/#/c/263537/1 [16:55:58] RECOVERY - eventlogging-service-eventbus endpoints health on kafka1002 is OK: All endpoints are healthy [16:55:58] awesome _joe_ it works! [16:56:29] RECOVERY - eventlogging-service-eventbus endpoints health on kafka1001 is OK: All endpoints are healthy [16:56:39] <_joe_> ottomata: pretty amaizing given I wrote it, heh? [16:57:00] hehe [16:57:12] (03PS1) 10John Vandenberg: Add flake8 rule for selected modules [puppet] - 10https://gerrit.wikimedia.org/r/263866 [16:58:11] (03CR) 10jenkins-bot: [V: 04-1] Add flake8 rule for selected modules [puppet] - 10https://gerrit.wikimedia.org/r/263866 (owner: 10John Vandenberg) [16:59:28] PROBLEM - mathoid endpoints health on sca1002 is CRITICAL: /{format}/ (mass-energy equivalence (complete)) is CRITICAL: Could not fetch url http://10.64.48.29:10042/complete/: Generic connection error: NoneType object has no attribute lower: /{format}/ (mass-energy equivalence (svg)) is CRITICAL: Could not fetch url http://10.64.48.29:10042/svg/: Generic connection error: NoneType object has no attribute lower: /{format}/ (mass-energy [16:59:29] PROBLEM - restbase endpoints health on restbase1001 is CRITICAL: /media/math/check/{type} (Mathoid - check test formula) is CRITICAL: Could not fetch url http://10.64.0.220:7231/en.wikipedia.org/v1/media/math/check/tex: Generic connection error: NoneType object has no attribute lower: /transform/wikitext/to/html{/title}{/revision} (Transform wikitext to html) is CRITICAL: Could not fetch url http://10.64.0.220:7231/en.wikipedia.org/v1 [17:00:08] PROBLEM - mathoid endpoints health on sca1001 is CRITICAL: /{format}/ (mass-energy equivalence (complete)) is CRITICAL: Could not fetch url http://10.64.32.153:10042/complete/: Generic connection error: NoneType object has no attribute lower: /{format}/ (mass-energy equivalence (svg)) is CRITICAL: Could not fetch url http://10.64.32.153:10042/svg/: Generic connection error: NoneType object has no attribute lower: /{format}/ (mass-ener [17:00:14] !log krenair@tin Synchronized php-1.27.0-wmf.10/extensions/Wikidata: https://gerrit.wikimedia.org/r/#/c/263865/ (duration: 00m 41s) [17:00:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:00:22] aude, ^ [17:00:24] please test [17:00:29] PROBLEM - cxserver endpoints health on sca1001 is CRITICAL: /v1/mt/{from}/{to}{/provider} (Machine translate an HTML fragment using Apertium.) is CRITICAL: Could not fetch url http://10.64.32.153:8080/v1/mt/en/es/Apertium: Generic connection error: NoneType object has no attribute lower [17:00:46] ok [17:01:08] PROBLEM - cxserver endpoints health on sca1002 is CRITICAL: /v1/mt/{from}/{to}{/provider} (Machine translate an HTML fragment using Apertium.) is CRITICAL: Could not fetch url http://10.64.48.29:8080/v1/mt/en/es/Apertium: Generic connection error: NoneType object has no attribute lower [17:01:28] PROBLEM - restbase endpoints health on restbase1007 is CRITICAL: /media/math/check/{type} (Mathoid - check test formula) is CRITICAL: Could not fetch url http://10.64.0.223:7231/en.wikipedia.org/v1/media/math/check/tex: Generic connection error: NoneType object has no attribute lower: /transform/wikitext/to/html{/title}{/revision} (Transform wikitext to html) is CRITICAL: Could not fetch url http://10.64.0.223:7231/en.wikipedia.org/v1 [17:01:49] PROBLEM - mathoid endpoints health on scb1001 is CRITICAL: /{format}/ (mass-energy equivalence (complete)) is CRITICAL: Could not fetch url http://10.64.0.16:10042/complete/: Generic connection error: NoneType object has no attribute lower: /{format}/ (mass-energy equivalence (svg)) is CRITICAL: Could not fetch url http://10.64.0.16:10042/svg/: Generic connection error: NoneType object has no attribute lower: /{format}/ (mass-energy e [17:01:58] Krenair: i think i need to tweak the config [17:02:08] i think the data type is 'external-identifier' and not 'external-id' [17:02:15] ok [17:02:19] PROBLEM - restbase endpoints health on restbase2006 is CRITICAL: /media/math/check/{type} (Mathoid - check test formula) is CRITICAL: Could not fetch url http://10.192.48.38:7231/en.wikipedia.org/v1/media/math/check/tex: Generic connection error: NoneType object has no attribute lower: /transform/wikitext/to/html{/title}{/revision} (Transform wikitext to html) is CRITICAL: Could not fetch url http://10.192.48.38:7231/en.wikipedia.org/ [17:02:40] give me a minute [17:03:19] PROBLEM - restbase endpoints health on restbase1008 is CRITICAL: /media/math/check/{type} (Mathoid - check test formula) is CRITICAL: Could not fetch url http://10.64.32.178:7231/en.wikipedia.org/v1/media/math/check/tex: Generic connection error: NoneType object has no attribute lower: /transform/wikitext/to/html{/title}{/revision} (Transform wikitext to html) is CRITICAL: Could not fetch url http://10.64.32.178:7231/en.wikipedia.org/ [17:03:28] PROBLEM - restbase endpoints health on praseodymium is CRITICAL: /media/math/check/{type} (Mathoid - check test formula) is CRITICAL: Could not fetch url http://10.64.16.149:7231/en.wikipedia.org/v1/media/math/check/tex: Generic connection error: NoneType object has no attribute lower: /transform/wikitext/to/html{/title}{/revision} (Transform wikitext to html) is CRITICAL: Could not fetch url http://10.64.16.149:7231/en.wikipedia.org/ [17:03:29] mark paravoid godog _joe_: could you look into mathoid? [17:04:49] PROBLEM - restbase endpoints health on restbase2003 is CRITICAL: /media/math/check/{type} (Mathoid - check test formula) is CRITICAL: Could not fetch url http://10.192.32.124:7231/en.wikipedia.org/v1/media/math/check/tex: Generic connection error: NoneType object has no attribute lower: /transform/wikitext/to/html{/title}{/revision} (Transform wikitext to html) is CRITICAL: Could not fetch url http://10.192.32.124:7231/en.wikipedia.or [17:05:15] gwicke, I have a feeling this might be a bad moment to be asking them to do this [17:05:21] (03CR) 10John Vandenberg: "putting the exclusion list in tox.ini means it can be shared between commands." [puppet] - 10https://gerrit.wikimedia.org/r/244148 (https://phabricator.wikimedia.org/T114887) (owner: 10Hashar) [17:06:49] !log restarted mathoid on sca1001 and sca1002 [17:06:51] !log restarted mathoid on sca1001 and sca1002 [17:06:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:06:58] PROBLEM - restbase endpoints health on restbase2004 is CRITICAL: /media/math/check/{type} (Mathoid - check test formula) is CRITICAL: Could not fetch url http://10.192.32.125:7231/en.wikipedia.org/v1/media/math/check/tex: Generic connection error: NoneType object has no attribute lower: /transform/wikitext/to/html{/title}{/revision} (Transform wikitext to html) is CRITICAL: Could not fetch url http://10.192.32.125:7231/en.wikipedia.or [17:07:05] Krenair: i checked on my local wiki [17:07:06] (03CR) 10John Vandenberg: "it would be nice if there was a tox job in jenkins that automatically invokes any tasks in the tox envlist." [puppet] - 10https://gerrit.wikimedia.org/r/263866 (owner: 10John Vandenberg) [17:07:32] it is 'external-id' (but the patch is such that the setting applies only to non- test wikis) [17:07:38] PROBLEM - restbase endpoints health on cerium is CRITICAL: /media/math/check/{type} (Mathoid - check test formula) is CRITICAL: Could not fetch url http://10.64.16.147:7231/en.wikipedia.org/v1/media/math/check/tex: Generic connection error: NoneType object has no attribute lower: /transform/wikitext/to/html{/title}{/revision} (Transform wikitext to html) is CRITICAL: Could not fetch url http://10.64.16.147:7231/en.wikipedia.org/v1/tra [17:07:52] so think it's ok [17:08:29] PROBLEM - restbase endpoints health on restbase-test2001 is CRITICAL: /media/math/check/{type} (Mathoid - check test formula) is CRITICAL: Could not fetch url http://10.192.16.149:7231/en.wikipedia.org/v1/media/math/check/tex: Generic connection error: NoneType object has no attribute lower: /transform/wikitext/to/html{/title}{/revision} (Transform wikitext to html) is CRITICAL: Could not fetch url http://10.192.16.149:7231/en.wikiped [17:08:56] (03CR) 10Tim Landscheidt: "If you launch a new Jessie instance and after first booting it has a mpt-statusd process running, then that's the issue." [puppet] - 10https://gerrit.wikimedia.org/r/263773 (owner: 10Andrew Bogott) [17:09:16] (03PS2) 10John Vandenberg: Add flake8 rule for selected modules [puppet] - 10https://gerrit.wikimedia.org/r/263866 [17:09:58] PROBLEM - restbase endpoints health on xenon is CRITICAL: /media/math/check/{type} (Mathoid - check test formula) is CRITICAL: Could not fetch url http://10.64.0.200:7231/en.wikipedia.org/v1/media/math/check/tex: Generic connection error: NoneType object has no attribute lower: /transform/wikitext/to/html{/title}{/revision} (Transform wikitext to html) is CRITICAL: Could not fetch url http://10.64.0.200:7231/en.wikipedia.org/v1/transf [17:09:59] PROBLEM - restbase endpoints health on restbase-test2003 is CRITICAL: /media/math/check/{type} (Mathoid - check test formula) is CRITICAL: Could not fetch url http://10.192.16.151:7231/en.wikipedia.org/v1/media/math/check/tex: Generic connection error: NoneType object has no attribute lower: /transform/wikitext/to/html{/title}{/revision} (Transform wikitext to html) is CRITICAL: Could not fetch url http://10.192.16.151:7231/en.wikiped [17:10:31] (03CR) 10jenkins-bot: [V: 04-1] Add flake8 rule for selected modules [puppet] - 10https://gerrit.wikimedia.org/r/263866 (owner: 10John Vandenberg) [17:10:58] PROBLEM - restbase endpoints health on restbase1003 is CRITICAL: /media/math/check/{type} (Mathoid - check test formula) is CRITICAL: Could not fetch url http://10.64.32.159:7231/en.wikipedia.org/v1/media/math/check/tex: Generic connection error: NoneType object has no attribute lower: /transform/wikitext/to/html{/title}{/revision} (Transform wikitext to html) is CRITICAL: Could not fetch url http://10.64.32.159:7231/en.wikipedia.org/ [17:11:19] PROBLEM - mathoid endpoints health on scb1002 is CRITICAL: /{format}/ (mass-energy equivalence (complete)) is CRITICAL: Could not fetch url http://10.64.16.21:10042/complete/: Generic connection error: NoneType object has no attribute lower: /{format}/ (mass-energy equivalence (svg)) is CRITICAL: Could not fetch url http://10.64.16.21:10042/svg/: Generic connection error: NoneType object has no attribute lower: /{format}/ (mass-energy [17:12:08] PROBLEM - restbase endpoints health on restbase2001 is CRITICAL: /media/math/check/{type} (Mathoid - check test formula) is CRITICAL: Could not fetch url http://10.192.16.152:7231/en.wikipedia.org/v1/media/math/check/tex: Generic connection error: NoneType object has no attribute lower: /transform/wikitext/to/html{/title}{/revision} (Transform wikitext to html) is CRITICAL: Could not fetch url http://10.192.16.152:7231/en.wikipedia.or [17:12:41] !log restarted mathoid on scb1001 and scb1002 [17:12:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:13:18] PROBLEM - restbase endpoints health on restbase1004 is CRITICAL: /media/math/check/{type} (Mathoid - check test formula) is CRITICAL: Could not fetch url http://10.64.32.160:7231/en.wikipedia.org/v1/media/math/check/tex: Generic connection error: NoneType object has no attribute lower: /transform/wikitext/to/html{/title}{/revision} (Transform wikitext to html) is CRITICAL: Could not fetch url http://10.64.32.160:7231/en.wikipedia.org/ [17:13:22] <_joe_> uh ottomata ^^ [17:13:26] <_joe_> I can fix it [17:13:50] is this just a service_checker issue? [17:14:03] <_joe_> gwicke: it is [17:14:06] hm! [17:14:11] (03PS1) 10Giuseppe Lavagetto: service_checker: fixup [puppet] - 10https://gerrit.wikimedia.org/r/263870 [17:14:14] <_joe_> gwicke: I'm fixing it [17:14:23] ah because header content type was set to none? [17:14:37] (03PS2) 10RobH: grant aude access to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/262836 (https://phabricator.wikimedia.org/T122977) [17:14:40] <_joe_> yeah I changed that before committing, really stupid [17:14:47] _joe_: okay, thanks [17:14:50] (03PS2) 10Giuseppe Lavagetto: service_checker: fixup [puppet] - 10https://gerrit.wikimedia.org/r/263870 [17:15:05] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/263870 (owner: 10Giuseppe Lavagetto) [17:15:07] _joe_: do you need to set it at all? [17:15:09] if it is not given? [17:15:15] <_joe_> ottomata: yep [17:15:18] 10Ops-Access-Requests, 6operations: Grant katie access to hive tables from stat1002 - https://phabricator.wikimedia.org/T122977#1931609 (10RobH) I missed this early in the week (sorry!) I'll merge at 10AM (after our ops meeting that is now in session.) Sorry about the delay, completely my fault! [17:15:58] PROBLEM - restbase endpoints health on restbase2005 is CRITICAL: /media/math/check/{type} (Mathoid - check test formula) is CRITICAL: Could not fetch url http://10.192.48.37:7231/en.wikipedia.org/v1/media/math/check/tex: Generic connection error: NoneType object has no attribute lower: /transform/wikitext/to/html{/title}{/revision} (Transform wikitext to html) is CRITICAL: Could not fetch url http://10.192.48.37:7231/en.wikipedia.org/ [17:16:15] <_joe_> so in ~ 20 minutes this will be fixed [17:17:02] _joe_: might be worth logging changes that can potentially trigger alerts [17:17:38] <_joe_> gwicke: those changes are in puppet, so it's "logged" already [17:17:49] <_joe_> we don't usually log merging puppet changes [17:18:10] <_joe_> also, ottomata merged this, so you might want to discuss it with him instead [17:18:29] PROBLEM - restbase endpoints health on restbase-test2002 is CRITICAL: /media/math/check/{type} (Mathoid - check test formula) is CRITICAL: Could not fetch url http://10.192.16.150:7231/en.wikipedia.org/v1/media/math/check/tex: Generic connection error: NoneType object has no attribute lower: /transform/wikitext/to/html{/title}{/revision} (Transform wikitext to html) is CRITICAL: Could not fetch url http://10.192.16.150:7231/en.wikiped [17:19:40] PROBLEM - restbase endpoints health on restbase2002 is CRITICAL: /media/math/check/{type} (Mathoid - check test formula) is CRITICAL: Could not fetch url http://10.192.16.153:7231/en.wikipedia.org/v1/media/math/check/tex: Generic connection error: NoneType object has no attribute lower: /transform/wikitext/to/html{/title}{/revision} (Transform wikitext to html) is CRITICAL: Could not fetch url http://10.192.16.153:7231/en.wikipedia.or [17:22:14] I see the point about avoiding duplication of log entries, but checking many places doesn't scale too well [17:22:59] PROBLEM - restbase endpoints health on restbase1009 is CRITICAL: /media/math/check/{type} (Mathoid - check test formula) is CRITICAL: Could not fetch url http://10.64.48.110:7231/en.wikipedia.org/v1/media/math/check/tex: Generic connection error: NoneType object has no attribute lower: /transform/wikitext/to/html{/title}{/revision} (Transform wikitext to html) is CRITICAL: Could not fetch url http://10.64.48.110:7231/en.wikipedia.org/ [17:23:08] PROBLEM - restbase endpoints health on restbase1002 is CRITICAL: /media/math/check/{type} (Mathoid - check test formula) is CRITICAL: Could not fetch url http://10.64.0.221:7231/en.wikipedia.org/v1/media/math/check/tex: Generic connection error: NoneType object has no attribute lower: /transform/wikitext/to/html{/title}{/revision} (Transform wikitext to html) is CRITICAL: Could not fetch url http://10.64.0.221:7231/en.wikipedia.org/v1 [17:23:49] PROBLEM - restbase endpoints health on restbase1005 is CRITICAL: /media/math/check/{type} (Mathoid - check test formula) is CRITICAL: Could not fetch url http://10.64.48.99:7231/en.wikipedia.org/v1/media/math/check/tex: Generic connection error: NoneType object has no attribute lower: /transform/wikitext/to/html{/title}{/revision} (Transform wikitext to html) is CRITICAL: Could not fetch url http://10.64.48.99:7231/en.wikipedia.org/v1 [17:24:48] PROBLEM - restbase endpoints health on restbase1006 is CRITICAL: /media/math/check/{type} (Mathoid - check test formula) is CRITICAL: Could not fetch url http://10.64.48.100:7231/en.wikipedia.org/v1/media/math/check/tex: Generic connection error: NoneType object has no attribute lower: /transform/wikitext/to/html{/title}{/revision} (Transform wikitext to html) is CRITICAL: Could not fetch url http://10.64.48.100:7231/en.wikipedia.org/ [17:24:54] <_joe_> gwicke: it's logged here, actually, via grrrit-wm [17:25:26] (03CR) 10Florianschmidtwelzow: "Sorry, then I misinterpreted your -1 as "this change needs more work", sorry :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/261999 (https://phabricator.wikimedia.org/T122754) (owner: 10Florianschmidtwelzow) [17:26:22] 6operations: Google Webmaster Tools - 1000 domain limit - https://phabricator.wikimedia.org/T99132#1931627 (10Tbayer) Another aspect that both makes this more timely and possibly presents another downside of splitting the account: It's now possible to associate an Android app to the Search Console, see https:/... [17:26:27] (03PS3) 10Yuvipanda: tools: Add simple command to log tool invocations [puppet] - 10https://gerrit.wikimedia.org/r/263817 (https://phabricator.wikimedia.org/T123444) [17:28:29] _joe_: that's a good point; I guess I should switch my attention from SAL to this channel [17:28:43] (03PS1) 10Giuseppe Lavagetto: service_checker: additional fixup [puppet] - 10https://gerrit.wikimedia.org/r/263872 [17:28:46] <_joe_> gwicke: I usually do both in fact [17:28:56] <_joe_> and, python knows how to be embarassing :/ [17:28:59] RECOVERY - restbase endpoints health on restbase1006 is OK: All endpoints are healthy [17:29:06] (03CR) 10jenkins-bot: [V: 04-1] service_checker: additional fixup [puppet] - 10https://gerrit.wikimedia.org/r/263872 (owner: 10Giuseppe Lavagetto) [17:29:23] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/263872 (owner: 10Giuseppe Lavagetto) [17:29:29] (03PS2) 10Giuseppe Lavagetto: service_checker: additional fixup [puppet] - 10https://gerrit.wikimedia.org/r/263872 [17:29:33] (03CR) 10Giuseppe Lavagetto: [V: 032] service_checker: additional fixup [puppet] - 10https://gerrit.wikimedia.org/r/263872 (owner: 10Giuseppe Lavagetto) [17:32:09] RECOVERY - restbase endpoints health on restbase-test2001 is OK: All endpoints are healthy [17:32:39] RECOVERY - restbase endpoints health on restbase2004 is OK: All endpoints are healthy [17:33:19] RECOVERY - restbase endpoints health on cerium is OK: All endpoints are healthy [17:33:29] RECOVERY - restbase endpoints health on xenon is OK: All endpoints are healthy [17:35:40] RECOVERY - restbase endpoints health on restbase-test2003 is OK: All endpoints are healthy [17:36:59] RECOVERY - mathoid endpoints health on scb1002 is OK: All endpoints are healthy [17:37:49] RECOVERY - restbase endpoints health on restbase2001 is OK: All endpoints are healthy [17:38:48] RECOVERY - restbase endpoints health on restbase1003 is OK: All endpoints are healthy [17:38:59] RECOVERY - restbase endpoints health on restbase1004 is OK: All endpoints are healthy [17:41:39] RECOVERY - restbase endpoints health on restbase2005 is OK: All endpoints are healthy [17:44:09] RECOVERY - restbase endpoints health on restbase-test2002 is OK: All endpoints are healthy [17:45:20] RECOVERY - restbase endpoints health on restbase2002 is OK: All endpoints are healthy [17:48:39] RECOVERY - restbase endpoints health on restbase1009 is OK: All endpoints are healthy [17:48:48] RECOVERY - restbase endpoints health on restbase1002 is OK: All endpoints are healthy [17:49:30] RECOVERY - restbase endpoints health on restbase1005 is OK: All endpoints are healthy [17:51:28] (03CR) 10Yuvipanda: tools: Add simple command to log tool invocations (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/263817 (https://phabricator.wikimedia.org/T123444) (owner: 10Yuvipanda) [17:54:19] RECOVERY - cxserver endpoints health on sca1002 is OK: All endpoints are healthy [17:54:49] RECOVERY - mathoid endpoints health on sca1002 is OK: All endpoints are healthy [17:55:09] RECOVERY - restbase endpoints health on restbase1001 is OK: All endpoints are healthy [17:55:29] RECOVERY - mathoid endpoints health on sca1001 is OK: All endpoints are healthy [17:55:56] (03CR) 10Rush: "this is pretty neat idea, I'm not sure where you are going with it but two questions" [puppet] - 10https://gerrit.wikimedia.org/r/263817 (https://phabricator.wikimedia.org/T123444) (owner: 10Yuvipanda) [17:55:59] RECOVERY - cxserver endpoints health on sca1001 is OK: All endpoints are healthy [17:56:33] (03PS2) 10Faidon Liambotis: Add dduvall to deployment group [puppet] - 10https://gerrit.wikimedia.org/r/263676 (owner: 10Greg Grossmeier) [17:56:43] (03CR) 10Faidon Liambotis: [C: 032] Add dduvall to deployment group [puppet] - 10https://gerrit.wikimedia.org/r/263676 (owner: 10Greg Grossmeier) [17:56:49] RECOVERY - restbase endpoints health on restbase1007 is OK: All endpoints are healthy [17:56:58] 10Ops-Access-Requests, 6operations, 10Analytics: add mforns, milimetric, nuria,ottomata, madhuvishy and joal to piwik-roots - https://phabricator.wikimedia.org/T122325#1931672 (10RobH) Meetign result: We approved the analytics group folks in this request to add mforns, milimetric, nuria,ottomata, madhuvishy... [17:57:09] RECOVERY - mathoid endpoints health on scb1001 is OK: All endpoints are healthy [17:57:12] thanks paravoid :) [17:57:13] Jeff_Green: Mind if I kick off the monthly openvas scan for eqiad now? Finally got it patched and running yesterday.. [17:57:27] csteipp: go for it [17:57:35] marxarelli: you now have deploy powers [17:57:39] well, once puppet runs [17:57:47] saw that! [17:57:48] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Access for new Analytics Opsen: Luca Toscano - https://phabricator.wikimedia.org/T122925#1931673 (10RobH) Ops Meeting result: approved [17:57:48] RECOVERY - restbase endpoints health on restbase2006 is OK: All endpoints are healthy [17:58:11] (03CR) 10Eevans: "> Not currently being branched..." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263797 (https://phabricator.wikimedia.org/T116786) (owner: 10Eevans) [17:58:43] marxarelli: I thought of you with the bowie news. can I just say, you are not my bowie surrogate so thanks an. [17:58:48] RECOVERY - restbase endpoints health on restbase1008 is OK: All endpoints are healthy [17:58:49] RECOVERY - restbase endpoints health on praseodymium is OK: All endpoints are healthy [17:59:13] chasemp: :( [17:59:23] hard to talk about that still [17:59:24] I meant now [17:59:29] instead of not [17:59:36] man that took an unintended turn [18:00:12] haha, thought that would be egregious to just remind me that i'm _still_ not your bowie surrogate [18:00:40] I have a heart dude [18:00:48] i got what you meant :) [18:01:09] (03PS3) 10RobH: grant aude access to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/262836 (https://phabricator.wikimedia.org/T122977) [18:01:22] * RobH stalks zuul for merge [18:01:26] 6operations: reduce amount of remaining Ubuntu 12.04 systems - https://phabricator.wikimedia.org/T123525#1931694 (10Dzahn) 3NEW [18:02:15] (03CR) 10RobH: [C: 032] grant aude access to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/262836 (https://phabricator.wikimedia.org/T122977) (owner: 10RobH) [18:02:38] RECOVERY - restbase endpoints health on restbase2003 is OK: All endpoints are healthy [18:03:07] !log zotero deploying translators 0476aa0 [18:03:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:04:31] (03CR) 10Chad: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/263837 (owner: 10Chad) [18:04:32] 6operations, 10ops-eqiad: db1052 degraded RAID - https://phabricator.wikimedia.org/T122703#1931707 (10akosiaris) 5Open>3Resolved a:3akosiaris ``` Enclosure Device ID: 32 Slot Number: 0 Drive's position: DiskGroup: 0, Span: 0, Arm: 0 Enclosure position: 1 Device Id: 0 WWN: 5000C50053CB4560 Sequence Number... [18:07:47] 10Ops-Access-Requests, 6operations: Grant katie access to hive tables from stat1002 - https://phabricator.wikimedia.org/T122977#1931718 (10RobH) 5stalled>3Resolved Aude's access to analytics-privatedata-users is now live. [18:08:41] (03CR) 10Dzahn: [C: 031] setup shell/sudo access for new employee Luca Toscano [puppet] - 10https://gerrit.wikimedia.org/r/263078 (https://phabricator.wikimedia.org/T122925) (owner: 10RobH) [18:09:19] <_joe_> whoa [18:09:32] all the klines ..wow [18:09:43] is that irccloud being killed? [18:09:51] looks like it [18:09:51] were those all irccloud? [18:09:53] yeah I suspect so [18:09:54] mutante: yea that patchset has a rebase path conflict im pulling it down to fix [18:10:25] a disadvantage of centralization [18:10:32] RobH: *nod* [18:10:52] the few, the proud, the paranoid self operations of irc bouncers ;] [18:11:33] (03CR) 10Gergő Tisza: Configure bot passwords (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263804 (https://phabricator.wikimedia.org/T123451) (owner: 10Gergő Tisza) [18:15:55] (03PS2) 10RobH: setup shell/sudo access for new employee Luca Toscano [puppet] - 10https://gerrit.wikimedia.org/r/263078 (https://phabricator.wikimedia.org/T122925) [18:16:31] (03CR) 10RobH: [C: 032] setup shell/sudo access for new employee Luca Toscano [puppet] - 10https://gerrit.wikimedia.org/r/263078 (https://phabricator.wikimedia.org/T122925) (owner: 10RobH) [18:18:03] elukey: ^ ok this is your last hour of calm, then you gotta get to work! (puppet change is live, all the systems should have your key in less than an hour) [18:18:05] ;] [18:18:50] also if you are still in the office, you should get a yubikey neo from oit [18:18:57] (so we can set you up with 2FA) [18:20:33] 10Ops-Access-Requests, 6operations: Access for new Analytics Opsen: Luca Toscano - https://phabricator.wikimedia.org/T122925#1931765 (10RobH) [18:20:50] 10Ops-Access-Requests, 6operations: Access for new Analytics Opsen: Luca Toscano - https://phabricator.wikimedia.org/T122925#1931770 (10RobH) 5Open>3Resolved a:3RobH Luca's access is now live. [18:23:27] RobH: thanks! I was about to ask for the yubikey :) [18:24:44] I'll be in the office tomorrow for keysigning [18:24:51] i saw daniel was going to be as well [18:25:13] (03CR) 10Chad: [C: 032] add OfficeIT namespace to wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263768 (https://phabricator.wikimedia.org/T123383) (owner: 10ArielGlenn) [18:25:14] YuviPanda: if you make it into the office before lunch we can do ops team lunch! [18:25:34] (this isnt knocing yuvi's working since he stays up all night, merely his sleep schedule) [18:25:53] hey guys, could you please restart zotero and mobileapps? [18:26:00] on sca100x and scb100x respectively [18:26:05] (03Merged) 10jenkins-bot: add OfficeIT namespace to wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263768 (https://phabricator.wikimedia.org/T123383) (owner: 10ArielGlenn) [18:27:19] !log demon@tin Synchronized wmf-config/InitialiseSettings.php: OfficeIT namespace on wikitech (duration: 00m 31s) [18:27:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:28:15] mobrovac: sorry, i tabbed away from chat and just saw your question [18:28:29] is it jsut a service resetart for each of those and why? [18:29:00] thanks RobH :) [18:30:08] mobrovac: basically im not supposed to just restart things because someone asks, but understand why im restarting it. So I'm asking =] [18:31:46] RobH: fair enough :) [18:32:06] you guys doing work or its just run out of memory and needs restart or what? [18:32:07] RobH: i deployed new code for both, but miss the corresponding sudo rights [18:32:09] 7Blocked-on-Operations, 10Ops-Access-Requests, 6operations, 6Discovery, 10Maps: Please grant admin Cassandra access to maps-admins - https://phabricator.wikimedia.org/T122465#1931809 (10Dzahn) 5Open>3Resolved a:3Dzahn This has been brought up in the ops meeting (it wasn't tagged as access request b... [18:32:13] ahh, that works [18:32:34] 'i restarted them due to marko's new code deploy' sounds much better than 'marko said so' [18:32:38] will do now [18:32:42] :)))) [18:33:08] i need to create access tickets for those [18:33:39] !log restarted zotero/mobileapps on sca1*/scb1* respectively [18:33:41] 6operations, 10vm-requests: Site: (1) VM request for url_downloader - https://phabricator.wikimedia.org/T123472#1931816 (10Dzahn) a:3Dzahn [18:33:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, RobH [18:33:55] !log restarted zotero/mobileapps on sca1*/scb1* respectively for marko's code deploy [18:33:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, RobH [18:33:59] (i hit enter too soon) [18:34:08] mobrovac: successful [18:34:25] thnx RobH! [18:34:27] heh, well, our ops meeting next week is likely on wednesday [18:34:34] so keep that in mind if you file a sudo request this week [18:34:39] yup, sure [18:34:41] (due to the US holiday on Monday) [18:34:51] welcome [18:35:06] apergos: im using your salt master ;] [18:46:22] (03PS1) 10Dzahn: introduce alsafi.codfw.wmnet [dns] - 10https://gerrit.wikimedia.org/r/263890 (https://phabricator.wikimedia.org/T123472) [18:48:15] (03CR) 10Dzahn: [C: 04-1] "eh.. no.. needs public IP" [dns] - 10https://gerrit.wikimedia.org/r/263890 (https://phabricator.wikimedia.org/T123472) (owner: 10Dzahn) [18:50:00] 6operations, 10Gerrit, 10hardware-requests: Need spare server to upgrade/migrate gerrit - https://phabricator.wikimedia.org/T123132#1931898 (10RobH) I'd like to use lead for this, but we need the memory upgraded from 16 to 32GB. I'll create a sub-task for @cmjohnson to check if he can upgrade with onsite sp... [18:51:03] 6operations, 10ops-eqiad, 10Gerrit, 10hardware-requests: check for memory upgrade for lead - https://phabricator.wikimedia.org/T123531#1931904 (10RobH) 3NEW a:3Cmjohnson [18:53:03] 6operations, 10ops-eqiad, 10Gerrit, 10hardware-requests: check for memory upgrade for lead - https://phabricator.wikimedia.org/T123531#1931932 (10RobH) [18:56:12] (03PS2) 10Dzahn: introduce alsafi.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/263890 (https://phabricator.wikimedia.org/T123472) [18:56:39] (03CR) 10jenkins-bot: [V: 04-1] introduce alsafi.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/263890 (https://phabricator.wikimedia.org/T123472) (owner: 10Dzahn) [18:57:13] (03PS3) 10Dzahn: introduce alsafi.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/263890 (https://phabricator.wikimedia.org/T123472) [18:57:30] RobH: nice [18:57:34] (03CR) 10jenkins-bot: [V: 04-1] introduce alsafi.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/263890 (https://phabricator.wikimedia.org/T123472) (owner: 10Dzahn) [18:58:08] RobH: nah, still dealing with the mold shit [18:58:38] YuviPanda: that sucks dude [18:58:45] (03PS4) 10Dzahn: introduce alsafi.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/263890 (https://phabricator.wikimedia.org/T123472) [18:58:47] dont get sick. [18:59:56] (03PS5) 10Dzahn: introduce alsafi.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/263890 (https://phabricator.wikimedia.org/T123472) [19:00:04] marxarelli: Respected human, time to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160113T1900). Please do the needful. [19:02:25] (03CR) 10Yuvipanda: "It's not a blocking operation, and I'm not sure how syslog would be useful... we can add it later if needed, maybe? Since it'll have to be" [puppet] - 10https://gerrit.wikimedia.org/r/263817 (https://phabricator.wikimedia.org/T123444) (owner: 10Yuvipanda) [19:04:24] (03PS6) 10Dzahn: introduce alsafi.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/263890 (https://phabricator.wikimedia.org/T123472) [19:04:28] (03PS4) 10Yuvipanda: tools: Add simple command to log tool invocations [puppet] - 10https://gerrit.wikimedia.org/r/263817 (https://phabricator.wikimedia.org/T123444) [19:05:10] (03CR) 10Dzahn: [C: 032] introduce alsafi.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/263890 (https://phabricator.wikimedia.org/T123472) (owner: 10Dzahn) [19:06:00] marxarelli isn't around, I'll run the train. [19:07:25] (03PS1) 10Thcipriani: group1 wikis to 1.27.0-wmf.10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263898 [19:07:38] 6operations, 10vm-requests, 5Patch-For-Review: Site: (1) VM request for url_downloader - https://phabricator.wikimedia.org/T123472#1931968 (10Dzahn) 5Open>3Resolved Wed Jan 13 19:06:25 2016 - INFO: Selected nodes for instance alsafi.wikimedia.org via iallocator hail: ganeti2006.codfw.wmnet, ganeti2004.c... [19:07:39] 6operations: request VM for url-downloader in codfw - https://phabricator.wikimedia.org/T123386#1931970 (10Dzahn) [19:08:00] 6operations: request VM for url-downloader in codfw - https://phabricator.wikimedia.org/T123386#1931979 (10Dzahn) a:3Dzahn [19:09:36] (03CR) 10Thcipriani: [C: 032] "TRAIN" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263898 (owner: 10Thcipriani) [19:09:58] (03Merged) 10jenkins-bot: group1 wikis to 1.27.0-wmf.10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263898 (owner: 10Thcipriani) [19:09:59] OMGTRAIN, GET OUT OF THE WAY [19:10:07] :) [19:10:29] choo choo [19:10:42] add cowcatcher [19:11:11] !log thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.27.0-wmf.10 [19:11:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:11:21] mutante: :) [19:12:18] PROBLEM - puppet last run on cp3044 is CRITICAL: CRITICAL: puppet fail [19:16:45] hmmm, lots of resourceloader warnings showing up in mediawiki-errors log [19:16:58] 10Ops-Access-Requests, 6operations, 6Services, 3Mobile-Content-Service: Allow mobrovac to restart MobileApps - https://phabricator.wikimedia.org/T123540#1932040 (10mobrovac) 3NEW [19:17:16] 10Ops-Access-Requests, 6operations, 6Services, 3Mobile-Content-Service: Allow mobrovac to restart MobileApps - https://phabricator.wikimedia.org/T123540#1932048 (10mobrovac) @GWicke could you please approve? [19:17:24] (03PS1) 10Dzahn: install_server: add alsafi to DHCP and netboot [puppet] - 10https://gerrit.wikimedia.org/r/263904 (https://phabricator.wikimedia.org/T123386) [19:17:37] 10Ops-Access-Requests, 6operations, 6Services, 3Mobile-Content-Service: Allow mobrovac to restart MobileApps - https://phabricator.wikimedia.org/T123540#1932052 (10GWicke) Approved. [19:17:58] (03PS1) 10Samtar: Lift IP rate limit [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263905 (https://phabricator.wikimedia.org/T123458) [19:18:07] (03CR) 10Dzahn: [C: 032] install_server: add alsafi to DHCP and netboot [puppet] - 10https://gerrit.wikimedia.org/r/263904 (https://phabricator.wikimedia.org/T123386) (owner: 10Dzahn) [19:24:57] (03PS2) 10Samtar: Lift IP rate limit [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263905 (https://phabricator.wikimedia.org/T123458) [19:27:08] !log dropping eventlogging tables from MobileWebSectionUsage_14321266 and MobileWebSectionUsage_15038458 m4-master log database. These are too large and have been blacklisted from mysql. No more events will be inserted into mysql for these. We are attempting to help replication catch up on the analytics-store slave. [19:27:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:29:28] PROBLEM - Unmerged changes on repository mediawiki_config on mira is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/). [19:37:54] (03PS1) 10ArielGlenn: dumps: configure for parallelized runs for zhwiki, metawiki [puppet] - 10https://gerrit.wikimedia.org/r/263912 [19:38:51] (03PS2) 10ArielGlenn: dumps: configure for parallelized runs for zhwiki, metawiki [puppet] - 10https://gerrit.wikimedia.org/r/263912 [19:39:39] RECOVERY - puppet last run on cp3044 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [19:40:00] cmjohnson1: , yt? [19:40:11] yes [19:42:15] 6operations, 10ops-eqiad: Possible bad mem chip or slot on dbproxy1004 - https://phabricator.wikimedia.org/T123546#1932134 (10Ottomata) 3NEW a:3Cmjohnson [19:42:19] cmjohnson1: https://phabricator.wikimedia.org/T123546 [19:42:42] (03CR) 10Matanya: [C: 031] adding w:hewiki to wgImportSources. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263842 (owner: 10Steinsplitter) [19:43:05] ottomata: okay....this will require me to do several reboots, etc. Is it something I need to schedule? [19:43:53] i think so! [19:43:55] nuria: ^^ [19:44:01] ottomata: yessir [19:44:10] cmjohnson1: this is a tricky one [19:44:15] this is a mysql master [19:44:19] that i've never really looked at before [19:44:23] and jaime is on vacation [19:44:32] i'm not sure what uses it [19:44:43] perhaps just eventlogging? [19:44:44] cmjohnson1: we can schedule an outage if you let us know of a window that works for you [19:45:06] 6operations, 10ops-eqiad, 10Analytics: Possible bad mem chip or slot on dbproxy1004 - https://phabricator.wikimedia.org/T123546#1932143 (10Nuria) [19:45:24] cmjohnson1: ticket will be best place to figure out a window [19:45:30] Hrm..so it's out of warranty ...so several reboots probably not necessary but I need to see if I have spare DIMM [19:45:40] but we'll still neeed to schedule a few mins of downtime [19:45:49] yep..i will update it there thanks nuria [19:45:57] thcipriani: task filed? [19:46:05] re the RL log spam [19:46:21] (I missed your comment until now) [19:46:44] (03PS1) 10Dzahn: site.pp: add alsafi.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/263915 (https://phabricator.wikimedia.org/T123386) [19:47:21] (03PS4) 10ArielGlenn: dumps: set up but don't enable script for dumps to run from cron [puppet] - 10https://gerrit.wikimedia.org/r/263807 (https://phabricator.wikimedia.org/T107750) [19:47:33] (03CR) 10Dzahn: [C: 032] site.pp: add alsafi.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/263915 (https://phabricator.wikimedia.org/T123386) (owner: 10Dzahn) [19:48:18] greg-g: https://phabricator.wikimedia.org/T123547 [19:49:30] thcipriani: ah, it went back down, right? [19:50:50] 6operations, 10Traffic, 5Patch-For-Review, 7Performance: Varnish apparently unconditionally varies on session cookies - https://phabricator.wikimedia.org/T122673#1932169 (10GWicke) Thanks, @bblack @faidon @ema. [19:55:26] !log elasticsearch: wikimania2017_content was reporting as missing in logstash, ran updateSearchIndexConfig. messy aliases? Seems to be working again. [19:55:30] ebernhardson: ping ^ [19:55:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:55:55] !log *wikimania2017wiki_content [19:56:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:56:34] 6operations, 5Patch-For-Review: request VM for url-downloader in codfw - https://phabricator.wikimedia.org/T123386#1932209 (10Dzahn) - VM created - OS installed (jessie) - added to puppet and salt alsafi.wikimedia.org can now have a role applied to it [19:56:47] 6operations: request VM for url-downloader in codfw - https://phabricator.wikimedia.org/T123386#1932210 (10Dzahn) [19:56:56] 6operations: request VM for url-downloader in codfw - https://phabricator.wikimedia.org/T123386#1932214 (10Dzahn) 5Open>3Resolved [19:56:57] 6operations, 5Patch-For-Review: url-downloader should be set up more redundantly - https://phabricator.wikimedia.org/T122134#1932215 (10Dzahn) [19:57:42] (03PS1) 10Andrew Bogott: Don't include mpt-status in new images. [puppet] - 10https://gerrit.wikimedia.org/r/263917 [19:57:57] 6operations, 5Patch-For-Review: url-downloader should be set up more redundantly - https://phabricator.wikimedia.org/T122134#1897515 (10Dzahn) A VM in codfw has been created, OS installed (jessie), added to puppet and salt with standard. The name is **alsafi.wikimedia.org**. [19:58:32] (03PS2) 10Andrew Bogott: Don't include mpt-status in new images. [puppet] - 10https://gerrit.wikimedia.org/r/263917 [19:58:54] (03PS3) 10Andrew Bogott: Don't include mpt-status in new images. [puppet] - 10https://gerrit.wikimedia.org/r/263917 [20:01:14] !log dropped MobileWebSectionUsage_14321266 and MobileWebSectionUsage_15038458 from analytics-store eventlogging slave db [20:01:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:03:07] 6operations, 7Mail: remove exim alias - dapatrick - https://phabricator.wikimedia.org/T123454#1932261 (10Dzahn) done ``` -# Darian Anthony Patrick -dapatrick: dpatrick - ``` [20:03:17] 6operations, 7Mail: Move most (all?) exim personal aliases to OIT - https://phabricator.wikimedia.org/T122144#1932274 (10Dzahn) [20:03:19] 6operations, 7Mail: remove exim alias - dapatrick - https://phabricator.wikimedia.org/T123454#1932273 (10Dzahn) 5Open>3Resolved [20:05:54] 6operations, 7Mail: remove exim alias - chase, chasemp, rush - https://phabricator.wikimedia.org/T123453#1932292 (10Dzahn) done ``` -# Chase Pettet -chase: cpettet -chasemp: cpettet -rush: cpettet - ``` [20:05:57] (03CR) 10Andrew Bogott: [C: 032] Don't include mpt-status in new images. [puppet] - 10https://gerrit.wikimedia.org/r/263917 (owner: 10Andrew Bogott) [20:06:06] 6operations, 7Mail: Move most (all?) exim personal aliases to OIT - https://phabricator.wikimedia.org/T122144#1932295 (10Dzahn) [20:06:07] 6operations, 7Mail: remove exim alias - chase, chasemp, rush - https://phabricator.wikimedia.org/T123453#1932294 (10Dzahn) 5Open>3Resolved [20:08:12] chasemp: do you even use all of these? :) [20:08:34] I missed something I think which these is the all [20:08:49] email aliases :) [20:09:02] oh ha, well the rush/chasemp complication is very real [20:09:08] not that it matters, it's free [20:09:22] and as of 3 minutes ago, not even managed by us :) [20:09:29] other than that cpettet is the professional assumed, I'm not sure what else there is [20:09:34] so, kinda [20:24:22] (03CR) 10Reedy: "It should be fine to merge this now. At worst, we have the below as a back compat until the core patch is everywhere" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/261999 (https://phabricator.wikimedia.org/T122754) (owner: 10Florianschmidtwelzow) [20:25:49] (03CR) 10Alex Monk: Remove $wgCopyrightIcon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/261999 (https://phabricator.wikimedia.org/T122754) (owner: 10Florianschmidtwelzow) [20:28:44] (03CR) 10Luke081515: [C: 04-1] Lift IP rate limit (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263905 (https://phabricator.wikimedia.org/T123458) (owner: 10Samtar) [20:31:57] (03PS3) 10Samtar: Lift IP rate limit [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263905 (https://phabricator.wikimedia.org/T123458) [20:32:19] 7Puppet, 6operations, 5Continuous-Integration-Scaling, 7WorkType-NewFunctionality: On Jessie, puppet does not start zuul-merger via init scripts - https://phabricator.wikimedia.org/T118861#1932403 (10greg) [20:32:28] 6operations, 5Continuous-Integration-Scaling, 5Patch-For-Review, 7WorkType-NewFunctionality: Remove CI root access from scandium - https://phabricator.wikimedia.org/T116921#1932408 (10greg) [20:32:48] Luke081515: yeah my bad, that was silly using an array - fixed ^ [20:33:00] * Luke081515 looks [20:33:33] (03PS4) 10Luke081515: Lift IP rate limit [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263905 (https://phabricator.wikimedia.org/T123458) (owner: 10Samtar) [20:33:40] (03CR) 10Luke081515: [C: 031] Lift IP rate limit [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263905 (https://phabricator.wikimedia.org/T123458) (owner: 10Samtar) [20:34:28] myrcx: No +2 verified ;) [20:34:35] unit tests complete [20:35:21] (03PS8) 10Subramanya Sastry: Make testreduce generic and instantiate parsoid-rt services [puppet] - 10https://gerrit.wikimedia.org/r/263322 (https://phabricator.wikimedia.org/T118778) [20:36:49] 6operations, 10ops-codfw, 10hardware-requests: allocate hardware for salt master in codfw - https://phabricator.wikimedia.org/T123559#1932440 (10ArielGlenn) 3NEW [20:37:05] 6operations, 10ops-codfw, 10Salt, 10hardware-requests: allocate hardware for salt master in codfw - https://phabricator.wikimedia.org/T123559#1932448 (10ArielGlenn) [20:37:26] myrcx: sry, I want to write "Now" +2 verfied instead of "no"... [20:37:41] 6operations, 6Phabricator, 7WorkType-Maintenance: Bahodir Mansurov locked out of Phabricator - https://phabricator.wikimedia.org/T123334#1932456 (10greg) [20:38:22] Luke081515: No worries :P I was confused for a second - thanks for reviewing! [20:38:30] no problem ;) [20:39:26] 6operations, 10ops-codfw, 10Salt, 10hardware-requests: allocate hardware for salt master in codfw - https://phabricator.wikimedia.org/T123559#1932440 (10ArielGlenn) I'd like 16GB ram, 16 cores if we have a spare laying around that meets those specs. This is lower powered than neodymium; my gut feel is tha... [20:40:50] 6operations, 10Datasets-General-or-Unknown: investigate rsync between dcs with encryption - https://phabricator.wikimedia.org/T123560#1932562 (10ArielGlenn) 3NEW a:3ArielGlenn [20:42:20] 6operations, 7Mail: remove exim aliases - mgodwin - https://phabricator.wikimedia.org/T123561#1932571 (10JKrauska) 3NEW a:3Dzahn [20:43:19] (03CR) 10Aude: "@thiemo that's why it's a setting, so it can be configured per-wiki." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263046 (https://phabricator.wikimedia.org/T123112) (owner: 10Thiemo Mättig (WMDE)) [20:50:31] (03CR) 10Hashar: "We can process submodules from operations/puppet though, that will let us make sure all repos adhere to the same standard. If we were to p" [puppet] - 10https://gerrit.wikimedia.org/r/244148 (https://phabricator.wikimedia.org/T114887) (owner: 10Hashar) [20:50:38] 6operations, 7Mail: remove exim alias - mkahn - https://phabricator.wikimedia.org/T123562#1932585 (10JKrauska) 3NEW a:3Dzahn [20:50:38] (03CR) 10Subramanya Sastry: "ori, i am happy with this now, and is ready for us to merge into ruthenium and see what breaks. :)" [puppet] - 10https://gerrit.wikimedia.org/r/263322 (https://phabricator.wikimedia.org/T118778) (owner: 10Subramanya Sastry) [20:51:28] (03CR) 10Hashar: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/163814 (owner: 10Hashar) [20:53:10] (03CR) 10Mobrovac: [C: 04-1] "Second round of comments." (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/263322 (https://phabricator.wikimedia.org/T118778) (owner: 10Subramanya Sastry) [20:55:33] subbu: reviewing now [20:55:50] oh, mobrovac -1ed it. :) [20:57:26] ori, ^ i'm going to address mobrovac's comments first. [20:58:24] (03CR) 10Mdann52: [C: 031] Lift IP rate limit [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263905 (https://phabricator.wikimedia.org/T123458) (owner: 10Samtar) [21:00:05] gwicke cscott arlolra subbu bearND mdholloway: Dear anthropoid, the time has come. Please deploy Services – Parsoid / OCG / Citoid / Mobileapps / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160113T2100). [21:01:13] PROBLEM - Host mr1-esams.oob is DOWN: PING CRITICAL - Packet loss = 100% [21:01:34] no parsoid deploy today [21:02:49] !log Disabling Puppet on mw1013 (eqiad jobrunner) to hack in some debug logging into GWT jobs. [21:02:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:03:23] 6operations, 6Phabricator, 7WorkType-Maintenance: Bahodir Mansurov locked out of Phabricator - https://phabricator.wikimedia.org/T123334#1932636 (10bmansurov) Thanks! [21:04:11] 6operations, 10hardware-requests: eqiad: (2) servers request for ORES - https://phabricator.wikimedia.org/T119598#1932637 (10RobH) @yuvipanda: Your reasoning seems sound to me, we'll call these oresdb1xxx. [21:05:44] 6operations, 10hardware-requests: eqiad: (2) servers request for ORES - https://phabricator.wikimedia.org/T119598#1932642 (10yuvipanda) I would suggest oresredis1xxx since they'll be used both as queue and cache machines, no 'db' type things there (we can flush them whenever) [21:06:32] PROBLEM - puppet last run on mw2038 is CRITICAL: CRITICAL: Puppet has 1 failures [21:07:18] (03CR) 10Mdann52: [C: 031] Rename two namespaces at bswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/247093 (https://phabricator.wikimedia.org/T115812) (owner: 10Luke081515) [21:07:23] RECOVERY - Host mr1-esams.oob is UP: PING OK - Packet loss = 0%, RTA = 89.43 ms [21:10:31] (03PS9) 10Subramanya Sastry: Make testreduce generic and instantiate parsoid-rt services [puppet] - 10https://gerrit.wikimedia.org/r/263322 (https://phabricator.wikimedia.org/T118778) [21:10:43] (03CR) 10Subramanya Sastry: Make testreduce generic and instantiate parsoid-rt services (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/263322 (https://phabricator.wikimedia.org/T118778) (owner: 10Subramanya Sastry) [21:11:25] (03CR) 10jenkins-bot: [V: 04-1] Make testreduce generic and instantiate parsoid-rt services [puppet] - 10https://gerrit.wikimedia.org/r/263322 (https://phabricator.wikimedia.org/T118778) (owner: 10Subramanya Sastry) [21:13:51] Hmm. Using IPv6, bast1001.wikimedia.org doesn't seem to want to work. Works fine via IPv4. [21:14:11] (03PS10) 10Subramanya Sastry: Make testreduce generic and instantiate parsoid-rt services [puppet] - 10https://gerrit.wikimedia.org/r/263322 (https://phabricator.wikimedia.org/T118778) [21:15:32] PROBLEM - puppet last run on mw1032 is CRITICAL: CRITICAL: puppet fail [21:16:18] can someone check why gerrit doesn't correctly replicate https://github.com/wikimedia/data-values-value-view anymore? tracked: https://phabricator.wikimedia.org/T123521 [21:16:46] (03PS1) 10Hashar: Revert "Get rid of .pep8 files" [puppet] - 10https://gerrit.wikimedia.org/r/263925 (https://phabricator.wikimedia.org/T114887) [21:16:54] (03PS2) 10Hashar: Revert "Get rid of .pep8 files" [puppet] - 10https://gerrit.wikimedia.org/r/263925 (https://phabricator.wikimedia.org/T114887) [21:22:00] 6operations, 10ops-codfw, 10Salt, 10hardware-requests: allocate hardware for salt master in codfw - https://phabricator.wikimedia.org/T123559#1932724 (10RobH) a:3ArielGlenn So we don't have any systems that are spare and have that core count without also having an insane amount of storage. We have the s... [21:22:40] apergos: if you are awake, i just updated the salt master ticket [21:22:45] who knows with jetlag ;] [21:22:58] I'm awake and working [21:23:08] goal for tomorrow: wake up by 3 pm [21:23:20] cool, so i updated the tikcet with a few options for allocation [21:23:32] i wasnt sure if your core count meant with or without hyperthreading, but i assume with [21:23:40] (since its very high and higher than current salt masters otherwise) [21:25:56] (03CR) 10Dzahn: [C: 032] Revert "Get rid of .pep8 files" [puppet] - 10https://gerrit.wikimedia.org/r/263925 (https://phabricator.wikimedia.org/T114887) (owner: 10Hashar) [21:26:27] apergos: i typically force myself into a no sleep hellscape until its the bedtime of my proper timezone [21:27:11] (03PS1) 10Chad: Gerrit: Remove root-level /gitweb redirection, nothing uses it [puppet] - 10https://gerrit.wikimedia.org/r/263927 [21:27:11] it makes for a useless return day though =P [21:27:43] !log Updated cirrus search mappings for testwikidata and wikidata to add new fields [21:27:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:28:09] mutante: thanks :-) [21:28:13] my method also relies on my return trip to make days later, heh [21:28:29] hashar: yw [21:28:48] (03CR) 10Hashar: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/163814 (owner: 10Hashar) [21:29:35] 6operations, 7Mail: remove benefits@ - https://phabricator.wikimedia.org/T123567#1932778 (10eliza) 3NEW a:3Dzahn [21:30:26] 6operations, 10ops-codfw, 10Salt, 10hardware-requests: allocate hardware for salt master in codfw - https://phabricator.wikimedia.org/T123559#1932802 (10ArielGlenn) You're right, I checked facter and forgot about hyperthreading which is enabled over there on neodymium. So we could go 2 6-cores and be fine... [21:30:43] RobH: I did the no sleep hell on the return flight but it didn't help any [21:31:04] also I've answered on the ticket as you see, thanks for catching that [21:31:12] sucks for no sleep =[ [21:31:58] so in order to meet my goal of awake at 3 pm I must be in bed by 7 am [21:32:54] PROBLEM - puppet last run on wtp2004 is CRITICAL: CRITICAL: puppet fail [21:33:05] 6operations, 10ops-codfw, 10Salt, 10hardware-requests: allocate hardware for salt master in codfw - https://phabricator.wikimedia.org/T123559#1932829 (10RobH) a:5ArielGlenn>3mark @Mark, I'm assigning this to you for your approval of spare system allocation to a salt-master in codfw. The system alloca... [21:33:23] RECOVERY - puppet last run on mw2038 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:34:55] (03PS2) 10Chad: Gerrit: Remove old gitweb redirects, broken [puppet] - 10https://gerrit.wikimedia.org/r/263927 [21:35:21] apergos: cool, updated and assigned to mark for his approval [21:38:47] sweet, thanks [21:39:28] 6operations, 6Performance-Team, 6Release-Engineering-Team, 10Traffic, and 2 others: Verify traffic to static resources from past branches does indeed drain - https://phabricator.wikimedia.org/T102991#1932925 (10greg) [21:39:44] I think wikibugs needs a limiter to stop it getting kicked for flooding :P [21:40:16] (03PS1) 10Mobrovac: RESTBase: Create a separate project definition for wiktionaries [puppet] - 10https://gerrit.wikimedia.org/r/263928 [21:40:44] !log mobileapps deployed c9e7e28 [21:40:45] could an ops help me with an RB config change perhaps? ^^ [21:40:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:40:58] (03PS1) 10Catrope: Enable the cross-wiki notifications beta feature in beta labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263929 [21:41:39] Krenair: Just to double-check, with our newfangled fanciness this ---^^ will work, right? (Extension wg variable without a wmg variable) [21:42:42] I would expect it to [21:42:51] give it a go [21:43:10] OK [21:43:30] Also, I've forgotten what the process around this is: do I need to put a beta-only change like this in SWAT, or can it just be merged? [21:44:23] RECOVERY - puppet last run on mw1032 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:44:50] !log restbase start deploy of 559a13a [21:44:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:44:58] RoanKattouw, I don't think we ever decided that question :P [21:45:08] 6operations, 7Mail: remove exim alias -- eekim - https://phabricator.wikimedia.org/T123572#1932972 (10JKrauska) 3NEW a:3Dzahn [21:45:28] I'd probably just do it since you're only touching a -labs file to be honest RoanKattouw [21:45:41] OK [21:45:57] (03CR) 10Catrope: [C: 032] Enable the cross-wiki notifications beta feature in beta labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263929 (owner: 10Catrope) [21:46:21] (03Merged) 10jenkins-bot: Enable the cross-wiki notifications beta feature in beta labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263929 (owner: 10Catrope) [21:48:13] PROBLEM - Ensure NFS exports are maintained for new instances with NFS on labstore1001 is CRITICAL: CRITICAL - Expecting active but unit nfs-exports is inactive [21:48:22] RoanKattouw: Krenair beta only changes don't need SWAT [21:48:40] but, if you can't do it and can't convince someone to do it for you, SWAT is fine [21:50:46] 6operations, 10ops-eqiad: patch new zayo transit connection - https://phabricator.wikimedia.org/T123574#1932996 (10RobH) 3NEW a:3Cmjohnson [21:51:04] 6operations, 10ops-eqiad: patch new zayo transit connection - https://phabricator.wikimedia.org/T123574#1933009 (10RobH) [21:51:33] 6operations, 10ops-eqiad: cr1-eqiad new patch zayo transit connection - https://phabricator.wikimedia.org/T123574#1933018 (10RobH) [21:53:33] (03PS1) 10Cmjohnson: adding pc1004-6 to netboot cfg [puppet] - 10https://gerrit.wikimedia.org/r/263933 [21:54:17] !log restbase end deploy of 559a13a [21:54:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:56:08] (03CR) 10Cmjohnson: [C: 032] adding pc1004-6 to netboot cfg [puppet] - 10https://gerrit.wikimedia.org/r/263933 (owner: 10Cmjohnson) [21:57:42] RECOVERY - puppet last run on wtp2004 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [21:59:32] PROBLEM - Host mr1-esams.oob is DOWN: PING CRITICAL - Packet loss = 100% [22:05:43] RECOVERY - Host mr1-esams.oob is UP: PING OK - Packet loss = 0%, RTA = 89.30 ms [22:06:30] Krenair: Hmm, it doesn't seem to work :( I'll add wmg plumbing [22:06:53] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/). [22:07:45] RoanKattouw: ^ [22:07:51] you forgot to sync AFAICT [22:10:38] 6operations, 7Mail: remove exim aliases -- usability, usability team - https://phabricator.wikimedia.org/T123575#1933102 (10JKrauska) 3NEW a:3Dzahn [22:12:44] RoanKattouw, Krenair: that only works if the extension has been converted to use extension.json, Echo hasn't yet. [22:13:12] RECOVERY - Ensure NFS exports are maintained for new instances with NFS on labstore1001 is OK: OK - nfs-exports is active [22:13:54] 6operations, 7Mail: remove exim alias - aradhana - https://phabricator.wikimedia.org/T123576#1933113 (10JKrauska) 3NEW a:3Dzahn [22:15:18] Aha OK [22:15:29] hoo: Yeah, I forgot, I have another one coming and I'll sync after that [22:16:04] (03PS1) 10Catrope: Add wmg plumbing for wgEchoUseCrossWikiBetaFeature [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263939 [22:16:15] 6operations, 7Mail: remove exim alias - corissa - https://phabricator.wikimedia.org/T123578#1933137 (10JKrauska) 3NEW a:3Dzahn [22:17:27] (03CR) 10Catrope: [C: 032] Add wmg plumbing for wgEchoUseCrossWikiBetaFeature [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263939 (owner: 10Catrope) [22:17:33] (03CR) 10Jforrester: "You'll need to add it to the whitelist if you want it to go live in production, of course." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263939 (owner: 10Catrope) [22:17:54] (03Merged) 10jenkins-bot: Add wmg plumbing for wgEchoUseCrossWikiBetaFeature [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263939 (owner: 10Catrope) [22:20:24] !log catrope@tin Synchronized wmf-config/: sync labs-only config changes (duration: 00m 32s) [22:20:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:21:23] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [22:21:53] RECOVERY - Unmerged changes on repository mediawiki_config on mira is OK: No changes to merge. [22:22:07] (03PS1) 10Andrew Bogott: Remove wikiviajesve nfs config [puppet] - 10https://gerrit.wikimedia.org/r/263940 [22:22:10] chasemp: ^ [22:22:57] 6operations, 7Mail: consolidate mailman redirects in exim aliases file - https://phabricator.wikimedia.org/T123581#1933178 (10JKrauska) 3NEW a:3Dzahn [22:22:58] andrewbogott: why...is that hardcoded there? [22:23:04] I guess I thought the whole deal was dynamic based on ldap [22:23:05] maybe not [22:23:09] No... [22:23:15] it used to be, but yuvi moved it to puppet [22:23:21] (03CR) 10Rush: [C: 031] "sure" [puppet] - 10https://gerrit.wikimedia.org/r/263940 (owner: 10Andrew Bogott) [22:23:26] mostly on purpose, so that people have to beg Ops to use nfs [22:23:37] why do we still run nfs-export-daemon then? [22:23:38] (03CR) 10Andrew Bogott: [C: 032] Remove wikiviajesve nfs config [puppet] - 10https://gerrit.wikimedia.org/r/263940 (owner: 10Andrew Bogott) [22:23:40] I'm confused [22:23:53] it used to be fully dynamic [22:24:17] so do we think this project was removed bu tthis entry was not? [22:24:33] yes. I deleted the project and forgot that there was a corresponding puppet change needed [22:24:42] since most projects don’t use nfs [22:25:08] it was quite a while ago, so this isn’t the source of our actual issue I think, just log spam [22:25:10] so could or could not be tripping up mountd as seen via strace but [22:25:21] we def know it's continually trying to handle that export which we don't want atm [22:25:30] ok [22:30:05] so now we’re back to our original problem, right? [22:31:41] ori, i addressed mobrovac's concerns ... ping me when you are free to get that puppet out on ruthenium and see what happens. [22:31:55] subbu: wanna do it now? [22:31:59] andrewbogott: well yeah so iiuc [22:32:01] sure. [22:32:19] it is making an rpc call and getting handled off to an instance of rpc.mount [22:32:26] but that basically hangs and the client says [22:32:29] (03PS1) 10Hashar: Add .gitreview [puppet/kafkatee] - 10https://gerrit.wikimedia.org/r/264010 [22:32:31] (03PS1) 10Hashar: Introduce tox as a test entry point [puppet/kafkatee] - 10https://gerrit.wikimedia.org/r/264011 [22:32:33] (03PS1) 10Hashar: Pass flake8 and add it to tox envlist [puppet/kafkatee] - 10https://gerrit.wikimedia.org/r/264012 [22:32:35] getpeername(4, 0x7fff87634ee0, [128]) = -1 ENOTCONN (Transport endpoint is not connected) [22:32:54] and eventually echo rpc timeout to stderror [22:33:05] rpc mount export: RPC: Timed out [22:34:35] (03CR) 10Hashar: "We use tox as a CI entry point ( https://www.mediawiki.org/wiki/Continuous_integration/Tutorials/Test_your_Python )." [puppet/kafkatee] - 10https://gerrit.wikimedia.org/r/264011 (owner: 10Hashar) [22:35:19] (03CR) 10Hashar: "Merely whitespaces/newlines tweaking" [puppet/kafkatee] - 10https://gerrit.wikimedia.org/r/264012 (owner: 10Hashar) [22:36:12] (03PS11) 10Ori.livneh: Make testreduce generic and instantiate parsoid-rt services [puppet] - 10https://gerrit.wikimedia.org/r/263322 (https://phabricator.wikimedia.org/T118778) (owner: 10Subramanya Sastry) [22:39:57] (03PS12) 10Ori.livneh: Make testreduce generic and instantiate parsoid-rt services [puppet] - 10https://gerrit.wikimedia.org/r/263322 (https://phabricator.wikimedia.org/T118778) (owner: 10Subramanya Sastry) [22:41:42] (03CR) 10Ori.livneh: [C: 032 V: 032] Make testreduce generic and instantiate parsoid-rt services [puppet] - 10https://gerrit.wikimedia.org/r/263322 (https://phabricator.wikimedia.org/T118778) (owner: 10Subramanya Sastry) [22:42:32] jouncebot: next [22:42:32] In 1 hour(s) and 17 minute(s): Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160114T0000) [22:43:13] (03PS2) 10Dzahn: RESTBase: Create a separate project definition for wiktionaries [puppet] - 10https://gerrit.wikimedia.org/r/263928 (owner: 10Mobrovac) [22:43:48] (03CR) 10Dzahn: [C: 032] RESTBase: Create a separate project definition for wiktionaries [puppet] - 10https://gerrit.wikimedia.org/r/263928 (owner: 10Mobrovac) [22:46:10] mobrovac: please go ahead ^ [22:46:17] yup mutante :) [22:46:18] thnx [22:46:42] applying puppet in staging now [22:54:49] !log restbase start deploy of 536e15b6 [22:54:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:54:59] paravoid: yt? [22:58:00] (03PS1) 10Subramanya Sastry: parsoid-rt-client: Save the localsettings file in the parsoid repo [puppet] - 10https://gerrit.wikimedia.org/r/264019 [22:58:14] !log /etc/init.d/nfs-kernel-server restart on labstore1001 [22:58:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:02:34] (03CR) 10Subramanya Sastry: "pcc is happy" [puppet] - 10https://gerrit.wikimedia.org/r/264019 (owner: 10Subramanya Sastry) [23:09:18] !log restbase end deploy of 536e15b6 [23:09:24] ostriches: can you check why gerrit doesn't correctly replicate https://github.com/wikimedia/data-values-value-view anymore? tracked: https://phabricator.wikimedia.org/T123521 [23:09:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:10:27] jzerebecki: I was poking that earlier, unclear why [23:11:47] :( thx anyway [23:12:42] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] [23:12:42] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] [23:15:53] PROBLEM - puppet last run on mw2112 is CRITICAL: CRITICAL: puppet fail [23:16:43] PROBLEM - Mobile HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] [23:17:26] (03PS1) 10Reedy: Add lots of wfMsg*() for wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/264023 [23:17:28] Krenair: ^ [23:17:30] * Reedy cries [23:17:56] Reedy, I was going to see if I could backport the patches [23:18:05] I suspect they won't [23:18:18] Depending on the amount.... It might be easier just to do them again [23:18:42] And then newer branches then don't have them? [23:18:52] RECOVERY - Mobile HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [23:18:52] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [23:18:52] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [23:22:46] Reedy, yeah, it's hopeless [23:23:10] (03PS1) 10Subramanya Sastry: Add parsoid::testing role and use it on ruthenium [puppet] - 10https://gerrit.wikimedia.org/r/264024 [23:24:04] (03CR) 10jenkins-bot: [V: 04-1] Add parsoid::testing role and use it on ruthenium [puppet] - 10https://gerrit.wikimedia.org/r/264024 (owner: 10Subramanya Sastry) [23:24:21] It's a shame we can't just nuke the Semantic extensions from wikitech [23:24:23] And/or just use master [23:25:18] Why can't we use master? Composer? Security? [23:25:19] Reedy, I think I'm just going to leave this broken [23:25:30] Security [23:25:34] Compatability [23:26:11] lol [23:26:17] Krenair: I really don't want to merge my "fix" [23:26:28] And I think SF is only not pinned to a branch, because there isn't any [23:26:42] not like it's really my problem to fix [23:26:52] nuria: is this one a duplicate ? https://gerrit.wikimedia.org/r/#/c/263891/3/2015/people.html just in that one file [23:26:55] (03PS2) 10Subramanya Sastry: Add parsoid::testing role and use it on ruthenium [puppet] - 10https://gerrit.wikimedia.org/r/264024 [23:29:09] 6operations, 10Gerrit, 10GitHub-Mirrors, 10ValueView, 10Wikidata: [Bug] ValueView GitHub mirror not updated any more - https://phabricator.wikimedia.org/T123521#1933396 (10Krenair) [23:32:00] mutante: i just corrected it but looks like you did too [23:33:16] nuria: yep:) just noticed and merged it. thanks [23:33:33] nuria: i assume it will go online within the next 2 hours or so [23:34:08] mutante: thank you [23:34:17] 6operations, 10Gerrit, 10GitHub-Mirrors, 10ValueView, 10Wikidata: [Bug] ValueView GitHub mirror not updated any more - https://phabricator.wikimedia.org/T123521#1933409 (10Krenair) I didn't see anything extra on github (for reference, it was at https://github.com/wikimedia/data-values-value-view/commit/4... [23:34:19] mutante: i tested locally and things are going into piwik [23:34:38] mutante: let me know when outside domain is available as we should see visits then [23:34:41] mutante: thanks much [23:34:52] nuria: cool! ok, will do [23:36:07] (03PS5) 10Yuvipanda: tools: Add simple command to log tool invocations [puppet] - 10https://gerrit.wikimedia.org/r/263817 (https://phabricator.wikimedia.org/T123444) [23:36:16] (03PS6) 10Yuvipanda: tools: Add simple command to log tool invocations [puppet] - 10https://gerrit.wikimedia.org/r/263817 (https://phabricator.wikimedia.org/T123444) [23:36:20] 6operations, 10Gerrit, 10GitHub-Mirrors, 10ValueView, 10Wikidata: [Bug] ValueView GitHub mirror not updated any more - https://phabricator.wikimedia.org/T123521#1933421 (10JanZerebecki) p:5Unbreak!>3Normal [23:37:02] (03CR) 10Yuvipanda: [C: 032 V: 032] tools: Add simple command to log tool invocations [puppet] - 10https://gerrit.wikimedia.org/r/263817 (https://phabricator.wikimedia.org/T123444) (owner: 10Yuvipanda) [23:37:19] 6operations, 10Gerrit, 10GitHub-Mirrors, 10ValueView, 10Wikidata: [Bug] ValueView GitHub mirror not updated any more - https://phabricator.wikimedia.org/T123521#1933425 (10Krenair) Did the same with IPSet (was at https://github.com/wikimedia/IPSet/commit/3c2dd6706546fe616e6ceba02044e64dce4fc9be ) [23:42:53] RECOVERY - puppet last run on mw2112 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [23:46:25] !log T123451: running mwscript sql.php --wiki=metawiki patch-bot_passwords.sql [23:46:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:53:21] 6operations, 6Performance-Team, 10Wikimedia-General-or-Unknown, 5Patch-For-Review, and 3 others: jobrunner memory leaks - https://phabricator.wikimedia.org/T122069#1933466 (10aaron) [23:53:23] 6operations, 6Performance-Team, 10Wikimedia-General-or-Unknown, 5MW-1.27-release-notes, and 2 others: Record per-job-type memory usage statistics - https://phabricator.wikimedia.org/T123284#1933465 (10aaron) 5Open>3Resolved