[00:01:17] ostriches: maybe not related to any of the jobrunner stuff, but OCG looks to be having a lot of problems talking to redis too -- https://logstash.wikimedia.org/#dashboard/temp/AVNoIHZOO3D718AOe-sE [00:02:44] 6Operations, 6Labs: revise/fix labstore replicate backup jobs - https://phabricator.wikimedia.org/T127567#2113829 (10chasemp) Backups were failing again last night, and I'm pretty sure it was related to a full snapshot left behind on labstore2001 (which was one of the previous causes). I removed the offending... [00:04:52] heh. the slow-parse dashboard is the inverse of the redis error dashboard -- https://logstash.wikimedia.org/#/dashboard/elasticsearch/slow-parse [00:05:10] when google shows the answer you want but then linuxquestions.org is down [00:05:25] once jobrunners started talking to redis well the slow-parse errors jump back up [00:05:37] mutante: hit the google cache link? [00:05:48] phew, yes, it's in cache [00:07:42] Did https://grafana.wikimedia.org/dashboard/db/job-queue-health?panelId=12&fullscreen&from=now-1d&to=now finally peak? [00:10:34] Data is saying we're processing more than enqueuing finally. [00:13:56] (03CR) 10GWicke: "Before we do this, we should shut down RESTBase on 1001 and 1002. Once this patch is merged, those nodes will lose access to the other Cas" [puppet] - 10https://gerrit.wikimedia.org/r/276728 (owner: 10Mobrovac) [00:22:33] (03PS1) 10Dereckson: Revert "(bug 45233) Groups permissions on pt.wikivoyage" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/276917 (https://phabricator.wikimedia.org/T129487) [00:33:03] (03PS1) 10Dereckson: Remove Wikisaurus namespace from ko.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/276919 (https://phabricator.wikimedia.org/T129631) [00:45:23] (03PS1) 10Aaron Schulz: Lower RunJobs default ?maxjobs to lower memory use [mediawiki-config] - 10https://gerrit.wikimedia.org/r/276921 [00:52:36] (03CR) 10Aaron Schulz: [C: 032] Lower RunJobs default ?maxjobs to lower memory use [mediawiki-config] - 10https://gerrit.wikimedia.org/r/276921 (owner: 10Aaron Schulz) [00:53:11] (03Merged) 10jenkins-bot: Lower RunJobs default ?maxjobs to lower memory use [mediawiki-config] - 10https://gerrit.wikimedia.org/r/276921 (owner: 10Aaron Schulz) [00:54:09] (03PS7) 10Dzahn: exim: rewriting rule for maint-announce@ mail to phab [puppet] - 10https://gerrit.wikimedia.org/r/268851 (https://phabricator.wikimedia.org/T118176) [00:55:19] !log aaron@tin Synchronized rpc/RunJobs.php: 7a8bd37247b7dfb (duration: 00m 38s) [00:55:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:55:23] (03PS8) 10Dzahn: phab/exim: rewriting rule for maint-announce@ mail [puppet] - 10https://gerrit.wikimedia.org/r/268851 (https://phabricator.wikimedia.org/T118176) [01:00:33] (03PS9) 10Dzahn: phab/exim: rewriting rule for maint-announce@ mail [puppet] - 10https://gerrit.wikimedia.org/r/268851 (https://phabricator.wikimedia.org/T118176) [01:01:30] (03CR) 10Dzahn: [C: 032] "tested - i just sent a single mail to maint-announce@wm and i got 2 tickets, T129720 and 10079" [puppet] - 10https://gerrit.wikimedia.org/r/268851 (https://phabricator.wikimedia.org/T118176) (owner: 10Dzahn) [01:01:32] robh: ^ [01:01:41] ohh [01:01:58] awesome [01:02:05] going to merge monday? [01:02:10] see the 2 tickets? [01:02:19] or already merged? [01:02:29] yes, just waiting for 'verified' [01:02:43] it's not touching mx anymore, just iridium [01:02:46] less risk [01:02:58] cool [01:03:07] i just triaged up maint-announce today in fact [01:03:17] so its clean for next week, you and i can keep an on eye it [01:03:23] ok, cool! [01:04:32] robh: ok, applied. wanna send one mail to maint-announce@ for me? [01:05:14] sent [01:05:48] https://phabricator.wikimedia.org/T129721 [01:05:59] https://rt.wikimedia.org/Ticket/Display.html?id=10080 [01:06:09] :) [01:12:28] 6Operations, 13Patch-For-Review: move RT off of magnesium - https://phabricator.wikimedia.org/T119112#2113958 (10Dzahn) [01:24:08] 6Operations, 13Patch-For-Review: move RT off of magnesium - https://phabricator.wikimedia.org/T119112#2113973 (10Dzahn) [01:26:34] 6Operations, 10CirrusSearch, 6Discovery, 3Discovery-Search-Sprint, and 4 others: Look into encrypting Elasticsearch traffic - https://phabricator.wikimedia.org/T124444#2113977 (10Deskana) [01:26:36] 6Operations, 10DBA, 13Patch-For-Review: Set up TLS for MariaDB replication - https://phabricator.wikimedia.org/T111654#2113978 (10Deskana) [01:26:38] 6Operations, 10CirrusSearch, 6Discovery, 3Discovery-Search-Sprint, and 2 others: Create a PKI that can be used by Puppet and for general purpose certificates - https://phabricator.wikimedia.org/T128077#2113975 (10Deskana) 5Open>3Resolved >>! In T128077#2111950, @Gehel wrote: > @Deskana As our PO, I'll... [01:35:50] (03PS1) 10Dzahn: phab: add sender domains for maint-announce tickets [puppet] - 10https://gerrit.wikimedia.org/r/276923 (https://phabricator.wikimedia.org/T118176) [01:36:14] (03CR) 10jenkins-bot: [V: 04-1] phab: add sender domains for maint-announce tickets [puppet] - 10https://gerrit.wikimedia.org/r/276923 (https://phabricator.wikimedia.org/T118176) (owner: 10Dzahn) [01:37:02] (03PS2) 10Dzahn: phab: add sender domains for maint-announce tickets [puppet] - 10https://gerrit.wikimedia.org/r/276923 (https://phabricator.wikimedia.org/T118176) [01:41:17] 6Operations, 6Labs: revise/fix labstore replicate backup jobs - https://phabricator.wikimedia.org/T127567#2114031 (10yuvipanda) Thank you for the indepth investigation and awesome write up, @chasemp! This all sounds great, but I want to challenge one assumption: > Multi-week historical copies as space allows... [01:42:13] (03PS3) 10Dzahn: phab: add sender domains for maint-announce tickets [puppet] - 10https://gerrit.wikimedia.org/r/276923 (https://phabricator.wikimedia.org/T118176) [01:45:56] (03Restored) 10Mattflaschen: Change login cookies (for 'Remember me') to a one year expiry. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/230954 (https://phabricator.wikimedia.org/T68699) (owner: 10Mattflaschen) [01:48:11] (03PS2) 10Mattflaschen: Change login cookies (for 'Remember me') to a one year expiry. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/230954 (https://phabricator.wikimedia.org/T68699) [01:48:34] (03CR) 10jenkins-bot: [V: 04-1] Change login cookies (for 'Remember me') to a one year expiry. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/230954 (https://phabricator.wikimedia.org/T68699) (owner: 10Mattflaschen) [01:48:53] 6Operations, 6Labs, 10wikitech.wikimedia.org: Update wikitech-static OS/PHP version - https://phabricator.wikimedia.org/T126385#2114039 (10Dzahn) thanks you. yes, the hostname change was intended but not explicitely mentioned. the dump import script thing sounds like a potential puppet fix? [01:51:43] (03CR) 10Mattflaschen: "We should also do https://gerrit.wikimedia.org/r/#/c/231490/ / T109031 before deploying this" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/230954 (https://phabricator.wikimedia.org/T68699) (owner: 10Mattflaschen) [01:52:24] 6Operations, 6Labs, 10wikitech.wikimedia.org: Update wikitech-static OS/PHP version - https://phabricator.wikimedia.org/T126385#2114054 (10Krenair) >>! In T126385#2114039, @Dzahn wrote: > the dump import script thing sounds like a potential puppet fix? I suppose so, if puppet were installed. [01:58:32] (03CR) 10Alex Monk: [C: 04-1] "Andrew suggested we use the keystone services catalogue instead." [puppet] - 10https://gerrit.wikimedia.org/r/276893 (https://phabricator.wikimedia.org/T129245) (owner: 10Alex Monk) [01:59:42] (03PS1) 10Dzahn: shinken: "Puppet failure" is never OK [puppet] - 10https://gerrit.wikimedia.org/r/276924 [02:00:39] RECOVERY - Last backup of the tools filesystem on labstore1001 is OK: OK - Last run for unit replicate-tools was successful [02:01:00] (03PS2) 10Dzahn: shinken: "Puppet failure" is never OK [puppet] - 10https://gerrit.wikimedia.org/r/276924 [02:01:44] (03CR) 10Dzahn: [C: 032] shinken: "Puppet failure" is never OK [puppet] - 10https://gerrit.wikimedia.org/r/276924 (owner: 10Dzahn) [02:04:56] (03PS4) 10Dzahn: phab: add sender domains for maint-announce tickets [puppet] - 10https://gerrit.wikimedia.org/r/276923 (https://phabricator.wikimedia.org/T118176) [02:06:58] (03CR) 10Dzahn: [C: 032] phab: add sender domains for maint-announce tickets [puppet] - 10https://gerrit.wikimedia.org/r/276923 (https://phabricator.wikimedia.org/T118176) (owner: 10Dzahn) [02:15:43] (03CR) 10Alex Monk: "Just need to figure out exactly what is wrong with my service/endpoint setup in horizon-proxy-dashboard.openstack.eqiad.wmflabs..." [puppet] - 10https://gerrit.wikimedia.org/r/276893 (https://phabricator.wikimedia.org/T129245) (owner: 10Alex Monk) [02:25:00] PROBLEM - Host ms-fe1004 is DOWN: PING CRITICAL - Packet loss = 100% [02:29:32] !log mwdeploy@tin sync-l10n completed (1.27.0-wmf.16) (duration: 12m 52s) [02:29:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:38:17] !log l10nupdate@tin ResourceLoader cache refresh completed at Sat Mar 12 02:38:16 UTC 2016 (duration 8m 45s) [02:38:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:42:48] 6Operations, 10ArchCom-RfC, 6Commons, 10MediaWiki-File-management, and 13 others: Use content hash based image / thumb URLs & define an official thumb API - https://phabricator.wikimedia.org/T66214#2114112 (10GWicke) > What data is fetched and used before knowing what thumbnail to request? Normally, no se... [02:55:25] 6Operations, 10ArchCom-RfC, 6Commons, 10MediaWiki-File-management, and 13 others: Use content hash based image / thumb URLs & define an official thumb API - https://phabricator.wikimedia.org/T66214#2114121 (10GWicke) > Also, a 100% compatible VCL-layer mapping of nice URLs to old URL is just not gonna happ... [03:00:11] RECOVERY - Last backup of the others filesystem on labstore1001 is OK: OK - Last run for unit replicate-others was successful [03:08:32] 6Operations, 10ArchCom-RfC, 6Commons, 10MediaWiki-File-management, and 13 others: Use content hash based image / thumb URLs & define an official thumb API - https://phabricator.wikimedia.org/T66214#2114140 (10GWicke) [03:23:37] 6Operations, 6Services: Package npm 2.14 - https://phabricator.wikimedia.org/T124474#1957157 (10Ricordisamoa) Will npm-node-4.3 CI jobs be fixed too? They actually run `npm install`... [03:27:38] (03PS1) 10Yuvipanda: k8s: Mount separate LVM volume for /var/lib/docker [puppet] - 10https://gerrit.wikimedia.org/r/276929 (https://phabricator.wikimedia.org/T129729) [03:40:17] 6Operations, 10ArchCom-RfC, 6Commons, 10MediaWiki-File-management, and 13 others: Use content hash based image / thumb URLs & define an official thumb API - https://phabricator.wikimedia.org/T66214#2114201 (10Tgr) >>! In T66214#2114121, @GWicke wrote: > This depends on how much information is available in... [04:00:21] RECOVERY - Last backup of the maps filesystem on labstore1001 is OK: OK - Last run for unit replicate-maps was successful [04:21:14] (03CR) 10Yuvipanda: [C: 032] k8s: Mount separate LVM volume for /var/lib/docker [puppet] - 10https://gerrit.wikimedia.org/r/276929 (https://phabricator.wikimedia.org/T129729) (owner: 10Yuvipanda) [04:44:03] !log legoktm@tin Synchronized php-1.27.0-wmf.16/extensions/CharInsert/: Revert "Remove inline event handler js from charinsert" - https://gerrit.wikimedia.org/r/#/c/276932/ T129524 (duration: 00m 29s) [04:44:04] T129524: Inserting via MediaWiki:Edittools no longer working - https://phabricator.wikimedia.org/T129524 [04:44:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [05:50:31] PROBLEM - puppet last run on mw1205 is CRITICAL: CRITICAL: Puppet has 1 failures [06:16:09] RECOVERY - puppet last run on mw1205 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [06:29:30] PROBLEM - puppet last run on mw1008 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:30] PROBLEM - puppet last run on pc1006 is CRITICAL: CRITICAL: puppet fail [06:29:50] PROBLEM - puppet last run on cp2013 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:31] PROBLEM - puppet last run on wtp2015 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:31] PROBLEM - puppet last run on elastic2007 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:50] PROBLEM - puppet last run on mw2136 is CRITICAL: CRITICAL: puppet fail [06:31:30] PROBLEM - puppet last run on db2055 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:00] PROBLEM - puppet last run on mw2207 is CRITICAL: CRITICAL: Puppet has 2 failures [06:56:31] RECOVERY - puppet last run on mw1008 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:56:31] RECOVERY - puppet last run on pc1006 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [06:56:59] RECOVERY - puppet last run on cp2013 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [06:57:00] RECOVERY - puppet last run on db2055 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [06:57:30] RECOVERY - puppet last run on mw2207 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:41] RECOVERY - puppet last run on elastic2007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:41] RECOVERY - puppet last run on wtp2015 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [06:58:01] RECOVERY - puppet last run on mw2136 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [07:22:00] (03PS1) 10Pmlineditor: Enable extension WikiLove on bnwikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/276934 (https://phabricator.wikimedia.org/T129728) [07:32:49] PROBLEM - puppet last run on mw1076 is CRITICAL: CRITICAL: Puppet has 1 failures [07:54:29] 6Operations, 10MediaWiki-JobQueue, 13Patch-For-Review: Job queue is growing and growing - https://phabricator.wikimedia.org/T129517#2114316 (10Raymond) >>! In T129517#2113465, @Luke081515 wrote: > Other ideas to solve this? I guess there only specific types of jobs affected aren't they? Otherwise users at wi... [07:57:59] RECOVERY - puppet last run on mw1076 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [08:16:44] hey, anyone in gerrit can proceed this request? https://phabricator.wikimedia.org/T129733 [08:16:54] I need this right to continue working [08:34:23] (03PS1) 10ArielGlenn: dumps: add pagetitles and mediatitles directories to 'other' listing [puppet] - 10https://gerrit.wikimedia.org/r/276936 [08:35:42] (03PS2) 10ArielGlenn: dumps: add pagetitles and mediatitles directories to 'other' listing [puppet] - 10https://gerrit.wikimedia.org/r/276936 [08:57:37] <_joe_> Amir1: not really, given a) it's the weekend b) it's an access request for you by you without any form of approval for anyone [08:57:54] (03CR) 10ArielGlenn: [C: 032] dumps: add pagetitles and mediatitles directories to 'other' listing [puppet] - 10https://gerrit.wikimedia.org/r/276936 (owner: 10ArielGlenn) [08:58:08] <_joe_> also, given I assume you're not going to release anything on the weekend (right?) how is this blocking you exactly? [08:59:23] _joe_: I understand, my goal is to get the biggest blocker done before end of the weekend so we can focus on moving to prod next weeks [08:59:57] btw. I'm member of ores team, I have access [09:00:11] to almost all repos, labs instances, etc. [09:00:48] <_joe_> Amir1: I don't doubt it [09:00:49] by almost all repos I mean all repos except this, I'm admin in both ores and revscoring project, etc. [09:01:22] <_joe_> still, why is this blocking you? [09:01:36] <_joe_> you can always cherry-pick your changes in the meanwhile [09:01:57] because I need them here [09:01:58] https://github.com/wiki-ai/ores-wikimedia-config/pull/44/files [09:02:35] and I need to add more wheels, data to that repo, I'm waiting this patch to be merged: https://gerrit.wikimedia.org/r/#/c/276310/ [09:02:55] (Aaron +2'd it but since jenkins is not enabled we need to submit it) [09:03:25] <_joe_> Amir1: I still don't get why you can't cherry-pick your patches [09:03:32] all of them are parts of this task: https://phabricator.wikimedia.org/T129109 [09:03:56] <_joe_> what is the technical operation you're unable to do without +2? [09:04:24] <_joe_> apart from merging, that could be done on monday I guess, any testing you might need to do doesn't need merges [09:04:28] <_joe_> I hope [09:04:43] it slows the work down, I can cherry pick, true. but then I have to change the everything after that [09:04:52] in the ores-wikimedia-config [09:05:18] <_joe_> oh I guess you're ok to wait then. [09:05:43] <_joe_> I'm not breaking due process without a solid reason [09:05:45] ok [09:05:55] I understand [09:06:02] totally [09:06:05] <_joe_> sorry [09:06:24] np [09:06:29] (03PS1) 10ArielGlenn: aaaand fix the href tag in the dumps other index file, thanks emacs [puppet] - 10https://gerrit.wikimedia.org/r/276937 [09:08:00] (03CR) 10ArielGlenn: [C: 032] aaaand fix the href tag in the dumps other index file, thanks emacs [puppet] - 10https://gerrit.wikimedia.org/r/276937 (owner: 10ArielGlenn) [09:39:52] (03PS1) 10ArielGlenn: dumps: clean up old page and media title dumps [puppet] - 10https://gerrit.wikimedia.org/r/276938 [09:44:23] (03CR) 10ArielGlenn: [C: 032] dumps: clean up old page and media title dumps [puppet] - 10https://gerrit.wikimedia.org/r/276938 (owner: 10ArielGlenn) [09:50:50] (03PS1) 10ArielGlenn: fix up arg for find command for cleaning up old title list dumps [puppet] - 10https://gerrit.wikimedia.org/r/276940 [09:51:10] writing manifests when still asleep, not recommended [09:52:20] (03CR) 10ArielGlenn: [C: 032] fix up arg for find command for cleaning up old title list dumps [puppet] - 10https://gerrit.wikimedia.org/r/276940 (owner: 10ArielGlenn) [09:58:31] (03PS1) 10ArielGlenn: title list dump cleanup: find doesn't like maxdepth after type [puppet] - 10https://gerrit.wikimedia.org/r/276941 [09:58:39] what was i just saying about sleep [10:00:21] (03CR) 10ArielGlenn: [C: 032] title list dump cleanup: find doesn't like maxdepth after type [puppet] - 10https://gerrit.wikimedia.org/r/276941 (owner: 10ArielGlenn) [10:04:31] (03CR) 10Elukey: First draft for the Varnish 4 porting. (033 comments) [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/276439 (https://phabricator.wikimedia.org/T124278) (owner: 10Elukey) [10:13:50] PROBLEM - puppet last run on mw2151 is CRITICAL: CRITICAL: puppet fail [10:21:51] 6Operations, 10MediaWiki-JobQueue, 13Patch-For-Review: Job queue is growing and growing - https://phabricator.wikimedia.org/T129517#2114476 (10mmodell) What @aaron said. This appears to be resolved. Redis errors are back down to nearly nothing. [10:22:07] 6Operations, 10MediaWiki-JobQueue, 13Patch-For-Review: Job queue is growing and growing - https://phabricator.wikimedia.org/T129517#2114477 (10mmodell) p:5Unbreak!>3Normal [10:41:39] RECOVERY - puppet last run on mw2151 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [11:09:16] (03PS1) 10Hashar: beta: nutcracker::verbosity: "4" [puppet] - 10https://gerrit.wikimedia.org/r/276950 [11:14:18] (03CR) 10Hashar: [C: 04-1] "Not taken in account somehow :-(" [puppet] - 10https://gerrit.wikimedia.org/r/276950 (owner: 10Hashar) [11:32:59] (03PS2) 10ArielGlenn: onallwikis: allow query to include wiki name in string [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/276905 [11:34:08] (03CR) 10ArielGlenn: [C: 032 V: 032] onallwikis: allow query to include wiki name in string [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/276905 (owner: 10ArielGlenn) [11:35:06] (03CR) 10ArielGlenn: "even though its dependency is merged, this query needs redoing, as it must rnu _on_ commons for all wikis. Need another tweak to onallwik" [puppet] - 10https://gerrit.wikimedia.org/r/276907 (owner: 10ArielGlenn) [11:57:56] (03PS1) 10ArielGlenn: add option to onallwikis to let it loop through all wikis as args from one wiki [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/276956 [11:59:37] 6Operations, 10MediaWiki-JobQueue, 13Patch-For-Review: Job queue is growing and growing - https://phabricator.wikimedia.org/T129517#2107949 (10ArielGlenn) Note that while the job queue no longer seems to be increasing (https://grafana.wikimedia.org/dashboard/db/job-queue-health?panelId=12&fullscreen&from=now... [12:24:00] PROBLEM - puppet last run on mw2174 is CRITICAL: CRITICAL: Puppet has 1 failures [12:49:50] RECOVERY - puppet last run on mw2174 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [12:58:01] (03CR) 10Luke081515: [C: 031] Remove Wikisaurus namespace from ko.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/276919 (https://phabricator.wikimedia.org/T129631) (owner: 10Dereckson) [14:01:20] PROBLEM - puppet last run on restbase2006 is CRITICAL: CRITICAL: puppet fail [14:26:59] RECOVERY - puppet last run on restbase2006 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [14:28:51] (03PS2) 10ArielGlenn: add option to onallwikis to run from a base wiki, with wikis as args [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/276956 [14:29:08] Please synch Interwiki map (Krenair?) [14:32:32] it's the weekend but you could request it for when people are back at work again, Steinsplitter [14:33:03] a ticket in phab in case people's irc backlog goes away [14:33:20] ok, it is not that urgent :) [14:36:19] :-) [14:44:24] Steinsplitter: yup definitely fill it as a task ( maybe #wikimedia-site-requests ) [14:44:47] that has the additional benefit of spamming a lot more of people than those lurking this irc channel right know [14:44:50] :D [14:47:49] :-) [14:53:26] (03CR) 10Dereckson: [C: 031] Enable extension WikiLove on bnwikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/276934 (https://phabricator.wikimedia.org/T129728) (owner: 10Pmlineditor) [14:58:16] (03PS2) 10Dereckson: Whitelist feeds included on Wikimedia Germany Engineering page on mediawiki.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/275815 (https://phabricator.wikimedia.org/T127176) (owner: 10WMDE-leszek) [14:59:07] (03CR) 10Dereckson: "PS2: add reference to the bug in config file" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/275815 (https://phabricator.wikimedia.org/T127176) (owner: 10WMDE-leszek) [16:44:37] (03PS7) 10Elukey: First draft for the Varnish 4 porting. [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/276439 (https://phabricator.wikimedia.org/T124278) [16:49:10] PROBLEM - Kafka Broker Replica Max Lag on kafka1018 is CRITICAL: CRITICAL: 58.62% of data above the critical threshold [5000000.0] [16:54:21] (03CR) 10Elukey: "Ottomata: I added VSM_Error() as output of the VSM_Open's error handling, so you should be able now to get a precise error about why it is" [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/276439 (https://phabricator.wikimedia.org/T124278) (owner: 10Elukey) [17:01:03] 6Operations, 10MediaWiki-JobQueue, 13Patch-For-Review: Job queue is growing and growing - https://phabricator.wikimedia.org/T129517#2114864 (10elukey) Very interesting thing from https://grafana.wikimedia.org/dashboard/db/job-queue-health: enqueue rate is equal to processing rate, so this is probably why the... [17:03:10] RECOVERY - Kafka Broker Replica Max Lag on kafka1018 is OK: OK: Less than 50.00% above the threshold [1000000.0] [17:10:19] PROBLEM - puppet last run on cp4003 is CRITICAL: CRITICAL: puppet fail [17:11:59] !log Updated operations/dumps/dcat on snapshot1003 from 92ab37d94e to e97408df39 [17:12:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:36:09] RECOVERY - puppet last run on cp4003 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [19:01:59] 6Operations, 5codfw-rollout, 3codfw-rollout-Jan-Mar-2016: Check the redis (jobqueue) configuration in codfw - https://phabricator.wikimedia.org/T124672#2114926 (10Joe) p:5Normal>3High [19:07:31] 6Operations, 10ops-codfw, 13Patch-For-Review, 5codfw-rollout, 3codfw-rollout-Jan-Mar-2016: rack/setup/deploy rdb200[5-6] - https://phabricator.wikimedia.org/T129178#2114927 (10Joe) >>! In T129178#2112402, @Dzahn wrote: > so yea, i don't really know if it's worth actually redoing that. but _if_ you really... [19:07:48] 6Operations, 10ops-codfw, 13Patch-For-Review, 5codfw-rollout, 3codfw-rollout-Jan-Mar-2016: rack/setup/deploy rdb200[5-6] - https://phabricator.wikimedia.org/T129178#2114928 (10Joe) both servers are up and running and healthy AFAICS [19:07:57] 6Operations, 10hardware-requests, 5codfw-rollout, 3codfw-rollout-Jan-Mar-2016: codfw: (2) servers for redis jobrunners - https://phabricator.wikimedia.org/T126453#2114930 (10Joe) [19:07:59] 6Operations, 10ops-codfw, 13Patch-For-Review, 5codfw-rollout, 3codfw-rollout-Jan-Mar-2016: rack/setup/deploy rdb200[5-6] - https://phabricator.wikimedia.org/T129178#2114929 (10Joe) 5Open>3Resolved [19:08:37] 6Operations, 5codfw-rollout, 3codfw-rollout-Jan-Mar-2016: Check the redis (jobqueue) configuration in codfw - https://phabricator.wikimedia.org/T124672#2114932 (10Joe) [19:08:39] 6Operations, 10hardware-requests, 5codfw-rollout, 3codfw-rollout-Jan-Mar-2016: codfw: (2) servers for redis jobrunners - https://phabricator.wikimedia.org/T126453#2014739 (10Joe) 5stalled>3Resolved [19:09:06] (03PS1) 10Giuseppe Lavagetto: jobqueue_redis: set up encryption and cross-dc replication [puppet] - 10https://gerrit.wikimedia.org/r/276980 (https://phabricator.wikimedia.org/T124672) [19:56:17] (03PS1) 10Dereckson: Add files to noc.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/276993 (https://phabricator.wikimedia.org/T116163) [20:00:19] (03CR) 10MaxSem: ""new" :P" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/276993 (https://phabricator.wikimedia.org/T116163) (owner: 10Dereckson) [20:02:37] (03PS2) 10Dereckson: Add files to noc.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/276993 (https://phabricator.wikimedia.org/T116163) [20:03:08] MaxSem: disambiguated [20:03:44] (03PS2) 10Giuseppe Lavagetto: jobqueue_redis: set up encryption and cross-dc replication [puppet] - 10https://gerrit.wikimedia.org/r/276980 (https://phabricator.wikimedia.org/T124672) [20:12:48] (03PS3) 10Giuseppe Lavagetto: jobqueue_redis: set up encryption and cross-dc replication [puppet] - 10https://gerrit.wikimedia.org/r/276980 (https://phabricator.wikimedia.org/T124672) [20:29:53] Krenair: hey, this patch is +2'd https://gerrit.wikimedia.org/r/#/c/276310/ and also verified (with explicit +2 on that too) but it's not merged, earlier today we (Aaron and I) talked but we couldn't figure it out why it's not working. Can you take a look? [20:30:25] Amir1, what's not working? [20:31:10] It's +2'd but not merged [20:31:11] I can see that it was CR+2'd and V+2'd without being merged, but... what's wrong? [20:31:20] Okay. Did you try merging it? [20:31:28] we want this merged [20:31:33] How? [20:31:39] usually jenkins does that [20:31:50] You have to set jenkins up to automatically merge for you [20:31:55] we checked for every possible button [20:32:01] If you didn't set jenkins up to do it, you still have to merge manually [20:32:08] huh. there's no "Submit" button either. unless i don't have the right for that repo [20:32:21] ugh, I guess I need to fiddle with ACLs to allow submission [20:32:27] exactly, there is no Submit button [20:32:48] ugh. people have been switching it off for repos to make everyone's lives more difficult [20:33:00] I manually merged before several times but I did using submit button and it's not there [20:33:47] Okay [20:33:54] There is now a submit button for users in the research group [20:35:15] http://git.wikimedia.org/commitdiff/research.git/bd2db69cf450055f5cd824f60e0d9912d1fcc509 [20:35:27] thanks Krenair :) [20:36:12] research/ores and research/ores/wheels are owned by that silly "Project and Group Creators" group [20:36:58] Will fix that when I (or maybe another admin) make the ores group for you [20:38:25] it'll happen soon [20:40:41] (03PS3) 10Giuseppe Lavagetto: [WiP] Add ipvs-related FSM [debs/pybal] - 10https://gerrit.wikimedia.org/r/272679 [20:42:32] (03CR) 10jenkins-bot: [V: 04-1] [WiP] Add ipvs-related FSM [debs/pybal] - 10https://gerrit.wikimedia.org/r/272679 (owner: 10Giuseppe Lavagetto) [20:50:19] 6Operations, 6Labs, 10wikitech.wikimedia.org: Update wikitech-static OS/PHP version - https://phabricator.wikimedia.org/T126385#2115044 (10Krenair) >>! In T126385#2105742, @Krenair wrote: > This should hopefully fix importing. Revisions from earlier today have appeared on https://wikitech-static.wikimedia.o... [22:29:50] PROBLEM - Kafka Broker Replica Max Lag on kafka1012 is CRITICAL: CRITICAL: 55.17% of data above the critical threshold [5000000.0] [22:29:59] PROBLEM - Kafka Broker Replica Max Lag on kafka1013 is CRITICAL: CRITICAL: 53.33% of data above the critical threshold [5000000.0] [22:43:30] RECOVERY - Kafka Broker Replica Max Lag on kafka1012 is OK: OK: Less than 50.00% above the threshold [1000000.0] [22:43:39] RECOVERY - Kafka Broker Replica Max Lag on kafka1013 is OK: OK: Less than 50.00% above the threshold [1000000.0] [23:46:35] 6Operations, 10Beta-Cluster-Infrastructure, 6Labs, 10Labs-Infrastructure: beta: Get SSL certificates for *.{projects}.beta.wmflabs.org - https://phabricator.wikimedia.org/T50501#2115139 (10Krenair) [23:53:35] (03PS2) 10Tim Landscheidt: Tools: Fix puppet-lint warnings [puppet] - 10https://gerrit.wikimedia.org/r/272440