[00:00:03] mutante: https://gerrit.wikimedia.org/r/194758 [00:00:04] RoanKattouw, ^d, Krenair: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150306T0000). Please do the needful. [00:00:15] Need that now [00:00:27] seems like the script doesn't scale as well as I initially thought [00:00:49] 6operations, 10Analytics-EventLogging, 6Analytics-Kanban: EventLogging query strings are truncated to 1014 bytes by ?(varnishncsa? or udp packet size?) - https://phabricator.wikimedia.org/T91347#1094294 (10Nuria) [00:02:28] Any other op around? [00:02:31] who's swatting? [00:02:49] (03CR) 10Ori.livneh: [C: 032] Change dispatchChanges parameters for Wikidata [puppet] - 10https://gerrit.wikimedia.org/r/194758 (owner: 10Hoo man) [00:02:55] Thanks :) [00:02:58] I guess me? [00:03:15] ok [00:03:21] superm401: ping for swat [00:03:29] Present [00:05:03] (03CR) 10Alex Monk: [C: 04-1] "List of changes is incomplete, and I don't understand why you've mentioned wmgMFRemovePageActions" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194549 (https://phabricator.wikimedia.org/T14423) (owner: 10coren) [00:05:14] 6operations, 10Analytics-EventLogging, 6Analytics-Kanban: EventLogging query strings are truncated to 1014 bytes by ?(varnishncsa? or udp packet size?) - https://phabricator.wikimedia.org/T91347#1094305 (10mforns) a:3mforns [00:05:28] Krenair: Which changes are missing? [00:06:14] Probably most of them? Those are basically the ones we talked about on IRC (which can't possibly have been all of them) [00:07:09] PROBLEM - git.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:08:05] Coren, some obvious ones: wmgUsePageImages, wgExtraInterlanguageLinkPrefixes, ... [00:08:14] Coren, still have a redundant entry in wgMetaNamespace.... [00:09:19] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 59688 bytes in 0.229 second response time [00:09:21] I haven't even looked at the other files that aren't InitialiseSettings [00:11:20] Krenair: Ah! I didn't think of looking for 'things-where-wikisource-has-a-setting-that-is-not-default' rather than just 'things-where-sourceswiki-was-different' [00:11:41] (03PS7) 10Nemo bis: Move sourceswiki special.dblist->wikisource.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194549 (https://phabricator.wikimedia.org/T14423) (owner: 10coren) [00:12:20] Coren, things where sourceswiki was specifically configured should continue to work [00:13:05] in most of those cases, if it's become redundant due to the wikisource group already having that, it might be OK to remove [00:13:08] But sourceswiki is SPECIAL [00:13:35] Nemo_bis: The problem being that it's not supposed to be an ended up being special by accident and neglect. :-) [00:13:37] ummm [00:14:03] It's even the only "mul" wiki we have [00:14:13] Update ContentTranslation to 8c40c7a is not deployed [00:14:15] kart_: ^ [00:14:24] (03PS1) 10BBlack: depool cp3013 [puppet] - 10https://gerrit.wikimedia.org/r/194762 [00:14:36] a bit late/early for kart I think [00:14:40] (03CR) 10BBlack: [C: 032 V: 032] depool cp3013 [puppet] - 10https://gerrit.wikimedia.org/r/194762 (owner: 10BBlack) [00:14:43] legoktm, is it on tin? [00:15:00] no I just pulled it in [00:15:06] !log depooled cp3013 in pybal [00:15:11] Logged the message, Master [00:15:28] wait no [00:15:38] I'm just confused [00:16:00] everything is fine [00:16:03] kart_: ignore that :) [00:18:22] !log legoktm Started scap: Flow and WikimediaMessages updates [00:18:27] Logged the message, Master [00:25:30] RECOVERY - check if wikidata.org dispatch lag is higher than 2 minutes on wikidata is OK: HTTP OK: HTTP/1.1 200 OK - 1450 bytes in 0.222 second response time [00:26:07] (03PS1) 10BBlack: bugfix for 28ae08ad [puppet] - 10https://gerrit.wikimedia.org/r/194764 [00:26:25] (03CR) 10BBlack: [C: 032 V: 032] bugfix for 28ae08ad [puppet] - 10https://gerrit.wikimedia.org/r/194764 (owner: 10BBlack) [00:30:45] mhm Krenair and Coren, we also have a script somewhere that regenerates these lists [00:30:59] dblists? [00:31:04] yep [00:31:15] ...? [00:31:55] don't remember where it is, but suspect it would reset them [00:33:16] !log legoktm Finished scap: Flow and WikimediaMessages updates (duration: 14m 53s) [00:33:20] Logged the message, Master [00:33:28] superm401: ^ done [00:33:37] legoktm, thanks. Can I run my scripts now? [00:33:43] Or should I wait until the end? [00:33:44] go for it [00:33:56] well, I don't think anyone else is deploying anything... [00:34:33] (03PS1) 10BBlack: move trusty global default below nodes [puppet] - 10https://gerrit.wikimedia.org/r/194767 [00:34:39] you're not thinking of the size-based DB lists are you, MaxSem? [00:34:49] (03CR) 10BBlack: [C: 032 V: 032] move trusty global default below nodes [puppet] - 10https://gerrit.wikimedia.org/r/194767 (owner: 10BBlack) [00:35:55] no [00:35:57] :P [00:43:01] PROBLEM - DPKG on cp3022 is CRITICAL: Connection refused by host [00:43:19] PROBLEM - Disk space on cp3022 is CRITICAL: Connection refused by host [00:43:30] PROBLEM - HTTPS on cp3022 is CRITICAL: Return code of 255 is out of bounds [00:43:42] ^ ignore that :P [00:43:50] PROBLEM - RAID on cp3022 is CRITICAL: Connection refused by host [00:44:10] PROBLEM - Varnish HTTP bits on cp3022 is CRITICAL: Connection refused [00:46:11] (03PS1) 10BBlack: Revert trusty/jessie installer defaults changes [puppet] - 10https://gerrit.wikimedia.org/r/194768 [00:46:13] (03PS1) 10BBlack: cp3*/amssq* -> jessie installer [puppet] - 10https://gerrit.wikimedia.org/r/194769 [00:46:24] (03CR) 10BBlack: [C: 032 V: 032] Revert trusty/jessie installer defaults changes [puppet] - 10https://gerrit.wikimedia.org/r/194768 (owner: 10BBlack) [00:46:36] (03CR) 10BBlack: [C: 032 V: 032] cp3*/amssq* -> jessie installer [puppet] - 10https://gerrit.wikimedia.org/r/194769 (owner: 10BBlack) [00:51:39] PROBLEM - Host cp3022 is DOWN: PING CRITICAL - Packet loss = 100% [00:52:41] (03CR) 10Shizhao: [C: 031] Add Draft namespace on zhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193827 (https://phabricator.wikimedia.org/T91223) (owner: 10Gerrit Patch Uploader) [00:56:30] RECOVERY - Host cp3022 is UP: PING OK - Packet loss = 0%, RTA = 89.61 ms [00:56:31] (03PS1) 10BBlack: cp[14]* -> jessie installer [puppet] - 10https://gerrit.wikimedia.org/r/194771 [00:57:08] (03CR) 10BBlack: [C: 032 V: 032] cp[14]* -> jessie installer [puppet] - 10https://gerrit.wikimedia.org/r/194771 (owner: 10BBlack) [01:03:40] PROBLEM - MySQL Processlist on db1034 is CRITICAL: CRIT 0 unauthenticated, 0 locked, 156 copy to table, 14 statistics [01:07:21] PROBLEM - Unmerged changes on repository puppet on palladium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [01:08:01] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [01:10:41] RECOVERY - Unmerged changes on repository puppet on palladium is OK: No changes to merge. [01:11:21] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [01:11:21] RECOVERY - MySQL Processlist on db1034 is OK: OK 0 unauthenticated, 0 locked, 17 copy to table, 1 statistics [01:12:57] !log killing query storm on s7, SpecialWhatLinksHere::showIndirectLinks [01:13:03] Logged the message, Master [01:16:41] springle: Seems to continue [01:18:00] or maybe not [01:18:51] * springle waits [01:19:25] all from 46.198.138.209 [01:22:34] do you have the full query... I wonder why it was so slow [01:22:43] or was it just the flood that made stuff trip oveR? [01:23:54] both. https://phabricator.wikimedia.org/T89630 [01:24:10] s7 hasn't had the index change i mentioned there ^ yet [01:24:41] metawiki seems to be the problem [01:25:43] (03PS1) 10BBlack: repool cp3014,cp3022 [puppet] - 10https://gerrit.wikimedia.org/r/194776 [01:27:06] !log reindexing s7 pagelinks T89630 [01:27:08] (03CR) 10BBlack: [C: 032] repool cp3014,cp3022 [puppet] - 10https://gerrit.wikimedia.org/r/194776 (owner: 10BBlack) [01:27:14] Logged the message, Master [01:28:16] !log repooled cp3014,cp3022 in pybal [01:28:22] Logged the message, Master [01:31:58] 7Puppet, 6Multimedia, 6Scrum-of-Scrums, 7Blocked-on-RelEng: Create basic puppet role for Sentry - https://phabricator.wikimedia.org/T84956#1094551 (10Tgr) [01:32:37] (03PS1) 10BBlack: depool cp3017 [puppet] - 10https://gerrit.wikimedia.org/r/194778 [01:32:56] !log depooled cp3017 in pybal [01:33:04] Logged the message, Master [01:33:15] (03CR) 10BBlack: [C: 032 V: 032] depool cp3017 [puppet] - 10https://gerrit.wikimedia.org/r/194778 (owner: 10BBlack) [01:34:23] 7Puppet, 6Multimedia, 6Scrum-of-Scrums, 7Blocked-on-RelEng: Create basic puppet role for Sentry - https://phabricator.wikimedia.org/T84956#1094573 (10Tgr) [01:43:03] (03CR) 10Springle: [C: 031] "It is mostly duplicated by dbtree, but it still works ok, and does show wiki lists which I haven't put back into dbtree yet." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194736 (https://phabricator.wikimedia.org/T90837) (owner: 10Dzahn) [01:44:30] (03CR) 10Springle: [C: 031] dbtree: use service alias instead of server name [puppet] - 10https://gerrit.wikimedia.org/r/194010 (owner: 10Dzahn) [01:44:44] (03CR) 10Springle: [C: 031] add service name for tendril backend db [dns] - 10https://gerrit.wikimedia.org/r/194005 (owner: 10Dzahn) [01:45:08] 6operations: improve cron spam visibility - https://phabricator.wikimedia.org/T84845#1094640 (10Tgr) [02:00:33] (03PS1) 10Dzahn: WIP: role and module for contacts [puppet] - 10https://gerrit.wikimedia.org/r/194786 (https://phabricator.wikimedia.org/T90679) [02:00:43] (03CR) 10Dzahn: [C: 032] add service name for tendril backend db [dns] - 10https://gerrit.wikimedia.org/r/194005 (owner: 10Dzahn) [02:01:20] (03PS2) 10Dzahn: add service name for tendril backend db [dns] - 10https://gerrit.wikimedia.org/r/194005 [02:08:54] 6operations, 5Patch-For-Review: contacts.wikimedia.org drupal unpuppetized - https://phabricator.wikimedia.org/T90679#1094689 (10Dzahn) < quiddity> mutante, I've emailed anna koval, who might know. < abartov> mutante: AFAIK, the Edu team now uses Asana to manage their contacts, and it is likely their Civi inst... [02:09:12] 6operations, 5Patch-For-Review: contacts.wikimedia.org drupal unpuppetized - https://phabricator.wikimedia.org/T90679#1094690 (10Dzahn) [02:10:31] (03CR) 10Dzahn: [C: 032] dbtree: use service alias instead of server name [puppet] - 10https://gerrit.wikimedia.org/r/194010 (owner: 10Dzahn) [02:17:21] (03PS1) 10Springle: Increase MariaDB thread_pool_size. [puppet] - 10https://gerrit.wikimedia.org/r/194788 [02:18:36] (03CR) 10Springle: [C: 032] Increase MariaDB thread_pool_size. [puppet] - 10https://gerrit.wikimedia.org/r/194788 (owner: 10Springle) [02:20:01] (03CR) 10Dzahn: "looks pretty solid to me. i mean, it even provides tests :). hard to find nitpicks, maybe one is missing README.md in the module root beca" [puppet] - 10https://gerrit.wikimedia.org/r/194495 (https://phabricator.wikimedia.org/T89867) (owner: 10Alexandros Kosiaris) [02:22:11] (03PS1) 10Dzahn: bugzilla: remove ferm service for port 443 [puppet] - 10https://gerrit.wikimedia.org/r/194789 [02:24:00] !log l10nupdate Synchronized php-1.25wmf19/cache/l10n: (no message) (duration: 00m 01s) [02:24:08] Logged the message, Master [02:24:23] (03PS1) 10Dzahn: planet: remove ferm service for 443 [puppet] - 10https://gerrit.wikimedia.org/r/194790 [02:25:08] !log LocalisationUpdate completed (1.25wmf19) at 2015-03-06 02:24:04+00:00 [02:25:13] Logged the message, Master [02:25:14] (03PS1) 10Dzahn: racktables: remove ferm service for 443 [puppet] - 10https://gerrit.wikimedia.org/r/194791 [02:25:18] (03PS1) 10BBlack: repool cp3017 [puppet] - 10https://gerrit.wikimedia.org/r/194792 [02:25:34] (03CR) 10BBlack: [C: 032 V: 032] repool cp3017 [puppet] - 10https://gerrit.wikimedia.org/r/194792 (owner: 10BBlack) [02:26:08] !log cp3017 repooled in pybal [02:26:08] !log l10nupdate Synchronized php-1.25wmf20/cache/l10n: (no message) (duration: 00m 04s) [02:26:13] Logged the message, Master [02:26:17] Logged the message, Master [02:26:29] (03PS1) 10Dzahn: releases: remove ferm service for 443 [puppet] - 10https://gerrit.wikimedia.org/r/194793 [02:27:15] !log LocalisationUpdate completed (1.25wmf20) at 2015-03-06 02:26:12+00:00 [02:27:20] Logged the message, Master [02:34:08] (03PS2) 10Dzahn: releases: remove ferm service for 443 [puppet] - 10https://gerrit.wikimedia.org/r/194793 [02:34:10] (03PS1) 10Dzahn: releases: typo in docs: ssh->tls/ssl [puppet] - 10https://gerrit.wikimedia.org/r/194795 [02:40:03] (03PS1) 10Dzahn: WIP: move misc/labsdebrepo to module [puppet] - 10https://gerrit.wikimedia.org/r/194796 [02:46:41] (03PS1) 10Dzahn: site.pp - node comments [puppet] - 10https://gerrit.wikimedia.org/r/194797 [02:49:03] (03PS2) 10Dzahn: site.pp - node comments [puppet] - 10https://gerrit.wikimedia.org/r/194797 [02:50:26] (03PS1) 10Dzahn: put base::firewall on calcium [puppet] - 10https://gerrit.wikimedia.org/r/194799 (https://phabricator.wikimedia.org/T83044) [02:56:23] (03PS1) 10BBlack: mark jessie hosts [puppet] - 10https://gerrit.wikimedia.org/r/194800 [02:56:25] (03PS1) 10BBlack: depool amssq3[24] for reinstall [puppet] - 10https://gerrit.wikimedia.org/r/194801 [02:56:51] (03CR) 10BBlack: [C: 032 V: 032] mark jessie hosts [puppet] - 10https://gerrit.wikimedia.org/r/194800 (owner: 10BBlack) [02:57:00] (03PS1) 10Dzahn: put base::firewall on netmon1001 [puppet] - 10https://gerrit.wikimedia.org/r/194802 [02:57:02] (03CR) 10BBlack: [C: 032 V: 032] depool amssq3[24] for reinstall [puppet] - 10https://gerrit.wikimedia.org/r/194801 (owner: 10BBlack) [02:57:22] !log amssq3[24] depooled in pybal [02:57:30] Logged the message, Master [03:04:22] PROBLEM - puppet last run on amssq49 is CRITICAL: CRITICAL: Puppet has 1 failures [03:04:57] (03PS1) 10Dzahn: put base::firewall on californium [puppet] - 10https://gerrit.wikimedia.org/r/194804 [03:05:31] PROBLEM - puppet last run on amssq53 is CRITICAL: CRITICAL: Puppet has 1 failures [03:05:31] RECOVERY - puppet last run on amssq49 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [03:05:50] PROBLEM - puppet last run on amssq61 is CRITICAL: CRITICAL: Puppet has 1 failures [03:07:20] (03PS1) 10Dzahn: horizon: add firewall hole for http [puppet] - 10https://gerrit.wikimedia.org/r/194805 [03:07:44] (03CR) 10Dzahn: "this would be after https://gerrit.wikimedia.org/r/#/c/194805/" [puppet] - 10https://gerrit.wikimedia.org/r/194804 (owner: 10Dzahn) [03:08:20] (03CR) 10Dzahn: [C: 04-2] put base::firewall on netmon1001 [puppet] - 10https://gerrit.wikimedia.org/r/194802 (owner: 10Dzahn) [03:08:48] (03CR) 10Dzahn: [C: 04-2] WIP: move misc/labsdebrepo to module [puppet] - 10https://gerrit.wikimedia.org/r/194796 (owner: 10Dzahn) [03:09:51] PROBLEM - puppet last run on amssq34 is CRITICAL: CRITICAL: Puppet has 1 failures [03:12:01] PROBLEM - Host amssq34 is DOWN: PING CRITICAL - Packet loss = 100% [03:12:40] PROBLEM - Host amssq32 is DOWN: PING CRITICAL - Packet loss = 100% [03:13:12] RECOVERY - Host amssq34 is UP: PING WARNING - Packet loss = 73%, RTA = 91.75 ms [03:13:21] RECOVERY - Host amssq32 is UP: PING OK - Packet loss = 0%, RTA = 88.97 ms [03:16:43] (03CR) 10Dzahn: [C: 031] Puppet module for the zotero service [puppet] - 10https://gerrit.wikimedia.org/r/194495 (https://phabricator.wikimedia.org/T89867) (owner: 10Alexandros Kosiaris) [03:17:00] PROBLEM - Host amssq34 is DOWN: PING CRITICAL - Packet loss = 100% [03:17:10] PROBLEM - Host amssq32 is DOWN: PING CRITICAL - Packet loss = 100% [03:18:06] (03CR) 10Dzahn: [C: 04-2] WIP: role and module for contacts [puppet] - 10https://gerrit.wikimedia.org/r/194786 (https://phabricator.wikimedia.org/T90679) (owner: 10Dzahn) [03:19:11] RECOVERY - Host amssq34 is UP: PING OK - Packet loss = 0%, RTA = 88.61 ms [03:19:51] RECOVERY - Host amssq32 is UP: PING OK - Packet loss = 0%, RTA = 89.48 ms [03:23:39] (03PS1) 10Dzahn: create shell account for Jeff Hobson [puppet] - 10https://gerrit.wikimedia.org/r/194806 (https://phabricator.wikimedia.org/T90624) [03:24:35] RECOVERY - puppet last run on amssq53 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [03:24:53] RECOVERY - puppet last run on amssq61 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [03:28:08] (03CR) 10Dzahn: "cool !:) can i then also move it out of misc, somehow like this: https://gerrit.wikimedia.org/r/#/c/194796/ ?:)" [puppet] - 10https://gerrit.wikimedia.org/r/119428 (https://phabricator.wikimedia.org/T62925) (owner: 10Tim Landscheidt) [03:30:40] (03CR) 10Dzahn: "mmaybe the IP addresses in exports.dumps could become parameters in the role class?" [puppet] - 10https://gerrit.wikimedia.org/r/194395 (owner: 10coren) [03:31:51] (03CR) 10Dzahn: "do it" [puppet] - 10https://gerrit.wikimedia.org/r/162860 (owner: 10ArielGlenn) [03:33:21] (03CR) 10Dzahn: "@alex comment: apache-graceful-all is gone meanwhile, dsh is only used for scap anymore afaict" [puppet] - 10https://gerrit.wikimedia.org/r/160628 (owner: 10Matanya) [03:35:26] (03CR) 10Dzahn: [C: 032] "https://packages.debian.org/jessie/libdistro-info-perl" [puppet] - 10https://gerrit.wikimedia.org/r/191677 (owner: 10Hashar) [03:38:46] (03PS1) 10BBlack: depool cp4015 [puppet] - 10https://gerrit.wikimedia.org/r/194808 [03:39:05] (03CR) 10BBlack: [C: 032 V: 032] depool cp4015 [puppet] - 10https://gerrit.wikimedia.org/r/194808 (owner: 10BBlack) [03:39:39] !log depooled cp4015 in pybal [03:39:45] Logged the message, Master [03:55:19] !log Completed running FlowAddMissingModerationLogs.php and FlowFixLog.php on all Flow wikis [03:55:24] Logged the message, Master [04:05:20] PROBLEM - Host cp4015 is DOWN: PING CRITICAL - Packet loss = 100% [04:09:31] (03PS1) 10Mattflaschen: Use a dblist for Flow [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194809 [04:14:36] (03CR) 10Mattflaschen: [C: 04-1] "Let's wait until we're doing a deploy anyway and include this." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194809 (owner: 10Mattflaschen) [04:26:56] (03CR) 10Giuseppe Lavagetto: [C: 031] noc: add link to new pybal config files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194742 (owner: 10Dzahn) [04:35:30] RECOVERY - Host cp4015 is UP: PING OK - Packet loss = 0%, RTA = 79.57 ms [04:50:36] (03CR) 10Giuseppe Lavagetto: [C: 031] "I think Coren is correct, but we shouldn't overcomplicate things ATM. You should probably merge this patch." [puppet] - 10https://gerrit.wikimedia.org/r/194095 (https://phabricator.wikimedia.org/T91498) (owner: 10Yuvipanda) [04:51:24] (03CR) 10Giuseppe Lavagetto: [C: 031] noc: remove broken symlink to pybal [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194735 (owner: 10Dzahn) [05:09:16] (03CR) 10GWicke: [C: 031] noc: add link to new pybal config files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194742 (owner: 10Dzahn) [05:13:42] !log repooled cp4015 in pybal [05:13:49] Logged the message, Master [05:14:19] (03PS1) 10BBlack: repool cp4015 [puppet] - 10https://gerrit.wikimedia.org/r/194819 [05:14:39] (03CR) 10BBlack: [C: 032 V: 032] repool cp4015 [puppet] - 10https://gerrit.wikimedia.org/r/194819 (owner: 10BBlack) [05:19:47] (03CR) 10GWicke: [C: 032] noc: add link to new pybal config files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194742 (owner: 10Dzahn) [05:19:49] (03CR) 10jenkins-bot: [V: 04-1] noc: add link to new pybal config files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194742 (owner: 10Dzahn) [05:20:49] (03CR) 10Chiefwei: [C: 031] Add Draft namespace on zhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193827 (https://phabricator.wikimedia.org/T91223) (owner: 10Gerrit Patch Uploader) [05:29:47] (03PS1) 10BBlack: depool cp1068 [puppet] - 10https://gerrit.wikimedia.org/r/194820 [05:30:45] !log depooled cp1068 in pybal [05:30:49] Logged the message, Master [05:34:31] PROBLEM - Host cp1068 is DOWN: PING CRITICAL - Packet loss = 100% [05:38:31] RECOVERY - Host cp1068 is UP: PING OK - Packet loss = 0%, RTA = 2.38 ms [05:44:50] it's possible the one from irc was due to a varnish stop->start event without the host going down [05:44:54] maybe I'll try that once this is done [05:50:44] (03CR) 10Andrew Bogott: [C: 031] horizon: add firewall hole for http [puppet] - 10https://gerrit.wikimedia.org/r/194805 (owner: 10Dzahn) [05:51:00] (03CR) 10Andrew Bogott: [C: 031] "Now if only we could access the box..." [puppet] - 10https://gerrit.wikimedia.org/r/194804 (owner: 10Dzahn) [05:57:11] !log repooled cp1068 in pybal [05:57:18] Logged the message, Master [05:59:02] (03PS2) 10BBlack: cp1068 -> jessie [puppet] - 10https://gerrit.wikimedia.org/r/194820 [05:59:17] (03CR) 10BBlack: [C: 032 V: 032] cp1068 -> jessie [puppet] - 10https://gerrit.wikimedia.org/r/194820 (owner: 10BBlack) [06:01:31] !log depool cp1063 in pybal [06:01:37] Logged the message, Master [06:02:02] 6operations: Dear ops-requests@rt.wikimedia.org, Call for Submissions on Various Academic Disciplines - https://phabricator.wikimedia.org/T91732#1094877 (10emailbot) [06:03:47] also, btw, apparently *all* the R620's do the Lifecycle Controller thing if you use omsa to apply a bios change like hyperthreading, not just esams [06:03:54] 6operations: more robust certificate chain creation in puppet - https://phabricator.wikimedia.org/T84543#1094883 (10Dzahn) [06:04:06] (which also explains why cp4015 was so wonky but eventually worked in the dark: it was going through the double reboot for that) [06:04:12] PROBLEM - Host cp1063 is DOWN: PING CRITICAL - Packet loss = 100% [06:05:10] 6operations: more robust certificate chain creation in puppet - https://phabricator.wikimedia.org/T84543#928592 (10Dzahn) @qchris thanks for confirming. i made it public. and here's a related Gerrit change: https://gerrit.wikimedia.org/r/#/c/194455/ [06:12:12] RECOVERY - Host cp1063 is UP: PING OK - Packet loss = 0%, RTA = 0.43 ms [06:14:21] PROBLEM - configured eth on cp1063 is CRITICAL: Connection refused by host [06:14:32] PROBLEM - dhclient process on cp1063 is CRITICAL: Connection refused by host [06:14:42] PROBLEM - Disk space on cp1063 is CRITICAL: Connection refused by host [06:14:42] PROBLEM - Varnish traffic logger on cp1063 is CRITICAL: Connection refused by host [06:14:51] PROBLEM - puppet last run on cp1063 is CRITICAL: Connection refused by host [06:15:02] PROBLEM - Varnish HTCP daemon on cp1063 is CRITICAL: Connection refused by host [06:15:12] PROBLEM - RAID on cp1063 is CRITICAL: Connection refused by host [06:15:12] PROBLEM - Varnishkafka log producer on cp1063 is CRITICAL: Connection refused by host [06:15:22] PROBLEM - HTTPS on cp1063 is CRITICAL: Return code of 255 is out of bounds [06:15:22] PROBLEM - salt-minion processes on cp1063 is CRITICAL: Connection refused by host [06:15:22] PROBLEM - DPKG on cp1063 is CRITICAL: Connection refused by host [06:15:22] PROBLEM - Varnish HTTP upload-backend on cp1063 is CRITICAL: Connection refused [06:15:22] PROBLEM - Varnish HTTP upload-frontend on cp1063 is CRITICAL: Connection refused [06:21:31] ^ all ok, I just forgot to disable in icinga :P [06:25:51] (03PS1) 10BBlack: repool amssq3[24], tag cp1063 [puppet] - 10https://gerrit.wikimedia.org/r/194826 [06:26:07] !log repooled amssq3[24] + cp1063 in pybal [06:26:15] Logged the message, Master [06:28:21] PROBLEM - puppet last run on mw1061 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:21] PROBLEM - puppet last run on mw1092 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:31] PROBLEM - Disk space on fluorine is CRITICAL: DISK CRITICAL - free space: /a 74804 MB (3% inode=99%): [06:29:00] PROBLEM - puppet last run on mw1002 is CRITICAL: CRITICAL: puppet fail [06:29:10] PROBLEM - puppet last run on db1051 is CRITICAL: CRITICAL: Puppet has 2 failures [06:29:40] PROBLEM - puppet last run on db1021 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:10] PROBLEM - puppet last run on mw1235 is CRITICAL: CRITICAL: Puppet has 2 failures [06:30:23] (03CR) 10BBlack: [C: 032] repool amssq3[24], tag cp1063 [puppet] - 10https://gerrit.wikimedia.org/r/194826 (owner: 10BBlack) [06:31:31] PROBLEM - puppet last run on amssq35 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:50] PROBLEM - puppet last run on db2040 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:20] PROBLEM - puppet last run on mw1123 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:01] PROBLEM - puppet last run on mw1251 is CRITICAL: CRITICAL: Puppet has 1 failures [06:40:49] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [06:41:13] <_joe_> bblack: ^^ [06:41:22] blerg [06:41:49] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [06:45:18] RECOVERY - puppet last run on amssq35 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [06:45:39] RECOVERY - puppet last run on mw1092 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [06:46:29] RECOVERY - puppet last run on db1021 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [06:46:29] RECOVERY - puppet last run on mw1002 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [06:46:29] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 7.14% of data above the critical threshold [500.0] [06:46:39] RECOVERY - puppet last run on mw1123 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [06:46:39] RECOVERY - puppet last run on mw1235 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [06:46:48] RECOVERY - puppet last run on db1051 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [06:46:59] RECOVERY - puppet last run on mw1061 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [06:47:08] RECOVERY - puppet last run on mw1251 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:09] RECOVERY - puppet last run on db2040 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [06:53:25] (03CR) 10Dzahn: "@Ori it's gone meanwhile in the change linked above, i think this can be abandoned" [puppet] - 10https://gerrit.wikimedia.org/r/164508 (owner: 10Ori.livneh) [06:55:15] !log LocalisationUpdate ResourceLoader cache refresh completed at Fri Mar 6 06:54:11 UTC 2015 (duration 54m 10s) [06:55:20] Logged the message, Master [06:58:48] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [07:03:04] (03PS1) 10Dzahn: add kartik and nikerabbit to statistics-users [puppet] - 10https://gerrit.wikimedia.org/r/194827 (https://phabricator.wikimedia.org/T91625) [07:07:38] RECOVERY - Disk space on fluorine is OK: DISK OK [07:42:47] (03PS1) 10Dzahn: replace failing fonts (oriya,unfonts,kannada) [puppet] - 10https://gerrit.wikimedia.org/r/194828 (https://phabricator.wikimedia.org/T91685) [07:59:28] (03PS1) 10Tim Landscheidt: Tools: Let sql query DNS as well for aliases [puppet] - 10https://gerrit.wikimedia.org/r/194829 (https://phabricator.wikimedia.org/T91733) [08:00:46] (03CR) 10Tim Landscheidt: "Tested to work." [puppet] - 10https://gerrit.wikimedia.org/r/194829 (https://phabricator.wikimedia.org/T91733) (owner: 10Tim Landscheidt) [08:06:12] (03PS1) 10Giuseppe Lavagetto: mediawiki: add configs to support the Dallas DC [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194830 [08:06:55] <_joe_> springle: ^^ it's a work in progress, of course, but I committed it so that you can take a look [08:16:36] (03PS2) 10Giuseppe Lavagetto: mediawiki: add configs to support the Dallas DC [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194830 [08:33:30] (03CR) 10Tim Landscheidt: "@Dzahn: Sure. I had planned that for some time in the future anyway; I just like to keep patches manageable, i. e. not combine different " [puppet] - 10https://gerrit.wikimedia.org/r/119428 (https://phabricator.wikimedia.org/T62925) (owner: 10Tim Landscheidt) [08:34:37] (03PS6) 10Yuvipanda: labsdeprepo: Allow more than one local repository [puppet] - 10https://gerrit.wikimedia.org/r/118796 (https://phabricator.wikimedia.org/T62925) (owner: 10Tim Landscheidt) [08:34:49] (03PS4) 10Yuvipanda: Tools: Use labsdeprepo [puppet] - 10https://gerrit.wikimedia.org/r/119428 (https://phabricator.wikimedia.org/T62925) (owner: 10Tim Landscheidt) [08:38:19] (03CR) 10Yuvipanda: [C: 032] labsdeprepo: Allow more than one local repository [puppet] - 10https://gerrit.wikimedia.org/r/118796 (https://phabricator.wikimedia.org/T62925) (owner: 10Tim Landscheidt) [08:41:49] PROBLEM - puppet last run on cp1063 is CRITICAL: CRITICAL: Puppet has 1 failures [08:42:07] (03CR) 10Tim Landscheidt: [C: 04-1] "With:" [puppet] - 10https://gerrit.wikimedia.org/r/193561 (https://phabricator.wikimedia.org/T91066) (owner: 10Yuvipanda) [08:43:50] (03CR) 10Yuvipanda: "I still see some tools running manually on the tomcat node now and then. I s there a way to track this?" [puppet] - 10https://gerrit.wikimedia.org/r/193561 (https://phabricator.wikimedia.org/T91066) (owner: 10Yuvipanda) [08:44:06] (03CR) 10Yuvipanda: [C: 032] Tools: Use labsdeprepo [puppet] - 10https://gerrit.wikimedia.org/r/119428 (https://phabricator.wikimedia.org/T62925) (owner: 10Tim Landscheidt) [08:48:24] (03PS1) 10Yuvipanda: labs: Unify lvm roles [puppet] - 10https://gerrit.wikimedia.org/r/194831 [08:48:38] _joe_: ^ cleaned up some of the lvm stuff. I’ll take care of it in a while [08:49:24] (03PS2) 10Yuvipanda: Tools: Let sql query DNS as well for aliases [puppet] - 10https://gerrit.wikimedia.org/r/194829 (https://phabricator.wikimedia.org/T91733) (owner: 10Tim Landscheidt) [08:52:09] (03CR) 10Tim Landscheidt: ""qacct -j \* -q webgrid-tomcat" (+ tail/less/etc. :-)) seems to work." [puppet] - 10https://gerrit.wikimedia.org/r/193561 (https://phabricator.wikimedia.org/T91066) (owner: 10Yuvipanda) [08:52:50] (03CR) 10Yuvipanda: [C: 032] Tools: Let sql query DNS as well for aliases [puppet] - 10https://gerrit.wikimedia.org/r/194829 (https://phabricator.wikimedia.org/T91733) (owner: 10Tim Landscheidt) [08:53:05] (03CR) 10Yuvipanda: ":D works well, yay :)" [puppet] - 10https://gerrit.wikimedia.org/r/119428 (https://phabricator.wikimedia.org/T62925) (owner: 10Tim Landscheidt) [08:55:32] (03PS2) 10Giuseppe Lavagetto: labs: Unify lvm roles [puppet] - 10https://gerrit.wikimedia.org/r/194831 (owner: 10Yuvipanda) [08:55:53] (03CR) 10Giuseppe Lavagetto: [C: 031] "This is a noop and should be perfectly good." [puppet] - 10https://gerrit.wikimedia.org/r/194831 (owner: 10Yuvipanda) [08:56:26] (03CR) 10Yuvipanda: [C: 032] labs: Unify lvm roles [puppet] - 10https://gerrit.wikimedia.org/r/194831 (owner: 10Yuvipanda) [09:01:28] (03PS1) 10Yuvipanda: tools: Cosmetic fixes to sql tool [puppet] - 10https://gerrit.wikimedia.org/r/194833 [09:01:50] (03CR) 10Yuvipanda: [C: 032] tools: Cosmetic fixes to sql tool [puppet] - 10https://gerrit.wikimedia.org/r/194833 (owner: 10Yuvipanda) [09:02:06] (03CR) 10Yuvipanda: [V: 032] tools: Cosmetic fixes to sql tool [puppet] - 10https://gerrit.wikimedia.org/r/194833 (owner: 10Yuvipanda) [09:02:53] (03CR) 10Tim Landscheidt: "Still I had to manually remove /data/project/.system/deb/all/Packages.gz, so my pride has a scratch :-)." [puppet] - 10https://gerrit.wikimedia.org/r/119428 (https://phabricator.wikimedia.org/T62925) (owner: 10Tim Landscheidt) [09:03:26] (03CR) 10Yuvipanda: ":) Still far better than what I would've done, which is to hand-do these things :)" [puppet] - 10https://gerrit.wikimedia.org/r/119428 (https://phabricator.wikimedia.org/T62925) (owner: 10Tim Landscheidt) [09:06:40] hi hashar [09:09:27] hello [09:10:43] hasharConf: ah, at a conf. I guess you won’t have time to help with the parsoid test today? [09:17:26] !log Jenkins: upgrading and restarting. Wish me luck. [09:17:33] Logged the message, Master [09:17:56] YuviPanda: I have no idea how the parsoid stuff works. You probably want to reach out to the Parsoid team. Cscott should know [09:18:05] YuviPanda: else MarkTraceur and Gabriel Wicke :) [09:18:13] hasharConf: err, this is the jenkins integration with parsoid, not parosid itself [09:18:22] the patch I had has parsoid itself working fine. [09:18:55] https://gerrit.wikimedia.org/r/#/c/193082/ [09:19:04] I responded to all your comments :) [09:22:28] PROBLEM - jenkins_service_running on gallium is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/java .*-jar /usr/share/jenkins/jenkins.war [09:23:38] RECOVERY - jenkins_service_running on gallium is OK: PROCS OK: 1 process with regex args ^/usr/bin/java .*-jar /usr/share/jenkins/jenkins.war [09:24:16] ^^ that is me [10:07:28] PROBLEM - check if wikidata.org dispatch lag is higher than 2 minutes on wikidata is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1462 bytes in 0.268 second response time [10:14:58] PROBLEM - puppet last run on wtp1007 is CRITICAL: CRITICAL: Puppet has 1 failures [10:16:23] (03CR) 10Santhosh: [C: 031] add kartik and nikerabbit to statistics-users [puppet] - 10https://gerrit.wikimedia.org/r/194827 (https://phabricator.wikimedia.org/T91625) (owner: 10Dzahn) [10:23:09] PROBLEM - puppet last run on cp4017 is CRITICAL: CRITICAL: Puppet has 1 failures [10:32:39] RECOVERY - puppet last run on wtp1007 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [10:39:49] RECOVERY - puppet last run on cp4017 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [11:01:43] (03CR) 10KartikMistry: [C: 031] add kartik and nikerabbit to statistics-users [puppet] - 10https://gerrit.wikimedia.org/r/194827 (https://phabricator.wikimedia.org/T91625) (owner: 10Dzahn) [11:34:06] (03CR) 10JanZerebecki: [C: 04-1] "This patch is a good idea." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/194455 (https://phabricator.wikimedia.org/T84543) (owner: 10Dzahn) [11:45:40] <_joe_> mmmh we're missing wikibugs [11:45:50] <_joe_> YuviPanda: do you know how to restart it? [11:46:00] oh [11:46:01] maybe [11:46:03] let me poke [11:47:57] jesus [11:48:03] https://tools.wmflabs.org/?status [11:48:03] <_joe_> what? [11:48:10] something’s wrong with toollabs. [11:48:11] again. [11:48:19] * YuviPanda looks [11:48:21] <_joe_> :( [11:48:24] <_joe_> sorry [11:51:36] _joe_: I started wikibugs back up. it’s slowly joining channels (to avoid flood kick) [11:51:51] <_joe_> YuviPanda: what's up with toollabs? [11:52:07] _joe_: everything seems ok, except for the listing of tools statuses on the page I linked you to [11:52:16] <_joe_> oh ok [11:52:32] things seeming ok isn’t really an indication of anything... [11:53:07] I should convince mark to have toollabs be a ‘goal’ next quarter [11:53:28] go ahead with a proposal [11:53:34] we need to formulate our team goals before the end of next week [11:53:37] yup, I’ll write up a doc page. [11:53:38] <_joe_> he's always lurking [11:53:38] for next quarter [11:53:41] <_joe_> ;) [11:53:41] thanks :) [11:53:55] actually, etherpad. [12:05:37] 6operations, 6Multimedia, 7HHVM: Convert Imagescalers to HHVM, Trusty - https://phabricator.wikimedia.org/T84842#1095368 (10Joe) I found no other regressions, and - good news! - https://upload.wikimedia.org/wikipedia/commons/thumb/3/3b/WorldAviation.198409.BackCover.pdf/page1-342px-WorldAviation.198409.BackC... [12:08:31] 6operations, 6Multimedia, 7HHVM: Convert Imagescalers to HHVM, Trusty - https://phabricator.wikimedia.org/T84842#1095370 (10Joe) So I'd have to debug the animated gif problem, and after that I guess we can throw the new imagescaler in the ring as soon as this is resolved. [12:13:48] offffpfp [12:13:57] debian packaging kills me off :( [12:14:12] spent a good hour figuring out I missed the -nc option to skip the clean: target [12:15:13] DIST=precise-wikimedia git-buildpackage -S -nc --git-pbuilder [12:15:13] Building with cowbuilder for distribution sid [12:15:20] does't even honor my DIST env variable :( [12:21:44] 6operations, 6Multimedia, 7HHVM: Convert Imagescalers to HHVM, Trusty - https://phabricator.wikimedia.org/T84842#1095383 (10Joe) animated gif conversion seems to take a lot of memory ``` [4844824.480807] convert invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0 [4844824.480813] convert cpuset=/ m... [12:25:11] Krenair: why are you grepping through the puppet repo ? git grep works way faster and has no such issue. I do like that symlink btw. It makes my life way easier. But I am fine with alternative proposals [12:28:54] (03CR) 10Alexandros Kosiaris: [C: 04-1] ensure there is always a newline in chained certs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/194455 (https://phabricator.wikimedia.org/T84543) (owner: 10Dzahn) [12:34:11] (03CR) 10Alexandros Kosiaris: [C: 031] "I am way more comfortable with this change nowadays. Seems like the labstore issues will be resolved in T87870 which is fine and apache-gr" [puppet] - 10https://gerrit.wikimedia.org/r/160628 (owner: 10Matanya) [12:35:13] 6operations, 10Citoid: Update the citoid/deploy branch to not contain zotero deploy - https://phabricator.wikimedia.org/T89872#1095399 (10akosiaris) Change is at https://gerrit.wikimedia.org/r/#/c/194548/ (isn't a bot supposed to do what I just did ?) [13:02:09] 6operations, 6Labs, 5Patch-For-Review: Puppetize labstore1003 - https://phabricator.wikimedia.org/T91573#1095427 (10coren) p:5Triage>3Normal [13:07:59] RECOVERY - check if wikidata.org dispatch lag is higher than 2 minutes on wikidata is OK: HTTP OK: HTTP/1.1 200 OK - 1453 bytes in 0.229 second response time [13:32:22] !log db2017 testing innodb_use_native_aio=0 due to InnoDB assertion failure on kernel 3.13 [13:32:28] Logged the message, Master [13:36:28] (03PS4) 10coren: labstore1002 to Jessie [puppet] - 10https://gerrit.wikimedia.org/r/194537 (https://phabricator.wikimedia.org/T91640) [13:38:16] (03PS7) 10Alexandros Kosiaris: Puppet module for the zotero service [puppet] - 10https://gerrit.wikimedia.org/r/194495 (https://phabricator.wikimedia.org/T89867) [13:38:45] (03PS1) 10Nemo bis: Hide "prefershttps" preference on HSTS domains (ru): it has no effect [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194856 (https://phabricator.wikimedia.org/T91748) [13:39:00] (03CR) 10coren: [C: 032] labstore1002 to Jessie [puppet] - 10https://gerrit.wikimedia.org/r/194537 (https://phabricator.wikimedia.org/T91640) (owner: 10coren) [13:40:21] (03PS2) 10Nemo bis: Hide "prefershttps" preference on HSTS domains (ru): it has no effect [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194856 (https://phabricator.wikimedia.org/T91352) [13:45:44] (03PS1) 10Springle: MariaDB linux_use_native_aio=0 until kernel 3.16 [puppet] - 10https://gerrit.wikimedia.org/r/194857 [13:46:53] (03CR) 10Alexandros Kosiaris: "@ori. Thanks! much appreciated. I added some linting changes of my own" [puppet] - 10https://gerrit.wikimedia.org/r/194495 (https://phabricator.wikimedia.org/T89867) (owner: 10Alexandros Kosiaris) [13:47:49] (03CR) 10Springle: [C: 032] "Yes, the linked patch came from a bug report on a ppc64 kernel, but it's the only other report I've found yet and the symptoms and tack tr" [puppet] - 10https://gerrit.wikimedia.org/r/194857 (owner: 10Springle) [13:58:40] (03CR) 10Springle: [C: 04-1] mediawiki: add configs to support the Dallas DC (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194830 (owner: 10Giuseppe Lavagetto) [14:04:03] (03PS1) 10coren: Labs: manage resolv.conf in labs also [puppet] - 10https://gerrit.wikimedia.org/r/194858 (https://phabricator.wikimedia.org/T63897) [14:04:15] YuviPanda: Step one ^^ [14:06:41] (03CR) 10Yuvipanda: [C: 04-1] "Nits...." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/194858 (https://phabricator.wikimedia.org/T63897) (owner: 10coren) [14:06:45] Coren: \o/ [14:07:09] PROBLEM - puppet last run on lead is CRITICAL: CRITICAL: Puppet has 1 failures [14:11:28] (03CR) 10coren: Labs: manage resolv.conf in labs also (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/194858 (https://phabricator.wikimedia.org/T63897) (owner: 10coren) [14:11:45] YuviPanda: Picked one of your nits, then put nits on the other. :-) [14:13:20] YuviPanda: In other words, sure, that would mean 'parsoid-lb' resolves if there isn't an instance of that name. Just as it would if you specified the fqdn. Harmless. :-) [14:13:49] * Coren pushes the version with the @ [14:14:04] (03PS2) 10coren: Labs: manage resolv.conf in labs also [puppet] - 10https://gerrit.wikimedia.org/r/194858 (https://phabricator.wikimedia.org/T63897) [14:14:36] 6operations: Provide dh-virtualenv 0.9 dev package on apt.wikimedia.org Precise distribution - https://phabricator.wikimedia.org/T91631#1095510 (10hashar) [14:15:45] YuviPanda: Also, nameserver IP? Probably should live in hiera. [14:16:18] YuviPanda: We'll do this in one sweep in another patch to remove the IP from everywhere it's hardcoded. [14:16:20] (03PS3) 10Giuseppe Lavagetto: mediawiki: add configs to support the Dallas DC [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194830 [14:17:07] YuviPanda: You can test the resulting resolv.conf on tools-login btw, if you're curious. [14:18:15] 6operations, 10MediaWiki-Configuration, 3codfw-appserver-setup, 3wikis-in-codfw: Configure mediawiki to operate in the Dallas DC - https://phabricator.wikimedia.org/T91754#1095518 (10Joe) 3NEW a:3Joe [14:19:46] (03PS4) 10Giuseppe Lavagetto: mediawiki: add configs to support the Dallas DC [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194830 (https://phabricator.wikimedia.org/T91754) [14:22:00] Coren: right, so the parsoid-lb thing is a change from how things are now, right? so if people start relying on parsoid-lb to resolve to the svc ip, and then someone creates an instance named parsoid-lb... [14:23:48] (03CR) 10Yuvipanda: [C: 04-1] Labs: manage resolv.conf in labs also [puppet] - 10https://gerrit.wikimedia.org/r/194858 (https://phabricator.wikimedia.org/T63897) (owner: 10coren) [14:23:50] * _joe_ creates parsoid-lb [14:23:56] YuviPanda: Right, or if someone relies on an instance doing something and it no longer does in the past... that's really not an issue - if you rely on a short hostname that is not documented and not under your control, you get what you deserve. :-) [14:24:06] it *is* documented on puppet :) [14:24:39] RECOVERY - puppet last run on lead is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [14:24:48] assuming you want labsdb to be enwiki.labsdb.svc.eqiad.wmnet, why not just have it be enwiki.labsdb.labs.eqiad.wmnet, and just have it search labs.eqiad.wmnet [14:25:56] <_joe_> svc.eqiad.wmnet isn't reserved from production? [14:25:59] Hm. Well, I /liked/ svc because they - by definition - are where-to-reach-service-x-in-dc-y which made sence. [14:26:14] But I can see the pros of labs.eqiad.wmnet [14:26:26] _joe_: well, labsdb is in production. [14:26:43] so .wmnet is the correct one, I think. [14:26:56] It is. It's a production service (that happens to be used by labs) [14:27:45] YuviPanda: Yeah, okay. I think your concern re svc is unwarranted; but labs.*.wmnet works for me too so let's do that. [14:28:08] \o/ cool [14:28:44] (03PS3) 10coren: Labs: manage resolv.conf in labs also [puppet] - 10https://gerrit.wikimedia.org/r/194858 (https://phabricator.wikimedia.org/T63897) [14:29:05] Ima just add cnames for labstore and dumps there too. :-) [14:31:13] (03CR) 10Yuvipanda: "Another tiny nit.." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/194858 (https://phabricator.wikimedia.org/T63897) (owner: 10coren) [14:31:17] Coren: another tiny nit... [14:31:17] sorry [14:34:48] YuviPanda: I'd rather have it there explicitly since I expect we might want to tune it in the future. [14:34:56] hmm [14:35:25] Also, it's explicit (at 3) in prod so making the difference obvious is a good thing imo [14:35:33] ah fair enough [14:35:40] (03CR) 10Yuvipanda: [C: 031] Labs: manage resolv.conf in labs also [puppet] - 10https://gerrit.wikimedia.org/r/194858 (https://phabricator.wikimedia.org/T63897) (owner: 10coren) [14:37:06] Coren: \o/. I didn’t actually test the resolv.conf tho. I assume that’s good enough [14:38:09] (03CR) 10coren: [C: 032] Labs: manage resolv.conf in labs also [puppet] - 10https://gerrit.wikimedia.org/r/194858 (https://phabricator.wikimedia.org/T63897) (owner: 10coren) [14:43:28] 6operations, 10Citoid: Update the citoid/deploy branch to not contain zotero deploy - https://phabricator.wikimedia.org/T89872#1095559 (10chasemp) >>! In T89872#1095399, @akosiaris wrote: > Change is at https://gerrit.wikimedia.org/r/#/c/194548/ (isn't a bot supposed to do what I just did ?) I too have notice... [14:53:04] (03CR) 10Ottomata: [C: 032] Stabilize puppet hashes interpolation on yarn-site.xml.erb [puppet/cdh] - 10https://gerrit.wikimedia.org/r/194488 (owner: 10Alexandros Kosiaris) [14:53:11] thanks akosiaris [14:57:59] (03CR) 10Ottomata: [C: 031] add kartik and nikerabbit to statistics-users [puppet] - 10https://gerrit.wikimedia.org/r/194827 (https://phabricator.wikimedia.org/T91625) (owner: 10Dzahn) [15:10:59] 6operations, 6Phabricator: Unhandled Exception ("CommandException") in diffusion - https://phabricator.wikimedia.org/T91648#1095596 (10akosiaris) I think I actually figured this out. It wasn't until late last night that I put 2+2 together. So this task and T91525 are related. T91525 is that latest (and last... [15:11:38] ottomata: ooh, thanks. it was messing up puppet compiler [15:11:41] chasemp: ^ [15:11:52] I figured it out... it's funny to say the least [15:14:01] Excellent rabbit hole my friend, good thinking [15:14:21] Will ask upstream about it [15:14:45] 6operations, 6Phabricator: Unhandled Exception ("CommandException") in diffusion - https://phabricator.wikimedia.org/T91648#1095606 (10Qgil) >>! In T91648#1095596, @akosiaris wrote: > [stevens] ( 1 , 2 , 3 , 4 ) Unix Network Programming , W. Richard Stevens, 1994 Prentice Hall. Ok, call me impressed! :) [15:16:09] PROBLEM - puppet last run on mw1043 is CRITICAL: CRITICAL: Puppet has 1 failures [15:25:19] 6operations, 10ops-eqiad: labstore1002 fails to enter PERC bios, hangs on detecting devices - https://phabricator.wikimedia.org/T91677#1095610 (10Cmjohnson) 5Open>3Resolved I removed all power from labstore1002 when I rearranged the rack. That disconnected the connection to idrac. I powered the server on... [15:29:14] 6operations, 7HHVM: Switch HAT appservers to trusty's ICU - https://phabricator.wikimedia.org/T86096#1095617 (10Bawolff) You may want to give wikis some warning before doing this, there may be a short window where categories will be messed up. updateCollation.php --force (the --force is important) will only n... [15:33:49] RECOVERY - puppet last run on mw1043 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [15:35:39] (03PS5) 10Giuseppe Lavagetto: mediawiki: add configs to support the Dallas DC [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194830 (https://phabricator.wikimedia.org/T91754) [15:36:46] (03PS1) 10coren: Add labs.eqiad.wmnet. subnet [dns] - 10https://gerrit.wikimedia.org/r/194865 [15:36:56] (03CR) 10jenkins-bot: [V: 04-1] Add labs.eqiad.wmnet. subnet [dns] - 10https://gerrit.wikimedia.org/r/194865 (owner: 10coren) [15:37:27] Oh, duh. [15:37:42] 6operations, 6Multimedia, 7HHVM: Convert Imagescalers to HHVM, Trusty - https://phabricator.wikimedia.org/T84842#1095633 (10Bawolff) Animated gifs have always been memory intensive. File:Cigarette_sales_per_Capita_in_the_United_States%2C_1970_-_2012.gif was already on the edge of what was running out of memo... [15:38:00] (03PS1) 10Chad: Restructure $wmfSwiftEqiadConfig into $wmfSwiftConfig [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194866 [15:38:37] (03PS2) 10coren: Add labs.eqiad.wmnet. subnet [dns] - 10https://gerrit.wikimedia.org/r/194865 (https://phabricator.wikimedia.org/T63897) [15:39:19] YuviPanda: ^^ [15:41:53] (03PS1) 10Alexandros Kosiaris: Update modules/cdh to contain de621e8f8fffec [puppet] - 10https://gerrit.wikimedia.org/r/194867 [15:51:48] Amanda is dead, isn't it/she? https://wikitech.wikimedia.org/wiki/Amanda [15:57:54] 6operations, 6Phabricator: Unhandled Exception ("CommandException") in diffusion - https://phabricator.wikimedia.org/T91648#1095646 (10chasemp) >>! In T91648#1095596, @akosiaris wrote: >... > * **Reset the file access creation mask** Nice dude. Lots of work upstream on phd lately so I confirmed on another... [16:02:57] (03CR) 10Alexandros Kosiaris: [C: 032] Update modules/cdh to contain de621e8f8fffec [puppet] - 10https://gerrit.wikimedia.org/r/194867 (owner: 10Alexandros Kosiaris) [16:07:28] (03PS2) 10Krinkle: replace failing fonts (oriya,unfonts,kannada) [puppet] - 10https://gerrit.wikimedia.org/r/194828 (https://phabricator.wikimedia.org/T91685) (owner: 10Dzahn) [16:10:19] (03CR) 10Krinkle: [C: 031] "Thanks! I can cherry-pick this later to test it in CI (or just merge if you've been able to test it elsewhere already and I'll rebase our " [puppet] - 10https://gerrit.wikimedia.org/r/194828 (https://phabricator.wikimedia.org/T91685) (owner: 10Dzahn) [16:10:48] PROBLEM - puppet last run on mc1014 is CRITICAL: Timeout while attempting connection [16:11:20] 6operations, 6Phabricator: Unhandled Exception ("CommandException") in diffusion - https://phabricator.wikimedia.org/T91648#1095653 (10chasemp) epriestley: chasemp: good enough for me, I'll file something to get us setting the umask [16:12:18] PROBLEM - salt-minion processes on mc1014 is CRITICAL: Timeout while attempting connection [16:12:18] PROBLEM - dhclient process on mc1014 is CRITICAL: Timeout while attempting connection [16:13:29] (03CR) 10Chad: [C: 032] Restructure $wmfSwiftEqiadConfig into $wmfSwiftConfig [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194866 (owner: 10Chad) [16:13:34] (03Merged) 10jenkins-bot: Restructure $wmfSwiftEqiadConfig into $wmfSwiftConfig [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194866 (owner: 10Chad) [16:16:09] PROBLEM - DPKG on mc1014 is CRITICAL: Timeout while attempting connection [16:16:19] PROBLEM - Disk space on mc1014 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:16:49] PROBLEM - RAID on mc1014 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:16:58] PROBLEM - configured eth on mc1014 is CRITICAL: Timeout while attempting connection [16:17:09] !log demon Synchronized private/PrivateSettings.php: restructure swift config for multi-dc, with b/c (duration: 00m 07s) [16:17:14] Logged the message, Master [16:17:48] !log demon Synchronized wmf-config/: restructured swift config for multi-dc (duration: 00m 07s) [16:17:49] RECOVERY - salt-minion processes on mc1014 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [16:17:53] Logged the message, Master [16:18:09] 6operations, 6Phabricator: Unhandled Exception ("CommandException") in diffusion - https://phabricator.wikimedia.org/T91648#1095661 (10epriestley) We're in agreement about `phd` being wrong here. Upstream task: https://secure.phabricator.com/T7475 [16:18:34] <^d> _joe_: All done ^^^, it's yours now for adding the new codfw config in PrivateSettings for swift [16:21:38] PROBLEM - Host mc1014 is DOWN: PING CRITICAL - Packet loss = 100% [16:22:09] RECOVERY - Host mc1014 is UP: PING WARNING - Packet loss = 80%, RTA = 2.45 ms [16:25:29] PROBLEM - SSH on mc1014 is CRITICAL: Connection timed out [16:26:48] PROBLEM - salt-minion processes on mc1014 is CRITICAL: Timeout while attempting connection [16:27:29] PROBLEM - Disk space on ms-be2007 is CRITICAL: DISK CRITICAL - /srv/swift-storage/sdd1 is not accessible: Input/output error [16:27:33] bblack: Do you have a few minutes to review a DNS patch for me? [16:28:19] PROBLEM - RAID on ms-be2007 is CRITICAL: CRITICAL: 1 failed LD(s) (Offline) [16:28:42] (03PS1) 10Andrew Bogott: Revert "Move californium to a public ip, part two" [puppet] - 10https://gerrit.wikimedia.org/r/194880 [16:28:44] (03PS1) 10Andrew Bogott: Revert "Move californium to public IP." [puppet] - 10https://gerrit.wikimedia.org/r/194881 [16:29:08] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 7.14% of data above the critical threshold [500.0] [16:29:17] (03PS3) 10coren: Add labs.eqiad.wmnet. subnet [dns] - 10https://gerrit.wikimedia.org/r/194865 (https://phabricator.wikimedia.org/T63897) [16:29:20] <_joe_> mc1014 down? [16:29:48] RECOVERY - SSH on mc1014 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [16:30:28] PROBLEM - Redis on mc1014 is CRITICAL: Connection timed out [16:31:14] 6operations, 6Multimedia, 7HHVM: Convert Imagescalers to HHVM, Trusty - https://phabricator.wikimedia.org/T84842#1095685 (10Bawolff) Some other GIF files since that's where we ran into problems (These are from the most popular list, most will be really small cases that don't stress the server, so I expect th... [16:31:43] (03PS4) 10Alexandros Kosiaris: Beta: Assign proxy for zotero [puppet] - 10https://gerrit.wikimedia.org/r/194552 [16:32:38] RECOVERY - Redis on mc1014 is OK: TCP OK - 0.997 second response time on port 6379 [16:32:53] user signup is broken - we ran out of captchas [16:33:08] fyi: Krenair is going to do a "swat" deploy soon to fix an "unbreak now!" issue in VE (basically, if your edit token expires during a session, VE helpfully gets a new one for you, but due to a change in the mw.api it now is endlessly requesting a new token and then crashes) [16:33:09] PROBLEM - puppet last run on ms-be2007 is CRITICAL: CRITICAL: Puppet has 1 failures [16:34:09] mlitn: Do we have a bug for that? [16:34:28] don’t know, I can submit one [16:34:35] Please do, make it unbreak now [16:35:05] How simply can we fix that? [16:35:23] there's a script to generate new captchas somewhere, isn't there? [16:36:02] captcha.py, but I don't think it's a wise idea to just run it [16:36:09] PROBLEM - Memcached on mc1014 is CRITICAL: Connection timed out [16:36:20] https://phabricator.wikimedia.org/T91760 [16:36:28] PROBLEM - SSH on mc1014 is CRITICAL: Connection timed out [16:37:05] When did captchas start expiring? [16:37:16] Aaron did that in the past [16:37:21] there's a doc on wikitech [16:37:25] but you need a word list [16:37:36] that apparently isn't pblished anywhere [16:37:43] might be in Aaron's home [16:38:04] ... is it not stored on a production server? [16:38:17] https://wikitech.wikimedia.org/wiki/Generating_CAPTCHAs [16:38:22] it's on Aaron's home [16:38:28] /home/aaron [16:38:32] oh right :D [16:38:33] /usr/share/dict/words ? [16:38:52] aha, nvm :p [16:39:13] Reedy: ^ do you know? [16:39:19] PROBLEM - Redis on mc1014 is CRITICAL: Connection timed out [16:39:35] hm? [16:39:43] ah [16:39:45] About generating new catpches [16:39:48] RECOVERY - SSH on mc1014 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [16:39:52] * captchas, even [16:40:02] I've never done it, nope :( [16:40:06] Tim/Aaron [16:40:42] (03PS5) 10Alexandros Kosiaris: Beta: Assign proxy for zotero [puppet] - 10https://gerrit.wikimedia.org/r/194552 [16:40:45] 6operations, 10ops-codfw: rack/wire/initial setup of db2043-db2070 - https://phabricator.wikimedia.org/T89368#1095714 (10Papaul) mgmt settings, test and rack table complete on db2043 10.193.1.93 ge-6/0/12 C6 db2044 10.193.1.94 ge-6/0/13 C6 db2045 10.193.1.95 ge-6/0/14 C6 db2046 10.193.1.96 ge-6/0/15 C6 db2047... [16:41:19] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [16:41:49] RECOVERY - Memcached on mc1014 is OK: TCP OK - 0.001 second response time on port 11211 [16:43:59] !log krenair Synchronized php-1.25wmf19/extensions/VisualEditor/modules/ve-mw/init/ve.init.mw.Target.js: https://gerrit.wikimedia.org/r/#/c/194869/ (duration: 00m 07s) [16:44:02] James_F, ^ [16:44:06] Is Swift dying maybe? [16:44:06] Logged the message, Master [16:44:07] MatmaRex, ^ [16:44:15] See also https://phabricator.wikimedia.org/T91761 [16:44:23] (03PS8) 10Alexandros Kosiaris: Puppet module for the zotero service [puppet] - 10https://gerrit.wikimedia.org/r/194495 (https://phabricator.wikimedia.org/T89867) [16:44:27] (03PS1) 10Milimetric: Add CORS to datasets.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/194884 (https://phabricator.wikimedia.org/T91532) [16:45:08] <_joe_> hoo: why do you say that? [16:45:09] PROBLEM - Memcached on mc1014 is CRITICAL: Connection timed out [16:45:25] 6operations, 10MediaWiki-General-or-Unknown, 10MediaWiki-Uploading, 6Multimedia: Upload on commons broken - https://phabricator.wikimedia.org/T91761#1095743 (10Bawolff) [16:45:28] PROBLEM - SSH on mc1014 is CRITICAL: Connection timed out [16:45:35] _joe_: Because there's people complaining [16:45:41] I'm just askin, not claiming [16:46:09] <_joe_> hoo: it's possible it is, yes [16:46:10] (03CR) 10Glaisher: "causes T91761 possibly" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194866 (owner: 10Chad) [16:46:10] <_joe_> ^d deployed a change that would affect swift [16:46:12] _joe_: Multiple people report upload is broken with error "An unknown error occurred in storage backend "local-swift-eqiad"" [16:46:33] <_joe_> ok, ping ^d, we probably have to revert his changes [16:46:39] <_joe_> ^d: you here? [16:46:43] 6operations, 6Phabricator: Unhandled Exception ("CommandException") in diffusion - https://phabricator.wikimedia.org/T91648#1095745 (10chasemp) [16:46:52] 6operations, 6Phabricator, 10Phabricator-Upstream: Unhandled Exception ("CommandException") in diffusion - https://phabricator.wikimedia.org/T91648#1092221 (10chasemp) [16:46:58] <_joe_> it's one change to private and then the one glashier pointed to [16:47:39] 6operations, 6Phabricator, 10Phabricator-Upstream: PHD ensuring umask goodness - https://phabricator.wikimedia.org/T91648#1095756 (10chasemp) [16:48:27] (03PS1) 10Giuseppe Lavagetto: Revert "Restructure $wmfSwiftEqiadConfig into $wmfSwiftConfig" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194886 [16:48:30] Fyi I have an unbreak-now VE patch lined up for wmf20 (currently only on test wikis, just following wmf19), but feel free to do the swift thing first. [16:48:42] <_joe_> Krenair: ok thanks [16:48:45] My submodule update for that is not merged or anything so it should be clean [16:49:18] <_joe_> Krenair: I'd like chad to verify what I am doing [16:49:44] 2015-03-06 16:48:18 mw1103 idwikisource: HTTP 0 () in 'SwiftFileBackend::getAuthentication' (given '[]'): HTTP return code: 0 [16:49:44] 2015-03-06 16:48:18 mw1103 idwikisource: SwiftFileBackend::doCleanInternal: cannot get container stat [16:49:44] 2015-03-06 16:48:18 mw1103 idwikisource: SwiftFileBackend::doCleanInternal: cannot get container stat [16:49:59] test wikis can wait more, wmf19 was my priority [16:50:24] * Reedy has a look at PrivateSettings [16:50:27] <_joe_> Reedy: yes, looking at how to revert a change to PrivateSettings [16:50:36] git revert hash? [16:50:41] <_joe_> Reedy: just revert the last change and merge https://gerrit.wikimedia.org/r/#/c/194886/ [16:51:06] "Make sure we support old config for swift too" [16:51:26] I love how MaxSem has all the commits in private :P [16:51:48] <_joe_> Reedy: if you can't do it, I will, but I have to look at the docs a bit [16:51:55] I'm just doing it [16:52:01] <_joe_> thanks :) [16:52:22] Hmm [16:52:25] The last commit to private is: [16:52:26] +$wmfSwiftEqiadConfig = $wmfSwiftConfig['eqiad']; // b/c [16:52:57] <_joe_> mmmh no, that and the one before I'd say [16:53:08] (03PS2) 10Giuseppe Lavagetto: Revert "Restructure $wmfSwiftEqiadConfig into $wmfSwiftConfig" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194886 [16:54:13] Yeah, both commits are recent [16:54:34] - require_once( '/usr/local/apache/common/private/WikitechPrivateSettings.php' ); [16:54:35] + require_once( __DIR__ . '/WikitechPrivateSettings.php' ); [16:54:40] Don't think we want to revert that though [16:54:54] And also removing a logstash password [16:55:03] <_joe_> no not that one [16:55:09] <_joe_> lemme take a look [16:55:12] It's all in one commit [16:55:57] <_joe_> Reedy: wat? [16:56:03] <_joe_> Reedy: so ok someone messed up [16:56:13] <_joe_> not chad, I'd say [16:56:14] Yeah, I'm guessing someone made changes but didn't commit them [16:56:19] RECOVERY - SSH on mc1014 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [16:56:22] <_joe_> but you should just revert the swift part [16:56:25] I just reapplied those [16:56:31] <_joe_> ok [16:56:39] PROBLEM - Host mc1014 is DOWN: PING CRITICAL - Packet loss = 100% [16:57:23] 6operations, 10MediaWiki-General-or-Unknown, 10MediaWiki-Uploading, 6Multimedia: Upload on commons broken - https://phabricator.wikimedia.org/T91761#1095805 (10Aklapper) Thanks! This is currently being investigated on IRC in #wikimedia-operations [16:57:25] (03CR) 10Reedy: [C: 032] Revert "Restructure $wmfSwiftEqiadConfig into $wmfSwiftConfig" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194886 (owner: 10Giuseppe Lavagetto) [16:57:32] <^d> Shit sorry [16:57:34] * ^d is catching scrollback [16:57:34] (03Merged) 10jenkins-bot: Revert "Restructure $wmfSwiftEqiadConfig into $wmfSwiftConfig" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194886 (owner: 10Giuseppe Lavagetto) [16:57:36] <^d> (was moving from home -> office) [16:57:51] <_joe_> ^d: never merge after breakfast on friday [16:57:56] !log reedy Synchronized wmf-config/: Unbreak uploads (duration: 00m 07s) [16:58:00] <_joe_> Reedy: thanks a lot :) [16:58:00] Logged the message, Master [16:58:01] So many friday rules [16:58:01] (03PS1) 10Rush: disable phd autorestart for now [puppet] - 10https://gerrit.wikimedia.org/r/194895 [16:58:18] <^d> mother.... [16:58:23] !log reedy Synchronized private/: Unbreak uploads (duration: 00m 06s) [16:58:28] Logged the message, Master [16:58:29] Nemo_bis: Such as you've gotta get down on a Friday? [16:58:32] <_joe_> Nemo_bis: well, his breakfast my beer o'clock [16:58:58] <_joe_> btw, it's that time of the day I guess [16:59:02] Was it just uploads borken? Or viewing too? [16:59:06] <^d> Pfft, who needs images anyway [16:59:18] I only heard reports about uploads [16:59:20] <^d> Reedy: Viewing if you got a cache miss probably ;-) [16:59:20] <_joe_> Reedy: just uploads it seems, but I won't be sure [16:59:21] 6operations, 6Multimedia, 10Wikimedia-General-or-Unknown: Upload on commons broken - https://phabricator.wikimedia.org/T91761#1095816 (10Aklapper) [16:59:28] <_joe_> bawolff: can you test now? [16:59:29] RECOVERY - Memcached on mc1014 is OK: TCP OK - 0.008 second response time on port 11211 [16:59:31] <^d> yay caches [16:59:43] moment [16:59:49] RECOVERY - Host mc1014 is UP: PING WARNING - Packet loss = 93%, RTA = 1.51 ms [16:59:49] 2015-03-06 16:57:55 mw1090 commonswiki: HTTP 0 () in 'SwiftFileBackend::getAuthentication' (given '[]'): HTTP return code: 0 [16:59:55] <_joe_> thanks :) [16:59:59] The error log spam has stopped [17:00:03] So it looks like [17:00:09] <_joe_> Reedy: that sounds right [17:00:32] 6operations, 6Multimedia, 10Wikimedia-General-or-Unknown: Upload on commons broken - https://phabricator.wikimedia.org/T91761#1095818 (10Aklapper) a:3Joe Looks like Joe is investigating (but feel free to reassign, please) [17:00:50] <_joe_> bawolff: btw, thanks a TON for the sample urls for the imagescalers migration, as you may have noticed, we do have one regression :/ [17:00:51] * andre__ tries some basic communication in that Phab task in the meantime [17:00:57] <^d> See, this is why I'm not left alone with the nice things. [17:01:23] Does this look resolved now? [17:01:35] !log rebooting mc1014 as totally hung box [17:01:36] It looks to be [17:01:39] Logged the message, Master [17:01:43] Krenair: We're just waiting for someone to test an upload [17:01:49] RECOVERY - Disk space on ms-be2007 is OK: DISK OK [17:01:49] PROBLEM - Host mc1014 is DOWN: PING CRITICAL - Packet loss = 100% [17:01:50] https://commons.wikimedia.org/wiki/Special:NewFiles started having new files again [17:01:56] so I think the fix worked [17:01:58] WFM [17:02:13] <_joe_> ook! [17:03:54] 6operations, 6Multimedia, 10Wikimedia-General-or-Unknown: Upload on commons broken - https://phabricator.wikimedia.org/T91761#1095836 (10Bawolff) 5Open>3Resolved This is fixed now. [17:04:51] Krenair: Should be good to deploy now [17:04:57] yep, thanks [17:05:43] thanks for the fast fix. highly appricated :-) [17:06:10] _joe_: no problem. I'm really excited for image scalars to be upgraded (Or more specificly I'm excited for video scalars, but image scalars is a major step in that direction) [17:06:13] (03CR) 10Ottomata: [C: 032] Add CORS to datasets.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/194884 (https://phabricator.wikimedia.org/T91532) (owner: 10Milimetric) [17:07:05] <_joe_> bawolff: I am working on it in the "spare time", more or less [17:07:09] RECOVERY - Redis on mc1014 is OK: TCP OK - 0.999 second response time on port 6379 [17:07:18] RECOVERY - Host mc1014 is UP: PING WARNING - Packet loss = 93%, RTA = 1.47 ms [17:07:39] Well then, an extra thank you for working on it [17:09:39] RECOVERY - Disk space on mc1014 is OK: DISK OK [17:10:19] RECOVERY - configured eth on mc1014 is OK: NRPE: Unable to read output [17:13:59] RECOVERY - DPKG on mc1014 is OK: All packages OK [17:14:20] !log krenair Synchronized php-1.25wmf20/extensions/VisualEditor/modules/ve-mw/init/ve.init.mw.Target.js: https://gerrit.wikimedia.org/r/#/c/194870/ (duration: 00m 05s) [17:14:24] Logged the message, Master [17:14:26] James_F, MatmaRex ^ [17:14:34] Thanks, Krenair. [17:14:52] mlitn, Reedy:; What happened with that confirmedit issue? [17:14:59] Nothing AFAIK [17:15:26] 6operations, 10ops-eqiad: Setup the 4 new varnish caching systems - https://phabricator.wikimedia.org/T91769#1095883 (10Cmjohnson) 3NEW [17:15:28] 7Puppet, 6Labs: Missing documentation for labs puppet roles - https://phabricator.wikimedia.org/T91770#1095890 (10awight) 3NEW [17:15:48] PROBLEM - configured eth on mc1014 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [17:16:18] PROBLEM - Disk space on mc1014 is CRITICAL: Timeout while attempting connection [17:16:19] nothing so far [17:17:48] PROBLEM - Host mc1014 is DOWN: PING CRITICAL - Packet loss = 100% [17:20:02] If no one is deploying anything right now, I'd like to do a no-op scap to get updated messages out there [17:20:39] RECOVERY - Disk space on mc1014 is OK: DISK OK [17:20:41] legoktm: Check with Krenair [17:20:47] I think he might be done [17:20:57] I'm done [17:21:09] RECOVERY - Host mc1014 is UP: PING WARNING - Packet loss = 73%, RTA = 3.13 ms [17:21:09] RECOVERY - salt-minion processes on mc1014 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [17:21:11] * Krenair logs out of tin [17:21:26] * bd808 is glad to see Reedy doing Reedy things today [17:21:35] Weather sucks in Florida [17:21:36] Again [17:21:41] lame [17:21:53] but you saw a rocket launch right? [17:21:58] I've seen 2 [17:22:01] (03CR) 10Chad: [C: 032] noc: remove broken symlink to pybal [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194735 (owner: 10Dzahn) [17:22:02] One at night, one in the light [17:22:09] sweet [17:22:09] (03Merged) 10jenkins-bot: noc: remove broken symlink to pybal [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194735 (owner: 10Dzahn) [17:22:25] Yeah [17:22:32] Heard the night one go supersonic [17:22:34] Pretty cool [17:22:44] I've see the inside of several product manager meetings... not quite as exciting [17:23:13] !log legoktm Started scap: no-op to update messages [17:23:19] Logged the message, Master [17:24:09] PROBLEM - puppet last run on amssq58 is CRITICAL: CRITICAL: Puppet has 1 failures [17:24:41] <^d> !log tin: /srv/mediawiki-staging/ now uses https instead of ssh for origin [17:24:45] Logged the message, Master [17:24:59] (03PS1) 10Giuseppe Lavagetto: sessions: temporarily disable mc1014 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194897 [17:25:06] <_joe_> chasemp: this ^^ is one [17:25:20] got it [17:25:37] !log legoktm Finished scap: no-op to update messages (duration: 02m 23s) [17:25:41] Logged the message, Master [17:25:51] o_0 [17:25:54] that was fast [17:26:04] did it actually work? [17:26:06] That can't be right [17:26:17] <_joe_> bd808: we are now storing the mediawiki code on our new storage device, /dev/null [17:26:22] no, it didn't work :/ [17:26:26] <_joe_> it proved to be uber-efficient [17:26:27] the top text on https://en.wikipedia.org/wiki/Special:UsersWhoWillBeRenamed?uselang=de is still english [17:26:31] legoktm: Run it with --verbose? [17:26:43] !log legoktm Started scap: no-op to update messages take 2 [17:26:44] And/or run l10nupdate first to make sure it's pulled all changed messages in etc first [17:26:48] Logged the message, Master [17:26:55] _joe_: sounds very efficient [17:27:02] l10nupdate was run last night though [17:27:49] hmm [17:27:50] 17:26:57 0 languages rebuilt out of 379 [17:27:50] 17:26:57 Use --force to rebuild the caches which are still fresh. [17:27:59] PROBLEM - puppet last run on cp3003 is CRITICAL: CRITICAL: Puppet has 1 failures [17:27:59] !log legoktm Finished scap: no-op to update messages take 2 (duration: 01m 15s) [17:28:04] Logged the message, Master [17:28:16] should I try it with --force? or run l10nupdate manually? [17:28:25] run l10nupdate manually [17:28:28] see what that has to say [17:28:38] <_joe_> bd808: it is! so blazing fast I wanted to propose it as an alternative backend for restbase [17:28:39] PROBLEM - puppet last run on amssq35 is CRITICAL: CRITICAL: Puppet has 1 failures [17:28:41] does it have a --verbose? [17:28:49] (03PS1) 10Giuseppe Lavagetto: nutcracker: temporarily remove mc1014 from the pool [puppet] - 10https://gerrit.wikimedia.org/r/194898 [17:28:50] l10nupdate? [17:28:52] yeah [17:28:55] it's pretty verbose to begin with [17:28:58] <_joe_> chasemp: this ^^ is the second one [17:29:01] lol [17:29:01] it's plenty verbose by default [17:29:04] _joe_: got it makes sense [17:29:07] started [17:29:18] <_joe_> I'm going off for realz now [17:29:24] o/ [17:29:28] !log running l10nupdate [17:29:32] Logged the message, Master [17:29:35] one more q _joe_, would I need to force puppet update on mw* to make that live? [17:29:40] <_joe_> I just realized it's 14 hours I'm around today :P [17:29:41] then I promise I stop bothering you [17:29:55] <_joe_> chasemp: it takes 20 minutes at most, I'd let puppet run its course [17:30:00] gotcha [17:30:05] <_joe_> it's not like a super-emergency [17:30:17] no worries then just checking [17:30:39] (03CR) 10Rush: [C: 032] nutcracker: temporarily remove mc1014 from the pool [puppet] - 10https://gerrit.wikimedia.org/r/194898 (owner: 10Giuseppe Lavagetto) [17:30:48] Fetching submodule VikiSemanticTitle wat [17:30:51] (03CR) 10Rush: [C: 032] sessions: temporarily disable mc1014 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194897 (owner: 10Giuseppe Lavagetto) [17:30:56] (03Merged) 10jenkins-bot: sessions: temporarily disable mc1014 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194897 (owner: 10Giuseppe Lavagetto) [17:31:08] legoktm: SMW stuff? [17:31:16] no [17:31:23] I think it's using mediawiki/extensions [17:31:36] oh yeah it does [17:31:49] l10nupdate does interesting stuff [17:32:21] It keeps a full clone of mw/extensions tracking HEAD for all [17:32:45] and then merges messages from there over into the cdb files in the deployed branches [17:32:59] !log legoktm Synchronized php-1.25wmf19/cache/l10n: (no message) (duration: 00m 01s) [17:33:04] Logged the message, Master [17:33:50] lots and lots of "Updated 0 CDB files" in the scap log [17:34:01] 6operations, 10ops-eqiad: mc1014 server has been flaking out and dropping connectivity - https://phabricator.wikimedia.org/T91773#1095950 (10chasemp) 3NEW a:3Cmjohnson [17:34:18] !log LocalisationUpdate completed (1.25wmf19) at 2015-03-06 17:33:15+00:00 [17:34:22] Logged the message, Master [17:34:22] 6operations, 10ops-eqiad: mc1014 server has been flaking out and dropping connectivity - https://phabricator.wikimedia.org/T91773#1095958 (10chasemp) Also, silenced in icinga until tomorrow (for now) [17:34:49] ah. because the sync dir continues to fail [17:34:55] !log legoktm Synchronized php-1.25wmf20/cache/l10n: (no message) (duration: 00m 04s) [17:34:59] Logged the message, Master [17:35:02] returned [255]: Permission denied (publickey). [17:35:02] (03CR) 10Rush: [C: 032] disable phd autorestart for now [puppet] - 10https://gerrit.wikimedia.org/r/194895 (owner: 10Rush) [17:35:24] 6operations, 6Phabricator, 10Phabricator-Upstream: PHD ensuring umask goodness - https://phabricator.wikimedia.org/T91648#1095959 (10chasemp) [17:35:38] So the cdbs on tin should be up to date in theory [17:35:54] now we need to get them to the wikis [17:36:04] !log LocalisationUpdate completed (1.25wmf20) at 2015-03-06 17:35:01+00:00 [17:36:09] Logged the message, Master [17:36:12] which another scap *should* do [17:36:16] it's still running [17:36:32] !log Key for minion californium.eqiad.wmnet deleted. Key for minion californium.wikimedia.org accepted. [17:36:37] Logged the message, Master [17:36:46] http://fpaste.org/194407/25663391/ <-- its outputting the help info? [17:36:49] andrewbogott: ^ or salt [17:36:53] (03PS1) 10Andrew Bogott: Update horizon config for icehouse [puppet] - 10https://gerrit.wikimedia.org/r/194900 [17:36:58] legoktm: known bug [17:37:00] s/or/for [17:37:28] mutante: thanks — I remembered to clear the puppet one but always forget salt [17:37:49] legoktm: https://phabricator.wikimedia.org/T1387 [17:37:56] (03Abandoned) 10Andrew Bogott: Revert "Move californium to a public ip, part two" [puppet] - 10https://gerrit.wikimedia.org/r/194880 (owner: 10Andrew Bogott) [17:38:12] andrewbogott: and i'm doing this puppetstoredconfigclean.rb californium.eqiad.wmnet [17:38:15] (03Abandoned) 10Andrew Bogott: Revert "Move californium to public IP." [puppet] - 10https://gerrit.wikimedia.org/r/194881 (owner: 10Andrew Bogott) [17:38:18] Killing californium.eqiad.wmnet...done. [17:38:45] on next puppet that should make icinga forget about the old host [17:39:12] legoktm: looks like Reedy has a patch to fix the netcat bug that just needs an Ops merge -- https://gerrit.wikimedia.org/r/#/c/183568/ [17:39:17] it thinks californium is down when it's not because Icinga still checks the old internal IP [17:39:25] (03CR) 10BryanDavis: [C: 031] Make deploy2graphite use mw-deployment-vars.sh [puppet] - 10https://gerrit.wikimedia.org/r/183568 (https://phabricator.wikimedia.org/T1387) (owner: 10Reedy) [17:39:57] (03CR) 10Andrew Bogott: [C: 032] Update horizon config for icehouse [puppet] - 10https://gerrit.wikimedia.org/r/194900 (owner: 10Andrew Bogott) [17:40:20] 6operations, 10Citoid: Configure zotero to use an outbound proxy - https://phabricator.wikimedia.org/T89874#1095973 (10akosiaris) https://gerrit.wikimedia.org/r/#/c/194552/ works fine in Beta. Zotero is using the configured proxy as it is supposed to. I am preparing a commit for production and this task can be... [17:40:43] 6operations, 6Release-Engineering, 5Patch-For-Review: /usr/local/bin/deploy2graphite broken on tin due to nc command syntax - https://phabricator.wikimedia.org/T1387#1095974 (10bd808) >>! In T1387#1043064, @fgiunchedi wrote: > ping? @ori @bd808 there was a question in https://gerrit.wikimedia.org/r/#/c/18356... [17:40:51] Hello, not sure where to ask my question, and please refer me to the right persons. I'm the one who is taking care of Wikimedia Canada's website, hosted by WMF at https://ca.wikimedia.org ...I have a problems. Francophones anonymous visitors cannot use Language selector, only Canadian English is available if not logged in. I read somewhere that this feature was desable by default. Where can I enable it ? [17:40:58] ACKNOWLEDGEMENT - Host californium is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn moved to public IP [17:40:59] RECOVERY - puppet last run on amssq58 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [17:41:29] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/). [17:41:30] Benoit-Rochon: commons and wikidata use a gadget to let anonymous users use a language selector [17:41:45] > DB connection error: Access denied for user 'wikiadmin'@'10.64.0.196' (using password: YES) (208.80.154.136) [17:42:40] 6operations, 10Wikimedia-IRC, 10Wikimedia-Labs-wikitech-interface: Enable irc feed for wikitech.wikimedia.org site - https://phabricator.wikimedia.org/T36685#1095977 (10Glaisher) [17:44:33] Benoit-Rochon: have you read my comment on your report? [17:45:10] RECOVERY - Host californium is UP: PING OK - Packet loss = 0%, RTA = 1.57 ms [17:45:14] (03PS1) 10Andrew Bogott: Let everyone see the horizon logo [puppet] - 10https://gerrit.wikimedia.org/r/194902 [17:45:16] andrewbogott: it's recovering in icinga now ^ [17:45:51] legoktm: so are you scapping again? [17:45:53] icinga configs got regenerated [17:45:59] bd808: no it's still updating RL caches [17:46:00] RECOVERY - puppet last run on cp3003 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [17:46:04] tin is probably https://gerrit.wikimedia.org/r/#/c/194897/ but unsure if that's my calling or not greg-g or bd808? [17:46:12] should I be doing something on tin to make https://gerrit.wikimedia.org/r/#/c/194897/ sane? [17:46:12] oh. yeah that takes forever [17:46:31] RECOVERY - puppet last run on amssq35 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [17:46:33] chasemp: yeah, that needs to be syncd [17:46:48] andrewbogott: https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?host=californium&nostatusheader [17:46:57] legoktm: ok I'm on tin what do? [17:47:09] or can someone in the know do this for me [17:47:12] so I don't blow things up [17:47:19] legoktm: you on it? [17:47:19] What are you trying to do? [17:47:21] I can do it [17:47:38] !log LocalisationUpdate ResourceLoader cache refresh completed at Fri Mar 6 17:46:35 UTC 2015 (duration 17m 32s) [17:47:42] Reedy: He merged https://gerrit.wikimedia.org/r/#/c/194897/1 and didn't pull/sync yet [17:47:44] Logged the message, Master [17:47:47] mutante: hm, can’t think what memcached is for... [17:48:01] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [17:48:22] PROBLEM - Memcached on californium is CRITICAL: Connection refused [17:48:26] Nome_bis I just did few seconds ago !!! Thank you. [17:48:27] andrewbogott: not sure, but i focused on the good part, all the other checks are happy (they weren't) [17:48:28] !log legoktm Synchronized wmf-config/session.php: https://gerrit.wikimedia.org/r/#/c/194897/ (duration: 00m 06s) [17:48:32] ACKNOWLEDGEMENT - Memcached on californium is CRITICAL: Connection refused andrew bogott Im not sure this should even be installed here [17:48:32] Logged the message, Master [17:48:42] chasemp: ^ done [17:48:44] mutante: yeah, it’s mostly good — I’ll sort out memcache eventually. Thanks! [17:48:48] thanks legoktm [17:48:50] andrewbogott: cool, yw [17:49:01] ok, l10nupdate finished [17:49:11] (03CR) 10Andrew Bogott: [C: 032] Let everyone see the horizon logo [puppet] - 10https://gerrit.wikimedia.org/r/194902 (owner: 10Andrew Bogott) [17:49:14] time to scap! [17:49:23] !log legoktm Started scap: no-op to update messages take 3 [17:49:28] Logged the message, Master [17:50:14] legoktm: I tried, but not sure why not working https://ca.wikimedia.org/wiki/MediaWiki:Vector.js?uselang=en [17:50:31] PROBLEM - Redis on mc1014 is CRITICAL: Connection timed out [17:50:42] !log legoktm Finished scap: no-op to update messages take 3 (duration: 01m 19s) [17:50:47] Logged the message, Master [17:50:53] andrewbogott: then i'll also merge the ferm changes for horizon [17:50:53] Benoit-Rochon: copy from https://www.wikidata.org/wiki/MediaWiki:Common.js [17:51:04] legoktm: bah. something's still not right. Updated 0 CDB files(s) across all hosts and branches [17:51:06] And Nikerabbit wrote I noticed many weird and broken things on the wiki [17:51:08] mutante: yes please [17:51:13] mutante: as long as that won’t block ssh :) [17:51:16] bd808: should I try using --force? [17:51:21] (03CR) 10Dzahn: [C: 032] horizon: add firewall hole for http [puppet] - 10https://gerrit.wikimedia.org/r/194805 (owner: 10Dzahn) [17:51:35] mutante: ugly and broken, but it’s working via misc-web: https://horizon.wikimedia.org/ [17:51:35] hmmm... [17:51:47] --force is for l10nupdate I think [17:52:01] chasemp: Feel like reviewing a dns changeset for me? [17:52:05] andrewbogott: nice! unicorn logo :) [17:52:13] Coren: I can try :) [17:52:17] chasemp: https://gerrit.wikimedia.org/r/#/c/194865/ [17:52:21] They must’ve changed the aspect ration in icehouse, it was cropped correctly in havana [17:52:26] *ratio [17:53:00] andrewbogott: arr, i added some stupid dependencies in gerrit i think.. need to fix it to merge [17:53:04] legoktm: not from wikidata! they removed en-ca [17:53:17] oh, oops [17:53:51] RECOVERY - Redis on mc1014 is OK: TCP OK - 0.012 second response time on port 6379 [17:53:58] Salut Coren [17:54:09] Benoit-Rochon: o/ [17:54:49] legoktm: Looking on tin... cdb files are being updated (timestamps changed yesterday) but json dumps are not changed... [17:55:09] Coren: J'essaye de m'en sortir avec les problème de language dans WMCA... je capotte! [17:55:22] if anyone is on iron i just sudo shut it down ...wrong terminal window [17:55:30] PROBLEM - Host iron is DOWN: PING CRITICAL - Packet loss = 100% [17:55:31] cmjohnson1: Hah! [17:55:34] cmjohnson1: oh, hah [17:55:43] cmjohnson1: Happens to the best of us. [17:55:55] Coren: i'm looking and I have no idea if this is right :) [17:55:56] At least it wasn't a poweroff of a box with no ilo. :-) [17:56:01] RECOVERY - Host iron is UP: PING OK - Packet loss = 0%, RTA = 2.26 ms [17:56:07] bblack may be the person to validate? [17:56:12] legoktm: I see that the json files for l10n in 1.25wmf19 changed at 00:23 today [17:56:22] That would be the normal l10nupdate run [17:56:33] but wmf20 did not change since the day before [17:56:43] 02:24 logmsgbot: l10nupdate Synchronized php-1.25wmf19/cache/l10n: (no message) (duration: 00m 01s) [17:56:57] 2 hours later? or did you mean 02:23? [17:57:15] cmjohnson1: works for me again [17:57:18] nope. Mar 6 00:23 [17:57:24] Benoit-Rochon: C'est quoi to probleme exactement? [17:57:42] legoktm: and yeah the cdbs are stamped 02:23 [17:57:45] mutante: thx for letting me know [17:57:52] Benoit-Rochon: see report; please check https://panopticlick.eff.org/ and tell us what's your accept-language [17:57:53] so this is all not right :( [17:58:09] (03CR) 10Dzahn: [C: 032] put base::firewall on californium [puppet] - 10https://gerrit.wikimedia.org/r/194804 (owner: 10Dzahn) [17:58:17] Benoit-Rochon: also, wikimedia-operations is definitely the wrong channel for this sort of support requests :) [17:59:31] andrewbogott: you got firewalled. sudo iptables -L and web ui still up fine [17:59:32] (03PS1) 10Andrew Bogott: Allow horizon host access to nova/keystone/glance [puppet] - 10https://gerrit.wikimedia.org/r/194905 [17:59:32] legoktm: I'm going to spend some time heading down this rabbit hole [17:59:50] mutante: I can’t ssh :( [17:59:52] Can you? [18:00:18] bd808: alright :) [18:00:43] (03CR) 10Andrew Bogott: [C: 032] Allow horizon host access to nova/keystone/glance [puppet] - 10https://gerrit.wikimedia.org/r/194905 (owner: 10Andrew Bogott) [18:01:01] Nemo_bis : text/html, */* gzip, deflate fr-CA,fr;q=0.8,fr-FR;q=0.6,en-US;q=0.4,en;q=0.2 [18:01:06] andrewbogott: yes, i can [18:01:09] now [18:01:16] ACCEPT tcp -- iron.wikimedia.org anywhere tcp dpt:ssh [18:01:30] via iron [18:02:08] huh, why does my proxy command not work, I wonder… [18:02:16] yeah, I can get in via iron too [18:02:27] maybe because you have it set to only proxy when it's *.eqiad.wmnet ? [18:02:34] probably :) [18:02:44] Nemo_bis legoktm Coren : it's working now for anonymous users. Thanks for your support and have a nice day. [18:02:44] but now it's in .wikimedia.org but you still have to proxy .. [18:03:17] mutante: yes, that was exactly it [18:03:26] chasemp: validate what? [18:03:40] bblack: https://gerrit.wikimedia.org/r/#/c/194865/ [18:03:41] andrewbogott: cool, so that's ok right [18:03:58] chasemp: Yeah, I was guessing the fact that I used macros would complicate finding the right person to review. [18:04:00] yep, all good [18:04:29] paravoid: You around-ish? [18:04:52] very very bad internet [18:04:55] so very -ish [18:05:14] paravoid: Enough to do a review of a dns changeset or not? [18:05:21] I'll try [18:05:25] link? [18:05:27] https://gerrit.wikimedia.org/r/#/c/194865/ [18:06:01] paravoid: That's to remove the ugly /etc/hosts hack. [18:06:12] uughhh [18:06:41] One last question guys... where can I ask to remove Canadian English in order to have only reg English on WMCA ? [18:06:54] legoktm: These messages aren't for a new extension are they? Like an extension that didn't exist when the branch was cut? [18:07:07] bd808: no, it's WikimediaMessages [18:07:15] paravoid: I'll check it out [18:07:25] 6operations, 10ops-eqiad: mc1014 server has been flaking out and dropping connectivity - https://phabricator.wikimedia.org/T91773#1096058 (10Cmjohnson) I performed the following actions Replaced the fiber. I noticed that the current fiber cable was not "clicking" in to the sfp+ on the server side. Drained fle... [18:07:31] why do we need all that [18:07:36] bd808: oh, but the messages were backported into wmf19 and wmf20 [18:07:45] this is really very ugly [18:08:02] legoktm: ok. that shouldn't cause problems I don't think [18:08:06] (03PS1) 10Odder: Enable transwiki imports for Telugu Wikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194908 (https://phabricator.wikimedia.org/T91635) [18:08:07] just 19 actually [18:08:14] (03PS1) 10Chad: memcached: Make sure /usr/lib/ganglia/python_modules exists [puppet] - 10https://gerrit.wikimedia.org/r/194909 [18:08:19] bd808: https://github.com/wikimedia/mediawiki-extensions-WikimediaMessages/commit/2fe5bcdde43d6352b31e3c1883f8f905007552cb is the commit I want to pull in [18:08:21] * bd808 is at the top of the rabbit warren and starting down [18:08:24] paravoid: I did it the cleanest way I could think of, but the existence of those per-db names is relied upon by pretty much every tool (comes from toolserver) [18:08:39] paravoid: And projects outside tools too. [18:08:53] mutante, Coren, try logging in to https://horizon.wikimedia.org with your labs shell name but your wikitech password [18:09:06] "comes from toolserver" isn't the best argument to make :) [18:09:25] paravoid: No, but "breaks everything if it goes away" is a pragmatic one. :-/ [18:09:42] andrewbogott: Works for me. [18:09:52] Horizon is /much/ better looking in icehouse, I hadn’t looked since havana [18:10:02] Much indeed! [18:10:43] I guess I’ll fix the logos :) [18:10:48] Coren: some of the c3 databases have underscores, that's not technically legal for a hostname [18:10:49] legoktm: And that patch was backported? Or you are hoping to pick it up from the l1n0update process? [18:11:11] bd808: hoping to pick it up [18:11:15] *nod* [18:11:20] it will get served, it works for the DNS protocol, but it's an illegal character for a hostname so usually best avoided JIC [18:11:21] 6operations: Give Google webmaster tools access to jon katz (Read only is fine) - https://phabricator.wikimedia.org/T90980#1096075 (10JKatzWMF) Hi - just repinging on this. [18:11:39] e.g. be_x_oldwiki [18:11:41] bblack: Ah, the zh variants and a few other oddballs. [18:11:42] I don't think we should be having all this complexity [18:11:48] esp. not in the prod DNS servers [18:12:01] RECOVERY - puppet last run on mc1014 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [18:12:14] why is it that it needs to be in wmnet instead of in wmflabs? [18:12:18] paravoid: The alternative is to preload that into dnsmasq. It'd be limited to labs, by definition, but would be more brittle. [18:12:33] I would probably be okay with it if we would delegate labs.eqiad.wmnet to a/the labs nameserver [18:12:48] paravoid: Hm. That sounds like a good idea actually. [18:12:55] but otoh this logic is why Labs has accumulated so much more cruft than production over the years :/ [18:13:37] so maybe I should not be making this distinction in my way of thinking [18:14:16] paravoid: Well, tbh, at this point my concern is "make status quo clean" more than "improve". Transitioning away from this is not trivial. [18:14:30] what was making this work before, or what's it replacing? [18:14:34] paravoid: *anything* is better than a hardcoded /etc/hosts [18:14:44] bblack: /etc/hosts pushed onto instances. [18:14:45] ^d, manybubbles: I've got two super easy config patches that need to be merged, but I won't be able to participate in SWAT deploys till Wednesday at the earliest. Could I schedule them for Monday for you guys to deploy? [18:15:12] twkozlowski: I'll be on an airplane...... So anomie? [18:15:19] the /etc/hosts entries are in labs.eqiad.wmnet as well? [18:15:20] in principle, yes [18:15:27] could you document what the ideal solution would look like and what would be needed to get there? [18:15:35] I still don't see why this isn't in the wmflabs TLD [18:15:40] <^d> twkozlowski: Mon-Wed is bad for me next week too, conferency things [18:15:40] manybubbles: Errm.. Your name is on the list for Monday :) [18:15:49] its always on the list, I think [18:15:56] Ah, right. [18:15:58] (after all, prod doesn't need to resolve this or connect to it, right?) [18:15:59] we just pick about 10 minutes before [18:16:00] bblack: No, they use a faux tld 'labsdb.'; labs now has ndos:2 and search labs.eqiad.wmnet in prevision [18:16:35] * YuviPanda comes back from food, reads backscroll [18:16:37] paravoid: "Ideal" solution would be for tools to look what server has what DB dynamically (there is a meta table on the replicas with that info). [18:16:50] paravoid: And do away with the per-db hostnames entirely. [18:17:00] <^d> YuviPanda: Thoughts on https://gerrit.wikimedia.org/r/#/c/194909/? [18:17:07] how does prod do it for mediaiki + all the dbs for the real sites? [18:17:38] twkozlowski: Basically, it all depends on someone being around who can test that the patch actually did whatever it's supposed to do (and didn't break anything else). Without seeing the patches, I couldn't commit to being able to do that. [18:17:50] bblack: mediawiki-config points things at the right db. But afaik there are no wikis that connects to different dbs depending on what they are doing. [18:18:06] anomie: https://phabricator.wikimedia.org/T91635 and https://phabricator.wikimedia.org/T91630 so that's super easy. [18:18:11] legoktm: not looking good so far. The string that I'm testing for is not in wmf19 cdb.json files [18:18:17] * bd808 continues to dig [18:18:28] (03CR) 10Yuvipanda: [C: 04-1] "Labs has no ganglia, and there's a has_ganglia hiera flag. So you can see where memcached::ganglia is included and not include it in labs " [puppet] - 10https://gerrit.wikimedia.org/r/194909 (owner: 10Chad) [18:18:50] <^d> YuviPanda: Ah, so we need to set that to false for staging [18:18:53] and in labs, we have one wiki pointing at many things? [18:18:56] <^d> I was wondering that [18:19:00] ^d: yup, it’s set to false for deployment-prep [18:19:08] paravoid: So, there is an ideal solution that provides the same semantics, and an ideal solution that does away with that system but requires code changes on almost every tool and project that talks to databases. [18:19:10] I mean, I guess I expected labs to be like prod when it came to all these language wikis + dbs [18:19:32] can we have this all on a phab task? [18:19:36] ^d: Is there a simple script on tin that lookup a value in a cdb based on a key? [18:19:47] bblack: There is already one. Lemme add more info on it. [18:19:50] <^d> bd808: mwscript cdb.php? [18:20:08] how it works, how do you think it should work, what it takes to get there and why do we need an intermediate solution [18:20:22] as generic as that sounds, I have to admit that it's the first I'm hearing about all this [18:20:40] twkozlowski: So T91635 would want someone with the appropriate rights on that wiki, and T91630 would want someone who knows how to use GWToolset (and has the necessary rights, if there are any). [18:21:03] paravoid: No, that makes sense. I'll clean up the ticket because it refers to older NAT intermediate solution that has been ripped out long ago though. [18:21:34] anomie: ... which is nothing I can help with anyway, as I don't have the necessary permissions on either project. [18:22:38] (03Abandoned) 10Chad: memcached: Make sure /usr/lib/ganglia/python_modules exists [puppet] - 10https://gerrit.wikimedia.org/r/194909 (owner: 10Chad) [18:22:55] (03PS1) 10Odder: Add a domain to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194913 (https://phabricator.wikimedia.org/T91630) [18:23:03] ^d: thanks. seems to work [18:23:09] <^d> yw [18:23:35] internet getting worse [18:23:38] bbll [18:25:15] legoktm: So... the messages you are wanting are in var/lib/l10nupdate/mediawiki/extensions/WikimediaMessages but not in /var/lib/l10nupdate/cache-1.25wmf{19,20} [18:25:36] * bd808 keeps looking [18:27:29] bblack: So yeah, those used to be aliases in the form 'foo.labsdb' hardcoded in /etc/hosts [18:27:42] (Well, they still are, but shouldn't be) [18:31:37] bblack: I've fix't the description of https://phabricator.wikimedia.org/T63897 to match current status better [18:32:39] legoktm: extensions/LocalisationUpdate/update.php is not seeing that there are new messages to merge. Not sure why yet [18:35:03] andrewbogott: That version of horizon is seriously boss. [18:35:16] anomie: so, in short, let me know if I can add them to the Deployments page on Wikitech. Otherwise they'll have to wait till Wednesday; I can join you on IRC that day, but I won't be able to test either of the patches. [18:35:41] Coren: Yeah. Many more features, verging on too many features :) I’m thinking now maybe we should upgrade labs to Juno just to see what’s next :) [18:35:52] * Coren chuckles. [18:36:04] Upgrading is easy, it just requires us to reboot virt hosts :( [18:36:14] "just" [18:36:28] well, if we space them out enough, there should actually be very little disruption to toollabs this time [18:36:43] YuviPanda: It's not just tools [18:36:44] :-) [18:36:59] the others usually have less people complaining :) [18:37:16] Well, we don’t need to do it this week, at least :) [18:37:28] andrewbogott: btw, do you know why it is that we can’t hit public floating IPs from inside labs? [18:37:42] * YuviPanda being able to do that would make a lot of redundancy / failover mechanisms much easier... [18:37:48] Usually upgrades don’t require a reboot, but Precise is a dead end. [18:37:52] Coren: we already have something like this for address rewrites in dnsmasq.conf: https://github.com/wikimedia/operations-puppet/blob/production/modules/openstack/templates/icehouse/nova/dnsmasq-nova.conf.erb [18:37:55] Coren: https://gerrit.wikimedia.org/r/#/c/194806/ it adds a user account, but it doesn't add it to any groups yet, so technically it's not the thing that gives access, but will be needed [18:38:04] couldn't we extend that same basic mechanism, but use cname= entries there? [18:38:05] YuviPanda: I don’t really know the rationale. It might just be a side-effect of some other policy. [18:38:43] andrewbogott: right. [18:38:44] Coren: I’m a bit worried that horizon allows any random user to delete images… going to test that in a couple of hours when I have a less-rooty user to collaborate with. [18:38:48] I will dig in at some point I guess. [18:38:57] YuviPanda: yeah, or send an email to Ops and cc ryan [18:39:02] Since I expect he made the call originally [18:39:58] bblack: We can. It's one of the two approaches considered. But two things bug me about it: (a) dnsmasq is already brittle as [bleep], and (b) since it's not proper zone files, behaviour of additions/changes is hard to guess at. Also (c) restarting dnsmasq to apply changes involves having to restart nova-network [18:40:03] (well, you can do cname= or address= in that config, whatever's appropriate) [18:40:50] a/c are compelling arguments, but IMHO they're compelling for replacing all of dnsmasq with a better solution that can do the same [18:40:57] bblack: What made me not so hesitant to put them in "real" DNS is the obvious analogue/parallel with the language wikis. [18:40:59] (03CR) 10Dzahn: [C: 032] bugzilla: remove ferm service for port 443 [puppet] - 10https://gerrit.wikimedia.org/r/194789 (owner: 10Dzahn) [18:41:01] e.g. powerdns-recursor can also have similar local data [18:41:39] bblack: Getting rid of dnsmasq is also on the todo list; but far from trivial. [18:42:12] (03PS2) 10Dzahn: bugzilla: remove ferm service for port 443 [puppet] - 10https://gerrit.wikimedia.org/r/194789 [18:42:17] well for the moment, if dnsmasq breaks betalabs breaks. and these entries shouldn't need updating all the time, right? so it's not like you're going to be spamming restarts all the time [18:42:32] I'm just saying, your eggs are already in that basket, and you need a solution there in the long term regardless [18:42:50] YuviPanda: IIRC, Ryan said that this is because the public IP mapping is done at a different layer than the internode networking and won't propagate (i.e.: the natting is only done on the /outside/ interface where instances don't reach) [18:44:01] YuviPanda: You'd have to have packets from the instances look back around into the outside interface for the natting to take place. [18:44:28] s/look/loop/ [18:44:47] Coren: hmm, I wonder how much overhead that would have / if nova-network supports it at all [18:44:47] (03PS1) 10MaxSem: Fix WikiGrok settings init [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194919 [18:44:54] In the big picture, I'd rather see labs confined to labs if we can help it. having labs data/functionality creeping into the prod DNS servers is not ideal, and does nothing for prod, which doesn't need these names. [18:45:05] (03CR) 10Dzahn: "before:" [puppet] - 10https://gerrit.wikimedia.org/r/194789 (owner: 10Dzahn) [18:45:08] (03CR) 10Odder: [C: 031] Setting import sources for uawikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193662 (https://phabricator.wikimedia.org/T91187) (owner: 10Base) [18:45:21] bblack: prod doesn't need the names, but the replica DBs are explicitly prod services. [18:45:33] bblack: That happen to be used mostly by labs. [18:45:44] (03PS2) 10Dzahn: planet: remove ferm service for 443 [puppet] - 10https://gerrit.wikimedia.org/r/194790 [18:46:00] (03CR) 10Dzahn: [C: 032] planet: remove ferm service for 443 [puppet] - 10https://gerrit.wikimedia.org/r/194790 (owner: 10Dzahn) [18:46:07] yeah but that means at best the 3x c1/c2/c3 hostnames belong in prod DNS somewhere [18:46:15] * Coren nods. [18:46:19] I do see your point. [18:46:19] (do they already have other names in prod DNS?) [18:46:39] bblack: Yes, but hardware names to which the mapping is not necessarily static. [18:47:11] Right now c1 is labsdb1001 but that's because it is, but because it has to. [18:47:19] s/but/not/ [18:48:33] (03CR) 10Dzahn: "same here as with bugzilla and all other services on zirconium. they are proxied and don't need to open these holes" [puppet] - 10https://gerrit.wikimedia.org/r/194790 (owner: 10Dzahn) [18:48:42] anomie: ^ [18:48:56] Coren: reading through the ticket, one thing stands out that I take issue with I think: [18:49:00] "This, however, requires altering the code that runs those tools and possibly restructuring it according to the new scheme, something which is difficult to demand of every maintainer (not all of whom are active regularily)." [18:49:09] (03PS2) 10Dzahn: racktables: remove ferm service for 443 [puppet] - 10https://gerrit.wikimedia.org/r/194791 [18:49:29] IMHO, *that* is the root of the problem. In the general case, how can one expect to reasonably support and maintain code that doesn not have active involved maintainers for it? [18:49:43] twkozlowski: I don't want to be responsible for something I can't verify... [18:50:14] (03PS3) 10Dzahn: releases: remove ferm service for 443 [puppet] - 10https://gerrit.wikimedia.org/r/194793 [18:50:20] do they really expect to dump off random code and walk away for years and that you'll make it keep working with magic and duct tape? But then they have a right to come back and complain if the tool ever breaks, without working to avoid breakage? [18:50:41] anomie: Okay, I'll try to get someone to verify it for Wednesday. [18:51:25] (03PS3) 10Dzahn: racktables: remove ferm service for 443 [puppet] - 10https://gerrit.wikimedia.org/r/194791 [18:51:47] (03PS2) 10Dzahn: releases: typo in docs: ssh->tls/ssl [puppet] - 10https://gerrit.wikimedia.org/r/194795 [18:53:45] (03PS3) 10Dzahn: site.pp - adding a few node comments [puppet] - 10https://gerrit.wikimedia.org/r/194797 [18:54:34] (03PS4) 10Dzahn: site.pp - adding a few node comments [puppet] - 10https://gerrit.wikimedia.org/r/194797 [18:54:39] (03CR) 10Dzahn: [C: 032] site.pp - adding a few node comments [puppet] - 10https://gerrit.wikimedia.org/r/194797 (owner: 10Dzahn) [18:55:30] bblack: yup, that’s part of the problem. [18:56:03] bblack: I’m a lot more ‘pro’ breaking things and then fixing them as they come up, but that usually has meant that I end up doing a lot of drudge work... [18:56:55] (03PS3) 10Dzahn: releases: typo in docs: ssh->tls/ssl [puppet] - 10https://gerrit.wikimedia.org/r/194795 [18:58:00] (03CR) 10Dzahn: [C: 032] releases: typo in docs: ssh->tls/ssl [puppet] - 10https://gerrit.wikimedia.org/r/194795 (owner: 10Dzahn) [18:59:21] (03CR) 10Dzahn: [C: 032] releases: remove ferm service for 443 [puppet] - 10https://gerrit.wikimedia.org/r/194793 (owner: 10Dzahn) [19:01:06] (03CR) 10Dzahn: "Notice: /Stage[main]/Ferm/File[/etc/ferm/conf.d/10_releases_https]/ensure: removed" [puppet] - 10https://gerrit.wikimedia.org/r/194793 (owner: 10Dzahn) [19:02:26] (03CR) 10Dzahn: "closed 443 on caesium" [puppet] - 10https://gerrit.wikimedia.org/r/194793 (owner: 10Dzahn) [19:04:19] (03PS1) 10Papaul: added mw2136-mw2148 [puppet] - 10https://gerrit.wikimedia.org/r/194921 [19:04:25] (03CR) 10Dzahn: [C: 032] racktables: remove ferm service for 443 [puppet] - 10https://gerrit.wikimedia.org/r/194791 (owner: 10Dzahn) [19:05:43] bblack: We're not necessarily talking years either - there's also the issue that they are volunteers and may or may not be able or willing to expend time to change things as we modify the api under them when we say so. They may only be able to do it during spring breack, or vacation, or whatever. [19:05:56] (03CR) 10Dzahn: "on magnesium: Notice: /Stage[main]/Ferm/File[/etc/ferm/conf.d/10_racktables-https]/ensure: removed" [puppet] - 10https://gerrit.wikimedia.org/r/194791 (owner: 10Dzahn) [19:06:58] bblack: It took over a year to move everything from ts to labs, and that was because I spent great pains making sure that things needed as few changes as possible. [19:07:30] bblack: Consider it the same magnitude problem as breaking editor workflow on enwiki. :-) [19:14:08] (03PS1) 10Dzahn: annualreport/devportal: add ferm service for http [puppet] - 10https://gerrit.wikimedia.org/r/194922 [19:14:13] 6operations, 10Staging: mariadb puppet module doesn't start mysql service in labs (possibly anywhere) - https://phabricator.wikimedia.org/T91797#1096483 (10thcipriani) 3NEW [19:15:05] 6operations, 6Multimedia, 7HHVM: Convert Imagescalers to HHVM, Trusty - https://phabricator.wikimedia.org/T84842#1096494 (10brion) Another thing to test: .ogg / .ogv transcoding output (from .webm or opus source). I've had some troubles with ffmpeg2theora on Trusty (though the note above about multithreading... [19:17:05] (03PS1) 10Dzahn: etherpad: remove ferm hole for 443 [puppet] - 10https://gerrit.wikimedia.org/r/194924 [19:17:17] 6operations, 10Staging: mariadb puppet module doesn't start mysql service in labs (possibly anywhere) - https://phabricator.wikimedia.org/T91797#1096523 (10thcipriani) [19:20:02] (03CR) 10Dzahn: [C: 032] etherpad: remove ferm hole for 443 [puppet] - 10https://gerrit.wikimedia.org/r/194924 (owner: 10Dzahn) [19:21:46] (03CR) 10Dzahn: "Notice: /Stage[main]/Ferm/File[/etc/ferm/conf.d/10_etherpad_https]/ensure: removed" [puppet] - 10https://gerrit.wikimedia.org/r/194924 (owner: 10Dzahn) [19:22:34] (03CR) 10Dzahn: "this closed the port on zirconium because it was the last role on it that opened it." [puppet] - 10https://gerrit.wikimedia.org/r/194924 (owner: 10Dzahn) [19:25:14] (03CR) 10Dzahn: [C: 032] "of course it's already open, actually we repeat the rule to open it 7 times, because 7 roles, but it's the proper way to do it so we can a" [puppet] - 10https://gerrit.wikimedia.org/r/194922 (owner: 10Dzahn) [19:27:02] Coren: yeah, I'm just saying: code on the internet is a living thing. It can't be abandoned and expected to thrive :) [19:27:09] (03CR) 10Dzahn: "and now it's 9 x ACCEPT tcp -- anywhere anywhere tcp dpt:http" [puppet] - 10https://gerrit.wikimedia.org/r/194922 (owner: 10Dzahn) [19:27:26] I sometimes really worry about how much "random labs code" the prod wikis rely on in a functional sense [19:28:14] "semi-production" [19:28:45] (not that I'm saying the code needs be ditched, I just wish those code dependencies for maintenance/tools/etc were documented and maintained more rigorously) [19:29:35] yea, actually that is a reason to not kill semantic mediawiki on wikitech [19:29:58] because of the "add documenation" links for each project [19:30:44] do we have an inventory of "all the tools used by bots/editors", in some checklist somewhere that we can audit e.g. when it was last updated by who, who's officially maintaining it, where the source repo is, can review changes, etc? [19:30:47] Everything? Most TS tools are not on Labs yet [19:31:43] bblack: all? that's going to be thousands items [19:31:48] yeah I know :/ [19:31:52] https://www.mediawiki.org/wiki/Upstream_projects#Invented_Here has links to some lists [19:32:00] there is a hierarchy to that: 3rd party, on labs but manual without docs, on labs with docc and maintainers.. [19:32:13] and gadgets... [19:32:28] it's great in the sense that it's very open, but it's a gaping black hole when it comes to "relying on the unknown and unknownable, and not breaking them" [19:32:31] https://www.mediawiki.org/wiki/Wikimedia_Labs/Tool_Labs/List_of_Toolserver_Tools has a small portion of the tools [19:32:56] Most of MediaWiki core is unmaintained too anyway ;) [19:33:05] :) [19:33:31] at least we know it, though. it's a repo we have synced up and we can grep and submit review patches for to fix it to work with new architecture, etc [19:33:35] (03PS1) 10Thcipriani: Add /etc/mysql dir before linking inside it [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/194925 [19:34:35] 6operations, 6Engineering-Community, 6MediaWiki-Core-Team, 6Multimedia, and 3 others: Prepare Platform April 2015 quarterly review presentation - https://phabricator.wikimedia.org/T91803#1096558 (10bd808) 3NEW a:3bd808 [19:36:03] bd808: should we just backport the messages and be done with it for now? [19:36:18] legoktm: yes [19:36:19] (03CR) 10Southparkfan: add network variables for dumps rsync clients (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/189196 (owner: 10John F. Lewis) [19:36:33] I don't see the bottom of this rabbit hole yet [19:37:09] greg-g: since l10nupdate isn't working, I'm going to backport + deploy some SUL translations [19:37:16] (03PS3) 10Chad: Add myself to the releasers group [puppet] - 10https://gerrit.wikimedia.org/r/194140 (https://phabricator.wikimedia.org/T91424) [19:37:26] <^d> Coren: I waited my 3 days ^ :) [19:37:30] (03PS1) 10coren: Labs: Don't start manage-nfs-volumes at boot [puppet] - 10https://gerrit.wikimedia.org/r/194929 [19:37:56] (03CR) 10Dzahn: [C: 031] "https://commons.wikimedia.org/wiki/Category:Multiple_people_with_thumbs_up" [puppet] - 10https://gerrit.wikimedia.org/r/194140 (https://phabricator.wikimedia.org/T91424) (owner: 10Chad) [19:38:12] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Chad H needs release.wm.o access to do future release(s) - https://phabricator.wikimedia.org/T91424#1096582 (10coren) [19:38:14] legoktm: kk [19:38:15] ^d So you have. :-) [19:38:21] (03CR) 10Dzahn: [C: 032] "https://commons.wikimedia.org/wiki/Category:Multiple_people_with_thumbs_up" [puppet] - 10https://gerrit.wikimedia.org/r/194140 (https://phabricator.wikimedia.org/T91424) (owner: 10Chad) [19:39:44] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Chad H needs release.wm.o access to do future release(s) - https://phabricator.wikimedia.org/T91424#1096585 (10coren) 5Open>3Resolved Yep. Cool enough to do releases. [19:39:55] 10Ops-Access-Requests, 6operations: Chad H needs release.wm.o access to do future release(s) - https://phabricator.wikimedia.org/T91424#1096587 (10coren) [19:40:23] andrewbogott_afk: Can you https://gerrit.wikimedia.org/r/#/c/194929/ when you get a minute? [19:40:23] ^d: created on caesium [19:40:35] <^d> ssh works, ty sir [19:40:47] welcome to releasers then [19:41:48] 10Ops-Access-Requests, 6operations: Chad H needs release.wm.o access to do future release(s) - https://phabricator.wikimedia.org/T91424#1096590 (10Dzahn) created on caesium: Notice: /Stage[main]/Admin/Admin::Hashuser[demon]/Admin::User[demon]/File[/home/demon/.ssh/authorized_keys]/ensure: created Notice: /Sta... [19:42:13] YuviPanda: You awake? [19:42:22] heh, looks like, [19:42:38] YuviPanda: Odd timezone person you. https://gerrit.wikimedia.org/r/#/c/194929/1 pretty please? I want to finish labstore1002 [19:42:40] <^d> mutante: fwiw, old releases on caesmium are owned by mah's old UID :) [19:42:46] <^d> Probably harmless [19:43:15] ^d: should we have a proper gid for releasers? [19:43:30] <^d> We do have a gid [19:43:39] <^d> eg: -rw-r--r-- 1 1232 releasers-mediawiki 20725572 Dec 17 20:17 mediawiki-1.24.1.tar.gz [19:44:35] (03CR) 10Yuvipanda: [C: 04-1] "Also, what exactly is start-nfs? I don't find it in our puppet repo, and can't seem to find it googling either..." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/194929 (owner: 10coren) [19:44:48] brion owns mobile, catrope owns VisualEditor, but mediawiki was root [19:44:53] <^d> lol [19:45:26] haha [19:45:30] hrmm.. a bit of a mess [19:45:38] the owner changes with versions [19:46:18] YuviPanda: Allergic to comments? :-) I don't mind doing delete + phab [19:46:25] ^d: 1232 = hexmode , right [19:46:29] <^d> yep [19:46:35] Coren: phab tickets get poked at, commented out code… :) [19:46:37] <^d> according to admins/data.yaml [19:46:42] YuviPanda: putting FIXME comments is an old (bad) habit of mine. :-) [19:46:43] 1.22 = root 1.23 = mglaser 1.24 = 1232 [19:46:44] :p [19:47:31] YuviPanda: You're right about start-nfs though. [19:47:55] <^d> mutante: 1.22 and below were copied from dumps/download.wm.o [19:48:00] <^d> So yeah, they'd all be root [19:48:15] also: some files wikidev, some files releasers-mediawiki [19:48:32] <^d> more umask fun! [19:49:05] Coren: yup, so whatever start-nfs is, I think that should be puppetized and then this would make sense. [19:49:17] YuviPanda: Doing so now. [19:50:24] !log chown demon:releasers-mediawiki 1.24 and below (belonged the removed user 1232/mah) [19:50:28] ^d:^ [19:50:30] Logged the message, Master [19:51:48] <^d> I own all the releases now! [19:51:51] * ^d goes drunk with power [19:52:04] no, just 1.24 :p [19:53:21] (03PS2) 10coren: Labs: Don't start manage-nfs-volumes at boot [puppet] - 10https://gerrit.wikimedia.org/r/194929 [19:53:31] YuviPanda: ^^ [19:55:26] (03CR) 10Yuvipanda: [C: 031] "(Although I'm not sure about eacth step start-nfs does, but I assume it was unpuppetized earlier)" [puppet] - 10https://gerrit.wikimedia.org/r/194929 (owner: 10coren) [19:55:55] (03CR) 10coren: [C: 032] "It was. This is puppetizing status quo." [puppet] - 10https://gerrit.wikimedia.org/r/194929 (owner: 10coren) [19:57:00] * andrewbogott confused by the lack of login message on labs bastion [19:57:12] Not sure how to go about testing that exactly one daemon runs though. Passive test? [19:57:23] andrewbogott: jessie? They changed motd handling. [19:58:47] Coren: no, we explicitly removed it a few days ago and then I forgot [19:58:48] !log legoktm Started scap: WikimediaMessages updates [19:58:52] Logged the message, Master [20:03:54] !log legoktm Finished scap: WikimediaMessages updates (duration: 05m 06s) [20:04:00] Logged the message, Master [20:04:12] https://en.wikipedia.org/wiki/Special:UsersWhoWillBeRenamed?uselang=de woot [20:07:55] Why is it colspan=3? [20:08:09] shouldn't it show the date and edit count? :P [20:09:32] I haven't populated the lists yet :P [20:10:41] (03PS2) 10Dzahn: put base::firewall on calcium [puppet] - 10https://gerrit.wikimedia.org/r/194799 (https://phabricator.wikimedia.org/T83044) [20:11:11] (03CR) 10Dzahn: [C: 032] put base::firewall on calcium [puppet] - 10https://gerrit.wikimedia.org/r/194799 (https://phabricator.wikimedia.org/T83044) (owner: 10Dzahn) [20:12:24] 6operations, 6Engineering-Community, 6MediaWiki-Core-Team, 6Multimedia, and 3 others: Prepare Platform April 2015 quarterly review presentation - https://phabricator.wikimedia.org/T91803#1096676 (10bd808) [20:13:42] legoktm: You're the worst [20:13:49] :( [20:13:51] mutante: did you merge the firewall patch I re poked you about? (Can't remember the server, can't remember if you merged it :p) [20:14:12] JohnFLewis: that one, not yet [20:14:26] i added a newer review [20:14:31] Kay, can you remember which server it was? :p [20:14:33] i think it's good [20:14:37] uranium [20:14:45] That's it. Right [20:14:57] but there's more to it than the others i merged [20:15:00] more services [20:15:22] Thought it's covered though? [20:15:37] +1 but +1 from akos' and i merge it [20:15:57] Alright [20:16:53] because gmond/gmetad/etc i don't wanna break ganglia on friday [20:17:00] fine on monday [20:17:26] the ones for today all just http(s) [20:19:32] (03CR) 10John F. Lewis: "@Faidon is there a specific reason against it assuming analytics is also stored here and this is more network stuff/firewall than purely d" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/189196 (owner: 10John F. Lewis) [20:19:58] 6operations, 3HTTPS-by-default, 5Patch-For-Review: varnish disk cache auditing/correction - https://phabricator.wikimedia.org/T90583#1096701 (10BBlack) 5Open>3Resolved All of the existing servers are sorted out in puppet now, and will take on an optimal disk layout + varnish storage config as they're rei... [20:20:41] But breaking ganglia on a Friday is better than Monday as people might actually use it on the Monday :p [20:21:11] :p there is some logic there, hah [20:21:38] 6operations, 3HTTPS-by-default: Upgrade all HTTP frontends to Debian jessie - https://phabricator.wikimedia.org/T86648#1096708 (10BBlack) We're at ~15% of the cache endpoints converted and reinstalled now, and most corner-case oddities with the reinstall process are known. Next week beings mass reinstalls, ex... [20:23:06] 6operations, 7HTTPS, 3HTTPS-by-default: Expand HTTP frontend clusters with new hardware - https://phabricator.wikimedia.org/T86663#1096715 (10BBlack) eqiad hardware expected to be racked, cabled, and available for software install circa Wed/Thurs of next week (11th or 12th). [20:26:33] mutante: so, I’m thinking that I need to open a firewall hole for memcache monitoring on californium… does that seem right? And, if so, what host should I open up for? [20:27:54] andrewbogott: yes, if you want memcached running there that seems right, and you should put it into the role class for horizon [20:28:11] mutante: yeah, horizon is using local memcache for sessions. [20:28:13] the base::firewall is always on nodes, and the holes you poke into it are always in the roles [20:28:25] I can’t find any examples of memcache monitoring firewall stuff anyplace. [20:28:47] andrewbogott: the nrpe port should be open which a [20:29:03] it should come from base per default , but this might be special [20:29:10] *allows monitoring via icinga [20:29:36] I don’t know why it would be special... [20:29:41] maybe it’s not a firewall issue [20:30:18] class { 'memcached': [20:30:19] ip => '127.0.0.1', [20:30:19] } [20:30:19] seems uncomplicated [20:30:27] can you find the icinga checkcommand line? [20:30:34] and manually run it [20:30:36] from neon [20:31:29] I don’t have any idea where to look for that [20:31:33] but am looking anyway [20:32:09] /etc/icinga on neon [20:32:36] 20484 check_command nrpe_check!check_memcached!10 [20:32:43] so it's executed via nrpe [20:33:03] and then runs a command locally on californium [20:34:07] for that it needs nagios-nrpe to be running on californium, but it is [20:34:13] hm, where would that command be located? [20:34:16] /etc/init.d/nagios-nrpe-server status [20:34:35] no, I mean, the ‘check_memcached’ command [20:34:47] in /etc/nagios/nrpe* on californium [20:34:59] hmph, serves me right for looking in /etc/icinga [20:35:05] either nrpe_local.cfg [20:35:13] or in the nrpe.d/* [20:35:41] weird, the check uses a nonstandard port [20:35:54] yes, it checks on port 11000, ack [20:35:58] oh, wait, no it doesn't [20:36:03] 11000 looks right [20:36:14] hm... [20:36:35] so running that manually, and it works: [20:36:40] well, what the heck, I restarted memcached and it switched ports to 11000 and now everything is happy [20:36:42] /usr/lib/nagios/plugins/check_tcp -H 127.0.0.1 -p 11000 [20:36:42] TCP OK - 0.000 second response time on port 11000|time=0.000116s;;;0.000000;10.000000 [20:36:47] so ...uhm.. [20:37:12] oh :) [20:37:26] RECOVERY - Memcached on californium is OK: TCP OK - 0.000 second response time on port 11000 [20:37:26] I think I’m going to reboot the box and see what it does [20:37:38] maybe it was a remnant from an earlier attempt [20:38:31] maybe puppet failed to run the right init script ..or upstart [20:39:10] !log rebooting californium to see what memcached does on startup [20:39:17] Logged the message, Master [20:40:46] PROBLEM - Host californium is DOWN: PING CRITICAL - Packet loss = 100% [20:42:47] RECOVERY - Host californium is UP: PING OK - Packet loss = 0%, RTA = 0.45 ms [20:49:24] mutante: a reboot and icinga is happy [20:49:55] andrewbogott: great [20:50:11] sounds like something that just fails on first boot then? [20:50:22] after fresh installs [20:53:32] mutante: maybe not even that, I’ve done some installing and reinstalling on that box. [20:54:42] akosiaris: Puppet runs every 20 minutes… but /what/ runs it? [20:55:40] 6operations, 10Staging: mariadb puppet module doesn't start mysql service in labs (possibly anywhere) - https://phabricator.wikimedia.org/T91797#1096760 (10coren) p:5Triage>3Normal I'm not sure that should be considered a bug - mysql_install_db is a very, //very// destructive operation that is probably unw... [20:56:05] (03CR) 10Chad: [C: 032] style: Misc code style fixes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194069 (owner: 10KartikMistry) [20:56:13] andrewbogott: cat /etc/cron.d/puppet [20:56:17] (03Merged) 10jenkins-bot: style: Misc code style fixes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194069 (owner: 10KartikMistry) [20:56:43] !log demon Synchronized wmf-config/InitialiseSettings.php: no-op, style fixes (duration: 00m 05s) [20:56:48] Logged the message, Master [20:57:07] mutante: hm, you’re right... [20:58:23] andrewbogott: do you want it temp. disabled? [20:58:31] mutante: no [20:58:39] we just couldn’t find it in crontab -l and were wondering what gives [20:59:31] ah, yea, right, not in crontab -l [20:59:58] PROBLEM - check if wikidata.org dispatch lag is higher than 2 minutes on wikidata is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1458 bytes in 0.420 second response time [21:00:16] it runs every 20 minutes AND @reboot [21:02:54] mutante: cron.d is different from crontab, I take it? [21:04:08] it doesn't belong to a specific user, it's system [21:04:11] but not root [21:04:19] so crontab -u root -l won't show it [21:06:02] i'm not sure if we would care about the difference though [21:08:20] andrewbogott: 'official answer': on a single user machine or a shared machine such as a school or college server, a user crontab would be the way to go. But in a large IT department, where several people might look after a server, then /etc/cron.d is probably the best place to install crontabs - it's a central point and saves searching [21:08:58] mutante: that makes sense, I didn’t realize there was such a thing. I thought it was all in root’s crontab [21:11:00] (03CR) 10Tim Landscheidt: "Neat! :-) (Obviously, didn't test.) I think we should add a comment that the distribution of databases throughout c1databases & Co. is f" [dns] - 10https://gerrit.wikimedia.org/r/194865 (https://phabricator.wikimedia.org/T63897) (owner: 10coren) [21:15:56] PROBLEM - HTTP on californium is CRITICAL: Connection refused [21:17:29] (03PS1) 10Hoo man: Don't dispatch Wikibase changes to closed Wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194949 [21:18:06] RECOVERY - HTTP on californium is OK: HTTP OK: HTTP/1.1 200 OK - 2258 bytes in 0.920 second response time [21:21:55] (03PS3) 10Dzahn: replace failing fonts (oriya,unfonts,kannada) [puppet] - 10https://gerrit.wikimedia.org/r/194828 (https://phabricator.wikimedia.org/T91685) [21:22:57] (03PS1) 10Andrew Bogott: Require https for horizon [puppet] - 10https://gerrit.wikimedia.org/r/194951 [21:23:45] 7Puppet, 6operations, 10Continuous-Integration, 5Patch-For-Review: Puppet class Mediawiki::Packages::Fonts fails to install various fonts - https://phabricator.wikimedia.org/T91685#1096843 (10Dzahn) no, after re-checking this on a trusty prod host i had to amend and the old package names still exist on Tru... [21:24:14] greg-g: Around? [21:24:23] (03CR) 10Andrew Bogott: [C: 032] Require https for horizon [puppet] - 10https://gerrit.wikimedia.org/r/194951 (owner: 10Andrew Bogott) [21:24:31] (03CR) 10Chad: [C: 032] noc: rm broken symlinks to mediawikiview/VE dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194741 (owner: 10Dzahn) [21:24:33] (03CR) 10Chad: [C: 032] noc: add link to db.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194736 (https://phabricator.wikimedia.org/T90837) (owner: 10Dzahn) [21:24:37] (03PS4) 10Dzahn: install fonts-unfonts-core, not just fonts-unfonts-extra [puppet] - 10https://gerrit.wikimedia.org/r/194828 (https://phabricator.wikimedia.org/T91685) [21:24:39] (03Merged) 10jenkins-bot: noc: rm broken symlinks to mediawikiview/VE dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194741 (owner: 10Dzahn) [21:24:41] (03Merged) 10jenkins-bot: noc: add link to db.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194736 (https://phabricator.wikimedia.org/T90837) (owner: 10Dzahn) [21:24:50] (03CR) 10Aude: [C: 031] "looks sensible..." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194949 (owner: 10Hoo man) [21:25:28] !log demon Synchronized docroot/noc/: (no message) (duration: 00m 09s) [21:25:35] Logged the message, Master [21:28:54] (03CR) 10Dzahn: "this will likely not work: RewriteCond %{HTTPS} !=on because horizon is proxied behind misc-web. nginx terminates HTTPS so Apache won't k" [puppet] - 10https://gerrit.wikimedia.org/r/194951 (owner: 10Andrew Bogott) [21:30:36] (03CR) 10Dzahn: "something like this instead: RewriteCond %{HTTP:X-Forwarded-Proto} !https" [puppet] - 10https://gerrit.wikimedia.org/r/194951 (owner: 10Andrew Bogott) [21:33:08] (03PS3) 10Chad: noc: add link to new pybal config files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194742 (owner: 10Dzahn) [21:33:16] PROBLEM - Disk space on fluorine is CRITICAL: DISK CRITICAL - free space: /a 75525 MB (3% inode=99%): [21:33:17] ^d: thank you:) and i was about to fix that [21:33:37] <^d> The whole thing has windows line endings :p [21:33:51] <^d> s/has/had/ :p [21:34:15] (03CR) 10Chad: [C: 032] noc: add link to new pybal config files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194742 (owner: 10Dzahn) [21:34:19] (03Merged) 10jenkins-bot: noc: add link to new pybal config files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194742 (owner: 10Dzahn) [21:34:20] ah:) yes [21:34:31] nice [21:34:58] mutante: you’re exactly right about the problem, although RewriteCond %{HTTP:X-Forwarded-Proto} !https still gets me a redir loop [21:35:12] !log demon Synchronized docroot/noc/index.html: (no message) (duration: 00m 07s) [21:35:18] Logged the message, Master [21:36:03] andrewbogott: try to just copy this https://gerrit.wikimedia.org/r/#/c/194583/2/modules/noc/templates/dbtree.wikimedia.org.erb [21:36:14] use the same snippet for more than one service [21:36:20] we [21:36:40] that's for http->https when behind misc-web [21:37:23] mutante: the ‘header’ line too? [21:37:44] <^d> mutante: so many pretty links on noc now :) [21:38:43] mutante: the Header line errors out on apache… without it I get the same looping behavior [21:39:06] andrewbogott: yes, you need this to fix the error: https://gerrit.wikimedia.org/r/#/c/190280/2/manifests/role/servermon.pp [21:39:11] load mod_headers [21:39:38] ^d: yay:) gracias [21:40:03] gwicke: pybal links more visible now [21:40:26] mutante: cool, thx! [21:41:10] (03PS1) 10Andrew Bogott: Force https for horizon, take two [puppet] - 10https://gerrit.wikimedia.org/r/194959 [21:41:13] except, just noticed the README still says .. Config files for PyBal in pmtpa. [21:41:54] (03CR) 10Dzahn: [C: 031] Force https for horizon, take two [puppet] - 10https://gerrit.wikimedia.org/r/194959 (owner: 10Andrew Bogott) [21:43:12] (03CR) 10Andrew Bogott: [C: 032] Force https for horizon, take two [puppet] - 10https://gerrit.wikimedia.org/r/194959 (owner: 10Andrew Bogott) [21:45:04] mutante: works, thanks! [21:45:05] andrewbogott: :)) [21:47:36] andrewbogott: logged in and looking around, interesting [21:47:51] can already see projects [21:47:54] very busy, right? But it seems mostly usable. [21:48:28] nah, it doesn't even feel that slow [21:48:32] yes [21:49:07] by ‘busy’ I mean, there are a million things on every screen [21:49:28] andrewbogott: woah, that’s actually really, really nice [21:49:44] ah, i thought you mean slow :). well yea, not as many as as the wiki ui :) [21:49:44] https://horizon.wikimedia.org/project/access_and_security/ [21:49:45] It seems I can only see projects that I’m an admin of, not projects that I’m a member of [21:49:46] has API access [21:49:50] which is a bit troubling [21:49:52] interesting [21:50:21] i can see a surprisingly large number of projects [21:50:36] i would say more than i'm admin of [21:50:48] hm, can you double-check w/wikitech? I just de-admined myself from a project and it vanished from horizon [21:50:49] but i might be wrong because i created projects for others a bit [21:50:57] and that added me as admins to things i dont really use [21:51:08] YuviPanda: there are going to be lots of entries in our docs that say “ignore this page, you can’t change anything and it wouldn’t help anyway” [21:51:12] right [21:51:18] mutante: yeah, I think if you create a page you’re automatically admin [21:51:18] I wonder if we can hide them [21:51:28] YuviPanda: I hope so :) [21:51:59] andrewbogott: also, what’s the state of ‘include puppet roles / hiera from horizon’? [21:51:59] But, YuviPanda, I think your next big dev project should be making a puppet (and/or hiera) dashboard. [21:52:05] hahaha [21:52:06] yes [21:52:25] Probably including writing a little rest api for it. [21:52:31] ooooh [21:52:32] right [21:52:37] so I can make it a small python service [21:52:45] that just keeps state somewhere (filesystem / db) [21:52:53] and have a tiny puppet ENC [21:52:55] Yeah — I proposed making an official one for OpenStack a couple of years ago but there wasn’t enough interest to make it happen. [21:52:56] that just queries this one [21:53:05] Should google and make sure that it hasn’t already happened [21:53:07] definitely sign me up... [21:53:13] yeah, if it hasn’t already happened... [21:54:02] Here’s my old design https://wiki.openstack.org/wiki/PuppetConfigForNova [21:54:11] I think it should use its own DB though, different from what I wrote there [21:54:42] (There’s probably not a lot of use in that doc, now that I read it) [21:54:55] no variables, I think [21:54:57] only hieraaaa [21:55:18] yeah, at this point there’s no need for vars. [21:55:28] Actually, can it /all/ be done via hiera? [21:55:33] andrewbogott: it already is! [21:55:37] well [21:55:38] kinda [21:55:48] andrewbogott: http://wikitech.wikimedia.org/wiki/Hiera:staging [21:56:02] andrewbogott: everything under ‘classes’ is applied on all nodes in the project [21:56:03] YuviPanda, mutante, I’m very interested in anything you find in that interface that you shouldn’t be able to do. for instance, it offers to let people delete images although, forunately, it doesn’t actually permit you to go through with it [21:56:25] YuviPanda: ok then! [21:56:43] andrewbogott: so technically it *can* be, although _joe_ hates it, and I can see why and we should probably write an ENC... [21:57:15] Hm, the horizon interface could just be a big edit field :) [21:57:27] andrewbogott: yeah, so that’s one of the things I had in mind. [21:57:35] YuviPanda, mutante, file potential security concerns here: https://phabricator.wikimedia.org/T91784 [21:57:42] andrewbogott: basically, have one huge YAML file from which both the ENC and puppet read. [21:57:51] YuviPanda: ideally, the interface would automatically aggregate available classes. [21:58:01] andrewbogott: and start off by having people hand-edit it, and then you just put a UI on top [21:58:08] By reading code. That’s not as hard as it sounds. [21:58:15] yeah, starting with raw yaml is fine [21:58:31] andrewbogott: I don’t think we even need to read code. I think you can query the puppetmaster for ‘list of classes' [21:58:44] oh? better yet [21:59:02] andrewbogott: so yeah, I’m definitely up for a little YAML based ENC. I even have a phab for it [21:59:10] Hm… right now wikitech allows you to designate classes as being appropriate for given projects. I don’t know if we care about maintaining that though. [21:59:34] andrewbogott: https://phabricator.wikimedia.org/T85279 [22:00:45] I will rejoice when puppet nodes aren’t in ldap anymore [22:00:53] andrewbogott: \o/ I shall to [22:00:54] too [22:00:57] 6operations, 10Staging: mariadb puppet module doesn't start mysql service in labs (possibly anywhere) - https://phabricator.wikimedia.org/T91797#1096926 (10thcipriani) >>! In T91797#1096760, @coren wrote: > I'm not sure that should be considered a bug - mysql_install_db is a very, //very// destructive operatio... [22:03:41] andrewbogott: I’ll whip up a PoC of YAML -> ENC next week hopefully [22:05:26] great [22:24:49] FYI: another friday deploy coming: hoo will be turning off the dispatching of wikidata edits to closed wikis (to help them keep up with the mega stream) [22:25:27] (03PS2) 10Hoo man: Don't dispatch Wikibase changes to closed Wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194949 [22:25:37] (03CR) 10Hoo man: [C: 032] Don't dispatch Wikibase changes to closed Wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194949 (owner: 10Hoo man) [22:26:00] (03Merged) 10jenkins-bot: Don't dispatch Wikibase changes to closed Wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194949 (owner: 10Hoo man) [22:26:02] (03CR) 10Greg Grossmeier: "+1'ing for the record (for a Friday deploy)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194949 (owner: 10Hoo man) [22:27:16] !log hoo Synchronized wmf-config/Wikibase.php: Don't dispatch Wikibase changes to closed Wikis (duration: 00m 06s) [22:27:22] Logged the message, Master [22:29:28] hoo: that'd be cleaner done as a wmg using the preloaded db lists :P [22:30:10] Reedy: Not sure how that would work [22:30:22] oh, yeah [22:30:22] we need an array of all DBs that is enabled on [22:30:24] I'm tired [22:30:30] lol [22:30:46] I was poking at a way of keeping a cached array of each around, but it never got anyway [22:35:38] (03PS1) 10Reedy: Remove toolserver.org from captcha whitelist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194966 [22:35:45] (03PS1) 10Hoo man: Increase the number of dispatchers again [puppet] - 10https://gerrit.wikimedia.org/r/194967 [22:35:53] ori: mutante: ^ [22:36:12] If we have that, I think the dispatching will work for the weekend [22:36:23] People tend to edit more on the weekend AFAIR [22:45:14] 7Puppet, 6operations, 10Continuous-Integration: Puppet (silently) fails to setup apache on some integration-slave14xx instances - https://phabricator.wikimedia.org/T91832#1097104 (10Krinkle) 3NEW [22:46:22] 7Puppet, 6operations, 10Continuous-Integration: Puppet (silently) fails to setup apache on some integration-slave14xx instances - https://phabricator.wikimedia.org/T91832#1097112 (10Krinkle) [22:48:10] 7Puppet, 6operations, 10Continuous-Integration: Puppet (silently) fails to setup apache on some integration-slave14xx instances - https://phabricator.wikimedia.org/T91832#1097115 (10Krinkle) p:5Triage>3Low [22:48:44] 7Puppet, 6operations, 10Continuous-Integration: Puppet (silently) fails to setup apache on some integration-slave14xx instances - https://phabricator.wikimedia.org/T91832#1097118 (10Krinkle) Re-running puppet two (!) more times eventually fixed this. Lowering priority since we're moving on regardless, but it... [22:50:08] YuviPanda: https://gerrit.wikimedia.org/r/194967 [22:51:37] FYI: Apparently today isn't really Friday. legoktm will be doing a backport/deploy to start a long running SUL-related script over the weekend [22:52:33] greg-g: you failed to control the days of the week :( [22:53:33] greg-g: Next Friday the correct response is fuck no [22:53:54] JohnFLewis: I could have swore it was Friday, it feels like one, I'm in the office even, but all these deploys have me second guessing [22:53:58] Reedy: how's flying? [22:54:08] Reedy: "I let you do it once, not no more!" [22:54:23] JohnFLewis: "Floridas weather is great all year round!" [22:54:26] * Reedy calls bullshit [22:54:34] it is saturday in six minutes [22:54:40] so it is soon going to be safe to deploy :D [22:55:01] haha [22:55:09] Zulu or gtfo [22:56:05] + Reedy is around :) [22:56:17] I already fixeded things earlier today! :) [22:59:56] * hoo is still looking for an op for https://gerrit.wikimedia.org/r/194967 [23:00:37] Coren is on duty... [23:08:33] (03CR) 10Ori.livneh: [C: 032] Increase the number of dispatchers again [puppet] - 10https://gerrit.wikimedia.org/r/194967 (owner: 10Hoo man) [23:10:14] !log legoktm Synchronized php-1.25wmf20/extensions/CentralAuth/includes/CentralAuthUser.php: https://gerrit.wikimedia.org/r/#/c/194709/ (duration: 00m 08s) [23:10:21] Logged the message, Master [23:10:33] !log legoktm Synchronized php-1.25wmf19/extensions/CentralAuth/includes/CentralAuthUser.php: https://gerrit.wikimedia.org/r/#/c/194709/ (duration: 00m 05s) [23:10:38] Logged the message, Master [23:11:28] greg-g: done, thanks! [23:15:09] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - santhosh - https://phabricator.wikimedia.org/T90937#1097202 (10Dzahn) 5Open>3Resolved I manually deleted the user and home directory from oxygen and gadolinium. (deluser santhosh, rm -rf /home/santhosh) . Home direct... [23:15:10] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1097204 (10Dzahn) [23:16:56] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1071208 (10Dzahn) [23:20:51] 7Puppet, 6operations, 10Continuous-Integration, 5Patch-For-Review: Puppet class Mediawiki::Packages::Fonts fails to install various fonts - https://phabricator.wikimedia.org/T91685#1097220 (10Dzahn) a:3Dzahn [23:21:41] !log demon Synchronized php-1.25wmf19/includes/: db profiling backport (duration: 00m 09s) [23:21:47] Logged the message, Master [23:21:51] 7Puppet, 6operations, 10Continuous-Integration, 5Patch-For-Review: Puppet class Mediawiki::Packages::Fonts fails to install various fonts - https://phabricator.wikimedia.org/T91685#1093266 (10Dzahn) ``` @integration-slave1405:~# dpkg -l | grep 'kannada\|oriya\|unfonts\|libertine' ii fonts-linuxlibertine... [23:25:49] 7Puppet, 6operations, 10Continuous-Integration, 5Patch-For-Review: Puppet class Mediawiki::Packages::Fonts fails to install various fonts - https://phabricator.wikimedia.org/T91685#1097226 (10Dzahn) 5Open>3Resolved Notice: Finished catalog run in 33.31 seconds root@integration-slave1405:~ not sure how... [23:29:06] PROBLEM - puppet last run on mw1065 is CRITICAL: CRITICAL: Puppet has 1 failures [23:29:34] (03PS5) 10Dzahn: install fonts-unfonts-core, not just fonts-unfonts-extra [puppet] - 10https://gerrit.wikimedia.org/r/194828 (https://phabricator.wikimedia.org/T91685) [23:34:49] <^d> mutante: you'd think -extra would depend on -core [23:34:57] <^d> (just from the name) [23:44:18] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 9 below the confidence bounds [23:46:37] RECOVERY - puppet last run on mw1065 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [23:47:39] !log started running populateListofUsersToRename.php (CentralAuth) [23:47:46] Logged the message, Master [23:52:48] ^demon|away: agreed, but it doesn't. just Recommends: fonts-unfonts-core [23:53:37] and Replaces: ttf-unfonts [23:54:03] they all tend to be fonts-something replaces ttf-something [23:54:26] and virtual vs. non-virtual packages [23:59:22] 7Puppet, 6operations, 10Continuous-Integration, 5Patch-For-Review: Puppet class Mediawiki::Packages::Fonts fails to install various fonts - https://phabricator.wikimedia.org/T91685#1097267 (10Dzahn) still suggesting https://gerrit.wikimedia.org/r/#/c/194828/ but it's just something i noticed while looking...