[00:00:04] RoanKattouw ostriches Krenair: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20151209T0000). [00:00:04] yurik jgirault ebernhardson: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [00:00:08] thcipriani, awesome! right in time for SWAT ) [00:00:19] ok [00:00:19] (03Abandoned) 10BBlack: Unify evaluate_cookie and evaluate_cookie_mobile [puppet] - 10https://gerrit.wikimedia.org/r/256617 (owner: 10Ori.livneh) [00:00:30] yurik: sigh :) [00:00:32] Surely we're not going to allow SWAT? [00:00:43] (03Abandoned) 10BBlack: Improve handling of disableImages cookie [puppet] - 10https://gerrit.wikimedia.org/r/257491 (https://phabricator.wikimedia.org/T120151) (owner: 10Ori.livneh) [00:00:50] James_F, why not? [00:01:28] yurik: The train just deployed after several set backs six hours late due to breakages and complications. [00:02:02] James_F, we have very few minor fixes for the graph, i seriously doubt it will affect the train [00:02:17] yurik: Not my call. [00:02:19] Famous last words :D [00:02:27] hehe [00:02:48] so who's swating? [00:02:48] the other two patches i have up are both trivial config changes. One turns off an A/B test that's already run over many days (and doesn't work with the newly deployed train). the other sets the shard counts for a current unused elasticsearch indices [00:03:34] i can, unless there are other complaints as well [00:04:09] ebernhardson, i say go for it ) [00:04:18] PROBLEM - Unmerged changes on repository mediawiki_config on mira is CRITICAL: There are 2 unmerged changes in mediawiki_config (dir /srv/mediawiki-staging/). [00:04:20] do we have a second opinion? :) [00:04:25] (03CR) 10BBlack: [C: 04-1] "needs a ferm change to allow the caches to reach etherpad1001 on port 9001, I think" [puppet] - 10https://gerrit.wikimedia.org/r/255406 (owner: 10Alexandros Kosiaris) [00:04:27] icinga-wm, you don't count [00:04:42] greg-g / thcipriani: Your call. [00:06:51] these changes look fairly innocuous to me. I'd rather get code the code out now that we've cleared our backlog rather than have it build up again. [00:07:18] thx thcipriani ! [00:07:36] I'd defer to greg-g (although he may not be around) [00:07:55] umm [00:08:05] mediawiki.org looks wrong [00:08:17] James_F: aparently in the undeployed but merged category we have https://gerrit.wikimedia.org/r/#/q/Id0642023abdae574e32620fc0843631d86bae006,n,z [00:08:27] https://www.mediawiki.org/wiki/MediaWiki?useskin=vector <-- the page isn't supposed to look like that [00:08:29] ebernhardson: That's in wmf.9. [00:08:46] ebernhardson: We're talking about wmf.8. [00:08:50] legoktm: yeah, just noticed that. there are some other problems too (see -releng) [00:08:59] James_F: ahh, ok [00:09:16] No SWAT for now. [00:09:35] ^ this is fair given breakage. [00:10:31] * ebernhardson sighs [00:11:00] thcipriani, i have already merged the graph's patches to 27.8 branch of graph. Is that ok? [00:11:11] and +2ed them [00:13:18] yurik: looking. [00:16:17] robh: can you help me understand about https://phabricator.wikimedia.org/T102689 ? [00:17:21] yurik: I'd like to get any changes that have merged out, but given they also touch some of the less styles it could make troubleshooting the .8 styling even more problematic. Can you revert for now and we can try again for morning SWAT? [00:17:26] Donni says that ‘virt1.wikimedia.org’ is still authoritative for some domains. I can’t figure out how that could be true [00:18:03] thcipriani, sure [00:18:50] (03CR) 10Krinkle: toollabs: migrate to redis::instance (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/257534 (owner: 10Ori.livneh) [00:19:05] andrewbogott: yea i just replied asking her why she says they are in use, but cannot provide a list [00:19:07] thats bullshit ;] [00:19:14] thanks :) [00:19:25] I figured that I would grep and immediately find it but... [00:19:30] cuz indeed, nothing should have pointed to them but labs stuff [00:19:34] weirdly I found virt0 listed as auth for some labs stuff but not virt1 [00:19:37] ori: https://gerrit.wikimedia.org/r/#/c/257043/1/manifests/role/ci.pp [00:19:47] i think they accidentally applied it to some other domain [00:19:49] yurik: appreciated. [00:20:18] RECOVERY - puppet last run on mw1230 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:20:52] I think I fixed the gadgets issue [00:20:53] > var_dump(GadgetRepo::singleton()->purgeDefinitionCache()); [00:22:46] thcipriani, https://gerrit.wikimedia.org/r/#/c/257784/ [00:23:03] thcipriani: https://gerrit.wikimedia.org/r/257785 needs to be backported, otherwise Gadgets are going to break tomorrow and thursday [00:24:07] legoktm: kk, will do, thanks. [00:25:38] PROBLEM - puppet last run on maerlant is CRITICAL: CRITICAL: puppet fail [00:27:51] 6operations, 10ops-eqiad: Remove all out of warranty unused cp10xx's from A2 - https://phabricator.wikimedia.org/T120856#1864265 (10RobH) Also please make sure to snag the rbf1001/rbf1002 as they were repurposed but once again spare original squid systems. [00:31:28] what's up? [00:31:48] * greg-g reads [00:32:28] greg-g: tl;dr cancelled evening swat after late deployment of wmf.8 when some breakage was discovered :( [00:33:16] :/ [00:33:28] wmf.8 still broken or fixed? [00:33:33] * greg-g reads in -releng too [00:34:07] legoktm fixed it, backporting as part of morning SWAT (already cherry-picked) [00:34:27] greg-g: the Gadgets issue is fixed on mw.o, unsure if anything else is broken. I think I found a regression in ED that I'll have ready for tomorrow swat [00:34:43] * greg-g nods [00:34:56] good work [00:35:00] thcipriani: fun first day? :) [00:36:05] greg-g: I did this uneventfully one other time, but yes: a fun and eventful first day :) [00:36:28] duh, of course [00:37:24] either way: working on a patch to fix the early stumbling block during branch cut that probably cost ~1hr [00:37:36] sweet [00:38:30] it's weird how small setbacks can seemingly compound in a long process. [00:40:24] thcipriani: which is why I leave for the airport 5 hours before my flight [00:43:19] PROBLEM - puppet last run on mw2147 is CRITICAL: CRITICAL: puppet fail [00:44:10] 6operations, 6Phabricator: migrate RT maint-announce into phabricator - https://phabricator.wikimedia.org/T118176#1864303 (10RobH) a:5RobH>3chasemp I've assigned this to the @chasemp for his review of the proposed workflow/email notifications of announcements. [00:46:54] (03PS1) 10Andrew Bogott: Add labs_baremetal_servers hiera item. [puppet] - 10https://gerrit.wikimedia.org/r/257796 (https://phabricator.wikimedia.org/T120262) [00:51:29] (03PS2) 10Andrew Bogott: Add labs_baremetal_servers hiera item. [puppet] - 10https://gerrit.wikimedia.org/r/257796 (https://phabricator.wikimedia.org/T120262) [00:52:34] (03CR) 10jenkins-bot: [V: 04-1] Add labs_baremetal_servers hiera item. [puppet] - 10https://gerrit.wikimedia.org/r/257796 (https://phabricator.wikimedia.org/T120262) (owner: 10Andrew Bogott) [00:53:08] RECOVERY - puppet last run on maerlant is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [00:54:15] (03PS3) 10Andrew Bogott: Add labs_baremetal_servers hiera item. [puppet] - 10https://gerrit.wikimedia.org/r/257796 (https://phabricator.wikimedia.org/T120262) [00:56:20] (03PS4) 10Andrew Bogott: Add labs_baremetal_servers hiera item. [puppet] - 10https://gerrit.wikimedia.org/r/257796 (https://phabricator.wikimedia.org/T120262) [00:56:28] hm, the shorter the patch the more revisions [00:58:25] (03PS1) 10Jforrester: Add config for wgVisualEditorUseSingleEditTab and set true for enwiki Labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/257800 [00:58:27] (03PS1) 10Jforrester: Remove never-used VisualEditorBetaInTab config option [mediawiki-config] - 10https://gerrit.wikimedia.org/r/257801 [00:58:29] (03PS1) 10Jforrester: Remove always-used VisualEditorShowBetaWelcome config option [mediawiki-config] - 10https://gerrit.wikimedia.org/r/257802 [00:58:31] (03PS5) 10Andrew Bogott: Add labs_baremetal_servers hiera item. [puppet] - 10https://gerrit.wikimedia.org/r/257796 (https://phabricator.wikimedia.org/T120262) [01:01:15] (03CR) 10Andrew Bogott: [C: 032] Add labs_baremetal_servers hiera item. [puppet] - 10https://gerrit.wikimedia.org/r/257796 (https://phabricator.wikimedia.org/T120262) (owner: 10Andrew Bogott) [01:01:49] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [01:03:31] (03CR) 10Jforrester: "Does this have approval from Community Advocacy/Legal?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/255129 (https://phabricator.wikimedia.org/T119446) (owner: 10Mdann52) [01:10:47] RECOVERY - puppet last run on mw2147 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [01:13:11] (03CR) 10Peachey88: "This is a standard permission change, Legal does not need to sign off on en.wikipedia community census." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/255129 (https://phabricator.wikimedia.org/T119446) (owner: 10Mdann52) [01:13:36] (03PS1) 10Andrew Bogott: Allow access to the labs puppetmaster from labs metal hosts. [puppet] - 10https://gerrit.wikimedia.org/r/257805 (https://phabricator.wikimedia.org/T95185) [01:14:37] (03CR) 10Andrew Bogott: [C: 032] Allow access to the labs puppetmaster from labs metal hosts. [puppet] - 10https://gerrit.wikimedia.org/r/257805 (https://phabricator.wikimedia.org/T95185) (owner: 10Andrew Bogott) [01:18:36] (03CR) 10Jforrester: "Isn't there a right in this area that we don't even hand out to stewards or staff? Or did we kill it?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/255129 (https://phabricator.wikimedia.org/T119446) (owner: 10Mdann52) [01:21:59] (03CR) 10Alex Monk: "This right is already granted to sysops on enwiki, among other places." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/255129 (https://phabricator.wikimedia.org/T119446) (owner: 10Mdann52) [01:22:23] (03CR) 10Legoktm: "James, I think you're thinking of "abusefilter-private", which is "View private data in the abuse log"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/255129 (https://phabricator.wikimedia.org/T119446) (owner: 10Mdann52) [01:23:33] (03CR) 10Peachey88: "Yes, it's abusefilter-private as legoktm has said." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/255129 (https://phabricator.wikimedia.org/T119446) (owner: 10Mdann52) [01:25:14] (03CR) 10Jforrester: "Ah, right, thanks." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/255129 (https://phabricator.wikimedia.org/T119446) (owner: 10Mdann52) [01:45:39] !log catrope@tin Synchronized php-1.27.0-wmf.8/extensions/Gadgets/: SWAT: bump MediaWikiGadgetsDefinitionRepo cache version (duration: 00m 29s) [01:45:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:46:09] RECOVERY - Unmerged changes on repository mediawiki_config on mira is OK: No changes to merge. [01:49:52] !log catrope@tin Started scap: Graph cherry-picks for wmf.8, including i18n changes [01:49:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:50:33] 7Puppet, 6Analytics-Backlog, 10Analytics-Wikimetrics: Cleanup Wikimetrics puppet module so it can run puppet continuously without own puppetmaster {dove} - https://phabricator.wikimedia.org/T101763#1864407 (10yuvipanda) Hello! After every time we change any fundamental settings (DNS, LDAP, puppetmaster, etc)... [01:59:08] (03CR) 10Alex Monk: [C: 031] Add config for wgVisualEditorUseSingleEditTab and set true for enwiki Labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/257800 (owner: 10Jforrester) [02:01:36] ori: when you set up those ldap logs for wikitech earlier… where are they? [02:02:11] (03PS12) 10Yuvipanda: Elasticsearch with proxy for tool labs [puppet] - 10https://gerrit.wikimedia.org/r/256618 (https://phabricator.wikimedia.org/T120040) (owner: 10BryanDavis) [02:15:56] andrewbogott: fluorine:/a/mw-log/ldap.log [02:16:08] ori: thanks! [02:16:27] andrewbogott: I had to live hack a change to the log verbosity config, and that was probably clobbered by the scap earlier. Hang on a sec and I'll do it properly [02:16:30] !log catrope@tin Finished scap: Graph cherry-picks for wmf.8, including i18n changes (duration: 26m 38s) [02:16:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:20:40] (03PS1) 10Ori.livneh: Increase LDAP log verbosity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/257812 [02:20:52] (03CR) 10Ori.livneh: [C: 032] Increase LDAP log verbosity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/257812 (owner: 10Ori.livneh) [02:21:13] (03Merged) 10jenkins-bot: Increase LDAP log verbosity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/257812 (owner: 10Ori.livneh) [02:23:37] !log ori@tin Synchronized wmf-config/wikitech.php: I4ef826af47: Increase LDAP log verbosity (duration: 00m 44s) [02:23:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:25:29] andrewbogott: ^ [02:27:18] ori: thanks! will look in a minute... [02:28:29] andrewbogott: there's something else you should know -- [02:29:27] the LdapAuthenticationPlugin class defines methods which are thin wrappers around the PHP LDAP API, sandwiched by error-suppressing code [02:29:56] right, so lots of errors are swallowed? [02:30:22] e.g. https://github.com/wikimedia/mediawiki-extensions-LdapAuthentication/blob/master/LdapAuthentication.php#L204-L208 [02:30:37] yes, you basically get no useful debug logging from the underlying library [02:31:07] I commented out the wfSuppressWarnings() / wfRestoreWarnings() lines earlier to get the actual error [02:31:22] ok — with luck it won’t come to that :/ [02:32:30] andrewbogott: if it does come to that, and you end up resorting to the same trick, it'll show up in /var/log/apache2/error.log ; you can grep for 'PHP Warning' [02:32:51] e.g. [Tue Dec 08 21:13:02.732993 2015] [:error] [pid 16960] [client 187.153.100.40:60538] PHP Warning: ldap_modify(): Modify: Insufficient access in /srv/mediawiki/php-1.27.0-wmf.7/extensions/LdapAuthentication/LdapAuthentication.php on line 206, referer: https://wikitech.wikimedia.org/w/index.php?title=Special:UserLogin&returnto=Special%3ANovaInstance&type=signup [02:33:34] that... should not have included a client ip. [02:34:28] !log mwdeploy@tin sync-l10n completed (1.27.0-wmf.7) (duration: 10m 27s) [02:34:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:40:24] (03PS3) 10BBlack: varnish: cache /api/rest_v1/ in backends [puppet] - 10https://gerrit.wikimedia.org/r/257630 (https://phabricator.wikimedia.org/T96847) [02:42:40] (03CR) 10BBlack: [C: 032] varnish: cache /api/rest_v1/ in backends [puppet] - 10https://gerrit.wikimedia.org/r/257630 (https://phabricator.wikimedia.org/T96847) (owner: 10BBlack) [02:43:13] (03PS3) 10BBlack: varnish: security_audit backend explicitly tier-one-only [puppet] - 10https://gerrit.wikimedia.org/r/257631 (https://phabricator.wikimedia.org/T96847) [02:44:51] (03CR) 10BBlack: [C: 032] varnish: security_audit backend explicitly tier-one-only [puppet] - 10https://gerrit.wikimedia.org/r/257631 (https://phabricator.wikimedia.org/T96847) (owner: 10BBlack) [02:46:47] (03PS4) 10BBlack: varnish: always use backend_random for pass/hfp [puppet] - 10https://gerrit.wikimedia.org/r/257636 (https://phabricator.wikimedia.org/T96847) [02:46:49] (03PS4) 10BBlack: add backend_random to maps and upload clusters config [puppet] - 10https://gerrit.wikimedia.org/r/257635 (https://phabricator.wikimedia.org/T96847) [02:46:51] (03PS4) 10BBlack: add backend_random to maps and upload clusters in conftool-data [puppet] - 10https://gerrit.wikimedia.org/r/257634 (https://phabricator.wikimedia.org/T96847) [02:46:53] (03PS4) 10BBlack: cache_upload: remove unused "rendering" backend [puppet] - 10https://gerrit.wikimedia.org/r/257633 (https://phabricator.wikimedia.org/T96847) [02:46:55] (03PS4) 10BBlack: varnish: return (pass) for CAL URLs [puppet] - 10https://gerrit.wikimedia.org/r/257632 (https://phabricator.wikimedia.org/T96847) [02:46:58] (03PS4) 10BBlack: text VCL: remove hiera mobile/text conditionals [puppet] - 10https://gerrit.wikimedia.org/r/257774 (https://phabricator.wikimedia.org/T109286) [02:47:00] (03PS2) 10BBlack: varnish: use same VCL files for text+mobile [puppet] - 10https://gerrit.wikimedia.org/r/257699 (https://phabricator.wikimedia.org/T109286) [02:49:54] (03CR) 10BBlack: [C: 032] cache_upload: remove unused "rendering" backend [puppet] - 10https://gerrit.wikimedia.org/r/257633 (https://phabricator.wikimedia.org/T96847) (owner: 10BBlack) [02:51:25] (03CR) 10BBlack: [C: 032] add backend_random to maps and upload clusters in conftool-data [puppet] - 10https://gerrit.wikimedia.org/r/257634 (https://phabricator.wikimedia.org/T96847) (owner: 10BBlack) [02:54:14] !log mwdeploy@tin sync-l10n completed (1.27.0-wmf.8) (duration: 04m 34s) [02:54:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:07:18] h [03:07:18] hh [03:07:18] h [03:07:18] h [03:07:19] h [03:07:19] hh [03:07:20] hh [03:07:29] h [03:07:29] f [03:07:30] f [03:07:30] ff [03:07:30] f [03:07:30] f [03:07:31] f [03:07:32] !ops [03:08:29] Krenair: For all we know his h and f keys were stuck :p [03:13:57] PROBLEM - Router interfaces on cr1-esams is CRITICAL: CRITICAL: host 91.198.174.245, interfaces up: 76, down: 3, dormant: 0, excluded: 0, unused: 0BRxe-0/0/0: down - Peering: AMS-IX (EvoSwitch SMF.2-9/ab) {#SMF3836} [10Gbps DF]BRxe-1/0/0: down - Peering: AMS-IX (EvoSwitch SMF.2-10/ab) {#SMF3837} [10Gbps DF]BRae2: down - AMS-IX (EvoSwitch)BR [03:37:11] (03CR) 10BBlack: [C: 032] add backend_random to maps and upload clusters config [puppet] - 10https://gerrit.wikimedia.org/r/257635 (https://phabricator.wikimedia.org/T96847) (owner: 10BBlack) [03:37:30] there's going to be a few random puppetfails on cp* machines - not a huge issue, known race condition on deployt [03:39:27] RECOVERY - Router interfaces on cr1-esams is OK: OK: host 91.198.174.245, interfaces up: 82, down: 0, dormant: 0, excluded: 0, unused: 0 [03:44:23] oh I guess for this change, the race doesn't matter, no fails [04:17:37] PROBLEM - Incoming network saturation on labstore1003 is CRITICAL: CRITICAL: 10.71% of data above the critical threshold [100000000.0] [04:44:57] !log l10nupdate@tin ResourceLoader cache refresh completed at Wed Dec 9 04:44:57 UTC 2015 (duration 1h 50m 43s) [04:45:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [04:58:47] RECOVERY - Incoming network saturation on labstore1003 is OK: OK: Less than 10.00% above the threshold [75000000.0] [06:17:58] PROBLEM - puppet last run on mw2064 is CRITICAL: CRITICAL: puppet fail [06:31:48] PROBLEM - puppet last run on mw2158 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:49] PROBLEM - puppet last run on db1059 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:49] PROBLEM - puppet last run on cp1054 is CRITICAL: CRITICAL: Puppet has 3 failures [06:31:49] PROBLEM - puppet last run on mw2050 is CRITICAL: CRITICAL: Puppet has 3 failures [06:32:08] PROBLEM - Kafka Broker Replica Max Lag on kafka1012 is CRITICAL: CRITICAL: 77.78% of data above the critical threshold [5000000.0] [06:32:28] PROBLEM - puppet last run on mw2126 is CRITICAL: CRITICAL: Puppet has 2 failures [06:32:49] PROBLEM - puppet last run on mw1009 is CRITICAL: CRITICAL: Puppet has 2 failures [06:32:57] PROBLEM - puppet last run on mw1110 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:59] PROBLEM - puppet last run on mw2036 is CRITICAL: CRITICAL: Puppet has 1 failures [06:34:18] PROBLEM - puppet last run on mw2018 is CRITICAL: CRITICAL: Puppet has 1 failures [06:34:28] PROBLEM - puppet last run on mw2073 is CRITICAL: CRITICAL: Puppet has 1 failures [06:34:51] (03CR) 10PleaseStand: "The issue is probably that Gerrit is treating everything after the semicolons as comments. So putting the entire URL value in double quote" [puppet] - 10https://gerrit.wikimedia.org/r/257193 (owner: 10Paladox) [06:36:19] PROBLEM - Kafka Broker Replica Max Lag on kafka1020 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [5000000.0] [06:45:29] RECOVERY - puppet last run on mw2064 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:47] RECOVERY - Kafka Broker Replica Max Lag on kafka1012 is OK: OK: Less than 1.00% above the threshold [1000000.0] [06:47:59] RECOVERY - Kafka Broker Replica Max Lag on kafka1020 is OK: OK: Less than 1.00% above the threshold [1000000.0] [06:57:08] RECOVERY - puppet last run on cp1054 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:07:35] (03PS1) 10Legoktm: Stop overriding Echo's EventLogging revision ids [mediawiki-config] - 10https://gerrit.wikimedia.org/r/257826 [07:22:34] (03CR) 10Mdann52: "If you can do this tomorrow, that would be great :) the RfC has concluded with consensus to do this, so anytime is godd" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/255129 (https://phabricator.wikimedia.org/T119446) (owner: 10Mdann52) [07:26:07] Please someone re-enable Wiktionary-l and add me as admin https://phabricator.wikimedia.org/T120446 so that I can warn the members [07:40:49] !log restarted create-dbusers on labstore1001 [07:40:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:57:23] (03PS10) 10Giuseppe Lavagetto: etcd: auth puppettization [WiP] [puppet] - 10https://gerrit.wikimedia.org/r/255155 (https://phabricator.wikimedia.org/T97972) [08:15:01] !log Memory utilization on the job runners began to climb around 2015-12-08T10:00:00+00:00 UTC. To investigate, enabled jemalloc profiling and restarted HHVM on mw1001. [08:15:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:16:33] (03PS1) 10Faidon Liambotis: snapshot: fix logrotate.d syntax [puppet] - 10https://gerrit.wikimedia.org/r/257832 [08:16:35] (03PS1) 10Faidon Liambotis: apt: swap backport repository's components [puppet] - 10https://gerrit.wikimedia.org/r/257833 [08:16:56] (03CR) 10Faidon Liambotis: [C: 032 V: 032] snapshot: fix logrotate.d syntax [puppet] - 10https://gerrit.wikimedia.org/r/257832 (owner: 10Faidon Liambotis) [08:17:14] (03CR) 10Faidon Liambotis: [C: 032 V: 032] apt: swap backport repository's components [puppet] - 10https://gerrit.wikimedia.org/r/257833 (owner: 10Faidon Liambotis) [08:17:51] (03PS2) 10Faidon Liambotis: Extend LDAP indices for labs with puppetVar, roleOccupant and aAAARecord [puppet] - 10https://gerrit.wikimedia.org/r/257686 (owner: 10Muehlenhoff) [08:17:57] PROBLEM - puppet last run on calcium is CRITICAL: CRITICAL: Puppet has 1 failures [08:18:16] (03CR) 10Faidon Liambotis: [C: 032] Extend LDAP indices for labs with puppetVar, roleOccupant and aAAARecord [puppet] - 10https://gerrit.wikimedia.org/r/257686 (owner: 10Muehlenhoff) [08:18:34] (03CR) 10Faidon Liambotis: [V: 032] Extend LDAP indices for labs with puppetVar, roleOccupant and aAAARecord [puppet] - 10https://gerrit.wikimedia.org/r/257686 (owner: 10Muehlenhoff) [08:22:11] (03PS1) 10Faidon Liambotis: apt: add a trailing slash to backports' uri [puppet] - 10https://gerrit.wikimedia.org/r/257834 [08:22:31] (03PS2) 10Faidon Liambotis: apt: add a trailing slash to backports' uri [puppet] - 10https://gerrit.wikimedia.org/r/257834 [08:22:45] (03CR) 10Faidon Liambotis: [C: 032 V: 032] apt: add a trailing slash to backports' uri [puppet] - 10https://gerrit.wikimedia.org/r/257834 (owner: 10Faidon Liambotis) [08:33:16] (03PS11) 10Giuseppe Lavagetto: etcd: auth puppettization [WiP] [puppet] - 10https://gerrit.wikimedia.org/r/255155 (https://phabricator.wikimedia.org/T97972) [08:33:31] pppupppettttizzzattttion? :P [08:34:31] <_joe_> yeah it's WiP [08:34:40] <_joe_> I'm fixing _that_ last [08:34:50] no I'm making fun of the two t above [08:34:58] s/puppettization/puppetization/ [08:35:05] <_joe_> yep, I know [08:35:15] <_joe_> I'm too lazy to amend the commit message twice [08:35:16] sorry, bad joke :) [08:35:49] <_joe_> that's why I did notice but told myself "I still have to remove WiP there, let's wait for it" [08:36:24] 7Puppet: Puppet support for multiple Dashiki instances running on one server - https://phabricator.wikimedia.org/T120891#1864745 (10Luke081515) [08:42:00] 7Puppet: Puppet support for multiple Dashiki instances running on one server - https://phabricator.wikimedia.org/T120891#1864754 (10yuvipanda) This should just be a simple role that allows static file hosting for multiple domains with nginx. [08:43:18] RECOVERY - puppet last run on calcium is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [08:49:06] 6operations: Add openldap/labs servers to backup - https://phabricator.wikimedia.org/T120919#1864777 (10MoritzMuehlenhoff) 3NEW [08:49:19] 6operations: Add openldap/labs servers to backup - https://phabricator.wikimedia.org/T120919#1864784 (10MoritzMuehlenhoff) p:5Triage>3Normal [09:03:00] (03PS5) 10Faidon Liambotis: Switch Central/South Asia to esams (1) [dns] - 10https://gerrit.wikimedia.org/r/239072 [09:03:29] (03CR) 10Faidon Liambotis: [C: 032] Switch Central/South Asia to esams (1) [dns] - 10https://gerrit.wikimedia.org/r/239072 (owner: 10Faidon Liambotis) [09:04:13] (03PS1) 10Faidon Liambotis: Switch Central/South Asia to esams (2) [dns] - 10https://gerrit.wikimedia.org/r/257841 [09:05:34] (03PS1) 10Faidon Liambotis: Switch Pakistan to esams (3) [dns] - 10https://gerrit.wikimedia.org/r/257842 [09:05:36] (03PS1) 10Faidon Liambotis: Switch India & BIOT to esams (4) [dns] - 10https://gerrit.wikimedia.org/r/257843 [09:07:32] (03PS4) 10Filippo Giunchedi: import debian directory [debs/bloomd] - 10https://gerrit.wikimedia.org/r/257168 (owner: 10Ori.livneh) [09:09:46] (03PS2) 10Faidon Liambotis: Switch Afghanistan and Sri Lanka to esams (2) [dns] - 10https://gerrit.wikimedia.org/r/257841 [09:13:31] (03CR) 10Faidon Liambotis: [C: 032] Switch Afghanistan and Sri Lanka to esams (2) [dns] - 10https://gerrit.wikimedia.org/r/257841 (owner: 10Faidon Liambotis) [09:14:20] (03CR) 10Filippo Giunchedi: [C: 04-1] "LGTM the package, I've updated a couple of things, still fails to start for me when ran as user bloomd:" [debs/bloomd] - 10https://gerrit.wikimedia.org/r/257168 (owner: 10Ori.livneh) [09:14:23] (03PS2) 10Faidon Liambotis: Switch Pakistan to esams (3) [dns] - 10https://gerrit.wikimedia.org/r/257842 [09:14:25] (03PS2) 10Faidon Liambotis: Switch India & BIOT to esams (4) [dns] - 10https://gerrit.wikimedia.org/r/257843 [09:22:32] (03CR) 10Faidon Liambotis: [C: 032] Switch Pakistan to esams (3) [dns] - 10https://gerrit.wikimedia.org/r/257842 (owner: 10Faidon Liambotis) [09:26:47] (03PS2) 10Lokal Profil: Make issued and modified typed [puppet] - 10https://gerrit.wikimedia.org/r/251492 (https://phabricator.wikimedia.org/T117533) [09:27:01] (03PS2) 10Lokal Profil: Localisation updates from translatewiki.net [puppet] - 10https://gerrit.wikimedia.org/r/251493 [09:29:06] why on earth is that DCAT app in puppet? [09:29:50] apergos: ? [09:30:11] eh? [09:31:05] why is DCAT, a PHP app, in ops/puppet? [09:34:38] is there a better place for it (it shouldn't really be in the mediawiki repos, it's not part of mw or an extension)? [09:34:48] just a separate repository? [09:35:17] It can but then we get to deal with how it gets deployed [09:35:37] well by "we" I mean the wikidata folks, but still [09:35:40] yes, so? [09:35:45] sec, door [09:36:36] brb they are dong work outside our apartment, it's about that I guess [09:36:56] (so if they move to a separate repo we should have a suggestion for deploymeny solution too) [09:39:38] it's not like we don't deploy code anywhere else.. [09:41:30] (03CR) 10Jcrespo: "I have not reviewed the code yet, but I would focus on having a single metric, just one, from performance schema as a proof of concept (fo" [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/256007 (owner: 10Isart) [09:41:36] apergos: can you take care of it? [09:42:27] 6operations, 10MediaWiki-General-or-Unknown, 7Graphite, 5MW-1.27-release-notes, and 2 others: mediawiki should send statsd metrics in batches - https://phabricator.wikimedia.org/T116031#1864817 (10fgiunchedi) looks like this has been deployed yesterday? still seeing single-packet traffic from appservers `... [09:45:28] 6operations, 10MediaWiki-General-or-Unknown, 7Graphite, 5MW-1.27-release-notes, and 2 others: mediawiki should send statsd metrics in batches - https://phabricator.wikimedia.org/T116031#1864818 (10ori) Only to group 0 wikis; the main wikis are on the previous version for a few more hours. [09:45:49] 6operations, 10DBA, 6WMF-Legal, 7HTTPS, 5Patch-For-Review: dbtree loads third party resources (from jquery.com and google.com) - https://phabricator.wikimedia.org/T96499#1864820 (10jcrespo) This will probably be obsoleted by T119619, so I would mark this as stalled and blocked by a decision there. [09:46:17] 6operations, 10DBA: Decide storage backend for performance schema monitoring stats - https://phabricator.wikimedia.org/T119619#1831039 (10jcrespo) [09:46:19] 6operations, 10DBA, 6WMF-Legal, 7HTTPS, 5Patch-For-Review: dbtree loads third party resources (from jquery.com and google.com) - https://phabricator.wikimedia.org/T96499#1864822 (10jcrespo) [09:46:21] paravoid: ok, I'll talk to them. [09:46:29] 6operations, 10DBA, 6WMF-Legal, 7HTTPS, 5Patch-For-Review: dbtree loads third party resources (from jquery.com and google.com) - https://phabricator.wikimedia.org/T96499#1864825 (10jcrespo) 5Open>3stalled [09:46:57] talk to them about? [09:47:07] just move it first? [09:47:15] it's not like they can deploy without you now [09:55:09] (03PS2) 10Alexandros Kosiaris: openldap: Notify slapd on acls.conf and indices.conf changes [puppet] - 10https://gerrit.wikimedia.org/r/257741 [09:55:11] (03PS4) 10Alexandros Kosiaris: openldap: Allow to specify cleartext hashing scheme [puppet] - 10https://gerrit.wikimedia.org/r/257691 [09:58:36] !log stopped slapd on serpens, run sudo -u openldap slapindex to add the new indices, started slapd [09:58:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:02:37] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Update WikimediaEnableMultiLines to OTRS 5.0.1 [software/otrs] - 10https://gerrit.wikimedia.org/r/248915 (owner: 10Alexandros Kosiaris) [10:02:37] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Update WikimediaTemplates to support 5.0.1 [software/otrs] - 10https://gerrit.wikimedia.org/r/248916 (owner: 10Alexandros Kosiaris) [10:02:38] !log manually ran update-ubuntu-mirror twice on carbon to fix mirror errors [10:02:38] (03PS2) 10Faidon Liambotis: puppet-run: do not let apt failures block agent [puppet] - 10https://gerrit.wikimedia.org/r/256148 (owner: 10BBlack) [10:02:38] !log stopped slapd on seaborgium, run sudo -u openldap slapindex to add the new indices, started slapd [10:02:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:02:38] (03CR) 10Faidon Liambotis: [C: 032 V: 032] puppet-run: do not let apt failures block agent [puppet] - 10https://gerrit.wikimedia.org/r/256148 (owner: 10BBlack) [10:02:38] ok now we got those indices populated as well [10:02:38] note that for aAAArecord we got 0 entries so really no index created but we did add it in case we do get that and since powerdns was complaining [10:03:04] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] "Package dropped on advice from the OTRS volunteers. It is not used anymore" [software/otrs] - 10https://gerrit.wikimedia.org/r/257587 (owner: 10Alexandros Kosiaris) [10:03:17] !log running hhvm-collect-heaps in screen on mw1001 with a 600s interval [10:03:41] _joe_: i'm going to leave that running ^ [10:03:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:03:52] if thats ok w/you [10:04:00] akosiaris: so just one package of our own left? [10:04:03] that's an improvement :) [10:04:07] paravoid: yup [10:04:13] plus Znuny4OTRS? [10:04:15] and we dropped FAQ as well [10:04:20] so now we got 3 [10:04:36] 1 of our own, 1 for znuny's repo, 1 for znuny's quick close [10:04:48] it's down from 6 in the production install so that's nice [10:04:50] !log restarting slapd on serpens/seaborgium to apply updated indices [10:04:52] yeah :) [10:04:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:04:58] PROBLEM - Ensure mysql credential creation for tools users is running on labstore1001 is CRITICAL: CRITICAL - Expecting active but unit create-dbusers is failed [10:05:06] moritzm: I already did that!!! [10:05:24] (11:58:35 πμ) akosiaris: !log stopped slapd on serpens, run sudo -u openldap slapindex to add the new indices, started slapd [10:05:56] akosiaris: thanks, I just realised in time before I made the actual restart [10:06:09] hehe [10:06:10] ok [10:06:11] so btw [10:06:15] something is doing this [10:06:17] Dec 9 10:05:05 seaborgium slapd[28202]: slap_global_control: unrecognized control: 1.3.6.1.4.1.4203.666.5.16 [10:06:25] trying to dereference [10:06:53] we can add the overlay to allow that [10:06:59] or just chase down the culprit [10:07:14] \o [10:08:31] it's most probably caused by the group/groupOfUniqueNames/uniqueMember lookup needed for "Directory Managers" [10:09:22] <_joe_> ori: it is [10:11:31] !sal [10:11:31] https://wikitech.wikimedia.org/wiki/Server_Admin_Log https://tools.wmflabs.org/sal/production See it and you will know all you need. [10:11:48] moritzm: no, that's an LDAP control set by the calling application [10:12:13] it does need support on the server side and of course the data being structured like that [10:12:33] somehow I am thinking gerrit [10:18:49] RECOVERY - Ensure mysql credential creation for tools users is running on labstore1001 is OK: OK - create-dbusers is active [10:21:00] !log reboot seaborgium to add 3 more vCPUs [10:23:11] akosiaris: I can reproduce that error with searching for "-E deref=member:", I'll give slapo-deref a try in vagrant [10:24:48] PROBLEM - Ensure mysql credential creation for tools users is running on labstore1001 is CRITICAL: CRITICAL - Expecting active but unit create-dbusers is failed [10:25:03] !log reboot serpens to add 3 more vCPUs [10:25:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:25:17] RECOVERY - puppet last run on mw1009 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [10:26:08] RECOVERY - puppet last run on db1059 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [10:26:24] (03PS1) 10Filippo Giunchedi: diamond: fix StreamHandler module path [puppet] - 10https://gerrit.wikimedia.org/r/257853 [10:27:06] (03PS2) 10Filippo Giunchedi: diamond: fix StreamHandler module path [puppet] - 10https://gerrit.wikimedia.org/r/257853 [10:27:08] RECOVERY - puppet last run on mw2018 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [10:27:17] RECOVERY - puppet last run on mw2126 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [10:27:18] RECOVERY - puppet last run on mw1110 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [10:27:19] RECOVERY - puppet last run on mw2073 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [10:27:48] RECOVERY - puppet last run on mw2036 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [10:28:29] RECOVERY - puppet last run on mw2158 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:28:39] RECOVERY - puppet last run on mw2050 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:45:28] !log sudo gnt-instance modify -H cpu_type=SandyBridge seaborgium to get aesni and sse cpu flags. rebooting seaborgium to apply those [10:45:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:48:49] akosiaris: url-downloader proxy is OK? [10:48:59] RECOVERY - Ensure mysql credential creation for tools users is running on labstore1001 is OK: OK - create-dbusers is active [10:49:07] akosiaris: we've some issues. [10:49:22] Not sure related to that but likely to that. [10:49:46] kart_: from what I know.. what kind of issues ? [10:50:38] akosiaris: Yandex fails. [10:50:55] akosiaris: we have not change any config, so wondering what went wrong. [10:52:47] kart_: logs ? [10:55:28] (03PS4) 10Mdann52: Allow sysops to add and remove accounts from bot group on mai.wikipedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/239854 (https://phabricator.wikimedia.org/T111898) [10:55:55] (03CR) 10jenkins-bot: [V: 04-1] Allow sysops to add and remove accounts from bot group on mai.wikipedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/239854 (https://phabricator.wikimedia.org/T111898) (owner: 10Mdann52) [11:01:28] kart_: can't find anything in the logs btw [11:01:40] akosiaris: url-downloaded looks okay. [11:01:48] kart_: not surprised ... [11:01:51] yes. bare logs. sad. [11:02:55] if there was a big problem with url_downloader icinga would have said so [11:03:29] there might be some weird problem with yandex and url_downloader interoperation but we can test that [11:03:43] (03PS5) 10Mdann52: Allow sysops to add and remove accounts from bot group on mai.wikipedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/239854 (https://phabricator.wikimedia.org/T111898) [11:04:02] akosiaris: okay :) [11:04:36] Nikerabbit is checking further. [11:08:38] (03PS5) 10Alexandros Kosiaris: openldap: Allow to specify cleartext hashing scheme [puppet] - 10https://gerrit.wikimedia.org/r/257691 [11:08:45] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] openldap: Allow to specify cleartext hashing scheme [puppet] - 10https://gerrit.wikimedia.org/r/257691 (owner: 10Alexandros Kosiaris) [11:09:15] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] diamond: fix StreamHandler module path [puppet] - 10https://gerrit.wikimedia.org/r/257853 (owner: 10Filippo Giunchedi) [11:09:17] (03PS3) 10Alexandros Kosiaris: openldap: Notify slapd on acls.conf and indices.conf changes [puppet] - 10https://gerrit.wikimedia.org/r/257741 [11:09:21] (03PS3) 10Filippo Giunchedi: diamond: fix StreamHandler module path [puppet] - 10https://gerrit.wikimedia.org/r/257853 [11:09:23] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] openldap: Notify slapd on acls.conf and indices.conf changes [puppet] - 10https://gerrit.wikimedia.org/r/257741 (owner: 10Alexandros Kosiaris) [11:09:26] (03CR) 10Filippo Giunchedi: [V: 032] diamond: fix StreamHandler module path [puppet] - 10https://gerrit.wikimedia.org/r/257853 (owner: 10Filippo Giunchedi) [11:09:52] akosiaris: good to merge ^ ? [11:10:03] (03PS4) 10Alexandros Kosiaris: openldap: Notify slapd on acls.conf and indices.conf changes [puppet] - 10https://gerrit.wikimedia.org/r/257741 [11:10:13] (03CR) 10Alexandros Kosiaris: [V: 032] openldap: Notify slapd on acls.conf and indices.conf changes [puppet] - 10https://gerrit.wikimedia.org/r/257741 (owner: 10Alexandros Kosiaris) [11:10:44] godog: yeah, I was going to do 2 at a time [11:10:48] and then... race condition!!! [11:11:14] hehehe [11:14:49] does "configure" on a labs instance work for you on wikitech? I'm getting "the requested host does not exist" for all instances I've tried [11:15:25] godog: https://github.com/BrightcoveOS/Diamond/blob/master/src/collectors/openldap/openldap.py [11:15:35] I am thinking about enabling this [11:15:40] 4 hosts [11:15:46] I suppose it should be ok ? [11:16:25] akosiaris: yup! LGTM [11:16:57] akosiaris: unrelated but development has moved to https://github.com/python-diamond/Diamond afaik [11:17:33] The requested host does not exist. [11:17:35] godog: see https://phabricator.wikimedia.org/T120904 [11:17:38] godog: yes I get the same error [11:17:46] ah.. [11:17:50] what ? [11:18:29] ah, thanks moritzm [11:18:38] it's not the ACLs, the mentioned LDAP DN doesn't have any restricted attributes [11:21:34] lol, great piece of code [11:21:50] it tries to dereference it before checking if it exists [11:22:24] not sure what to do with this tbh... [11:22:52] godog: usually I have to logout and login again when I want to do something about an instance [11:23:28] Nemo_bis: yeah that's been my experience too, in this case logout/login doesn't seem to change the outcome [11:24:18] (03PS1) 10Filippo Giunchedi: prometheus: add graphite_exporter support [puppet] - 10https://gerrit.wikimedia.org/r/257860 (https://phabricator.wikimedia.org/T92813) [11:24:20] (03PS1) 10Filippo Giunchedi: labs: tap metrics towards graphite_exporter [puppet] - 10https://gerrit.wikimedia.org/r/257861 (https://phabricator.wikimedia.org/T92813) [11:29:21] !log Update cxserver to c88ef57 [11:29:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:35:28] PROBLEM - Kafka Broker Replica Max Lag on kafka1018 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [5000000.0] [11:37:21] Nikerabbit: akosiaris is here. [11:38:14] well, so far we know that something started giving CERT_UNTRUSTED in past 24 hours [11:38:52] (03PS1) 10Alexandros Kosiaris: ldap-labs: Issue new 2048bit certificates [puppet] - 10https://gerrit.wikimedia.org/r/257863 [11:38:56] ie {"name":"cxserver","hostname":"sca1002","pid":28028,"level":50,"msg":"MT processing error: Error: Error: CERT_UNTRUSTED","time":"2015-12-09T11:29:01.710Z","v":0} [11:39:00] akosiaris: ^^ [11:39:06] akosiaris: in the log now. [11:39:29] RECOVERY - cassandra-a CQL 10.64.32.187:9042 on restbase1008 is OK: TCP OK - 0.010 second response time on port 9042 [11:41:14] kart_: for what ? [11:41:28] I see the CERT_UNTRUSTED but for what ? [11:41:28] PROBLEM - Kafka Broker Replica Max Lag on kafka1018 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [5000000.0] [11:41:31] akosiaris: Yandex. [11:41:52] so, https://www.yandex.com/ ? [11:42:01] gimme a sec to finish something first [11:42:01] akosiaris: we're debugging issue. It stopped working since yesterday. [11:42:05] akosiaris: sure. [11:42:28] (03CR) 10Alexandros Kosiaris: [C: 032] ldap-labs: Issue new 2048bit certificates [puppet] - 10https://gerrit.wikimedia.org/r/257863 (owner: 10Alexandros Kosiaris) [11:42:39] translate.yandex.com [11:47:46] Nikerabbit: it is: https://translate.yandex.net to be correct. [11:49:18] right [11:51:02] !log reissue certificates for labs-ldap.{eqiad,codfw}.wikimedia.org with a 2048bit key size [11:51:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:51:29] RECOVERY - Kafka Broker Replica Max Lag on kafka1018 is OK: OK: Less than 1.00% above the threshold [1000000.0] [11:53:36] santhosh: is here. [11:55:48] PROBLEM - Ensure mysql credential creation for tools users is running on labstore1001 is CRITICAL: CRITICAL - Expecting active but unit create-dbusers is failed [11:57:30] (03PS1) 10Muehlenhoff: Enable deref overlay [puppet] - 10https://gerrit.wikimedia.org/r/257866 [11:58:12] akosiaris: ^ would enable this also for openldap/corp, which seems harmless to me, but I can rework to make it specific to openldap/labs [11:59:11] akosiaris: ping when you're free :) [12:00:00] (03PS2) 10Muehlenhoff: Enable deref overlay [puppet] - 10https://gerrit.wikimedia.org/r/257866 [12:04:27] (03CR) 10Alexandros Kosiaris: [C: 031] "there is suprisingly little information/documentation about this online and in sources. If you 've tested it I am fine with it but I am a " [puppet] - 10https://gerrit.wikimedia.org/r/257866 (owner: 10Muehlenhoff) [12:04:45] kart_: ok, now I am free [12:04:57] so, what exactly fails ? able to reproduce it yet ? [12:05:32] santhosh: ^^ [12:05:40] akosiaris: node request through http-downloaded proxy to translate.yandex.net [12:06:13] and I can't figure out why and why did it start only recently [12:06:26] (ie yesterday most likely) [12:06:30] hmm, so curl -Iv https://translate.yandex.com on chromium (the url downloader host), says SSL certificate verify ok and everything works out ok [12:06:58] it seems it is node (not proxy downloader) which is giving the CERT_UNTRUSTED, but cxserver hadn't been restarted recently [12:07:12] so I wonder if there was any other change in the system which could cause that [12:07:34] also, nowhere else are we having problems connecting to that host except inside cxserver [12:07:43] also, curl -x url-downloader.wikimedia.org:8080 -Iv https://translate.yandex.com from sca1001, same results [12:07:51] Nikerabbit: note that I restarted cxserver after today's update. [12:07:52] which is starting to mean node is the problem... not the host [12:08:16] ls [12:08:21] sorry, wrong window [12:08:31] (03CR) 10Muehlenhoff: "Yeah, the lack of documentation is even acknowledged by upstream:" [puppet] - 10https://gerrit.wikimedia.org/r/257866 (owner: 10Muehlenhoff) [12:08:59] so I don't know where node gets the list of accepted cas [12:09:07] and I wonder if that list could have changed [12:09:44] I sure hope it uses the system ones in /etc/ssl/ [12:10:12] and yes those lists change occasionally [12:10:49] but if that was the problem, curl would fail too [12:10:50] (03PS1) 10Glaisher: Enable global AbuseFilter at French Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/257868 (https://phabricator.wikimedia.org/T120568) [12:11:51] (03PS12) 10Giuseppe Lavagetto: etcd: auth puppetization [puppet] - 10https://gerrit.wikimedia.org/r/255155 (https://phabricator.wikimedia.org/T97972) [12:11:56] Nikerabbit: got some nodejs code we could try to create a connection to yandex ? [12:12:25] well there is https://github.com/nodejs/node/blob/v0.10.25-release/src/node_root_certs.h [12:12:59] was there any nodejs update recently? [12:13:08] what ? please tell me they are joking ... [12:13:09] akosiaris: not off hand, I would need to improvise based on https://github.com/wikimedia/mediawiki-services-cxserver/blob/61ebec42a4a63f092dec4cefdbbc97f8817a5b64/mt/Yandex.js#L41 [12:13:11] (03PS1) 10Jcrespo: Repool es1019 after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/257869 [12:13:11] santhosh: no [12:15:17] https://stackoverflow.com/questions/20658120/nodejs-unable-to-read-default-cas-in-ubuntu is not very enlightening [12:17:49] RECOVERY - Ensure mysql credential creation for tools users is running on labstore1001 is OK: OK - create-dbusers is active [12:17:58] so, if node uses it's own ca list [12:18:05] (03PS13) 10Giuseppe Lavagetto: etcd: auth puppetization [puppet] - 10https://gerrit.wikimedia.org/r/255155 (https://phabricator.wikimedia.org/T97972) [12:18:35] * start date: 2015-10-23 16:31:28 GMT [12:18:37] * expire date: 2017-10-22 16:31:28 GMT [12:18:50] Nikerabbit: yandex probably just swapped it's certificate [12:19:07] PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [12:19:18] PROBLEM - Misc HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [12:19:20] akosiaris: yeah I would agree on that [12:19:41] issuer: C=RU; O=Yandex LLC; OU=Yandex Certification Authority; CN=Yandex CA [12:19:43] what ? [12:19:45] they got a CA ? [12:19:57] PROBLEM - puppet last run on mw1115 is CRITICAL: CRITICAL: Puppet has 64 failures [12:20:02] akosiaris: yes. That's I saw lately. [12:21:14] https://github.com/nodejs/node/issues/4175 coincidence this was just reported? [12:21:58] https://anonscm.debian.org/cgit/collab-maint/nodejs.git/tree/debian/patches/2014_donotinclude_root_certs.patch [12:22:09] so if that is in our node executable [12:22:16] we are barking up the wrong tree and I am most happy [12:22:24] checking [12:23:31] it is not [12:23:46] that being said Certum Trusted Network CA maybe in our list at least [12:23:54] which is the parent of yandex CA [12:26:48] ahahaha [12:26:55] Nikerabbit: kart I think I 've found it [12:27:18] RECOVERY - Misc HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [12:27:21] node has bundled a different Certum CA that the one used by yandex [12:27:41] cool. [12:27:59] yes, root vs. trusted network? [12:28:14] but how was it working till yday? :) [12:28:31] santhosh: probably yandex replaced it's cert yesterday [12:28:34] (03CR) 10Reedy: [C: 031] "(Y)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/222057 (https://phabricator.wikimedia.org/T104370) (owner: 10CSteipp) [12:28:47] more up to date node would include certum trusted network [12:28:55] but I don't think we can update node just like that [12:28:59] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [12:29:18] so I see couple of options: 1) try the 'ca' param of node-request, 2) disable validation [12:30:20] the good thing is that we we move cxserver to debian jessie, the good debian people seem to have dropped that internal node list [12:30:36] * akosiaris is dying to call the node people names [12:30:53] and not polite ones at that [12:31:06] Nikerabbit: I'd go for 1 [12:31:17] not sure how to though [12:31:31] akosiaris: 2 would perhaps be easier and we are soon migrating to service-runner [12:31:40] 1 would probably need a configuration option [12:32:18] well, with no validation you have no way of knowing you are really talking to yandex [12:33:49] I would take care of that after service-runner thing and get this back online for now [12:34:55] the actual library is switched in the rewrite [12:35:31] oh, nevermind, pref is just a wrapper around request [12:44:57] PROBLEM - HHVM rendering on mw1115 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:45:38] PROBLEM - Apache HTTP on mw1115 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:46:18] PROBLEM - Check size of conntrack table on mw1115 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:46:48] PROBLEM - Disk space on mw1115 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:46:48] PROBLEM - SSH on mw1115 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:46:49] PROBLEM - configured eth on mw1115 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:47:09] PROBLEM - DPKG on mw1115 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:47:58] PROBLEM - RAID on mw1115 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:50:47] RECOVERY - Disk space on mw1115 is OK: DISK OK [12:51:10] (03PS1) 10Muehlenhoff: Further updates to LDAP indices [puppet] - 10https://gerrit.wikimedia.org/r/257871 [12:52:29] PROBLEM - nutcracker process on mw1115 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:53:58] PROBLEM - HHVM processes on mw1115 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:54:10] (03CR) 10Mforns: [C: 031] "LGTM, thanks!" [puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/256884 (owner: 10Dzahn) [12:56:37] RECOVERY - nutcracker process on mw1115 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [12:57:57] RECOVERY - HHVM processes on mw1115 is OK: PROCS OK: 6 processes with command name hhvm [13:02:48] PROBLEM - nutcracker process on mw1115 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:03:40] (03CR) 10Alexandros Kosiaris: [C: 031] Further updates to LDAP indices [puppet] - 10https://gerrit.wikimedia.org/r/257871 (owner: 10Muehlenhoff) [13:06:46] !log restarting Jenkins (upgrading Gearman plugin) [13:06:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:08:35] http://www.wired.com/2015/12/bitcoins-creator-satoshi-nakamoto-is-probably-this-unknown-australian-genius/ [13:09:00] lol [13:09:48] wasn't he nominated for an Economic Nobel Prize? [13:10:03] no, but [13:10:04] http://www.theguardian.com/technology/2015/dec/09/bitcoin-founder-craig-wrights-home-raided-by-australian-police [13:10:08] PROBLEM - HHVM processes on mw1115 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:10:11] police raided his home [13:10:39] ah he has been nominated but dimmed not eligible apparently [13:10:45] ( https://www.cryptocoinsnews.com/satoshi-nakamoto-not-eligible-nobel-prize/ ) [13:10:59] PROBLEM - Disk space on mw1115 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:11:58] RECOVERY - HHVM processes on mw1115 is OK: PROCS OK: 6 processes with command name hhvm [13:12:07] RECOVERY - RAID on mw1115 is OK: OK: no RAID installed [13:12:19] RECOVERY - Check size of conntrack table on mw1115 is OK: OK: nf_conntrack is 0 % full [13:12:38] RECOVERY - nutcracker process on mw1115 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [13:12:49] RECOVERY - Disk space on mw1115 is OK: DISK OK [13:12:57] RECOVERY - SSH on mw1115 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3 (protocol 2.0) [13:12:58] RECOVERY - configured eth on mw1115 is OK: OK - interfaces up [13:13:18] RECOVERY - DPKG on mw1115 is OK: All packages OK [13:15:27] akosiaris: is there any easier way run utils/new_wmf_service.py -- it complains about existing cxserver module :) [13:15:30] (03CR) 10DCausse: [C: 031] "In order to enable the beta feature we will have to set wgUseCirrusSearchUseCompletionSuggester=true, but I think we should deploy this pa" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/256974 (owner: 10EBernhardson) [13:18:08] PROBLEM - RAID on mw1115 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:18:08] PROBLEM - HHVM processes on mw1115 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:18:24] 6operations, 10Dumps-Generation, 10hardware-requests: determine hardware needs for dumps in eqiad (boxes out of warranty, capacity planning) - https://phabricator.wikimedia.org/T118154#1865118 (10mark) @ArielGlenn: 3? or 4? Could you please outline what you need, in an ideal situation, if we were starting f... [13:18:28] PROBLEM - Check size of conntrack table on mw1115 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:18:48] PROBLEM - nutcracker process on mw1115 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:18:58] PROBLEM - SSH on mw1115 is CRITICAL: Server answer [13:18:59] PROBLEM - Disk space on mw1115 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:19:08] PROBLEM - configured eth on mw1115 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:19:28] PROBLEM - DPKG on mw1115 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:21:20] kart_: nope, just rename the existing one to something else [13:21:33] it's "new"_service, not existing one ;-) [13:21:58] PROBLEM - salt-minion processes on mw1115 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:21:58] PROBLEM - dhclient process on mw1115 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:21:58] PROBLEM - nutcracker port on mw1115 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:24:02] akosiaris: :) [13:29:03] (03PS1) 10Alexandros Kosiaris: etherpad: ferm rules for the etherpad service port [puppet] - 10https://gerrit.wikimedia.org/r/257874 [13:29:57] RECOVERY - nutcracker port on mw1115 is OK: TCP OK - 0.000 second response time on port 11212 [13:29:57] RECOVERY - dhclient process on mw1115 is OK: PROCS OK: 0 processes with command name dhclient [13:29:57] RECOVERY - salt-minion processes on mw1115 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [13:29:59] RECOVERY - HHVM processes on mw1115 is OK: PROCS OK: 1 process with command name hhvm [13:30:07] RECOVERY - RAID on mw1115 is OK: OK: no RAID installed [13:30:29] RECOVERY - Check size of conntrack table on mw1115 is OK: OK: nf_conntrack is 0 % full [13:30:48] RECOVERY - nutcracker process on mw1115 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [13:31:01] (03CR) 10Alexandros Kosiaris: [C: 032] etherpad: ferm rules for the etherpad service port [puppet] - 10https://gerrit.wikimedia.org/r/257874 (owner: 10Alexandros Kosiaris) [13:31:08] RECOVERY - Disk space on mw1115 is OK: DISK OK [13:31:09] RECOVERY - SSH on mw1115 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3 (protocol 2.0) [13:31:09] RECOVERY - configured eth on mw1115 is OK: OK - interfaces up [13:31:37] RECOVERY - DPKG on mw1115 is OK: All packages OK [13:32:00] RECOVERY - Apache HTTP on mw1115 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.128 second response time [13:32:09] (03CR) 10Alexandros Kosiaris: "Indeed. fixed in https://gerrit.wikimedia.org/r/#/c/257874/" [puppet] - 10https://gerrit.wikimedia.org/r/255406 (owner: 10Alexandros Kosiaris) [13:33:17] RECOVERY - HHVM rendering on mw1115 is OK: HTTP OK: HTTP/1.1 200 OK - 64015 bytes in 1.989 second response time [13:34:19] moritzm: I’m up. Have time to work on the wikitech thing a bit? [13:34:29] RECOVERY - puppet last run on mw1115 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:35:12] andrewbogott: sure, let me merge the deref overlay change first, I'll poke you in 10-15 mins, ok? [13:35:20] sure [13:39:51] godog: what's the status of "The requested host does not exist"? Reported? [13:40:16] can't find https://phabricator.wikimedia.org/search/query/UwWP7jiWKtMH/#R [13:43:02] Nemo_bis: it has a bug and we’re working on it. [13:43:21] andrewbogott: ah thanks, URL? [13:43:29] it’s most likely due to this: https://phabricator.wikimedia.org/T120904 [13:43:39] ok thanks [13:46:08] (03PS3) 10Muehlenhoff: Enable deref overlay [puppet] - 10https://gerrit.wikimedia.org/r/257866 [13:46:45] (03CR) 10Muehlenhoff: [C: 032 V: 032] Enable deref overlay [puppet] - 10https://gerrit.wikimedia.org/r/257866 (owner: 10Muehlenhoff) [13:47:33] (03PS8) 10KartikMistry: WIP: service-runner migration for cxserver [puppet] - 10https://gerrit.wikimedia.org/r/250910 (https://phabricator.wikimedia.org/T117657) [13:49:59] (03Restored) 10Alexandros Kosiaris: Trust the upstream proxy to have the correct client IP [puppet] - 10https://gerrit.wikimedia.org/r/255404 (owner: 10Alexandros Kosiaris) [13:50:12] (03PS3) 10Alexandros Kosiaris: Trust the upstream proxy to have the correct client IP [puppet] - 10https://gerrit.wikimedia.org/r/255404 [13:52:54] (03CR) 10Alexandros Kosiaris: [C: 032] Trust the upstream proxy to have the correct client IP [puppet] - 10https://gerrit.wikimedia.org/r/255404 (owner: 10Alexandros Kosiaris) [13:53:57] RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 226, down: 0, dormant: 0, excluded: 0, unused: 0 [13:57:48] PROBLEM - Ensure mysql credential creation for tools users is running on labstore1001 is CRITICAL: CRITICAL - Expecting active but unit create-dbusers is failed [14:01:10] Nemo_bis: how’s that, better? [14:03:18] Yes! [14:03:18] (03PS1) 10Luke081515: Add three new groups to pawiki, and allow sysops to add or remove users to them [mediawiki-config] - 10https://gerrit.wikimedia.org/r/257881 (https://phabricator.wikimedia.org/T120369) [14:03:35] Nemo_bis: ok, I need to wipe the cache and reset tokens (and log everybody out) but that should do it. [14:03:41] (03CR) 10jenkins-bot: [V: 04-1] Add three new groups to pawiki, and allow sysops to add or remove users to them [mediawiki-config] - 10https://gerrit.wikimedia.org/r/257881 (https://phabricator.wikimedia.org/T120369) (owner: 10Luke081515) [14:03:53] Good [14:04:23] 6operations, 10Dumps-Generation, 10hardware-requests: determine hardware needs for dumps in eqiad (boxes out of warranty, capacity planning) - https://phabricator.wikimedia.org/T118154#1865190 (10ArielGlenn) if I were starting from scratch I would ask for 3 boxes all alike, as quoted by Robh, and I would giv... [14:04:40] akosiaris: the deref overlay fixes the wikitech problem (T120904), so it was OSM and not gerrit after all [14:05:53] lol [14:05:56] really ? [14:06:01] (03PS2) 10Luke081515: Add three new groups to pawiki, and allow sysops to add or remove users to them [mediawiki-config] - 10https://gerrit.wikimedia.org/r/257881 (https://phabricator.wikimedia.org/T120369) [14:06:26] (03CR) 10jenkins-bot: [V: 04-1] Add three new groups to pawiki, and allow sysops to add or remove users to them [mediawiki-config] - 10https://gerrit.wikimedia.org/r/257881 (https://phabricator.wikimedia.org/T120369) (owner: 10Luke081515) [14:06:26] ok, that explain why I could not reproduce it [14:07:05] sigh OSM is such a bad software [14:07:09] and abandonware [14:07:31] (03PS3) 10Luke081515: Add three new groups to pawiki, and allow sysops to add or remove users to them [mediawiki-config] - 10https://gerrit.wikimedia.org/r/257881 (https://phabricator.wikimedia.org/T120369) [14:08:58] akosiaris: there's also very irritating caching involved. when I enabled the deref feature it still failed, and only after Andrew disabled some caching magic it started to work again... [14:09:36] !log decommission restbase1007 [14:09:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:10:13] yeah, not surprised [14:10:17] if you look at that code [14:10:26] $this->domainInfo = LdapAuthenticationPlugin::ldap_get_entries( $wgAuth->ldapconn, $result ); $wgMemc->set( $key, $this->domainInfo, 3600 * 24 ); } if ( $this->domainInfo ) { [14:10:36] it's caching before it actually checks the result [14:12:45] (03CR) 10Alexandros Kosiaris: [C: 04-1] WIP: service-runner migration for cxserver (039 comments) [puppet] - 10https://gerrit.wikimedia.org/r/250910 (https://phabricator.wikimedia.org/T117657) (owner: 10KartikMistry) [14:14:58] akosiaris: OSM is under attack since yesterday [14:15:27] Well, or they were; dunno if there are still consequences https://lists.openstreetmap.org/pipermail/talk/2015-December/075171.html [14:15:30] (03CR) 10BBlack: [C: 031] Have misc-web talk directly to etherpad-lite [puppet] - 10https://gerrit.wikimedia.org/r/255406 (owner: 10Alexandros Kosiaris) [14:16:30] Nemo_bis: it's not OSM that was under attack. it was the UK academic network [14:16:56] I wonder why .. probably they hosted something .. like IRC servers ? [14:17:04] or some governmental web site [14:17:49] RECOVERY - Ensure mysql credential creation for tools users is running on labstore1001 is OK: OK - create-dbusers is active [14:25:59] (03PS5) 10DCausse: Add 2 payloads map fields to CirrusSearchRequestSet avro schema [mediawiki-config] - 10https://gerrit.wikimedia.org/r/252957 (https://phabricator.wikimedia.org/T118570) [14:33:06] (03CR) 10Luke081515: [C: 031] Allow sysops to add and remove accounts from bot group on mai.wikipedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/239854 (https://phabricator.wikimedia.org/T111898) (owner: 10Mdann52) [14:35:55] (03CR) 10Alexandros Kosiaris: WIP: service-runner migration for cxserver (037 comments) [puppet] - 10https://gerrit.wikimedia.org/r/250910 (https://phabricator.wikimedia.org/T117657) (owner: 10KartikMistry) [14:37:04] (03PS14) 10Giuseppe Lavagetto: etcd: auth puppetization [puppet] - 10https://gerrit.wikimedia.org/r/255155 (https://phabricator.wikimedia.org/T97972) [14:38:56] 6operations, 10ops-codfw: es2010 failed disk (degraded RAID) - https://phabricator.wikimedia.org/T117848#1865288 (10jcrespo) @Cmjohnson We have mark's approval and Papaul is back, do you need something that I can help with to organize the shipping? Aside from Mark's advice, I would add safe-deleting the disks... [14:42:38] (03PS9) 10KartikMistry: WIP: service-runner migration for cxserver [puppet] - 10https://gerrit.wikimedia.org/r/250910 (https://phabricator.wikimedia.org/T117657) [14:53:25] I'm checking out a patch on tin to sync to mw1017 for testing [14:58:38] (03CR) 10Alexandros Kosiaris: [C: 04-1] WIP: service-runner migration for cxserver (037 comments) [puppet] - 10https://gerrit.wikimedia.org/r/250910 (https://phabricator.wikimedia.org/T117657) (owner: 10KartikMistry) [14:59:05] (03CR) 10Jcrespo: [C: 032] Repool es1019 after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/257869 (owner: 10Jcrespo) [15:00:20] (03CR) 10Alexandros Kosiaris: WIP: service-runner migration for cxserver (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/250910 (https://phabricator.wikimedia.org/T117657) (owner: 10KartikMistry) [15:01:01] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Repool es1019 after maintenance (duration: 00m 33s) [15:01:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:03:50] (03PS2) 10EBernhardson: Use event-schemas repository for avro schemas [mediawiki-config] - 10https://gerrit.wikimedia.org/r/255135 [15:04:13] (03CR) 10jenkins-bot: [V: 04-1] Use event-schemas repository for avro schemas [mediawiki-config] - 10https://gerrit.wikimedia.org/r/255135 (owner: 10EBernhardson) [15:09:26] (03PS5) 10BBlack: varnish: always use backend_random for pass/hfp [puppet] - 10https://gerrit.wikimedia.org/r/257636 (https://phabricator.wikimedia.org/T96847) [15:09:40] (03PS5) 10BBlack: text VCL: remove hiera mobile/text conditionals [puppet] - 10https://gerrit.wikimedia.org/r/257774 (https://phabricator.wikimedia.org/T109286) [15:09:42] (03PS3) 10BBlack: varnish: use same VCL files for text+mobile [puppet] - 10https://gerrit.wikimedia.org/r/257699 (https://phabricator.wikimedia.org/T109286) [15:10:22] 6operations, 6Labs, 10wikitech.wikimedia.org, 5Patch-For-Review, 7Wikimedia-log-errors: Job queue broken for labswiki (jobs for wikitech.wikimedia.org are not running) - https://phabricator.wikimedia.org/T117394#1865388 (10jcrespo) 5Open>3Resolved I can confirm this is fixed, last error has timestamp... [15:13:03] (03CR) 10BBlack: [C: 032] varnish: always use backend_random for pass/hfp [puppet] - 10https://gerrit.wikimedia.org/r/257636 (https://phabricator.wikimedia.org/T96847) (owner: 10BBlack) [15:16:04] (03PS3) 10EBernhardson: Use event-schemas repository for avro schemas [mediawiki-config] - 10https://gerrit.wikimedia.org/r/255135 [15:16:34] (03CR) 10jenkins-bot: [V: 04-1] Use event-schemas repository for avro schemas [mediawiki-config] - 10https://gerrit.wikimedia.org/r/255135 (owner: 10EBernhardson) [15:17:00] 6operations, 6Phabricator: migrate RT maint-announce into phabricator - https://phabricator.wikimedia.org/T118176#1865407 (10chasemp) p:5Triage>3Normal [15:17:35] (03CR) 10EBernhardson: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/255135 (owner: 10EBernhardson) [15:17:53] (03PS43) 10Ottomata: [WIP] Puppetize eventlogging-service with systemd in role::eventbus [puppet] - 10https://gerrit.wikimedia.org/r/253465 (https://phabricator.wikimedia.org/T118780) [15:18:38] PROBLEM - Kafka Broker Replica Max Lag on kafka1022 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [5000000.0] [15:18:46] 6operations, 6Phabricator: migrate RT maint-announce into phabricator - https://phabricator.wikimedia.org/T118176#1865419 (10chasemp) a:5chasemp>3RobH I modified your plan slightly to reflect the technical possibilities of spaces and task creation. i.e. phab does almost all of this natively so we should t... [15:22:57] (03CR) 10EBernhardson: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/255135 (owner: 10EBernhardson) [15:24:03] (03CR) 10Muehlenhoff: Add auth2001 partitioning entries Bug:T120263 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/257643 (https://phabricator.wikimedia.org/T120263) (owner: 10Papaul) [15:24:37] PROBLEM - Kafka Broker Replica Max Lag on kafka1022 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [5000000.0] [15:24:51] 6operations, 6Phabricator: migrate RT maint-announce into phabricator - https://phabricator.wikimedia.org/T118176#1865479 (10chasemp) [15:24:58] (03CR) 10EBernhardson: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/255135 (owner: 10EBernhardson) [15:29:21] PROBLEM - Kafka Broker Replica Max Lag on kafka1022 is CRITICAL: CRITICAL: 62.50% of data above the critical threshold [5000000.0] [15:30:24] (03PS10) 10KartikMistry: WIP: service-runner migration for cxserver [puppet] - 10https://gerrit.wikimedia.org/r/250910 (https://phabricator.wikimedia.org/T117657) [15:34:49] RECOVERY - Kafka Broker Replica Max Lag on kafka1022 is OK: OK: Less than 1.00% above the threshold [1000000.0] [15:34:53] (03PS3) 10Papaul: Add auth2001 partitioning entries Bug:T120263 [puppet] - 10https://gerrit.wikimedia.org/r/257643 (https://phabricator.wikimedia.org/T120263) [15:35:16] (03PS11) 10KartikMistry: service-runner migration for cxserver [puppet] - 10https://gerrit.wikimedia.org/r/250910 (https://phabricator.wikimedia.org/T117657) [15:36:12] (03CR) 10KartikMistry: service-runner migration for cxserver (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/250910 (https://phabricator.wikimedia.org/T117657) (owner: 10KartikMistry) [15:38:48] PROBLEM - puppet last run on mw2187 is CRITICAL: CRITICAL: Puppet has 1 failures [15:40:56] 6operations, 10RESTBase, 6Services: Switch RESTBase to use service::node - https://phabricator.wikimedia.org/T118401#1865545 (10mobrovac) a:3mobrovac [15:44:04] (03PS1) 10Mobrovac: RESTBase: Switch to service::node [puppet] - 10https://gerrit.wikimedia.org/r/257898 (https://phabricator.wikimedia.org/T118401) [15:45:08] (03CR) 10jenkins-bot: [V: 04-1] RESTBase: Switch to service::node [puppet] - 10https://gerrit.wikimedia.org/r/257898 (https://phabricator.wikimedia.org/T118401) (owner: 10Mobrovac) [15:45:29] PROBLEM - puppet last run on mw1125 is CRITICAL: CRITICAL: Puppet has 93 failures [15:47:25] (03PS2) 10Mobrovac: RESTBase: Switch to service::node [puppet] - 10https://gerrit.wikimedia.org/r/257898 (https://phabricator.wikimedia.org/T118401) [15:48:04] !log uploaded to apt.wikimedia.org jessie-wikimedia: etherpad-lite_1.5.7-2 [15:48:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:51:52] 6operations, 10ops-eqiad: Remove all out of warranty unused cp10xx's from A2 - https://phabricator.wikimedia.org/T120856#1865593 (10Cmjohnson) p:5Triage>3Normal [15:54:03] (03CR) 10EBernhardson: "suggest abandoning, will go with https://gerrit.wikimedia.org/r/#/c/255135/ which submodules the event-schemas repository" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/252957 (https://phabricator.wikimedia.org/T118570) (owner: 10DCausse) [15:54:19] (03CR) 10Mobrovac: [C: 04-1] service-runner migration for cxserver (036 comments) [puppet] - 10https://gerrit.wikimedia.org/r/250910 (https://phabricator.wikimedia.org/T117657) (owner: 10KartikMistry) [15:54:32] (03PS15) 10Giuseppe Lavagetto: etcd: auth puppetization [puppet] - 10https://gerrit.wikimedia.org/r/255155 (https://phabricator.wikimedia.org/T97972) [15:55:25] mobrovac: Thanks! [15:55:40] yw kart_! [15:56:41] (03Abandoned) 10DCausse: Add 2 payloads map fields to CirrusSearchRequestSet avro schema [mediawiki-config] - 10https://gerrit.wikimedia.org/r/252957 (https://phabricator.wikimedia.org/T118570) (owner: 10DCausse) [15:57:28] (03PS4) 10EBernhardson: Use event-schemas repository for avro schemas [mediawiki-config] - 10https://gerrit.wikimedia.org/r/255135 [15:58:33] (03PS1) 10Giuseppe Lavagetto: Contint: add python-etcd to the latest version (for conftool) [puppet] - 10https://gerrit.wikimedia.org/r/257904 [15:58:40] <_joe_> hashar: ^^ [15:59:15] (03CR) 10Alexandros Kosiaris: [C: 04-1] service-runner migration for cxserver (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/250910 (https://phabricator.wikimedia.org/T117657) (owner: 10KartikMistry) [15:59:23] * James_F waves for SWAT, BTW. [16:00:04] anomie ostriches thcipriani marktraceur Krenair: Respected human, time to deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20151209T1600). Please do the needful. [16:00:04] thcipriani yurik jgirault James_F jzerebecki ebernhardson: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [16:00:18] I'll SWAT. Lots to get through, as usual config changes first since those move quickly, followed by extension backports, followed by core and anything that needs a full scap. James_F you're up first. [16:00:23] Kk. [16:00:25] Oh, wait. [16:00:29] I've got one more. [16:00:49] And wikitechwiki has logged me out and my damn phone won't show me my 2FA because it's restarting. [16:00:51] * James_F sgihs. [16:01:06] #swatproblems [16:01:18] Indeed. [16:01:25] (03PS5) 10EBernhardson: Use event-schemas repository for avro schemas [mediawiki-config] - 10https://gerrit.wikimedia.org/r/255135 (https://phabricator.wikimedia.org/T118570) [16:01:34] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/257800 (owner: 10Jforrester) [16:02:22] (03Merged) 10jenkins-bot: Add config for wgVisualEditorUseSingleEditTab and set true for enwiki Labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/257800 (owner: 10Jforrester) [16:03:27] 11 patches eh? :) [16:03:54] o/ [16:03:57] James_F: CommonSettings first looks right? [16:04:04] (03CR) 10Hashar: [C: 031] "We can not use contint::packages::ops on labs yet because it includes authdns::lint which needs GeoIP." [puppet] - 10https://gerrit.wikimedia.org/r/257904 (owner: 10Giuseppe Lavagetto) [16:04:12] thcipriani: Yes, sorry. Also, should be Beta-Cluster only. [16:04:22] (In impact, obviously you sync to both. :-)) [16:04:30] James_F: kk thanks :) [16:04:40] greg-g: I *think* all of Yuri's are deployed. [16:04:50] ah, good [16:04:51] greg-g: after a freeze there is greater instability than before? [16:05:08] RECOVERY - puppet last run on mw2187 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:05:27] (03CR) 10Giuseppe Lavagetto: [C: 032] Contint: add python-etcd to the latest version (for conftool) [puppet] - 10https://gerrit.wikimedia.org/r/257904 (owner: 10Giuseppe Lavagetto) [16:05:45] !log thcipriani@tin Synchronized wmf-config/CommonSettings.php: SWAT: Add config for wgVisualEditorUseSingleEditTab and set true for enwiki Labs Part I [[gerrit:257800]] (duration: 00m 29s) [16:05:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:06:25] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Add config for wgVisualEditorUseSingleEditTab and set true for enwiki Labs Part II [[gerrit:257800]] (duration: 00m 28s) [16:06:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:06:50] ^ James_F should be sync'd (blew up logs, but that _should_ be fixed after 2nd sync) [16:06:58] * James_F nods. [16:07:25] Hmm. Not immediately having an effect. [16:08:24] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/256881 (owner: 10Jforrester) [16:08:53] jzerebecki: there's a greater number of patches going out, so theoretically yes [16:09:09] (03Merged) 10jenkins-bot: Enable importupload and some import sources for officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/256881 (owner: 10Jforrester) [16:09:29] greg-g: Probably around 500 rather than 150? [16:09:33] James_F: hmm, the logs definitely suggest that it was sync'd :) [16:09:45] thcipriani: Yeah, I'm suspecting a bug. No need to revert. [16:09:53] kk [16:10:38] thcipriani: Aha, now working. Thanks. [16:10:44] nice! [16:11:17] PROBLEM - puppet last run on cp3004 is CRITICAL: CRITICAL: puppet fail [16:11:23] James_F: sounds about right [16:11:47] Also, woo: no more second edit tab. [16:12:16] * ebernhardson riles up the masses :P [16:12:18] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable importupload and some import sources for officewiki [[gerrit:256881]] (duration: 00m 42s) [16:12:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:12:23] ^ James_F check please [16:12:42] Ha. I don't actually have import rights on officewiki. [16:12:58] (03PS3) 10Mobrovac: RESTBase: Switch to service::node [puppet] - 10https://gerrit.wikimedia.org/r/257898 (https://phabricator.wikimedia.org/T118401) [16:13:08] greg-g: Do you? [16:14:10] not that I know of [16:14:40] * James_F calls in the cavalry. [16:14:55] thcipriani: Nothing seems amiss for now; please continue. [16:15:07] James_F: kk, thanks. [16:15:31] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/253371 (https://phabricator.wikimedia.org/T115002) (owner: 10Greg Grossmeier) [16:15:56] thcipriani: Confirmed by someone who does have +import. All working. :-) [16:16:19] (03Merged) 10jenkins-bot: Move Catalan and Hebrew Wikipedias to group1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/253371 (https://phabricator.wikimedia.org/T115002) (owner: 10Greg Grossmeier) [16:16:21] James_F: awesome, thanks for finding someone to check. [16:17:48] PROBLEM - puppet last run on mw2114 is CRITICAL: CRITICAL: puppet fail [16:18:24] (03CR) 10Giuseppe Lavagetto: "recheck" [software/conftool] - 10https://gerrit.wikimedia.org/r/256482 (owner: 10Giuseppe Lavagetto) [16:18:43] !log thcipriani@tin Synchronized dblists/group1.dblist: SWAT: Move Catalan and Hebrew Wikipedias to group1 [[gerrit:253371]] (duration: 00m 29s) [16:18:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:18:55] ^ greg-g James_F sync'd (I'll doublecheck during train today) [16:19:02] 6operations, 10hardware-requests: spare swift disks order - https://phabricator.wikimedia.org/T119698#1865649 (10Papaul) [16:19:40] ebernhardson: you're up. [16:19:47] sweet! [16:19:59] * greg-g nods [16:20:05] (03CR) 10Luke081515: [C: 031] Set initial Staff password policy [mediawiki-config] - 10https://gerrit.wikimedia.org/r/222057 (https://phabricator.wikimedia.org/T104370) (owner: 10CSteipp) [16:20:34] thcipriani: both are pretty straight forward, the a/b test one turns something off. The other is almost a noop, in that it doesn't change anything thats currently running just lets us prep for a beta feature release next week [16:20:53] (03PS6) 10BBlack: text VCL: remove hiera mobile/text conditionals [puppet] - 10https://gerrit.wikimedia.org/r/257774 (https://phabricator.wikimedia.org/T109286) [16:20:55] (03PS4) 10BBlack: varnish: use same VCL files for text+mobile [puppet] - 10https://gerrit.wikimedia.org/r/257699 (https://phabricator.wikimedia.org/T109286) [16:21:01] ebernhardson: kk, I was just going to ask if the noop one needed any maintenance scripts or anything. [16:21:24] thcipriani: yea i'll be running it manually, then submitting a puppet patch that does re-runs once a week once verified [16:21:26] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/256974 (owner: 10EBernhardson) [16:22:40] (03Merged) 10jenkins-bot: Set initial titlesuggest shard sizes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/256974 (owner: 10EBernhardson) [16:22:59] PROBLEM - puppet last run on mw2152 is CRITICAL: CRITICAL: puppet fail [16:25:01] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Set initial titlesuggest shard sizes [[gerrit:256974]] (duration: 00m 29s) [16:25:03] ^ ebernhardson check please [16:25:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:25:11] well, if possible :) [16:25:25] jynus: yt? [16:25:25] thcipriani: running on a tiny wiki :) [16:25:33] (03PS4) 10Mobrovac: RESTBase: Switch to service::node [puppet] - 10https://gerrit.wikimedia.org/r/257898 (https://phabricator.wikimedia.org/T118401) [16:25:47] thcipriani: worked as expected [16:25:51] (03CR) 10Giuseppe Lavagetto: [C: 031] "DTRT in labs, will test it next on k8s" [puppet] - 10https://gerrit.wikimedia.org/r/255155 (https://phabricator.wikimedia.org/T97972) (owner: 10Giuseppe Lavagetto) [16:26:10] ebernhardson: nice, thanks. Next one, then. [16:26:47] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/254071 (https://phabricator.wikimedia.org/T118292) (owner: 10EBernhardson) [16:28:41] (03PS12) 10KartikMistry: service-runner migration for cxserver [puppet] - 10https://gerrit.wikimedia.org/r/250910 (https://phabricator.wikimedia.org/T117657) [16:29:13] hmm, zuul not picking it up [16:29:40] (03CR) 10KartikMistry: service-runner migration for cxserver (039 comments) [puppet] - 10https://gerrit.wikimedia.org/r/250910 (https://phabricator.wikimedia.org/T117657) (owner: 10KartikMistry) [16:30:50] (03PS1) 10BBlack: text VCL: move layer-common vcl_hash to text-common [puppet] - 10https://gerrit.wikimedia.org/r/257917 [16:30:52] (03PS1) 10BBlack: text VCL: move geoip code into common VCL [puppet] - 10https://gerrit.wikimedia.org/r/257918 [16:30:54] (03Merged) 10jenkins-bot: Turn off language detection user test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/254071 (https://phabricator.wikimedia.org/T118292) (owner: 10EBernhardson) [16:31:01] akosiaris: etherpad is down, not sure if you are done with the upgrade [16:31:22] gilles: it went awry [16:31:24] had to revert [16:31:40] gilles: it should be fine now [16:31:46] (03CR) 10BBlack: [C: 032] "puppet compiler diffs on cache_text are just blank lines (no-op). mobile diffs don't apply until daemon restart, so I can manually diff t" [puppet] - 10https://gerrit.wikimedia.org/r/257699 (https://phabricator.wikimedia.org/T109286) (owner: 10BBlack) [16:31:48] yep, thanks [16:31:50] damned software ... [16:32:10] (03CR) 10jenkins-bot: [V: 04-1] service-runner migration for cxserver [puppet] - 10https://gerrit.wikimedia.org/r/250910 (https://phabricator.wikimedia.org/T117657) (owner: 10KartikMistry) [16:32:20] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Turn off language detection user test [[gerrit:254071]] (duration: 00m 29s) [16:32:24] ^ ebernhardson check please [16:32:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:33:10] thcipriani: still a couple test logs coming in, but it dropped from 200/s to 4/s, i imagine the rest will trail off soon ehre [16:33:28] ebernhardson: kk, thanks! [16:34:26] ok, from the looks of it, the one I had up for swat sync'd last night: https://gerrit.wikimedia.org/r/#/c/257788 [16:35:35] yurik: graph patches, did these get reverted? If so can we revert revert? [16:36:00] thcipriani: I think they were pushed last night by RoanKattouw_away. [16:36:38] PROBLEM - puppet last run on rdb2004 is CRITICAL: CRITICAL: puppet fail [16:36:49] PROBLEM - puppet last run on cp4020 is CRITICAL: CRITICAL: puppet fail [16:37:38] RECOVERY - puppet last run on cp3004 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [16:38:05] thcipriani: Could you throw in https://gerrit.wikimedia.org/r/#/c/255129/ (config change)? [16:38:07] PROBLEM - puppet last run on mw2122 is CRITICAL: CRITICAL: Puppet has 2 failures [16:38:17] hmm, yeah, spot-check confirms [16:38:27] PROBLEM - Codfw HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [16:38:40] * James_F slyly attempts to trade one new patch for five done ones. :-) [16:38:56] James_F: heh, sure [16:39:05] jzerebecki: did your patch also get out last night? [16:39:54] thcipriani: the submodule update worked fine. but is that what you asked? [16:39:58] (03CR) 10Mobrovac: "https://puppet-compiler.wmflabs.org/1465/ shows no functional differences for all of the affected nodes - restbase1001, aqs1001, sca1001 a" [puppet] - 10https://gerrit.wikimedia.org/r/257898 (https://phabricator.wikimedia.org/T118401) (owner: 10Mobrovac) [16:40:33] jzerebecki: you still need it sync'd out this morning? (confirming since some other patches were sync'd already) [16:41:01] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/255129 (https://phabricator.wikimedia.org/T119446) (owner: 10Mdann52) [16:41:18] thcipriani: no I have no unsynced but already merged patches. I have 1 unmerged for this SWAT. [16:41:50] (03Merged) 10jenkins-bot: Add rights to CU+OS groups on en.wikipedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/255129 (https://phabricator.wikimedia.org/T119446) (owner: 10Mdann52) [16:42:23] jzerebecki: sorry for the confusion, I was looking at the wrong patch set :\ [16:44:16] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Add rights to CU+OS groups on en.wikipedia.org [[gerrit:255129]] (duration: 00m 29s) [16:44:19] RECOVERY - Codfw HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [16:44:33] ^ James_F check please [16:44:58] thcipriani: Yup, working. [16:45:00] (03PS1) 10GWicke: Add a robots.txt to disallow indexing of API content [puppet] - 10https://gerrit.wikimedia.org/r/257922 (https://phabricator.wikimedia.org/T119786) [16:46:08] RECOVERY - puppet last run on mw2114 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [16:48:50] RECOVERY - puppet last run on cp4020 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:49:00] 6operations, 10ops-codfw, 5Patch-For-Review: power off Codfw-Cisco Servers - https://phabricator.wikimedia.org/T115372#1865790 (10Papaul) Wipe complete on all the Cisco servers in C1, There are power off and disconnected from mgmt, production switches and PUD's [16:51:18] RECOVERY - puppet last run on mw2152 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:57:05] jzerebecki: Wikidata is a submodule that needs a core bump, correct? [16:58:52] (03CR) 10Mobrovac: [C: 031] Add a robots.txt to disallow indexing of API content [puppet] - 10https://gerrit.wikimedia.org/r/257922 (https://phabricator.wikimedia.org/T119786) (owner: 10GWicke) [16:59:54] godog: you should start seeings statd data from mediawiki come through in fewer packets now! [17:00:29] well, from test wikis at least already.. and everything else today and tommorrow [17:00:40] do we have some way of monitoring it? [17:00:57] RECOVERY - puppet last run on rdb2004 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [17:00:59] 6operations: Make certain that jessie-backports is disabled fleetwide. - https://phabricator.wikimedia.org/T108941#1865845 (10hashar) `::apt` now always injects `jessie-backports`. That has been done on Nov. 11th by https://gerrit.wikimedia.org/r/#/c/252202/ c3888dcbe6e78c53c3f59bbb62e221d2f0fac1d8 ``` apt:... [17:01:51] thcipriani: it should happen automatically in this case [17:02:27] RECOVERY - puppet last run on mw2122 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [17:04:22] 6operations, 6Labs, 5Patch-For-Review: Investigate whether to use Debian's jessie-backports - https://phabricator.wikimedia.org/T107507#1865865 (10faidon) [17:04:24] 6operations: Make certain that jessie-backports is disabled fleetwide. - https://phabricator.wikimedia.org/T108941#1865862 (10faidon) 5Open>3declined a:3faidon Indeed. [17:05:05] !log krenair@tin Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 00m 28s) [17:05:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:07:46] ori, why did that fix things? [17:07:54] did you commit a really old interwiki CDB file or something? [17:08:27] For some reason the previous version of the file was missing wikis created within the last year [17:08:31] https://phabricator.wikimedia.org/T120937 [17:08:55] (03PS4) 10Muehlenhoff: Add auth2001 partitioning entries Bug:T120263 [puppet] - 10https://gerrit.wikimedia.org/r/257643 (https://phabricator.wikimedia.org/T120263) (owner: 10Papaul) [17:09:47] (03CR) 10Muehlenhoff: [C: 032 V: 032] Add auth2001 partitioning entries Bug:T120263 [puppet] - 10https://gerrit.wikimedia.org/r/257643 (https://phabricator.wikimedia.org/T120263) (owner: 10Papaul) [17:14:39] jzerebecki: kk, finally wrangled wikidata, ready for sync? [17:15:52] thcipriani: yup [17:17:08] !log thcipriani@tin Synchronized php-1.27.0-wmf.8/extensions/Wikidata: SWAT: Update Wikibase [[gerrit:257901]] (duration: 00m 49s) [17:17:10] ^ jzerebecki check please [17:17:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:18:26] (03PS1) 10Papaul: Add MAC entries for auth2001 Bug:T120263 [puppet] - 10https://gerrit.wikimedia.org/r/257934 (https://phabricator.wikimedia.org/T120263) [17:20:13] thcipriani: looks good. [17:20:17] thx [17:20:22] jzerebecki: thank you! [17:22:49] PROBLEM - Apache HTTP on mw1125 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:22:49] PROBLEM - HHVM rendering on mw1125 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:24:28] PROBLEM - RAID on mw1125 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [17:24:37] PROBLEM - configured eth on mw1125 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [17:24:58] PROBLEM - nutcracker port on mw1125 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [17:24:58] PROBLEM - dhclient process on mw1125 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [17:25:07] PROBLEM - SSH on mw1125 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:25:09] PROBLEM - salt-minion processes on mw1125 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [17:25:18] PROBLEM - Check size of conntrack table on mw1125 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [17:25:18] PROBLEM - nutcracker process on mw1125 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [17:25:48] PROBLEM - Disk space on mw1125 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [17:26:27] PROBLEM - DPKG on mw1125 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [17:26:28] PROBLEM - HHVM processes on mw1125 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [17:29:27] RECOVERY - nutcracker process on mw1125 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [17:29:45] RECOVERY - Disk space on mw1125 is OK: DISK OK [17:30:28] RECOVERY - HHVM processes on mw1125 is OK: PROCS OK: 6 processes with command name hhvm [17:30:28] RECOVERY - DPKG on mw1125 is OK: All packages OK [17:30:37] RECOVERY - RAID on mw1125 is OK: OK: no RAID installed [17:30:47] RECOVERY - configured eth on mw1125 is OK: OK - interfaces up [17:31:08] RECOVERY - nutcracker port on mw1125 is OK: TCP OK - 0.000 second response time on port 11212 [17:31:09] RECOVERY - dhclient process on mw1125 is OK: PROCS OK: 0 processes with command name dhclient [17:31:09] RECOVERY - SSH on mw1125 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3 (protocol 2.0) [17:31:19] RECOVERY - salt-minion processes on mw1125 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [17:31:29] RECOVERY - Check size of conntrack table on mw1125 is OK: OK: nf_conntrack is 0 % full [17:31:39] !log restarted hhvm on mw1125 [17:31:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:31:50] !log restarted hhvm on mw1125, memory starvation [17:31:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:32:58] RECOVERY - Apache HTTP on mw1125 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.171 second response time [17:33:08] RECOVERY - HHVM rendering on mw1125 is OK: HTTP OK: HTTP/1.1 200 OK - 63976 bytes in 1.415 second response time [17:37:47] (03CR) 10Muehlenhoff: [C: 031] "Looks good to me, will merge later on." [puppet] - 10https://gerrit.wikimedia.org/r/257934 (https://phabricator.wikimedia.org/T120263) (owner: 10Papaul) [17:37:58] bd808: I can't get into bd808-vagrant.wikimania-support.eqiad.wmflabs. [17:38:00] is this important? [17:38:49] it probably has a messed up self-hosted puppetmaster [17:39:25] could you take a look I just want to get ldap in a good state [17:39:44] If you can't get in I probably can't either, but I'll try [17:40:58] RECOVERY - puppet last run on mw1125 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [17:41:17] (03PS1) 10Alexandros Kosiaris: etherpad: Drop trustProxy, use log_x_client_ip [puppet] - 10https://gerrit.wikimedia.org/r/257938 [17:45:44] (03PS1) 10Alexandros Kosiaris: Add a simple patch to log X-Client-IP instead of XFF [debs/etherpad-lite] - 10https://gerrit.wikimedia.org/r/257942 [17:45:51] AaronSchulz: anything change with the job queue yesterday? all writes to elasticsearch seem to have stopped ~19 hours ago [17:46:51] (03PS7) 10BBlack: text VCL: remove hiera mobile/text conditionals [puppet] - 10https://gerrit.wikimedia.org/r/257774 (https://phabricator.wikimedia.org/T109286) [17:46:53] (03PS2) 10BBlack: text VCL: move layer-common vcl_hash to text-common [puppet] - 10https://gerrit.wikimedia.org/r/257917 [17:47:50] bblack: doesn't that break zero? [17:47:56] oh no it doesn't [17:48:04] because mobile varies x-subdomain [17:48:58] because mobilefrontend* [17:49:02] ok, got it [17:49:22] well technically it's the same appservers, it's kinda messed up [17:49:23] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Add a simple patch to log X-Client-IP instead of XFF [debs/etherpad-lite] - 10https://gerrit.wikimedia.org/r/257942 (owner: 10Alexandros Kosiaris) [17:49:36] (03CR) 10Alexandros Kosiaris: [C: 032] etherpad: Drop trustProxy, use log_x_client_ip [puppet] - 10https://gerrit.wikimedia.org/r/257938 (owner: 10Alexandros Kosiaris) [17:49:45] gwicke: subbu who can I talk to about getting puppet fixed up on these vm's [17:49:48] 1 citoid.services.eqiad.wmflabs. [17:49:48] 1 services-deploy.services.eqiad.wmflabs. [17:49:48] 1 st-mathoid01.services-testbed.eqiad.wmflabs. [17:49:50] 2 st-puppetmaster.services-testbed.eqiad.wmflabs. [17:49:52] 2 st-restbase01.services-testbed.eqiad.wmflabs. [17:49:54] 2 st-tin.services-testbed.eqiad.wmflabs. [17:49:56] paravoid: I still think we need to better-fix that problem before I can merge [17:50:25] mobrovac, ^ [17:50:33] right now if varnish sends X-Subdomain, MediaWiki sends back a response that varies on X-Subdomain, but if varnish doesn't send it, mediawiki doesn't vary on it. Which "works" for separate cache pools, but it's technically wrong [17:50:46] and it will break my current patch, due to inconsistent vary into the same cache cluster [17:50:55] (well, will break my current patch if we sent mobile requests to text-cluster) [17:51:35] (which I totally plan to test once that's merged, and even testing them would pollute the cache and break things) [17:51:35] isn't that why you did the hash_data thing? [17:51:54] the hash_data thing as it's implemented so far just splits desktop-v-mobile, not M-v-Zero [17:52:26] but it's still going to cause issues that MW sends inconsistent Vary I think? [17:52:29] chasemp: most of the services-testbed ones can actually be removed afaik [17:52:30] it splits desktop-v-mobile and then Vary splits M-v-Zero, isn't it? [17:52:43] mobrovac: can you specify which should not? [17:52:50] that was my question which I then answered myself like that :) [17:52:55] mobrovac: I can do a mass cleaning and that still leaves a few [17:52:58] what do you need MediaWiki to do? [17:53:02] paravoid: I'm not sure... [17:53:28] paravoid: technically MW will still be sending responses for the same URL (but not same vcl_hash) with a lack of Vary: X-Subdomain, which we hash-bin differently, but... [17:53:34] inconsistent Vary scares me [17:53:38] chasemp: actually all of them if thcipriani confirms he doesn't need them (most of them were created by him) [17:53:41] (03CR) 10Muehlenhoff: [C: 032 V: 032] Add MAC entries for auth2001 Bug:T120263 [puppet] - 10https://gerrit.wikimedia.org/r/257934 (https://phabricator.wikimedia.org/T120263) (owner: 10Papaul) [17:53:48] PROBLEM - puppet last run on cp3013 is CRITICAL: CRITICAL: puppet fail [17:53:49] (03PS2) 10Muehlenhoff: Add MAC entries for auth2001 Bug:T120263 [puppet] - 10https://gerrit.wikimedia.org/r/257934 (https://phabricator.wikimedia.org/T120263) (owner: 10Papaul) [17:53:53] chasemp: s/all of them/all of them in services-testbed/ [17:54:07] (03CR) 10Muehlenhoff: [V: 032] Add MAC entries for auth2001 Bug:T120263 [puppet] - 10https://gerrit.wikimedia.org/r/257934 (https://phabricator.wikimedia.org/T120263) (owner: 10Papaul) [17:54:32] the right answer is MW should be sending Vary: X-Subdomain regardless of whether it sees an X-Subdomain request header to tell it it's a "cache_mobile" request [17:54:42] mobrovac: k [17:54:48] bblack: ok, that's easy [17:55:02] 1 citoid.services.eqiad.wmflabs. [17:55:02] 1 services-deploy.services.eqiad.wmflabs. [17:55:06] mobrovac: that leaves ^ [17:55:27] (or rephrased as a general rule: MW should *always* output the same Vary header for the same Host+URL, regardless of anything) [17:55:56] bblack: why? [17:57:25] chasemp: for citoid, i have to check if it's used at all, and services-deploy i have no idea who created it nor why [17:57:32] the X-Subdomain header handling is entirely outside of Core, and straddles operations/mediawiki-config and several mobile extensions [17:57:37] chasemp: do you have some logs/something as to why they are failing? [17:58:18] it's a bit of a mess [17:58:19] https://github.com/search?q=%40wikimedia+x-subdomain&type=Code&utf8=%E2%9C%93 [17:58:19] if hash_data works, then I don't think we need it [17:58:37] mobrovac: services-deploy.services.eqiad.wmflabs I can't get into at all [17:59:59] 6operations, 10ops-eqiad, 6Discovery, 10Wikidata, 10Wikidata-Query-Service: install two Intel 320 Series SSDSA2CW300G3 2.5" 300GB each in wdqs1001/wdqs1002 - https://phabricator.wikimedia.org/T120712#1866075 (10Cmjohnson) 2 disks were added to each server [18:00:00] 6operations, 6Discovery, 10Wikidata, 10Wikidata-Query-Service, 10hardware-requests: Additional diskspace of wdqs1001/wdqs1002 - https://phabricator.wikimedia.org/T119579#1866078 (10Cmjohnson) [18:00:00] 6operations, 10ops-eqiad, 6Discovery, 10Wikidata, 10Wikidata-Query-Service: install two Intel 320 Series SSDSA2CW300G3 2.5" 300GB each in wdqs1001/wdqs1002 - https://phabricator.wikimedia.org/T120712#1866077 (10Cmjohnson) 5Open>3Resolved [18:00:01] chasemp: hm, me neither [18:00:01] paravoid: yeah but hash_data is a hack, if we're talking standards, Vary should be consistent [18:00:01] chasemp: i guess that makes it a prime candidate for removal [18:00:02] mobrovac: ok fair :) [18:00:02] mobilefrontend is a hack :) [18:00:02] (03CR) 10Eevans: [C: 031] Add a robots.txt to disallow indexing of API content [puppet] - 10https://gerrit.wikimedia.org/r/257922 (https://phabricator.wikimedia.org/T119786) (owner: 10GWicke) [18:00:04] ori: if an origin sends two different Vary headers for the same URI (which is the only variance otherwise, in standard terms without vcl_hash hacks), there's no right way to handle that and/or it's a messy race [18:00:55] on one request it says "cache this for anyone that asks for this URI", and on another request for the same URI it says "cache this varied on this request header", but then on a subsequent request varied on the header it says again "cache this for all requests" [18:02:20] yeah, that makes sense [18:03:16] 6operations, 10ops-eqiad, 6Labs, 10Labs-Infrastructure: labstore1002 issues while trying to reboot - https://phabricator.wikimedia.org/T98183#1866123 (10Cmjohnson) @coren: we no longer have the spar h800 we used in labstore1001 last month. [18:03:34] variant handling in mediawiki is tim's baby, so it's almost certainly basically correct at a basic level, but it may be getting misused [18:04:07] well it's just having MobileFrontend extension inject a different Vary on top of core's Vary, but only get invoked to do that based on certain incoming headers, that kinda screws it up [18:04:13] at least from my black-box perspective [18:04:25] yeah, that makes total sense [18:04:28] ebernhardson: no config change that I know of [18:05:25] probably MFE shouldn't muck with Vary per-request, but instead should tell core to add a new Vary header globally, or something [18:05:46] (or maybe just its docs say to update your global config to add that new global Vary) [18:06:01] yes [18:06:45] anyways [18:07:18] paravoid: I think maybe you're right that my current version of the patch, with the extra hash_data("Mobile-Subdomain") in there on the text cluster, makes it a non-issue. [18:07:26] but it still kinda scares me, I need to read and think first to be sure [18:08:11] but I guess hash_data() has to happen before Varying does [18:08:24] hash_data happens on the request path, doesn't it? [18:08:35] (while vary happens on the response path) [18:10:16] yeah vcl_hash is right after vcl_recv, if it's a (lookup) [18:10:27] vary happens on response -> store in cache [18:15:40] I guess the part that's always been unclear to me, is whether on response with Vary varnish stuffs the Vary info into the top-level metadata of the hash entry and then stores multiple hash objects in slots on the Vary beneath [18:15:42] !log inserted decryption key into database for enWP Arbcom case [18:16:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:16:15] or whether it's always got vary slots, and on response with vary it just stuffs that one response into the vary list with "if this header == what I saw in the request for this response" [18:16:37] maybe that was a horrible way of saying that heh [18:16:46] PROBLEM - tools-home on tools.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:17:04] uh oh =[ [18:17:07] I don't know if the response-Vary is hash-entry-level metadata for all variants, or just sets conditions based on Vary + this response's request headers on one of the sub-objects [18:17:32] is tools home down expected? [18:18:04] andrewbogott ^ Coren ^ YuviPanda ? [18:18:23] clearly no napping for me [18:18:25] * YuviPanda looks [18:18:34] is back up [18:18:36] was a 'blip' [18:18:38] I wonder why [18:18:49] robh: something ugly is happening in labs, where whenever a new instance is created everything freezes for a few seconds [18:18:54] RECOVERY - tools-home on tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 953392 bytes in 4.853 second response time [18:18:58] presumably ldap is timing out for someone [18:19:00] ouch [18:19:03] but we haven’t identified anything specific [18:19:06] that sounds horribly painful to debug [18:19:40] well, my debug plan involves moritzm saying “Oh, here it is!” and fixing it [18:19:46] so, not painful for me :) [18:19:50] except I bet moritz is done for the day [18:20:38] how would that be an ldap failure? [18:20:48] how long does that check run when we do it manually? [18:20:51] takes a look [18:21:05] (03PS1) 10Yuvipanda: quarry: Don't install python-pymqsl [puppet] - 10https://gerrit.wikimedia.org/r/257953 [18:22:18] RECOVERY - puppet last run on cp3013 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [18:22:20] (03PS2) 10Yuvipanda: quarry: Don't install python-pymqsl [puppet] - 10https://gerrit.wikimedia.org/r/257953 [18:22:41] paravoid: Not sure how, but ldap times out and that affects pretty much everything - external dns, getent() processing, etc. The tool labs landing page in particular because it looks up groups in ldap() to enumerate maintainers of tools. [18:22:58] ldap times out? [18:23:00] (03CR) 10Yuvipanda: [C: 032 V: 032] quarry: Don't install python-pymqsl [puppet] - 10https://gerrit.wikimedia.org/r/257953 (owner: 10Yuvipanda) [18:23:01] ldap* no parens. [18:23:08] when did that happen? [18:23:32] paravoid: Well, we haven't seen it time out, we're seeing things that rely on it get bursts of slow responses. [18:23:44] Although I think andrew actually caught it in the act. [18:23:47] Lemme check backlog [18:23:59] I don’t think I caught it really. [18:24:11] I’m blaming ldap mostly because it’s pervasive, and because of proximity to the problem [18:24:56] it's just checking http on tools.wmflabs.org [18:25:23] mutante: Specifically, it's checking that 'Magnus' appears in the content. [18:25:32] it takes about 5 seconds when i ran it manually now [18:25:33] (03PS1) 10Jcrespo: Reconfiguration of External Storage servers in codfw [puppet] - 10https://gerrit.wikimedia.org/r/257954 [18:25:47] i think over 30 seconds icinga would call it timeout [18:26:05] The check is for 5s iirc. [18:26:23] YuviPanda: ^^ am I remembering right? [18:26:37] real 0m8.085s [18:26:41] it's the default timeout [18:26:43] not 5s [18:26:50] should be 30s afair [18:26:55] it's 8 and still OK [18:27:01] yeah, if check_http is 30 then it should be too [18:27:07] Ah, no, the message says "Socket timeout after 10 seconds" [18:27:11] So 10s [18:27:12] :-) [18:27:22] ok, that's closer then, yea [18:28:03] it's changing between 4 and 8 when repeated [18:28:21] That page has almost 1M now - I really need to move the list of tools out of the landing page. [18:28:53] Especially since it's not lightweight to fetch or generate. [18:28:54] "this is the tool of the day"... [see all] [18:30:12] Having the list of tools there was cute when there were dozens. Now... not so much. [18:30:14] (03CR) 10MarcoAurelio: [C: 04-1] "Per my last review." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/239854 (https://phabricator.wikimedia.org/T111898) (owner: 10Mdann52) [18:30:53] hmm, needs pagination i guess, or some A-Z index [18:31:23] or we could organize them in wiki and just link from there [18:31:29] can someone check /var/log/mediawiki/jobrunner.log on mw1003 seeing ton of 503 Service Unavailable, dunno if it's normal [18:32:35] ori: hmm, so redis::instance seems borked on quarry. [18:32:40] ori: constantly restarting.. [18:35:42] Coren, YuviPanda, I just created four new instances at once in an attempt to cause a freeze… no dice [18:35:43] ori: fun, service redis-server stop doesn't actually seem to... do anything [18:35:43] andrewbogott: Yeah, I'm pretty sure the issue wasn't instance creation directly. [18:35:43] andrewbogott: ugh :| [18:35:44] I wonder if it's ganeti related [18:35:44] yeah, was just hoping that if I pushed my luck it would reproduce [18:35:44] (03CR) 10Alexandros Kosiaris: [C: 032] Have misc-web talk directly to etherpad-lite [puppet] - 10https://gerrit.wikimedia.org/r/255406 (owner: 10Alexandros Kosiaris) [18:35:44] (03PS3) 10Alexandros Kosiaris: Have misc-web talk directly to etherpad-lite [puppet] - 10https://gerrit.wikimedia.org/r/255406 [18:35:44] (03CR) 10Alexandros Kosiaris: [V: 032] Have misc-web talk directly to etherpad-lite [puppet] - 10https://gerrit.wikimedia.org/r/255406 (owner: 10Alexandros Kosiaris) [18:35:44] robh: hmm, ok to recycle element names for servers? like i need a new one and i'd take "technetium" because it doesn't show up anywhere. but git log told me in was long ago used for what then became payments1001 [18:35:47] YuviPanda: merged yours as well [18:35:47] mutante: if you can see it became payments then its even easier to say yes [18:35:47] akosiaris: bah, thanks. [18:35:48] but indeed, we will reuse element names if they are confirmed gone [18:35:48] and if you see its called somehting else, thats a confirmation [18:35:48] ganeti vm? [18:35:49] robh: ok, thanks! and yes, ganeti VM [18:35:58] I wonder if we're going to end up having more of those than elements eventually... [18:36:02] stars less likely ;] [18:36:13] (03PS8) 10BBlack: text VCL: remove hiera mobile/text conditionals [puppet] - 10https://gerrit.wikimedia.org/r/257774 (https://phabricator.wikimedia.org/T109286) [18:36:15] (03PS3) 10BBlack: text VCL: move layer-common vcl_hash to text-common [puppet] - 10https://gerrit.wikimedia.org/r/257917 [18:36:17] (03PS2) 10BBlack: VCL: move geoip code to text VCL (only consumer) [puppet] - 10https://gerrit.wikimedia.org/r/257918 [18:36:22] this one will also be removed again after a while [18:36:32] then the name can go back "in the pool" [18:36:55] but i was also thinking that, there arent that many elements [18:37:19] PROBLEM - puppet last run on carbon is CRITICAL: CRITICAL: Puppet has 1 failures [18:37:19] yea, i think we should have split ganeti and physical misc hostnames [18:37:25] but i dunno what to suggest for them so meh [18:37:47] though i think it would be cool to know its a ganeti vm in the hsotname ;] [18:37:51] g-whateves [18:38:25] (03PS1) 10Yuvipanda: wdq_mm: Kill 'labs' part in lb role [puppet] - 10https://gerrit.wikimedia.org/r/257958 [18:38:42] i tend to agree a different scheme for virtual machines from the beginning ,, but now let's worry about that when we actually run out [18:38:48] (03PS2) 10Yuvipanda: wdq_mm: Kill 'labs' part in lb role [puppet] - 10https://gerrit.wikimedia.org/r/257958 [18:39:18] (03CR) 10Yuvipanda: [C: 032 V: 032] wdq_mm: Kill 'labs' part in lb role [puppet] - 10https://gerrit.wikimedia.org/r/257958 (owner: 10Yuvipanda) [18:41:03] (03PS9) 10BBlack: text VCL: remove hiera mobile/text conditionals [puppet] - 10https://gerrit.wikimedia.org/r/257774 (https://phabricator.wikimedia.org/T109286) [18:41:11] (03CR) 10BBlack: [C: 032 V: 032] text VCL: remove hiera mobile/text conditionals [puppet] - 10https://gerrit.wikimedia.org/r/257774 (https://phabricator.wikimedia.org/T109286) (owner: 10BBlack) [18:41:22] dcausse: i'm looking. does not seem to be normal. mw1004 is different [18:41:38] mutante: thanks [18:41:54] YuviPanda: ok to merge? [18:42:08] bblack: yup! [18:42:48] so it complains about 503 when connecting to local port 9005 and that is apache [18:45:24] mutante: yes, all cirrus writes to elastic are stopped since few hours and we don't why,wondering if it's related [18:45:40] (03CR) 10Jhobs: [C: 031] Enable RelatedArticles and Cards on Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/255553 (https://phabricator.wikimedia.org/T116676) (owner: 10Bmansurov) [18:49:20] jobqueue-access log shows me there are many new jobs coming in, looks faster than mw1004 but not a specific pattern [18:49:38] jobqueue-error logs though: "failed to make connection to backend" [18:51:24] AaronSchulz: eh, hi, are you around ^? [18:52:13] (03PS4) 10BBlack: text VCL: move layer-common vcl_hash to text-common [puppet] - 10https://gerrit.wikimedia.org/r/257917 [18:52:24] is that one just at capacity (mw1003), mw1004 should be the same but doesnt have these errors [18:52:27] (03CR) 10BBlack: [C: 032 V: 032] text VCL: move layer-common vcl_hash to text-common [puppet] - 10https://gerrit.wikimedia.org/r/257917 (owner: 10BBlack) [18:52:37] (03PS3) 10BBlack: VCL: move geoip code to text VCL (only consumer) [puppet] - 10https://gerrit.wikimedia.org/r/257918 [18:52:43] (03CR) 10BBlack: [C: 032 V: 032] VCL: move geoip code to text VCL (only consumer) [puppet] - 10https://gerrit.wikimedia.org/r/257918 (owner: 10BBlack) [18:53:31] (03PS1) 10GWicke: Fix the api URI template for labs [puppet] - 10https://gerrit.wikimedia.org/r/257965 [18:55:46] aah.. [hhvm] [18:56:16] dcausse: looks fixed by restarting hhvm serice [18:56:21] check the log [18:56:44] (03CR) 10Krinkle: [C: 04-1] toollabs: migrate to redis::instance [puppet] - 10https://gerrit.wikimedia.org/r/257534 (owner: 10Ori.livneh) [18:56:46] !log mw1003 - restarted hhvm (and apache a bit earlier too) [18:56:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:56:53] mutante: thanks! will check if our problem is resolved [18:57:24] dcausse: ok, cool [19:00:04] thcipriani: Respected human, time to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20151209T1900). Please do the needful. [19:00:39] starting promote of .8 to group1 [19:02:11] !log ebernhardson@tin Synchronized php-1.27.0-wmf.7/extensions/CirrusSearch/includes/Job/Job.php: Revert manual adjustment made to CirrusSearch Job.php (duration: 00m 29s) [19:02:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:04:22] (03PS1) 10Thcipriani: group1 wikis to 1.27.0-wmf.8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/257967 [19:07:18] PROBLEM - puppet last run on cp4006 is CRITICAL: CRITICAL: puppet fail [19:08:55] (03CR) 10Thcipriani: [C: 032] "Train deploy" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/257967 (owner: 10Thcipriani) [19:09:17] (03Merged) 10jenkins-bot: group1 wikis to 1.27.0-wmf.8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/257967 (owner: 10Thcipriani) [19:10:16] !log thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.27.0-wmf.8 [19:10:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:13:53] hmm, just realized cawiki and hewiki didn't get updated like they should have today. [19:15:17] thcipriani: they wouldn't be in group1 [19:15:19] RECOVERY - puppet last run on cp4006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:15:32] bd808: they were just added to group1 this morning (should have been) [19:15:32] group1 is all *except* wikipedias [19:17:26] context: https://phabricator.wikimedia.org/T115002 and https://gerrit.wikimedia.org/r/#/c/253371/ [19:17:29] 6operations, 10Salt: salt minions need 'wake up' test.ping after idle period before they respond properly to comands - https://phabricator.wikimedia.org/T120831#1866444 (10ArielGlenn) The next category of problem looks like this: Master is running, minions are running On neodymium, I stop the new master and r... [19:19:57] thcipriani: neat. Looks like the script that bumps wikiversions need to be debugged [19:20:24] bd808: indeed. Looking at that now. [19:20:28] PROBLEM - puppet last run on analytics1035 is CRITICAL: CRITICAL: Puppet has 1 failures [19:24:58] thcipriani: looks like things don't work like that change expects. The first %% line stops all other processing [19:25:19] yeah, saw the break in the foreach :\ [19:25:36] I think that could be ripped out actually [19:25:51] I like the idea of what you are trying to do there [19:26:06] my change was totally untested :) [19:26:23] the alternative would be to add in a new "+ something" list [19:27:48] I would be worried about ripping out that piece of logic. There may be something that depends on the break, but it is certainly un-intuitive in this instance. [19:28:58] PROBLEM - Unmerged changes on repository mediawiki_config on mira is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/). [19:29:48] 6operations, 6Phabricator, 6Project-Creators, 6Triagers: Requests for addition to the #project-creators group (in comments) - https://phabricator.wikimedia.org/T706#1866486 (10Luke081515) Hello, can you add me to this group? There are repos, which needs projects (for extensions, etc., see T120915), so this... [19:29:54] (03PS1) 10Dzahn: dhcp: fix config syntax error [puppet] - 10https://gerrit.wikimedia.org/r/257971 [19:30:44] (03CR) 10Dzahn: [C: 032 V: 032] dhcp: fix config syntax error [puppet] - 10https://gerrit.wikimedia.org/r/257971 (owner: 10Dzahn) [19:33:01] kaldari: Regarding that question about the inserts to your new table from editing WPBannerMeta, am I correct in thinking it's still the case that the jobs would be the LinksUpdate jobs that come with any template edit? Or did you change the plan in some manner since the SoS? [19:33:51] RECOVERY - puppet last run on carbon is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:35:13] anomie: they would be separate jobs generated by the parser function. The parser function doesn't actually effect the output of the page, it just records the data from the various assessment templates in the database [19:36:07] but they would all be triggered by the re-parsing, which I assume is itself triggered by the LinksUpdate job(?) [19:37:33] kaldari: Ok, so more like the cirrusSearchLinksUpdate job: a separate job that gets triggered by LinksUpdate. [19:37:46] o/ [19:37:53] Who can I ask about adding me to the "Triagers" project on Phab? [19:37:54] (03PS16) 10Krinkle: contint: install npm/grunt-cli with npm [puppet] - 10https://gerrit.wikimedia.org/r/244748 (https://phabricator.wikimedia.org/T113903) (owner: 10Hashar) [19:38:02] yes, I think that's right although I'm not familiar with cirrusSearchLinksUpdate [19:38:02] (03CR) 10Krinkle: [C: 031] contint: install npm/grunt-cli with npm [puppet] - 10https://gerrit.wikimedia.org/r/244748 (https://phabricator.wikimedia.org/T113903) (owner: 10Hashar) [19:38:35] Could someone merge https://gerrit.wikimedia.org/r/#/c/244748/ ? it's already live on the relevant puppetmaster for CI and without that patch CI would break. [19:38:38] (03PS1) 10Thcipriani: Add hewiki and cawiki to group1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/257973 [19:39:43] (03PS1) 10Ottomata: Upstream release of 2.1.0 [debs/python-pykafka] (debian) - 10https://gerrit.wikimedia.org/r/257974 [19:40:05] anomie: jcrespo mentioned that the inserts might fail if the flood of insert jobs causes that particular table to lock (it's a new table), so he suggests that we try the inserts multiple times, but I have no idea how that would work. Do you know how we would do that or is that something that would automatically by handled by the job queue? [19:40:07] dbrant: Anybody on the members list at https://phabricator.wikimedia.org/project/profile/5/ can help you [19:40:52] bd808: thx! [19:40:53] (03CR) 10Ori.livneh: toollabs: migrate to redis::instance (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/257534 (owner: 10Ori.livneh) [19:41:15] jynus: ^ [19:41:17] kaldari: Really, I'm not sure. I'd have to look at existing jobs to see how they do it. [19:41:18] (03CR) 10BryanDavis: [C: 031] Add hewiki and cawiki to group1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/257973 (owner: 10Thcipriani) [19:42:05] (03CR) 10Thcipriani: [C: 032] Add hewiki and cawiki to group1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/257973 (owner: 10Thcipriani) [19:42:34] (03Merged) 10jenkins-bot: Add hewiki and cawiki to group1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/257973 (owner: 10Thcipriani) [19:42:53] (03PS3) 10Ori.livneh: toollabs: migrate to redis::instance [puppet] - 10https://gerrit.wikimedia.org/r/257534 [19:43:02] dbrant: i'll do it, just a sec [19:45:01] dbrant: done [19:45:08] RECOVERY - Unmerged changes on repository mediawiki_config on mira is OK: No changes to merge. [19:45:15] ori: much obliged!! thanks [19:45:18] !log thcipriani@tin Synchronized dblists/group1-wikipedia.dblist: Train: Add hewiki and cawiki to group1. Part I [[gerrit:257973]] (duration: 00m 33s) [19:45:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:45:59] dbrant: congrats again on the 'best apps of 2015' thing -- that's really impressive. [19:46:05] !log thcipriani@tin Synchronized dblists/group1.dblist: Train: Add hewiki and cawiki to group1. Part II [[gerrit:257973]] (duration: 00m 29s) [19:46:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:46:29] RECOVERY - puppet last run on analytics1035 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:48:47] (03CR) 10Mobrovac: [C: 031] Fix the api URI template for labs [puppet] - 10https://gerrit.wikimedia.org/r/257965 (owner: 10GWicke) [19:48:59] (03PS1) 10Dzahn: introduce technetium.eqiad.wmnet [dns] - 10https://gerrit.wikimedia.org/r/257975 (https://phabricator.wikimedia.org/T118763) [19:49:17] (03PS1) 10Krinkle: contint: Change redis memory from 128Gb back to 128Mb [puppet] - 10https://gerrit.wikimedia.org/r/257976 [19:49:28] (03CR) 10Krinkle: role::ci::slave::browsertests: migrate to redis::instance (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/257043 (owner: 10Ori.livneh) [19:49:41] (03CR) 10Ori.livneh: [C: 032 V: 032] "yikes. thanks." [puppet] - 10https://gerrit.wikimedia.org/r/257976 (owner: 10Krinkle) [19:49:42] ori: thanks! it was a bit unexpected but... i'll take it! [19:50:01] (03PS2) 10Dzahn: Add a robots.txt to disallow indexing of API content [puppet] - 10https://gerrit.wikimedia.org/r/257922 (https://phabricator.wikimedia.org/T119786) (owner: 10GWicke) [19:51:54] (03PS1) 10Thcipriani: hewiki and cawiki to php-1.27.0-wmf.8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/257977 [19:52:16] (03CR) 10Dzahn: [C: 032] Add a robots.txt to disallow indexing of API content [puppet] - 10https://gerrit.wikimedia.org/r/257922 (https://phabricator.wikimedia.org/T119786) (owner: 10GWicke) [19:53:01] (03PS2) 10Dzahn: Fix the api URI template for labs [puppet] - 10https://gerrit.wikimedia.org/r/257965 (owner: 10GWicke) [19:53:24] ori: i merged that on master (redis 128GB :) [19:53:36] (03CR) 10Dzahn: [C: 032] Fix the api URI template for labs [puppet] - 10https://gerrit.wikimedia.org/r/257965 (owner: 10GWicke) [19:53:47] (03CR) 10Thcipriani: [C: 032] "Train deploy" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/257977 (owner: 10Thcipriani) [19:54:09] (03Merged) 10jenkins-bot: hewiki and cawiki to php-1.27.0-wmf.8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/257977 (owner: 10Thcipriani) [19:54:48] mutante: thanks [19:56:18] PROBLEM - Kafka Broker Replica Max Lag on kafka1020 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [5000000.0] [19:56:24] !log thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: hewiki and cawiki to php-1.27.0-wmf.8 [19:56:27] (03PS3) 10Krinkle: zuul: move roles into role module [puppet] - 10https://gerrit.wikimedia.org/r/257039 (owner: 10Dzahn) [19:56:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:57:32] (03CR) 10Dzahn: [C: 032] lint fixes [puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/256884 (owner: 10Dzahn) [19:57:39] Now cawiki and hewiki are on the right versions. train is complete. [19:57:50] whole train [19:59:36] mutante: indeed. Wednesday and Thursday just wikiversion updates. All the l10n updates, branch cutting, etc, etc are done Tuesday. [19:59:57] (03PS4) 10Krinkle: zuul: move roles into role module [puppet] - 10https://gerrit.wikimedia.org/r/257039 (owner: 10Dzahn) [20:00:13] (03CR) 10Krinkle: [C: 031] zuul: move roles into role module [puppet] - 10https://gerrit.wikimedia.org/r/257039 (owner: 10Dzahn) [20:00:51] ori: Could you merge https://gerrit.wikimedia.org/r/#/c/244748/ as well? [20:01:02] has been deployed for a month in ci already [20:02:00] (03PS17) 10Ori.livneh: contint: install npm/grunt-cli with npm [puppet] - 10https://gerrit.wikimedia.org/r/244748 (https://phabricator.wikimedia.org/T113903) (owner: 10Hashar) [20:02:06] (03CR) 10Ori.livneh: [C: 032 V: 032] contint: install npm/grunt-cli with npm [puppet] - 10https://gerrit.wikimedia.org/r/244748 (https://phabricator.wikimedia.org/T113903) (owner: 10Hashar) [20:02:27] PROBLEM - Kafka Broker Replica Max Lag on kafka1020 is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [5000000.0] [20:02:29] thcipriani: :) ah! [20:10:12] Thanks [20:10:18] RECOVERY - Kafka Broker Replica Max Lag on kafka1020 is OK: OK: Less than 1.00% above the threshold [1000000.0] [20:10:37] (03PS1) 10Dzahn: wikimetrics: submodule update [puppet] - 10https://gerrit.wikimedia.org/r/257979 [20:12:58] PROBLEM - Unmerged changes on repository mediawiki_config on mira is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/). [20:13:38] (03PS2) 10Dzahn: wikimetrics: submodule update [puppet] - 10https://gerrit.wikimedia.org/r/257979 [20:15:44] (03CR) 10Dzahn: [C: 032] wikimetrics: submodule update [puppet] - 10https://gerrit.wikimedia.org/r/257979 (owner: 10Dzahn) [20:23:59] 6operations, 6Phabricator, 6Project-Creators, 6Triagers: Requests for addition to the #project-creators group (in comments) - https://phabricator.wikimedia.org/T706#1866696 (10Aklapper) >>! In T706#1866486, @Luke081515 wrote: > Hello, can you add me to this group? There are repos, which needs projects (for... [20:26:40] 6operations, 6Commons, 10MediaWiki-File-management, 6Multimedia, and 2 others: Image cache issue when 'over-writing' an image on commons - https://phabricator.wikimedia.org/T119038#1866719 (10BurritoBazooka) Seems to be behaving correctly for me now. Should probably wait a bit more before closing? Thank y... [20:26:59] (03PS1) 10BBlack: varnish: sanitize XFF better [puppet] - 10https://gerrit.wikimedia.org/r/257984 (https://phabricator.wikimedia.org/T118769) [20:28:39] (03PS2) 10BBlack: varnish: sanitize XFF better [puppet] - 10https://gerrit.wikimedia.org/r/257984 (https://phabricator.wikimedia.org/T118769) [20:30:22] 6operations, 6Commons, 10MediaWiki-File-management, 6Multimedia, and 2 others: Image cache issue when 'over-writing' an image on commons - https://phabricator.wikimedia.org/T119038#1866746 (10BBlack) We shouldn't close at all, as these fixes are just workarounds and don't get to the heart of the issue, but... [20:30:35] 6operations, 6Commons, 10MediaWiki-File-management, 6Multimedia, and 2 others: Image cache issue when 'over-writing' an image on commons - https://phabricator.wikimedia.org/T119038#1866747 (10BBlack) p:5Unbreak!>3High [20:40:20] (03CR) 10BBlack: [C: 031] "tested with some inputs, but https://xkcd.com/1171/" [puppet] - 10https://gerrit.wikimedia.org/r/257984 (https://phabricator.wikimedia.org/T118769) (owner: 10BBlack) [20:42:26] (03PS1) 10Dzahn: openstack: fix quoting in nova role [puppet] - 10https://gerrit.wikimedia.org/r/258013 [20:45:46] (03CR) 10Paladox: "@PleaseStand do you mean something like ";" or if not how would it go." [puppet] - 10https://gerrit.wikimedia.org/r/257193 (owner: 10Paladox) [20:47:37] YuviPanda: did you get any further in figuring out why stopping the redis-server service did not work? [20:50:29] thcipriani, how's the train this time? [20:50:58] (03CR) 10CSteipp: [C: 031] varnish: sanitize XFF better [puppet] - 10https://gerrit.wikimedia.org/r/257984 (https://phabricator.wikimedia.org/T118769) (owner: 10BBlack) [20:51:01] yurik: complete. Much less bad than yesterday :) [20:51:11] .8 should be on all group1 wikis [20:51:42] thcipriani, awesome! I need to quickly deploy maps update, ahead of the gwicke migration [20:52:10] * gwicke wasn't aware of any upcoming migrations [20:53:22] gwicke, you have something complex scheduled in 10 min https://wikitech.wikimedia.org/wiki/Deployments [20:55:39] PROBLEM - puppet last run on mw1074 is CRITICAL: CRITICAL: puppet fail [20:55:58] yurik: gwicke: are you guys doing any deploy? Got sneak a CI upgrade right now that is going to prevent jobs from running for a few minutes [20:56:04] (Jenkins jobs I mean) [20:56:14] hashar: no deploy planned from services [20:56:16] hashar, i'm syncing maps [20:56:25] should be another 5 min [20:56:30] gwicke: thanks :)) [20:56:48] yurik: ah good. Guess I will wait for your operation to be complete [20:56:56] thx [20:57:58] (03CR) 10Dzahn: [C: 032] openstack: fix quoting in nova role [puppet] - 10https://gerrit.wikimedia.org/r/258013 (owner: 10Dzahn) [20:59:34] Reedy: Hm.. maintenance scripts are run with the correct version from multiversion, right? E.g. I can deploy a patch to wmf.8 of WikimediaMaintenance and it'll apply to wmf.8 wikis but not wmf.7 wikis [21:00:04] gwicke cscott arlolra subbu bearND mdholloway: Dear anthropoid, the time has come. Please deploy Services – Parsoid / OCG / Citoid / Mobileapps / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20151209T2100). [21:01:25] Krinkle: Yeah [21:01:48] (03CR) 10Chad: "The whole value would be quoted, not just the semicolon." [puppet] - 10https://gerrit.wikimedia.org/r/257193 (owner: 10Paladox) [21:02:37] 6operations, 10hardware-requests: Decommission and remove from racks out of warranty spares - https://phabricator.wikimedia.org/T121007#1866856 (10Cmjohnson) 3NEW a:3mark [21:06:54] (03PS1) 10EBernhardson: Remove $wmgUseCirrus variable [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258026 [21:08:23] hashar, done, go [21:08:45] yurik: thanks doing [21:08:58] (03PS1) 10EBernhardson: Increase Cirrus master timeout to 2 minutes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258027 [21:09:28] no mobileapps deployment today. /cc bearND [21:09:37] mdholloway: thanks :) [21:09:47] waiting on parsoid [21:10:34] 6operations: Install apt packages myspell-fr aspell-id hunspell-vi on stat1003 - https://phabricator.wikimedia.org/T121011#1866903 (10Halfak) 3NEW [21:11:00] !log Stopping Jenkins [21:11:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:11:13] (03PS4) 10Dzahn: lint: re-enable double quoted strings check [puppet] - 10https://gerrit.wikimedia.org/r/243859 (https://phabricator.wikimedia.org/T93645) [21:11:27] 6operations: Install apt packages myspell-fr aspell-id hunspell-vi on stat1003 - https://phabricator.wikimedia.org/T121011#1866916 (10Dzahn) a:3Dzahn [21:12:10] (03PS5) 10Dzahn: lint: re-enable double quoted strings check [puppet] - 10https://gerrit.wikimedia.org/r/243859 (https://phabricator.wikimedia.org/T93645) [21:12:13] Anyone around who can help me get a long-running process unblocked? [21:12:14] https://phabricator.wikimedia.org/T121011?workflow=create [21:12:26] :D [21:12:44] (03CR) 10Dzahn: [C: 032] "This check has also been fixed globally across the repo now. Re-enabling it for jenkins." [puppet] - 10https://gerrit.wikimedia.org/r/243859 (https://phabricator.wikimedia.org/T93645) (owner: 10Dzahn) [21:12:59] halfak: yea, i just took the ticket [21:13:06] Thanks mutante :) [21:13:31] Any feedback you have about how to make these tickets easier to deal with would be welcome :) [21:13:34] parsoid deploy is going to happen today .. waiting for beta labs to be udpated so i can do some quick tests before starting prod. deploy. [21:16:08] (03CR) 10Paladox: "@Chad would you have an example on how to do that for" [puppet] - 10https://gerrit.wikimedia.org/r/257193 (owner: 10Paladox) [21:16:22] (03PS1) 10Papaul: Fixed MAC info for auth2001 Bug:T120263 [puppet] - 10https://gerrit.wikimedia.org/r/258031 (https://phabricator.wikimedia.org/T120263) [21:16:57] halfak: i think you are already doing it right, ticket in phab (which looks fine) and this channel are the way to go [21:17:06] Awesome :) [21:17:49] (03CR) 10Dzahn: [C: 032] "yep, that's what i see in DHCP log on carbon" [puppet] - 10https://gerrit.wikimedia.org/r/258031 (https://phabricator.wikimedia.org/T120263) (owner: 10Papaul) [21:18:05] 6operations, 6Phabricator: migrate RT maint-announce into phabricator - https://phabricator.wikimedia.org/T118176#1866930 (10RobH) [21:18:23] (03CR) 10Dzahn: [V: 032] "yep, that's what i see in DHCP log on carbon" [puppet] - 10https://gerrit.wikimedia.org/r/258031 (https://phabricator.wikimedia.org/T120263) (owner: 10Papaul) [21:18:25] !log starting parsoid deploy [21:18:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:20:06] !log Jenkins going down right now. [21:20:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:21:27] (03CR) 10Dzahn: "carbon dhcpd: DHCPACK on 10.192.0.138 to 14:18:77:33:1d:1c" [puppet] - 10https://gerrit.wikimedia.org/r/258031 (https://phabricator.wikimedia.org/T120263) (owner: 10Papaul) [21:22:18] 6operations, 6Phabricator: migrate RT maint-announce into phabricator - https://phabricator.wikimedia.org/T118176#1866946 (10RobH) Please note that the projects needed for this were documented by @chasemp on T103700. We're linking in our testing tasks to this task as well. As all his proposed changes to my i... [21:22:22] halfak: really just French, Indonesian and Vietnamese? [21:22:34] !log restarted parsoid on wtp1003 as a canary [21:22:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:22:40] mutante, It looks like the rest of the ones I need are already there. [21:22:47] halfak: ah, those are in addition, just adding more languages, right [21:22:48] I'll get the whole list and maybe you can help me confirm. [21:23:03] halfak: yea, i did this before i think, gotcha [21:23:12] you dont have to , i just found it [21:23:27] RECOVERY - puppet last run on mw1074 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [21:23:36] Cool! Here's a paste from our docs anyway (just in case) [21:23:37] https://etherpad.wikimedia.org/p/revscoring_languages [21:24:13] ah, i just pasted in that etherpad [21:24:21] the part that i see we already have [21:24:34] remembers adding this [21:24:51] yep, so the change is very easy, hold on [21:25:09] Great. Thanks again :) [21:25:18] * halfak runs to get a soda [21:25:18] wtp1003 looking good. restarting parsoid on all nodes. [21:25:39] I am still upgrading jenkins [21:25:46] :( [21:25:47] puppet takes a whie [21:26:12] 'myspell-fr', should already be there though.. might be a role thing and stat1002 vs stat1003 again [21:26:43] mutante, it was myspell-fr missing the cued me into realizing I needed new packages. [21:27:17] Looks like we have "myspell-fr-gut", but not "myspell-fr" [21:29:13] halfak: myspell-fr looks installed on stat1002 and stat1003 [21:29:20] Weird. [21:29:35] our old ticket was https://phabricator.wikimedia.org/T99030 [21:30:08] OK. My method might be broken. I did "apt-get remove myspell-[TAB]" and didn't see it in the list. [21:30:16] !log Jenkins upgraded to 1.625.3 ( https://phabricator.wikimedia.org/T56599 ) [21:30:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:31:01] Woah! It's in the list now! [21:31:14] halfak: i do: dpkg -l | grep spell , but that method also works [21:31:21] And now I don't see "...-fr-gut" [21:31:56] cmjohnson1: heyas, you are on clinic duty this week [21:31:59] halfak: oh, maybe they conflict with each other [21:32:04] but we are telling puppet to install both [21:32:07] im about to change how maint-announces work, can i steal that part of your duty for the remainder of the week? [21:32:14] Could be. Either way, it seems like all is working with "fr" now. [21:32:14] checking it out [21:32:21] you still do the rest, but leave the maint-anounce queues in rt (and now in phab) alone [21:32:28] robh: IDK...i really look forward to it [21:32:39] hah...of course you can [21:32:43] see i can tell when you are trolling [21:32:48] maybe not everyone else, but you yes. [21:33:03] cool, ignore those queues starting now =] [21:33:08] okay [21:33:20] halfak: yes, that's it. we can only install myspell-fr OR myspell-fr-gut, but we tell puppet we want both [21:33:30] 6operations, 6Phabricator: migrate RT maint-announce into phabricator - https://phabricator.wikimedia.org/T118176#1866996 (10RobH) [21:33:36] Gotcha. myspell-fr is preferred :) [21:33:41] 6operations, 6Phabricator: migrate RT maint-announce into phabricator - https://phabricator.wikimedia.org/T118176#1793387 (10RobH) [21:33:43] halfak: what's better? Hydro-Quebec version or GUTenberg version? [21:33:53] ok! i'll fix that in puppet [21:34:32] !log finished deploying parsoid sha a0c626e4 [21:35:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:37:19] (03PS6) 10Dzahn: lint: re-enable double quoted strings check [puppet] - 10https://gerrit.wikimedia.org/r/243859 (https://phabricator.wikimedia.org/T93645) [21:37:41] (03PS1) 10Dzahn: statistics: don't install myspell-fr-gut package [puppet] - 10https://gerrit.wikimedia.org/r/258038 (https://phabricator.wikimedia.org/T121011) [21:39:10] 6operations, 6Phabricator: migrate RT maint-announce into phabricator - https://phabricator.wikimedia.org/T118176#1867018 (10RobH) So we've tested task creation by email to maint-announce@phabricator.wikimedia.org. As that is working, I've modified the maint-announce alias to send into both RT and Phabricator... [21:42:31] (03PS1) 10Dzahn: statistics: add aspell-id, hunspell-vi [puppet] - 10https://gerrit.wikimedia.org/r/258039 (https://phabricator.wikimedia.org/T121011) [21:42:41] (03PS2) 10Dzahn: statistics: don't install myspell-fr-gut package [puppet] - 10https://gerrit.wikimedia.org/r/258038 (https://phabricator.wikimedia.org/T121011) [21:42:49] (03CR) 10Dzahn: [C: 032] statistics: don't install myspell-fr-gut package [puppet] - 10https://gerrit.wikimedia.org/r/258038 (https://phabricator.wikimedia.org/T121011) (owner: 10Dzahn) [21:43:44] 6operations, 5Patch-For-Review: Install apt packages myspell-fr aspell-id hunspell-vi on stat1003 - https://phabricator.wikimedia.org/T121011#1867037 (10Dzahn) it's a follow-up to T99030 [21:44:22] (03CR) 10Alex Monk: "I thought we didn't reuse names like this?" [dns] - 10https://gerrit.wikimedia.org/r/257975 (https://phabricator.wikimedia.org/T118763) (owner: 10Dzahn) [21:44:45] (03PS2) 10Dzahn: statistics: add aspell-id, hunspell-vi [puppet] - 10https://gerrit.wikimedia.org/r/258039 (https://phabricator.wikimedia.org/T121011) [21:45:11] (03CR) 10Dzahn: "I asked Robh because of that, and yea, we do." [dns] - 10https://gerrit.wikimedia.org/r/257975 (https://phabricator.wikimedia.org/T118763) (owner: 10Dzahn) [21:45:47] (03CR) 10Dzahn: [C: 032] statistics: add aspell-id, hunspell-vi [puppet] - 10https://gerrit.wikimedia.org/r/258039 (https://phabricator.wikimedia.org/T121011) (owner: 10Dzahn) [21:47:31] ori: no [21:47:57] (03CR) 10RobH: "We did not re-use any hostnames in the past, because hostnames were the only record of a system other than it's serial number (which could" [dns] - 10https://gerrit.wikimedia.org/r/257975 (https://phabricator.wikimedia.org/T118763) (owner: 10Dzahn) [21:48:12] mutante: hope that clarifies (i commented) [21:48:22] so you arent stuck saying rob said this (which is never fun ;) [21:49:22] robh: it does:) thank you [21:50:28] 6operations, 5Patch-For-Review: Install apt packages myspell-fr aspell-id hunspell-vi on stat1003 - https://phabricator.wikimedia.org/T121011#1867085 (10Dzahn) Notice: /Stage[main]/Statistics::Compute/Package[hunspell-vi]/ensure: ensure changed 'purged' to 'present' Notice: /Stage[main]/Statistics::Compute/Pac... [21:50:41] uh… [21:50:42] Unhandled Exception ("Exception") [21:50:42] LDAP Exception: Unable to start TLS connection when connecting to LDAP. [21:50:45] LDAP Error #-11: Connect error [21:50:48] halfak: https://phabricator.wikimedia.org/T121011#1867085 [21:50:49] 6operations, 6Phabricator, 6Project-Creators, 6Triagers: Requests for addition to the #project-creators group (in comments) - https://phabricator.wikimedia.org/T706#1867087 (10Luke081515) I'm a meber of #Repository-Admins since today, to add these projects, so this is not a problem ;) (https://phabricator.... [21:50:50] (on phabricator) [21:51:17] halfak: there might one more of those conflicts with -de-de-oldspell .. but the -vi and -id packages you wanted got installed [21:52:09] Thanks mutante. I'll keep testing. It looks like I shouldn't need "-de-de-oldspell" for anything. [21:52:10] I would open a bug, but… you know, it's a bit hard when the bug blocks you from logging into the bug tracker :) [21:52:12] !log OS install complete on auth2001 [21:52:16] * halfak feels shame if that was in the old request [21:52:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:53:31] 6operations, 10ops-codfw: rack new yubico auth system - https://phabricator.wikimedia.org/T120263#1867095 (10Papaul) [21:54:34] Platonides: confirmed i have same attempting a login from my nonlogged in browser [21:54:50] andrewbogott: is this possibly from migration? [21:55:25] halfak: removing that, it's for the spelling before https://en.wikipedia.org/wiki/German_orthography_reform_of_1996 [21:56:45] folks are investigating the issue about ldap phab etc... [21:56:55] Platonides: thx for mentioning it, im now poking ppl [21:57:45] thanks for looking into it [21:57:56] odd to be the first one noticing [21:58:03] (03PS1) 10Dzahn: statistics: remove package myspell-de-de-oldspell [puppet] - 10https://gerrit.wikimedia.org/r/258041 (https://phabricator.wikimedia.org/T99030) [21:58:10] perhaps everyone else had it opened already [21:58:36] (03PS2) 10Dzahn: statistics: remove package myspell-de-de-oldspell [puppet] - 10https://gerrit.wikimedia.org/r/258041 (https://phabricator.wikimedia.org/T99030) [21:58:37] yea i only relogin when it forces me to do so [21:58:47] i wouldnt have noticed for at least another week [21:58:55] (03CR) 10Dzahn: [C: 032] statistics: remove package myspell-de-de-oldspell [puppet] - 10https://gerrit.wikimedia.org/r/258041 (https://phabricator.wikimedia.org/T99030) (owner: 10Dzahn) [21:59:04] greg-g: I would like to push the fix for https://phabricator.wikimedia.org/T121018 as soon as it's through code review [21:59:18] Problem is in the data type definitions only [21:59:27] Platonides: chasemp fixed it [21:59:32] try now, it works for me [21:59:52] (i didnt wanna claim credit, i just asked him about fixing it and he did it =) [22:01:14] robh: sorry, I can’t tell what the context was [22:01:34] yep, fixed [22:01:36] there was an issue with phabricator logins via ldap but chase just fixed it [22:01:37] thanks robh [22:01:43] so now its all good [22:01:47] quite welcome [22:01:47] :) [22:02:04] (03CR) 10Alex Monk: "Okay. Maybe "We do not re-use hostnames of past servers on new servers." (from the infrastructure naming conventions on wikitech) should b" [dns] - 10https://gerrit.wikimedia.org/r/257975 (https://phabricator.wikimedia.org/T118763) (owner: 10Dzahn) [22:02:38] robh: ok :) [22:04:21] 6operations, 6Phabricator, 6Project-Creators, 6Triagers: Requests for addition to the #project-creators group (in comments) - https://phabricator.wikimedia.org/T706#1867159 (10Krenair) So basically, you're proposing to create at least one project for every mediawiki extension with source hosted by Wikimedi... [22:07:45] 6operations, 6Phabricator, 6Project-Creators, 6Triagers: Requests for addition to the #project-creators group (in comments) - https://phabricator.wikimedia.org/T706#1867172 (10Luke081515) Yeah, I can do this. But like @Aklapper said, we can create a project only for extensions, if they don't use 3rd party... [22:10:17] halfak: arg, installing hunspell-vi means removing myspell-da (weird!? they are completely different languages and no other conflicts are listed but this specific one is) [22:10:51] mutante, let's remove da for now since viwiki got to us first. [22:10:57] halfak: ok [22:11:00] We'll need to address the conflict later :/ [22:11:07] Thanks for letting me know about it. [22:11:16] Conflicts: firefox (<< 2.0.0.3-2), iceape-browser (<< 1.1.1-2), icedove (<< 2.0.0.0-4), iceweasel (<< 2.0.0.3-2), libxul0d (= 1.8.0.11-3), mozilla-browser (<< 1.8+1.1.1-2), myspell-da, openoffice.org (<= 1.0.3-2), openoffice.org-core (<< 2.1~m190-1), thunderbird (<< 2.0.0.1+0dfsg-0) [22:11:24] see how myspell-da is in that list? [22:11:35] why Danish but no others ?:p [22:11:48] it might even be a typo by the package builder [22:11:55] Seems likely [22:15:48] (03PS1) 10Dzahn: statistics: package hunspell-vi conflicts with myspell-da [puppet] - 10https://gerrit.wikimedia.org/r/258047 (https://phabricator.wikimedia.org/T121011) [22:16:29] or the maintainer is danish and that is his subtle fuck you ;] [22:16:31] 6operations, 6Phabricator, 6Project-Creators, 6Triagers: Requests for addition to the #project-creators group (in comments) - https://phabricator.wikimedia.org/T706#1867220 (10Krenair) I'm not sure I see the value in doing that unless it means we can kill #MediaWiki-extensions-Other So I wouldn't object to... [22:16:37] (03PS2) 10Dzahn: statistics: package hunspell-vi conflicts with myspell-da [puppet] - 10https://gerrit.wikimedia.org/r/258047 (https://phabricator.wikimedia.org/T121011) [22:16:37] (unless its our package, ten doubtful) [22:17:20] yea, it's not ours, but Danish and Vietnamese seem so very unrelated [22:17:42] (03CR) 10Dzahn: [C: 032] statistics: package hunspell-vi conflicts with myspell-da [puppet] - 10https://gerrit.wikimedia.org/r/258047 (https://phabricator.wikimedia.org/T121011) (owner: 10Dzahn) [22:21:38] 6operations, 6Phabricator, 6Project-Creators, 6Triagers: Requests for addition to the #project-creators group (in comments) - https://phabricator.wikimedia.org/T706#1867247 (10Luke081515) Sorry, my fault, I did not found #mediawiki-extensions-other before. So in this case we can use this project, but in so... [22:25:45] 6operations, 5Patch-For-Review: Install apt packages myspell-fr aspell-id hunspell-vi on stat1003 - https://phabricator.wikimedia.org/T121011#1867259 (10Dzahn) i see no more conflicts on puppet runs now and: [stat1002:~] $ dpkg -l | egrep '(myspell-fr)|(aspell-id)|(hunspell-vi)' ii aspell-id... [22:26:11] 6operations, 5Patch-For-Review: Install apt packages myspell-fr aspell-id hunspell-vi on stat1003 - https://phabricator.wikimedia.org/T121011#1867264 (10Dzahn) 5Open>3Resolved [22:26:33] halfak: ^ closed it. i see no more conflicts and the ones you listed are installed [22:27:16] if you run into others in the future you can just reopen that [22:41:35] (03PS1) 10Andrew Bogott: Allow hiera to override $::labsproject [puppet] - 10https://gerrit.wikimedia.org/r/258051 (https://phabricator.wikimedia.org/T95185) [22:42:16] Anyone deploying? [22:42:35] (03CR) 10jenkins-bot: [V: 04-1] Allow hiera to override $::labsproject [puppet] - 10https://gerrit.wikimedia.org/r/258051 (https://phabricator.wikimedia.org/T95185) (owner: 10Andrew Bogott) [22:43:08] (03PS1) 10Jdlrobson: Enable the A/B test to measure impact of collapsing content [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258053 (https://phabricator.wikimedia.org/T120292) [22:44:28] (03PS2) 10Jdlrobson: Enable the A/B test to measure impact of collapsing content [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258053 (https://phabricator.wikimedia.org/T120292) [22:44:53] (03PS2) 10Andrew Bogott: Allow hiera to override $::labsproject [puppet] - 10https://gerrit.wikimedia.org/r/258051 (https://phabricator.wikimedia.org/T95185) [22:46:35] (03CR) 10BryanDavis: [C: 031] Enable the A/B test to measure impact of collapsing content [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258053 (https://phabricator.wikimedia.org/T120292) (owner: 10Jdlrobson) [22:48:12] (03PS5) 10Dzahn: zuul: move roles into role module [puppet] - 10https://gerrit.wikimedia.org/r/257039 [22:48:48] PROBLEM - puppet last run on es2008 is CRITICAL: CRITICAL: puppet fail [22:55:28] (03PS1) 10Dzahn: base::labs: rename class with dash character [puppet] - 10https://gerrit.wikimedia.org/r/258055 [22:59:49] (03CR) 10Eevans: [C: 031] RESTBase: Switch to service::node [puppet] - 10https://gerrit.wikimedia.org/r/257898 (https://phabricator.wikimedia.org/T118401) (owner: 10Mobrovac) [23:00:53] (03PS1) 10Dzahn: contint: rename slave-scripts class [puppet] - 10https://gerrit.wikimedia.org/r/258057 [23:04:48] PROBLEM - puppet last run on mw2054 is CRITICAL: CRITICAL: Puppet has 1 failures [23:08:59] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] [23:09:28] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] [23:12:48] (03PS3) 10Andrew Bogott: Allow hiera to override $::labsproject [puppet] - 10https://gerrit.wikimedia.org/r/258051 (https://phabricator.wikimedia.org/T95185) [23:14:30] RECOVERY - puppet last run on es2008 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [23:16:57] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [23:17:18] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [23:18:03] (03PS1) 10Jdlrobson: Enable RelatedArticles in labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258063 [23:18:49] (03PS4) 10Andrew Bogott: Allow hiera to override $::labsproject [puppet] - 10https://gerrit.wikimedia.org/r/258051 (https://phabricator.wikimedia.org/T95185) [23:19:53] (03PS2) 10BryanDavis: Enable RelatedArticles in labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258063 (owner: 10Jdlrobson) [23:20:33] (03CR) 10BryanDavis: [C: 031] Enable RelatedArticles in labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258063 (owner: 10Jdlrobson) [23:24:32] yurik: TimStarling: gwicke: I'm not super proud of it, but here's what I got – https://phabricator.wikimedia.org/T119043#1867611 [23:25:27] thanks Krinkle [23:25:56] Krinkle, thx, looking [23:26:34] ~log auth2001 signing puppet certs [23:26:43] !log auth2001 signing puppet certs [23:26:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:28:37] !log hoo@tin Synchronized php-1.27.0-wmf.8/extensions/Wikidata: Item/ Property id values need separate validators each (duration: 00m 39s) [23:28:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:29:09] RECOVERY - Unmerged changes on repository mediawiki_config on mira is OK: No changes to merge. [23:29:26] Already verified: https://www.wikidata.org/w/index.php?title=Property%3AP237&type=revision&diff=282120222&oldid=282088649 [23:32:19] RECOVERY - puppet last run on mw2054 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:34:23] TimStarling, Krinkle: mine is at https://phabricator.wikimedia.org/T119043#1867567 [23:34:57] (03CR) 10Andrew Bogott: [C: 032] Allow hiera to override $::labsproject [puppet] - 10https://gerrit.wikimedia.org/r/258051 (https://phabricator.wikimedia.org/T95185) (owner: 10Andrew Bogott) [23:37:36] (03PS1) 10EBernhardson: [WIP] Cron job to rebuild completion indices [puppet] - 10https://gerrit.wikimedia.org/r/258068 (https://phabricator.wikimedia.org/T112028) [23:38:40] (03CR) 10jenkins-bot: [V: 04-1] [WIP] Cron job to rebuild completion indices [puppet] - 10https://gerrit.wikimedia.org/r/258068 (https://phabricator.wikimedia.org/T112028) (owner: 10EBernhardson) [23:40:10] (03CR) 10Jdlrobson: [C: 031] Enable RelatedArticles in labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258063 (owner: 10Jdlrobson) [23:40:59] (03CR) 10BryanDavis: [C: 032] Enable RelatedArticles in labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258063 (owner: 10Jdlrobson) [23:41:08] (03PS2) 10EBernhardson: [WIP] Cron job to rebuild completion indices [puppet] - 10https://gerrit.wikimedia.org/r/258068 (https://phabricator.wikimedia.org/T112028) [23:41:21] (03Merged) 10jenkins-bot: Enable RelatedArticles in labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258063 (owner: 10Jdlrobson) [23:43:34] !log signing salt-key complete [23:43:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:45:28] (03PS3) 10EBernhardson: [WIP] Cron job to rebuild completion indices [puppet] - 10https://gerrit.wikimedia.org/r/258068 (https://phabricator.wikimedia.org/T120843) [23:51:44] 6operations, 10ops-codfw: rack new yubico auth system - https://phabricator.wikimedia.org/T120263#1867742 (10Papaul) [23:53:13] (03PS1) 10Andrew Bogott: Add direct hostname lookup to labs hiera [puppet] - 10https://gerrit.wikimedia.org/r/258071 [23:53:24] chasemp: ^ ? [23:54:47] andrewbogott: why https://gerrit.wikimedia.org/r/#/c/258071/1/hieradata/labs/hosts/.gitignore,unified [23:55:07] um... [23:55:12] it was like that when I found it [23:55:18] I wonder who was using that, and how it worked? [23:55:22] Seems clear that it could not have worked [23:55:43] I would put this at the very top [23:55:44] https://gerrit.wikimedia.org/r/#/c/258071/1/modules/puppetmaster/files/labs.hiera.yaml [23:55:45] but maybe I should leave that dir alone and create a new one [23:55:47] because at hti spoint [23:55:48] ok [23:55:51] "labs/%{::labsproject}/host/%{::hostname}" [23:55:57] cannot be fulfilled right? [23:56:06] there is no labsproject before [23:56:11] "labs/hosts/%{::hostname}" [23:56:22] somewhat speculative there [23:59:46] (03PS2) 10Andrew Bogott: Add direct hostname lookup to labs hiera [puppet] - 10https://gerrit.wikimedia.org/r/258071