[00:00:07] well swat is over [00:00:45] Coren: I can't find labstore2001 under site.pp, where is it? [00:01:16] are we falling back to default? we shouldn't [00:01:45] paravoid: It was never put in yet because that's where I ran the random destructive filesystem tests. That's why I'm reinstalling now, and I'm about to push a changeset to puppetize it. [00:02:07] ah [00:02:14] greg-g: i would like to deploy a couple reverted changes for wikidata [00:02:28] paravoid: Also used the opportunity to Jessie it up. :-) [00:02:30] they appear to be causing problems for our change dispatcher [00:02:51] so revert to how it was pre-deploy and investigate tomorrow... [00:06:23] (03PS1) 10coren: Add labstore200[12] minimal configuration [puppet] - 10https://gerrit.wikimedia.org/r/199542 [00:08:46] (03PS9) 10Gergő Tisza: Make vbench more generic [puppet] - 10https://gerrit.wikimedia.org/r/197240 (https://phabricator.wikimedia.org/T92701) [00:10:09] paravoid: ^^ labstore200[12]. I'm holding off for the labs_storage module before I add in substantive config. [00:11:28] why labs_storage and not labstore btw? [00:11:37] and why a module rather than just abstract what you have into a role? [00:11:38] (03PS1) 10Negative24: puppet-lint: Disable case default check [puppet] - 10https://gerrit.wikimedia.org/r/199545 [00:14:57] (03CR) 10Negative24: "Come to think of it, maybe this check is better to keep in. I could add default values to each of these case values to solve the problem s" [puppet] - 10https://gerrit.wikimedia.org/r/199545 (owner: 10Negative24) [00:17:33] bd808: seems greg-g is not around... i am reverting 2 patches for wikidata and would like to deploy [00:17:45] any objections (anyone?) [00:17:49] else i will proceed [00:17:56] aude: works for me [00:18:05] k [00:18:06] thanks [00:20:23] paravoid: Because there are going to be a number of subcomponents, and modules only load on demand. There will also be roles, that include the right classes. [00:20:55] paravoid: The name, however, I picked because labs_dns, labs_vagrant, labs_vmbuilder, etc. Fit the pattern [00:21:25] paravoid: I'm not attached to it. [00:24:07] waiting for jenkins [00:25:12] 6operations, 10RESTBase: (nodetool) cleanup needed on restbase1006 - https://phabricator.wikimedia.org/T93079#1147280 (10GWicke) 5Open>3Resolved I ran this cleanup successfully over the weekend. Resolving. [00:26:33] 6operations, 6Engineering-Community, 3ECT-March-2015: date/budget proposal for 2015 Ops Offsite - https://phabricator.wikimedia.org/T89023#1147287 (10Rfarrand) p:5Normal>3Low [00:29:07] is zuul broken? [00:30:58] (03Abandoned) 10Mattflaschen: Simplify Echo and Thanks settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/186539 (owner: 10Mattflaschen) [00:31:59] legoktm: ^ zuul broken? [00:32:12] I was just looking at that [00:32:19] https://gerrit.wikimedia.org/r/#/c/199544/ is gate and submit and https://integration.wikimedia.org/ci/ looks like it's doing nothing [00:32:22] ok [00:32:26] probably gearman again [00:32:35] :( [00:33:11] * legoktm is fixing [00:33:22] Request: POST http://integration.wikimedia.org/ci/configSubmit, from 10.64.0.171 via cp1043 cp1043 ([10.64.0.171]:80), Varnish XID 981516440 [00:33:22] Forwarded for: 76.103.130.60, 10.64.0.171 [00:33:22] Error: 503, Service Unavailable at Wed, 25 Mar 2015 00:33:16 GMT [00:33:23] uhoh [00:33:45] :( [00:55:30] jouncebot: next [00:55:30] In 13 hour(s) and 4 minute(s): Phabricator update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150325T1400) [01:06:27] 6operations, 6MediaWiki-Core-Team, 6Multimedia, 6Parsoid-Team, and 3 others: Prepare Platform/Ops April 2015 quarterly review presentation - https://phabricator.wikimedia.org/T91803#1147373 (10bd808) [01:20:23] 6operations, 7HTTPS, 3HTTPS-by-default: Force all Wikimedia cluster traffic to be over SSL for all users (logged-in and anon) - https://phabricator.wikimedia.org/T49832#1147410 (10Tony_Tan_98) Just to let everyone know, HTTPS traffic to desktop Wikimedia sites is no longer blocked in China. However, HTTPS to... [01:21:31] still waiting for jenkins, now on submodule update [01:23:13] 7Puppet, 6Labs, 6Phabricator: Disable by default Phabricator alternate file domain on Labs - https://phabricator.wikimedia.org/T93837#1147422 (10Negative24) 3NEW a:3Negative24 [01:32:40] !log aude Synchronized php-1.25wmf22/extensions/Wikidata: Fix change dispatcher issues (duration: 00m 18s) [01:32:49] Logged the message, Master [01:32:55] done :) [01:35:34] (03PS1) 10Ori.livneh: Disable TCP slow-start restart on caches [puppet] - 10https://gerrit.wikimedia.org/r/199556 [01:35:56] bblack: ^ [02:13:43] 7Puppet, 6Labs, 6Phabricator: Disable by default Phabricator alternate file domain on Labs - https://phabricator.wikimedia.org/T93837#1147460 (10Negative24) [02:31:46] (03Abandoned) 10Negative24: Configure Puppet to use phd group setting [puppet] - 10https://gerrit.wikimedia.org/r/199538 (owner: 10Negative24) [02:41:32] !log l10nupdate Synchronized php-1.25wmf21/cache/l10n: (no message) (duration: 09m 12s) [02:41:40] Logged the message, Master [02:43:38] (03PS1) 10Gage: IPsec: Icinga monitor for Strongswan connections [puppet] - 10https://gerrit.wikimedia.org/r/199561 [02:48:11] !log LocalisationUpdate completed (1.25wmf21) at 2015-03-25 02:47:07+00:00 [02:48:18] Logged the message, Master [03:03:06] PROBLEM - puppet last run on mw1130 is CRITICAL: CRITICAL: puppet fail [03:04:03] (03CR) 10Alex Monk: [C: 031] Setting import sources for uawikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193662 (https://phabricator.wikimedia.org/T91187) (owner: 10Base) [03:04:15] (03CR) 10Alex Monk: [C: 031] Remove unused variables and commented-out code from CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/156078 (https://bugzilla.wikimedia.org/29902) (owner: 10Withoutaname) [03:05:46] (03CR) 10Alex Monk: [C: 031] Enable transwiki imports for Telugu Wikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194908 (https://phabricator.wikimedia.org/T91635) (owner: 10Odder) [03:06:45] !log l10nupdate Synchronized php-1.25wmf22/cache/l10n: (no message) (duration: 06m 44s) [03:06:56] Logged the message, Master [03:11:41] !log LocalisationUpdate completed (1.25wmf22) at 2015-03-25 03:10:38+00:00 [03:11:50] Logged the message, Master [03:13:59] (03PS2) 10Tim Landscheidt: Ensure that apt preferences are named *.pref [puppet] - 10https://gerrit.wikimedia.org/r/195081 (https://phabricator.wikimedia.org/T60681) [03:14:24] (03PS1) 10Negative24: Default ignore alternate file domain config [puppet] - 10https://gerrit.wikimedia.org/r/199564 (https://phabricator.wikimedia.org/T93837) [03:15:34] (03CR) 10Alex Monk: [C: 04-1] "Hi Odder, Steinsplitter: Just been going through config change requests, found this one. I'm not clear on what our policy for adding domai" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194913 (https://phabricator.wikimedia.org/T91630) (owner: 10Odder) [03:18:17] (03CR) 10Tim Landscheidt: "I checked "git log -p 9d6b2e99bd1bf3ad31b86714cc8d4bda68679b25..21f1fdcc834d593ee3dddcc61d35ce115620f696" that there have been no uses of " [puppet] - 10https://gerrit.wikimedia.org/r/195081 (https://phabricator.wikimedia.org/T60681) (owner: 10Tim Landscheidt) [03:21:56] RECOVERY - puppet last run on mw1130 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [03:24:00] (03CR) 10Alex Monk: [C: 031] Create and modify groups in eswikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/198749 (https://phabricator.wikimedia.org/T93371) (owner: 10Gerardduenas) [03:30:09] (03CR) 10Alex Monk: [C: 031] Let dawiki bureaucrats add/remove accountcreator group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/198753 (https://phabricator.wikimedia.org/T93260) (owner: 10Glaisher) [03:31:54] (03PS10) 10Gergő Tisza: Make vbench more generic [puppet] - 10https://gerrit.wikimedia.org/r/197240 (https://phabricator.wikimedia.org/T92701) [03:34:16] (03CR) 10Alex Monk: [C: 031] Add import sources for cawikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/198786 (https://phabricator.wikimedia.org/T93203) (owner: 10Gerardduenas) [03:40:41] 6operations, 7HTTPS, 3HTTPS-by-default: Force all Wikimedia cluster traffic to be over SSL for all users (logged-in and anon) - https://phabricator.wikimedia.org/T49832#1147536 (10Chmarkine) >>! In T49832#1147410, @Tony_Tan_98 wrote: > Just to let everyone know, HTTPS traffic to desktop Wikimedia sites is no... [04:16:15] RECOVERY - check if wikidata.org dispatch lag is higher than 2 minutes on wikidata is OK: HTTP OK: HTTP/1.1 200 OK - 1475 bytes in 0.214 second response time [04:18:16] 6operations, 6MediaWiki-Core-Team, 6Multimedia, 6Parsoid-Team, and 3 others: Prepare Platform/Ops April 2015 quarterly review presentation - https://phabricator.wikimedia.org/T91803#1147632 (10bd808) [04:25:02] 6operations, 6MediaWiki-Core-Team, 6Multimedia, 6Parsoid-Team, and 3 others: Prepare Platform/Ops April 2015 quarterly review presentation - https://phabricator.wikimedia.org/T91803#1147645 (10bd808) [04:26:52] 7Blocked-on-Operations, 6operations, 10Continuous-Integration, 6Scrum-of-Scrums: Jenkins: Re-enable lint checks for Apache config in operations-puppet - https://phabricator.wikimedia.org/T72068#1147649 (10Dzahn) I'm wondering if it's an option to just go back to the separate apache-config repo we had befor... [04:27:25] (03PS1) 10Alex Monk: Make references to tasks/bugs more consistent [mediawiki-config] - 10https://gerrit.wikimedia.org/r/199574 (https://phabricator.wikimedia.org/T31902) [04:39:06] 6operations, 6MediaWiki-Core-Team, 6Multimedia, 6Parsoid-Team, and 3 others: Prepare Platform/Ops April 2015 quarterly review presentation - https://phabricator.wikimedia.org/T91803#1147688 (10bd808) [04:41:56] 6operations, 6MediaWiki-Core-Team, 6Multimedia, 6Parsoid-Team, and 3 others: Prepare Platform/Ops April 2015 quarterly review presentation - https://phabricator.wikimedia.org/T91803#1147700 (10bd808) [04:42:29] 6operations, 6MediaWiki-Core-Team, 6Multimedia, 6Parsoid-Team, and 3 others: Prepare Platform/Ops April 2015 quarterly review presentation - https://phabricator.wikimedia.org/T91803#1096558 (10bd808) [04:46:09] (03PS1) 10KartikMistry: WIP: Use dblist for contenttranslation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/199576 [04:55:20] (03CR) 10Alex Monk: "Are you planning to introduce more config settings based on the same settings?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/199576 (owner: 10KartikMistry) [04:58:31] 6operations, 10OTRS, 6Security, 7HTTPS: SSL-config of the OTRS is outdated - https://phabricator.wikimedia.org/T91504#1147731 (10Chmarkine) [05:03:00] (03PS2) 10Spage: Redirect dev.wikimedia.org URLs [puppet] - 10https://gerrit.wikimedia.org/r/199182 (https://phabricator.wikimedia.org/T372) [05:22:18] (03PS1) 10Dzahn: point dev.wikimedia to cluster, not misc-web [dns] - 10https://gerrit.wikimedia.org/r/199581 [05:22:40] (03PS2) 10Dzahn: point dev.wikimedia to cluster, not misc-web [dns] - 10https://gerrit.wikimedia.org/r/199581 (https://phabricator.wikimedia.org/T372) [05:23:36] (03CR) 10Dzahn: "will need https://gerrit.wikimedia.org/r/#/c/199581/ to let the Apache cluster actually get the requests for dev.wm.org" [puppet] - 10https://gerrit.wikimedia.org/r/199182 (https://phabricator.wikimedia.org/T372) (owner: 10Spage) [05:24:01] (03PS3) 10Mattflaschen: Enable editing of Flow posts, by autoconfirmed users, on mediawikwiki, enwiki, ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/196068 (https://phabricator.wikimedia.org/T90670) (owner: 10EBernhardson) [05:24:44] (03PS4) 10Mattflaschen: Enable editing of Flow posts, by autoconfirmed users, on mediawikwiki, enwiki, ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/196068 (https://phabricator.wikimedia.org/T90670) (owner: 10EBernhardson) [05:33:40] 6operations, 10OTRS, 6Security, 7HTTPS: SSL-config of the OTRS is outdated - https://phabricator.wikimedia.org/T91504#1147758 (10Dzahn) manifests/role/otrs.pp already uses: $ssl_settings = ssl_ciphersuite('apache-2.2', 'compat', '365') either this is not actually used or the same problems apply to a coup... [05:42:29] 6operations, 10OTRS, 6Security, 7HTTPS: SSL-config of the OTRS is outdated - https://phabricator.wikimedia.org/T91504#1147761 (10Dzahn) 'compat' => 'ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECD HE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-SHA256 :ECDHE-... [05:45:33] 6operations, 5Patch-For-Review: Setup poolcounter servers for codfw - https://phabricator.wikimedia.org/T93261#1147762 (10Dzahn) [05:46:09] (03CR) 10Glaisher: "needs rebase" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193827 (https://phabricator.wikimedia.org/T91223) (owner: 10Gerrit Patch Uploader) [05:48:30] 6operations, 5Patch-For-Review: Setup poolcounter servers for codfw - https://phabricator.wikimedia.org/T93261#1147764 (10Dzahn) subra is also up now. https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=subra poolcounter - OK Poolcounter connection - TCP OK - 0.043 second response time on... [05:49:12] (03PS4) 10Glaisher: Add Draft namespace on zhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193827 (https://phabricator.wikimedia.org/T91223) (owner: 10Gerrit Patch Uploader) [05:49:40] (03CR) 10Glaisher: "rebased" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193827 (https://phabricator.wikimedia.org/T91223) (owner: 10Gerrit Patch Uploader) [05:51:59] 6operations, 10OTRS, 6Security, 7HTTPS: SSL-config of the OTRS is outdated - https://phabricator.wikimedia.org/T91504#1147772 (10Chmarkine) >>! In T91504#1140486, @DaBPunkt wrote: > > Sure it does, but the webserver for our OTRS doesn’t use it. HSTS is a nice idea, yes >>! In T91504#1147758, @Dzahn wrote... [05:52:13] 6operations, 3codfw-appserver-setup, 3wikis-in-codfw: Set up the mediawiki application layer in codfw - https://phabricator.wikimedia.org/T86894#1147774 (10Dzahn) [05:52:14] 6operations, 5Patch-For-Review: Setup poolcounter servers for codfw - https://phabricator.wikimedia.org/T93261#1147773 (10Dzahn) 5Open>3Resolved [05:56:25] Krenair: nothing planned at moment, dblist makes easier to maintain as list will grow in future. [05:56:43] Krenair: re: 199576 [05:57:11] Krenair: correct me if I'm wrong :) [06:23:58] 6operations, 5Patch-For-Review: Setup poolcounter servers for codfw - https://phabricator.wikimedia.org/T93261#1147805 (10Joe) @Dzahn I'll prepare and submit the mediawiki config change to use those two servers in codfw, thanks! [06:26:27] ori: are you still awake ? [06:27:54] anyway ori, when you come online: https://github.com/outbrain/gruffalo might be of a help for graphite issues. [06:29:54] PROBLEM - puppet last run on amssq54 is CRITICAL: CRITICAL: Puppet has 3 failures [06:29:54] PROBLEM - puppet last run on lvs2004 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:55] PROBLEM - puppet last run on labsdb1003 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:04] PROBLEM - puppet last run on mw1052 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:04] PROBLEM - puppet last run on mw2036 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:04] PROBLEM - puppet last run on mw1166 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:13] PROBLEM - puppet last run on amssq46 is CRITICAL: CRITICAL: Puppet has 1 failures [06:34:24] PROBLEM - puppet last run on mw2003 is CRITICAL: CRITICAL: Puppet has 1 failures [06:45:34] RECOVERY - puppet last run on labsdb1003 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [06:46:55] RECOVERY - puppet last run on amssq54 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:04] RECOVERY - puppet last run on lvs2004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:13] RECOVERY - puppet last run on mw1052 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [06:47:14] RECOVERY - puppet last run on mw2036 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:14] RECOVERY - puppet last run on mw1166 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [06:47:14] RECOVERY - puppet last run on amssq46 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [06:48:25] RECOVERY - puppet last run on mw2003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:08:44] PROBLEM - git.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:35:18] (03CR) 10BBlack: [C: 032] Disable TCP slow-start restart on caches [puppet] - 10https://gerrit.wikimedia.org/r/199556 (owner: 10Ori.livneh) [07:36:24] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [07:36:54] strontium :P [07:41:49] (03PS1) 10BBlack: remove CAMELLIA from ciphersuites [puppet] - 10https://gerrit.wikimedia.org/r/199582 [07:43:34] (03CR) 10BBlack: [C: 04-1] "Needs some discussion first, just putting this up there to push the issue a bit. The upside is by dropping these ciphers we gain FIPS com" [puppet] - 10https://gerrit.wikimedia.org/r/199582 (owner: 10BBlack) [07:43:52] !log LocalisationUpdate ResourceLoader cache refresh completed at Wed Mar 25 07:42:46 UTC 2015 (duration 42m 45s) [07:43:59] Logged the message, Master [07:50:34] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 59654 bytes in 0.204 second response time [08:20:22] (03CR) 10Krinkle: [C: 031] contint: disable hhvm stacktraces / map [puppet] - 10https://gerrit.wikimedia.org/r/195035 (https://phabricator.wikimedia.org/T64788) (owner: 10Hashar) [08:24:16] (03PS3) 10Krinkle: contint: migrate to require_package() [puppet] - 10https://gerrit.wikimedia.org/r/188034 (owner: 10Hashar) [08:24:20] (03CR) 10Krinkle: [C: 031] contint: migrate to require_package() [puppet] - 10https://gerrit.wikimedia.org/r/188034 (owner: 10Hashar) [08:48:48] 6operations, 10Wikimedia-Labs-wikitech-interface, 7HTTPS: wikitech.wikimedia.org SSL certificate considered "outdated security" in Chrome - https://phabricator.wikimedia.org/T92709#1147929 (10Krinkle) >>! In T92709#1118731, @Dzahn wrote: > this should be T73156 (SHA1 needs to be replaced with a SHA256 cert)... [08:57:59] 7Blocked-on-Operations, 6operations, 10Continuous-Integration, 6Scrum-of-Scrums: Jenkins: Re-enable lint checks for Apache config in operations-puppet - https://phabricator.wikimedia.org/T72068#1147951 (10Krinkle) I'd recommend for someone experienced with apache config and operations/puppet (maybe @dzahn)... [09:00:20] (03CR) 10Hashar: "I have no idea what HSTS is nor do I have time to look at it." [puppet] - 10https://gerrit.wikimedia.org/r/198819 (https://phabricator.wikimedia.org/T40516) (owner: 10Chmarkine) [09:05:24] PROBLEM - mediawiki-installation DSH group on mw2088 is CRITICAL: Host mw2088 is not in mediawiki-installation dsh group [09:06:21] (03PS3) 10Filippo Giunchedi: Ensure that apt preferences are named *.pref [puppet] - 10https://gerrit.wikimedia.org/r/195081 (https://phabricator.wikimedia.org/T60681) (owner: 10Tim Landscheidt) [09:06:52] (03CR) 10Filippo Giunchedi: [C: 031] "happy to merge this, will have more time next week though to babysit" [puppet] - 10https://gerrit.wikimedia.org/r/195081 (https://phabricator.wikimedia.org/T60681) (owner: 10Tim Landscheidt) [09:07:45] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] send additional metrics to graphite [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/199264 (https://phabricator.wikimedia.org/T78514) (owner: 10Eevans) [09:07:54] RECOVERY - NTP on mw2088 is OK: NTP OK: Offset -0.004019260406 secs [09:09:44] RECOVERY - RAID on mw2088 is OK: OK: no RAID installed [09:09:55] RECOVERY - dhclient process on mw2088 is OK: PROCS OK: 0 processes with command name dhclient [09:10:04] RECOVERY - DPKG on mw2088 is OK: All packages OK [09:10:14] RECOVERY - salt-minion processes on mw2088 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [09:10:14] RECOVERY - Disk space on mw2088 is OK: DISK OK [09:10:15] RECOVERY - nutcracker port on mw2088 is OK: TCP OK - 0.000 second response time on port 11212 [09:10:34] RECOVERY - nutcracker process on mw2088 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [09:10:45] (03PS1) 10Filippo Giunchedi: update cassandra submodule [puppet] - 10https://gerrit.wikimedia.org/r/199585 [09:11:03] RECOVERY - configured eth on mw2088 is OK: NRPE: Unable to read output [09:12:16] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] update cassandra submodule [puppet] - 10https://gerrit.wikimedia.org/r/199585 (owner: 10Filippo Giunchedi) [09:13:57] 7Blocked-on-Operations, 7Puppet, 6operations, 10Beta-Cluster: Setup a mediawiki03 (or what not) on Beta Cluster that we can direct the security scanning work to - https://phabricator.wikimedia.org/T72181#1147980 (10hashar) >>! In T72181#734082, @dduvall wrote: > Still waiting for https://gerrit.wikimedia.o... [09:18:15] 6operations, 10ops-codfw, 3codfw-appserver-setup, 3wikis-in-codfw: mw2208-2209 have unreachable mgmt interfaces - https://phabricator.wikimedia.org/T93857#1147984 (10Joe) 3NEW [09:19:27] 6operations, 10ops-codfw, 3codfw-appserver-setup, 3wikis-in-codfw: mw2050 has probably a faulty disk - https://phabricator.wikimedia.org/T93858#1147990 (10Joe) 3NEW [09:19:46] 6operations, 10Wikimedia-Labs-wikitech-interface, 7HTTPS: wikitech.wikimedia.org SSL certificate considered "outdated security" in Chrome - https://phabricator.wikimedia.org/T92709#1147996 (10yuvipanda) p:5Triage>3Normal [09:21:12] 7Blocked-on-Operations, 7Puppet, 6operations, 10Beta-Cluster: Setup a mediawiki03 (or what not) on Beta Cluster that we can direct the security scanning work to - https://phabricator.wikimedia.org/T72181#1147999 (10yuvipanda) Note that the old mediawiki03 doesn't exist at all anymore... [09:24:37] 7Blocked-on-Operations, 7Puppet, 6operations, 10Beta-Cluster: Setup a mediawiki03 (or what not) on Beta Cluster that we can direct the security scanning work to - https://phabricator.wikimedia.org/T72181#1148021 (10yuvipanda) So another option is to just hit apache directly on one host - just open that up... [09:28:00] mobrovac: btw the 'additional cassandra metrics' patch is merged, I don't have much time to babysit a cassandra restart ATM though, you reckon it'd be simply going around the cluster and restart individual nodes? (cc urandom) [09:28:22] godog: yep, saw that, thnx [09:28:29] godog: shouldn't be a pb [09:28:44] godog: i can do it if you wish [09:30:10] auf, we really should find a way to allow (authorised) people to delete dashboards from grafana [09:30:15] i'm in cassandra hell there right now [09:30:16] :P [09:31:29] 7Puppet, 6Multimedia, 6Release-Engineering, 6Scrum-of-Scrums, and 2 others: Create basic puppet role for Sentry - https://phabricator.wikimedia.org/T84956#1148047 (10Gilles) Repo is live at gerrit.wikimedia.org/r/operations/software/sentry and is a clone of yesterday's state of sentry on github [09:34:44] PROBLEM - Disk space on mw2178 is CRITICAL: Connection refused by host [09:34:44] PROBLEM - dhclient process on mw2178 is CRITICAL: Connection refused by host [09:35:16] mobrovac: yeah I've seen some activity on phab re: that [09:35:35] PROBLEM - nutcracker port on mw2178 is CRITICAL: Connection refused by host [09:35:44] PROBLEM - RAID on mw2178 is CRITICAL: Connection refused by host [09:35:45] PROBLEM - DPKG on mw2178 is CRITICAL: Connection refused by host [09:35:45] PROBLEM - nutcracker process on mw2178 is CRITICAL: Connection refused by host [09:35:54] mobrovac: if you have time to bounce the cluster I'd appreciate that yeah, thanks! [09:36:03] PROBLEM - configured eth on mw2178 is CRITICAL: Connection refused by host [09:36:04] PROBLEM - salt-minion processes on mw2178 is CRITICAL: Connection refused by host [09:36:16] "if you have time", hehe funny guy godog :) [09:36:21] godog: will do [09:36:54] PROBLEM - puppet last run on mw2178 is CRITICAL: Connection refused by host [09:36:55] <_joe_> mmmh I feel like something wrong again in that damn dhcp file [09:37:03] <_joe_> shit. [09:38:55] mobrovac: hahaha thanks, wasn't meant to be sarcastic [09:39:44] godog: no, no, i took it as "optimistic" :) [09:43:41] <_joe_> is grafana allowing us to create dashboards correctly? [09:43:51] <_joe_> or is still buggy and faulty? [09:44:22] I believe that part is working [09:44:50] _joe_: create dashboards, yes, create graphs, no, you have to copy an existing one and modify it [09:45:02] <_joe_> ok, exactly [09:45:08] <_joe_> that was the problem [09:45:32] what i do is open an existing dashboard, change its name, save and then modify the graphs [09:45:37] (03PS2) 10Odder: Add a domain to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194913 (https://phabricator.wikimedia.org/T91630) [09:46:14] PROBLEM - Disk space on ms-be2002 is CRITICAL: DISK CRITICAL - /srv/swift-storage/sde1 is not accessible: Input/output error [09:46:24] PROBLEM - RAID on ms-be2002 is CRITICAL: CRITICAL: 1 failed LD(s) (Offline) [09:48:46] 6operations, 7Mail, 7Monitoring: Mailing lists alerts - https://phabricator.wikimedia.org/T93783#1148128 (10fgiunchedi) subscription rates would be nice too if not too hard to extract from mailman (sadly I think it'll have to be from logs) [09:51:25] does anyone know if/where the hierator service is up and running? [09:51:31] can't seem to find it in puppet [09:52:21] mobrovac: isn't [09:52:33] I see [09:52:36] interesting [09:52:41] oki thnx YuviPanda [09:54:44] PROBLEM - carbon-cache too many creates on graphite1001 is CRITICAL: CRITICAL: 1.67% of data above the critical threshold [1000.0] [09:55:56] euh could ^^ be from enabling the new C* metrics? [09:56:05] likely [10:00:01] (03PS1) 10Giuseppe Lavagetto: dhcp: swap mac addresses for mw2178 and mw2179 [puppet] - 10https://gerrit.wikimedia.org/r/199594 [10:00:24] (03PS2) 10Giuseppe Lavagetto: dhcp: swap mac addresses for mw2178 and mw2179 [puppet] - 10https://gerrit.wikimedia.org/r/199594 [10:00:57] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] dhcp: swap mac addresses for mw2178 and mw2179 [puppet] - 10https://gerrit.wikimedia.org/r/199594 (owner: 10Giuseppe Lavagetto) [10:01:01] (03CR) 10Hashar: "I have poked Giuseppe and Ori by private email to attract review." [puppet] - 10https://gerrit.wikimedia.org/r/195035 (https://phabricator.wikimedia.org/T64788) (owner: 10Hashar) [10:02:02] <_joe_> mh I was sure I did put -1 there, hashar :) [10:02:04] RECOVERY - Disk space on ms-be2002 is OK: DISK OK [10:02:17] <_joe_> I'm pretty sure it's perf_pid_map and not PerfPidMap [10:02:22] <_joe_> but I'll check [10:03:05] _joe_: Buongiorno :) I have merely copy pasted from ori comment iirc, though I might have looked at hhvm doc a bit [10:04:04] RECOVERY - nutcracker port on mw2178 is OK: TCP OK - 0.000 second response time on port 11212 [10:04:10] <_joe_> hasharConfcall: https://github.com/facebook/hhvm/wiki/INI-Settings I am right [10:04:15] (03CR) 10Hashar: "I have no idea what HSTS is nor do I have time to look at it." [puppet] - 10https://gerrit.wikimedia.org/r/198458 (https://phabricator.wikimedia.org/T40516) (owner: 10Chmarkine) [10:04:16] RECOVERY - RAID on mw2178 is OK: OK: no RAID installed [10:04:16] RECOVERY - nutcracker process on mw2178 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [10:04:16] RECOVERY - DPKG on mw2178 is OK: All packages OK [10:04:24] RECOVERY - configured eth on mw2178 is OK: NRPE: Unable to read output [10:04:34] RECOVERY - salt-minion processes on mw2178 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [10:04:44] RECOVERY - Disk space on mw2178 is OK: DISK OK [10:04:44] RECOVERY - dhclient process on mw2178 is OK: PROCS OK: 0 processes with command name dhclient [10:05:11] _joe_: and there is a hhvm.keep_perf_pid_map parameter as well [10:05:44] PROBLEM - puppet last run on ms-be2002 is CRITICAL: CRITICAL: Puppet has 1 failures [10:08:53] <_joe_> hashar: what does that do, you probably have to figure out from the sources [10:08:54] PROBLEM - RAID on mw2178 is CRITICAL: Connection refused by host [10:08:55] PROBLEM - DPKG on mw2178 is CRITICAL: Connection refused by host [10:08:55] PROBLEM - nutcracker process on mw2178 is CRITICAL: Connection refused by host [10:09:09] <_joe_> oh I am a moron [10:09:13] PROBLEM - configured eth on mw2178 is CRITICAL: Connection refused by host [10:10:23] PROBLEM - nutcracker port on mw2178 is CRITICAL: Connection refused by host [10:10:45] PROBLEM - salt-minion processes on mw2178 is CRITICAL: Connection refused by host [10:10:54] PROBLEM - Disk space on mw2178 is CRITICAL: Connection refused by host [10:10:55] PROBLEM - dhclient process on mw2178 is CRITICAL: Connection refused by host [10:13:35] PROBLEM - Host mw2178 is DOWN: PING CRITICAL - Packet loss = 100% [10:14:51] 6operations, 10RESTBase, 7Monitoring, 5Patch-For-Review: Detailed cassandra monitoring: metrics and dashboards done, need to set up alerts - https://phabricator.wikimedia.org/T78514#1148211 (10mobrovac) >>! In T78514#1147964, @gerritbot wrote: > Change 199264 merged by Filippo Giunchedi: > send additional... [10:15:34] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 13.33% of data above the critical threshold [500.0] [10:17:04] godog: ^^ seems too much info from C* :/ [10:17:26] <_joe_> mavhc: what? [10:17:46] <_joe_> err mobrovac what gives you the impression it's too much info? [10:18:40] euh well after restasring C* problems with graphite1k1 started to appear [10:18:43] RECOVERY - salt-minion processes on mw2178 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [10:18:54] RECOVERY - dhclient process on mw2178 is OK: PROCS OK: 0 processes with command name dhclient [10:18:54] RECOVERY - Disk space on mw2178 is OK: DISK OK [10:18:54] RECOVERY - Host mw2178 is UP: PING OK - Packet loss = 0%, RTA = 45.58 ms [10:18:58] that might be just a correlation, and not a causation though [10:19:15] <_joe_> well that is /not/ a problem on graphite [10:19:50] there's been a steady increase in metrics received, we'll see how many new there are [10:19:54] RECOVERY - nutcracker port on mw2178 is OK: TCP OK - 0.000 second response time on port 11212 [10:20:04] RECOVERY - nutcracker process on mw2178 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [10:20:04] RECOVERY - DPKG on mw2178 is OK: All packages OK [10:20:04] RECOVERY - RAID on mw2178 is OK: OK: no RAID installed [10:20:13] <_joe_> graphite is just telling us that we have more than 500 responses with code 5xx per minute in the last 15 minutes [10:20:14] RECOVERY - configured eth on mw2178 is OK: NRPE: Unable to read output [10:20:23] <_joe_> mobrovac: https://gdash.wikimedia.org/dashboards/reqerror/ [10:21:32] ah i see [10:21:38] didn't have that magic link :) [10:21:43] _joe_: thnx [10:21:52] too many magic links, hard to keep track [10:22:01] <_joe_> mobrovac: you'll get used [10:22:35] RECOVERY - puppet last run on mw2178 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:24:26] 6operations, 10ops-codfw, 3codfw-appserver-setup, 3wikis-in-codfw: mw2208-2209, mw2213 have unreachable mgmt interfaces - https://phabricator.wikimedia.org/T93857#1148230 (10Joe) [10:28:13] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [10:28:53] RECOVERY - puppet last run on mw2088 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:34:40] (03CR) 10Hashar: [C: 031] "Seems most the errors are related to puppet:///private/ . I am wondering whether it should be reported upstream, but 'private' might be wi" [puppet] - 10https://gerrit.wikimedia.org/r/198116 (https://phabricator.wikimedia.org/T87132) (owner: 10Tim Landscheidt) [11:03:19] PROBLEM - dhclient process on mw2195 is CRITICAL: Connection refused by host [11:03:48] PROBLEM - mediawiki-installation DSH group on mw2195 is CRITICAL: Host mw2195 is not in mediawiki-installation dsh group [11:04:09] PROBLEM - nutcracker port on mw2195 is CRITICAL: Connection refused by host [11:04:28] PROBLEM - nutcracker process on mw2195 is CRITICAL: Connection refused by host [11:04:39] PROBLEM - puppet last run on mw2195 is CRITICAL: Connection refused by host [11:04:59] PROBLEM - DPKG on mw2195 is CRITICAL: Connection refused by host [11:04:59] PROBLEM - salt-minion processes on mw2195 is CRITICAL: Connection refused by host [11:05:19] PROBLEM - Disk space on mw2195 is CRITICAL: Connection refused by host [11:06:38] PROBLEM - RAID on mw2195 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [11:07:08] PROBLEM - configured eth on mw2195 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [11:07:45] (03PS1) 10Gilles: Basic role for Sentry [puppet] - 10https://gerrit.wikimedia.org/r/199598 (https://phabricator.wikimedia.org/T84956) [11:10:00] <_joe_> sorry I forgot to schedule downtime [11:10:13] <_joe_> gilles: where do you want to use sentry? [11:10:20] <_joe_> not in prod, I hope [11:10:21] 6operations, 7Graphite: scale graphite deployment (tracking) - https://phabricator.wikimedia.org/T85451#1148278 (10fgiunchedi) [11:10:22] 6operations, 7Graphite, 5Patch-For-Review: replace txstatsd - https://phabricator.wikimedia.org/T90111#1148276 (10fgiunchedi) 5duplicate>3Open reopening, still need to replace txstatsd with statsite [11:10:51] <_joe_> I had a prod installation of sentry at WORK~1, it required SIGNIFICANT ops work to just stay up [11:11:07] <_joe_> and we had not even 1/1000th of wikimedia traffic [11:12:40] RECOVERY - dhclient process on mw2195 is OK: PROCS OK: 0 processes with command name dhclient [11:12:49] RECOVERY - DPKG on mw2195 is OK: All packages OK [11:12:49] RECOVERY - salt-minion processes on mw2195 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [11:12:49] RECOVERY - RAID on mw2195 is OK: OK: no RAID installed [11:13:09] RECOVERY - Disk space on mw2195 is OK: DISK OK [11:13:20] RECOVERY - configured eth on mw2195 is OK: NRPE: Unable to read output [11:13:29] RECOVERY - nutcracker port on mw2195 is OK: TCP OK - 0.000 second response time on port 11212 [11:13:40] RECOVERY - nutcracker process on mw2195 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [11:21:31] (03PS1) 10Filippo Giunchedi: statsite: new module [puppet] - 10https://gerrit.wikimedia.org/r/199599 (https://phabricator.wikimedia.org/T90111) [11:21:33] (03PS1) 10Filippo Giunchedi: statsdlb: replace txstatsd with statsite [puppet] - 10https://gerrit.wikimedia.org/r/199600 (https://phabricator.wikimedia.org/T90111) [11:21:59] PROBLEM - puppet last run on mw2195 is CRITICAL: CRITICAL: Puppet has 6 failures [11:24:46] mobrovac: yep that's a lot of metrics https://ganglia.wikimedia.org/latest/graph_all_periods.php?c=Miscellaneous%20eqiad&h=graphite1001.eqiad.wmnet&r=hour&z=default&jr=&js=&st=1426676089&v=85.0&m=part_max_used&vl=%25&ti=Maximum%20Disk%20Space%20Used&z=large [11:25:08] RECOVERY - puppet last run on mw2195 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:26:00] hehe, godog you can see the increasing steps on the graphs, which match exactly my restarts of C* nodes [11:31:23] 6operations, 10RESTBase, 7Monitoring, 5Patch-For-Review: Detailed cassandra monitoring: metrics and dashboards done, need to set up alerts - https://phabricator.wikimedia.org/T78514#1148327 (10fgiunchedi) btw that added a lot of metrics, consuming ~9% of disk space on graphite1001, e.g. ``` graphite1001:/... [11:32:07] mobrovac: in the ganglia graphs? that's a lot of restarts :) graphite rate-limits the number of creates so that's probably that on those graphs [11:32:18] ah [11:32:48] godog: do we have a solution for this (other than getting rid of the new metrics) ? [11:34:12] mobrovac: I think we're fine with limiting them to the interesting ones, not getting rid altogether, to answer your question ATM no because graphite is single-machine [11:34:57] ok [11:38:08] (03PS1) 10Giuseppe Lavagetto: dsh: add missing codfw appservers to the mediawiki_installation group [puppet] - 10https://gerrit.wikimedia.org/r/199605 [11:38:10] (03PS1) 10Giuseppe Lavagetto: mediawiki: re-enable monitoring in codfw [puppet] - 10https://gerrit.wikimedia.org/r/199606 [11:42:29] 6operations, 10RESTBase, 7Monitoring, 5Patch-For-Review: Detailed cassandra monitoring: metrics and dashboards done, need to set up alerts - https://phabricator.wikimedia.org/T78514#1148347 (10mobrovac) >>! In T78514#1148327, @fgiunchedi wrote: > I think we need to filter the column family metrics to the r... [11:45:23] _joe_: in production, there's already been an ops thread about it and people advised to use trebuchet for its deployment [11:45:29] (03CR) 10Filippo Giunchedi: [C: 04-1] VarnishStatusCollector for diamond. (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/199302 (https://phabricator.wikimedia.org/T88705) (owner: 1020after4) [11:45:32] PROBLEM - mediawiki-installation DSH group on mw2184 is CRITICAL: Host mw2184 is not in mediawiki-installation dsh group [11:45:42] <_joe_> gilles: well I'm not speaking about deploy [11:45:48] <_joe_> I'm speaking about scaling [11:46:00] _joe_: right now the goal isn't to capture all our traffic, but a specific area of the site (UploadWizard pages) [11:46:01] <_joe_> I don't know what is the message rate you expect [11:46:08] <_joe_> gilles: oh ok [11:46:20] <_joe_> gilles: then you're fine :) [11:46:23] scaling will be a separate project, and the sentry team has guidelines about what a large scale setup looks like [11:46:29] <_joe_> yeah [11:46:35] <_joe_> broken guidelines too :P [11:46:51] at least they're very responsive on irc [11:46:53] <_joe_> but, we'll see when we get there [11:47:29] <_joe_> mine was more like a warning; I do love senttry btw [11:47:57] we don't expect much in terms of performance from the default setup [11:48:00] <_joe_> it saved me incalculable time and helped me catch quite a lot of bugs [11:48:37] <_joe_> the default might not be good for what you're trying to do either, but let's see [11:51:30] sigh puppet "invalid byte sequence in utf-8" https://phabricator.wikimedia.org/T93614 [11:51:33] SIGPUPPET [11:52:41] PROBLEM - puppet last run on mw2184 is CRITICAL: CRITICAL: Puppet has 6 failures [11:54:21] RECOVERY - puppet last run on mw2184 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [12:04:55] (03CR) 10Giuseppe Lavagetto: [C: 032] dsh: add missing codfw appservers to the mediawiki_installation group [puppet] - 10https://gerrit.wikimedia.org/r/199605 (owner: 10Giuseppe Lavagetto) [12:06:06] 6operations, 6Labs, 10hardware-requests: Replace virt1000 with a newer warrantied server - https://phabricator.wikimedia.org/T90626#1148421 (10faidon) I don't think that the "under warranty" bit is a dealbreaker. The point of my comment on IRC is that we should be prepared for a catastrophic event for one th... [12:07:54] 6operations, 10RESTBase, 10hardware-requests: Expand RESTBase cluster capacity - https://phabricator.wikimedia.org/T93790#1148422 (10mark) [12:08:12] 6operations, 6Labs, 10hardware-requests: Replace virt1000 with a newer warrantied server - https://phabricator.wikimedia.org/T90626#1148424 (10yuvipanda) +1 on having a hot spare. I remember the close-to-heart-attack several people got when we thought virt1000's motherboard had fried when only one of the lig... [12:14:31] RECOVERY - carbon-cache too many creates on graphite1001 is OK: OK: Less than 1.00% above the threshold [500.0] [12:16:16] 6operations, 6Phabricator, 6Project-Creators: create procurement project - https://phabricator.wikimedia.org/T93796#1148434 (10Aklapper) As written in T93760#1146356 the ACL requirements need to be clarified first. [12:17:24] (03CR) 10Mark Bergsma: [C: 04-2] "They're still there, just like all others from the same batch." [dns] - 10https://gerrit.wikimedia.org/r/199287 (owner: 10Faidon Liambotis) [12:18:30] (03CR) 10Mark Bergsma: [C: 031] Kill toolserver IPv4/IPv6 subnets [dns] - 10https://gerrit.wikimedia.org/r/199288 (owner: 10Faidon Liambotis) [12:20:32] 6operations, 10RESTBase, 10hardware-requests: Expand RESTBase cluster capacity - https://phabricator.wikimedia.org/T93790#1148439 (10faidon) What "does not have a lot of margin on IO bandwidth and storage capacity" mean exactly? Which resource is near exhaustion (IOPS, bandwidth or capacity) in your analysis... [12:25:31] (03CR) 10Mark Bergsma: [C: 04-1] "multatuli/slauerhoff (and its array) remain. The rest is gone indeed." [dns] - 10https://gerrit.wikimedia.org/r/199287 (owner: 10Faidon Liambotis) [12:26:32] 7Blocked-on-Operations, 7Puppet, 6operations, 10Beta-Cluster: Setup a mediawiki03 (or what not) on Beta Cluster that we can direct the security scanning work to - https://phabricator.wikimedia.org/T72181#1148442 (10csteipp) The last scan found an issue in varnish, so there is benefit to having it goo throu... [12:41:51] PROBLEM - puppet last run on mw2182 is CRITICAL: CRITICAL: Puppet has 1 failures [12:44:43] 6operations, 6Phabricator, 6Project-Creators: create procurement project - https://phabricator.wikimedia.org/T93796#1148459 (10mark) @RobH: I don't believe we've discussed this recently, and I'm not sure we've done our due diligence on making sure that the data will be secure enough, the e-mail side still wo... [12:46:11] RECOVERY - mediawiki-installation DSH group on mw2184 is OK: OK [12:55:42] RECOVERY - mediawiki-installation DSH group on mw2148 is OK: OK [12:59:11] RECOVERY - puppet last run on mw2182 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:04:51] RECOVERY - mediawiki-installation DSH group on mw2195 is OK: OK [13:07:31] RECOVERY - mediawiki-installation DSH group on mw2088 is OK: OK [13:31:51] 6operations, 6Phabricator: Moving procurement from RT to Phabricator - https://phabricator.wikimedia.org/T93760#1148587 (10Aklapper) [13:31:52] 6operations, 6Phabricator, 6Project-Creators: create procurement project - https://phabricator.wikimedia.org/T93796#1148585 (10Aklapper) 5Open>3stalled Setting status to STALLED; please reset once discussed [13:34:11] (03CR) 10Tim Landscheidt: "@hashar: If puppet-lint had an option to white-list certain directories apart from modules/, that would be very convenient indeed. Do you" [puppet] - 10https://gerrit.wikimedia.org/r/198116 (https://phabricator.wikimedia.org/T87132) (owner: 10Tim Landscheidt) [13:40:00] PROBLEM - puppet last run on mw2214 is CRITICAL: CRITICAL: Puppet has 1 failures [13:49:25] mobrovac: btw, for applying roles to hosts for restbase on staging, please ammend nodes/labs/staging.yaml in ops/puppet instead of using wikitech interface :) is the new hotness [13:51:35] YuviPanda: cool, me likes that better [13:51:48] mobrovac: :) [13:52:05] YuviPanda: can that be used for beta as well? or nodes/labs is not *that* hot? :) [13:52:21] mobrovac: it totally can be, yeah. [13:52:27] good to know [13:52:41] mobrovac: beta already has stuff setup for auto puppet / salt signing and auto puppetmaster setting, so don’t have to do that manually [13:52:51] mobrovac: but you might want to give the releng folks a headsup if you start using that with beta [13:53:34] YuviPanda: RESTBase is alredy set up in beta, so won't be poking around it now, was just curious :P [13:53:41] mobrovac: :) [13:54:22] (03PS5) 10Matanya: nova: lint compute.pp [puppet] - 10https://gerrit.wikimedia.org/r/195535 [13:54:59] (03PS6) 10Matanya: nova: lint compute.pp [puppet] - 10https://gerrit.wikimedia.org/r/195535 [13:55:22] 6operations, 6MediaWiki-Core-Team, 6Multimedia, 6Parsoid-Team, and 3 others: Prepare Platform/Ops April 2015 quarterly review presentation - https://phabricator.wikimedia.org/T91803#1148666 (10Qgil) [13:56:37] 6operations, 6Release-Engineering: Create a basic RSpec unit test for operations/puppet - https://phabricator.wikimedia.org/T78342#1148673 (10zeljkofilipin) [13:58:48] I have 18 pending puppet patches, most of them easy to review, would anyone please help me with that? i prefer avoiding getting into a rebase-loop [13:59:05] 6operations, 10MediaWiki-extensions-Graph, 6Services, 10service-template-node, 7service-runner: Deploy graphoid service into production - https://phabricator.wikimedia.org/T90487#1148697 (10mobrovac) [13:59:12] poking andrewbogott akosiaris and _joe_ [13:59:43] matanya: I can do a few, in a few minutes... [13:59:49] thanks [14:00:04] chasemp: Dear anthropoid, the time has come. Please deploy Phabricator update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150325T1400). [14:00:11] nope [14:00:17] nothing this week [14:00:32] at your convinance : https://gerrit.wikimedia.org/r/#/q/owner:%22Matanya+%253Cmatanya%2540foss.co.il%253E%22+status:open,n,z [14:02:48] (03CR) 10Steinsplitter: [C: 031] "@Alex Monk: We generally whitelist domains for GWT upload. Only trusted users has GWT / upload by urls access. There was never a problem.." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194913 (https://phabricator.wikimedia.org/T91630) (owner: 10Odder) [14:06:51] (03CR) 10Ottomata: [C: 04-1] "Questions!" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/198782 (https://phabricator.wikimedia.org/T89255) (owner: 10Nuria) [14:06:55] (03CR) 10Andrew Bogott: [C: 032] nova: lint compute.pp [puppet] - 10https://gerrit.wikimedia.org/r/195535 (owner: 10Matanya) [14:08:21] (03Abandoned) 10coren: Add labs.eqiad.wmnet. subnet [dns] - 10https://gerrit.wikimedia.org/r/194865 (https://phabricator.wikimedia.org/T63897) (owner: 10coren) [14:08:40] (03PS2) 10Andrew Bogott: labs_vmbuilder: resource attributes quoting [puppet] - 10https://gerrit.wikimedia.org/r/195764 (owner: 10Matanya) [14:09:02] matanya: want to add me as a reviewer to a few more? I don’t see much in my list [14:09:13] yes andrewbogott thanks [14:10:29] (03CR) 10Andrew Bogott: [C: 032] labs_vmbuilder: resource attributes quoting [puppet] - 10https://gerrit.wikimedia.org/r/195764 (owner: 10Matanya) [14:13:00] matanya: oops, I merged a patch that quoted resources and now I see a patch from you that unquotes them [14:13:08] i take it there was consensus against quotes? [14:14:25] 10Ops-Access-Requests, 6operations: Access request: +2 on cassandra submodule for services team members - https://phabricator.wikimedia.org/T93775#1148830 (10Ottomata) I'm fine with +2 for cassandra puppet repo, but I somehow doubt other opsens will like it. > The other issue is that we can't use the puppet c... [14:15:02] (03CR) 10Andrew Bogott: [C: 031] swift_new: lint and resource quoting [puppet] - 10https://gerrit.wikimedia.org/r/195607 (owner: 10Matanya) [14:15:25] godog: want to merge and babysit ^ ? [14:15:59] 10Ops-Access-Requests, 6operations: Access request: +2 on cassandra submodule for services team members - https://phabricator.wikimedia.org/T93775#1148831 (10Ottomata) Either way, you will have to get review from opsen, cause they aren't going to give you +2 on operations/puppet. Why not just accompany the su... [14:16:11] andrewbogott: yeah, I said I'd merge it and didn't :) will do now [14:16:21] thx [14:16:47] (03PS4) 10Filippo Giunchedi: swift_new: lint and resource quoting [puppet] - 10https://gerrit.wikimedia.org/r/195607 (owner: 10Matanya) [14:16:54] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] swift_new: lint and resource quoting [puppet] - 10https://gerrit.wikimedia.org/r/195607 (owner: 10Matanya) [14:16:56] thanks matanya ! [14:17:24] :) [14:17:41] (03CR) 10Andrew Bogott: [C: 032] mysql: selector outside a resource + 4 spaces [puppet] - 10https://gerrit.wikimedia.org/r/195518 (owner: 10Matanya) [14:17:46] yes andrewbogott , there was consensus against [14:18:01] 6operations, 10RESTBase, 10RESTBase-Cassandra: graphs for Cassandra metrics - https://phabricator.wikimedia.org/T93884#1148834 (10Eevans) 3NEW [14:18:21] matanya: ok. Sorry, I probably broke the rebase for another patch on that same code then :( [14:18:27] 6operations, 10RESTBase, 10RESTBase-Cassandra: graphs for Cassandra metrics - https://phabricator.wikimedia.org/T93884#1148845 (10Eevans) [14:18:47] no worries andrewbogott i can fix, what patch ? [14:20:34] matanya: the labs_vmbuilder patch I just merged [14:21:12] (03CR) 10Andrew Bogott: [C: 032] backup: resource attributes quoting [puppet] - 10https://gerrit.wikimedia.org/r/195661 (owner: 10Matanya) [14:22:06] (03CR) 10Andrew Bogott: [C: 04-1] "This one quotes where it should unquote :)" [puppet] - 10https://gerrit.wikimedia.org/r/195627 (owner: 10Matanya) [14:23:27] Coren: Subject: DegradedArray event on /dev/md126:labstore2001 [14:23:27] (03PS1) 10Matanya: labs_vmbuilder: unquote ensure [puppet] - 10https://gerrit.wikimedia.org/r/199614 [14:23:43] (03CR) 10Ottomata: "It seems I am being outnumbered here! :p But, at least, I would appreciate some responses to my arguments. Thus far the only one I have" [puppet] - 10https://gerrit.wikimedia.org/r/196335 (https://phabricator.wikimedia.org/T92560) (owner: 10Eevans) [14:23:45] (03CR) 10Andrew Bogott: [C: 04-1] "one addition, inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/195616 (owner: 10Matanya) [14:23:46] andrewbogott: fix ^^ [14:24:03] 6operations, 10RESTBase, 7Monitoring, 5Patch-For-Review: Detailed cassandra monitoring: metrics and dashboards done, need to set up alerts - https://phabricator.wikimedia.org/T78514#1148863 (10fgiunchedi) >>! In T78514#1148347, @mobrovac wrote: >>>! In T78514#1148327, @fgiunchedi wrote: >> I think we need... [14:24:12] (03PS2) 10Andrew Bogott: labs_vmbuilder: unquote ensure [puppet] - 10https://gerrit.wikimedia.org/r/199614 (owner: 10Matanya) [14:24:26] (03CR) 10Andrew Bogott: [C: 032] labs_vmbuilder: unquote ensure [puppet] - 10https://gerrit.wikimedia.org/r/199614 (owner: 10Matanya) [14:24:30] (03CR) 10Matanya: limn: minor lint and Resource attributes quoting (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/195616 (owner: 10Matanya) [14:24:45] paravoid: The array is being rebuilt already. [14:25:03] paravoid: at 30% [14:25:08] k [14:26:01] (03CR) 10Andrew Bogott: [C: 032] ldap: selector outside a resource [puppet] - 10https://gerrit.wikimedia.org/r/195524 (owner: 10Matanya) [14:27:20] matanya: I’m going to eat some breakfast and let that last bunch of patches settle in a bit :) [14:27:24] (03PS2) 10Matanya: dynamicproxy: resource attributes quote [puppet] - 10https://gerrit.wikimedia.org/r/195627 [14:27:31] thanks much andrewbogott [14:27:51] (03PS3) 10Hashar: zuul: lint [puppet] - 10https://gerrit.wikimedia.org/r/195769 (owner: 10Matanya) [14:28:53] (03CR) 10Hashar: [C: 031 V: 031] zuul: lint [puppet] - 10https://gerrit.wikimedia.org/r/195769 (owner: 10Matanya) [14:30:29] hasharLunch: What server will ^ apply to? [14:31:11] 6operations, 10RESTBase, 10RESTBase-Cassandra: Cassandra/CQL query interface monitoring - https://phabricator.wikimedia.org/T93886#1148875 (10Eevans) 3NEW [14:31:35] (03CR) 10Andrew Bogott: [C: 032] zuul: lint [puppet] - 10https://gerrit.wikimedia.org/r/195769 (owner: 10Matanya) [14:31:55] andrewbogott: just gallium / lanthanum and the ci machine. It is just some whitespaces though :) [14:32:46] hashar: still I prefer to watch :) [14:35:33] 6operations, 10RESTBase, 10RESTBase-Cassandra: Cassandra/CQL query interface monitoring - https://phabricator.wikimedia.org/T93886#1148885 (10Eevans) [14:35:34] 6operations, 10RESTBase, 7Monitoring, 5Patch-For-Review: Detailed cassandra monitoring: metrics and dashboards done, need to set up alerts - https://phabricator.wikimedia.org/T78514#1148884 (10Eevans) [14:35:35] 6operations, 10RESTBase, 10RESTBase-Cassandra: graphs for Cassandra metrics - https://phabricator.wikimedia.org/T93884#1148886 (10Eevans) [14:36:29] (03CR) 10Hashar: [C: 04-1] "I never !log Zuul config changes myself. The git log and reflog are enough to find out what happened if needed." [puppet] - 10https://gerrit.wikimedia.org/r/197386 (owner: 10Legoktm) [14:37:11] RECOVERY - puppet last run on mw2214 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [14:40:05] I should maybe do SWAT today, it's been a while. [14:40:24] Ooh, it's empty, even better. [14:40:25] 6operations, 6Phabricator: Moving procurement from RT to Phabricator - https://phabricator.wikimedia.org/T93760#1148898 (10RobH) I realize I corrected myself from employee to NDA. It was intentional since we don't have an employee list in phabricator. [14:41:16] 6operations, 6Phabricator, 6Project-Creators: create procurement project - https://phabricator.wikimedia.org/T93796#1148903 (10RobH) @Chasemp and I planned to test this once we have the project made. The project creation has to happen for the ACL testing. [14:41:22] marktraceur: feel free to proceed changes in operations/mediawiki-config instead :D [14:41:29] 6operations, 6Phabricator, 6Project-Creators: create procurement project - https://phabricator.wikimedia.org/T93796#1148905 (10RobH) 5stalled>3Open [14:41:29] 6operations, 6Phabricator: Moving procurement from RT to Phabricator - https://phabricator.wikimedia.org/T93760#1148906 (10RobH) [14:41:55] 6operations, 10RESTBase, 7Monitoring, 5Patch-For-Review: Detailed cassandra monitoring: metrics and dashboards done, need to set up alerts - https://phabricator.wikimedia.org/T78514#1148907 (10Eevans) > btw that added a lot of metrics, consuming ~9% of disk space on graphite1001, e.g. Is that too much? >... [14:42:40] hashar: Or I could do my real job [14:43:31] 6operations, 10RESTBase, 10RESTBase-Cassandra: Cassandra/CQL query interface monitoring - https://phabricator.wikimedia.org/T93886#1148909 (10Eevans) [14:43:50] (03PS2) 10Giuseppe Lavagetto: mediawiki: re-enable monitoring in codfw [puppet] - 10https://gerrit.wikimedia.org/r/199606 [14:47:30] (03CR) 10Hashar: [C: 04-1] "I ran it through the puppet compiler for gallium.wikimedia.org and lanthanum.eqiad.wmnet: http://puppet-compiler.wmflabs.org/642/change/19" [puppet] - 10https://gerrit.wikimedia.org/r/196175 (https://phabricator.wikimedia.org/T92475) (owner: 10Faidon Liambotis) [14:51:01] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [14:51:05] marktraceur: Not empty anymore, aude is apparently creating a patch. [14:51:06] 6operations, 10RESTBase, 7Monitoring, 5Patch-For-Review: Detailed cassandra monitoring: metrics and dashboards done, need to set up alerts - https://phabricator.wikimedia.org/T78514#1148956 (10GWicke) >>! In T78514#1148863, @fgiunchedi wrote: >>> not sure what's with the random suffix for example >> >> Th... [14:51:09] Ah. [14:51:18] indeed [14:51:39] * aude could deploy myself if that was more convenient [14:51:41] (03CR) 10JanZerebecki: "Having FIPS compliance does not seem like a positive thing. Having FIPS certification is very much a negative thing. Tests that rate negat" [puppet] - 10https://gerrit.wikimedia.org/r/199582 (owner: 10BBlack) [14:51:49] (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki: re-enable monitoring in codfw [puppet] - 10https://gerrit.wikimedia.org/r/199606 (owner: 10Giuseppe Lavagetto) [14:52:24] (03CR) 10Jforrester: "Replacing "Bug 12345" with "T14345" would be good too… follow-up maybe?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/199574 (https://phabricator.wikimedia.org/T31902) (owner: 10Alex Monk) [14:53:01] (03CR) 10Jforrester: [C: 031] Make references to tasks/bugs more consistent [mediawiki-config] - 10https://gerrit.wikimedia.org/r/199574 (https://phabricator.wikimedia.org/T31902) (owner: 10Alex Monk) [14:54:56] aude: I wouldn't mind that, if you're the only one [14:54:59] If not I'll do it [14:55:08] marktraceur: ok, can take care of it [14:55:08] It has seriously been a *long* time. [14:55:21] might be a few minutes late, while waiting for jenkins etc [14:55:43] No problem. [14:56:26] 6operations, 10RESTBase, 7Monitoring, 5Patch-For-Review: Detailed cassandra monitoring: metrics and dashboards done, need to set up alerts - https://phabricator.wikimedia.org/T78514#1148964 (10fgiunchedi) >>! In T78514#1148907, @Eevans wrote: >> btw that added a lot of metrics, consuming ~9% of disk space... [14:57:34] 6operations, 6Phabricator: Moving procurement from RT to Phabricator - https://phabricator.wikimedia.org/T93760#1148965 (10RobH) [14:58:26] 6operations, 7HTTPS, 3HTTPS-by-default: Force all Wikimedia cluster traffic to be over SSL for all users (logged-in and anon) - https://phabricator.wikimedia.org/T49832#1148969 (10BBlack) We can try in case it's a lingering mistake, but if it's intentional I'm sure they'll just adapt on their side. [14:59:40] 6operations, 6Phabricator: Moving procurement from RT to Phabricator - https://phabricator.wikimedia.org/T93760#1148988 (10mark) [15:00:05] manybubbles, anomie, ^d, thcipriani, marktraceur: Respected human, time to deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150325T1500). Please do the needful. [15:01:47] 6operations, 7HTTPS, 3HTTPS-by-default: Force all Wikimedia cluster traffic to be over SSL for all users (logged-in and anon) - https://phabricator.wikimedia.org/T49832#1149006 (10faidon) Well, CN belongs in ulsfo anyway, geographically speaking. It stayed in eqiad during the ulsfo rollout on purpose, in ord... [15:02:25] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [15:02:30] 10Ops-Access-Requests, 6operations: Access request: +2 on cassandra submodule for services team members - https://phabricator.wikimedia.org/T93775#1149007 (10GWicke) [15:02:34] aude: OK, it's on you. I'm running to get breakfast. [15:02:40] marktraceur: k [15:03:27] 6operations, 6Phabricator: Moving procurement from RT to Phabricator - https://phabricator.wikimedia.org/T93760#1149011 (10jeremyb) [15:03:27] 6operations, 6Phabricator, 6Project-Creators: create procurement project - https://phabricator.wikimedia.org/T93796#1149009 (10jeremyb) 5stalled>3Open maybe midair collision? phab doesn't do edit conflict resolution AFAIK [15:03:54] * aude will have to do scap [15:04:02] since we introduce a new message :/ [15:04:24] 10Ops-Access-Requests, 6operations: Access request: +2 on cassandra submodule for services team members - https://phabricator.wikimedia.org/T93775#1145807 (10GWicke) >>! In T93775#1148830, @Ottomata wrote: > I'm fine with +2 for cassandra puppet repo, but I somehow doubt other opsens will like it. > >> The ot... [15:04:40] aude: we have a patch that will make scap faster to rebuild l10n [15:04:53] bd808: yay :) [15:04:58] if you are going to scap we should get it first and save you 5-10m [15:05:37] * bd808 checks to make sure it is working right in beta [15:05:39] bd808: ok [15:05:54] i might pull in an core patch also [15:06:07] which might take a few minutes to get ready [15:06:28] 10Ops-Access-Requests, 6operations: Access request: +2 on cassandra submodule for services team members - https://phabricator.wikimedia.org/T93775#1149027 (10Ottomata) Whoops not sure how that line got linked to: https://github.com/wikimedia/integration-config/blob/master/zuul/layout.yaml#L2739 Does that mak... [15:07:44] (03CR) 10BryanDavis: "Tested via cherry-pick on deployment-bastion. "Normal" scap run times went back down to <2 minutes." [tools/scap] - 10https://gerrit.wikimedia.org/r/199318 (https://phabricator.wikimedia.org/T93737) (owner: 10BryanDavis) [15:08:15] ^d, twentyafterfour: if one of you will merge ^ I will sync it in beta + prod [15:08:29] * ^d looks [15:09:23] (03CR) 10Chad: [C: 032] Copy l10n CDB files to rebuildLocalisationCache.php tmp dir [tools/scap] - 10https://gerrit.wikimedia.org/r/199318 (https://phabricator.wikimedia.org/T93737) (owner: 10BryanDavis) [15:09:40] sweet [15:09:46] :) [15:10:12] (03Merged) 10jenkins-bot: Copy l10n CDB files to rebuildLocalisationCache.php tmp dir [tools/scap] - 10https://gerrit.wikimedia.org/r/199318 (https://phabricator.wikimedia.org/T93737) (owner: 10BryanDavis) [15:13:23] (03PS1) 10BBlack: remap CN -> ulsfo [dns] - 10https://gerrit.wikimedia.org/r/199621 [15:14:37] 6operations, 7Mail, 7Monitoring: Mailing lists alerts - https://phabricator.wikimedia.org/T93783#1149081 (10Dzahn) _should_ have been resolved back in T84150 but hasn't? also see: https://phabricator.wikimedia.org/rOPUP8135b7bf6f6ebe6e37e1a8961520f712c9d052ca https://phabricator.wikimedia.org/rOPUP754c5d5d... [15:14:39] (03CR) 10Faidon Liambotis: [C: 032] remap CN -> ulsfo [dns] - 10https://gerrit.wikimedia.org/r/199621 (owner: 10BBlack) [15:14:41] 7Puppet, 6Multimedia, 6Release-Engineering, 6Scrum-of-Scrums, and 3 others: Create basic puppet role for Sentry - https://phabricator.wikimedia.org/T84956#1149083 (10Gilles) a:3Gilles [15:14:47] bblack: shall I push? [15:14:54] (03CR) 10BBlack: [C: 032] remap CN -> ulsfo [dns] - 10https://gerrit.wikimedia.org/r/199621 (owner: 10BBlack) [15:14:55] bd808: let me know when you are done [15:14:58] too late? :) [15:15:07] heh [15:15:14] aude: will do. syncing via trebuchet now [15:15:17] k [15:16:39] 6operations, 7Mail, 7Monitoring: Mailing lists alerts - https://phabricator.wikimedia.org/T93783#1149090 (10Dzahn) existing check for mailman queue size: https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=sodium&service=mailman_queue_size checks /var/lib/mailman/qfiles/in while this tasks... [15:17:42] 7Puppet, 10Tool-Labs: Puppetize adding new node to OGE - https://phabricator.wikimedia.org/T88712#1149092 (10coren) p:5Triage>3Low Filing this in the "would be nice to have, lots of work" category for now. [15:18:26] !log trebuchet fetch of scap failed on mw1222 with return code 128 [15:18:35] Logged the message, Master [15:18:40] 7Puppet, 10Tool-Labs: Puppetize adding new node to OGE - https://phabricator.wikimedia.org/T88712#1149096 (10coren) a:5coren>3None [15:19:28] !log trebuchet checkout of scap failed on mw1113, mw1222, and mw1104 with return code 30 [15:19:33] Logged the message, Master [15:19:46] !log Updated scap to include 4a63a63 (Copy l10n CDB files to rebuildLocalisationCache.php tmp dir) [15:19:50] Logged the message, Master [15:19:51] aude: ^ all done [15:19:54] thanks [15:20:16] 7Puppet, 10Tool-Labs: Fully puppetize Grid Engine (Tracking) - https://phabricator.wikimedia.org/T88711#1149098 (10coren) a:5coren>3None [15:20:37] 7Puppet, 10Tool-Labs, 5Patch-For-Review: Puppetize adding a host to a particular queue - https://phabricator.wikimedia.org/T88713#1149102 (10coren) p:5Triage>3Low a:5coren>3None [15:20:59] (03PS3) 10Faidon Liambotis: Kill esams dead/non-existent hosts [dns] - 10https://gerrit.wikimedia.org/r/199287 [15:21:01] (03PS3) 10Faidon Liambotis: Kill toolserver IPv4/IPv6 subnets [dns] - 10https://gerrit.wikimedia.org/r/199288 [15:21:03] (03PS4) 10Faidon Liambotis: Move maerlant out of .esams.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/199289 [15:22:30] (03CR) 10Faidon Liambotis: [C: 032] Kill esams dead/non-existent hosts [dns] - 10https://gerrit.wikimedia.org/r/199287 (owner: 10Faidon Liambotis) [15:22:41] (03CR) 10Faidon Liambotis: [C: 032] Kill toolserver IPv4/IPv6 subnets [dns] - 10https://gerrit.wikimedia.org/r/199288 (owner: 10Faidon Liambotis) [15:22:51] <^d> _joe_: Is mw2048 ok again and could be brought back in as a scap proxy? [15:23:47] <_joe_> ^d: it should, but I have an interview later (around SWAT time) so I preferred to delay putting it back to tomorrow [15:23:58] <^d> Ok no worries, just checking back in [15:24:43] * aude waits for jenkins [15:26:53] and wonders if zuul is stuck again [15:27:40] No, it looks ok [15:27:47] think it's just maybe slow [15:27:49] but not stuck [15:27:54] :/ [15:29:07] (03CR) 10BBlack: "Re: FIPS, I don't see it as a bad thing, but I totally agree it has nothing to do with security. As for the FS argument: I tried some pat" [puppet] - 10https://gerrit.wikimedia.org/r/199582 (owner: 10BBlack) [15:30:43] ok, i see the jenkins job [15:36:23] 81%.... [15:36:27] almost done [15:36:51] but then have a second patch :/ [15:40:25] 6operations, 7HTTPS, 3HTTPS-by-default: Force all Wikimedia cluster traffic to be over SSL for all users (logged-in and anon) - https://phabricator.wikimedia.org/T49832#1149141 (10BBlack) The switch of CN to ulsfo was done in https://gerrit.wikimedia.org/r/#/c/199621/ about half an hour ago. Our TTLs are co... [15:43:12] 6operations: re-deploy cp4009 - https://phabricator.wikimedia.org/T93640#1149160 (10RobH) 5Open>3Resolved a:3RobH brandon took care of this on the task for the onsite replacement, all done, resolving. [15:44:56] 7Puppet, 6Multimedia, 6Release-Engineering, 6Scrum-of-Scrums, and 3 others: Create basic puppet role for Sentry - https://phabricator.wikimedia.org/T84956#1149170 (10Gilles) [15:47:54] still waiting on jenkins [15:52:44] if i was fixing a more critical fatal error, i would not be this patient for jenkins :( [15:52:52] (03PS1) 10BBlack: enable OCSP Stapling everywhere [puppet] - 10https://gerrit.wikimedia.org/r/199624 [15:54:04] aude: write fewer tests? :P [15:54:17] greg-g: it's core [15:54:53] the mediawiki-phpunit-zend job is always so slow [15:54:58] yeah [15:55:14] i don't think it was always this slow but maybe it's including more things (like all extensions?) [15:55:51] <^d> We write more tests ;-) [15:56:11] <^d> Actually, best thing you can do to make a test faster is to not use MediaWikiTestCase [15:56:12] <^d> :p [15:56:17] yeah [15:56:26] and probably time for more profiling [15:56:37] like when we fixed tests using bcrypt everywhere [15:58:47] !log aude Started scap: Wikidata bug fixes and fix rollback bug in core [15:58:52] Logged the message, Master [16:00:11] PROBLEM - check_raid on barium is CRITICAL: CRITICAL: MegaSAS 2 logical, 4 physical: a0/v1 (2 disk array) degraded [16:02:05] (03CR) 10JanZerebecki: "That was in I87616455abd58c986aa960348fc20c017f097716 and I think we didn't reject it, but just instead went with first only disabling RC4" [puppet] - 10https://gerrit.wikimedia.org/r/199582 (owner: 10BBlack) [16:05:12] PROBLEM - check_raid on barium is CRITICAL: CRITICAL: MegaSAS 2 logical, 4 physical: a0/v1 (2 disk array) degraded [16:06:22] 6operations, 10ops-eqiad, 10ops-fundraising: barium has a failed HDD - https://phabricator.wikimedia.org/T93899#1149257 (10Jgreen) 3NEW [16:06:32] 6operations, 10ops-eqiad, 10ops-fundraising: barium has a failed HDD - https://phabricator.wikimedia.org/T93899#1149265 (10Jgreen) p:5Triage>3High [16:09:02] 6operations, 7Graphite: logins on graphite - https://phabricator.wikimedia.org/T93158#1149300 (10RobH) I can confirm that for me, my graphite and icinga login are identical (ldap) and both function. [16:10:12] PROBLEM - check_raid on barium is CRITICAL: CRITICAL: MegaSAS 2 logical, 4 physical: a0/v1 (2 disk array) degraded [16:12:13] ACKNOWLEDGEMENT - check_raid on barium is CRITICAL: CRITICAL: MegaSAS 2 logical, 4 physical: a0/v1 (2 disk array) degraded Jeff_Green see phabricator T93899 [16:21:48] !log aude Finished scap: Wikidata bug fixes and fix rollback bug in core (duration: 23m 01s) [16:21:53] Logged the message, Master [16:22:30] 6operations, 10ops-codfw: mw2050 management unreachable - https://phabricator.wikimedia.org/T93729#1149334 (10RobH) and now it just works.... oh well, its working now, resolving task. if it happens again this can show up in search histories. [16:22:37] 6operations, 10ops-codfw: mw2050 management unreachable - https://phabricator.wikimedia.org/T93729#1149335 (10RobH) 5Open>3Resolved a:3RobH [16:24:52] (03CR) 10BBlack: [C: 032] enable OCSP Stapling everywhere [puppet] - 10https://gerrit.wikimedia.org/r/199624 (owner: 10BBlack) [16:25:33] done :) [16:26:49] 6operations, 6Multimedia: Add monitoring of upload rate on commons to icingia alerts - https://phabricator.wikimedia.org/T92322#1149372 (10fgiunchedi) btw the missing metrics should be related to T85641 [16:27:09] bblack, would you please review https://gerrit.wikimedia.org/r/198805 ? [16:28:22] 6operations, 7Graphite: logins on graphite - https://phabricator.wikimedia.org/T93158#1149379 (10RobH) clarification: so i tested the initial login prompt on graphite. The additional login option once the graphite web GUI loads doesn't work for me either. [16:34:33] dr0ptp4kt: yes [16:34:41] bblack: thx [16:35:18] robh: Quick review? https://gerrit.wikimedia.org/r/#/c/199542/1 [16:36:11] (03CR) 10RobH: [C: 031] Add labstore200[12] minimal configuration [puppet] - 10https://gerrit.wikimedia.org/r/199542 (owner: 10coren) [16:36:22] Danke. [16:36:25] you just wanted a +1 as proof you arent nuts right? [16:36:28] looks legit to me [16:36:47] robh: Yeah, it's the "Doesn't look like it'll break the world" check. :-) [16:36:53] cool [16:37:20] (03PS2) 10coren: Add labstore200[12] minimal configuration [puppet] - 10https://gerrit.wikimedia.org/r/199542 [16:41:18] (03CR) 10coren: [C: 032] Add labstore200[12] minimal configuration [puppet] - 10https://gerrit.wikimedia.org/r/199542 (owner: 10coren) [16:42:25] 6operations, 6Multimedia: Add monitoring of upload rate on commons to icingia alerts - https://phabricator.wikimedia.org/T92322#1149436 (10fgiunchedi) a:3fgiunchedi [16:42:40] 6operations, 6Multimedia: Add monitoring of upload rate on commons to icingia alerts - https://phabricator.wikimedia.org/T92322#1106360 (10fgiunchedi) p:5Triage>3Normal [16:43:39] (03CR) 10Ori.livneh: [C: 04-1] statsite: new module (036 comments) [puppet] - 10https://gerrit.wikimedia.org/r/199599 (https://phabricator.wikimedia.org/T90111) (owner: 10Filippo Giunchedi) [16:45:45] PROBLEM - Freshness of OCSP Stapling files on amssq33 is CRITICAL: CRITICAL: File /var/cache/ocsp/sni.m.wiktionary.org.ocsp is more than 29100 secs old! [16:45:56] PROBLEM - puppet last run on vanadium is CRITICAL: CRITICAL: puppet fail [16:49:02] 6operations, 7HTTPS, 3HTTPS-by-default: Force all Wikimedia cluster traffic to be over SSL for all users (logged-in and anon) - https://phabricator.wikimedia.org/T49832#1149444 (10Chmarkine) Great! https://en.m.wikipedia.org now works in China. According to [[https://zh.wikipedia.org/wiki/Wikipedia_talk:%E7%... [16:50:33] (03CR) 10BBlack: Do not fragment cache with provenance parameter (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/198805 (owner: 10Dr0ptp4kt) [16:51:09] dr0ptp4kt: ^ (minor nits to make PCRE do fewer useless things, basically - I think it does function correctly as-is) [16:52:03] bblack: thanks! will take a close look [16:54:43] paravoid: Did you take a peek at https://gerrit.wikimedia.org/r/#/c/199542/1 after the changes? Do the thresholds make sense to you? [16:55:22] 7Puppet, 6Labs, 6Phabricator, 5Patch-For-Review: Disable by default Phabricator alternate file domain on Labs - https://phabricator.wikimedia.org/T93837#1149454 (10Negative24) [16:55:25] wrong commit? [16:55:40] Yeah, coptpasta fail. https://gerrit.wikimedia.org/r/#/c/199297/3 [16:56:25] Ah, hm. I only changed one set of comments. [16:56:30] * Coren fixes. [16:56:32] 600mbps for critical is a bit on the low side I'd say [16:57:01] Too conservative? It /does/ require 10% of the samples. [16:57:03] I'd do 600/800 or so [16:57:21] kk. [16:57:27] and put all that in a role class? [16:57:44] I hate seeing monitoring checks in site.pp [16:57:45] Now? I wanted to avoid merging too many things at once. [16:58:00] (03PS1) 10Rush: Revert "exim4.conf.SMTP_IMAP_MM.erb local mail can cause loops" [puppet] - 10https://gerrit.wikimedia.org/r/199634 [16:58:02] Because I'm bringing other things into those classes. [16:58:09] just another commit before that that moves it to a role class? [16:58:34] Okay, lemme split out the replication functionality away from the other commit and do that first then. [16:58:43] (03PS1) 10Filippo Giunchedi: graphite: enable locking writes [puppet] - 10https://gerrit.wikimedia.org/r/199636 (https://phabricator.wikimedia.org/T86316) [17:00:33] (03CR) 10Rush: [C: 032] Revert "exim4.conf.SMTP_IMAP_MM.erb local mail can cause loops" [puppet] - 10https://gerrit.wikimedia.org/r/199634 (owner: 10Rush) [17:02:24] 6operations, 10hardware-requests, 3wikis-in-codfw: setup deployment server in codfw (tin equivalent) - https://phabricator.wikimedia.org/T91678#1149486 (10mark) Ok, let's assign some bare metal then. [17:02:45] RECOVERY - puppet last run on vanadium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:02:53] 6operations, 10hardware-requests, 3wikis-in-codfw: setup deployment server in codfw (tin equivalent) - https://phabricator.wikimedia.org/T91678#1149488 (10RobH) a:5mark>3RobH Claiming, I'll allocate a system for this later. [17:04:04] 6operations, 10ops-eqiad, 10ops-fundraising: barium has a failed HDD - https://phabricator.wikimedia.org/T93899#1149490 (10Cmjohnson) a:3Cmjohnson [17:04:35] (03PS3) 10Dr0ptp4kt: Do not fragment cache with provenance parameter [puppet] - 10https://gerrit.wikimedia.org/r/198805 [17:05:12] bblack: ^ i *believe* that does it. lemme know if you want me to do the X-WMF-WPROV assignment with a noncapturing expression as well (and i guess in that case use the \1 backreference instead of the \2 backreference) [17:06:36] 6operations, 10hardware-requests: order new array for dataset1001 - https://phabricator.wikimedia.org/T93118#1149493 (10RobH) a:5RobH>3ArielGlenn I'm assigning this task to @ArielGlenn, after IRC discussion about the space/disk requirements needed for this request. We're going to need to document on tas... [17:11:24] (03CR) 10Dr0ptp4kt: "Addressed comments in PS3." [puppet] - 10https://gerrit.wikimedia.org/r/198805 (owner: 10Dr0ptp4kt) [17:11:46] (03CR) 10Dr0ptp4kt: "Addressed comments in PS3." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/198805 (owner: 10Dr0ptp4kt) [17:12:12] dr0ptp4kt: in interview now, will get back to this on the hour-ish [17:12:31] 6operations, 7Graphite: logins on graphite - https://phabricator.wikimedia.org/T93158#1149514 (10Dzahn) @eevans are you clicking the "login" button in the upper right corner in the UI itself and then get the error? like i did? that doesn't work. but as RobH points out the initial pop-up auth from Apache works... [17:12:41] bblack: thank you. happy interviewing [17:16:56] ottomata: there's a monitoring alert about kafka broker messages. FifteenMinuteRate CRITICAL [17:18:33] bah thanks. [17:19:51] hm looks like analytics1021 hasn't been active for a while, from the grafana graph [17:20:08] since 3/21 [17:21:56] RECOVERY - Kafka Broker Messages In on analytics1021 is OK: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate OKAY: 5162.18515082 [17:22:04] :) [17:22:56] jgage: i just did an election [17:24:08] cool thanks. i guess i missed it getting kicked out because it happened over the weekend. [17:24:42] (03PS1) 10John F. Lewis: planet: use https instead of http [puppet] - 10https://gerrit.wikimedia.org/r/199638 (https://phabricator.wikimedia.org/T70554) [17:24:56] the nice thing is that traffic kept flowing. yay redundancy. [17:24:59] jgage: ^^ mind deploying that? [17:26:21] john, sure. [17:26:32] (03CR) 10Dzahn: [C: 031] planet: use https instead of http [puppet] - 10https://gerrit.wikimedia.org/r/199638 (https://phabricator.wikimedia.org/T70554) (owner: 10John F. Lewis) [17:26:39] (03CR) 10Gage: [C: 032] planet: use https instead of http [puppet] - 10https://gerrit.wikimedia.org/r/199638 (https://phabricator.wikimedia.org/T70554) (owner: 10John F. Lewis) [17:26:41] (03PS2) 10Nuria: Adding template for apache to serve static content [puppet] - 10https://gerrit.wikimedia.org/r/198782 (https://phabricator.wikimedia.org/T89255) [17:28:56] jgage: thanks :) [17:33:10] (03PS1) 10coren: Labs: Split out labstores substance into roles [puppet] - 10https://gerrit.wikimedia.org/r/199639 [17:33:48] paravoid: ^^ [17:34:05] (03CR) 10jenkins-bot: [V: 04-1] Labs: Split out labstores substance into roles [puppet] - 10https://gerrit.wikimedia.org/r/199639 (owner: 10coren) [17:34:32] Ah. Typo. [17:35:15] PROBLEM - Host mw2027 is DOWN: PING CRITICAL - Packet loss = 100% [17:35:40] 25 years of C rear their heads. :-) [17:36:05] RECOVERY - Host mw2027 is UP: PING WARNING - Packet loss = 93%, RTA = 568.78 ms [17:36:18] (03PS2) 10coren: Labs: Split out labstores substance into roles [puppet] - 10https://gerrit.wikimedia.org/r/199639 [17:40:11] (03CR) 10Nuria: ">Why is this in the limn module?" [puppet] - 10https://gerrit.wikimedia.org/r/198782 (https://phabricator.wikimedia.org/T89255) (owner: 10Nuria) [17:42:17] (03PS4) 10coren: Labs: Monitor network staturation on labstores [puppet] - 10https://gerrit.wikimedia.org/r/199297 (https://phabricator.wikimedia.org/T92629) [17:42:46] paravoid: ^^ [17:43:02] 7Blocked-on-Operations, 6operations, 10Continuous-Integration, 6Release-Engineering, 6Scrum-of-Scrums: Jenkins: Re-enable lint checks for Apache config in operations-puppet - https://phabricator.wikimedia.org/T72068#1149620 (10dduvall) [17:43:08] (03PS1) 10Dzahn: planet: use standard for http->https redirects [puppet] - 10https://gerrit.wikimedia.org/r/199641 [17:46:03] (03CR) 10John F. Lewis: [C: 031] planet: use standard for http->https redirects [puppet] - 10https://gerrit.wikimedia.org/r/199641 (owner: 10Dzahn) [17:46:35] (03CR) 10Dzahn: [C: 032] planet: use standard for http->https redirects [puppet] - 10https://gerrit.wikimedia.org/r/199641 (owner: 10Dzahn) [17:47:15] PROBLEM - puppet last run on mw2118 is CRITICAL: CRITICAL: puppet fail [17:48:08] (03PS1) 10RobH: raid10-gpt partman recipe corrections [puppet] - 10https://gerrit.wikimedia.org/r/199642 (https://phabricator.wikimedia.org/T93113) [17:48:11] partman, why you so hard to write recipes for? [17:48:47] (03CR) 10RobH: [C: 032] raid10-gpt partman recipe corrections [puppet] - 10https://gerrit.wikimedia.org/r/199642 (https://phabricator.wikimedia.org/T93113) (owner: 10RobH) [17:49:16] yea, partman. agree and so easy to fail [17:50:29] (03CR) 10Mattflaschen: "Ricordisamoa's -1 is just for the commit message, which has been addressed." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/196068 (https://phabricator.wikimedia.org/T90670) (owner: 10EBernhardson) [17:50:39] 6operations, 7HTTPS, 5Patch-For-Review: https://planet.wikimedia.org redirects to http://meta.wikimedia.org/wiki/Planet_Wikimedia - https://phabricator.wikimedia.org/T70554#1149649 (10Dzahn) [17:54:27] twentyafterfour: around? [17:55:21] aude: yep [17:55:28] when do you think you will start deploy? [17:55:44] suppose it might be an hour once you make the branch and put it on test [17:56:28] * aude mostly wants to know when wikipedias will be switched to wmf22 [17:57:21] aude: it takes me at least an hour, I'm still slightly slow. Would you like me to let you know right before I push the change ti wmf22? [17:57:27] yeah [17:57:48] i would like to go home and eat first and then be around in case any problems with our code [17:57:57] so, ~45 minutes [17:58:04] and hoo is around also [17:58:30] indeed :) [17:58:42] cool [17:58:56] ok, back in ~45 min [17:58:57] aude: it'll that long for sure [17:59:02] k [17:59:33] (03CR) 10Ricordisamoa: Enable editing of Flow posts, by autoconfirmed users, on mediawikwiki, enwiki, ruwiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/196068 (https://phabricator.wikimedia.org/T90670) (owner: 10EBernhardson) [18:00:05] twentyafterfour, greg-g: Dear anthropoid, the time has come. Please deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150325T1800). [18:05:38] 6operations, 6Multimedia: Add monitoring of upload rate on commons to icingia alerts - https://phabricator.wikimedia.org/T92322#1149719 (10Tgr) [18:05:55] RECOVERY - puppet last run on mw2118 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [18:07:21] (03PS1) 10RobH: testing a partman recipe [puppet] - 10https://gerrit.wikimedia.org/r/199645 (https://phabricator.wikimedia.org/T93113) [18:07:45] _joe_: yt? have a few minutes to talk about the deprovisioning of tungsten? https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=1&host=tungsten [18:08:35] <_joe_> nuria: I think you need the other italian, godog [18:09:21] _joe_: ah you are right! [18:09:29] * godog the other italian [18:09:40] <_joe_> eheh [18:09:47] (03CR) 10RobH: [C: 032] "This is referencing a file that intentionally is NOT maintained by puppet. The reasoning behind this is as follows:" [puppet] - 10https://gerrit.wikimedia.org/r/199645 (https://phabricator.wikimedia.org/T93113) (owner: 10RobH) [18:09:49] <_joe_> godog: we keep getting swapped :P [18:10:18] <_joe_> the other italian in ops, to be more precise [18:10:22] godog, _joe_ : my dislexia is terrible, i totally read godog name like something else [18:11:02] anyways, now that godog and I are acquitted... [18:11:15] <_joe_> nuria: don't worry, we are on the same team, joined at the same time, and we both have nicknames that don't relate to our RL names. It happens now and then :) [18:11:39] haha yeah it did happen at the beginning alright [18:11:51] it happens less after drinkign with them. [18:11:59] drinking even. [18:12:04] nuria: hey! there's some more context at https://phabricator.wikimedia.org/T90591 what were you curious about? [18:12:06] joedog did it? [18:12:12] heh [18:12:16] ya, i am late to the party. godog: let me know if you have a few minutes to talk about tungsten deprovisioning [18:12:21] mutante: ok, that made me lol [18:13:19] godog: You know that Eeventlogging statsd reporting runs on that host, right? [18:14:03] godog: so when deprovisioning it we need a new host that also runs those processes (as those are the ones we use for alarming) [18:14:30] _joe_: not sure if anyone is going to work on scaling sentry in the forseeable future, but the plan was to get JS errors into logstash and have a logstash plugin report them to sentry at a controlled rate [18:14:47] that could be used to work around a lot of scaling issues [18:14:54] 7Blocked-on-Operations, 6operations, 10Continuous-Integration, 6Scrum-of-Scrums: Jenkins is using php-luasandbox 1.9-1 for zend unit tests; precise should be upgraded to 2.0-7+wmf2.1 or equivalent - https://phabricator.wikimedia.org/T88798#1149803 (10akosiaris) Hello, this is taking a little bit more than... [18:15:16] <_joe_> tgr: why? [18:15:38] nuria: tungsten has been replaced by graphite1001, everything in puppet is supposed to have moved there too, is it in puppet? [18:15:44] <_joe_> introducing more moving parts instead of using celery (which basically does exactly that)? [18:15:48] 6operations, 6Phabricator: Moving procurement from RT to Phabricator - https://phabricator.wikimedia.org/T93760#1149808 (10Krenair) > Procurement project task has to be locked down by view/read/edit/everything to ONLY Wikimedia employees. > Any email attachments into task should automatically have security set... [18:16:35] <_joe_> again, asking for ops advice on things that must scale is usually a good idea. but... I am off for now, sorry, it's 7 PM and I have people to meet IRL [18:17:25] godog: ya all processes that need to run are on puppet. I do not have permits to ssh to graphite1001 so i cannot tell whether processes are running, could you check? [18:17:35] godog: that is if it is not to late for you guys [18:17:52] godog: will request permits to graphite1001 [18:20:32] nuria: sure no problem, what should I be looking at? [18:20:58] (03CR) 10BBlack: [C: 031] "Looks good to my human eyes at this point. Can you or someone test this on e.g. labs first and make sure it doesn't fail to compile or br" [puppet] - 10https://gerrit.wikimedia.org/r/198805 (owner: 10Dr0ptp4kt) [18:20:59] godog: ps auxfw | grep event should bring one process connected to statsd port [18:22:11] yurik or MaxSem, could one of you guys give a hand on putting https://gerrit.wikimedia.org/r/198805 into labs for testing purposes? i haven't actually done hot varnish patches on labs before. i would love to see how you guys do that sort of thing. cc bblack [18:22:12] godog: what is the full path of graphite1001? [18:23:10] mmm, not sure I know how to test varnishes, dr0ptp4kt [18:24:31] godog: got it , it's graphite1001.eqiad.wmnet [18:26:02] dr0ptp4kt: I think you can merge the gerrit changeset to labs before it's merged for prod puppet, that's the usual route. the labs varnish servers pick up their VCL from puppet like prod. [18:26:11] (03CR) 10Ottomata: "Doh, right ok looks good." [puppet] - 10https://gerrit.wikimedia.org/r/198782 (https://phabricator.wikimedia.org/T89255) (owner: 10Nuria) [18:26:16] (03PS3) 10Ottomata: Adding template for apache to serve static content [puppet] - 10https://gerrit.wikimedia.org/r/198782 (https://phabricator.wikimedia.org/T89255) (owner: 10Nuria) [18:26:17] nuria: no EL processes on tungsten or graphite1001, hafnium perhaps? [18:26:21] (03PS1) 10RobH: raid10-gpt.cfg partman fixed [puppet] - 10https://gerrit.wikimedia.org/r/199647 (https://phabricator.wikimedia.org/T93113) [18:27:40] I should be saying s/labs/betalabs/ [18:27:53] godog: right .. i am confused now, why are alarms running on tungsten then? [18:28:10] (03CR) 10Ottomata: [C: 032] Adding template for apache to serve static content [puppet] - 10https://gerrit.wikimedia.org/r/198782 (https://phabricator.wikimedia.org/T89255) (owner: 10Nuria) [18:28:55] RECOVERY - Freshness of OCSP Stapling files on amssq33 is OK: OK [18:29:26] bblack: so "just" +2 it in gerrit? i don't think i have +2 rights [18:29:32] (03CR) 10Alex Monk: "Yep, that's for a later commit." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/199574 (https://phabricator.wikimedia.org/T31902) (owner: 10Alex Monk) [18:30:02] godog: from looking at icinga i thought alarms were set on tungsten, [18:30:03] dr0ptp4kt: no, +2 in gerrit sends it to prod. I think people cherrypick them to betalabs's checkout somehow. Honestly I've never been through the process. [18:30:51] alternatively, could log into one of the betalabs caches, disable puppet, manually apply the change and restart varnish there, I guess. Yurik's done this stuff before. [18:30:57] <_joe_> bblack: that is the process, yes [18:31:09] <_joe_> (cherry-picking on deployment-salt) [18:31:24] <_joe_> dr0ptp4kt: ask someone in the releng team to help with that, maybe? [18:31:33] bblack, i blocked that memory [18:31:37] <_joe_> and also, I'm off for today :) [18:32:00] 7Blocked-on-Operations, 6operations, 6Scrum-of-Scrums, 3Continuous-Integration-Isolation: Review Jenkins isolation architecture with Antoine - https://phabricator.wikimedia.org/T92324#1149882 (10chasemp) Stepping out on a limb here but I think this has the potential, or is probable, to become a large Ops r... [18:32:02] * yurik heads to therapy [18:32:38] _joe_: you’ve said that so many times today... [18:34:44] godog: so then tungsten was just a plain graphite host then, ok, sorry about teh confusion [18:38:02] (03CR) 10Ori.livneh: [C: 031] statsdlb: replace txstatsd with statsite [puppet] - 10https://gerrit.wikimedia.org/r/199600 (https://phabricator.wikimedia.org/T90111) (owner: 10Filippo Giunchedi) [18:39:02] so now we have a new partman radi10-gpt recipe [18:39:04] and it works [18:42:29] 6operations: Make OCSP Stapling support more generic and robust - https://phabricator.wikimedia.org/T93927#1149975 (10BBlack) 3NEW a:3BBlack [18:42:55] (03PS2) 10Ori.livneh: statsite: new module [puppet] - 10https://gerrit.wikimedia.org/r/199599 (https://phabricator.wikimedia.org/T90111) (owner: 10Filippo Giunchedi) [18:43:00] 6operations, 7HTTPS, 3HTTPS-by-default, 7Performance: HTTPS performance tuning - https://phabricator.wikimedia.org/T86666#1149984 (10BBlack) [18:43:58] 6operations, 7HTTPS: Make OCSP Stapling support more generic and robust - https://phabricator.wikimedia.org/T93927#1149986 (10BBlack) [18:45:47] 6operations, 7HTTPS, 3HTTPS-by-default, 7Performance: HTTPS performance tuning - https://phabricator.wikimedia.org/T86666#1150010 (10BBlack) OCSP went out to all clusters today, as the testing over the past ~24H looked pretty good. Filed a future task to improve the robustness and adaptability of the upda... [18:46:18] (03CR) 10Ori.livneh: [C: 031] statsite: new module [puppet] - 10https://gerrit.wikimedia.org/r/199599 (https://phabricator.wikimedia.org/T90111) (owner: 10Filippo Giunchedi) [18:47:58] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Hmm, so bypassing the check that fails? On a package that translates from French to Spanish ? Both quite popular languages. I am not likin" [debs/contenttranslation/apertium-fr-es] - 10https://gerrit.wikimedia.org/r/195577 (https://phabricator.wikimedia.org/T92252) (owner: 10KartikMistry) [18:48:21] (03CR) 10Alex Monk: [C: 031] "OK" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194913 (https://phabricator.wikimedia.org/T91630) (owner: 10Odder) [18:48:50] twentyafterfour: back [18:49:31] (03PS1) 1020after4: Remove 1.25wmf16 symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/199654 [18:49:33] (03PS1) 1020after4: Add 1.25wmf23 symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/199655 [18:49:35] (03PS1) 1020after4: Wikipedias to 1.25wmf22 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/199656 [18:49:37] (03PS1) 1020after4: Group0 to 1.25wmf23 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/199657 [18:49:42] (03CR) 10jenkins-bot: [V: 04-1] Remove 1.25wmf16 symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/199654 (owner: 1020after4) [18:49:56] 6operations, 10ops-requests: Monitor mailman - https://phabricator.wikimedia.org/T84150#1150042 (10Dzahn) [18:50:40] 6operations, 5Patch-For-Review: deploy francium for html/zim dumps - https://phabricator.wikimedia.org/T93113#1150051 (10RobH) [18:51:21] uhm .. https://integration.wikimedia.org/ci/job/operations-mw-config-tests/13605/console [18:52:46] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There are 0 unmerged changes in mediawiki_config (dir /srv/mediawiki-staging/). [18:53:09] 6operations, 5Patch-For-Review: deploy francium for html/zim dumps - https://phabricator.wikimedia.org/T93113#1150063 (10RobH) a:5RobH>3GWicke All the ops side of things on this are done except for whatever site.pp and operations repo changes are needed for service implementation. I would imagine that @gw... [18:53:53] 6operations, 5Patch-For-Review: deploy francium for html/zim dumps - https://phabricator.wikimedia.org/T93113#1150072 (10RobH) [18:53:54] 6operations, 10ops-eqiad: install 4 * 3TB disks in francium - sdc error - https://phabricator.wikimedia.org/T93114#1150070 (10RobH) 5Open>3Resolved turns out it was a bad disk from new order, returning it via the rt ticket and chris used the onsite spare disk. resolved. [18:54:24] 6operations, 6Phabricator, 7Monitoring: Phabricator reported down on status.wm.o - https://phabricator.wikimedia.org/T93443#1150078 (10Dzahn) In that case i will call this resolved because that's what we are doing, checking for content in a bug, not something defined by the UI or translation files. [18:54:56] 6operations, 6Phabricator, 7Monitoring: Phabricator reported down on status.wm.o - https://phabricator.wikimedia.org/T93443#1150087 (10Dzahn) 5Open>3Resolved [18:55:13] nuria: correct (and confusing) the checks run on the graphite host but check other things [18:55:31] 6operations, 5Patch-For-Review: deploy francium for html/zim dumps - https://phabricator.wikimedia.org/T93113#1129671 (10RobH) I could be wrong, and then it may be @arielglenn who handles this, not certain... (I just know I discussed this system request with them both.) [18:55:42] godog: ok, thank you hafnium is not changing any time soon [18:58:00] twentyafterfour: I'm not sure what is causing that config test to fail [18:58:05] (03CR) 1020after4: "RECHECK" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/199654 (owner: 1020after4) [18:58:36] bd808: recheck succeeded ... [18:58:48] sunspots [18:58:55] it was passing for me locally too [18:59:07] (03CR) 1020after4: [C: 032] Remove 1.25wmf16 symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/199654 (owner: 1020after4) [18:59:30] (03CR) 1020after4: [C: 032] Add 1.25wmf23 symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/199655 (owner: 1020after4) [18:59:52] 6operations, 5Patch-For-Review: deploy francium for html/zim dumps - https://phabricator.wikimedia.org/T93113#1150107 (10GWicke) @RobH, thanks! The next thing we'll need is some level of access to this box, ideally sudo. It looks like I ran into the same trap of not explicitly spelling out 'and we need access... [19:01:25] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [19:02:26] 6operations, 10ops-codfw: Receive 6 new misc virt cluster nodes - https://phabricator.wikimedia.org/T91977#1150111 (10Papaul) @RobH. can you please update this ticket with racking information? Thanks. [19:02:44] 6operations, 5Patch-For-Review: deploy francium for html/zim dumps - https://phabricator.wikimedia.org/T93113#1150113 (10RobH) All sudo requests have to be their own ticket in ops-access-requests and then have them approved in the monday meeting, so yep. The other way to not wait is to have ops mgmt directly... [19:06:42] 6operations, 6Phabricator, 10Wikimedia-Bugzilla: Sanitise a Bugzilla database dump - https://phabricator.wikimedia.org/T85141#1150129 (10JohnLewis) To get this moving (again), here is a new plan of attack: * Get the bugid of all security bugs (as the schema is seriously out of date, just grep the docroot fo... [19:07:57] (03PS1) 10Dzahn: fix check_mailman_queue monitoring [puppet] - 10https://gerrit.wikimedia.org/r/199662 (https://phabricator.wikimedia.org/T84150) [19:08:54] 6operations, 10ops-codfw: Receive 6 new misc virt cluster nodes - https://phabricator.wikimedia.org/T91977#1150158 (10RobH) The names of these systems was discussed on the operations mailing list. The result was down to mvirt or ganeti, and since labs will rename to labsX (from virtX), naming these ganeti (in... [19:09:58] 6operations, 10ops-codfw: Receive 6 new misc virt cluster nodes - https://phabricator.wikimedia.org/T91977#1150167 (10RobH) Actually, I'll create a sub-task to detail the networking for these. [19:10:21] jenkins dead ? [19:11:30] (03PS2) 10Dzahn: fix check_mailman_queue monitoring [puppet] - 10https://gerrit.wikimedia.org/r/199662 (https://phabricator.wikimedia.org/T84150) [19:12:09] 6operations, 7network: determine networking for ganeti2001-2006 - https://phabricator.wikimedia.org/T93932#1150173 (10RobH) 3NEW a:3mark [19:12:18] (03Merged) 10jenkins-bot: Remove 1.25wmf16 symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/199654 (owner: 1020after4) [19:12:20] (03Merged) 10jenkins-bot: Add 1.25wmf23 symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/199655 (owner: 1020after4) [19:12:45] (03PS3) 10Dzahn: fix check_mailman_queue monitoring [puppet] - 10https://gerrit.wikimedia.org/r/199662 (https://phabricator.wikimedia.org/T84150) [19:13:48] 6operations, 10Wikimedia-Shop, 5Patch-For-Review: Changing the URL for the Wikimedia Shop - https://phabricator.wikimedia.org/T92438#1150190 (10vshchepakina) @Dzahn any updates on Shopify? [19:15:26] (03PS5) 10EBernhardson: Enable editing of Flow posts, by autoconfirmed users, on mediawikiwiki, enwiki, ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/196068 (https://phabricator.wikimedia.org/T90670) [19:17:31] ok aude: I'm about to push version 1.25wmf22 (and wmf23) out shortly [19:17:45] Not sure she's around yet [19:17:47] but I'm [19:18:32] "13:48 aude twentyafterfour: back" [19:18:51] oh, ok [19:18:53] :) [19:20:11] !log twentyafterfour Started scap: testwiki to php-1.25wmf23 and rebuild l10n cache [19:20:19] (03PS4) 10Dzahn: fix check_mailman_queue monitoring [puppet] - 10https://gerrit.wikimedia.org/r/199662 (https://phabricator.wikimedia.org/T84150) [19:20:22] Logged the message, Master [19:20:36] ^d: how to check which groups a specific gerrit user is member of? [19:21:04] i can search for the groups [19:21:53] (03CR) 10Gergő Tisza: Basic role for Sentry (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/199598 (https://phabricator.wikimedia.org/T84956) (owner: 10Gilles) [19:22:31] !log twentyafterfour scap failed: CalledProcessError Command 'cp '/srv/mediawiki-staging/php-1.25wmf23/cache/l10n/'*.cdb '/tmp/scap_l10n_2482639127'' returned non-zero exit status 1 (duration: 02m 19s) [19:22:35] Logged the message, Master [19:23:11] (03CR) 10Alex Monk: "Or perhaps you want to use foreachwikiindblist? Or something." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/199576 (owner: 10KartikMistry) [19:24:04] (03CR) 10Alex Monk: [C: 031] Use a dblist for Flow [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194809 (owner: 10Mattflaschen) [19:25:50] (03CR) 10Alex Monk: "This is up for swat later. Are the dependencies resolved?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/196068 (https://phabricator.wikimedia.org/T90670) (owner: 10EBernhardson) [19:25:50] twentyafterfour: ok [19:26:15] unlikely there are any issues with our stuff, but like to be around anyway just in case [19:26:24] 19:22:30 scap failed: CalledProcessError Command 'cp '/srv/mediawiki-staging/php-1.25wmf23/cache/l10n/'*.cdb '/tmp/scap_l10n_2482639127'' returned non-zero exit status 1 (duration: 02m 19s) [19:26:29] bd808: ^ [19:26:53] related to the most recent change I guess? [19:28:14] (03CR) 10Dzahn: [C: 032] fix check_mailman_queue monitoring [puppet] - 10https://gerrit.wikimedia.org/r/199662 (https://phabricator.wikimedia.org/T84150) (owner: 10Dzahn) [19:29:29] twentyafterfour: there should be inforamtion in the scap log [19:29:49] and sounds maybe realted to his change [19:29:52] even though it worked for me on wmf 21 and 22 [19:30:07] <^d> mutante: Not really exposed I can think of [19:30:10] the difference is wmf23 is new and doesn't have the cdb cache [19:30:14] yeah [19:30:29] sounds like maybe a bug [19:30:51] yeah it's a bug. it should just continue instead of aborting on that command [19:41:41] greg-g: you around? [19:43:21] (03PS1) 1020after4: Check for existing CDB files before attempting to copy them. [tools/scap] - 10https://gerrit.wikimedia.org/r/199671 [19:43:39] (03CR) 10jenkins-bot: [V: 04-1] Check for existing CDB files before attempting to copy them. [tools/scap] - 10https://gerrit.wikimedia.org/r/199671 (owner: 1020after4) [19:45:12] (03PS2) 1020after4: Check for existing CDB files before attempting to copy them. [tools/scap] - 10https://gerrit.wikimedia.org/r/199671 [19:48:29] greg-g: _joe_ suggested maybe that releng woiuld have a pointer for testing https://gerrit.wikimedia.org/r/#/c/198805/ in beta labs. who would be someone who could help with this? ^demon|lunch are you familiar with doing these varnish things in beta labs? yurik has done it before, but he said it has receded from memory :) [19:48:50] !log twentyafterfour Started scap: testwiki to php-1.25wmf23 and rebuild l10n cache (attempt #2) [19:48:55] Logged the message, Master [19:49:42] dr0ptp4kt, you basically stop the puppet auto-runs, copy the original .vcl file, make changes, test, rename the original and restort the puppets [19:49:56] yurik: aha! so it is in your memory bank! [19:50:16] yurik: is this something you could lend a hand on? [19:50:35] i would have to do a lot of googling :) [19:50:39] yurik: or do you know of others, perhaps in greg-g's releng as per _joe_'s suggestion? [19:51:24] (03PS5) 10Faidon Liambotis: Move maerlant out of .esams.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/199289 [19:51:26] (03PS1) 10Faidon Liambotis: Remove br1-knams, decom'ed [dns] - 10https://gerrit.wikimedia.org/r/199673 [19:52:14] (03CR) 10Faidon Liambotis: [C: 032] Remove br1-knams, decom'ed [dns] - 10https://gerrit.wikimedia.org/r/199673 (owner: 10Faidon Liambotis) [19:52:37] (03CR) 10Faidon Liambotis: [C: 032] Move maerlant out of .esams.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/199289 (owner: 10Faidon Liambotis) [19:52:49] oh jenkins [19:55:20] 6operations, 7Mail, 7Monitoring, 5Patch-For-Review: Mailing lists alerts - https://phabricator.wikimedia.org/T93783#1150428 (10Dzahn) This should now become CRIT if "any" of the queues is above threshold (42). before it checked "in" and "out" but if the first queue was ok it would exit as OK. I fixed that... [19:55:24] (03PS4) 10Faidon Liambotis: maerlant.esams.wikimedia.org -> maerlant.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/199294 [19:55:49] dr0ptp4kt, ok, lets try [19:56:01] (03CR) 10Faidon Liambotis: [C: 032] maerlant.esams.wikimedia.org -> maerlant.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/199294 (owner: 10Faidon Liambotis) [19:56:04] yurik: thank you! [19:56:25] (03PS3) 10Nemo bis: Add dedicated runner for MessageIndexRebuildJob [puppet] - 10https://gerrit.wikimedia.org/r/197919 (https://phabricator.wikimedia.org/T90704) (owner: 10Nikerabbit) [20:00:04] gwicke, cscott, arlolra, subbu: Dear anthropoid, the time has come. Please deploy Services – Parsoid / OCG / Citoid / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150325T2000). [20:02:00] (03CR) 1020after4: [C: 031 V: 031] "Cherry-picked on tin, seems to work" [tools/scap] - 10https://gerrit.wikimedia.org/r/199671 (owner: 1020after4) [20:05:53] !log deployed parsoid sha 0313fcc7 [20:05:58] Logged the message, Master [20:07:22] (03PS1) 10Yurik: Added new SSH key for yurik [puppet] - 10https://gerrit.wikimedia.org/r/199676 [20:07:40] dr0ptp4kt, around? [20:07:45] want to chat? [20:07:49] yurik: i'm here, yes [20:07:55] yurik: sure thing, hangout? [20:08:02] ok [20:08:16] paravoid, hi, could you +2 ^^^ [20:08:21] 6operations, 10Wikimedia-Shop, 5Patch-For-Review: Changing the URL for the Wikimedia Shop - https://phabricator.wikimedia.org/T92438#1150507 (10Dzahn) @vshchepakina Yes, shopify apparently contacted DigiCert, the provider of their certificate. And then needed validation from us. We got emails directly from... [20:08:54] yurik: why me? [20:09:00] LOLOL [20:09:03] yurik: there's an ops duty [20:09:12] ops duty person, I mean [20:09:16] sorry faidon, robh, its on you :) [20:13:17] sync-proxies is taking FOREVER [20:13:59] yurik: https://gerrit.wikimedia.org/r/#/c/198805/ [20:15:44] oh man ... it's syncing with codfw which is taking forever [20:16:35] 6operations, 10Wikimedia-Shop, 5Patch-For-Review: Changing the URL for the Wikimedia Shop - https://phabricator.wikimedia.org/T92438#1150532 (10RobH) I find it odd that we are allowing a third party company to have a certificate for our domains (shop, store, whatever.wikemieda.org) and that we cannot revoke... [20:18:31] (03Abandoned) 10Yuvipanda: tools: Add jdk-8 to trusty nodes [puppet] - 10https://gerrit.wikimedia.org/r/181548 (https://phabricator.wikimedia.org/T68171) (owner: 10Yuvipanda) [20:19:58] <^demon|lunch> dr0ptp4kt: I really dunno much about varnish. Hoping to learn a tad more as we build out staging [20:20:19] dr0ptp4kt: yeah, you can test that in Beta Cluster. For things like that (hard to put in if realm==BetaCluster blocks) then cherry-picking, testing, then merging is the way to do it [20:20:26] ^demon|lunch: cool. yurik is giving it a whirl [20:20:45] twentyafterfour: how long forever? [20:20:51] twentyafterfour: also, yeah, 200+ new hosts :) [20:21:01] greg-g: shushh, no putting things in realm==beta. Spent a lot of time getting those out... [20:21:19] (03CR) 10RobH: [C: 032] "the easiest way to confirm these is when folks make their own patchsets =D" [puppet] - 10https://gerrit.wikimedia.org/r/199676 (owner: 10Yurik) [20:21:24] although, varnish still has those. but not too much, just IP addresses and stuff [20:21:30] * greg-g nods [20:21:31] sorry [20:21:36] greg-g: 10 minutes to sync 12 servers and sync--common is running ....a LONG time [20:21:38] 6operations, 10RESTBase, 7Monitoring, 5Patch-For-Review: Detailed cassandra monitoring: metrics and dashboards done, need to set up alerts - https://phabricator.wikimedia.org/T78514#1150555 (10GWicke) @fgiunchedi: Since we depend on this & are on track to add more stats with each service -- what is the pla... [20:21:42] I'll remove realm==beta from my vocabulary ;) [20:21:48] (03CR) 10Hashar: [C: 04-1] "On CI I am using http://jenkins-debian-glue.org/docs/ which creates cow images on demand. Though the base path include the architecture ( " (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/194471 (owner: 10Alexandros Kosiaris) [20:21:48] yesplease :) [20:22:02] yurik: sorry, was zoned in on another task [20:22:06] looks good, i merged [20:22:08] twentyafterfour: hmm, that seems very long [20:22:47] robh, thanks!! [20:23:17] finally moving along though. I thought it was completely frozen for a while [20:24:08] YuviPanda, what's the current preferred way of writing labs-only puppet manifests? is there a repo other than operations/puppet? [20:24:08] <^demon|lunch> twentyafterfour: you can screen now btw :) [20:24:21] MaxSem: what do you mean by ‘labs only manifests' [20:24:37] MaxSem: just put them in operations/puppet, write a module, put a role class in manifests/roles, and be done? [20:24:50] MaxSem: I wish there was a separate repo, that would be great [20:24:59] bd808: pm me yours and robla's email please? (for the mwapi-team mailing list. could look but this is easier :p) [20:25:06] YuviPanda: the issue is most of us don't have +2 on operations/puppet [20:25:17] mmm, main puppet repo is slow to iterate on:) [20:25:30] ah, the big question :) [20:25:36] MaxSem: can I get an example? [20:25:52] MaxSem: twentyafterfour so, long time ago, when I also did not have +2 on ops/puppet and faced similar issues, I wrote this puppet module... [20:25:52] it would be really awesome to have a way to test puppet stuff and ..like MaxSem says, to iterate faster ;) [20:25:57] MaxSem: twentyafterfour it’s called… puppetception. [20:26:13] :D [20:26:25] MaxSem: twentyafterfour basically what it lets you do is on labs, point to another git puppet repo, and then execute that as well, decoupled from the ops/puppet repo. [20:26:40] 6operations, 7network: determine networking for ganeti2001-2006 - https://phabricator.wikimedia.org/T93932#1150588 (10RobH) I chatted with Mark who suggested I chat with Alex about this. I'm assigning this to Alex so he can update the task if he likes (otherwise I'll chat with him my AM his PM tomorrow.) [20:26:43] I meant it for things that are completely 100% separate from ops/puppet, and would never make it to prod - like quarry.wmflabs.org [20:26:58] mutante, Nikerabbit: I guess https://gerrit.wikimedia.org/r/#/c/197919/ can be put out there first [20:27:01] I wanna puppetize Hierator, but to get it into main puppet repo is hard while I'm experimenting on labs, because of the li mited number of people who can review and high standards [20:27:06] twentyafterfour: MaxSem however, I never ended up using it, because then it is a completely separate repo and I can’t use all the nice things I got used to from ops/puppet :) [20:27:26] MaxSem: self hosted puppetmaster + cherrypicks? [20:27:43] that’s definitely going to help more than puppetception, since ideally you’d want hierator in ops/puppet finally [20:27:56] sigh [20:27:58] MaxSem: or you can go the ottomata way, make it a git submodule, test it in vagrant [20:28:09] and then you’ll just write different roles for vagrant vs ops/puppet [20:28:18] (03PS4) 10Yuvipanda: Remove dummy redirect for fab-01 [puppet] - 10https://gerrit.wikimedia.org/r/198535 (owner: 10Negative24) [20:28:40] (03CR) 10Yuvipanda: [C: 032 V: 032] Remove dummy redirect for fab-01 [puppet] - 10https://gerrit.wikimedia.org/r/198535 (owner: 10Negative24) [20:28:43] it is a good way [20:28:45] 6operations, 7network: determine networking for ganeti2001-2006 - https://phabricator.wikimedia.org/T93932#1150605 (10RobH) a:5mark>3akosiaris [20:29:00] well, it's currently kinda hacky [20:29:04] https://gerrit.wikimedia.org/r/#/c/189149/ [20:29:16] (03PS1) 10Faidon Liambotis: Add multatuli to public1-esams [dns] - 10https://gerrit.wikimedia.org/r/199681 [20:29:23] (03PS1) 10Faidon Liambotis: Add multatuli, just standard for now [puppet] - 10https://gerrit.wikimedia.org/r/199682 [20:30:06] (03CR) 10Faidon Liambotis: [C: 032] Add multatuli to public1-esams [dns] - 10https://gerrit.wikimedia.org/r/199681 (owner: 10Faidon Liambotis) [20:30:15] AaronSchulz: what's "there"? [20:30:37] (03CR) 10Faidon Liambotis: [C: 032] Add multatuli, just standard for now [puppet] - 10https://gerrit.wikimedia.org/r/199682 (owner: 10Faidon Liambotis) [20:31:02] ottomata: you should write it up somewhere so people can point it to MaxSem and such :) [20:31:06] paravoid: Hah, are we naming machines for Dutch authors now? [20:31:16] RoanKattouw: always have, that's esams :) [20:31:21] MaxSem: personally, I didn’t find it too hard even when I didn’t have +2 [20:31:30] MaxSem: but that of course requires a *lot* of bugging opsen... [20:31:35] I guess Multatuli is the only one that I actually recognize then :) [20:31:45] MaxSem: and I worked on things that were absolutely labs-only (tools, mostly) so it was easier... [20:31:46] RoanKattouw: https://wikitech.wikimedia.org/wiki/Infrastructure_naming_conventions [20:32:02] Nikerabbit: as in "deployed" [20:32:05] RoanKattouw: current ones are hooft, nescio, eeden, maerlant, multatuli [20:32:17] ooooh, nescio is an author name? [20:32:28] I guess Hooft is a poet [20:32:29] * YuviPanda thought it was something like nestle... [20:32:47] The other ones I don't know, but I'm not that much into Dutch literature [20:32:52] YuviPanda, how hard would it be to make the jetty part of this commit ready for ops/puppet, YuviPanda ? [20:32:59] err, two names:) [20:33:01] Multatuli is very well known though [20:33:10] Everyone who went to high school in NL read Max Havelaar [20:33:11] (03CR) 10Hashar: "I don't think it is a good idea to run pip via puppet. I did that for Zuul but it has proven to be a nightmare to maintain." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/199598 (https://phabricator.wikimedia.org/T84956) (owner: 10Gilles) [20:33:24] YuviPanda: https://en.wikipedia.org/wiki/Nescio it's a pseudonym of a Dutch writer but still :p [20:33:51] MaxSem: that exec to wget from maven is going to merit a -2 [20:34:09] MaxSem: right, so all the execs must go [20:35:04] esams/knams hosts are named after notable dutch people [20:35:04] MaxSem: needs a lot more documentation as well. jetty::service? what is it? why just upstart, when we’re moving to systemd? etc [20:35:18] https://wikitech.wikimedia.org/wiki/Infrastructure_naming_conventions#Miscellaneous_Servers [20:35:35] yep that wget is temporary as stated in comment:) [20:35:43] MaxSem: so I dunno how hard / easy it is, but getting packaging done and done to a point where ops are satisfied is probably the biggest blocker.l.. [20:35:59] MaxSem: outside of that, maybe not too hard? Also, I don’t see a deployment method for the wars. [20:36:11] YuviPanda, but in general, can jetty-runner be deployed with gitfat? [20:36:24] I have no idea what gitfat is, sorry [20:37:27] (03CR) 10Krinkle: "Yeah. pip is acceptable from ops for labs (though stil undesirable maintenance-wise, as we've experienced in CI). For prod, usage of compo" [puppet] - 10https://gerrit.wikimedia.org/r/199598 (https://phabricator.wikimedia.org/T84956) (owner: 10Gilles) [20:42:12] YuviPanda, basically, package{'foo': provider => trebuchet}, and then 'gitfat_enabled' => true, in role::deployment::config [20:42:15] ok sync-common is only 60% done after 30 minutes [20:42:29] twentyafterfour, I blame codfw:P [20:42:47] We're going to have to lengthen the deployment window [20:42:57] I was already running over time before [20:43:10] MaxSem: right, still no idea :P And I don’t feel very good about diving into trebuchet internals :D [20:43:14] I should probably sleep anyway... [20:43:18] twentyafterfour: ugh [20:43:24] 7Blocked-on-Operations, 6operations, 10Continuous-Integration, 6Scrum-of-Scrums: Jenkins is using php-luasandbox 1.9-1 for zend unit tests; precise should be upgraded to 2.0-7+wmf2.1 or equivalent - https://phabricator.wikimedia.org/T88798#1150682 (10hashar) A note: the CI instances have Debian unattended... [20:43:32] bblack: SO, even without the patch, a restart of the beta varnish threw an error. yurik is in a meeting right now, but i wanted to make you aware of this. yurik may need a hand with resolving the error. cc greg-g [20:43:58] dr0ptp4kt: is that why http://en.m.wikipedia.beta.wmflabs.org/ isn't loading now? [20:44:49] greg-g: yeah, from what i can tell at least one of the nodes won't restart [20:45:20] (03PS1) 10BBlack: various betalabs cache storage sizing fixups [puppet] - 10https://gerrit.wikimedia.org/r/199688 [20:45:38] dr0ptp4kt: see -labs [20:45:38] :( [20:45:45] also, the patch above [20:45:49] bblack: "beta cluster" :P [20:45:58] ? [20:46:17] the thing that hosts http://en.m.wikipedia.beta.wmflabs.org/ is the Beta Cluster, not "betalabs" [20:46:36] i call it beta labs all the time [20:46:41] ok whatever [20:46:44] * greg-g is pedantic because of the confusion that is caused by when people just say "labs" and mean who the hell knows which thing [20:46:53] that is annoying yes [20:46:53] (03CR) 10BBlack: [C: 032] various betalabs cache storage sizing fixups [puppet] - 10https://gerrit.wikimedia.org/r/199688 (owner: 10BBlack) [20:47:05] I figure WMF Labs gets to own "labs" as a short hand [20:47:13] greg-g: thank you for the terminology check! [20:47:15] and toollabs? :) [20:47:24] what is tool labs then :) because community often confuse me with that one [20:47:31] ugh, effing names [20:47:34] bblack: since you are poking at beta cache, wonder if you are aware of https://phabricator.wikimedia.org/T90983 [20:47:40] greg-g: deployment-prep! [20:47:47] got closed as a duplicate though not convinced it's exactly a duplicate [20:47:50] paravoid: shoulda just stayed with that :) [20:48:05] since things seem to get stuck in varnish [20:48:25] no idea [20:49:06] :/ [20:49:18] ebernhardson: tool labs is the job runner grid that is hosted in labs and what most labs users outside the WMF actually use rather than dedicated projects [20:49:23] dr0ptp4kt: if you can get someone to sync up betawhatsit puppet with https://gerrit.wikimedia.org/r/#/c/199688/ and puppet the mobile cache, it should fix the restart [20:49:43] yurik: ^ can you do this? [20:49:44] iirc it happens automatically every 10' or so [20:49:51] or 1h? [20:50:02] something like that :) [20:50:18] I thought it was never automatic, because they cherrypick in unmerged changes, etc [20:50:28] it rebases automatically I think [20:50:31] ah ok [20:50:39] yurik: ^ see also discussion between paravoid and bblack [20:51:44] PROBLEM - puppet last run on ms-be3001 is CRITICAL: CRITICAL: puppet fail [20:51:45] bblack: I think the number of cherrypicks is down to 0 on beta cluster (modulo honestly temp ones of less than a day) [20:52:12] dr0ptp4kt, sorry, was in a meeting [20:53:00] aude: sorry, I suspect what you're seeing with bits is actually rather thorny. I think it's probably not different from prod, except that in prod things get pushed out naturally. [20:53:19] the low traffic in betabits == things don't get pushed off the cache so fast [20:53:42] we don't do real invalidation in bits (e.g. PURGE) [20:54:14] but, my understanding (and it could be wrong!) was that prod avoided issues with this by having bits URLs be content-unique somehow (e.g. versioned/dated URLs) [20:55:39] (on the upside, there is a somewhat easy workaround: restart varnish on betabits and the cache poofs) [20:55:44] (03PS1) 10Negative24: Add Phab labs security role [puppet] - 10https://gerrit.wikimedia.org/r/199690 [20:56:39] (03CR) 10jenkins-bot: [V: 04-1] Add Phab labs security role [puppet] - 10https://gerrit.wikimedia.org/r/199690 (owner: 10Negative24) [20:56:46] (03PS1) 10Mjbmr: Enable wgUseRCPatrol for fawiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/199692 (https://phabricator.wikimedia.org/T85381) [20:56:46] bblack: sounds not nice [20:56:59] i've purged individual things in beta though and it works [20:57:20] and suppose a restart could be done as last resort [20:57:30] yeah, I'm just saying, in prod we don't purge there at all on deploy, because in prod I don't think we ever rely on content changing for a unique URL [20:58:11] i think in prod, it's either stuff with changing url [20:58:24] dr0ptp4kt, bblack, the puppets are back on and running, but that patch is not updating. Do i need to get it some other way? [20:58:26] or stuff that doesn't have that is cached only 5 min or so [20:58:29] (03CR) 10Chmarkine: [C: 031] "I support removing Camellia. The major reason is that Camellia is patented, although it is available under a royalty-free license.[1] Give" [puppet] - 10https://gerrit.wikimedia.org/r/199582 (owner: 10BBlack) [20:58:39] (03PS3) 10BryanDavis: Check for existing CDB files before attempting to copy them. [tools/scap] - 10https://gerrit.wikimedia.org/r/199671 (owner: 1020after4) [20:58:51] aude: we could put a cronjob on beta to restart the varnish server every 30 minutes or something :) [20:59:07] or some VCL hack to set all object TTLs very low [20:59:24] (03PS2) 10Negative24: Add Phab labs security role [puppet] - 10https://gerrit.wikimedia.org/r/199690 [20:59:57] yurik: the beta puppetmaster needs updating first [21:00:18] bblack: could work :) [21:00:35] bblack, is someone doing it? [21:00:50] i really don't want to mess up all of betalabs :) [21:00:51] not me, I don't even know the process, although I'm sure it's on wikitech somewhere [21:01:54] yurik: all of mobile beta cluster is not working as it is :) [21:01:59] yurik: /usr/local/bin/git-sync-upstream will update to the latest [21:02:12] * yurik running [21:02:36] bd808, no such file [21:02:58] i'm on mobile03 [21:03:00] on deployment-salt? There surely is because I'm looking at it [21:03:05] :) [21:03:08] lol [21:03:14] bd808, want to run it? :) [21:03:19] sure [21:03:36] you seem to know waaay more about it... and my puppet journey is still ahead of me [21:04:21] "various betalabs cache storage sizing fixups" pulled in [21:04:28] was that what you needed? [21:04:43] puppet is not a journey, it's a destination. it's where you arrive when you've exhausted all other sane language options in your search for the ultimate twisted non-sensical scoping and syntax rules. [21:04:47] bd808: yes [21:05:00] yurik: sudo puppet agent --test --verbose to force your run on mobile03 [21:05:13] doing so now [21:05:25] already in process... [21:05:29] hehe, beat me to it [21:05:40] multithreaded, multiprocessed, multimanaged [21:06:01] its on! [21:06:05] bblack, runs [21:06:07] thank! [21:06:16] aude, greg-g the beta is back! [21:06:22] dr0ptp4kt, time to hack you thing again ) [21:06:29] yurik: :) [21:07:07] yurik: yay! [21:07:29] (03CR) 10BryanDavis: [C: 031 V: 031] "Made the test a bit more "pythonic". Tested on local VM." [tools/scap] - 10https://gerrit.wikimedia.org/r/199671 (owner: 1020after4) [21:08:05] dr0ptp4kt, added back your patch, want to test it? [21:09:02] yurik: yes, one moment [21:09:35] RECOVERY - puppet last run on ms-be3001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:10:24] yurik: that regex needs a tweak. gimme a couple minutes [21:12:15] (03Abandoned) 10Negative24: Add Phab labs security role [puppet] - 10https://gerrit.wikimedia.org/r/199690 (owner: 10Negative24) [21:13:28] dr0ptp4kt, http://www.regexr.com/ [21:13:37] yurik: thx [21:14:15] varnish regexes are PCRE, so perl is a good option for testing/experimenting too [21:14:48] e.g. run perl -ple 's/x/y/' and feed it some test strings via STDIN [21:15:41] (there are minor differences in perl/PCRE, but they tend to be esoteric things nobody uses) [21:16:04] bblack, thanks. I just like the visual aspects of that site, but you are right, best to test at the end on the proper PCRE [21:16:40] (03CR) 10Chmarkine: "And among Alexa Top 10 sites, Wikipedia is the only site that supports Camellia." [puppet] - 10https://gerrit.wikimedia.org/r/199582 (owner: 10BBlack) [21:19:33] oh there's a "pcregrep" command/package avail on debian as well [21:19:44] that would be an even more reliable test for exactly PCRE syntax [21:20:36] PROBLEM - puppet last run on mw2120 is CRITICAL: CRITICAL: puppet fail [21:23:26] (03PS4) 10Dr0ptp4kt: Do not fragment cache with provenance parameter [puppet] - 10https://gerrit.wikimedia.org/r/198805 [21:23:42] yurik: ^ would you please try that instead? cc bblack [21:24:00] dr0ptp4kt, you go ahead, i will watch over your shoulder :) [21:24:04] vchat [21:24:12] dr0ptp4kt: heh [21:24:18] so (?i) counts as a capture too? [21:24:46] if you have a problem that you solve with regex, you have two problems... [21:24:51] bblack: [21:24:53] yurik: [21:24:56] what yurik said [21:25:44] I don't think \3 is the answer you're looking for there [21:25:49] yurik: pretty please would you do it this time? [21:25:51] (I don't think (?i) counts) [21:25:54] ok [21:26:01] bblack: try it with your perl one liner :) [21:26:20] I just did! [21:26:30] bblack: blargh [21:26:31] hang on [21:26:34] yurik: sorry [21:26:37] bblack-mba:puppet bblack$ perl -ple 's/(?i)(\?|&)wprov=([^&]+)/$2/' [21:26:40] foo?WPROV=XXX [21:26:43] fooXXX [21:26:56] what problem were you seeing with PS3 behavior? [21:27:24] dr0ptp4kt, ping me when ready [21:27:43] 3 senior devs for one regex ... sigh :) [21:28:23] (03CR) 10Alex Monk: [C: 031] "Looks like this is what they actually intended. Thanks." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/199692 (https://phabricator.wikimedia.org/T85381) (owner: 10Mjbmr) [21:28:55] any time you create a new regex and don't acidentally cause world war 3, you can be proud :) [21:30:09] bblack: yurik i'll get back to you after a bit. sorry for the hassle. i gotta go test another thing. maybe good to get my brain away from regexen for a moment [21:31:47] ok [21:32:06] dr0ptp4kt: FWIW, I checked the PS3 if-clause one in vcl_recv as well, and it looks ok in perl as well [21:34:20] * yurik proposes to use qbasic instead of VCL ... [21:35:42] 6operations, 10Wikimedia-Shop, 5Patch-For-Review: Changing the URL for the Wikimedia Shop - https://phabricator.wikimedia.org/T92438#1150928 (10Dzahn) Yea, just changing it from shop.wikimedia.org to store.wikipedia.org. Not changing the general way we are doing it here, that part was just as i found it. I'm... [21:37:50] yurik: +2 [21:38:09] dr0ptp4kt, ? [21:38:17] dr0ptp4kt: to what you said [21:38:25] RECOVERY - puppet last run on mw2120 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:38:44] re qbasic? yes, that was a marvelous language... i just don't remember what games ppl wrote in it... but they were good ) [21:39:20] https://groups.google.com/forum/#!topic/comp.lang.basic.misc/nMhtm4z32X8 [21:42:50] LOL [21:43:30] never thought basic was a WriteOnly language [21:47:56] (03PS1) 10Dzahn: switch shop name, store.wikipedia.org to shopify [dns] - 10https://gerrit.wikimedia.org/r/199751 (https://phabricator.wikimedia.org/T92438) [21:48:53] (03CR) 10John F. Lewis: [C: 031] "lgtm" [dns] - 10https://gerrit.wikimedia.org/r/199751 (https://phabricator.wikimedia.org/T92438) (owner: 10Dzahn) [21:51:25] twentyafterfour: 1.25wmf22 for group 2 is waiting on wmf23 scaping? :-( [21:51:34] 6operations, 10MediaWiki-extensions-Sentry, 6Multimedia: Procure hardware for Sentry - placeholder (not a live request) - https://phabricator.wikimedia.org/T93138#1151045 (10Tgr) [21:52:12] James_F: yeah scap is going super slow ` [21:52:31] twentyafterfour: (You've got enough to do, I just need to do some post-wmf22-deployment tasks and don't know when to do them or what to tell the CLs.) [21:52:54] (03CR) 10Hoo man: [C: 031] "Looks good at a glance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/199574 (https://phabricator.wikimedia.org/T31902) (owner: 10Alex Monk) [21:54:01] James_F: it's at 87%... I'll let you know once it's done [21:54:13] twentyafterfour: Are there some hosts that got suck somehow? I don't see anything in the scap log on fluorine for the last hour [21:54:19] twentyafterfour: Thanks! [21:54:24] s/suck/stuck/ [21:54:36] bd808: it's still going... very slowly [21:54:45] (03PS1) 10Dzahn: adjust shop redirects to changed shop URL [puppet] - 10https://gerrit.wikimedia.org/r/199754 (https://phabricator.wikimedia.org/T92438) [21:55:37] sync-common: 87% (ok: 419; fail: 0; left: 58) [21:55:44] All the stuck hosts are in codfw [21:55:55] are we serving from codfw yet? [21:56:32] no [21:56:39] no but we have started populating the mw servers there [21:56:44] and they are hung somehow [21:56:54] it worked (well, sync-common) this morning :( [21:57:15] ps ax|grep sync-common|wc -l == 59 ; ps ax|grep sync-common|grep -v codfw|wc -l == 0 [21:57:30] so this scap started at 19:48? [21:57:58] yes [21:58:04] ouch [21:58:16] twentyafterfour: I can kill them on tin. The scap should continue and then will eventually say that it failed but we know that was just codfw [21:58:20] it's been poking along very slowly from the start [21:58:45] something is messed up and it's in codfw (or between codfw and tin) [21:58:56] does it do all eqiad first, and then codfw? [21:58:58] or all mixed? [21:59:04] all mixed [21:59:07] damn [21:59:21] but all that is left running is codfw for the initial sync [22:00:19] twentyafterfour: want me to unstick you with some kills? [22:00:29] 6operations, 10Wikimedia-Shop, 5Patch-For-Review: Changing the URL for the Wikimedia Shop - https://phabricator.wikimedia.org/T92438#1151121 (10jeremyb) I don't care much about store vs. shop. However, I think there's substantial precedent for not putting canonical stuff like this under project domains. Why... [22:00:38] bd808: ok [22:00:40] if it was all eqiad first we could tell James_F it was basically done :p [22:01:12] can we remove codfw mw's for now until this is resolved? this will kill SWAT tonight (in one hour) [22:01:14] !log scap sync-common step stuck with 58 codfw hosts not syncing at any reasonable speed [22:01:21] Logged the message, Master [22:02:22] it's jenkins that was super annoying at swat earlier :( [22:02:28] but scap was also slowish [22:02:29] frack it's not letting me kill them [22:03:26] Krenair: :-P [22:03:49] what's the timeout on those? [22:03:59] I haven't had any fail yet [22:04:31] infinite [22:04:59] they just keep running until something stops each ssh session [22:05:33] (03CR) 1020after4: [C: 032] Check for existing CDB files before attempting to copy them. [tools/scap] - 10https://gerrit.wikimedia.org/r/199671 (owner: 1020after4) [22:05:50] (03Merged) 10jenkins-bot: Check for existing CDB files before attempting to copy them. [tools/scap] - 10https://gerrit.wikimedia.org/r/199671 (owner: 1020after4) [22:06:05] Can I get some root help to kill off the rogue ssh processes on tin? [22:06:13] This will kill them -- ps ax|grep sync-common|grep codfw|awk '{print $1}'|xargs kill [22:06:16] robh, ^ [22:07:04] the processes are owned by uid 4967. Not sure what user that even is [22:07:11] :/ [22:07:20] andrewbogott: got a moment? [22:07:21] I thought they should be owned by mwdeploy [22:07:30] what needs killing? [22:07:41] bd808: i can do that [22:07:59] heh, op not permitted, gotta kill with fire (9) i suppose [22:08:12] krenair@tin:~$ getent passwd "4967" | cut -d: -f1 [22:08:12] twentyafterfour [22:08:14] bd808, ^ [22:08:19] ah [22:08:28] that sort of makes sense [22:08:31] 6operations, 6Security, 10Wikimedia-Shop, 7HTTPS, 5Patch-For-Review: Changing the URL for the Wikimedia Shop - https://phabricator.wikimedia.org/T92438#1151191 (10jeremyb) [22:08:33] greg-g: what’s up? [22:08:37] bd808: huh [22:08:40] still op not permitted [22:08:46] andrewbogott: nevermind, robh got it [22:08:48] 6operations, 6Security, 10Wikimedia-Shop, 7HTTPS, 5Patch-For-Review: Changing the URL for the Wikimedia Shop - https://phabricator.wikimedia.org/T92438#1151194 (10csteipp) >>! In T92438#1151121, @jeremyb wrote: > Also, this is a security problem too because cookies are shared for the whole domain? @cstei... [22:08:51] andrewbogott: (killing things with root-fire) [22:09:04] greg-g: ok then :) [22:09:06] robh: you don't have the rights to kill twentyafterfour's jobs? [22:09:16] im running as sudo... [22:09:18] wtf [22:09:27] (03PS1) 10Gage: IPsec: remove role from amssq* [puppet] - 10https://gerrit.wikimedia.org/r/199758 [22:09:39] wait there [22:09:48] ps ax|grep sync-common|grep codfw|awk '{print $1}'|sudo xargs kill -9 [22:09:50] yea [22:09:53] if they are running as me, presumably I can kill them? [22:09:54] i had missed one sudo its done [22:10:09] and looks like no effect [22:10:12] 6operations, 6Security, 10Wikimedia-Shop, 7HTTPS, 5Patch-For-Review: Changing the URL for the Wikimedia Shop - https://phabricator.wikimedia.org/T92438#1110814 (10jeremyb) >>! In T92438#1151194, @csteipp wrote: > Code that goes into project domains would need a full security review. to clarify this is n... [22:10:20] w00t I see updating cdb file logs [22:10:26] (03Abandoned) 10Dzahn: switch shop name, store.wikipedia.org to shopify [dns] - 10https://gerrit.wikimedia.org/r/199751 (https://phabricator.wikimedia.org/T92438) (owner: 10Dzahn) [22:10:39] or those are new sync commons firing [22:10:40] (03Abandoned) 10Dzahn: adjust shop redirects to changed shop URL [puppet] - 10https://gerrit.wikimedia.org/r/199754 (https://phabricator.wikimedia.org/T92438) (owner: 10Dzahn) [22:10:44] bd808: so its working? [22:10:54] robh: yeah all the bad ones are gone [22:10:58] cool [22:11:12] ok so, bd808 why were they running as me and how did some succeed if that's the wrong thing? [22:11:34] scap runs as you (or whoever starts it) [22:12:03] 6operations, 6Security, 10Wikimedia-Shop, 7HTTPS, 5Patch-For-Review: Changing the URL for the Wikimedia Shop - https://phabricator.wikimedia.org/T92438#1151214 (10Krenair) 5Open>3declined [22:12:05] as for why/how they got stuck we'd need to look at rsync server logs I think [22:12:45] !log robh killed the stuck scap ssh connections to codfw and scap moved on to the next step [22:12:52] and, we have 48 minutes until swat, so either we figure it out or we remove codfw mw's from the list [22:12:54] Logged the message, Master [22:13:44] twentyafterfour: I did a scap that took way way way too long once. The lesson I learned was that it looks stuck it's better to get it unstuck than to hope it eventually finishes [22:14:54] and we should really add some sort of max sync duration for each host [22:15:07] not quite sure how to do that... [22:15:11] 6operations, 6Security, 10Wikimedia-Shop, 7HTTPS: Changing the URL for the Wikimedia Shop - https://phabricator.wikimedia.org/T92438#1151223 (10Krenair) [22:15:58] if one fails, you can then run sync-common on that host to retry it, right? [22:15:58] it should be possible though. The main scap process is alive and polling a queue while the remote commands run [22:16:01] bd808: it didn't really look stuck because the counter kept going up...just really slowly [22:16:27] hmm [22:16:48] I wonder if the log events from codfw don't make it back to fluorine [22:17:20] because there was nothing on fluorine for over an hour [22:17:42] it gets more verbose messaging than the cli does [22:17:46] I was just thinking the network to codfw was really slow for some reason. ping time < 50ms but that's not very good compared to <2ms in eqiad [22:18:15] !log twentyafterfour Finished scap: testwiki to php-1.25wmf23 and rebuild l10n cache (attempt #2) (duration: 149m 25s) [22:18:20] Logged the message, Master [22:18:26] if things are correct the rsyncs are codfw<->codfw after the intial process starts [22:19:21] are we sure things are correct? [22:19:51] one of the killed processes: 22:09:36 ['/srv/deployment/scap/scap/bin/sync-common', '--no-update-l10n', 'mw1010.eqiad.wmnet', 'mw1033.eqiad.wmnet', 'mw1070.eqiad.wmnet', 'mw1097.eqiad.wmnet', 'mw1216.eqiad.wmnet', 'mw1161.eqiad.wmnet', 'mw1201.eqiad.wmnet', 'mw2001.codfw.wmnet', 'mw2041.codfw.wmnet', 'mw2080.codfw.wmnet', 'mw2119.codfw.wmnet', 'mw2187.codfw.wmnet'] on mw1049 returned [-9]: [22:20:08] that's got eqiad and codfw in the list [22:20:12] (03CR) 10Faidon Liambotis: [C: 032] IPsec: remove role from amssq* [puppet] - 10https://gerrit.wikimedia.org/r/199758 (owner: 10Gage) [22:20:27] the list is the rsync hosts [22:20:38] mw1049 was the host that was updating [22:20:52] 149 :( [22:20:53] so yeah.. we got some eqiad hosts [22:21:17] ok so how do we cut codfw off for a resync? [22:22:00] remove from modules/dsh/files/group/mediawiki-installation [22:22:04] yeah [22:22:04] eg: https://gerrit.wikimedia.org/r/#/c/199605/1/modules/dsh/files/group/mediawiki-installation [22:22:28] we need all the mw2* hosts to be commented out [22:24:24] 6operations, 10MediaWiki-extensions-Sentry, 6Multimedia, 3Multimedia-Sprint-2015-03-25: Procure hardware for Sentry - placeholder (not a live request) - https://phabricator.wikimedia.org/T93138#1151260 (10Tgr) [22:25:07] (03PS1) 10BryanDavis: Disable scap to codfw hosts [puppet] - 10https://gerrit.wikimedia.org/r/199763 [22:25:29] (03CR) 10Greg Grossmeier: [C: 031] Disable scap to codfw hosts [puppet] - 10https://gerrit.wikimedia.org/r/199763 (owner: 10BryanDavis) [22:25:48] robh: ^ need some help to get that applied on tin to keep scap moving right now [22:26:25] bd808: can you guys create some kind of task to track that we re-enable it? [22:26:36] yeah [22:27:12] lets reference it in the changeset so we headoff the potential of upsetting folks =] and then yep i'll merge [22:27:24] has anyone without root actually used SSH configuration from https://wikitech.wikimedia.org/wiki/SSH_configuration_notes#SSH_Configration [22:27:33] I added joe to the changeset so he sees it tomorrow [22:27:34] 6operations, 6Release-Engineering: Re-enable codfw scap targets - https://phabricator.wikimedia.org/T93958#1151280 (10bd808) 3NEW [22:27:44] !log reformatting berkelium & curium [22:27:47] yurik, I think I used it once or twice [22:27:49] plus that way on the task we can simply revert the changeset easily and its referenced [22:27:51] Logged the message, Master [22:27:56] Krenair, it gives me Could not resolve hostname bastion1001.wikimedia.org [22:28:01] bast1001 [22:28:07] (03PS2) 10BryanDavis: Disable scap to codfw hosts [puppet] - 10https://gerrit.wikimedia.org/r/199763 (https://phabricator.wikimedia.org/T93958) [22:28:14] added to commit message [22:28:21] yurik, not bastion1001 :) [22:28:33] (03CR) 10RobH: [C: 032] Disable scap to codfw hosts [puppet] - 10https://gerrit.wikimedia.org/r/199763 (https://phabricator.wikimedia.org/T93958) (owner: 10BryanDavis) [22:28:34] 6operations, 6Release-Engineering: Re-enable codfw scap targets - https://phabricator.wikimedia.org/T93958#1151289 (10greg) p:5Triage>3Normal [22:28:38] Krenair, i followed instructions -- replace iron.wikimedia.org with bastion1001.wikimedia.org [22:28:45] waiting for tests to run then will merge [22:28:59] I've fixed it now [22:29:00] could you double check that that config is still relevant for non-roots? [22:29:02] thx! [22:29:18] I think it's out of date in other ways [22:29:22] like, fenari? [22:29:27] exactly :) [22:29:30] don't we have some bastion in codfw now? [22:29:42] not sure if there's a non-root one [22:29:54] PROBLEM - Host berkelium is DOWN: PING CRITICAL - Packet loss = 100% [22:30:04] bd808: chagne is live on puppetmaster [22:30:08] change even [22:30:14] PROBLEM - Host curium is DOWN: PING CRITICAL - Packet loss = 100% [22:30:21] robh: can you force a run on tin? [22:30:23] there is currently no non-root bastion in ulsfo [22:30:24] will do now [22:30:26] is there still sdtpa stuff? [22:30:29] That's where we need the updated file [22:30:38] Krenair: nope [22:30:56] berkelium & curium are paravoid & i doing ipsec testing [22:31:10] tin is mid puppet run, sloww ;P [22:31:23] Krenair: shouldn't [22:31:26] 6operations, 6Release-Engineering: Re-enable codfw scap targets - https://phabricator.wikimedia.org/T93958#1151317 (10bd808) @mmodell's scap ran super long: ``` !log twentyafterfour Finished scap: testwiki to php-1.25wmf23 and rebuild l10n cache (attempt #2) (duration: 149m 25s) ``` It only finis... [22:31:27] yurik, I personally use https://gist.github.com/MaxSem/2b5703bd8839e16a946c [22:31:35] I !logged it :) [22:31:45] RECOVERY - Host berkelium is UP: PING OK - Packet loss = 0%, RTA = 2.76 ms [22:32:05] MaxSem, thanks! But is all that needed for ubuntu? [22:32:23] paravoid: should that be our next devops t-shirt? "I !logged it!" [22:32:27] mmm, what you want and what doesn't work? [22:32:45] RECOVERY - Host curium is UP: PING OK - Packet loss = 0%, RTA = 0.67 ms [22:33:33] twentyafterfour: codfw hosts are out. Want to run scap again? [22:33:41] 6operations, 6Security, 10Wikimedia-Shop, 7HTTPS: Changing the URL for the Wikimedia Shop - https://phabricator.wikimedia.org/T92438#1151330 (10Dzahn) I called digicert and shopify support to have the request canceled. Sorry about that @vshchepakina [22:33:43] MaxSem, was trying to improve my shell - trying to make it so that ssh tin.wikimedia.org (or if possible - ssh tin or ssh tin.w) would go directly to the tin [22:33:57] eh [22:34:12] first of all, there is no such host as tin.wikimedia.org [22:34:19] bd808, mutante, removed fenari, pmtpa and sdtpa references [22:34:23] is there a codfw bastion? [22:35:12] bd808: ok running it [22:35:15] PROBLEM - configured eth on berkelium is CRITICAL: Connection refused by host [22:35:27] yurik: "ProxyCommand ssh -a -W %h:%p bast1001.wikimedia.org" should be the magic you need [22:35:36] PROBLEM - Disk space on berkelium is CRITICAL: Connection refused by host [22:35:43] !log twentyafterfour Started scap: testwiki to php-1.25wmf23 (this time without codfw) [22:35:48] Under "Host *.eqiad.wmnet" in your ~/.ssh/config [22:35:49] Logged the message, Master [22:35:55] PROBLEM - dhclient process on berkelium is CRITICAL: Connection refused by host [22:36:05] PROBLEM - DPKG on curium is CRITICAL: Connection refused by host [22:36:14] PROBLEM - RAID on berkelium is CRITICAL: Connection refused by host [22:36:25] PROBLEM - salt-minion processes on berkelium is CRITICAL: Connection refused by host [22:36:26] PROBLEM - DPKG on berkelium is CRITICAL: Connection refused by host [22:36:35] PROBLEM - Disk space on curium is CRITICAL: Connection refused by host [22:36:46] PROBLEM - configured eth on curium is CRITICAL: Connection refused by host [22:36:55] PROBLEM - dhclient process on curium is CRITICAL: Connection refused by host [22:37:08] yurik: my big ol .ssh/config for WMF: https://phabricator.wikimedia.org/P433 [22:37:23] (mostly copied from paravoi-d ) [22:37:26] PROBLEM - salt-minion processes on curium is CRITICAL: Connection refused by host [22:37:35] PROBLEM - RAID on curium is CRITICAL: Connection refused by host [22:39:16] twentyafterfour: looks like it is zipping right along now [22:39:16] !log twentyafterfour Finished scap: testwiki to php-1.25wmf23 (this time without codfw) (duration: 03m 33s) [22:39:22] Logged the message, Master [22:40:23] bd808: yeah .. every step took a loooong time before. [22:40:38] yay [22:40:39] so... nobody know about codfw bastions? [22:40:52] Krenair: there are no non-root codfw bastions [22:41:01] Krenair: there is one but only for roots so far, bast2001 [22:41:02] 15:30 < robh> there is currently no non-root bastion in ulsfo [22:41:08] all the codfw stuff is 2xxxx [22:41:13] will stick it on the page anyway [22:41:25] ha, he asked about codfw [22:41:27] i answered ulsfo [22:41:35] hah, I mentally switched it [22:41:39] It was relevant anyway [22:43:26] RECOVERY - configured eth on berkelium is OK: NRPE: Unable to read output [22:43:55] RECOVERY - Disk space on berkelium is OK: DISK OK [22:44:06] RECOVERY - dhclient process on berkelium is OK: PROCS OK: 0 processes with command name dhclient [22:44:19] argh [22:44:25] RECOVERY - RAID on berkelium is OK: NRPE: Unable to read output [22:44:56] PROBLEM - mediawiki-installation DSH group on mw2092 is CRITICAL: Host mw2092 is not in mediawiki-installation dsh group [22:44:56] PROBLEM - mediawiki-installation DSH group on mw2143 is CRITICAL: Host mw2143 is not in mediawiki-installation dsh group [22:44:56] PROBLEM - mediawiki-installation DSH group on mw2165 is CRITICAL: Host mw2165 is not in mediawiki-installation dsh group [22:44:56] PROBLEM - mediawiki-installation DSH group on mw2173 is CRITICAL: Host mw2173 is not in mediawiki-installation dsh group [22:44:56] PROBLEM - mediawiki-installation DSH group on mw2145 is CRITICAL: Host mw2145 is not in mediawiki-installation dsh group [22:44:56] PROBLEM - mediawiki-installation DSH group on mw2109 is CRITICAL: Host mw2109 is not in mediawiki-installation dsh group [22:45:15] RECOVERY - dhclient process on curium is OK: PROCS OK: 0 processes with command name dhclient [22:45:15] PROBLEM - mediawiki-installation DSH group on mw2076 is CRITICAL: Host mw2076 is not in mediawiki-installation dsh group [22:45:15] PROBLEM - mediawiki-installation DSH group on mw2097 is CRITICAL: Host mw2097 is not in mediawiki-installation dsh group [22:45:15] PROBLEM - mediawiki-installation DSH group on mw2135 is CRITICAL: Host mw2135 is not in mediawiki-installation dsh group [22:45:15] PROBLEM - mediawiki-installation DSH group on mw2062 is CRITICAL: Host mw2062 is not in mediawiki-installation dsh group [22:45:15] PROBLEM - mediawiki-installation DSH group on mw2073 is CRITICAL: Host mw2073 is not in mediawiki-installation dsh group [22:45:16] PROBLEM - mediawiki-installation DSH group on mw2114 is CRITICAL: Host mw2114 is not in mediawiki-installation dsh group [22:45:16] PROBLEM - mediawiki-installation DSH group on mw2086 is CRITICAL: Host mw2086 is not in mediawiki-installation dsh group [22:45:28] we did that ^ [22:45:35] PROBLEM - mediawiki-installation DSH group on mw2100 is CRITICAL: Host mw2100 is not in mediawiki-installation dsh group [22:45:35] PROBLEM - mediawiki-installation DSH group on mw2025 is CRITICAL: Host mw2025 is not in mediawiki-installation dsh group [22:45:35] PROBLEM - mediawiki-installation DSH group on mw2003 is CRITICAL: Host mw2003 is not in mediawiki-installation dsh group [22:45:35] PROBLEM - mediawiki-installation DSH group on mw2059 is CRITICAL: Host mw2059 is not in mediawiki-installation dsh group [22:45:35] PROBLEM - mediawiki-installation DSH group on mw2160 is CRITICAL: Host mw2160 is not in mediawiki-installation dsh group [22:45:35] PROBLEM - mediawiki-installation DSH group on mw2206 is CRITICAL: Host mw2206 is not in mediawiki-installation dsh group [22:45:45] RECOVERY - salt-minion processes on curium is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [22:45:45] RECOVERY - RAID on curium is OK: NRPE: Unable to read output [22:45:46] PROBLEM - mediawiki-installation DSH group on mw2102 is CRITICAL: Host mw2102 is not in mediawiki-installation dsh group [22:45:46] PROBLEM - mediawiki-installation DSH group on mw2015 is CRITICAL: Host mw2015 is not in mediawiki-installation dsh group [22:45:46] PROBLEM - mediawiki-installation DSH group on mw2037 is CRITICAL: Host mw2037 is not in mediawiki-installation dsh group [22:45:55] PROBLEM - mediawiki-installation DSH group on mw2034 is CRITICAL: Host mw2034 is not in mediawiki-installation dsh group [22:45:55] PROBLEM - mediawiki-installation DSH group on mw2108 is CRITICAL: Host mw2108 is not in mediawiki-installation dsh group [22:45:55] PROBLEM - mediawiki-installation DSH group on mw2197 is CRITICAL: Host mw2197 is not in mediawiki-installation dsh group [22:45:55] PROBLEM - mediawiki-installation DSH group on mw2172 is CRITICAL: Host mw2172 is not in mediawiki-installation dsh group [22:45:55] PROBLEM - mediawiki-installation DSH group on mw2169 is CRITICAL: Host mw2169 is not in mediawiki-installation dsh group [22:45:55] PROBLEM - mediawiki-installation DSH group on mw2072 is CRITICAL: Host mw2072 is not in mediawiki-installation dsh group [22:46:02] paravoid, robh: how do we silence that? [22:46:04] RECOVERY - DPKG on curium is OK: All packages OK [22:46:06] PROBLEM - mediawiki-installation DSH group on mw2085 is CRITICAL: Host mw2085 is not in mediawiki-installation dsh group [22:46:06] PROBLEM - mediawiki-installation DSH group on mw2083 is CRITICAL: Host mw2083 is not in mediawiki-installation dsh group [22:46:06] PROBLEM - mediawiki-installation DSH group on mw2005 is CRITICAL: Host mw2005 is not in mediawiki-installation dsh group [22:46:06] PROBLEM - mediawiki-installation DSH group on mw2056 is CRITICAL: Host mw2056 is not in mediawiki-installation dsh group [22:46:06] PROBLEM - mediawiki-installation DSH group on mw2105 is CRITICAL: Host mw2105 is not in mediawiki-installation dsh group [22:46:06] (03PS1) 10Faidon Liambotis: Switch berkelium & curium to jessie [puppet] - 10https://gerrit.wikimedia.org/r/199774 [22:46:11] argh i have no idea [22:46:22] bd808: turn them back on! [22:46:24] ;D [22:46:25] PROBLEM - mediawiki-installation DSH group on mw2011 is CRITICAL: Host mw2011 is not in mediawiki-installation dsh group [22:46:25] hahaha, my epic check works! :P [22:46:25] PROBLEM - mediawiki-installation DSH group on mw2014 is CRITICAL: Host mw2014 is not in mediawiki-installation dsh group [22:46:25] PROBLEM - mediawiki-installation DSH group on mw2066 is CRITICAL: Host mw2066 is not in mediawiki-installation dsh group [22:46:25] PROBLEM - mediawiki-installation DSH group on mw2028 is CRITICAL: Host mw2028 is not in mediawiki-installation dsh group [22:46:25] PROBLEM - mediawiki-installation DSH group on mw2099 is CRITICAL: Host mw2099 is not in mediawiki-installation dsh group [22:46:25] PROBLEM - mediawiki-installation DSH group on mw2041 is CRITICAL: Host mw2041 is not in mediawiki-installation dsh group [22:46:26] PROBLEM - mediawiki-installation DSH group on mw2113 is CRITICAL: Host mw2113 is not in mediawiki-installation dsh group [22:46:26] PROBLEM - mediawiki-installation DSH group on mw2053 is CRITICAL: Host mw2053 is not in mediawiki-installation dsh group [22:46:27] RECOVERY - Disk space on curium is OK: DISK OK [22:46:28] but yea... fuck [22:46:33] (03CR) 10Faidon Liambotis: [C: 032] Switch berkelium & curium to jessie [puppet] - 10https://gerrit.wikimedia.org/r/199774 (owner: 10Faidon Liambotis) [22:46:34] PROBLEM - mediawiki-installation DSH group on mw2164 is CRITICAL: Host mw2164 is not in mediawiki-installation dsh group [22:46:34] PROBLEM - mediawiki-installation DSH group on mw2096 is CRITICAL: Host mw2096 is not in mediawiki-installation dsh group [22:46:35] PROBLEM - mediawiki-installation DSH group on mw2080 is CRITICAL: Host mw2080 is not in mediawiki-installation dsh group [22:46:35] PROBLEM - mediawiki-installation DSH group on mw2118 is CRITICAL: Host mw2118 is not in mediawiki-installation dsh group [22:46:35] PROBLEM - mediawiki-installation DSH group on mw2199 is CRITICAL: Host mw2199 is not in mediawiki-installation dsh group [22:46:36] RECOVERY - configured eth on curium is OK: NRPE: Unable to read output [22:46:45] PROBLEM - mediawiki-installation DSH group on mw2046 is CRITICAL: Host mw2046 is not in mediawiki-installation dsh group [22:46:45] PROBLEM - mediawiki-installation DSH group on mw2036 is CRITICAL: Host mw2036 is not in mediawiki-installation dsh group [22:46:45] PROBLEM - mediawiki-installation DSH group on mw2104 is CRITICAL: Host mw2104 is not in mediawiki-installation dsh group [22:46:45] PROBLEM - mediawiki-installation DSH group on mw2192 is CRITICAL: Host mw2192 is not in mediawiki-installation dsh group [22:46:45] PROBLEM - mediawiki-installation DSH group on mw2017 is CRITICAL: Host mw2017 is not in mediawiki-installation dsh group [22:46:45] PROBLEM - mediawiki-installation DSH group on mw2030 is CRITICAL: Host mw2030 is not in mediawiki-installation dsh group [22:46:45] PROBLEM - mediawiki-installation DSH group on mw2166 is CRITICAL: Host mw2166 is not in mediawiki-installation dsh group [22:47:01] bd808: we should have put them off monitoring basically [22:47:04] i didnt realize that was a check [22:47:09] its going to trigger for all the codfw [22:47:14] MaxSem: haha, indeed it works. can we add it as commented hosts ?:) [22:47:20] once the deployment is over, can we figure out why they werent syncing? [22:47:26] PROBLEM - mediawiki-installation DSH group on mw2016 is CRITICAL: Host mw2016 is not in mediawiki-installation dsh group [22:47:26] PROBLEM - mediawiki-installation DSH group on mw2022 is CRITICAL: Host mw2022 is not in mediawiki-installation dsh group [22:47:26] PROBLEM - mediawiki-installation DSH group on mw2163 is CRITICAL: Host mw2163 is not in mediawiki-installation dsh group [22:47:26] PROBLEM - mediawiki-installation DSH group on mw2170 is CRITICAL: Host mw2170 is not in mediawiki-installation dsh group [22:47:26] PROBLEM - mediawiki-installation DSH group on mw2112 is CRITICAL: Host mw2112 is not in mediawiki-installation dsh group [22:47:26] PROBLEM - mediawiki-installation DSH group on mw2079 is CRITICAL: Host mw2079 is not in mediawiki-installation dsh group [22:47:26] PROBLEM - mediawiki-installation DSH group on mw2174 is CRITICAL: Host mw2174 is not in mediawiki-installation dsh group [22:47:27] PROBLEM - mediawiki-installation DSH group on mw2057 is CRITICAL: Host mw2057 is not in mediawiki-installation dsh group [22:47:27] PROBLEM - mediawiki-installation DSH group on mw2045 is CRITICAL: Host mw2045 is not in mediawiki-installation dsh group [22:47:29] I assumed we were going to [22:47:32] after swat, yeah [22:47:35] PROBLEM - mediawiki-installation DSH group on mw2082 is CRITICAL: Host mw2082 is not in mediawiki-installation dsh group [22:47:35] PROBLEM - mediawiki-installation DSH group on mw2090 is CRITICAL: Host mw2090 is not in mediawiki-installation dsh group [22:47:35] PROBLEM - mediawiki-installation DSH group on mw2136 is CRITICAL: Host mw2136 is not in mediawiki-installation dsh group [22:47:35] PROBLEM - mediawiki-installation DSH group on mw2146 is CRITICAL: Host mw2146 is not in mediawiki-installation dsh group [22:47:36] mutante, that would kill the idea ;) [22:47:41] cool, then i guess we just get to have them alert [22:47:42] they didnt sync because of what icinga-wm tells us? [22:47:45] PROBLEM - mediawiki-installation DSH group on mw2128 is CRITICAL: Host mw2128 is not in mediawiki-installation dsh group [22:47:45] PROBLEM - mediawiki-installation DSH group on mw2183 is CRITICAL: Host mw2183 is not in mediawiki-installation dsh group [22:47:47] (03CR) 10Faidon Liambotis: [V: 032] Switch berkelium & curium to jessie [puppet] - 10https://gerrit.wikimedia.org/r/199774 (owner: 10Faidon Liambotis) [22:47:56] (03PS2) 10Faidon Liambotis: IPsec: use aes128gcm instead of aes256gcm [puppet] - 10https://gerrit.wikimedia.org/r/198685 (owner: 10Gage) [22:48:02] (03CR) 10Faidon Liambotis: [C: 032 V: 032] IPsec: use aes128gcm instead of aes256gcm [puppet] - 10https://gerrit.wikimedia.org/r/198685 (owner: 10Gage) [22:48:02] mutante: correct, we killed codfw from syncs due to failures and swat deployment was runnig [22:48:05] PROBLEM - mediawiki-installation DSH group on mw2093 is CRITICAL: Host mw2093 is not in mediawiki-installation dsh group [22:48:05] and codfw isnt serving anything [22:48:06] PROBLEM - mediawiki-installation DSH group on mw2103 is CRITICAL: Host mw2103 is not in mediawiki-installation dsh group [22:48:15] (03PS2) 10Faidon Liambotis: IPsec: Icinga monitor for Strongswan connections [puppet] - 10https://gerrit.wikimedia.org/r/199561 (owner: 10Gage) [22:48:16] none of us realized that was a check on each mw host [22:48:20] (03CR) 1020after4: [C: 032] Wikipedias to 1.25wmf22 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/199656 (owner: 1020after4) [22:48:22] (03CR) 10Faidon Liambotis: [C: 032 V: 032] IPsec: Icinga monitor for Strongswan connections [puppet] - 10https://gerrit.wikimedia.org/r/199561 (owner: 10Gage) [22:48:24] PROBLEM - mediawiki-installation DSH group on mw2023 is CRITICAL: Host mw2023 is not in mediawiki-installation dsh group [22:48:25] PROBLEM - mediawiki-installation DSH group on mw2013 is CRITICAL: Host mw2013 is not in mediawiki-installation dsh group [22:48:25] PROBLEM - mediawiki-installation DSH group on mw2161 is CRITICAL: Host mw2161 is not in mediawiki-installation dsh group [22:48:29] the plan is to investigate and re-enable them post swat [22:48:43] mutante: https://phabricator.wikimedia.org/T93958 [22:48:44] PROBLEM - mediawiki-installation DSH group on mw2089 is CRITICAL: Host mw2089 is not in mediawiki-installation dsh group [22:48:44] PROBLEM - mediawiki-installation DSH group on mw2033 is CRITICAL: Host mw2033 is not in mediawiki-installation dsh group [22:48:56] PROBLEM - mediawiki-installation DSH group on mw2106 is CRITICAL: Host mw2106 is not in mediawiki-installation dsh group [22:48:56] ok, gotcha! [22:49:14] PROBLEM - mediawiki-installation DSH group on mw2127 is CRITICAL: Host mw2127 is not in mediawiki-installation dsh group [22:49:15] PROBLEM - mediawiki-installation DSH group on mw2043 is CRITICAL: Host mw2043 is not in mediawiki-installation dsh group [22:49:25] PROBLEM - mediawiki-installation DSH group on mw2095 is CRITICAL: Host mw2095 is not in mediawiki-installation dsh group [22:49:25] PROBLEM - mediawiki-installation DSH group on mw2202 is CRITICAL: Host mw2202 is not in mediawiki-installation dsh group [22:49:43] let me disable notifications for that [22:49:45] PROBLEM - mediawiki-installation DSH group on mw2098 is CRITICAL: Host mw2098 is not in mediawiki-installation dsh group [22:49:45] PROBLEM - mediawiki-installation DSH group on mw2032 is CRITICAL: Host mw2032 is not in mediawiki-installation dsh group [22:49:55] PROBLEM - mediawiki-installation DSH group on mw2121 is CRITICAL: Host mw2121 is not in mediawiki-installation dsh group [22:50:23] !log disabled notifications for dsh group checks in icinga - reenable me after T93958 [22:50:27] thanks mutante [22:50:28] Logged the message, Master [22:50:40] jouncebot, next [22:50:40] (03Merged) 10jenkins-bot: Wikipedias to 1.25wmf22 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/199656 (owner: 1020after4) [22:50:40] In 0 hour(s) and 9 minute(s): Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150325T2300) [22:51:00] still deploying, twentyafterfour? [22:51:24] PROBLEM - Host curium is DOWN: PING CRITICAL - Packet loss = 100% [22:51:25] PROBLEM - Host berkelium is DOWN: PING CRITICAL - Packet loss = 100% [22:51:26] that was probably just the first scap to get the new branch out there before updating the config [22:51:38] !log twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedias to 1.25wmf22 [22:51:43] yep [22:51:43] Logged the message, Master [22:51:45] It may have been [22:51:53] so what broke when scapping to codfw? [22:51:56] but it shoudn't matter really [22:52:03] MaxSem: "shit didn't work" [22:52:08] k... I had been hoping to get a ton of config patches in before swat [22:52:15] greg-g: CAlling the thing train... Do you want to get delays? Because that's how you get delays [22:52:17] James_F: 1.25wmf22 is out [22:52:17] but it looks like even swat will be delayed now [22:52:23] hoo: lol [22:52:31] unless that was it? I don't know the train deploy process [22:52:44] lol [22:52:45] The codfw rsync slaves synced with tin but then 58 MW servers hung up syncing with their slaves [22:52:52] twentyafterfour: Thanks! [22:52:58] (03CR) 1020after4: [C: 032] Group0 to 1.25wmf23 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/199657 (owner: 1020after4) [22:53:00] did it just rsync between DCs for hosts that are not rsync proxies? [22:53:03] (03Merged) 10jenkins-bot: Group0 to 1.25wmf23 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/199657 (owner: 1020after4) [22:54:02] MaxSem: not sure. I wasn't getting detailed logs on fluorine to tell me what hosts were syncing with each other from codfw [22:54:06] Post-merge build failed. " Failed deployment on the EQIAD beta cluster :-/ Please contact a member of the beta project to fixup the working directory on the destination server. in 1s" [22:54:24] beta cluster? [22:54:35] RECOVERY - Host berkelium is UP: PING OK - Packet loss = 0%, RTA = 1.68 ms [22:54:39] greg-g: https://gerrit.wikimedia.org/r/#/c/199657/ [22:55:07] (03CR) 1020after4: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/199657 (owner: 1020after4) [22:55:13] hoo, yeah - and trains also get wrecked. still, prolly better than falling from the skies as planes do [22:55:24] twentyafterfour, greg-g: I'll look [22:55:25] 22:53:05 error: unable to unlink old 'php' (Permission denied) [22:55:25] 22:53:05 error: unable to unlink old 'wikiversions.json' (Permission denied) [22:55:28] 22:53:05 fatal: Could not reset index file to revision 'eb8394491173352d0eaf5aaace9c84f15a5b80f6'. [22:55:35] RECOVERY - Host curium is UP: PING OK - Packet loss = 0%, RTA = 1.23 ms [22:55:44] bah [22:56:36] ebernhardson, superm401 um [22:56:38] !log twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.25wmf23 [22:56:45] Logged the message, Master [22:57:05] Krenair: hey [22:57:16] https://gerrit.wikimedia.org/r/#/c/199680/ [22:57:19] !log twentyafterfour Purged l10n cache for 1.25wmf21 [22:57:24] Logged the message, Master [22:57:29] Are you asking for an actual schema change here? [22:57:30] Krenair: has no effect on production [22:57:36] right, just the sync [22:57:41] 6operations, 6Security, 10Wikimedia-Shop, 7HTTPS: Changing the URL for the Wikimedia Shop - https://phabricator.wikimedia.org/T92438#1151392 (10vshchepakina) @Dzahn bummer! let's keep it store.wikimedia.org then. Can store.wikipedia.org be still redirected to store.wikimedia.org? [22:57:43] Krenair: it had to go through, or jenkins -1 everything [22:57:52] ok [22:57:54] Krenair: due to the recent switch to mysql running jenkins jobs [22:57:58] Krenair, just deploying it will do nothing since we never run update.php. [22:58:13] yes, I'm aware :) [22:58:15] PROBLEM - configured eth on berkelium is CRITICAL: Connection refused by host [22:58:18] ok swat can continue I'm done [22:58:26] we haven't started yet :) [22:58:32] I'm still reviewing patches [22:58:35] PROBLEM - Disk space on berkelium is CRITICAL: Connection refused by host [22:58:41] ebernhardson, did you check with springle anyway? [22:58:48] assuming he'll have to do the production schema change at some point [22:58:55] PROBLEM - RAID on curium is CRITICAL: Connection refused by host [22:58:55] PROBLEM - dhclient process on berkelium is CRITICAL: Connection refused by host [22:59:05] PROBLEM - DPKG on curium is CRITICAL: Connection refused by host [22:59:15] PROBLEM - RAID on berkelium is CRITICAL: Timeout while attempting connection [22:59:18] Krenair, there is already a bug filed asking them to deploy. [22:59:29] Krenair: yes there is a ticket for springle T93844 for that to run in prod [22:59:33] There's no huge rush; we just can't deploy to any wiki with a DB name longer than 16 until it's done. [22:59:36] PROBLEM - salt-minion processes on berkelium is CRITICAL: Timeout while attempting connection [22:59:38] PROBLEM - DPKG on berkelium is CRITICAL: Timeout while attempting connection [22:59:39] PROBLEM - Disk space on curium is CRITICAL: Timeout while attempting connection [22:59:43] So we'd like to get it deployed but it's not "drop everything" [22:59:56] PROBLEM - configured eth on curium is CRITICAL: Timeout while attempting connection [23:00:04] RoanKattouw, ^d, superm401: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150325T2300). Please do the needful. [23:00:05] alright, let's see [23:00:14] PROBLEM - dhclient process on curium is CRITICAL: Timeout while attempting connection [23:00:27] Krenair: I think there might still be a deploy ongoing? [23:00:35] Because of codfw problems [23:00:37] ok swat can continue I'm done [23:00:45] PROBLEM - salt-minion processes on curium is CRITICAL: Timeout while attempting connection [23:00:58] (03PS3) 10Alex Monk: Use a dblist for Flow [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194809 (owner: 10Mattflaschen) [23:01:08] (03CR) 10Alex Monk: [C: 032] Use a dblist for Flow [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194809 (owner: 10Mattflaschen) [23:01:25] we still need to figure out why codfw didn't sync up properly, but I'm not deploying right now. just updating the release notes [23:01:46] PROBLEM - Host berkelium is DOWN: PING CRITICAL - Packet loss = 100% [23:02:14] ebernhardson, superm401: you need to address https://gerrit.wikimedia.org/r/#/c/196068/ [23:02:36] (03Merged) 10jenkins-bot: Use a dblist for Flow [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194809 (owner: 10Mattflaschen) [23:02:43] (03CR) 10Mattflaschen: "Yes" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/196068 (https://phabricator.wikimedia.org/T90670) (owner: 10EBernhardson) [23:02:55] RECOVERY - Host berkelium is UP: PING OK - Packet loss = 0%, RTA = 1.59 ms [23:03:25] !log krenair Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/194809/ - dblist for flow (duration: 00m 08s) [23:03:26] superm401 [23:03:31] Logged the message, Master [23:03:38] Krenair, yeah, all the blockers are resolved. [23:03:42] Krenair: I'll have a patch for swat in a few minutes [23:03:46] ok [23:05:00] is it ok superm401? [23:05:42] Something broke, not sure if that caused it. [23:05:50] But Flow is down. [23:06:11] reverting [23:06:14] !log krenair Synchronized wmf-config: rv (duration: 00m 07s) [23:06:19] Logged the message, Master [23:06:47] yeah something went wrong [23:07:10] thats the dblist one? [23:07:24] yes [23:07:43] ok, we will recheck and revisit that for another swat. best bet for now is revert [23:07:49] He ddi. [23:07:57] Let me check the exception. [23:08:03] 331 No such file or directory in /srv/mediawiki/wmf-config/CommonSettings.php on line 175 [23:08:04] 331 array_map(): Argument #2 should be an array or collection [23:08:04] 330 in_array() expects parameter 2 to be an array or collection in /srv/mediawiki/wmf-config/CommonSettings.php on line 176 [23:08:04] 2015-03-25 23:06:01 mw1108 mediawikiwiki: [54643e1b] /wiki/Talk:Sandbox MWException from line 324 of /srv/mediawiki/php-1.25wmf23/includes/content/ContentHandler.php: No handler for model 'flow-board' registered in $wgContentHandlers [23:08:09] 6operations, 6Security, 10Wikimedia-Shop, 7HTTPS: Changing the URL for the Wikimedia Shop - https://phabricator.wikimedia.org/T92438#1151406 (10Dzahn) 5declined>3Open @vshchepakina http://store.wikipedia.org/ is a working redirect, just to http://shop.wikimedia.org because that's the main shop URL.... [23:08:26] I wonder if flow.dblist didn't get synced properly. [23:08:31] Because that definitely exists. [23:08:49] (03PS1) 10Legoktm: Convert more extensions to use extension registration! [mediawiki-config] - 10https://gerrit.wikimedia.org/r/199779 [23:09:02] Krenair, yeah, that's what the error says: [23:09:22] flow would've been disabled if it failed to load that file [23:09:25] "$IP/../$tag.dblist" was somehow missing [23:09:33] Krenair: https://gerrit.wikimedia.org/r/#/c/199779/ is the patch, adding to wikitech now [23:09:33] Can you try again and explicitly sync-file flow.dblist separately? [23:09:50] (03CR) 10Jforrester: [C: 031] Convert more extensions to use extension registration! [mediawiki-config] - 10https://gerrit.wikimedia.org/r/199779 (owner: 10Legoktm) [23:10:03] Krenair, oh, did you sync-dir wmf-config? [23:10:06] Because it's not in that directory. [23:10:08] sure [23:10:23] oh. [23:10:24] damn. [23:10:27] All of the dblist files are one up. [23:10:32] of course not [23:10:36] my bad, sorry [23:10:40] No problem [23:10:54] 6operations, 6Security, 10Wikimedia-Shop, 7HTTPS: Changing the URL for the Wikimedia Shop - https://phabricator.wikimedia.org/T92438#1151410 (10Dzahn) here are the existing redirects: # Shop redirects funnel shop.wiktionary.org //shop.wikimedia.org funnel store.wiktionary.org //shop.wikimedia.org fun... [23:10:57] !log krenair Synchronized flow.dblist: (oops) (duration: 00m 08s) [23:11:03] Logged the message, Master [23:11:10] okay... [23:11:13] now we can try wmf-config [23:11:34] !log krenair Synchronized wmf-config: trying again (duration: 00m 08s) [23:11:39] Logged the message, Master [23:12:00] Fine now on both branches. [23:12:40] ori has (V-1'd) a commit from november to move them all into a 'dblists' directory [23:13:12] failed the tests, but the records of why are gone now :/ [23:14:06] 6operations, 6Security, 10Wikimedia-Shop, 7HTTPS: Changing the URL for the Wikimedia Shop - https://phabricator.wikimedia.org/T92438#1151432 (10vshchepakina) @Dzahn I really prefer store.wikimedia.org, especially since we can't have store.wikipedia.org as the main URL. Any chance you could contact Shopify... [23:14:12] (03PS6) 10Alex Monk: Enable editing of Flow posts, by autoconfirmed users, on mediawikiwiki, enwiki, ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/196068 (https://phabricator.wikimedia.org/T90670) (owner: 10EBernhardson) [23:14:28] (03PS1) 10Faidon Liambotis: Revert "IPsec: Icinga monitor for Strongswan connections" [puppet] - 10https://gerrit.wikimedia.org/r/199781 [23:14:33] (03CR) 10Alex Monk: [C: 032] Enable editing of Flow posts, by autoconfirmed users, on mediawikiwiki, enwiki, ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/196068 (https://phabricator.wikimedia.org/T90670) (owner: 10EBernhardson) [23:15:03] (03CR) 10Faidon Liambotis: [C: 032 V: 032] Revert "IPsec: Icinga monitor for Strongswan connections" [puppet] - 10https://gerrit.wikimedia.org/r/199781 (owner: 10Faidon Liambotis) [23:15:38] 7Blocked-on-Operations, 6operations, 10Continuous-Integration, 6Release-Engineering, 6Scrum-of-Scrums: Jenkins: Re-enable lint checks for Apache config in operations-puppet - https://phabricator.wikimedia.org/T72068#1151434 (10Dzahn) Why do we have to write something new? I wasn't asking for a new featur... [23:18:08] What's up with jenkins? [23:18:34] RECOVERY - dhclient process on berkelium is OK: PROCS OK: 0 processes with command name dhclient [23:18:44] RECOVERY - DPKG on curium is OK: All packages OK [23:18:45] RECOVERY - RAID on berkelium is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0 [23:18:55] hoo, is it stuck? [23:19:05] RECOVERY - salt-minion processes on berkelium is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [23:19:05] RECOVERY - DPKG on berkelium is OK: All packages OK [23:19:06] RECOVERY - Disk space on curium is OK: DISK OK [23:19:13] certainly looks very slow: https://integration.wikimedia.org/ci/job/mediawiki-phpunit-zend/4122/console [23:19:15] 6operations, 10hardware-requests, 3Continuous-Integration-Isolation: eqiad: 2 hardware access request for CI isolation on labsnet - https://phabricator.wikimedia.org/T93076#1151467 (10RobH) These are two fairly low requirement systems, can they share a single host? [23:19:25] RECOVERY - configured eth on curium is OK: NRPE: Unable to read output [23:19:26] RECOVERY - configured eth on berkelium is OK: NRPE: Unable to read output [23:19:36] RECOVERY - dhclient process on curium is OK: PROCS OK: 0 processes with command name dhclient [23:19:42] but still going [23:19:55] RECOVERY - Disk space on berkelium is OK: DISK OK [23:19:56] mh, ok [23:20:05] RECOVERY - salt-minion processes on curium is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [23:20:15] RECOVERY - RAID on curium is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0 [23:20:26] now that it's using mysql it's pretty slow [23:21:05] PROBLEM - puppet last run on amssq45 is CRITICAL: CRITICAL: puppet fail [23:22:16] sigh :( [23:22:43] (03Merged) 10jenkins-bot: Enable editing of Flow posts, by autoconfirmed users, on mediawikiwiki, enwiki, ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/196068 (https://phabricator.wikimedia.org/T90670) (owner: 10EBernhardson) [23:22:48] the real problem is that we have tests that are dependent upon the database :P [23:22:57] legoktm++++++++++++++++ [23:23:05] that. ain't. unit. tests. [23:23:45] !log krenair Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/196068/ (duration: 00m 09s) [23:23:49] superm401 [23:23:50] Logged the message, Master [23:24:47] Krenair, works. [23:24:55] ok [23:24:58] ebernhardson's changes next [23:25:26] Krenair: kk [23:25:54] (03PS1) 10Gage: IPsec: Icinga monitor for Strongswan connections [puppet] - 10https://gerrit.wikimedia.org/r/199787 [23:26:22] ebernhardson, compare post is somehow broken. [23:26:28] https://www.mediawiki.org/w/index.php?title=Topic:Se7l8dzgtdcd1f47&action=compare-post-revisions&topic_newRevision=se880pq4xpx8bixr [23:26:29] Krenair: yet again this is supression related, the only page i know actively having the problem is hewiki, wmf22 [23:26:30] ebernhardson [23:26:36] !log krenair Synchronized php-1.25wmf22/extensions/Flow: https://gerrit.wikimedia.org/r/#/c/199686/ (duration: 00m 09s) [23:26:41] Logged the message, Master [23:26:51] ok [23:27:06] Looking now [23:28:17] ebernhardson, the wmf23 change is actually simple enough that I'm OK with only testing the wmf22 equivalent, if you guys are happy with it [23:28:48] Krenair: looks like you already synced out 22? i'm not seeing the error on he anymore [23:29:28] yes [23:29:29] I did [23:29:31] see the log :) [23:29:46] (03PS6) 10Dzahn: phab: small lint fixes phd.pp,init.pp [puppet] - 10https://gerrit.wikimedia.org/r/198750 [23:29:47] that's why I pinged you :) [23:30:35] Has anyone else found bast1001 very slow lately? [23:30:43] superm401: the problem with compare-revisions is a typo [23:30:51] superm401: $this->permission-> when it shoudl be permissions [23:31:02] * ebernhardson kinda wishes we used hacklang which staticaly asserts these things :P [23:31:11] I've had some weird issues with ssh to the wmf cluster as well, actually [23:31:21] mostly just sessions freezing if I leave them in the background for a while [23:31:34] bit of slowness about a week ago [23:31:45] superm401: patch incoming for the typo [23:31:51] Thanks. [23:32:01] ah, damn [23:32:12] its probably too late to fit into swat today(with how long jenkins will take) [23:33:35] PROBLEM - NTP on curium is CRITICAL: NTP CRITICAL: Offset unknown [23:33:43] Krenair: nothing to revert in regards to patch i just added, its a bug with code we shipped to train deploy [23:33:45] PROBLEM - NTP on berkelium is CRITICAL: NTP CRITICAL: Offset unknown [23:33:58] ok [23:34:02] moving on to wmf23 then [23:35:05] (03PS1) 10Dzahn: shop redirects: store instead of shop [puppet] - 10https://gerrit.wikimedia.org/r/199791 (https://phabricator.wikimedia.org/T92438) [23:35:31] (03CR) 10Dzahn: [C: 032] phab: small lint fixes phd.pp,init.pp [puppet] - 10https://gerrit.wikimedia.org/r/198750 (owner: 10Dzahn) [23:35:32] ebernhardson [23:35:36] !log krenair Synchronized php-1.25wmf23/extensions/Flow/includes: https://gerrit.wikimedia.org/r/#/c/199684/1 (duration: 00m 09s) [23:35:41] Logged the message, Master [23:35:58] Krenair: poking round [23:36:14] ah right, this is the one we thought was OK based on the wmf22 test [23:36:28] Krenair: yea, nothing seems obviously broken. probably fine [23:36:31] but the issue has not yet appeared on a wmf23 wiki, right? [23:36:34] correct [23:36:35] nothing bad in the logs [23:36:49] yep ok, that's fine [23:36:54] RECOVERY - NTP on curium is OK: NTP OK: Offset -0.05847370625 secs [23:36:55] RECOVERY - NTP on berkelium is OK: NTP OK: Offset -0.04899477959 secs [23:37:27] now... ah yes, legoktm put something up didn't he? [23:37:37] oh it's another extension registration thing [23:37:41] :D [23:37:57] Krenair: https://gerrit.wikimedia.org/r/#q,199779,n,z [23:38:05] yeah I refreshed the calendar [23:38:45] RECOVERY - puppet last run on amssq45 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [23:39:20] (03PS2) 10Alex Monk: Convert more extensions to use extension registration! [mediawiki-config] - 10https://gerrit.wikimedia.org/r/199779 (owner: 10Legoktm) [23:39:27] (03CR) 10Alex Monk: [C: 032] Convert more extensions to use extension registration! [mediawiki-config] - 10https://gerrit.wikimedia.org/r/199779 (owner: 10Legoktm) [23:39:44] PROBLEM - Host berkelium is DOWN: PING CRITICAL - Packet loss = 100% [23:39:59] (03Merged) 10jenkins-bot: Convert more extensions to use extension registration! [mediawiki-config] - 10https://gerrit.wikimedia.org/r/199779 (owner: 10Legoktm) [23:40:05] RECOVERY - Host berkelium is UP: PING OK - Packet loss = 0%, RTA = 1.32 ms [23:40:37] legoktm [23:40:41] yes hi [23:40:41] !log krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/199779/ (duration: 00m 08s) [23:40:45] :D [23:40:46] Logged the message, Master [23:40:50] * legoktm tests [23:41:19] 6operations, 6Phabricator, 6Project-Creators: Create policy projects and convert people projects to open - https://phabricator.wikimedia.org/T90491#1151638 (10atgo) Maybe I'm late in the discussion here and you guys are set - but prefixing the Project with //acl*// moves it to the top of the list of projects... [23:41:39] ApiSandbox good, BF good, CharInsert good [23:41:39] * Krenair adds legoktm to review all the patches in the mean time [23:41:45] :o [23:42:22] AccountAudit... that's pgehres' thing right? [23:42:32] are you guys still using that? [23:42:36] yeah [23:42:48] I'm logging into the db to test that one [23:42:48] would you like to check that as well then please :) [23:42:51] ok [23:44:02] (03CR) 10Dzahn: [C: 031] transparency: make it HTTPS only and enable HSTS [puppet] - 10https://gerrit.wikimedia.org/r/199517 (https://phabricator.wikimedia.org/T40516) (owner: 10Chmarkine) [23:44:17] Krenair: yup, confirmed [23:44:23] ok, good [23:45:36] right... now I have some patches from the list to do [23:46:07] (03PS4) 10Alex Monk: Remove unused variables and commented-out code from CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/156078 (https://bugzilla.wikimedia.org/29902) (owner: 10Withoutaname) [23:47:17] (03CR) 10Alex Monk: [C: 032] Remove unused variables and commented-out code from CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/156078 (https://bugzilla.wikimedia.org/29902) (owner: 10Withoutaname) [23:48:07] (03Merged) 10jenkins-bot: Remove unused variables and commented-out code from CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/156078 (https://bugzilla.wikimedia.org/29902) (owner: 10Withoutaname) [23:49:11] !log krenair Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/156078/ (duration: 00m 07s) [23:49:19] Logged the message, Master [23:49:47] (03PS2) 10Alex Monk: Add Nova_Resource namespace to default labswiki search options [mediawiki-config] - 10https://gerrit.wikimedia.org/r/198460 (https://phabricator.wikimedia.org/T67132) [23:49:52] (03CR) 10Alex Monk: [C: 032] Add Nova_Resource namespace to default labswiki search options [mediawiki-config] - 10https://gerrit.wikimedia.org/r/198460 (https://phabricator.wikimedia.org/T67132) (owner: 10Alex Monk) [23:49:59] (03Merged) 10jenkins-bot: Add Nova_Resource namespace to default labswiki search options [mediawiki-config] - 10https://gerrit.wikimedia.org/r/198460 (https://phabricator.wikimedia.org/T67132) (owner: 10Alex Monk) [23:50:24] uh oh, hang on [23:50:50] !log krenair Synchronized wmf-config: re-sync that last one... (duration: 00m 08s) [23:50:55] Logged the message, Master [23:51:02] ok, that one went fine [23:52:13] !log krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/198460/ (duration: 00m 08s) [23:52:19] Logged the message, Master [23:52:51] (03PS3) 10Alex Monk: Setting import sources for uawikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193662 (https://phabricator.wikimedia.org/T91187) (owner: 10Base) [23:52:59] (03CR) 10Alex Monk: [C: 032] Setting import sources for uawikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193662 (https://phabricator.wikimedia.org/T91187) (owner: 10Base) [23:53:02] (03PS1) 10Dzahn: shop URL: change 'shop' to 'store' [dns] - 10https://gerrit.wikimedia.org/r/199796 (https://phabricator.wikimedia.org/T92438) [23:53:04] (03Merged) 10jenkins-bot: Setting import sources for uawikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193662 (https://phabricator.wikimedia.org/T91187) (owner: 10Base) [23:53:49] !log krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/193662/ (duration: 00m 07s) [23:53:56] Logged the message, Master [23:54:30] (03PS2) 10Alex Monk: Enable transwiki imports for Telugu Wikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194908 (https://phabricator.wikimedia.org/T91635) (owner: 10Odder) [23:54:40] (03CR) 10Alex Monk: [C: 032] Enable transwiki imports for Telugu Wikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194908 (https://phabricator.wikimedia.org/T91635) (owner: 10Odder) [23:54:44] (03Merged) 10jenkins-bot: Enable transwiki imports for Telugu Wikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194908 (https://phabricator.wikimedia.org/T91635) (owner: 10Odder) [23:55:26] !log krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/194908 (duration: 00m 07s) [23:55:32] Logged the message, Master [23:56:43] (03PS3) 10Alex Monk: Create and modify groups in eswikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/198749 (https://phabricator.wikimedia.org/T93371) (owner: 10Gerardduenas) [23:56:48] (03CR) 10Alex Monk: [C: 032] Create and modify groups in eswikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/198749 (https://phabricator.wikimedia.org/T93371) (owner: 10Gerardduenas) [23:56:53] (03Merged) 10jenkins-bot: Create and modify groups in eswikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/198749 (https://phabricator.wikimedia.org/T93371) (owner: 10Gerardduenas) [23:57:10] (03PS1) 10Faidon Liambotis: strongswan: cleanup Service invocation [puppet] - 10https://gerrit.wikimedia.org/r/199799 [23:57:27] !log krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/198749/ (duration: 00m 08s) [23:57:31] Logged the message, Master [23:57:43] (03PS2) 10Alex Monk: Let dawiki bureaucrats add/remove accountcreator group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/198753 (https://phabricator.wikimedia.org/T93260) (owner: 10Glaisher) [23:57:48] (03CR) 10Alex Monk: [C: 032] Let dawiki bureaucrats add/remove accountcreator group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/198753 (https://phabricator.wikimedia.org/T93260) (owner: 10Glaisher) [23:57:54] (03Merged) 10jenkins-bot: Let dawiki bureaucrats add/remove accountcreator group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/198753 (https://phabricator.wikimedia.org/T93260) (owner: 10Glaisher) [23:57:58] (03CR) 10Gage: [C: 031] strongswan: cleanup Service invocation [puppet] - 10https://gerrit.wikimedia.org/r/199799 (owner: 10Faidon Liambotis) [23:58:17] 6operations, 6Security, 10Wikimedia-Shop, 7HTTPS, 5Patch-For-Review: Changing the URL for the Wikimedia Shop - https://phabricator.wikimedia.org/T92438#1151729 (10Dzahn) >>! In T92438#1151432, @vshchepakina wrote: > @Dzahn I really prefer store.wikimedia.org, especially since we can't have store.wikipedi... [23:58:28] (03CR) 10BBlack: [C: 031] strongswan: cleanup Service invocation [puppet] - 10https://gerrit.wikimedia.org/r/199799 (owner: 10Faidon Liambotis) [23:58:35] !log krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/198753/ (duration: 00m 09s) [23:58:40] Logged the message, Master [23:58:49] (03CR) 10Faidon Liambotis: [C: 032] strongswan: cleanup Service invocation [puppet] - 10https://gerrit.wikimedia.org/r/199799 (owner: 10Faidon Liambotis) [23:59:05] (03PS4) 10Alex Monk: Add import sources for cawikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/198786 (https://phabricator.wikimedia.org/T93203) (owner: 10Gerardduenas) [23:59:10] (03CR) 10Alex Monk: [C: 032] Add import sources for cawikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/198786 (https://phabricator.wikimedia.org/T93203) (owner: 10Gerardduenas) [23:59:15] (03Merged) 10jenkins-bot: Add import sources for cawikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/198786 (https://phabricator.wikimedia.org/T93203) (owner: 10Gerardduenas) [23:59:41] !log krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/198786/ (duration: 00m 07s) [23:59:45] Logged the message, Master