[00:00:04] RoanKattouw, ^d: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150218T0000). Please do the needful. [00:00:22] any objections? [00:01:37] ok. lets do it [00:01:45] lgtm [00:01:45] ebernhardson, yt? [00:02:05] (03PS5) 10Springle: phabricator using mysql fulltext T89274, tweaked for mariadb/aria [puppet] - 10https://gerrit.wikimedia.org/r/190775 [00:02:08] MaxSem: yup [00:02:59] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There are 2 unmerged changes in mediawiki_config (dir /srv/mediawiki-staging/). [00:03:10] (03PS2) 10Dzahn: create shell user for Marielle Volz [puppet] - 10https://gerrit.wikimedia.org/r/190405 (https://phabricator.wikimedia.org/T89057) [00:05:20] 3Ops-Access-Requests, Services, operations, Citoid: Give mvolz access to sha machine i.e. http://citoid.wikimedia.org/ - https://phabricator.wikimedia.org/T89057#1045224 (10Dzahn) @mvolz we have a rotation system. this week it's @ottomata. also it had to wait for 3 (business) days. I just rebased it. [00:05:39] !log maxsem Synchronized php-1.25wmf17/extensions/Echo/: SWAT (duration: 00m 07s) [00:05:45] Logged the message, Master [00:05:48] ebernhardson, ^ [00:06:18] MaxSem: ok testing [00:06:33] ejegg, why are you merging changes during a swat? [00:07:10] 3Ops-Access-Requests, Services, operations, Citoid: Give mvolz access to sha machine i.e. http://citoid.wikimedia.org/ - https://phabricator.wikimedia.org/T89057#1045227 (10Dzahn) a:3Ottomata @ottomata please see the change above. also, it needs some roles for this, so far it's just the user acount itsel [00:07:26] oh, it was even before the swat [00:07:38] 3Ops-Access-Requests, operations: Requesting access to analytics-privatedata-users for jamesur - https://phabricator.wikimedia.org/T89739#1045229 (10Dzahn) a:3Ottomata [00:07:48] (03CR) 10AndyRussG: "Tests successful on the beta cluster :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/182078 (owner: 10AndyRussG) [00:08:05] 3Ops-Access-Requests, operations: access request for researcher to analytics-users in Hadoop - https://phabricator.wikimedia.org/T89264#1045231 (10Dzahn) a:3Ottomata [00:08:18] 3Ops-Access-Requests, operations: Access request for stat1003 - https://phabricator.wikimedia.org/T89418#1045232 (10Dzahn) a:3Ottomata [00:09:01] 3operations, Wikimedia-Git-or-Gerrit: TransparencyReport repository master in Gerrit silently made private - https://phabricator.wikimedia.org/T89640#1045234 (10Prtksxna) >>! In T89640#1044975, @Krenair wrote: > Who sent this request and why? I sent the request on behalf of the legal team. We are working on th... [00:09:20] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [00:09:28] 3Ops-Access-Requests, operations: Requesting sudo access to vanadium for mforns - https://phabricator.wikimedia.org/T89471#1045236 (10Dzahn) a:3Ottomata [00:09:33] MaxSem: looks good to go, [00:10:24] MaxSem: sorry, was just tweaking a setting on labs [00:11:14] I won't do it again [00:11:34] ejegg, in future, please pull on tin to avoid nagios alerts and risk your changes being reverted ;) [00:11:47] !log maxsem Synchronized php-1.25wmf16/extensions/Echo/: SWAT (duration: 00m 06s) [00:11:51] Logged the message, Master [00:11:54] ebernhardson, [00:12:04] 3operations, Wikimedia-Git-or-Gerrit: TransparencyReport repository master in Gerrit silently made private - https://phabricator.wikimedia.org/T89640#1045239 (10Krenair) That's... Not how Git is supposed to work. There is no need to hide Wikimedia's master copy of the repository just because you want to prepare... [00:12:07] (03CR) 10MaxSem: [C: 032] Enable JS console recruitment on mobile. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/187823 (https://phabricator.wikimedia.org/T85815) (owner: 10Jdlrobson) [00:12:21] oops, got it [00:13:35] !log db1048 m3-slave restart mysqld T89274 [00:13:38] Logged the message, Master [00:14:45] MaxSem: seems to work there too. thanks! [00:14:51] :) [00:15:22] (03Merged) 10jenkins-bot: Enable JS console recruitment on mobile. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/187823 (https://phabricator.wikimedia.org/T85815) (owner: 10Jdlrobson) [00:15:40] why is wiki so slow [00:16:11] !log maxsem Synchronized wmf-config/mobile.php: https://gerrit.wikimedia.org/r/187823 (duration: 00m 06s) [00:16:13] Logged the message, Master [00:17:47] 3operations, Wikimedia-Git-or-Gerrit: TransparencyReport repository master in Gerrit silently made private - https://phabricator.wikimedia.org/T89640#1045245 (10Prtksxna) The only other way that I know of is to submit all changes as draft patches (I hear aren't exactly private either) and merge them right before... [00:20:01] (03PS1) 10Dzahn: add jamesur to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/191218 (https://phabricator.wikimedia.org/T89739) [00:20:46] (03CR) 10MaxSem: [C: 032] Adding original language of this work campaign for WikiGrok [mediawiki-config] - 10https://gerrit.wikimedia.org/r/188731 (owner: 10Kaldari) [00:21:15] !log db1043 m3-master restart mysqld T89274 [00:21:19] Logged the message, Master [00:21:30] (03Merged) 10jenkins-bot: Adding original language of this work campaign for WikiGrok [mediawiki-config] - 10https://gerrit.wikimedia.org/r/188731 (owner: 10Kaldari) [00:22:38] !log maxsem Synchronized wmf-config/mobile.php: https://gerrit.wikimedia.org/r/188731 (duration: 00m 05s) [00:22:41] Logged the message, Master [00:25:49] PROBLEM - haproxy failover on dbproxy1003 is CRITICAL: CRITICAL check_failover servers up 2 down 1 [00:26:09] !log maxsem Synchronized php-1.25wmf17/extensions/WikiGrok/: https://gerrit.wikimedia.org/r/190562 (duration: 00m 07s) [00:26:12] Logged the message, Master [00:27:01] !log maxsem Synchronized php-1.25wmf16/extensions/WikiGrok/: https://gerrit.wikimedia.org/r/190562 (duration: 00m 06s) [00:27:04] Logged the message, Master [00:29:40] (03PS2) 10Dzahn: Use noc@ for apache2 ServerAdmin [puppet] - 10https://gerrit.wikimedia.org/r/188416 (owner: 10John F. Lewis) [00:31:43] (03CR) 10Dzahn: [C: 032] Use noc@ for apache2 ServerAdmin [puppet] - 10https://gerrit.wikimedia.org/r/188416 (owner: 10John F. Lewis) [00:33:53] (03PS1) 10Springle: actually put ft_stopword_file into the db config this time [puppet] - 10https://gerrit.wikimedia.org/r/191222 [00:35:45] (03CR) 10Springle: [C: 032] actually put ft_stopword_file into the db config this time [puppet] - 10https://gerrit.wikimedia.org/r/191222 (owner: 10Springle) [00:36:40] PROBLEM - MySQL Processlist on db1058 is CRITICAL: CRIT 0 unauthenticated, 0 locked, 0 copy to table, 175 statistics [00:37:48] RECOVERY - MySQL Processlist on db1058 is OK: OK 0 unauthenticated, 0 locked, 0 copy to table, 0 statistics [00:38:33] (03CR) 10MaxSem: [C: 032] Enable gather extension on en beta labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189863 (owner: 10Robmoen) [00:41:17] * MaxSem bites Zuul [00:42:55] there is no maxsem, only zuul [00:43:00] (03Merged) 10jenkins-bot: Enable gather extension on en beta labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189863 (owner: 10Robmoen) [00:44:34] !log maxsem Synchronized wmf-config/: https://gerrit.wikimedia.org/r/189863 - labs only (duration: 00m 06s) [00:44:42] Logged the message, Master [00:45:49] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [00:48:22] hoo, aude: we've got a bunch of exceptions like HttpError from line 359 of /srv/mediawiki/php-1.25wmf17/extensions/Wikidata/extensions/Wikibase/repo/includes/LinkedData/EntityDataRequestHandler.php: Failed to load ent [00:48:23] ity Q8889088 [00:50:51] 3operations, Phabricator: Mysql search issues flagged by Phabricator setup - https://phabricator.wikimedia.org/T89274#1045351 (10chasemp) 5Open>3Resolved a:3chasemp done [00:55:39] RECOVERY - haproxy failover on dbproxy1003 is OK: OK check_failover servers up 2 down 0 [00:56:45] !log maxsem Synchronized php-1.25wmf17/extensions/MobileFrontend/: SWAT (duration: 00m 06s) [00:56:50] Logged the message, Master [00:57:20] 3Ops-Access-Requests, operations: Give Tyler Cipriani shell access (with access to CI systems as well) - https://phabricator.wikimedia.org/T89378#1045360 (10thcipriani) @Dzahn—can confirm that I have access to bast1001 and gallium via bast1001. Thanks! [01:07:28] (03PS1) 10GWicke: Update restbase config.yaml.erb for service-runner [puppet] - 10https://gerrit.wikimedia.org/r/191230 [01:10:32] anybody around for a quick restbase config tweak review? [01:11:17] 3operations, Wikimedia-Git-or-Gerrit: TransparencyReport repository master in Gerrit silently made private - https://phabricator.wikimedia.org/T89640#1045403 (10Jalexander) >>! In T89640#1045245, @Prtksxna wrote: > The only other way that I know of is to submit all changes as draft patches (I hear aren't exactly... [01:20:03] ottomata: ping [01:23:36] (03CR) 10Ori.livneh: [C: 032] "Looks sane, and not mission-critical yet, so I feel OK merging." [puppet] - 10https://gerrit.wikimedia.org/r/191230 (owner: 10GWicke) [01:24:01] ori: thanks! [01:24:14] gwicke: np; need me to run puppet somewhere, or can you do that yourself? [01:24:29] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [01:25:05] ori: running already [01:25:09] cool [01:25:26] (03PS1) 10Ori.livneh: vbench: Allow 'stage' to be specified from the command line [puppet] - 10https://gerrit.wikimedia.org/r/191234 (https://phabricator.wikimedia.org/T89536) [01:25:38] RoanKattouw: ^ [01:31:14] (03CR) 10Ori.livneh: [C: 032] vbench: Allow 'stage' to be specified from the command line [puppet] - 10https://gerrit.wikimedia.org/r/191234 (https://phabricator.wikimedia.org/T89536) (owner: 10Ori.livneh) [01:31:56] (03PS1) 10Springle: reduce db1065 non-api load [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191236 [01:34:27] (03PS1) 10GWicke: Small fix in restbase config [puppet] - 10https://gerrit.wikimedia.org/r/191237 [01:34:42] ori, small follow-up ^^ [01:35:00] (03CR) 10Ori.livneh: [C: 032 V: 032] Small fix in restbase config [puppet] - 10https://gerrit.wikimedia.org/r/191237 (owner: 10GWicke) [01:35:07] thx! [01:57:54] (03CR) 10Springle: [C: 032] reduce db1065 non-api load [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191236 (owner: 10Springle) [01:57:59] (03Merged) 10jenkins-bot: reduce db1065 non-api load [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191236 (owner: 10Springle) [01:58:52] !log springle Synchronized wmf-config/db-eqiad.php: reduce db1065 load (duration: 00m 05s) [01:58:57] Logged the message, Master [02:03:10] 3Phabricator, operations, Wikimedia-Bugzilla: Create a static HTML version of Bugzilla - https://phabricator.wikimedia.org/T85140#1045517 (10Dzahn) >>! In T85140#1038257, @jayvdb wrote: > * the header on static-bugzilla pages is / will be wrong > In order to access the Phabricator task corresponding to a Bug... [02:06:52] (03CR) 10Dzahn: "is it intended that the swift related things are removed from yaml?" [puppet] - 10https://gerrit.wikimedia.org/r/188822 (owner: 10Giuseppe Lavagetto) [02:07:23] !log ori Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 05s) [02:07:29] Logged the message, Master [02:10:10] (03PS1) 10GWicke: Two more fixes for the new layout [puppet] - 10https://gerrit.wikimedia.org/r/191240 [02:10:40] ori: if you have time for one more ^^ [02:11:26] wait, something sneaked in there [02:12:40] (03PS2) 10GWicke: Two more fixes for the new layout [puppet] - 10https://gerrit.wikimedia.org/r/191240 [02:12:51] fixed ^^ [02:16:44] ori, can I bug you once more? [02:17:16] would be great to get https://gerrit.wikimedia.org/r/191240 merged as I can then let the load tests run on the test cluster over night [02:21:29] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 14.29% of data above the critical threshold [500.0] [02:21:39] PROBLEM - puppet last run on cp3007 is CRITICAL: CRITICAL: Puppet has 1 failures [02:21:48] PROBLEM - puppet last run on amssq45 is CRITICAL: CRITICAL: Puppet has 2 failures [02:21:48] PROBLEM - puppet last run on amssq52 is CRITICAL: CRITICAL: Puppet has 1 failures [02:21:58] !log l10nupdate Synchronized php-1.25wmf16/cache/l10n: (no message) (duration: 00m 02s) [02:22:04] Logged the message, Master [02:23:06] !log LocalisationUpdate completed (1.25wmf16) at 2015-02-18 02:22:02+00:00 [02:23:09] Logged the message, Master [02:31:30] (03CR) 10Ori.livneh: [C: 032] Two more fixes for the new layout [puppet] - 10https://gerrit.wikimedia.org/r/191240 (owner: 10GWicke) [02:34:19] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [02:38:43] !log l10nupdate Synchronized php-1.25wmf17/cache/l10n: (no message) (duration: 00m 01s) [02:38:48] Logged the message, Master [02:38:49] RECOVERY - puppet last run on amssq45 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [02:38:49] RECOVERY - puppet last run on amssq52 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [02:38:49] RECOVERY - puppet last run on cp3007 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [02:39:50] !log LocalisationUpdate completed (1.25wmf17) at 2015-02-18 02:38:47+00:00 [02:39:53] Logged the message, Master [02:43:52] 3OTRS, operations: Upgrade OTRS to latest stable release (4.0 or later) - https://phabricator.wikimedia.org/T74109#1045543 (10Matthewrbowker) [03:15:25] RD, hi [03:15:32] :P [03:15:39] i know he shouldnt be doing that [03:15:41] but shrug [03:15:46] so per https://phabricator.wikimedia.org/T89789 [03:16:21] user wants to stop receiving notifications from private wiki they can't log in on [03:17:30] I wonder if we can just mark them as email unverified or something [03:21:04] RD, does that sound OK to you? [03:21:33] they should just fix it [03:21:37] :P [03:22:04] the issue in MW? [03:22:09] yes [03:49:26] (03CR) 10KartikMistry: "What should we do with cxserver-admin group? :) I guess it is on same host(s), so thought having access to stop/start similar services sho" [puppet] - 10https://gerrit.wikimedia.org/r/189915 (owner: 10KartikMistry) [04:04:15] (03PS32) 10KartikMistry: cxserver: Add Yandex support [puppet] - 10https://gerrit.wikimedia.org/r/186538 (https://phabricator.wikimedia.org/T88512) [04:05:07] (03PS1) 10BryanDavis: l10nupdate: use --no-shared-authsock with sync-dir [puppet] - 10https://gerrit.wikimedia.org/r/191251 (https://phabricator.wikimedia.org/T76061) [04:21:32] (03Abandoned) 10Glaisher: Create 'interface-editor' user group on cawikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/187915 (https://phabricator.wikimedia.org/T85713) (owner: 10Glaisher) [04:57:58] 3OTRS, operations: Make OTRS sessions IP-address-agnostic - https://phabricator.wikimedia.org/T87217#1045668 (10lfaraone) >>! In T87217#1043145, @Steinsplitter wrote: >>>! In T87217#1041222, @tommorris wrote: >> If one is using tethered mobile broadband from Three (a UK mobile operator) using an Android handset... [05:07:57] !log LocalisationUpdate ResourceLoader cache refresh completed at Wed Feb 18 05:06:53 UTC 2015 (duration 6m 52s) [05:08:03] Logged the message, Master [05:26:36] (03PS1) 10KartikMistry: Beta: Use array format for dictionary in cxserver config [puppet] - 10https://gerrit.wikimedia.org/r/191256 [05:29:18] (03CR) 10KartikMistry: "Should merge along with/after, https://gerrit.wikimedia.org/r/#/c/191255/" [puppet] - 10https://gerrit.wikimedia.org/r/191256 (owner: 10KartikMistry) [05:31:44] (03PS4) 10KartikMistry: WIP: Use compact registry format for cxserver [puppet] - 10https://gerrit.wikimedia.org/r/190990 [05:32:44] (03CR) 10Gergő Tisza: Set up beacon endpoint for virtual media views (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/190821 (https://phabricator.wikimedia.org/T89088) (owner: 10Gilles) [05:44:54] (03PS1) 10BryanDavis: logstash: Ship logs via syslog udp datagrams [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191259 (https://phabricator.wikimedia.org/T88732) [05:48:16] 3Wikimedia-Logstash, operations, Incident-20150205-SiteOutage: Decouple logging infrastructure failures from MediaWiki logging - https://phabricator.wikimedia.org/T88732#1045751 (10bd808) a:3bd808 [05:48:36] 3MediaWiki-Core-Team, Wikimedia-Logstash, operations, Incident-20150205-SiteOutage: Decouple logging infrastructure failures from MediaWiki logging - https://phabricator.wikimedia.org/T88732#1019000 (10bd808) [05:57:10] 3operations: Incident response protocol needs a refresh - https://phabricator.wikimedia.org/T89800#1045765 (10Eloquence) 3NEW [06:00:03] 3ops-eqiad, operations: db1054 MCE errors logged for CPU temperature - https://phabricator.wikimedia.org/T89801#1045773 (10Springle) 3NEW [06:02:55] (03PS1) 10Springle: depool db1054, T89801 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191260 [06:29:09] PROBLEM - puppet last run on mw1099 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:09] PROBLEM - puppet last run on mw1176 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:19] PROBLEM - puppet last run on db1046 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:28] PROBLEM - puppet last run on amssq35 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:28] PROBLEM - puppet last run on cp3016 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:39] PROBLEM - puppet last run on db1040 is CRITICAL: CRITICAL: Puppet has 2 failures [06:29:39] PROBLEM - puppet last run on mw1065 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:39] PROBLEM - puppet last run on ms-fe1004 is CRITICAL: CRITICAL: Puppet has 2 failures [06:29:40] PROBLEM - puppet last run on mw1153 is CRITICAL: CRITICAL: Puppet has 2 failures [06:29:48] PROBLEM - puppet last run on mw1166 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:49] PROBLEM - puppet last run on analytics1030 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:49] PROBLEM - puppet last run on cp1056 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:08] PROBLEM - puppet last run on mw1170 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:09] PROBLEM - puppet last run on db2018 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:09] PROBLEM - puppet last run on mw1144 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:10] PROBLEM - puppet last run on cp4003 is CRITICAL: CRITICAL: Puppet has 2 failures [06:30:10] PROBLEM - puppet last run on cp4008 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:39] PROBLEM - puppet last run on amssq38 is CRITICAL: CRITICAL: puppet fail [06:35:47] (03PS1) 10KartikMistry: WIP: Do not use registry and fallback to config.default.js [puppet] - 10https://gerrit.wikimedia.org/r/191263 [06:39:38] PROBLEM - puppet last run on db1005 is CRITICAL: CRITICAL: Puppet has 1 failures [06:45:33] (03PS2) 10Springle: depool db1054, T89801 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191260 [06:49:29] RECOVERY - puppet last run on db2018 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [06:49:59] RECOVERY - puppet last run on db1040 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [06:49:59] RECOVERY - puppet last run on mw1065 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [06:50:38] RECOVERY - puppet last run on mw1099 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [06:51:29] RECOVERY - puppet last run on mw1144 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [06:51:58] (03CR) 10Springle: [C: 032] depool db1054, T89801 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191260 (owner: 10Springle) [06:52:03] (03Merged) 10jenkins-bot: depool db1054, T89801 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191260 (owner: 10Springle) [06:52:51] _joe_: working on tin? [06:53:32] unstaged changes [06:55:12] hmm twentyafterfour [06:55:41] not my unstaged changes [06:56:02] yeah only going by the file owner [06:56:15] * springle keeps looking [06:56:48] looks similar to some changes that I stashed before deployment earlier... ori? [06:57:37] yes, i was testing something; reset [06:57:45] i just reset it, i mean [06:57:52] thanks [06:59:23] !log springle Synchronized wmf-config/db-eqiad.php: depool db1054 T89801 (duration: 00m 06s) [06:59:28] Logged the message, Master [07:08:09] RECOVERY - puppet last run on ms-fe1004 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [07:08:49] RECOVERY - puppet last run on mw1176 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [07:08:59] RECOVERY - puppet last run on db1046 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [07:09:15] RECOVERY - puppet last run on amssq35 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [07:09:19] RECOVERY - puppet last run on mw1153 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [07:09:19] RECOVERY - puppet last run on mw1166 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [07:09:29] RECOVERY - puppet last run on cp1056 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [07:09:29] RECOVERY - puppet last run on analytics1030 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [07:09:39] RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [07:09:59] RECOVERY - puppet last run on cp4003 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [07:10:00] RECOVERY - puppet last run on cp4008 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [07:10:10] RECOVERY - puppet last run on cp3016 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [07:11:33] (03PS1) 10KartikMistry: Beta: Update $wgContentTranslationSiteTemplates [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191264 [07:13:20] RECOVERY - puppet last run on amssq38 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [07:16:35] <_joe_> springle: nope [07:16:41] <_joe_> heya [07:16:49] <_joe_> ori: still here? [07:17:00] RECOVERY - puppet last run on db1005 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [07:17:30] _joe_: yeah sorry, not you. [07:27:56] is labs still broken? There is only one instance listed on https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep ... [07:31:06] <_joe_> twentyafterfour: labs broken? [07:31:39] well, beta used to have a lot more than 1 instance in the project [07:31:53] <_joe_> twentyafterfour: https://wikitech.wikimedia.org/wiki/Incident_documentation/20150217-LabsOutage [07:32:08] <_joe_> twentyafterfour: try to log out [07:32:52] <_joe_> I'm getting a blank page on wikitech [07:32:57] <_joe_> for instances [07:33:29] that's what I'm talking about ... [07:33:50] <_joe_> twentyafterfour: ok so that is called wikitech [07:33:53] <_joe_> not labs [07:34:07] <_joe_> beta renders correctly, how could it be if no instances were around? [07:34:27] <_joe_> tbh I don't have time to troubleshoot this now, if beta is up [07:34:53] <_joe_> unless it's blocking someone's work right now [07:35:13] well if it's not a good time don't worry about it. I need to create a new instance in that project though [07:35:23] it can wait till tomorrow [07:36:10] I'll just open a ticket about it [07:36:23] <_joe_> twentyafterfour: thanks [07:36:33] to phabricator! :) [07:36:36] <_joe_> sorry but there is no obvious reason for this to fail [07:37:33] <_joe_> and I can't find the error log on silver [07:37:35] <_joe_> GRRR [07:37:53] <_joe_> oh ok here it is. [07:37:54] <_joe_> [Wed Feb 18 07:37:08.455473 2015] [:error] [pid 871] [client 79.58.168.196:54502] PHP Fatal error: Call to a member function getImageName() on a non-object in /srv/mediawiki/php-1.25wmf17/extensions/OpenStackManager/special/SpecialNovaInstance.php on line 569 [07:37:59] _joe_: that is bug of wikitech/openstack to see list of instances? [07:38:10] one has to logout and login to see it. [07:38:10] <_joe_> kart_: I'd say, yes [07:38:16] <_joe_> kart_: nope [07:38:22] <_joe_> this is a new error [07:38:25] is this new one? :/ [07:38:41] <_joe_> I said repeatedly that linking wikitech to the deployment train was a bad idea [07:38:48] +1 [07:45:32] (03PS2) 10Giuseppe Lavagetto: service_unit: allow custom init script in a single initsystem [puppet] - 10https://gerrit.wikimedia.org/r/190815 [07:47:27] (03CR) 10Giuseppe Lavagetto: [C: 032] service_unit: allow custom init script in a single initsystem [puppet] - 10https://gerrit.wikimedia.org/r/190815 (owner: 10Giuseppe Lavagetto) [07:49:29] 3Wikimedia-Labs-wikitech-interface, operations: wikitech instances list is blank - https://phabricator.wikimedia.org/T89808#1045882 (10mmodell) 3NEW [07:50:46] (03PS2) 10Giuseppe Lavagetto: memcached: systemd compatibility [puppet] - 10https://gerrit.wikimedia.org/r/190816 [07:53:11] (03CR) 10Gilles: Set up beacon endpoint for virtual media views (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/190821 (https://phabricator.wikimedia.org/T89088) (owner: 10Gilles) [08:06:52] 3operations, Phabricator: have any task put into ops-access-requests automatically generate an ops-access-review task - https://phabricator.wikimedia.org/T87467#1045912 (10mmodell) This is blocked on ops, afaik. It's only used by them so I'll let them decide when to merge it. [08:11:57] (03PS3) 10Giuseppe Lavagetto: memcached: systemd compatibility [puppet] - 10https://gerrit.wikimedia.org/r/190816 [08:16:39] (03CR) 10Giuseppe Lavagetto: "Cherry-picked on beta resulted in a noop." [puppet] - 10https://gerrit.wikimedia.org/r/190816 (owner: 10Giuseppe Lavagetto) [08:22:24] (03CR) 10Santhosh: [C: 031] Beta: Update $wgContentTranslationSiteTemplates [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191264 (owner: 10KartikMistry) [08:24:37] (03CR) 10Giuseppe Lavagetto: [C: 032] memcached: systemd compatibility [puppet] - 10https://gerrit.wikimedia.org/r/190816 (owner: 10Giuseppe Lavagetto) [08:32:41] (03CR) 10Gergő Tisza: Set up beacon endpoint for virtual media views (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/190821 (https://phabricator.wikimedia.org/T89088) (owner: 10Gilles) [08:33:30] 3operations, Phabricator: have any task put into ops-access-requests automatically generate an ops-access-review task - https://phabricator.wikimedia.org/T87467#1045923 (10mmodell) 5Open>3stalled [08:48:16] <_joe_> "omgwtf.in" [08:48:47] <_joe_> he's got all the best domains :P [08:51:45] (03CR) 10Nikerabbit: [C: 031] Beta: Update $wgContentTranslationSiteTemplates [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191264 (owner: 10KartikMistry) [08:55:45] (03PS1) 10Giuseppe Lavagetto: move mc1016 to a new rack/row [dns] - 10https://gerrit.wikimedia.org/r/191270 [08:55:47] (03PS1) 10Giuseppe Lavagetto: move mc1015 to a new rack/row [dns] - 10https://gerrit.wikimedia.org/r/191271 [08:55:49] (03PS1) 10Giuseppe Lavagetto: move mc1014 to a new rack/row [dns] - 10https://gerrit.wikimedia.org/r/191272 [08:55:51] (03PS1) 10Giuseppe Lavagetto: move mc1013 to a new rack/row [dns] - 10https://gerrit.wikimedia.org/r/191273 [08:55:53] (03PS1) 10Giuseppe Lavagetto: move mc1007 to a new rack/row [dns] - 10https://gerrit.wikimedia.org/r/191274 [08:55:55] (03PS1) 10Giuseppe Lavagetto: move mc1008 to a new rack/row [dns] - 10https://gerrit.wikimedia.org/r/191275 [08:55:57] (03PS1) 10Giuseppe Lavagetto: move mc1009 to a new rack/row [dns] - 10https://gerrit.wikimedia.org/r/191276 [08:55:59] (03PS1) 10Giuseppe Lavagetto: move mc1010 to a new rack/row [dns] - 10https://gerrit.wikimedia.org/r/191277 [08:56:01] (03PS1) 10Giuseppe Lavagetto: move mc1011 to a new rack/row [dns] - 10https://gerrit.wikimedia.org/r/191278 [08:56:03] (03PS1) 10Giuseppe Lavagetto: move mc1012 to a new rack/row [dns] - 10https://gerrit.wikimedia.org/r/191279 [08:58:43] (03CR) 10Giuseppe Lavagetto: [C: 04-2] "The dns changes should happen one-by-one, see" [dns] - 10https://gerrit.wikimedia.org/r/190358 (owner: 10Cmjohnson) [09:06:51] _joe_: I had ftbfs.in :) [09:11:54] greetings [09:26:54] 3Phabricator, operations, Wikimedia-Bugzilla: Create a static HTML version of Bugzilla - https://phabricator.wikimedia.org/T85140#1045971 (10Aklapper) >>! In T85140#1045169, @JohnLewis wrote: > After looking at it, there are only two options regarding obsolete attachments. > 1. Import them into Phabricator We w... [09:29:18] 3Project-Creators, operations, Phabricator: Create projects for Ops goals - https://phabricator.wikimedia.org/T87262#1045975 (10Aklapper) Thanks chasemp! [09:44:50] (03CR) 10Giuseppe Lavagetto: [C: 031] "I did not review every single value, but this is clearly correct." [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/190813 (https://phabricator.wikimedia.org/T76149) (owner: 10Filippo Giunchedi) [10:02:53] (03PS2) 10Filippo Giunchedi: cassandra: deprecate cassandra::defaults class [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/190813 (https://phabricator.wikimedia.org/T76149) [10:03:02] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] cassandra: deprecate cassandra::defaults class [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/190813 (https://phabricator.wikimedia.org/T76149) (owner: 10Filippo Giunchedi) [10:04:20] thanks akosiaris _joe_ ! [10:08:06] 3RESTBase-Cassandra, operations: Make the cassandra module use hiera properly - https://phabricator.wikimedia.org/T76149#1046020 (10fgiunchedi) default.pp is gone with https://gerrit.wikimedia.org/r/190813, leaving this open since there other other improvements like @gwicke suggested [10:14:25] (03PS1) 10Filippo Giunchedi: cassandra: update submodule [puppet] - 10https://gerrit.wikimedia.org/r/191284 [10:15:26] hashar: Any chance you could have a look at https://gerrit.wikimedia.org/r/190690 ? Should be trivial [10:15:35] (03CR) 10jenkins-bot: [V: 04-1] cassandra: update submodule [puppet] - 10https://gerrit.wikimedia.org/r/191284 (owner: 10Filippo Giunchedi) [10:19:32] (03PS2) 10Filippo Giunchedi: cassandra: update submodule [puppet] - 10https://gerrit.wikimedia.org/r/191284 [10:22:27] Did we make ruwiki https only? [10:22:34] And why on earth did nobody tell us [10:22:49] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] cassandra: update submodule [puppet] - 10https://gerrit.wikimedia.org/r/191284 (owner: 10Filippo Giunchedi) [10:23:09] _joe_: godog: ^ [10:23:15] hoo: sure will do [10:23:29] hashar: Thank you [10:23:59] <_joe_> hashar: thanks for fixing the extensions integration job :) [10:24:38] _joe_: you are welcome. I am still wondering why hhvm-dev was no more installed [10:24:51] <_joe_> oh, well, whatever :) [10:24:51] but maybe I did it manually or it was in the build-deps of hhvm at some point [10:25:45] Could someone answer my question? [10:26:07] hoo: look at operations/mediawiki-config.git there must be a Task associated [10:26:09] or a conf change [10:27:35] Can't find anything at a glance [10:27:36] (03PS1) 10Giuseppe Lavagetto: nutcracker: move and label mc1016 [puppet] - 10https://gerrit.wikimedia.org/r/191288 [10:27:38] (03PS1) 10Giuseppe Lavagetto: nutcracker: move and lable mc1015 [puppet] - 10https://gerrit.wikimedia.org/r/191289 [10:27:40] (03PS1) 10Giuseppe Lavagetto: nutcracker: move and label mc1014 [puppet] - 10https://gerrit.wikimedia.org/r/191290 [10:27:42] (03PS1) 10Giuseppe Lavagetto: nutcracker: move and label mc1013 [puppet] - 10https://gerrit.wikimedia.org/r/191291 [10:27:44] (03PS1) 10Giuseppe Lavagetto: nutcracker: move and label mc1007 [puppet] - 10https://gerrit.wikimedia.org/r/191292 [10:27:46] (03PS1) 10Giuseppe Lavagetto: nutcracker: move and label mc1008 [puppet] - 10https://gerrit.wikimedia.org/r/191293 [10:27:48] (03PS1) 10Giuseppe Lavagetto: nutcracker: move and label mc1009 [puppet] - 10https://gerrit.wikimedia.org/r/191294 [10:27:50] (03PS1) 10Giuseppe Lavagetto: nutcracker: move and label mc1010 [puppet] - 10https://gerrit.wikimedia.org/r/191295 [10:27:52] (03PS1) 10Giuseppe Lavagetto: nutcracker: move and label mc1011 [puppet] - 10https://gerrit.wikimedia.org/r/191296 [10:27:54] (03PS1) 10Giuseppe Lavagetto: nutcracker: move and label mc1012 [puppet] - 10https://gerrit.wikimedia.org/r/191297 [10:29:55] <_joe_> this is gonna be soo funny [10:30:14] <_joe_> now I have a third place where I have to change all those IPs [10:30:37] jenkins will be thrilled with that shower of reviews [10:30:53] <_joe_> I can only imagine when we'd have 10 services using the same memcached... I'll have to change those in 10 different files? [10:31:01] hoo: I have updated the jobs [10:31:08] <_joe_> or, will we learn a lesson for once? [10:32:46] any guru would know what ssh error is: unknown key type 'ecdsa-sha2-nistp256' [10:33:00] I have grabbed the ssh known host file from tin but my ssh client does not recognize the key type :-( [10:33:15] <_joe_> yes, you have an outdated ssh client [10:33:28] * hashar raises fist at apple [10:33:34] !log Manually switched wikidatawiki's sites table entry for ruwiki from protocol relative to https URIs [10:33:35] guess I will have to upgrade my OS [10:33:38] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [10:33:39] PROBLEM - Unmerged changes on repository puppet on palladium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [10:33:40] Logged the message, Master [10:34:06] that was me, sorry [10:34:39] RECOVERY - Unmerged changes on repository puppet on palladium is OK: No changes to merge. [10:34:39] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [10:35:00] <_joe_> well, this alert is just a reminder [10:35:26] <_joe_> it's there for the specific reason that sometime we merge noops or we add new things and we forget to merge [10:37:29] PROBLEM - puppet last run on restbase1003 is CRITICAL: CRITICAL: puppet fail [10:38:07] <_joe_> godog: ouch ^^ [10:40:35] hah, will take a look shortly [10:40:39] PROBLEM - puppet last run on praseodymium is CRITICAL: CRITICAL: puppet fail [10:44:09] PROBLEM - puppet last run on cerium is CRITICAL: CRITICAL: puppet fail [10:44:31] hashar: Sorry to bother again: https://gerrit.wikimedia.org/r/191301 While we don't need cldr there we still load it [10:44:54] Would be to much work/ to much complexity to change the extension loading stuff just for the API tests [10:45:09] PROBLEM - puppet last run on xenon is CRITICAL: CRITICAL: puppet fail [10:46:03] (03PS1) 10Filippo Giunchedi: fix additional_jvm_opts default [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/191302 [10:46:20] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] fix additional_jvm_opts default [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/191302 (owner: 10Filippo Giunchedi) [10:47:04] yay for submodule update double commit [10:47:09] (03PS1) 10Filippo Giunchedi: cassandra: submodule update [puppet] - 10https://gerrit.wikimedia.org/r/191303 [10:47:14] hoo: updating :] [10:47:25] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] cassandra: submodule update [puppet] - 10https://gerrit.wikimedia.org/r/191303 (owner: 10Filippo Giunchedi) [10:48:33] Thanks again [10:49:18] (03PS1) 10Giuseppe Lavagetto: move mc1016 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191305 [10:49:20] (03PS1) 10Giuseppe Lavagetto: move mc1015 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191306 [10:49:22] (03PS1) 10Giuseppe Lavagetto: move mc1014 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191307 [10:49:24] (03PS1) 10Giuseppe Lavagetto: move mc1013 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191308 [10:49:26] (03PS1) 10Giuseppe Lavagetto: move mc1007 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191309 [10:49:28] (03PS1) 10Giuseppe Lavagetto: move mc1008 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191310 [10:49:30] (03PS1) 10Giuseppe Lavagetto: move mc1009 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191311 [10:49:32] (03PS1) 10Giuseppe Lavagetto: move mc1010 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191312 [10:49:32] hoo: job updated [10:49:34] (03PS1) 10Giuseppe Lavagetto: move mc1011 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191313 [10:49:36] (03PS1) 10Giuseppe Lavagetto: move mc1012 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191314 [10:50:13] Thank you :) [10:51:39] RECOVERY - puppet last run on restbase1003 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [10:55:50] (03PS1) 10Filippo Giunchedi: update class defaults [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/191315 [10:56:42] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] update class defaults [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/191315 (owner: 10Filippo Giunchedi) [10:57:55] (03PS2) 10Filippo Giunchedi: add extra_classpath argument [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/191067 (https://phabricator.wikimedia.org/T78514) [10:58:38] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] add extra_classpath argument [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/191067 (https://phabricator.wikimedia.org/T78514) (owner: 10Filippo Giunchedi) [10:58:59] RECOVERY - puppet last run on praseodymium is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [10:59:28] (03PS1) 10Filippo Giunchedi: cassandra: update submodule [puppet] - 10https://gerrit.wikimedia.org/r/191316 [10:59:41] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] cassandra: update submodule [puppet] - 10https://gerrit.wikimedia.org/r/191316 (owner: 10Filippo Giunchedi) [11:01:04] (03PS1) 10Filippo Giunchedi: cassandra: update submodule [puppet] - 10https://gerrit.wikimedia.org/r/191317 [11:02:01] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] cassandra: update submodule [puppet] - 10https://gerrit.wikimedia.org/r/191317 (owner: 10Filippo Giunchedi) [11:02:30] RECOVERY - puppet last run on cerium is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [11:04:39] RECOVERY - puppet last run on xenon is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [11:05:39] 3operations, Wikimedia-Git-or-Gerrit: Upgrade gerrit to 2.8.6 - https://phabricator.wikimedia.org/T65847#699035 (10Nemo_bis) [11:20:30] (03CR) 10KartikMistry: "Merged, https://gerrit.wikimedia.org/r/#/c/191255/ so this can go ahead." [puppet] - 10https://gerrit.wikimedia.org/r/191256 (owner: 10KartikMistry) [11:21:51] akosiaris: ^^ :) [11:28:58] (03CR) 10Alexandros Kosiaris: [C: 032] Beta: Use array format for dictionary in cxserver config [puppet] - 10https://gerrit.wikimedia.org/r/191256 (owner: 10KartikMistry) [11:32:24] akosiaris: thanks! [11:37:49] (03PS1) 10ArielGlenn: add ipv6 addr for ms1001 [dns] - 10https://gerrit.wikimedia.org/r/191320 [11:42:33] (03CR) 10ArielGlenn: [C: 032] add ipv6 addr for ms1001 [dns] - 10https://gerrit.wikimedia.org/r/191320 (owner: 10ArielGlenn) [11:51:54] (03PS1) 10ArielGlenn: enable ipv6 for ms1001 [puppet] - 10https://gerrit.wikimedia.org/r/191321 [11:56:02] (03CR) 10ArielGlenn: [C: 032] enable ipv6 for ms1001 [puppet] - 10https://gerrit.wikimedia.org/r/191321 (owner: 10ArielGlenn) [11:58:38] (03CR) 10Alexandros Kosiaris: "I think we can leave that as is for now. The ability to restart cxserver after a deploy in case trebuchet misbehaves is useful" [puppet] - 10https://gerrit.wikimedia.org/r/189915 (owner: 10KartikMistry) [12:00:07] (03PS33) 10KartikMistry: cxserver: Add Yandex support [puppet] - 10https://gerrit.wikimedia.org/r/186538 (https://phabricator.wikimedia.org/T88512) [12:05:29] PROBLEM - Debian mirror in sync with upstream on carbon is CRITICAL: /srv/mirrors/debian is over 14 hours old. [12:08:06] 3Phabricator, operations, Wikimedia-Bugzilla: Create a static HTML version of Bugzilla - https://phabricator.wikimedia.org/T85140#1046164 (10JohnLewis) >>! In T85140#1045971, @Aklapper wrote: >>>! In T85140#1045169, @JohnLewis wrote: >> 2. Wait for T85141 to be resolved. > > How is that related to obsolete (but... [12:13:59] RECOVERY - Debian mirror in sync with upstream on carbon is OK: /srv/mirrors/debian is over 0 hours old. [12:29:25] 3Phabricator, operations, Wikimedia-Bugzilla: Create a static HTML version of Bugzilla - https://phabricator.wikimedia.org/T85140#1046176 (10Aklapper) What does "recovering obsolete attachments" mean? All obsolete attachments are accessible from current old-bugzilla.wikimedia.org without any login required, so I... [12:35:14] 3Phabricator, operations, Wikimedia-Bugzilla: Create a static HTML version of Bugzilla - https://phabricator.wikimedia.org/T85140#1046181 (10JohnLewis) The plan is to remove old-bugzilla as soon as possible as maintaining old abandoned project (in the sense of we don't use it) is not going to happen which can op... [12:42:09] (03CR) 1020after4: [C: 031] l10nupdate: use --no-shared-authsock with sync-dir [puppet] - 10https://gerrit.wikimedia.org/r/191251 (https://phabricator.wikimedia.org/T76061) (owner: 10BryanDavis) [12:49:09] PROBLEM - git.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:56:29] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 59694 bytes in 0.223 second response time [13:04:27] 3Datasets-General-or-Unknown, operations: dumps.wikimedia.org seems super-slow right now - https://phabricator.wikimedia.org/T45647#1046213 (10ArielGlenn) The ms1001 rsync should kick off in about an hour, I'll give an ETA of finishing time about an hour after that. Not more than a couple of days I would think. [13:07:12] 3operations: Our custom php packages need to create some conf.d links - https://phabricator.wikimedia.org/T89157#1046217 (10Joe) Packages uploaded to carbon; hhvm versions installed on beta, I'm going to install the php extensions on silver now. [13:11:26] (03CR) 10Alexandros Kosiaris: [C: 04-1] cxserver: Add Yandex support (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/186538 (https://phabricator.wikimedia.org/T88512) (owner: 10KartikMistry) [13:13:02] 3operations: Our custom php packages need to create some conf.d links - https://phabricator.wikimedia.org/T89157#1046218 (10Joe) 5Open>3Resolved [13:25:30] PROBLEM - puppet last run on ms-be2006 is CRITICAL: CRITICAL: puppet fail [13:25:44] 3Wikimedia-Bugzilla, operations, Phabricator: Create a static HTML version of Bugzilla - https://phabricator.wikimedia.org/T85140#1046219 (10Aklapper) I know that. It's what this task is about. :) I also understand that "Show Obsolete" does nothing currently. But I don't understand how "recovering obsolete atta... [13:29:53] 3Wikimedia-Bugzilla, operations, Phabricator: Create a static HTML version of Bugzilla - https://phabricator.wikimedia.org/T85140#1046228 (10JohnLewis) Attachments are stored in the database. Providing a database dump leaving attachments and obsolete ones in the dump would allow people to recover them in what ev... [13:36:26] (03PS1) 10ArielGlenn: datasets: make sure rsyncd is running on them [puppet] - 10https://gerrit.wikimedia.org/r/191330 [13:43:49] RECOVERY - puppet last run on ms-be2006 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [13:44:24] (03CR) 10ArielGlenn: [C: 032] datasets: make sure rsyncd is running on them [puppet] - 10https://gerrit.wikimedia.org/r/191330 (owner: 10ArielGlenn) [14:02:03] (03PS34) 10Alexandros Kosiaris: cxserver: Add Yandex support [puppet] - 10https://gerrit.wikimedia.org/r/186538 (https://phabricator.wikimedia.org/T88512) (owner: 10KartikMistry) [14:08:47] (03PS35) 10Alexandros Kosiaris: cxserver: Add Yandex support [puppet] - 10https://gerrit.wikimedia.org/r/186538 (https://phabricator.wikimedia.org/T88512) (owner: 10KartikMistry) [14:15:59] (03PS36) 10Alexandros Kosiaris: cxserver: Add Yandex support [puppet] - 10https://gerrit.wikimedia.org/r/186538 (https://phabricator.wikimedia.org/T88512) (owner: 10KartikMistry) [14:23:36] (03CR) 10Alexandros Kosiaris: [C: 032] "Noop in production, cherry-picked temporarily on beta and populates the configuration as expected. Merging" [puppet] - 10https://gerrit.wikimedia.org/r/186538 (https://phabricator.wikimedia.org/T88512) (owner: 10KartikMistry) [14:25:17] <^d> apergos: About? Could do that (hopefully no-op) config change [14:25:36] ah. [14:25:39] give me 5 mins [14:25:41] ^d: [14:25:45] <^d> Mmk [14:25:52] (gotta get pasta off the stove, very latelunch) [14:27:38] akosiaris: thanks! [14:28:04] 3Ops-Access-Requests, operations: Requesting access to analytics-privatedata-users for jamesur - https://phabricator.wikimedia.org/T89739#1046265 (10Ottomata) James, What is it you want access to? stats boxes are different than analytics cluster. If you want access to private webrequest logs on the Hadoop clu... [14:29:57] kart_: better check if everything is working as expected. I went as far as puppet [14:32:38] "nothing every takes just 5 minutes" [14:32:44] ^d: back. fire when ready [14:33:00] * ^d loads the cannon [14:33:15] (03CR) 10Chad: [C: 032] Don't load php_utfnormal.so using dl() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/164009 (owner: 10PleaseStand) [14:33:27] (03Merged) 10jenkins-bot: Don't load php_utfnormal.so using dl() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/164009 (owner: 10PleaseStand) [14:34:11] akosiaris: yep. tested. [14:34:13] 3Ops-Access-Requests, operations: access request for researcher to analytics-users in Hadoop - https://phabricator.wikimedia.org/T89264#1046286 (10Ottomata) Hiya! Sorry for the delay on this! (Leila, poke me next time!) Has Ashwin signed this? https://phabricator.wikimedia.org/L3 [14:34:28] !log demon Synchronized wmf-config/CommonSettings.php: remove dl() of php_utfnormal (duration: 00m 07s) [14:34:35] akosiaris: sent mail to team for that. [14:34:37] Logged the message, Master [14:34:38] <^d> apergos: And we're live [14:34:42] great [14:34:47] akosiaris: feel free to correct me there :) [14:35:00] first round I shoul know in about a minute [14:35:12] next round I'll know in about ano hour and ten (next full dump start) [14:35:38] <^d> I'm not seeing anything coming from the apaches, etc. Which was expected :) [14:36:52] too much prefetch going on, can't tell yet [14:37:41] and the other host is oing 7z compression ;-/ [14:39:54] (03PS1) 10Ottomata: Put new user Ashwin Pradeep Paranjape in analytics-users group [puppet] - 10https://gerrit.wikimedia.org/r/191338 (https://phabricator.wikimedia.org/T89264) [14:42:16] (03PS1) 10Filippo Giunchedi: cassandra: set rack/dc/cluster name [puppet] - 10https://gerrit.wikimedia.org/r/191339 (https://phabricator.wikimedia.org/T76986) [14:43:13] 3ops-eqiad, operations: rack and setup restbase production cluster in eqiad - https://phabricator.wikimedia.org/T88805#1046292 (10fgiunchedi) pending renames in racktables too, only asset name now [14:43:16] ^d: manually ran a little piece as a test, looks good [14:43:28] <^d> Yay [14:43:40] yep, here's to getting rid of cruft! [14:44:08] <^d> Removing stuff from CommonSettings makes me happy since it runs on every single request [14:44:33] _joe_ akosiaris I don't like very much the whole hiera regexp in https://gerrit.wikimedia.org/r/191339 perhaps there are better ways [14:45:27] <_joe_> godog: probably using roles [14:45:37] <_joe_> godog: I'll take a look in a few [14:45:40] thanks [14:45:55] <_joe_> yeah that's pretty ugly [14:46:09] <_joe_> also, racks should be facts [14:46:12] <_joe_> not class variables [14:46:30] <_joe_> but maybe I got that wrong someway? [14:47:03] (03PS1) 10Rush: phab update for deploy window [puppet] - 10https://gerrit.wikimedia.org/r/191340 [14:47:11] yeah they should, I didn't see anything rack/row related in facter -p [14:50:03] <_joe_> mmmh [14:52:44] (03CR) 10Rush: [C: 032] phab update for deploy window [puppet] - 10https://gerrit.wikimedia.org/r/191340 (owner: 10Rush) [14:52:59] (03PS1) 10Ottomata: Add new user Daisy Chen to researchers group Bug: T89418 [puppet] - 10https://gerrit.wikimedia.org/r/191341 (https://phabricator.wikimedia.org/T89418) [14:54:19] 3Ops-Access-Requests, operations: access request for researcher to analytics-users in Hadoop - https://phabricator.wikimedia.org/T89264#1046308 (10Ottomata) Also, I think we need approval from a manager for Ashwin. I'm not sure who this would be. Dario? [14:54:47] (03CR) 10QEDK: [C: 031] Add new user Daisy Chen to researchers group Bug: T89418 [puppet] - 10https://gerrit.wikimedia.org/r/191341 (https://phabricator.wikimedia.org/T89418) (owner: 10Ottomata) [14:54:55] 3Ops-Access-Requests, operations: Access request for stat1003 - https://phabricator.wikimedia.org/T89418#1046310 (10Ottomata) Daisy, we need approval from your supervisor on this ticket. Thanks! [15:00:04] chasemp: Dear anthropoid, the time has come. Please deploy Phabricator update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150218T1500). [15:03:24] (03PS2) 10Filippo Giunchedi: cassandra: set rack/dc/cluster name [puppet] - 10https://gerrit.wikimedia.org/r/191339 (https://phabricator.wikimedia.org/T89657) [15:09:34] is pahbricator down ? [15:09:48] chasemp: ^ _joe_ ? [15:09:56] matanya: see update from jouncebot, deployment in progress [15:10:06] oh, sorry [15:10:14] was refreshing a bug [15:10:18] and got 503 [15:10:20] phabricator.wikimedia.org intentionally under maintenance, see https://www.mediawiki.org/wiki/Phabricator/Maintenance [15:10:37] ehm, phabricator is down for me [15:10:37] yo andre__ [15:10:39] would be nice if the what was the page saying [15:10:40] oh :) [15:10:58] *if that was what the page said [15:11:07] (03CR) 10QEDK: [C: 031] Put new user Ashwin Pradeep Paranjape in analytics-users group [puppet] - 10https://gerrit.wikimedia.org/r/191338 (https://phabricator.wikimedia.org/T89264) (owner: 10Ottomata) [15:11:08] yeah. maybe next time. :) [15:11:10] yeah, varnish doesn't really distinc between that [15:11:17] matanya: submit a patch ? [15:11:19] <_joe_> !log shutting down mc1016 for movement to a new row [15:11:24] Logged the message, Master [15:11:26] akosiaris: maybe :) [15:13:21] * marktraceur looks around for SWATters [15:13:23] (03PS2) 10Giuseppe Lavagetto: move mc1016 to a new rack/row [dns] - 10https://gerrit.wikimedia.org/r/191270 [15:13:42] (03CR) 10Giuseppe Lavagetto: [C: 032] move mc1016 to a new rack/row [dns] - 10https://gerrit.wikimedia.org/r/191270 (owner: 10Giuseppe Lavagetto) [15:14:11] <^d> marktraceur: sup? [15:15:21] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] "Seems like this should work. Yes at the moment Rack/row is not exposed. There is a (flawed) approach at https://gerrit.wikimedia.org/r/#/c" [puppet] - 10https://gerrit.wikimedia.org/r/191339 (https://phabricator.wikimedia.org/T89657) (owner: 10Filippo Giunchedi) [15:15:49] marktraceur? [15:17:53] Was just wondering who's doing it this fine morning [15:18:18] (03CR) 10Anomie: CX: Do not use internal $wmgParsoidURL (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190776 (https://phabricator.wikimedia.org/T89558) (owner: 10KartikMistry) [15:18:30] I did nine patches yesterday morning, I think I'll sit this one out [15:19:39] I think it was nine. [15:20:59] Yeah, I was about to say that patch doesn't make much sense [15:21:32] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "I'd move things around hiera, but for the rest, I agree with akosiaris" (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/191339 (https://phabricator.wikimedia.org/T89657) (owner: 10Filippo Giunchedi) [15:24:07] (03CR) 10KartikMistry: CX: Do not use internal $wmgParsoidURL (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190776 (https://phabricator.wikimedia.org/T89558) (owner: 10KartikMistry) [15:24:57] kart_, how is that not accessible? [15:25:28] !log phabricator updated for T86772 [15:25:32] Logged the message, Master [15:26:16] (03CR) 10Anomie: CX: Do not use internal $wmgParsoidURL (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190776 (https://phabricator.wikimedia.org/T89558) (owner: 10KartikMistry) [15:28:50] milimetric: what are you on phab? [15:29:13] chasemp: milimetric [15:29:21] one l [15:29:29] (03CR) 10Alex Monk: CX: Do not use internal $wmgParsoidURL (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190776 (https://phabricator.wikimedia.org/T89558) (owner: 10KartikMistry) [15:29:31] I was thoroughly confused :) [15:29:36] yeah, chasemp i believe in brevity [15:29:44] why, is there someone with two LLs?! [15:30:03] nah you're safe :) [15:30:08] milimetric: fyi https://phabricator.wikimedia.org/T89646 [15:30:10] k, 'cause that means war [15:30:14] (03CR) 10KartikMistry: CX: Do not use internal $wmgParsoidURL (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190776 (https://phabricator.wikimedia.org/T89558) (owner: 10KartikMistry) [15:30:43] chasemp: cool, that's great, I'll tell kevinator who maintains our boards [15:32:55] (03CR) 10Anomie: CX: Do not use internal $wmgParsoidURL (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190776 (https://phabricator.wikimedia.org/T89558) (owner: 10KartikMistry) [15:38:22] anomie: Krenair well the real issue was as Nikerabbit mentioned in 1st PS [15:38:25] 3Wikimedia-Labs-wikitech-interface, operations: wikitech instances list is blank - https://phabricator.wikimedia.org/T89808#1046454 (10scfc) The first case (no instances showing up on https://wikitech.wikimedia.org/wiki/Special:NovaInstance) happens usually when the OpenStack auth expires, but not the MediaWiki... [15:38:45] 3operations: bond eth intefaces on ms1001 - https://phabricator.wikimedia.org/T89829#1046466 (10ArielGlenn) 3NEW [15:38:48] (03PS2) 10Giuseppe Lavagetto: nutcracker: move and label mc1016 [puppet] - 10https://gerrit.wikimedia.org/r/191288 [15:39:11] (03CR) 10Giuseppe Lavagetto: [C: 032] nutcracker: move and label mc1016 [puppet] - 10https://gerrit.wikimedia.org/r/191288 (owner: 10Giuseppe Lavagetto) [15:39:17] You want to pull articles from production>? [15:39:21] 3ops-eqiad, operations: mc1016 mgmt not working - https://phabricator.wikimedia.org/T82259#1046481 (10Cmjohnson) 5Open>3Resolved Powered the server down. This has been resolved. [15:39:22] 3ops-eqiad, operations, Incident-20150205-SiteOutage: Split memcached in eqiad across multiple racks/rows - https://phabricator.wikimedia.org/T83551#1046483 (10Cmjohnson) [15:39:23] (03CR) 10Giuseppe Lavagetto: [V: 032] nutcracker: move and label mc1016 [puppet] - 10https://gerrit.wikimedia.org/r/191288 (owner: 10Giuseppe Lavagetto) [15:39:40] (03CR) 10KartikMistry: "More clarity: The issues are that we want to pull articles from production, and the production $wmgParsoidURL currently in use is not acce" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190776 (https://phabricator.wikimedia.org/T89558) (owner: 10KartikMistry) [15:40:49] (03PS3) 10Filippo Giunchedi: cassandra: set rack/dc/cluster name [puppet] - 10https://gerrit.wikimedia.org/r/191339 (https://phabricator.wikimedia.org/T89657) [15:41:09] (03CR) 10Filippo Giunchedi: cassandra: set rack/dc/cluster name (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/191339 (https://phabricator.wikimedia.org/T89657) (owner: 10Filippo Giunchedi) [15:41:20] 3operations: bond eth intefaces on ms1001 - https://phabricator.wikimedia.org/T89829#1046518 (10ArielGlenn) [15:41:29] 3operations, Wikimedia-Git-or-Gerrit: TransparencyReport repository master in Gerrit silently made private - https://phabricator.wikimedia.org/T89640#1046521 (10hashar) So I guess we can close this bug now. The repository has been made private while some reports are being crafted for later public release. [15:42:19] (03PS6) 10KartikMistry: CX: Do not use internal $wmgParsoidURL [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190776 (https://phabricator.wikimedia.org/T89558) [15:42:44] PROBLEM - puppet last run on dbstore1001 is CRITICAL: CRITICAL: puppet fail [15:42:52] blah. commit msg. [15:42:54] PROBLEM - puppet last run on analytics1015 is CRITICAL: CRITICAL: puppet fail [15:42:54] PROBLEM - puppet last run on mw1048 is CRITICAL: CRITICAL: puppet fail [15:42:54] PROBLEM - puppet last run on ms-be1001 is CRITICAL: CRITICAL: puppet fail [15:43:04] PROBLEM - puppet last run on wtp1019 is CRITICAL: CRITICAL: puppet fail [15:43:04] PROBLEM - puppet last run on es1006 is CRITICAL: CRITICAL: puppet fail [15:43:04] PROBLEM - puppet last run on mw1252 is CRITICAL: CRITICAL: puppet fail [15:43:05] PROBLEM - puppet last run on mw1031 is CRITICAL: CRITICAL: puppet fail [15:43:14] PROBLEM - puppet last run on elastic1031 is CRITICAL: CRITICAL: puppet fail [15:43:14] PROBLEM - puppet last run on cp1059 is CRITICAL: CRITICAL: puppet fail [15:43:24] PROBLEM - puppet last run on mw1147 is CRITICAL: CRITICAL: puppet fail [15:43:25] PROBLEM - puppet last run on ms-be1013 is CRITICAL: CRITICAL: puppet fail [15:43:25] PROBLEM - puppet last run on mc1011 is CRITICAL: CRITICAL: puppet fail [15:43:34] PROBLEM - puppet last run on mw1178 is CRITICAL: CRITICAL: puppet fail [15:43:44] PROBLEM - puppet last run on mc1004 is CRITICAL: CRITICAL: puppet fail [15:43:44] PROBLEM - puppet last run on wtp1017 is CRITICAL: CRITICAL: puppet fail [15:43:45] <_joe_> aw that's me [15:43:54] PROBLEM - puppet last run on mc1009 is CRITICAL: CRITICAL: puppet fail [15:43:54] PROBLEM - puppet last run on cp1070 is CRITICAL: CRITICAL: puppet fail [15:44:00] (03PS1) 10Giuseppe Lavagetto: Revert "nutcracker: move and label mc1016" [puppet] - 10https://gerrit.wikimedia.org/r/191345 [15:44:03] PROBLEM - puppet last run on db1065 is CRITICAL: CRITICAL: puppet fail [15:44:04] PROBLEM - puppet last run on db1029 is CRITICAL: CRITICAL: puppet fail [15:44:05] PROBLEM - puppet last run on mw1200 is CRITICAL: CRITICAL: puppet fail [15:44:12] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] Revert "nutcracker: move and label mc1016" [puppet] - 10https://gerrit.wikimedia.org/r/191345 (owner: 10Giuseppe Lavagetto) [15:44:13] PROBLEM - puppet last run on mw1145 is CRITICAL: CRITICAL: puppet fail [15:44:34] PROBLEM - puppet last run on mw1089 is CRITICAL: CRITICAL: puppet fail [15:44:43] PROBLEM - puppet last run on analytics1033 is CRITICAL: CRITICAL: puppet fail [15:44:43] PROBLEM - puppet last run on analytics1017 is CRITICAL: CRITICAL: puppet fail [15:44:43] PROBLEM - puppet last run on mw1141 is CRITICAL: CRITICAL: puppet fail [15:44:44] PROBLEM - puppet last run on rdb1004 is CRITICAL: CRITICAL: puppet fail [15:44:55] PROBLEM - puppet last run on netmon1001 is CRITICAL: CRITICAL: puppet fail [15:44:56] PROBLEM - puppet last run on mw1080 is CRITICAL: CRITICAL: puppet fail [15:44:56] PROBLEM - puppet last run on platinum is CRITICAL: CRITICAL: puppet fail [15:44:56] PROBLEM - puppet last run on mw1041 is CRITICAL: CRITICAL: puppet fail [15:44:57] PROBLEM - puppet last run on db1031 is CRITICAL: CRITICAL: puppet fail [15:45:03] PROBLEM - puppet last run on helium is CRITICAL: CRITICAL: puppet fail [15:45:04] PROBLEM - puppet last run on ms-be1003 is CRITICAL: CRITICAL: puppet fail [15:45:04] PROBLEM - puppet last run on mw1187 is CRITICAL: CRITICAL: puppet fail [15:45:15] PROBLEM - puppet last run on cerium is CRITICAL: CRITICAL: puppet fail [15:45:22] <_joe_> uff, sorry [15:45:33] PROBLEM - puppet last run on snapshot1003 is CRITICAL: CRITICAL: puppet fail [15:45:34] PROBLEM - puppet last run on mw1174 is CRITICAL: CRITICAL: puppet fail [15:45:44] PROBLEM - puppet last run on mw1153 is CRITICAL: CRITICAL: puppet fail [15:45:44] PROBLEM - puppet last run on lead is CRITICAL: CRITICAL: puppet fail [15:45:45] PROBLEM - puppet last run on elastic1018 is CRITICAL: CRITICAL: puppet fail [15:45:54] PROBLEM - puppet last run on mw1088 is CRITICAL: CRITICAL: puppet fail [15:45:54] PROBLEM - puppet last run on mw1228 is CRITICAL: CRITICAL: puppet fail [15:45:54] PROBLEM - puppet last run on wtp1020 is CRITICAL: CRITICAL: puppet fail [15:45:54] PROBLEM - puppet last run on mc1003 is CRITICAL: CRITICAL: puppet fail [15:45:55] PROBLEM - puppet last run on potassium is CRITICAL: CRITICAL: puppet fail [15:46:03] PROBLEM - puppet last run on elastic1012 is CRITICAL: CRITICAL: puppet fail [15:46:14] PROBLEM - puppet last run on elastic1001 is CRITICAL: CRITICAL: puppet fail [15:46:14] PROBLEM - puppet last run on mw1060 is CRITICAL: CRITICAL: puppet fail [15:46:19] 3ops-eqiad, operations: cable up ms1001's second eth interface - https://phabricator.wikimedia.org/T89836#1046535 (10ArielGlenn) 3NEW [15:46:23] RECOVERY - puppet last run on mw1031 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [15:46:24] PROBLEM - puppet last run on mw1173 is CRITICAL: CRITICAL: puppet fail [15:46:34] PROBLEM - puppet last run on labstore1003 is CRITICAL: CRITICAL: puppet fail [15:46:38] 3ops-eqiad, operations: cable up ms1001's second eth interface - https://phabricator.wikimedia.org/T89836#1046545 (10ArielGlenn) [15:46:43] PROBLEM - puppet last run on mw1099 is CRITICAL: CRITICAL: puppet fail [15:46:43] PROBLEM - puppet last run on sca1002 is CRITICAL: CRITICAL: puppet fail [15:46:44] PROBLEM - puppet last run on mw1120 is CRITICAL: CRITICAL: puppet fail [15:46:44] PROBLEM - puppet last run on db1040 is CRITICAL: CRITICAL: puppet fail [15:46:44] PROBLEM - puppet last run on db1050 is CRITICAL: CRITICAL: puppet fail [15:46:44] PROBLEM - puppet last run on ms-fe1001 is CRITICAL: CRITICAL: puppet fail [15:46:45] (03PS7) 10KartikMistry: CX: Do not use internal $wmgParsoidURL [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190776 (https://phabricator.wikimedia.org/T89558) [15:47:04] PROBLEM - puppet last run on elastic1008 is CRITICAL: CRITICAL: puppet fail [15:47:04] PROBLEM - puppet last run on mw1117 is CRITICAL: CRITICAL: puppet fail [15:47:04] PROBLEM - puppet last run on mw1003 is CRITICAL: CRITICAL: puppet fail [15:47:24] PROBLEM - puppet last run on mw1235 is CRITICAL: CRITICAL: puppet fail [15:47:24] PROBLEM - puppet last run on mw1226 is CRITICAL: CRITICAL: puppet fail [15:47:55] (03PS1) 10Giuseppe Lavagetto: nutcracker: move mc1016 [puppet] - 10https://gerrit.wikimedia.org/r/191349 [15:48:19] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] nutcracker: move mc1016 [puppet] - 10https://gerrit.wikimedia.org/r/191349 (owner: 10Giuseppe Lavagetto) [15:51:05] manybubbles, marktraceur, ^d, Krenair (since you're here): Who wants to SWAT today? [15:51:10] jouncebot: refresh [15:51:13] I refreshed my knowledge about deployments. [15:51:20] jouncebot: next [15:51:20] In 0 hour(s) and 8 minute(s): Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150218T1600) [15:51:34] RECOVERY - puppet last run on mw1252 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [15:51:37] <_joe_> do I have the time for a single sync-file before swat? [15:51:46] _joe_: Probably [15:51:54] <_joe_> it's the change of an IP address for sessions [15:51:56] yes [15:52:07] swat will be simple, we have two labs config changes [15:52:15] RECOVERY - puppet last run on mw1228 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [15:52:34] RECOVERY - puppet last run on mw1235 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [15:52:34] RECOVERY - puppet last run on mw1226 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [15:52:38] (03PS2) 10Giuseppe Lavagetto: move mc1016 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191305 [15:53:02] It looks like gi11es set that up [15:53:08] wrong channel [15:53:13] <^d> We don't even have to sync-file the changes [15:53:34] shouldn't we keep them in sync anyway ^d? [15:53:34] RECOVERY - puppet last run on mw1187 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [15:53:47] <^d> Krenair: Yeah, I usually do for completeness sake :) [15:53:48] RECOVERY - puppet last run on mw1200 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [15:53:49] RECOVERY - puppet last run on mw1173 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [15:53:50] (03CR) 10Giuseppe Lavagetto: [C: 032] "mc1016 has moved" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191305 (owner: 10Giuseppe Lavagetto) [15:54:04] RECOVERY - puppet last run on mw1174 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [15:54:13] RECOVERY - puppet last run on mw1178 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [15:54:41] 3operations: does this do project association? - https://phabricator.wikimedia.org/T89837#1046562 (10chasemp) 3NEW [15:55:02] ottomata: It would be nice to get a review/merge of https://gerrit.wikimedia.org/r/#/c/191251 today. twentyafterfour merged the associated scap patch. Not deployed yet, but I will do that after the swat is done [15:55:04] RECOVERY - puppet last run on mw1147 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [15:55:04] ^d: you doing it then? [15:55:14] RECOVERY - puppet last run on mw1120 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [15:55:15] RECOVERY - puppet last run on mw1141 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [15:55:20] 3operations: does this do project association? - https://phabricator.wikimedia.org/T89837#1046568 (10chasemp) 5Open>3Invalid a:3chasemp [15:55:23] RECOVERY - puppet last run on mw1153 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [15:55:35] RECOVERY - puppet last run on mw1117 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [15:55:44] <^d> Krenair: Isn't one of them still wrong and should use $wmgParsoidUrl? [15:55:54] RECOVERY - puppet last run on mw1145 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [15:56:08] Apparently they want to use production articles. [15:56:09] !log oblivian Synchronized wmf-config/session.php: mc1016 IP change (duration: 00m 07s) [15:56:11] ^d: nope. We want to use Production public parsoid URL [15:56:15] Logged the message, Master [15:56:27] bd808: will look at in an a couple of mins... [15:56:34] RECOVERY - puppet last run on mw1080 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [15:56:40] Well, if you're sure your code can handle foreign articles like that... [15:56:44] Krenair: Number of Articles in Beta is not much good for testing :) [15:56:47] ottomata: cool beans. thanks [15:56:54] RECOVERY - puppet last run on mw1060 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [15:57:14] RECOVERY - puppet last run on mw1089 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [15:57:14] RECOVERY - puppet last run on mw1099 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [15:57:14] <^d> So the comments on PS5 aren't applicable anymore? [15:57:17] and it knows where to send any API requests etc. to [15:57:34] Krenair: yes. It can handle. [15:57:34] RECOVERY - puppet last run on mw1088 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:58:45] RECOVERY - puppet last run on mw1003 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [15:58:45] RECOVERY - puppet last run on mw1041 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [15:58:54] RECOVERY - puppet last run on mw1048 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [16:00:05] manybubbles, anomie, ^d, marktraceur: Dear anthropoid, the time has come. Please deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150218T1600). [16:00:14] RECOVERY - puppet last run on es1006 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [16:00:34] (03CR) 10Chad: [C: 032] Beta: Update $wgContentTranslationSiteTemplates [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191264 (owner: 10KartikMistry) [16:00:44] (03Merged) 10jenkins-bot: Beta: Update $wgContentTranslationSiteTemplates [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191264 (owner: 10KartikMistry) [16:01:05] (03CR) 10Chad: [C: 032] CX: Do not use internal $wmgParsoidURL [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190776 (https://phabricator.wikimedia.org/T89558) (owner: 10KartikMistry) [16:01:07] (03CR) 10jenkins-bot: [V: 04-1] CX: Do not use internal $wmgParsoidURL [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190776 (https://phabricator.wikimedia.org/T89558) (owner: 10KartikMistry) [16:01:13] RECOVERY - puppet last run on analytics1015 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [16:01:14] RECOVERY - puppet last run on ms-be1001 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [16:01:14] RECOVERY - puppet last run on wtp1019 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [16:01:16] <^d> Bah [16:01:19] <^d> Manual rebase needed [16:01:32] bah [16:01:33] RECOVERY - puppet last run on cp1059 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [16:01:44] RECOVERY - puppet last run on mc1011 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [16:01:44] RECOVERY - puppet last run on ms-be1013 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [16:01:54] RECOVERY - puppet last run on analytics1033 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [16:01:54] RECOVERY - puppet last run on rdb1004 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [16:01:55] RECOVERY - puppet last run on mc1004 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [16:02:04] RECOVERY - puppet last run on wtp1017 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [16:02:05] RECOVERY - puppet last run on dbstore1001 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [16:02:13] RECOVERY - puppet last run on mc1009 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [16:02:24] RECOVERY - puppet last run on db1029 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [16:02:25] (03PS8) 10Chad: CX: Do not use internal $wmgParsoidURL [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190776 (https://phabricator.wikimedia.org/T89558) (owner: 10KartikMistry) [16:02:34] RECOVERY - puppet last run on elastic1031 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [16:02:43] (03CR) 10Chad: [C: 032] CX: Do not use internal $wmgParsoidURL [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190776 (https://phabricator.wikimedia.org/T89558) (owner: 10KartikMistry) [16:02:52] (03Merged) 10jenkins-bot: CX: Do not use internal $wmgParsoidURL [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190776 (https://phabricator.wikimedia.org/T89558) (owner: 10KartikMistry) [16:02:54] RECOVERY - puppet last run on analytics1017 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [16:03:04] RECOVERY - puppet last run on netmon1001 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [16:03:04] RECOVERY - puppet last run on wtp1020 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [16:03:15] RECOVERY - puppet last run on potassium is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [16:03:16] RECOVERY - puppet last run on helium is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [16:03:22] !log demon Synchronized wmf-config/CommonSettings-labs.php: for completeness, no-op (duration: 00m 07s) [16:03:23] RECOVERY - puppet last run on ms-be1003 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [16:03:23] RECOVERY - puppet last run on cp1070 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [16:03:24] RECOVERY - puppet last run on db1065 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:03:28] Logged the message, Master [16:03:28] Thanks ^d [16:03:31] <^d> yw [16:03:35] RECOVERY - puppet last run on cerium is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [16:03:39] <^d> I guess wait a minute or two and see how beta's doing [16:03:44] RECOVERY - puppet last run on snapshot1003 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [16:03:44] RECOVERY - puppet last run on labstore1003 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [16:03:47] yes [16:03:54] RECOVERY - puppet last run on ms-fe1001 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [16:04:04] RECOVERY - puppet last run on lead is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [16:04:11] 3operations: Varnish GeoIP is broken for HTTPS+IPv6 traffic - https://phabricator.wikimedia.org/T89688#1046590 (10Yurik) X-Analytics=ip=..... , and as discussed before, we should do T89838 to make proxy IP management easier [16:04:14] RECOVERY - puppet last run on platinum is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [16:04:14] RECOVERY - puppet last run on mc1003 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [16:04:14] RECOVERY - puppet last run on db1031 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:04:34] RECOVERY - puppet last run on elastic1001 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [16:04:47] ^d: verified. [16:04:52] <^d> \o/ [16:04:54] RECOVERY - puppet last run on sca1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:05:04] RECOVERY - puppet last run on db1050 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [16:05:13] RECOVERY - puppet last run on elastic1018 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:05:14] RECOVERY - puppet last run on elastic1008 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [16:05:25] RECOVERY - puppet last run on elastic1012 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [16:06:03] RECOVERY - puppet last run on db1040 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:06:21] bd808: i have not merged anything like this before, and I don't really know the context [16:08:15] ottomata: ok. This is the core script for the nightly l10nupdate process that runs on tin to merge new l10n data into the prod l10n cache. It has been broken since December and this patch is trying to fix that. [16:08:29] so the worst thing that happens is it remains broken [16:08:42] and the best is that the syncs actually start working [16:08:42] 3Ops-Access-Requests, Services, operations, Citoid: Give mvolz access to sha machine i.e. http://citoid.wikimedia.org/ - https://phabricator.wikimedia.org/T89057#1046594 (10Ottomata) @Jdforrester-WMF, I think we need your approval on this. [16:09:41] 3ops-eqiad: new eqiad task - https://phabricator.wikimedia.org/T89839#1046599 (10chasemp) 3NEW [16:09:54] what's up with the auth sock vs key thing? will folks already ahve keys distributed or something? [16:10:15] the l10nupdate user has a key that is already distributed [16:11:12] when we introduced the shared auth sock for mw deployers it broke the l10nupdate job because that utility user can't access the shared socket [16:11:14] 3Ops-Access-Requests, operations: Access request for stat1003 - https://phabricator.wikimedia.org/T89418#1046614 (10dchen) hi guys, Jared responded to my original ops-request@ email with his approval, and it sent me this email below. I am restricted from viewing it on phab for some reason, but perhaps you guys c... [16:11:36] the new flag to scap lets this l10nupdate job turn off the automatic use of the shared auth sock [16:11:54] scap, sync-dir, sync-file all get the new feature [16:12:11] hm ok [16:12:18] (03PS2) 10Ottomata: l10nupdate: use --no-shared-authsock with sync-dir [puppet] - 10https://gerrit.wikimedia.org/r/191251 (https://phabricator.wikimedia.org/T76061) (owner: 10BryanDavis) [16:12:32] (03CR) 10Ottomata: [C: 032 V: 032] l10nupdate: use --no-shared-authsock with sync-dir [puppet] - 10https://gerrit.wikimedia.org/r/191251 (https://phabricator.wikimedia.org/T76061) (owner: 10BryanDavis) [16:12:34] (03PS2) 10Giuseppe Lavagetto: nutcracker: move and label mc1015 [puppet] - 10https://gerrit.wikimedia.org/r/191289 [16:12:36] (03PS2) 10Giuseppe Lavagetto: nutcracker: move and label mc1013 [puppet] - 10https://gerrit.wikimedia.org/r/191291 [16:12:37] (03PS2) 10Giuseppe Lavagetto: nutcracker: move and label mc1014 [puppet] - 10https://gerrit.wikimedia.org/r/191290 [16:12:40] (03PS2) 10Giuseppe Lavagetto: nutcracker: move and label mc1008 [puppet] - 10https://gerrit.wikimedia.org/r/191293 [16:12:42] (03PS2) 10Giuseppe Lavagetto: nutcracker: move and label mc1007 [puppet] - 10https://gerrit.wikimedia.org/r/191292 [16:12:43] (03PS2) 10Giuseppe Lavagetto: nutcracker: move and label mc1010 [puppet] - 10https://gerrit.wikimedia.org/r/191295 [16:12:46] (03PS2) 10Giuseppe Lavagetto: nutcracker: move and label mc1009 [puppet] - 10https://gerrit.wikimedia.org/r/191294 [16:12:48] (03PS2) 10Giuseppe Lavagetto: nutcracker: move and label mc1011 [puppet] - 10https://gerrit.wikimedia.org/r/191296 [16:12:50] (03PS2) 10Giuseppe Lavagetto: nutcracker: move and label mc1012 [puppet] - 10https://gerrit.wikimedia.org/r/191297 [16:13:16] 3ops-eqiad, operations: new eqiad task - https://phabricator.wikimedia.org/T89839#1046617 (10chasemp) 5Open>3Invalid a:3chasemp [16:13:37] 3ops-eqiad, operations: should get both ops-eqiad and operations - https://phabricator.wikimedia.org/T89840#1046621 (10chasemp) 3NEW [16:14:09] 3ops-eqiad, operations: should get both ops-eqiad and operations - https://phabricator.wikimedia.org/T89840#1046626 (10chasemp) 5Open>3Invalid a:3chasemp [16:15:20] (03CR) 10Chad: [C: 032] mediawikiwiki: Allow sysop to add and remove themself from translationadmin [mediawiki-config] - 10https://gerrit.wikimedia.org/r/187183 (https://phabricator.wikimedia.org/T87797) (owner: 10Florianschmidtwelzow) [16:15:28] (03Merged) 10jenkins-bot: mediawikiwiki: Allow sysop to add and remove themself from translationadmin [mediawiki-config] - 10https://gerrit.wikimedia.org/r/187183 (https://phabricator.wikimedia.org/T87797) (owner: 10Florianschmidtwelzow) [16:15:38] (03PS2) 10Giuseppe Lavagetto: move mc1015 to a new rack/row [dns] - 10https://gerrit.wikimedia.org/r/191271 [16:16:01] !log demon Synchronized wmf-config/InitialiseSettings.php: translationadmin for sysops on mw.org (duration: 00m 08s) [16:16:07] Logged the message, Master [16:16:50] (03CR) 10Giuseppe Lavagetto: [C: 032] move mc1015 to a new rack/row [dns] - 10https://gerrit.wikimedia.org/r/191271 (owner: 10Giuseppe Lavagetto) [16:17:50] (03PS1) 10GWicke: Add a statsd_port parameter to the restbase class [puppet] - 10https://gerrit.wikimedia.org/r/191350 [16:18:28] (03PS3) 10Giuseppe Lavagetto: nutcracker: move and label mc1015 [puppet] - 10https://gerrit.wikimedia.org/r/191289 [16:19:07] thanks ottomata. I'll get the associated scap change deployed soon [16:19:19] (03CR) 10Giuseppe Lavagetto: [C: 032] nutcracker: move and label mc1015 [puppet] - 10https://gerrit.wikimedia.org/r/191289 (owner: 10Giuseppe Lavagetto) [16:20:39] sho thaqng [16:24:11] 3Ops-Access-Requests, operations: Requesting sudo access to vanadium for mforns - https://phabricator.wikimedia.org/T89471#1046694 (10Ottomata) Hiya, So, this ticket is essentially the same as T88988. We'll have to wait until our ops meeting on Monday before we can grant this. [16:29:23] <^d> godog: Where are we on the elastic rolling restart? [16:30:33] ^d: we are on the 'done' tile [16:30:59] <^d> Oh sweet [16:31:15] 3Ops-Access-Requests, operations, RESTBase: Access to restbase / cassandra cluster - https://phabricator.wikimedia.org/T89366#1046747 (10Ottomata) > Planning to use root to disable Puppet seems like a bad idea on the surface. Is this something that anyone in Ops that has +2 access to operations/puppet ever does?... [16:31:40] ^d: ah, you weren't CC'ed to T86602 that explains it [16:31:50] * ^d reads [16:34:46] 3operations: bond eth intefaces on ms1001 - https://phabricator.wikimedia.org/T89829#1046808 (10Ottomata) @bblack, I'm not entirely sure of what's involved here. I see you've worked on the interface module before. Could you help with this? Or, who should I ask? [16:35:19] 3ops-eqiad, operations: db1054 MCE errors logged for CPU temperature - https://phabricator.wikimedia.org/T89801#1046811 (10Ottomata) p:5Triage>3Normal [16:37:09] 3ops-esams, operations: Upgrade cp3011-3014 with 10G cards - https://phabricator.wikimedia.org/T88684#1046818 (10Ottomata) p:5Triage>3Normal a:3mark [16:37:25] 3ops-esams, operations: Upgrade cp3011-3014 with 10G cards - https://phabricator.wikimedia.org/T88684#1017923 (10Ottomata) Mark, I assigned this to you, as you are likely the one who would go to esams and actually do this :) [16:38:01] <^d> godog: I wonder if we could come up with some script (using salt maybe?) that could make it completely automated? [16:38:14] <^d> Or at least very hands-off [16:38:42] ^d: we most certainly can, haven't got around it but basically yeah a "serial salt" call would do it [16:38:59] <^d> Yeah, that's my thought [16:39:09] <^d> es-tool does the hard work, so salt could just call that probably [16:39:43] yep, one thing to think about is how to safely stop salt so that it simply doesn't go to the next host and call es-tool [16:39:48] and the next, and the next [16:39:59] 3operations: Graph data missing for "MediaWiki: Total Backend Latency" - https://phabricator.wikimedia.org/T85316#1046826 (10Ottomata) [16:40:11] 3operations, Beta-Cluster: Make www-data the web-serving user (is currently apache) - https://phabricator.wikimedia.org/T78076#835083 (10Krenair) Some things like mwscript still use apache, and this is now broken on deployment-prep. Please see T89802 [16:41:08] <^d> I need to figure out the `No handlers could be found for logger "elasticsearch"` error [16:42:03] <_joe_> are we done with SWAT? [16:42:13] <_joe_> I have another sync-file to perform [16:42:26] <^d> Yeah, it's done [16:42:28] ^d: allegedly fixed in later versions of pyes, https://github.com/elasticsearch/elasticsearch-py/issues/67 [16:42:42] <^d> Ah ok! [16:42:43] <^d> Good [16:42:45] in which I've made a fool of myself too [16:42:53] (03PS2) 10Giuseppe Lavagetto: move mc1015 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191306 [16:43:10] (03CR) 10Giuseppe Lavagetto: [C: 032] move mc1015 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191306 (owner: 10Giuseppe Lavagetto) [16:44:10] !log oblivian Synchronized wmf-config/session.php: mc1015 IP change (duration: 00m 05s) [16:44:14] Logged the message, Master [16:44:16] 3operations: unattended elasticsearch restarts - https://phabricator.wikimedia.org/T89845#1046836 (10fgiunchedi) 3NEW a:3fgiunchedi [16:44:46] whee latest phabricator update "projects" completion does the right thing, thanks chasemp [16:45:10] 3RESTBase, operations: Investigate apparent graphite request rate under-reporting - https://phabricator.wikimedia.org/T89846#1046847 (10GWicke) 3NEW [16:45:22] (03PS1) 10Chad: es-tool: Also show unassigned shards during restart [puppet] - 10https://gerrit.wikimedia.org/r/191356 [16:45:42] <^d> godog: Trivial ^ [16:45:44] 3RESTBase, operations: Investigate apparent restbase request rate under-reporting in graphite: statsd issue? - https://phabricator.wikimedia.org/T89846#1046854 (10GWicke) [16:45:56] 3RESTBase, operations: Investigate apparent restbase request rate under-reporting in graphite: statsd issue? - https://phabricator.wikimedia.org/T89846#1046847 (10GWicke) p:5Triage>3Normal [16:46:20] (03CR) 10Filippo Giunchedi: [C: 031] es-tool: Also show unassigned shards during restart [puppet] - 10https://gerrit.wikimedia.org/r/191356 (owner: 10Chad) [16:48:26] ^d: indeedly! [16:48:47] (03PS2) 10Giuseppe Lavagetto: move mc1014 to a new rack/row [dns] - 10https://gerrit.wikimedia.org/r/191272 [16:49:27] Hey folks. Is it safe for me to update scap now? We have a patch that will hopefully fix l10nupdate [16:49:34] jouncebot: next [16:49:34] In 2 hour(s) and 10 minute(s): MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150218T1900) [16:49:44] (03PS1) 10Chad: es-tool: support IPv6 addresses in (un)ban-node [puppet] - 10https://gerrit.wikimedia.org/r/191357 [16:50:35] actually... twentyafterfour or ^d you should probably do the trebuchet update for scap instead of me [16:50:37] <_joe_> !log moving mc1014 to a new row [16:50:40] Logged the message, Master [16:50:52] <^d> I should probably know how [16:51:11] <^d> git-deply from /srv/scap or something? [16:51:33] ssh tin; cd /src/deployment/scap/scap; git deploy start; git fetch; git deploy sync [16:51:58] oops, add a git rebase origin/master before the sync [16:52:26] then run a sync-file at least to smoke test it [16:53:01] There are a few dead hosts in the trebuchet minions list too [16:53:20] (03CR) 10Giuseppe Lavagetto: [C: 032] move mc1014 to a new rack/row [dns] - 10https://gerrit.wikimedia.org/r/191272 (owner: 10Giuseppe Lavagetto) [16:53:28] <^d> Anything in gerrit I need to merge, or is it just pulling in already merged stuff? [16:53:41] so check the (d)etailed report from trebuchet when it looks like all but 6-7 hosts are done fetching [16:53:53] all merged and ready to update [16:54:51] bonus points for documenting the process on a wikitech page specific to scap [16:54:59] ^^ ++ [16:55:22] https://wikitech.wikimedia.org/wiki/Wikimedia_binaries#scap may be all that is on wikitech at the momement :( [16:56:02] An up-to-date version of https://www.mediawiki.org/wiki/Deployment_tooling/Notes/What_does_scap_do on wikitech would probably be a nice thing to have [16:56:41] <_joe_> bd808: I agree. We should ask whoever modified scap to change that [16:56:57] !log demon Synchronized README: testing scap update (duration: 00m 07s) [16:57:01] Logged the message, Master [16:57:02] if you find that guy let me know. he's a slacker [16:57:19] <^d> bd808: Fine other than two apaches bitching about "no import cli" from earlier [16:57:34] <^d> mw1154 & 58 [16:57:36] that is a sign of the corrupt git clone [16:57:57] <^d> hmm [16:57:59] I manually fixed that in beta just now for 3 hsots [16:58:32] one had a dangling object in the git index, the other two were completely broken clones [16:59:09] <^d> What directory on mw*? /srv/deployment/scap/scap/ isn't a git directory at all [16:59:31] that's the problem, it should be [16:59:35] did we ever put http://git-repair.branchable.com/ into production? [17:00:01] * greg-g goes into 2 hours of 1:1s [17:00:02] <^d> bd808: I can't fix then, no sudo on mw* [17:00:34] fixing will take a root -- cd /src/deployment/scap; mv scap scap-broken; sudo salt-call deploy.fetch 'scap/scap'; sudo salt-call deploy.checkout 'scap/scap' [17:00:48] <^d> yeah [17:01:01] commands from https://wikitech.wikimedia.org/wiki/Trebuchet#Troubleshooting_the_deployment_from_multiple_locations [17:03:24] PROBLEM - gdash.wikimedia.org on graphite2001 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 398 bytes in 0.087 second response time [17:03:34] PROBLEM - graphite.wikimedia.org on graphite2001 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 398 bytes in 0.090 second response time [17:03:44] PROBLEM - puppet last run on graphite2001 is CRITICAL: CRITICAL: Puppet has 1 failures [17:04:04] <^d> bd808: So other than that, your changes seem fine [17:04:53] PROBLEM - uWSGI web apps on graphite2001 is CRITICAL: CRITICAL: Not all configured uWSGI apps are running. [17:06:08] 3Ops-Access-Requests, operations: Requesting sudo access to vanadium for mforns - https://phabricator.wikimedia.org/T89471#1046921 (10mforns) Hey, no problem. Thanks! [17:06:29] graphite2001 is me forgetting downtime [17:06:54] 3RESTBase, operations: Investigate apparent restbase request rate under-reporting in graphite: statsd issue? - https://phabricator.wikimedia.org/T89846#1046930 (10fgiunchedi) likely UDP packet drop and txstatsd maxed out ``` graphite1001:~$ while sleep 1 ; do grep Udp: /proc/net/snmp ; done Udp: InDatagrams NoP... [17:08:50] (03CR) 10Filippo Giunchedi: [C: 031] Add a statsd_port parameter to the restbase class [puppet] - 10https://gerrit.wikimedia.org/r/191350 (owner: 10GWicke) [17:10:59] (03PS3) 10Giuseppe Lavagetto: nutcracker: move and label mc1014 [puppet] - 10https://gerrit.wikimedia.org/r/191290 [17:11:19] (03CR) 10Giuseppe Lavagetto: [C: 032] nutcracker: move and label mc1014 [puppet] - 10https://gerrit.wikimedia.org/r/191290 (owner: 10Giuseppe Lavagetto) [17:11:33] (03CR) 10Giuseppe Lavagetto: [V: 032] nutcracker: move and label mc1014 [puppet] - 10https://gerrit.wikimedia.org/r/191290 (owner: 10Giuseppe Lavagetto) [17:14:44] (03PS2) 10Legoktm: Enable GlobalUserPage extension on all public, CentralAuth wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190691 (https://phabricator.wikimedia.org/T72576) [17:14:46] (03PS1) 10Legoktm: Add GlobalUserPageWikis hook handler for test* wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191360 [17:15:13] 3ops-eqiad, operations: Rack Setup new diskshelf for labstore1001 - https://phabricator.wikimedia.org/T88802#1046954 (10Cmjohnson) Friday seems wide open. Would 11am be okay with everyone? Chris [17:16:34] (03PS2) 10Giuseppe Lavagetto: move mc1014 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191307 [17:17:04] PROBLEM - puppet last run on analytics1003 is CRITICAL: CRITICAL: Puppet last ran 2 days ago [17:17:14] PROBLEM - puppet last run on analytics1025 is CRITICAL: CRITICAL: Puppet last ran 2 days ago [17:17:14] PROBLEM - puppet last run on analytics1021 is CRITICAL: CRITICAL: Puppet last ran 2 days ago [17:17:14] PROBLEM - puppet last run on analytics1022 is CRITICAL: CRITICAL: Puppet last ran 2 days ago [17:17:14] (03CR) 10Giuseppe Lavagetto: [C: 032] move mc1014 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191307 (owner: 10Giuseppe Lavagetto) [17:17:15] PROBLEM - puppet last run on analytics1018 is CRITICAL: CRITICAL: Puppet last ran 2 days ago [17:17:35] PROBLEM - puppet last run on analytics1010 is CRITICAL: CRITICAL: Puppet last ran 2 days ago [17:17:44] PROBLEM - puppet last run on analytics1012 is CRITICAL: CRITICAL: Puppet last ran 2 days ago [17:17:53] PROBLEM - puppet last run on analytics1004 is CRITICAL: CRITICAL: Puppet last ran 2 days ago [17:17:54] PROBLEM - puppet last run on analytics1023 is CRITICAL: CRITICAL: Puppet last ran 2 days ago [17:17:54] PROBLEM - puppet last run on analytics1026 is CRITICAL: CRITICAL: Puppet last ran 2 days ago [17:18:04] PROBLEM - puppet last run on analytics1024 is CRITICAL: CRITICAL: Puppet last ran 2 days ago [17:18:23] (03CR) 10Glaisher: "Yay! :D" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190691 (https://phabricator.wikimedia.org/T72576) (owner: 10Legoktm) [17:18:26] !log oblivian Synchronized wmf-config/session.php: mc1014 IP change (duration: 00m 07s) [17:18:30] Logged the message, Master [17:18:45] !log oblivian Synchronized wmf-config/session.php: mc1014 IP change (duration: 00m 07s) [17:19:14] RECOVERY - puppet last run on analytics1024 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [17:19:46] <_joe_> !log mw1158 and mw1154 report broken python imports during scap [17:19:48] Logged the message, Master [17:21:45] 3operations, MediaWiki-Core-Team: Review Graphite scaling options - https://phabricator.wikimedia.org/T1018#1046987 (10fgiunchedi) 5Open>3Invalid no activity and ssd machines are provisioned, resolving [17:22:33] <_joe_> !log shutting down mc1014, moving to a different rack [17:22:36] Logged the message, Master [17:23:15] 3ops-codfw, operations: codw pfw* serial connections problem - https://phabricator.wikimedia.org/T84737#1046993 (10RobH) Serials for ease of reference: pfw2001 = AJ5112AA0049 pfw2002 = AJ5112AA0042 [17:24:05] RECOVERY - puppet last run on analytics1025 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [17:24:06] (03PS2) 10Giuseppe Lavagetto: move mc1013 to a new rack/row [dns] - 10https://gerrit.wikimedia.org/r/191273 [17:24:46] (03CR) 10Giuseppe Lavagetto: [C: 032] move mc1013 to a new rack/row [dns] - 10https://gerrit.wikimedia.org/r/191273 (owner: 10Giuseppe Lavagetto) [17:25:07] 3operations: bond eth intefaces on ms1001 - https://phabricator.wikimedia.org/T89829#1047006 (10BBlack) Well the puppet part we should be able to copy from existing usage of interface::aggregate. The switch part I can figure out from looking at past examples in the current configs as well. Let's just pick a ti... [17:25:25] RECOVERY - puppet last run on analytics1012 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:26:16] RECOVERY - puppet last run on analytics1010 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [17:28:56] RECOVERY - puppet last run on analytics1026 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [17:29:16] RECOVERY - puppet last run on analytics1022 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [17:29:56] RECOVERY - puppet last run on analytics1023 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [17:30:02] godog: thanks for looking into the statsd issue! [17:31:35] gwicke: np, I feared that was the case unfortunately [17:31:49] what should we do about it? [17:31:56] (03PS1) 10Andrew Bogott: Revert "Reduce ttl to 5M for wikitech" [dns] - 10https://gerrit.wikimedia.org/r/191362 [17:32:13] I saw that the regular statsd server supports some nice perf features like batch reporting [17:32:46] https://github.com/etsy/statsd/ [17:32:53] short term, reduce load, long term yeah think about another plan, I have a couple of tickets to file about that shortly [17:33:16] RECOVERY - puppet last run on analytics1004 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [17:33:41] the other thing I could imagine helping would be running local statsd daemons & then let those report the summary upstream [17:34:36] RECOVERY - puppet last run on analytics1003 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [17:34:41] yeah local statsd is the way to go I think and then pull the metrics from those [17:34:56] RECOVERY - puppet last run on analytics1021 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [17:35:06] (03CR) 10Andrew Bogott: [C: 032] Revert "Reduce ttl to 5M for wikitech" [dns] - 10https://gerrit.wikimedia.org/r/191362 (owner: 10Andrew Bogott) [17:35:16] RECOVERY - puppet last run on analytics1018 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [17:37:55] RECOVERY - uWSGI web apps on graphite2001 is OK: OK: All defined uWSGI apps are runnning. [17:40:27] godog: upgrading to regular statsd could also help perf, as txstatsd is implemented in python [17:41:39] https://wikitech.wikimedia.org/wiki/Graphite/Scaling#performance:_change_statsd_daemon [17:41:43] per host statsd seems weird to me, the way I've done it previously is a statsd instance per...team sort of, broken down by source more or less. Then teams use a registry like etcd to lookup where they need to send things at the time of. Doing millions and millions of metrics we never had much issue [17:41:53] 3Ops-Access-Requests, operations, RESTBase: Access to restbase / cassandra cluster - https://phabricator.wikimedia.org/T89366#1047097 (10faidon) Alright, thanks to everyone who chimed in, I think I have a pretty good understanding where everyone's coming from. I'm okay with moving this forward. I'm also okay wit... [17:43:25] 3ops-eqiad, operations: Rack Setup new diskshelf for labstore1001 - https://phabricator.wikimedia.org/T88802#1047105 (10coren) Works for me. [17:45:02] 3ops-eqiad, operations: Rack Setup new diskshelf for labstore1001 - https://phabricator.wikimedia.org/T88802#1047113 (10Andrew) That's fine, although can you clarify which 11AM? [17:45:05] 3ops-eqiad, operations: Rack Setup new diskshelf for labstore1001 - https://phabricator.wikimedia.org/T88802#1047114 (10faidon) I'd prefer it if we didn't do something like that on a Friday. Also, a Labs outage should be properly announced in advance both to volunteers and staff members and preferrably coordina... [17:45:35] 3ops-eqiad, operations: cable up ms1001's second eth interface - https://phabricator.wikimedia.org/T89836#1047128 (10Cmjohnson) I added the second interface connection and labeled the port on the switch. ge-1/0/4 [17:46:25] 3ops-eqiad, operations: Rack Setup new diskshelf for labstore1001 - https://phabricator.wikimedia.org/T88802#1047130 (10coren) Ah, good point - I forgot that "at the end of this week" might have been agreeable ealy, but is now more problematic as we near it. [17:47:27] 3operations, Wikimedia-Git-or-Gerrit: TransparencyReport repository master in Gerrit silently made private - https://phabricator.wikimedia.org/T89640#1047135 (10akosiaris) We could. Then again I don't see us having reached a consensus about how this bug got resolved. @Qchris was kind enough to provide https://ge... [17:49:49] <^d> qchris: per-branch permissions are totally busted in gerrit, right? [17:49:54] <^d> it's not just drafts, right? [17:50:06] <^d> (read permissions, that is) [17:50:23] For read permissions: Yes. [17:50:26] right [17:50:42] <^d> I thought so. Man I hate this bs. [17:51:08] for git in general, trying to control read access via branch names is a non-starter, since a branch name is just a pointer at the tip commit of the branch. [17:51:55] !log fixing scap on mw1158 and mw1154 will take a root to fix bad trebuchet git clones -- cd /src/deployment/scap; sudo mv scap scap-broken; sudo salt-call deploy.fetch 'scap/scap'; sudo salt-call deploy.checkout 'scap/scap' [17:51:55] <_joe_> the good way to do that is to create a private repo where you do your work, and push to a public "release" repo your artifact [17:52:01] Logged the message, Master [17:52:06] <_joe_> and use git-deploy [17:52:25] <_joe_> bd808: oh I did that yesterday on another host in the list you gave me [17:52:26] 3Ops-Access-Requests, operations, RESTBase: Access to restbase / cassandra cluster - https://phabricator.wikimedia.org/T89366#1047142 (10GWicke) Thanks @faidon. Your points are all pretty uncontroversial and reflect established practice, so no problem there. [17:52:27] <^d> bblack: You can map sha1s to tags/branches that use them, so you can make it work [17:52:38] <_joe_> but a puppet run should be enough [17:52:48] <_joe_> bd808: I'll do that [17:52:52] <^d> (but in general it's crazy and a waste of time imho) [17:52:59] _joe_: :( not sure what is breaking the clones sometimes. I had to fix 2 in beta today as well [17:53:21] gwicke: I know, I wrote that page :) [17:53:51] godog: ah, nm then ;) [17:55:13] chasemp: my understanding that the main challenge with layered statsds is that you need to be careful about syncing the aggregation periods [17:55:31] *is [17:56:13] <_joe_> !log fixed scap on mw1154, moving /srv/deployment/scap away made puppet perform the redeploy [17:56:17] Logged the message, Master [17:56:25] 3operations: set up switch port for second ethernet interface for ms1001 - https://phabricator.wikimedia.org/T89833#1047151 (10Aklapper) [17:56:57] sure you are flooding every minute or so (for us) and if you send it all at once from all sources that can be a bad thing, but it's not more or less deterministic than doing it per host [17:57:13] and it's all about load which we have a poor idea of now [17:57:47] I'm more concerned about the effect the periods have on the values that are actually reported, especially for gauges [17:58:08] <_joe_> !log fixed scap on mw1158, moving /srv/deployment/scap away made puppet perform the redeploy [17:58:11] Logged the message, Master [17:58:42] 3Ops-Access-Requests, operations: Access request for stat1003 - https://phabricator.wikimedia.org/T89418#1047158 (10Aklapper) ...or @Jaredzimmerman-WMF could fix the access restrictions to that Conpherence thread (cannot access it either). [17:58:51] gwicke: not sure what you mean, a guage is always "last value" [17:58:51] (03PS3) 10Giuseppe Lavagetto: nutcracker: move and label mc1013 [puppet] - 10https://gerrit.wikimedia.org/r/191291 [17:58:57] chasemp: actually I'm also wondering if we could use lvs to load balance multiple statsd daemons [17:59:26] (03CR) 10Giuseppe Lavagetto: [C: 032] nutcracker: move and label mc1013 [puppet] - 10https://gerrit.wikimedia.org/r/191291 (owner: 10Giuseppe Lavagetto) [17:59:31] 3operations: set up switch port for second ethernet interface for ms1001 - https://phabricator.wikimedia.org/T89833#1047161 (10Cmjohnson) [17:59:32] 3ops-eqiad, operations: cable up ms1001's second eth interface - https://phabricator.wikimedia.org/T89836#1047159 (10Cmjohnson) 5Open>3Resolved a:3Cmjohnson [18:00:16] chasemp, gwicke: where to put the statsd aggregators really depends on how your metrics are structured. If you keep metrics per-host then a per-host statsd makes sense. If you keep per-product then it does not. [18:00:38] but you could then easily have a statsd per product [18:00:42] if needed [18:00:47] I'm mostly concerned about the global ones [18:01:08] bd808: sort of true but not entirely the way I've done it in the past [18:02:01] chasemp: TMTOWTDI :) [18:02:21] what about my mother? [18:02:54] <_joe_> PERL ALERT [18:03:26] to get accurate values with layering it seems to be important to flush the upper layer significantly more frequently than the primary [18:03:53] gwicke: what layers are you talking about exactly I think I'm not understanding your mental model [18:04:06] do you mean to layer statsd services? [18:04:09] at $DAYJOB-1 (where we had far fewer hosts) our metrics were {company}.{environment}.{datacenter}.{host}.{product}.{...} and we aggregated on each host with a statsd-like thing [18:04:20] one statsd aggregating stats per host / product / whatever, and sending the aggregate to a master statsd [18:04:35] that's just going to cause unnecessary pain [18:04:41] and I don't see the benefit? [18:04:58] so you mean a single layer, each directly reporting to graphite? [18:05:01] bd808: that's fair, we do poor normalization here for our metric paths from what I've seen [18:05:29] gwicke: a single layer made up of a statsd service more or less for each "customer" service...you could call it SOA [18:05:31] <_joe_> chasemp: we need to decide SOON on how to normalize [18:05:34] if you were so inclied :D [18:05:38] that was a joke [18:05:55] and yes they flush to graphite directly or via some conduit that has queuing if desired [18:06:01] chasemp: makes sense to me [18:06:06] but stacking statsd services is just weird and will be bad [18:06:13] *nod* I invented the whole deployment there and knew I wanted to be able to compare and trend things on those levels across all of the things [18:06:13] <_joe_> still, I do think it's a flawed model for metrics, having a hierarchy [18:06:17] agreed, which is why I was hesitant [18:07:09] so the way I've done it with a statsd service per customer we prepended a sane path at aggregation time [18:07:15] (03PS1) 10Tim Landscheidt: Tools: Install byobu [puppet] - 10https://gerrit.wikimedia.org/r/191368 (https://phabricator.wikimedia.org/T88989) [18:07:20] I'm bullish on the path/normalization btw, square peg in a round hole [18:07:22] so if you are sending from foo you do a lookup in the service registry for your statsd instance [18:07:35] and then you send it off and we handle the where in teh top level via mandate [18:07:47] no one in foo service can accidentally shim a metric under bar etc [18:08:13] we did this as people stink at organizing their own stuff and it was the only way to make global standards [18:08:19] plus you set it in one place [18:08:21] in the service-runner lib we just extracted from restbase we also enforce service name prefixing [18:09:12] then actual "server" metrics have their own statsd service which understands where they should land, basically you lookup via key in a registry where you should send your stats and the top level layout is automagical to you as a basic consumer [18:09:18] (03PS2) 10Giuseppe Lavagetto: move mc1013 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191308 [18:09:25] it's not easy to know where to put things honestly [18:09:54] <_joe_> godog: ok but if we stick to graphite, we need to choose it anyways [18:10:21] (03CR) 10Giuseppe Lavagetto: [C: 032] move mc1013 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191308 (owner: 10Giuseppe Lavagetto) [18:10:25] (03Merged) 10jenkins-bot: move mc1013 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191308 (owner: 10Giuseppe Lavagetto) [18:11:01] gwicke: understood but it would be better if that was transparent to you at a very top level and the abstraction was kept consistent by a service guideline for statsd / graphite itself [18:11:26] !log oblivian Synchronized wmf-config/session.php: mc1013 IP change (duration: 00m 07s) [18:11:26] chasemp: we enforce that in code [18:11:29] having every team decide what the top level metric path will be in isolation is not a scalable or sane solution [18:11:30] Logged the message, Master [18:12:02] the statsd client passed to each service unconditionally prefixes the name supplied by the puppetized config for the service [18:12:06] gwicke: sure I understand but not centralized code taht is consumable across teams easily and consistently? My point is just, great idea but let's do it at another layer and do it for everyone [18:12:29] chasemp: it's a lib intended to be shared across services [18:12:30] in that case sounds good let's do that everywhere [18:12:41] https://github.com/wikimedia/service-runner [18:13:00] node services, that is [18:13:04] when you say services you mean nodejs [18:13:05] ah yes [18:13:11] I mean _all_ of everything [18:13:28] otherwise we solve this same problem in every context differently [18:13:30] yeah, that's slightly harder ;) [18:13:36] it's not really tho [18:13:55] the human element is harder than prepending stuff and saying go here to find your metrics [18:14:42] !log oblivian Synchronized wmf-config/session.php: mc1013 IP change (duration: 00m 05s) [18:14:45] Logged the message, Master [18:16:08] chasemp: agreed, but we can help the human side by making doing the right thing really easy [18:16:29] no argument [18:17:25] PROBLEM - puppet last run on mw1154 is CRITICAL: CRITICAL: Puppet has 1 failures [18:22:01] (03CR) 10GWicke: [C: 031] create admin group restbase-roots [puppet] - 10https://gerrit.wikimedia.org/r/190500 (https://phabricator.wikimedia.org/T89366) (owner: 10Dzahn) [18:27:55] 3operations: scale statsd reporting/aggregation - https://phabricator.wikimedia.org/T89857#1047220 (10fgiunchedi) 3NEW a:3fgiunchedi [18:28:12] gwicke chasemp bd808 _joe_ ^ have at it :) [18:30:09] (03CR) 10Alexandros Kosiaris: [C: 032] Add IOPS to diskstat.py gmond plugin [puppet] - 10https://gerrit.wikimedia.org/r/191090 (owner: 10Alexandros Kosiaris) [18:30:22] godog: btw, is txstatsd only running a single thread / process? [18:31:20] 3Ops-Access-Requests, operations: Requesting access to analytics-privatedata-users for jamesur - https://phabricator.wikimedia.org/T89739#1047236 (10Jalexander) Aye, sorry for the bad wording on my part, the Hadoop cluster is in fact what I'm looking for. Unfortunately when my use case comes up its for legal pro... [18:31:58] 3operations: diamond network collector loss not accurate - https://phabricator.wikimedia.org/T89858#1047237 (10fgiunchedi) 3NEW a:3fgiunchedi [18:33:31] gwicke: a single process, perhaps one/two threads [18:36:55] RECOVERY - puppet last run on mw1154 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [18:38:14] 3RESTBase, operations: Investigate apparent restbase request rate under-reporting in graphite: statsd issue? - https://phabricator.wikimedia.org/T89846#1047269 (10fgiunchedi) so, short term plan is to reduce txstatsd load by stopping mw metrics into txstatsd (cc @ori) and move them to another statsd port to reli... [18:38:18] gwicke: ^ [18:38:29] (03PS1) 10Ori.livneh: Change $wgUDPProfilerPort to 8135. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191372 [18:38:30] godog: ^ [18:39:18] (03CR) 10Filippo Giunchedi: [C: 031] Change $wgUDPProfilerPort to 8135. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191372 (owner: 10Ori.livneh) [18:39:22] lolz, +1 [18:39:26] PROBLEM - puppet last run on amssq45 is CRITICAL: CRITICAL: Puppet has 1 failures [18:40:03] (03CR) 10Ori.livneh: [C: 032] "See " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191372 (owner: 10Ori.livneh) [18:40:09] (03Merged) 10jenkins-bot: Change $wgUDPProfilerPort to 8135. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191372 (owner: 10Ori.livneh) [18:40:36] !log ori Synchronized wmf-config/CommonSettings.php: Id215ff962: Change $wgUDPProfilerPort to 8135. (duration: 00m 05s) [18:40:40] Logged the message, Master [18:41:08] 3RESTBase, operations: Investigate apparent restbase request rate under-reporting in graphite: statsd issue? - https://phabricator.wikimedia.org/T89846#1047273 (10ori) >>! In T89846#1047269, @fgiunchedi wrote: > so, short term plan is to reduce txstatsd load by stopping mw metrics into txstatsd (cc @ori) and mov... [18:42:02] (03CR) 10Filippo Giunchedi: [C: 031] create admin group restbase-roots [puppet] - 10https://gerrit.wikimedia.org/r/190500 (https://phabricator.wikimedia.org/T89366) (owner: 10Dzahn) [18:43:56] 3Ops-Access-Requests, operations: Requesting access to analytics-privatedata-users for jamesur - https://phabricator.wikimedia.org/T89739#1047277 (10Dzahn) >>! In T89739#1046265, @Ottomata wrote: > If you want access to private webrequest logs on the Hadoop cluster, then analytics-privatedata-access so the gerr... [18:47:54] 3Citoid, operations, Services: Zotero not running in production - https://phabricator.wikimedia.org/T76308#1047292 (10Catrope) [18:48:50] 3Citoid, operations, Scrum-of-Scrums, Services: Zotero not running in production - https://phabricator.wikimedia.org/T76308#795651 (10Catrope) [18:50:08] (03PS1) 10Alexandros Kosiaris: Move from role class parameter to hiera lookup [puppet] - 10https://gerrit.wikimedia.org/r/191379 [18:58:13] (03CR) 10Dzahn: "openstack::firewall is not applied on silver." [puppet] - 10https://gerrit.wikimedia.org/r/190147 (owner: 10Dzahn) [18:59:05] (03CR) 10Andrew Bogott: [C: 031] "I was mistaken, this is a different path from the one that includes openstack::firewall." [puppet] - 10https://gerrit.wikimedia.org/r/190147 (owner: 10Dzahn) [18:59:39] (03CR) 10John F. Lewis: [C: 031] wikitech - add ferm rules for http/https [puppet] - 10https://gerrit.wikimedia.org/r/190147 (owner: 10Dzahn) [19:00:05] twentyafterfour, greg-g, legoktm: Respected human, time to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150218T1900). Please do the needful. [19:00:06] RECOVERY - puppet last run on amssq45 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:00:16] o [19:00:18] / [19:01:01] (03PS2) 10Dzahn: wikitech - add ferm rules for http/https [puppet] - 10https://gerrit.wikimedia.org/r/190147 [19:01:07] !log restart txstatsd on graphite1001 to flush old metrics [19:01:10] Logged the message, Master [19:02:29] (03CR) 10Dzahn: [C: 032] wikitech - add ferm rules for http/https [puppet] - 10https://gerrit.wikimedia.org/r/190147 (owner: 10Dzahn) [19:05:03] ori: feel like peeking here? https://gerrit.wikimedia.org/r/#/c/177080/ [19:05:28] had some comments but amended long since [19:10:54] greg-g: is twentyafterfour doing the deploy today? [19:11:13] yeah [19:11:14] legoktm: I assume so [19:11:24] ok :P [19:11:29] I was just about to start it [19:11:34] twentyafterfour: did you see my messages in -releng? [19:13:54] twentyafterfour: basically https://gerrit.wikimedia.org/r/#/c/191360/ (config) and https://gerrit.wikimedia.org/r/191370 (submodule bump in 17) should be deployed before you scap [19:15:11] ok [19:16:14] 3RESTBase, operations: Investigate apparent restbase request rate under-reporting in graphite: statsd issue? - https://phabricator.wikimedia.org/T89846#1047415 (10GWicke) The restbase request rate as reported increased with this patch: {F43019} It is still not quite at 100% of the expected rate though. Is txst... [19:16:19] legoktm: so I need to merge those into mediawiki-config on tin ? [19:17:04] um, do you mean -staging? [19:18:36] twentyafterfour: only the first one is a config change which ends up in wmf-config.. [19:20:45] 3RESTBase, operations: Investigate apparent restbase request rate under-reporting in graphite: statsd issue? - https://phabricator.wikimedia.org/T89846#1047443 (10GWicke) @ori, re https://gerrit.wikimedia.org/r/191372: Are requests to port 8135 handled by statsd rather than txstatsd? [19:24:53] !log pruned stale members from trebuchet minions set for scap/scap: redis-cli srem "deploy:scap/scap:minions" fenari.wikimedia.org virt0.wikimedia.org nickel.wikimedia.org searchidx1001.eqiad.wmnet [19:26:22] gwicke: btw I have to go now, will take a closer look tomorrow too how to fix statsd [19:28:03] godog: kk [19:28:20] godog: it's lower prio than the access request from our pov [19:29:33] godog: thanks for your help & enjoy your evening! [19:34:19] gwicke: cool, yeah we are sorted on the access request I think! (modulo merge) I'll do it tomorrow if it doesn't happen over european night [19:35:43] godog: a +1 would be helpful though [19:36:50] gwicke: there already :) [19:36:54] <- off [19:37:01] ah, thx! [19:37:02] bye! [19:39:30] 3Ops-Access-Requests, operations: Access request for stat1003 - https://phabricator.wikimedia.org/T89418#1047499 (10Jaredzimmerman-WMF) Approved [19:42:57] (03PS2) 10Ottomata: Add new user Daisy Chen to researchers group Bug: T89418 [puppet] - 10https://gerrit.wikimedia.org/r/191341 (https://phabricator.wikimedia.org/T89418) [19:44:23] (03CR) 10Ottomata: [C: 032] Add new user Daisy Chen to researchers group Bug: T89418 [puppet] - 10https://gerrit.wikimedia.org/r/191341 (https://phabricator.wikimedia.org/T89418) (owner: 10Ottomata) [19:46:00] 3ops-eqiad, operations: Rack Setup new diskshelf for labstore1001 - https://phabricator.wikimedia.org/T88802#1047508 (10Cmjohnson) Excellent Point. How about Tuesday at 10am Eastern. It does not appear to interfere with anything that would affect Labs. [19:46:47] 3ops-eqiad, operations: Rack Setup new diskshelf for labstore1001 - https://phabricator.wikimedia.org/T88802#1047509 (10Andrew) Chris, can you suggest a window next week? Any day but Monday is good for me. [19:48:26] 3ops-eqiad, operations: Rack Setup new diskshelf for labstore1001 - https://phabricator.wikimedia.org/T88802#1047521 (10Cmjohnson) Andrew see above Tuesday 2/24 10AM Eastern (1500UTC) [19:48:39] 3Ops-Access-Requests, operations: Access request for stat1003 - https://phabricator.wikimedia.org/T89418#1047522 (10Ottomata) 5Open>3Resolved Done. Daisy, you should be able to edit your .ssh/config file and add ``` ForwardAgent no Host !bast1001.wikimedia.org *.wikimedia.org *.wmnet ProxyCommand... [19:50:06] 3Citoid, operations, Scrum-of-Scrums, Services: Zotero not running in production - https://phabricator.wikimedia.org/T76308#1047527 (10GWicke) My proposal for the way forward is this: - in the short term (absent containers), contain citoid and the zotero xulrunner using - a tight apparmor policy (no writes or... [19:51:19] 3Ops-Access-Requests, operations: Access request for stat1003 - https://phabricator.wikimedia.org/T89418#1047533 (10dchen) thanks!! [19:52:29] (03PS2) 10Alexandros Kosiaris: Move from role class parameter to hiera lookup [puppet] - 10https://gerrit.wikimedia.org/r/191379 [19:52:31] (03PS1) 10Alexandros Kosiaris: Use network::constants to populate url_downloader ACLs [puppet] - 10https://gerrit.wikimedia.org/r/191385 [19:55:46] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There are 0 unmerged changes in mediawiki_config (dir /srv/mediawiki-staging/). [19:56:43] 3Citoid, operations, Scrum-of-Scrums, Services: Zotero not running in production - https://phabricator.wikimedia.org/T76308#1047563 (10akosiaris) Hello, so I 've finally got some time to work on this for this and next week (other priorities before that, I am afraid). I 've kind of already started putting various... [19:59:47] 3operations: Backport and using zotero-standalone for the zotero service - https://phabricator.wikimedia.org/T89866#1047609 (10akosiaris) 3NEW [20:00:12] (03PS1) 10Matanya: sge: 4 digit file mode [puppet] - 10https://gerrit.wikimedia.org/r/191386 [20:02:02] 3operations: Puppetize zotero - https://phabricator.wikimedia.org/T89867#1047641 (10akosiaris) 3NEW [20:03:21] 3operations: Assign hardware for the zotero service - https://phabricator.wikimedia.org/T89869#1047664 (10akosiaris) 3NEW [20:04:34] 3operations: Assign an internal LVS service IP for zotero - https://phabricator.wikimedia.org/T89870#1047681 (10akosiaris) 3NEW [20:06:46] 3operations: Update the citoid/deploy branch to not contain zotero deploy - https://phabricator.wikimedia.org/T89872#1047721 (10akosiaris) 3NEW [20:06:49] (03PS1) 10Rush: phab update security extensions for access-request [puppet] - 10https://gerrit.wikimedia.org/r/191387 [20:08:52] (03PS1) 10Ottomata: Add kite to cloudera updates - it is a dependency for sqoop [puppet] - 10https://gerrit.wikimedia.org/r/191388 [20:09:32] 3operations: Configure citoid to use the new zotero service - https://phabricator.wikimedia.org/T89873#1047745 (10akosiaris) 3NEW [20:10:20] 3ops-codfw, operations: take a look at fdb2001 (in fundraising rack) and see whether it actually has a bad hdd - https://phabricator.wikimedia.org/T89407#1047758 (10Papaul) a:3Jgreen Drive has been replaced. old drive in shipping for return. [20:10:22] 3ops-eqiad, operations: Rack Setup new diskshelf for labstore1001 - https://phabricator.wikimedia.org/T88802#1047760 (10Andrew) Tuesday AM sounds fine to me. [20:10:22] (03PS2) 10Rush: phab update security extensions for access-request [puppet] - 10https://gerrit.wikimedia.org/r/191387 [20:10:33] (03CR) 10Ottomata: [C: 032] Add kite to cloudera updates - it is a dependency for sqoop [puppet] - 10https://gerrit.wikimedia.org/r/191388 (owner: 10Ottomata) [20:10:47] 3operations: Configure zotero to use an outbound proxy - https://phabricator.wikimedia.org/T89874#1047761 (10akosiaris) 3NEW [20:11:41] 3operations: Configure citoid to use outbound proxy - https://phabricator.wikimedia.org/T89875#1047771 (10akosiaris) 3NEW [20:11:50] 3Citoid, operations, Scrum-of-Scrums, Services: Zotero not running in production - https://phabricator.wikimedia.org/T76308#1047779 (10akosiaris) [20:12:16] PROBLEM - puppet last run on stat1002 is CRITICAL: CRITICAL: Puppet has 1 failures [20:14:31] 3Citoid, operations, Scrum-of-Scrums, Services: Zotero not running in production - https://phabricator.wikimedia.org/T76308#1047822 (10akosiaris) @Gwicke well, if getting rid of xulrunner is possible (and merging zotero functionality into citoid, if I understand correctly what you are saying), it is making thing... [20:17:10] 3Citoid, operations, Scrum-of-Scrums, Services: Zotero not running in production - https://phabricator.wikimedia.org/T76308#1047832 (10Jdforrester-WMF) Can we please not make this already-6-months-late project even later purely for technical architecture reasons? [20:20:04] bwerrrrrrr [20:20:14] am I doing something wrong with apt and reprepro again? [20:20:16] https://gerrit.wikimedia.org/r/#/c/191388/1/modules/install-server/files/reprepro/updates [20:20:17] did this [20:20:20] applied it on carbon [20:20:26] cd /srv/wikimedia; reprepro update [20:20:27] ... [20:20:29] no kite packages. [20:21:33] paravoid: any quick advice this time? [20:22:28] 3Ops-Access-Requests, operations: access request for researcher to analytics-users in Hadoop - https://phabricator.wikimedia.org/T89264#1047837 (10leila) @Ottomata, Ashwin has signed the link. Approval should come from Toby. He was in the ops-requests email I sent this request in. Should I ask him to approve in... [20:22:34] (03PS1) 10Chad: Remove profiling from CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191390 [20:30:32] uh, nm, there it is [20:30:39] i guess it just didn't like me the first time [20:33:17] (03CR) 10Aaron Schulz: [C: 031] Remove profiling from CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191390 (owner: 10Chad) [20:33:58] 3Ops-Access-Requests, operations: access request for researcher to analytics-users in Hadoop - https://phabricator.wikimedia.org/T89264#1047912 (10Tnegrin) approved [20:35:20] (03PS1) 10Ottomata: Mirror solr and sentry from cloudera; they are dependencies for kite [puppet] - 10https://gerrit.wikimedia.org/r/191401 [20:37:53] 3Citoid, operations, Scrum-of-Scrums, Services: Zotero not running in production - https://phabricator.wikimedia.org/T76308#1047932 (10GWicke) > Can we please not make this already-6-months-late project even later purely for technical architecture reasons? +1 for being pragmatic in the short term. I share the c... [20:38:00] (03PS2) 10Ottomata: Put new user Ashwin Pradeep Paranjape in analytics-users group [puppet] - 10https://gerrit.wikimedia.org/r/191338 (https://phabricator.wikimedia.org/T89264) [20:38:10] (03CR) 10Chad: [C: 032] Remove profiling from CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191390 (owner: 10Chad) [20:38:54] (03CR) 10Ottomata: [C: 032] Put new user Ashwin Pradeep Paranjape in analytics-users group [puppet] - 10https://gerrit.wikimedia.org/r/191338 (https://phabricator.wikimedia.org/T89264) (owner: 10Ottomata) [20:38:57] (03CR) 10Chad: [C: 04-2] "Waiting." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191390 (owner: 10Chad) [20:39:00] (03PS2) 10Ottomata: Mirror solr and sentry from cloudera; they are dependencies for kite [puppet] - 10https://gerrit.wikimedia.org/r/191401 [20:39:10] (03PS5) 10Dzahn: move mediawiki maintenance scripts to module [puppet] - 10https://gerrit.wikimedia.org/r/178873 (https://phabricator.wikimedia.org/T88597) [20:39:20] (03CR) 10Ottomata: [C: 032 V: 032] Mirror solr and sentry from cloudera; they are dependencies for kite [puppet] - 10https://gerrit.wikimedia.org/r/191401 (owner: 10Ottomata) [20:40:04] (03CR) 10Legoktm: "Will there still be away to see how long initializing config/loading extensions takes? I'd like to be able to compare when we switch to ex" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191390 (owner: 10Chad) [20:49:56] RECOVERY - puppet last run on stat1002 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [20:52:32] (03PS3) 10Ottomata: create admin group restbase-roots [puppet] - 10https://gerrit.wikimedia.org/r/190500 (https://phabricator.wikimedia.org/T89366) (owner: 10Dzahn) [20:54:45] (03CR) 10Ottomata: [C: 032] create admin group restbase-roots [puppet] - 10https://gerrit.wikimedia.org/r/190500 (https://phabricator.wikimedia.org/T89366) (owner: 10Dzahn) [20:56:19] gwicke: ^ [20:57:38] ottomata: awesome, mucho gracias! [20:58:15] we can start stressing that hardware & Jessie now [20:58:56] PROBLEM - puppet last run on restbase1003 is CRITICAL: CRITICAL: puppet fail [21:00:05] gwicke, cscott, arlolra, subbu: Respected human, time to deploy Parsoid/OCG (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150218T2100). Please do the needful. [21:00:51] twentyafterfour: how goes the train? Need anything? [21:01:06] PROBLEM - puppet last run on praseodymium is CRITICAL: CRITICAL: puppet fail [21:01:46] bd808: I'm being incredibly slow because I'm tediously automating each step as I go along [21:01:59] most excellent [21:02:07] go twentyafterfour go [21:02:28] I've figured out how to automate carrying forward security patches...for the most part [21:02:52] you should probably get a merit badge for just doing that [21:03:11] but in the process I noticed that there are a lot of hot fixes to extensions, do I ignore cherry-picks other than security patches? [21:03:23] or should I pull cherry picks on extensions into the new branch [21:03:37] yes. those should be in the next branch in theory [21:03:44] or better yet, log them and complain loudly about the cherries [21:03:49] yes (ignore) [21:04:07] the cherries should be the result of logged SWAT activity [21:04:16] PROBLEM - puppet last run on cerium is CRITICAL: CRITICAL: puppet fail [21:04:33] ok cool [21:04:39] 2/3 of swat stuff is backporting bug fixes to the active branches [21:05:43] Hi ^d! It's been suggest that I ask you about getting in the wmf_deployment group so I can +2 config changes... I do now have depoy rights on the cluster... thanks in advance! [21:06:12] "21:01 < twentyaft> bd808: I'm being incredibly slow because I'm tediously automating each step as I go along" <3 [21:06:23] 3Analytics-Cluster, operations: Clean up permissions for privatedata files on stat1002 - they should be group readable by statistics-privatedata-users - https://phabricator.wikimedia.org/T89887#1048013 (10Ottomata) 3NEW a:3Ottomata [21:06:26] PROBLEM - puppet last run on xenon is CRITICAL: CRITICAL: puppet fail [21:10:01] 3ops-eqiad, operations: Rack Setup new diskshelf for labstore1001 - https://phabricator.wikimedia.org/T88802#1048043 (10greg) That time should work yeah. Also, what is the user-visible impact of this downtime? [21:21:26] 3ops-eqiad, operations: Rack Setup new diskshelf for labstore1001 - https://phabricator.wikimedia.org/T88802#1048082 (10Andrew) The impact will be VERY visible. All shared storage on labs will stop working -- it will be an almost total labs outage, with lots of processes angry about filesystem timeouts after th... [21:22:14] (03PS1) 10Andrew Bogott: Roughed in designate class [puppet] - 10https://gerrit.wikimedia.org/r/191471 [21:22:44] 3ops-eqiad, operations: Rack Setup new diskshelf for labstore1001 - https://phabricator.wikimedia.org/T88802#1048085 (10Andrew) Chris, suppose you can do https://phabricator.wikimedia.org/T89266 during the same window? Obviously there's only one of you, but we may as well combine our outages into one big one. [21:24:14] (03CR) 10Andrew Bogott: [C: 04-2] "Man, I can't believe that passed CI tests!" [puppet] - 10https://gerrit.wikimedia.org/r/191471 (owner: 10Andrew Bogott) [21:33:03] :o [21:34:26] :o [21:40:16] 3Ops-Access-Requests, operations: access request for researcher to analytics-users in Hadoop - https://phabricator.wikimedia.org/T89264#1048154 (10leila) @Ottomata, I just talked to Toby about it. He approved that it's fine to give Ashwin access to what Bob has access to given that he has signed the NDA and MOU. [21:41:33] !log deployed parsoid version 17f68256 [21:41:39] Logged the message, Master [21:44:19] 3Ops-Access-Requests, operations: access request for researcher to analytics-users in Hadoop - https://phabricator.wikimedia.org/T89264#1048169 (10Ottomata) Great, thanks. Bob also has access to analytics-privatedata-users, private data in Hadoop. I think we don't need that for this. Instead, I will also gran... [21:47:57] (03PS1) 10Ottomata: Also add ashwinpp to statistics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/191478 (https://phabricator.wikimedia.org/T89264) [21:49:49] (03PS1) 1020after4: Remove 1.25wmf12 symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191480 [21:49:51] (03PS1) 1020after4: Remove 1.25wmf13 symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191481 [21:49:53] (03PS1) 1020after4: Add 1.25wmf18 symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191482 [21:49:55] (03PS1) 1020after4: Wikipedias to 1.25wmf17 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191483 [21:49:57] (03PS1) 1020after4: Group0 to 1.25wmf18 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191484 [21:50:11] (03CR) 10Ottomata: [C: 032] Also add ashwinpp to statistics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/191478 (https://phabricator.wikimedia.org/T89264) (owner: 10Ottomata) [21:51:07] 3Ops-Access-Requests, operations: access request for researcher to analytics-users in Hadoop - https://phabricator.wikimedia.org/T89264#1048186 (10Ottomata) 5Open>3Resolved [21:54:17] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [21:54:41] twentyafterfour: um, so how long do you expect the deployment to take? [21:55:15] just about ready to scap...not sure how long that will take but ... [21:55:49] (03CR) 1020after4: [C: 032] Remove 1.25wmf12 symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191480 (owner: 1020after4) [21:55:54] (03Merged) 10jenkins-bot: Remove 1.25wmf12 symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191480 (owner: 1020after4) [21:55:57] (03CR) 1020after4: [C: 032] Remove 1.25wmf13 symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191481 (owner: 1020after4) [21:56:02] (03Merged) 10jenkins-bot: Remove 1.25wmf13 symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191481 (owner: 1020after4) [21:56:05] (03CR) 1020after4: [C: 032] Add 1.25wmf18 symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191482 (owner: 1020after4) [21:56:10] (03Merged) 10jenkins-bot: Add 1.25wmf18 symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191482 (owner: 1020after4) [21:56:15] (03CR) 1020after4: [C: 032] Wikipedias to 1.25wmf17 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191483 (owner: 1020after4) [21:56:21] (03Merged) 10jenkins-bot: Wikipedias to 1.25wmf17 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191483 (owner: 1020after4) [21:56:23] (03CR) 1020after4: [C: 032] Group0 to 1.25wmf18 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191484 (owner: 1020after4) [21:56:28] (03Merged) 10jenkins-bot: Group0 to 1.25wmf18 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191484 (owner: 1020after4) [21:57:18] twentyafterfour: ok, did the GlobalUserPage submodule update + config change get pulled in? [21:57:34] I'm just doing that now [21:58:00] \O/ [21:58:02] ok :) [21:58:04] (03CR) 1020after4: [C: 032] Add GlobalUserPageWikis hook handler for test* wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191360 (owner: 10Legoktm) [21:58:12] (03Merged) 10jenkins-bot: Add GlobalUserPageWikis hook handler for test* wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191360 (owner: 10Legoktm) [21:59:20] legoktm: should I just cherry pick https://gerrit.wikimedia.org/r/#/c/191370/ onto wmf18 branch? [21:59:44] twentyafterfour: no, wmf18 already has master since you just branched it [22:01:06] 3Triagers, Project-Creators, Phabricator, operations: Broaden the group of users that can create projects in Phabricator - https://phabricator.wikimedia.org/T706#1048218 (10KLans_WMF) Please add me to Project-Creators. I am the scrum master for the app and mobile web teams and will be helping create sprints. [22:01:44] legoktm: but I have to manually update extensions/GlobalUserPage on tin? [22:02:05] (for wmf17?) [22:02:23] twentyafterfour: yeah, "git submodule update extensions/GlobalUserPage" should do it once that commit is merged [22:02:29] 3hardware-requests, Labs, ops-eqiad, operations: virt1000 memory upgrade - https://phabricator.wikimedia.org/T89266#1048219 (10Cmjohnson) I can do this on Tuesday at 1500-1700UTC same time frame as Labstore1001. Please confirm if this will work for everyone. [22:03:18] 3ops-eqiad, operations: Rack Setup new diskshelf for labstore1001 - https://phabricator.wikimedia.org/T88802#1048221 (10Cmjohnson) Shouldn't be a problem In fact, I will probably do that first since it's the simplest. I commented on the ticket T89266 [22:11:32] (03PS6) 10Dzahn: move mediawiki maintenance scripts to module [puppet] - 10https://gerrit.wikimedia.org/r/178873 (https://phabricator.wikimedia.org/T88597) [22:13:42] 3§ Fundraising Dash, operations: Create staging site for Dash - https://phabricator.wikimedia.org/T87809#1048240 (10atgo) p:5Normal>3Triage [22:15:18] 3§ Fundraising Dash, operations: Create sandbox site for Dash - https://phabricator.wikimedia.org/T87809#1048263 (10K4-713) [22:17:01] ok finally ready to scap [22:17:14] (03CR) 10Dzahn: "@ori moved the includes from site to a role as you suggested. redid the maintenance classes because of changes since original upload date" [puppet] - 10https://gerrit.wikimedia.org/r/178873 (https://phabricator.wikimedia.org/T88597) (owner: 10Dzahn) [22:17:41] * bd808 bets on 36 minutes runtime [22:18:02] 3§ Fundraising Dash, operations: Create sandbox site for Dash - https://phabricator.wikimedia.org/T87809#1048274 (10atgo) p:5Triage>3Normal [22:18:03] that quick? [22:18:17] I'd guess longer than that [22:19:42] What was it last week for the full new branch scap? [22:19:52] (03CR) 10Dzahn: [C: 031] sge: 4 digit file mode [puppet] - 10https://gerrit.wikimedia.org/r/191386 (owner: 10Matanya) [22:20:10] I'm sticking with 36 [22:24:50] !log twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedias to 1.25wmf17 [22:24:54] Logged the message, Master [22:24:55] 3§ Fundraising Dash, operations: Create sandbox site for Dash - https://phabricator.wikimedia.org/T87809#1048291 (10Ejegg) Need to create another OAuth context in Civi Need a db to add the new user preference tables Guessing we want to be able to manage this by pulling arbitrary git commits, not the usual deplo... [22:25:21] (03PS2) 10Andrew Bogott: Roughed in designate class [puppet] - 10https://gerrit.wikimedia.org/r/191471 [22:25:44] mutante: for some reason the restbase login doesn't work yet [22:27:54] bd808: I think nearly an hour last time I did it [22:28:08] Errm, https://www.mediawiki.org/ says "File not found: /srv/mediawiki/php-1.25wmf18/index.php" [22:28:14] I guess that's known? /me just joined [22:28:34] !log twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.25wmf18 [22:28:39] group0 wikis are dead [22:28:40] Logged the message, Master [22:29:17] I broke it :( [22:29:21] ok ... [22:30:10] wrong order [22:30:25] twentyafterfour: I'll assume we're skipping our 1:1 for now :) [22:30:31] revert and resync the wikiversions file [22:30:33] 3Triagers, Project-Creators, Phabricator, operations: Broaden the group of users that can create projects in Phabricator - https://phabricator.wikimedia.org/T706#1048304 (10Aklapper) @KLans_WMF: I've added you. Usual disclaimer: Please remember to follow https://www.mediawiki.org/wiki/Phabricator/Creating_and_r... [22:30:50] !log twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: rollback group0 to 1.25wmf17 [22:30:53] Logged the message, Master [22:31:28] greg-g: hah ..gimme a minute at least [22:31:36] np :) [22:32:41] twentyafterfour: I think you skipped this step -- https://wikitech.wikimedia.org/wiki/Heterogeneous_deployment/Train_deploys#Sync_to_cluster_and_verify_on_testwiki [22:33:57] twentyafterfour: I always did testwiki as a seperate step (before moving wikipedias), then run scap... Which will cause scap to build the new localisation cache, and change testwiki over to the new version at the end [22:34:14] 3Labs, Wikimedia-Labs-Infrastructure, operations: Make labs/private really private - https://phabricator.wikimedia.org/T89642#1048321 (10Krenair) [22:34:23] Then you can merge/sync the updates for the rest of the wikipedias, and then the rest of group0 [22:34:35] !log twentyafterfour Started scap: testwiki to php-1.25wmf18 and rebuild l10n cache [22:34:38] Logged the message, Master [22:38:04] got lost in bulleted list on the wiki page ... it's pretty hard to follow honestly. I've been writing scripts for each thing as I go but I guess I missed a bullet [22:39:00] twentyafterfour: mind if I just move our 1:1 to tomorrow? my afternoon is crushed :/ [22:39:08] greg-g: that's fine [22:39:11] (and not in the "wanna crush some code" kind of way) [22:39:18] cool, thanks [22:40:06] greg-g: I'm flexible... [22:41:01] my schedule is less so, sadly :/ [22:45:27] group0 wikis are dead | I broke it :( [22:45:44] twentyafterfour: congratulations! you are officially a true member of the cabal [22:46:15] hashar: do I get a merrit badge? [22:47:00] I need to add to my collection... http://20after4.deviantart.com/badges/ [22:47:15] http://20after4.deviantart.com/badges/3526694/ [22:47:29] I don't think you get a tshirt unless you break a wikipedia [22:47:41] twentyafterfour: I like to say you are not officially endorsed as a WMF cluster maintainer until you broke the site at least once [22:47:45] we all did [22:48:02] despite all carefulness [22:48:22] lovely pixel badge [22:48:25] I earned by 'broke deviantart' dead-fella badge 3 times over [22:48:29] Krenair: mediawiki.org counts, no? :) [22:48:33] greg-g: Dude, crush some meetings bro [22:48:41] lol [22:49:14] there is a badge hard to earn [22:49:28] which is "we ended up calling Tim cause nobody figure it out" [22:49:43] not as a wikipedia [22:49:43] but I forgot where the etherpad was [22:50:02] http://etherpad.wikimedia.org/p/IBrokeWikipediaList [22:51:16] hah "11. mark b: [suggestion: Made every request a 301 redirect to the Belgian chapter wiki, then Squid happily cached the redirects]" [22:51:19] that's great [22:51:30] marktraceur: argh I totally need a couple of them [22:51:37] Right? [22:51:43] my lamest mistake was "drop enwiki;" [22:51:45] Why didn't you get one in SF? [22:52:31] they were rookie mistakes, nothing signed-off [22:52:52] like editing index.php and doing a typo causing fatal errors and thus blank pages for all sites :-/ [22:53:10] I sort of deserve one, bd808 want to keep one for me? :D [22:53:40] I will happily cover the shipping expenses. [22:54:30] (03PS1) 10John F. Lewis: beta: don't rate limit office IPs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191489 (https://phabricator.wikimedia.org/T87841) [22:55:17] this is the one caused by me: https://wikitech.wikimedia.org/wiki/Incident_documentation/20140714-Lists greg-g does it justify ? [22:56:30] (03CR) 10Greg Grossmeier: [C: 031] beta: don't rate limit office IPs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191489 (https://phabricator.wikimedia.org/T87841) (owner: 10John F. Lewis) [22:57:11] matanya: I don't want to be the arbitrator of t-shirts :) [22:57:13] jouncebot: next [22:57:13] In 1 hour(s) and 2 minute(s): Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150219T0000) [22:57:30] well I am off. Good luck everyone [22:57:40] g'night! [22:57:41] I'll let bd808 decide on this one :) [22:58:18] greg-g: want to see if twentyafterfour will merge that now or wait and stick it onto swat even though it doesn't need it? :) [22:58:38] o lawd, I'm on product duty [22:58:45] * Deskana blows his product whistle [22:58:50] HALT, IN THE NAME OF USERS [22:58:58] Deskana: Behave, you. :-) [22:59:04] JohnLewis: swat it [22:59:12] James_F: Aye captain. [22:59:14] kay, I'll add it now [22:59:58] I though James_F is for ever [23:00:17] only diamonds are forever [23:00:38] death also [23:00:54] :) [23:01:01] mutante: around? [23:01:09] you never knowc chasemp [23:01:18] or do I? [23:01:21] no I don't [23:01:32] but that would have been a weird unveiling if I did [23:02:17] jamesofur: I believe he's gone to do something. Anything the channel can help you with?? [23:02:26] it is reported that death is great, but no one came back to confirm [23:02:37] JohnLewis: filing a task for it :) then possibly [23:02:51] jamesofur: okay, CC me :p [23:02:57] diamonds and scap forever [23:03:14] greg-g: put on calendar, it'll be a fun SWAT for the person to do a no op merge ;) [23:03:15] matanya: I handed over the cap to Deskana for a change. [23:03:22] jamesofur: me too if you think i can help :P [23:03:28] James_F: but we love you... :( [23:03:28] :p [23:03:40] 22:52:39 Started sync-apaches [23:03:42] sync-common: 46% (ok: 123; fail: 0; left: 143) <-- maybe not forever but it seems like forever [23:04:18] time to sleep, night folks [23:04:25] * James_F gris. [23:09:28] which LDAP implementation do we use? [23:09:55] ebernhardson: the non active directory one [23:10:09] hashar: arn't there lots of ldap implementations? [23:10:10] jamesofur: I see the ticket I think [23:10:15] most likely :) [23:10:15] PROBLEM - HHVM rendering on mw1141 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:10:36] PROBLEM - Apache HTTP on mw1141 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:10:54] ebernhardson: you can ask #wikimedia-labs . I am pretty sure it is openldap [23:11:00] <^d> ebernhardson: opendj, I think? [23:11:29] jamesofur: seems like an awkward case actually :/ [23:11:36] * jamesofur nods [23:11:48] and actually what is being used is not necessarily obviously apparent [23:11:57] but if there is anything we can do I want to do it [23:12:39] banning is not really possible though ops can look through the logs to see if there was a possible cause and such. Nothing much than actually actually. [23:13:08] that seems like a huge security hole in the software.... but I guess... MailMan /sigh/ [23:14:09] ebernhardson: openldap in labs opendj in prod [23:15:18] jamesofur: it's a mix of mailman and ops though :) [23:15:44] JohnLewis: it's not ops fault if mailman can't ban an email address for abuse :) [23:15:57] mailman doesn't allow banning but it is possible to ban but it requires ops to block web access to an IP and exim for an email etc. [23:16:19] and ops don't want to jump through hoops :D [23:16:55] matanya: interesting we use both, any idea why the split? [23:17:03] if it's an ongoing issue I'm ok with jumping though hoops, but let's see what it looks like [23:17:24] to make them jump, just use the old 'per legal' line ;) [23:17:34] !log twentyafterfour Finished scap: testwiki to php-1.25wmf18 and rebuild l10n cache (duration: 42m 58s) [23:17:39] ebernhardson: historical reasons, from what i know, which is not much, the plan is to go with openldap all the way [23:17:39] Logged the message, Master [23:18:25] bd808: ~43min..you were close [23:18:32] 3Ops-Access-Requests, RESTBase, operations: Access to restbase / cassandra cluster - https://phabricator.wikimedia.org/T89366#1048426 (10GWicke) Thank you, @ottomata! For some reason the login doesn't work yet. Maybe just adding it in https://gerrit.wikimedia.org/r/#/c/190500/3/hieradata/role/common/restbase.yam... [23:19:02] * gwicke was hoping for mutante too [23:19:20] * gwicke responded to an old buffer [23:20:46] PROBLEM - HHVM queue size on mw1141 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [80.0] [23:20:56] anyone else with mailman/mail experience around from ops? [23:20:56] PROBLEM - HHVM busy threads on mw1141 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [86.4] [23:21:48] * ori looks at mw1141 [23:23:21] !log HHVM on mw1141 locked up (threads stuck in __lll_lock_wait). Depooling for further investigation. [23:23:28] Logged the message, Master [23:23:42] !log twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.25wmf18 [23:25:08] !log twentyafterfour Purged l10n cache for 1.25wmf16 [23:25:10] Logged the message, Master [23:26:03] 3ops-codfw, operations: rack and initial configuration of wtp2001-2020 - https://phabricator.wikimedia.org/T86807#1048465 (10Papaul) a:5Papaul>3RobH mgmt set up and BIOS setup complete test using root@ip racadm serveraction powercycle , console com2 complete. wtp2001 10.193.2.15 ge-4/0/17 B4 wtp2002 10.1... [23:26:16] (03PS2) 10Jforrester: Provide the Citoid extension for test wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/187132 [23:26:34] <^d> twentyafterfour: You got a patch incoming for the symlink updates? Deployment's still dirty on tin. [23:26:52] mmm, new servers to installlllll [23:27:19] ^d: no. did I miss another step? :-/ [23:27:46] <^d> Possibly. It should be documented though. [23:27:46] <^d> `cd /srv/mediawiki-staging/; git status` [23:28:43] I've been following https://wikitech.wikimedia.org/wiki/Heterogeneous_deployment/Train_deploys [23:29:01] <^d> Yeah [23:30:38] jamesofur: that emails sums my life up - I've still not found someone who does :p [23:30:47] <^d> twentyafterfour: Under "Clone new branch" [23:30:51] (well enjoys it as well and wants to do it) [23:30:58] :) [23:30:58] <^d> twentyafterfour: "Create symlinks patch for changes made in common/docroot and common/w by checkoutMediaWiki" [23:32:08] ^d: that just deals with static-$version [23:32:12] but not static-current [23:32:20] I'm not actually sure what step modified static-current [23:33:53] gwicke: ugh, so there is some conflict between the roles for admin::groups between restbase and cassandra [23:34:12] <^d> twentyafterfour: checkoutMediaWiki most likely, but the changes look right [23:34:13] argg. rephrasing [23:34:22] "Conflicting value for admin::groups found in role cassandra" [23:34:36] mutante: could you comment on https://phabricator.wikimedia.org/T89904 [23:34:37] on something that (also) gets restbase [23:34:44] you are the most mailman-y knowledgeable person? [23:35:02] mutante: hm, ok -- would adding the groups on cassandra instead help? [23:35:24] in testing it's cassandra-test-roots [23:35:33] so not restbase-* [23:36:30] (03PS1) 1020after4: Add static-current symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191496 [23:36:32] <^d> twentyafterfour: updateBranchPointers actually [23:36:51] (03CR) 1020after4: [C: 032] Add static-current symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191496 (owner: 1020after4) [23:36:55] (03Merged) 10jenkins-bot: Add static-current symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191496 (owner: 1020after4) [23:37:25] <^d> (which is run as a part of checkoutMW) [23:39:20] !log fixed symlinks. uploaded release notes. deployment finished 1.5 hours behind schedule [23:39:24] Logged the message, Master [23:40:30] <^d> This is a problem ^ [23:40:42] <^d> "Knowing the deploy system back and forth" should not be a pre-req for doing a timely deploy [23:40:45] twentyafterfour: woot. Can I sync out the globaluserpage config change now? [23:41:00] legoktm: yes [23:41:08] chasemp: replied [23:41:41] the entire thing is a horrid mess IMO [23:42:07] <^d> Couple of reasons. [23:42:09] I couldn't come up with a more convoluted process if I tried [23:42:12] mutante: cool tx [23:42:55] <^d> 1) most of the modern crap came from those of us who speak git in our sleep [23:43:20] (03PS3) 10Legoktm: Enable GlobalUserPage extension on all public, CentralAuth wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190691 (https://phabricator.wikimedia.org/T72576) [23:43:22] <^d> 2) a lot of it was never designed, it was just bolted on as needs changed, parts of it are largely unchanged behavior for years and years [23:43:35] obviously [23:43:39] ;) [23:43:42] (03CR) 10Legoktm: [C: 032] Enable GlobalUserPage extension on all public, CentralAuth wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190691 (https://phabricator.wikimedia.org/T72576) (owner: 10Legoktm) [23:44:05] <^d> 3 (sort a followup to 2)): it's fragile, so sweeping changes are shied away from [23:44:10] twentyafterfour, do you have that bd808's tshirt already? [23:44:33] o.O why is my change waiting behind a VE change?? [23:44:35] ^d no need to make excuses for it really, I can see why it is the way it is ... the only part I can't understand is how you guys put up with it [23:44:53] ve change? [23:44:55] twentyafterfour: we have a lot more crappy shit to fix :P [23:45:07] <^d> twentyafterfour: Because those of us who do it often enough have a sense of stolkholm syndrome and can suffer through it [23:45:16] <^d> it only becomes super obvious when we try to teach someone new [23:45:20] (03Merged) 10jenkins-bot: Enable GlobalUserPage extension on all public, CentralAuth wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190691 (https://phabricator.wikimedia.org/T72576) (owner: 10Legoktm) [23:45:26] <^d> (ie: if you just suffered too and didn't bitch, we'd muddle on) [23:45:28] <^d> ;-) [23:46:02] some of us just threw up our hands and found other things to do :-) [23:46:17] I've got a low tolerance for tedious manual repetitive tasks, but more importantly, it's too easy to screw things up [23:46:32] twentyafterfour++ [23:46:51] !log legoktm Synchronized wmf-config: Enable GlobalUserPage extension on all public, CentralAuth wikis (duration: 00m 05s) [23:46:53] Logged the message, Master [23:47:16] <^d> twentyafterfour: I want to have a discussion about this. We should figure out what our ideal process looks like and find a way to get to it [23:47:24] <^d> Rather than just fix the most obnoxious edge cases [23:47:31] https://en.wikipedia.org/wiki/Special:Version ? [23:47:47] <^d> Was the extension on before l10n rebuild? [23:47:47] at deviantart we had a big red button on a web page... deployment took all of 3 minutes and that was 95% waiting for it to do it's thing [23:47:54] yes, it was on testwikis [23:47:58] <^d> Hmm [23:48:10] http://dt.deviantart.com/ <-- several deployments each day, any dev could deploy at any time [23:48:26] * ^d doesn't trust devs! [23:48:41] !log aaron Synchronized php-1.25wmf17/includes/db/LoadBalancer.php: 42a56404328547a0b8bd07f001b1c4dff67b3498 (duration: 00m 05s) [23:48:44] Logged the message, Master [23:49:21] ^d: Reedy put it in https://github.com/wikimedia/operations-mediawiki-config/blob/master/wmf-config/extension-list-1.25wmf16, would that be related? [23:49:46] <^d> Yes, probably. [23:49:51] <^d> wtf is that version'd file? [23:50:08] messages are fine in wmf18 though? https://test.wikidata.org/wiki/Special:Version [23:50:45] hmm [23:50:51] (03CR) 10Chad: [C: 032] Remove profiling from CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191390 (owner: 10Chad) [23:51:02] I guess I'll just scap after SWAT? [23:51:16] (03Merged) 10jenkins-bot: Remove profiling from CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191390 (owner: 10Chad) [23:51:38] !log demon Synchronized wmf-config/CommonSettings.php: rm no-op profile calls (duration: 00m 06s) [23:51:43] Logged the message, Master [23:52:17] (03PS1) 10Legoktm: Put GlobalUserPage in the normal extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191501 [23:52:53] <^d> AaronS: 30 no-op function calls gone from CommonSettings. Yay micro-optimizations :p [23:54:26] jamesofur: I'm still looking at your request, may be possible to globally disable subscription for the email [23:54:47] thanks [23:59:16] 3Wikimedia-Labs-wikitech-interface, operations: wikitech instances list is blank - https://phabricator.wikimedia.org/T89808#1048561 (10mmodell) I haven't logged out and the problem seems to have resolved it's self. I'm not sure what the issue was, I had assumed it was related to the outage on labs yesterday. [23:59:26] 3Wikimedia-Labs-wikitech-interface, operations: wikitech instances list is blank - https://phabricator.wikimedia.org/T89808#1048562 (10mmodell) 5Open>3Invalid a:3mmodell