[00:00:03] ? [00:00:07] that's in mariadb/packages [00:00:13] mariadb::packages [00:00:26] that’s just for the repo though [00:00:26] let me get on the host [00:00:32] it wouldn’t prevent the packages from installing would it? [00:00:33] yeah, so if the repo fials then no packages [00:00:40] assuming the packages are only in the repo [00:00:45] no puppet errors or warnings [00:00:51] apt-get update? [00:01:14] also is it labtestweb2001.codfw.wmnet? [00:01:22] yes [00:01:25] ssh labtestweb2001.codfw.wmnet [00:01:28] fails [00:01:33] oh, sorry, .wikimedia.org [00:01:39] it should at least try to install those packages, I think [00:01:41] ah [00:01:43] ok [00:04:23] yeah [00:04:27] had to mess with my ssh config to get into it [00:04:39] am digging in [00:04:43] puppet takes forever [00:04:48] only to find that my user wasn't set up :P [00:05:01] (since I don't usually send *.wikimedia.org ssh through production bastions) [00:05:30] works now though [00:06:27] andrewbogott: HAHAHA! [00:06:29] found it [00:06:44] * andrewbogott prepares to feel ashamed [00:06:48] (03PS1) 10Yuvipanda: mariadb: Actually include class in role [puppet] - 10https://gerrit.wikimedia.org/r/264901 [00:06:51] andrewbogott: ^ [00:07:06] I get bit by it every time I write a role that has same name as class [00:07:13] ah, so the role included itself [00:07:30] yeah [00:07:36] and was a noop since it was already included [00:07:37] so doesn’t that mean that potentially every one of our db servers is going to get touched by this? [00:07:46] I don't think our db servers use role::mariadb [00:08:32] lots use role::mariadb::core [00:08:35] yup [00:08:50] which does not actually use mariadb role directly at all [00:08:54] so this has never really worked [00:08:59] and nobody has ever used it :D [00:09:11] how are mariadb packages actually getting installed anyplace? [00:09:33] class { 'mariadb::packages_wmf': [00:09:35] mariadb10 => true, [00:09:37] } [00:09:38] and others [00:09:44] andrewbogott: the mariadb class is being included in places [00:09:47] and works [00:09:52] it's just the mariadb *role* doesn't work [00:10:15] yep, I follow… just want to make sure your patch doesn’t break anything [00:10:19] although I don’t know why it would [00:10:46] (03PS2) 10Andrew Bogott: mariadb: Actually include class in role [puppet] - 10https://gerrit.wikimedia.org/r/264901 (owner: 10Yuvipanda) [00:11:56] (03CR) 10Andrew Bogott: [C: 032] mariadb: Actually include class in role [puppet] - 10https://gerrit.wikimedia.org/r/264901 (owner: 10Yuvipanda) [00:12:22] thanks andrewbogott [00:16:28] well, I’m lost. wmf-mariadb10 claims to replace mariadb but I can’t tell what it’s supposed to do or replace. [00:18:45] andrewbogott: is it mariadb10 vs mariadb 5.5? [00:19:45] yeah [00:19:49] we use mariadb 10 in a lot of places [00:19:50] Reedy: I think that wmf-mariadb10 is what we run on production, and it includes everything I need /except/ for an init script [00:19:57] and silver has an init script but I can’t tell were it came from [00:19:57] heh [00:20:08] I think it's done purposely [00:20:25] So the debs don't fuck over the cluster mysql boxes [00:20:42] I'm trying to curl this service running on krypton on port 8000 from stat1002 - (port was opened up here https://gerrit.wikimedia.org/r/#/c/249433/2/manifests/role/analytics/burrow.pp) The connection times out though - does anyone know what I'm missing? [00:20:45] so I’m supposed to have to create the init script by hand? [00:20:59] I presume there's a way round it [00:21:27] I guess we're supposed to ask the DBA :D [00:21:29] I'd almost be tempted to grab a deb from a mariadb mirror and install that initially [00:21:32] And then switch over [00:21:42] Reedy: that's what the mariadb package is doing I think (5.5?) [00:21:51] madhuvishy: try from bastion? [00:21:53] Likely [00:22:00] YuviPanda: ah yes [00:22:12] madhuvishy: stat1002 is in the analytics vlan, which has additional firewall constraints on top of ferm firewalls that might be affecting you [00:22:13] that seems like something otto might have told me [00:22:55] YuviPanda: bast1001? [00:22:55] this is all terrible and obscure [00:22:59] but I can work around it [00:23:01] madhuvishy: yes [00:23:13] cannot get to it [00:23:26] oh i should proxy? [00:23:26] madhuvishy: oh? bast1001.wikimedia.org? [00:23:30] madhuvishy: it is the proxy :D [00:23:33] i cant ssh [00:23:37] oh [00:23:40] wait [00:23:43] you mean you can't get to krypton from bastion? [00:23:45] or can't get to bastion? [00:23:47] no [00:23:53] i can't ssh into bastion [00:24:14] so i was asking if i should somehow proxy requests to krypton via bastion [00:24:26] (03PS1) 10Andrew Bogott: Revert "Include role::mariadb on labtestweb2001" [puppet] - 10https://gerrit.wikimedia.org/r/264902 [00:24:30] because it says channel 0: open failed: administratively prohibited: open failed [00:24:56] madhuvishy: nah, you should test it from bastion, and if that works file a bug to open up that port to the analytics vlan [00:25:16] madhuvishy: you were trying to connect to bast1001.eqiad.wmnet [00:25:23] madhuvishy: you should connect to bast1001.wikimedia.org [00:25:30] gah [00:25:31] okay [00:25:32] thanks [00:26:37] (03PS2) 10Andrew Bogott: Revert "Include role::mariadb on labtestweb2001" [puppet] - 10https://gerrit.wikimedia.org/r/264902 [00:27:46] (03CR) 10Andrew Bogott: [C: 032] Revert "Include role::mariadb on labtestweb2001" [puppet] - 10https://gerrit.wikimedia.org/r/264902 (owner: 10Andrew Bogott) [00:30:28] going to eat some food [00:39:38] (03PS1) 10Andrew Bogott: Only check puppet freshness once per day, not 60 times in a row. [puppet] - 10https://gerrit.wikimedia.org/r/264904 (https://phabricator.wikimedia.org/T121773) [00:41:45] (03CR) 10Andrew Bogott: [C: 032] Only check puppet freshness once per day, not 60 times in a row. [puppet] - 10https://gerrit.wikimedia.org/r/264904 (https://phabricator.wikimedia.org/T121773) (owner: 10Andrew Bogott) [00:41:54] (03PS2) 10Andrew Bogott: Only check puppet freshness once per day, not 60 times in a row. [puppet] - 10https://gerrit.wikimedia.org/r/264904 (https://phabricator.wikimedia.org/T121773) [00:58:38] andrewbogott, so I suppose the mysql server came up, as the error message changed to Access denied for user 'wikiuser'@'208.80.153.14' (using password: YES) (208.80.153.14) [00:58:58] yeah, I’m fighting with accounts and grants now [01:00:53] I’ve done set password for 'wikiuser'@'208.80.153.14' = PASSWORD(‘’) [01:00:57] but that seems insufficient [01:01:02] I would help but deployers are no longer mysql roots, so... :P [01:02:55] 7Blocked-on-Operations, 10Dumps-Generation, 10Flow, 3Collaboration-Team-Current, 5WMF-deploy-2016-01-19_(1.27.0-wmf.11): Publish recurring Flow dumps at http://dumps.wikimedia.org/ - https://phabricator.wikimedia.org/T119511#1942764 (10Catrope) [01:04:15] I don't think that's how you're supposed to do it andrewbogott [01:04:23] have you seen templates/mariadb/production-grants-core.sql.erb ? [01:06:02] I’ve also done grant index, create, select, insert, update, delete, drop, alter, lock tables on labtestwiki.* to 'wikiuser'@'localhost' identified by ‘ [01:06:57] yeah, 'localhost' won't work [01:07:30] well, also grant index, create, select, insert, update, delete, drop, alter, lock tables on labtestwiki.* to 'wikiuser'@'208.80.153.14' identified by ‘ [01:07:37] I’m trying everything :) [01:15:04] andrewbogott, what about copying the grant commands shown when you "show grants for wikiuser@'208.80.154.136'" on silver? [01:19:33] Krenair: I’m wanting to run install.pp to create all the tables and such [01:19:38] but maybe that’s unlikely in this setup [01:19:48] install.php? [01:20:07] I need a labtestwiki database to exist, and it needs tables [01:20:13] before I can import anything. [01:20:36] install.php is normally how you create an empty wiki, isn’t it? [01:20:39] no [01:20:42] not in wikimedia production [01:20:59] then how? [01:21:13] We have a script called addWiki.php... a lot of it isn't relevant to wikitech, but that's okay [01:21:26] problem is it needs permissions, to create databases as wikiadmin [01:22:33] mwscript WikimediaMaintenance/addWiki.php --wiki=labtestwiki ? [01:23:00] It does all sorts of fun things like sending announcement emails to mailing lists [01:23:38] well, I’m not looking forward to that :( [01:24:07] right, which is why I'd only run part of it [01:24:07] I can’t figure out what path to give to mwscript [01:24:13] oh, how do I run part of it? [01:24:18] please do not run the script [01:24:50] are the grants set up in mysql? [01:25:10] so -- [01:25:20] I won’t really know if the grants are right or wrong until I have something to test them with [01:25:28] for example, a script that creates a wiki [01:27:38] won't sql labtestwiki won't? [01:28:16] yep, lgtm [01:29:54] ? [01:31:07] looks good to me [01:31:15] am modifying the script to do what we need [01:31:20] It was the "won't sql labtestwiki won’t?” part that I didn’t understand [01:31:24] but I will stand by, thank you :) [01:32:29] oh [01:32:30] right [01:32:37] that was meant to be "won't sql labtestwiki work?" [01:35:25] 6operations, 10Analytics, 10Analytics-Cluster, 10EventBus, 6Services: Investigate proper set up for using Kafka MirrorMaker with new main Kafka clusters. - https://phabricator.wikimedia.org/T123954#1942833 (10GWicke) Cross-DC consumption without replication would mean that events from an unavailable DC w... [01:38:20] andrewbogott, look at the table list now [01:38:45] that’s a lot of tables! [01:38:59] so now… why is Access denied for user 'wikiuser'@'208.80.153.14' [01:39:04] wikiadmin works, not wikiuser [01:39:07] hm [01:39:17] right, but I set up wikiuser too, in theory [01:39:42] mysql> show grants for wikiuser@208.80.153.14; [01:39:42] ERROR 1044 (42000): Access denied for user 'wikiadmin'@'208.80.153.14' to database 'mysql' [01:39:43] meh [01:40:28] password could be wrong, I suppose [01:41:35] this is $wgDBsqlpassword that I want, right? [01:41:45] for GRANT ALL PRIVILEGES ON *.* TO 'wikiuser'@'208.80.153.14' IDENTIFIED BY PASSWORD [01:42:05] oh, wait [01:42:32] ummm [01:42:33] actually [01:42:38] I'm not sure what that one does [01:42:46] look at wgDBpassword [01:43:32] ah, there we go [01:43:39] on to the next error [01:44:00] meanwhile I’m going to do this import... [01:44:32] hm, Error: 1146 Table 'labtestwiki.smw_object_ids' doesn't exist [01:44:37] right [01:44:49] so this script won't have done the wikitech special extension tables [01:45:01] just the normal brand-new-prod-wiki set [01:45:25] yeah. So how do we get those? [01:45:50] there'll be sql files in the extension repositories [01:46:52] Well. [01:46:55] There *should* be. [01:46:58] I presume I run them as wikiadmin as well? [01:48:26] hang on, I'll do this bit [01:48:30] ok, thanks [01:48:50] okay [01:48:51] SMW set up [01:49:10] there’s one in openstackmanager too [01:49:13] had to dig through the source code and find this code: smwfGetStore()->setup( true ); [01:49:14] yes [01:49:17] and oathauth [01:49:54] and echo, apparently... [01:50:54] Oh, yeah. [01:51:03] 6operations, 10Analytics, 10Analytics-Cluster, 10EventBus, 6Services: Investigate proper set up for using Kafka MirrorMaker with new main Kafka clusters. - https://phabricator.wikimedia.org/T123954#1942844 (10mobrovac) The point for me here are not names, but rather the ability of consumers in different... [01:51:06] wikitech is one of the wikis with a more traditional echo setup [01:51:23] a local echo table instead of storing it centrally on the x1 db cluster [01:51:30] well, tables [01:52:36] just going through the list of extensions now [01:53:26] there are lots :( [01:55:28] in OSM, when you added schema-changes/openstack_change_token_size.sql you didn't change openstack.sql [01:58:23] Oh [01:58:29] Because there is schema-changes/tokens.sql [01:58:53] oh well [01:58:55] done OSM [02:00:02] OATHAuth done [02:01:21] LdapAuthentication done (just has a ldap_domains table) [02:02:02] Echo is supported by createExtensionTables.php of course [02:02:35] andrewbogott, are we missing anything? [02:02:47] I’ll run the import again, we’ll see [02:03:10] lots of this: [02:03:10] Trying to get property of non-object in /srv/mediawiki/php-1.27.0-wmf.9/includes/SiteStats.php on line 174 [02:03:14] but I don’t know if that matters [02:03:40] hmm [02:04:16] also this is I think text-only import so it might be upset about image files missing [02:04:19] which I will do next [02:06:37] I ran the "INSERT INTO site_stats(ss_row_id) VALUES (1)" query that addWiki usually does (but I left out of the wikitech script because I figured it'd be imported) [02:07:21] since it seems to be mid-import I'll have to make it regenerate the site stats later [02:09:22] hopefully you'll see less of the warning [02:11:44] ok, I had to create the image dir, I’m re-running the import now and it seems happier [02:12:16] this is going to take an hour or more, surely you need to sleep? [02:14:33] pff, sleep :P [02:14:52] I also need to eat at some point :) [02:15:09] but yes, /me goes to get other stuff done before going to be [02:15:11] bed [02:15:34] thanks for all your work on this! It will be so nice to have a test box [02:15:54] of course lots to do yet getting openstack working properly [02:26:10] !log mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 10m 40s) [02:26:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:36:31] PROBLEM - Host ms-be1001 is DOWN: PING CRITICAL - Packet loss = 100% [02:39:19] 7Blocked-on-Operations, 10Dumps-Generation, 10Flow, 3Collaboration-Team-Current, 5WMF-deploy-2016-01-19_(1.27.0-wmf.11): Publish recurring Flow dumps at http://dumps.wikimedia.org/ - https://phabricator.wikimedia.org/T119511#1828193 (10Catrope) [02:46:40] !log mwdeploy@tin sync-l10n completed (1.27.0-wmf.10) (duration: 09m 21s) [02:46:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:53:40] !log l10nupdate@tin ResourceLoader cache refresh completed at Tue Jan 19 02:53:40 UTC 2016 (duration 7m 0s) [02:53:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:08:41] (03PS1) 10Alex Monk: Fix IPs in wikitech apache configs [puppet] - 10https://gerrit.wikimedia.org/r/264915 [03:13:11] PROBLEM - puppet last run on mw2194 is CRITICAL: CRITICAL: puppet fail [03:22:17] (03PS1) 10Catrope: Enable cross-wiki notifications beta feature on testwiki and test2wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/264917 [03:23:22] PROBLEM - High load average on labstore1001 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [24.0] [03:35:01] (03PS2) 10Catrope: Enable cross-wiki notifications beta feature on testwiki and test2wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/264917 [03:35:03] (03PS1) 10Catrope: Add cross-wiki notifications to beta features whitelist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/264920 [03:42:22] (03CR) 10Jforrester: [C: 031] "This 'ere +1 is wot signifies my consent, donchakno." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/264920 (owner: 10Catrope) [03:42:32] RECOVERY - puppet last run on mw2194 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [03:58:52] RECOVERY - High load average on labstore1001 is OK: OK: Less than 50.00% above the threshold [16.0] [04:07:22] PROBLEM - High load average on labstore1001 is CRITICAL: CRITICAL: 88.89% of data above the critical threshold [24.0] [04:11:32] RECOVERY - High load average on labstore1001 is OK: OK: Less than 50.00% above the threshold [16.0] [04:36:22] (03PS2) 10Andrew Bogott: Fix IPs in wikitech apache configs [puppet] - 10https://gerrit.wikimedia.org/r/264915 (owner: 10Alex Monk) [04:37:32] (03CR) 10Andrew Bogott: [C: 032] Fix IPs in wikitech apache configs [puppet] - 10https://gerrit.wikimedia.org/r/264915 (owner: 10Alex Monk) [04:57:53] PROBLEM - puppet last run on cp3039 is CRITICAL: CRITICAL: puppet fail [05:25:03] RECOVERY - puppet last run on cp3039 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:23:22] PROBLEM - puppet last run on mw2133 is CRITICAL: CRITICAL: puppet fail [06:26:16] (03PS1) 10Varnent: Add ability for sysops to self-set and self-remove flood group right. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/264930 (https://phabricator.wikimedia.org/T86237) [06:31:02] PROBLEM - puppet last run on mc2007 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:12] PROBLEM - puppet last run on mw1170 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:12] PROBLEM - puppet last run on db2055 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:52] PROBLEM - puppet last run on mw1158 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:53] PROBLEM - puppet last run on mw2073 is CRITICAL: CRITICAL: Puppet has 2 failures [06:32:31] PROBLEM - puppet last run on mw2045 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:32] PROBLEM - puppet last run on mw1119 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:32] PROBLEM - puppet last run on lvs2002 is CRITICAL: CRITICAL: Puppet has 2 failures [06:32:33] PROBLEM - puppet last run on mw2043 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:52] PROBLEM - puppet last run on mw2208 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:02] PROBLEM - puppet last run on mw1135 is CRITICAL: CRITICAL: Puppet has 2 failures [06:33:12] PROBLEM - puppet last run on mw2207 is CRITICAL: CRITICAL: Puppet has 6 failures [06:33:13] PROBLEM - puppet last run on mw2126 is CRITICAL: CRITICAL: Puppet has 2 failures [06:50:31] RECOVERY - puppet last run on mw2133 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [06:56:22] PROBLEM - salt-minion processes on alsafi is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:56:32] PROBLEM - RAID on alsafi is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:56:42] RECOVERY - puppet last run on mw1158 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [06:56:52] PROBLEM - DPKG on alsafi is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:57:02] PROBLEM - configured eth on alsafi is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:57:21] RECOVERY - puppet last run on mw2045 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [06:57:21] PROBLEM - dhclient process on alsafi is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:57:21] RECOVERY - puppet last run on mw1119 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [06:57:22] PROBLEM - Disk space on alsafi is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:57:31] RECOVERY - puppet last run on lvs2002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:31] RECOVERY - puppet last run on mw2043 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:42] RECOVERY - puppet last run on mw2208 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:52] RECOVERY - puppet last run on mw1135 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:52] PROBLEM - puppet last run on alsafi is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:58:02] RECOVERY - puppet last run on mc2007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:03] RECOVERY - puppet last run on mw2207 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:03] RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:03] RECOVERY - puppet last run on mw2126 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:11] RECOVERY - puppet last run on db2055 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:52] RECOVERY - puppet last run on mw2073 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:19:02] PROBLEM - SSH on alsafi is CRITICAL: Server answer [07:21:11] RECOVERY - SSH on alsafi is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u1 (protocol 2.0) [07:33:16] 6operations, 7HHVM: Switch HAT appservers to trusty's ICU (or newer) - https://phabricator.wikimedia.org/T86096#1943208 (10Joe) @MoritzMuehlenhoff I don't remember the specifics, but I can look into it. [07:49:38] (03PS2) 10Jalexander: Add ability for OfficeWiki sysops to add and remove flood group rights from themselves. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/264930 (https://phabricator.wikimedia.org/T86237) (owner: 10Varnent) [07:55:16] (03CR) 10Jalexander: [C: 031] Add ability for OfficeWiki sysops to add and remove flood group rights from themselves. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/264930 (https://phabricator.wikimedia.org/T86237) (owner: 10Varnent) [07:55:28] 7Blocked-on-Operations, 10Dumps-Generation, 10Flow, 3Collaboration-Team-Current, 5WMF-deploy-2016-01-19_(1.27.0-wmf.11): Publish recurring Flow dumps at http://dumps.wikimedia.org/ - https://phabricator.wikimedia.org/T119511#1943234 (10Nemo_bis) [07:55:39] 7Blocked-on-Operations, 10Dumps-Generation, 10Flow, 3Collaboration-Team-Current, 5WMF-deploy-2016-01-19_(1.27.0-wmf.11): Publish recurring Flow dumps at http://dumps.wikimedia.org/ - https://phabricator.wikimedia.org/T119511#1828193 (10Nemo_bis) [07:56:56] 7Blocked-on-Operations, 10Dumps-Generation, 10Flow, 3Collaboration-Team-Current, 5WMF-deploy-2016-01-19_(1.27.0-wmf.11): Publish recurring Flow dumps at http://dumps.wikimedia.org/ - https://phabricator.wikimedia.org/T119511#1943244 (10Nemo_bis) p:5Normal>3High Fix priority per blocked task. [08:00:51] PROBLEM - SSH on alsafi is CRITICAL: Server answer [08:02:01] 6operations: Adding ltoscano@wikimedia.org to the analytics-alerts mailing list - https://phabricator.wikimedia.org/T123141#1943255 (10elukey) [08:02:52] RECOVERY - SSH on alsafi is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u1 (protocol 2.0) [08:05:39] (03PS1) 10Giuseppe Lavagetto: Add deployment virtual name [dns] - 10https://gerrit.wikimedia.org/r/264932 [08:25:31] PROBLEM - NTP on alsafi is CRITICAL: NTP CRITICAL: No response from NTP server [08:34:02] PROBLEM - Disk space on stat1002 is CRITICAL: DISK CRITICAL - free space: /a 327215 MB (3% inode=99%) [08:44:31] PROBLEM - SSH on alsafi is CRITICAL: Server answer [08:46:32] RECOVERY - SSH on alsafi is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u1 (protocol 2.0) [09:01:11] RECOVERY - Disk space on stat1002 is OK: DISK OK [09:05:22] PROBLEM - SSH on alsafi is CRITICAL: Server answer [09:07:31] RECOVERY - SSH on alsafi is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u1 (protocol 2.0) [09:10:51] 6operations: Create email alias that will send emails to all Analytics Engineers - https://phabricator.wikimedia.org/T121180#1943351 (10akosiaris) Added Luca Toscano (new analytics hire) as well [09:15:51] PROBLEM - SSH on alsafi is CRITICAL: Server answer [09:17:52] RECOVERY - SSH on alsafi is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u1 (protocol 2.0) [09:33:50] 6operations, 10Gerrit, 10GitHub-Mirrors, 10ValueView, and 2 others: [Task] Redirect unused extensions/ValueView repository to data-values/value-view - https://phabricator.wikimedia.org/T123624#1943360 (10hashar) [09:36:52] PROBLEM - citoid endpoints health on sca1001 is CRITICAL: /api (bad PMCID) is CRITICAL: Could not fetch url http://10.64.32.153:1970/api: Timeout on connection while downloading http://10.64.32.153:1970/api [09:37:45] 6operations, 10Gerrit, 10GitHub-Mirrors, 10ValueView, and 2 others: [Task] Redirect unused extensions/ValueView repository to data-values/value-view - https://phabricator.wikimedia.org/T123624#1943370 (10hashar) 5Open>3Resolved a:3hashar I have removed the extension from CI and deleted the related Gi... [09:40:52] PROBLEM - citoid endpoints health on sca1002 is CRITICAL: /api (bad PMCID) is CRITICAL: Could not fetch url http://10.64.48.29:1970/api: Timeout on connection while downloading http://10.64.48.29:1970/api [09:45:11] PROBLEM - SSH on alsafi is CRITICAL: Server answer [09:47:12] RECOVERY - citoid endpoints health on sca1001 is OK: All endpoints are healthy [09:47:12] RECOVERY - SSH on alsafi is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u1 (protocol 2.0) [09:53:32] PROBLEM - citoid endpoints health on sca1001 is CRITICAL: /api (bad PMCID) is CRITICAL: Could not fetch url http://10.64.32.153:1970/api: Timeout on connection while downloading http://10.64.32.153:1970/api [09:59:26] (03CR) 10Jakob: [C: 031] Phragile: Ensure clone before creating storage dir [puppet] - 10https://gerrit.wikimedia.org/r/264745 (owner: 10WMDE-leszek) [09:59:42] RECOVERY - citoid endpoints health on sca1002 is OK: All endpoints are healthy [09:59:51] RECOVERY - citoid endpoints health on sca1001 is OK: All endpoints are healthy [10:02:02] PROBLEM - SSH on alsafi is CRITICAL: Server answer [10:06:13] PROBLEM - citoid endpoints health on sca1002 is CRITICAL: /api (bad PMCID) is CRITICAL: Could not fetch url http://10.64.48.29:1970/api: Timeout on connection while downloading http://10.64.48.29:1970/api [10:06:21] PROBLEM - citoid endpoints health on sca1001 is CRITICAL: /api (bad PMCID) is CRITICAL: Could not fetch url http://10.64.32.153:1970/api: Timeout on connection while downloading http://10.64.32.153:1970/api [10:16:41] RECOVERY - citoid endpoints health on sca1001 is OK: All endpoints are healthy [10:18:51] RECOVERY - SSH on alsafi is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u1 (protocol 2.0) [10:20:14] <_joe_> sigh, citoid timeouts again [10:21:28] 6operations, 10Continuous-Integration-Infrastructure, 7HHVM, 7WorkType-Maintenance: HHVM Jenkins job throw: Unable to set CoreFileSize to 8589934592: Operation not permitted (1) - https://phabricator.wikimedia.org/T78799#1943428 (10hashar) >>! In T78799#1939272, @Nemo_bis wrote: > So in other words that's... [10:23:02] PROBLEM - citoid endpoints health on sca1001 is CRITICAL: /api (bad PMCID) is CRITICAL: Could not fetch url http://10.64.32.153:1970/api: Timeout on connection while downloading http://10.64.32.153:1970/api [10:27:11] RECOVERY - citoid endpoints health on sca1001 is OK: All endpoints are healthy [10:29:13] PROBLEM - SSH on alsafi is CRITICAL: Server answer [10:33:51] PROBLEM - citoid endpoints health on sca1001 is CRITICAL: /api (bad PMCID) is CRITICAL: Could not fetch url http://10.64.32.153:1970/api: Timeout on connection while downloading http://10.64.32.153:1970/api [10:35:11] PROBLEM - check_mysql on db1008 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 799 [10:35:51] RECOVERY - citoid endpoints health on sca1001 is OK: All endpoints are healthy [10:39:53] RECOVERY - SSH on alsafi is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u1 (protocol 2.0) [10:40:11] RECOVERY - check_mysql on db1008 is OK: Uptime: 591651 Threads: 2 Questions: 4709460 Slow queries: 3869 Opens: 1610 Flush tables: 2 Open tables: 400 Queries per second avg: 7.959 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [10:46:21] RECOVERY - citoid endpoints health on sca1002 is OK: All endpoints are healthy [10:52:53] 6operations, 7HHVM: Be able to switch programmatically between deployment servers in codfw and eqiad - https://phabricator.wikimedia.org/T124024#1943475 (10Joe) 3NEW [10:57:42] PROBLEM - puppet last run on db2050 is CRITICAL: CRITICAL: puppet fail [11:03:11] PROBLEM - SSH on alsafi is CRITICAL: Server answer [11:07:01] what's up with alsafi? [11:07:22] RECOVERY - SSH on alsafi is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u1 (protocol 2.0) [11:07:41] PROBLEM - citoid endpoints health on sca1001 is CRITICAL: /api (bad PMCID) is CRITICAL: Could not fetch url http://10.64.32.153:1970/api: Timeout on connection while downloading http://10.64.32.153:1970/api [11:09:41] RECOVERY - citoid endpoints health on sca1001 is OK: All endpoints are healthy [11:15:34] 6operations, 10ops-eqiad: asw-c1-eqiad uplinks are down - https://phabricator.wikimedia.org/T124026#1943493 (10faidon) 3NEW a:3Cmjohnson [11:15:40] (03PS1) 10Mdann52: Add import source for ru.wikisource.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/264937 (https://phabricator.wikimedia.org/T123837) [11:15:51] PROBLEM - SSH on alsafi is CRITICAL: Server answer [11:17:52] RECOVERY - SSH on alsafi is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u1 (protocol 2.0) [11:23:12] RECOVERY - puppet last run on db2050 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:26:22] PROBLEM - SSH on alsafi is CRITICAL: Server answer [11:32:52] RECOVERY - SSH on alsafi is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u1 (protocol 2.0) [11:36:32] PROBLEM - puppet last run on elastic1031 is CRITICAL: CRITICAL: Puppet has 1 failures [11:41:12] PROBLEM - SSH on alsafi is CRITICAL: Server answer [11:43:21] RECOVERY - SSH on alsafi is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u1 (protocol 2.0) [11:53:51] PROBLEM - SSH on alsafi is CRITICAL: Server answer [11:55:52] RECOVERY - SSH on alsafi is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u1 (protocol 2.0) [12:01:53] RECOVERY - puppet last run on elastic1031 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [12:17:03] PROBLEM - SSH on alsafi is CRITICAL: Server answer [12:21:21] RECOVERY - SSH on alsafi is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u1 (protocol 2.0) [12:23:52] PROBLEM - puppet last run on mw2092 is CRITICAL: CRITICAL: Puppet has 1 failures [12:27:33] PROBLEM - SSH on alsafi is CRITICAL: Server answer [12:49:22] RECOVERY - puppet last run on mw2092 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [12:53:02] RECOVERY - SSH on alsafi is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u1 (protocol 2.0) [12:59:22] PROBLEM - SSH on alsafi is CRITICAL: Server answer [13:02:44] so what's going on with alsafi? :) [13:11:23] <_joe_> looks like the usual ganeti bug? [13:11:28] <_joe_> we've seen it already [13:17:08] (03PS1) 10Giuseppe Lavagetto: scap: use logical names for the rsync master [puppet] - 10https://gerrit.wikimedia.org/r/264943 (https://phabricator.wikimedia.org/T124024) [13:17:10] (03PS1) 10Giuseppe Lavagetto: role::deployment: make it possible to switch between different servers [puppet] - 10https://gerrit.wikimedia.org/r/264944 (https://phabricator.wikimedia.org/T124024) [13:17:12] (03PS1) 10Giuseppe Lavagetto: deployment: activate redis replica between the masters [puppet] - 10https://gerrit.wikimedia.org/r/264945 (https://phabricator.wikimedia.org/T124024) [13:28:53] RECOVERY - SSH on alsafi is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u1 (protocol 2.0) [13:35:31] PROBLEM - SSH on alsafi is CRITICAL: Server answer [13:50:12] RECOVERY - SSH on alsafi is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u1 (protocol 2.0) [13:51:57] !log powercycling alsafi [13:52:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:52:24] !log powercycle ms-be1001 [13:52:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:52:32] RECOVERY - RAID on alsafi is OK: OK: no RAID installed [13:52:41] RECOVERY - DPKG on alsafi is OK: All packages OK [13:52:41] RECOVERY - configured eth on alsafi is OK: OK - interfaces up [13:52:51] RECOVERY - dhclient process on alsafi is OK: PROCS OK: 0 processes with command name dhclient [13:52:51] RECOVERY - Disk space on alsafi is OK: DISK OK [13:53:52] RECOVERY - salt-minion processes on alsafi is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [13:55:02] RECOVERY - puppet last run on alsafi is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:55:16] 6operations, 10DBA: TokuDB crashes frequently -consider upgrade it or search for alternative engines with similar features - https://phabricator.wikimedia.org/T109069#1943829 (10jcrespo) 5Open>3stalled [13:55:51] RECOVERY - Host ms-be1001 is UP: PING OK - Packet loss = 0%, RTA = 1.09 ms [13:55:57] paravoid: out of curiosity, what's wrong with alsafi? [13:56:22] I think it's our usual kvm bug, I'll follow up on that later [13:56:34] basically some kvm instances just lock up for no good reason [13:56:40] uh [13:56:56] I've debugged it with the qemu maintainer and qemu upstream, didn't go anywhere [13:57:02] :( [13:57:02] RECOVERY - very high load average likely xfs on ms-be1001 is OK: OK - load average: 17.76, 6.88, 2.51 [13:57:14] I think aio=native fixes it, if it doesn't we'll have to just backport a new version and see how that looks like [13:57:54] from the icinga alerts it looks like SSH was having troubles but the rest was fine? [13:58:21] no, ssh was flapping, everything else remained down [14:02:55] oh yes, I've just seen the alerts from earlier this morning [14:03:53] 6operations, 10DBA, 7Icinga, 7Monitoring: "db1047/eventlogging_sync processes" icinga alert is flaky since at least early January - https://phabricator.wikimedia.org/T123509#1943858 (10jcrespo) This check should be marked as non-critical, and not sending pages (but alters to chat/web interface/email). For... [14:12:22] RECOVERY - NTP on alsafi is OK: NTP OK: Offset -0.0002576112747 secs [14:16:01] PROBLEM - puppet last run on mw1161 is CRITICAL: CRITICAL: puppet fail [14:17:38] 6operations, 10Analytics, 10Analytics-Cluster, 10EventBus, 6Services: Investigate proper set up for using Kafka MirrorMaker with new main Kafka clusters. - https://phabricator.wikimedia.org/T123954#1943905 (10GWicke) > I guess in a first phase we could manually instruct them to do so, but automatising co... [14:19:10] 6operations, 5Patch-For-Review, 7Swift: swift upgrade plans - https://phabricator.wikimedia.org/T117972#1943910 (10fgiunchedi) I've dist-upgraded swift in esams to trusty, the only precise machines left are ms-fe1001 -> ms-fe1004 and ms-be1001 -> ms-be1015 in terms of next steps I think we should: 1. dist-u... [14:20:02] PROBLEM - puppet last run on ganeti2004 is CRITICAL: CRITICAL: puppet fail [14:27:32] PROBLEM - SSH on mw1161 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:27:42] PROBLEM - RAID on mw1161 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:28:11] PROBLEM - configured eth on mw1161 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:29:21] PROBLEM - DPKG on mw1161 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:29:28] !log reimporting some fawiki tables from production into labsdb hosts [14:29:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:32:01] PROBLEM - salt-minion processes on mw1161 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:33:23] PROBLEM - Disk space on mw1161 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:34:42] PROBLEM - dhclient process on mw1161 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:35:22] RECOVERY - Disk space on mw1161 is OK: DISK OK [14:35:42] RECOVERY - DPKG on mw1161 is OK: All packages OK [14:35:59] 7Blocked-on-Operations, 10Datasets-Archiving, 10Dumps-Generation, 10Flow, and 2 others: Publish recurring Flow dumps at http://dumps.wikimedia.org/ - https://phabricator.wikimedia.org/T119511#1943950 (10Hydriz) [14:36:11] RECOVERY - SSH on mw1161 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.4 (protocol 2.0) [14:36:11] RECOVERY - salt-minion processes on mw1161 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [14:36:22] RECOVERY - RAID on mw1161 is OK: OK: no RAID installed [14:36:42] RECOVERY - dhclient process on mw1161 is OK: PROCS OK: 0 processes with command name dhclient [14:36:51] RECOVERY - configured eth on mw1161 is OK: OK - interfaces up [14:41:52] RECOVERY - puppet last run on mw1161 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [14:43:42] RECOVERY - puppet last run on ganeti2004 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [14:48:01] PROBLEM - Host labstore1002 is DOWN: PING CRITICAL - Packet loss = 100% [14:54:43] RECOVERY - Host labstore1002 is UP: PING WARNING - Packet loss = 37%, RTA = 2.38 ms [14:56:51] PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [14:57:02] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [14:58:09] !log reseating asw-c-eqiad uplink module (xe-1/1/0 and xe-1/1/2) [14:58:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:58:25] 6operations: Create HA setup for DNS recursion - https://phabricator.wikimedia.org/T79058#1943986 (10faidon) [14:59:12] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [15:01:21] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [15:09:06] bblack: hiyaa [15:09:52] checking in about today's scheduled mobile -> text merge [15:11:22] PROBLEM - puppet last run on cp3022 is CRITICAL: CRITICAL: puppet fail [15:13:50] Hello akosiaris - if you have a moment later can you peek at https://otrs-wiki.wikimedia.org/wiki/Administrator_requests#Wrong_translation ... I already commented - don't think there's much we can do.. [15:14:02] I find it funny, though [15:14:51] (03PS1) 10Muehlenhoff: CVE-2016-0728 [debs/linux] - 10https://gerrit.wikimedia.org/r/264952 [15:14:52] RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 228, down: 0, dormant: 0, excluded: 0, unused: 0 [15:14:53] 6operations: Setup basic infrastructure services in codfw - https://phabricator.wikimedia.org/T84350#1944011 (10faidon) [15:15:01] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 207, down: 0, dormant: 0, excluded: 0, unused: 0 [15:15:05] RD: lol... soup of the day... [15:15:17] I lol'd [15:15:32] cmjohnson1: whatever you did, it worked (see the RECOVERYs above) :) [15:15:39] (03CR) 10Muehlenhoff: [C: 032 V: 032] CVE-2016-0728 [debs/linux] - 10https://gerrit.wikimedia.org/r/264952 (owner: 10Muehlenhoff) [15:15:46] RD: https://github.com/OTRS/otrs/search?utf8=%E2%9C%93&q=Soep+van+de+Dag it's present in current OTRS [15:15:56] yep...i was fortunate enough to find a spare in an existing ex4200...the module was dead [15:16:26] akosiaris: ok, like I said in the comment I don't think you really need to worry about it - we'll put an upstream bug in [15:16:40] yeah, I suppose we can submit a fix given we got the correct translation. won't make it to ticket.wikimedia.org, but will make it to the new install at some point [15:16:53] 6operations, 10ops-eqiad: asw-c1-eqiad uplinks are down - https://phabricator.wikimedia.org/T124026#1944017 (10Cmjohnson) 5Open>3Resolved The uplink module was indeed dead. Fortunately we had a spare module hanging out in c6. [15:16:58] 6operations, 10netops: Upgrade JunOS on cr1/cr2-codfw - https://phabricator.wikimedia.org/T113640#1944019 (10faidon) Scheduled for Thursday Jan 21st, 12:00 UTC. [15:17:17] 6operations, 10Incident-Labs-NFS-20151216: Reinstall labstore1002 to ensure consistency with labstore1001 - https://phabricator.wikimedia.org/T121905#1944022 (10Cmjohnson) [15:18:56] 6operations, 10ops-codfw: rack/setup/deploy auth2001 as codfw auth system - https://phabricator.wikimedia.org/T120263#1944025 (10faidon) [15:19:58] cmjohnson1: do we have more spares? if not, we should probably order one or two [15:20:40] we do not..i was hoping I would find one and got very luck. I will create a new taks [15:20:58] * cmjohnson1 is having trouble typing this morning [15:21:11] :) [15:25:32] 6operations, 6Discovery, 5codfw-rollout: Set up a CirrusSearch cluster in codfw (Dallas, Texas) - https://phabricator.wikimedia.org/T105703#1944034 (10faidon) [15:25:39] 6operations, 6Discovery, 5codfw-rollout: Set up a CirrusSearch cluster in codfw (Dallas, Texas) - https://phabricator.wikimedia.org/T105703#1449703 (10faidon) [15:26:09] akosiaris: I was about to file upstream but apparently (per http://bugs.otrs.org/show_bug.cgi?id=11299 ) they use another system for reporting such issues. [15:26:20] I don't feel like registering [15:26:23] :p [15:26:34] 6operations, 10ops-eqiad, 10Incident-Labs-NFS-20151216, 6Labs, 10Labs-Infrastructure: labstore1002 issues while trying to reboot - https://phabricator.wikimedia.org/T98183#1944037 (10Cmjohnson) The new H800 card has been installed. We should probably schedule a time/day to move to ls1002 [15:26:42] 6operations, 6Discovery, 5codfw-rollout: Set up a CirrusSearch cluster in codfw (Dallas, Texas) - https://phabricator.wikimedia.org/T105703#1449703 (10faidon) 5Open>3Resolved We already have two tasks tracking a real world load testing (T117714 & T121741) and this task's name is a bit misleading since we... [15:31:36] 6operations, 6Analytics-Kanban, 7HTTPS, 5Patch-For-Review: EventLogging sees too few distinct client IPs {oryx} [8 pts] - https://phabricator.wikimedia.org/T119144#1944056 (10Ottomata) @ironholds or @tbayer, can you confirm that `clientIP`s make more sense now? [15:34:01] 6operations, 10Analytics, 10Analytics-Cluster, 10EventBus, 6Services: Investigate proper set up for using Kafka MirrorMaker with new main Kafka clusters. - https://phabricator.wikimedia.org/T123954#1944062 (10Ottomata) Ok, so it sounds like the master-master with topics named after DCs is needed then, yes? [15:39:04] 6operations, 10Analytics, 10Analytics-Cluster, 10EventBus, 6Services: Investigate proper set up for using Kafka MirrorMaker with new main Kafka clusters. - https://phabricator.wikimedia.org/T123954#1944072 (10GWicke) @ottomata, I think in conventional replication terminology the mirrormaker stuff is all... [15:39:42] RECOVERY - puppet last run on cp3022 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:40:21] 6operations, 10Wikimedia-Apache-configuration, 10Wikimedia-Site-Requests, 5codfw-appserver-setup, 5wikis-in-codfw: Configure mediawiki to operate in the Dallas DC - https://phabricator.wikimedia.org/T91754#1944078 (10Joe) As @demon pointed out, we were just missing the swift configuration, for which I wa... [15:44:10] (03CR) 10Filippo Giunchedi: filebackend: add configuration for codfw (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197499 (https://phabricator.wikimedia.org/T91754) (owner: 10Giuseppe Lavagetto) [15:47:04] 6operations, 10ops-codfw, 6Labs, 5Patch-For-Review: Update tag and racktables for labcontrol2001: renamed to labtestweb2001 - https://phabricator.wikimedia.org/T123841#1944088 (10Papaul) 5Open>3Resolved a:3Papaul Complete [15:52:34] (03PS1) 10Giuseppe Lavagetto: role::deployment::server: syncronize /srv/deployment [puppet] - 10https://gerrit.wikimedia.org/r/264954 [15:52:59] 6operations, 10Analytics, 10Analytics-Cluster, 10EventBus, 6Services: Investigate proper set up for using Kafka MirrorMaker with new main Kafka clusters. - https://phabricator.wikimedia.org/T123954#1944108 (10Ottomata) Indeed you are right, it is much more like master-slave since the topics are distinct.... [15:55:03] 6operations, 10ops-codfw: ms-be2015.codfw.wmnet: slot=11 dev=sdl failed - https://phabricator.wikimedia.org/T123830#1944114 (10Papaul) @fgiunchedi I have also a blinking amber light on drive in slot 8. Can you please check that also. Thanks. [15:57:27] 7Blocked-on-Operations, 10Deployment-Systems, 6Release-Engineering-Team, 3Scap3: Cleanup things we're not deploying anymore. - https://phabricator.wikimedia.org/T120157#1944116 (10greg) >>! In T120157#1928203, @demon wrote: > So we just need a root to `rm -R /srv/deployment{brrd,ishamel,mwprof,reporter,sca... [16:00:04] anomie ostriches thcipriani marktraceur Krenair: Respected human, time to deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160119T1600). Please do the needful. [16:00:04] MatmaRex yurik kart_ anomie: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [16:01:27] (03CR) 10WMDE-Fisch: [C: 031] Phragile: Ensure clone before creating storage dir [puppet] - 10https://gerrit.wikimedia.org/r/264745 (owner: 10WMDE-leszek) [16:02:10] Around. [16:02:18] I can SWAT this morning. [16:02:33] ok [16:02:35] I already started +2ing stuff [16:02:38] 7Blocked-on-Operations, 10Deployment-Systems, 6Release-Engineering-Team, 3Scap3: Cleanup things we're not deploying anymore. - https://phabricator.wikimedia.org/T120157#1944147 (10demon) Err, `scap.old`, ofc. [16:03:24] Krenair: oh, didn't see that. Please, SWAT away :) [16:04:27] 6operations, 10ops-codfw, 10Salt, 10hardware-requests: allocate hardware for salt master in codfw - https://phabricator.wikimedia.org/T123559#1944161 (10Papaul) @RobH the HW warranty expiration date on this system is 2016-01-13. Is it okay to allocate this system for that purpose? [16:04:51] Uhhhh [16:05:00] Krenair: MatmaRex isn't here, is someone else covering his patch? [16:05:50] hmm [16:05:52] so he isn't [16:06:04] MarkTraceur, guess not. are you volunteering? [16:06:32] based on the task it should be easy enough to test [16:09:02] Hi. I want to turn a flag for a particular extension off on enwiki (https://phabricator.wikimedia.org/T121949) Where can I make that change? Which repo/file? [16:09:21] Krenair: Actually it would be tricky because you need the campaign editor right on Commons (the only place it's enabled) [16:09:29] But I'll try to find a Commons admin to help [16:09:42] Niharika: operations/mediawiki-config [16:10:33] Krenair: Can't make it worse than not being able to save pages, though, so I'd say do it to it [16:11:12] Niharika: You'll need to make a change in InitialiseSettings and CommonSettings [16:11:24] Niharika: Repo is operations/mediawiki-config. For changing something everywhere, you would just need to edit wmf-config/CommonSettings.php. For configuring things per-wiki, though, you need wmf-config/InitialiseSettings.php for the setting and sometimes additional code in CommonSettings.php. [16:11:54] anomie: Sounds like they want default => true, enwiki => false [16:12:09] anomie: Reedy: Thanks. Yep, that's what we want. [16:12:41] yurik, around? [16:14:03] (03PS1) 10Muehlenhoff: Also amend debian/changelog with a pointer to the patch [debs/linux] - 10https://gerrit.wikimedia.org/r/264957 [16:14:32] 6operations, 10ops-codfw: ms-be2015.codfw.wmnet: slot=8 dev=sdi failed - https://phabricator.wikimedia.org/T124056#1944207 (10fgiunchedi) [16:15:03] 6operations, 10ops-codfw: ms-be2015.codfw.wmnet: slot=11 dev=sdl failed - https://phabricator.wikimedia.org/T123830#1944214 (10fgiunchedi) indeed slot 8 seems failed too, thanks @papaul! I've opened {T124056} for that [16:17:04] 6operations, 10ops-codfw, 10Salt, 10hardware-requests: allocate hardware for salt master in codfw - https://phabricator.wikimedia.org/T123559#1944220 (10RobH) @papaul: Not yet. I have this assigned to @mark for his review. We shouldn't assign spares (in non emergency situations) without clearing it with... [16:23:04] ok...l [16:23:06] clearly not [16:24:13] kart_, what about you? [16:25:03] anomie? [16:25:12] Krenair? [16:25:26] you have a patch up for swat [16:25:29] Krenair: around. [16:25:34] Krenair: Yes, I do. [16:25:40] (03CR) 10Luke081515: [C: 031] Add ability for OfficeWiki sysops to add and remove flood group rights from themselves. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/264930 (https://phabricator.wikimedia.org/T86237) (owner: 10Varnent) [16:26:19] Krenair: I need someone to look at patch deeply. file upload is still enable, while it suppose to be disable. [16:28:10] kart_, so is there a task for this? [16:28:28] 6operations, 10ops-codfw: ms-be2015.codfw.wmnet: slot=11 dev=sdl failed - https://phabricator.wikimedia.org/T123830#1944252 (10fgiunchedi) [16:29:20] Krenair: no. based on value in wgUploadNavigationUrl [16:29:25] ie false. [16:29:59] kart_, so you're saying that because it has wgUploadNavigationUrl set to false, it should have uploading disabled? [16:32:19] Krenair: I have discussion link. [16:32:34] I can add it, but it is in Gujarati. Is that OK? [16:34:12] Krenair: I didn't find any commons admins yet, I'm suspecting there aren't any available right now [16:35:52] (03CR) 10Alex Monk: [C: 032] Centralize and add rights and grants in preparation for grants moving into core [mediawiki-config] - 10https://gerrit.wikimedia.org/r/264437 (owner: 10Anomie) [16:35:56] Krenair: Putting link to https://gu.wikipedia.org/wiki/%E0%AA%B5%E0%AA%BF%E0%AA%95%E0%AA%BF%E0%AA%AA%E0%AB%80%E0%AA%A1%E0%AA%BF%E0%AA%AF%E0%AA%BE:%E0%AA%9A%E0%AB%8B%E0%AA%A4%E0%AA%B0%E0%AB%8B_(%E0%AA%85%E0%AA%A8%E0%AB%8D%E0%AA%AF)#.E0.AA.9A.E0.AA.BF.E0.AA.A4.E0.AB.8D.E0.AA.B0.E0.AB.8B_.E0.AA.9A.E0.AA.A2.E0.AA.BE.E0.AA.B5.E0.AA.B5.E0.AA.BE_.E0.AA.85.E0.AA.82.E0.AA.97.E0.AB.87.E0.AA.A8.E0.AB.80_.E0.AA.A8.E0.AB.80.E0.AA.A4.E0.AA.BF_.E0.AA.AA.E0.AA. [16:36:10] Any idea what to do? [16:36:36] (03Merged) 10jenkins-bot: Centralize and add rights and grants in preparation for grants moving into core [mediawiki-config] - 10https://gerrit.wikimedia.org/r/264437 (owner: 10Anomie) [16:36:50] (03PS1) 10Kaldari: Disable active user gadget stats on testwiki (in preparation for enwiki) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/264961 (https://phabricator.wikimedia.org/T121949) [16:37:15] 6operations, 10RESTBase, 10RESTBase-Cassandra, 5Patch-For-Review: Set up multi-DC replication for Cassandra - https://phabricator.wikimedia.org/T108613#1944277 (10Joe) @fgiunchedi @Eevans is this task completed? If not, what's left to do? Is client-side encryption really necessary? [16:37:48] kart_, does that say they want to disable local uploads? [16:38:38] Krenair: yes. [16:39:26] (03PS2) 10Alex Monk: Add missing entry to disable file upload on guwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/264469 (owner: 10KartikMistry) [16:39:29] RD: yeah, transifex is kind of standard for that. [16:39:52] (03PS3) 10Alex Monk: Add missing entry to disable file upload on guwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/264469 (owner: 10KartikMistry) [16:40:10] (03CR) 10Alex Monk: [C: 032] Add missing entry to disable file upload on guwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/264469 (owner: 10KartikMistry) [16:40:34] (03Merged) 10jenkins-bot: Add missing entry to disable file upload on guwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/264469 (owner: 10KartikMistry) [16:41:15] !log krenair@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/264437/ (duration: 00m 32s) [16:41:17] anomie, ^ [16:41:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:41:21] please check [16:41:30] jouncebot: next [16:41:31] In 0 hour(s) and 18 minute(s): Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160119T1700) [16:42:22] Krenair: Hmm. It seems to have broken stuff somehow. Let me investigate quick. [16:42:45] what has it broken exactly? [16:43:32] Krenair: Oh, I see. It'll be fine with wmf.11, but $wgMWOAuthGrantPermissions needs to be after the OAuth extension is loaded. [16:43:41] akosiaris: hey [16:43:52] anomie, want to upload a patch to fix that? [16:43:59] Krenair: Working on it now [16:44:42] 6operations, 10ops-eqiad, 5Patch-For-Review: rack/setup pc1004-1006 - https://phabricator.wikimedia.org/T121888#1944304 (10Cmjohnson) [16:44:51] doing the other config patch in the mean time [16:45:15] !log krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/264469/ (duration: 00m 31s) [16:45:17] kart_, ^ [16:45:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:45:27] Krenair: checking.. [16:45:42] 6operations, 10ops-eqiad, 5Patch-For-Review: rack/setup pc1004-1006 - https://phabricator.wikimedia.org/T121888#1944305 (10Cmjohnson) a:5Cmjohnson>3jcrespo Finished with setup and install/puppet signed/salt-keys added. Assigning to @jcrespo for implementation [16:45:49] Krenair: cool. Thanks! [16:46:33] (03PS1) 10Anomie: Move $wgMWOAuthGrantPermissions to after the OAuth extension is loaded [mediawiki-config] - 10https://gerrit.wikimedia.org/r/264964 [16:46:36] Krenair: https://gerrit.wikimedia.org/r/264964 [16:47:21] !log krenair@tin Synchronized php-1.27.0-wmf.10/extensions/Graph/modules/graph-loader.js: https://gerrit.wikimedia.org/r/#/c/264715/ (duration: 00m 31s) [16:47:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:47:53] anomie, no replacement for $wgMWOAuthGrantPermissionGroups['checkuser'] = 'administration'; ? [16:48:07] Krenair: Good catch. [16:49:02] (03PS2) 10Anomie: Move $wgMWOAuthGrantPermissions to after the OAuth extension is loaded [mediawiki-config] - 10https://gerrit.wikimedia.org/r/264964 [16:49:55] Krenair: Done. I had checked by counting lines, and thought "14 removed, 17 added, and 3 comment lines == good". Didn't notice it was also adding a blank line, so == bad. [16:50:21] (03CR) 10Alex Monk: [C: 032] Move $wgMWOAuthGrantPermissions to after the OAuth extension is loaded [mediawiki-config] - 10https://gerrit.wikimedia.org/r/264964 (owner: 10Anomie) [16:50:48] (03Merged) 10jenkins-bot: Move $wgMWOAuthGrantPermissions to after the OAuth extension is loaded [mediawiki-config] - 10https://gerrit.wikimedia.org/r/264964 (owner: 10Anomie) [16:51:45] !log krenair@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/264964/ (duration: 00m 31s) [16:51:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:51:49] Krenair: There we go. Looks good now. [16:52:09] ori: Thanks [16:52:26] graph fix looks good too [16:52:33] (03PS3) 10Anomie: Remove $wgMWOAuthGrantPermissions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/264438 [16:55:00] (03PS1) 10Cmjohnson: bug: task# T123029 Removing production DNS for erbium. Leaving mgmt for future installs. [dns] - 10https://gerrit.wikimedia.org/r/264965 [16:55:45] Why can't those dns commits be done with the same commit message format as everything else? [16:55:58] (03CR) 10Cmjohnson: [C: 032] bug: task# T123029 Removing production DNS for erbium. Leaving mgmt for future installs. [dns] - 10https://gerrit.wikimedia.org/r/264965 (owner: 10Cmjohnson) [16:57:12] 7Blocked-on-Operations, 6operations, 10ops-eqiad, 5Patch-For-Review: reclaim erbium, gadolinium into spares - https://phabricator.wikimedia.org/T123029#1944342 (10Cmjohnson) 5Open>3Resolved Production DNS removed, left the management. These severs are ready for reinstall [16:57:35] 6operations, 10RESTBase, 10RESTBase-Cassandra, 5Patch-For-Review: Set up multi-DC replication for Cassandra - https://phabricator.wikimedia.org/T108613#1944344 (10GWicke) @joe, we'd like to encrypt all cross-DC traffic, and some of that traffic is directly from clients to remote Cassandra nodes. We current... [16:59:32] Krenair: self-merge is the bright future of code review! ;) [16:59:52] so I've heard [17:00:04] akosiaris mutante: Respected human, time to deploy Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160119T1700). Please do the needful. [17:00:04] Krenair Kaldari: A patch you scheduled for Puppet SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [17:00:57] here [17:06:03] 6operations, 10ops-codfw: ms-be2015.codfw.wmnet: slot=11 dev=sdl failed - https://phabricator.wikimedia.org/T123830#1944364 (10Papaul) Thank you for contacting Dell Enterprise Support! The following information includes the applicable case and dispatch numbers related to our conversation: Service Request #:... [17:07:34] 6operations, 10RESTBase, 10RESTBase-Cassandra, 5Patch-For-Review: Set up multi-DC replication for Cassandra - https://phabricator.wikimedia.org/T108613#1944367 (10fgiunchedi) also if I'm reading the documentation correctly clients will fallback to codfw only when no eqiad nodes are unreachable. WRT the "se... [17:08:37] mutante, akosiaris: you there? [17:10:16] 6operations, 10Wikimedia-Apache-configuration, 10Wikimedia-Site-Requests, 5wikis-in-codfw: Configure mediawiki to operate in the Dallas DC - https://phabricator.wikimedia.org/T91754#1944370 (10Krenair) [17:19:32] kaldari, oh [17:19:36] your patch is to mediawiki-config [17:19:57] true [17:20:07] this is the puppet swat [17:20:10] normal swat was an hour ago [17:20:15] ah [17:20:18] <_joe_> heh [17:20:28] I'll move it to the evening window then [17:20:37] curse my metal body! [17:20:50] :) [17:21:21] soo... who's doing puppet swat? [17:22:28] Week of 2016-01-18: Daniel / Alex [17:22:31] https://office.wikimedia.org/wiki/Operations/Operations_Meeting_Notes/TechOps-2016-01-13#PuppetSWAT [17:22:35] mutante: akosiaris ^^^^ [17:23:08] eh, i missed swat [17:23:23] (also, no one on clinic duty?) :) [17:24:21] <_joe_> do we have patches and no one is around? [17:24:23] <_joe_> I can help [17:24:28] * _joe_ looks [17:24:33] PROBLEM - puppet last run on cp3013 is CRITICAL: CRITICAL: puppet fail [17:25:08] <_joe_> Krenair: I already looked at the patch, so it's a no-brainer for me :) [17:27:18] <_joe_> just running it through the puppet compiler to be sure :) [17:29:40] (03CR) 10Chad: [C: 04-1] "If it's not being branched yet why do we want it in extension-list though...won't that blow up trying to reference an extension we're not " (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263797 (https://phabricator.wikimedia.org/T116786) (owner: 10Eevans) [17:30:00] <_joe_> Krenair: I'm manually rebasing, gerrit fails to do it [17:30:07] <_joe_> but well, it should be ok [17:30:09] ok [17:31:01] <_joe_> and ofc git succeeds where gerrit fails :P [17:31:08] (03PS5) 10Giuseppe Lavagetto: Begin to merge production and beta apache config, starting with nonexistent.conf [puppet] - 10https://gerrit.wikimedia.org/r/244237 (https://phabricator.wikimedia.org/T86644) (owner: 10Alex Monk) [17:31:14] <_joe_> which is why it worked on the puppet compiler :P [17:31:42] (03PS2) 10BBlack: cache_text: add mobile IPs to loopback [puppet] - 10https://gerrit.wikimedia.org/r/258458 (https://phabricator.wikimedia.org/T109286) [17:31:59] if you guy have time left over after puppet swat, any chance we could retry https://gerrit.wikimedia.org/r/#/c/264969/ from regular swat? sorry i missed it, was sick/asleep :/ and editing of all campaign pages on commons is stil broken [17:32:00] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/244237 (https://phabricator.wikimedia.org/T86644) (owner: 10Alex Monk) [17:32:21] Krenair: _joe_: ^ [17:32:28] <_joe_> MatmaRex: it's not really the same thing, but you can ask deployers :) [17:32:49] <_joe_> puppetswat is done by ops, swat by releng (mostly) [17:32:57] i know, it's an entirely different thing, but it would've been fine 30 minutes ago and if nobody's deploying anything afterwards… :) [17:32:58] * Krenair coughs [17:34:42] <_joe_> I just don't want to step on anyone's toes [17:34:44] I thought I had done half of the swats for the past week [17:36:31] <_joe_> Krenair: it seems ok, thanks for doing that [17:42:15] (03CR) 10Giuseppe Lavagetto: "@Aaron the code seems good, I still have some doubts about how will mediawiki interact with swift:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197499 (https://phabricator.wikimedia.org/T91754) (owner: 10Giuseppe Lavagetto) [17:46:48] MatmaRex, will retry your patch [17:47:15] Krenair: thanks. marxarelli apparently also wants to deploy it [17:47:20] so sort it out D: [17:47:23] oh, okay [17:47:27] well I +2'd the commit [17:47:39] once it goes through jenkins, either of us can do the server side part [17:48:29] (03PS2) 10Eevans: EventBus configuration (currently disabled) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263797 (https://phabricator.wikimedia.org/T116786) [17:48:55] One of you but hopefully not both [17:49:00] (03CR) 10Eevans: EventBus configuration (currently disabled) (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263797 (https://phabricator.wikimedia.org/T116786) (owner: 10Eevans) [17:49:49] 6operations, 10ops-codfw: patch new circuit IPYX/125449/004/ZYO (zayo ulsfo-codfw wave) - https://phabricator.wikimedia.org/T124062#1944445 (10RobH) 3NEW a:3Papaul [17:50:33] 6operations, 10ops-codfw: patch new circuit IPYX/125449/004/ZYO (zayo ulsfo-codfw wave) - https://phabricator.wikimedia.org/T124062#1944483 (10RobH) Alternatively, Papaul may have pre-patched this and now it just needs to be rolled (fibers flipped) [17:50:34] marxarelli, are you doing it? [17:50:52] 6operations, 10ops-codfw: patch new circuit IPYX/125449/004/ZYO (zayo ulsfo-codfw wave) - https://phabricator.wikimedia.org/T124062#1944488 (10RobH) [17:51:06] Krenair: cutting the branch now. actual deploy will be a while [17:51:34] marxarelli, MatmaRex's UW backport... [17:51:46] not the train [17:51:50] 6operations, 10ops-codfw: patch new circuit IPYX/125449/004/ZYO (zayo ulsfo-codfw wave) - https://phabricator.wikimedia.org/T124062#1944445 (10RobH) [17:52:11] 6operations, 10netops: turn-up/implement zayo wave (579171) for ulsfo-codfw - https://phabricator.wikimedia.org/T122885#1944491 (10RobH) [17:52:13] 6operations, 10ops-codfw: patch new circuit IPYX/125449/004/ZYO (zayo ulsfo-codfw wave) - https://phabricator.wikimedia.org/T124062#1944445 (10RobH) [17:52:38] Krenair: right. i'll leave that one up to you if that's cool [17:52:50] ok [17:52:54] sorry, should have just left it. deployment confusion this morning [17:53:03] RECOVERY - puppet last run on cp3013 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [17:54:27] !log krenair@tin Synchronized php-1.27.0-wmf.10/extensions/UploadWizard/UploadWizard.config.php: https://gerrit.wikimedia.org/r/#/c/264969/ (duration: 00m 31s) [17:54:28] MatmaRex, ^ [17:54:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:54:48] yay, thank you [18:05:24] jynus, Krenair, can we try to sort out https://phabricator.wikimedia.org/T124002 ? [18:09:22] (03PS1) 10Alex Monk: [WIP] Move portals into generic sites.pp [puppet] - 10https://gerrit.wikimedia.org/r/264978 (https://phabricator.wikimedia.org/T86644) [18:09:59] andrewbogott, yes, I created the task with the intention of getting the problems sorted out :P [18:10:30] Krenair: is it possible to disable the feature selectively, just on labtestweb? [18:10:37] (03CR) 10jenkins-bot: [V: 04-1] [WIP] Move portals into generic sites.pp [puppet] - 10https://gerrit.wikimedia.org/r/264978 (https://phabricator.wikimedia.org/T86644) (owner: 10Alex Monk) [18:10:51] andrewbogott, I suppose so, but I also question whether we really want it on wikitech [18:11:10] ok. Maybe I don’t understand what it does… we certainly do have vandalism issues on wikitech [18:11:34] it adds centrally (meta) controlled abuse filters [18:11:52] but therefore relies on the main db servers [18:12:39] which works, right? centrally set abuse filters are currently in effect on wikitech? [18:13:09] andrewbogott, right, but I thought we didn't want to rely on the main db servers? [18:13:23] isn't that the point of silver running it's own mysql server? [18:13:32] hm… it depends on what we man by ‘rely’. If the main db servers go down, will wikitech go down? [18:14:12] or just temporarily less abuse-filtered? [18:14:13] PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 118, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-5/0/2: down - Core: cr2-ulsfo:xe-1/3/0 (Zayo, OGYX/124337//ZYO) {#11541} [10Gbps DWDM]BR [18:14:23] andrewbogott, not entirely, some pages will cease to work. You can see this effect currently on labtestwiki [18:14:39] what’s an example of a page that stops working? [18:14:45] see the ticket [18:14:47] Special:RecentChanges [18:14:59] oh, I see [18:15:11] that doesn’t worry me much. I think it’s probably worth it if we get some free vandal-fighting [18:15:16] okay [18:15:42] so we should disable the feature at labtestwiki instead of setting up the permissions for it? [18:15:57] The “don’t depend on man db servers” rule is so we still have access to our docs in case of catastrophe. It doesn’t sound like this breaks that rule. [18:16:44] Yeah, let’s disable it. [18:16:51] on labtestwikitech [18:17:29] sorry,i'm here now. is there stuff left that was in swat and isn't done.looking [18:18:03] no mutante [18:18:04] it's done [18:18:22] andrewbogott, other thing it breaks is history pages I believe [18:18:34] RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 120, down: 0, dormant: 0, excluded: 0, unused: 0 [18:18:37] oh, hm... [18:18:37] at least, some of them: https://labtestwikitech.wikimedia.org/wiki/Main_Page?action=history [18:18:41] well, that worries me a bit more [18:19:09] (03PS2) 10Alex Monk: [WIP] Move portals into generic sites.pp [puppet] - 10https://gerrit.wikimedia.org/r/264978 (https://phabricator.wikimedia.org/T86644) [18:19:15] yeah, ok, that matters. Let’s disable it everywhere! [18:19:20] * andrewbogott is suddenly easy to convince [18:19:56] Krenair: ok, thanks [18:20:15] andrewbogott, I'm not trying to convince you to go either way [18:20:39] I know :) But I am nonetheless convinced. History pages sometimes matter when troubleshooting. [18:21:27] (btw, is this the same issue that I’m getting when I try to create an account? Can't connect to MySQL server on ’10.192.48.20’?) [18:21:45] 10Ops-Access-Requests, 6operations: Create new puppet group `discovery-analytics-deploy` - https://phabricator.wikimedia.org/T122620#1944593 (10jcrespo) Allowing sudo -u discovery /srv/deployment/discovery/analytics/bin/discovery-deploy-to-hdfs and deploying to /srv/deployment/discovery/analytics is definitily... [18:22:25] 6operations, 10netops: turn-up/implement zayo wave (579171) for ulsfo-codfw - https://phabricator.wikimedia.org/T122885#1944596 (10Papaul) [18:22:27] 6operations, 10ops-codfw: patch new circuit IPYX/125449/004/ZYO (zayo ulsfo-codfw wave) - https://phabricator.wikimedia.org/T124062#1944594 (10Papaul) 5Open>3Resolved complete [18:23:09] andrewbogott, likely. abuse filter triggers on account creations. [18:23:16] (03PS3) 10Ema: cache_text: add mobile IPs to loopback [puppet] - 10https://gerrit.wikimedia.org/r/258458 (https://phabricator.wikimedia.org/T109286) (owner: 10BBlack) [18:23:21] and therefore it has to go and look at global filters [18:23:29] yep, ok, two birds then [18:23:54] (03CR) 10Ema: [C: 032 V: 032] cache_text: add mobile IPs to loopback [puppet] - 10https://gerrit.wikimedia.org/r/258458 (https://phabricator.wikimedia.org/T109286) (owner: 10BBlack) [18:25:39] 6operations, 10ops-codfw: note/label the allocated ulsfo-eqidfw xconnects that aren't in active use (two of them) - https://phabricator.wikimedia.org/T124069#1944603 (10RobH) 3NEW a:3Papaul [18:25:57] (03PS1) 10Alex Monk: Disable global abuse filters on nonglobal wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/264980 (https://phabricator.wikimedia.org/T124002) [18:26:38] (03CR) 10Andrew Bogott: [C: 031] Disable global abuse filters on nonglobal wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/264980 (https://phabricator.wikimedia.org/T124002) (owner: 10Alex Monk) [18:28:20] 6operations, 10ops-codfw: note/label the allocated ulsfo-eqidfw xconnects that aren't in active use (two of them) - https://phabricator.wikimedia.org/T124069#1944639 (10RobH) This should stop us from accidentally plugging in other connections to these ports. [18:28:43] _joe_, when you put that change through puppet-compiler earlier, what node list did you use? [18:30:52] 6operations, 7Mail: remove exim alias feedbacktest@ - https://phabricator.wikimedia.org/T123665#1944643 (10Dzahn) 5Open>3Resolved Thank you, @jdlrobson. done and removed ``` -## Random -feedbacktest: tfinc, pchang, feedbackbot@jonrobson.me.uk - ``` [18:30:54] 6operations, 7Mail: Move most (all?) exim personal aliases to OIT - https://phabricator.wikimedia.org/T122144#1944645 (10Dzahn) [18:32:26] jouncebot: next [18:32:27] In 0 hour(s) and 27 minute(s): MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160119T1900) [18:33:53] looks like it was mw1018 [18:34:14] (03PS2) 10Dzahn: [Planet Wikimedia] Update Pau Giner domain [puppet] - 10https://gerrit.wikimedia.org/r/264544 (owner: 10Nemo bis) [18:34:35] (03CR) 10Dzahn: "rebased/path conflict" [puppet] - 10https://gerrit.wikimedia.org/r/264544 (owner: 10Nemo bis) [18:34:57] (03PS3) 10Dzahn: [Planet Wikimedia] Update Pau Giner domain [puppet] - 10https://gerrit.wikimedia.org/r/264544 (owner: 10Nemo bis) [18:35:04] (03CR) 10Dzahn: [C: 032] [Planet Wikimedia] Update Pau Giner domain [puppet] - 10https://gerrit.wikimedia.org/r/264544 (owner: 10Nemo bis) [18:36:34] ema: hi, i see a change of yours when merging on the puppetmaster [18:37:07] bblack: did you ask jouncebot because of that change? [18:37:32] mutante: yeah [18:37:38] (03PS1) 10Cmjohnson: bug: task# T123785 Creating new shell user Justin Clark/jdcc [puppet] - 10https://gerrit.wikimedia.org/r/264982 [18:37:39] mutante: please hold, we'll sort it out [18:38:01] yep, so my planet change is totally harmless . can be merged anytime [18:42:51] !log Starting migration of mobile traffic to text cluster https://phabricator.wikimedia.org/T109286 [18:42:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:44:13] Krenair: shall I sign that change up for swat? [18:44:45] (03PS2) 10Chad: Generate mediawiki-installation dsh group file from hiera data [puppet] - 10https://gerrit.wikimedia.org/r/247324 (https://phabricator.wikimedia.org/T86644) [18:45:28] andrewbogott, I think my issue is different, those errors are silver trying to connect to extension1 shard [18:45:51] but probably caused by a similar thing- bad configuration [18:47:12] (03PS2) 10Cmjohnson: bug: task# T123785 Creating new shell user Justin Clark/jdcc [puppet] - 10https://gerrit.wikimedia.org/r/264982 [18:47:55] (03PS2) 10Dzahn: [Planet Wikimedia] Add Andrew Gray and William Beutler [puppet] - 10https://gerrit.wikimedia.org/r/264543 (owner: 10Nemo bis) [18:48:00] jynus: ok — but this is something that started happening recently? Or has it always been going on? [18:48:17] recent, like a month or 2 ago [18:48:46] oh, huh. [18:49:03] (03CR) 10Cmjohnson: [C: 032] bug: task# T123785 Creating new shell user Justin Clark/jdcc [puppet] - 10https://gerrit.wikimedia.org/r/264982 (owner: 10Cmjohnson) [18:54:20] (03CR) 10Dduvall: [C: 032] EventBus configuration (currently disabled) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263797 (https://phabricator.wikimedia.org/T116786) (owner: 10Eevans) [18:54:41] jynus: I have no idea, and am probably not the right one to ask. Do you mind opening a new phab case about that? [18:54:44] (03Merged) 10jenkins-bot: EventBus configuration (currently disabled) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263797 (https://phabricator.wikimedia.org/T116786) (owner: 10Eevans) [18:54:50] (03PS1) 10Cmjohnson: bug: task# T123785 Granting access for jdcc to access analytics-privatedata-users group and "bastiononly" access [puppet] - 10https://gerrit.wikimedia.org/r/264984 [18:55:09] I think there is already [18:55:43] https://phabricator.wikimedia.org/T124002 ? or a different one? [18:55:57] https://phabricator.wikimedia.org/T121866 [18:56:19] (03PS2) 10Andrew Bogott: Disable global abuse filters on nonglobal wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/264980 (https://phabricator.wikimedia.org/T124002) (owner: 10Alex Monk) [18:57:11] oh, probably flow related [18:57:45] but also querying metawiki metawiki [18:58:19] I think it is the same issue, partially [18:58:22] Echo also uses x1 [18:58:38] yes, I am looking at the actual dbs queried [18:58:41] Echo is working on wikitech, one way or another [18:58:52] Although I suppose we might be able to configure it to not use x1 for labswiki [18:59:11] question is mostly [18:59:16] (03PS1) 10Chad: apache_status.py: two minor comment fixes for pep8 [puppet] - 10https://gerrit.wikimedia.org/r/264986 [18:59:25] should it use those central services yes or not? [18:59:35] before trying to fix it [18:59:46] jynus: ‘not' [18:59:57] (03PS3) 10Dzahn: [Planet Wikimedia] Add Andrew Gray and William Beutler [puppet] - 10https://gerrit.wikimedia.org/r/264543 (owner: 10Nemo bis) [18:59:58] <3 the quotes [19:00:05] marxarelli: Respected human, time to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160119T1900). Please do the needful. [19:00:11] (03CR) 10Aaron Schulz: "That will be handled in another patch. I haven't even touched the FileBackendMultiWrite portions yet, which will have the new backends add" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197499 (https://phabricator.wikimedia.org/T91754) (owner: 10Giuseppe Lavagetto) [19:00:19] (03CR) 10Dzahn: [C: 032 V: 032] [Planet Wikimedia] Add Andrew Gray and William Beutler [puppet] - 10https://gerrit.wikimedia.org/r/264543 (owner: 10Nemo bis) [19:00:30] andrewbogott, are you ok with adding flow and or deployment and sking to not use x1? [19:00:35] !log starting branch cut for 1.27.0-wmf.11 [19:00:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:00:59] jynus: adding to the ticket you mean? [19:01:02] yes [19:01:29] yes, but it’s most likely a config change that I can make myself [19:01:35] is it? [19:01:46] because it may require data migration, etc. [19:01:48] If you want labswiki to be cordoned off, we should be able to configure things so they never use x1 [19:01:57] I was thinking I would just turn flow off [19:02:00] Oh yes but we might need to migrate data [19:02:04] since it clearly hasn’t ever been working [19:02:04] andrewbogott, +1 [19:02:17] Yeah sounds good [19:02:35] Echo does work, so maybe the echo_* tables exist locally? [19:02:44] (03PS2) 10Dzahn: admin: jdcc to access analytics-privatedata-users and bastiononly [puppet] - 10https://gerrit.wikimedia.org/r/264984 (https://phabricator.wikimedia.org/T123785) (owner: 10Cmjohnson) [19:02:49] if it never worked, it's the best [19:02:52] 6operations, 6Labs, 7Wikimedia-log-errors: labswiki cannot connect to x1-slave (db1031), and soon, x1-master, either [Error connecting to 10.64.16.20: :real_connect(): (HY000/2003): Can't connect to MySQL server on '10.64.16.20' (4)] - https://phabricator.wikimedia.org/T121866#1944794 (10Andrew) Probably the... [19:03:08] I’m happy for someone else to enable it for dogfood purposes, but that should maybe go under a separate ticket [19:03:10] There is no labswiki DB on x1 [19:03:10] So for Echo etc you should be fine [19:03:27] Yeah, Flow can be enabled without x1 access but it needs special config [19:03:31] 6operations, 6Labs, 7Wikimedia-log-errors: labswiki cannot connect to x1-slave (db1031), and soon, x1-master, either [Error connecting to 10.64.16.20: :real_connect(): (HY000/2003): Can't connect to MySQL server on '10.64.16.20' (4)] - https://phabricator.wikimedia.org/T121866#1944795 (10Andrew) a:3Andrew [19:03:39] that should clear the errors [19:03:58] and leave only the ones related to the other ticket or show other errors, if any [19:04:26] the meta ones are probably related to the abusefilter [19:05:07] and there I can help, if it is a mysql issue (although probably it is network?) [19:05:28] (03PS1) 10Andrew Bogott: Disable flow on wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/264988 (https://phabricator.wikimedia.org/T121866) [19:05:36] RoanKattouw: ^ [19:06:16] (03CR) 10Catrope: [C: 032] Disable flow on wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/264988 (https://phabricator.wikimedia.org/T121866) (owner: 10Andrew Bogott) [19:06:22] is it wikitech or labswiki? [19:06:45] wikitech is a dblist that contains labswiki and labtestwiki [19:06:53] ok :-) [19:07:09] The domain name is wikitech.wm.o and the dbname is labswiki [19:07:11] (03Merged) 10jenkins-bot: Disable flow on wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/264988 (https://phabricator.wikimedia.org/T121866) (owner: 10Andrew Bogott) [19:07:12] For historical reasons [19:07:21] andrewbogott: Mind deploying that yourself? [19:07:35] I knew that, I didnt know about the list. Thanks! [19:07:51] RoanKattouw: I don’t mind although I’m slightly afraid [19:08:01] jynus: the list is new as of yesterday [19:08:22] I can do it too if you like [19:08:32] um… yes please [19:08:34] (03CR) 10Dzahn: [C: 031] "yes, analytics-privatedata-users (and bastion access) is what is requested and per https://wikitech.wikimedia.org/wiki/Analytics/Data_acce" [puppet] - 10https://gerrit.wikimedia.org/r/264984 (https://phabricator.wikimedia.org/T123785) (owner: 10Cmjohnson) [19:08:45] I haven’t done this in a while [19:08:46] thanks [19:09:07] (03PS3) 10Cmjohnson: admin: jdcc to access analytics-privatedata-users and bastiononly [puppet] - 10https://gerrit.wikimedia.org/r/264984 (https://phabricator.wikimedia.org/T123785) [19:09:33] 10Ops-Access-Requests, 6operations: Create new puppet group `discovery-analytics-deploy` - https://phabricator.wikimedia.org/T122620#1944826 (10EBernhardson) /srv/deployment/discovery/analytics would be deployed via trebuchet from tin, pretty much the same as how the existing refinery repo is deployed to /srv/... [19:10:35] PROBLEM - puppet last run on cp4003 is CRITICAL: CRITICAL: puppet fail [19:10:35] PROBLEM - puppet last run on mw2060 is CRITICAL: CRITICAL: puppet fail [19:10:59] (03CR) 10Cmjohnson: [C: 032] admin: jdcc to access analytics-privatedata-users and bastiononly [puppet] - 10https://gerrit.wikimedia.org/r/264984 (https://phabricator.wikimedia.org/T123785) (owner: 10Cmjohnson) [19:11:25] RoanKattouw: you could deploy https://gerrit.wikimedia.org/r/#/c/264980/ too while you’re at it, if you feel like it :) [19:11:34] marxarelli: Re https://gerrit.wikimedia.org/r/#/c/263797/ please do not merge things in mediawiki-config unless you are about to deploy them imminently [19:11:38] * RoanKattouw will have to deploy that change now [19:12:15] 10Ops-Access-Requests, 6operations, 5Patch-For-Review, 5WMF-NDA: Requesting access to analytics-privatedata-users for jdcc-berkman - https://phabricator.wikimedia.org/T123785#1944854 (10Cmjohnson) [19:12:29] RoanKattouw: rgr [19:12:46] (03PS2) 10Dzahn: apache_status.py: two minor comment fixes for pep8 [puppet] - 10https://gerrit.wikimedia.org/r/264986 (owner: 10Chad) [19:12:53] 10Ops-Access-Requests, 6operations, 5Patch-For-Review, 5WMF-NDA: Requesting access to analytics-privatedata-users for jdcc-berkman - https://phabricator.wikimedia.org/T123785#1944856 (10Cmjohnson) 5Open>3Resolved a:3Cmjohnson This task has been completed. Please try to login and re-open this task if... [19:12:53] (03CR) 10Dzahn: [C: 032] apache_status.py: two minor comment fixes for pep8 [puppet] - 10https://gerrit.wikimedia.org/r/264986 (owner: 10Chad) [19:13:39] * andrewbogott -> breakfast and office [19:13:43] !log catrope@tin Synchronized wmf-config/extension-list: Add EventBus (duration: 00m 31s) [19:13:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:13:56] thanks RoanKattouw! [19:14:38] !log catrope@tin Synchronized wmf-config/InitialiseSettings.php: Disable Flow on wikitech; add EventBus plumbing (duration: 00m 31s) [19:14:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:15:19] !log catrope@tin Synchronized wmf-config/CommonSettings.php: EventBus plumbing (duration: 00m 30s) [19:15:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:15:55] 6operations, 10Analytics-Cluster, 10EventBus, 6Services: Investigate proper set up for using Kafka MirrorMaker with new main Kafka clusters. - https://phabricator.wikimedia.org/T123954#1944872 (10Ottomata) [19:17:15] that worked, no errors in the last 5 minutes except 1 related to recentchanges [19:18:00] andrewbogott: Re https://gerrit.wikimedia.org/r/#/c/264980/ let's put that in the evening SWAT [19:18:17] Since AIUI it's not causing errors and stuff [19:19:46] 10Ops-Access-Requests, 6operations, 6Services, 3Mobile-Content-Service: Allow mobrovac to restart MobileApps - https://phabricator.wikimedia.org/T123540#1944890 (10Dzahn) alright, let's make a new admin group. the service name is "mobileapps" ``` root@scb1001:~# service mobileapps status ● mobileapps.ser... [19:23:10] 10Ops-Access-Requests, 6operations: Create new puppet group `discovery-analytics-deploy` - https://phabricator.wikimedia.org/T122620#1944898 (10EBernhardson) See https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Refinery for the existing deployment that we are duplicating from a different repository / set... [19:24:45] Krenair, sorry, wasn't around, but thanks for merging - checked, looks good [19:25:01] (03PS1) 10Dzahn: admin: add new group mobileapps-admins [puppet] - 10https://gerrit.wikimedia.org/r/264991 (https://phabricator.wikimedia.org/T123540) [19:27:25] (03CR) 10Dzahn: [C: 032] "adds new but empty group" [puppet] - 10https://gerrit.wikimedia.org/r/264991 (https://phabricator.wikimedia.org/T123540) (owner: 10Dzahn) [19:29:21] (03PS1) 10Chad: ganglia_gdnsd.py: two minor comment fixes for pep8 [puppet] - 10https://gerrit.wikimedia.org/r/264993 [19:31:48] (03CR) 10Dzahn: [C: 032] ganglia_gdnsd.py: two minor comment fixes for pep8 [puppet] - 10https://gerrit.wikimedia.org/r/264993 (owner: 10Chad) [19:35:54] RECOVERY - puppet last run on cp4003 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [19:36:47] (03PS1) 10Dzahn: admin: add datacenter-ops to install-server role [puppet] - 10https://gerrit.wikimedia.org/r/264994 (https://phabricator.wikimedia.org/T123681) [19:37:10] (03PS2) 10Dzahn: admin: add datacenter-ops to install-server role [puppet] - 10https://gerrit.wikimedia.org/r/264994 (https://phabricator.wikimedia.org/T123681) [19:37:54] RECOVERY - puppet last run on mw2060 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [19:52:37] (03PS1) 10Chad: ganglia diskstat.py: pep8 fixes all over the place [puppet] - 10https://gerrit.wikimedia.org/r/264997 [19:56:59] 10Ops-Access-Requests, 6operations, 6Services, 3Mobile-Content-Service, 5Patch-For-Review: Allow mobrovac to restart MobileApps - https://phabricator.wikimedia.org/T123540#1945023 (10Cmjohnson) This requires sudo level access and will need to discussed in next ops meeting first [19:58:03] 6operations, 10ops-codfw: ms-be2015.codfw.wmnet: slot=8 dev=sdi failed - https://phabricator.wikimedia.org/T124056#1945038 (10Cmjohnson) a:3Papaul Assigning this to papaul [19:58:45] 6operations, 10ops-codfw: ms-be2015.codfw.wmnet: slot=11 dev=sdl failed - https://phabricator.wikimedia.org/T123830#1945043 (10Cmjohnson) a:3Papaul [19:59:13] 6operations, 10ops-eqiad: decom protactinium (datacenter) - https://phabricator.wikimedia.org/T123798#1945047 (10Cmjohnson) a:3Cmjohnson [20:02:55] (03PS1) 10Dduvall: Add 1.27.0-wmf.11 symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265010 [20:02:57] (03PS1) 10Dduvall: Group0 to 1.27.0-wmf.11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265011 [20:06:11] 6operations, 10Analytics-Cluster, 10EventBus, 6Services: Investigate proper set up for using Kafka MirrorMaker with new main Kafka clusters. - https://phabricator.wikimedia.org/T123954#1945096 (10Krinkle) >>! In T123954#1944108, @Ottomata wrote: > I think I'd prefer suffixing, but I'm not sure. E.g. `med... [20:09:34] 6operations, 6Parsing-Team, 10Parsoid, 6Services: Update ruthenium to Debian jessie from Ubuntu 12.04 - https://phabricator.wikimedia.org/T122328#1945147 (10Dzahn) the mysql server version before: 5.5.46-0ubuntu0.12.04.2 and after it's going to be jessie with 5.5.46-0ubuntu0.12.04.2, so same thing. theref... [20:10:25] 6operations, 10Analytics-Cluster, 10EventBus, 6Services: Investigate proper set up for using Kafka MirrorMaker with new main Kafka clusters. - https://phabricator.wikimedia.org/T123954#1945176 (10Ottomata) `main-eqiad` is the name (I have chosen :?) for the Kafka cluster, so I'd like to keep it consistent... [20:10:42] 6operations, 6Parsing-Team, 10Parsoid, 6Services: Update ruthenium to Debian jessie from Ubuntu 12.04 - https://phabricator.wikimedia.org/T122328#1945178 (10mobrovac) >>! In T122328#1945147, @Dzahn wrote: > the mysql server version before: 5.5.46-0ubuntu0.12.04.2 and after it's going to be jessie with 5.5... [20:11:02] 6operations, 6Parsing-Team, 10Parsoid, 6Services: Update ruthenium to Debian jessie from Ubuntu 12.04 - https://phabricator.wikimedia.org/T122328#1945181 (10Dzahn) >>! In T122328#1945178, @mobrovac wrote: >>>! In T122328#1945147, @Dzahn wrote: >> the mysql server version before: 5.5.46-0ubuntu0.12.04.2 an... [20:11:21] (03CR) 10Dduvall: [C: 032] Add 1.27.0-wmf.11 symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265010 (owner: 10Dduvall) [20:12:03] (03Merged) 10jenkins-bot: Add 1.27.0-wmf.11 symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265010 (owner: 10Dduvall) [20:12:36] !log ruthenium: service mysql stop [20:12:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:12:54] RoanKattouw: there is no evening swat today, is there? [20:13:11] !log ruthenium: disable puppet, copy data over to osmium (screen) [20:13:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:15:27] 6operations, 10ops-codfw: patch new zayo transit xconnect into cr2-codfw:xe-5/3/1 - https://phabricator.wikimedia.org/T124088#1945221 (10RobH) 3NEW a:3Papaul [20:15:38] 6operations, 10Analytics-Cluster, 10EventBus, 6Services: Investigate proper set up for using Kafka MirrorMaker with new main Kafka clusters. - https://phabricator.wikimedia.org/T123954#1945236 (10Krinkle) >>! In T123954#1945176, @Ottomata wrote: > `main-eqiad` is the name for the Kafka cluster, so I'd like... [20:16:50] 6operations, 10ops-codfw: patch new zayo transit xconnect into cr2-codfw:xe-5/3/1 - https://phabricator.wikimedia.org/T124088#1945221 (10RobH) [20:17:07] 6operations, 10ops-codfw: patch new zayo transit xconnect into cr2-codfw:xe-5/3/1 - https://phabricator.wikimedia.org/T124088#1945221 (10RobH) [20:17:11] mutante, got a moment to help me with some puppet stuff? [20:17:38] trying to do some more prod/beta sites.pp merging [20:18:23] RoanKattouw: also it’s causing one error, which is that I can’t work on this project without it [20:18:51] jynus: did those db errors quiet down? [20:21:00] andrewbogott, they certainly went down, there are some left, at least metawiki, I do not know if something else [20:21:15] jynus: great, so that’s probably the second patch that’s needed. [20:22:32] it can be seen here: https://logstash.wikimedia.org/#dashboard/temp/AVJbjVI9ptxhN1XaWwP4 [20:23:33] 10Ops-Access-Requests, 6operations: Create new puppet group `discovery-analytics-deploy` - https://phabricator.wikimedia.org/T122620#1945346 (10Ottomata) @jcrespo, the deployment script that Erik is talking about here deploys files to HDFS, not to servers. Normal deploy processes are used to get repos from ti... [20:23:44] !log dduvall@tin Started scap: testwiki to php-1.27.0-wmf.11 and rebuild l10n cache [20:23:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:25:15] !log dduvall@tin scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki="labtestwiki" --list-file="/srv/mediawiki-staging/wmf-config/extension-list" --output="/tmp/tmp.jRNpeW67FO" ' returned non-zero exit status 1 (duration: 01m 31s) [20:25:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:25:46] Sweet [20:26:42] Krenair: have a moment to deply https://gerrit.wikimedia.org/r/#/c/264980/ ? [20:26:45] *deploy [20:26:50] (03PS1) 10Dzahn: osmium: temp. add rsyncd to copy ruthenium data [puppet] - 10https://gerrit.wikimedia.org/r/265097 (https://phabricator.wikimedia.org/T122328) [20:27:01] andrewbogott: The train is supposed to be going [20:27:05] If scap just didn't fail [20:27:08] oh, ok, sorry [20:27:10] marxarelli: Need a hand? [20:27:47] Reedy: could use it, yeah [20:28:13] I'm a little busy anyway [20:28:24] Extension /srv/mediawiki-staging/php-1.27.0-wmf.9/extensions/EventBus/extension.json doesn't exist [20:28:49] oh man, I thought it was only active wikiversions that needed a backport [20:29:03] Currently active MediaWiki versions: 1.27.0-wmf.10, 1.27.0-wmf.9 [20:29:05] .9 is active for some wikis [20:29:07] It is the active version :) [20:29:13] s/the/an/ [20:29:29] marxarelli: Simplest fix is to just add the extension to wmf.9 [20:29:39] labswiki and labtestwiki [20:29:40] I think people killed my extension-list version workaround hacks [20:30:46] (03PS2) 10Dzahn: osmium: temp. add rsyncd to copy ruthenium data [puppet] - 10https://gerrit.wikimedia.org/r/265097 (https://phabricator.wikimedia.org/T122328) [20:31:01] marxarelli: Unfortunately, we broke the semantic extension branches, hence those still being on .9 [20:31:55] marxarelli: You alright adding the extension to wmf.9? [20:32:53] Reedy: yeah, shouldn't be a problem [20:33:02] trial by fire [20:33:04] PROBLEM - Unmerged changes on repository mediawiki_config on mira is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/). [20:33:14] (03PS1) 10Chad: ganglia compat.py: couple of pep8 fixes, mostly whitespace [puppet] - 10https://gerrit.wikimedia.org/r/265100 [20:33:16] (03PS1) 10Chad: ganglia nginx_status.py: bunch of pep8 fixes [puppet] - 10https://gerrit.wikimedia.org/r/265101 [20:33:18] (03PS1) 10Chad: ganglia: util.py: bunch of pep8 fixes [puppet] - 10https://gerrit.wikimedia.org/r/265102 [20:33:20] (03PS1) 10Chad: ganglia: udp2log_socket.py: bunch of pep8 fixes [puppet] - 10https://gerrit.wikimedia.org/r/265103 [20:33:22] (03PS1) 10Chad: ganglia: udp_stats.py: bunch of pep8 fixes [puppet] - 10https://gerrit.wikimedia.org/r/265104 [20:38:33] 6operations, 6Parsing-Team, 10Parsoid, 6Services, 5Patch-For-Review: Update ruthenium to Debian jessie from Ubuntu 12.04 - https://phabricator.wikimedia.org/T122328#1945474 (10Dzahn) I added an rsyncd on osmium and data is being copied over now. @ruthenium:/mnt/data# rsync -avz /mnt/data/ rsync://osmium... [20:39:12] I'm seeing a spike in 4xx, is that related to 20:28 < Reedy> Extension /srv/mediawiki-staging/php-1.27.0-wmf.9/extensions/EventBus/extension.json doesn't exist [20:39:22] bblack: It shouldn't be [20:39:26] seems to be going back down already [20:39:27] scap should've aborted [20:39:30] https://grafana.wikimedia.org/dashboard/db/varnish-http-errors [20:39:34] so, it won't have rolled anything out [20:39:58] andrewbogott: There is one, at 4pm Pacific, the wiki page is confusing because that's very early tomorrow in UTC [20:40:19] ah, of course. I’m constantly making that mistake [20:40:21] andrewbogott: If it's blocking you, then I'm OK deploying it, but it looks like marxarelli and Reedy are having fun with the deployment tools right now [20:40:37] yeah, I’ll wait for them at least [20:40:57] You should be fine to do, depending on how long it'll take marxarelli to add the extension :) [20:41:15] OK I'll just go right now [20:41:37] (03CR) 10Catrope: [C: 032] Disable global abuse filters on nonglobal wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/264980 (https://phabricator.wikimedia.org/T124002) (owner: 10Alex Monk) [20:42:07] oh hmm, so the "spike" is in percentage terms [20:42:14] ACKNOWLEDGEMENT - DPKG on ruthenium is CRITICAL: DPKG CRITICAL dpkg reports broken packages daniel_zahn https://phabricator.wikimedia.org/T122328 [20:42:21] stats is showing a total reqs dropoff on the right I think, but it might be artificial... [20:42:29] (03Merged) 10jenkins-bot: Disable global abuse filters on nonglobal wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/264980 (https://phabricator.wikimedia.org/T124002) (owner: 10Alex Monk) [20:42:35] thank you RoanKattouw! [20:42:46] yeah there was some big lag in stats reaching graphite for reqs for a bit there... [20:42:59] it fixed itself now :) [20:45:03] 10Ops-Access-Requests, 6operations: Create new puppet group `discovery-analytics-deploy` - https://phabricator.wikimedia.org/T122620#1945521 (10jcrespo) Sorry, I misread the request. I'm ok with it. [20:46:55] Reedy: https://gerrit.wikimedia.org/r/#/c/265109/ [20:46:57] !log catrope@tin Synchronized wmf-config/InitialiseSettings.php: Disable global AbuseFilters on non-global wikis (duration: 02m 04s) [20:46:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:47:54] RECOVERY - Unmerged changes on repository mediawiki_config on mira is OK: No changes to merge. [20:48:17] marxarelli: You should probably add a branch (you could reuse wmf.10) [20:48:21] !log tin: deleted unused things from /srv/deployment (T120157) [20:48:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:48:39] But, as long as people don't enable it on wikitech et al (which they shouldn't be), I don't think it matters [20:50:08] * Reedy uses wmf.10 [20:50:52] 7Blocked-on-Operations, 10Deployment-Systems, 6Release-Engineering-Team, 3Scap3: Cleanup things we're not deploying anymore. - https://phabricator.wikimedia.org/T120157#1945570 (10Dzahn) ok. rm -rf /srv/deployment/scap .. :) no, just kidding ``` root@tin:/srv/deployment# file /srv/deployment/{brrd,isha... [20:51:14] 7Blocked-on-Operations, 10Deployment-Systems, 6Release-Engineering-Team, 3Scap3: Cleanup things we're not deploying anymore. - https://phabricator.wikimedia.org/T120157#1945574 (10Dzahn) 5Open>3Resolved a:3Dzahn [20:51:37] ... and qunit crapped out for some reason [20:52:15] marxarelli: I'd be tempted to just C+2 and V+2 [20:52:32] mutante: ty on T120157 [20:52:57] ostriches: welcome :) [20:53:29] 6operations, 6Labs, 5Patch-For-Review, 7Wikimedia-log-errors: labswiki cannot connect to x1-slave (db1031), and soon, x1-master, either [Error connecting to 10.64.16.20: :real_connect(): (HY000/2003): Can't connect to MySQL server on '10.64.16.20' (4)] - https://phabricator.wikimedia.org/T121866#1945582 (10... [20:53:31] Reedy: yeah, the tests all pass. it looks like some teardown code that failed [20:56:11] 6operations, 7Graphite: Wes Moran not able to log into Graphite - https://phabricator.wikimedia.org/T123796#1945600 (10Dzahn) @Eliza hi, see above. Maybe you can advice Wes to use a Wikitech user (if he has one already). If he doesn't have one he could create one and then we just have to add him to the right e... [20:56:45] marxarelli: merged [20:57:08] marxarelli: stage the extension, then run scap again :) [20:57:25] Reedy: awesome. thanks for the help [20:58:47] !log dduvall@tin Started scap: testwiki to php-1.27.0-wmf.11 and rebuild l10n cache [20:59:24] Reedy: what do we need to do to get wikitech caught back up? Finally upgrade SMW? [20:59:37] !log sync-common on labtestweb2001 [20:59:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:59:53] bd808, or ditch it. [20:59:57] bd808: Upgrade SMW. Remove SMW. Backport wfMessage() fixes. Add back compat shims to wikitech.php [21:00:02] I *think* we could build a wikidata-like "cooked" extension for SMW [21:00:11] I'd like to try removing SMW from labtestwikitech and see what happens.... andrewbogott? [21:00:34] Krenair: fine with me — might be nice to have it working first though [21:01:02] haha [21:01:07] andrewbogott: you must be new around here [21:01:12] :P [21:01:26] :) [21:01:28] what, have labtestwikitech working first? [21:01:37] well, so we can tell what breaks when we break it :) [21:03:15] andrewbogott, right, so what's still not working on labtestwikitech? [21:03:36] account creation didn’t work last I tried. Trying again now [21:04:11] can’t test much of the good stuff without an account [21:04:27] I suppose we need to point it at separate ldap, nova, etc. installs? [21:04:58] it mostly is already, in theory [21:05:12] heya, i need some systemd advice again! [21:05:17] paravoid: still around? [21:05:19] hm, account creation still says Can't connect to MySQL server on '10.192.48.20' [21:05:24] oh no, you are not [21:05:25] nm [21:05:26] hmm [21:05:41] guess its pretty late in europe [21:05:49] andrewbogott: that's a s1 slave [21:06:21] right, I don’t know why we’re hitting it at all [21:06:48] looking [21:09:54] marxarelli: Is scap progressing fine? [21:11:53] ok [21:11:55] it's still AF [21:12:12] 6operations, 7Graphite: Wes Moran not able to log into Graphite - https://phabricator.wikimedia.org/T123796#1945730 (10eliza) Will reach out to Wes. Thanks. Eliza [21:12:16] YuviPanda: how's your systemd fu? [21:12:17] Reedy: you can stalk the scap progress on fluorine with `tail -f /a/mw-log/scap.log | python ~bd808/scaplog.py` [21:12:34] $ mwscript eval.php labtestwiki [21:12:34] > var_dump( $wgAbuseFilterCentralDB ); [21:12:34] string(8) "metawiki" [21:12:41] despite us just having tried to disable this [21:12:51] (03PS4) 10Dzahn: Use %{TIME_YEAR} instead of updating Wikimania redirects every year [puppet] - 10https://gerrit.wikimedia.org/r/262670 (owner: 10Chad) [21:12:52] Reedy: yep. just chugging away on l10n update [21:13:18] (03PS5) 10Dzahn: Use %{TIME_YEAR} instead of updating Wikimania redirects every year [puppet] - 10https://gerrit.wikimedia.org/r/262670 (owner: 10Chad) [21:13:26] ottomata: not very strong :) [21:13:29] what's going on [21:14:59] !log dduvall@tin scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki="testwiki" --list-file="/srv/mediawiki-staging/wmf-config/extension-list" --output="/tmp/tmp.qyk48j8kem" ' returned non-zero exit status 1 (duration: 16m 11s) [21:15:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:15:07] (03PS1) 10Andrew Bogott: Disable abusefilter on wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265127 [21:15:09] Krenair: maybe ^ ? [21:15:36] oh boy [21:16:02] andrewbogott, I don't think we should disable the whole of abusefilter [21:16:10] ok [21:16:40] (03Abandoned) 10Andrew Bogott: Disable abusefilter on wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265127 (owner: 10Andrew Bogott) [21:17:15] (03CR) 10Reedy: "Before the next wiki is created... Aren't we doing to have broken links?" [puppet] - 10https://gerrit.wikimedia.org/r/262670 (owner: 10Chad) [21:17:46] marxarelli: Ugh, again? [21:18:00] Reedy: :( [21:18:02] (03CR) 10Dzahn: "Reedy, yea, we'd have to make sure to create the new wikis always before New Year's Eve.. we did so far..." [puppet] - 10https://gerrit.wikimedia.org/r/262670 (owner: 10Chad) [21:18:14] [2c8e8829] [no req] Exception from line 31 of /srv/mediawiki-staging/php-1.27.0-wmf.11/extensions/Validator/Validator.php: Validator depends on the ParamProcessor library. [21:18:25] ffs [21:18:29] wikitech strikes again [21:18:35] Reedy, do you know how tags take precedence over each other? [21:18:45] (03CR) 10Dzahn: "i don't think it's an issue because we already created 2017 wiki before we switched the redirect to 2016 wiki" [puppet] - 10https://gerrit.wikimedia.org/r/262670 (owner: 10Chad) [21:19:01] e.g. if I set medium => true but nonglobal => false, which applies to a wiki which is both medium and nonglobal? [21:19:45] Reedy: yea, so it's a year apart, we had 2017 wiki created before wikimania.org redirect switched from 2015 to 2016 [21:19:52] Krenair: I'm not sure it's determinstic [21:20:18] does it depend on which order PHP feels like reading them in? [21:20:26] Probably [21:20:35] Krenair: The usual workaround is to specify the dbname explicitly in the list, if you need to override a dblist [21:21:09] (03CR) 10Dzahn: [C: 031] Use %{TIME_YEAR} instead of updating Wikimania redirects every year [puppet] - 10https://gerrit.wikimedia.org/r/262670 (owner: 10Chad) [21:21:32] Reedy: something missing from vendor? [21:21:44] marxarelli: Looks like it's something "new" [21:21:57] aude: Is ParamProcessor a wikibase/wikidata thing? [21:22:09] marxarelli: Yup [21:22:10] https://github.com/wikimedia/mediawiki-extensions-Validator/blob/0935cc257740711bd37149be5c3b98fe88ba3c01/composer.json#L27 [21:22:25] Hmm, it's not new [21:22:27] wtf [21:23:44] marxarelli: https://github.com/wikimedia/mediawiki-tools-release/commit/fe463e195268b79c8813e4d9767a0e4660ee4d9a [21:24:15] marxarelli: It's not using the 0.5.x branch [21:25:14] PROBLEM - salt-minion processes on osmium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:25:37] marxarelli: Yup [21:25:46] Non of the explicitly branched extensions are branched [21:26:07] :/ [21:26:10] ostriches: You broke it [21:26:33] (03PS2) 10Dzahn: Phragile: Ensure clone before creating storage dir [puppet] - 10https://gerrit.wikimedia.org/r/264745 (owner: 10WMDE-leszek) [21:27:05] (03CR) 10Dzahn: [C: 032] Phragile: Ensure clone before creating storage dir [puppet] - 10https://gerrit.wikimedia.org/r/264745 (owner: 10WMDE-leszek) [21:27:09] Well, not branched, not using the correct target branches [21:27:34] PROBLEM - RAID on osmium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:28:14] PROBLEM - Disk space on osmium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:28:59] (03PS1) 10Alex Monk: Really disable global abusefilters on the nonglobal wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265135 (https://phabricator.wikimedia.org/T124002) [21:29:13] osmium is doing that because i keep it busy by copying data [21:29:24] RECOVERY - salt-minion processes on osmium is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [21:29:33] marxarelli: https://gerrit.wikimedia.org/r/265134 [21:29:43] RECOVERY - RAID on osmium is OK: OK: no RAID installed [21:29:54] (03CR) 10Alex Monk: [C: 032] Really disable global abusefilters on the nonglobal wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265135 (https://phabricator.wikimedia.org/T124002) (owner: 10Alex Monk) [21:30:26] (03Merged) 10jenkins-bot: Really disable global abusefilters on the nonglobal wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265135 (https://phabricator.wikimedia.org/T124002) (owner: 10Alex Monk) [21:31:04] YuviPanda: https://gerrit.wikimedia.org/r/#/c/241582/1 [21:32:21] mutante: yeah, needs a bit more time and co-ordination and time... [21:32:34] RECOVERY - Disk space on osmium is OK: DISK OK [21:33:12] YuviPanda: gotcha, ok [21:33:16] !log Finished migrating mobile traffic to text cluster in codfw (Mexico + green US states on this map https://phabricator.wikimedia.org/T114659) [21:33:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:33:55] !log krenair@tin Synchronized dblists/nonglobal.dblist: https://gerrit.wikimedia.org/r/265135 (duration: 03m 21s) [21:33:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:34:44] marxarelli: I suspect breaking CentralNotice wouldn't have been good either [21:35:01] definitely not! [21:35:07] (03PS2) 10Dzahn: Sentry: really create group [puppet] - 10https://gerrit.wikimedia.org/r/263019 (https://phabricator.wikimedia.org/T85239) (owner: 10Gergő Tisza) [21:35:24] !log krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/265135 (duration: 00m 32s) [21:35:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:35:30] (03PS3) 10Dzahn: Sentry: really create group [puppet] - 10https://gerrit.wikimedia.org/r/263019 (https://phabricator.wikimedia.org/T85239) (owner: 10Gergő Tisza) [21:35:39] (03CR) 10Dzahn: [C: 032] Sentry: really create group [puppet] - 10https://gerrit.wikimedia.org/r/263019 (https://phabricator.wikimedia.org/T85239) (owner: 10Gergő Tisza) [21:36:54] (03PS1) 10Aaron Schulz: Bump $wgJobBackoffThrottling to lower the htmlcacheupdate backlog [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265139 [21:38:01] (03CR) 10Aaron Schulz: [C: 032] Bump $wgJobBackoffThrottling to lower the htmlcacheupdate backlog [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265139 (owner: 10Aaron Schulz) [21:38:07] andrewbogott, so RecentChanges and history pages work now [21:38:11] andrewbogott, try create account? [21:38:23] (03Merged) 10jenkins-bot: Bump $wgJobBackoffThrottling to lower the htmlcacheupdate backlog [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265139 (owner: 10Aaron Schulz) [21:38:26] * andrewbogott tries [21:38:54] PROBLEM - configured eth on osmium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:38:54] PROBLEM - puppet last run on osmium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:38:56] it works! Or at least says it works [21:39:11] andrewbogott: [21:40:54] RECOVERY - puppet last run on osmium is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [21:40:54] RECOVERY - configured eth on osmium is OK: OK - interfaces up [21:41:51] icinga-wm: pssst [21:42:44] Krenair: creation seemed to work but actually logging in as that user gets me [2368eae1] 2016-01-19 21:42:25: Fatal exception of type "MWException" [21:42:47] super helpful! [21:44:07] marxarelli: jenkins just merged it [21:44:25] PHP Fatal error: Class 'SFFormInput' not found [21:44:41] 2016-01-19 21:42:25 labtestweb2001 labtestwiki exception ERROR: [2368eae1] /w/index.php?title=Special:UserLogin&action=submitlogin&type=login&returnto=Main+Page MWException from line 3716 of /srv/mediawiki/php-1.27.0-wmf.9/includes/user/User.php: CAS update failed on user_touched for user ID '1' (read from slave); the version of the user to be saved is older than the current version. {"exception_id":"2368eae1"} [21:45:22] mutante: where? [21:45:32] Reedy: on silver in apache error log [21:45:43] Reedy: weeeee! updating submodules now [21:45:49] ah, ok, it’s trying to hit keystone. That’s progress! [21:45:51] I will fix [21:45:59] Still need to run scap again :( [21:46:47] (03PS2) 10Dzahn: Remove plugin repository [puppet] - 10https://gerrit.wikimedia.org/r/263634 (owner: 10Chad) [21:47:02] mutante: lots of them? [21:47:02] (03PS3) 10Dzahn: gerrit: remove plugin repository [puppet] - 10https://gerrit.wikimedia.org/r/263634 (owner: 10Chad) [21:47:29] Reedy: no, i think it's old. nevermind [21:47:37] heh [21:48:07] Reedy: like 2 days ago [21:48:19] Reedy: submodule update is giving me hell for some reason [21:48:39] (03PS4) 10Dzahn: gerrit: remove plugin repository [puppet] - 10https://gerrit.wikimedia.org/r/263634 (owner: 10Chad) [21:48:47] marxarelli: pastebin? [21:49:09] (03PS1) 10Eevans: Enable EventBus extension (post-deploy) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265142 (https://phabricator.wikimedia.org/T116786) [21:49:46] (03CR) 10Reedy: "Does everywhere really need it enabling? Including wikitech et al?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265142 (https://phabricator.wikimedia.org/T116786) (owner: 10Eevans) [21:49:48] (03CR) 10Dzahn: [C: 032] gerrit: remove plugin repository [puppet] - 10https://gerrit.wikimedia.org/r/263634 (owner: 10Chad) [21:50:50] urandom: FYI, in https://phabricator.wikimedia.org/T123954, we are talking about changing the topic names [21:51:28] Krenair: do you have working changes to wikiversions.json? [21:51:36] will it be possible to set the meta.topic field dynamically in mediawiki based on the datacenter/$::site name the app server is running in? [21:51:43] no [21:51:52] (03CR) 10Dzahn: "Error: Failed to apply catalog: Could not find dependency Git::Clone[operations/gerrit/plugins] for File[/var/lib/gerrit2/review_site/lib/" [puppet] - 10https://gerrit.wikimedia.org/r/263634 (owner: 10Chad) [21:52:02] RoanKattouw_away, or marxarelli might? [21:52:21] AaronSchulz: i do [21:52:33] train is running long and wikiversions is modified for testwiki only right now [21:52:43] marxarelli: what's the submodules error? [21:52:50] long train, I see [21:53:06] Reedy: sorry, sec [21:55:02] Reedy: https://phabricator.wikimedia.org/P2485 [21:55:25] git submodule update tried to rebase SemanticMediaWiki and failed [21:55:28] (03PS1) 10Dzahn: gerrit: remove require for unused plugin repo [puppet] - 10https://gerrit.wikimedia.org/r/265144 [21:55:43] You don't want it to rebase... [21:55:45] PROBLEM - puppet last run on ytterbium is CRITICAL: CRITICAL: puppet fail [21:55:56] rm -rf extensions/SemanticMediaWiki [21:56:04] (03CR) 10Dzahn: [C: 032] gerrit: remove require for unused plugin repo [puppet] - 10https://gerrit.wikimedia.org/r/265144 (owner: 10Dzahn) [21:56:10] (03PS1) 10Andrew Bogott: Add some special-case handling for the labtestwiki OpenStack and ldap setup [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265145 [21:56:11] and then git submodule update --init --recursive extensions/SemanticMediaWiki ? [21:56:12] (03CR) 10Dzahn: "needed https://gerrit.wikimedia.org/r/#/c/265144/1" [puppet] - 10https://gerrit.wikimedia.org/r/263634 (owner: 10Chad) [21:56:31] yeah, there are a number of ext submodules that borked. i'll clean em up [21:56:49] Reedy: that's what i did. not sure why it tried to rebase [21:57:11] old git problems? [21:57:24] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/). [21:57:54] RECOVERY - puppet last run on ytterbium is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [21:58:13] PROBLEM - Unmerged changes on repository mediawiki_config on mira is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/). [21:58:19] Krenair: https://gerrit.wikimedia.org/r/#/c/265145/1 has more special-case ugliness [21:59:37] Reedy: maybe. think i got it cleaned up now. i just need to reapply a security patch then re-scap [22:00:11] cool [22:01:15] 3rd time lucky.. [22:01:47] !log dduvall@tin Started scap: testwiki to php-1.27.0-wmf.11 and rebuild l10n cache [22:07:43] andrewbogott, lgtm if you know those new server IPs/hosts etc. are correct [22:09:13] Reedy: so far so good [22:09:19] Krenair: I’m sure it’s imcomplete, but should get us further. [22:09:27] marxarelli: I saw some other noise, but it's not a blocker [22:09:32] let's do this then [22:09:47] Noting scap is running [22:09:49] atm [22:10:04] ah yes [22:10:25] andrewbogott, mind waiting a bit? [22:10:31] no problem [22:11:02] labtestweb2001 isn't included in scap but I'd prefer to wait until scap has finished [22:11:28] Reedy: i don't think we use param processor [22:11:39] 99% sure, but might be used smw or somethihg else [22:11:43] Yeah, it is [22:11:55] I thought Wikidata might be including it for some reason [22:12:00] aude: nvm then, thanks anyway :) [22:12:23] 6operations, 10ops-codfw: spare EX-UM-2X4SFP ex4200 uplink check - https://phabricator.wikimedia.org/T124104#1945937 (10RobH) 3NEW a:3Papaul [22:12:26] ok [22:17:13] (03CR) 10Eevans: "> Does everywhere really need it enabling? Including wikitech et al?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265142 (https://phabricator.wikimedia.org/T116786) (owner: 10Eevans) [22:22:28] (03PS1) 10Milimetric: Enable limn-multimedia-data cron [puppet] - 10https://gerrit.wikimedia.org/r/265152 [22:23:05] (03CR) 10Reedy: "Restbase is, but then wmgUseRestbaseVRS disables some of the functionality. What exactly, it disabled, I've no idea. So it sort of looks l" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265142 (https://phabricator.wikimedia.org/T116786) (owner: 10Eevans) [22:24:17] (03PS2) 10Eevans: Enable EventBus extension (post-deploy) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265142 (https://phabricator.wikimedia.org/T116786) [22:24:31] (03CR) 10Ottomata: [C: 032] Enable limn-multimedia-data cron [puppet] - 10https://gerrit.wikimedia.org/r/265152 (owner: 10Milimetric) [22:24:54] (03CR) 10Alex Monk: "Does it *really* need to be enabled on loginwiki, votewiki etc.?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265142 (https://phabricator.wikimedia.org/T116786) (owner: 10Eevans) [22:25:10] MarkTraceur: so all that stuff's merged now, your jobs will run. If they fail, they'll leave logs on stat1003 and you'll basically not see the output on datasets.wikimedia.org. If that happens, let me know [22:31:01] Krenair: Question is whether login/vote should be disabled in wmgUseRestbaseVRS too [22:32:24] PROBLEM - HHVM rendering on mw1205 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:32:34] PROBLEM - Apache HTTP on mw1205 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:32:54] Ooh nifty [22:33:35] milimetric: Where will the metrics show up? [22:35:23] PROBLEM - puppet last run on mw2191 is CRITICAL: CRITICAL: Puppet has 1 failures [22:39:35] 6operations, 10ops-codfw: spare EX-UM-2X4SFP ex4200 uplink check - https://phabricator.wikimedia.org/T124104#1946008 (10Papaul) a:5Papaul>3RobH I don't have space of the EX-UM-2X4SFP on site. [22:39:49] papaul: thx for checking! [22:40:49] (03PS3) 10Eevans: Enable EventBus extension (post-deploy) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265142 (https://phabricator.wikimedia.org/T116786) [22:41:28] 6operations, 10ops-codfw: spare EX-UM-2X4SFP ex4200 uplink check - https://phabricator.wikimedia.org/T124104#1946015 (10RobH) 5Open>3Resolved I've noted the results on the T124078, resolving. [22:42:03] 6operations, 10ops-codfw: patch new zayo transit xconnect into cr2-codfw:xe-5/3/1 - https://phabricator.wikimedia.org/T124088#1946017 (10Papaul) 5Open>3Resolved patch ID 11542 [22:43:29] (03PS4) 10Eevans: Enable EventBus extension (post-deploy) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265142 (https://phabricator.wikimedia.org/T116786) [23:03:58] RECOVERY - puppet last run on mw2191 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [23:05:32] Reedy: done with the train deploy? [23:05:45] andrewbogott: Not me doing it, but no, it's still going [23:05:50] rebuilding cdbs [23:05:53] ok [23:06:31] It's an hour or so in [23:06:34] It shouldn't be long left [23:06:38] Well, for the long parts [23:06:45] Still the bumps and changeovers to do [23:07:27] marxarelli: What's the % done of this step? [23:07:57] Reedy: 75% on scap-rebuild-cdbs [23:08:02] getting there [23:08:58] 6operations, 10DBA, 7Icinga, 7Monitoring: "db1047/eventlogging_sync processes" icinga alert is flaky since at least early January - https://phabricator.wikimedia.org/T123509#1946164 (10hoo) [23:09:57] (03PS2) 10Dereckson: Add davidabian.com to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/259003 (https://phabricator.wikimedia.org/T121383) [23:13:50] !log dduvall@tin Finished scap: testwiki to php-1.27.0-wmf.11 and rebuild l10n cache (duration: 72m 03s) [23:13:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:14:09] (03PS2) 10Dduvall: Group0 to 1.27.0-wmf.11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265011 [23:14:18] yay [23:14:27] (03CR) 10Alex Monk: [C: 032] Add some special-case handling for the labtestwiki OpenStack and ldap setup [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265145 (owner: 10Andrew Bogott) [23:14:36] oh, you have another patch [23:14:41] this'll be really quick [23:14:43] (03CR) 10Dduvall: [C: 032] Group0 to 1.27.0-wmf.11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265011 (owner: 10Dduvall) [23:14:50] (03Merged) 10jenkins-bot: Add some special-case handling for the labtestwiki OpenStack and ldap setup [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265145 (owner: 10Andrew Bogott) [23:14:56] bah, wtf. [23:15:02] commit 06e10cd7905e32b9616951260870451766595160 [23:15:03] Author: Aaron Schulz [23:15:03] Date: Tue Jan 19 13:37:01 2016 -0800 [23:15:03] Bump $wgJobBackoffThrottling to lower the htmlcacheupdate backlog [23:15:04] What is this? [23:15:12] (03Merged) 10jenkins-bot: Group0 to 1.27.0-wmf.11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265011 (owner: 10Dduvall) [23:15:40] AaronSchulz, ^ [23:15:58] Krenair: should i hold off on bumping versions? [23:16:38] thank you Krenair [23:16:56] you can probably get away with merging on tin and then not syncing CommonSettings [23:17:02] you're just going to sync-wikiversions right marxarelli? [23:17:08] Krenair: right [23:17:21] yeah... go for it [23:18:55] 6operations, 10MediaWiki-API, 10Traffic, 7Monitoring: Set up action API latency / error rate metrics & alerts - https://phabricator.wikimedia.org/T123854#1946193 (10GWicke) It turns out that there are some Varnish backend metrics ([example](https://graphite.wikimedia.org/render/?width=857&height=556&target... [23:18:59] !log dduvall@tin rebuilt wikiversions.php and synchronized wikiversions files: group0 to 1.27.0-wmf.11 [23:19:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:19:36] Reedy: hey, thanks for the help [23:19:42] np :) [23:19:48] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [23:22:54] !log krenair@tin Synchronized wmf-config/wikitech.php: https://gerrit.wikimedia.org/r/265145 (duration: 02m 24s) [23:22:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:23:46] (03PS1) 10Madhuvishy: apache: Add http to https redirection for simplestatic role [puppet] - 10https://gerrit.wikimedia.org/r/265162 [23:23:46] have reverted the CommonSettings change on tin so it doesn't get accidentally deployed [23:24:03] andrewbogott, running sync-common on labtestweb2001 now [23:24:38] RECOVERY - Unmerged changes on repository mediawiki_config on mira is OK: No changes to merge. [23:25:56] andrewbogott, okay... try now? [23:26:12] ok! [23:28:04] (03CR) 10Yuvipanda: apache: Add http to https redirection for simplestatic role (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/265162 (owner: 10Madhuvishy) [23:29:24] (03CR) 10Madhuvishy: apache: Add http to https redirection for simplestatic role (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/265162 (owner: 10Madhuvishy) [23:30:33] (03CR) 10Yuvipanda: apache: Add http to https redirection for simplestatic role (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/265162 (owner: 10Madhuvishy) [23:32:46] (03CR) 10Madhuvishy: apache: Add http to https redirection for simplestatic role (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/265162 (owner: 10Madhuvishy) [23:33:29] !log aaron@tin Synchronized wmf-config/CommonSettings.php: Bump $wgJobBackoffThrottling to lower the htmlcacheupdate backlog (duration: 00m 32s) [23:33:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:34:31] (03CR) 10Yuvipanda: apache: Add http to https redirection for simplestatic role (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/265162 (owner: 10Madhuvishy) [23:36:28] AaronSchulz, welcome back? You could've git reset --hard HEAD^ [23:36:44] so I've done that now [23:37:26] Krenair: I’m distracted by a different semi-urgent thing, but feel free to create yourself an account and see what happens :) Hopefully I’ll be back on this soon [23:37:59] (03CR) 10Mobrovac: [C: 031] Enable EventBus extension (post-deploy) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265142 (https://phabricator.wikimedia.org/T116786) (owner: 10Eevans) [23:39:04] There was either an authentication database error or you are not allowed to update your external account. [23:39:16] yeah, same for me [23:39:57] (03CR) 10Madhuvishy: apache: Add http to https redirection for simplestatic role (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/265162 (owner: 10Madhuvishy) [23:40:22] 2016-01-19 23:40:09 labtestweb2001 labtestwiki ldap INFO: 2.1.0 Using servers: ldap://labtestservices2001.wikimedia.org:389 [23:40:22] 2016-01-19 23:40:09 labtestweb2001 labtestwiki ldap INFO: 2.1.0 Entering getDomain [23:40:22] 2016-01-19 23:40:09 labtestweb2001 labtestwiki ldap INFO: 2.1.0 Using TLS [23:40:22] 2016-01-19 23:40:10 labtestweb2001 labtestwiki ldap INFO: 2.1.0 Failed to start TLS. [23:41:47] andrewbogott, I can telnet to that port, but "openssl s_client -connect labtestservices2001.wikimedia.org:389" fails the SSL handshake [23:42:39] Krenair: ok, so a cert issue maybe… the ldap server works for local queries and edits but I’m not sure if that’s with ssl [23:46:14] andrewbogott, did you copy data from the normal ldap to this one? [23:46:27] Krenair: yes — it’s almost exactly the same as the original [23:46:31] except for some password changes [23:46:40] and that I just now deleted my account so I could recreate [23:47:09] I can ldapsearch on both LDAP servers (labtestservices2001 and ldap-labs.eqiad) from terbium [23:49:04] (03PS1) 10Dereckson: Add *.archives.gov to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265163 (https://phabricator.wikimedia.org/T124080) [23:51:07] (User creation log); 21:38 . . User account 129.12.195.232 (Talk) was created ‎ [23:51:09] ugh, wtf? [23:51:17] Krenair: ok, I’m back paying attention to this [23:51:29] so, presumably wikitech is using some other ldap protocol than ldapsearch, right? [23:51:38] It registered my IP as a user [23:51:57] pragmatic! [23:52:48] I don't know much about LDAP, but I think ldapsearch is just a tool that speaks the Lightweight Directory Access Protocol, right? [23:52:51] hm, user creation doesn’t even show up in the ldap logs, maybe it isn’t getting that far [23:53:00] Yeah, I think so [23:54:15] (03PS1) 10Dereckson: Add *.bodleian.ox.ac.uk to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265165 (https://phabricator.wikimedia.org/T121779) [23:54:49] we're using starttls [23:55:12] http://stackoverflow.com/a/18847820/1306662 suggests openssl won't work with that [23:55:27] ok, meaning that your test isn’t useful? [23:55:31] I think so [23:56:00] who are the LDAP experts on ops? [23:57:54] moritz [23:58:04] (and me, I guess :( ) [23:58:24] we don’t know that this is an ldap issue though, do we? [23:58:39] no, but moritz might know where the issue is [23:59:44] (03CR) 10Yuvipanda: apache: Add http to https redirection for simplestatic role (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/265162 (owner: 10Madhuvishy)